Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (242)

Search Parameters:
Keywords = single-shot detector

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 5844 KB  
Article
Cloud Particle Detection in 2D-S Imaging Data via an Adaptive Anchor SSD Model
by Shuo Liu, Dingkun Yang and Luhong Fan
Atmosphere 2025, 16(8), 985; https://doi.org/10.3390/atmos16080985 - 19 Aug 2025
Viewed by 497
Abstract
The airborne 2D-S optical array probe has worked for more than ten years and has collected a large number of cloud particle images. However, existing detection methods cannot detect cloud particles with high precision due to the size differences of cloud particles and [...] Read more.
The airborne 2D-S optical array probe has worked for more than ten years and has collected a large number of cloud particle images. However, existing detection methods cannot detect cloud particles with high precision due to the size differences of cloud particles and the occurrence of particle fragmentation during imaging. So, this paper proposes a novel cloud particle detection method. The key innovation is an adaptive anchor SSD module, which overcomes existing limitations by generating anchor points that adaptively align with cloud particle size distributions. Firstly, morphological transformations generate multi-scale image information through repeated dilation and erosion operations, while removing irrelevant artifacts and fragmented particles for data cleaning. After that, the method generates geometric and mass centers across multiple scales and dynamically merges these centers to form adaptive anchor points. Finally, a detection module integrates a modified SSD with a ResNet-50 backbone for accurate bounding box predictions. Experimental results show that the proposed method achieves an mAP of 0.934 and a recall of 0.905 on the test set, demonstrating its effectiveness and reliability for cloud particle detection using the 2D-S probe. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Atmospheric Sciences)
Show Figures

Figure 1

21 pages, 5889 KB  
Article
Mobile-YOLO: A Lightweight Object Detection Algorithm for Four Categories of Aquatic Organisms
by Hanyu Jiang, Jing Zhao, Fuyu Ma, Yan Yang and Ruiwen Yi
Fishes 2025, 10(7), 348; https://doi.org/10.3390/fishes10070348 - 14 Jul 2025
Viewed by 825
Abstract
Accurate and rapid aquatic organism recognition is a core technology for fisheries automation and aquatic organism statistical research. However, due to absorption and scattering effects, images of aquatic organisms often suffer from poor contrast and color distortion. Additionally, the clustering behavior of aquatic [...] Read more.
Accurate and rapid aquatic organism recognition is a core technology for fisheries automation and aquatic organism statistical research. However, due to absorption and scattering effects, images of aquatic organisms often suffer from poor contrast and color distortion. Additionally, the clustering behavior of aquatic organisms often leads to occlusion, further complicating the identification task. This study proposes a lightweight object detection model, Mobile-YOLO, for the recognition of four representative aquatic organisms, namely holothurian, echinus, scallop, and starfish. Our model first utilizes the Mobile-Nano backbone network we proposed, which enhances feature perception while maintaining a lightweight design. Then, we propose a lightweight detection head, LDtect, which achieves a balance between lightweight structure and high accuracy. Additionally, we introduce Dysample (dynamic sampling) and HWD (Haar wavelet downsampling) modules, aiming to optimize the feature fusion structure and achieve lightweight goals by improving the processes of upsampling and downsampling. These modules also help compensate for the accuracy loss caused by the lightweight design of LDtect. Compared to the baseline model, our model reduces Params (parameters) by 32.2%, FLOPs (floating point operations) by 28.4%, and weights (model storage size) by 30.8%, while improving FPS (frames per second) by 95.2%. The improvement in mAP (mean average precision) can also lead to better accuracy in practical applications, such as marine species monitoring, conservation efforts, and biodiversity assessment. Furthermore, the model’s accuracy is enhanced, with the mAP increased by 1.6%, demonstrating the advanced nature of our approach. Compared with YOLO (You Only Look Once) series (YOLOv5-12), SSD (Single Shot MultiBox Detector), EfficientDet (Efficient Detection), RetinaNet, and RT-DETR (Real-Time Detection Transformer), our model achieves leading comprehensive performance in terms of both accuracy and lightweight design. The results indicate that our research provides technological support for precise and rapid aquatic organism recognition. Full article
(This article belongs to the Special Issue Technology for Fish and Fishery Monitoring)
Show Figures

Figure 1

25 pages, 1669 KB  
Article
Zero-Shot Infrared Domain Adaptation for Pedestrian Re-Identification via Deep Learning
by Xu Zhang, Yinghui Liu, Liangchen Guo and Huadong Sun
Electronics 2025, 14(14), 2784; https://doi.org/10.3390/electronics14142784 - 10 Jul 2025
Viewed by 570
Abstract
In computer vision, the performance of detectors trained under optimal lighting conditions is significantly impaired when applied to infrared domains due to the scarcity of labeled infrared target domain data and the inherent degradation in infrared image quality. Progress in cross-domain pedestrian re-identification [...] Read more.
In computer vision, the performance of detectors trained under optimal lighting conditions is significantly impaired when applied to infrared domains due to the scarcity of labeled infrared target domain data and the inherent degradation in infrared image quality. Progress in cross-domain pedestrian re-identification is hindered by the lack of labeled infrared image data. To address the degradation of pedestrian recognition in infrared environments, we propose a framework for zero-shot infrared domain adaptation. This integrated approach is designed to mitigate the challenges of pedestrian recognition in infrared domains while enabling zero-shot domain adaptation. Specifically, an advanced reflectance representation learning module and an exchange–re-decomposition–coherence process are employed to learn illumination invariance and to enhance the model’s effectiveness, respectively. Additionally, the CLIP (Contrastive Language–Image Pretraining) image encoder and DINO (Distillation with No Labels) are fused for feature extraction, improving model performance under infrared conditions and enhancing its generalization capability. To further improve model performance, we introduce the Non-Local Attention (NLA) module, the Instance-based Weighted Part Attention (IWPA) module, and the Multi-head Self-Attention module. The NLA module captures global feature dependencies, particularly long-range feature relationships, effectively mitigating issues such as blurred or missing image information in feature degradation scenarios. The IWPA module focuses on localized regions to enhance model accuracy in complex backgrounds and unevenly lit scenes. Meanwhile, the Multi-head Self-Attention module captures long-range dependencies between cross-modal features, further strengthening environmental understanding and scene modeling. The key innovation of this work lies in the skillful combination and application of existing technologies to new domains, overcoming the challenges posed by vision in infrared environments. Experimental results on the SYSU-MM01 dataset show that, under the single-shot setting, Rank-1 Accuracy (Rank-1) andmean Average Precision (mAP) values of 37.97% and 37.25%, respectively, were achieved, while in the multi-shot setting, values of 34.96% and 34.14% were attained. Full article
(This article belongs to the Special Issue Deep Learning in Image Processing and Computer Vision)
Show Figures

Figure 1

11 pages, 6080 KB  
Article
Single-Shot Femtosecond Raster-Framing Imaging with High Spatio-Temporal Resolution Using Wavelength/Polarization Time Coding
by Yang Yang, Yongle Zhu, Xuanke Zeng, Dong He, Li Gu, Zhijian Wang and Jingzhen Li
Photonics 2025, 12(7), 639; https://doi.org/10.3390/photonics12070639 - 24 Jun 2025
Viewed by 427
Abstract
This paper introduces a single-shot ultrafast imaging technique termed wavelength and polarization time-encoded ultrafast raster imaging (WP-URI). By integrating raster imaging principles with wavelength- and polarization-based temporal encoding, the system uses a spatial raster mask and time–space mapping to aggregate multiple two-dimensional temporal [...] Read more.
This paper introduces a single-shot ultrafast imaging technique termed wavelength and polarization time-encoded ultrafast raster imaging (WP-URI). By integrating raster imaging principles with wavelength- and polarization-based temporal encoding, the system uses a spatial raster mask and time–space mapping to aggregate multiple two-dimensional temporal raster images onto a single detector plane, thereby enabling the effective spatial separation and extraction of target information. Finally, the target dynamics are recovered using a reconstruction algorithm based on the Nyquist–Shannon sampling theorem. Numerical simulations demonstrate the single-shot acquisition of four dynamic frames at 25 trillion frames per second (Tfps) with an intrinsic spatial resolution of 50 line pairs per millimeter (lp/mm) and a wide field of view. The WP-URI technique achieves unparalleled spatio-temporal resolution and frame rates, offering significant potential for investigating ultrafast phenomena such as matter interactions, carrier dynamics in semiconductor devices, and femtosecond laser–matter processes. Full article
Show Figures

Figure 1

15 pages, 2420 KB  
Article
Performance Comparison of Multipixel Biaxial Scanning Direct Time-of-Flight Light Detection and Ranging Systems With and Without Imaging Optics
by Konstantin Albert, Manuel Ligges, Andre Henschke, Jennifer Ruskowski, Menaka De Zoysa, Susumu Noda and Anton Grabmaier
Sensors 2025, 25(10), 3229; https://doi.org/10.3390/s25103229 - 21 May 2025
Viewed by 735
Abstract
The laser pulse detection probability of a scanning direct time-of-flight light detection and ranging (LiDAR) measurement is evaluated based on the optical signal distribution on a multipixel single photon avalanche diode (SPAD) array. These detectors intrinsically suffer from dead-times after the successful detection [...] Read more.
The laser pulse detection probability of a scanning direct time-of-flight light detection and ranging (LiDAR) measurement is evaluated based on the optical signal distribution on a multipixel single photon avalanche diode (SPAD) array. These detectors intrinsically suffer from dead-times after the successful detection of a single photon and, thus, allow only for limited counting statistics when multiple returning laser photons are imaged on a single pixel. By blurring the imaged laser spot, the transition from single-pixel statistics with high signal intensity to multipixel statistics with less signal intensity is examined. Specifically, a comparison is made between the boundary cases in which (i) the returning LiDAR signal is focused through optics onto a single pixel and (ii) the detection is performed without lenses using all available pixels on the sensor matrix. The omission of imaging optics reduces the overall system size and minimizes optical transfer losses, which is crucial given the limited laser emission power due to safety standards. The investigation relies on a photon rate model for interfering (background) and signal light, applied to a simulated first-photon sensor architecture. For single-shot scenarios that reflect the optimal use of the time budget in scanning LiDAR systems, the lens-less and blurred approaches can achieve comparable or even superior results to the focusing system. This highlights the potential of fully solid-state scanning LiDAR systems utilizing optical phase arrays or multidirectional laser chips. Full article
(This article belongs to the Special Issue SPAD-Based Sensors and Techniques for Enhanced Sensing Applications)
Show Figures

Graphical abstract

16 pages, 7005 KB  
Article
Digitization of Medical Device Displays Using Deep Learning Models: A Comparative Study
by Pedro Ferreira, Pedro Lobo, Filipa Reis, João L. Vilaça and Pedro Morais
Appl. Sci. 2025, 15(10), 5436; https://doi.org/10.3390/app15105436 - 13 May 2025
Viewed by 744
Abstract
With the growing number of patients living with chronic conditions, there is an increasing need for efficient systems that can automatically capture and convert medical device readings into digital data, particularly in home-based care settings. However, most home-based medical devices are closed systems [...] Read more.
With the growing number of patients living with chronic conditions, there is an increasing need for efficient systems that can automatically capture and convert medical device readings into digital data, particularly in home-based care settings. However, most home-based medical devices are closed systems that do not support straightforward automatic data export and often require complex connections to access or transmit patient information. Since most of these devices display clinical information on a screen, this research explores how a standard smartphone camera, combined with artificial intelligence, can be used to automatically extract the displayed data in a simple and non-intrusive way. In particular, this study provides a comparative analysis of several You Only Look Once (YOLO) and Single Shot MultiBox Detector (SSD) models to evaluate their effectiveness in detecting and recognizing the readings on medical device displays. In addition to these comparisons, we also explore a hybrid approach that combines the YOLOv8l model for object detection with a Convolutional Neural Network (CNN) for classification. Several iterations of the aforementioned models were tested, using image resolutions of 320 × 320 and 640 × 640. The performance was assessed using metrics such as precision, recall, mean average precision at 0.5 Intersection over Union (mAP@50), and frames per second (FPS). The results show that YOLOv8l (640) achieved the highest mAP@50 of 0.979, but at a lower inference speed (13.20 FPS), while YOLOv8n (320) offered the fastest inference (129.79 FPS) with a reduction in mean average precision (0.786). Combining YOLOv8l with a CNN classifier resulted in a slight reduction in overall accuracy (0.96) when compared to the standalone model (0.98). While the results are promising, the study acknowledges certain limitations, including dataset-specific biases, controlled acquisition settings, and challenges in adapting to real-world scenarios. Nevertheless, the comparative analysis offers valuable insights into the trade-off between inference time and accuracy, helping guide the selection of the most suitable model based on the specific demands of the intended scanning application. Full article
(This article belongs to the Special Issue Innovations in Artificial Neural Network Applications)
Show Figures

Figure 1

13 pages, 3561 KB  
Article
Retrospective Clinical Trial to Evaluate the Effectiveness of a New Tanner–Whitehouse-Based Bone Age Assessment Algorithm Trained with a Deep Neural Network System
by Meesun Lee, Young-Hun Choi, Seul-Bi Lee, Jae-Won Choi, Seunghyun Lee, Jae-Yeon Hwang, Jung-Eun Cheon, SungHyuk Hong, Jeonghoon Kim and Yeon-Jin Cho
Diagnostics 2025, 15(8), 993; https://doi.org/10.3390/diagnostics15080993 - 14 Apr 2025
Viewed by 858
Abstract
Background/Objectives: To develop an automated deep learning-based bone age prediction model using the Tanner–Whitehouse (TW3) method and evaluate its feasibility by comparing its performance with that of pediatric radiologists. Methods: The hand and wrist radiographs of 560 Korean children and adolescents [...] Read more.
Background/Objectives: To develop an automated deep learning-based bone age prediction model using the Tanner–Whitehouse (TW3) method and evaluate its feasibility by comparing its performance with that of pediatric radiologists. Methods: The hand and wrist radiographs of 560 Korean children and adolescents (280 female, 280 male, mean age 9.43 ± 2.92 years) were evaluated using the TW3-based model and three pediatric radiologists. Images with bony destruction, congenital anomalies, or non-diagnostic quality were excluded. A commercialized AI solution built upon the Rotated Single Shot MultiBox Detector (SSD) and EfficientNet-B0 was used. Bone age measurements from the model and radiologists were compared using the paired t-tests. Linear regression analysis was performed and the coefficient of determination (r²), mean absolute error (MAE), and root mean square error (RMSE) were measured. A Bland–Altman analysis was conducted and the proportion of bone age predictions within 0.6 years of the radiologists’ assessments was calculated. Results: The TW3-based model demonstrated no significant differences between bone age measurements and radiologists, except for participants <6 and >13 years old (overall, p = 0.874; 6–8 years, p = 0.737; 8–9 years, p = 0.093; 9–10 years, p = 0.301; 10–11 years, p = 0.584; 11–13 years, p = 0.976; <6 or >13 years, p < 0.001). There was a strong linear correlation between the model prediction and radiologist assessments (r2 = 0.977). The RMSE and MAE values of the model were 0.529 (95% CI, 0.482–0.575) and 0.388 (95% CI, 0.361–0.417) years. Overall, 82.3% of bone age model predictions were within 0.6 years of the radiologists’ interpretation. Conclusions: Automated deep learning-based bone age assessment has the potential to reduce radiologists’ workload and provide standardized measurements for clinical decision making. Full article
Show Figures

Figure 1

18 pages, 3958 KB  
Article
AI-Driven UAV Surveillance for Agricultural Fire Safety
by Akmalbek Abdusalomov, Sabina Umirzakova, Komil Tashev, Nodir Egamberdiev, Guzalxon Belalova, Azizjon Meliboev, Ibragim Atadjanov, Zavqiddin Temirov and Young Im Cho
Fire 2025, 8(4), 142; https://doi.org/10.3390/fire8040142 - 2 Apr 2025
Cited by 4 | Viewed by 1368
Abstract
The increasing frequency and severity of agricultural fires pose significant threats to food security, economic stability, and environmental sustainability. Traditional fire-detection methods, relying on satellite imagery and ground-based sensors, often suffer from delayed response times and high false-positive rates, limiting their effectiveness in [...] Read more.
The increasing frequency and severity of agricultural fires pose significant threats to food security, economic stability, and environmental sustainability. Traditional fire-detection methods, relying on satellite imagery and ground-based sensors, often suffer from delayed response times and high false-positive rates, limiting their effectiveness in mitigating fire-related damages. In this study, we propose an advanced deep learning-based fire-detection framework that integrates the Single-Shot MultiBox Detector (SSD) with the computationally efficient MobileNetV2 architecture. This integration enhances real-time fire- and smoke-detection capabilities while maintaining a lightweight and deployable model suitable for Unmanned Aerial Vehicle (UAV)-based agricultural monitoring. The proposed model was trained and evaluated on a custom dataset comprising diverse fire scenarios, including various environmental conditions and fire intensities. Comprehensive experiments and comparative analyses against state-of-the-art object-detection models, such as You Only Look Once (YOLO), Faster Region-based Convolutional Neural Network (Faster R-CNN), and SSD-based variants, demonstrated the superior performance of our model. The results indicate that our approach achieves a mean Average Precision (mAP) of 97.7%, significantly surpassing conventional models while maintaining a detection speed of 45 frames per second (fps) and requiring only 5.0 GFLOPs of computational power. These characteristics make it particularly suitable for deployment in edge-computing environments, such as UAVs and remote agricultural monitoring systems. Full article
Show Figures

Figure 1

16 pages, 11868 KB  
Article
A Robust YOLOv5 Model with SE Attention and BIFPN for Jishan Jujube Detection in Complex Agricultural Environments
by Hao Chen, Lijun Su, Yiren Tian, Yixin Chai, Gang Hu and Weiyi Mu
Agriculture 2025, 15(6), 665; https://doi.org/10.3390/agriculture15060665 - 20 Mar 2025
Cited by 2 | Viewed by 1407
Abstract
This study presents an improved detection model based on the YOLOv5 (You Only Look Once version 5) framework to enhance the accuracy of Jishan jujube detection in complex natural environments, particularly with varying degrees of occlusion and dense foliage. To improve detection performance, [...] Read more.
This study presents an improved detection model based on the YOLOv5 (You Only Look Once version 5) framework to enhance the accuracy of Jishan jujube detection in complex natural environments, particularly with varying degrees of occlusion and dense foliage. To improve detection performance, we integrate an SE (squeeze-and-excitation) attention module into the backbone network to enhance the model’s ability to focus on target objects while suppressing background noise. Additionally, the original neck network is replaced with a BIFPN (bi-directional feature pyramid network) structure, enabling efficient multiscale feature fusion and improving the extraction of critical features, especially for small and occluded fruits. The experimental results demonstrate that the improved YOLOv5 model achieves a mean average precision (mAP) of 96.5%, outperforming the YOLOv3, YOLOv4, YOLOv5, and SSD (Single-Shot Multibox Detector) models by 7.4%, 9.9%, 2.5%, and 0.8%, respectively. Furthermore, the proposed model improves precision (95.8%) and F1 score (92.4%), reducing false positives and achieving a better balance between precision and recall. These results highlight the model’s effectiveness in addressing missed detections of small and occluded fruits while maintaining higher confidence in predictions. Full article
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)
Show Figures

Figure 1

14 pages, 7611 KB  
Article
Detection of Apple Trees in Orchard Using Monocular Camera
by Stephanie Nix, Airi Sato, Hirokazu Madokoro, Satoshi Yamamoto, Yo Nishimura and Kazuhito Sato
Agriculture 2025, 15(5), 564; https://doi.org/10.3390/agriculture15050564 - 6 Mar 2025
Viewed by 1039
Abstract
This study proposes an object detector for apple trees as a first step in developing agricultural digital twins. An original dataset of orchard images was created and used to train Single Shot MultiBox Detector (SSD) and You Only Look Once (YOLO) models. Performance [...] Read more.
This study proposes an object detector for apple trees as a first step in developing agricultural digital twins. An original dataset of orchard images was created and used to train Single Shot MultiBox Detector (SSD) and You Only Look Once (YOLO) models. Performance was evaluated using mean Average Precision (mAP). YOLO significantly outperformed SSD, achieving 91.3% mAP compared to the SSD’s 46.7%. Results indicate YOLO’s Darknet-53 backbone extracts more complex features suited to tree detection. This work demonstrates the potential of deep learning for automated data collection in smart farming applications. Full article
(This article belongs to the Special Issue Innovations in Precision Farming for Sustainable Agriculture)
Show Figures

Figure 1

27 pages, 10747 KB  
Article
MC-EVM: A Movement-Compensated EVM Algorithm with Face Detection for Remote Pulse Monitoring
by Abdallah Benhamida and Miklos Kozlovszky
Appl. Sci. 2025, 15(3), 1652; https://doi.org/10.3390/app15031652 - 6 Feb 2025
Viewed by 1286
Abstract
Automated tasks, mainly in the biomedical field, help to develop new technics to provide faster solutions for monitoring patients’ health status. For instance, they help to measure different types of human bio-signal, perform fast data analysis, and enable overall patient status monitoring. Eulerian [...] Read more.
Automated tasks, mainly in the biomedical field, help to develop new technics to provide faster solutions for monitoring patients’ health status. For instance, they help to measure different types of human bio-signal, perform fast data analysis, and enable overall patient status monitoring. Eulerian Video Magnification (EVM) can reveal small-scale and hidden changes in real life such as color and motion changes that are used to detect actual pulse. However, due to patient movement during the measurement, the EVM process will result in the wrong estimation of the pulse. In this research, we provide a working prototype for effective artefact elimination using a face movement compensated EVM (MC-EVM) which aims to track the human face as the main Region Of Interest (ROI) and then use EVM to estimate the pulse. Our primary contribution lays on the development and training of two face detection models using TensorFlow Lite: the Single-Shot MultiBox Detector (SSD) and the EfficientDet-Lite0 models that are used based on the computational capabilities of the device in use. By employing one of these models, we can crop the face accurately from the video, which is then processed using EVM to estimate the pulse. MC-EVM showed very promising results and ensured robust pulse measurement by effectively mitigating the impact of patient movement. The results were compared and validated against ground-truth data that were made available online and against pre-existing solutions from the state-of-the-art. Full article
(This article belongs to the Special Issue Monitoring of Human Physiological Signals)
Show Figures

Figure 1

22 pages, 2506 KB  
Article
Segmentation of ADPKD Computed Tomography Images with Deep Learning Approach for Predicting Total Kidney Volume
by Ting-Wen Sheng, Djeane Debora Onthoni, Pushpanjali Gupta, Tsong-Hai Lee and Prasan Kumar Sahoo
Biomedicines 2025, 13(2), 263; https://doi.org/10.3390/biomedicines13020263 - 22 Jan 2025
Cited by 2 | Viewed by 1790
Abstract
Background: Total Kidney Volume (TKV) is widely used globally to predict the progressive loss of renal function in patients with Autosomal Dominant Polycystic Kidney Disease (ADPKD). Typically, TKV is calculated using Computed Tomography (CT) images by manually locating, delineating, and segmenting the ADPKD [...] Read more.
Background: Total Kidney Volume (TKV) is widely used globally to predict the progressive loss of renal function in patients with Autosomal Dominant Polycystic Kidney Disease (ADPKD). Typically, TKV is calculated using Computed Tomography (CT) images by manually locating, delineating, and segmenting the ADPKD kidneys. However, manual localization and segmentation are tedious, time-consuming tasks and are prone to human error. Specifically, there is a lack of studies that focus on CT modality variation. Methods: In contrast, our work develops a step-by-step framework, which robustly handles both Non-enhanced Computed Tomography (NCCT) and Contrast-enhanced Computed Tomography (CCT) images, ensuring balanced sample utilization and consistent performance across modalities. To achieve this, Artificial Intelligence (AI)-enabled localization and segmentation models are proposed for estimating TKV, which is designed to work robustly on both NCCT and Contrast-Computed Tomography (CCT) images. These AI-based models incorporate various image preprocessing techniques, including dilation and global thresholding, combined with Deep Learning (DL) approaches such as the adapted Single Shot Detector (SSD), Inception V2, and DeepLab V3+. Results: The experimental results demonstrate that the proposed AI-based models outperform other DL architectures, achieving a mean Average Precision (mAP) of 95% for automatic localization, a mean Intersection over Union (mIoU) of 92% for segmentation, and a mean R2 score of 97% for TKV estimation. Conclusions: These results clearly indicate that the proposed AI-based models can robustly localize and segment ADPKD kidneys and estimate TKV using both NCCT and CCT images. Full article
(This article belongs to the Special Issue The Promise of Artificial Intelligence in Kidney Disease)
Show Figures

Figure 1

38 pages, 6770 KB  
Article
Evaluation and Selection of Hardware and AI Models for Edge Applications: A Method and A Case Study on UAVs
by Müge Canpolat Şahin and Ayça Kolukısa Tarhan
Appl. Sci. 2025, 15(3), 1026; https://doi.org/10.3390/app15031026 - 21 Jan 2025
Cited by 6 | Viewed by 4635
Abstract
This study proposes a method for selecting suitable edge hardware and Artificial Intelligence (AI) models to be deployed on these edge devices. Edge AI, which enables devices at the network periphery to perform intelligent tasks locally, is rapidly expanding across various domains. However, [...] Read more.
This study proposes a method for selecting suitable edge hardware and Artificial Intelligence (AI) models to be deployed on these edge devices. Edge AI, which enables devices at the network periphery to perform intelligent tasks locally, is rapidly expanding across various domains. However, selecting appropriate edge hardware and AI models is a multi-faceted challenge due to the wide range of available options, diverse application requirements, and the unique constraints of edge environments, such as limited computational power, strict energy constraints, and the need for real-time processing. Ad hoc approaches often lead to non-optimal solutions and inefficiency problems. Considering these issues, we propose a method based on the ISO/IEC 25010:2011 quality standard, integrating Multi-Criteria Decision Analysis (MCDA) techniques to assess both the hardware and software aspects of Edge AI applications systematically. For the proposed method, we conducted an experiment consisting of two stages: In the first stage of the experiment, to show the applicability of the method across different use cases, we tested the method with four scenarios on UAVs, each presenting distinct edge requirements. In the second stage of the experiment, guided by the method’s recommendations for Scenario I, where the STM32H7 series microcontrollers were identified as the suitable hardware and the object detection model with Single Shot Multi-Box Detector (SSD) architecture and MobileNet backbone as the suitable AI model, we developed a TensorFlow Lite model from scratch to enhance the efficiency and versatility of the model for object detection tasks across various categories. This additional TensorFlow Lite model is aimed to show how the proposed method can guide the further development of optimized AI models tailored to the constraints and requirements of specific edge hardware. Full article
Show Figures

Figure 1

13 pages, 3531 KB  
Article
Multi-Scale Feature Fusion and Context-Enhanced Spatial Sparse Convolution Single-Shot Detector for Unmanned Aerial Vehicle Image Object Detection
by Guimei Qi, Zhihong Yu and Jian Song
Appl. Sci. 2025, 15(2), 924; https://doi.org/10.3390/app15020924 - 18 Jan 2025
Cited by 3 | Viewed by 1271
Abstract
Accurate and efficient object detection in UAV images is a challenging task due to the diversity of target scales and the massive number of small targets. This study investigates the enhancement in the detection head using sparse convolution, demonstrating its effectiveness in achieving [...] Read more.
Accurate and efficient object detection in UAV images is a challenging task due to the diversity of target scales and the massive number of small targets. This study investigates the enhancement in the detection head using sparse convolution, demonstrating its effectiveness in achieving an optimal balance between accuracy and efficiency. Nevertheless, the sparse convolution method encounters challenges related to the inadequate incorporation of global contextual information and exhibits network inflexibility attributable to its fixed mask ratios. To address the above issues, the MFFCESSC-SSD, a novel single-shot detector (SSD) with multi-scale feature fusion and context-enhanced spatial sparse convolution, is proposed in this paper. First, a global context-enhanced group normalization (CE-GN) layer is developed to address the issue of information loss resulting from the convolution process applied exclusively to the masked region. Subsequently, a dynamic masking strategy is designed to determine the optimal mask ratios, thereby ensuring compact foreground coverage that enhances both accuracy and efficiency. Experiments on two datasets (i.e., VisDrone and ARH2000; the latter dataset was created by the researchers) demonstrate that the MFFCESSC-SSD remarkably outperforms the performance of the SSD and numerous conventional object detection algorithms in terms of accuracy and efficiency. Full article
(This article belongs to the Special Issue Object Detection and Image Classification)
Show Figures

Figure 1

22 pages, 6782 KB  
Article
Multi-Modal Prototypes for Few-Shot Object Detection in Remote Sensing Images
by Yanxing Liu, Zongxu Pan, Jianwei Yang, Peiling Zhou and Bingchen Zhang
Remote Sens. 2024, 16(24), 4693; https://doi.org/10.3390/rs16244693 - 16 Dec 2024
Cited by 4 | Viewed by 2660
Abstract
Few-shot object detection has attracted extensive attention due to the abomination of time-consuming or even impractical large-scale data labeling. Current studies attempted to employ prototype-matching approaches for object detection, constructing class prototypes from textual or visual features. However, single visual prototypes exhibit limited [...] Read more.
Few-shot object detection has attracted extensive attention due to the abomination of time-consuming or even impractical large-scale data labeling. Current studies attempted to employ prototype-matching approaches for object detection, constructing class prototypes from textual or visual features. However, single visual prototypes exhibit limited generalization in few-shot scenarios, while single textual prototypes lack the spatial details of remote sensing targets. Therefore, to achieve the best of both worlds, we propose a prototype aggregating module to integrate textual and visual prototypes, leveraging both semantics from textual prototypes and spatial details from visual prototypes. In addition, the transferability of multi-modal few-shot detectors from natural scenarios to remote sensing scenarios remains unexplored, and previous training strategies for FSOD do not adequately consider the characteristics of text encoders. To address the issue, we have conducted extensive ablation studies on different feature extractors of the detector and propose an efficient two-stage training strategy, which takes the characteristics of the text feature extractor into account. Experiments on two common few-shot detection benchmarks demonstrate the effectiveness of our proposed method. In four widely used data splits of DIOR, our method significantly outperforms previous state-of-the-art methods by at most 8.7%. Full article
Show Figures

Figure 1

Back to TopTop