MDPI - Publisher of Open Access Journals

21 pages, 3725 KB

Open AccessArticle

Pruning-Friendly RGB-T Semantic Segmentation for Real-Time Processing on Edge Devices

by Jun Young Hwang, Youn Joo Lee, Ho Gi Jung and Jae Kyu Suhr

Electronics 2025, 14(17), 3408; https://doi.org/10.3390/electronics14173408 - 27 Aug 2025

Viewed by 638

RGB-T semantic segmentation using thermal and RGB images simultaneously is actively being researched to robustly recognize the surrounding environment of vehicles regardless of challenging lighting and weather conditions. It is important for them to operate in real time on edge devices. As transformer-based [...] Read more.

RGB-T semantic segmentation using thermal and RGB images simultaneously is actively being researched to robustly recognize the surrounding environment of vehicles regardless of challenging lighting and weather conditions. It is important for them to operate in real time on edge devices. As transformer-based approaches, which most recent RGB-T semantic segmentation studies belong to, are very difficult to perform on edge devices, this paper considers only CNN-based RGB-T semantic segmentation networks that can be performed on edge devices and operated in real time. Although EAEFNet shows the best performance among CNN-based networks on edge devices, its inference speed is too slow for real-time operation. Furthermore, even when channel pruning is applied, the speed improvement is minimal. The analysis of EAEFNet identifies the intermediate fusion of RGB and thermal features and the high complexity of the decoder as the main causes. To address these issues, this paper proposes a network using a ResNet encoder with an early-fused four-channel input and the U-Net decoder structure. To improve the decoder performance, bilinear upsampling is replaced with PixelShuffle. Additionally, mini Atrous Spatial Pyramid Pooling (ASPP) and Progressive Transposed Module (PTM) modules are applied. Since the Proposed Network is primarily composed of convolutional layers, channel pruning is confirmed to be effectively applicable. Consequently, channel pruning significantly improves inference speed, and enables real-time operation on the neural processing unit (NPU) of edge devices. The Proposed Network is evaluated using the MFNet dataset, one of the most widely used public datasets for RGB-T semantic segmentation. It is shown that the proposed method achieves a performance comparable to EAEFNet while operating at over 30 FPS on an embedded board equipped with the Qualcomm QCS6490 SoC. Full article

(This article belongs to the Special Issue New Insights in 2D and 3D Object Detection and Semantic Segmentation)

► Show Figures

Figure 1

12 pages, 7323 KB

Open AccessArticle

WinEdge: Low-Power Winograd CNN Execution with Transposed MRAM for Edge Devices

by Milad Ashtari Gargari, Sepehr Tabrizchi and Arman Roohi

Electronics 2025, 14(12), 2485; https://doi.org/10.3390/electronics14122485 - 19 Jun 2025

Viewed by 580

Abstract

This paper presents a novel transposed MRAM architecture (WinEdge) specifically optimized for Winograd convolution acceleration in edge computing devices. Leveraging Magnetic Tunnel Junctions (MTJs) with Spin Hall Effect (SHE)-assisted Spin-Transfer Torque (STT) writing, the proposed design enables a single SHE current to simultaneously [...] Read more.

This paper presents a novel transposed MRAM architecture (WinEdge) specifically optimized for Winograd convolution acceleration in edge computing devices. Leveraging Magnetic Tunnel Junctions (MTJs) with Spin Hall Effect (SHE)-assisted Spin-Transfer Torque (STT) writing, the proposed design enables a single SHE current to simultaneously write data to four MTJs, substantially reducing power consumption. Additionally, the integration of stacked MTJs significantly improves storage density. The proposed WinEdge efficiently supports both standard and transposed data access modes regardless of bit-width, achieving up to 36% lower power, 47% reduced energy consumption, and 28% faster processing speed compared to existing designs. Simulations conducted in 45 nm CMOS technology validate its superiority over conventional SRAM-based solutions for convolutional neural network (CNN) acceleration in resource-constrained edge environments. Full article

(This article belongs to the Special Issue Emerging Computing Paradigms for Efficient Edge AI Acceleration)

► Show Figures

Figure 1

16 pages, 2672 KB

Open AccessArticle

AI-Powered Convolutional Neural Network Surrogate Modeling for High-Speed Finite Element Analysis in the NPPs Fuel Performance Framework

by Salvatore A. Cancemi, Andrius Ambrutis, Mantas Povilaitis and Rosa Lo Frano

Energies 2025, 18(10), 2557; https://doi.org/10.3390/en18102557 - 15 May 2025

Cited by 2 | Viewed by 2175

Abstract

Convolutional Neural Networks (CNNs) are proposed for use in the nuclear power plant domain as surrogate models to enhance the computational efficiency of finite element analyses in simulating nuclear fuel behavior under varying conditions. The dataset comprises 3D fuel pellet FE models and [...] Read more.

Convolutional Neural Networks (CNNs) are proposed for use in the nuclear power plant domain as surrogate models to enhance the computational efficiency of finite element analyses in simulating nuclear fuel behavior under varying conditions. The dataset comprises 3D fuel pellet FE models and involves 13 input features, such as pressure, Young’s modulus, and temperature. CNNs predict outcomes like displacement, von Mises stress, and creep strain from these inputs, significantly reducing the simulation time from several seconds per analysis to approximately one second. The data are normalized using local and global min–max scaling to maintain consistency across inputs and outputs, facilitating accurate model learning. The CNN architecture includes multiple dense, reshaping, and transpose convolution layers, optimized through a brute-force hyperparameter tuning process and validated using a 5-fold cross-validation approach. The study employs the Adam optimizer, with a significant reduction in computational time highlighted using a GPU, which outperforms traditional CPUs significantly in training speed. The findings suggest that integrating CNN models into nuclear fuel analysis can drastically reduce computational times while maintaining accuracy, making them valuable for real-time monitoring and decision-making within nuclear power plant operations. Full article

► Show Figures

Figure 1

20 pages, 3901 KB

Open AccessArticle

Design and Implementation of a Lightweight and Energy-Efficient Semantic Segmentation Accelerator for Embedded Platforms

by Hui Li, Jinyi Li, Bowen Li, Zhengqian Miao and Shengli Lu

Micromachines 2025, 16(3), 258; https://doi.org/10.3390/mi16030258 - 25 Feb 2025

Cited by 1 | Viewed by 1082

Abstract

With the rapid development of lightweight network models and efficient hardware deployment techniques, the demand for real-time semantic segmentation in areas such as autonomous driving and medical image processing has increased significantly. However, realizing efficient semantic segmentation on resource-constrained embedded platforms still faces [...] Read more.

With the rapid development of lightweight network models and efficient hardware deployment techniques, the demand for real-time semantic segmentation in areas such as autonomous driving and medical image processing has increased significantly. However, realizing efficient semantic segmentation on resource-constrained embedded platforms still faces many challenges. As a classical lightweight semantic segmentation network, ENet has attracted much attention due to its low computational complexity. In this study, we optimize the ENet semantic segmentation network to significantly reduce its computational complexity through structural simplification and 8-bit quantization and improve its hardware compatibility through the optimization of on-chip data storage and data transfer while maintaining 51.18% mIoU. The optimized network is successfully deployed on hardware accelerator and SoC systems based on Xilinx ZYNQ ZCU104 FPGA. In addition, we optimize the computational units of transposed convolution and dilated convolution and improve the on-chip data storage and data transfer design. The optimized system achieves a frame rate of 130.75 FPS, which meets the real-time processing requirements in areas such as autonomous driving and medical imaging. Meanwhile, the power consumption of the accelerator is 3.479 W, the throughput reaches 460.8 GOPS, and the energy efficiency reaches 132.2 GOPS/W. These results fully demonstrate the effectiveness of the optimization and deployment strategies in achieving a balance between computational efficiency and accuracy, which makes the system well suited for resource-constrained embedded platform applications. Full article

► Show Figures

Figure 1

23 pages, 20581 KB

Open AccessArticle

A Novel Pseudo-Siamese Fusion Network for Enhancing Semantic Segmentation of Building Areas in Synthetic Aperture Radar Images

by Mengguang Liao, Longcheng Huang and Shaoning Li

Appl. Sci. 2025, 15(5), 2339; https://doi.org/10.3390/app15052339 - 21 Feb 2025

Viewed by 843

Abstract

Segmenting building areas from synthetic aperture radar (SAR) images holds significant research value and practical application potential. However, the complexity of the environment, the diversity of building shapes, and the interference from speckle noise have made building area segmentation from SAR images a [...] Read more.

Segmenting building areas from synthetic aperture radar (SAR) images holds significant research value and practical application potential. However, the complexity of the environment, the diversity of building shapes, and the interference from speckle noise have made building area segmentation from SAR images a challenging research topic. Compared to traditional methods, deep learning-driven approaches exhibit superiority in the aspect of stability and efficiency. Currently, most segmentation methods use a single neural network to encode SAR images, then decode them through interpolation or transpose convolution operations, and finally obtain the segmented building area images using a loss function. Although effective, the methods result in the loss of detailed information and do not fully extract the deep-level features of building areas. Therefore, we propose an innovative network named PSANet. First, two sets of deep-level features of building areas were extracted using ResNet-18 and ResNet-34, with five encoded features of varying scales obtained through a fusion algorithm. Meanwhile, information on the deepest-level encoded features was enriched utilizing an atrous spatial pyramid pooling module. Next, the encoded features were reconstructed through skip connections and transposed convolution operations to obtain discriminative features of the building areas. Finally, the model was optimized using the combined CE-Dice loss function to achieve superior performance. The experimental results of the SAR images from regions with different geographical characteristics demonstrate that the proposed PSANet outperforms several recent State-of-the-Art methods. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Semantic Segmentation, 2nd Edition)

► Show Figures

Figure 1

21 pages, 6133 KB

Open AccessArticle

BEV Semantic Map Reconstruction for Self-Driving Cars with the Multi-Head Attention Mechanism

by Yi-Cheng Liao, Jichiang Tsai and Hsuan-Ying Chien

Electronics 2025, 14(1), 32; https://doi.org/10.3390/electronics14010032 - 25 Dec 2024

Viewed by 2583

Abstract

Environmental perception is crucial for safe autonomous driving, enabling accurate analysis of the vehicle’s surroundings. While 3D LiDAR is traditionally used for 3D environment reconstruction, its high cost and complexity present challenges. In contrast, camera-based cross-view frameworks can offer a cost-effective alternative. Hence, [...] Read more.

Environmental perception is crucial for safe autonomous driving, enabling accurate analysis of the vehicle’s surroundings. While 3D LiDAR is traditionally used for 3D environment reconstruction, its high cost and complexity present challenges. In contrast, camera-based cross-view frameworks can offer a cost-effective alternative. Hence, this manuscript proposes a new cross-view model to extract mapping features from camera images and then transfer them to a Bird’s-Eye View (BEV) map. Particularly, a multi-head attention mechanism in the decoder architecture generates the final semantic map. Each camera learns embedding information corresponding to its position and angle within the BEV map. Cross-view attention fuses information from different perspectives to predict top-down map features enriched with spatial information. The multi-head attention mechanism then globally performs dependency matches, enhancing long-range information and capturing latent relationships between features. Transposed convolution replaces traditional upsampling methods, avoiding high similarities of local features and facilitating semantic segmentation inference of the BEV map. Finally, we conduct numerous simulation experiments to verify the performance of our cross-view model. Full article

(This article belongs to the Special Issue Advancement on Smart Vehicles and Smart Travel)

► Show Figures

Figure 1

15 pages, 6894 KB

Open AccessArticle

A Novel Approach to Pedestrian Re-Identification in Low-Light and Zero-Shot Scenarios: Exploring Transposed Convolutional Reflectance Decoders

by Zhenghao Li and Jiping Xiong

Electronics 2024, 13(20), 4069; https://doi.org/10.3390/electronics13204069 - 16 Oct 2024

Viewed by 1506

Abstract

In recent years, pedestrian re-identification technology has made significant progress, with various neural network models performing well under normal conditions, such as good weather and adequate lighting. However, most research has overlooked extreme environments, such as rainy weather and nighttime. Additionally, the existing [...] Read more.

In recent years, pedestrian re-identification technology has made significant progress, with various neural network models performing well under normal conditions, such as good weather and adequate lighting. However, most research has overlooked extreme environments, such as rainy weather and nighttime. Additionally, the existing pedestrian re-identification datasets predominantly consist of well-lit images. Although some studies have started to address these issues by proposing methods for enhancing low-light images to restore their original features, the effectiveness of these approaches remains limited. We noted that a method based on Retinex theory designed a reflectance representation learning module aimed at restoring image features as much as possible. However, this method has so far only been applied in object detection networks. In response to this, we improved the method and applied it to pedestrian re-identification, proposing a transposed convolution reflectance decoder (TransConvRefDecoder) to better restore details in low-light images. Extensive experiments on the Market1501, CUHK03, and MSMT17 datasets demonstrated that our approach delivered superior performance. Full article

► Show Figures

Figure 1

24 pages, 9004 KB

Open AccessArticle

NPSFF-Net: Enhanced Building Segmentation in Remote Sensing Images via Novel Pseudo-Siamese Feature Fusion

by Ningbo Guo, Mingyong Jiang, Xiaoyu Hu, Zhijuan Su, Weibin Zhang, Ruibo Li and Jiancheng Luo

Remote Sens. 2024, 16(17), 3266; https://doi.org/10.3390/rs16173266 - 3 Sep 2024

Cited by 6 | Viewed by 1787

Abstract

Building segmentation has extensive research value and application prospects in high-resolution remote sensing image (HRSI) processing. However, complex architectural contexts, varied building morphologies, and non-building occlusions make building segmentation challenging. Compared with traditional methods, deep learning-based methods present certain advantages in terms of [...] Read more.

Building segmentation has extensive research value and application prospects in high-resolution remote sensing image (HRSI) processing. However, complex architectural contexts, varied building morphologies, and non-building occlusions make building segmentation challenging. Compared with traditional methods, deep learning-based methods present certain advantages in terms of accuracy and intelligence. At present, the most popular option is to first apply a single neural network to encode an HRSI, then perform a decoding process through up-sampling or using a transposed convolution operation, and then finally obtain the segmented building image with the help of a loss function. Although effective, this approach not only tends to lead to a loss of detail information, but also fails to fully utilize the contextual features. As an alternative, we propose a novel network called NPSFF-Net. First, using an improved pseudo-Siamese network composed of ResNet-34 and ResNet-50, two sets of deep semantic features of buildings are extracted with the support of transfer learning, and four encoded features at different scales are obtained after fusion. Then, information from the deepest encoded feature is enriched using a feature enhancement module, and the resolutions are recovered via the operations of skip connections and transposed convolutions. Finally, the discriminative features of buildings are obtained using the designed feature fusion algorithm, and the optimal segmentation model is obtained by fitting a cross-entropy loss function. Our method obtained intersection-over-union values of 89.45% for the Aerial Imagery Dataset, 71.88% for the Massachusetts Buildings Dataset, and 68.72% for the Satellite Dataset I. Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Figure 1

18 pages, 4141 KB

Open AccessArticle

High-Performance Binocular Disparity Prediction Algorithm for Edge Computing

by Yuxi Cheng, Yang Song, Yi Liu, Hui Zhang and Feng Liu

Sensors 2024, 24(14), 4563; https://doi.org/10.3390/s24144563 - 14 Jul 2024

Viewed by 1680

Abstract

End-to-end disparity estimation algorithms based on cost volume deployed in edge-end neural network accelerators have the problem of structural adaptation and need to ensure accuracy under the condition of adaptation operator. Therefore, this paper proposes a novel disparity calculation algorithm that uses low-rank [...] Read more.

End-to-end disparity estimation algorithms based on cost volume deployed in edge-end neural network accelerators have the problem of structural adaptation and need to ensure accuracy under the condition of adaptation operator. Therefore, this paper proposes a novel disparity calculation algorithm that uses low-rank approximation to approximately replace 3D convolution and transposed 3D convolution, WReLU to reduce data compression caused by the activation function, and unimodal cost volume filtering and a confidence estimation network to regularize cost volume. It alleviates the problem of disparity-matching cost distribution being far away from the true distribution and greatly reduces the computational complexity and number of parameters of the algorithm while improving accuracy. Experimental results show that compared with a typical disparity estimation network, the absolute error of the proposed algorithm is reduced by 38.3%, the three-pixel error is reduced to 1.41%, and the number of parameters is reduced by 67.3%. The calculation accuracy is better than that of other algorithms, it is easier to deploy, and it has strong structural adaptability and better practicability. Full article

(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)

► Show Figures

Figure 1

17 pages, 8746 KB

Open AccessArticle

Robot Grasp Detection with Loss-Guided Collaborative Attention Mechanism and Multi-Scale Feature Fusion

by Haibing Fang, Caixia Wang and Yong Chen

Appl. Sci. 2024, 14(12), 5193; https://doi.org/10.3390/app14125193 - 14 Jun 2024

Cited by 2 | Viewed by 2108

Abstract

Grasp detection serves as the fundamental element for achieving successful grasping in robotic systems. The encoder–decoder structure has become widely adopted as the foundational architecture for grasp detection networks due to its inherent advantages of speed and accuracy. However, traditional network structures fail [...] Read more.

Grasp detection serves as the fundamental element for achieving successful grasping in robotic systems. The encoder–decoder structure has become widely adopted as the foundational architecture for grasp detection networks due to its inherent advantages of speed and accuracy. However, traditional network structures fail to effectively extract the essential features required for accurate grasping poses and neglect to eliminate the checkerboard artifacts caused by inversion convolution during decoding. Aiming at overcoming these challenges, we propose a novel generative grasp detection network (LGAR-Net2). A transposed convolution layer is employed to replace the bilinear interpolation layer in the decoder to remove the issue of uneven overlapping and consequently eliminate checkerboard artifacts. In addition, a loss-guided collaborative attention block (LGCA), which combines attention blocks with spatial pyramid blocks to enhance the attention to important regions of the image, is constructed to enhance the accuracy of information extraction. Validated on the Cornell public dataset using RGB images as the input, LGAR-Net2 achieves an accuracy of 97.7%, an improvement of 1.1% over the baseline network, and processes a single RGB image in just 15 ms. Full article

► Show Figures

Figure 1

23 pages, 3028 KB

Open AccessArticle

Effective Video Summarization Using Channel Attention-Assisted Encoder–Decoder Framework

by Faisal Alharbi, Shabana Habib, Waleed Albattah, Zahoor Jan, Meshari D. Alanazi and Muhammad Islam

Symmetry 2024, 16(6), 680; https://doi.org/10.3390/sym16060680 - 1 Jun 2024

Cited by 4 | Viewed by 3263

Abstract

A significant number of cameras regularly generate massive amounts of data, demanding hardware, time, and labor resources to acquire, process, and monitor. Asymmetric frames within videos pose a challenge to automatic summarization of videos, making it challenging to capture key content. Developments in [...] Read more.

A significant number of cameras regularly generate massive amounts of data, demanding hardware, time, and labor resources to acquire, process, and monitor. Asymmetric frames within videos pose a challenge to automatic summarization of videos, making it challenging to capture key content. Developments in computer vision have accelerated the seamless capture and analysis of high-resolution video content. Video summarization (VS) has garnered considerable interest due to its ability to provide concise summaries of lengthy videos. The current literature mainly relies on a reduced set of representative features implemented using shallow sequential networks. Therefore, this work utilizes an optimal feature-assisted visual intelligence framework for representative feature selection and summarization. Initially, the empirical analysis of several features is performed, and ultimately, we adopt a fine-tuning InceptionV3 backbone for feature extraction, deviating from conventional approaches. Secondly, our strategic encoder–decoder module captures complex relationships with five convolutional blocks and two convolution transpose blocks. Thirdly, we introduced a channel attention mechanism, illuminating interrelations between channels and prioritizing essential patterns to grasp complex refinement features for final summary generation. Additionally, comprehensive experiments and ablation studies validate our framework’s exceptional performance, consistently surpassing state-of-the-art networks on two benchmarks (TVSum and SumMe) datasets. Full article

(This article belongs to the Special Issue New Trends in Symmetry/Asymmetry of Image Processing and Computer Vision)

► Show Figures

Figure 1

20 pages, 6246 KB

Open AccessArticle

A Two-Stage Automatic Container Code Recognition Method Considering Environmental Interference

by Meng Yu, Shanglei Zhu, Bao Lu, Qiang Chen and Tengfei Wang

Appl. Sci. 2024, 14(11), 4779; https://doi.org/10.3390/app14114779 - 31 May 2024

Viewed by 2076

Abstract

Automatic Container Code Recognition (ACCR) is critical for enhancing the efficiency of container terminals. However, existing ACCR methods frequently fail to achieve satisfactory performance in complex environments at port gates. In this paper, we propose an approach for accurate, fast, and compact container [...] Read more.

Automatic Container Code Recognition (ACCR) is critical for enhancing the efficiency of container terminals. However, existing ACCR methods frequently fail to achieve satisfactory performance in complex environments at port gates. In this paper, we propose an approach for accurate, fast, and compact container code recognition by utilizing YOLOv4 for container region localization and Deeplabv3+ for character recognition. To enhance the recognition speed and accuracy of YOLOv4 and Deeplabv3+, and to facilitate their deployment at gate entrances, we introduce several improvements. First, we optimize the feature-extraction process of YOLOv4 and Deeplabv3+ to reduce their computational complexity. Second, we enhance the multi-scale recognition and loss functions of YOLOv4 to improve the accuracy and speed of container region localization. Furthermore, we adjust the dilated convolution rates of the ASPP module in Deeplabv3+. Finally, we replace two upsampling structures in the decoder of Deeplabv3+ with transposed convolution upsampling and sub-pixel convolution upsampling. Experimental results on our custom dataset demonstrate that our proposed method, C-YOLOv4, achieves a container region localization accuracy of 99.76% at a speed of 56.7 frames per second (FPS), while C-Deeplabv3+ achieves an average pixel classification accuracy (MPA) of 99.88% and an FPS of 11.4. The overall recognition success rate and recognition speed of our approach are 99.51% and 2.3 ms per frame, respectively. Moreover, C-YOLOv4 and C-Deeplabv3+ outperform existing methods in complex scenarios. Full article

(This article belongs to the Special Issue Next-Generation of Internet of Things (IoT): New Advances, Solutions, Applications, Services and Challenges)

► Show Figures

Figure 1

26 pages, 21449 KB

Open AccessArticle

Automated Multi-Type Pavement Distress Segmentation and Quantification Using Transformer Networks for Pavement Condition Index Prediction

by Zaiyan Zhang, Weidong Song, Yangyang Zhuang, Bing Zhang and Jiachen Wu

Appl. Sci. 2024, 14(11), 4709; https://doi.org/10.3390/app14114709 - 30 May 2024

Cited by 5 | Viewed by 2176

Abstract

Pavement distress detection is a crucial task when assessing pavement performance conditions. Here, a novel deep-learning method based on a transformer network, referred to as ISTD-DisNet, is proposed for multi-type pavement distress semantic segmentation. In this methodology, a mix transformer (MiT) based on [...] Read more.

Pavement distress detection is a crucial task when assessing pavement performance conditions. Here, a novel deep-learning method based on a transformer network, referred to as ISTD-DisNet, is proposed for multi-type pavement distress semantic segmentation. In this methodology, a mix transformer (MiT) based on a hierarchical transformer structure is chosen as the backbone to obtain multi-scale feature information on pavement distress, and a mixed attention module (MAM) is introduced at the decoding stage to capture the pavement distress features across different channels and spatial locations. A learnable transposed convolution upsampling module (TCUM) enhances the model’s ability to restore multi-scale distress details. Subsequently, a novel parameter—the distress pixel density ratio (PDR)—is introduced based on the segmentation results. Analyzing the intrinsic correlation between the PDR and the pavement condition index (PCI), a new pavement damage index prediction model is proposed. Finally, the experimental results reveal that the F1 and mIOU of the proposed method are 95.51% and 91.67%, respectively, and the segmentation performance is better than that of the other seven mainstream segmentation models. Further PCI prediction model validation experimental results also indicate that utilizing the PDR enables the quantitative evaluation of the pavement damage conditions for each assessment unit, holding promising engineering application potential. Full article

(This article belongs to the Special Issue Advanced Diagnostics and Nondestructive Testing Technologies for Civil Structures)

► Show Figures

Figure 1

20 pages, 6807 KB

Open AccessArticle

Single Image Super Resolution Using Deep Residual Learning

by Moiz Hassan, Kandasamy Illanko and Xavier N. Fernando

AI 2024, 5(1), 426-445; https://doi.org/10.3390/ai5010021 - 21 Mar 2024

Cited by 11 | Viewed by 6642

Abstract

Single Image Super Resolution (SSIR) is an intriguing research topic in computer vision where the goal is to create high-resolution images from low-resolution ones using innovative techniques. SSIR has numerous applications in fields such as medical/satellite imaging, remote target identification and autonomous vehicles. [...] Read more.

Single Image Super Resolution (SSIR) is an intriguing research topic in computer vision where the goal is to create high-resolution images from low-resolution ones using innovative techniques. SSIR has numerous applications in fields such as medical/satellite imaging, remote target identification and autonomous vehicles. Compared to interpolation based traditional approaches, deep learning techniques have recently gained attention in SISR due to their superior performance and computational efficiency. This article proposes an Autoencoder based Deep Learning Model for SSIR. The down-sampling part of the Autoencoder mainly uses 3 by 3 convolution and has no subsampling layers. The up-sampling part uses transpose convolution and residual connections from the down sampling part. The model is trained using a subset of the VILRC ImageNet database as well as the RealSR database. Quantitative metrics such as PSNR and SSIM are found to be as high as 76.06 and 0.93 in our testing. We also used qualitative measures such as perceptual quality. Full article

(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)

► Show Figures

Figure 1

20 pages, 9791 KB

Open AccessArticle

Downscaling Daily Reference Evapotranspiration Using a Super-Resolution Convolutional Transposed Network

by Yong Liu, Xiaohui Yan, Wenying Du, Tianqi Zhang, Xiaopeng Bai and Ruichuan Nan

Water 2024, 16(2), 335; https://doi.org/10.3390/w16020335 - 19 Jan 2024

Cited by 4 | Viewed by 2013

Abstract

The current work proposes a novel super-resolution convolutional transposed network (SRCTN) deep learning architecture for downscaling daily climatic variables. The algorithm was established based on a super-resolution convolutional neural network with transposed convolutions. This study designed synthetic experiments to downscale daily reference evapotranspiration [...] Read more.

The current work proposes a novel super-resolution convolutional transposed network (SRCTN) deep learning architecture for downscaling daily climatic variables. The algorithm was established based on a super-resolution convolutional neural network with transposed convolutions. This study designed synthetic experiments to downscale daily reference evapotranspiration (ET₀) data, which are a key indicator for climate change, from low resolutions (2°, 1°, and 0.5°) to a fine resolution (0.25°). The entire time period was divided into two major parts, i.e., training–validation (80%) and test periods (20%), and the training–validation period was further divided into training (80%) and validation (20%) parts. In the comparison of the downscaling performance between the SRCTN and Q-M models, the root-mean-squared error (RMSE) values indicated the accuracy of the models. For the SRCTN model, the RMSE values were reported for different scaling ratios: 0.239 for a ratio of 8, 0.077 for a ratio of 4, and 0.015 for a ratio of 2. In contrast, the RMSE values for the Q-M method were 0.334, 0.208, and 0.109 for scaling ratios of 8, 4, and 2, respectively. Notably, the RMSE values in the SRCTN model were consistently lower than those in the Q-M method across all scaling ratios, suggesting that the SRCTN model exhibited better downscaling performance in this evaluation. The results exhibited that the SRCTN method could reproduce the spatiotemporal distributions and extremes for the testing period very well. The trained SRCTN model in one study area performed remarkably well in a different area via transfer learning without re-training or calibration, and it outperformed the classic downscaling approach. The good performance of the SRCTN algorithm can be primarily attributed to the incorporation of transposed convolutions, which can be partially seen as trainable upsampling operations. Therefore, the proposed SRCTN method is a promising candidate tool for downscaling daily ET₀ and can potentially be employed to conduct downscaling operations for other variables. Full article

(This article belongs to the Special Issue Advances in Hydraulic and Water Resources Research)

► Show Figures

Figure 1

Search Results (63)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (63)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI