Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (70)

Search Parameters:
Keywords = transpose convolution

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
15 pages, 6862 KB  
Article
SLR-Net: Lightweight and Accurate Detection of Weak Small Objects in Satellite Laser Ranging Imagery
by Wei Zhu, Jinlong Hu, Weiming Gong, Yong Wang and Yi Zhang
Sensors 2026, 26(2), 732; https://doi.org/10.3390/s26020732 - 22 Jan 2026
Viewed by 108
Abstract
To address the challenges of insufficient efficiency and accuracy in traditional detection models caused by minute target sizes, low signal-to-noise ratios (SNRs), and feature volatility in Satellite Laser Ranging (SLR) images, this paper proposes an efficient, lightweight, and high-precision detection model. The core [...] Read more.
To address the challenges of insufficient efficiency and accuracy in traditional detection models caused by minute target sizes, low signal-to-noise ratios (SNRs), and feature volatility in Satellite Laser Ranging (SLR) images, this paper proposes an efficient, lightweight, and high-precision detection model. The core motivation of this study is to fundamentally enhance the model’s capabilities in feature extraction, fusion, and localization for minute and blurred targets through a specifically designed network architecture and loss function, without significantly increasing the computational burden. To achieve this goal, we first design a DMS-Conv module. By employing dense sampling and channel function separation strategies, this module effectively expands the receptive field while avoiding the high computational overhead and sampling artifacts associated with traditional multi-scale methods, thereby significantly improving feature representation for faint targets. Secondly, to optimize information flow within the feature pyramid, we propose a Lightweight Upsampling Module (LUM). Integrating depthwise separable convolutions with a channel reshuffling mechanism, this module replaces traditional transposed convolutions at a minimal computational cost, facilitating more efficient multi-scale feature fusion. Finally, addressing the stringent requirements for small target localization accuracy, we introduce the MPD-IoU Loss. By incorporating the diagonal distance of bounding boxes as a geometric penalty term, this loss function provides finer and more direct spatial alignment constraints for model training, effectively boosting localization precision. Experimental results on a self-constructed real-world SLR observation dataset demonstrate that the proposed model achieves an mAP50:95 of 47.13% and an F1-score of 88.24%, with only 2.57 M parameters and 6.7 GFLOPs. Outperforming various mainstream lightweight detectors in the comprehensive performance of precision and recall, these results validate that our method effectively resolves the small target detection challenges in SLR scenarios while maintaining a lightweight design, exhibiting superior performance and practical value. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

18 pages, 2081 KB  
Article
Breast Ultrasound Image Segmentation Integrating Mamba-CNN and Feature Interaction
by Guoliang Yang, Yuyu Zhang and Hao Yang
Sensors 2026, 26(1), 105; https://doi.org/10.3390/s26010105 - 23 Dec 2025
Cited by 1 | Viewed by 565
Abstract
The large scale and shape variation in breast lesions make their segmentation extremely challenging. A breast ultrasound image segmentation model integrating Mamba-CNN and feature interaction is proposed for breast ultrasound images with a large amount of speckle noise and multiple artifacts. The model [...] Read more.
The large scale and shape variation in breast lesions make their segmentation extremely challenging. A breast ultrasound image segmentation model integrating Mamba-CNN and feature interaction is proposed for breast ultrasound images with a large amount of speckle noise and multiple artifacts. The model first uses the visual state space model (VSS) as an encoder for feature extraction to better capture its long-range dependencies. Second, a hybrid attention enhancement mechanism (HAEM) is designed at the bottleneck between the encoder and the decoder to provide fine-grained control of the feature map in both the channel and spatial dimensions, so that the network captures key features and regions more comprehensively. The decoder uses transposed convolution to upsample the feature map, gradually increasing the resolution and recovering its spatial information. Finally, the cross-fusion module (CFM) is constructed to simultaneously focus on the spatial information of the shallow feature map as well as the deep semantic information, which effectively reduces the interference of noise and artifacts. Experiments are carried out on BUSI and UDIAT datasets, and the Dice similarity coefficient and HD95 indexes reach 76.04% and 20.28 mm, respectively, which show that the algorithm can effectively solve the problems of noise and artifacts in ultrasound image segmentation, and the segmentation performance is improved compared with the existing algorithms. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

17 pages, 2961 KB  
Article
Generative Model Construction Based on Highly Rated Koi Images to Evaluate Koi Quality
by Jiahong Gang, Tatsuya Yamazaki and Yusuke Iida
Fishes 2025, 10(12), 655; https://doi.org/10.3390/fishes10120655 - 17 Dec 2025
Viewed by 392
Abstract
Nishikigoi are highly valued ornamental fish whose evaluation affects their market price. However, the judging criteria of the exhibitions remain unclear. This study applies a generative artificial intelligence model to explore potential factors behind non-award-winning Kohaku Nishikigoi. An improved Variational Autoencoder (VAE) is [...] Read more.
Nishikigoi are highly valued ornamental fish whose evaluation affects their market price. However, the judging criteria of the exhibitions remain unclear. This study applies a generative artificial intelligence model to explore potential factors behind non-award-winning Kohaku Nishikigoi. An improved Variational Autoencoder (VAE) is developed based on the standard VAE as follows: introducing perceptual loss to enhance detail, adding mask loss to maintain body shape consistency, and replacing transposed convolutions with UpSampling layers to reduce artifacts. With the improved VAE, we propose a method to evaluate a non-award-winning Koi. Specifically, when the non-award-winning images are input into the model, differences between the input and output images become large to identify visual deficiencies of the inputs, since the improved VAE is designed to generate images that potentially win competitions. For experiments, synthetic non-award-winning Koi images were created by modifying award-winning ones. The synthesized non-award-winning images were input into the improved VAE and the generated images were obtained. Experimental results showed that shape consistency measured by Multi-layer Sliding Window was lower for award-winning images (0.110) than for non-award-winning images (0.141). Also, the average difference in color was smaller for award-winning Koi (4.75%) than for non-award-winning Koi (28.7%). Full article
(This article belongs to the Special Issue Application of Artificial Intelligence in Aquaculture)
Show Figures

Graphical abstract

23 pages, 11094 KB  
Article
RSDB-Net: A Novel Rotation-Sensitive Dual-Branch Network with Enhanced Local Features for Remote Sensing Ship Detection
by Danshu Zhou, Yushan Xiong, Shuangming Yu, Peng Feng, Jian Liu, Nanjian Wu, Runjiang Dou and Liyuan Liu
Remote Sens. 2025, 17(23), 3925; https://doi.org/10.3390/rs17233925 - 4 Dec 2025
Viewed by 415
Abstract
Ship detection in remote sensing imagery is hindered by cluttered backgrounds, large variations in scale, and random orientations, limiting the performance of detectors designed for natural images. We propose RSDB-Net, a Rotation-Sensitive Dual-Branch Detection Network that introduces innovations in feature extraction, fusion, and [...] Read more.
Ship detection in remote sensing imagery is hindered by cluttered backgrounds, large variations in scale, and random orientations, limiting the performance of detectors designed for natural images. We propose RSDB-Net, a Rotation-Sensitive Dual-Branch Detection Network that introduces innovations in feature extraction, fusion, and detection. The Swin Transformer–CNN Backbone (STCBackbone) combines a Swin Transformer for global semantics with a CNN branch for local spatial detail, while the Feature Conversion and Coupling Module (FCCM) aligns and fuses heterogeneous features to handle multi-scale objects, and a Rotation-sensitive Cross-branch Fusion Head (RCFHead) enables bidirectional interaction between classification and localization, improving detection of randomly oriented targets. Additionally, an enhanced Feature Pyramid Network (eFPN) with learnable transposed convolutions restores semantic information while maintaining spatial alignment. Experiments on DOTA-v1.0 and HRSC2016 show that RSDB-Net performs better than the state of the art (SOTA), with mAP-ship values of 89.13% and 90.10% (+5.54% and +44.40% over the baseline, respectively), and reaches 72 FPS on an RTX 3090. RSDB-Net also demonstrates strong generalization and scalability, providing an effective solution for rotation-aware ship detection. Full article
Show Figures

Figure 1

21 pages, 1194 KB  
Article
Retentive-HAR: Human Activity Recognition from Wearable Sensors with Enhanced Temporal and Inter-Feature Dependency Retention
by Ayokunle Olalekan Ige, Daniel Ayo Oladele and Malusi Sibiya
Appl. Sci. 2025, 15(23), 12661; https://doi.org/10.3390/app152312661 - 29 Nov 2025
Viewed by 747
Abstract
Human Activity Recognition (HAR) using wearable sensor data plays a vital role in health monitoring, context-aware computing, and smart environments. Many existing deep learning models for HAR incorporate MaxPooling layers after convolutional operations to reduce dimensionality and computational load. While this approach is [...] Read more.
Human Activity Recognition (HAR) using wearable sensor data plays a vital role in health monitoring, context-aware computing, and smart environments. Many existing deep learning models for HAR incorporate MaxPooling layers after convolutional operations to reduce dimensionality and computational load. While this approach is effective in image-based tasks, it is less suitable for the sensor signals used in HAR. MaxPooling introduces a form of temporal downsampling that can discard subtle yet crucial temporal information. Also, traditional CNNs often struggle to capture long-range dependencies within each window due to their limited receptive fields, and they lack effective mechanisms to aggregate information across multiple windows without stacking multiple layers, which increases computational cost. In this study, we introduce Retentive-HAR, a model designed to enhance feature learning by capturing dependencies both within and across sliding windows. The proposed model intentionally omits the MaxPooling layer, thereby preserving the full temporal resolution throughout the network. The model begins with parallel dilated convolutions, which capture long-range dependencies within each window. Feature outputs from these convolutional layers are then concatenated along the feature dimension and transposed, allowing the Retentive Module to analyze dependencies across both window and feature dimensions. Additional 1D-CNN layers are then applied to the transposed feature maps to capture complex interactions across concatenated window representations before including Bi-LSTM layers. Experiments on PAMAP2, HAPT, and WISDM datasets achieve a performance of 96.40%, 94.70%, and 96.16%, respectively, which outperforms the existing methods with minimal computational cost. Full article
Show Figures

Figure 1

23 pages, 3777 KB  
Article
Quantum Down-Sampling Filter for Variational Autoencoder
by Farina Riaz, Fakhar Zaman, Hajime Suzuki, Alsharif Abuadbba and David Nguyen
Electronics 2025, 14(23), 4626; https://doi.org/10.3390/electronics14234626 - 25 Nov 2025
Viewed by 478
Abstract
Variational Autoencoders (VAEs) are fundamental for generative modeling and image reconstruction, yet their performance often struggles to maintain high fidelity in reconstructions. This study introduces a hybrid model, Quantum Variational Autoencoder (Q-VAE), which integrates quantum encoding within the encoder while utilizing fully connected [...] Read more.
Variational Autoencoders (VAEs) are fundamental for generative modeling and image reconstruction, yet their performance often struggles to maintain high fidelity in reconstructions. This study introduces a hybrid model, Quantum Variational Autoencoder (Q-VAE), which integrates quantum encoding within the encoder while utilizing fully connected layers to extract meaningful representations. The decoder uses transposed convolution layers for up-sampling. The Q-VAE is evaluated against the classical VAE and the classical direct-passing VAE, which utilizes windowed pooling filters. Results on the MNIST and USPS datasets demonstrate that Q-VAE consistently outperforms classical approaches, achieving lower Fréchet Inception Distance scores, thereby indicating superior image fidelity and enhanced reconstruction quality. These findings highlight the potential of Q-VAE for high-quality synthetic data generation and improved image reconstruction in generative models. Full article
(This article belongs to the Special Issue Second Quantum Revolution: Sensing, Computing, and Transmitting)
Show Figures

Figure 1

14 pages, 1280 KB  
Article
DMBT Decoupled Multi-Modal Binding Transformer for Multimodal Sentiment Analysis
by Rui Guo, Gu Gong and Fan Jiang
Electronics 2025, 14(21), 4296; https://doi.org/10.3390/electronics14214296 - 31 Oct 2025
Viewed by 639
Abstract
The performance of Multimodal Sentiment Analysis (MSA) is commonly hindered by two major bottlenecks: the complexity and redundancy associated with supervised feature disentanglement and the coarse granularity of static fusion mechanisms. To systematically address these challenges, a novel framework, the Decoupled Multi-modal Binding [...] Read more.
The performance of Multimodal Sentiment Analysis (MSA) is commonly hindered by two major bottlenecks: the complexity and redundancy associated with supervised feature disentanglement and the coarse granularity of static fusion mechanisms. To systematically address these challenges, a novel framework, the Decoupled Multi-modal Binding Transformer (DMBT), is proposed. The framework first introduces an Unsupervised Semantic Disentanglement (USD) module, which resolves the issue of complex redundancy by cleanly separating features into modality-common and modality-specific components in a lightweight, parameter-free manner. Subsequently, to tackle the challenge of coarse-grained fusion, a Gated Interaction and Fusion Transformer (GIFT) is constructed as the core engine. The exceptional performance of GIFT is driven by two synergistic components. The first is a Multi-modal Binding Transposed Attention (MBTA) that employs a hybrid convolutional and attention model to concurrently perceive both global context and local fine-grained features, and then a Dynamic Fusion Gate (DFG) that performs final, adaptive decision-making by re-weighting all deeply enhanced representations. Extensive experiments on the CMU-MOSI and CMU-MOSEI benchmarks demonstrate that the proposed DMBT framework surpasses existing state-of-the-art models across all key evaluation metrics. The efficacy of each innovative component is further validated through comprehensive ablation studies. Full article
Show Figures

Figure 1

21 pages, 3725 KB  
Article
Pruning-Friendly RGB-T Semantic Segmentation for Real-Time Processing on Edge Devices
by Jun Young Hwang, Youn Joo Lee, Ho Gi Jung and Jae Kyu Suhr
Electronics 2025, 14(17), 3408; https://doi.org/10.3390/electronics14173408 - 27 Aug 2025
Viewed by 1411
Abstract
RGB-T semantic segmentation using thermal and RGB images simultaneously is actively being researched to robustly recognize the surrounding environment of vehicles regardless of challenging lighting and weather conditions. It is important for them to operate in real time on edge devices. As transformer-based [...] Read more.
RGB-T semantic segmentation using thermal and RGB images simultaneously is actively being researched to robustly recognize the surrounding environment of vehicles regardless of challenging lighting and weather conditions. It is important for them to operate in real time on edge devices. As transformer-based approaches, which most recent RGB-T semantic segmentation studies belong to, are very difficult to perform on edge devices, this paper considers only CNN-based RGB-T semantic segmentation networks that can be performed on edge devices and operated in real time. Although EAEFNet shows the best performance among CNN-based networks on edge devices, its inference speed is too slow for real-time operation. Furthermore, even when channel pruning is applied, the speed improvement is minimal. The analysis of EAEFNet identifies the intermediate fusion of RGB and thermal features and the high complexity of the decoder as the main causes. To address these issues, this paper proposes a network using a ResNet encoder with an early-fused four-channel input and the U-Net decoder structure. To improve the decoder performance, bilinear upsampling is replaced with PixelShuffle. Additionally, mini Atrous Spatial Pyramid Pooling (ASPP) and Progressive Transposed Module (PTM) modules are applied. Since the Proposed Network is primarily composed of convolutional layers, channel pruning is confirmed to be effectively applicable. Consequently, channel pruning significantly improves inference speed, and enables real-time operation on the neural processing unit (NPU) of edge devices. The Proposed Network is evaluated using the MFNet dataset, one of the most widely used public datasets for RGB-T semantic segmentation. It is shown that the proposed method achieves a performance comparable to EAEFNet while operating at over 30 FPS on an embedded board equipped with the Qualcomm QCS6490 SoC. Full article
(This article belongs to the Special Issue New Insights in 2D and 3D Object Detection and Semantic Segmentation)
Show Figures

Figure 1

12 pages, 7323 KB  
Article
WinEdge: Low-Power Winograd CNN Execution with Transposed MRAM for Edge Devices
by Milad Ashtari Gargari, Sepehr Tabrizchi and Arman Roohi
Electronics 2025, 14(12), 2485; https://doi.org/10.3390/electronics14122485 - 19 Jun 2025
Viewed by 850
Abstract
This paper presents a novel transposed MRAM architecture (WinEdge) specifically optimized for Winograd convolution acceleration in edge computing devices. Leveraging Magnetic Tunnel Junctions (MTJs) with Spin Hall Effect (SHE)-assisted Spin-Transfer Torque (STT) writing, the proposed design enables a single SHE current to simultaneously [...] Read more.
This paper presents a novel transposed MRAM architecture (WinEdge) specifically optimized for Winograd convolution acceleration in edge computing devices. Leveraging Magnetic Tunnel Junctions (MTJs) with Spin Hall Effect (SHE)-assisted Spin-Transfer Torque (STT) writing, the proposed design enables a single SHE current to simultaneously write data to four MTJs, substantially reducing power consumption. Additionally, the integration of stacked MTJs significantly improves storage density. The proposed WinEdge efficiently supports both standard and transposed data access modes regardless of bit-width, achieving up to 36% lower power, 47% reduced energy consumption, and 28% faster processing speed compared to existing designs. Simulations conducted in 45 nm CMOS technology validate its superiority over conventional SRAM-based solutions for convolutional neural network (CNN) acceleration in resource-constrained edge environments. Full article
(This article belongs to the Special Issue Emerging Computing Paradigms for Efficient Edge AI Acceleration)
Show Figures

Figure 1

16 pages, 2672 KB  
Article
AI-Powered Convolutional Neural Network Surrogate Modeling for High-Speed Finite Element Analysis in the NPPs Fuel Performance Framework
by Salvatore A. Cancemi, Andrius Ambrutis, Mantas Povilaitis and Rosa Lo Frano
Energies 2025, 18(10), 2557; https://doi.org/10.3390/en18102557 - 15 May 2025
Cited by 2 | Viewed by 3464
Abstract
Convolutional Neural Networks (CNNs) are proposed for use in the nuclear power plant domain as surrogate models to enhance the computational efficiency of finite element analyses in simulating nuclear fuel behavior under varying conditions. The dataset comprises 3D fuel pellet FE models and [...] Read more.
Convolutional Neural Networks (CNNs) are proposed for use in the nuclear power plant domain as surrogate models to enhance the computational efficiency of finite element analyses in simulating nuclear fuel behavior under varying conditions. The dataset comprises 3D fuel pellet FE models and involves 13 input features, such as pressure, Young’s modulus, and temperature. CNNs predict outcomes like displacement, von Mises stress, and creep strain from these inputs, significantly reducing the simulation time from several seconds per analysis to approximately one second. The data are normalized using local and global min–max scaling to maintain consistency across inputs and outputs, facilitating accurate model learning. The CNN architecture includes multiple dense, reshaping, and transpose convolution layers, optimized through a brute-force hyperparameter tuning process and validated using a 5-fold cross-validation approach. The study employs the Adam optimizer, with a significant reduction in computational time highlighted using a GPU, which outperforms traditional CPUs significantly in training speed. The findings suggest that integrating CNN models into nuclear fuel analysis can drastically reduce computational times while maintaining accuracy, making them valuable for real-time monitoring and decision-making within nuclear power plant operations. Full article
Show Figures

Figure 1

20 pages, 3901 KB  
Article
Design and Implementation of a Lightweight and Energy-Efficient Semantic Segmentation Accelerator for Embedded Platforms
by Hui Li, Jinyi Li, Bowen Li, Zhengqian Miao and Shengli Lu
Micromachines 2025, 16(3), 258; https://doi.org/10.3390/mi16030258 - 25 Feb 2025
Cited by 3 | Viewed by 1675
Abstract
With the rapid development of lightweight network models and efficient hardware deployment techniques, the demand for real-time semantic segmentation in areas such as autonomous driving and medical image processing has increased significantly. However, realizing efficient semantic segmentation on resource-constrained embedded platforms still faces [...] Read more.
With the rapid development of lightweight network models and efficient hardware deployment techniques, the demand for real-time semantic segmentation in areas such as autonomous driving and medical image processing has increased significantly. However, realizing efficient semantic segmentation on resource-constrained embedded platforms still faces many challenges. As a classical lightweight semantic segmentation network, ENet has attracted much attention due to its low computational complexity. In this study, we optimize the ENet semantic segmentation network to significantly reduce its computational complexity through structural simplification and 8-bit quantization and improve its hardware compatibility through the optimization of on-chip data storage and data transfer while maintaining 51.18% mIoU. The optimized network is successfully deployed on hardware accelerator and SoC systems based on Xilinx ZYNQ ZCU104 FPGA. In addition, we optimize the computational units of transposed convolution and dilated convolution and improve the on-chip data storage and data transfer design. The optimized system achieves a frame rate of 130.75 FPS, which meets the real-time processing requirements in areas such as autonomous driving and medical imaging. Meanwhile, the power consumption of the accelerator is 3.479 W, the throughput reaches 460.8 GOPS, and the energy efficiency reaches 132.2 GOPS/W. These results fully demonstrate the effectiveness of the optimization and deployment strategies in achieving a balance between computational efficiency and accuracy, which makes the system well suited for resource-constrained embedded platform applications. Full article
Show Figures

Figure 1

23 pages, 20581 KB  
Article
A Novel Pseudo-Siamese Fusion Network for Enhancing Semantic Segmentation of Building Areas in Synthetic Aperture Radar Images
by Mengguang Liao, Longcheng Huang and Shaoning Li
Appl. Sci. 2025, 15(5), 2339; https://doi.org/10.3390/app15052339 - 21 Feb 2025
Viewed by 1123
Abstract
Segmenting building areas from synthetic aperture radar (SAR) images holds significant research value and practical application potential. However, the complexity of the environment, the diversity of building shapes, and the interference from speckle noise have made building area segmentation from SAR images a [...] Read more.
Segmenting building areas from synthetic aperture radar (SAR) images holds significant research value and practical application potential. However, the complexity of the environment, the diversity of building shapes, and the interference from speckle noise have made building area segmentation from SAR images a challenging research topic. Compared to traditional methods, deep learning-driven approaches exhibit superiority in the aspect of stability and efficiency. Currently, most segmentation methods use a single neural network to encode SAR images, then decode them through interpolation or transpose convolution operations, and finally obtain the segmented building area images using a loss function. Although effective, the methods result in the loss of detailed information and do not fully extract the deep-level features of building areas. Therefore, we propose an innovative network named PSANet. First, two sets of deep-level features of building areas were extracted using ResNet-18 and ResNet-34, with five encoded features of varying scales obtained through a fusion algorithm. Meanwhile, information on the deepest-level encoded features was enriched utilizing an atrous spatial pyramid pooling module. Next, the encoded features were reconstructed through skip connections and transposed convolution operations to obtain discriminative features of the building areas. Finally, the model was optimized using the combined CE-Dice loss function to achieve superior performance. The experimental results of the SAR images from regions with different geographical characteristics demonstrate that the proposed PSANet outperforms several recent State-of-the-Art methods. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Semantic Segmentation, 2nd Edition)
Show Figures

Figure 1

21 pages, 6133 KB  
Article
BEV Semantic Map Reconstruction for Self-Driving Cars with the Multi-Head Attention Mechanism
by Yi-Cheng Liao, Jichiang Tsai and Hsuan-Ying Chien
Electronics 2025, 14(1), 32; https://doi.org/10.3390/electronics14010032 - 25 Dec 2024
Viewed by 3871
Abstract
Environmental perception is crucial for safe autonomous driving, enabling accurate analysis of the vehicle’s surroundings. While 3D LiDAR is traditionally used for 3D environment reconstruction, its high cost and complexity present challenges. In contrast, camera-based cross-view frameworks can offer a cost-effective alternative. Hence, [...] Read more.
Environmental perception is crucial for safe autonomous driving, enabling accurate analysis of the vehicle’s surroundings. While 3D LiDAR is traditionally used for 3D environment reconstruction, its high cost and complexity present challenges. In contrast, camera-based cross-view frameworks can offer a cost-effective alternative. Hence, this manuscript proposes a new cross-view model to extract mapping features from camera images and then transfer them to a Bird’s-Eye View (BEV) map. Particularly, a multi-head attention mechanism in the decoder architecture generates the final semantic map. Each camera learns embedding information corresponding to its position and angle within the BEV map. Cross-view attention fuses information from different perspectives to predict top-down map features enriched with spatial information. The multi-head attention mechanism then globally performs dependency matches, enhancing long-range information and capturing latent relationships between features. Transposed convolution replaces traditional upsampling methods, avoiding high similarities of local features and facilitating semantic segmentation inference of the BEV map. Finally, we conduct numerous simulation experiments to verify the performance of our cross-view model. Full article
(This article belongs to the Special Issue Advancement on Smart Vehicles and Smart Travel)
Show Figures

Figure 1

15 pages, 6894 KB  
Article
A Novel Approach to Pedestrian Re-Identification in Low-Light and Zero-Shot Scenarios: Exploring Transposed Convolutional Reflectance Decoders
by Zhenghao Li and Jiping Xiong
Electronics 2024, 13(20), 4069; https://doi.org/10.3390/electronics13204069 - 16 Oct 2024
Viewed by 1694
Abstract
In recent years, pedestrian re-identification technology has made significant progress, with various neural network models performing well under normal conditions, such as good weather and adequate lighting. However, most research has overlooked extreme environments, such as rainy weather and nighttime. Additionally, the existing [...] Read more.
In recent years, pedestrian re-identification technology has made significant progress, with various neural network models performing well under normal conditions, such as good weather and adequate lighting. However, most research has overlooked extreme environments, such as rainy weather and nighttime. Additionally, the existing pedestrian re-identification datasets predominantly consist of well-lit images. Although some studies have started to address these issues by proposing methods for enhancing low-light images to restore their original features, the effectiveness of these approaches remains limited. We noted that a method based on Retinex theory designed a reflectance representation learning module aimed at restoring image features as much as possible. However, this method has so far only been applied in object detection networks. In response to this, we improved the method and applied it to pedestrian re-identification, proposing a transposed convolution reflectance decoder (TransConvRefDecoder) to better restore details in low-light images. Extensive experiments on the Market1501, CUHK03, and MSMT17 datasets demonstrated that our approach delivered superior performance. Full article
Show Figures

Figure 1

24 pages, 9004 KB  
Article
NPSFF-Net: Enhanced Building Segmentation in Remote Sensing Images via Novel Pseudo-Siamese Feature Fusion
by Ningbo Guo, Mingyong Jiang, Xiaoyu Hu, Zhijuan Su, Weibin Zhang, Ruibo Li and Jiancheng Luo
Remote Sens. 2024, 16(17), 3266; https://doi.org/10.3390/rs16173266 - 3 Sep 2024
Cited by 6 | Viewed by 2092
Abstract
Building segmentation has extensive research value and application prospects in high-resolution remote sensing image (HRSI) processing. However, complex architectural contexts, varied building morphologies, and non-building occlusions make building segmentation challenging. Compared with traditional methods, deep learning-based methods present certain advantages in terms of [...] Read more.
Building segmentation has extensive research value and application prospects in high-resolution remote sensing image (HRSI) processing. However, complex architectural contexts, varied building morphologies, and non-building occlusions make building segmentation challenging. Compared with traditional methods, deep learning-based methods present certain advantages in terms of accuracy and intelligence. At present, the most popular option is to first apply a single neural network to encode an HRSI, then perform a decoding process through up-sampling or using a transposed convolution operation, and then finally obtain the segmented building image with the help of a loss function. Although effective, this approach not only tends to lead to a loss of detail information, but also fails to fully utilize the contextual features. As an alternative, we propose a novel network called NPSFF-Net. First, using an improved pseudo-Siamese network composed of ResNet-34 and ResNet-50, two sets of deep semantic features of buildings are extracted with the support of transfer learning, and four encoded features at different scales are obtained after fusion. Then, information from the deepest encoded feature is enriched using a feature enhancement module, and the resolutions are recovered via the operations of skip connections and transposed convolutions. Finally, the discriminative features of buildings are obtained using the designed feature fusion algorithm, and the optimal segmentation model is obtained by fitting a cross-entropy loss function. Our method obtained intersection-over-union values of 89.45% for the Aerial Imagery Dataset, 71.88% for the Massachusetts Buildings Dataset, and 68.72% for the Satellite Dataset I. Full article
(This article belongs to the Section Remote Sensing Image Processing)
Show Figures

Figure 1

Back to TopTop