Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (20)

Search Parameters:
Keywords = dual-stream decoder

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
24 pages, 5200 KiB  
Article
Edge-Guided Dual-Stream U-Net for Secure Image Steganography
by Peng Ji, Youlue Zhang and Zhongyou Lv
Appl. Sci. 2025, 15(8), 4413; https://doi.org/10.3390/app15084413 - 17 Apr 2025
Viewed by 358
Abstract
Steganography, a technique for concealing information, often faces challenges such as low decoding accuracy and inadequate extraction of edge and global features. To overcome these limitations, we propose a dual-stream U-Net framework with integrated edge enhancement for image steganography. Our main contributions include [...] Read more.
Steganography, a technique for concealing information, often faces challenges such as low decoding accuracy and inadequate extraction of edge and global features. To overcome these limitations, we propose a dual-stream U-Net framework with integrated edge enhancement for image steganography. Our main contributions include the adoption of a dual-stream U-Net structure in the encoder, integrating an edge-enhancement stream with the InceptionDMK module for multi-scale edge detail extraction, and incorporating a multi-scale median attention (MSMA) module into the original input stream to enhance feature representation. This dual-stream design promotes deep feature fusion, thereby improving the edge details and embedding capacity of stego images. Moreover, an iterative optimization strategy is employed to progressively refine the selection of cover images and the embedding process, achieving enhanced stego quality and decoding performance. Experiments show that our method produces high-quality stego images across multiple public datasets, achieving near-100% decoding accuracy. It also surpasses existing methods in visual quality metrics like PSNR and SSIM. This framework offers a promising approach for enhancing steganographic security in real-world applications such as secure communication and data protection. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

16 pages, 5387 KiB  
Article
Dual-Stream Contrastive Latent Learning Generative Adversarial Network for Brain Image Synthesis and Tumor Classification
by Junaid Zafar, Vincent Koc and Haroon Zafar
J. Imaging 2025, 11(4), 101; https://doi.org/10.3390/jimaging11040101 - 28 Mar 2025
Viewed by 403
Abstract
Generative adversarial networks (GANs) prioritize pixel-level attributes over capturing the entire image distribution, which is critical in image synthesis. To address this challenge, we propose a dual-stream contrastive latent projection generative adversarial network (DSCLPGAN) for the robust augmentation of MRI images. The dual-stream [...] Read more.
Generative adversarial networks (GANs) prioritize pixel-level attributes over capturing the entire image distribution, which is critical in image synthesis. To address this challenge, we propose a dual-stream contrastive latent projection generative adversarial network (DSCLPGAN) for the robust augmentation of MRI images. The dual-stream generator in our architecture incorporates two specialized processing pathways: one is dedicated to local feature variation modeling, while the other captures global structural transformations, ensuring a more comprehensive synthesis of medical images. We used a transformer-based encoder–decoder framework for contextual coherence and the contrastive learning projection (CLP) module integrates contrastive loss into the latent space for generating diverse image samples. The generated images undergo adversarial refinement using an ensemble of specialized discriminators, where discriminator 1 (D1) ensures classification consistency with real MRI images, discriminator 2 (D2) produces a probability map of localized variations, and discriminator 3 (D3) preserves structural consistency. For validation, we utilized a publicly available MRI dataset which contains 3064 T1-weighted contrast-enhanced images with three types of brain tumors: meningioma (708 slices), glioma (1426 slices), and pituitary tumor (930 slices). The experimental results demonstrate state-of-the-art performance, achieving an SSIM of 0.99, classification accuracy of 99.4% for an augmentation diversity level of 5, and a PSNR of 34.6 dB. Our approach has the potential of generating high-fidelity augmentations for reliable AI-driven clinical decision support systems. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

20 pages, 6296 KiB  
Article
Privacy-Preserving Image Captioning with Partial Encryption and Deep Learning
by Antoinette Deborah Martin and Inkyu Moon
Mathematics 2025, 13(4), 554; https://doi.org/10.3390/math13040554 - 7 Feb 2025
Viewed by 576
Abstract
Although image captioning has gained remarkable interest, privacy concerns are raised because it relies heavily on images, and there is a risk of exposing sensitive information in the image data. In this study, a privacy-preserving image captioning framework that leverages partial encryption using [...] Read more.
Although image captioning has gained remarkable interest, privacy concerns are raised because it relies heavily on images, and there is a risk of exposing sensitive information in the image data. In this study, a privacy-preserving image captioning framework that leverages partial encryption using Double Random Phase Encoding (DRPE) and deep learning is proposed to address privacy concerns. Unlike previous methods that rely on full encryption or masking, our approach involves encrypting sensitive regions of the image while preserving the image’s overall structure and context. Partial encryption ensures that the sensitive regions’ information is preserved instead of lost by masking it with a black or gray box. It also allows the model to process both encrypted and unencrypted regions, which could be problematic for models with fully encrypted images. Our framework follows an encoder–decoder architecture where a dual-stream encoder based on ResNet50 extracts features from the partially encrypted images, and a transformer architecture is employed in the decoder to generate captions from these features. We utilize the Flickr8k dataset and encrypt the sensitive regions using DRPE. The partially encrypted images are then fed to the dual-stream encoder, which processes the real and imaginary parts of the encrypted regions separately for effective feature extraction. Our model is evaluated using standard metrics and compared with models trained on the original images. Our results demonstrate that our method achieves comparable performance to models trained on original and masked images and outperforms models trained on fully encrypted data, thus verifying the feasibility of partial encryption in privacy-preserving image captioning. Full article
(This article belongs to the Section E2: Control Theory and Mechanics)
Show Figures

Figure 1

30 pages, 13159 KiB  
Article
GLMAFuse: A Dual-Stream Infrared and Visible Image Fusion Framework Integrating Local and Global Features with Multi-Scale Attention
by Fu Li, Yanghai Gu, Ming Zhao, Deji Chen and Quan Wang
Electronics 2024, 13(24), 5002; https://doi.org/10.3390/electronics13245002 - 19 Dec 2024
Viewed by 811
Abstract
Integrating infrared and visible-light images facilitates a more comprehensive understanding of scenes by amalgamating dual-sensor data derived from identical environments. Traditional CNN-based fusion techniques are predominantly confined to local feature emphasis due to their inherently limited receptive fields. Conversely, Transformer-based models tend to [...] Read more.
Integrating infrared and visible-light images facilitates a more comprehensive understanding of scenes by amalgamating dual-sensor data derived from identical environments. Traditional CNN-based fusion techniques are predominantly confined to local feature emphasis due to their inherently limited receptive fields. Conversely, Transformer-based models tend to prioritize global information, which can lead to a deficiency in feature diversity and detail retention. Furthermore, methods reliant on single-scale feature extraction are inadequate for capturing extensive scene information. To address these limitations, this study presents GLMAFuse, an innovative dual-stream encoder–decoder network, which utilizes a multi-scale attention mechanism to harmoniously integrate global and local features. This framework is designed to maximize the extraction of multi-scale features from source images while effectively synthesizing local and global information across all layers. We introduce the global-aware and local embedding (GALE) module to adeptly capture and merge global structural attributes and localized details from infrared and visible imagery via a parallel dual-branch architecture. Additionally, the multi-scale attention fusion (MSAF) module is engineered to optimize attention weights at the channel level, facilitating an enhanced synergy between high-frequency edge details and global backgrounds. This promotes effective interaction and fusion of dual-modal features. Extensive evaluations using standard datasets demonstrate that GLMAFuse surpasses the existing leading methods in both qualitative and quantitative assessments, highlighting its superior capability in infrared and visible image fusion. On the TNO and MSRS datasets, our method achieves outstanding performance across multiple metrics, including EN (7.15, 6.75), SD (46.72, 47.55), SF (12.79, 12.56), MI (2.21, 3.22), SCD (1.75, 1.80), VIF (0.79, 1.08), Qbaf (0.58, 0.71), and SSIM (0.99, 1.00). These results underscore its exceptional proficiency in infrared and visible image fusion. Full article
(This article belongs to the Special Issue Artificial Intelligence Innovations in Image Processing)
Show Figures

Figure 1

22 pages, 1174 KiB  
Article
Dual Stream Encoder–Decoder Architecture with Feature Fusion Model for Underwater Object Detection
by Mehvish Nissar, Amit Kumar Mishra and Badri Narayan Subudhi
Mathematics 2024, 12(20), 3227; https://doi.org/10.3390/math12203227 - 15 Oct 2024
Cited by 1 | Viewed by 1319
Abstract
Underwater surveillance is an imminent and fascinating exploratory domain, particularly in monitoring aquatic ecosystems. This field offers valuable insights into underwater behavior and activities, which have broad applications across various domains. Specifically, underwater surveillance involves detecting and tracking moving objects within aquatic environments. [...] Read more.
Underwater surveillance is an imminent and fascinating exploratory domain, particularly in monitoring aquatic ecosystems. This field offers valuable insights into underwater behavior and activities, which have broad applications across various domains. Specifically, underwater surveillance involves detecting and tracking moving objects within aquatic environments. However, the complex properties of water make object detection a challenging task. Background subtraction is a commonly employed technique for detecting local changes in video scenes by segmenting images into the background and foreground to isolate the object of interest. Within this context, we propose an innovative dual-stream encoder–decoder framework based on the VGG-16 and ResNet-50 models for detecting moving objects in underwater frames. The network includes a feature fusion module that effectively extracts multiple-level features. Using a limited set of images and performing training in an end-to-end manner, the proposed framework yields accurate results without post-processing. The efficacy of the proposed technique is confirmed through visual and quantitative comparisons with eight cutting-edge methods using two standard databases. The first one employed in our experiments is the Underwater Change Detection Dataset, which includes five challenges, each challenge comprising approximately 1000 frames. The categories in this dataset were recorded under various underwater conditions. The second dataset used for practical analysis is the Fish4Knowledge dataset, where we considered five challenges. Each category, recorded in different aquatic settings, contains a varying number of frames, typically exceeding 1000 per category. Our proposed method surpasses all methods used for comparison by attaining an average F-measure of 0.98 on the Underwater Change Detection Dataset and 0.89 on the Fish4Knowledge dataset. Full article
(This article belongs to the Section E1: Mathematics and Computer Science)
Show Figures

Figure 1

27 pages, 5309 KiB  
Article
A Case Study on the Integration of Powerline Communications and Visible Light Communications from a Power Electronics Perspective
by Felipe Loose, Juan Ramón Garcia-Meré, Adrion Andrei Rosanelli, Carlos Henrique Barriquello, José Antonio Fernandez Alvárez, Juan Rodríguez and Diego González Lamar
Sensors 2024, 24(20), 6627; https://doi.org/10.3390/s24206627 - 14 Oct 2024
Viewed by 1479
Abstract
This paper presents a dual-purpose LED driver system that functions as both a lighting source and a Visible Light Communication (VLC) transmitter integrated with a Powerline Communication (PLC) network under the PRIME G3 standard. The system decodes PLC messages from the powerline grid [...] Read more.
This paper presents a dual-purpose LED driver system that functions as both a lighting source and a Visible Light Communication (VLC) transmitter integrated with a Powerline Communication (PLC) network under the PRIME G3 standard. The system decodes PLC messages from the powerline grid and transmits the information via LED light to an optical receiver under a binary phase shift keying (BPSK) modulation. The load design targets a light flux of 800 lumens, suitable for LED light bulb applications up to 10 watts, ensuring practicality and energy efficiency. The Universal Asynchronous Receiver-Transmitter (UART) module enables communication between the PLC and VLC systems, allowing for an LED driver with dynamic control and real-time operation. Key signal processing stages are commented and developed, including a hybrid buck converter with modulation capabilities and a nonlinear optical receiver to regenerate the BPSK reference signal for VLC. Results show a successful prototype working under a laboratory environment. Experimental validation shows successful transmission of bit streams from the PLC grid to the VLC setup. A design guideline is presented in order to dictate the design of the electronic devices involved in the experiment. Finally, this research highlights the feasibility of integrating PLC and VLC technologies, offering an efficient and cost-effective solution for data transmission over existing infrastructure. Full article
(This article belongs to the Special Issue Challenges and Future Trends in Optical Communications)
Show Figures

Figure 1

16 pages, 1521 KiB  
Article
A Novel End-to-End Deep Learning Framework for Chip Packaging Defect Detection
by Siyi Zhou, Shunhua Yao, Tao Shen and Qingwang Wang
Sensors 2024, 24(17), 5837; https://doi.org/10.3390/s24175837 - 8 Sep 2024
Cited by 3 | Viewed by 2791
Abstract
As semiconductor chip manufacturing technology advances, chip structures are becoming more complex, leading to an increased likelihood of void defects in the solder layer during packaging. However, identifying void defects in packaged chips remains a significant challenge due to the complex chip background, [...] Read more.
As semiconductor chip manufacturing technology advances, chip structures are becoming more complex, leading to an increased likelihood of void defects in the solder layer during packaging. However, identifying void defects in packaged chips remains a significant challenge due to the complex chip background, varying defect sizes and shapes, and blurred boundaries between voids and their surroundings. To address these challenges, we present a deep-learning-based framework for void defect segmentation in chip packaging. The framework consists of two main components: a solder region extraction method and a void defect segmentation network. The solder region extraction method includes a lightweight segmentation network and a rotation correction algorithm that eliminates background noise and accurately captures the solder region of the chip. The void defect segmentation network is designed for efficient and accurate defect segmentation. To cope with the variability of void defect shapes and sizes, we propose a Mamba model-based encoder that uses a visual state space module for multi-scale information extraction. In addition, we propose an interactive dual-stream decoder that uses a feature correlation cross gate module to fuse the streams’ features to improve their correlation and produce more accurate void defect segmentation maps. The effectiveness of the framework is evaluated through quantitative and qualitative experiments on our custom X-ray chip dataset. Furthermore, the proposed void defect segmentation framework for chip packaging has been applied to a real factory inspection line, achieving an accuracy of 93.3% in chip qualification. Full article
(This article belongs to the Section Fault Diagnosis & Sensors)
Show Figures

Figure 1

14 pages, 9214 KiB  
Article
End-to-End Implicit Object Pose Estimation
by Chen Cao, Baocheng Yu, Wenxia Xu, Guojun Chen and Yuming Ai
Sensors 2024, 24(17), 5721; https://doi.org/10.3390/s24175721 - 3 Sep 2024
Cited by 1 | Viewed by 1357
Abstract
To accurately estimate the 6D pose of objects, most methods employ a two-stage algorithm. While such two-stage algorithms achieve high accuracy, they are often slow. Additionally, many approaches utilize encoding–decoding to obtain the 6D pose, with many employing bilinear sampling for decoding. However, [...] Read more.
To accurately estimate the 6D pose of objects, most methods employ a two-stage algorithm. While such two-stage algorithms achieve high accuracy, they are often slow. Additionally, many approaches utilize encoding–decoding to obtain the 6D pose, with many employing bilinear sampling for decoding. However, bilinear sampling tends to sacrifice the accuracy of precise features. In our research, we propose a novel solution that utilizes implicit representation as a bridge between discrete feature maps and continuous feature maps. We represent the feature map as a coordinate field, where each coordinate pair corresponds to a feature value. These feature values are then used to estimate feature maps of arbitrary scales, replacing upsampling for decoding. We apply the proposed implicit module to a bidirectional fusion feature pyramid network. Based on this implicit module, we propose three network branches: a class estimation branch, a bounding box estimation branch, and the final pose estimation branch. For this pose estimation branch, we propose a miniature dual-stream network, which estimates object surface features and complements the relationship between 2D and 3D. We represent the rotation component using the SVD (Singular Value Decomposition) representation method, resulting in a more accurate object pose. We achieved satisfactory experimental results on the widely used 6D pose estimation benchmark dataset Linemod. This innovative approach provides a more convenient solution for 6D object pose estimation. Full article
(This article belongs to the Section Physical Sensors)
Show Figures

Figure 1

22 pages, 56379 KiB  
Article
Utilizing Dual-Stream Encoding and Transformer for Boundary-Aware Agricultural Parcel Extraction in Remote Sensing Images
by Weiming Xu, Juan Wang, Chengjun Wang, Ziwei Li, Jianchang Zhang, Hua Su and Sheng Wu
Remote Sens. 2024, 16(14), 2637; https://doi.org/10.3390/rs16142637 - 18 Jul 2024
Cited by 1 | Viewed by 1270
Abstract
The accurate extraction of agricultural parcels from remote sensing images is crucial for advanced agricultural management and monitoring systems. Existing methods primarily emphasize regional accuracy over boundary quality, often resulting in fragmented outputs due to uniform crop types, diverse agricultural practices, and environmental [...] Read more.
The accurate extraction of agricultural parcels from remote sensing images is crucial for advanced agricultural management and monitoring systems. Existing methods primarily emphasize regional accuracy over boundary quality, often resulting in fragmented outputs due to uniform crop types, diverse agricultural practices, and environmental variations. To address these issues, this paper proposes DSTBA-Net, an end-to-end encoder–decoder architecture. Initially, we introduce a Dual-Stream Feature Extraction (DSFE) mechanism within the encoder, which consists of Residual Blocks and Boundary Feature Guidance (BFG) to separately process image and boundary data. The extracted features are then fused in the Global Feature Fusion Module (GFFM), utilizing Transformer technology to further integrate global and detailed information. In the decoder, we employ Feature Compensation Recovery (FCR) to restore critical information lost during the encoding process. Additionally, the network is optimized using a boundary-aware weighted loss strategy. DSTBA-Net aims to achieve high precision in agricultural parcel segmentation and accurate boundary extraction. To evaluate the model’s effectiveness, we conducted experiments on agricultural parcel extraction in Denmark (Europe) and Shandong (Asia). Both quantitative and qualitative analyses show that DSTBA-Net outperforms comparative methods, offering significant advantages in agricultural parcel extraction. Full article
Show Figures

Graphical abstract

17 pages, 10137 KiB  
Article
MFFnet: Multimodal Feature Fusion Network for Synthetic Aperture Radar and Optical Image Land Cover Classification
by Yangyang Wang, Wengang Zhang, Weidong Chen, Chang Chen and Zhenyu Liang
Remote Sens. 2024, 16(13), 2459; https://doi.org/10.3390/rs16132459 - 4 Jul 2024
Cited by 2 | Viewed by 2059
Abstract
Optical and Synthetic Aperture Radar (SAR) imagery offers a wealth of complementary information on a given target, attributable to the distinct imaging modalities of each component image type. Thus, multimodal remote sensing data have been widely used to improve land cover classification. However, [...] Read more.
Optical and Synthetic Aperture Radar (SAR) imagery offers a wealth of complementary information on a given target, attributable to the distinct imaging modalities of each component image type. Thus, multimodal remote sensing data have been widely used to improve land cover classification. However, fully integrating optical and SAR image data is not straightforward due to the distinct distributions of their features. To this end, we propose a land cover classification network based on multimodal feature fusion, i.e., MFFnet. We adopt a dual-stream network to extract features from SAR and optical images, where a ResNet network is utilized to extract deep features from optical images and PidiNet is employed to extract edge features from SAR. Simultaneously, the iAFF feature fusion module is used to facilitate data interactions between multimodal data for both low- and high-level features. Additionally, to enhance global feature dependency, the ASPP module is employed to handle the interactions between high-level features. The processed high-level features extracted from the dual-stream encoder are fused with low-level features and inputted into the decoder to restore the dimensional feature maps, generating predicted images. Comprehensive evaluations demonstrate that MFFnet achieves excellent performance in both qualitative and quantitative assessments on the WHU-OPT-SAR dataset. Compared to the suboptimal results, our method improves the OA and Kappa metrics by 7.7% and 11.26% on the WHU-OPT-SAR dataset, respectively. Full article
(This article belongs to the Section AI Remote Sensing)
Show Figures

Figure 1

24 pages, 5326 KiB  
Article
Edge-Enhanced Dual-Stream Perception Network for Monocular Depth Estimation
by Zihang Liu and Quande Wang
Electronics 2024, 13(9), 1652; https://doi.org/10.3390/electronics13091652 - 25 Apr 2024
Cited by 2 | Viewed by 1363
Abstract
Estimating depth from a single RGB image has a wide range of applications, such as in robot navigation and autonomous driving. Currently, Convolutional Neural Networks based on encoder–decoder architecture are the most popular methods to estimate depth maps. However, convolutional operators have limitations [...] Read more.
Estimating depth from a single RGB image has a wide range of applications, such as in robot navigation and autonomous driving. Currently, Convolutional Neural Networks based on encoder–decoder architecture are the most popular methods to estimate depth maps. However, convolutional operators have limitations in modeling large-scale dependence, often leading to inaccurate depth predictions at object edges. To address these issues, a new edge-enhanced dual-stream monocular depth estimation method is introduced in this paper. ResNet and Swin Transformer are combined to better extract global and local features, which benefits the estimation of the depth map. To better integrate the information from the two branches of the encoder and the shallow branch of the decoder, we designed a lightweight decoder based on the multi-head Cross-Attention Module. Furthermore, in order to improve the boundary clarity of objects in the depth map, a loss function with an additional penalty for depth estimation error on the edges of objects is presented. The results on three datasets, NYU Depth V2, KITTI, and SUN RGB-D, show that the method presented in this paper achieves better performance for monocular depth estimation. Additionally, it has good generalization capabilities for various scenarios and real-world images. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Computer Vision)
Show Figures

Figure 1

17 pages, 5387 KiB  
Article
Edge-Enhanced TempoFuseNet: A Two-Stream Framework for Intelligent Multiclass Video Anomaly Recognition in 5G and IoT Environments
by Gulshan Saleem, Usama Ijaz Bajwa, Rana Hammad Raza and Fan Zhang
Future Internet 2024, 16(3), 83; https://doi.org/10.3390/fi16030083 - 29 Feb 2024
Cited by 1 | Viewed by 1983
Abstract
Surveillance video analytics encounters unprecedented challenges in 5G and IoT environments, including complex intra-class variations, short-term and long-term temporal dynamics, and variable video quality. This study introduces Edge-Enhanced TempoFuseNet, a cutting-edge framework that strategically reduces spatial resolution to allow the processing of low-resolution [...] Read more.
Surveillance video analytics encounters unprecedented challenges in 5G and IoT environments, including complex intra-class variations, short-term and long-term temporal dynamics, and variable video quality. This study introduces Edge-Enhanced TempoFuseNet, a cutting-edge framework that strategically reduces spatial resolution to allow the processing of low-resolution images. A dual upscaling methodology based on bicubic interpolation and an encoder–bank–decoder configuration is used for anomaly classification. The two-stream architecture combines the power of a pre-trained Convolutional Neural Network (CNN) for spatial feature extraction from RGB imagery in the spatial stream, while the temporal stream focuses on learning short-term temporal characteristics, reducing the computational burden of optical flow. To analyze long-term temporal patterns, the extracted features from both streams are combined and routed through a Gated Recurrent Unit (GRU) layer. The proposed framework (TempoFuseNet) outperforms the encoder–bank–decoder model in terms of performance metrics, achieving a multiclass macro average accuracy of 92.28%, an F1-score of 69.29%, and a false positive rate of 4.41%. This study presents a significant advancement in the field of video anomaly recognition and provides a comprehensive solution to the complex challenges posed by real-world surveillance scenarios in the context of 5G and IoT. Full article
(This article belongs to the Special Issue Edge Intelligence: Edge Computing for 5G and the Internet of Things)
Show Figures

Figure 1

29 pages, 7529 KiB  
Article
MJ-GAN: Generative Adversarial Network with Multi-Grained Feature Extraction and Joint Attention Fusion for Infrared and Visible Image Fusion
by Danqing Yang, Xiaorui Wang, Naibo Zhu, Shuang Li and Na Hou
Sensors 2023, 23(14), 6322; https://doi.org/10.3390/s23146322 - 12 Jul 2023
Cited by 3 | Viewed by 1800
Abstract
The challenging issues in infrared and visible image fusion (IVIF) are extracting and fusing as much useful information as possible contained in the source images, namely, the rich textures in visible images and the significant contrast in infrared images. Existing fusion methods cannot [...] Read more.
The challenging issues in infrared and visible image fusion (IVIF) are extracting and fusing as much useful information as possible contained in the source images, namely, the rich textures in visible images and the significant contrast in infrared images. Existing fusion methods cannot address this problem well due to the handcrafted fusion operations and the extraction of features only from a single scale. In this work, we solve the problems of insufficient information extraction and fusion from another perspective to overcome the difficulties in lacking textures and unhighlighted targets in fused images. We propose a multi-scale feature extraction (MFE) and joint attention fusion (JAF) based end-to-end method using a generative adversarial network (MJ-GAN) framework for the aim of IVIF. The MFE modules are embedded in the two-stream structure-based generator in a densely connected manner to comprehensively extract multi-grained deep features from the source image pairs and reuse them during reconstruction. Moreover, an improved self-attention structure is introduced into the MFEs to enhance the pertinence among multi-grained features. The merging procedure for salient and important features is conducted via the JAF network in a feature recalibration manner, which also produces the fused image in a reasonable manner. Eventually, we can reconstruct a primary fused image with the major infrared radiometric information and a small amount of visible texture information via a single decoder network. The dual discriminator with strong discriminative power can add more texture and contrast information to the final fused image. Extensive experiments on four publicly available datasets show that the proposed method ultimately achieves phenomenal performance in both visual quality and quantitative assessment compared with nine leading algorithms. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

23 pages, 3226 KiB  
Article
A Frequency Attention-Based Dual-Stream Network for Image Inpainting Forensics
by Hongquan Wang, Xinshan Zhu, Chao Ren, Lan Zhang and Shugen Ma
Mathematics 2023, 11(12), 2593; https://doi.org/10.3390/math11122593 - 6 Jun 2023
Cited by 1 | Viewed by 1579
Abstract
The rapid development of digital image inpainting technology is causing serious hidden danger to the security of multimedia information. In this paper, a deep network called frequency attention-based dual-stream network (FADS-Net) is proposed for locating the inpainting region. FADS-Net is established by a [...] Read more.
The rapid development of digital image inpainting technology is causing serious hidden danger to the security of multimedia information. In this paper, a deep network called frequency attention-based dual-stream network (FADS-Net) is proposed for locating the inpainting region. FADS-Net is established by a dual-stream encoder and an attention-based blue-associative decoder. The dual-stream encoder includes two feature extraction streams, the raw input stream (RIS) and the frequency recalibration stream (FRS). RIS directly captures feature maps from the raw input, while FRS performs feature extraction after recalibrating the input via learning in the frequency domain. In addition, a module based on dense connection is designed to ensure efficient extraction and full fusion of dual-stream features. The attention-based associative decoder consists of a main decoder and two branch decoders. The main decoder performs up-sampling and fine-tuning of fused features by using attention mechanisms and skip connections, and ultimately generates the predicted mask for the inpainted image. Then, two branch decoders are utilized to further supervise the training of two feature streams, ensuring that they both work effectively. A joint loss function is designed to supervise the training of the entire network and two feature extraction streams for ensuring optimal forensic performance. Extensive experimental results demonstrate that the proposed FADS-Net achieves superior localization accuracy and robustness on multiple datasets compared to the state-of-the-art inpainting forensics methods. Full article
(This article belongs to the Special Issue Mathematical Methods for Pattern Recognition)
Show Figures

Figure 1

16 pages, 3299 KiB  
Article
LUVS-Net: A Lightweight U-Net Vessel Segmentor for Retinal Vasculature Detection in Fundus Images
by Muhammad Talha Islam, Haroon Ahmed Khan, Khuram Naveed, Ali Nauman, Sardar Muhammad Gulfam and Sung Won Kim
Electronics 2023, 12(8), 1786; https://doi.org/10.3390/electronics12081786 - 10 Apr 2023
Cited by 10 | Viewed by 2250
Abstract
This paper presents LUVS-Net, which is a lightweight convolutional network for retinal vessel segmentation in fundus images that is designed for resource-constrained devices that are typically unable to meet the computational requirements of large neural networks. The computational challenges arise due to low-quality [...] Read more.
This paper presents LUVS-Net, which is a lightweight convolutional network for retinal vessel segmentation in fundus images that is designed for resource-constrained devices that are typically unable to meet the computational requirements of large neural networks. The computational challenges arise due to low-quality retinal images, wide variance in image acquisition conditions and disparities in intensity. Consequently, the training of existing segmentation methods requires a multitude of trainable parameters for the training of networks, resulting in computational complexity. The proposed Lightweight U-Net for Vessel Segmentation Network (LUVS-Net) can achieve high segmentation performance with only a few trainable parameters. This network uses an encoder–decoder framework in which edge data are transposed from the first layers of the encoder to the last layer of the decoder, massively improving the convergence latency. Additionally, LUVS-Net’s design allows for a dual-stream information flow both inside as well as outside of the encoder–decoder pair. The network width is enhanced using group convolutions, which allow the network to learn a larger number of low- and intermediate-level features. Spatial information loss is minimized using skip connections, and class imbalances are mitigated using dice loss for pixel-wise classification. The performance of the proposed network is evaluated on the publicly available retinal blood vessel datasets DRIVE, CHASE_DB1 and STARE. LUVS-Net proves to be quite competitive, outperforming alternative state-of-the-art segmentation methods and achieving comparable accuracy using trainable parameters that are reduced by two to three orders of magnitude compared with those of comparative state-of-the-art methods. Full article
Show Figures

Figure 1

Back to TopTop