Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (53)

Search Parameters:
Keywords = multi-task cascaded convolutional networks

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 7295 KB  
Article
Video Identifying and Eraser: Use Multi-Task Cascaded Convolutional Neural Network to Enhance Safety in a Text-to-Video Diffusion Model
by Shuang Lin, Ranran Zhou and Yong Wang
Appl. Sci. 2026, 16(6), 2995; https://doi.org/10.3390/app16062995 - 20 Mar 2026
Viewed by 127
Abstract
Current security solutions predominantly rely on cloud-based implementations, often neglecting computational resource constraints and operational efficiency. While contemporary methodologies typically require additional training, the few that operate without retraining frequently yield suboptimal performance. To address these limitations, this work leverages a pre-trained MTCNN [...] Read more.
Current security solutions predominantly rely on cloud-based implementations, often neglecting computational resource constraints and operational efficiency. While contemporary methodologies typically require additional training, the few that operate without retraining frequently yield suboptimal performance. To address these limitations, this work leverages a pre-trained MTCNN architecture to detect faces of copyright-protected individuals. We construct a facial landmark database comprising five critical fiducial points, which serves as a supplementary module integrated into the stable diffusion framework, enabling real-time security filtering for synthesized video content. The proposed system utilizes MTCNN models pre-trained in the cloud to build a repository of copyrighted facial signatures, generating a geometric parameter database of facial landmarks. This database, coupled with a parallel verification unit, functions as a plugin within the standard Stable Diffusion pipeline. By leveraging Stable Diffusion’s native decoder, we decode stochastic frames from the U-Net latent representations and perform real-time comparative analysis to identify potential copyright violations in generated video sequences. Upon detecting an infringement, an on-screen display (OSD) alert notifies the user and immediately halts the text-to-video (T2V) generation process. Experimental evaluations demonstrate that our framework effectively mitigates the resource constraints and latency issues inherent in edge deployment scenarios of prior security implementations. Leveraging MTCNN’s proven robustness and extensive edge compatibility for facial recognition, the proposed detection and obfuscation plugin integrates seamlessly with Stable Diffusion while preserving generation quality. Full article
(This article belongs to the Special Issue Applied Multimodal AI: Methods and Applications Across Domains)
Show Figures

Figure 1

27 pages, 16570 KB  
Article
Dual-Region Encryption Model Based on a 3D-MNFC Chaotic System and Logistic Map
by Jingyan Li, Yan Niu, Dan Yu, Yiling Wang, Jiaqi Huang and Mingliang Dou
Entropy 2026, 28(2), 132; https://doi.org/10.3390/e28020132 - 23 Jan 2026
Viewed by 302
Abstract
Facial information carries key personal privacy, and it is crucial to ensure its security through encryption. Traditional encryption for portrait images typically processes the entire image, despite the fact that most regions lack sensitive facial information. This approach is notably inefficient and imposes [...] Read more.
Facial information carries key personal privacy, and it is crucial to ensure its security through encryption. Traditional encryption for portrait images typically processes the entire image, despite the fact that most regions lack sensitive facial information. This approach is notably inefficient and imposes unnecessary computational burdens. To address this inefficiency while maintaining security, we propose a novel dual-region encryption model for portrait images. Firstly, a Multi-task Cascaded Convolutional Network (MTCNN) was adopted to efficiently segment facial images into two regions: facial and non-facial. Subsequently, given the high sensitivity of facial regions, a robust encryption scheme was designed by integrating a CNN-based key generator, the proposed three-dimensional Multi-module Nonlinear Feedback-coupled Chaotic System (3D-MNFC), DNA encoding, and bit reversal. The 3D-MNFC incorporating time-varying parameters, nonlinear terms and state feedback terms and coupling mechanisms has been proven to exhibit excellent chaotic performance. As for non-facial regions, the Logistic map combined with XOR operations is used to balance efficiency and basic security. Finally, the encrypted image is obtained by restoring the two ciphertext images to their original positions. Comprehensive security analyses confirm the exceptional performance of the regional model: large key space (2536) and near-ideal information entropy (7.9995), NPCR and UACI values of 99.6055% and 33.4599%. It is worth noting that the model has been verified to improve efficiency by at least 37.82%. Full article
(This article belongs to the Section Multidisciplinary Applications)
Show Figures

Figure 1

28 pages, 11618 KB  
Article
Cascaded Multi-Attention Feature Recurrent Enhancement Network for Spectral Super-Resolution Reconstruction
by He Jin, Jinhui Lan, Zhixuan Zhuang and Yiliang Zeng
Remote Sens. 2026, 18(2), 202; https://doi.org/10.3390/rs18020202 - 8 Jan 2026
Viewed by 424
Abstract
Hyperspectral imaging (HSI) captures the same scene across multiple spectral bands, providing richer spectral characteristics of materials than conventional RGB images. The spectral reconstruction task seeks to map RGB images into hyperspectral images, enabling high-quality HSI data acquisition without additional hardware investment. Traditional [...] Read more.
Hyperspectral imaging (HSI) captures the same scene across multiple spectral bands, providing richer spectral characteristics of materials than conventional RGB images. The spectral reconstruction task seeks to map RGB images into hyperspectral images, enabling high-quality HSI data acquisition without additional hardware investment. Traditional methods based on linear models or sparse representations struggle to effectively model the nonlinear characteristics of hyperspectral data. Although deep learning approaches have made significant progress, issues such as detail loss and insufficient modeling of spatial–spectral relationships persist. To address these challenges, this paper proposes the Cascaded Multi-Attention Feature Recurrent Enhancement Network (CMFREN). This method achieves targeted breakthroughs over existing approaches through a cascaded architecture of feature purification, spectral balancing and progressive enhancement. This network comprises two core modules: (1) the Hierarchical Residual Attention (HRA) module, which suppresses artifacts in illumination transition regions through residual connections and multi-scale contextual feature fusion, and (2) the Cascaded Multi-Attention (CMA) module, which incorporates a Spatial–Spectral Balanced Feature Extraction (SSBFE) module and a Spectral Enhancement Module (SEM). The SSBFE combines Multi-Scale Residual Feature Enhancement (MSRFE) with Spectral-wise Multi-head Self-Attention (S-MSA) to achieve dynamic optimization of spatial–spectral features, while the SEM synergistically utilizes attention and convolution to progressively enhance spectral details and mitigate spectral aliasing in low-resolution scenes. Experiments across multiple public datasets demonstrate that CMFREN achieves state-of-the-art (SOTA) performance on metrics including RMSE, PSNR, SAM, and MRAE, validating its superiority under complex illumination conditions and detail-degraded scenarios. Full article
Show Figures

Figure 1

25 pages, 5001 KB  
Article
SAR-to-Optical Remote Sensing Image Translation Method Based on InternImage and Cascaded Multi-Head Attention
by Cheng Xu and Yingying Kong
Remote Sens. 2026, 18(1), 55; https://doi.org/10.3390/rs18010055 - 24 Dec 2025
Viewed by 667
Abstract
Synthetic aperture radar (SAR), with its all-weather and all-day observation capabilities, plays a significant role in the field of remote sensing. However, due to the unique imaging mechanism of SAR, its interpretation is challenging. Translating SAR images into optical remote sensing images has [...] Read more.
Synthetic aperture radar (SAR), with its all-weather and all-day observation capabilities, plays a significant role in the field of remote sensing. However, due to the unique imaging mechanism of SAR, its interpretation is challenging. Translating SAR images into optical remote sensing images has become a research hotspot in recent years to enhance the interpretability of SAR images. This paper proposes a deep learning-based method for SAR-to-optical remote sensing image translation. The network comprises three parts: a global representor, a generator with cascaded multi-head attention, and a multi-scale discriminator. The global representor, built upon InternImage with deformable convolution v3 (DCNv3) as its core operator, leverages its global receptive field and adaptive spatial aggregation capabilities to extract global semantic features from SAR images. The generator follows the classic “encoder-bottleneck-decoder” structure, where the encoder focuses on extracting local detail features from SAR images. The cascaded multi-head attention module within the bottleneck layer optimizes local detail features and facilitates feature interaction between global semantics and local details. The discriminator adopts a multi-scale structure based on the local receptive field PatchGAN, enabling joint global and local discrimination. Furthermore, for the first time in SAR image translation tasks, structural similarity index metric (SSIM) loss is combined with adversarial loss, perceptual loss, and feature matching loss as the loss function. A series of experiments demonstrate the effectiveness and reliability of the proposed method. Compared to mainstream image translation methods, our method ultimately generates higher-quality optical remote sensing images that are semantically consistent, texturally authentic, clearly detailed, and visually reasonable appearances. Full article
Show Figures

Figure 1

17 pages, 2779 KB  
Article
Image Restoration Based on Semantic Prior Aware Hierarchical Network and Multi-Scale Fusion Generator
by Yapei Feng, Yuxiang Tang and Hua Zhong
Technologies 2025, 13(11), 521; https://doi.org/10.3390/technologies13110521 - 13 Nov 2025
Viewed by 736
Abstract
As a fundamental low-level vision task, image restoration plays a pivotal role in reconstructing authentic visual information from corrupted inputs, directly impacting the performance of downstream high-level vision systems. Current approaches frequently exhibit two critical limitations: (1) Progressive texture degradation and blurring during [...] Read more.
As a fundamental low-level vision task, image restoration plays a pivotal role in reconstructing authentic visual information from corrupted inputs, directly impacting the performance of downstream high-level vision systems. Current approaches frequently exhibit two critical limitations: (1) Progressive texture degradation and blurring during iterative refinement, particularly in irregular damage patterns. (2) Structural incoherence when handling cross-domain artifacts. To address these challenges, we present a semantic-aware hierarchical network (SAHN) that synergistically integrates multi-scale semantic guidance with structural consistency constraints. Firstly, we construct a Dual-Stream Feature Extractor. Based on a modified U-Net backbone with dilated residual blocks, this skip-connected encoder–decoder module simultaneously captures hierarchical semantic contexts and fine-grained texture details. Secondly, we propose the semantic prior mapper by establishing spatial–semantic correspondences between damaged areas and multi-scale features through predefined semantic prototypes through adaptive attention pooling. Additionally, we construct a multi-scale fusion generator, by employing cascaded association blocks with structural similarity constraints. This unit progressively aggregates features from different semantic levels using deformable convolution kernels, effectively bridging the gap between global structure and local texture reconstruction. Compared to existing methods, our algorithm attains the highest overall PSNR of 34.99 with the best visual authenticity (with the lowest FID of 11.56). Comprehensive evaluations of three datasets demonstrate its leading performance in restoring visual realism. Full article
Show Figures

Figure 1

18 pages, 1960 KB  
Article
CasDacGCN: A Dynamic Attention-Calibrated Graph Convolutional Network for Information Popularity Prediction
by Bofeng Zhang, Yanlin Zhu, Zhirong Zhang, Kaili Liao, Sen Niu, Bingchun Li and Haiyan Li
Entropy 2025, 27(10), 1064; https://doi.org/10.3390/e27101064 - 14 Oct 2025
Cited by 1 | Viewed by 951
Abstract
Information popularity prediction is a critical problem in social network analysis. With the increasing prevalence of social platforms, accurate prediction of the diffusion process has become increasingly important. Existing methods mainly rely on graph neural networks to model structural relationships, but they are [...] Read more.
Information popularity prediction is a critical problem in social network analysis. With the increasing prevalence of social platforms, accurate prediction of the diffusion process has become increasingly important. Existing methods mainly rely on graph neural networks to model structural relationships, but they are often insufficient in capturing the complex interplay between temporal evolution and local cascade structures, especially in real-world scenarios involving sparse or rapidly changing cascades. To address this issue, we propose the Cascading Dynamic attention-calibrated Graph Convolutional Network, named CasDacGCN. It enhances prediction performance through spatiotemporal feature fusion and adaptive representation learning. The model integrates snapshot-level local encoding, global temporal modeling, cross-attention mechanisms, and a hypernetwork-based sample-wise calibration strategy, enabling flexible modeling of multi-scale diffusion patterns. Results from experiments demonstrate that the proposed model consistently surpasses existing approaches on two real-world datasets, validating its effectiveness in popularity prediction tasks. Full article
Show Figures

Figure 1

25 pages, 3263 KB  
Article
Combining MTCNN and Enhanced FaceNet with Adaptive Feature Fusion for Robust Face Recognition
by Sasan Karamizadeh, Saman Shojae Chaeikar and Hamidreza Salarian
Technologies 2025, 13(10), 450; https://doi.org/10.3390/technologies13100450 - 3 Oct 2025
Cited by 2 | Viewed by 2409
Abstract
Face recognition systems typically face actual challenges like facial pose, illumination, occlusion, and ageing that significantly impact the recognition accuracy. In this paper, a robust face recognition system that uses Multi-task Cascaded Convolutional Networks (MTCNN) for face detection and face alignment with an [...] Read more.
Face recognition systems typically face actual challenges like facial pose, illumination, occlusion, and ageing that significantly impact the recognition accuracy. In this paper, a robust face recognition system that uses Multi-task Cascaded Convolutional Networks (MTCNN) for face detection and face alignment with an enhanced FaceNet for facial embedding extraction is presented. The enhanced FaceNet uses attention mechanisms to achieve more discriminative facial embeddings, especially in challenging scenarios. In addition, an Adaptive Feature Fusion module synthetically combines identity-specific embeddings with context information such as pose, lighting, and presence of masks, hence enhancing robustness and accuracy. Training takes place using the CelebA dataset, and the test is conducted independently on LFW and IJB-C to enable subject-disjoint evaluation. CelebA has over 200,000 faces of 10,177 individuals, LFW consists of 13,000+ faces of 5749 individuals in unconstrained conditions, and IJB-C has 31,000 faces and 117,000 video frames with extreme pose and occlusion changes. The system introduced here achieves 99.6% on CelebA, 94.2% on LFW, and 91.5% on IJB-C and outperforms baselines such as simple MTCNN-FaceNet, AFF-Net, and state-of-the-art models such as ArcFace, CosFace, and AdaCos. These findings demonstrate that the proposed framework generalizes effectively between datasets and is resilient in real-world scenarios. Full article
Show Figures

Graphical abstract

19 pages, 7270 KB  
Article
A Fast Rotation Detection Network with Parallel Interleaved Convolutional Kernels
by Leilei Deng, Lifeng Sun and Hua Li
Symmetry 2025, 17(10), 1621; https://doi.org/10.3390/sym17101621 - 1 Oct 2025
Viewed by 591
Abstract
In recent years, convolutional neural network-based object detectors have achieved extensive applications in remote sensing (RS) image interpretation. While multi-scale feature modeling optimization remains a persistent research focus, existing methods frequently overlook the symmetrical balance between feature granularity and morphological diversity, particularly when [...] Read more.
In recent years, convolutional neural network-based object detectors have achieved extensive applications in remote sensing (RS) image interpretation. While multi-scale feature modeling optimization remains a persistent research focus, existing methods frequently overlook the symmetrical balance between feature granularity and morphological diversity, particularly when handling high-aspect-ratio RS targets with anisotropic geometries. This oversight leads to suboptimal feature representations characterized by spatial sparsity and directional bias. To address this challenge, we propose the Parallel Interleaved Convolutional Kernel Network (PICK-Net), a rotation-aware detection framework that embodies symmetry principles through dual-path feature modulation and geometrically balanced operator design. The core innovation lies in the synergistic integration of cascaded dynamic sparse sampling and symmetrically decoupled feature modulation, enabling adaptive morphological modeling of RS targets. Specifically, the Parallel Interleaved Convolution (PIC) module establishes symmetric computation patterns through mirrored kernel arrangements, effectively reducing computational redundancy while preserving directional completeness through rotational symmetry-enhanced receptive field optimization. Complementing this, the Global Complementary Attention Mechanism (GCAM) introduces bidirectional symmetry in feature recalibration, decoupling channel-wise and spatial-wise adaptations through orthogonal attention pathways that maintain equilibrium in gradient propagation. Extensive experiments on RSOD and NWPU-VHR-10 datasets demonstrate our superior performance, achieving 92.2% and 84.90% mAP, respectively, outperforming state-of-the-art methods including EfficientNet and YOLOv8. With only 12.5 M parameters, the framework achieves symmetrical optimization of accuracy-efficiency trade-offs. Ablation studies confirm that the symmetric interaction between PIC and GCAM enhances detection performance by 2.75%, particularly excelling in scenarios requiring geometric symmetry preservation, such as dense target clusters and extreme scale variations. Cross-domain validation on agricultural pest datasets further verifies its rotational symmetry generalization capability, demonstrating 84.90% accuracy in fine-grained orientation-sensitive detection tasks. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

22 pages, 1269 KB  
Article
LightFakeDetect: A Lightweight Model for Deepfake Detection in Videos That Focuses on Facial Regions
by Sarab AlMuhaideb, Hessa Alshaya, Layan Almutairi, Danah Alomran and Sarah Turki Alhamed
Mathematics 2025, 13(19), 3088; https://doi.org/10.3390/math13193088 - 25 Sep 2025
Cited by 4 | Viewed by 6262
Abstract
In recent years, the proliferation of forged videos, known as deepfakes, has escalated significantly, primarily due to advancements in technologies such as Generative Adversarial Networks (GANs), diffusion models, and Vision Language Models (VLMs). These deepfakes present substantial risks, threatening political stability, facilitating celebrity [...] Read more.
In recent years, the proliferation of forged videos, known as deepfakes, has escalated significantly, primarily due to advancements in technologies such as Generative Adversarial Networks (GANs), diffusion models, and Vision Language Models (VLMs). These deepfakes present substantial risks, threatening political stability, facilitating celebrity impersonation, and enabling tampering with evidence. As the sophistication of deepfake technology increases, detecting these manipulated videos becomes increasingly challenging. Most of the existing deepfake detection methods use Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), or Vision Transformers (ViTs), achieving strong accuracy but exhibiting high computational demands. This highlights the need for a lightweight yet effective pipeline for real-time and resource-limited scenarios. This study introduces a lightweight deep learning model for deepfake detection in order to address this emerging threat. The model incorporates three integral components: MobileNet for feature extraction, a Convolutional Block Attention Module (CBAM) for feature enhancement, and a Gated Recurrent Unit (GRU) for temporal analysis. Additionally, a pre-trained Multi-Task Cascaded Convolutional Network (MTCNN) is utilized for face detection and cropping. The model is evaluated using the Deepfake Detection Challenge (DFDC) and Celeb-DF v2 datasets, demonstrating impressive performance, with 98.2% accuracy and a 99.0% F1-score on Celeb-DF v2 and 95.0% accuracy and a 97.2% F1-score on DFDC, achieving a commendable balance between simplicity and effectiveness. Full article
Show Figures

Figure 1

36 pages, 8122 KB  
Article
Human Activity Recognition via Attention-Augmented TCN-BiGRU Fusion
by Ji-Long He, Jian-Hong Wang, Chih-Min Lo and Zhaodi Jiang
Sensors 2025, 25(18), 5765; https://doi.org/10.3390/s25185765 - 16 Sep 2025
Cited by 5 | Viewed by 2344
Abstract
With the widespread application of wearable sensors in health monitoring and human–computer interaction, deep learning-based human activity recognition (HAR) research faces challenges such as the effective extraction of multi-scale temporal features and the enhancement of robustness against noise in multi-source data. This study [...] Read more.
With the widespread application of wearable sensors in health monitoring and human–computer interaction, deep learning-based human activity recognition (HAR) research faces challenges such as the effective extraction of multi-scale temporal features and the enhancement of robustness against noise in multi-source data. This study proposes the TGA-HAR (TCN-GRU-Attention-HAR) model. The TGA-HAR model integrates Temporal Convolutional Neural Networks and Recurrent Neural Networks by constructing a hierarchical feature abstraction architecture through cascading Temporal Convolutional Network (TCN) and Bidirectional Gated Recurrent Unit (BiGRU) layers for complex activity recognition. This study utilizes TCN layers with dilated convolution kernels to extract multi-order temporal features. This study utilizes BiGRU layers to capture bidirectional temporal contextual correlation information. To further optimize feature representation, the TGA-HAR model introduces residual connections to enhance the stability of gradient propagation and employs an adaptive weighted attention mechanism to strengthen feature representation. The experimental results of this study demonstrate that the model achieved test accuracies of 99.37% on the WISDM dataset, 95.36% on the USC-HAD dataset, and 96.96% on the PAMAP2 dataset. Furthermore, we conducted tests on datasets collected in real-world scenarios. This method provides a highly robust solution for complex human activity recognition tasks. Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

22 pages, 1243 KB  
Article
ProCo-NET: Progressive Strip Convolution and Frequency- Optimized Framework for Scale-Gradient-Aware Semantic Segmentation in Off-Road Scenes
by Zihang Liu, Donglin Jing and Chenxiang Ji
Symmetry 2025, 17(9), 1428; https://doi.org/10.3390/sym17091428 - 2 Sep 2025
Viewed by 844
Abstract
In off-road scenes, segmentation targets exhibit significant scale progression due to perspective depth effects from oblique viewing angles, meaning that the size of the same target undergoes continuous, boundary-less progressive changes along a specific direction. This asymmetric variation disrupts the geometric symmetry of [...] Read more.
In off-road scenes, segmentation targets exhibit significant scale progression due to perspective depth effects from oblique viewing angles, meaning that the size of the same target undergoes continuous, boundary-less progressive changes along a specific direction. This asymmetric variation disrupts the geometric symmetry of targets, causing traditional segmentation networks to face three key challenges: (1) inefficientcapture of continuous-scale features, where pyramid structures and multi-scale kernels struggle to balance computational efficiency with sufficient coverage of progressive scales; (2) degraded intra-class feature consistency, where local scale differences within targets induce semantic ambiguity; and (3) loss of high-frequency boundary information, where feature sampling operations exacerbate the blurring of progressive boundaries. To address these issues, this paper proposes the ProCo-NET framework for systematic optimization. Firstly, a Progressive Strip Convolution Group (PSCG) is designed to construct multi-level receptive field expansion through orthogonally oriented strip convolution cascading (employing symmetric processing in horizontal/vertical directions) integrated with self-attention mechanisms, enhancing perception capability for asymmetric continuous-scale variations. Secondly, an Offset-Frequency Cooperative Module (OFCM) is developed wherein a learnable offset generator dynamically adjusts sampling point distributions to enhance intra-class consistency, while a dual-channel frequency domain filter performs adaptive high-pass filtering to sharpen target boundaries. These components synergistically solve feature consistency degradation and boundary ambiguity under asymmetric changes. Experiments show that this framework significantly improves the segmentation accuracy and boundary clarity of multi-scale targets in off-road scene segmentation tasks: it achieves 71.22% MIoU on the standard RUGD dataset (0.84% higher than the existing optimal method) and 83.05% MIoU on the Freiburg_Forest dataset. Among them, the segmentation accuracy of key obstacle categories is significantly improved to 52.04% (2.7% higher than the sub-optimal model). This framework effectively compensates for the impact of asymmetric deformation through a symmetric computing mechanism. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

29 pages, 15488 KB  
Article
GOFENet: A Hybrid Transformer–CNN Network Integrating GEOBIA-Based Object Priors for Semantic Segmentation of Remote Sensing Images
by Tao He, Jianyu Chen and Delu Pan
Remote Sens. 2025, 17(15), 2652; https://doi.org/10.3390/rs17152652 - 31 Jul 2025
Cited by 5 | Viewed by 1647
Abstract
Geographic object-based image analysis (GEOBIA) has demonstrated substantial utility in remote sensing tasks. However, its integration with deep learning remains largely confined to image-level classification. This is primarily due to the irregular shapes and fragmented boundaries of segmented objects, which limit its applicability [...] Read more.
Geographic object-based image analysis (GEOBIA) has demonstrated substantial utility in remote sensing tasks. However, its integration with deep learning remains largely confined to image-level classification. This is primarily due to the irregular shapes and fragmented boundaries of segmented objects, which limit its applicability in semantic segmentation. While convolutional neural networks (CNNs) excel at local feature extraction, they inherently struggle to capture long-range dependencies. In contrast, Transformer-based models are well suited for global context modeling but often lack fine-grained local detail. To overcome these limitations, we propose GOFENet (Geo-Object Feature Enhanced Network)—a hybrid semantic segmentation architecture that effectively fuses object-level priors into deep feature representations. GOFENet employs a dual-encoder design combining CNN and Swin Transformer architectures, enabling multi-scale feature fusion through skip connections to preserve both local and global semantics. An auxiliary branch incorporating cascaded atrous convolutions is introduced to inject information of segmented objects into the learning process. Furthermore, we develop a cross-channel selection module (CSM) for refined channel-wise attention, a feature enhancement module (FEM) to merge global and local representations, and a shallow–deep feature fusion module (SDFM) to integrate pixel- and object-level cues across scales. Experimental results on the GID and LoveDA datasets demonstrate that GOFENet achieves superior segmentation performance, with 66.02% mIoU and 51.92% mIoU, respectively. The model exhibits strong capability in delineating large-scale land cover features, producing sharper object boundaries and reducing classification noise, while preserving the integrity and discriminability of land cover categories. Full article
Show Figures

Graphical abstract

13 pages, 13928 KB  
Article
Voter Authentication Using Enhanced ResNet50 for Facial Recognition
by Aminou Halidou, Daniel Georges Olle Olle, Arnaud Nguembang Fadja, Daramy Vandi Von Kallon and Tchana Ngninkeu Gil Thibault
Signals 2025, 6(2), 25; https://doi.org/10.3390/signals6020025 - 23 May 2025
Cited by 1 | Viewed by 2478
Abstract
Electoral fraud, particularly multiple voting, undermines the integrity of democratic processes. To address this challenge, this study introduces an innovative facial recognition system that integrates an enhanced 50-layer Residual Network (ResNet50) architecture with Additive Angular Margin Loss (ArcFace) and Multi-Task Cascaded Convolutional Neural [...] Read more.
Electoral fraud, particularly multiple voting, undermines the integrity of democratic processes. To address this challenge, this study introduces an innovative facial recognition system that integrates an enhanced 50-layer Residual Network (ResNet50) architecture with Additive Angular Margin Loss (ArcFace) and Multi-Task Cascaded Convolutional Neural Networks (MTCNN) for face detection. Using the Mahalanobis distance, the system verifies voter identities by comparing captured facial images with previously recorded biometric features. Extensive evaluations demonstrate the methodology’s effectiveness, achieving a facial recognition accuracy of 99.85%. This significant improvement over existing baseline methods has the potential to enhance electoral transparency and prevent multiple voting. The findings contribute to developing robust biometric-based electoral systems, thereby promoting democratic trust and accountability. Full article
Show Figures

Figure 1

13 pages, 6337 KB  
Article
Printed Circuit Board Sample Expansion and Automatic Defect Detection Based on Diffusion Models and ConvNeXt
by Youzhi Xu, Hao Wu, Yulong Liu and Xiaoming Liu
Micromachines 2025, 16(3), 261; https://doi.org/10.3390/mi16030261 - 26 Feb 2025
Cited by 6 | Viewed by 2191
Abstract
Soldering of printed circuit board (PCB)-based surface-mounted assemblies is a critical process, and to enhance the accuracy of detecting their multi-targeted soldering defects, we propose an automated sample generation method that combines ControlNet and a Stable Diffusion Model. This method can expand the [...] Read more.
Soldering of printed circuit board (PCB)-based surface-mounted assemblies is a critical process, and to enhance the accuracy of detecting their multi-targeted soldering defects, we propose an automated sample generation method that combines ControlNet and a Stable Diffusion Model. This method can expand the dataset by quickly obtaining sample images with high quality containing both defects and normal detection targets. Meanwhile, we propose the Cascade Mask R-CNN model with ConvNeXt as the backbone, which performs well in dealing with multi-target defect detection tasks. Unlike previous detection methods that can only detect a single component, it can detect all components in the region. The results of the experiment demonstrate that the detection accuracy of our proposed approach is significantly enhanced over the previous convolutional neural network model, with an increase of more than 10.5% in the mean accuracy precision (mAP) and 9.5% in the average recall (AR). Full article
Show Figures

Figure 1

41 pages, 1802 KB  
Review
A Systematic Review of CNN Architectures, Databases, Performance Metrics, and Applications in Face Recognition
by Andisani Nemavhola, Colin Chibaya and Serestina Viriri
Information 2025, 16(2), 107; https://doi.org/10.3390/info16020107 - 5 Feb 2025
Cited by 18 | Viewed by 9708
Abstract
This study provides a comparative evaluation of face recognition databases and Convolutional Neural Network (CNN) architectures used in training and testing face recognition systems. The databases span from early datasets like Olivetti Research Laboratory (ORL) and Facial Recognition Technology (FERET) to more recent [...] Read more.
This study provides a comparative evaluation of face recognition databases and Convolutional Neural Network (CNN) architectures used in training and testing face recognition systems. The databases span from early datasets like Olivetti Research Laboratory (ORL) and Facial Recognition Technology (FERET) to more recent collections such as MegaFace and Ms-Celeb-1M, offering a range of sizes, subject diversity, and image quality. Older databases, such as ORL and FERET, are smaller and cleaner, while newer datasets enable large-scale training with millions of images but pose challenges like inconsistent data quality and high computational costs. The study also examines CNN architectures, including FaceNet and Visual Geometry Group 16 (VGG16), which show strong performance on large datasets like Labeled Faces in the Wild (LFW) and VGGFace, achieving accuracy rates above 98%. In contrast, earlier models like Support Vector Machine (SVM) and Gabor Wavelets perform well on smaller datasets but lack scalability for larger, more complex datasets. The analysis highlights the growing importance of multi-task learning and ensemble methods, as seen in Multi-Task Cascaded Convolutional Networks (MTCNNs). Overall, the findings emphasize the need for advanced algorithms capable of handling large-scale, real-world challenges while optimizing accuracy and computational efficiency in face recognition systems. Full article
(This article belongs to the Special Issue Machine Learning and Data Mining for User Classification)
Show Figures

Figure 1

Back to TopTop