Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (47)

Search Parameters:
Keywords = multi-task cascaded convolutional networks

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 3263 KB  
Article
Combining MTCNN and Enhanced FaceNet with Adaptive Feature Fusion for Robust Face Recognition
by Sasan Karamizadeh, Saman Shojae Chaeikar and Hamidreza Salarian
Technologies 2025, 13(10), 450; https://doi.org/10.3390/technologies13100450 - 3 Oct 2025
Abstract
Face recognition systems typically face actual challenges like facial pose, illumination, occlusion, and ageing that significantly impact the recognition accuracy. In this paper, a robust face recognition system that uses Multi-task Cascaded Convolutional Networks (MTCNN) for face detection and face alignment with an [...] Read more.
Face recognition systems typically face actual challenges like facial pose, illumination, occlusion, and ageing that significantly impact the recognition accuracy. In this paper, a robust face recognition system that uses Multi-task Cascaded Convolutional Networks (MTCNN) for face detection and face alignment with an enhanced FaceNet for facial embedding extraction is presented. The enhanced FaceNet uses attention mechanisms to achieve more discriminative facial embeddings, especially in challenging scenarios. In addition, an Adaptive Feature Fusion module synthetically combines identity-specific embeddings with context information such as pose, lighting, and presence of masks, hence enhancing robustness and accuracy. Training takes place using the CelebA dataset, and the test is conducted independently on LFW and IJB-C to enable subject-disjoint evaluation. CelebA has over 200,000 faces of 10,177 individuals, LFW consists of 13,000+ faces of 5749 individuals in unconstrained conditions, and IJB-C has 31,000 faces and 117,000 video frames with extreme pose and occlusion changes. The system introduced here achieves 99.6% on CelebA, 94.2% on LFW, and 91.5% on IJB-C and outperforms baselines such as simple MTCNN-FaceNet, AFF-Net, and state-of-the-art models such as ArcFace, CosFace, and AdaCos. These findings demonstrate that the proposed framework generalizes effectively between datasets and is resilient in real-world scenarios. Full article
Show Figures

Figure 1

19 pages, 7270 KB  
Article
A Fast Rotation Detection Network with Parallel Interleaved Convolutional Kernels
by Leilei Deng, Lifeng Sun and Hua Li
Symmetry 2025, 17(10), 1621; https://doi.org/10.3390/sym17101621 - 1 Oct 2025
Abstract
In recent years, convolutional neural network-based object detectors have achieved extensive applications in remote sensing (RS) image interpretation. While multi-scale feature modeling optimization remains a persistent research focus, existing methods frequently overlook the symmetrical balance between feature granularity and morphological diversity, particularly when [...] Read more.
In recent years, convolutional neural network-based object detectors have achieved extensive applications in remote sensing (RS) image interpretation. While multi-scale feature modeling optimization remains a persistent research focus, existing methods frequently overlook the symmetrical balance between feature granularity and morphological diversity, particularly when handling high-aspect-ratio RS targets with anisotropic geometries. This oversight leads to suboptimal feature representations characterized by spatial sparsity and directional bias. To address this challenge, we propose the Parallel Interleaved Convolutional Kernel Network (PICK-Net), a rotation-aware detection framework that embodies symmetry principles through dual-path feature modulation and geometrically balanced operator design. The core innovation lies in the synergistic integration of cascaded dynamic sparse sampling and symmetrically decoupled feature modulation, enabling adaptive morphological modeling of RS targets. Specifically, the Parallel Interleaved Convolution (PIC) module establishes symmetric computation patterns through mirrored kernel arrangements, effectively reducing computational redundancy while preserving directional completeness through rotational symmetry-enhanced receptive field optimization. Complementing this, the Global Complementary Attention Mechanism (GCAM) introduces bidirectional symmetry in feature recalibration, decoupling channel-wise and spatial-wise adaptations through orthogonal attention pathways that maintain equilibrium in gradient propagation. Extensive experiments on RSOD and NWPU-VHR-10 datasets demonstrate our superior performance, achieving 92.2% and 84.90% mAP, respectively, outperforming state-of-the-art methods including EfficientNet and YOLOv8. With only 12.5 M parameters, the framework achieves symmetrical optimization of accuracy-efficiency trade-offs. Ablation studies confirm that the symmetric interaction between PIC and GCAM enhances detection performance by 2.75%, particularly excelling in scenarios requiring geometric symmetry preservation, such as dense target clusters and extreme scale variations. Cross-domain validation on agricultural pest datasets further verifies its rotational symmetry generalization capability, demonstrating 84.90% accuracy in fine-grained orientation-sensitive detection tasks. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

22 pages, 1269 KB  
Article
LightFakeDetect: A Lightweight Model for Deepfake Detection in Videos That Focuses on Facial Regions
by Sarab AlMuhaideb, Hessa Alshaya, Layan Almutairi, Danah Alomran and Sarah Turki Alhamed
Mathematics 2025, 13(19), 3088; https://doi.org/10.3390/math13193088 - 25 Sep 2025
Abstract
In recent years, the proliferation of forged videos, known as deepfakes, has escalated significantly, primarily due to advancements in technologies such as Generative Adversarial Networks (GANs), diffusion models, and Vision Language Models (VLMs). These deepfakes present substantial risks, threatening political stability, facilitating celebrity [...] Read more.
In recent years, the proliferation of forged videos, known as deepfakes, has escalated significantly, primarily due to advancements in technologies such as Generative Adversarial Networks (GANs), diffusion models, and Vision Language Models (VLMs). These deepfakes present substantial risks, threatening political stability, facilitating celebrity impersonation, and enabling tampering with evidence. As the sophistication of deepfake technology increases, detecting these manipulated videos becomes increasingly challenging. Most of the existing deepfake detection methods use Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), or Vision Transformers (ViTs), achieving strong accuracy but exhibiting high computational demands. This highlights the need for a lightweight yet effective pipeline for real-time and resource-limited scenarios. This study introduces a lightweight deep learning model for deepfake detection in order to address this emerging threat. The model incorporates three integral components: MobileNet for feature extraction, a Convolutional Block Attention Module (CBAM) for feature enhancement, and a Gated Recurrent Unit (GRU) for temporal analysis. Additionally, a pre-trained Multi-Task Cascaded Convolutional Network (MTCNN) is utilized for face detection and cropping. The model is evaluated using the Deepfake Detection Challenge (DFDC) and Celeb-DF v2 datasets, demonstrating impressive performance, with 98.2% accuracy and a 99.0% F1-score on Celeb-DF v2 and 95.0% accuracy and a 97.2% F1-score on DFDC, achieving a commendable balance between simplicity and effectiveness. Full article
Show Figures

Figure 1

36 pages, 8122 KB  
Article
Human Activity Recognition via Attention-Augmented TCN-BiGRU Fusion
by Ji-Long He, Jian-Hong Wang, Chih-Min Lo and Zhaodi Jiang
Sensors 2025, 25(18), 5765; https://doi.org/10.3390/s25185765 - 16 Sep 2025
Viewed by 449
Abstract
With the widespread application of wearable sensors in health monitoring and human–computer interaction, deep learning-based human activity recognition (HAR) research faces challenges such as the effective extraction of multi-scale temporal features and the enhancement of robustness against noise in multi-source data. This study [...] Read more.
With the widespread application of wearable sensors in health monitoring and human–computer interaction, deep learning-based human activity recognition (HAR) research faces challenges such as the effective extraction of multi-scale temporal features and the enhancement of robustness against noise in multi-source data. This study proposes the TGA-HAR (TCN-GRU-Attention-HAR) model. The TGA-HAR model integrates Temporal Convolutional Neural Networks and Recurrent Neural Networks by constructing a hierarchical feature abstraction architecture through cascading Temporal Convolutional Network (TCN) and Bidirectional Gated Recurrent Unit (BiGRU) layers for complex activity recognition. This study utilizes TCN layers with dilated convolution kernels to extract multi-order temporal features. This study utilizes BiGRU layers to capture bidirectional temporal contextual correlation information. To further optimize feature representation, the TGA-HAR model introduces residual connections to enhance the stability of gradient propagation and employs an adaptive weighted attention mechanism to strengthen feature representation. The experimental results of this study demonstrate that the model achieved test accuracies of 99.37% on the WISDM dataset, 95.36% on the USC-HAD dataset, and 96.96% on the PAMAP2 dataset. Furthermore, we conducted tests on datasets collected in real-world scenarios. This method provides a highly robust solution for complex human activity recognition tasks. Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

22 pages, 1243 KB  
Article
ProCo-NET: Progressive Strip Convolution and Frequency- Optimized Framework for Scale-Gradient-Aware Semantic Segmentation in Off-Road Scenes
by Zihang Liu, Donglin Jing and Chenxiang Ji
Symmetry 2025, 17(9), 1428; https://doi.org/10.3390/sym17091428 - 2 Sep 2025
Viewed by 467
Abstract
In off-road scenes, segmentation targets exhibit significant scale progression due to perspective depth effects from oblique viewing angles, meaning that the size of the same target undergoes continuous, boundary-less progressive changes along a specific direction. This asymmetric variation disrupts the geometric symmetry of [...] Read more.
In off-road scenes, segmentation targets exhibit significant scale progression due to perspective depth effects from oblique viewing angles, meaning that the size of the same target undergoes continuous, boundary-less progressive changes along a specific direction. This asymmetric variation disrupts the geometric symmetry of targets, causing traditional segmentation networks to face three key challenges: (1) inefficientcapture of continuous-scale features, where pyramid structures and multi-scale kernels struggle to balance computational efficiency with sufficient coverage of progressive scales; (2) degraded intra-class feature consistency, where local scale differences within targets induce semantic ambiguity; and (3) loss of high-frequency boundary information, where feature sampling operations exacerbate the blurring of progressive boundaries. To address these issues, this paper proposes the ProCo-NET framework for systematic optimization. Firstly, a Progressive Strip Convolution Group (PSCG) is designed to construct multi-level receptive field expansion through orthogonally oriented strip convolution cascading (employing symmetric processing in horizontal/vertical directions) integrated with self-attention mechanisms, enhancing perception capability for asymmetric continuous-scale variations. Secondly, an Offset-Frequency Cooperative Module (OFCM) is developed wherein a learnable offset generator dynamically adjusts sampling point distributions to enhance intra-class consistency, while a dual-channel frequency domain filter performs adaptive high-pass filtering to sharpen target boundaries. These components synergistically solve feature consistency degradation and boundary ambiguity under asymmetric changes. Experiments show that this framework significantly improves the segmentation accuracy and boundary clarity of multi-scale targets in off-road scene segmentation tasks: it achieves 71.22% MIoU on the standard RUGD dataset (0.84% higher than the existing optimal method) and 83.05% MIoU on the Freiburg_Forest dataset. Among them, the segmentation accuracy of key obstacle categories is significantly improved to 52.04% (2.7% higher than the sub-optimal model). This framework effectively compensates for the impact of asymmetric deformation through a symmetric computing mechanism. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

29 pages, 15488 KB  
Article
GOFENet: A Hybrid Transformer–CNN Network Integrating GEOBIA-Based Object Priors for Semantic Segmentation of Remote Sensing Images
by Tao He, Jianyu Chen and Delu Pan
Remote Sens. 2025, 17(15), 2652; https://doi.org/10.3390/rs17152652 - 31 Jul 2025
Viewed by 798
Abstract
Geographic object-based image analysis (GEOBIA) has demonstrated substantial utility in remote sensing tasks. However, its integration with deep learning remains largely confined to image-level classification. This is primarily due to the irregular shapes and fragmented boundaries of segmented objects, which limit its applicability [...] Read more.
Geographic object-based image analysis (GEOBIA) has demonstrated substantial utility in remote sensing tasks. However, its integration with deep learning remains largely confined to image-level classification. This is primarily due to the irregular shapes and fragmented boundaries of segmented objects, which limit its applicability in semantic segmentation. While convolutional neural networks (CNNs) excel at local feature extraction, they inherently struggle to capture long-range dependencies. In contrast, Transformer-based models are well suited for global context modeling but often lack fine-grained local detail. To overcome these limitations, we propose GOFENet (Geo-Object Feature Enhanced Network)—a hybrid semantic segmentation architecture that effectively fuses object-level priors into deep feature representations. GOFENet employs a dual-encoder design combining CNN and Swin Transformer architectures, enabling multi-scale feature fusion through skip connections to preserve both local and global semantics. An auxiliary branch incorporating cascaded atrous convolutions is introduced to inject information of segmented objects into the learning process. Furthermore, we develop a cross-channel selection module (CSM) for refined channel-wise attention, a feature enhancement module (FEM) to merge global and local representations, and a shallow–deep feature fusion module (SDFM) to integrate pixel- and object-level cues across scales. Experimental results on the GID and LoveDA datasets demonstrate that GOFENet achieves superior segmentation performance, with 66.02% mIoU and 51.92% mIoU, respectively. The model exhibits strong capability in delineating large-scale land cover features, producing sharper object boundaries and reducing classification noise, while preserving the integrity and discriminability of land cover categories. Full article
Show Figures

Graphical abstract

13 pages, 13928 KB  
Article
Voter Authentication Using Enhanced ResNet50 for Facial Recognition
by Aminou Halidou, Daniel Georges Olle Olle, Arnaud Nguembang Fadja, Daramy Vandi Von Kallon and Tchana Ngninkeu Gil Thibault
Signals 2025, 6(2), 25; https://doi.org/10.3390/signals6020025 - 23 May 2025
Viewed by 1149
Abstract
Electoral fraud, particularly multiple voting, undermines the integrity of democratic processes. To address this challenge, this study introduces an innovative facial recognition system that integrates an enhanced 50-layer Residual Network (ResNet50) architecture with Additive Angular Margin Loss (ArcFace) and Multi-Task Cascaded Convolutional Neural [...] Read more.
Electoral fraud, particularly multiple voting, undermines the integrity of democratic processes. To address this challenge, this study introduces an innovative facial recognition system that integrates an enhanced 50-layer Residual Network (ResNet50) architecture with Additive Angular Margin Loss (ArcFace) and Multi-Task Cascaded Convolutional Neural Networks (MTCNN) for face detection. Using the Mahalanobis distance, the system verifies voter identities by comparing captured facial images with previously recorded biometric features. Extensive evaluations demonstrate the methodology’s effectiveness, achieving a facial recognition accuracy of 99.85%. This significant improvement over existing baseline methods has the potential to enhance electoral transparency and prevent multiple voting. The findings contribute to developing robust biometric-based electoral systems, thereby promoting democratic trust and accountability. Full article
Show Figures

Figure 1

13 pages, 6337 KB  
Article
Printed Circuit Board Sample Expansion and Automatic Defect Detection Based on Diffusion Models and ConvNeXt
by Youzhi Xu, Hao Wu, Yulong Liu and Xiaoming Liu
Micromachines 2025, 16(3), 261; https://doi.org/10.3390/mi16030261 - 26 Feb 2025
Cited by 2 | Viewed by 1170
Abstract
Soldering of printed circuit board (PCB)-based surface-mounted assemblies is a critical process, and to enhance the accuracy of detecting their multi-targeted soldering defects, we propose an automated sample generation method that combines ControlNet and a Stable Diffusion Model. This method can expand the [...] Read more.
Soldering of printed circuit board (PCB)-based surface-mounted assemblies is a critical process, and to enhance the accuracy of detecting their multi-targeted soldering defects, we propose an automated sample generation method that combines ControlNet and a Stable Diffusion Model. This method can expand the dataset by quickly obtaining sample images with high quality containing both defects and normal detection targets. Meanwhile, we propose the Cascade Mask R-CNN model with ConvNeXt as the backbone, which performs well in dealing with multi-target defect detection tasks. Unlike previous detection methods that can only detect a single component, it can detect all components in the region. The results of the experiment demonstrate that the detection accuracy of our proposed approach is significantly enhanced over the previous convolutional neural network model, with an increase of more than 10.5% in the mean accuracy precision (mAP) and 9.5% in the average recall (AR). Full article
Show Figures

Figure 1

41 pages, 1802 KB  
Review
A Systematic Review of CNN Architectures, Databases, Performance Metrics, and Applications in Face Recognition
by Andisani Nemavhola, Colin Chibaya and Serestina Viriri
Information 2025, 16(2), 107; https://doi.org/10.3390/info16020107 - 5 Feb 2025
Cited by 5 | Viewed by 6273
Abstract
This study provides a comparative evaluation of face recognition databases and Convolutional Neural Network (CNN) architectures used in training and testing face recognition systems. The databases span from early datasets like Olivetti Research Laboratory (ORL) and Facial Recognition Technology (FERET) to more recent [...] Read more.
This study provides a comparative evaluation of face recognition databases and Convolutional Neural Network (CNN) architectures used in training and testing face recognition systems. The databases span from early datasets like Olivetti Research Laboratory (ORL) and Facial Recognition Technology (FERET) to more recent collections such as MegaFace and Ms-Celeb-1M, offering a range of sizes, subject diversity, and image quality. Older databases, such as ORL and FERET, are smaller and cleaner, while newer datasets enable large-scale training with millions of images but pose challenges like inconsistent data quality and high computational costs. The study also examines CNN architectures, including FaceNet and Visual Geometry Group 16 (VGG16), which show strong performance on large datasets like Labeled Faces in the Wild (LFW) and VGGFace, achieving accuracy rates above 98%. In contrast, earlier models like Support Vector Machine (SVM) and Gabor Wavelets perform well on smaller datasets but lack scalability for larger, more complex datasets. The analysis highlights the growing importance of multi-task learning and ensemble methods, as seen in Multi-Task Cascaded Convolutional Networks (MTCNNs). Overall, the findings emphasize the need for advanced algorithms capable of handling large-scale, real-world challenges while optimizing accuracy and computational efficiency in face recognition systems. Full article
(This article belongs to the Special Issue Machine Learning and Data Mining for User Classification)
Show Figures

Figure 1

19 pages, 14722 KB  
Article
Log Volume Measurement and Counting Based on Improved Cascade Mask R-CNN and Deep SORT
by Chunjiang Yu, Yongke Sun, Yong Cao, Lei Liu and Xiaotao Zhou
Forests 2024, 15(11), 1884; https://doi.org/10.3390/f15111884 - 26 Oct 2024
Cited by 1 | Viewed by 1542
Abstract
Logs require multiple verifications to ensure accurate volume and quantity measurements. Log end detection is a crucial step in measuring log volume and counting logs. Currently, this task primarily relies on the Mask R-CNN instance segmentation model. However, the Feature Pyramid Network (FPN) [...] Read more.
Logs require multiple verifications to ensure accurate volume and quantity measurements. Log end detection is a crucial step in measuring log volume and counting logs. Currently, this task primarily relies on the Mask R-CNN instance segmentation model. However, the Feature Pyramid Network (FPN) in Mask R-CNN may compromise accuracy due to feature redundancy during multi-scale fusion, particularly with small objects. Moreover, counting logs in a single image is challenging due to their large size and stacking. To address the above issues, we propose an improved log segmentation model based on Cascade Mask R-CNN. This method uses ResNet for multi-scale feature extraction and integrates a hierarchical Convolutional Block Attention Module (CBAM) to refine feature weights and enhance object emphasis. Then, a Region Proposal Network (RPN) is employed to generate log segmentation proposals. Finally, combined with Deep SORT, the model tracks log ends in video streams and counts the number of logs in the stack. Experiments demonstrate the effectiveness of our method, achieving an average precision (AP) of 82.3, APs of 75.3 for small, APm of 70.9 for medium, and APl of 86.2 for large objects. These results represent improvements of 1.8%, 3.7%, 2.6%, and 1.4% over Mask R-CNN, respectively. The detection rate reached 98.6%, with a counting accuracy of 95%. Compared to manually measured volumes, our method shows a low error rate of 4.07%. Full article
(This article belongs to the Section Wood Science and Forest Products)
Show Figures

Figure 1

18 pages, 4515 KB  
Article
Historical Blurry Video-Based Face Recognition
by Lujun Zhai, Suxia Cui, Yonghui Wang, Song Wang, Jun Zhou and Greg Wilsbacher
J. Imaging 2024, 10(9), 236; https://doi.org/10.3390/jimaging10090236 - 20 Sep 2024
Cited by 1 | Viewed by 2169
Abstract
Face recognition is a widely used computer vision, which plays an increasingly important role in user authentication systems, security systems, and consumer electronics. The models for most current applications are based on high-definition digital cameras. In this paper, we focus on digital images [...] Read more.
Face recognition is a widely used computer vision, which plays an increasingly important role in user authentication systems, security systems, and consumer electronics. The models for most current applications are based on high-definition digital cameras. In this paper, we focus on digital images derived from historical motion picture films. Historical motion picture films often have poorer resolution than modern digital imagery, making face detection a more challenging task. To approach this problem, we first propose a trunk–branch concatenated multi-task cascaded convolutional neural network (TB-MTCNN), which efficiently extracts facial features from blurry historical films by combining the trunk with branch networks and employing various sizes of kernels to enrich the multi-scale receptive field. Next, we build a deep neural network-integrated object-tracking algorithm to compensate for failed recognition over one or more video frames. The framework combines simple online and real-time tracking with deep data association (Deep SORT), and TB-MTCNN with the residual neural network (ResNet) model. Finally, a state-of-the-art image restoration method is employed to reduce the effect of noise and blurriness. The experimental results show that our proposed joint face recognition and tracking network can significantly reduce missed recognition in historical motion picture film frames. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

12 pages, 1968 KB  
Article
A Deep Learning Approach for Early Detection of Facial Palsy in Video Using Convolutional Neural Networks: A Computational Study
by Anuja Arora, Jasir Mohammad Zaeem, Vibhor Garg, Ambikesh Jayal and Zahid Akhtar
Computers 2024, 13(8), 200; https://doi.org/10.3390/computers13080200 - 15 Aug 2024
Cited by 4 | Viewed by 2807
Abstract
Facial palsy causes the face to droop due to sudden weakness in the muscles on one side of the face. Computer-added assistance systems for the automatic recognition of palsy faces present a promising solution to recognizing the paralysis of faces at an early [...] Read more.
Facial palsy causes the face to droop due to sudden weakness in the muscles on one side of the face. Computer-added assistance systems for the automatic recognition of palsy faces present a promising solution to recognizing the paralysis of faces at an early stage. A few research studies have already been performed to handle this research issue using an automatic deep feature extraction by deep learning approach and handcrafted machine learning approach. This empirical research work designed a multi-model facial palsy framework which is a combination of two convolutional models—a multi-task cascaded convolutional network (MTCNN) for face and landmark detection and a hyperparameter tuned and parametric setting convolution neural network model for facial palsy classification. Using the proposed multi-model facial palsy framework, we presented results on a dataset of YouTube videos featuring patients with palsy. The results indicate that the proposed framework can detect facial palsy efficiently. Furthermore, the achieved accuracy, precision, recall, and F1-score values of the proposed framework for facial palsy detection are 97%, 94%, 90%, and 97%, respectively, for the training dataset. For the validation dataset, the accuracy achieved is 95%, precision is 90%, recall is 75.6%, and F-score is 76%. As a result, this framework can easily be used for facial palsy detection. Full article
Show Figures

Figure 1

14 pages, 4131 KB  
Article
Concurrent Learning Approach for Estimation of Pelvic Tilt from Anterior–Posterior Radiograph
by Ata Jodeiri, Hadi Seyedarabi, Sebelan Danishvar, Seyyed Hossein Shafiei, Jafar Ganjpour Sales, Moein Khoori, Shakiba Rahimi and Seyed Mohammad Javad Mortazavi
Bioengineering 2024, 11(2), 194; https://doi.org/10.3390/bioengineering11020194 - 17 Feb 2024
Viewed by 3390
Abstract
Accurate and reliable estimation of the pelvic tilt is one of the essential pre-planning factors for total hip arthroplasty to prevent common post-operative complications such as implant impingement and dislocation. Inspired by the latest advances in deep learning-based systems, our focus in this [...] Read more.
Accurate and reliable estimation of the pelvic tilt is one of the essential pre-planning factors for total hip arthroplasty to prevent common post-operative complications such as implant impingement and dislocation. Inspired by the latest advances in deep learning-based systems, our focus in this paper has been to present an innovative and accurate method for estimating the functional pelvic tilt (PT) from a standing anterior–posterior (AP) radiography image. We introduce an encoder–decoder-style network based on a concurrent learning approach called VGG-UNET (VGG embedded in U-NET), where a deep fully convolutional network known as VGG is embedded at the encoder part of an image segmentation network, i.e., U-NET. In the bottleneck of the VGG-UNET, in addition to the decoder path, we use another path utilizing light-weight convolutional and fully connected layers to combine all extracted feature maps from the final convolution layer of VGG and thus regress PT. In the test phase, we exclude the decoder path and consider only a single target task i.e., PT estimation. The absolute errors obtained using VGG-UNET, VGG, and Mask R-CNN are 3.04 ± 2.49, 3.92 ± 2.92, and 4.97 ± 3.87, respectively. It is observed that the VGG-UNET leads to a more accurate prediction with a lower standard deviation (STD). Our experimental results demonstrate that the proposed multi-task network leads to a significantly improved performance compared to the best-reported results based on cascaded networks. Full article
Show Figures

Figure 1

21 pages, 2285 KB  
Article
MDAU-Net: A Liver and Liver Tumor Segmentation Method Combining an Attention Mechanism and Multi-Scale Features
by Jinlin Ma, Mingge Xia, Ziping Ma and Zhiqing Jiu
Appl. Sci. 2023, 13(18), 10443; https://doi.org/10.3390/app131810443 - 18 Sep 2023
Cited by 4 | Viewed by 2006
Abstract
In recent years, U-Net and its extended variants have made remarkable progress in the realm of liver and liver tumor segmentation. However, the limitations of single-path convolutional operations have hindered the full exploitation of valuable features and restricted their mobility within networks. Moreover, [...] Read more.
In recent years, U-Net and its extended variants have made remarkable progress in the realm of liver and liver tumor segmentation. However, the limitations of single-path convolutional operations have hindered the full exploitation of valuable features and restricted their mobility within networks. Moreover, the semantic gap between shallow and deep features proves that a simplistic shortcut is not enough. To address these issues and realize automatic liver and tumor area segmentation in CT images, we introduced the multi-scale feature fusion with dense connections and an attention mechanism segmentation method (MDAU-Net). This network leverages the multi-head attention (MHA) mechanism and multi-scale feature fusion. First, we introduced a double-flow linear pooling enhancement unit to optimize the fusion of deep and shallow features while mitigating the semantic gap between them. Subsequently, we proposed a cascaded adaptive feature extraction unit, combining attention mechanisms with a series of dense connections to capture valuable information and encourage feature reuse. Additionally, we designed a cross-level information interaction mechanism utilizing bidirectional residual connections to address the issue of forgetting a priori knowledge during training. Finally, we assessed MDAU-Net’s performance on the LiTS and SLiver07 datasets. The experimental results demonstrated that MDAU-Net is well-suited for liver and tumor segmentation tasks, outperforming existing widely used methods in terms of robustness and accuracy. Full article
(This article belongs to the Special Issue Advances in Biomedical Image Processing and Analysis)
Show Figures

Figure 1

23 pages, 24808 KB  
Article
Classification of Hyperspectral and LiDAR Data Using Multi-Modal Transformer Cascaded Fusion Net
by Shuo Wang, Chengchao Hou, Yiming Chen, Zhengjun Liu, Zhenbei Zhang and Geng Zhang
Remote Sens. 2023, 15(17), 4142; https://doi.org/10.3390/rs15174142 - 24 Aug 2023
Cited by 8 | Viewed by 3190
Abstract
With the continuous development of surface observation methods and technologies, we can acquire multiple sources of data more effectively in the same geographic area. The quality and availability of these data have also significantly improved. Consequently, how to better utilize multi-source data to [...] Read more.
With the continuous development of surface observation methods and technologies, we can acquire multiple sources of data more effectively in the same geographic area. The quality and availability of these data have also significantly improved. Consequently, how to better utilize multi-source data to represent ground information has become an important research question in the field of geoscience. In this paper, a novel model called multi-modal transformer cascaded fusion net (MMTCFN) is proposed for fusion and classification of multi-modal remote sensing data, Hyperspectral Imagery (HSI) and LiDAR data. Feature fusion and feature extraction are the two stages of the model. First, in the feature extraction stage, a three-branch cascaded Convolutional Neural Network (CNN) framework is employed to fully leverage the advantages of convolutional operators in extracting shallow-level local features. Based on this, we generated multi-modal long-range integrated deep features utilizing the transformer-based vectorized pixel group transformer (VPGT) module during the feature fusion stage. In the VPGT block, we designed a vectorized pixel group embedding that preserves the global features extracted from the three branches in a non-overlapping multi-space manner. Moreover, we introduce the DropKey mechanism into the multi-head self-attention (MHSA) to alleviate overfitting caused by insufficient training samples. Finally, we employ a probabilistic decision fusion strategy to integrate multiple class estimations, assigning a specific category to each pixel. This model was experimented on three HSI-LiDAR datasets with balanced and unbalanced training samples. The proposed model outperforms the other seven SOTA approaches in terms of OA performance, proving the superiority of MMTCFN for the HSI-LiDAR classification task. Full article
Show Figures

Figure 1

Back to TopTop