Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (7)

Search Parameters:
Keywords = Selective Kernel Networks (SKNet)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 6822 KB  
Article
Intelligent Fault Diagnosis Based on Dual-Graph Transformation and P2D-Sk-ResNet-XGBoost
by Zhining Jia, Hongtao Yu, Lei Qiao, Guanqun Wang, You Cui, Zhimin Xu, Yang Yang and Fengjun Zhang
Processes 2025, 13(10), 3342; https://doi.org/10.3390/pr13103342 - 18 Oct 2025
Viewed by 259
Abstract
To address the limitations of one-dimensional vibration signals in convolutional neural networks and the insufficient feature extraction capability of traditional single data processing methods under complex operating conditions, this paper proposes a novel fault diagnosis method that integrates dual-graph transformation and an improved [...] Read more.
To address the limitations of one-dimensional vibration signals in convolutional neural networks and the insufficient feature extraction capability of traditional single data processing methods under complex operating conditions, this paper proposes a novel fault diagnosis method that integrates dual-graph transformation and an improved residual network. Firstly, the one-dimensional vibration signals are converted into time–frequency representations using the short-time Fourier transform (STFT) and the synchrosqueezed wavelet transform (SWT). Subsequently, these dual-domain representations are fed in parallel into a customized parallel two-dimensional residual network (P2D-Sk-ResNet), which incorporates the selective kernel network (SKNet) mechanism into a ResNet architecture. This design enables adaptive multi-scale feature extraction. Finally, the features from the fully connected layer are classified using the extreme gradient boosting (XGBoost) algorithm to complete the fault diagnosis task. Comparative experiments demonstrate that the proposed STFT-SWT-P2D-Sk-ResNet-XGBoost achieves a diagnostic accuracy of 98.51% under constant load conditions, significantly outperforming several baseline models. Furthermore, the model exhibits superior generalization capability under varying load conditions and strong robustness in noisy environments. The proposed method provides a valuable and practical reference for intelligent fault diagnosis in industrial applications. Full article
(This article belongs to the Section Process Control and Monitoring)
Show Figures

Figure 1

19 pages, 9284 KB  
Article
UAV-YOLO12: A Multi-Scale Road Segmentation Model for UAV Remote Sensing Imagery
by Bingyan Cui, Zhen Liu and Qifeng Yang
Drones 2025, 9(8), 533; https://doi.org/10.3390/drones9080533 - 29 Jul 2025
Viewed by 1801
Abstract
Unmanned aerial vehicles (UAVs) are increasingly used for road infrastructure inspection and monitoring. However, challenges such as scale variation, complex background interference, and the scarcity of annotated UAV datasets limit the performance of traditional segmentation models. To address these challenges, this study proposes [...] Read more.
Unmanned aerial vehicles (UAVs) are increasingly used for road infrastructure inspection and monitoring. However, challenges such as scale variation, complex background interference, and the scarcity of annotated UAV datasets limit the performance of traditional segmentation models. To address these challenges, this study proposes UAV-YOLOv12, a multi-scale segmentation model specifically designed for UAV-based road imagery analysis. The proposed model builds on the YOLOv12 architecture by adding two key modules. It uses a Selective Kernel Network (SKNet) to adjust receptive fields dynamically and a Partial Convolution (PConv) module to improve spatial focus and robustness in occluded regions. These enhancements help the model better detect small and irregular road features in complex aerial scenes. Experimental results on a custom UAV dataset collected from national highways in Wuxi, China, show that UAV-YOLOv12 achieves F1-scores of 0.902 for highways (road-H) and 0.825 for paths (road-P), outperforming the original YOLOv12 by 5% and 3.2%, respectively. Inference speed is maintained at 11.1 ms per image, supporting near real-time performance. Moreover, comparative evaluations with U-Net show that UAV-YOLOv12 improves by 7.1% and 9.5%. The model also exhibits strong generalization ability, achieving F1-scores above 0.87 on public datasets such as VHR-10 and the Drone Vehicle dataset. These results demonstrate that the proposed UAV-YOLOv12 can achieve high accuracy and robustness in diverse road environments and object scales. Full article
Show Figures

Figure 1

16 pages, 3143 KB  
Article
DGA Domain Detection Based on Transformer and Rapid Selective Kernel Network
by Jisheng Tang, Yiling Guan, Shenghui Zhao, Huibin Wang and Yinong Chen
Electronics 2024, 13(24), 4982; https://doi.org/10.3390/electronics13244982 - 18 Dec 2024
Viewed by 1495
Abstract
Botnets pose a significant challenge in network security by leveraging Domain Generation Algorithms (DGA) to evade traditional security measures. Extracting DGA domain samples is inherently complex, and the current DGA detection models often struggle to capture domain features effectively when facing limited training [...] Read more.
Botnets pose a significant challenge in network security by leveraging Domain Generation Algorithms (DGA) to evade traditional security measures. Extracting DGA domain samples is inherently complex, and the current DGA detection models often struggle to capture domain features effectively when facing limited training data. This limitation results in suboptimal detection performance and an imbalance between model accuracy and complexity. To address these challenges, this paper introduces a novel multi-scale feature fusion model that integrates the Transformer architecture with the Rapid Selective Kernel Network (R-SKNet). The proposed model employs the Transformer’s encoder to couple the single-domain character elements with the multiple types of relationships within the global domain block. This paper proposes integrating R-SKNet into DGA detection and developing an efficient channel attention (ECA) module. By enhancing the branch information guidance in the SKNet architecture, the approach achieves adaptive receptive field selection, multi-scale feature capture, and lightweight yet efficient multi-scale convolution. Moreover, the improved Feature Pyramid Network (FPN) architecture, termed EFAM, is utilized to adjust channel weights for outputs at different stages of the backbone network, leading to achieving multi-scale feature fusion. Experimental results demonstrate that, in tasks with limited training samples, the proposed method achieves lower computational complexity and higher detection accuracy compared to mainstream detection models. Full article
Show Figures

Figure 1

18 pages, 4533 KB  
Article
A Bearing Fault Diagnosis Method in Scenarios of Imbalanced Samples and Insufficient Labeled Samples
by Xiaohan Cheng, Yuxin Lu, Zhihao Liang, Lei Zhao, Yuandong Gong and Meng Wang
Appl. Sci. 2024, 14(19), 8582; https://doi.org/10.3390/app14198582 - 24 Sep 2024
Viewed by 1580
Abstract
In practical working environments, rolling bearings are one of the components that are prone to failure. Their vibration signal samples are faced with challenges, mainly including the imbalance between normal and fault samples as well as an insufficient number of labeled samples. This [...] Read more.
In practical working environments, rolling bearings are one of the components that are prone to failure. Their vibration signal samples are faced with challenges, mainly including the imbalance between normal and fault samples as well as an insufficient number of labeled samples. This study proposes a sample-expansion method based on generative adversarial networks (GANs) and a fault diagnosis method based on a transformer to solve the above issues. First, selective kernel networks (SKNets) and a genetic algorithm (GA) were introduced to construct a conditional variational autoencoder–evolutionary generative adversarial network with a selective kernel (CVAE-SKEGAN) to achieve a balance between the proportion of normal and faulty samples. Then, a semi-supervised learning–variational convolutional Swin transformer (SSL-VCST) network was built for the fault classification, specifically introducing variational attention and semi-supervised mechanisms to reduce the overfitting risk of the model and solve the problem of a shortage of labeled samples. Three typical operating conditions were designed for the multi-case applicability verification. The results show that the method proposed in this study had good application effects when solving both sample imbalances and labeled-sample deficiencies and improved the accuracy of fault diagnosis in the above scenarios. Full article
Show Figures

Figure 1

19 pages, 2503 KB  
Article
ResSKNet-SSDP: Effective and Light End-To-End Architecture for Speaker Recognition
by Fei Deng, Lihong Deng, Peifan Jiang, Gexiang Zhang and Qiang Yang
Sensors 2023, 23(3), 1203; https://doi.org/10.3390/s23031203 - 20 Jan 2023
Cited by 12 | Viewed by 2943
Abstract
In speaker recognition tasks, convolutional neural network (CNN)-based approaches have shown significant success. Modeling the long-term contexts and efficiently aggregating the information are two challenges in speaker recognition, and they have a critical impact on system performance. Previous research has addressed these issues [...] Read more.
In speaker recognition tasks, convolutional neural network (CNN)-based approaches have shown significant success. Modeling the long-term contexts and efficiently aggregating the information are two challenges in speaker recognition, and they have a critical impact on system performance. Previous research has addressed these issues by introducing deeper, wider, and more complex network architectures and aggregation methods. However, it is difficult to significantly improve the performance with these approaches because they also have trouble fully utilizing global information, channel information, and time-frequency information. To address the above issues, we propose a lighter and more efficient CNN-based end-to-end speaker recognition architecture, ResSKNet-SSDP. ResSKNet-SSDP consists of a residual selective kernel network (ResSKNet) and self-attentive standard deviation pooling (SSDP). ResSKNet can capture long-term contexts, neighboring information, and global information, thus extracting a more informative frame-level. SSDP can capture short- and long-term changes in frame-level features, aggregating the variable-length frame-level features into fixed-length, more distinctive utterance-level features. Extensive comparison experiments were performed on two popular public speaker recognition datasets, Voxceleb and CN-Celeb, with current state-of-the-art speaker recognition systems and achieved the lowest EER/DCF of 2.33%/0.2298, 2.44%/0.2559, 4.10%/0.3502, and 12.28%/0.5051. Compared with the lightest x-vector, our designed ResSKNet-SSDP has 3.1 M fewer parameters and 31.6 ms less inference time, but 35.1% better performance. The results show that ResSKNet-SSDP significantly outperforms the current state-of-the-art speaker recognition architectures on all test sets and is an end-to-end architecture with fewer parameters and higher efficiency for applications in realistic situations. The ablation experiments further show that our proposed approaches also provide significant improvements over previous methods. Full article
(This article belongs to the Special Issue Acoustic Sensors and Their Applications)
Show Figures

Figure 1

20 pages, 4617 KB  
Article
Mathematical Formula Image Screening Based on Feature Correlation Enhancement
by Hongyuan Liu, Fang Yang, Xue Wang and Jianhui Si
Electronics 2022, 11(5), 799; https://doi.org/10.3390/electronics11050799 - 3 Mar 2022
Cited by 2 | Viewed by 3925
Abstract
There are mathematical formula images or other images in scientific and technical documents or on web pages, and mathematical formula images are classified as either containing only mathematical formulas or formulas interspersed with other elements, such as text and coordinate diagrams. To screen [...] Read more.
There are mathematical formula images or other images in scientific and technical documents or on web pages, and mathematical formula images are classified as either containing only mathematical formulas or formulas interspersed with other elements, such as text and coordinate diagrams. To screen and collect images containing mathematical formulas for others to study or for further research, a model for screening images of mathematical formulas based on feature correlation enhancement is proposed. First, the Feature Correlation Enhancement (FCE) module was designed to improve the correlation degree of mathematical formula features and weaken other features. Then, the strip multi-scale pooling (SMP) module was designed to solve the problem of non-uniform image size, while enhancing the focus on horizontal formula features. Finally, the loss function was improved to balance the dataset. The accuracy of the experiment was 89.50%, which outperformed the existing model. Using the model to screen images enables the user to screen out images containing mathematical formulas. The screening of images containing mathematical formulas helps to speed up the creation of a database of mathematical formula images. Full article
(This article belongs to the Collection Computer Vision and Pattern Recognition Techniques)
Show Figures

Figure 1

17 pages, 7998 KB  
Article
A Two-Stream CNN Model with Adaptive Adjustment of Receptive Field Dedicated to Flame Region Detection
by Peng Lu, Yaqin Zhao and Yuan Xu
Symmetry 2021, 13(3), 397; https://doi.org/10.3390/sym13030397 - 28 Feb 2021
Cited by 12 | Viewed by 3561
Abstract
Convolutional neural networks (CNN) have yielded state-of-the-art performance in image segmentation. Their application in video surveillance systems can provide very useful information for extinguishing fire in time. The current studies mostly focused on CNN-based flame image classification and have achieved good accuracy. However, [...] Read more.
Convolutional neural networks (CNN) have yielded state-of-the-art performance in image segmentation. Their application in video surveillance systems can provide very useful information for extinguishing fire in time. The current studies mostly focused on CNN-based flame image classification and have achieved good accuracy. However, the research of CNN-based flame region detection is extremely scarce due to the bulky network structures and high hardware configuration requirements of the state-of-the-art CNN models. Therefore, this paper presents a two-stream convolutional neural network for flame region detection (TSCNNFlame). TSCNNFlame is a lightweight CNN architecture including a spatial stream and temporal stream for detecting flame pixels in video sequences captured by fixed cameras. The static features from the spatial stream and dynamic features from the temporal stream are fused by three convolutional layers to reduce the false positives. We replace the convolutional layer of CNN with the selective kernel (SK)-Shuffle block constructed by integrating the SK convolution into the deep convolutional layer of ShuffleNet V2. The SKnet blocks can adaptively adjust the size of one receptive field with the proportion of one region of interest (ROI) in it. The grouped convolution used in Shufflenet solves the problem in which the multi-branch structure of SKnet causes the network parameters to double with the number of branches. Therefore, the CNN network dedicated to flame region detection balances the efficiency and accuracy by the lightweight architecture, the temporal–spatial features fusion, and the advantages of the SK-Shuffle block. The experimental results, which are evaluated by multiple metrics and are analyzed from many angles, show that this method can achieve significant performance while reducing the running time. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

Back to TopTop