MDPI - Publisher of Open Access Journals

21 pages, 3725 KB

Open AccessArticle

Pruning-Friendly RGB-T Semantic Segmentation for Real-Time Processing on Edge Devices

by Jun Young Hwang, Youn Joo Lee, Ho Gi Jung and Jae Kyu Suhr

Electronics 2025, 14(17), 3408; https://doi.org/10.3390/electronics14173408 - 27 Aug 2025

Viewed by 463

RGB-T semantic segmentation using thermal and RGB images simultaneously is actively being researched to robustly recognize the surrounding environment of vehicles regardless of challenging lighting and weather conditions. It is important for them to operate in real time on edge devices. As transformer-based [...] Read more.

RGB-T semantic segmentation using thermal and RGB images simultaneously is actively being researched to robustly recognize the surrounding environment of vehicles regardless of challenging lighting and weather conditions. It is important for them to operate in real time on edge devices. As transformer-based approaches, which most recent RGB-T semantic segmentation studies belong to, are very difficult to perform on edge devices, this paper considers only CNN-based RGB-T semantic segmentation networks that can be performed on edge devices and operated in real time. Although EAEFNet shows the best performance among CNN-based networks on edge devices, its inference speed is too slow for real-time operation. Furthermore, even when channel pruning is applied, the speed improvement is minimal. The analysis of EAEFNet identifies the intermediate fusion of RGB and thermal features and the high complexity of the decoder as the main causes. To address these issues, this paper proposes a network using a ResNet encoder with an early-fused four-channel input and the U-Net decoder structure. To improve the decoder performance, bilinear upsampling is replaced with PixelShuffle. Additionally, mini Atrous Spatial Pyramid Pooling (ASPP) and Progressive Transposed Module (PTM) modules are applied. Since the Proposed Network is primarily composed of convolutional layers, channel pruning is confirmed to be effectively applicable. Consequently, channel pruning significantly improves inference speed, and enables real-time operation on the neural processing unit (NPU) of edge devices. The Proposed Network is evaluated using the MFNet dataset, one of the most widely used public datasets for RGB-T semantic segmentation. It is shown that the proposed method achieves a performance comparable to EAEFNet while operating at over 30 FPS on an embedded board equipped with the Qualcomm QCS6490 SoC. Full article

(This article belongs to the Special Issue New Insights in 2D and 3D Object Detection and Semantic Segmentation)

► Show Figures

Figure 1

21 pages, 1681 KB

Open AccessArticle

Cross-Modal Complementarity Learning for Fish Feeding Intensity Recognition via Audio–Visual Fusion

by Jian Li, Yanan Wei, Wenkai Ma and Tan Wang

Animals 2025, 15(15), 2245; https://doi.org/10.3390/ani15152245 - 31 Jul 2025

Viewed by 732

Abstract

Accurate evaluation of fish feeding intensity is crucial for optimizing aquaculture efficiency and the healthy growth of fish. Previous methods mainly rely on single-modal approaches (e.g., audio or visual). However, the complex underwater environment makes single-modal monitoring methods face significant challenges: visual systems [...] Read more.

Accurate evaluation of fish feeding intensity is crucial for optimizing aquaculture efficiency and the healthy growth of fish. Previous methods mainly rely on single-modal approaches (e.g., audio or visual). However, the complex underwater environment makes single-modal monitoring methods face significant challenges: visual systems are severely affected by water turbidity, lighting conditions, and fish occlusion, while acoustic systems suffer from background noise. Although existing studies have attempted to combine acoustic and visual information, most adopt simple feature-level fusion strategies, which fail to fully explore the complementary advantages of the two modalities under different environmental conditions and lack dynamic evaluation mechanisms for modal reliability. To address these problems, we propose the Adaptive Cross-modal Attention Fusion Network (ACAF-Net), a cross-modal complementarity learning framework with a two-stage attention fusion mechanism: (1) a cross-modal enhancement stage that enriches individual representations through Low-rank Bilinear Pooling and learnable fusion weights; (2) an adaptive attention fusion stage that dynamically weights acoustic and visual features based on complementarity and environmental reliability. Our framework incorporates dimension alignment strategies and attention mechanisms to capture temporal–spatial complementarity between acoustic feeding signals and visual behavioral patterns. Extensive experiments demonstrate superior performance compared to single-modal and conventional fusion approaches, with 6.4% accuracy improvement. The results validate the effectiveness of exploiting cross-modal complementarity for underwater behavioral analysis and establish a foundation for intelligent aquaculture monitoring systems. Full article

(This article belongs to the Special Issue Innovations in Aquaculture: New Technologies, Culture Systems and Integration of Emerging Species)

► Show Figures

Figure 1

21 pages, 4008 KB

Open AccessArticle

Enhancing Suburban Lane Detection Through Improved DeepLabV3+ Semantic Segmentation

by Shuwan Cui, Bo Yang, Zhifu Wang, Yi Zhang, Hao Li, Hui Gao and Haijun Xu

Electronics 2025, 14(14), 2865; https://doi.org/10.3390/electronics14142865 - 17 Jul 2025

Viewed by 577

Abstract

Lane detection is a key technology in automatic driving environment perception, and its accuracy directly affects vehicle positioning, path planning, and driving safety. In this study, an enhanced real-time model for lane detection based on an improved DeepLabV3+ architecture is proposed to address [...] Read more.

Lane detection is a key technology in automatic driving environment perception, and its accuracy directly affects vehicle positioning, path planning, and driving safety. In this study, an enhanced real-time model for lane detection based on an improved DeepLabV3+ architecture is proposed to address the challenges posed by complex dynamic backgrounds and blurred road boundaries in suburban road scenarios. To address the lack of feature correlation in the traditional Atrous Spatial Pyramid Pooling (ASPP) module of the DeepLabV3+ model, we propose an improved LC-DenseASPP module. First, inspired by DenseASPP, the number of dilated convolution layers is reduced from six to three by adopting a dense connection to enhance feature reuse, significantly reducing computational complexity. Second, the convolutional block attention module (CBAM) attention mechanism is embedded after the LC-DenseASPP dilated convolution operation. This effectively improves the model’s ability to focus on key features through the adaptive refinement of channel and spatial attention features. Finally, an image-pooling operation is introduced in the last layer of the LC-DenseASPP to further enhance the ability to capture global context information. DySample is introduced to replace bilinear upsampling in the decoder, ensuring model performance while reducing computational resource consumption. The experimental results show that the model achieves a good balance between segmentation accuracy and computational efficiency, with a mean intersection over union (mIoU) of 95.48% and an inference speed of 128 frames per second (FPS). Additionally, a new lane-detection dataset, SubLane, is constructed to fill the gap in the research field of lane detection in suburban road scenarios. Full article

► Show Figures

Figure 1

20 pages, 623 KB

Open AccessArticle

Fast Normalization for Bilinear Pooling via Eigenvalue Regularization

by Sixiang Xu, Huihui Dong, Chen Zhang and Chaoxue Wang

Appl. Sci. 2025, 15(8), 4155; https://doi.org/10.3390/app15084155 - 10 Apr 2025

Viewed by 649

Abstract

Bilinear pooling, as an aggregation approach that outputs second-order statistics of deep learning features, has demonstrated effectiveness in a wide range of visual recognition tasks. Among major improvements on the bilinear pooling, matrix square root normalization—applied to the bilinear representation matrix—is regarded as [...] Read more.

Bilinear pooling, as an aggregation approach that outputs second-order statistics of deep learning features, has demonstrated effectiveness in a wide range of visual recognition tasks. Among major improvements on the bilinear pooling, matrix square root normalization—applied to the bilinear representation matrix—is regarded as a crucial step for further boosting performance. However, most existing works leverage Newton’s iteration to perform normalization, which becomes computationally inefficient when dealing with high-dimensional features. To address this limitation, through a comprehensive analysis, we reveal that both the distribution and magnitude of eigenvalues in the bilinear representation matrix play an important role in the network performance. Building upon this insight, we propose a novel approach, namely RegCov, which regularizes the eigenvalues when the normalization is absent. Specifically, RegCov incorporates two regularization terms that encourage the network to align the current eigenvalues with the target ones in terms of their distribution and magnitude. We implement RegCov across different network architectures and run extensive experiments on the ImageNet1K and fine-grained image classification benchmarks. The results demonstrate that RegCov maintains robust recognition to diverse datasets and network architectures while achieving superior inference speed compared to previous works. Full article

(This article belongs to the Special Issue Application of Machine Learning to Image Classification and Image Segmentation)

► Show Figures

Figure 1

24 pages, 12658 KB

Open AccessArticle

Camouflaged Object Detection with Enhanced Small-Structure Awareness in Complex Backgrounds

by Yaning Lv, Sanyang Liu, Yudong Gong and Jing Yang

Electronics 2025, 14(6), 1118; https://doi.org/10.3390/electronics14061118 - 12 Mar 2025

Cited by 5 | Viewed by 1466

Abstract

Small-Structure Camouflaged Object Detection (SSCOD) is a highly promising yet challenging task, as small-structure targets often exhibit weaker features and occupy a significantly smaller proportion of the image compared to normal-sized targets. Such data are not only prevalent in existing benchmark camouflaged object [...] Read more.

Small-Structure Camouflaged Object Detection (SSCOD) is a highly promising yet challenging task, as small-structure targets often exhibit weaker features and occupy a significantly smaller proportion of the image compared to normal-sized targets. Such data are not only prevalent in existing benchmark camouflaged object detection datasets but also frequently encountered in real-world scenarios. Although existing camouflaged object detection (COD) methods have significantly improved detection accuracy, research specifically focused on SSCOD remains limited. To further advance the SSCOD task, we propose a detail-preserving multi-scale adaptive network architecture that incorporates the following key components: (1) An adaptive scaling strategy designed to mimic human visual perception when observing blurry targets. (2) An Attentive Atrous Spatial Pyramid Pooling (A2SPP) module, enabling each position in the feature map to autonomously learn the optimal feature scale. (3) A scale integration mechanism, leveraging Haar Wavelet-based Downsampling (HWD) and bilinear upsampling to preserve both contextual and fine-grained details across multiple scales. (4) A Feature Enhancement Module (FEM), specifically tailored to refine feature representations in small-structure detection scenarios. Extensive comparative experiments and ablation studies conducted on three camouflaged object detection datasets, as well as our proposed small-structure test datasets, demonstrated that our framework outperformed existing state-of-the-art (SOTA) methods. Notably, our approach achieved superior performance in detecting small-structured targets, highlighting its effectiveness and robustness in addressing the challenges of SSCOD tasks. Additionally, we conducted polyp segmentation experiments on four datasets, and the results showed that our framework is also well-suited for polyp segmentation, consistently outperforming other recent methods. Full article

► Show Figures

Figure 1

25 pages, 152810 KB

Open AccessArticle

QEDetr: DETR with Query Enhancement for Fine-Grained Object Detection

by Chenguang Dong, Shan Jiang, Haijiang Sun, Jiang Li, Zhenglei Yu, Jiasong Wang and Jiacheng Wang

Remote Sens. 2025, 17(5), 893; https://doi.org/10.3390/rs17050893 - 3 Mar 2025

Cited by 2 | Viewed by 2736

Abstract

Fine-grained object detection aims to accurately localize the object bounding box while identifying the specific model of the object, which is more challenging than conventional remote sensing object detection. Transformer-based object detector (DETR) can capture remote inter-feature dependencies by using attention, which is [...] Read more.

Fine-grained object detection aims to accurately localize the object bounding box while identifying the specific model of the object, which is more challenging than conventional remote sensing object detection. Transformer-based object detector (DETR) can capture remote inter-feature dependencies by using attention, which is suitable for fine-grained object detection tasks. However, most existing DETR-like object detectors are not specifically optimized for remote sensing detection tasks. Therefore, we propose an oriented fine-grained object detection method based on transformers. First, we combine denoising training and angle coding to propose a baseline DETR-like object detector for oriented object detection. Next, we propose a new attention mechanism for extracting finer-grained features by constraining the angle of sampling points during the attentional process, ensuring that the sampling points are more evenly distributed across the object features. Then, we propose a multiscale fusion method based on bilinear pooling to obtain the enhanced query and initialize a more accurate object bounding box. Finally, we combine the localization accuracy of each query with its classification accuracy and propose a new classification loss to further enhance the high-quality queries. Evaluation results on the FAIR1M dataset show that our method achieves an average accuracy of 48.5856 mAP and the highest accuracy of 49.7352 mAP in object detection, outperforming other methods. Full article

(This article belongs to the Section AI Remote Sensing)

► Show Figures

Figure 1

25 pages, 1547 KB

Open AccessArticle

Dual-Policy Attribute-Based Searchable Encryption with Secure Keyword Update for Vehicular Social Networks

by Qianxue Wan, Muhua Liu, Lin Wang, Feng Wang and Mingchuan Zhang

Electronics 2025, 14(2), 266; https://doi.org/10.3390/electronics14020266 - 10 Jan 2025

Viewed by 1297

Abstract

Cloud-to-Vehicle (C2V) integration serves as a fundamental infrastructure to provide robust computing and storage support for Vehicular Social Networks (VSNs). However, the proliferation of sensitive personal data within VSNs poses significant challenges in achieving secure and efficient data sharing while maintaining data usability [...] Read more.

Cloud-to-Vehicle (C2V) integration serves as a fundamental infrastructure to provide robust computing and storage support for Vehicular Social Networks (VSNs). However, the proliferation of sensitive personal data within VSNs poses significant challenges in achieving secure and efficient data sharing while maintaining data usability and precise retrieval capabilities. Although existing searchable attribute-based encryption schemes offer the secure retrieval of encrypted data and fine-grained access control mechanisms, these schemes still exhibit limitations in terms of bilateral access control, dynamic index updates, and search result verification. This study presents a Dual-Policy Attribute-based Searchable Encryption (DP-ABSE) scheme with dynamic keyword update functionality for VSNs. The scheme implements a fine-grained decoupling mechanism that decomposes data attributes into two distinct components: immutable attribute names and mutable attribute values. This decomposition transfers the attribute verification process from data owners to the encrypted files themselves, enabling data attribute-level granularity in access control. Through the integration of an identity-based authentication mechanism derived from the data owner’s unique identifier and bilinear pairing verification, it achieves secure updates of the specified keywords index while preserving both the anonymity of the non-updated data and the confidentiality of the message content. The encryption process employs an offline/online two-phase design, allowing data owners to pre-compute ciphertext pools for efficient real-time encryption. Subsequently, the decryption process introduces an outsourcing local-phase mechanism, leveraging key encapsulation technology for secure attribute computation outsourcing, thereby reducing the terminal computational load. To enhance security at the terminal decryption stage, the scheme incorporates a security verification module based on retrieval keyword and ciphertext correlation validation, preventing replacement attacks and ensuring data integrity. Security analysis under standard assumptions confirms the theoretical soundness of the proposed solution, and extensive performance evaluations showcase its effectiveness. Full article

► Show Figures

Figure 1

19 pages, 6995 KB

Open AccessArticle

A Classification Model for Fine-Grained Silkworm Cocoon Images Based on Bilinear Pooling and Adaptive Feature Fusion

by Mochen Liu, Xin Hou, Mingrui Shang, Eunice Oluwabunmi Owoola, Guizheng Zhang, Wei Wei, Zhanhua Song and Yinfa Yan

Agriculture 2024, 14(12), 2363; https://doi.org/10.3390/agriculture14122363 - 22 Dec 2024

Viewed by 1618

Abstract

The quality of silkworm cocoons affects the quality and cost of silk processing. It is necessary to sort silkworm cocoons prior to silk production. Cocoon images consist of fine-grained images with large intra-class differences and small inter-class differences. The subtle intra-class features pose [...] Read more.

The quality of silkworm cocoons affects the quality and cost of silk processing. It is necessary to sort silkworm cocoons prior to silk production. Cocoon images consist of fine-grained images with large intra-class differences and small inter-class differences. The subtle intra-class features pose a serious challenge in accurately locating the effective areas and classifying silkworm cocoons. To improve the perception of intra-class features and the classification accuracy, this paper proposes a bilinear pooling classification model (B-Res41-ASE) based on adaptive multi-scale feature fusion and enhancement. B-Res41-ASE consists of three parts: a feature extraction module, a feature fusion module, and a feature enhancement module. Firstly, the backbone network, ResNet41, is constructed based on the bilinear pooling algorithm to extract complete cocoon features. Secondly, the adaptive spatial feature fusion module (ASFF) is introduced to fuse different semantic information to solve the problem of fine-grained information loss in the process of feature extraction. Finally, the squeeze and excitation module (SE) is used to suppress redundant information, enhance the weight of distinguishable regions, and reduce classification bias. Compared with the widely used classification network, the proposed model achieves the highest classification performance in the test set, with accuracy of 97.0% and an F1-score of 97.5%. The accuracy of B-Res41-ASE is 3.1% and 2.6% higher than that of the classification networks AlexNet and GoogLeNet, respectively, while the F1-score is 2.5% and 2.2% higher, respectively. Additionally, the accuracy of B-Res41-ASE is 1.9% and 7.7% higher than that of the Bilinear CNN and HBP, respectively, while the F1-score is 1.6% and 5.7% higher. The experimental results show that the proposed classification model without complex labelling outperforms other cocoon classification algorithms in terms of classification accuracy and robustness, providing a theoretical basis for the intelligent sorting of silkworm cocoons. Full article

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

► Show Figures

Figure 1

17 pages, 4993 KB

Open AccessArticle

NFSA-DTI: A Novel Drug–Target Interaction Prediction Model Using Neural Fingerprint and Self-Attention Mechanism

by Feiyang Liu, Huang Xu, Peng Cui, Shuo Li, Hongbo Wang and Ziye Wu

Int. J. Mol. Sci. 2024, 25(21), 11818; https://doi.org/10.3390/ijms252111818 - 3 Nov 2024

Cited by 2 | Viewed by 2335

Abstract

Existing deep learning methods have shown outstanding performance in predicting drug–target interactions. However, they still have limitations: (1) the over-reliance on locally extracted features by some single encoders, with insufficient consideration of global features, and (2) the inadequate modeling and learning of local [...] Read more.

Existing deep learning methods have shown outstanding performance in predicting drug–target interactions. However, they still have limitations: (1) the over-reliance on locally extracted features by some single encoders, with insufficient consideration of global features, and (2) the inadequate modeling and learning of local crucial interaction sites in drug–target interaction pairs. In this study, we propose a novel drug–target interaction prediction model called the Neural Fingerprint and Self-Attention Mechanism (NFSA-DTI), which effectively integrates the local information of drug molecules and target sequences with their respective global features. The neural fingerprint method is used in this model to extract global features of drug molecules, while the self-attention mechanism is utilized to enhance CNN’s capability in capturing the long-distance dependencies between the subsequences in the target amino acid sequence. In the feature fusion module, we improve the bilinear attention network by incorporating attention pooling, which enhances the model’s ability to learn local crucial interaction sites in the drug–target pair. The experimental results on three benchmark datasets demonstrated that NFSA-DTI outperformed all baseline models in predictive performance. Furthermore, case studies illustrated that our model could provide valuable insights for drug discovery. Moreover, our model offers molecular-level interpretations. Full article

(This article belongs to the Special Issue Molecular Computer Science and Artificial Intelligence for Drug Discovery)

► Show Figures

Figure 1

24 pages, 1677 KB

Open AccessArticle

CPINet: Towards A Novel Cross-Polarimetric Interaction Network for Dual-Polarized SAR Ship Classification

by Jinglu He, Ruiting Sun, Yingying Kong, Wenlong Chang, Chenglu Sun, Gaige Chen, Yinghua Li, Zhe Meng and Fuping Wang

Remote Sens. 2024, 16(18), 3479; https://doi.org/10.3390/rs16183479 - 19 Sep 2024

Cited by 1 | Viewed by 2022

Abstract

With the rapid development of the modern world, it is imperative to achieve effective and efficient monitoring for territories of interest, especially for the broad ocean area. For surveillance of ship targets at sea, a common and powerful approach is to take advantage [...] Read more.

With the rapid development of the modern world, it is imperative to achieve effective and efficient monitoring for territories of interest, especially for the broad ocean area. For surveillance of ship targets at sea, a common and powerful approach is to take advantage of satellite synthetic aperture radar (SAR) systems. Currently, using satellite SAR images for ship classification is a challenging issue due to complex sea situations and the imaging variances of ships. Fortunately, the emergence of advanced satellite SAR sensors has shed much light on the SAR ship automatic target recognition (ATR) task, e.g., utilizing dual-polarization (dual-pol) information to boost the performance of SAR ship classification. Therefore, in this paper we have developed a novel cross-polarimetric interaction network (CPINet) to explore the abundant polarization information of dual-pol SAR images with the help of deep learning strategies, leading to an effective solution for high-performance ship classification. First, we establish a novel multiscale deep feature extraction framework to fully mine the characteristics of dual-pol SAR images in a coarse-to-fine manner. Second, to further leverage the complementary information of dual-pol SAR images, we propose a mixed-order squeeze–excitation (MO-SE) attention mechanism, in which the first- and second-order statistics of the deep features from one single-polarized SAR image are extracted to guide the learning of another polarized one. Then, the intermediate multiscale fused and MO-SE augmented dual-polarized deep feature maps are respectively aggregated by the factorized bilinear coding (FBC) pooling method. Meanwhile, the last multiscale fused deep feature maps for each single-polarized SAR image are also individually aggregated by the FBC. Finally, four kinds of highly discriminative deep representations are obtained for loss computation and category prediction. For better network training, the gradient normalization (GradNorm) method for multitask networks is extended to adaptively balance the contribution of each loss component. Extensive experiments on the three- and five-category dual-pol SAR ship classification dataset collected from the open and free OpenSARShip database demonstrate the superiority and robustness of CPINet compared with state-of-the-art methods for the dual-polarized SAR ship classification task. Full article

(This article belongs to the Special Issue SAR in Big Data Era III)

► Show Figures

Figure 1

17 pages, 4607 KB

Open AccessArticle

Research on the Wild Mushroom Recognition Method Based on Transformer and the Multi-Scale Feature Fusion Compact Bilinear Neural Network

by He Liu, Qingran Hu and Dongyan Huang

Agriculture 2024, 14(9), 1618; https://doi.org/10.3390/agriculture14091618 - 15 Sep 2024

Cited by 3 | Viewed by 1400

Abstract

Wild mushrooms are popular for their taste and nutritional value; however, non-experts often struggle to distinguish between toxic and non-toxic species when foraging in the wild, potentially leading to poisoning incidents. To address this issue, this study proposes a compact bilinear neural network [...] Read more.

Wild mushrooms are popular for their taste and nutritional value; however, non-experts often struggle to distinguish between toxic and non-toxic species when foraging in the wild, potentially leading to poisoning incidents. To address this issue, this study proposes a compact bilinear neural network method based on Transformer and multi-scale feature fusion. The method utilizes a dual-stream structure that integrates multiple feature extractors, enhancing the comprehensiveness of image information capture. Additionally, bottleneck attention and efficient multi-scale attention modules are embedded to effectively capture multi-scale features while maintaining low computational costs. By employing a compact bilinear pooling module, the model achieves high-order feature interactions, reducing the number of parameters without compromising performance. Experimental results demonstrate that the proposed method achieves an accuracy of 98.03%, outperforming existing comparative methods. This proves the superior recognition performance of the model, making it more reliable in distinguishing wild mushrooms while capturing key information from multiple dimensions, enabling it to better handle complex scenarios. Furthermore, the development of public-facing identification tools based on this method could help reduce the risk of poisoning incidents. Building on these findings, the study suggests strengthening the research and development of digital agricultural technologies, promoting the application of intelligent recognition technologies in agriculture, and providing technical support for agricultural production and resource management through digital platforms. This would provide a theoretical foundation for the innovation of digital agriculture and promote its sustainable development. Full article

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

► Show Figures

Figure 1

17 pages, 13784 KB

Open AccessArticle

Continuous Dictionary of Nodes Model and Bilinear-Diffusion Representation Learning for Brain Disease Analysis

by Jiarui Liang, Tianyi Yan, Yin Huang, Ting Li, Songhui Rao, Hongye Yang, Jiayu Lu, Yan Niu, Dandan Li, Jie Xiang and Bin Wang

Brain Sci. 2024, 14(8), 810; https://doi.org/10.3390/brainsci14080810 - 13 Aug 2024

Viewed by 1687

Abstract

Brain networks based on functional magnetic resonance imaging (fMRI) provide a crucial perspective for diagnosing brain diseases. Representation learning has recently attracted tremendous attention due to its strong representation capability, which can be naturally applied to brain disease analysis. However, traditional representation learning [...] Read more.

Brain networks based on functional magnetic resonance imaging (fMRI) provide a crucial perspective for diagnosing brain diseases. Representation learning has recently attracted tremendous attention due to its strong representation capability, which can be naturally applied to brain disease analysis. However, traditional representation learning only considers direct and local node interactions in original brain networks, posing challenges in constructing higher-order brain networks to represent indirect and extensive node interactions. To address this problem, we propose the Continuous Dictionary of Nodes model and Bilinear-Diffusion (CDON-BD) network for brain disease analysis. The CDON model is innovatively used to learn the original brain network, with its encoder weights directly regarded as latent features. To fully integrate latent features, we further utilize Bilinear Pooling to construct higher-order brain networks. The Diffusion Module is designed to capture extensive node interactions in higher-order brain networks. Compared to state-of-the-art methods, CDON-BD demonstrates competitive classification performance on two real datasets. Moreover, the higher-order representations learned by our method reveal brain regions relevant to the diseases, contributing to a better understanding of the pathology of brain diseases. Full article

(This article belongs to the Section Neuropsychiatry)

► Show Figures

Figure 1

18 pages, 5800 KB

Open AccessArticle

Bilinear Distance Feature Network for Semantic Segmentation in PowerLine Corridor Point Clouds

by Yunyi Zhou, Ziyi Feng, Chunling Chen and Fenghua Yu

Sensors 2024, 24(15), 5021; https://doi.org/10.3390/s24155021 - 2 Aug 2024

Cited by 1 | Viewed by 1446

Abstract

Semantic segmentation of target objects in power transmission line corridor point cloud scenes is a crucial step in powerline tree barrier detection. The massive quantity, disordered distribution, and non-uniformity of point clouds in power transmission line corridor scenes pose significant challenges for feature [...] Read more.

Semantic segmentation of target objects in power transmission line corridor point cloud scenes is a crucial step in powerline tree barrier detection. The massive quantity, disordered distribution, and non-uniformity of point clouds in power transmission line corridor scenes pose significant challenges for feature extraction. Previous studies have often overlooked the core utilization of spatial information, limiting the network’s ability to understand complex geometric shapes. To overcome this limitation, this paper focuses on enhancing the deep expression of spatial geometric information in segmentation networks and proposes a method called BDF-Net to improve RandLA-Net. For each input 3D point cloud data, BDF-Net first encodes the relative coordinates and relative distance information into spatial geometric feature representations through the Spatial Information Encoding block to capture the local spatial structure of the point cloud data. Subsequently, the Bilinear Pooling block effectively combines the feature information of the point cloud with the spatial geometric representation by leveraging its bilinear interaction capability thus learning more discriminative local feature descriptors. The Global Feature Extraction block captures the global structure information in the point cloud data by using the ratio between the point position and the relative position, so as to enhance the semantic understanding ability of the network. In order to verify the performance of BDF-Net, this paper constructs a dataset, PPCD, for the point cloud scenario of transmission line corridors and conducts detailed experiments on it. The experimental results show that BDF-Net achieves significant performance improvements in various evaluation metrics, specifically achieving an OA of 97.16%, a mIoU of 77.48%, and a mAcc of 87.6%, which are 3.03%, 16.23%, and 18.44% higher than RandLA-Net, respectively. Moreover, comparisons with other state-of-the-art methods also verify the superiority of BDF-Net in point cloud semantic segmentation tasks. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

18 pages, 2412 KB

Open AccessArticle

Fine-Grained Recognition of Mixed Signals with Geometry Coordinate Attention

by Qingwu Yi, Qing Wang, Jianwu Zhang, Xiaoran Zheng and Zetao Lu

Sensors 2024, 24(14), 4530; https://doi.org/10.3390/s24144530 - 13 Jul 2024

Cited by 1 | Viewed by 1133

Abstract

With the advancement of technology, signal modulation types are becoming increasingly diverse and complex. The phenomenon of signal time–frequency overlap during transmission poses significant challenges for the classification and recognition of mixed signals, including poor recognition capabilities and low generality. This paper presents [...] Read more.

With the advancement of technology, signal modulation types are becoming increasingly diverse and complex. The phenomenon of signal time–frequency overlap during transmission poses significant challenges for the classification and recognition of mixed signals, including poor recognition capabilities and low generality. This paper presents a recognition model for the fine-grained analysis of mixed signal characteristics, proposing a Geometry Coordinate Attention mechanism and introducing a low-rank bilinear pooling module to more effectively extract signal features for classification. The model employs a residual neural network as its backbone architecture and utilizes the Geometry Coordinate Attention mechanism for time–frequency weighted analysis based on information geometry theory. This analysis targets multiple-scale features within the architecture, producing time–frequency weighted features of the signal. These weighted features are further analyzed through a low-rank bilinear pooling module, combined with the backbone features, to achieve fine-grained feature fusion. This results in a fused feature vector for mixed signal classification. Experiments were conducted on a simulated dataset comprising 39,600 mixed-signal time–frequency plots. The model was benchmarked against a baseline using a residual neural network. The experimental outcomes demonstrated an improvement of 9% in the exact match ratio and 5% in the Hamming score. These results indicate that the proposed model significantly enhances the recognition capability and generalizability of mixed signal classification. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

17 pages, 6040 KB

Open AccessArticle

AM-UNet: Field Ridge Segmentation of Paddy Field Images Based on an Improved MultiResUNet Network

by Xulong Wu, Peng Fang, Xing Liu, Muhua Liu, Peichen Huang, Xianhao Duan, Dakang Huang and Zhaopeng Liu

Agriculture 2024, 14(4), 637; https://doi.org/10.3390/agriculture14040637 - 21 Apr 2024

Cited by 5 | Viewed by 2013

Abstract

In order to solve the problem of image boundary segmentation caused by the irregularity of paddy fields in southern China, a high-precision segmentation method based on the improved MultiResUNet model for paddy field mapping is proposed, combining the characteristics of paddy field scenes. [...] Read more.

In order to solve the problem of image boundary segmentation caused by the irregularity of paddy fields in southern China, a high-precision segmentation method based on the improved MultiResUNet model for paddy field mapping is proposed, combining the characteristics of paddy field scenes. We introduce the attention gate (AG) mechanism at the end of the encoder–decoder skip connections in the MultiResUNet model to generate the weights and highlight the response of the field ridge area, add an atrous spatial pyramid pooling (ASPP) module after the end of the encoder down-sampling, use an appropriate combination of expansion rates to improve the identification of small-scale edge details, use 1 × 1 convolution to improve the range of the sensory field after bilinear interpolation to increase the segmentation accuracy, and, thus, construct the AM-UNet paddy field ridge segmentation model. The experimental results show that the IoU, precision, and F1 value of the AM-UNet model are 88.74%, 93.45%, and 93.95%, respectively, and that inference time for a single image is 168ms, enabling accurate and real-time segmentation of field ridges in a complex paddy field environment. Thus, the AM-UNet model can provide technical support for the development of vision-based automatic navigation systems for agricultural machines. Full article

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

► Show Figures

Figure 1

Search Results (47)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (47)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI