Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (100)

Search Parameters:
Keywords = context aggregation attention network

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
24 pages, 4213 KiB  
Article
Multi-Scale Feature Fusion and Global Context Modeling for Fine-Grained Remote Sensing Image Segmentation
by Yifan Li and Gengshen Wu
Appl. Sci. 2025, 15(10), 5542; https://doi.org/10.3390/app15105542 - 15 May 2025
Viewed by 67
Abstract
High-precision remote sensing image semantic segmentation plays a crucial role in Earth science analysis and urban management, especially in urban remote sensing scenarios with rich details and complex structures. In such cases, the collaborative modeling of global and local contexts is a key [...] Read more.
High-precision remote sensing image semantic segmentation plays a crucial role in Earth science analysis and urban management, especially in urban remote sensing scenarios with rich details and complex structures. In such cases, the collaborative modeling of global and local contexts is a key challenge for improving segmentation accuracy. Existing methods that rely on single feature extraction architectures, such as convolutional neural networks (i.e., CNNs) and vision transformers, are prone to semantic fragmentation due to their limited feature representation capabilities. To address this issue, we propose a hybrid architecture model called PLGTransformer, which is based on dual-encoder collaborative enhancement and integrates pyramid pooling and graph convolutional network (i.e., GCN) modules. Our model innovatively constructs a parallel encoding architecture combining Swin transformer and CNN: the CNN branch captures fine-grained features such as road and building edges through multi-scale heterogeneous convolutions, while the Swin transformer branch models global dependencies of large-scale land cover using hierarchical window attention. To further strengthen multi-granularity feature fusion, we design a dual-path pyramid pooling module to perform adaptive multi-scale context aggregation for both feature types and dynamically balance local and global contributions using learnable weights. Specifically, we introduce the GCNs to build a topological graph in the feature space, enabling geometric relationship reasoning for multi-scale feature nodes at high resolution. Experiments on the Potsdam and Vaihingen datasets show that our model outperforms contemporary advanced methods and significantly improves segmentation accuracy for small objects such as vehicles and individual buildings, thereby validating the effectiveness of the multi-feature collaborative enhancement mechanism. Full article
(This article belongs to the Special Issue Signal and Image Processing: From Theory to Applications: 2nd Edition)
Show Figures

Figure 1

33 pages, 649 KiB  
Article
Feature Integration Strategies for Neural Speaker Diarization in Conversational Telephone Speech
by Juan Ignacio Alvarez-Trejos, Alicia Lozano-Diez and Daniel Ramos
Appl. Sci. 2025, 15(9), 4842; https://doi.org/10.3390/app15094842 - 27 Apr 2025
Viewed by 201
Abstract
This paper addresses the challenge of optimizing end-to-end neural diarization systems for conversational telephone speech, focusing on diverse acoustic features beyond traditional Mel-filterbanks. We present a methodological framework for integrating and analyzing different feature types as input to the well-known End-to-End Neural Diarization [...] Read more.
This paper addresses the challenge of optimizing end-to-end neural diarization systems for conversational telephone speech, focusing on diverse acoustic features beyond traditional Mel-filterbanks. We present a methodological framework for integrating and analyzing different feature types as input to the well-known End-to-End Neural Diarization with Encoder Decoder Attractors (EEND-EDA) model, focusing on Emphasized Channel Attention, Propagation and Aggregation in Time Delay Neural Network (ECAPA-TDNN) embeddings and Geneva Minimalistic Acoustic Parameter Sets (GeMAPS). Our approach combines systematic feature analysis with adaptation strategies, including speaker-count restriction and regularization techniques. Moreover, through comprehensive ablation studies of GeMAPS features, we identify optimal acoustic parameters and temporal contexts for diarization tasks, achieving a reduced feature set that maintains performance while decreasing computational complexity. Experiments on the CallHome corpus demonstrate that our optimized ECAPA-TDNN with Mel-filterbank combination reduces Diarization Error Rate by 29% relative to baseline systems. Our evaluation framework extends beyond traditional metrics, revealing that different feature combinations exhibit distinct strengths in specific diarization aspects. Full article
(This article belongs to the Special Issue Applied Audio Interaction)
Show Figures

Figure 1

19 pages, 19461 KiB  
Article
MEF-CAAN: Multi-Exposure Image Fusion Based on a Low-Resolution Context Aggregation Attention Network
by Wenxiang Zhang, Chunmeng Wang and Jun Zhu
Sensors 2025, 25(8), 2500; https://doi.org/10.3390/s25082500 - 16 Apr 2025
Viewed by 252
Abstract
Recently, deep learning-based multi-exposure image fusion methods have been widely explored due to their high efficiency and adaptability. However, most existing multi-exposure image fusion methods have insufficient feature extraction ability for recovering information and details in extremely exposed areas. In order to solve [...] Read more.
Recently, deep learning-based multi-exposure image fusion methods have been widely explored due to their high efficiency and adaptability. However, most existing multi-exposure image fusion methods have insufficient feature extraction ability for recovering information and details in extremely exposed areas. In order to solve this problem, we propose a multi-exposure image fusion method based on a low-resolution context aggregation attention network (MEF-CAAN). First, we feed the low-resolution version of the input images to CAAN to predict their low-resolution weight maps. Then, the high-resolution weight maps are generated by guided filtering for upsampling (GFU). Finally, the high-resolution fused image is generated by a weighted summation operation. Our proposed network is unsupervised and adaptively adjusts the weights of channels to achieve better feature extraction. Experimental results show that our method outperforms existing state-of-the-art methods by both quantitative and qualitative evaluation. Full article
Show Figures

Figure 1

17 pages, 3806 KiB  
Article
M3RTNet: Combustion State Recognition Model of MSWI Process Based on Res-Transformer and Three Feature Enhancement Strategies
by Jian Zhang, Rongcheng Sun, Jian Tang and Haoran Pei
Sustainability 2025, 17(8), 3412; https://doi.org/10.3390/su17083412 - 11 Apr 2025
Viewed by 211
Abstract
The accurate identification of combustion status can effectively improve the efficiency of municipal solid waste incineration and reduce the risk of secondary pollution, which plays a key role in promoting the sustainable development of the waste treatment industry. Due to the low accuracy [...] Read more.
The accurate identification of combustion status can effectively improve the efficiency of municipal solid waste incineration and reduce the risk of secondary pollution, which plays a key role in promoting the sustainable development of the waste treatment industry. Due to the low accuracy of the incinerator flame combustion state recognition in the current municipal solid waste incineration process, this paper proposes a Res-Transformer flame combustion state recognition model based on three feature enhancement strategies. In this paper, Res-Transformer is used as the backbone network of the model to effectively integrate local flame combustion features and global features. Firstly, an efficient multi-scale attention module is introduced into Resnet, which uses a multi-scale parallel sub-network to establish long and short dependencies. Then, a deformable multi-head attention module is designed in the Transformer layer, and the deformable self-attention is used to extract long-term feature dependencies. Finally, we design a context feature fusion module to efficiently aggregate the spatial information of the shallow network and the channel information of the deep network, and enhance the cross-layer features extracted by the network. In order to verify the effectiveness of the model proposed in this paper, comparative experiments and ablation experiments were conducted on the municipal solid waste incineration image dataset. The results showed that the Acc, Pre, Rec and F1 score indices of the model proposed in this paper were 96.16%, 96.15%, 96.07% and 96.11%, respectively. Experiments demonstrate the effectiveness and robustness of this method. Full article
(This article belongs to the Section Waste and Recycling)
Show Figures

Figure 1

25 pages, 8014 KiB  
Article
Breaking Barriers in Thyroid Cytopathology: Harnessing Deep Learning for Accurate Diagnosis
by Seo Young Oh, Yong Moon Lee, Dong Joo Kang, Hyeong Ju Kwon, Sabyasachi Chakraborty and Jae Hyun Park
Bioengineering 2025, 12(3), 293; https://doi.org/10.3390/bioengineering12030293 - 14 Mar 2025
Viewed by 566
Abstract
Background: We address the application of artificial intelligence (AI) techniques in thyroid cytopathology, specifically for diagnosing papillary thyroid carcinoma (PTC), the most common type of thyroid cancer. Methods: Our research introduces deep learning frameworks that analyze cytological images from fine-needle aspiration cytology (FNAC), [...] Read more.
Background: We address the application of artificial intelligence (AI) techniques in thyroid cytopathology, specifically for diagnosing papillary thyroid carcinoma (PTC), the most common type of thyroid cancer. Methods: Our research introduces deep learning frameworks that analyze cytological images from fine-needle aspiration cytology (FNAC), a key preoperative diagnostic method for PTC. The first framework is a patch-level classifier referred as “TCS-CNN”, based on a convolutional neural network (CNN) architecture, to predict thyroid cancer based on the Bethesda System (TBS) category. The second framework is an attention-based deep multiple instance learning (AD-MIL) model, which employs a feature extractor using TCS-CNN and an attention mechanism to aggregate features from smaller-patch-level regions into predictions for larger-patch-level regions, referred to as bag-level predictions in this context. Results: The proposed TCS-CNN framework achieves an accuracy of 97% and a recall of 96% for small-patch-level classification, accurately capturing local malignancy information. Additionally, the AD-MIL framework also achieves approximately 96% accuracy and recall, demonstrating that this framework can maintain comparable performance while expanding the diagnostic coverage to larger regions through patch aggregation. Conclusions: This study provides a feasibility analysis for thyroid cytopathology classification and visual interpretability for AI diagnosis, suggesting potential improvements in patient outcomes and reductions in healthcare costs. Full article
(This article belongs to the Section Biosignal Processing)
Show Figures

Figure 1

22 pages, 2908 KiB  
Article
LSTGINet: Local Attention Spatio-Temporal Graph Inference Network for Age Prediction
by Yi Lei, Xin Wen, Yanrong Hao, Ruochen Cao, Chengxin Gao, Peng Wang, Yuanyuan Guo and Rui Cao
Algorithms 2025, 18(3), 138; https://doi.org/10.3390/a18030138 - 3 Mar 2025
Viewed by 514
Abstract
There is a close correlation between brain aging and age. However, traditional neural networks cannot fully capture the potential correlation between age and brain aging due to the limited receptive field. Furthermore, they are more concerned with deep spatial semantics, ignoring the fact [...] Read more.
There is a close correlation between brain aging and age. However, traditional neural networks cannot fully capture the potential correlation between age and brain aging due to the limited receptive field. Furthermore, they are more concerned with deep spatial semantics, ignoring the fact that effective temporal information can enrich the representation of low-level semantics. To address these limitations, a local attention spatio-temporal graph inference network (LSTGINet) was developed to explore the details of the association between age and brain aging, taking into account both spatio-temporal and temporal perspectives. First, multi-scale temporal and spatial branches are used to increase the receptive field and model the age information simultaneously, achieving the perception of static correlation. Second, these spatio-temporal feature graphs are reconstructed, and large topographies are constructed. The graph inference node aggregation and transfer functions fully capture the hidden dynamic correlation between brain aging and age. A new local attention module is embedded in the graph inference component to enrich the global context semantics, establish dependencies and interactivity between different spatio-temporal features, and balance the differences in the spatio-temporal distribution of different semantics. We use a newly designed weighted loss function to supervise the learning of the entire prediction framework to strengthen the inference process of spatio-temporal correlation. The final experimental results show that the MAE on baseline datasets such as CamCAN and NKI are 6.33 and 6.28, respectively, better than the current state-of-the-art age prediction methods, and provides a basis for assessing the state of brain aging in adults. Full article
(This article belongs to the Section Combinatorial Optimization, Graph, and Network Algorithms)
Show Figures

Figure 1

22 pages, 11312 KiB  
Article
Multi-Scale Kolmogorov-Arnold Network (KAN)-Based Linear Attention Network: Multi-Scale Feature Fusion with KAN and Deformable Convolution for Urban Scene Image Semantic Segmentation
by Yuanhang Li, Shuo Liu, Jie Wu, Weichao Sun, Qingke Wen, Yibiao Wu, Xiujuan Qin and Yanyou Qiao
Remote Sens. 2025, 17(5), 802; https://doi.org/10.3390/rs17050802 - 25 Feb 2025
Cited by 1 | Viewed by 916
Abstract
The introduction of an attention mechanism in remote sensing image segmentation improves the accuracy of the segmentation. In this paper, a novel multi-scale KAN-based linear attention (MKLA) segmentation network of MKLANet is developed to promote a better segmentation result. A hybrid global–local attention [...] Read more.
The introduction of an attention mechanism in remote sensing image segmentation improves the accuracy of the segmentation. In this paper, a novel multi-scale KAN-based linear attention (MKLA) segmentation network of MKLANet is developed to promote a better segmentation result. A hybrid global–local attention mechanism in a feature decoder is designed to enhance the ability of aggregating the global–local context and avoiding potential blocking artifacts for feature extraction and segmentation. The local attention channel adopts MKLA block by bringing the merits of KAN convolution in Mamba like the linear attention block to improve the ability of handling linear and nonlinear feature and complex function approximation with a few extra computations. The global attention channel uses long-range cascade encoder–decoder block, where it mainly employs the 7 × 7 depth-wise convolution token mixer and lightweight 7 × 7 dilated deep convolution to capture the long-distance spatial features field and retain key spatial information. In addition, to enrich the input of the attention block, a deformable convolution module is developed between the encoder output and corresponding scale decoder, which can improve the expression ability of the segmentation model without increasing the depth of the network. The experimental results of the Vaihingen dataset (83.68% in mIoU, 92.98 in OA, and 91.08 in mF1), the UAVid dataset (69.78% in mIoU, 96.51 in OA), the LoveDA dataset (51.53% in mIoU, 86.42% in OA, and 67.19% in mF1), and the Potsdam dataset (97.14% in mIoU, 92.64% in OA, and 93.8% in mF1) outperform other advanced attention-based approaches in terms of small targets and edges’ segmentation. Full article
Show Figures

Graphical abstract

19 pages, 972 KiB  
Article
Context Geometry Volume and Warping Refinement for Real-Time Stereo Matching
by Ning Liu, Nannan Zhao, Ou Yang, Qingtian Wu and Xinyu Ouyang
Electronics 2025, 14(5), 892; https://doi.org/10.3390/electronics14050892 - 24 Feb 2025
Viewed by 447
Abstract
In the past three years, the stereo matching method based on 3D CNNs has achieved surprising results and has received more and more attention. However, most stereo matching approaches aim to improve prediction accuracy by constructing and aggregating cost volumes through extensive 3D [...] Read more.
In the past three years, the stereo matching method based on 3D CNNs has achieved surprising results and has received more and more attention. However, most stereo matching approaches aim to improve prediction accuracy by constructing and aggregating cost volumes through extensive 3D convolutions, which not only does not fully utilize the geometric information, but also overlooks the computational speed. Thus, achieving high-accuracy, high-efficiency stereo matching has become challenging. In this paper, we present a rapid and precise stereo matching network named CGW based on 3D CNNs, which simultaneously achieves real-time functioning, considerable accuracy, and a strong generalization capability. The network is divided into two parts. The first part constructs the geometric attention cube through a lightweight feature extraction network and a lightweight 3D regularization network. The second part filters the context features using the geometric attention cube to obtain the context geometric cube, and finally, the disparity is predicted and refined to obtain the final disparity. We adopted MobileNetV3 as an efficient backbone for feature extraction and designed 3D depthwise separable convolutions with residual structures to replace traditional 3D convolutions for constructing the cost volume and performing cost aggregation, aiming to reduce the model size and improve the computational speed. Additionally, we designed the context geometric attention (CGA) module and embedded it into the lightweight 3D regularization network, as well as designed the Warped Disparity Refinement (WDR) network to further improve the disparity prediction accuracy. CGA effectively guides cost aggregation by integrating rich contextual and geometric information, while also providing feedback for feature learning to guide more efficient context feature extraction. WDR constructs a warping cost volume using the obtained initial disparity, combined with image features, the initial disparity map, and reconstruction errors, to optimize the disparity. According to the initial disparity, it searches for the accurate disparity within a refined range. By narrowing the search range, WDR simplifies the task for the network to locate the correct disparity (residual), while simultaneously improving the computational efficiency. Experiments conducted on multiple benchmark datasets showed that, compared to other fast methods, CGW has advantages in both speed and accuracy and exhibits better generalization performance. Full article
Show Figures

Figure 1

20 pages, 7947 KiB  
Article
Towards an Efficient Remote Sensing Image Compression Network with Visual State Space Model
by Yongqiang Wang, Feng Liang, Shang Wang, Hang Chen, Qi Cao, Haisheng Fu and Zhenjiao Chen
Remote Sens. 2025, 17(3), 425; https://doi.org/10.3390/rs17030425 - 26 Jan 2025
Cited by 1 | Viewed by 986
Abstract
In the past few years, deep learning has achieved remarkable advancements in the area of image compression. Remote sensing image compression networks focus on enhancing the similarity between the input and reconstructed images, effectively reducing the storage and bandwidth requirements for high-resolution remote [...] Read more.
In the past few years, deep learning has achieved remarkable advancements in the area of image compression. Remote sensing image compression networks focus on enhancing the similarity between the input and reconstructed images, effectively reducing the storage and bandwidth requirements for high-resolution remote sensing images. As the network’s effective receptive field (ERF) expands, it can capture more feature information across the remote sensing images, thereby reducing spatial redundancy and improving compression efficiency. However, the majority of these learned image compression (LIC) techniques are primarily CNN-based and transformer-based, often failing to balance the global ERF and computational complexity optimally. To alleviate this issue, we propose a learned remote sensing image compression network with visual state space model named VMIC to achieve a better trade-off between computational complexity and performance. Specifically, instead of stacking small convolution kernels or heavy self-attention mechanisms, we employ a 2D-bidirectional selective scan mechanism. Every element within the feature map aggregates data from multiple spatial positions, establishing a globally effective receptive field with linear computational complexity. We extend it to an omni-selective scan for the global-spatial correlations within our Channel and Global Context Entropy Model (CGCM), enabling the integration of spatial and channel priors to minimize redundancy across slices. Experimental results demonstrate that the proposed method achieves superior trade-off between rate-distortion performance and complexity. Furthermore, in comparison to traditional codecs and learned image compression algorithms, our model achieves BD-rate reductions of −4.48%, −9.80% over the state-of-the-art VTM on the AID and NWPU VHR-10 datasets, respectively, as well as −6.73% and −7.93% on the panchromatic and multispectral images of the WorldView-3 remote sensing dataset. Full article
Show Figures

Figure 1

24 pages, 1649 KiB  
Article
Heterogeneous Multi-Agent Risk-Aware Graph Encoder with Continuous Parameterized Decoder for Autonomous Driving Trajectory Prediction
by Shaoyu Sun, Chunyang Wang, Bo Xiao, Xuelian Liu, Chunhao Shi, Rongliang Sun and Ruijie Han
Electronics 2025, 14(1), 105; https://doi.org/10.3390/electronics14010105 - 30 Dec 2024
Viewed by 817
Abstract
Trajectory prediction is a critical component of autonomous driving, intelligent transportation systems, and human–robot interactions, particularly in complex environments like intersections, where diverse road constraints and multi-agent interactions significantly increase the risk of collisions. To address these challenges, a Heterogeneous Risk-Aware Graph Encoder [...] Read more.
Trajectory prediction is a critical component of autonomous driving, intelligent transportation systems, and human–robot interactions, particularly in complex environments like intersections, where diverse road constraints and multi-agent interactions significantly increase the risk of collisions. To address these challenges, a Heterogeneous Risk-Aware Graph Encoder with Continuous Parameterized Decoder for Trajectory Prediction (HRGC) is proposed. The architecture integrates a heterogeneous risk-aware local graph attention encoder, a low-rank temporal transformer, a fusion lane and global interaction encoder layer, and a continuous parameterized decoder. First, a heterogeneous risk-aware edge-enhanced local attention encoder is proposed, which enhances edge features using risk metrics, constructs graph structures through graph optimization and spectral clustering, maps these enhanced edge features to corresponding graph structure indices, and enriches node features with local agent-to-agent attention. Risk-aware edge attention is aggregated to update node features, capturing spatial and collision-aware representations, embedding crucial risk information into agents’ features. Next, the low-rank temporal transformer is employed to reduce computational complexity while preserving accuracy. By modeling agent-to-lane relationships, it captures critical map context, enhancing the understanding of agent behavior. Global interaction further refines node-to-node interactions via attention mechanisms, integrating risk and spatial information for improved trajectory encoding. Finally, a trajectory decoder utilizes the aforementioned encoder to generate control points for continuous parameterized curves. These control points are multiplied by dynamically adjusted basis functions, which are determined by an adaptive knot vector that adjusts based on velocity and curvature. This mechanism ensures precise local control and the superior handling of sharp turns and speed variations, resulting in more accurate real-time predictions in complex scenarios. The HRGC network achieves superior performance on the Argoverse 1 benchmark, outperforming state-of-the-art methods in complex urban intersections. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

31 pages, 1942 KiB  
Article
An Evidential Solar Irradiance Forecasting Method Using Multiple Sources of Information
by Mohamed Mroueh, Moustapha Doumiati, Clovis Francis and Mohamed Machmoum
Energies 2024, 17(24), 6361; https://doi.org/10.3390/en17246361 - 18 Dec 2024
Viewed by 756
Abstract
In the context of global warming, renewable energy sources, particularly wind and solar power, have garnered increasing attention in recent decades. Accurate forecasting of the energy output in microgrids (MGs) is essential for optimizing energy management, reducing maintenance costs, and prolonging the lifespan [...] Read more.
In the context of global warming, renewable energy sources, particularly wind and solar power, have garnered increasing attention in recent decades. Accurate forecasting of the energy output in microgrids (MGs) is essential for optimizing energy management, reducing maintenance costs, and prolonging the lifespan of energy storage systems. This study proposes an innovative approach to solar irradiance forecasting based on the theory of belief functions, introducing a novel and flexible evidential method for short-to-medium-term predictions. The proposed machine learning model is designed to effectively handle missing data and make optimal use of available information. By integrating multiple predictive models, each focusing on different meteorological factors, the approach enhances forecasting accuracy. The Yager combination method and pignistic transformation are utilized to aggregate the individual models. Applied to a publicly available dataset, the method achieved promising results, with an average root mean square error (RMS) of 27.83 W/m2 calculated from eight distinct forecast days. This performance surpasses the best reported results of 30.21 W/m2 from recent comparable studies for one-day-ahead solar irradiance forecasting. Comparisons with deep learning-based methods, such as long short-term memory (LSTM) networks and recurrent neural networks (RNNs), demonstrate that the proposed approach is competitive with state-of-the-art techniques, delivering reliable predictions with significantly less training data. The full potential and limitations of the proposed approach are also discussed. Full article
Show Figures

Figure 1

21 pages, 4740 KiB  
Article
Multi-Scale Geometric Feature Extraction and Global Transformer for Real-World Indoor Point Cloud Analysis
by Yisheng Chen, Yu Xiao, Hui Wu, Chongcheng Chen and Ding Lin
Mathematics 2024, 12(23), 3827; https://doi.org/10.3390/math12233827 - 3 Dec 2024
Cited by 1 | Viewed by 1302
Abstract
Indoor point clouds often present significant challenges due to the complexity and variety of structures and high object similarity. The local geometric structure helps the model learn the shape features of objects at the detail level, while the global context provides overall scene [...] Read more.
Indoor point clouds often present significant challenges due to the complexity and variety of structures and high object similarity. The local geometric structure helps the model learn the shape features of objects at the detail level, while the global context provides overall scene semantics and spatial relationship information between objects. To address these challenges, we propose a novel network architecture, PointMSGT, which includes a multi-scale geometric feature extraction (MSGFE) module and a global Transformer (GT) module. The MSGFE module consists of a geometric feature extraction (GFE) module and a multi-scale attention (MSA) module. The GFE module reconstructs triangles through each point’s two neighbors and extracts detailed local geometric relationships by the triangle’s centroid, normal vector, and plane constant. The MSA module extracts features through multi-scale convolutions and adaptively aggregates features, focusing on both local geometric details and global semantic information at different scale levels, enhancing the understanding of complex scenes. The global Transformer employs a self-attention mechanism to capture long-range dependencies across the entire point cloud. The proposed method demonstrates competitive performance in real-world indoor scenarios, with a mIoU of 68.6% in semantic segmentation on S3DIS and OA of 86.4% in classification on ScanObjectNN. Full article
(This article belongs to the Special Issue Machine Learning Methods and Mathematical Modeling with Applications)
Show Figures

Figure 1

24 pages, 5614 KiB  
Article
Semantic Segmentation of Corn Leaf Blotch Disease Images Based on U-Net Integrated with RFB Structure and Dual Attention Mechanism
by Ye Mu, Ke Li, Yu Sun and Yu Bao
Agronomy 2024, 14(11), 2652; https://doi.org/10.3390/agronomy14112652 - 11 Nov 2024
Cited by 1 | Viewed by 950
Abstract
Northern corn leaf blight (NCLB) is caused by a fungus and can be susceptible to the disease throughout the growing period of corn, posing a significant impact on corn yield. Aiming at the problems of under-segmentation, over-segmentation, and low segmentation accuracy in the [...] Read more.
Northern corn leaf blight (NCLB) is caused by a fungus and can be susceptible to the disease throughout the growing period of corn, posing a significant impact on corn yield. Aiming at the problems of under-segmentation, over-segmentation, and low segmentation accuracy in the traditional segmentation model of northern corn leaf blight, this study proposes a segmentation method based on an improved U-Net network model. By introducing a convolutional layer and maximum pooling layer to a VGG19 network, the channel attention module and spatial attention module (CBAM) are fused, and the squeeze excitation (SE) attention mechanism is combined. This enhances image feature decoding, integrates feature maps of each layer, strengthens the feature extraction process, expands the sensory fields and aggregates context information, and reduces the loss of location and dense semantic information caused by the pooling operation. Findings from the study show that the proposed NCLB-Net has significantly improved the MIoU and PA indexes, reaching 92.43% and 94.71%, respectively. Compared with the traditional methods, U-Net, SETR, DAnet, OCnet, PSPNet, etc., the MIoU is improved by 20.81%, 16.10%, 9.79%, 5.27%, and 11.06%, and the PA is improved by 11.49%, 8.18%, 9.54%, 13.11%, and 6.26%, respectively. Full article
(This article belongs to the Section Precision and Digital Agriculture)
Show Figures

Figure 1

19 pages, 3453 KiB  
Article
Autonomous UAV Chasing with Monocular Vision: A Learning-Based Approach
by Yuxuan Jin, Tiantian Song, Chengjie Dai, Ke Wang and Guanghua Song
Aerospace 2024, 11(11), 928; https://doi.org/10.3390/aerospace11110928 - 9 Nov 2024
Cited by 1 | Viewed by 858
Abstract
In recent years, unmanned aerial vehicles (UAVs) have shown significant potential across diverse applications, drawing attention from both academia and industry. In specific scenarios, UAVs are expected to achieve formation flying without relying on communication or external assistance. In this context, our work [...] Read more.
In recent years, unmanned aerial vehicles (UAVs) have shown significant potential across diverse applications, drawing attention from both academia and industry. In specific scenarios, UAVs are expected to achieve formation flying without relying on communication or external assistance. In this context, our work focuses on the classic leader-follower formation and presents a learning-based UAV chasing control method that enables a quadrotor UAV to autonomously chase a highly maneuverable fixed-wing UAV. The proposed method utilizes a neural network called Vision Follow Net (VFNet), which integrates monocular visual data with the UAV’s flight state information. Utilizing a multi-head self-attention mechanism, VFNet aggregates data over a time window to predict the waypoints for the chasing flight. The quadrotor’s yaw angle is controlled by calculating the line-of-sight (LOS) angle to the target, ensuring that the target remains within the onboard camera’s field of view during the flight. A simulation flight system is developed and used for neural network training and validation. Experimental results indicate that the quadrotor maintains stable chasing performance through various maneuvers of the fixed-wing UAV and can sustain formation over long durations. Our research explores the use of end-to-end neural networks for UAV formation flying, spanning from perception to control. Full article
(This article belongs to the Section Aeronautics)
Show Figures

Figure 1

22 pages, 4866 KiB  
Article
TCEDN: A Lightweight Time-Context Enhanced Depression Detection Network
by Keshan Yan, Shengfa Miao, Xin Jin, Yongkang Mu, Hongfeng Zheng, Yuling Tian, Puming Wang, Qian Yu and Da Hu
Life 2024, 14(10), 1313; https://doi.org/10.3390/life14101313 - 16 Oct 2024
Viewed by 971
Abstract
The automatic video recognition of depression is becoming increasingly important in clinical applications. However, traditional depression recognition models still face challenges in practical applications, such as high computational costs, the poor application effectiveness of facial movement features, and spatial feature degradation due to [...] Read more.
The automatic video recognition of depression is becoming increasingly important in clinical applications. However, traditional depression recognition models still face challenges in practical applications, such as high computational costs, the poor application effectiveness of facial movement features, and spatial feature degradation due to model stitching. To overcome these challenges, this work proposes a lightweight Time-Context Enhanced Depression Detection Network (TCEDN). We first use attention-weighted blocks to aggregate and enhance video frame-level features, easing the model’s computational workload. Next, by integrating the temporal and spatial changes of video raw features and facial movement features in a self-learning weight manner, we enhance the precision of depression detection. Finally, a fusion network of 3-Dimensional Convolutional Neural Network (3D-CNN) and Convolutional Long Short-Term Memory Network (ConvLSTM) is constructed to minimize spatial feature loss by avoiding feature flattening and to achieve depression score prediction. Tests on the AVEC2013 and AVEC2014 datasets reveal that our approach yields results on par with state-of-the-art techniques for detecting depression using video analysis. Additionally, our method has significantly lower computational complexity than mainstream methods. Full article
Show Figures

Figure 1

Back to TopTop