MDPI - Publisher of Open Access Journals

16 pages, 1714 KB

Open AccessArticle

MCAF-Net: Multi-Channel Temporal Cross-Attention Network with Dynamic Gating for Sleep Stage Classification

by Xuegang Xu, Quan Wang, Changyuan Wang and Yaxin Zhang

Sensors 2025, 25(14), 4251; https://doi.org/10.3390/s25144251 - 8 Jul 2025

Viewed by 512

Automated sleep stage classification is essential for objective sleep evaluation and clinical diagnosis. While numerous algorithms have been developed, the predominant existing methods utilize single-channel electroencephalogram (EEG) signals, neglecting the complementary physiological information available from other channels. Standard polysomnography (PSG) recordings capture multiple [...] Read more.

Automated sleep stage classification is essential for objective sleep evaluation and clinical diagnosis. While numerous algorithms have been developed, the predominant existing methods utilize single-channel electroencephalogram (EEG) signals, neglecting the complementary physiological information available from other channels. Standard polysomnography (PSG) recordings capture multiple concurrent biosignals, where sophisticated integration of these multi-channel data represents a critical factor for enhanced classification accuracy. Conventional multi-channel fusion techniques typically employ elementary concatenation approaches that insufficiently model the intricate cross-channel correlations, consequently limiting classification performance. To overcome these shortcomings, we present MCAF-Net, a novel network architecture that employs temporal convolution modules to extract channel-specific features from each input signal and introduces a dynamic gated multi-head cross-channel attention mechanism (MCAF) to effectively model the interdependencies between different physiological channels. Experimental results show that our proposed method successfully integrates information from multiple channels, achieving significant improvements in sleep stage classification compared to the vast majority of existing methods. Full article

(This article belongs to the Section Sensor Networks)

► Show Figures

Figure 1

11 pages, 670 KB

Open AccessArticle

LLM-Enhanced Chinese Morph Resolution in E-Commerce Live Streaming Scenarios

by Xiaoye Ouyang, Liu Yuan, Xiaocheng Hu, Jiahao Zhu and Jipeng Qiang

Entropy 2025, 27(7), 698; https://doi.org/10.3390/e27070698 - 29 Jun 2025

Viewed by 513

Abstract

E-commerce live streaming in China has become a major retail channel, yet hosts often employ subtle phonetic or semantic “morphs” to evade moderation and make unsubstantiated claims, posing risks to consumers. To address this, we study the Live Auditory Morph Resolution (LiveAMR) task, [...] Read more.

E-commerce live streaming in China has become a major retail channel, yet hosts often employ subtle phonetic or semantic “morphs” to evade moderation and make unsubstantiated claims, posing risks to consumers. To address this, we study the Live Auditory Morph Resolution (LiveAMR) task, which restores morphed speech transcriptions to their true forms. Building on prior text-based morph resolution, we propose an LLM-enhanced training framework that mines three types of explanation knowledge—predefined morph-type labels, LLM-generated reference corrections, and natural-language rationales constrained for clarity and comprehensiveness—from a frozen large language model. These annotations are concatenated with the original morphed sentence and used to fine-tune a lightweight T5 model under a standard cross-entropy objective. In experiments on two test sets (in-domain and out-of-domain), our method achieves substantial gains over baselines, improving

F_{0.5}

by up to 7 pp in-domain (to 0.943) and 5 pp out-of-domain (to 0.799) compared to a strong T5 baseline. These results demonstrate that structured LLM-derived signals can be mined without fine-tuning the LLM itself and injected into small models to yield efficient, accurate morph resolution. Full article

(This article belongs to the Special Issue Natural Language Processing and Data Mining)

► Show Figures

Figure 1

15 pages, 1265 KB

Open AccessArticle

Research on a Short-Term Power Load Forecasting Method Based on a Three-Channel LSTM-CNN

by Xiaojing Zhao, Huimin Peng, Lanyong Zhang and Hongwei Ma

Electronics 2025, 14(11), 2262; https://doi.org/10.3390/electronics14112262 - 31 May 2025

Viewed by 677

Abstract

Aiming at addressing the problem of insufficient fusion of multi-source heterogeneous features in short-term power load forecasting, this paper proposes a three-channel LSTM-CNN hybrid forecasting model. This method extracts the temporal characteristics of time, weather, and historical loads through independent LSTM channels and [...] Read more.

Aiming at addressing the problem of insufficient fusion of multi-source heterogeneous features in short-term power load forecasting, this paper proposes a three-channel LSTM-CNN hybrid forecasting model. This method extracts the temporal characteristics of time, weather, and historical loads through independent LSTM channels and realizes cross-modal spatial correlation mining by using a Convolutional Neural Network (CNN). The time channel takes hour, week, and holiday codes as input to capture the daily/weekly cycle patterns. The meteorological channel integrates real-time data such as temperature and humidity and models the nonlinear delay effect between them and the load. The historical load channel sequence of the past 24 h is analyzed to interpret the internal trend and fluctuation characteristics. The output of the three channels is concatenated and then input into a one-dimensional convolutional layer. Cross-modal cooperative features are extracted through local perception. Finally, the 24 h load prediction value is output through the fully connected layer. The experimental results show that the prediction model based on the three-channel LSTM-CNN has a better prediction effect compared with the existing models, and its average absolute percentage error on the two datasets is reduced to 1.367% and 0.974%, respectively. The research results provide an expandable framework for multi-source time series data modeling, supporting the precise dispatching of smart grids and optimal energy allocation. Full article

► Show Figures

Figure 1

23 pages, 14157 KB

Open AccessArticle

A Spatial–Frequency Combined Transformer for Cloud Removal of Optical Remote Sensing Images

by Fulian Zhao, Chenlong Ding, Xin Li, Runliang Xia, Caifeng Wu and Xin Lyu

Remote Sens. 2025, 17(9), 1499; https://doi.org/10.3390/rs17091499 - 23 Apr 2025

Cited by 1 | Viewed by 934

Abstract

Cloud removal is a vital preprocessing step in optical remote sensing images (RSIs), directly enhancing image quality and providing a high-quality data foundation for downstream tasks, such as water body extraction and land cover classification. Existing methods attempt to combine spatial and frequency [...] Read more.

Cloud removal is a vital preprocessing step in optical remote sensing images (RSIs), directly enhancing image quality and providing a high-quality data foundation for downstream tasks, such as water body extraction and land cover classification. Existing methods attempt to combine spatial and frequency features for cloud removal, but they rely on shallow feature concatenation or simplistic addition operations, which fail to establish effective cross-domain synergistic mechanisms. These approaches lead to edge blurring and noticeable color distortions. To address this issue, we propose a spatial–frequency collaborative enhancement Transformer network named SFCRFormer, which significantly improves cloud removal performance. The core of SFCRFormer is the spatial–frequency combined Transformer (SFCT) block, which implements cross-domain feature reinforcement through a dual-branch spatial attention (DBSA) module and frequency self-attention (FreSA) module to effectively capture global context information. The DBSA module enhances the representation of spatial features by decoupling spatial-channel dependencies via parallelized feature refinement paths, surpassing the performance of traditional single-branch attention mechanisms in maintaining the overall structure of the image. FreSA leverages fast Fourier transform to convert features into the frequency domain, using frequency differences between object and cloud regions to achieve precise cloud detection and fine-grained removal. In order to further enhance the features extracted by DBSA and FreSA, we design the dual-domain feed-forward network (DDFFN), which effectively improves the detail fidelity of the restored image by multi-scale convolution for local refinement and frequency transformation for global structural optimization. A composite loss function, incorporating Charbonnier loss and Structural Similarity Index (SSIM) loss, is employed to optimize model training and balance pixel-level accuracy with structural fidelity. Experimental evaluations on the public datasets demonstrate that SFCRFormer outperforms state-of-the-art methods across various quantitative metrics, including PSNR and SSIM, while delivering superior visual results. Full article

(This article belongs to the Special Issue Global Monitoring of Inland Water Using Remote Sensing and Artificial Intelligence)

► Show Figures

Figure 1

26 pages, 16081 KB

Open AccessArticle

Deep Learning for Enhanced-Resolution Reconstruction of Sentinel-1 Backscatter NRCS in China’s Offshore Seas

by Xiaoxiao Zhang, Yu Du, Xiang Su and Zhensen Wu

Remote Sens. 2025, 17(8), 1385; https://doi.org/10.3390/rs17081385 - 13 Apr 2025

Viewed by 714

Abstract

High-precision and high-resolution scattering data play a crucial role in remote sensing applications, including ocean environment monitoring, target recognition, and classification. This paper proposes a deep learning-based model aimed at enhancing and reconstructing the spatial resolution of Sentinel-1 backscatter NRCS (Normalized Radar Cross [...] Read more.

High-precision and high-resolution scattering data play a crucial role in remote sensing applications, including ocean environment monitoring, target recognition, and classification. This paper proposes a deep learning-based model aimed at enhancing and reconstructing the spatial resolution of Sentinel-1 backscatter NRCS (Normalized Radar Cross Section) data for China’s offshore seas, including the Bohai Sea, Yellow Sea, East China Sea, Taiwan Strait, and South China Sea. The proposed model innovatively integrates a Self-Attention Feature Fusion based on the Weighted Channel Concatenation (SAFF-WCC) module, combined with the Global Attention Mechanism (GAM) and High-Order Attention (HOA) modules. The feature fusion module effectively regulates the proportion of each feature during the fusion process through weight allocation, significantly enhancing the effectiveness of multi-feature integration. The experimental results show that the model can effectively enhance the fine structural features of marine targets when the resolution is doubled, though the enhancement effect is slightly diminished when the resolution is quadrupled. For high-resolution data reconstruction, the proposed model demonstrates significant advantages over traditional methods under a scale factor of 2 across four key evaluation metrics, including PSNR, SSIM, MS-SSIM, and MAPE. These results indicate that the proposed deep learning-based model is not only well-suited for scattering data from China’s offshore seas but also provides robust support for subsequent research on ocean target recognition, as well as the compression and transmission of SAR data. Full article

(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing-III)

► Show Figures

Figure 1

19 pages, 14103 KB

Open AccessArticle

DCFA-YOLO: A Dual-Channel Cross-Feature-Fusion Attention YOLO Network for Cherry Tomato Bunch Detection

by Shanglei Chai, Ming Wen, Pengyu Li, Zhi Zeng and Yibin Tian

Agriculture 2025, 15(3), 271; https://doi.org/10.3390/agriculture15030271 - 26 Jan 2025

Cited by 5 | Viewed by 1596

Abstract

To better utilize multimodal information for agriculture applications, this paper proposes a cherry tomato bunch detection network using dual-channel cross-feature fusion. It aims to improve detection performance by employing the complementary information of color and depth images. Using the existing YOLOv8_n as the [...] Read more.

To better utilize multimodal information for agriculture applications, this paper proposes a cherry tomato bunch detection network using dual-channel cross-feature fusion. It aims to improve detection performance by employing the complementary information of color and depth images. Using the existing YOLOv8_n as the baseline framework, it incorporates a dual-channel cross-fusion attention mechanism for multimodal feature extraction and fusion. In the backbone network, a ShuffleNetV2 unit is adopted to optimize the efficiency of initial feature extraction. During the feature fusion stage, two modules are introduced by using re-parameterization, dynamic weighting, and efficient concatenation to strengthen the representation of multimodal information. Meanwhile, the CBAM mechanism is integrated at different feature extraction stages, combined with the improved SPPF_CBAM module, to effectively enhance the focus and representation of critical features. Experimental results using a dataset obtained from a commercial greenhouse demonstrate that DCFA-YOLO excels in cherry tomato bunch detection, achieving an mAP50 of 96.5%, a significant improvement over the baseline model, while drastically reducing computational complexity. Furthermore, comparisons with other state-of-the-art YOLO and other object detection models validate its detection performance. This provides an efficient solution for multimodal fusion for real-time fruit detection in the context of robotic harvesting, running at 52fps on a regular computer. Full article

(This article belongs to the Special Issue Computational, AI and IT Solutions Helping Agriculture)

► Show Figures

Figure 1

18 pages, 6627 KB

Open AccessArticle

Attention-Guided Fusion and Classification for Hyperspectral and LiDAR Data

by Jing Huang, Yinghao Zhang, Fang Yang and Li Chai

Remote Sens. 2024, 16(1), 94; https://doi.org/10.3390/rs16010094 - 25 Dec 2023

Cited by 8 | Viewed by 2931

Abstract

The joint use of hyperspectral image (HSI) and Light Detection And Ranging (LiDAR) data has been widely applied for land cover classification because it can comprehensively represent the urban structures and land material properties. However, existing methods fail to combine the different image [...] Read more.

The joint use of hyperspectral image (HSI) and Light Detection And Ranging (LiDAR) data has been widely applied for land cover classification because it can comprehensively represent the urban structures and land material properties. However, existing methods fail to combine the different image information effectively, which limits the semantic relevance of different data sources. To solve this problem, in this paper, an Attention-guided Fusion and Classification framework based on Convolutional Neural Network (AFC-CNN) is proposed to classify the land cover based on the joint use of HSI and LiDAR data. In the feature extraction module, AFC-CNN employs the three dimensional convolutional neural network (3D-CNN) combined with a multi-scale structure to extract the spatial-spectral features of HSI, and uses a 2D-CNN to extract the spatial features from LiDAR data. Simultaneously, the spectral attention mechanism is adopted to assign weights to the spectral channels, and the cross attention mechanism is introduced to impart significant spatial weights from LiDAR to HSI, which enhance the interaction between HSI and LiDAR data and leverage the fusion information. Then two feature branches are concatenated and transferred to the feature fusion module for higher-level feature extraction and fusion. In the fusion module, AFC-CNN adopts the depth separable convolution connected through the residual structures to obtain the advanced features, which can help reduce computational complexity and improve the fitting ability of the model. Finally, the fused features are sent into the linear classification module for final classification. Experimental results on three datasets, i.e., Houston, MUUFL and Trento datasets show that the proposed AFC-CNN framework achieves better classification accuracy compared with the state-of-the-art algorithms. The overall accuracy of AFC-CNN on Houston, MUUFL and Trento datasets are 94.2%, 95.3% and 99.5%, respectively. Full article

(This article belongs to the Special Issue Advanced Artificial Intelligence Algorithm for the Analysis of Remote Sensing Images II)

► Show Figures

Figure 1

17 pages, 6980 KB

Open AccessArticle

A Maturity Detection Method for Hemerocallis Citrina Baroni Based on Lightweight and Attention Mechanism

by Bin Sheng, Ligang Wu and Nan Zhang

Appl. Sci. 2023, 13(21), 12043; https://doi.org/10.3390/app132112043 - 4 Nov 2023

Cited by 3 | Viewed by 1574

Abstract

Hemerocallis citrina Baroni with different maturity levels has different uses for food and medicine and has different economic benefits and sales value. However, the growth speed of Hemerocallis citrina Baroni is fast, the harvesting cycle is short, and the maturity identification is completely [...] Read more.

Hemerocallis citrina Baroni with different maturity levels has different uses for food and medicine and has different economic benefits and sales value. However, the growth speed of Hemerocallis citrina Baroni is fast, the harvesting cycle is short, and the maturity identification is completely dependent on experience, so the harvesting efficiency is low, the dependence on manual labor is large, and the identification standard is not uniform. In this paper, we propose a GCB YOLOv7 Hemerocallis citrina Baroni maturity detection method based on a lightweight neural network and attention mechanism. First, lightweight Ghost convolution is introduced to reduce the difficulty of feature extraction and decrease the number of computations and parameters of the model. Second, between the feature extraction backbone network and the feature fusion network, the CBAM mechanism is added to perform the feature extraction independently in the channel and spatial dimensions, which improves the tendency of the feature extraction and enhances the expressive ability of the model. Last, in the feature fusion network, Bi FPN is used instead of the concatenate feature fusion method, which increases the information fusion channels while decreasing the number of edge nodes and realizing cross-channel information fusion. The experimental results show that the improved GCB YOLOv7 algorithm reduces the number of parameters and floating-point operations by about 2.03 million and 7.3 G, respectively. The training time is reduced by about 0.122 h, and the model volume is compressed from 74.8 M to 70.8 M. In addition, the average precision is improved from 91.3% to 92.2%, mAP@0.5 and mAP@0.5:0.95 are improved by about 1.38% and 0.20%, respectively, and the detection efficiency reaches 10 ms/frame, which meets the real-time performance requirements. It can be seen that the improved GCB YOLOv7 algorithm is not only lightweight but also effectively improves detection precision. Full article

(This article belongs to the Special Issue Intelligent Control of Unmanned Aerial Vehicles)

► Show Figures

Figure 1

34 pages, 12677 KB

Open AccessArticle

CCC-SSA-UNet: U-Shaped Pansharpening Network with Channel Cross-Concatenation and Spatial–Spectral Attention Mechanism for Hyperspectral Image Super-Resolution

by Zhichao Liu, Guangliang Han, Hang Yang, Peixun Liu, Dianbing Chen, Dongxu Liu and Anping Deng

Remote Sens. 2023, 15(17), 4328; https://doi.org/10.3390/rs15174328 - 2 Sep 2023

Cited by 5 | Viewed by 2574

Abstract

A hyperspectral image (HSI) has a very high spectral resolution, which can reflect the target’s material properties well. However, the limited spatial resolution poses a constraint on its applicability. In recent years, some hyperspectral pansharpening studies have attempted to integrate HSI with PAN [...] Read more.

A hyperspectral image (HSI) has a very high spectral resolution, which can reflect the target’s material properties well. However, the limited spatial resolution poses a constraint on its applicability. In recent years, some hyperspectral pansharpening studies have attempted to integrate HSI with PAN to improve the spatial resolution of HSI. Although some achievements have been made, there are still shortcomings, such as insufficient utilization of multi-scale spatial and spectral information, high computational complexity, and long network model inference time. To address the above issues, we propose a novel U-shaped hyperspectral pansharpening network with channel cross-concatenation and spatial–spectral attention mechanism (CCC-SSA-UNet). A novel channel cross-concatenation (CCC) method was designed to effectively enhance the fusion ability of different input source images and the fusion ability between feature maps at different levels. Regarding network design, integrating a UNet based on an encoder–decoder architecture with a spatial–spectral attention network (SSA-Net) based on residual spatial–spectral attention (Res-SSA) blocks further enhances the ability to extract spatial and spectral features. The experiment shows that our proposed CCC-SSA-UNet exhibits state-of-the-art performance and has a shorter inference runtime and lower GPU memory consumption than most of the existing hyperspectral pansharpening methods. Full article

(This article belongs to the Section AI Remote Sensing)

► Show Figures

Graphical abstract

16 pages, 3416 KB

Open AccessArticle

LSGP-USFNet: Automated Attention Deficit Hyperactivity Disorder Detection Using Locations of Sophie Germain’s Primes on Ulam’s Spiral-Based Features with Electroencephalogram Signals

by Orhan Atila, Erkan Deniz, Ali Ari, Abdulkadir Sengur, Subrata Chakraborty, Prabal Datta Barua and U. Rajendra Acharya

Sensors 2023, 23(16), 7032; https://doi.org/10.3390/s23167032 - 8 Aug 2023

Cited by 7 | Viewed by 2407

Abstract

Anxiety, learning disabilities, and depression are the symptoms of attention deficit hyperactivity disorder (ADHD), an isogenous pattern of hyperactivity, impulsivity, and inattention. For the early diagnosis of ADHD, electroencephalogram (EEG) signals are widely used. However, the direct analysis of an EEG is highly [...] Read more.

Anxiety, learning disabilities, and depression are the symptoms of attention deficit hyperactivity disorder (ADHD), an isogenous pattern of hyperactivity, impulsivity, and inattention. For the early diagnosis of ADHD, electroencephalogram (EEG) signals are widely used. However, the direct analysis of an EEG is highly challenging as it is time-consuming, nonlinear, and nonstationary in nature. Thus, in this paper, a novel approach (LSGP-USFNet) is developed based on the patterns obtained from Ulam’s spiral and Sophia Germain’s prime numbers. The EEG signals are initially filtered to remove the noise and segmented with a non-overlapping sliding window of a length of 512 samples. Then, a time–frequency analysis approach, namely continuous wavelet transform, is applied to each channel of the segmented EEG signal to interpret it in the time and frequency domain. The obtained time–frequency representation is saved as a time–frequency image, and a non-overlapping n × n sliding window is applied to this image for patch extraction. An n × n Ulam’s spiral is localized on each patch, and the gray levels are acquired from this patch as features where Sophie Germain’s primes are located in Ulam’s spiral. All gray tones from all patches are concatenated to construct the features for ADHD and normal classes. A gray tone selection algorithm, namely ReliefF, is employed on the representative features to acquire the final most important gray tones. The support vector machine classifier is used with a 10-fold cross-validation criteria. Our proposed approach, LSGP-USFNet, was developed using a publicly available dataset and obtained an accuracy of 97.46% in detecting ADHD automatically. Our generated model is ready to be validated using a bigger database and it can also be used to detect other children’s neurological disorders. Full article

(This article belongs to the Special Issue EEG Sensors for Biomedical Applications)

► Show Figures

Figure 1

23 pages, 1849 KB

Open AccessArticle

Emotion Recognition from Large-Scale Video Clips with Cross-Attention and Hybrid Feature Weighting Neural Networks

by Siwei Zhou, Xuemei Wu, Fan Jiang, Qionghao Huang and Changqin Huang

Int. J. Environ. Res. Public Health 2023, 20(2), 1400; https://doi.org/10.3390/ijerph20021400 - 12 Jan 2023

Cited by 23 | Viewed by 5869

Abstract

The emotion of humans is an important indicator or reflection of their mental states, e.g., satisfaction or stress, and recognizing or detecting emotion from different media is essential to perform sequence analysis or for certain applications, e.g., mental health assessments, job stress level [...] Read more.

The emotion of humans is an important indicator or reflection of their mental states, e.g., satisfaction or stress, and recognizing or detecting emotion from different media is essential to perform sequence analysis or for certain applications, e.g., mental health assessments, job stress level estimation, and tourist satisfaction assessments. Emotion recognition based on computer vision techniques, as an important method of detecting emotion from visual media (e.g., images or videos) of human behaviors with the use of plentiful emotional cues, has been extensively investigated because of its significant applications. However, most existing models neglect inter-feature interaction and use simple concatenation for feature fusion, failing to capture the crucial complementary gains between face and context information in video clips, which is significant in addressing the problems of emotion confusion and emotion misunderstanding. Accordingly, in this paper, to fully exploit the complementary information between face and context features, we present a novel cross-attention and hybrid feature weighting network to achieve accurate emotion recognition from large-scale video clips, and the proposed model consists of a dual-branch encoding (DBE) network, a hierarchical-attention encoding (HAE) network, and a deep fusion (DF) block. Specifically, the face and context encoding blocks in the DBE network generate the respective shallow features. After this, the HAE network uses the cross-attention (CA) block to investigate and capture the complementarity between facial expression features and their contexts via a cross-channel attention operation. The element recalibration (ER) block is introduced to revise the feature map of each channel by embedding global information. Moreover, the adaptive-attention (AA) block in the HAE network is developed to infer the optimal feature fusion weights and obtain the adaptive emotion features via a hybrid feature weighting operation. Finally, the DF block integrates these adaptive emotion features to predict an individual emotional state. Extensive experimental results of the CAER-S dataset demonstrate the effectiveness of our method, exhibiting its potential in the analysis of tourist reviews with video clips, estimation of job stress levels with visual emotional evidence, or assessments of mental healthiness with visual media. Full article

(This article belongs to the Special Issue Satisfaction, Stress, and Mental Health in the Tourism/Hospitality Industry)

► Show Figures

Figure 1

19 pages, 6842 KB

Open AccessArticle

DAMNet: A Dual Adjacent Indexing and Multi-Deraining Network for Real-Time Image Deraining

by Penghui Zhao, Haowen Zheng, Suigu Tang, Zongren Chen and Yangyan Liang

Fractal Fract. 2023, 7(1), 24; https://doi.org/10.3390/fractalfract7010024 - 26 Dec 2022

Cited by 3 | Viewed by 3415

Abstract

Image deraining is increasingly critical in the domain of computer vision. However, there is a lack of fast deraining algorithms for multiple images without temporal and spatial features. To fill this gap, an efficient image-deraining algorithm based on dual adjacent indexing and multi-deraining [...] Read more.

Image deraining is increasingly critical in the domain of computer vision. However, there is a lack of fast deraining algorithms for multiple images without temporal and spatial features. To fill this gap, an efficient image-deraining algorithm based on dual adjacent indexing and multi-deraining layers is proposed to increase deraining efficiency. The deraining operation is based on two proposals: the dual adjacent method and the joint training method based on multi-deraining layers. The dual adjacent structure indexes pixels from adjacent features of the previous layer to merge with features produced by deraining layers, and the merged features are reshaped to prepare for the loss computation. Joint training method is based on multi-deraining layers, which utilise the pixelshuffle operation to prepare various deraining features for the multi-loss functions. Multi-loss functions jointly compute the structural similarity by loss calculation based on reshaped and deraining features. The features produced by the four deraining layers are concatenated in the channel dimension to obtain the total structural similarity and mean square error. During the experiments, the proposed deraining model is relatively efficient in primary rain datasets, reaching more than 200 fps, and maintains relatively impressive results in single and crossing datasets, demonstrating that our deraining model reaches one of the most advanced ranks in the domain of rain-removing. Full article

(This article belongs to the Section Engineering)

► Show Figures

Figure 1

16 pages, 30605 KB

Open AccessArticle

Optical Flow-Aware-Based Multi-Modal Fusion Network for Violence Detection

by Yang Xiao, Guxue Gao, Liejun Wang and Huicheng Lai

Entropy 2022, 24(7), 939; https://doi.org/10.3390/e24070939 - 6 Jul 2022

Cited by 9 | Viewed by 3598

Abstract

Violence detection aims to locate violent content in video frames. Improving the accuracy of violence detection is of great importance for security. However, the current methods do not make full use of the multi-modal vision and audio information, which affects the accuracy of [...] Read more.

Violence detection aims to locate violent content in video frames. Improving the accuracy of violence detection is of great importance for security. However, the current methods do not make full use of the multi-modal vision and audio information, which affects the accuracy of violence detection. We found that the violence detection accuracy of different kinds of videos is related to the change of optical flow. With this in mind, we propose an optical flow-aware-based multi-modal fusion network (OAMFN) for violence detection. Specifically, we use three different fusion strategies to fully integrate multi-modal features. First, the main branch concatenates RGB features and audio features and the optical flow branch concatenates optical flow features with RGB features and audio features, respectively. Then, the cross-modal information fusion module integrates the features of different combinations and applies weights to them to capture cross-modal information in audio and video. After that, the channel attention module extracts valuable information by weighting the integration features. Furthermore, an optical flow-aware-based score fusion strategy is introduced to fuse features of different modalities from two branches. Compared with methods on the XD-Violence dataset, our multi-modal fusion network yields APs that are 83.09% and 1.4% higher than those of the state-of-the-art methods in offline detection, and 78.09% and 4.42% higher than those of the state-of-the-art methods in online detection. Full article

(This article belongs to the Topic Machine and Deep Learning)

► Show Figures

Graphical abstract

13 pages, 2541 KB

Open AccessArticle

OtoPair: Combining Right and Left Eardrum Otoscopy Images to Improve the Accuracy of Automated Image Analysis

by Seda Camalan, Aaron C. Moberly, Theodoros Teknos, Garth Essig, Charles Elmaraghy, Nazhat Taj-Schaal and Metin N. Gurcan

Appl. Sci. 2021, 11(4), 1831; https://doi.org/10.3390/app11041831 - 19 Feb 2021

Cited by 9 | Viewed by 4935

Abstract

The accurate diagnosis of otitis media (OM) and other middle ear and eardrum abnormalities is difficult, even for experienced otologists. In our earlier studies, we developed computer-aided diagnosis systems to improve the diagnostic accuracy. In this study, we investigate a novel approach, called [...] Read more.

The accurate diagnosis of otitis media (OM) and other middle ear and eardrum abnormalities is difficult, even for experienced otologists. In our earlier studies, we developed computer-aided diagnosis systems to improve the diagnostic accuracy. In this study, we investigate a novel approach, called OtoPair, which uses paired eardrum images together rather than using a single eardrum image to classify them as ‘normal’ or ‘abnormal’. This also mimics the way that otologists evaluate ears, because they diagnose eardrum abnormalities by examining both ears. Our approach creates a new feature vector, which is formed with extracted features from a pair of high-resolution otoscope images or images that are captured by digital video-otoscopes. The feature vector has two parts. The first part consists of lookup table-based values created by using deep learning techniques reported in our previous OtoMatch content-based image retrieval system. The second part consists of handcrafted features that are created by recording registration errors between paired eardrums, color-based features, such as histogram of a* and b* component of the L*a*b* color space, and statistical measurements of these color channels. The extracted features are concatenated to form a single feature vector, which is then classified by a tree bagger classifier. A total of 150-pair (300-single) of eardrum images, which are either the same category (normal-normal and abnormal-abnormal) or different category (normal-abnormal and abnormal-normal) pairs, are used to perform several experiments. The proposed approach increases the accuracy from 78.7% (±0.1%) to 85.8% (±0.2%) on a three-fold cross-validation method. These are promising results with a limited number of eardrum pairs to demonstrate the feasibility of using a pair of eardrum images instead of single eardrum images to improve the diagnostic accuracy. Full article

(This article belongs to the Special Issue Computer-aided Biomedical Imaging 2020: Advances and Prospects)

► Show Figures

Figure 1

24 pages, 4034 KB

Open AccessArticle

Good Code Sets from Complementary Pairs via Discrete Frequency Chips

by Ravi Kadlimatti and Adly T. Fam

Aerospace 2017, 4(2), 28; https://doi.org/10.3390/aerospace4020028 - 7 May 2017

Cited by 5 | Viewed by 7672

Abstract

It is shown that replacing the sinusoidal chip in Golay complementary code pairs by special classes of waveforms that satisfy two conditions, symmetry/anti-symmetry and quazi-orthogonality in the convolution sense, renders the complementary codes immune to frequency selective fading and also allows for concatenating [...] Read more.

It is shown that replacing the sinusoidal chip in Golay complementary code pairs by special classes of waveforms that satisfy two conditions, symmetry/anti-symmetry and quazi-orthogonality in the convolution sense, renders the complementary codes immune to frequency selective fading and also allows for concatenating them in time using one frequency band/channel. This results in a zero-sidelobe region around the mainlobe and an adjacent region of small cross-correlation sidelobes. The symmetry/anti-symmetry property results in the zero-sidelobe region on either side of the mainlobe, while quasi-orthogonality of the two chips keeps the adjacent region of cross-correlations small. Such codes are constructed using discrete frequency-coding waveforms (DFCW) based on linear frequency modulation (LFM) and piecewise LFM (PLFM) waveforms as chips for the complementary code pair, as they satisfy both the symmetry/anti-symmetry and quasi-orthogonality conditions. It is also shown that changing the slopes/chirp rates of the DFCW waveforms (based on LFM and PLFM waveforms) used as chips with the same complementary code pair results in good code sets with a zero-sidelobe region. It is also shown that a second good code set with a zero-sidelobe region could be constructed from the mates of the complementary code pair, while using the same DFCW waveforms as their chips. The cross-correlation between the two sets is shown to contain a zero-sidelobe region and an adjacent region of small cross-correlation sidelobes. Thus, the two sets are quasi-orthogonal and could be combined to form a good code set with twice the number of codes without affecting their cross-correlation properties. Or a better good code set with the same number codes could be constructed by choosing the best candidates form the two sets. Such code sets find utility in multiple input-multiple output (MIMO) radar applications. Full article

(This article belongs to the Special Issue Radar and Aerospace)

► Show Figures

Graphical abstract

Search Results (15)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (15)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI