applsci-logo

Journal Browser

Journal Browser

AI, Machine Learning and Deep Learning in Signal Processing

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (30 September 2023) | Viewed by 119193

Special Issue Editors


E-Mail Website
Guest Editor
Department of Electronics Engineering, Chief of Creative Content Labs; Sangmyung University, Seoul 110-743, Korea
Interests: digital signal processing; deep learning; copyright protection technology; digital watermarking

E-Mail Website
Guest Editor
Department of Electronics Engineering, Chungwoon University, Chungnam, Korea
Interests: micro-processor; digital signal processing; biomedical signal processing

Special Issue Information

Dear Colleagues,

Over 10 years, the entire field of signal processing has been facing new challenges and paradigm shifts due to dramatic improvement of computational performance in hardware and an explosive increase of connected devices in the internet. Tremendous data volumes generated by ubiquitous reality have to be analyzed and processed to provide useful and meaningful information.

Artificial intelligence (AI) that is represented by machine (deep) learning provides novel insight into the field of signal processing. Consequently, new approaches, methods, theories, and tools have to be developed by the signal processing community to analyze and account for generated data volumes.

The Special Issue aims at attracting manuscripts on timely topics in the signal processing area for AI and machine learning, including deep learning. The objective of the Special Issue is to bring together recent high-quality works in AI and machine learning, including deep learning, to promote key advances in signal processing areas covered by the journal and to provide reviews of the state-of-the-art in emerging domains.

Prof. Dr. Jongweon Kim
Prof. Dr. Yongseok Lee
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Artificial Intelligence (AI)
  • Deep learning
  • Machine learning
  • Signal processing
  • Image and video processing
  • Audio and acoustic signal processing
  • Biomedical signal processing
  • Speech processing
  • Multimedia signal processing
  • Multidimensional signal processing
  • Augmented Reality
  • Virtual Reality…

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Related Special Issue

Published Papers (38 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

19 pages, 1335 KiB  
Article
Congestive Heart Failure Category Classification Using Neural Networks in Short-Term Series
by Juan L. López and José A. Vásquez-Coronel
Appl. Sci. 2023, 13(24), 13211; https://doi.org/10.3390/app132413211 - 13 Dec 2023
Cited by 1 | Viewed by 1573
Abstract
Congestive heart failure carries immense importance in the realm of public health. This significance arises from its substantial influence on the number of lives lost, economic burdens, the potential for prevention, and the opportunity to enhance the well-being of both individuals and the [...] Read more.
Congestive heart failure carries immense importance in the realm of public health. This significance arises from its substantial influence on the number of lives lost, economic burdens, the potential for prevention, and the opportunity to enhance the well-being of both individuals and the broader community through decision-making in healthcare. Several researchers have proposed neural networks for classification of different congestive heart failure categories. However, there is little information about the confidence of the prediction on short-term series. Therefore, evaluating classification models is required for effective decision-making in healthcare. This paper explores the use of three classical variants of neural networks to classify three groups of patients with congestive heart failure. The study considered the iterative method Multilayer Perceptron neural network (MLP), two non-iterative models (Extreme Learning Machine (ELM) and Random Vector Functional Link Network (RVFL)), and the CNN approach. The results showed that the deep feature learning system obtained better classification rates than MLP, ELM, and RVFL. Several scenarios designed by coupling some deep feature maps with the RVFL and MLP models showed very high simulation accuracy. The overall accuracy rate of CNN–MLP and CNN–RVFL varies between 98% and 99%. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

16 pages, 2920 KiB  
Article
Underwater Image Super-Resolution via Dual-aware Integrated Network
by Aiye Shi and Haimin Ding
Appl. Sci. 2023, 13(24), 12985; https://doi.org/10.3390/app132412985 - 5 Dec 2023
Cited by 5 | Viewed by 1535
Abstract
Underwater scenes are often affected by issues such as blurred details, color distortion, and low contrast, which are primarily caused by wavelength-dependent light scattering; these factors significantly impact human visual perception. Convolutional neural networks (CNNs) have recently displayed very promising performance in underwater [...] Read more.
Underwater scenes are often affected by issues such as blurred details, color distortion, and low contrast, which are primarily caused by wavelength-dependent light scattering; these factors significantly impact human visual perception. Convolutional neural networks (CNNs) have recently displayed very promising performance in underwater super-resolution (SR). However, the nature of CNN-based methods is local operations, making it difficult to reconstruct rich features. To solve these problems, we present an efficient and lightweight dual-aware integrated network (DAIN) comprising a series of dual-aware enhancement modules (DAEMs) for underwater SR tasks. In particular, DAEMs primarily consist of a multi-scale color correction block (MCCB) and a swin transformer layer (STL). These components work together to incorporate both local and global features, thereby enhancing the quality of image reconstruction. MCCBs can use multiple channels to process the different colors of underwater images to restore the uneven underwater light decay-affected real color and details of the images. The STL captures long-range dependencies and global contextual information, enabling the extraction of neglected features in underwater images. Experimental results demonstrate significant enhancements with a DAIN over conventional SR methods. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

30 pages, 1958 KiB  
Article
CL-TAD: A Contrastive-Learning-Based Method for Time Series Anomaly Detection
by Huynh Cong Viet Ngu and Keon Myung Lee
Appl. Sci. 2023, 13(21), 11938; https://doi.org/10.3390/app132111938 - 31 Oct 2023
Cited by 1 | Viewed by 2668
Abstract
Anomaly detection has gained increasing attention in recent years, but detecting anomalies in time series data remains challenging due to temporal dynamics, label scarcity, and data diversity in real-world applications. To address these challenges, we introduce a novel method for anomaly detection in [...] Read more.
Anomaly detection has gained increasing attention in recent years, but detecting anomalies in time series data remains challenging due to temporal dynamics, label scarcity, and data diversity in real-world applications. To address these challenges, we introduce a novel method for anomaly detection in time series data, called CL-TAD (Contrastive-Learning-based method for Times series Anomaly Detection), which employs a contrastive-learning-based representation learning technique. Inspired by the successes of reconstruction-based approaches and contrastive learning approaches, the proposed method seeks to leverage these approaches for time series anomaly detection. The CL-TAD method is comprised of two main components: positive sample generation and contrastive-learning-based representation learning. The former component generates positive samples by trying to reconstruct the original data from masked samples. These positive samples, in conjunction with the original data, serve as input for the contrastive-learning-based representation learning component. The representations of input original data and their masked data are used to detect anomalies later on. Experimental results have demonstrated that the CL-TAD method achieved the best performance on five datasets out of nine benchmark datasets over 10 other recent methods. By leveraging the reconstruction learning and contrastive learning techniques, our method offers a promising solution for effectively detecting anomalies in time series data by handling the issues raised by label scarcity and data diversity, delivering high performance. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

16 pages, 3328 KiB  
Article
Clover Dry Matter Predictor Based on Semantic Segmentation Network and Random Forest
by Yin Ji, Jiandong Fang and Yudong Zhao
Appl. Sci. 2023, 13(21), 11742; https://doi.org/10.3390/app132111742 - 26 Oct 2023
Cited by 1 | Viewed by 1309
Abstract
As a key animal feed source, the dry matter content of clover is widely regarded as an important indicator of its nutritional value and quality. The primary aim of this study is to introduce a methodology for forecasting clover dry matter content utilizing [...] Read more.
As a key animal feed source, the dry matter content of clover is widely regarded as an important indicator of its nutritional value and quality. The primary aim of this study is to introduce a methodology for forecasting clover dry matter content utilizing a semantic segmentation network. This approach involves constructing a predictive model based on visual image information to analyze the dry matter content within clover. Given the complex features embedded in clover images and the difficulty of obtaining labeled data, it becomes challenging to analyze the dry matter content directly from the images. In order to address this issue, a method for predicting dry matter in clover based on semantic segmentation network is proposed. The method uses the improved DeepLabv3+ network as the backbone of feature extraction, and integrates the SE (Squeeze-and-Excitation) attention mechanism into the ASPP (Atrous Spatial Pyramid Pooling) module to enhance the semantic segmentation performance, in order to realize the efficient extraction of the features of clover images; on this basis, a regression model based on the Random Forest (RF) method is constructed to realize the prediction of dry matter in clover. Extensive experiments conducted by applying the trained model to the dry matter prediction dataset evaluated the good predictor performance and showed that the number of each pixel level after semantic segmentation improved the performance of semantic segmentation by 18.5% compared to the baseline, and there was a great improvement in the collinearity of dry matter prediction. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

16 pages, 3592 KiB  
Article
PSI Analysis of Adversarial-Attacked DCNN Models
by Youngseok Lee and Jongweon Kim
Appl. Sci. 2023, 13(17), 9722; https://doi.org/10.3390/app13179722 - 28 Aug 2023
Viewed by 959
Abstract
In the past few years, deep convolutional neural networks (DCNNs) have surpassed human performance in tasks related to recognizing objects. However, DCNNs are also threatened by performance degradation due to adversarial examples. DCNNs are essentially black-boxed, and it is not known how the [...] Read more.
In the past few years, deep convolutional neural networks (DCNNs) have surpassed human performance in tasks related to recognizing objects. However, DCNNs are also threatened by performance degradation due to adversarial examples. DCNNs are essentially black-boxed, and it is not known how the output is determined internally; consequently, it is not known how adversarial attacks cause performance degradation inside the DCNNs. To observe the internal neuronal activities of DCNN models for adversarial examples, we analyzed the population sparseness index (PSI) values at each layer of two representative DCNN models, namely AlexNet and VGG11. From the experimental results, we observed that the internal responses of the two DCNN models to adversarial examples exhibited distinct layer-wise PSI values, differing from the internal responses to benign examples. The main contribution of this study is the discovery of significant differences in the internal responses of two specific DCNN models to adversarial and benign examples by PSI. Furthermore, our research has the potential not only to contribute to the design of more robust DCNN models against adversarial examples but also to bridge the gap between the fields of artificial intelligence and neurophysiology of the brain. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

17 pages, 2193 KiB  
Article
Speed Bump and Pothole Detection Using Deep Neural Network with Images Captured through ZED Camera
by José-Eleazar Peralta-López, Joel-Artemio Morales-Viscaya, David Lázaro-Mata, Marcos-Jesús Villaseñor-Aguilar, Juan Prado-Olivarez, Francisco-Javier Pérez-Pinal, José-Alfredo Padilla-Medina, Juan-José Martínez-Nolasco and Alejandro-Israel Barranco-Gutiérrez
Appl. Sci. 2023, 13(14), 8349; https://doi.org/10.3390/app13148349 - 19 Jul 2023
Cited by 12 | Viewed by 4594
Abstract
The condition of the roads where cars circulate is of the utmost importance to ensure that each autonomous or manual car can complete its journey satisfactorily. The existence of potholes, speed bumps, and other irregularities in the pavement can cause car wear and [...] Read more.
The condition of the roads where cars circulate is of the utmost importance to ensure that each autonomous or manual car can complete its journey satisfactorily. The existence of potholes, speed bumps, and other irregularities in the pavement can cause car wear and fatal traffic accidents. Therefore, detecting and characterizing these anomalies helps reduce the risk of accidents and damage to the vehicle. However, street images are naturally multivariate, with redundant and substantial information, as well as significantly contaminated measurement noise, making the detection of street anomalies more challenging. In this work, an automatic color image analysis using a deep neural network for the detection of potholes on the road using images taken by a ZED camera is proposed. A lightweight architecture was designed to speed up training and usage. This consists of seven properly connected and synchronized layers. All the pixels of the original image are used without resizing. The classic stride and pooling operations were used to obtain as much information as possible. A database was built using a ZED camera seated on the front of a car. The routes where the photographs were taken are located in the city of Celaya in Guanajuato, Mexico. Seven hundred and fourteen images were manually tagged, several of which contain bumps and potholes. The system was trained with 70% of the database and validated with the remaining 30%. In addition, we propose a database that discriminates between potholes and speed bumps. A precision of 98.13% using 37 convolution filters in a 3 × 3 window was obtained, which improves upon recent state-of-the-art work. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

15 pages, 1960 KiB  
Article
Predicting the Quality of Tangerines Using the GCNN-LSTM-AT Network Based on Vis–NIR Spectroscopy
by Yiran Wu, Xinhua Zhu, Qiangsheng Huang, Yuan Zhang, Julian Evans and Sailing He
Appl. Sci. 2023, 13(14), 8221; https://doi.org/10.3390/app13148221 - 15 Jul 2023
Cited by 9 | Viewed by 1915
Abstract
Fruit quality assessment plays a crucial role in determining their market value, consumer acceptance, and post-harvest management. In recent years, spectroscopic techniques have gained significant attention as non-destructive methods for evaluating fruit quality. In this study, we propose a novel deep-learning network, called [...] Read more.
Fruit quality assessment plays a crucial role in determining their market value, consumer acceptance, and post-harvest management. In recent years, spectroscopic techniques have gained significant attention as non-destructive methods for evaluating fruit quality. In this study, we propose a novel deep-learning network, called GCNN-LSTM-AT, for the prediction of five important parameters of tangerines using visible and near-infrared spectroscopy (Vis–NIR). The quality attributes include soluble solid content (SSC), total acidity (TA), acid–sugar ratio (A/S), firmness, and Vitamin C (VC). The proposed model combines the strengths of graph convolutional network (GCN), convolutional neural networks (CNNs), and long short-term memory (LSTM) to capture both spatial and sequential dependencies in the spectra data, and incorporates an attention mechanism to enhance the discriminative ability of the model. To investigate the effectiveness and stability of the model, comparisons with three traditional machine-learning algorithms—moving window partial least squares (MWPLS), random forest (RF), and support vector regression (SVR)—and two deep neural networks—DeepSpectra2D and CNN-AT—are provided. The results have shown that the GCNN-LSTM-AT network outperforms other algorithms and models, achieving accurate predictions for SSC (R2: 0.9885, RMSECV: 0.1430 Brix), TA (R2: 0.8075, RMSECV: 0.0868%), A/S (R2: 0.9014, RMSECV: 1.9984), firmness (R2: 0.9472, RMSECV: 0.0294 kg), and VC (R2: 0.7386, RMSECV: 29.4104 mg/100 g) of tangerines. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

17 pages, 10638 KiB  
Article
DANet: Temporal Action Localization with Double Attention
by Jianing Sun, Xuan Wu, Yubin Xiao, Chunguo Wu, Yanchun Liang, Yi Liang, Liupu Wang and You Zhou
Appl. Sci. 2023, 13(12), 7176; https://doi.org/10.3390/app13127176 - 15 Jun 2023
Viewed by 1723
Abstract
Temporal action localization (TAL) aims to predict action instance categories in videos and identify their start and end times. However, existing Transformer-based backbones focus only on global or local features, resulting in the loss of information. In addition, both global and local self-attention [...] Read more.
Temporal action localization (TAL) aims to predict action instance categories in videos and identify their start and end times. However, existing Transformer-based backbones focus only on global or local features, resulting in the loss of information. In addition, both global and local self-attention mechanisms tend to average embeddings, thereby reducing the preservation of critical features. To solve these two problems better, we propose two kinds of attention mechanisms, namely multi-headed local self-attention (MLSA) and max-average pooling attention (MA) to extract simultaneously local and global features. In MA, max-pooling is used to select the most critical information from local clip embeddings instead of averaging embeddings, and average-pooling is used to aggregate global features. We use MLSA for modeling local temporal context. In addition, to enhance collaboration between MA and MLSA, we propose the double attention block (DABlock), comprising MA and MLSA. Finally, we propose the final network double attention network (DANet), composed of DABlocks and other advanced blocks. To evaluate DANet’s performance, we conduct extensive experiments for the TAL task. Experimental results demonstrate that DANet outperforms the other state-of-the-art models on all datasets. Finally, ablation studies demonstrate the effectiveness of the proposed MLSA and MA. Compared with structures using backbone with convolution and global Transformer, DABlock consisting of MLSA and MA has a superior performance, achieving an 8% and 0.5% improvement on overall average mAP, respectively. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

21 pages, 6845 KiB  
Article
A New Defect Diagnosis Method for Wire Rope Based on CNN-Transformer and Transfer Learning
by Mingyuan Wang, Jishun Li and Yujun Xue
Appl. Sci. 2023, 13(12), 7069; https://doi.org/10.3390/app13127069 - 13 Jun 2023
Cited by 3 | Viewed by 1494
Abstract
Accurate wire rope defect diagnosis is crucial for the health of whole machinery systems in various industries and practical applications. Although the loss of metallic cross-sectional area signals is the most widely used method in non-destructive wire rope evaluation methods, the weakness and [...] Read more.
Accurate wire rope defect diagnosis is crucial for the health of whole machinery systems in various industries and practical applications. Although the loss of metallic cross-sectional area signals is the most widely used method in non-destructive wire rope evaluation methods, the weakness and scarcity of defect signals lead to poor diagnostic performance, especially in diverse conditions or those with noise interference. Thus, a new wire rope defect diagnosis method is proposed in this study. First, empirical mode decomposition and isolation forest methods are applied to eliminate noise signals and to locate the defects. Second, a convolution neural network and transformer encoder are used to design a new wire rope defect diagnosis network for the improvement of the feature extraction ability. Third, transfer learning architecture is established based on gray feature images to fine-tune the pre-trained model using a small target domain dataset. Finally, comparison experiments and a visualization analysis are conducted to verify the effectiveness of the proposed methods. The results demonstrate that the presented model can improve the performance of the wire rope defect diagnosis method under cross-domain conditions. Additionally, the transfer feasibility of transfer learning architecture is discussed for future practical applications. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

14 pages, 1496 KiB  
Article
An Ensemble Transfer Learning Model for Detecting Stego Images
by Dina Yousif Mikhail, Roojwan Sc Hawezi and Shahab Wahhab Kareem
Appl. Sci. 2023, 13(12), 7021; https://doi.org/10.3390/app13127021 - 11 Jun 2023
Cited by 6 | Viewed by 2811
Abstract
As internet traffic grows daily, so does the need to protect it. Network security protects data from unauthorized access and ensures their confidentiality and integrity. Steganography is the practice and study of concealing communications by inserting them into seemingly unrelated data streams (cover [...] Read more.
As internet traffic grows daily, so does the need to protect it. Network security protects data from unauthorized access and ensures their confidentiality and integrity. Steganography is the practice and study of concealing communications by inserting them into seemingly unrelated data streams (cover media). Investigating and adapting machine learning models in digital image steganalysis is becoming more popular. It has been demonstrated that steganography techniques used within such a framework perform more securely than do techniques using hand-crafted pieces. This work was carried out to investigate and examine machine learning methods’ critical contributions and beneficial roles. Machine learning is a field of artificial intelligence (AI) that provides the ability to learn without being explicitly programmed. Steganalysis is considered a classification problem that can be addressed by employing machine learning techniques and recent deep learning tools. The proposed ensemble model had four models (convolution neural networks (CNNs), Inception, AlexNet, and Resnet50), and after evaluating each model, the system voted on the best model for detecting stego images. Since active steganalysis is a classification problem that may be solved using active deep learning tools and modern machine learning methods, this paper’s major goal was to analyze deep learning algorithms’ vital roles and main contributions. The evaluation shows how to successfully detect images that contain a steganography algorithm that hides data in images. Thus, it suggests which algorithms work best, which need improvement, and which are easier to identify. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

16 pages, 4066 KiB  
Article
An Efficient SMOTE-Based Deep Learning Model for Voice Pathology Detection
by Ji-Na Lee and Ji-Yeoun Lee
Appl. Sci. 2023, 13(6), 3571; https://doi.org/10.3390/app13063571 - 10 Mar 2023
Cited by 13 | Viewed by 3002
Abstract
The Saarbruecken Voice Database (SVD) is a public database used by voice pathology detection systems. However, the distributions of the pathological and normal voice samples show a clear class imbalance. This study aims to develop a system for the classification of pathological and [...] Read more.
The Saarbruecken Voice Database (SVD) is a public database used by voice pathology detection systems. However, the distributions of the pathological and normal voice samples show a clear class imbalance. This study aims to develop a system for the classification of pathological and normal voices that uses efficient deep learning models based on various oversampling methods, such as the adaptive synthetic sampling (ADASYN), synthetic minority oversampling technique (SMOTE), and Borderline-SMOTE directly applied to feature parameters. The suggested combinations of oversampled linear predictive coefficients (LPCs), mel-frequency cepstral coefficients (MFCCs), and deep learning methods can efficiently classify pathological and normal voices. The balanced datasets from ADASYN, SMOTE, and Borderline-SMOTE are used to validate and evaluate the various deep learning models. The experiments are conducted using model evaluation metrics such as the recall, specificity, G, and F1 value. The experimental results suggest that the proposed voice pathology detection (VPD) system integrating the LPCs oversampled by the SMOTE and a convolutional neural network (CNN) can effectively yield the highest accuracy at 98.89% when classifying pathological and normal voices. Finally, the performances of oversampling algorithms such as the ADASYN, SMOTE, and Borderline-SMOTE are discussed. Furthermore, the performance of SMOTE is superior to conventional imbalanced data oversampling algorithms, and it can be used to diagnose pathological signals in real-world applications. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

11 pages, 14273 KiB  
Article
GhostNeXt: Rethinking Module Configurations for Efficient Model Design
by Kiseong Hong, Gyeong-hyeon Kim and Eunwoo Kim
Appl. Sci. 2023, 13(5), 3301; https://doi.org/10.3390/app13053301 - 4 Mar 2023
Cited by 4 | Viewed by 1902
Abstract
Despite the continuous development of convolutional neural networks, it remains a challenge to achieve performance improvement with fewer parameters and floating point operations (FLOPs) as a light-weight model. In particular, excessive expressive power on a module is a crucial cause of skyrocketing the [...] Read more.
Despite the continuous development of convolutional neural networks, it remains a challenge to achieve performance improvement with fewer parameters and floating point operations (FLOPs) as a light-weight model. In particular, excessive expressive power on a module is a crucial cause of skyrocketing the computational cost of the entire network. We argue that it is necessary to optimize the entire network by optimizing single modules or blocks of the network. Therefore, we propose GhostNeXt, a promising alternative to GhostNet, by adjusting the module configuration inside the Ghost block. We introduce a controller to select channel operations of the module dynamically. It holds a plug-and-play component that is more useful than the existing approach. Experiments on several classification tasks demonstrate that the proposed method is a better alternative to convolution layers in baseline models. GhostNeXt achieves competitive recognition performance compared to GhostNet and other popular models while reducing computational costs on the benchmark datasets. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

18 pages, 2507 KiB  
Article
A Study of Breast Cancer Classification Algorithms by Fusing Machine Learning and Deep Learning
by Lifei Sun and Sen Li
Appl. Sci. 2023, 13(5), 3097; https://doi.org/10.3390/app13053097 - 27 Feb 2023
Cited by 2 | Viewed by 1876
Abstract
Although breast cancer, with easy recurrence and high mortality, has become one of the leading causes of cancer death in women, early and accurate diagnosis of breast cancer can effectively increase the likelihood of a cure. Therefore, it is particularly important to improve [...] Read more.
Although breast cancer, with easy recurrence and high mortality, has become one of the leading causes of cancer death in women, early and accurate diagnosis of breast cancer can effectively increase the likelihood of a cure. Therefore, it is particularly important to improve the accuracy of early diagnosis of breast cancer. However, conventional early diagnosis relies on human experience and has a low accuracy rate. Therefore, many researchers have proposed various machine learning methods to improve the accuracy and efficiency of prediction. Most of the existing studies around breast cancer classification adopt a single algorithm to fit breast cancer data but ignore the applicability of different breast cancer data features to the model. In this paper, we adopt machine algorithms to strip the features of machine learning methods from the rest of the features and attempt to enhance the model effect by designing deep learning model structures to find the hidden patterns in the rest of the features. In addition, due to strict medical data privacy requirements and high collection difficulty and cost, the model designed in this paper will be trained on a small number of samples. As a result, we attempt to find a minimization model for a breast cancer classification algorithm that features both low cost and high efficiency. At the same time, the deep learning model is further designed to complement the original model when it is possible to introduce complex data indicators. Experimental values show that the design model in this paper performs best not only under limited data and limited indicators but also under limited data complex indicators, demonstrating the effectiveness of the approach of mixed comparison and feature selection of multiple classification algorithms. In summary, the fusion model designed and implemented in this paper performs well in the experiments, and the accuracy of the model test reaches 98.3%. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

19 pages, 704 KiB  
Article
A Machine Learning Tool to Monitor and Forecast Results from Testing Products in End-of-Line Systems
by Carlos Nunes, Ricardo Nunes, E. J. Solteiro Pires, João Barroso and Arsénio Reis
Appl. Sci. 2023, 13(4), 2263; https://doi.org/10.3390/app13042263 - 10 Feb 2023
Cited by 1 | Viewed by 2125
Abstract
The massive industrialization of products in a factory environment requires testing the product at a stage before its exportation to the sales market. For example, the end-of-line tests at Continental Advanced Antenna contribute to the validation of an antenna’s functionality, a product manufactured [...] Read more.
The massive industrialization of products in a factory environment requires testing the product at a stage before its exportation to the sales market. For example, the end-of-line tests at Continental Advanced Antenna contribute to the validation of an antenna’s functionality, a product manufactured by this organization. In addition, the storage of information from the testing process allows the data manipulation through automated machine learning algorithms in search of a beneficial contribution. Studies in this area (automatic learning/machine learning) lead to the search and development of tools designed with objectives such as preventing anomalies in the production line, predictive maintenance, product quality assurance, forecast demand, forecasting safety problems, increasing resources, proactive maintenance, resource scalability, reduced production time, and anomaly detection, isolation, and correction. Once applied to the manufacturing environment, these advantages make the EOL system more productive, reliable, and less time-consuming. This way, a tool is proposed that allows the visualization and previous detection of trends associated with faults in the antenna testing system. Furthermore, it focuses on predicting failures at Continental’s EOL. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

14 pages, 2692 KiB  
Article
A Neural Network-Based Partial Fingerprint Image Identification Method for Crime Scenes
by Yuting Sun, Yanfeng Tang and Xiaojuan Chen
Appl. Sci. 2023, 13(2), 1188; https://doi.org/10.3390/app13021188 - 16 Jan 2023
Cited by 7 | Viewed by 2965
Abstract
Fingerprints are the most widely used of all biological characteristics in public safety and forensic identification. However, fingerprint images extracted from the crime scene are incomplete. On the one hand, due to the lack of effective area in partial fingerprint images, the extracted [...] Read more.
Fingerprints are the most widely used of all biological characteristics in public safety and forensic identification. However, fingerprint images extracted from the crime scene are incomplete. On the one hand, due to the lack of effective area in partial fingerprint images, the extracted features are insufficient. On the other hand, a broken ridge may lead to a large number of false feature points, which affect the accuracy of fingerprint recognition. Existing fingerprint identification methods are not ideal for partial fingerprint identification. To overcome these problems, this paper proposes an attention-based partial fingerprint identification model named APFI. Firstly, the algorithm utilizes the residual network (ResNet) for feature descriptor extraction, which generates a representation of spatial information on fingerprint expression. Secondly, the channel attention module is inserted into the proposed model to obtain more accurate fingerprint feature information from the residual block. Then, to improve the identification accuracy of partial fingerprints, the angular distance between features is used to calculate the similarity of fingerprints. Finally, the proposed model is trained and validated on a home-made partial fingerprint image dataset. Experiments on the home-made fingerprint datasets and the NIST-SD4 datasets show that the partial fingerprint identification method proposed in this paper has higher identification accuracy than other state-of-the-art methods. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

16 pages, 2143 KiB  
Article
Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker Recognition
by Wondimu Lambamo, Ramasamy Srinivasagan and Worku Jifara
Appl. Sci. 2023, 13(1), 569; https://doi.org/10.3390/app13010569 - 31 Dec 2022
Cited by 4 | Viewed by 2896
Abstract
The performance of speaker recognition systems is very well on the datasets without noise and mismatch. However, the performance gets degraded with the environmental noises, channel variation, physical and behavioral changes in speaker. The types of Speaker related feature play crucial role in [...] Read more.
The performance of speaker recognition systems is very well on the datasets without noise and mismatch. However, the performance gets degraded with the environmental noises, channel variation, physical and behavioral changes in speaker. The types of Speaker related feature play crucial role in improving the performance of speaker recognition systems. Gammatone Frequency Cepstral Coefficient (GFCC) features has been widely used to develop robust speaker recognition systems with the conventional machine learning, it achieved better performance compared to Mel Frequency Cepstral Coefficient (MFCC) features in the noisy condition. Recently, deep learning models showed better performance in the speaker recognition compared to conventional machine learning. Most of the previous deep learning-based speaker recognition models has used Mel Spectrogram and similar inputs rather than a handcrafted features like MFCC and GFCC features. However, the performance of the Mel Spectrogram features gets degraded in the high noise ratio and mismatch in the utterances. Similar to Mel Spectrogram, Cochleogram is another important feature for deep learning speaker recognition models. Like GFCC features, Cochleogram represents utterances in Equal Rectangular Band (ERB) scale which is important in noisy condition. However, none of the studies have conducted analysis for noise robustness of Cochleogram and Mel Spectrogram in speaker recognition. In addition, only limited studies have used Cochleogram to develop speech-based models in noisy and mismatch condition using deep learning. In this study, analysis of noise robustness of Cochleogram and Mel Spectrogram features in speaker recognition using deep learning model is conducted at the Signal to Noise Ratio (SNR) level from −5 dB to 20 dB. Experiments are conducted on the VoxCeleb1 and Noise added VoxCeleb1 dataset by using basic 2DCNN, ResNet-50, VGG-16, ECAPA-TDNN and TitaNet Models architectures. The Speaker identification and verification performance of both Cochleogram and Mel Spectrogram is evaluated. The results show that Cochleogram have better performance than Mel Spectrogram in both speaker identification and verification at the noisy and mismatch condition. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

21 pages, 16692 KiB  
Article
Anomaly Detection Method in Railway Using Signal Processing and Deep Learning
by Jaeseok Shim, Jeongseo Koo, Yongwoon Park and Jaehoon Kim
Appl. Sci. 2022, 12(24), 12901; https://doi.org/10.3390/app122412901 - 15 Dec 2022
Cited by 9 | Viewed by 3110
Abstract
In this paper, anomaly detection of wheel flats based on signal processing and deep learning techniques is analyzed. Wheel flats mostly affect running stability and ride comfort. Currently, domestic railway companies visually inspect wheel flats one by one with their eyes after railway [...] Read more.
In this paper, anomaly detection of wheel flats based on signal processing and deep learning techniques is analyzed. Wheel flats mostly affect running stability and ride comfort. Currently, domestic railway companies visually inspect wheel flats one by one with their eyes after railway vehicles enter the railway depots for maintenance. Therefore, CBM (Condition-Based Maintenance) is required for wheel flats resolution. Anomaly detection for wheel flat signals of railway vehicles using Order analysis and STFT (Short Time Fourier Transform) is studied in this paper. In the case of railway vehicles, it is not easy to obtain actual failure data through running vehicles in a university laboratory due to safety and cost issues. Therefore, vibration-induced acceleration was obtained using a multibody dynamics simulation software, SIMPACK. This method is also proved in the other paper by rig tests. In addition, since the noise signal was not included in the simulated vibration, the noise signal obtained from the Seoul Metro Subway Line 7 vehicle was overlapped with the simulated one. Finally, to improve the performance of both detection rate and real-time of characteristics based on existing LeNet-5 architectures, spectrogram images transformed from time domain data were proceeded with the LeNet deep learning model modified with the pooling method and activation function. As a result, it is validated that the method using the spectrogram with a deep learning approach yields higher accuracy than the time domain data. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

23 pages, 9473 KiB  
Article
Deep Compressed Sensing Generation Model for End-to-End Extreme Observation and Reconstruction
by Han Diao, Xiaozhu Lin and Chun Fang
Appl. Sci. 2022, 12(23), 12176; https://doi.org/10.3390/app122312176 - 28 Nov 2022
Cited by 3 | Viewed by 1548
Abstract
Data transmission and storage are inseparable from compression technology. Compressed sensing directly undersamples and reconstructs data at a much lower sampling frequency than Nyquist, which reduces redundant sampling. However, the requirement of data sparsity in compressed sensing limits its application. The combination of [...] Read more.
Data transmission and storage are inseparable from compression technology. Compressed sensing directly undersamples and reconstructs data at a much lower sampling frequency than Nyquist, which reduces redundant sampling. However, the requirement of data sparsity in compressed sensing limits its application. The combination of neural network-based generative models and compressed sensing breaks the limitation of data sparsity. Compressed sensing for extreme observations can reduce costs, but the reconstruction effect of the above methods in extreme observations is blurry. We addressed this problem by proposing an end-to-end observation and reconstruction method based on a deep compressed sensing generative model. Under RIP and S-REC, data can be observed and reconstructed from end to end. In MNIST extreme observation and reconstruction, end-to-end feasibility compared to random input is verified. End-to-end reconstruction accuracy improves by 5.20% over random input and SSIM by 0.2200. In the Fashion_MNIST extreme observation and reconstruction, it is verified that the reconstruction effect of the deconvolution generative model is better than that of the multi-layer perceptron. The end-to-end reconstruction accuracy of the deconvolution generative model is 2.49% higher than that of the multi-layer perceptron generative model, and the SSIM is 0.0532 higher. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

16 pages, 3855 KiB  
Article
A Multi-Pedestrian Tracking Algorithm for Dense Scenes Based on an Attention Mechanism and Dual Data Association
by Chang Li, Yiding Wang and Xiaoming Liu
Appl. Sci. 2022, 12(19), 9597; https://doi.org/10.3390/app12199597 - 24 Sep 2022
Cited by 5 | Viewed by 2030
Abstract
Aiming at the problems of frequent identity switches (IDs) and trajectory interruption of multi-pedestrian tracking algorithms in dense scenes, this paper proposes a multi-pedestrian tracking algorithm based on an attention mechanism and dual data association. First, the FairMOT algorithm is used as a [...] Read more.
Aiming at the problems of frequent identity switches (IDs) and trajectory interruption of multi-pedestrian tracking algorithms in dense scenes, this paper proposes a multi-pedestrian tracking algorithm based on an attention mechanism and dual data association. First, the FairMOT algorithm is used as a baseline to introduce the feature pyramid network in the CenterNet detection network and up-sampling the output multi-scale fused feature maps, effectively reducing the rate of missed detection of small-sized and obscured pedestrians. The improved channel attention mechanism module is embedded in the CenterNet’s backbone network to improve detection accuracy. Then, a re-identification (ReID) branch is embedded in the head of the detection network, and the two sub-tasks of pedestrian detection and pedestrian apparent feature extraction are combined in a multi-task joint learning approach to output the pedestrian apparent feature vectors while detecting pedestrians, which improves the computational efficiency and localization accuracy of the algorithm. Finally, we propose a dual data association tracking model that tracks by associating almost every detection box instead of only the high-scoring ones. For low-scoring detection boxes, we utilize their similarities with trajectories to recover obscured pedestrians. The experiment using the MOT17 dataset shows that the tracking accuracy is improved by 0.6% compared with the baseline FairMOT algorithm, and the number of switches decreases from 3303 to 2056, which indicates that the proposed algorithm can effectively reduce the number of trajectory interruptions and identity switching. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

17 pages, 3224 KiB  
Article
Evading Logits-Based Detections to Audio Adversarial Examples by Logits-Traction Attack
by Songshen Han, Kaiyong Xu, Songhui Guo, Miao Yu and Bo Yang
Appl. Sci. 2022, 12(18), 9388; https://doi.org/10.3390/app12189388 - 19 Sep 2022
Cited by 2 | Viewed by 1702
Abstract
Automatic Speech Recognition (ASR) provides a new way of human-computer interaction. However, it is vulnerable to adversarial examples, which are obtained by deliberately adding perturbations to the original audios. Thorough studies on the universal feature of adversarial examples are essential to prevent potential [...] Read more.
Automatic Speech Recognition (ASR) provides a new way of human-computer interaction. However, it is vulnerable to adversarial examples, which are obtained by deliberately adding perturbations to the original audios. Thorough studies on the universal feature of adversarial examples are essential to prevent potential attacks. Previous research has shown classic adversarial examples have different logits distribution compared to normal speech. This paper proposes a logit-traction attack to eliminate this difference at the statistical level. Experiments on the LibriSpeech dataset show that the proposed attack reduces the accuracy of the LOGITS NOISE detection to 52.1%. To further verify the effectiveness of this approach in attacking detection based on logits, three different features quantifying the dispersion of logits are constructed in this paper. Furthermore, a richer target sentence is adopted for experiments. The results indicate that these features can detect baseline adversarial examples with an accuracy of about 90% but cannot effectively detect Logits-Traction adversarial examples, proving that Logits-Traction attack can evade the logits-based detection method. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

14 pages, 3444 KiB  
Article
Wiener Filter and Deep Neural Networks: A Well-Balanced Pair for Speech Enhancement
by Dayana Ribas, Antonio Miguel, Alfonso Ortega and Eduardo Lleida
Appl. Sci. 2022, 12(18), 9000; https://doi.org/10.3390/app12189000 - 7 Sep 2022
Cited by 14 | Viewed by 3770
Abstract
This paper proposes a Deep Learning (DL) based Wiener filter estimator for speech enhancement in the framework of the classical spectral-domain speech estimator algorithm. According to the characteristics of the intermediate steps of the speech enhancement algorithm, i.e., the SNR estimation and the [...] Read more.
This paper proposes a Deep Learning (DL) based Wiener filter estimator for speech enhancement in the framework of the classical spectral-domain speech estimator algorithm. According to the characteristics of the intermediate steps of the speech enhancement algorithm, i.e., the SNR estimation and the gain function, there is determined the best usage of the network at learning a robust instance of the Wiener filter estimator. Experiments show that the use of data-driven learning of the SNR estimator provides robustness to the statistical-based speech estimator algorithm and achieves performance on the state-of-the-art. Several objective quality metrics show the performance of the speech enhancement and beyond them, there are examples of noisy vs. enhanced speech available for listening to demonstrate in practice the skills of the method in simulated and real audio. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

16 pages, 832 KiB  
Article
GLFormer: Global and Local Context Aggregation Network for Temporal Action Detection
by Yilong He, Yong Zhong, Lishun Wang and Jiachen Dang
Appl. Sci. 2022, 12(17), 8557; https://doi.org/10.3390/app12178557 - 26 Aug 2022
Cited by 1 | Viewed by 1775
Abstract
As the core component of video analysis, Temporal Action Localization (TAL) has experienced remarkable success. However, some issues are not well addressed. First, most of the existing methods process the local context individually, without explicitly exploiting the relations between features in an action [...] Read more.
As the core component of video analysis, Temporal Action Localization (TAL) has experienced remarkable success. However, some issues are not well addressed. First, most of the existing methods process the local context individually, without explicitly exploiting the relations between features in an action instance as a whole. Second, the duration of different actions varies widely; thus, it is difficult to choose the proper temporal receptive field. To address these issues, this paper proposes a novel network, GLFormer, which can aggregate short, medium, and long temporal contexts. Our method consists of three independent branches with different ranges of attention, and these features are then concatenated along the temporal dimension to obtain richer features. One is multi-scale local convolution (MLC), which consists of multiple 1D convolutions with varying kernel sizes to capture the multi-scale context information. Another is window self-attention (WSA), which tries to explore the relationship between features within the window range. The last is global attention (GA), which is used to establish long-range dependencies across the full sequence. Moreover, we design a feature pyramid structure to be compatible with action instances of various durations. GLFormer achieves state-of-the-art performance on two challenging video benchmarks, THUMOS14 and ActivityNet 1.3. Our performance is 67.2% and 54.5% [email protected] on the datasets THUMOS14 and ActivityNet 1.3, respectively. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

20 pages, 2213 KiB  
Article
Feature Extraction with Handcrafted Methods and Convolutional Neural Networks for Facial Emotion Recognition
by Eleni Tsalera, Andreas Papadakis, Maria Samarakou and Ioannis Voyiatzis
Appl. Sci. 2022, 12(17), 8455; https://doi.org/10.3390/app12178455 - 24 Aug 2022
Cited by 19 | Viewed by 4304
Abstract
This research compares the facial expression recognition accuracy achieved using image features extracted (a) manually through handcrafted methods and (b) automatically through convolutional neural networks (CNNs) from different depths, with and without retraining. The Karolinska Directed Emotional Faces, Japanese Female Facial Expression, and [...] Read more.
This research compares the facial expression recognition accuracy achieved using image features extracted (a) manually through handcrafted methods and (b) automatically through convolutional neural networks (CNNs) from different depths, with and without retraining. The Karolinska Directed Emotional Faces, Japanese Female Facial Expression, and Radboud Faces Database databases have been used, which differ in image number and characteristics. Local binary patterns and histogram of oriented gradients have been selected as handcrafted methods and the features extracted are examined in terms of image and cell size. Five CNNs have been used, including three from the residual architecture of increasing depth, Inception_v3, and EfficientNet-B0. The CNN-based features are extracted from the pre-trained networks from the 25%, 50%, 75%, and 100% of their depths and, after their retraining on the new databases. Each method is also evaluated in terms of calculation time. CNN-based feature extraction has proved to be more efficient since the classification results are superior and the computational time is shorter. The best performance is achieved when the features are extracted from shallower layers of pre-trained CNNs (50% or 75% of their depth), achieving high accuracy results with shorter computational time. CNN retraining is, in principle, beneficial in terms of classification accuracy, mainly for the larger databases by an average of 8%, also increasing the computational time by an average of 70%. Its contribution in terms of classification accuracy is minimal when applied in smaller databases. Finally, the effect of two types of noise on the models is examined, with ResNet50 appearing to be the most robust to noise. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

15 pages, 900 KiB  
Article
Investigating the Difference of Fake News Source Credibility Recognition between ANN and BERT Algorithms in Artificial Intelligence
by Tosti H. C. Chiang, Chih-Shan Liao and Wei-Ching Wang
Appl. Sci. 2022, 12(15), 7725; https://doi.org/10.3390/app12157725 - 31 Jul 2022
Cited by 4 | Viewed by 2945
Abstract
Fake news permeating life through channels misleads people into disinformation. To reduce the harm of fake news and provide multiple and effective news credibility channels, the approach of linguistics is applied to a word-frequency-based ANN system and semantics-based BERT system in this study, [...] Read more.
Fake news permeating life through channels misleads people into disinformation. To reduce the harm of fake news and provide multiple and effective news credibility channels, the approach of linguistics is applied to a word-frequency-based ANN system and semantics-based BERT system in this study, using mainstream news as a general news dataset and content farms as a fake news dataset for the models judging news source credibility and comparing the difference in news source credibility recognition between ANN and BERT. The research findings show high similarity in the highest and lowest hit rates between the ANN system and the BERT system (Liberty Time had the highest hit rate, while ETtoday and nooho.net had the lowest hit rates). The BERT system presents a higher and more stable overall source credibility recognition rate than the ANN system (BERT 91.2% > ANN 82.75%). Recognizing news source credibility through artificial intelligence not only could effectively enhance people’s sensitivity to news sources but, in the long term, could cultivate public media literacy to achieve the synergy of fake news resistance with technology. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

15 pages, 518 KiB  
Article
Singing Voice Detection in Electronic Music with a Long-Term Recurrent Convolutional Network
by Raymundo Romero-Arenas, Alfonso Gómez-Espinosa and Benjamín Valdés-Aguirre
Appl. Sci. 2022, 12(15), 7405; https://doi.org/10.3390/app12157405 - 23 Jul 2022
Cited by 2 | Viewed by 2116
Abstract
Singing Voice Detection (SVD) is a classification task that determines whether there is a singing voice in a given audio segment. While current systems produce high-quality results on this task, the reported experiments are usually limited to popular music. A Long-Term Recurrent Convolutional [...] Read more.
Singing Voice Detection (SVD) is a classification task that determines whether there is a singing voice in a given audio segment. While current systems produce high-quality results on this task, the reported experiments are usually limited to popular music. A Long-Term Recurrent Convolutional Network (LRCN) was adapted to detect vocals in a new dataset of electronic music to evaluate its performance in a different music genre and compare its results against those in other state-of-the-art experiments in pop music to prove its effectiveness across a different genre. Experiments on two datasets studied the impacts of different audio features and block size on LRCN temporal relationship learning, and the benefits of preprocessing on performance, and the results generate a benchmark to evaluate electronic music and its intricacies. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

18 pages, 5213 KiB  
Article
Attention-Gate-Based Model with Inception-like Block for Single-Image Dehazing
by Cheng-Ying Tsai and Chieh-Li Chen
Appl. Sci. 2022, 12(13), 6725; https://doi.org/10.3390/app12136725 - 2 Jul 2022
Cited by 5 | Viewed by 2315
Abstract
In recent decades, haze has become an environmental issue due to its effects on human health. It also reduces visibility and degrades the performance of computer vision algorithms in autonomous driving applications, which may jeopardize car driving safety. Therefore, it is extremely important [...] Read more.
In recent decades, haze has become an environmental issue due to its effects on human health. It also reduces visibility and degrades the performance of computer vision algorithms in autonomous driving applications, which may jeopardize car driving safety. Therefore, it is extremely important to instantly remove the haze effect on an image. The purpose of this study is to leverage useful modules to achieve a lightweight and real-time image-dehazing model. Based on the U-Net architecture, this study integrates four modules, including an image pre-processing block, inception-like blocks, spatial pyramid pooling blocks, and attention gates. The original attention gate was revised to fit the field of image dehazing and consider different color spaces to retain the advantages of each color space. Furthermore, using an ablation study and a quantitative evaluation, the advantages of using these modules were illustrated. Through existing indoor and outdoor test datasets, the proposed method shows outstanding dehazing quality and an efficient execution time compared to other state-of-the-art methods. This study demonstrates that the proposed model can improve dehazing quality, keep the model lightweight, and obtain pleasing dehazing results. A comparison to existing methods using the RESIDE SOTS dataset revealed that the proposed model improves the SSIM and PSNR metrics by at least 5–10%. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

16 pages, 1640 KiB  
Article
Sex Recognition through ECG Signals aiming toward Smartphone Authentication
by Jose-Luis Cabra Lopez, Carlos Parra, Libardo Gomez and Luis Trujillo
Appl. Sci. 2022, 12(13), 6573; https://doi.org/10.3390/app12136573 - 29 Jun 2022
Cited by 9 | Viewed by 3449
Abstract
Physiological signals are strongly related to a person’s state of health and carry information about the human body. For example, by ECG, it is possible to obtain information about cardiac disease, emotions, personal identification, and the sex of a person, among others. This [...] Read more.
Physiological signals are strongly related to a person’s state of health and carry information about the human body. For example, by ECG, it is possible to obtain information about cardiac disease, emotions, personal identification, and the sex of a person, among others. This paper proposes the study of the heartbeat from a soft-biometric perspective to be applied to smartphone unlocking services. We employ the user heartbeat to classify the individual by sex (male, female) with the use of Deep Learning, reaching an accuracy of 94.4% ± 2.0%. This result was obtained with the RGB representation of the union of the time-frequency transformation from the pseudo-orthogonal X, Y, and Z bipolar signals. Evaluating each bipolar contribution, we found that the XYZ combination provides the best category distinction using GoogLeNet. The 24-h Holter database of the study contains 202 subjects with a female size of 49.5%. We propose an architecture for managing this signal that allows the use of a few samples to train the network. Due to the hidden nature of ECG, it does not present vulnerabilities like public trait exposition, light/noise sensibility, or learnability compared to fingerprint, facial, voice, or password verification methods. ECG may complement those gaps en route to a cooperative authentication ecosystem. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

15 pages, 3771 KiB  
Article
Improving Human Activity Recognition for Sparse Radar Point Clouds: A Graph Neural Network Model with Pre-Trained 3D Human-Joint Coordinates
by Gawon Lee and Jihie Kim
Appl. Sci. 2022, 12(4), 2168; https://doi.org/10.3390/app12042168 - 18 Feb 2022
Cited by 22 | Viewed by 5532
Abstract
Many devices have been used to detect human action, including wearable devices, cameras, lidars, and radars. However, some people, such as the elderly and young children, may not know how to use wearable devices effectively. Cameras have the disadvantage of invading privacy, and [...] Read more.
Many devices have been used to detect human action, including wearable devices, cameras, lidars, and radars. However, some people, such as the elderly and young children, may not know how to use wearable devices effectively. Cameras have the disadvantage of invading privacy, and lidar is rather expensive. In contrast, radar, which is widely used commercially, is easily accessible and relatively cheap. However, due to the limitations of radio waves, radar data are sparse and not easy to use for human activity recognition. In this study, we present a novel human activity recognition model that consists of a pre-trained model and graph neural networks (GNNs). First, we overcome the sparsity of the radar data. To achieve that, we use a model pre-trained with the 3D coordinates of radar data and Kinect data that represents the ground truth. With this pre-trained model, we extract reliable features as 3D human joint coordinate estimates from sparse radar data. Then, a GNN model is used to extract additional information in the spatio-temporal domain from these joint coordinate estimates. Our approach was evaluated using the MMActivity dataset, which includes five different human activities. Our system achieved an accuracy of 96%. The experimental result demonstrates that our algorithm is more effective than five other baseline models. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

15 pages, 846 KiB  
Article
Ensemble-Guided Model for Performance Enhancement in Model-Complexity-Limited Acoustic Scene Classification
by Seokjin Lee, Minhan Kim, Seunghyeon Shin, Seungjae Baek, Sooyoung Park and Youngho Jeong
Appl. Sci. 2022, 12(1), 44; https://doi.org/10.3390/app12010044 - 21 Dec 2021
Cited by 6 | Viewed by 2734
Abstract
In recent acoustic scene classification (ASC) models, various auxiliary methods to enhance performance have been applied, e.g., subsystem ensembles and data augmentations. Particularly, the ensembles of several submodels may be effective in the ASC models, but there is a problem with increasing the [...] Read more.
In recent acoustic scene classification (ASC) models, various auxiliary methods to enhance performance have been applied, e.g., subsystem ensembles and data augmentations. Particularly, the ensembles of several submodels may be effective in the ASC models, but there is a problem with increasing the size of the model because it contains several submodels. Therefore, it is hard to be used in model-complexity-limited ASC tasks. In this paper, we would like to find the performance enhancement method while taking advantage of the model ensemble technique without increasing the model size. Our method is proposed based on a mean-teacher model, which is developed for consistency learning in semi-supervised learning. Because our problem is supervised learning, which is different from the purpose of the conventional mean-teacher model, we modify detailed strategies to maximize the consistency learning performance. To evaluate the effectiveness of our method, experiments were performed with an ASC database from the Detection and Classification of Acoustic Scenes and Events 2021 Task 1A. The small-sized ASC model with our proposed method improved the log loss performance up to 1.009 and the F1-score performance by 67.12%, whereas the vanilla ASC model showed a log loss of 1.052 and an F1-score of 65.79%. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

17 pages, 934 KiB  
Article
Attention-Based CNN and Bi-LSTM Model Based on TF-IDF and GloVe Word Embedding for Sentiment Analysis
by Marjan Kamyab, Guohua Liu and Michael Adjeisah
Appl. Sci. 2021, 11(23), 11255; https://doi.org/10.3390/app112311255 - 27 Nov 2021
Cited by 69 | Viewed by 10309
Abstract
Sentiment analysis (SA) detects people’s opinions from text engaging natural language processing (NLP) techniques. Recent research has shown that deep learning models, i.e., Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and Transformer-based provide promising results for recognizing sentiment. Nonetheless, CNN has the [...] Read more.
Sentiment analysis (SA) detects people’s opinions from text engaging natural language processing (NLP) techniques. Recent research has shown that deep learning models, i.e., Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and Transformer-based provide promising results for recognizing sentiment. Nonetheless, CNN has the advantage of extracting high-level features by using convolutional and max-pooling layers; it cannot efficiently learn a sequence of correlations. At the same time, Bidirectional RNN uses two RNN directions to improve extracting long-term dependencies. However, it cannot extract local features in parallel, and Transformer-based like Bidirectional Encoder Representations from Transformers (BERT) are the computational resources needed to fine-tune, facing an overfitting problem on small datasets. This paper proposes a novel attention-based model that utilizes CNNs with LSTM (named ACL-SA). First, it applies a preprocessor to enhance the data quality and employ term frequency-inverse document frequency (TF-IDF) feature weighting and pre-trained Glove word embedding approaches to extract meaningful information from textual data. In addition, it utilizes CNN’s max-pooling to extract contextual features and reduce feature dimensionality. Moreover, it uses an integrated bidirectional LSTM to capture long-term dependencies. Furthermore, it applies the attention mechanism at the CNN’s output layer to emphasize each word’s attention level. To avoid overfitting, the Guasiannoise and GuasianDroupout are adopted as regularization. The model’s robustness is evaluated on four English standard datasets, i.e., Sentiment140, US-airline, Sentiment140-MV, SA4A with various performance matrices, and compared efficiency with existing baseline models and approaches. The experiment results show that the proposed method significantly outperforms the state-of-the-art models. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

27 pages, 661 KiB  
Article
A Bayesian Modeling Approach to Situated Design of Personalized Soundscaping Algorithms
by Bart van Erp, Albert Podusenko, Tanya Ignatenko and Bert de Vries
Appl. Sci. 2021, 11(20), 9535; https://doi.org/10.3390/app11209535 - 14 Oct 2021
Cited by 3 | Viewed by 2103
Abstract
Effective noise reduction and speech enhancement algorithms have great potential to enhance lives of hearing aid users by restoring speech intelligibility. An open problem in today’s commercial hearing aids is how to take into account users’ preferences, indicating which acoustic sources should be [...] Read more.
Effective noise reduction and speech enhancement algorithms have great potential to enhance lives of hearing aid users by restoring speech intelligibility. An open problem in today’s commercial hearing aids is how to take into account users’ preferences, indicating which acoustic sources should be suppressed or enhanced, since they are not only user-specific but also depend on many situational factors. In this paper, we develop a fully probabilistic approach to “situated soundscaping”, which aims at enabling users to make on-the-spot (“situated”) decisions about the enhancement or suppression of individual acoustic sources. The approach rests on a compact generative probabilistic model for acoustic signals. In this framework, all signal processing tasks (source modeling, source separation and soundscaping) are framed as automatable probabilistic inference tasks. These tasks can be efficiently executed using message passing-based inference on factor graphs. Since all signal processing tasks are automatable, the approach supports fast future model design cycles in an effort to reach commercializable performance levels. The presented results show promising performance in terms of SNR, PESQ and STOI improvements in a situated setting. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

14 pages, 1920 KiB  
Article
Preprocessing for Unintended Conducted Emissions Classification with ResNet
by Gregory Sheets, Philip Bingham, Mark B. Adams, David Bolme and Scott L. Stewart
Appl. Sci. 2021, 11(19), 8808; https://doi.org/10.3390/app11198808 - 22 Sep 2021
Viewed by 2104
Abstract
Characterization of Unintended Conducted Emissions (UCE) from electronic devices is important when diagnosing electromagnetic interference, performing nonintrusive load monitoring (NILM) of power systems, and monitoring electronic device health, among other applications. Prior work has demonstrated that UCE analysis can serve as a diagnostic [...] Read more.
Characterization of Unintended Conducted Emissions (UCE) from electronic devices is important when diagnosing electromagnetic interference, performing nonintrusive load monitoring (NILM) of power systems, and monitoring electronic device health, among other applications. Prior work has demonstrated that UCE analysis can serve as a diagnostic tool for energy efficiency investigations and detailed load analysis. While explaining the feature selection of deep networks with certainty is often not fully comprehensive, or in other applications, quite lacking, additional tools/methods for further corroboration and confirmation can help further the understanding of the researcher. This is true especially in the subject application of the study in this paper. Often the focus of such efforts is the selected features themselves, and there is not as much understanding gained about the noise in the collected data. If selected feature and noise characteristics are known, it can be used to further shape the design of the deep network or associated preprocessing. This is additionally difficult when the available data are limited, as in the case which the authors investigated in this study. Here, the authors present a novel work (which is a proposed complementary portion of the overall solution to the deep network classification explainability problem for this application) by applying a systematic progression of preprocessing and a deep neural network (ResNet architecture) to classify UCE data obtained via current transformers. By using a methodical application of preprocessing techniques prior to a deep classifier, hypotheses can be produced concerning what features the deep network deems important relative to what it perceives as noise. For instance, it is hypothesized in this particular study as a result of execution of the proposed method and periodic inspection of the classifier output that the UCE spectral features are relatively close to each other or to the interferers, as systematically reducing the beta parameter of the Kaiser window produced progressively better classification performance, but only to a point, as going below the Beta of eight produced decreased classifier performance, as well as the hypothesis that further spectral feature resolution was not as important to the classifier as rejection of the leakage from a spectrally distant interference. This can be very important in unpredictable low-FNR applications, where knowing the difference between features and noise is difficult. As a side-benefit, much was learned regarding the best preprocessing to use with the selected deep network for the UCE collected from these low power consumer devices obtained via current transformers. Baseline rectangular windowed FFT preprocessing provided a 62% classification increase versus using raw samples. After performing a more optimal preprocessing, more than 90% classification accuracy was achieved across 18 low-power consumer devices for scenarios in which the in-band features-to-noise ratio (FNR) was very poor. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

14 pages, 2462 KiB  
Article
Bayesian Feature Fusion Using Factor Graph in Reduced Normal Form
by Amedeo Buonanno, Antonio Nogarotto, Giuseppe Cacace, Giovanni Di Gennaro, Francesco A. N. Palmieri, Maria Valenti and Giorgio Graditi
Appl. Sci. 2021, 11(4), 1934; https://doi.org/10.3390/app11041934 - 22 Feb 2021
Cited by 5 | Viewed by 2064
Abstract
In this work, we investigate an Information Fusion architecture based on a Factor Graph in Reduced Normal Form. This paradigm permits to describe the fusion in a completely probabilistic framework and the information related to the different features are represented as messages that [...] Read more.
In this work, we investigate an Information Fusion architecture based on a Factor Graph in Reduced Normal Form. This paradigm permits to describe the fusion in a completely probabilistic framework and the information related to the different features are represented as messages that flow in a probabilistic network. In this way we build a sort of context for observed features conferring to the solution a great flexibility for managing different type of features with wrong and missing values as required by many real applications. Moreover, modifying opportunely the messages that flow into the network, we obtain an effective way to condition the inference based on the different reliability of each information source or in presence of single unreliable signal. The proposed architecture has been used to fuse different detectors for an identity document classification task but its flexibility, extendibility and robustness make it suitable to many real scenarios where the signal can be wrongly received or completely missing. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

15 pages, 2206 KiB  
Article
A Study on a Speech Emotion Recognition System with Effective Acoustic Features Using Deep Learning Algorithms
by Sung-Woo Byun and Seok-Pil Lee
Appl. Sci. 2021, 11(4), 1890; https://doi.org/10.3390/app11041890 - 21 Feb 2021
Cited by 28 | Viewed by 6064
Abstract
The goal of the human interface is to recognize the user’s emotional state precisely. In the speech emotion recognition study, the most important issue is the effective parallel use of the extraction of proper speech features and an appropriate classification engine. Well defined [...] Read more.
The goal of the human interface is to recognize the user’s emotional state precisely. In the speech emotion recognition study, the most important issue is the effective parallel use of the extraction of proper speech features and an appropriate classification engine. Well defined speech databases are also needed to accurately recognize and analyze emotions from speech signals. In this work, we constructed a Korean emotional speech database for speech emotion analysis and proposed a feature combination that can improve emotion recognition performance using a recurrent neural network model. To investigate the acoustic features, which can reflect distinct momentary changes in emotional expression, we extracted F0, Mel-frequency cepstrum coefficients, spectral features, harmonic features, and others. Statistical analysis was performed to select an optimal combination of acoustic features that affect the emotion from speech. We used a recurrent neural network model to classify emotions from speech. The results show the proposed system has more accurate performance than previous studies. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

11 pages, 13604 KiB  
Article
Design of a Multi-Condition Emotional Speech Synthesizer
by Sung-Woo Byun and Seok-Pil Lee
Appl. Sci. 2021, 11(3), 1144; https://doi.org/10.3390/app11031144 - 26 Jan 2021
Cited by 3 | Viewed by 2559
Abstract
Recently, researchers have developed text-to-speech models based on deep learning, which have produced results superior to those of previous approaches. However, because those systems only mimic the generic speaking style of reference audio, it is difficult to assign user-defined emotional types to synthesized [...] Read more.
Recently, researchers have developed text-to-speech models based on deep learning, which have produced results superior to those of previous approaches. However, because those systems only mimic the generic speaking style of reference audio, it is difficult to assign user-defined emotional types to synthesized speech. This paper proposes an emotional speech synthesizer constructed by embedding not only speaking styles but also emotional styles. We extend speaker embedding to multi-condition embedding by adding emotional embedding in Tacotron, so that the synthesizer can generate emotional speech. An evaluation of the results showed the superiority of the proposed model to a previous model, in terms of emotional expressiveness. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

17 pages, 755 KiB  
Article
Hierarchical Phoneme Classification for Improved Speech Recognition
by Donghoon Oh, Jeong-Sik Park, Ji-Hwan Kim and Gil-Jin Jang
Appl. Sci. 2021, 11(1), 428; https://doi.org/10.3390/app11010428 - 4 Jan 2021
Cited by 13 | Viewed by 6587
Abstract
Speech recognition consists of converting input sound into a sequence of phonemes, then finding text for the input using language models. Therefore, phoneme classification performance is a critical factor for the successful implementation of a speech recognition system. However, correctly distinguishing phonemes with [...] Read more.
Speech recognition consists of converting input sound into a sequence of phonemes, then finding text for the input using language models. Therefore, phoneme classification performance is a critical factor for the successful implementation of a speech recognition system. However, correctly distinguishing phonemes with similar characteristics is still a challenging problem even for state-of-the-art classification methods, and the classification errors are hard to be recovered in the subsequent language processing steps. This paper proposes a hierarchical phoneme clustering method to exploit more suitable recognition models to different phonemes. The phonemes of the TIMIT database are carefully analyzed using a confusion matrix from a baseline speech recognition model. Using automatic phoneme clustering results, a set of phoneme classification models optimized for the generated phoneme groups is constructed and integrated into a hierarchical phoneme classification method. According to the results of a number of phoneme classification experiments, the proposed hierarchical phoneme group models improved performance over the baseline by 3%, 2.1%, 6.0%, and 2.2% for fricative, affricate, stop, and nasal sounds, respectively. The average accuracy was 69.5% and 71.7% for the baseline and proposed hierarchical models, showing a 2.2% overall improvement. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

17 pages, 1790 KiB  
Article
Performance Boosting of Scale and Rotation Invariant Human Activity Recognition (HAR) with LSTM Networks Using Low Dimensional 3D Posture Data in Egocentric Coordinates
by Ibrahim Furkan Ince
Appl. Sci. 2020, 10(23), 8474; https://doi.org/10.3390/app10238474 - 27 Nov 2020
Cited by 8 | Viewed by 3165
Abstract
Human activity recognition (HAR) has been an active area in computer vision with a broad range of applications, such as education, security surveillance, and healthcare. HAR is a general time series classification problem. LSTMs are widely used for time series classification tasks. However, [...] Read more.
Human activity recognition (HAR) has been an active area in computer vision with a broad range of applications, such as education, security surveillance, and healthcare. HAR is a general time series classification problem. LSTMs are widely used for time series classification tasks. However, they work well with high-dimensional feature vectors, which reduce the processing speed of LSTM in real-time applications. Therefore, dimension reduction is required to create low-dimensional feature space. As it is experimented in previous study, LSTM with dimension reduction yielded the worst performance among other classifiers, which are not deep learning methods. Therefore, in this paper, a novel scale and rotation invariant human activity recognition system, which can also work in low dimensional feature space is presented. For this purpose, Kinect depth sensor is employed to obtain skeleton joints. Since angles are used, proposed system is already scale invariant. In order to provide rotation invariance, body relative direction in egocentric coordinates is calculated. The 3D vector between right hip and left hip is used to get the horizontal axis and its cross product with the vertical axis of global coordinate system assumed to be the depth axis of the proposed local coordinate system. Instead of using 3D joint angles, 8 number of limbs and their corresponding 3D angles with X, Y, and Z axes of the proposed coordinate system are compressed with several dimension reduction methods such as averaging filter, Haar wavelet transform (HWT), and discrete cosine transform (DCT) and employed as the feature vector. Finally, extracted features are trained and tested with LSTM (long short-term memory) network, which is an artificial recurrent neural network (RNN) architecture. Experimental and benchmarking results indicate that proposed framework boosts the performance of LSTM by approximately 30% accuracy in low-dimensional feature space. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

Review

Jump to: Research

36 pages, 7252 KiB  
Review
Robustness of Deep Learning Models for Vision Tasks
by Youngseok Lee and Jongweon Kim
Appl. Sci. 2023, 13(7), 4422; https://doi.org/10.3390/app13074422 - 30 Mar 2023
Cited by 4 | Viewed by 3794
Abstract
In recent years, artificial intelligence technologies in vision tasks have gradually begun to be applied to the physical world, proving they are vulnerable to adversarial attacks. Thus, the importance of improving robustness against adversarial attacks has emerged as an urgent issue in vision [...] Read more.
In recent years, artificial intelligence technologies in vision tasks have gradually begun to be applied to the physical world, proving they are vulnerable to adversarial attacks. Thus, the importance of improving robustness against adversarial attacks has emerged as an urgent issue in vision tasks. This article aims to provide a historical summary of the evolution of adversarial attacks and defense methods on CNN-based models and also introduces studies focusing on brain-inspired models that mimic the visual cortex, which is resistant to adversarial attacks. As the origination of CNN models was in the application of physiological findings related to the visual cortex of the time, new physiological studies related to the visual cortex provide an opportunity to create more robust models against adversarial attacks. The authors hope this review will promote interest and progress in artificially intelligent security by improving the robustness of deep learning models for vision tasks. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

Back to TopTop