MDPI - Publisher of Open Access Journals

37 pages, 2286 KB

Open AccessArticle

Parameterised Quantum SVM with Data-Driven Entanglement for Zero-Day Exploit Detection

by Steven Jabulani Nhlapo, Elodie Ngoie Mutombo and Mike Nkongolo Wa Nkongolo

Computers 2025, 14(8), 331; https://doi.org/10.3390/computers14080331 - 15 Aug 2025

Viewed by 534

Zero-day attacks pose a persistent threat to computing infrastructure by exploiting previously unknown software vulnerabilities that evade traditional signature-based network intrusion detection systems (NIDSs). To address this limitation, machine learning (ML) techniques offer a promising approach for enhancing anomaly detection in network traffic. [...] Read more.

Zero-day attacks pose a persistent threat to computing infrastructure by exploiting previously unknown software vulnerabilities that evade traditional signature-based network intrusion detection systems (NIDSs). To address this limitation, machine learning (ML) techniques offer a promising approach for enhancing anomaly detection in network traffic. This study evaluates several ML models on a labeled network traffic dataset, with a focus on zero-day attack detection. Ensemble learning methods, particularly eXtreme gradient boosting (XGBoost), achieved perfect classification, identifying all 6231 zero-day instances without false positives and maintaining efficient training and prediction times. While classical support vector machines (SVMs) performed modestly at 64% accuracy, their performance improved to 98% with the use of the borderline synthetic minority oversampling technique (SMOTE) and SMOTE + edited nearest neighbours (SMOTEENN). To explore quantum-enhanced alternatives, a quantum SVM (QSVM) is implemented using three-qubit and four-qubit quantum circuits simulated on the aer_simulator_statevector. The QSVM achieved high accuracy (99.89%) and strong F1-scores (98.95%), indicating that nonlinear quantum feature maps (QFMs) can increase sensitivity to zero-day exploit patterns. Unlike prior work that applies standard quantum kernels, this study introduces a parameterised quantum feature encoding scheme, where each classical feature is mapped using a nonlinear function tuned by a set of learnable parameters. Additionally, a sparse entanglement topology is derived from mutual information between features, ensuring a compact and data-adaptive quantum circuit that aligns with the resource constraints of noisy intermediate-scale quantum (NISQ) devices. Our contribution lies in formalising a quantum circuit design that enables scalable, expressive, and generalisable quantum architectures tailored for zero-day attack detection. This extends beyond conventional usage of QSVMs by offering a principled approach to quantum circuit construction for cybersecurity. While these findings are obtained via noiseless simulation, they provide a theoretical proof of concept for the viability of quantum ML (QML) in network security. Future work should target real quantum hardware execution and adaptive sampling techniques to assess robustness under decoherence, gate errors, and dynamic threat environments. Full article

(This article belongs to the Special Issue Intrusion Detection and Trust Provisioning in Edge-of-Things Environment)

► Show Figures

Figure 1

21 pages, 3919 KB

Open AccessArticle

Comparative Analysis of Resampling Techniques for Class Imbalance in Financial Distress Prediction Using XGBoost

by Guodong Hou, Dong Ling Tong, Soung Yue Liew and Peng Yin Choo

Mathematics 2025, 13(13), 2186; https://doi.org/10.3390/math13132186 - 4 Jul 2025

Viewed by 573

Abstract

One of the key challenges in financial distress data is class imbalance, where the data are characterized by a highly imbalanced ratio between the number of distressed and non-distressed samples. This study examines eight resampling techniques for improving distress prediction using the XGBoost [...] Read more.

One of the key challenges in financial distress data is class imbalance, where the data are characterized by a highly imbalanced ratio between the number of distressed and non-distressed samples. This study examines eight resampling techniques for improving distress prediction using the XGBoost algorithm. The study was performed on a dataset acquired from the CSMAR database, containing 26,383 firm-quarter samples from 639 Chinese A-share listed companies (2007–2024), with only 12.1% of the cases being distressed. Results show that standard Synthetic Minority Oversampling Technique (SMOTE) enhanced F1-score (up to 0.73) and Matthews Correlation Coefficient (MCC, up to 0.70), while SMOTE-Tomek and Borderline-SMOTE further boosted recall, slightly sacrificing precision. These oversampling and hybrid methods also maintained reasonable computational efficiency. However, Random Undersampling (RUS), though yielding high recall (0.85), suffered from low precision (0.46) and weaker generalization, but was the fastest method. Among all techniques, Bagging-SMOTE achieved balanced performance (AUC 0.96, F1 0.72, PR-AUC 0.80, MCC 0.68) using a minority-to-majority ratio of 0.15, demonstrating that ensemble-based resampling can improve robustness with minimal impact on the original class distribution, albeit with higher computational cost. The compared findings highlight that no single approach fits all use cases, and technique selection should align with specific goals. Techniques favoring recall (e.g., Bagging-SMOTE, SMOTE-Tomek) are suited for early warning, while conservative techniques (e.g., Tomek Links) help reduce false positives in risk-sensitive applications, and efficient methods such as RUS are preferable when computational speed is a priority. Full article

(This article belongs to the Special Issue New Advances in Computational Finance and Computational Intelligence in Finance)

► Show Figures

Figure 1

19 pages, 2124 KB

Open AccessArticle

A Unified Deep Learning Ensemble Framework for Voice-Based Parkinson’s Disease Detection and Motor Severity Prediction

by Madjda Khedimi, Tao Zhang, Chaima Dehmani, Xin Zhao and Yanzhang Geng

Bioengineering 2025, 12(7), 699; https://doi.org/10.3390/bioengineering12070699 - 27 Jun 2025

Viewed by 860

Abstract

This study presents a hybrid ensemble learning framework for the joint detection and motor severity prediction of Parkinson’s disease (PD) using biomedical voice features. The proposed architecture integrates a deep multimodal fusion model with dense expert pathways, multi-head self-attention, and multitask output branches [...] Read more.

This study presents a hybrid ensemble learning framework for the joint detection and motor severity prediction of Parkinson’s disease (PD) using biomedical voice features. The proposed architecture integrates a deep multimodal fusion model with dense expert pathways, multi-head self-attention, and multitask output branches to simultaneously perform binary classification and regression. To ensure data quality and improve model generalization, preprocessing steps included outlier removal via Isolation Forest, two-stage feature scaling (RobustScaler followed by MinMaxScaler), and augmentation through polynomial and interaction terms. Borderline-SMOTE was employed to address class imbalance in the classification task. To enhance prediction performance, ensemble learning strategies were applied by stacking outputs from the fusion model with tree-based regressors (Random Forest, Gradient Boosting, and XGBoost), using diverse meta-learners including XGBoost, Ridge Regression, and a deep neural network. Among these, the Stacking Ensemble with XGBoost (SE-XGB) achieved the best results, with an R² of 99.78% and RMSE of 0.3802 for UPDRS regression and 99.37% accuracy for PD classification. Comparative analysis with recent literature highlights the superior performance of our framework, particularly in regression settings. These findings demonstrate the effectiveness of combining advanced feature engineering, deep learning, and ensemble meta-modeling for building accurate and generalizable models in voice-based PD monitoring. This work provides a scalable foundation for future clinical decision support systems. Full article

(This article belongs to the Special Issue Machine Learning and Artificial Intelligence for Biomedical Applications, 3rd Edition)

► Show Figures

Figure 1

44 pages, 13985 KB

Open AccessArticle

Improving Transformer Health Index Prediction Performance Using Machine Learning Algorithms with a Synthetic Minority Oversampling Technique

by Muhammad Akmal A. Putra, Suwarno and Rahman Azis Prasojo

Energies 2025, 18(9), 2364; https://doi.org/10.3390/en18092364 - 6 May 2025

Viewed by 849

Abstract

Machine learning (ML) has emerged as a powerful tool in transformer condition assessment, enabling more accurate diagnostics by leveraging historical test data. However, imbalanced datasets, often characterized by limited samples in poor transformer conditions, pose significant challenges to model performance. This study investigates [...] Read more.

Machine learning (ML) has emerged as a powerful tool in transformer condition assessment, enabling more accurate diagnostics by leveraging historical test data. However, imbalanced datasets, often characterized by limited samples in poor transformer conditions, pose significant challenges to model performance. This study investigates the application of oversampling techniques to enhance ML model accuracy in predicting the Health Index of transformers. A dataset comprising 3850 transformer tests collected from utilities across Indonesia was used. Key parameters, including oil quality, dissolved gas analysis, and paper condition factors, were employed as inputs for ML modeling. To address the class imbalance, various oversampling methods, such as the Synthetic Minority Oversampling Technique (SMOTE), Borderline-SMOTE, SMOTE-Tomek, and SMOTE-ENN, were implemented and compared. This study explores the impact of these techniques on model performance, focusing on classification accuracy, precision, recall, and F1-score. The results reveal that all SMOTE-based methods improved model performance, with SMOTE-ENN yielding the best outcomes. It significantly reduced classification errors, particularly for minority classes, ensuring better predictive reliability. These findings underscore the importance of advanced oversampling techniques in improving transformer diagnostics. By effectively addressing the challenges posed by imbalanced datasets, this research provides a robust framework for applying ML in transformer condition monitoring and other domains with similar data constraints. Full article

(This article belongs to the Special Issue Dielectric Insulation in Medium- and High-Voltage Power Equipment—Degradation and Failure Mechanism, Diagnostics, and Electrical Parameters Improvement: 2nd Edition)

► Show Figures

Figure 1

20 pages, 3197 KB

Open AccessArticle

Research on Intrusion Detection Method Based on Transformer and CNN-BiLSTM in Internet of Things

by Chunhui Zhang, Jian Li, Naile Wang and Dejun Zhang

Sensors 2025, 25(9), 2725; https://doi.org/10.3390/s25092725 - 25 Apr 2025

Cited by 2 | Viewed by 2130

Abstract

With the widespread deployment of Internet of Things (IoT) devices, their complex network environments and open communication modes have made them prime targets for cyberattacks. Traditional Intrusion Detection Systems (IDS) face challenges in handling complex attack types, data imbalance, and feature extraction difficulties [...] Read more.

With the widespread deployment of Internet of Things (IoT) devices, their complex network environments and open communication modes have made them prime targets for cyberattacks. Traditional Intrusion Detection Systems (IDS) face challenges in handling complex attack types, data imbalance, and feature extraction difficulties in IoT environments. Accurately detecting abnormal traffic in IoT has become increasingly critical. To address the limitation of single models in comprehensively capturing the diverse features of IoT traffic, this paper proposes a hybrid model based on CNN-BiLSTM-Transformer, which better handles complex features and long-sequence dependencies in intrusion detection. To address the issue of data class imbalance, the Borderline-SMOTE method is introduced to enhance the model’s ability to recognize minority class attack samples. To tackle the problem of redundant features in the original dataset, a comprehensive feature selection strategy combining XGBoost, Chi-square (Chi2), and Mutual Information is adopted to ensure the model focuses on the most discriminative features. Experimental validation demonstrates that the proposed method achieves 99.80% accuracy on the CIC-IDS 2017 dataset and 97.95% accuracy on the BoT-IoT dataset, significantly outperforming traditional intrusion detection methods, proving its efficiency and accuracy in detecting abnormal traffic in IoT environments. Full article

(This article belongs to the Section Internet of Things)

► Show Figures

Figure 1

17 pages, 2638 KB

Open AccessArticle

An Evaluation of Mine Water Inrush Based on Data Expansion and Machine Learning

by Ye Zhang and Shoufeng Tang

Appl. Sci. 2025, 15(8), 4229; https://doi.org/10.3390/app15084229 - 11 Apr 2025

Cited by 1 | Viewed by 365

Abstract

The accuracy of coal mine water inrush prediction models is affected mainly by the small number of samples and difficulty in feature extraction. A new data augmentation water inrush prediction method is proposed. This method uses the natural neighbor theory and mutual information [...] Read more.

The accuracy of coal mine water inrush prediction models is affected mainly by the small number of samples and difficulty in feature extraction. A new data augmentation water inrush prediction method is proposed. This method uses the natural neighbor theory and mutual information sparse autoencoder-improved SMOTE to augment and predict the risk of water inrush. By learning features through the autoencoder, we can achieve better separation between classes and weaken the influence of data overlap between classes in the original sample. Then, the natural neighbor search algorithm is used to determine the intrinsic neighbor relationships between samples, remove outliers and noise samples, and use different oversampling methods for borderline samples and center samples in the minority class. Synthetic samples are generated in the feature space, mapped back to the original space, and merged with the original samples to form an expanded water inrush dataset. Finally, the experiment demonstrates that the enhanced SMOTE oversampling algorithm suggested in this paper broadens the dataset. With a Gmean value of 0.9025 from training with the standard dataset, it outperforms the contrast algorithm, SMOTE average of 0.8581, B-SMOTE average of 0.873, and ADASYN average of 0.8909. Additionally, it performs well in the coal mine floor water inrush dataset, increasing the water inrush prediction algorithm’s accuracy. Full article

► Show Figures

Figure 1

32 pages, 4876 KB

Open AccessArticle

Research on Network Intrusion Detection Model Based on Hybrid Sampling and Deep Learning

by Derui Guo and Yufei Xie

Sensors 2025, 25(5), 1578; https://doi.org/10.3390/s25051578 - 4 Mar 2025

Cited by 1 | Viewed by 2503

Abstract

This study proposes an enhanced network intrusion detection model, 1D-TCN-ResNet-BiGRU-Multi-Head Attention (TRBMA), aimed at addressing the issues of incomplete learning of temporal features and low accuracy in the classification of malicious traffic found in existing models. The TRBMA model utilizes Temporal Convolutional Networks [...] Read more.

This study proposes an enhanced network intrusion detection model, 1D-TCN-ResNet-BiGRU-Multi-Head Attention (TRBMA), aimed at addressing the issues of incomplete learning of temporal features and low accuracy in the classification of malicious traffic found in existing models. The TRBMA model utilizes Temporal Convolutional Networks (TCNs) to improve the ResNet18 architecture and incorporates Bidirectional Gated Recurrent Units (BiGRUs) and Multi-Head Self-Attention mechanisms to enhance the comprehensive learning of temporal features. Additionally, the ResNet network is adapted into a one-dimensional version that is more suitable for processing time-series data, while the AdamW optimizer is employed to improve the convergence speed and generalization ability during model training. Experimental results on the CIC-IDS-2017 dataset indicate that the TRBMA model achieves an accuracy of 98.66% in predicting malicious traffic types, with improvements in precision, recall, and F1-score compared to the baseline model. Furthermore, to address the challenge of low identification rates for malicious traffic types with small sample sizes in unbalanced datasets, this paper introduces TRBMA (BS-OSS), a variant of the TRBMA model that integrates Borderline SMOTE-OSS hybrid sampling. Experimental results demonstrate that this model effectively identifies malicious traffic types with small sample sizes, achieving an overall prediction accuracy of 99.88%, thereby significantly enhancing the performance of the network intrusion detection model. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

18 pages, 6065 KB

Open AccessArticle

Risk Assessment of High-Voltage Power Grid Under Typhoon Disaster Based on Model-Driven and Data-Driven Methods

by Xiao Zhou and Jiang Li

Energies 2025, 18(4), 809; https://doi.org/10.3390/en18040809 - 9 Feb 2025

Cited by 1 | Viewed by 1332

Abstract

As global warming continues to intensify, typhoon disasters will more frequently occur in East and Southeast Asia, posing a high risk of causing large-scale power outages in the power system. To investigate the impact of typhoon disasters on high-voltage power grids, a comprehensive [...] Read more.

As global warming continues to intensify, typhoon disasters will more frequently occur in East and Southeast Asia, posing a high risk of causing large-scale power outages in the power system. To investigate the impact of typhoon disasters on high-voltage power grids, a comprehensive risk assessment method that integrates model-driven and data-driven approaches is proposed, which can predict power grid faults in advance and provide support for power grid operators to generate emergency dispatching plans. Firstly, by comparing actual loads with the design strengths of the transmission tower-line system and analyzing the geometric relationship between typhoon wind circles and the system, key variables, such as wind speed, longitude, latitude, and other pertinent factors, are screened. The Spearman correlation coefficient is employed to pinpoint the meteorological variables that exhibit a high degree of relevance, enhancing the accuracy and interpretability of our model. Secondly, addressing the lack of power grid fault samples, three data balancing methods—Borderline-SMOTE, ADASYN, and SMOTE-Tomek—are compared, with Borderline-SMOTE selected for its superior performance in enhancing the sample set. Additionally, a power grid failure risk assessment model is built based on Light Gradient Boosting Machine (LightGBM), and the Borderline-Smoothing Algorithm (BSA) is used for the modeling of power grid faults. The nonlinear mapping relationship between typhoon meteorological data and the power grid equipment failure rate is extracted through deep learning training. Subsequently, the Tree-structured Parzen Estimator (TPE) is leveraged to optimize the hyperparameters of the LightGBM model, thus enhancing its prediction accuracy. Finally, the actual power system data of a province in China under a strong typhoon are assessed, validating the proposed assessment method’s effectiveness. Full article

(This article belongs to the Section F: Electrical Engineering)

► Show Figures

Figure 1

26 pages, 1911 KB

Open AccessArticle

Machine Learning-Based Stacking Ensemble Model for Prediction of Heart Disease with Explainable AI and K-Fold Cross-Validation: A Symmetric Approach

by Sara Qamar Sultan, Nadeem Javaid, Nabil Alrajeh and Muhammad Aslam

Symmetry 2025, 17(2), 185; https://doi.org/10.3390/sym17020185 - 25 Jan 2025

Cited by 3 | Viewed by 3588

Abstract

One of the most complex and prevalent diseases is heart disease (HD). It is among the main causes of death around the globe. With changes in lifestyles and the environment, its prevalence is rising rapidly. The prediction of the disease in its early [...] Read more.

One of the most complex and prevalent diseases is heart disease (HD). It is among the main causes of death around the globe. With changes in lifestyles and the environment, its prevalence is rising rapidly. The prediction of the disease in its early stages is crucial, as delays in diagnosis can cause serious complications and even death. Machine learning (ML) can be effective in this regard. Many researchers have used different techniques for the efficient detection of the disease and to overcome the drawbacks of existing models. Several ensemble models have also been applied. We proposed a stacking ensemble model named NCDG, which uses Naive Bayes, Categorical Boosting, and Decision Tree as base learners, with Gradient Boosting serving as the meta-learner classifier. We performed preprocessing using a factorization method to convert string columns into integers. We employ the Synthetic Minority Oversampling TEchnique (SMOTE) and BorderLineSMOTE balancing techniques to address the issue of data class imbalance. Additionally, we implemented hard and soft voting using voting classifier and compared the results with the proposed stacking model. For the Artificial Intelligence-based eXplainability of our proposed NCDG model, we use the SHapley Additive exPlanations (SHAP) technique. The outcomes show that our suggested stacking model, NCDG, performs better than the benchmark existing techniques. The experimental results of our proposed stacking model achieved the highest accuracy, F1-Score, precision and recall of 0.91, 0.91, 0.91 and 0.91, respectively, and an execution time of 653 s. Moreover, we have also utilized K-Fold Cross-Validation method to validate our predicted results. It is worth mentioning that our prediction results and their validation strongly coincide with each other which proves our approach to be symmetric. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

19 pages, 4555 KB

Open AccessArticle

Enhanced Intrusion Detection for ICS Using MS1DCNN and Transformer to Tackle Data Imbalance

by Yuanlin Zhang, Lei Zhang and Xiaoyuan Zheng

Sensors 2024, 24(24), 7883; https://doi.org/10.3390/s24247883 - 10 Dec 2024

Cited by 1 | Viewed by 1305

Abstract

With the escalating threat posed by network intrusions, the development of efficient intrusion detection systems (IDSs) has become imperative. This study focuses on improving detection performance in programmable logic controller (PLC) network security while addressing challenges related to data imbalance and long-tail distributions. [...] Read more.

With the escalating threat posed by network intrusions, the development of efficient intrusion detection systems (IDSs) has become imperative. This study focuses on improving detection performance in programmable logic controller (PLC) network security while addressing challenges related to data imbalance and long-tail distributions. A dataset containing five types of attacks targeting programmable logic controllers (PLCs) in industrial control systems (ICS) was first constructed. To address class imbalance and challenges posed by complex network traffic, Synthetic Minority Oversampling Technique (SMOTE) and Borderline-SMOTE were applied to oversample minority classes, thereby enhancing their diversity. This paper proposes a dual-channel feature extraction model that integrates a multi-scale one-dimensional convolutional neural network (MS1DCNN) and a Weight-Dropped Transformer (WDTransformer) for IDS. The MS1DCNN is designed to extract fine-grained temporal features from packet-level data, whereas the WDTransformer leverages self-attention mechanisms to capture long-range dependencies and incorporates regularization techniques to mitigate overfitting. To further enhance performance on long-tail distributions, a custom combined loss function was developed by integrating cross-entropy loss and focal loss to reduce misclassification in minority classes. Experimental validation on the constructed dataset demonstrated that the proposed model achieved an accuracy of 95.11% and an F1 score of 95.12%, significantly outperforming traditional machine learning and deep learning models. Full article

(This article belongs to the Section Internet of Things)

► Show Figures

Figure 1

33 pages, 5826 KB

Open AccessArticle

Improving Churn Detection in the Banking Sector: A Machine Learning Approach with Probability Calibration Techniques

by Alin-Gabriel Văduva, Simona-Vasilica Oprea, Andreea-Mihaela Niculae, Adela Bâra and Anca-Ioana Andreescu

Electronics 2024, 13(22), 4527; https://doi.org/10.3390/electronics13224527 - 18 Nov 2024

Cited by 6 | Viewed by 5355

Abstract

Identifying and reducing customer churn have become a priority for financial institutions seeking to retain clients. Our research focuses on customer churn rate analysis using advanced machine learning (ML) techniques, leveraging a synthetic dataset sourced from the Kaggle platform. The dataset undergoes a [...] Read more.

Identifying and reducing customer churn have become a priority for financial institutions seeking to retain clients. Our research focuses on customer churn rate analysis using advanced machine learning (ML) techniques, leveraging a synthetic dataset sourced from the Kaggle platform. The dataset undergoes a preprocessing phase to select variables directly impacting customer churn behavior. SMOTETomek, a hybrid technique that combines oversampling of the minority class (churn) with SMOTE and the removal of noisy or borderline instances through Tomek links, is applied to balance the dataset and improve class separability. Two cutting-edge ML models are applied—random forest (RF) and the Light Gradient-Boosting Machine (LGBM) Classifier. To evaluate the effectiveness of these models, several key performance metrics are utilized, including precision, sensitivity, F1 score, accuracy, and Brier score, which helps assess the calibration of the predicted probabilities. A particular contribution of our research is on calibrating classification probabilities, as many ML models tend to produce uncalibrated probabilities due to the complexity of their internal mechanisms. Probability calibration techniques are employed to adjust the predicted probabilities, enhancing their reliability and interpretability. Furthermore, the Shapley Additive Explanations (SHAP) method, an explainable artificial intelligence (XAI) technique, is further implemented to increase the transparency and credibility of the model’s decision-making process. SHAP provides insights into the importance of individual features in predicting churn, providing knowledge to banking institutions for the development of personalized customer retention strategies. Full article

(This article belongs to the Special Issue Applied Machine Learning in Intelligent Systems)

► Show Figures

Figure 1

23 pages, 4001 KB

Open AccessArticle

Enhancing Firewall Packet Classification through Artificial Neural Networks and Synthetic Minority Over-Sampling Technique: An Innovative Approach with Evaluative Comparison

by Adem Korkmaz, Selma Bulut, Tarık Talan, Selahattin Kosunalp and Teodor Iliev

Appl. Sci. 2024, 14(16), 7426; https://doi.org/10.3390/app14167426 - 22 Aug 2024

Cited by 3 | Viewed by 2029

Abstract

Firewall packet classification is a critical component of network security, demanding precise and reliable methods to ensure optimal functionality. This study introduces an advanced approach that combines Artificial Neural Networks (ANNs) with various data balancing techniques, including the Synthetic Minority Over-sampling Technique (SMOTE), [...] Read more.

Firewall packet classification is a critical component of network security, demanding precise and reliable methods to ensure optimal functionality. This study introduces an advanced approach that combines Artificial Neural Networks (ANNs) with various data balancing techniques, including the Synthetic Minority Over-sampling Technique (SMOTE), ADASYN, and BorderlineSMOTE, to enhance the classification of firewall packets into four distinct classes: ‘allow’, ‘deny’, ‘drop’, and ‘reset-both’. Initial experiments without data balancing revealed that while the ANN model achieved perfect precision, recall, and F1-Scores for the ‘allow’, ‘deny’, and ‘drop’ classes, it struggled to accurately classify the ‘reset-both’ class. To address this, we applied SMOTE, ADASYN, and BorderlineSMOTE to mitigate class imbalance, which led to significant improvements in overall classification performance. Among the techniques, the ANN combined with BorderlineSMOTE demonstrated superior efficacy, achieving a 97% overall accuracy and consistently high performance across all classes, particularly in the accurate classification of minority classes. In contrast, while SMOTE and ADASYN also improved the model’s performance, the results with BorderlineSMOTE were notably more balanced and reliable. This study provides a comparative analysis with existing machine learning models, highlighting the effectiveness of the proposed approach in firewall packet classification. The synthesized results validate the potential of integrating ANNs with advanced data balancing techniques to enhance the robustness and reliability of network security systems. The findings underscore the importance of addressing class imbalance in machine learning models, particularly in security-critical applications, and offer valuable insights for the design and improvement of future network security infrastructures. Full article

(This article belongs to the Special Issue Progress and Research in Cybersecurity and Data Privacy)

► Show Figures

Figure 1

20 pages, 3083 KB

Open AccessArticle

Efficient Detection of Irrelevant User Reviews Using Machine Learning

by Cheolgi Kim and Hyeon Gyu Kim

Appl. Sci. 2024, 14(16), 6900; https://doi.org/10.3390/app14166900 - 7 Aug 2024

Viewed by 1545

Abstract

User reviews such as SNS feeds and blog writings have been widely used to extract opinions, complains, and requirements about a given place or product from users’ perspective. However, during the process of collecting them, a lot of reviews that are irrelevant to [...] Read more.

User reviews such as SNS feeds and blog writings have been widely used to extract opinions, complains, and requirements about a given place or product from users’ perspective. However, during the process of collecting them, a lot of reviews that are irrelevant to a given search keyword can be included in the results. Such irrelevant reviews may lead to distorted results in data analysis. In this paper, we discuss a method to detect irrelevant user reviews efficiently by combining various oversampling and machine learning algorithms. About 35,000 user reviews collected from 25 restaurants and 33 tourist attractions in Ulsan Metropolitan City, South Korea, were used for learning, where the ratio of irrelevant reviews in the two kinds of data sets was 53.7% and 71.6%, respectively. To deal with skewness in the collected reviews, oversampling algorithms such as SMOTE, Borderline-SMOTE, and ADASYN were used. To build a model for the detection of irrelevant reviews, RNN, LSTM, GRU, and BERT were adopted and compared, as they are known to provide high accuracy in text processing. The performance of the detection models was examined through experiments, and the results showed that the BERT model presented the best performance, with an F1 score of 0.965. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

17 pages, 1054 KB

Open AccessArticle

Predictive Modeling of COVID-19 Readmissions: Insights from Machine Learning and Deep Learning Approaches

by Wei Kit Loo, Wingates Voon, Anwar Suhaimi, Cindy Shuan Ju Teh, Yee Kai Tee, Yan Chai Hum, Khairunnisa Hasikin, Kareen Teo, Hang Cheng Ong and Khin Wee Lai

Diagnostics 2024, 14(14), 1511; https://doi.org/10.3390/diagnostics14141511 - 12 Jul 2024

Viewed by 1317

Abstract

This project employs artificial intelligence, including machine learning and deep learning, to assess COVID-19 readmission risk in Malaysia. It offers tools to mitigate healthcare resource strain and enhance patient outcomes. This study outlines a methodology for classifying COVID-19 readmissions. It starts with dataset [...] Read more.

This project employs artificial intelligence, including machine learning and deep learning, to assess COVID-19 readmission risk in Malaysia. It offers tools to mitigate healthcare resource strain and enhance patient outcomes. This study outlines a methodology for classifying COVID-19 readmissions. It starts with dataset description and pre-processing, while the data balancing was computed through Random Oversampling, Borderline SMOTE, and Adaptive Synthetic Sampling. Nine machine learning and ten deep learning techniques are applied, with five-fold cross-validation for evaluation. Optuna is used for hyperparameter selection, while the consistency in training hyperparameters is maintained. Evaluation metrics encompass accuracy, AUC, and training/inference times. Results were based on stratified five-fold cross-validation and different data-balancing methods. Notably, CatBoost consistently excelled in accuracy and AUC across all tables. Using ROS, CatBoost achieved the highest accuracy (0.9882 ± 0.0020) with an AUC of 1.0000 ± 0.0000. CatBoost maintained its superiority in BSMOTE and ADASYN as well. Deep learning approaches performed well, with SAINT leading in ROS and TabNet leading in BSMOTE and ADASYN. Decision Tree ensembles like Random Forest and XGBoost consistently showed strong performance. Full article

(This article belongs to the Special Issue Machine Learning in Signal and Image Analysis for Biomedical Application: 2nd Edition)

► Show Figures

Figure 1

20 pages, 12282 KB

Open AccessArticle

Harnessing Machine Learning and Data Fusion for Accurate Undocumented Well Identification in Satellite Images

by Teeratorn Kadeethum and Christine Downs

Remote Sens. 2024, 16(12), 2116; https://doi.org/10.3390/rs16122116 - 11 Jun 2024

Cited by 1 | Viewed by 1356

Abstract

This study utilizes satellite data to detect undocumented oil and gas wells, which pose significant environmental concerns, including greenhouse gas emissions. Three key findings emerge from the study. Firstly, the problem of imbalanced data is addressed by recommending oversampling techniques like Rotation–GaussianBlur–Solarization data [...] Read more.

This study utilizes satellite data to detect undocumented oil and gas wells, which pose significant environmental concerns, including greenhouse gas emissions. Three key findings emerge from the study. Firstly, the problem of imbalanced data is addressed by recommending oversampling techniques like Rotation–GaussianBlur–Solarization data augmentation (RGS), the Synthetic Minority Over-Sampling Technique (SMOTE), or ADASYN (an extension of SMOTE) over undersampling techniques. The performance of borderline SMOTE is less effective than that of the rest of the oversampling techniques, as its performance relies heavily on the quality and distribution of data near the decision boundary. Secondly, incorporating pre-trained models trained on large-scale datasets enhances the models’ generalization ability, with models trained on one county’s dataset demonstrating high overall accuracy, recall, and F1 scores that can be extended to other areas. This transferability of models allows for wider application. Lastly, including persistent homology (PH) as an additional input improves performance for in-distribution testing but may affect the model’s generalization for out-of-distribution testing. A careful consideration of PH’s impact on overall performance and generalizability is recommended. Overall, this study provides a robust approach to identifying undocumented oil and gas wells, contributing to the acceleration of a net-zero economy and supporting environmental sustainability efforts. Full article

► Show Figures

Figure 1

Search Results (54)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (54)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI