Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (489)

Search Parameters:
Keywords = ensemble decision tree model

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
26 pages, 7623 KiB  
Article
An Ensemble Classification Method Based on a Stacking Strategy for Ship Type Classification with AIS Data
by Lei Deng, Shichen Yang, Limin Jia and Danyang Geng
J. Mar. Sci. Eng. 2025, 13(5), 886; https://doi.org/10.3390/jmse13050886 - 29 Apr 2025
Viewed by 94
Abstract
Ship type (e.g., Cargo, Tanker and Fishing) classification is crucial for marine management, environmental protection, and maritime safety, as it enhances navigation safety and aids regulatory agencies in combating illegal activities. Traditional ship type classification methods with AIS data are often plagued by [...] Read more.
Ship type (e.g., Cargo, Tanker and Fishing) classification is crucial for marine management, environmental protection, and maritime safety, as it enhances navigation safety and aids regulatory agencies in combating illegal activities. Traditional ship type classification methods with AIS data are often plagued by problems such as data imbalance, insufficient feature extraction, reliance on single-model approaches, or unscientific model combination methods, which reduce the accuracy of classification. In this paper, we propose an ensemble classification method based on a stacking strategy to overcome these challenges. We apply the SMOTE technique to balance the dataset by generating minority class samples. Then, a more comprehensive ship behavior model is developed by combining static and dynamic features. A stacking strategy is adopted for the classification, integrating multiple tree structure-based classifiers to improve classification performance. The experimental results show that the ensemble classification method based on the stacking strategy outperforms traditional classifiers such as CatBoost, Random Forest, Decision Tree, LightGBM, and the ensemble classification method, especially in terms of improving classification precision, recall, F1 score, ROC curve, and AUC. This method improves the accuracy of ship type recognition, and it is suitable to real-time online classification, which is helpful for applications in marine safety monitoring, law enforcement, and illegal fishing detection. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

20 pages, 4164 KiB  
Article
MAL-XSEL: Enhancing Industrial Web Malware Detection with an Explainable Stacking Ensemble Model
by Ezz El-Din Hemdan, Samah Alshathri, Haitham Elwahsh, Osama A. Ghoneim and Amged Sayed
Processes 2025, 13(5), 1329; https://doi.org/10.3390/pr13051329 - 26 Apr 2025
Viewed by 189
Abstract
The escalating global incidence of malware presents critical cybersecurity threats to manufacturing, automation, and industrial process control systems. Given the fast-developing web applications and IoT devices in use by industry operations, securing a transparent and effective malware detection mechanism has become imperative to [...] Read more.
The escalating global incidence of malware presents critical cybersecurity threats to manufacturing, automation, and industrial process control systems. Given the fast-developing web applications and IoT devices in use by industry operations, securing a transparent and effective malware detection mechanism has become imperative to operational resilience and data integrity. Classical methods of malware detection are conventionally opaque “black boxes” with limited transparency, thus eroding trust and hindering deployment in security-sensitive contexts. In this respect, this research proposes MAL-XSEL—a malware detection framework using an explainable stacking ensemble learning approach for performing high-accuracy classification and interpretable decision-making. MAL-XSEL explicates the model predictions through Shapley additive explanations (SHAP) and local interpretable model-agnostic explanations (LIME), which enable security analysts to validate how the detection logic works and prioritize the features contributing to the most critical threats. Evaluated on two benchmark datasets, MAL-XSEL outperformed conventional machine learning models, achieving top accuracies of 99.62% (ClaMP dataset) and 99.16% (MalwareDataSet). Notably, it surpassed state-of-the-art algorithms such as LightGBM (99.52%), random forest (99.33%), and decision trees (98.89%) across both datasets while maintaining computational efficiency. A unique interaction of ensemble learning and XAI is employed for detection, not only with improved accuracy but also with interpretable insight into the behavior of malware, thereby allowing trust to be substantiated in an automated system. By closing the divide between performance and interpretability, MAL-XSEL enables cybersecurity practitioners to deploy transparent and auditable defenses against an ever-growing resource of threats. This work demonstrates how there can be no compromise on explainability in security-critical applications and, as such, establishes a roadmap for future research on industrial malware analysis tools. Full article
Show Figures

Figure 1

32 pages, 6398 KiB  
Article
Big Data-Driven Distributed Machine Learning for Scalable Credit Card Fraud Detection Using PySpark, XGBoost, and CatBoost
by Leonidas Theodorakopoulos, Alexandra Theodoropoulou, Anastasios Tsimakis and Constantinos Halkiopoulos
Electronics 2025, 14(9), 1754; https://doi.org/10.3390/electronics14091754 - 25 Apr 2025
Viewed by 402
Abstract
This study presents an optimization for a distributed machine learning framework to achieve credit card fraud detection scalability. Due to the growth in fraudulent activities, this research implements the PySpark-based processing of large-scale transaction datasets, integrating advanced machine learning models: Logistic Regression, Decision [...] Read more.
This study presents an optimization for a distributed machine learning framework to achieve credit card fraud detection scalability. Due to the growth in fraudulent activities, this research implements the PySpark-based processing of large-scale transaction datasets, integrating advanced machine learning models: Logistic Regression, Decision Trees, Random Forests, XGBoost, and CatBoost. These have been evaluated in terms of scalability, accuracy, and handling imbalanced datasets. Key findings: Among the most promising models for complex and imbalanced data, XGBoost and CatBoost promise close-to-ideal accuracy rates in fraudulent transaction detection. PySpark will be instrumental in scaling these systems to enable them to perform distributed processing, real-time analysis, and adaptive learning. This study further discusses challenges like overfitting, data access, and real-time implementation with potential solutions such as ensemble methods, intelligent sampling, and graph-based approaches. Future directions are underlined by deploying these frameworks in live transaction environments, leveraging continuous learning mechanisms, and integrating advanced anomaly detection techniques to handle evolving fraud patterns. The present research demonstrates the importance of distributed machine learning frameworks for developing robust, scalable, and efficient fraud detection systems, considering their significant impact on financial security and the overall financial ecosystem. Full article
(This article belongs to the Special Issue New Advances in Cloud Computing and Its Latest Applications)
Show Figures

Figure 1

18 pages, 4305 KiB  
Article
Decoding Depression from Different Brain Regions Using Hybrid Machine Learning Methods
by Qi Sang, Chen Chen and Zeguo Shao
Bioengineering 2025, 12(5), 449; https://doi.org/10.3390/bioengineering12050449 - 24 Apr 2025
Viewed by 248
Abstract
Depression has become one of the most common mental illnesses, causing severe physical and mental harm. To clarify the impact of brain region segmentation on the detection accuracy of moderate-to-severe major depressive disorder (MDD) and identify the optimal brain region for detecting MDD [...] Read more.
Depression has become one of the most common mental illnesses, causing severe physical and mental harm. To clarify the impact of brain region segmentation on the detection accuracy of moderate-to-severe major depressive disorder (MDD) and identify the optimal brain region for detecting MDD using electroencephalography (EEG), this study compared eight traditional single-machine learning algorithms with a hybrid machine learning model based on a stacking ensemble technique. The hybrid model employed K-nearest neighbors (KNN), decision tree (DT), and Extreme Gradient Boosting (XGBoost) as base learners and used a DT as the meta-learner. Compared with traditional single methods, the hybrid approach significantly improved detection accuracy by leveraging the strengths of different algorithms. In addition, this study divided the brain regions into the left and right temporal lobes and extracted both linear and nonlinear features to comprehensively capture the complexity and dynamic behavior of EEG signals, enhancing the model’s ability to distinguish features across different brain regions. The experimental results showed that among the eight traditional machine learning methods, the KNN classifier achieved the highest detection accuracy of 96.97% in the left temporal lobe region. In contrast, the stacking hybrid learning model further increased the detection accuracy to 98.07%, significantly outperforming the single models. Moreover, the analysis of the brain region segmentation revealed that the left temporal lobe exhibited higher discriminative power in detecting MDD, highlighting its important role in the neurobiology of depression. This study provides a solid foundation for developing more efficient and portable methods for detecting depression, offering new perspectives and approaches for EEG-based MDD detection, and contributing to the improvement in objectivity and precision in depression diagnosis. Full article
Show Figures

Figure 1

23 pages, 7608 KiB  
Article
Machine-Learning-Based Ensemble Prediction of the Snow Water Equivalent in the Upper Yalong River Basin
by Jujia Zhang, Mingxiang Yang, Ningpeng Dong and Yicheng Wang
Sustainability 2025, 17(9), 3779; https://doi.org/10.3390/su17093779 - 22 Apr 2025
Viewed by 230
Abstract
The snow water equivalent (SWE) in high-altitude regions is crucial for water resource management and disaster risk reduction, yet accurate predictions remain challenging due to complex snowmelt processes, nonlinear meteorological factors, and time-lag effects. This study used snow remote sensing products from the [...] Read more.
The snow water equivalent (SWE) in high-altitude regions is crucial for water resource management and disaster risk reduction, yet accurate predictions remain challenging due to complex snowmelt processes, nonlinear meteorological factors, and time-lag effects. This study used snow remote sensing products from the Advanced Microwave Scanning Radiometer (AMSR) as the predictand for evaluating SWE predictions. It applied nine machine learning models—linear regression (LR), decision trees (DT), support vector regression (SVR), random forest (RF), artificial neural networks (ANNs), AdaBoost, XGBoost, gradient boosting decision trees (GBDT), and CatBoost. For each machine learning model, submodels were constructed to predict the SWE for the next 1 to 30 days. The 30 submodels of each machine learning model formed the prediction model for the snow water equivalent over the next 30 days. Through an accuracy evaluation and ensemble forecasting, the snow water equivalent prediction for the next 30 days in the Yalong River above the Ganzi Basin was finally achieved. The results showed that for all models, the average Nash–Sutcliffe Efficiency (NSE) rate was greater than 0.8, the average root mean square error (RMSE) was under 8 mm, and the average relative error (RE) was below 7% across three lead time periods (1–10, 11–20, and 21–30 days). The ensemble average model, combining ANNs, GBDT, and CatBoost, demonstrated superior accuracy, with NSE values exceeding 0.85 and RMSE values under 6 mm. A sensitivity analysis using the Shapley Additive Explanations (SHAP) model revealed that temperature variables (average, minimum, and maximum temperatures) were the most influential factors, while relative humidity (Rhu) significantly affected the SWE by reducing evaporation. These findings provide insights for improving SWE prediction accuracy and support water resource management in high-altitude regions. Full article
Show Figures

Figure 1

17 pages, 1488 KiB  
Article
A Machine Learning Approach for Predicting Maternal Health Risks in Lower-Middle-Income Countries Using Sparse Data and Vital Signs
by Avnish Malde, Vishnunarayan Girishan Prabhu, Dishant Banga, Michael Hsieh, Chaithanya Renduchintala and Ronald Pirrallo
Future Internet 2025, 17(5), 190; https://doi.org/10.3390/fi17050190 - 22 Apr 2025
Viewed by 316
Abstract
According to the World Health Organization, maternal mortality rates remain a critical public health issue, with 94% of maternal deaths occurring in low- and middle-income countries (LMICs), where the rates reached 430 per 100,000 live births in 2020 compared to 13 in high-income [...] Read more.
According to the World Health Organization, maternal mortality rates remain a critical public health issue, with 94% of maternal deaths occurring in low- and middle-income countries (LMICs), where the rates reached 430 per 100,000 live births in 2020 compared to 13 in high-income countries. Despite this difference, only a few studies have investigated whether sparse data and features such as vital signs can effectively predict maternal health risks. This study addresses this gap by evaluating the predictive capability of vital sign data using machine learning models trained on a dataset of 1014 pregnant women from rural Bangladesh. This study developed multiple machine learning models using a dataset containing age, blood pressure, temperature, heart rate, and blood glucose of 1014 pregnant women from rural Bangladesh. The models’ performance were evaluated using regular, random and stratified sampling techniques. Additionally, we developed a stacking ensemble machine learning model combining multiple methods to evaluate predictive accuracy. A key contribution of this study is developing a stacking ensemble model combined with stratified sampling, an approach not previously considered in maternal health risk prediction. The ensemble model using stratified sampling achieved the highest accuracy (87.2%), outperforming CatBoost (84.7%), XGBoost (84.2%), random forest (81.3%) and decision trees (80.3%) without stratified sampling. Observations from our study demonstrate the feasibility of using sparse data and features for maternal health risk prediction using algorithms. By focusing on data from resource-constrained settings, we show that machine learning offers a convenient and accessible solution to improve prenatal care and reduce maternal deaths in LMICs. Full article
(This article belongs to the Special Issue Artificial Intelligence-Enabled Smart Healthcare)
Show Figures

Figure 1

13 pages, 2299 KiB  
Article
Machine Learning Introduces Electrophysiology Assessment as the Best Predictor for the Recovery Prognosis of Spinal Cord Injury Patients for Personalized Rehabilitation Approaches
by Dionysia Chrysanthakopoulou, Charalampos Matzaroglou, Eftychia Trachani and Constantinos Koutsojannis
Appl. Sci. 2025, 15(8), 4578; https://doi.org/10.3390/app15084578 - 21 Apr 2025
Viewed by 296
Abstract
The strong correlation between evoked potentials (EPs) and American Spinal Injury Association (ASIA) scores in individuals with spinal cord injury (SCI) suggests that EPs may serve as reliable predictive markers for rehabilitation progress. Numerous studies have confirmed a relationship between variations in somatosensory [...] Read more.
The strong correlation between evoked potentials (EPs) and American Spinal Injury Association (ASIA) scores in individuals with spinal cord injury (SCI) suggests that EPs may serve as reliable predictive markers for rehabilitation progress. Numerous studies have confirmed a relationship between variations in somatosensory evoked potentials (SSEPs) and ASIA scores, especially in the early stages of SCI. Machine learning’s (ML’s) increasing importance in medicine is driven by the growing availability of health data and improved algorithms. It enables the creation of predictive models for disease diagnosis, progression prediction, personalized treatment, and improved healthcare efficiency. Data-driven approaches can significantly improve patient care, reduce costs, and facilitate personalized medicine. The meticulous analysis of medical data is crucial for timely disease identification, leading to effective symptom management and appropriate treatment. This study applies artificial intelligence to identify predictors of SCI progression, as measured by the disability index, ASIA impairment scale (AIS), and final motor recovery. We aim to clarify the prognostic role of electrophysiological testing (SSEPs, MEPs, and nerve conduction studies (NCSs)) in SCI. We analyzed data from a medical database of 123 records. We developed an ML-based intelligent system, utilizing ensemble algorithms combining decision trees and neural network approaches, to predict SCI recovery. Our evaluation showed SEP accuracies of 90% for motor recovery prediction and 80% for AIS scale determination, comparable to full electrophysiology evaluation accuracies of 93% and 89%, respectively, and generally superior results compared to MEP and NCS results. EPs emerged as the best predictors, comparable to a comprehensive electrophysiology assessment, significantly improving accuracy compared to clinical findings alone. An electrophysiological assessment, when available, increased overall accuracy for final motor recovery prediction to 93% (from a maximum of 75%) and, for ASIA score determination, to 89% (from a maximum of 66%). Further validation is needed with a larger dataset. Future research should validate that sensory electrophysiology assessment is a less expensive, portable, and simpler alternative to other prognostic tests and more effective than clinical assessments, like the AIS, biomarker for SCI, and personalized rehabilitation planning. Full article
(This article belongs to the Special Issue Advanced Physical Therapy for Rehabilitation)
Show Figures

Figure 1

20 pages, 6307 KiB  
Article
Machine Learning Models for Chlorophyll-a Forecasting in a Freshwater Lake: Case Study of Lake Taihu
by Guojin Sun, Weitang Zhu, Xiaoyan Qian, Chunlei Wei, Pengfei Xie, Yao Shi, Xiaoyong Cao and Yi He
Water 2025, 17(8), 1219; https://doi.org/10.3390/w17081219 - 18 Apr 2025
Viewed by 217
Abstract
Cyanobacteria harmful blooms (Cyano-HABs) have become a globally critical environmental issue, threatening freshwater ecosystems by degrading water quality and posing risks to human and aquatic life. Chlorophyll-a (Chl-a), a key biomarker of bloom intensity, offers crucial insights into algal bloom dynamics. However, predicting [...] Read more.
Cyanobacteria harmful blooms (Cyano-HABs) have become a globally critical environmental issue, threatening freshwater ecosystems by degrading water quality and posing risks to human and aquatic life. Chlorophyll-a (Chl-a), a key biomarker of bloom intensity, offers crucial insights into algal bloom dynamics. However, predicting Chl-a concentrations remains challenging due to the complex interactions between various environmental factors. This study utilizes machine learning (ML) models to predict Chl-a concentrations, focusing on Lake Taihu in China, a large eutrophic lake that serves as an example of numerous freshwater lakes suffering from Cyano-HABs. The research leverages nine critical water quality parameters—water temperature, pH, dissolved oxygen, turbidity, electrical conductivity permanganate index, ammonia nitrogen, total phosphorus, and total nitrogen—to develop an ensemble ML model using XGBoost, known for its ability to handle nonlinear relationships and integrate multiple variables. The XGBoost model achieved superior predictive accuracy with an R2 value of 0.78 and RMSE of 8.97 mg/m3 on the test set, outperforming traditional models like linear regression, decision trees, multi-layer perceptrons, support vector regression, and random forests. Feature importance analysis identified electrical conductivity, turbidity, and water temperature as the most significant predictors of Chl-a levels. This study further enhances model interpretability through Pearson correlation analysis, which quantifies the relationships between Chl-a concentrations and other water quality factors. Additionally, we employed principal component analysis (PCA), mutual information, Spearman rank correlation coefficients, and SHAP models to analyze feature importance and model interpretability in ML. The model’s robustness was tested across multiple monitoring sites in Lake Taihu, demonstrating its potential for broader application in other eutrophic lakes facing similar environmental challenges. By providing a reliable tool for forecasting Chl-a concentrations, this research contributes to the development of early warning systems that can help mitigate the impacts of Cyano-HABs, aiding in more effective water resource management. Full article
(This article belongs to the Section Water Resources Management, Policy and Governance)
Show Figures

Figure 1

25 pages, 2389 KiB  
Article
Analysis of Demographic, Familial, and Social Determinants of Smoking Behavior Using Machine Learning Methods
by Joanna Chwał, Małgorzata Kostka, Paweł Stanisław Kostka, Radosław Dzik, Anna Filipowska and Rafał Jan Doniec
Appl. Sci. 2025, 15(8), 4442; https://doi.org/10.3390/app15084442 - 17 Apr 2025
Viewed by 301
Abstract
Smoking behavior, encompassing both traditional tobacco and electronic cigarette use, is influenced by a range of demographic, familial, and social factors. This study examines the relationship between smoking habits and family dynamics through a cross-sectional survey of 100 participants, using an anonymous questionnaire [...] Read more.
Smoking behavior, encompassing both traditional tobacco and electronic cigarette use, is influenced by a range of demographic, familial, and social factors. This study examines the relationship between smoking habits and family dynamics through a cross-sectional survey of 100 participants, using an anonymous questionnaire to collect demographic data, smoking patterns, and familial interactions. Validated instruments, including the Penn State Electronic Cigarette Dependence Index and the Family Relationship Assessment Scale, were employed to assess smoking dependence and family dynamics. The analysis identified key patterns, such as increased smoking frequency among individuals experiencing higher family tension and variations in smoking habits across age and gender groups. Nocturnal smoking was linked to higher cigarette consumption, whereas early-day smokers exhibited a lower desire to quit. Machine learning models were applied to predict and classify smoking behaviors based on socio-demographic and familial variables, with an ensemble learning model achieving the highest accuracy (93.33%), outperforming k-nearest neighbors (90.00%), support vector machines (80.00%), and decision trees (83.33%). These findings underscore the complex interplay between family relationships and smoking behavior, providing insights for public health interventions. Additionally, this study highlights the potential of machine learning in behavioral research, demonstrating its utility in identifying and predicting smoking-related patterns. Full article
(This article belongs to the Special Issue Artificial Intelligence in Medicine and Healthcare)
Show Figures

Figure 1

22 pages, 7978 KiB  
Article
Research on High Spatiotemporal Resolution of XCO2 in Sichuan Province Based on Stacking Ensemble Learning
by Zhaofei Li, Na Zhao, Han Zhang, Yang Wei, Yumin Chen and Run Ma
Sustainability 2025, 17(8), 3433; https://doi.org/10.3390/su17083433 - 11 Apr 2025
Viewed by 250
Abstract
Global warming caused by the increase in the atmospheric CO2 content has become a focal environmental issue of common concern to the international community. As a key resource support for achieving the “dual carbon” goals in Western China, Sichuan Province requires a [...] Read more.
Global warming caused by the increase in the atmospheric CO2 content has become a focal environmental issue of common concern to the international community. As a key resource support for achieving the “dual carbon” goals in Western China, Sichuan Province requires a deep analysis of its carbon sources, carbon sinks, and its characteristics in terms of atmospheric environmental capacity, which is of great significance for formulating effective regional sustainable development strategies and responding to global climate change. In view of the unique geographical and climatic conditions in Sichuan Province and the current situation of a low and uneven distribution of atmospheric environmental capacity, this paper uses three forms of multi-source satellite data, OCO-2, OCO-3, and GOSAT, combined with other auxiliary data, to generate a daily XCO2 concentration dataset with a spatial resolution of a 1km grid in Sichuan Province from 2015 to 2022. Based on the Optuna optimization method with 10-fold cross-validation, the optimal hyperparameter configuration of the four base learners of Stacking, random forest, gradient boosting decision tree, extreme gradient boosting, and the K nearest neighbor algorithm is searched for; finally, the logistic regression algorithm is used as the second-layer meta-learner to effectively improve the prediction accuracy and generalization ability of the Stacking ensemble learning model. According to the comparison of the performance of each model by cross-validation and TCCON site verification, the Stacking model significantly improved in accuracy, with an R2, RMSE, and MAE of 0.983, 0.87 ppm and 0.19 ppm, respectively, which is better than those of traditional models such as RF, KNN, XGBoost, and GBRT. The accuracy verification of the atmospheric XCO2 data estimated by the model based on the observation data of the two TCCON stations in Xianghe and Hefei showed that the correlation coefficients were 0.96 and 0.98, and the MAEs were 0.657 ppm and 0.639 ppm, respectively, further verifying the high accuracy and reliability of the model. At the same time, the fusion of multi-source satellite data significantly improved the spatial coverage of XCO2 concentration data in Sichuan Province, effectively filling the gap in single satellite observation data. Based on the reconstructed XCO2 dataset of Sichuan Province, the study revealed that there are significant regional and seasonal differences in the XCO2 concentrations in the region, showing seasonal variation characteristics of being higher in spring and winter and lower in summer and autumn; in terms of the spatial distribution, the overall spatial distribution characteristics are high in the east and low in the west. This study helps to deepen our understanding of the carbon cycle and climate change, and can provide a scientific basis and risk assessment methods for policy formulation, effect evaluation, and international cooperation. Full article
Show Figures

Figure 1

18 pages, 2629 KiB  
Article
Ensemble Machine Learning Models Utilizing a Hybrid Recursive Feature Elimination (RFE) Technique for Detecting GPS Spoofing Attacks Against Unmanned Aerial Vehicles
by Raghad Al-Syouf, Omar Y. Aljarrah, Raed Bani-Hani and Abdallah Alma’aitah
Sensors 2025, 25(8), 2388; https://doi.org/10.3390/s25082388 - 9 Apr 2025
Viewed by 304
Abstract
The dependency of Unmanned Aerial Vehicles (UAVs), also known as drones, on off-board data, such as control and position data, makes them highly susceptible to serious safety and security threats, including data interceptions, Global Positioning System (GPS) jamming, and spoofing attacks. This indeed [...] Read more.
The dependency of Unmanned Aerial Vehicles (UAVs), also known as drones, on off-board data, such as control and position data, makes them highly susceptible to serious safety and security threats, including data interceptions, Global Positioning System (GPS) jamming, and spoofing attacks. This indeed necessitates the existence of an Intrusion Detection System (IDS) in place to detect potential security threats/intrusions promptly. Recently, machine-learning-based IDSs have gained popularity due to their high performance in detecting known as well as novel cyber-attacks. However, the time and computation efficiencies of ML-based IDSs still present a challenge in the UAV domain. Therefore, this paper proposes a hybrid Recursive Feature Elimination (RFE) technique based on feature importance ranking along with a Spearman Correlation Analysis (SCA). This technique is built on ensemble learning approaches, namely, bagging, boosting, stacking, and voting classifiers, to efficiently detect GPS spoofing attacks. Two benchmark datasets are employed: the GPS spoofing dataset and the UAV location GPS spoofing dataset. The results show that our proposed ensemble models achieved a notable balance between efficacy and efficiency, showing that the bagging classifier achieved the highest accuracy rate of 99.50%. At the same time, the Decision Tree (DT) and the bagging classifiers achieved the lowest processing time of 0.003 s and 0.029 s, respectively, using the GPS spoofing dataset. For the UAV location GPS spoofing dataset, the bagging classifier emerged as the top performer, achieving 99.16% accuracy and 0.002 s processing time compared to other well-known ML models. In addition, the experimental results show that our proposed methodology (RFE) outperformed other well-known ML models built on conventional feature selection techniques for detecting GPS spoofing attacks, such as mutual information gain, correlation matrices, and the chi-square test. Full article
(This article belongs to the Section Navigation and Positioning)
Show Figures

Figure 1

25 pages, 5420 KiB  
Article
Explainable AI for Chronic Kidney Disease Prediction in Medical IoT: Integrating GANs and Few-Shot Learning
by Nermeen Gamal Rezk, Samah Alshathri, Amged Sayed and Ezz El-Din Hemdan
Bioengineering 2025, 12(4), 356; https://doi.org/10.3390/bioengineering12040356 - 29 Mar 2025
Viewed by 566
Abstract
According to recent global public health studies, chronic kidney disease (CKD) is becoming more and more recognized as a serious health risk as many people are suffering from this disease. Machine learning techniques have demonstrated high efficiency in identifying CKD, but their opaque [...] Read more.
According to recent global public health studies, chronic kidney disease (CKD) is becoming more and more recognized as a serious health risk as many people are suffering from this disease. Machine learning techniques have demonstrated high efficiency in identifying CKD, but their opaque decision-making processes limit their adoption in clinical settings. To address this, this study employs a generative adversarial network (GAN) to handle missing values in CKD datasets and utilizes few-shot learning techniques, such as prototypical networks and model-agnostic meta-learning (MAML), combined with explainable machine learning to predict CKD. Additionally, traditional machine learning models, including support vector machines (SVM), logistic regression (LR), decision trees (DT), random forests (RF), and voting ensemble learning (VEL), are applied for comparison. To unravel the “black box” nature of machine learning predictions, various techniques of explainable AI, such as SHapley Additive exPlanations (SHAP) and local interpretable model-agnostic explanations (LIME), are applied to understand the predictions made by the model, thereby contributing to the decision-making process and identifying significant parameters in the diagnosis of CKD. Model performance is evaluated using predefined metrics, and the results indicate that few-shot learning models integrated with GANs significantly outperform traditional machine learning techniques. Prototypical networks with GANs achieve the highest accuracy of 99.99%, while MAML reaches 99.92%. Furthermore, prototypical networks attain F1-score, recall, precision, and Matthews correlation coefficient (MCC) values of 99.89%, 99.9%, 99.9%, and 100%, respectively, on the raw dataset. As a result, the experimental results clearly demonstrate the effectiveness of the suggested method, offering a reliable and trustworthy model to classify CKD. This framework supports the objectives of the Medical Internet of Things (MIoT) by enhancing smart medical applications and services, enabling accurate prediction and detection of CKD, and facilitating optimal medical decision making. Full article
Show Figures

Figure 1

25 pages, 12169 KiB  
Article
Assessment of Landslide Susceptibility Based on the Two-Layer Stacking Model—A Case Study of Jiacha County, China
by Zhihan Wang, Tao Wen, Ningsheng Chen and Ruixuan Tang
Remote Sens. 2025, 17(7), 1177; https://doi.org/10.3390/rs17071177 - 26 Mar 2025
Viewed by 224
Abstract
The challenge of obtaining landslide susceptibility zoning in Tibet is compounded by the high altitude, extensive range, and difficult exploration of the region. To address this issue, a novel evaluation approach based on Stacking ensemble machine learning is proposed. This study focuses on [...] Read more.
The challenge of obtaining landslide susceptibility zoning in Tibet is compounded by the high altitude, extensive range, and difficult exploration of the region. To address this issue, a novel evaluation approach based on Stacking ensemble machine learning is proposed. This study focuses on Jiacha County, adopts the slope unit as the evaluation unit, and picks up 14 evaluation factors that symbolize the topography and geomorphology, environmental and hydrological features, and basic geological features. These landslide conditioning factors were integrated into a total of 4660 Stacking ensemble learning models, randomly combined by 10 base-algorithms, including AdaBoost, Decision Tree (DT), Gradient Boosting Decision Tree (GBDT), k-Nearest Neighbors (kNNs), LightGBM, Multilayer Perceptron (MLP), Random Forest (RF), Ridge Regression, Support Vector Machine (SVM), and XGBoost. All models were trained, using the natural discontinuity method to classify landslide susceptibility, and the AUC value, the area under the ROC curve, was taken to evaluate the model. The results show that the maximum AUC values in the 9 models performing better reach 0.78 and 0.99 over the test set and the train set. Most of the areas identified as high susceptibility and above show consistency with the interpretation of the existing geological field data. Thus, the Stacking ensemble method is applicable to the landslide susceptibility situation in Jiacha County, Tibet, and can provide theoretical support for disaster prevention and mitigation work in the Qinghai–Tibet Plateau area. Full article
Show Figures

Figure 1

34 pages, 2285 KiB  
Article
Empirical Analysis of Data Sampling-Based Decision Forest Classifiers for Software Defect Prediction
by Fatima Enehezei Usman-Hamza, Abdullateef Oluwagbemiga Balogun, Hussaini Mamman, Luiz Fernando Capretz, Shuib Basri, Rafiat Ajibade Oyekunle, Hammed Adeleye Mojeed and Abimbola Ganiyat Akintola
Software 2025, 4(2), 7; https://doi.org/10.3390/software4020007 - 21 Mar 2025
Viewed by 300
Abstract
The strategic significance of software testing in ensuring the success of software development projects is paramount. Comprehensive testing, conducted early and consistently across the development lifecycle, is vital for mitigating defects, especially given the constraints on time, budget, and other resources often faced [...] Read more.
The strategic significance of software testing in ensuring the success of software development projects is paramount. Comprehensive testing, conducted early and consistently across the development lifecycle, is vital for mitigating defects, especially given the constraints on time, budget, and other resources often faced by development teams. Software defect prediction (SDP) serves as a proactive approach to identifying software components that are most likely to be defective. By predicting these high-risk modules, teams can prioritize thorough testing and inspection, thereby preventing defects from escalating to later stages where resolution becomes more resource intensive. SDP models must be continuously refined to improve predictive accuracy and performance. This involves integrating clean and preprocessed datasets, leveraging advanced machine learning (ML) methods, and optimizing key metrics. Statistical-based and traditional ML approaches have been widely explored for SDP. However, statistical-based models often struggle with scalability and robustness, while conventional ML models face challenges with imbalanced datasets, limiting their prediction efficacy. In this study, innovative decision forest (DF) models were developed to address these limitations. Specifically, this study evaluates the cost-sensitive forest (CS-Forest), forest penalizing attributes (FPA), and functional trees (FT) as DF models. These models were further enhanced using homogeneous ensemble techniques, such as bagging and boosting techniques. The experimental analysis on benchmark SDP datasets demonstrates that the proposed DF models effectively handle class imbalance, accurately distinguishing between defective and non-defective modules. Compared to baseline and state-of-the-art ML and deep learning (DL) methods, the suggested DF models exhibit superior prediction performance and offer scalable solutions for SDP. Consequently, the application of DF-based models is recommended for advancing defect prediction in software engineering and similar ML domains. Full article
Show Figures

Figure 1

28 pages, 5493 KiB  
Article
Multi-Objective Optimization Method for Power Transformer Design Based on Surrogate Modeling and Hybrid Heuristic Algorithm
by Baidi Shi, Wei Xiao, Liangxian Zhang, Tao Wang, Yongfeng Jiang, Jingyu Shang, Zixing Li, Xinfu Chen and Meng Li
Electronics 2025, 14(6), 1198; https://doi.org/10.3390/electronics14061198 - 18 Mar 2025
Viewed by 325
Abstract
In response to the increasing demands for energy conservation and pollution reduction, optimizing transformer design to reduce operational losses and minimize raw material usage has become crucial. This paper introduces an innovative methodology that combines ensemble learning models with hybrid multi-objective optimization heuristic [...] Read more.
In response to the increasing demands for energy conservation and pollution reduction, optimizing transformer design to reduce operational losses and minimize raw material usage has become crucial. This paper introduces an innovative methodology that combines ensemble learning models with hybrid multi-objective optimization heuristic algorithms to optimize leakage impedance deviation, on-load loss, and raw material consumption in power transformers. The stacking ensemble model uses support vector machines, linear regression, decision tree regression, and K-nearest neighbors as base learners, with the extreme learning machine serving as the meta-learner to re-learn outputs from first-level learners. Given the significant impact of hyperparameters on the prediction performance of ensemble learning models, an improved particle swarm optimization method is proposed for effective hyperparameter optimization. To assess the uncertainty of the proposed ensemble learning model, a Kriging surrogate model-based analysis is outlined. Moreover, a powerful multi-objective algorithm that integrates the multi-objective grey wolf optimization (MOGWO) and the non-dominated sorting genetic algorithm-III (NSGA3) is presented for model optimization. This approach demonstrates superior performance compared to mainstream multi-objective optimization algorithms. The effectiveness of this method is further validated through the engineering tests of two real engineering cases. The proposed algorithm can accommodate various design requirements and, under the given constraints, achieve a multi-objective optimization design for power transformers, ensuring optimal performance in different operational scenarios. Full article
Show Figures

Figure 1

Back to TopTop