Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (406)

Search Parameters:
Keywords = SMOTE oversampling

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
11 pages, 832 KB  
Proceeding Paper
Heart Failure Prediction Through a Comparative Study of Machine Learning and Deep Learning Models
by Mohid Qadeer, Rizwan Ayaz and Muhammad Ikhsan Thohir
Eng. Proc. 2025, 107(1), 61; https://doi.org/10.3390/engproc2025107061 - 4 Sep 2025
Abstract
The heart is essential to human life, so it is important to protect it and understand any kind of damage it can have. All the diseases related to hearts leads to heart failure. To help address this, a tool for predicting survival is [...] Read more.
The heart is essential to human life, so it is important to protect it and understand any kind of damage it can have. All the diseases related to hearts leads to heart failure. To help address this, a tool for predicting survival is needed. This study explores the use of several classification models for forecasting heart failure outcomes using the Heart Failure Clinical Records dataset. The outcome contrasts a deep learning (DL) model known as the Convolutional Neural Network (CNN) with many machine learning models, including Random Forest (RF), K-Nearest Neighbors (KNN), Decision Tree (DT), and Naïve Bayes (NB). Various data processing techniques, like standard scaling and Synthetic Minority Oversampling Technique (SMOTE), are used to improve prediction accuracy. The CNN model performs best by achieving 99%. In comparison, the best-performing ML model, Naïve Bayes, reaches 92.57%. This shows that deep learning provides better predictions of heart failure, making it a useful tool for early detection and better patient care. Full article
Show Figures

Figure 1

10 pages, 1019 KB  
Proceeding Paper
Classification of Infectious and Parasitic Diseases by Smart Healthcare System
by Junwei Yang, Teerawat Simmachan, Subij Shakya and Pichit Boonkrong
Eng. Proc. 2025, 108(1), 14; https://doi.org/10.3390/engproc2025108014 - 1 Sep 2025
Viewed by 676
Abstract
We developed a machine-learning model for the International Classification of Diseases, 10th Revision (ICD-10) classification using data from 5108 patients. Nine features, including age, gender, BMI, and vital signs, were extracted to classify the top three ICD-10 categories: intestinal infections, tuberculosis, and other [...] Read more.
We developed a machine-learning model for the International Classification of Diseases, 10th Revision (ICD-10) classification using data from 5108 patients. Nine features, including age, gender, BMI, and vital signs, were extracted to classify the top three ICD-10 categories: intestinal infections, tuberculosis, and other bacterial diseases. Decision trees, random forest, and XGBoost models were tested using the synthetic minority over-sampling technique (SMOTE) and class weights to minimize class imbalance. Five-fold cross-validation was used using the training and testing datasets in a data ratio of 80:20. The random forest model with class weights showed the best performance. Shapley additive explanations (SHAP) analysis highlighted body-mass index (BMI), gender, and pulse as key features. The developed model showed potential for enhancing ICD-10 classification through real-time and personalized medical applications. Full article
Show Figures

Figure 1

18 pages, 1553 KB  
Article
GAN-AHR: A GAN-Based Adaptive Hybrid Resampling Algorithm for Imbalanced Intrusion Detection
by Monirah Al-Ajlan and Mourad Ykhlef
Electronics 2025, 14(17), 3476; https://doi.org/10.3390/electronics14173476 - 29 Aug 2025
Viewed by 375
Abstract
With the recent proliferation of the Internet and the ever-evolving threat landscape, developing a reliable and effective intrusion detection system (IDS) has become an urgent need. However, one of the key challenges hindering the success of IDS development is class imbalance, which often [...] Read more.
With the recent proliferation of the Internet and the ever-evolving threat landscape, developing a reliable and effective intrusion detection system (IDS) has become an urgent need. However, one of the key challenges hindering the success of IDS development is class imbalance, which often leads to biased models and poor detection rates. To address this challenge, this paper proposes a GAN-AHR algorithm which adaptively balances the dataset by augmenting minority classes using CGAN or BSMOTE, based on class-specific characteristics such as compactness and density. By leveraging BSMOTE to oversample classes with high compactness and high density, we can exploit its simplicity and effectiveness. However, the quality of BSMOTE-generated data is significantly lower when the classes are sparse and lacking clear boundaries. In such cases, CGAN is better suited in this scenario given its ability to capture complex data distributions. We present empirical results on the NF-UNSW-NB15 dataset using a Random Forest (RF) classifier, reporting a significant improvement in the precision, recall, and F1-score of several minority classes. Specifically, a remarkable increase in the F1-score for the Shellcode and DoS classes was reported, reaching 0.90 and 0.51, respectively. Full article
(This article belongs to the Special Issue New Trends in Cryptography, Authentication and Information Security)
Show Figures

Figure 1

7 pages, 347 KB  
Proceeding Paper
Stroke Prediction Using Machine Learning Algorithms
by Nayab Kanwal, Sabeen Javaid and Dhita Diana Dewi
Eng. Proc. 2025, 107(1), 32; https://doi.org/10.3390/engproc2025107032 - 27 Aug 2025
Viewed by 192
Abstract
Stroke is a major global cause of death and disability, and improving outcomes requires early prediction. Although class imbalance in datasets causes biased predictions and inferior classification accuracy, machine learning (ML) techniques have shown potential in stroke prediction. We used the Synthetic Minority [...] Read more.
Stroke is a major global cause of death and disability, and improving outcomes requires early prediction. Although class imbalance in datasets causes biased predictions and inferior classification accuracy, machine learning (ML) techniques have shown potential in stroke prediction. We used the Synthetic Minority Oversampling Technique (SMOTE) to balance datasets and lessen bias in order to address these problems. Furthermore, we suggested a method that combines a linear discriminant analysis (LDA) model for classification with an autoencoder for feature extraction. A grid search approach was used to optimize the hyperparameters of the LDA model. We used criteria like accuracy, sensitivity, specificity, AUC (area under the curve), and ROC (Receiver Operating Characteristic) to guarantee a strong evaluation. With 98.51% sensitivity, 97.56% specificity, 99.24% accuracy, and 98.00% balanced accuracy, our model demonstrated remarkable performance, indicating its potential to improve stroke prediction and aid in clinical decision-making. Full article
Show Figures

Figure 1

23 pages, 2230 KB  
Article
Ensemble Learning for Software Requirement-Risk Assessment: A Comparative Study of Bagging and Boosting Approaches
by Chandan Kumar, Pathan Shaheen Khan, Medandrao Srinivas, Sudhanshu Kumar Jha, Shiv Prakash and Rajkumar Singh Rathore
Future Internet 2025, 17(9), 387; https://doi.org/10.3390/fi17090387 - 27 Aug 2025
Viewed by 347
Abstract
In software development, software requirement engineering (SRE) is an essential stage that guarantees requirements are clear and unambiguous. However, incomplete inconsistency, and ambiguity in requirement documents often occur, which can cause project delay, cost escalation, or total failure. In response to these challenges, [...] Read more.
In software development, software requirement engineering (SRE) is an essential stage that guarantees requirements are clear and unambiguous. However, incomplete inconsistency, and ambiguity in requirement documents often occur, which can cause project delay, cost escalation, or total failure. In response to these challenges, this paper introduces a machine learning method to automatically identify the risk levels of software requirements according to ensemble classification methods. The labeled textual requirement dataset was preprocessed utilizing conventional preprocessing techniques, label encoding, and oversampling with the synthetic minority oversampling technique (SMOTE) to handle class imbalance. Various ensemble and baseline models such as extra trees, random forest, bagging with decision trees, XGBoost, LightGBM, gradient boosting, decision trees, support vector machine, and multi-layer perceptron were trained and compared. Five-fold cross-validation was used to provide stable performance evaluation on accuracy, area under the ROC curve (AUC), F1-score, precision, recall, root mean square error (RMSE), and error rate. The bagging (DT) classifier achieved the best overall performance, with an accuracy of 99.55%, AUC of 0.9971 and an F1-score of 97.23%, while maintaining a low RMSE of 0.03 and error rate of 0.45%. These results demonstrate the effectiveness of ensemble-based classifiers, especially bagging (DT) classifiers, in accurately predicting high-risk software requirements. The proposed method enables early detection and mitigation of requirement risks, aiding project managers and software engineers in improving resource planning, reducing rework, and enhancing overall software quality. Full article
(This article belongs to the Collection Information Systems Security)
Show Figures

Figure 1

27 pages, 2279 KB  
Article
HQRNN-FD: A Hybrid Quantum Recurrent Neural Network for Fraud Detection
by Yao-Chong Li, Yi-Fan Zhang, Rui-Qing Xu, Ri-Gui Zhou and Yi-Lin Dong
Entropy 2025, 27(9), 906; https://doi.org/10.3390/e27090906 - 27 Aug 2025
Viewed by 418
Abstract
Detecting financial fraud is a critical aspect of modern intelligent financial systems. Despite the advances brought by deep learning in predictive accuracy, challenges persist—particularly in capturing complex, high-dimensional nonlinear features. This study introduces a novel hybrid quantum recurrent neural network for fraud detection [...] Read more.
Detecting financial fraud is a critical aspect of modern intelligent financial systems. Despite the advances brought by deep learning in predictive accuracy, challenges persist—particularly in capturing complex, high-dimensional nonlinear features. This study introduces a novel hybrid quantum recurrent neural network for fraud detection (HQRNN-FD). The model utilizes variational quantum circuits (VQCs) incorporating angle encoding, data reuploading, and hierarchical entanglement to project transaction features into quantum state spaces, thereby facilitating quantum-enhanced feature extraction. For sequential analysis, the model integrates a recurrent neural network (RNN) with a self-attention mechanism to effectively capture temporal dependencies and uncover latent fraudulent patterns. To mitigate class imbalance, the synthetic minority over-sampling technique (SMOTE) is employed during preprocessing, enhancing both class representation and model generalizability. Experimental evaluations reveal that HQRNN-FD attains an accuracy of 0.972 on publicly available fraud detection datasets, outperforming conventional models by 2.4%. In addition, the framework exhibits robustness against quantum noise and improved predictive performance with increasing qubit numbers, validating its efficacy and scalability for imbalanced financial classification tasks. Full article
(This article belongs to the Special Issue Quantum Computing in the NISQ Era)
Show Figures

Figure 1

14 pages, 3720 KB  
Proceeding Paper
A Novel Data-Driven Framework for Automated Migraines Classification Using Ensemble Learning
by Muhammad Owais Butt, Azka Mir and Alun Sujjada
Eng. Proc. 2025, 107(1), 25; https://doi.org/10.3390/engproc2025107025 - 26 Aug 2025
Viewed by 280
Abstract
Migraines are recurring and highly painful headaches with multiple associated symptoms that severely affect millions of people around the world. This condition is considered quite serious from a neurologist’s perspective because it is highly debilitating. Effective treatment of migraines begins with its diagnosis [...] Read more.
Migraines are recurring and highly painful headaches with multiple associated symptoms that severely affect millions of people around the world. This condition is considered quite serious from a neurologist’s perspective because it is highly debilitating. Effective treatment of migraines begins with its diagnosis but the subjective nature of clinical evaluations along with class imbalance in patient datasets makes this very complicated. This paper attempts to tackle these issues by developing a machine-learning framework for automated migraines classification by utilizing a Kaggle dataset of 400 samples with 23 independent attributes and 1 dependent attribute representing different types of migraines. Our framework starts with a detailed cleansing of the data, which includes filtering out all missing values. Then, through the use of SMOTE (Synthetic Minority Oversampling Technique), the issue of an imbalanced dataset is tackled. This is followed by optimized feature selection through forward selection and cross-validation with Naïve Bayes. Supervised machine-learning classifiers such as Random Forest (RF), decision tree (DT), K-nearest Neighbors (KNN), and Naïve Bayes (NB) are evaluated and voted on to predict the outcome. Full article
Show Figures

Figure 1

13 pages, 1341 KB  
Proceeding Paper
Predicting Nurse Stress Levels Using Time-Series Sensor Data and Comparative Evaluation of Classification Algorithms
by Ayşe Çiçek Korkmaz, Adem Korkmaz and Selahattin Koşunalp
Eng. Proc. 2025, 104(1), 30; https://doi.org/10.3390/engproc2025104030 - 22 Aug 2025
Viewed by 242
Abstract
This study proposes a machine learning-based framework for classifying occupational stress levels among nurses using physiological time-series data collected from wearable sensors. The dataset comprises multimodal signals including electrodermal activity (EDA), heart rate (HR), skin temperature (TEMP), and tri-axial accelerometer measurements (X, Y, [...] Read more.
This study proposes a machine learning-based framework for classifying occupational stress levels among nurses using physiological time-series data collected from wearable sensors. The dataset comprises multimodal signals including electrodermal activity (EDA), heart rate (HR), skin temperature (TEMP), and tri-axial accelerometer measurements (X, Y, Z), which are labeled into three categorical stress levels: low (0), medium (1), and high (2). To enhance the usability of the raw data, a resampling process was performed to aggregate the measurements into one-minute intervals, followed by the application of the Synthetic Minority Over-sampling Technique (SMOTE) to mitigate severe class imbalance. Subsequently, a comparative classification analysis was conducted using four supervised learning algorithms: Random Forest, XGBoost, k-Nearest Neighbors (k-NN), and LightGBM. Model performances were evaluated based on accuracy, weighted F1-score, and confusion matrices to ensure robustness across imbalanced class distributions. Additionally, temporal pattern analyses by the day of the week and the hour of the day revealed significant trends in stress variation, underscoring the influence of circadian and organizational factors. Among the models tested, ensemble-based methods, particularly Random Forest and XGBoost with optimized hyperparameters, demonstrated a superior predictive performance. These findings highlight the feasibility of integrating real-time, sensor-driven stress monitoring systems into healthcare environments to support proactive workforce management and improve care quality. Full article
Show Figures

Figure 1

25 pages, 1872 KB  
Article
Food Safety Risk Prediction and Regulatory Policy Enlightenment Based on Machine Learning
by Daqing Wu, Hangqi Cai and Tianhao Li
Systems 2025, 13(8), 715; https://doi.org/10.3390/systems13080715 - 19 Aug 2025
Viewed by 433
Abstract
This paper focuses on the challenges in food safety governance in megacities, taking Shanghai as the research object. Aiming at the pain points in food sampling inspections, it proposes a risk prediction and regulatory optimization scheme combining text mining and machine learning. First, [...] Read more.
This paper focuses on the challenges in food safety governance in megacities, taking Shanghai as the research object. Aiming at the pain points in food sampling inspections, it proposes a risk prediction and regulatory optimization scheme combining text mining and machine learning. First, the paper uses the LDA method to conduct in-depth mining on over 78,000 pieces of food sampling data across 34 categories in Shanghai, so as to identify core risk themes. Second, it applies SMOTE oversampling to the sampling data with an extremely low unqualified rate (0.5%). Finally, a machine learning prediction model for food safety risks is constructed, and predictions are made based on this model. The research findings are as follows: ① Food risks in Shanghai show significant characteristics in terms of time, category, and pollution causes. ② Supply chain links, regulatory intensity, and consumption scenarios are among the core influencing factors. ③ The traditional “full coverage” model is inefficient, and resources need to be tilted toward high-risk categories. ④ Public attention (e.g., the “You Order, We Inspect” initiative) can drive regulatory responses to improve the qualified rate. Based on these findings, this paper suggests that relevant authorities should ① classify three levels of risks for categories, increase inspection frequency for high-risk products in summer, adjust sampling intensity for different business entities, and establish a dynamic hierarchical regulatory mechanism; ② tackle source governance, reduce environmental pollution, upgrade process supervision, and strengthen whole-chain risk prevention and control; and ③ promote public participation, strengthen the enterprise responsibility system, and deepen the social co-governance pattern. This study effectively addresses the risk early warning problems in food safety supervision of megacities, providing a scientific basis and practical path for optimizing the allocation of regulatory resources and improving governance efficiency. Full article
(This article belongs to the Topic Digital Technologies in Supply Chain Risk Management)
Show Figures

Figure 1

23 pages, 5632 KB  
Article
Classification of Rockburst Intensity Grades: A Method Integrating k-Medoids-SMOTE and BSLO-RF
by Qinzheng Wu, Bing Dai, Danli Li, Hanwen Jia and Penggang Li
Appl. Sci. 2025, 15(16), 9045; https://doi.org/10.3390/app15169045 - 16 Aug 2025
Viewed by 366
Abstract
Precise forecasting of rockburst intensity categories is vital to safeguarding operational safety and refining design protocols in deep underground engineering. This study proposes an intelligent forecasting framework through the integration of k-medoids-SMOTE and the BSLO-optimized Random Forest (BSLO-RF) algorithm. A curated dataset encompassing [...] Read more.
Precise forecasting of rockburst intensity categories is vital to safeguarding operational safety and refining design protocols in deep underground engineering. This study proposes an intelligent forecasting framework through the integration of k-medoids-SMOTE and the BSLO-optimized Random Forest (BSLO-RF) algorithm. A curated dataset encompassing 351 rockburst instances, stratified into four intensity grades, was compiled via systematic literature synthesis. To mitigate data imbalance and outlier interference, z-score normalization and k-medoids-SMOTE oversampling were implemented, with t-SNE visualization confirming improved inter-class distinguishability. Notably, the BSLO algorithm was utilized for hyperparameter tuning of the Random Forest model, thereby strengthening its global search and local refinement capabilities. Comparative analyses revealed that the optimized BSLO-RF framework outperformed conventional machine learning methods (e.g., BSLO-SVM, BSLO-BP), achieving an average prediction accuracy of 89.16% on the balanced dataset—accompanied by a recall of 87.5% and F1-score of 0.88. It exhibited superior performance in predicting extreme grades: 93.3% accuracy for Level I (no rockburst) and 87.9% for Level IV (severe rockburst), exceeding BSLO-SVM (75.8% for Level IV) and BSLO-BP (72.7% for Level IV). Field validation via the Zhongnanshan Tunnel project further corroborated its reliability, yielding an 80% prediction accuracy (four out of five cases correctly classified) and verifying its adaptability to complex geological settings. This research introduces a robust intelligent classification approach for rockburst intensity, offering actionable insights for risk assessment and mitigation in deep mining and tunneling initiatives. Full article
Show Figures

Figure 1

37 pages, 2286 KB  
Article
Parameterised Quantum SVM with Data-Driven Entanglement for Zero-Day Exploit Detection
by Steven Jabulani Nhlapo, Elodie Ngoie Mutombo and Mike Nkongolo Wa Nkongolo
Computers 2025, 14(8), 331; https://doi.org/10.3390/computers14080331 - 15 Aug 2025
Viewed by 607
Abstract
Zero-day attacks pose a persistent threat to computing infrastructure by exploiting previously unknown software vulnerabilities that evade traditional signature-based network intrusion detection systems (NIDSs). To address this limitation, machine learning (ML) techniques offer a promising approach for enhancing anomaly detection in network traffic. [...] Read more.
Zero-day attacks pose a persistent threat to computing infrastructure by exploiting previously unknown software vulnerabilities that evade traditional signature-based network intrusion detection systems (NIDSs). To address this limitation, machine learning (ML) techniques offer a promising approach for enhancing anomaly detection in network traffic. This study evaluates several ML models on a labeled network traffic dataset, with a focus on zero-day attack detection. Ensemble learning methods, particularly eXtreme gradient boosting (XGBoost), achieved perfect classification, identifying all 6231 zero-day instances without false positives and maintaining efficient training and prediction times. While classical support vector machines (SVMs) performed modestly at 64% accuracy, their performance improved to 98% with the use of the borderline synthetic minority oversampling technique (SMOTE) and SMOTE + edited nearest neighbours (SMOTEENN). To explore quantum-enhanced alternatives, a quantum SVM (QSVM) is implemented using three-qubit and four-qubit quantum circuits simulated on the aer_simulator_statevector. The QSVM achieved high accuracy (99.89%) and strong F1-scores (98.95%), indicating that nonlinear quantum feature maps (QFMs) can increase sensitivity to zero-day exploit patterns. Unlike prior work that applies standard quantum kernels, this study introduces a parameterised quantum feature encoding scheme, where each classical feature is mapped using a nonlinear function tuned by a set of learnable parameters. Additionally, a sparse entanglement topology is derived from mutual information between features, ensuring a compact and data-adaptive quantum circuit that aligns with the resource constraints of noisy intermediate-scale quantum (NISQ) devices. Our contribution lies in formalising a quantum circuit design that enables scalable, expressive, and generalisable quantum architectures tailored for zero-day attack detection. This extends beyond conventional usage of QSVMs by offering a principled approach to quantum circuit construction for cybersecurity. While these findings are obtained via noiseless simulation, they provide a theoretical proof of concept for the viability of quantum ML (QML) in network security. Future work should target real quantum hardware execution and adaptive sampling techniques to assess robustness under decoherence, gate errors, and dynamic threat environments. Full article
Show Figures

Figure 1

22 pages, 3234 KB  
Article
A Lightweight CNN for Multiclass Retinal Disease Screening with Explainable AI
by Arjun Kumar Bose Arnob, Muhammad Hasibur Rashid Chayon, Fahmid Al Farid, Mohd Nizam Husen and Firoz Ahmed
J. Imaging 2025, 11(8), 275; https://doi.org/10.3390/jimaging11080275 - 15 Aug 2025
Viewed by 833
Abstract
Timely, balanced, and transparent detection of retinal diseases is essential to avert irreversible vision loss; however, current deep learning screeners are hampered by class imbalance, large models, and opaque reasoning. This paper presents a lightweight attention-augmented convolutional neural network (CNN) that addresses all [...] Read more.
Timely, balanced, and transparent detection of retinal diseases is essential to avert irreversible vision loss; however, current deep learning screeners are hampered by class imbalance, large models, and opaque reasoning. This paper presents a lightweight attention-augmented convolutional neural network (CNN) that addresses all three barriers. The network combines depthwise separable convolutions, squeeze-and-excitation, and global-context attention, and it incorporates gradient-based class activation mapping (Grad-CAM) and Grad-CAM++ to ensure that every decision is accompanied by pixel-level evidence. A 5335-image ten-class color-fundus dataset from Bangladeshi clinics, which was severely skewed (17–1509 images per class), was equalized using a synthetic minority oversampling technique (SMOTE) and task-specific augmentations. Images were resized to 150×150 px and split 70:15:15. The training used the adaptive moment estimation (Adam) optimizer (initial learning rate of 1×104, reduce-on-plateau, early stopping), 2 regularization, and dual dropout. The 16.6 M parameter network converged in fewer than 50 epochs on a mid-range graphics processing unit (GPU) and reached 87.9% test accuracy, a macro-precision of 0.882, a macro-recall of 0.879, and a macro-F1-score of 0.880, reducing the error by 58% relative to the best ImageNet backbone (Inception-V3, 40.4% accuracy). Eight disorders recorded true-positive rates above 95%; macular scar and central serous chorioretinopathy attained F1-scores of 0.77 and 0.89, respectively. Saliency maps consistently highlighted optic disc margins, subretinal fluid, and other hallmarks. Targeted class re-balancing, lightweight attention, and integrated explainability, therefore, deliver accurate, transparent, and deployable retinal screening suitable for point-of-care ophthalmic triage on resource-limited hardware. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

17 pages, 2542 KB  
Article
Automated Landform Classification from InSAR-Derived DEMs Using an Enhanced Random Forest Model for Urban Transportation Corridor Hazard Assessment
by Song Zhu, Yuansheng Hua, Jiasong Zhu and Fanyi Meng
Remote Sens. 2025, 17(16), 2819; https://doi.org/10.3390/rs17162819 - 14 Aug 2025
Viewed by 288
Abstract
Interferometric Synthetic Aperture Radar (InSAR)-derived Digital Elevation Models (DEMs) provide critical landform data for monitoring the stability of urban infrastructure, especially for linear infrastructure such as roads and transportation corridors. Traditional landform classification methods are often hindered by incomplete results and require significant [...] Read more.
Interferometric Synthetic Aperture Radar (InSAR)-derived Digital Elevation Models (DEMs) provide critical landform data for monitoring the stability of urban infrastructure, especially for linear infrastructure such as roads and transportation corridors. Traditional landform classification methods are often hindered by incomplete results and require significant manual intervention. To address these challenges, we propose an automated landform classification method based on an enhanced Random Forest (RF) model that integrates Optimization of Decreasing Reduction (ODR) for majority class undersampling and Support Vector Machine Synthetic Minority Oversampling Technique (SVM-SMOTE) for minority class oversampling, specifically to address class imbalance. The method was validated using a dataset of 82,450 expert-labeled samples from approximately 100 km of highway corridors, with independent test sets and ten-fold cross-validation. The enhanced RF model achieved a classification completeness rate of 100% and a macro F-score of 97.0%, significantly outperforming traditional rule-based and standard RF methods. This approach provides robust post-processing support for InSAR-based urban infrastructure monitoring and environmental modeling. Full article
Show Figures

Figure 1

32 pages, 2983 KB  
Article
TS-SMOTE: An Improved SMOTE Method Based on Symmetric Triangle Scoring Mechanism for Solving Class-Imbalanced Problems
by Shihao Song and Sibo Yang
Symmetry 2025, 17(8), 1326; https://doi.org/10.3390/sym17081326 - 14 Aug 2025
Viewed by 367
Abstract
The imbalanced classification problem is a key research in machine learning as the relevant algorithms tend to focus on the features and patterns of the majority class instead of insufficient learning of the minority class, resulting in unsatisfactory performance of machine learning. Scholars [...] Read more.
The imbalanced classification problem is a key research in machine learning as the relevant algorithms tend to focus on the features and patterns of the majority class instead of insufficient learning of the minority class, resulting in unsatisfactory performance of machine learning. Scholars have attempted to solve this problem and proposed many ideas at the data and algorithm levels. The SMOTE (Synthetic Minority Over-sampling Technique) method is an effective approach at the data level. In this paper, we propose an oversampling method based on SMOTE and symmetric regular triangles scoring mechanism. This method uses symmetrical triangles to flatten the plane, and then establishes a suitable scoring mechanism to select the minority samples that participate in the synthesis. After selecting the minority samples, it conducts multiple linear interpolations according to the established rules to generate new minority samples. In the experimental section, we select 30 imbalanced datasets to test their performance of the proposed method and some classical oversampling methods under different indicators. In order to demonstrate the performance of these oversampling methods with classifiers, we select three different classifiers and test their performance. The experimental results show that the TS-SMOTE method has the best performance. Full article
(This article belongs to the Special Issue Advances in Neural Network/Deep Learning and Symmetry/Asymmetry)
Show Figures

Figure 1

22 pages, 1710 KB  
Article
Machine Learning Techniques Improving the Box–Cox Transformation in Breast Cancer Prediction
by Sultan S. Alshamrani
Electronics 2025, 14(16), 3173; https://doi.org/10.3390/electronics14163173 - 9 Aug 2025
Viewed by 434
Abstract
Breast cancer remains a major global health problem, characterized by high incidence and mortality rates. Developing accurate prediction models is essential to improving early detection and treatment outcomes. Machine learning (ML) has become a valuable resource in breast cancer prediction; however, the complexities [...] Read more.
Breast cancer remains a major global health problem, characterized by high incidence and mortality rates. Developing accurate prediction models is essential to improving early detection and treatment outcomes. Machine learning (ML) has become a valuable resource in breast cancer prediction; however, the complexities inherent in medical data, including biases and imbalances, can hinder the effectiveness of these models. This paper explores combining the Box–Cox transformation with ML models to normalize data distributions and stabilize variance, thereby enhancing prediction accuracy. Two datasets were analyzed: a synthetic gamma-distributed dataset that simulates skewed real-world data and the Surveillance, Epidemiology, and End Results (SEER) breast cancer dataset, which displays imbalanced real-world data. Four distinct experimental scenarios were conducted on the ML models with a synthetic dataset, the SEER dataset with the Box–Cox transformation, a SEER dataset with the logarithmic transformation, and with Synthetic Minority Over-sampling Technique (SMOTE) augmentation to evaluate the impact of the Box–Cox transformation through different lambda values. The results show that the Box–Cox transformation significantly improves the performance of Artificial Intelligence (AI) models, particularly the stacking model, achieving the highest accuracy with 94.53% and 94.74% of the F1 score. This study demonstrates the importance of feature transformation in healthcare analytics, offering a scalable framework for improving breast cancer prediction and potentially applicable to other medical datasets with similar challenges. Full article
Show Figures

Figure 1

Back to TopTop