Next Article in Journal
A Survey on Variable Neighborhood Search for Sustainable Logistics
Next Article in Special Issue
Code Obfuscation: A Comprehensive Approach to Detection, Classification, and Ethical Challenges
Previous Article in Journal
Design of a New Energy Microgrid Optimization Scheduling Algorithm Based on Improved Grey Relational Theory
Previous Article in Special Issue
Applying Recommender Systems to Predict Personalized Film Age Ratings for Parents
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

SMOTE vs. SMOTEENN: A Study on the Performance of Resampling Algorithms for Addressing Class Imbalance in Regression Models

1
Department of Anatomy, College of Osteopathic Medicine, New York Institute of Technology, Old Westbury, NY 11568, USA
2
Department of Osteopathic Manipulative Medicine, College of Osteopathic Medicine, New York Institute of Technology, Old Westbury, NY 11568, USA
3
The Ferrara Center for Patient Safety and Clinical Simulation, College of Osteopathic Medicine, New York Institute of Technology, Old Westbury, NY 11568, USA
*
Author to whom correspondence should be addressed.
Algorithms 2025, 18(1), 37; https://doi.org/10.3390/a18010037
Submission received: 20 December 2024 / Revised: 8 January 2025 / Accepted: 9 January 2025 / Published: 10 January 2025
(This article belongs to the Special Issue Algorithms in Data Classification (2nd Edition))

Abstract

Class imbalance is a prevalent challenge in machine learning that arises from skewed data distributions in one class over another, causing models to prioritize the majority class and underperform on the minority classes. This bias can significantly undermine accurate predictions in real-world scenarios, highlighting the importance of the robust handling of imbalanced data for dependable results. This study examines one such scenario of real-time monitoring systems for fall risk assessment in bedridden patients where class imbalance may compromise the effectiveness of machine learning. It compares the effectiveness of two resampling techniques, the Synthetic Minority Oversampling Technique (SMOTE) and SMOTE combined with Edited Nearest Neighbors (SMOTEENN), in mitigating class imbalance and improving predictive performance. Using a controlled sampling strategy across various instance levels, the performance of both methods in conjunction with decision tree regression, gradient boosting regression, and Bayesian regression models was evaluated. The results indicate that SMOTEENN consistently outperforms SMOTE in terms of accuracy and mean squared error across all sample sizes and models. SMOTEENN also demonstrates healthier learning curves, suggesting improved generalization capabilities, particularly for a sampling strategy with a given number of instances. Furthermore, cross-validation analysis reveals that SMOTEENN achieves higher mean accuracy and lower standard deviation compared to SMOTE, indicating more stable and reliable performance. These findings suggest that SMOTEENN is a more effective technique for handling class imbalance, potentially contributing to the development of more accurate and generalizable predictive models in various applications.
Keywords: class imbalance; SMOTE; SMOTEENN; oversampling; machine learning class imbalance; SMOTE; SMOTEENN; oversampling; machine learning
Graphical Abstract

Share and Cite

MDPI and ACS Style

Husain, G.; Nasef, D.; Jose, R.; Mayer, J.; Bekbolatova, M.; Devine, T.; Toma, M. SMOTE vs. SMOTEENN: A Study on the Performance of Resampling Algorithms for Addressing Class Imbalance in Regression Models. Algorithms 2025, 18, 37. https://doi.org/10.3390/a18010037

AMA Style

Husain G, Nasef D, Jose R, Mayer J, Bekbolatova M, Devine T, Toma M. SMOTE vs. SMOTEENN: A Study on the Performance of Resampling Algorithms for Addressing Class Imbalance in Regression Models. Algorithms. 2025; 18(1):37. https://doi.org/10.3390/a18010037

Chicago/Turabian Style

Husain, Gazi, Daniel Nasef, Rejath Jose, Jonathan Mayer, Molly Bekbolatova, Timothy Devine, and Milan Toma. 2025. "SMOTE vs. SMOTEENN: A Study on the Performance of Resampling Algorithms for Addressing Class Imbalance in Regression Models" Algorithms 18, no. 1: 37. https://doi.org/10.3390/a18010037

APA Style

Husain, G., Nasef, D., Jose, R., Mayer, J., Bekbolatova, M., Devine, T., & Toma, M. (2025). SMOTE vs. SMOTEENN: A Study on the Performance of Resampling Algorithms for Addressing Class Imbalance in Regression Models. Algorithms, 18(1), 37. https://doi.org/10.3390/a18010037

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop