Algorithms

31 pages, 6617 KB

Open AccessArticle

A Modular and Explainable Machine Learning Pipeline for Student Dropout Prediction in Higher Education

by Abdelkarim Bettahi, Fatima-Zahra Belouadha and Hamid Harroud

Algorithms 2025, 18(10), 662; https://doi.org/10.3390/a18100662 (registering DOI) - 18 Oct 2025

Student dropout remains a persistent challenge in higher education, with substantial personal, institutional, and societal costs. We developed a modular dropout prediction pipeline that couples data preprocessing with multi-model benchmarking and a governance-ready explainability layer. Using 17,883 undergraduate records from a Moroccan higher [...] Read more.

Student dropout remains a persistent challenge in higher education, with substantial personal, institutional, and societal costs. We developed a modular dropout prediction pipeline that couples data preprocessing with multi-model benchmarking and a governance-ready explainability layer. Using 17,883 undergraduate records from a Moroccan higher education institution, we evaluated nine algorithms (logistic regression (LR), decision tree (DT), random forest (RF), k-nearest neighbors (k-NN), support vector machine (SVM), gradient boosting, Extreme Gradient Boosting (XGBoost), Naïve Bayes (NB), and multilayer perceptron (MLP)). On our test set, XGBoost attained an area under the receiver operating characteristic curve (AUC–ROC) of 0.993, F1-score of 0.911, and recall of 0.944. Subgroup reporting supported governance and fairness: across credit–load bins, recall remained high and stable (e.g., <9 credits: precision 0.85, recall 0.932; 9–12: 0.886/0.969; >12: 0.915/0.936), with full TP/FP/FN/TN provided. A Shapley additive explanations (SHAP)-based layer identified risk and protective factors (e.g., administrative deadlines, cumulative GPA, and passed-course counts), surfaced ambiguous and anomalous cases for human review, and offered case-level diagnostics. To assess generalization, we replicated our findings on a public dataset (UCI–Portugal; tables only): XGBoost remained the top-ranked (F1-score 0.792, AUC–ROC 0.922). Overall, boosted ensembles combined with SHAP delivered high accuracy, transparent attribution, and governance-ready outputs, enabling responsible early-warning implementation for student retention. Full article

(This article belongs to the Special Issue AI-Driven Solutions for Smart Systems in Engineering, Computing, Education, and Society)

41 pages, 762 KB

Open AccessArticle

MCMC Methods: From Theory to Distributed Hamiltonian Monte Carlo over PySpark

by Christos Karras, Leonidas Theodorakopoulos, Aristeidis Karras, George A. Krimpas, Charalampos-Panagiotis Bakalis and Alexandra Theodoropoulou

Algorithms 2025, 18(10), 661; https://doi.org/10.3390/a18100661 - 17 Oct 2025

Journal Menu

Journal Browser

Algorithms, Volume 18, Issue 10 (October 2025) – 69 articles

Further Information

Guidelines

MDPI Initiatives

Follow MDPI