1. Introduction
It has long been considered that student dropout is one of the most problematic and protracted issues affecting educational organizations around the world. Besides resulting in personal losses to students such as less career potential, unsound finances, and reduced social mobility, the phenomenon also takes its toll on the society and the institution in terms of resource wastage, worse performance, and lower contributions to social and economic development. As online and hybrid education is gaining momentum very rapidly, especially in the current situation of digital transformation and dependence on learning management systems, the need to manage early student withdrawal challenges is becoming even more acute. Higher education institutions (HEIs) must handle large volumes of student data because of digitalization, which demands data-driven methods to detect at-risk students and deliver immediate interventions.
Machine learning (ML) has become a game-changing force in a range of fields that combines predictive power with practical decision support. In agriculture, ML is being integrated into smart farm platforms to fine-tune crop management, monitor soils, and forecast yields to drive sustainable climate-resilient practices [
1]. In healthcare, ML models improve disease diagnosis, personalize treatments, and guide population health management showing an ability to sift through medical data with greater accuracy and efficiency [
2]. Beyond these areas, ML finds use in everything, from cybersecurity and smart city infrastructures to business intelligence and educational settings, strengthening its cross-sector relevance in the Fourth Industrial Revolution [
3]. When viewed as a whole, these deployments illustrate the wide-reaching power of ML to confront problems highlighting the need for interpretable and governance-ready frameworks, especially for high-stakes tasks, like forecasting dropout rates in higher education.
Attention mechanisms have swiftly become a cornerstone of deep learning models letting them latch onto the salient cues while drowning out the irrelevant noise. Recent surveys put a spotlight on the reach of attention mechanisms. In their 2021 review, Correia and Colombini [
4] unpack the building blocks and design choices, behind attention underscoring its relevance for natural-language processing, vision and multimodal applications. Brauwers and Frasincar [
5] follow up with a taxonomy that not categorises the various attention styles but also offers evaluation frameworks and draws parallels across disciplines. Meanwhile Hassanin et al. [
6] dive into visual-attention techniques outlining the architectural families (channel, spatial and self-attention) that dominate contemporary computer-vision models. Collectively the body of work showcases the flexibility and interpretability of attention mechanisms hinting at ground for future applications, in educational data analytics especially where intricate temporal and relational dependencies must be captured.
In the last ten years, data mining and artificial intelligence (AI) research in the field of education and learning has made significant progress, and machine learning models are being developed to predict student dropout more accurately and earlier. Systematic reviews indicate that hundreds of factors have been reported to have an impact on student dropout, varying between academic performance and social economic status to institutional and individual factors [
7].
Nevertheless, despite this heterogeneity, research is highly biased toward academic and demographic variables, which are simple to compute, while psychological, motivational, and behavioral predictors are often neglected. Not only this, but interpretability has failed to keep up with the increase in predictive accuracy, and only a small proportion of studies have relied on explainable artificial intelligence processes, i.e., Shapley additive explanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME). This has led to the development of models that are difficult to implement as practical policies for teachers and administrators, thereby raising questions about the real-world usefulness of models that are merely accurate.
Recent advancements in the discipline, however, have demonstrated that more effective solutions can be achieved. For example, Alghamdi, Soh, and Li [
8] designed ISELDP, which includes a stacking ensemble architecture in which Adaptive Boosting (AdaBoost), random forest, gradient boosting, and Extreme Gradient Boosting (XGBoost) along with a multilayer perceptron as a meta-learner are incorporated, and they reported a significant improvement in predictions of in-session dropout. Their experiment demonstrated that the proposed ensemble-based architecture has the potential to address the phenomenon of class imbalance and support real-time predictive analytics of at-risk learners, but the magnitude of computation, generalization, and explanation issues remain within this organization as a means of delivery of learning data applications within a learning environment. Similarly, Bañeres et al. [
9] implemented an extensive early-warning system (EWS) at the Universitat Oberta de Catalunya, where the gradual at-risk (GAR) system has the potential to deliver continuous predictions and instantaneous responses to the early stages of student activity in a course. This model shows how the combination of a predictive model and interventional mechanisms, i.e., nudges and an instructor dashboard, can assist in a see-saw increase in retention. However, there are several drawbacks to the approach because of its dependence on the grades of continuous assessments, as well as the fact that it lacks generalizability to other administrative cases of examination.
Besides system-level deployments, there is an increasing amount of work on the importance of explainability in dropout prediction. Padmasiri and Kasthuriarachchi [
10] investigated interpretable dropout prediction models through the integration of machine learning predictors with SHAP and LIME to identify which variables including tuition history, age, and academic grades most impact dropout status. Their user evaluation found that educators valued the interpretability of these models, but the study still has shortcomings, as it only used one dataset and a small sample of professionals. Similarly, Islam et al. [
11] combined several explainable artificial intelligence (XAI) methods within an educational data-mining model to allow both global and local interpretability and to detect the key factors, such as their Kernel Tuning Toolkit (KTT) usage, scholarship, and unemployment rates. The literature demonstrates a growing trend toward decision–support systems that provide actionable insights instead of opaque predictive models, yet scalability, dataset diversity, and longitudinal validation remain major challenges.
There have also been applications of comparative analysis to the role that methodological choice plays in dropout prediction. The authors of [
12] demonstrated that support vector regression, gradient boosting, and XGBoost outperform basic models after performing extensive hyperparameter optimization. The research by Villar and de Andrade [
13] demonstrated that the Categorical Boosting (CatBoost) and Light Gradient Boosting Machine (LightGBM) algorithms, together with SHAP analysis, achieved superior F1-scores and better dropout driver identification than standard methods. The results show a general trend in research because ensemble and deep learning models always produce better predictive results. However, such methods require more computational power, which increases the risk of overfitting, particularly when working with small or unbalanced datasets.
The predictive horizons are expanding in new directions. Models now aim to identify dropout risk not just in higher education but also much earlier in the student lifecycle. Psyridou et al. [
14] demonstrated that dropout can be predicted as early as the end of primary school using balanced random forests trained on a 13-year longitudinal dataset. Their research showed that reading fluency, comprehension, and early arithmetic skills are among the strongest predictors of future dropout risk. This demonstrates the significance of early interventions. Similarly, Martinez, Sood, and Mahto [
15] stressed the importance of engagement and behavioral characteristics in the detection of dropout in real time in higher education. They showed that the Naïve Bayes and random forest models can achieve a precision greater than
in small pilot deployments. The research demonstrates how dropout develops over time through academic and cognitive factors, as well as behavioral and contextual elements. The success of an intervention depends on both the early identification of problems and the continuous assessment of the situation.
Research has made significant progress, but there are still important research gaps. The continuous data imbalance problem reduces the reliability of predictive models. Minority dropout cases are often ignored by classifiers in favor of the majority non-dropout cases. The risk of overfitting remains high in ensemble and deep learning frameworks because they require complex hyperparameter tuning. Research findings have not been validated across different institutions, cultural settings, or academic fields. The field of explainable AI has received significant attention, but most existing research focuses on post hoc feature attribution methods rather than designing models that incorporate understanding from the start. The literature does not contain systematic methods to differentiate between risk and protection factors, which is necessary for developing complete proactive intervention strategies.
It is against this background that the present study was conducted. Based on prior solutions to ensemble learning, explainable AI, and EWSs, we developed a modular dropout prediction pipeline. The system evaluates classical, ensemble, and deep learning models with explainability integrated into the core of the pipeline. SHAP-based analysis helps achieve two goals: it enables global feature importance analysis and detects anomalies, understands ambiguity, and identifies high-confidence misclassifications. Our method delivers practical insights that surpass black-box prediction through the risk versus protection factor distinction. We strive to provide educators and policymakers tools that are both comprehensible and practical. This research demonstrates how methodological complexity leads to real-world application through scientific rigor, thus producing practical solutions to fight student dropout. We use “dropout” to denote eventual non-completion within the observation window, “at-risk” to indicate positive predictions, and “withdrawal” for institution-specific administrative codes.
This work contributes a modular, reproducible pipeline that benchmarks nine classifiers on 17,883 Moroccan undergraduate records and couples them with a governance-ready SHAP layer; statistically grounded comparisons showing XGBoost as the top model on our test set (area under the receiver operating characteristic curve (AUC–ROC)
, F1-score
, and recall
), with significance testing against strong peers; subgroup reporting (credit–load, division, and major) with full true positives, true negatives, false positives, and false negatives for governance and fairness auditing; case-level explanations that surface anomalous/ambiguous predictions for human-in-the-loop review; and an external replication on the “Predict Students’ Dropout and Academic Success” public dataset [
16], indicating preserved ranking and generalization of the approach.
6. Conclusions and Future Work
This research used ensemble trees alongside neural networks and traditional linear classifiers to create and evaluate a modular system for student dropout prediction. We contribute three things: The evaluation of nine classifiers on historical student data revealed that XGBoost and gradient boosting tree-based ensemble models delivered the best results for all accuracy, recall, F1-score, and AUC performance metrics. The second part of our approach used Shapley additive explanations (SHAP) as a unifying interpretability layer to demonstrate that the most predictive models select their choices based on educationally significant factors, including administrative deadlines and cumulative GPA and progression indicators. The third improvement included diagnostic lenses for explainability analysis to identify abnormal profiles, uncertain probability ranges, and confident misclassifications.
The results showed that researchers need to achieve an equilibrium point that combines model interpretability with predictive accuracy. Although the gradient boosting family produces better ranking quality, its use in educational settings is only warranted when combined with clear explanations that enable stakeholders to confirm the consistency of risk factors and take appropriate action in response to them. The results were not statistical errors because we used confusion matrices, ROC analyses, and SHAP attributions to demonstrate that they produced consistent and understandable patterns in student data. The available information allowed for the creation of warning systems during the initial stages.
However, it is important to recognize a number of limitations. Some predictors (e.g., administrative hold timestamps such as AP_START_DTE and AP_END_DTE) are institution specific and may not exist or have the same semantics elsewhere. Division and program variables are reported in encoded form; several majors/divisions have small sample sizes, so subgroup estimates can be unstable and should be interpreted with caution or aggregated for governance reporting. Our fairness audit was intentionally restricted to routinely collected academic variables (credit load, division, and major); we did not infer or analyze protected attributes (e.g., gender or socioeconomic status). Although replication on a public dataset preserved model ordering, broader multi-institution validation remains necessary. The SHAP framework provided strong results but operated as an approximation that could have introduced biases from the models used for analysis. The research focused on tabular features, so it did not include the temporal dynamics and relational structures that define student trajectories.
Accordingly, we position the pipeline as a screening aid within a documented, human-in-the-loop workflow. Institutions should apply a single global threshold per model, archive subgroup metrics for audit, and avoid automating high-stakes decisions from model scores alone. Three research paths that support each other should be explored in the future. The continuous-time models neural ordinary differential equations (neural ODEs) and neural neural controlled differential equations (neural CDEs) track student performance and engagement metrics through their ability to detect early warning signs by analyzing time-dependent patterns in student data. The second method employs relational data between students, classes, and teachers to identify structural factors that affect persistence through graph-based learning techniques. Third, in order to help advisors and policymakers take prompt and fair action, deployment research should focus on creating real-time institutional dashboards that combine interpretable visualizations with predictive scores. The implementation of these methods can help produce dependable early-warning systems that maintain transparency, ethical standards, and flexibility for adapting to changing higher education requirements.