Next Article in Journal
PlantDRs: A Database of Dispersed Repeats in Plant Genomes Identified by the Iterative Procedure Method
Previous Article in Journal
ICA-Based Resting-State Networks Obtained on Large Autism fMRI Dataset ABIDE
Previous Article in Special Issue
NPFC-Test: A Multimodal Dataset from an Interactive Digital Assessment Using Wearables and Self-Reports
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Analysis of Student Dropout Risk in Higher Education Using Proportional Hazards Model and Based on Entry Characteristics

1
Institute of Computer Systems and Data Science, Latvia University of Life Sciences and Technologies, 2 Liela Street, LV-3001 Jelgava, Latvia
2
Study Centre, Latvia University of Life Sciences and Technologies, 2 Liela Street, LV-3001 Jelgava, Latvia
*
Author to whom correspondence should be addressed.
Data 2025, 10(7), 110; https://doi.org/10.3390/data10070110
Submission received: 27 May 2025 / Revised: 4 July 2025 / Accepted: 5 July 2025 / Published: 8 July 2025

Abstract

The aim of this study is to identify the key factors contributing to student dropout and to develop a predictive model that estimates the dropout risk of students based on their entry characteristics and enrolment registration data. Our analysis is based on the registration and academic data of 971 full-time and part-time bachelor’s students in five faculties, who were enrolled in the academic year 2021–2022 at the Latvia University of Life Sciences and Technologies (LBTU). The dropout analysis was done during the 3.5 years of study, when the students started their last semester in engineering and information technology, agriculture and food technology, economics and social sciences, and forest and environmental studies and when veterinary medicine students had completed more than half of their program of study. Survival analysis methods were used during the study. Students’ dropout risk in relation to gender, faculty, priority to study in the program, and secondary school performance (SM) was estimated using the Proportional hazard model (Cox model). The highest student dropout was observed during the first year of study. Secondary school performance was a significant predictor of students’ dropout risk; students with higher SM had a lower dropout risk (HR = 0.66, p < 0.05). As well, student dropout can be explained by faculty or study programme. Students in economics and social sciences were at lower dropout risk than the students from the other faculties. Results show the model’s concordance index was 0.59, and this indicates that additional or stronger predictors may be needed to improve model performance.

1. Introduction

Student dropout in higher education remains a persistent and widely recognised challenge across many societies. Despite extensive discussions and policy efforts, dropout rates continue to vary significantly across European Union (EU) countries and among different fields of study. In countries such as Latvia, demographic trends, particularly a declining school-age population, are expected to increase competition among higher education institutions (HEIs) for student enrolment and retention. Consequently, reducing student dropout is increasingly viewed not only as a matter of educational quality but also as a critical component of institutional sustainability.
A growing number of countries have either implemented or are transitioning to performance-based funding models, where financial allocations to HEIs are determined by student graduation rates rather than enrolment numbers [1]. This shift further emphasises the importance of predicting, identifying, and addressing the factors that may contribute to student dropout.
According to a report by the Organisation for Economic Co-operation and Development (OECD) [2], Latvian universities exhibit a high dropout rate in higher education, with only 48% of bachelor’s students completing their studies within the standard timeframe of three or four years. However, this does not necessarily mean that the remaining students fail to graduate entirely. Many of them take academic breaks, transfer to different study programmes or institutions, and eventually complete their degrees beyond the scheduled period [2]. The Latvian Ministry of Education and Science attributes this trend partly to labour shortages in various sectors, which lead students to prioritise employment over continuing their studies [3,4].
In the same OECD report [4], students identified several key reasons for discontinuing their studies, including inadequate support from academic staff and family, financial constraints, and a misalignment between initial expectations and the actual academic content. While the part of the Latvian population holding higher education degrees is rising, it remains below the OECD average: 34% of individuals aged 25 to 64 in Latvia possess a higher education qualification, compared to the OECD average of 37%.
Latvia also demonstrates the highest gender disparity among higher education graduates within OECD countries, with only 30% of graduates being male compared to 54% female. The Ministry of Education and Science has expressed concern regarding the particularly low share of doctoral degree holders in Latvia, at just 0.4%, significantly below the OECD average of 1.1%. At the same time, the OECD reports an increase in financial investment in higher education in Latvia.
For many years, educational institutions in Latvia [5], including the Latvia University of Life Sciences and Technologies (LBTU), have systematically collected student registration data at enrolment, as well as academic performance data throughout the course of study. The registration data include information on individual student characteristics (such as place of residence, gender, and age) and prior educational performance (e.g., academic success in secondary school subjects). LBTU is one of the largest universities in Latvia and plays a key role in educating students and conducting research in the country’s primary export sectors: forestry, agriculture and food production, information and communication technologies (ICT), and veterinary medicine.
As Europe, including Latvia, transitions toward a performance-based funding model for public universities, student satisfaction and a student-centred approach are becoming increasingly critical for ensuring the sustainable development of higher education institutions. In this context, our motivation is to identify effective solutions for predicting and addressing the widespread issue of student dropout, which remains a significant challenge across all universities in Latvia.
The aim of this study is to identify which variables significantly influence student dropout and to assess whether it is possible to predict academic success based on the registration data available at the beginning of a student’s university studies.

2. Related Works

To understand the causes of student dropout, various methods are employed. In typical scenarios, universities conduct brief exit interviews with students who intend to withdraw or ask them to complete a questionnaire. In other cases, HEIs carry out automatic exmatriculation due to administrative reasons such as unpaid tuition fees, prolonged absence without official notification, and unexplained withdrawal from studies. Researchers used an extensive questionnaire for nearly 150,000 students to find the reasons why undergraduate students drop out of university [6]. One study [6] concluded that gender, age, study branch, entry qualification, scholarship, nationality, and job are predictors of dropout and that males from abroad are more likely to drop out of the university studies in Spain. The same research identified that lack of scholarship and working during studies increase the chances of dropping out.
To predict and understand the factors contributing to student dropout, researchers have applied machine learning methods with promising results. These studies often utilise data such as log files from learning management systems and the number of failed courses per student [1]. For example, in a study on dropout prediction among bachelor’s students in Italy [7], machine learning techniques, specifically random forest and gradient boosting, were used. The authors identified that academic performance during the first year (measured by credits earned) is a significant predictor of dropout, with additional, though less impactful, factors including family income, high school graduation grades, and the type of secondary school attended. The study also mentions that the high dropout rate may be linked to the strict requirement in Italy that students must pass all courses to progress to the second year.
A similar emphasis on the importance of the first academic year is mentioned in a study conducted in Spain [8], which focused entirely on reducing dropout rates during the study period. The researchers found that program selection, study habits, and the amount of time dedicated to studying were the most influential factors associated with student dropout.
Another study [9] also identified student performance (measured by grades) and attendance in courses as the most significant factors for assessing the risk of dropout. In a study conducted in South Korea [10] focusing on dropout in distance education, researchers found that the most reliable data for predicting dropout came from students’ performance over the last four semesters. The study also show that dropout factors differ between genders. In the Austrian context [11], researchers observed that students who performed well in their bachelor’s programmes were also likely to succeed and graduate from master’s programmes. Additionally, when bachelor’s programmes included preparatory training for master’s-level studies, students required less effort to obtain the necessary credits for completing their master’s degrees.
In another study [12], the authors used the Knowledge Discovery in Databases (KDD) approach to predict and assess student dropout. The findings suggest that in order to reduce dropout rates, universities should focus on improving teaching quality, reviewing the accessibility of scholarships, and offering personalised curriculum options. However, the study also emphasises that many dropout-related factors are beyond the control of higher education institutions, including labour market demands, national economic conditions, and individuals’ social circumstances.
A data mining approach was also applied in a Portuguese study [13], where the national average dropout rate is approximately 29%. Analysing data from 331 computer engineering students, researchers found that the XGBoost algorithm provided the highest prediction accuracy, even in a situation with limited data availability. The study also showed that older students were less likely to drop out and that successful completion of the most challenging courses was an indicator that students would graduate.
Another Spanish study [8] addressed the known issue of higher dropout rates in engineering disciplines by surveying 624 engineering students across eight universities in Spain. The study evaluated 40 factors and found that 23 of them were directly related to student motivation. The most frequently mentioned reasons for dropout were poor academic performance and negative relationships with professors.
A study conducted in Denmark and Norway [14] revealed that dropout rates were higher among students from working-class and lower-middle-class backgrounds, across all analysed universities in both countries. In contrast, students from upper and upper-middle-class backgrounds had lower dropout rates. These observations persisted even when students had similar academic achievement at the point of secondary school graduation and university entry. The authors suggest that students from working-class backgrounds have higher social costs, which may contribute to their increased possibility of leaving higher education.
Regarding dropout reasons, those identified in Latvian higher education institutions align with the reasons mentioned by Rabelo & Zárate [12]. These include dropout due to program completion, dropout associated with transferring to another higher education institution, dropout resulting from a change of study program within the same institution or faculty, and dropout due to failure to meet academic requirements.
The analysed research continues to support Tinto’s Model of Student Integration [15], emphasizing the crucial role institutions play in creating an environment that assists student integration. Findings indicating that male international students are more likely to drop out, or that first-year academic performance is a key predictor of persistence, align with this framework. First-year academic success is often closely linked to a student’s degree of integration into university life, particularly in terms of relationships with professors and peers.
Moreover, students who are already socially and academically integrated during their bachelor’s studies, and who then pursue a master’s degree within the same faculty, tend to perform better. This pattern is clearly demonstrated in the Austrian study by Loder [11]. Similarly, a Portuguese study [13] that found that older students are less likely to drop out can also be interpreted through Tinto’s model: older, more mature students may integrate more easily due to their prior experience of integrating into various social groups.

3. Data Collection and Processing

3.1. Data Collection and Pre-Processing

For several years, the Latvia University of Life Sciences and Technologies (LBTU) has systematically collected both registration data from newly enrolled students and longitudinal data generated during the course of their studies. The dataset utilised in this study was obtained from the University Study Centre and comprises data from 971 bachelor’s students, enrolled in either full-time or part-time study modes, across five faculties during the 2021–2022 academic year.
LBTU offers four-year bachelor’s degree programmes in the following faculties: Engineering and Information Technology (IITF), Agriculture and Food Technology (LPTF), Economics and Social Sciences (ESAF), and Forestry and Environmental Sciences (MVZF). The Faculty of Veterinary Medicine (VMF) offers a six-year professional study program. Notably, the study programmes in information technology and veterinary medicine are delivered exclusively in a full-time format.
Student gender, faculty, priority to study in the programme, and sum of secondary school marks were included in the data set (Table 1).
Data pre-processing was conducted to convert the data to a suitable format, and data cleaning, variable encoding, and scaling were applied. Pre-processing was conducted using R 4.4.3 [16] and Microsoft Excel.

3.2. The Conceptual Model

The objective of this study is to identify variables associated with student dropout and to develop a predictive model for early identification of students at risk. The research employed survival analysis methodologies to examine time-to-dropout patterns.
Dependent Variable: The dependent variable in this study is survival time, representing the duration a student remains enrolled before dropping out. At the time of data extraction from the university system, students had completed up to eight semesters of study at LBTU. For students who discontinued their studies, survival time was recorded as the number of days or months from enrolment to dropout, with time fixed at the semester in which the dropout occurred.
Students who took an official academic leave (study break) or transferred to a different study program were not classified as dropouts; instead, they were considered continuing students, as they remained within the higher education system.
Explanatory Variables/Covariates: Student gender, faculty, finance source, priority to study in the programme, and sum of secondary school marks were included in the analysis (Table 1).
Variable Gender: Female and male.
Variable Faculty: IITF—Engineering and Information technology, LPTF—Agriculture and Food Technology, ESAF—Economics and Social Sciences, MVZF—Forestry and Environmental Sciences, and VMF—Veterinary Medicine.
Variable Finance: Students may begin their studies under a state-funded agreement, which is available exclusively to full-time students, or under a self-financed agreement, which is open to both full-time and part-time students. If a student fails to meet the academic requirements of the study program, they may be transferred from state-funded to self-financed status.
Variable Priority: During the application process, prospective students indicate their priority for enrolling in a specific study program. For the purpose of analysis, the priority variable was categorised into three groups (1st, 2nd, 3rd and lower) and one group where priority was not mentioned (NM). If the priority was NM, then students were enrolled during the additional admission and priority was not recorded in the system.
Variable Sum of secondary school marks (SM): Secondary school performance was calculated based on the sum of secondary school marks (SM), with the results of the centralised state examinations also incorporated into the overall assessment.
Survival and hazard probability analyses were employed to estimate dropout risks throughout the students’ period of study. The log-rank test was used to assess differences in dropout rates across factor groups such as gender, faculty, priority, and financial status. Additionally, the proportional hazards model was applied to identify key predictors of student dropout, facilitating the development of a predictive model. Results are presented in Section 4.2.
Survival probability S(t)—the probability that a student will survive from beginning of the study (t = 0) to at any given specified time (t = 0, 1, 2, …, 42 months)—was calculated for a study cohort in total and depending on gender, faculty, and priority to study in the programme. The survival function formula is
S ( t j ) = S ( t j 1 ) ( 1 d j n j )
where S(t)—the probability of survival at time j and j − 1; dj—the number of events (dropouts) at time j; nj—the number the students known to have survived at time j.
The hazard function is h(t) = −log(S(t)) and was calculated for a study cohort in total and depending on the factors; the h(t) value shows the probability that a student at a time t will have dropped out and the event—dropout—is occurring.
The log-rank test was used for the analysis of survival time between two or more groups of students based on survival curves. Differences in survival curves between genders, the five faculties, and the four study priority groups were compared.
χ 2 = i = 1 g ( O i E i ) 2 E i
where Oi—number of observed events; Ei—number of expected events; g—number of groups.
The relationship between students’ secondary school performance and their academic performance at university was also examined. The results were presented using the correlation coefficient and scatter plot visualization. Secondary school performance was calculated based on the sum of secondary school marks (SM), with the results from the centralised state examination included in the calculation. University academic performance was measured using the Weighted Average Mark (WAM), which is computed by multiplying each course grade (on a scale from 1 to 10) by its credit point value, summing these weighted grades, and dividing by the total number of credit points. Results are presented in Section 4.3.

3.3. Model Testing and Evaluation

For the students’ dropout prediction, Weibull [17] and Cox proportional hazards [18] models were used at the beginning of the study. The models were compared, and the models’ validity was examined through statistical testing (e.g., Akaike information criterion (AIC), log-likelihood, and covariate significance) and model assumptions (see Appendix A). For Weibull model assumptions, the linearity of covariates and model residual distribution were evaluated (see Appendix A, Figure A1). Schoenfeld residuals were examined to test the Cox model proportional hazards assumption, and diagnostic plots (scaled Schoenfeld residuals over time) were provided (see Appendix A, Figure A2). Both models (Weibull and Cox proportional hazards) showed the same significant covariates in the model (see Appendix A, Table A1); however, in the Weibull model case, the residuals were not normally distributed, so the Cox proportional hazards model (Cox PH Model 2) with four variables (see Appendix A, Figure A1) was selected for data processing.
The proportional hazard model (Cox model) [19] was used to predict the risk or hazard ratio (HR) of students’ dropout depending on gender, faculty, priority to study in the program, and SM:
hi(t) = [h0(t)] e(b0+b1xi1+b2xi2+b3xi3+b4xi4)
where
  • hi(t)—the hazard rate for the ith case at time t;
  • h0(t)—the baseline hazard at time t;
  • bj—the value of the jth regression coefficient;
  • xi1—gender (female, male);
  • xi2—faculty (IITF, LPTF, ESAF, MVZF, VMF);
  • xi3—priority to study in the program ((1st, 2nd, 3rd and lower, NM);
  • xi4—sum of secondary school marks (SM). The SM variable was standardised using z-score transformation.
The predictive performance of the Cox proportional hazards model was evaluated using the concordance index (C-index), log-likelihood, and the Akaike Information Criterion (AIC). The C-index assesses how accurately the model ranks students according to their dropout risk, with values closer to 1 indicating better predictive discrimination. Log-likelihood and AIC values were employed to evaluate model fit, where a higher log-likelihood and lower AIC suggest a better-fitting model. A significant p-value from the log-likelihood test further supports the model’s predictive validity.
Statistical analyses were performed using R [16], R libraries Survival [20], Survminer [21], and GGally [22].

4. Results and Discussion

4.1. Analysis of Student Dropout

Of the students enrolled at LBTU during the 2021–2022 academic year, 50.93% reached the eighth semester, which is the final semester for students in four faculties: ESAF, IITF, LPTF, and MVZF. As illustrated in Figure 1, notable differences were observed across faculties: 52.4% of students in ESAF, 47.3% in IITF, 51.8% in LPTF, and 49.6% in MVZF progressed to the final semester, compared to 64.0% in the Faculty of Veterinary Medicine (VMF).
Figure 1 also presents the distribution of students according to reasons for leaving the university across different faculties. A prominent trend is the relatively high proportion of students who voluntarily discontinued their studies: 22.1% for ESAF and 33.3% for VMF. While voluntary dropout is considered an acceptable reason for leaving, it does not provide sufficient insight into the underlying causes of student attrition.
Voluntary dropout can also be explained through Tinto’s Dropout Model [15], as it is closely associated with factors such as pre-entry attributes (e.g., student readiness) and initial commitment to studies (e.g., low motivation). Effective integration from school to university may be a critical factor in reducing voluntary dropout, particularly during the first year of study.
Other common reasons for dropout include failure to complete study courses and failure to commence studies. For instance, 11% of students at ESAF and 9.9% at MVZF enrolled but did not begin their studies.
On average, more than 49% of students at the university dropout for various reasons. Among these, 10.62% of dropouts are attributed to students failing to complete the requirements of their study program (i.e., not completing study courses). Additionally, 7.01% of students did not commence their studies, and 3.30% failed to fulfil financial obligations or did not sign an additional agreement related to changes in funding.
Students may begin their studies under a state-funded (budget) agreement, available only to full-time students, or under a self-financed agreement, which applies to both full-time and part-time students. If academic requirements are not met (i.e., courses are not passed), students may either be expelled or transferred from budget-funded to self-financed status. Consequently, students facing financial difficulties may voluntarily leave the university, resulting in dropout.
Overall, 28.14% of students dropped out voluntarily. This group includes students who did not return from a study break, failed to register for courses, or reported being unable to balance academic demands with employment, leading to incomplete coursework.
Our previous investigation revealed that approximately 50% of IT students who dropped out did so due to failure to complete the study program curriculum, whereas only 5% attributed their dropout to financial difficulties [23]. A study by Tayebi et al. [8] across nine universities in Spain examined the dropout reasons reported by students (n = 285). The most frequently cited reasons included academic difficulty (approximately 18%), poor academic performance (12%), and dissatisfaction with teaching quality (around 11%). Additional factors mentioned were stress, financial constraints, lack of time, and the challenge of balancing work alongside studies.
Furthermore, Pusztai et al. [24] highlighted that student dropout is influenced not only by gender and financial status but also by poor communication and insufficient engagement between students and academic staff, which was is also emphasised in Tinto’s Dropout Model.

4.2. Student Survival and Dropout Risk

The student dropout rate decreased during the study period, with the highest student dropout observed during the first study year. Figure 2a presents the survival probability (St) of the students. The St declined during the study period: it was 73.0 ± 1.42% at 6 months and further decreased to 64.1 ± 1.54% at 12 months (Figure 2a). During further studies, the dropout rate of students decreased, and the total student dropout reached around 49% before the end of the study period. The student dropout risk rapidly grows during the first year of study and then tends to decrease (Figure 2b). This aligns with findings of related work [23,25] and supports a core principle of Tinto’s Dropout Model: that poor integration and low commitment significantly increase the risk of student dropout. Consequently, universities should invest in support services for new students, particularly during the first year of study, to enhance student commitment and integration into academic life and reduce dropout rates.
Students’ dropout rate depends on many factors. Figure 3 shows students’ overall survival depending on finance source (Figure 3a), gender (Figure 3b), faculty (Figure 3c), and study priority (Figure 3d). Significant differences in survival time by the log-rank test were obtained between genders (p < 0.0053), with females having higher survival compared to males. Students from the Faculty of Veterinary Medicine have higher survival rates compared to the other four faculties. In addition, better survival during the first year of study was seen when students were enrolled in their first-priority university and study programme. In Latvia’s unified student enrolment information system, applicants have the option to select multiple priority choices across different universities throughout the country.
The risk of student dropout increases during the first year of study (Figure 4). During the initial semester (approximately six months), no significant differences in dropout risk were observed between genders (Figure 4b) or among faculties (Figure 4c). However, the dropout risk increases over time, with males and students from the IITF faculty exhibiting a higher risk. Additionally, students admitted through the additional admission category (study priority NM) demonstrate a higher likelihood of dropping out after the first year of study (Figure 4d).
Our previous research has consistently shown that student dropout is influenced by the type of study programme [23,25], with the highest dropout rates typically occurring during the first year of study. Wild and Heuling [26] report that, on average, 50% of students in higher education drop out, primarily within the first academic year. Dropout rates are particularly high in STEM and engineering programmes, often exceeding 50% [27]. Similar findings were reported by Behr et al. [28], who observed the highest dropout rates in engineering, mathematics, and natural sciences, as well as in law, economics, and social sciences.
Numerous studies also demonstrate gender differences in dropout rates, with men generally facing a higher risk of dropout than women [29,30,31]. Lower dropout rates among women are often attributed to greater self-efficacy, better academic performance [30,31], and stronger social integration. However, some studies present contrasting results. For instance, Pedersen et al. [32] and Astorne-Figari et al. [33] found that women exhibit higher dropout rates in certain STEM fields. Similarly, Meyer and Strauß [34] observed that women studying in gender-atypical disciplines are more likely to drop out compared to their male counterparts.

4.3. Relationship Between Secondary School and Academic Performance

The student SM and WAM grades range between 46.0 and 512.9, with an average of 211.4 ± 2.68, and between 1.0 (4.0) and 9.49, with an average 6.88 ± 0.05 (7.05 ± 0.04), respectively. A positive relationship exists between SM and WAM, with a moderate correlation coefficient (r = 0.359) (Figure 5). This means that students with higher SM scores also have better WAM results. The dataset reveals a clear tendency: students in the Faculty of Veterinary Medicine (VMF) demonstrate higher secondary school marks (SM) and weighted average marks (WAM) compared to students in other faculties. This may help explain the lower dropout risk observed among VMF students. Additionally, the correlation between SM and WAM is stronger for students who are state funded (r = 0.390) compared to those who are self-financed and is highest among students who selected their study programme as their first priority (r = 0.416). These findings suggest that prior academic achievement and initial study motivation are important predictors of academic success and retention.
The empirical results show the moderate relationship between SM and WAM, that is, student school achievement or SM is one of the significant predictors of student WAM and dropout rate. Masci et al. [35] conclude that early student academic performance is the most important factor in predicting student dropout.

4.4. Risk Factor Analysis of Student Dropout

In many studies, machine learning approaches have been employed to evaluate student dropout in higher education [13]. In contrast, our investigation applied survival analysis to predict student dropout. This method not only estimates the probability of dropout but also provides insight into the timing of dropout events and identifies the factors that significantly influence the likelihood of dropout [35,36,37].
Gutierrez-Pachas et al. [36] concluded that the Cox proportional hazards model provides one of the most effective approaches for predicting dropout through survival curves. Their study highlighted that academic variables significantly influence dropout risk and should be prioritised in predictive modelling, whereas socioeconomic and gender variables were found to have relatively low correlation with dropout.
Our previous study [25] also demonstrated that dropout risk during the first year of study is influenced by gender, secondary school performance, and faculty affiliation. The highest dropout rate was observed among students from the Faculty of Information Technologies. Findings from previous studies, supplemented by our own results (Appendix A), indicated that four variables (gender, faculty, priority, and SM) would be included in the Cox proportional hazard model. Figure 6 summarises the results from the Cox proportional hazard model. Each variable included in the model is presented by its hazard ratio (HR), 95% confidence interval (CI), and p-value.
Based on the results of the Cox proportional hazards model (Equation (3)), one of the most significant predictors of student dropout is the student’s secondary school performance (SM) (p < 0.001), which positively affects student graduation rate [38]. Students with higher SM have lower risk of dropout (HR = 0.66, CI (0.57–0.76)); for every one-point increase in a student’s SM, the risk of dropping out decreases by 34%. In the current investigation, a lower risk of student dropout was observed among female students and those who selected their programme as a first and third priority choice.
After combining the factors in the Cox model, the risk of dropout also varies by faculty or study programme (Figure 6), and students in the veterinary medicine programme (VMF) have a significantly higher dropout risk (HR = 1.94, CI (1.08–3.48), p < 0.05) compared to students in other faculties. A difference in hazard ratios was observed when the faculty variable was analysed separately compared to when it was included in the full model. For example, the dropout risk for students in the Faculty of Veterinary Medicine increased significantly, with the hazard ratio rising to 1.94. This may be explained by the fact that students in the VMF faculty typically have higher grades than those in other faculties, and some students with good academic performance drop out of university; as a result, these students are at higher risk of dropout.
Students enrolled in Agriculture and Food Technology (LPTF; HR = 1.31) have a slightly higher dropout risk, followed by those in the Faculty of Engineering and Information Technologies (IITF, HR = 1.12), and Forestry and Environmental Sciences (MVZF; HR = 1.01), when compared to students in the Faculty of Economics and Social Sciences (ESAF), which is defined as the reference group.
Higher dropout rates in VMF, LPTF, and IITF programmes may be linked to a concerning trend in Latvia: declining student performance in national graduation exams in mathematics, physics, and chemistry. This ongoing issue has caused frustration, particularly as the Ministry of Education and Science has not presented a clear strategy to improve performance in these key subjects at the school level. Furthermore, many engineering and IT students report that the content of their university studies does not meet their expectations. They often enter these programmes with a limited understanding of the academic demands, particularly in challenging subjects such as advanced mathematics, physics, and chemistry, which may contribute to their decision to drop out.
To increase student interest in their chosen program, faculties have taken a proactive approach to student recruitment. For example, prospective students are given opportunities to shadow students and teaching staff, allowing them to gain a realistic understanding of the profession before enrolling.
The result may also reflect that specific study programmes (e.g., forestry, veterinary medicine, and food technology) are uniquely available at LBTU in Latvia, strengthening students’ commitment to completing them.
The Cox proportional hazards model was statistically significant (global p-value < 0.05). However, the model’s concordance index was 0.59 (Figure 6), and this indicates that additional or stronger predictors may be needed to improve model performance.
Incorporating interaction effects into the model can enhance its predictive accuracy. In this study, potential interaction terms, such as those between a student’s secondary school performance (SM) and gender, or between SM and the faculty of admission, can be statistically significant. Specifically, the effect of SM appears to vary depending on a student’s gender or the faculty in which they are enrolled. Thus, the influence of secondary school performance on student dropout risk may be dependent not only on SM itself but also on its interaction with other factors, such as gender and faculty affiliation.

5. Conclusions

This study found that the highest student dropout occurred during the first year of study, aligning with previous research [7,38]. The risk of student dropout increases during the first year and then gradually declines over time.
Among the most significant factors contributing to dropout were students’ secondary school performance (p < 0.001) and the type of academic discipline. A positive correlation was observed between secondary school performance and academic performance at university (r = 0.359), indicating that students with stronger academic backgrounds in secondary school tend to perform better in higher education. This variable may serve as an important predictor in dropout risk models.
Higher dropout rates were observed in veterinary, engineering, technology, forestry, and agriculture programmes, whereas social sciences exhibited lower dropout rates. These findings partially contrast with those of González-Morales et al. [6], who reported the highest dropout rates in social and legal sciences and the lowest in health sciences.
First-year university curricula typically include STEM courses (such as mathematics, chemistry, and physics), with varying credit point allocations depending on the specific study program. Many students begin their university studies with insufficient prior knowledge in these subjects, particularly in mathematics, physics, or chemistry. As a result, students with low confidence or inadequate preparation in these areas often struggle to meet academic requirements, which may contribute to their decision to drop out. Additionally, the academic demands of university education, characterised by subject complexity and the necessity for consistent, self-directed learning, can further impact these challenges.
Student dropout remains a global challenge, carrying significant economic and social costs. The Cox proportional hazards model developed in this study enables early identification of students at higher risk of dropout. This predictive capability offers universities the opportunity to design and implement targeted interventions aimed at reducing dropout rates and improving student retention.
The limitations of this study include the following: the analysis is based on data from a single academic period (2021–2025); potentially influential factors (e.g., socio-economic status, student health conditions, and detailed data on course engagement) were not included due to data unavailability.
Future work could address these limitations by expanding the dataset to multiple years (study periods) or different institutions to validate and improve generalizability. Integrating additional data sources, such as data from student activity on e-learning systems, and exploring advanced machine learning methods (e.g., Random Forests, XGBoost, Support Vector Machine) could provide more in-depth analysis and enhance model predictive performance, especially when applying larger datasets over time. Finally, developing a real-time risk monitoring software module in e-learning and student management systems could support early interventions.

Author Contributions

Conceptualization, L.P. and I.A.; methodology, L.P. and I.A.; software, L.P.; validation, I.A. and L.P.; formal analysis, I.A., L.P. and G.V.; investigation, L.P., G.V. and S.S.; resources, I.A., G.V. and L.P.; data curation, S.S.; writing—original draft preparation, G.V. and L.P.; writing—review and editing, G.V., I.A. and L.P.; visualization, L.P.; supervision, L.P. and I.A. project administration, L.P. and S.S.; funding acquisition, I.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This study did not involve human experimentation. The data were obtained from the LBTU platform through the LBTU Study Centre Secretariat. No subject identification information, such as name or code, was included in these data.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset on higher education student entry characteristics and student dropout is available at: https://dv.dataverse.lv/dataset.xhtml?persistentId=doi:10.71782/DATA/X7CWTH (accessed on 7 July 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Assumptions and Comparison of Weibull and Cox Proportional Hazards Models

Figure A1. Weibull model assumptions: linearity of covariates (a) and model residual distribution (b).
Figure A1. Weibull model assumptions: linearity of covariates (a) and model residual distribution (b).
Data 10 00110 g0a1
Figure A2. Proportional hazards assumption diagnostic plots (scaled Schoenfeld residuals over time) for model factors gender (a), faculty (b), priority (c), and SM (d).
Figure A2. Proportional hazards assumption diagnostic plots (scaled Schoenfeld residuals over time) for model factors gender (a), faculty (b), priority (c), and SM (d).
Data 10 00110 g0a2
Table A1. Comparison of Weibull and Cox Proportional Hazards models.
Table A1. Comparison of Weibull and Cox Proportional Hazards models.
ModelTime Variable DistributionAICLog-LikelihoodSignificant Covariates Assumptions
Cox PHDoes not assume a specific distribution for survival time6013.8−2996.9 (df = 10)Faculty
Finance
SM
PH assumptions are met for Cox PH Model 2 (see Figure A3)
Weibull Assumes a specific distribution for survival time4156.2−2066.1 (df = 12)Faculty
Finance
SM
Residuals not normally distributed (see Figure A1)
The proportional hazards assumption was tested, and the authors concluded the following (see Figure A3):
Cox PH Model 1 with five variables indicates that the overall proportional hazards (PH) assumption is not met (GLOBAL test: p < 0.05); at least one factor violates the PH assumption—Finance (p < 0.05). This could be related with finance source changes from semester to semester.
The authors decided exclude the Finance factor from the analysis. In this case, the PH assumption is met (p > 0.05) for each factor, and the effect is proportional over time.
Figure A3. Diagnostic of proportional hazards assumption: Cox PH Model 1 with five variables (a) and Cox PH Model 2 with four variables (b).
Figure A3. Diagnostic of proportional hazards assumption: Cox PH Model 1 with five variables (a) and Cox PH Model 2 with four variables (b).
Data 10 00110 g0a3

References

  1. Vaarma, M.; Li, H. Predicting student dropouts with machine learning: An empirical study in Finnish higher education. Technol. Soc. 2024, 76, 102474. [Google Scholar] [CrossRef]
  2. OECD. Education at a Glance 2024 (Education at a Glance). 2024. Available online: https://www.oecd.org/en/publications/education-at-a-glance-2024_c00cad36-en.html (accessed on 7 July 2025).
  3. Latvia Ministry of Education. OECD INES|Ministry of Education. 2024. Available online: https://www.izm.gov.lv/lv/oecd-ines (accessed on 24 February 2025).
  4. OECD. Education at a Glance 2022. OECD Indicators/Education at a Glance. 2022. Available online: https://www.oecd.org/en/publications/education-at-a-glance-2022_3197152b-en.html (accessed on 7 July 2025).
  5. Latvia Ministry of Education. Higher Education|Study in Latvia. 2024. Available online: https://studyinlatvia.lv/higher-education# (accessed on 24 February 2025).
  6. González-Morales, M.O.; López-Aguilar, D.; Álvarez-Pérez, P.R.; Toledo-Delgado, P.A. Dropping out of higher education: Analysis of variables that characterise students who interrupt their studies. Acta Psychol. 2025, 252, 104669. [Google Scholar] [CrossRef] [PubMed]
  7. Delogu, M.; Lagravinese, R.; Paolini, D.; Resce, G. Predicting dropout from higher education: Evidence from Italy. Econ. Model. 2024, 130, 106583. [Google Scholar] [CrossRef]
  8. Tayebi, A.; Gomez, J.; Delgado, C. Analysis on the lack of motivation and dropout in engineering students in Spain. IEEE Access 2021, 9, 66253–66265. [Google Scholar] [CrossRef]
  9. Pecuchova, J.; Drlik, M. Predicting Students at Risk of Early Dropping Out from Course Using Ensemble Classification Methods. Procedia Comput. Sci. 2023, 225, 3223–3232. [Google Scholar] [CrossRef]
  10. Seo, E.Y.; Yang, J.; Lee, J.E.; So, G. Predictive modelling of student dropout risk: Practical insights from a South Korean distance university. Heliyon 2024, 10, e30960. [Google Scholar] [CrossRef] [PubMed]
  11. Loder, A.K.F. Master’s programs’ dropout and graduation clusters in a university system with a multiple enrollment policy. Int. J. Educ. Res. Open 2025, 8, 100423. [Google Scholar] [CrossRef]
  12. Rabelo, A.M.; Zárate, L.E. A model for predicting dropout of higher education students. Data Sci. Manag. 2025, 8, 72–85. [Google Scholar] [CrossRef]
  13. Da Silva, D.E.M.; Pires, E.J.S.; Reis, A.; De Moura Oliveira, P.B.; Barroso, J. Forecasting students dropout: A UTAD university study. Future Internet 2022, 14, 76. [Google Scholar] [CrossRef]
  14. Helland, H.; Strømme, T.B.; Thomsen, J.-P. Social inequality in dropout rates in higher education: Denmark and Norway. Stud. High. Educ. 2024, 1–16. [Google Scholar] [CrossRef]
  15. Tinto, V. Leaving College: Rethinking the Causes and Cures of Student Attrition, 2nd ed.; University of Chicago Press: Chicago, IL, USA, 1993. [Google Scholar] [CrossRef]
  16. R Core Team. R, version 4.4.3; A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2025. Available online: https://www.R-project.org/ (accessed on 15 February 2025).
  17. Murthy, D.P.; Xie, M.; Jiang, R. Weibull Models; John Wiley & Sons: Hoboken, NJ, USA, 2004. [Google Scholar]
  18. Cox, D.R. Regression models and life-tables. J. R. Stat. Soc. Ser. B 1972, 34, 187–220. [Google Scholar] [CrossRef]
  19. Therneau, T.M.; Grambsch, P.M. Modeling Survival Data: Extending the Cox Model; Springer Science & Business Media: Berlin, Germany, 2000; ISBN 0-387-98784-3. [Google Scholar]
  20. Therneau, T. R Package, version 3.8-3; A Package for Survival Analysis in R; R Foundation for Statistical Computing: Vienna, Austria, 2024. Available online: https://CRAN.R-project.org/package=survival (accessed on 15 February 2025).
  21. Kassambara, A.; Kosinski, M.; Biecek, P. R Package, version 0.5.0; Survminer: Drawing Survival Curves Using ‘ggplot2’; R Foundation for Statistical Computing: Vienna, Austria, 2024. Available online: https://rpkgs.datanovia.com/survminer/index.html (accessed on 3 February 2025).
  22. Schloerke, B.; Cook, D.; Larmarange, J.; Briatte, F.; Marbach, M.; Thoen, E.; Elberg, A.; Crowley, J. R Package, version 2.2.1; GGally: Extension to ‘ggplot2’; R Foundation for Statistical Computing: Vienna, Austria, 2024. Available online: https://github.com/ggobi/ggally (accessed on 15 February 2025).
  23. Paura, L.; Arhipova, I.; Vitols, G. Evaluation of students dropout rate and reasons during the study. In INTED2017 Proceedings; IATED: Valencia, Spain, 2017; pp. 2233–2238. [Google Scholar] [CrossRef]
  24. Pusztai, G.; Fényes, H.; Kovács, K. Factors Influencing the Chance of Dropout or Being at Risk of Dropout in Higher Education. Educ. Sci. 2022, 12, 804. [Google Scholar] [CrossRef]
  25. Paura, L.; Arhipova, I. Cause Analysis of students’ dropout rate in higher education study program. Procedia—Soc. Behav. Sci. 2014, 109, 1282–1286. [Google Scholar] [CrossRef]
  26. Wild, S.; Heuling, L.S. Student dropout and retention: An event history analysis among students in cooperative higher education. Int. J. Educ. Res. 2020, 104, 101687. [Google Scholar] [CrossRef]
  27. Kabashi, Q.; Shabani, I.; Caka, N. Analysis of the student dropout rate at the Faculty of Electrical and Computer Engineering of the University of Prishtina, Kosovo, from 2001 to 2015. IEEE Access 2022, 10, 68126–68137. [Google Scholar] [CrossRef]
  28. Behr, A.; Giese, M.; Teguim, K.H.D.; Theune, K. Dropping out from Higher Education in Germany an Empirical Evaluation of Determinants for Bachelor Students. Open Educ. Stud. 2020, 2, 126–148. [Google Scholar] [CrossRef]
  29. Espinoza, O.; Sandoval, L.; González, L.; Maldonado, K.; Larrondo, Y.; Corradi, B. Reasons for university dropout in Chile: Does student gender play a role? Educ. Rev. 2024, 77, 562–577. [Google Scholar] [CrossRef]
  30. Fior, C.A.; Polydoro, S.A.J.; Pelissoni, A.M.S.; Dantas, M.A.; Martins, M.J.; Da Silva Almeida, L. Impact of self-efficacy and academic performance in the dropout of higher education students. Psicol. Esc. E Educ. 2022, 26, e235218. [Google Scholar] [CrossRef]
  31. Cocoradă, E.; Curtu, A.L.; Năstasă, L.E.; Vorovencii, I. Dropout Intention, Motivation, and Socio-Demographics of Forestry students in Romania. Forests 2021, 12, 618. [Google Scholar] [CrossRef]
  32. Pedersen, J.V.; Nielsen, M.W. Gender, self-efficacy and attrition from STEM programmes: Evidence from Danish survey and registry data. Stud. High. Educ. 2023, 49, 47–61. [Google Scholar] [CrossRef]
  33. Astorne-Figari, C.; Speer, J.D. Drop out, switch majors, or persist? The contrasting gender gaps. Econ. Lett. 2018, 164, 82–85. [Google Scholar] [CrossRef]
  34. Meyer, J.; Strauß, S. The influence of gender composition in a field of study on students’ drop-out of higher education. Eur. J. Educ. 2019, 54, 443–456. [Google Scholar] [CrossRef]
  35. Masci, C.; Giovio, M.; Mussida, P. Survival models for predicting student dropout at university across time. Int. Conf. Educ. New Dev. 2022, 1, 203–207. [Google Scholar] [CrossRef]
  36. Gutierrez-Pachas, D.A.; Garcia-Zanabria, G.; Cuadros-Vargas, E.; Camara-Chavez, G.; Gomez-Nieto, E. Supporting Decision-Making Process on Higher Education Dropout by Analyzing Academic, Socioeconomic, and Equity Factors through Machine Learning and Survival Analysis Methods in the Latin American Context. Educ. Sci. 2023, 13, 154. [Google Scholar] [CrossRef]
  37. Kalamaras, D.; Maska, L.; Nasika, F. A Cox Proportional Hazards Model with Latent Covariates Reflecting Students’ Preparation, Motives, and Expectations for the Analysis of Time to Degree. Stats 2025, 8, 37. [Google Scholar] [CrossRef]
  38. Llauró, A.; Fonseca, D.; Romero, S.; Aláez, M.; Lucas, J.T.; Felipe, M.M. Identification and comparison of the main variables affecting early university dropout rates according to knowledge area and institution. Heliyon 2023, 9, e17435. [Google Scholar] [CrossRef]
Figure 1. Percentage of students who progressed to the eighth semester and the distribution of students by causes of dropout.
Figure 1. Percentage of students who progressed to the eighth semester and the distribution of students by causes of dropout.
Data 10 00110 g001
Figure 2. Survival probability (%) of students (a) and dropout risk (b) for study cohort in total.
Figure 2. Survival probability (%) of students (a) and dropout risk (b) for study cohort in total.
Data 10 00110 g002
Figure 3. Survival probability (%) of students depending on (a) finance source, (b) gender, (c) faculty, and (d) priority.
Figure 3. Survival probability (%) of students depending on (a) finance source, (b) gender, (c) faculty, and (d) priority.
Data 10 00110 g003
Figure 4. Hazard rate of students’ dropout depending on (a) finance source, (b) gender, (c) faculty, and (d) priority.
Figure 4. Hazard rate of students’ dropout depending on (a) finance source, (b) gender, (c) faculty, and (d) priority.
Data 10 00110 g004
Figure 5. Relationship between SM and WAM in upper and lower panels and distribution of SM and WAM on diagonal panels depending on (a) finances, (b) gender, (c) faculty, and (d) priority. Statistical analysis was performed by correlation analysis (** p < 0.01, *** p < 0.001).
Figure 5. Relationship between SM and WAM in upper and lower panels and distribution of SM and WAM on diagonal panels depending on (a) finances, (b) gender, (c) faculty, and (d) priority. Statistical analysis was performed by correlation analysis (** p < 0.01, *** p < 0.001).
Data 10 00110 g005
Figure 6. Proportional hazard model results: HR—hazard ratio; HR 95% CI—hazard ratio 95% confidence interval; p-value (* p < 0.05, *** p < 0.001).
Figure 6. Proportional hazard model results: HR—hazard ratio; HR 95% CI—hazard ratio 95% confidence interval; p-value (* p < 0.05, *** p < 0.001).
Data 10 00110 g006
Table 1. Variable description.
Table 1. Variable description.
Variable NameType of DataNo. of Levels or RangeDescription
GenderNominal2Female, Male
FacultyNominal5IITF, LPTF, ESAF, MVZF, VMF
Dropout dateDate-Student dropout date
Dropout semesterNominal7Student dropout semester
Dropout reasonNominal4Did not start study, voluntary dropout,
did not fulfil financial obligations,
did not complete study courses
Dropout after daysNumeric0–1258 daysSurvival time = difference between student dropout date and study start date
Dropout after monthNumeric0–41 monthsStudent survival time in months
StatusNominal21—dropout; 0—study
PriorityNominal4Not mentioned (NM),
1st, 2nd, 3rd and lower
Finance sourceNominal2Government financed, self-financed
WAMNumeric1–9.49Weighted Average Mark
SMNumeric46–512.9Sum of secondary school marks and the results from the central exam
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Paura, L.; Arhipova, I.; Vitols, G.; Sproge, S. Analysis of Student Dropout Risk in Higher Education Using Proportional Hazards Model and Based on Entry Characteristics. Data 2025, 10, 110. https://doi.org/10.3390/data10070110

AMA Style

Paura L, Arhipova I, Vitols G, Sproge S. Analysis of Student Dropout Risk in Higher Education Using Proportional Hazards Model and Based on Entry Characteristics. Data. 2025; 10(7):110. https://doi.org/10.3390/data10070110

Chicago/Turabian Style

Paura, Liga, Irina Arhipova, Gatis Vitols, and Sandra Sproge. 2025. "Analysis of Student Dropout Risk in Higher Education Using Proportional Hazards Model and Based on Entry Characteristics" Data 10, no. 7: 110. https://doi.org/10.3390/data10070110

APA Style

Paura, L., Arhipova, I., Vitols, G., & Sproge, S. (2025). Analysis of Student Dropout Risk in Higher Education Using Proportional Hazards Model and Based on Entry Characteristics. Data, 10(7), 110. https://doi.org/10.3390/data10070110

Article Metrics

Back to TopTop