Early Dropout Prediction Model: A Case Study of University Leveling Course Students

Sandoval-Palis, Iván; Naranjo, David; Vidal, Jack; Gilar-Corbi, Raquel

doi:10.3390/su12229314

Open AccessArticle

Early Dropout Prediction Model: A Case Study of University Leveling Course Students

¹

Departamento de Formación Básica, Escuela Politécnica Nacional, Quito 17-01-2759, Ecuador

²

Department of Developmental Psychology and Didactics, University of Alicante, 03690 Alicante, Spain

^*

Author to whom correspondence should be addressed.

Sustainability 2020, 12(22), 9314; https://doi.org/10.3390/su12229314

Submission received: 26 September 2020 / Revised: 20 October 2020 / Accepted: 22 October 2020 / Published: 10 November 2020

(This article belongs to the Section Sustainable Education and Approaches)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The school-dropout problem is a serious issue that affects both a country’s education system and its economy, given the substantial investment in education made by national governments. One strategy for counteracting the problem at an early stage is to identify students at risk of dropping out. The present study introduces a model to predict student dropout rates in the Escuela Politécnica Nacional leveling course. Data related to 2097 higher education students were analyzed; a logistic regression model and an artificial neural network model were trained using four variables, which incorporated student academic and socio-economic information. After comparing the two models, the neural network, with an experimentally defined architecture of 4–7–1 architecture and a logistic activation function, was selected as the model that should be applied to early predict dropout in the leveling course. The study findings show that students with the highest risk of dropping out are those in vulnerable situations, with low application grades, from the Costa regime, who are enrolled in the leveling course for technical degrees. This model can be used by the university authorities to identify possible dropout cases, as well as to establish policies to reduce university dropout and failure rates.

Keywords:

dropout; artificial neural network; logistic regression; dropout prediction model; university students

Graphical Abstract

1. Introduction

One of the most serious problems facing academic institutions is the high rate of student failure. Several studies have provided evidence that student failure is influenced by an interaction between several decisive factors throughout the academic process, suggesting that the risk of dropping out is configured from a group of variables, rather than a single variable [1,2,3].

The problem of higher education attrition has been extensively researched [4]. In 1973, Tinto and Cullen defined two categories of dropping out: leaving the college of registration, and failing to obtain any degree [5]. Both definitions cover a wide range of concerns, including economics. As governments make substantial investments in public and community colleges, they need to measure how much they spend on students who drop out during the first year. During the 2008–2009 academic year, U.S. taxpayers spent more than USD 900 million on full-time, degree-seeking community college students who dropped out during their first year [6]. Between 2003 and 2008, the U.S. invested nearly USD 6.2 billion in colleges and universities to educate students who did not return for a second year. State governments gave more than USD 1.4 billion and the federal government gave more USD 1.5 billion in grants to students who did not return for a second year [7].

In Europe, many countries do not systematically monitor higher education success rates. In a study of 35 European countries, only 12 regularly report indicators related to completion. Even fewer countries report on retention rates, dropout rates, or time-to-degree. The available cross-country comparative statistics must be interpreted with care due to differences in underlying definitions, context, and institutional arrangements across higher education systems [8]. Norway is one of the most concerned countries, analyzing significant higher education data; in a report published on 20 August 2020, 67.5% of new students in Norway completed a degree within 8 years [9].

In countries that belong to the OECD, 12% of students who enter a full-time bachelor’s program, on average, leave the tertiary system before beginning their second year of study. This share increases to 20% by the end of the program’s theoretical duration and to 24% three years later. In all countries with available data, women have higher completion rates than men in BA programs [10].

In Latin America, the performance of the higher education system has been disappointing. On average, around half of citizens aged 25–29 have not established a career. Only Mexico and Peru have a completion rate near that of the United States (65 percent). In Colombia, around 37 percent of students who begin a BA program drop out of the higher education system altogether; this rises to around 53 percent among students who begin short-cycle programs. Around 36 percent of all dropouts leave university at the end of their first year, in contrast to approximately 15 percent of student dropouts in the United States. Despite the concentration of dropouts at the start of their college careers, almost 30 percent of all dropouts leave the system after four years [11]. According to a national survey of employment, unemployment, and underemployment in Ecuador, there are 7,780,767 people aged between 25 to 64 years, of whom only 20% have been to university. It is therefore highly probable that the remaining 80% includes students who have dropped out of college.

In view of the previous data, we consider that the study of the student dropout phenomenon is a fundamental aspect for the improvement of higher education, and with this study, we intend to answer some questions, such as: what socioeconomic and academic variables influence dropout during the first levels of higher education? Can a predictive model be defined for the identification of these factors in order to improve university student retention and decrease dropout rates?

2. Literature Review

The academic and socio-economic backgrounds of students with lower socio-economic status can negatively affect their ability to remain at university [1,2,3]. In various studies, factors including a lack of prior academic preparation and economic and financial difficulties have been shown to potentially cause students to drop out of the higher education system [12,13,14]. However, these are not the only factors affecting dropout rates; the problem must be considered as a phenomenon affecting the entire higher education system in which endogenous and exogenous factors converge within the same system.

According to Donoso et al. [15], the retention of students in university education is a broad phenomenon, related to access and higher education selection policies, which reflect the fact that some high-school graduates do not possess the skills, conditions, capacities, aptitudes, or competences to continue their university studies. Recent studies have concluded that the college-dropout issue generally arises during the early years of an individual’s career [16]; Tinto [17] highlights two critical periods when the risk of desertion is higher than usual. The first critical period is the admissions process, when a student first accesses the university. The second critical period occurs during the first semesters spent in university, when the student begins the process of social and academic adaptation. Bean [18] points out that dropout is not only due to academic variables but can also be explained by psychosocial, environmental, and socialization factors. Authors such as Chen and DesJardins [19,20] argue that dropout is based on a cost-benefit decision, highlighting the impact of student benefits in said decision. Braxton et al. [21] and Kuh [22] point out that dropout depends on the quality of teaching and the student’s learning experience. When students enter higher education, they present their own family and personal characteristics; they must find ways to fit in with the reality of the institutional social system. The organization of higher education institutions has an impact on the individual and his or her socialization and satisfaction [23]. For this reason, a student’s social relations with classmates, teachers, and the social environment are vitally important [24]; it is essential to achieve a balance between social adaptation and social support [8,24].

In this context, greater attention must be paid to retention rates during the first stages of education. This is vitally important for international evaluations, which reflect the capacity and effectiveness of institutions in retaining students, given that the highest dropout rates occur early [25]. Factors that can lead to early desertion include performance, but also region of origin, age, and year of admission. Economic factors are not necessarily of vital importance [26].

During the last two decades, there has been an increase in empirical studies of variables associated with performance in higher education; Schneider [4] conducted a systematic review of the literature, including 38 meta-analyses based on almost 2 million students. They set out to answer this important question: ‘What characteristics of students, teachers and instruction are strongly associated with better learning outcomes?’ The key variables that explain academic performance appear to be (a) related to the following instructional variables: social interaction, stimulating meaningful learning, evaluation and feedback, presenting information clearly, technology employment, and extracurricular training programs, as well as (b) related to the following student variables: intelligence and previous achievements, strategies, motivation, personality, and context.

Several studies have attempted to identify the factors that influence dropout rates. In this context, factors such as monthly family income, type of school, type of housing, and even gender have been identified as factors that influence the student-dropout phenomenon [12,27]. A Romanian study showed that academic satisfaction at the beginning of a course was a significant predictor of dropout intention [28]. In South Korea [29] and in Latin America, Amo et al. [28] found that getting a job and receiving a student scholarship were two major factors that reduced the university dropout rate. Other factors to consider include the financial situation [8], class attendance [24] and study time [24,30]; for example, students who combine part-time study with work are more likely to drop out of higher education [30].

Gallegos et al. [26] found that the student’s region of origin, grades, and scholarships are variables that affect the probability of dropping out. Similarly, the level of income, parents’ education, and the type of school in which they attended secondary education were statistically significant in explaining dropout.

Álvarez [31] classified the factors that cause students to drop out as follows: (a) personal factors (motivational factors, psychological or emotional factors, student expectations, health problems, age, lack of discipline, etc.); (b) academic factors (lack of academic aptitude, lack of vocational guidance, poor choice of career or institution, poor academic performance, poor previous training, deficiencies in academic programs, etc.); and (c) socio-economic factors (precarious economic-social situation, institutional reasons, etc.).

Lin, Yu, and Chen [32], using the Probit model, analyzed in first-year students the predictive factors in the retention of higher education students, finding that the previous grade point average (GPA), obtained in secondary education, was one of the predictive factors, like the rank class, the size of the secondary school of origin, and gender (women are less likely). They also found very interesting results regarding possible interventions to reverse the dropout process: programs that include orientation or remedial English courses, on-campus jobs, and on-campus residency have a positive impact on retention.

At the same time, governments and universities have proposed affirmative action policies to help students overcome difficulties associated with these factors. Identifying these factors and analyzing their influence on student academic performance is an important process, which can help to identify at-risk students early and to take corrective actions during the educational process [33,34]. Mathematical modeling and data-mining algorithms have been used to identify factors that influence education-related phenomena.

Logistic regression is a traditional, predictive method that is often used in the educational field, especially when the predictor variables are continuous. The model is based on calculating the probability that a categorical variable will take a certain value from a set of values given by predictor variables. During model training, regression coefficients analogous to linear-regression variables are established. It is therefore necessary to verify the fulfillment of statistical assumptions to guarantee the validity of the model [35].

By contrast, Artificial Neural Networks (ANN) constitute a predictive model based on machine learning, which allows researchers to explore and model functional relationships between variables that cannot be established using traditional statistical methods. Neural networks have been used to solve prediction and classification problems in various areas of knowledge, generating special interest in the field of education, especially in relation to student performance modeling and variables that influence the educational process [35,36,37].

It is essential to study this phenomenon because there is a pressing need to reduce the dropout figures, at both the national and international level. The main objective of this paper was to present the dropout prediction model in order to identify the academic and socio-economic factors that cause students to drop out. This model will allow university authorities to early identify possible dropout cases and to establish policies that support these vulnerable students and reduce the dropout rates.

3. Methods

3.1. Participants

In Ecuador, Secretaría Nacional de Educación Superior, Ciencia, Tecnología e Innovación (SENESCYT) regulates the admission process to Higher Education Institutions (HEIs). In 2014, SENESCYT implemented an affirmative action policy through a socioeconomic characterization of applicants during the admission process. The aim of this affirmative action policy was to increase the rates of access to Higher Education of applicants in a situation of greater economic and social vulnerability. Thus, since 2017, SENESCYT has included, among the new students entering the Escuela Politécnica Nacional (EPN) leveling course, those students coming from this new admission process [34].

Data regarding 2097 first-year students enrolled in the Escuela Politécnica Nacional (EPN) leveling course during the 2017-B, 2018-A, and 2018-B academic periods were analyzed. The total number of participants in this study came from the socioeconomic characterization process of the new admission process of SENESCYT. The EPN is an engineering university that receives approximately 2% of the total student intake of all universities in Ecuador. The data were obtained from the EPN and SENESCYT databases. The characteristics of the study population are shown in Table 1.

3.2. Variables and Predictors

3.2.1. Predicted Variable

Dropout: Whether or not a student dropped out of the leveling course. The dropout data include cases in which a student who failed the leveling course did not enroll again in the EPN, as well as cases in which students did not register grades during the leveling course. These types of dropouts can occur during the first or second student-enrollment period. In exceptional cases, students also drop out during the third enrollment and readmissions periods. However, the predicted variable is Boolean, so modeling each type of dropout is beyond the scope of this work.

The models trained in the present study perform their classification based on probabilities calculated to predict the variable as a function of interactions between predictors. In the absence of conditioned or historical criteria for these probabilities, basic criteria were applied: probabilities greater than or equal to 0.5 were classified as Yes (students who dropped out of the leveling course), while probabilities less than 0.5 were classified as No (students who did not drop out of the leveling course).

3.2.2. Predictors

The variables in this study were selected from a set of 14 variables through correlation and an independence analysis. Only variables significantly associated with the dropout variable were considered. These variables are as follows:

Regime: The academic regime in which a student enters the leveling course. There are two regimes: Costa and Sierra. The Sierra regime is analogous to the autumn semester, which runs from September to January, while the Costa regime is analogous to the spring semester, which runs from March to July.
Leveling course type: The type of leveling course in which a student is enrolled. When students accept a place at the EPN, they are assigned to the leveling course in engineering, sciences and administrative sciences (corresponding to International Standard Classification of Education 6) or to the leveling course for a technical degree (corresponding to International Standard Classification of Education 5), depending on their chosen careers.
Application grade: The score that a student achieved on the university application exam. Exam scores range between 400 and 1000 points. The higher the score, the higher the student’s performance in the exam. The application score does not include the student’s high-school GPA.
Vulnerability index: A numerical indicator of a student’s relative socio-economic vulnerability. The index, scored over 1000 points, is inversely proportional to the student’s level of vulnerability. This index is calculated using information self-declared by students in a socioeconomic survey called Associated Factors Survey, which is filled by the students during the application process.

Through the Associated Factors Survey, information about the socioeconomic aspects of the students is collected. This information is stored in 25 variables and, subsequently, based on the students’ responses, weighted scores over 1000 points are assigned to each of the 25 variables. The 25 variables are listed below, and the corresponding weighted scores for each level of each variable are shown in parentheses.

Water: Whether or not the student’s home has potable water service. This variable has two levels: “No” (0.000) and “Yes” (39.000).
Waste: Whether or not the student’s home has waste collection service. This variable has two levels: “No” (0.000) and “Yes” (36.000).
Washing machine: Whether or not the student’s home has a washing machine. This variable has two levels: “No” (0.000) and “Yes” (27.000).
Internet: Whether or not the student’s home has an internet connection. This variable has two levels: “No” (0.000) and “Yes” (30.000).
Telephone: Whether or not the student’s home has a landline phone. This variable has two levels: “No” (0.000) and “Yes” (27.000).
Cable television: Whether or not the student’s home has cable or satellite television. This variable has two levels: “No” (0.000) and “Yes” (30.000).
Job: Whether the student works or not. This variable has two levels: “No” (0.000) and “Yes” (46.099).
Desktop computer: The number of desktop computers in the student’s home. This variable has two levels: “None” (0.000) and “One or more” (31.155).
Laptop: The number of laptops in the student’s home. This variable has two levels: “None” (0.000) and “One or more” (30.746).
Stove: The number of stoves in the student’s home. This variable has two levels: “None” (0.000) and “One or more” (29.000).
Radio: The number of radios in the student’s home. This variable has two levels: “None” (0.000) and “One or more” (27.000).
Refrigerator: The number of refrigerators in the student’s home. This variable has two levels: “None” (0.000) and “One or more” (35.000).
Microwave: The number of microwave ovens in the student’s home. This variable has two levels: “None” (0.000) and “One or more” (29.000).
Television: The number of televisions in the student’s home. This variable has two levels: “None” (0.000) and “One or more” (29.000).
Car: The number of cars in the student’s home. This variable has four levels: “None” (0.000), “One” (31.000), “Two” (55.000), and “Three or more” (64.000).
Toilet: The number of toilets in the student’s home. This variable has four levels: “None” (0.000), “One” (32.000), “Two” (53.000), and “Three or more” (67.000).
Bathroom: The number of bathrooms (containing either a bathtub or a shower) in the student’s home. This variable has two levels: “None” (0.000) and “One or more” (38.000).
Social security: The number of social security beneficiaries in the student’s home. This variable has two levels: “None” (0.000) and “One or more” (24.000).
Private insurance: The number of private insurance beneficiaries in the student’s home. This variable has two levels: “None” (0.000) and “One or more” (36.000).
Educational level: The highest educational level of the student’s father, mother or guardian. This variable has four levels: “None or unknown” (0.000), “Elementary school” (9.000), “High school” (22.000), and “University degree or higher” (45.000)
Occupation: The occupation of the student’s father, mother or guardian. This variable has thirteen levels: “Unemployed” (0.000), “Retired” (0.000), “Armed forces and police” (0.000), “Elementary occupations” (0.000), “Operators of facilities, machines and assemblers” (0.000), “Craftsmen” (0.000), “Qualified agricultural and fishing worker” (0.000), “Office employees” (25.000), “Mid-Level Technicians” (34.000), “Scientific and intellectual professionals” (46.000), “Management staff of a private company” (48.000), “Management staff of the public administration” (48.000), and “Service and merchant worker” (73.000).
Wall: The material of the walls of the student’s house. This variable has five levels: “Uncoated cane or other materials” (0.000), “Coated cane, bahareque or wood” (14.000), “Adobe or rammed earth” (33.000), “Brick or CMU” (48.000), and “Concrete” (51.000)
Floor: The material of the floor of the student’s house. This variable has five levels: “Cane or ground” (0.000), “Untreated wood” (4.000), “Brick or cement” (26.000), “Parquet, plank, floating floor or carpeting” (42.000), and “Ceramic, tile, vinyl or marble” (43.000).
Housing: Type of student’s home. This variable has eight levels: “Hut or covacha” (0.000), “Ranch” (16.000), “Mediagua” (54.000), “Room(s) in tenancy house” (69.000), “Collective housing” (69.000), “Apartment in house or building” (69.000), “Suite” (69.000), and “House” (69.000).
Wastewater: Type of sewage collection at the student’s home. This variable has six levels: “None” (0.000), “Direct discharge to the sea, river, lake or stream” (0.000), “Latrine” (1.000), “Connected to cesspit” (14.000), “Connected to septic tank” (30.000), and “Connected to public sewerage” (44.000).

These weighted scores were determined by the Instituto Nacional de Estadística y Censos of Ecuador, by analyzing the Living Conditions Survey that was applied in the last population and housing census. The same 25 variables of the Associated Factors Survey with their corresponding weighted scores were used by the government to determine the socioeconomic status of Ecuadorians with the Living Conditions Survey.

Finally, the weighted scores are added to calculate the vulnerability index. In this way, the vulnerability index determines that an applicant presents a situation of greater vulnerability when the score is closer to zero.

3.3. Model Training

Two predictive models were trained: a logistic regression which is based on classical criteria, and an artificial neural network which is based on machine learning. These models were chosen since several authors have demonstrated their efficiency in solving problems, such as the one proposed in this study [35,36,38,39,40].

3.3.1. Logistic Regression

A logistic regression was used to model the student dropout rate in the EPN leveling course based on academic and socio-economic information.

First, the data were partitioned into training sets (70%), and testing (30%), using a random sampling process. Next, a logistic regression was carried out with lasso regularization and a C value of 0.001.

Then, the model was then evaluated in the test set; a significance test was carried out to determine the statistical significance of each variable in the model [35].

3.3.2. Artificial Neural Network

An ANN was trained to model student dropout rates in the EPN leveling course based on academic and socio-economic information.

First, the data were partitioned into training (70%), validation (15%), and test (15%) sets, using a random sampling process. Next, an artificial neural network was modeled on the training dataset. The hyperparameters for the neural network were as follows: one hidden layer, Adam optimization algorithm, a learning rate of 0.0001, and a maximum number of 4000 iterations. Various models of neural networks were trained, varying the number of neurons in the hidden layer and the function activation. The number of neurons in the hidden layer was derived from Equation (1):

\frac{2}{3} N I L + N O L \leq N H L \leq 2 N I L

(1)

where NHL is the number of neurons in the hidden layer, NIL is the number of neurons in the input layer, and NOL is the number of neurons in the output layer.

The activation functions tested were as follows: identity, logistics, hyperbolic tangent, and rectified linear unit (ReLu).

Once the maximum classification accuracy of the model was obtained in the validation set, the neural network was evaluated in the test set, and a Garson test was carried out to determine the relative importance of each variable in the model [41,42,43].

3.4. Model Performance Evaluation

Although the procedure used to train the models made it possible to obtain a first approximation of their performance, this procedure was very susceptible to the problem of overfitting on the training and validation sets. Therefore, the general performance of the models was determined by cross-validation with k = 10, which minimizes the effects of overfitting and selection bias. The performance of each model was evaluated using the following indicators: accuracy of classification and area under the Receiver Operating Characteristics (ROC) curve, which is a graph that shows the performance of a classification model at all classification thresholds, and the area under this curve provides an aggregate measure of performance across all possible classification thresholds. [39,44].

Preliminary tests were carried out in Orange 3.22.0. Descriptive analyses and hypothesis tests were performed in SPSS 22. Model training was performed in RStudio Version1.2.1335.

4. Results and Discussion

The results of the logistic regression are presented below, in Table 2 and Table 3.

Of the four variables considered to model dropping out of the leveling course through logistic regression, only application grade and regime had statistical importance at a significance level of 5% (the p-value for each term tests the null hypothesis that the coefficient is equal to zero, which implies that it has no effect on the modeled variable). In other words, neither the vulnerability index nor the type of leveling course provided useful information for the model. On the other hand, the residuals were distributed symmetrically, which is why they could be considered normally distributed. Table 4 shows the confusion matrix of the logistic regression model, obtained by cross validation. The correctly classified instances were 1525 (true positives) and 0 (true negatives), while the misclassified instances were 571 (false positives) and 1 (false negative). From this information, a classification accuracy of 0.727 was obtained (1525/2097); although this is a relatively high value, it is not an adequate indicator of the overall performance of the model, since the selectivity of the model is null. In other words, the model practically classifies all students as having dropped out of the leveling course, since, according to the original data, a student is more likely than not to have dropped out (72.8% of cases).

Table 5 presents the training stage results of the artificial neural network. During the training stage, twenty-four models were trained with different combinations of neurons in the hidden layer and activation functions, while the number of neurons in the input and output layers remained constant and equal to 4 and 1, respectively. Thus, according to Equation (1), in the training stage, the range of neurons in the hidden layer was between 3.7 and 8, and since the number of neurons must be an integer, the actual range is between 4 and 8.

It is observed that the simplest model, with 4–3–1 architecture (4 neurons in the input layer, 3 neurons in the hidden layer, 1 neuron in the output layer) and an identity activation function, presents a relatively high and comparable performance with respect to the other models. However, the number of neurons in the hidden layer is outside the range established in the methodology.

The model with 4–7–1 architecture and a logistic activation function, and the model with 4–6–1 architecture and the activation function ReLu present comparable performances. Although the classification accuracy of the 4–6–1 ReLu model is slightly higher (less than 1%), the area under the ROC curve of the latter is slightly higher (less than 1%). For this reason—and because the area under the ROC curve is the most significant indicator of model performance—the model with 4–7–1 architecture and a logistic activation function was selected as the optimized model.

Figure 1 shows the results of the Garson test, applied to the neural network model with 4–7–1 architecture and a logistic activation function.

All variables have a relative importance greater than 5%, the most important being the vulnerability index. The reference category for the regime variable was Sierra and for the leveling course variable was technical degree.

Table 6 shows the confusion matrix of the neural network model obtained by cross-validation. From this information, a classification accuracy of 0.768 was obtained, a relatively high performance value with a precision value of 0.796.

Table 7 compares the models’ performance indicators; Figure 2 shows the ROC curve for each model.

The area under the neural network ROC curve is greater than 0.5, while the corresponding value for logistic regression is less than 0.5; this value (0.5) constitutes an important theoretical reference, since, according to literature, models with an AUC value under 0.5 operate in a random manner, that is, without considering interactions between variables [35]. For this reason, it can be stated that the ANN model performs better than logistic regression.

At this point, it is important to note that the vulnerability index, which had no statistical importance in the logistic regression model, had the greatest relative importance in the neural network model. This contrast may show that machine-learning methods can find relationships between variables that are rarely found using traditional methods.

When the classification accuracy of the two models is compared, the values are similar. However, as the corresponding results indicate, the logistic regression model cannot correctly classify students who did not drop out of the leveling course. This shows that the performance of a model cannot be evaluated solely by its classification accuracy, if so, there would be no notable differences between the logistic regression and the neural network. Nevertheless, as mentioned before, when the two models are compared by the area under their corresponding ROCs curves, the ANN performs much better than the logistic regression, and, for this reason, the ANN model should be applied to predict dropout in the leveling course. Although the neural network model presents a false-positive rate of 0.627, this contrasts with a false-negative rate of 0.085. This result is extremely important because the model will be used to carry out interventions with students at risk of dropping out. From an academic point of view, it is far better to carry out interventions on students who do not need them (false positives) than to fail to identify students who do need interventions (false negatives).

The classification accuracy of the ANN model did not exceed 80%. This apparently low performance is explained, on the one hand, from a theoretical perspective, since dropout is a multifactorial phenomenon and it is impossible to include all the factors and their corresponding information in a model, as every model will have a range of misclassifications as a consequence [35,43]. On the other hand Helal [42] and Yang [45] agree that the predictions made in very early stages of the educational process are inaccurate, therefore, it is advisable to build and apply different models at different educational stages. This is evidenced, for example, when comparing the classification accuracy of the ANN model of this study (0.768) with the results obtained by [46], who modeled multilayer perceptrons (a more complex type of ANN) to predict the dropout in the pre-registration, first semester and first year stages. In this sense, the classification accuracy of the model applied in the pre-registration stage, analogous to the application stage of the ANN model built in this study, was 0.667; however, the classification accuracies of the models applied in the first semester and the first year were greater than 0.95. This suggests that the model of the present study could be adapted and applied at different stages so that more factors that influence dropout rates could be included.

However, despite the fact that the ANN model proposed in this study makes predictions at the stage in which students are just entering university, its application is an advantage in terms of carrying out early academic interventions, since dropout rates are higher during the first years of university [47]. Furthermore, the ANN model may be periodically optimized since the variables used in this study are generated in each academic semester; in this way, the student information generated during each semester will be incorporated into the partition data for training the model.

From a practical point of view, a neural network is considered as a black box in the sense that, although the results are easily interpretable, the interactions between the variables and the activation functions are far complex. Therefore, the neural network model presented in this study does not directly determine the profile of a student at risk of dropping out. However, based on the results of the model, some approximations can be achieved by applying traditional techniques to determine the profile of a dropout student.

Table 8 shows the results of a t-test, comparing the vulnerability index and application grades of students who did and did not drop out. The vulnerability index and application grade both reveal statistically significant differences between students who dropped out of the leveling course and those who did not. Overall, the values of vulnerability index and application grades of students who remained in the leveling course are higher than the values for students who dropped out.

Table 9 shows the results of an independence test of the regime and type of leveling course, comparing students who did and did not drop out. It is clear that the regime and type of leveling course are not independent of the dropout variable. An analysis of the proportions of each category shows that students enrolled in the Costa regime and leveling course for a technical degree were more likely to drop the leveling course.

The results achieved in this study agree with those of other investigations in that among the risk factors that influence dropout rates are those of a socio-economic and academic nature [48]. It is possible to categorize the variables into one of these types of factors.

First, the application grade in Table 8 is a variable that shows students’ previous knowledge in mathematics, language, natural sciences, and social sciences, therefore, it is an academic factor. As explained above, the vulnerability index is an indicator of a student’s relative socio-economic vulnerability, therefore, it is a socio-economic factor.

On the other hand, the regime variable shows the academic period in which students enter the leveling course. When a student graduates from high school, they should enter to EPN in the Sierra regime. However, those students whose application grade was not high enough to get a place in the EPN, take the admission test again and, if their application score is high enough, then they have the possibility of entering the Costa regime. In this context, the regime is an academic factor. Most of the students who enter the Costa regime are those who did not enter the leveling course in the process corresponding to their first application; they are students with a lower academic level. Most of the students who enter the Sierra regime are those who entered the EPN in the process corresponding to their first application.

Regarding the variable regarding leveling course type, the application grade of engineering sciences, and administrative sciences students is higher than that of the students of technical degree, (in this study, the mean of the application grade of the engineering sciences, and administrative sciences students was 863.26, while that of the students of technical degree was 790.66). The variable leveling course type is an academic factor.

The problem of university dropouts has economic repercussions, since if students drop out, the money that has been allocated to their fees and scholarships ends up not being used. González and Uribe [49] indicated that 23.5% of the expenditure invested by the state in higher education is lost with desertion. In this way, if the student drops out, the investment made by the state or by private entities in those students cannot be recovered. Likewise, not having university studies is related to a greater probability of suffering unemployment, since most of the unemployed in Ecuador, as previously mentioned, do not have university studies. Therefore, university dropout becomes a very important problem, both for the individual and for the university institution, which in order to improve its effectiveness should try to reverse this dropout problem.

5. Conclusions

The present study has considered variables related to academic and socio-economic factors influencing dropout. In the model proposed in this study, the vulnerability index is a socio-economic factor, whereas the application grade, the regime, and the leveling course type are academic factors.

In comparing the logistic regression and neural network models, this study concludes that the neural network model offers superior performance. The area under the logistic regression model operation curve is less than 0.5, meaning that this model cannot consider the interactions between variables based on the baseline classification level.

The optimized neural network model has 4–7–1 architecture and a logistic activation function. This model has a classification accuracy of 0.768 and an area under the ROC curve of 0.795. Although the neural network model has a false-positive rate of 0.627, it is offset by a false-negative rate of 0.085. This result is extremely important because the model will be used to carry out interventions with students at risk of dropping out. From an academic point of view, it is far better to carry out interventions with students who do not need them (false positives) than to fail to identify students who do need them (false negatives).

From running the optimized model and statistical analysis of the variables considered when developing the model, we conclude that students who are likely to drop out the leveling course have a profile that includes a low application grade, a low vulnerability index, and enrolment in both the Costa regime and the leveling course for the technical degree. However, it is not possible to establish the threshold for the quantitative variables (application grade and vulnerability index) at which a student risks dropping out. The results of the model are the product of a complex interaction between the four variables. One way of optimizing the results of the neural network is to model a multilayer perceptron, a more complex neural network, with more hidden layers and hyperparameters. However, this modeling falls outside the scope of this work. This model could be implemented by the Admissions Unit of Escuela Politécnica Nacional to identify potential dropout cases early, establishing, in coordination with the university authorities, policies that support such students. In this way, the dropout and failure rates could be reduced.

Considering the entire process of developing this study, it is essential to reiterate that students drop out as a consequence of combinations of variables. We must reconsider the relationship between those variables and levels of social integration, since following the logic of [17], students who reach higher education shape their own social integration based on situations that involve rewards. This presents institutes of higher education with a task: they must implement strategies that generate a sense of belonging and motivate students to remain in university. Likewise, given the fact that previous academic performance, attendance, active class participation, dedication to studying, and other activities have an impact on the dropout phenomenon, this study advises universities to implement actions that promote student persistence and support affirmative-action students. Institutes of higher education should create support and guidance programs for students at a higher risk of dropping out, implementing strategies that address socio-economic and psychological factors, as well as institutional and academic issues.

Early identification of students at risk of dropping out will allow authorities and university teachers to take actions with the aim of reducing the dropout rate. Martínez [50] points out that among the preventive actions to strengthen retention, tutoring programs, mentoring, preparatory courses, first-year seminars, remedial courses, curricular learning communities, learning support services, and use of technology to make teaching more flexible and motivate students stand out. In the case of the EPN, it is suggested to define the access process for students who are at risk of dropping out. The application of diagnostic tests and questionnaires would allow one to determine the profile in the academic, emotional and socioeconomic areas of these students with the aim of identifying those areas in which interventions should be carried out. Those students with academic deficiencies should take remedial courses in the different subjects that have deficiencies as well as tutoring programs to support their learning process. Students living in poverty should receive institutional scholarships and be regularly monitored by the Student Welfare Directorate. Additionally, peer mentoring programs should be established, in which higher-level students provide accompaniment to lower-level students. All these actions together will allow students at risk of dropping out to be adequately integrated into university life, thereby reducing university dropout rates.

One limitation of this study is the fact that it has analyzed the specific case of the EPN, an engineering university that receives approximately 2% of the total student intake of all universities in Ecuador. The model can easily be applied by universities that share the same variables as the EPN. For other universities, however, a new model with new variables must be generated. A future study could standardize the variables and generate a model that could be used at the national level.

The study was developed in different conditions than the current ones; the data used in this research correspond to the face-to-face teaching modality. The COVID-19 pandemic positively affects school dropouts, as millions of students have been affected, both by the closure of schools around the world and by the global economic recession. According to the World Bank Group [51], until the end of April, schools had been closed in 180 countries, and 85% of students around the world were not attending school; in addition, the world economy will shrink by 3% in 2020. Dropouts will increase, and many of these students will probably drop out of school forever. The highest dropout rate will be concentrated in students belonging to vulnerable groups, as they are forced to abandon their studies due to lack of economic and technological resources, thus widening the existing gap in education. These events delay the achievement of the sustainable development goal of ensuring inclusive, equitable and quality education for all by 2030.

Author Contributions

Conceptualization, I.S.-P., J.V., and R.G.-C.; methodology, D.N.; software, D.N.; validation, I.S.-P., D.N. and J.V.; formal analysis, D.N.; data curation, D.N.; writing—original draft preparation, I.S.-P., D.N., J.V., and R.G.-C.; writing—review and editing, R.G.-C.; project administration, I.S.-P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Escuela Politécnica Nacional, grant number PIJ-17-12.

Acknowledgments

The authors thank the Escuela Politécnica Nacional for the funding of the project PIJ-17-12: “Implementación de un Curso Preparatorio Piloto para Estudiantes de Grupos Vulnerables—Política de Cuotas”.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

Caballero Dominguez, C.; Reyes Sarmiento, M.; Rodríguez Pautt, A.; Bolivar Troncoso, A. Factores de Riesgo Sociodemográficos, Psicosociales y Académicosde Abandono de los estu- dios en Estudiantes de Primer Semestre de la Universidad del Magdalena, Colombia. In Proceedings of the Tercera Conferencia Lationamericana Sobre el Abandono en la Educación Superior, Cali, Colombia, 14–16 November 2018. [Google Scholar]
Gartner, L.; Gallego, C. La deserción estudiantil en la Universidad de Caldas: Sus características, factores determinantes y el impacto de las estrategias institucionales de prevención. In Proceedings of the V CLABES Quinta Conferencia Latinoamericana sobre el Abandono en la Educación Superior, Talca, Chile, 11–13 November 2015. [Google Scholar]
Castrillón, G. Factores que Inciden en la Deserción Estudiantil en el Programa Académico Administración de Empresas en la Universidad del Valle Sede Pacífico; Universidad del Valle: Buenaventura, Colombia, 2014. [Google Scholar]
Schneider, M.; Preckel, F. Variables associated with achievement in higher education: A systematic review of meta-analyses. Psychol. Bull. 2017, 143, 565–600. [Google Scholar] [CrossRef]
Tinto, V.; Cullen, J. Dropout in Higher Education: A Review and Theoretical Synthesis of Recent Research; Office of Planning, Budgeting, and Evaluation of the U.S. Office of Education: New York, NY, USA, 1973.
Schneider, M.; Yin, L. The Hidden Costs of Community Colleges; American Institutes for Research: Washington, DC, USA, 2011; Volume 37. [Google Scholar]
Schneider, M. Finishing the First Lap: The Cost of First Year Student Attrition in America’s Four Year Colleges and Universities. Am. Inst. Res. 2010, 23. [Google Scholar]
Vossensteyn, H.; Kottmann, A.; Jongbloed, B.; Kaiser, F.; Cremonini, L.; Stensaker, B.; Hovdhaugen, E.; Wollscheid, S. Drop-Out and Completion in Higher Education in Europe—Literature Review; Publications Office of the European Union: Luxembourg, 2015. [Google Scholar]
National Statistical Institute of Norway. Completion Rates of Students in Higher Education—SSB. Available online: https://www.ssb.no/en/utdanning/statistikker/hugjen/aar (accessed on 10 September 2020).
OECD. Education at a Glance 2019: OECD Indicators; OECD Publishing: Paris, France, 2019. [Google Scholar]
Marta Ferreyra, M.; Avitabile, C.; Botero Álvarez, J.; Haimovich Paz, F.; Urzúa, S. The Economic Impact of Higher Education; International Bank for Reconstruction and Development/The World Bank: Washington, DC, USA, 2017. [Google Scholar]
Amo, C.; Santelices, M.V. Trayectorias universitarias: MÁS QUE PERSISTENCIA O DESERCIÓN. In Proceedings of the VII CLABES Séptima Conferencia Latinoamericana sobre el Abandono en la Educación Superior, Córdoba, Argenitna, 15–17 November 2017. [Google Scholar]
Montoya Gutiérerz, G. Estudio factores asociados al abandono temprano de la educación superior. In Proceedings of the IV CLABES Cuarta Conferencia Latinoamericana sobre el Abandono en la Educación Superior, Medellín, Colombia, 22–24 October 2014; Volume 1. [Google Scholar] [CrossRef]
Lara, H.O.; Silva, J.S.; Galeano, M.O.; Carreño, C.C.; Ariza, A.B. Estudio factores asociados a la deserción estudiantil en la universidad minuto de dios de la sede virtual ya distancia. In Proceedings of the Congr. CLABES VII CLABES Séptima Conferencia Latinoamericana sobre el Abandono en la Educación Superior, Córdoba, Argenitna, 15–17 November 2017. [Google Scholar]
Donoso, S.; Schiefelbein, E. Análisis de los modelos explicativos de retención de estudiantes en la universidad: Una visión desde la desigualdad social. Estud. Pedagógicos 2007, XXXIII, 7–27. [Google Scholar] [CrossRef]
Lugo, B. La Deserción Estudiantil: ¿Realmente Es Un Problema Social? Rev. Postgrado FACE-UC 2013, 7, 289–309. [Google Scholar]
Tinto, V. Research and practice of student retention: What next? J. Coll. Student Retent. Res. Theory Pract. 2006, 8, 1–19. [Google Scholar] [CrossRef]
Bean, J.P. Student attrition, intentions, and confidence: Interaction effects in a path model. Res. High. Educ. 1982, 17, 291–320. [Google Scholar] [CrossRef]
Chen, R.; DesJardins, S.L. Exploring the effects of financial aid on the gap in student dropout risks by income level. Res. High. Educ. 2008, 49, 1–18. [Google Scholar] [CrossRef]
Chen, R.; DesJardins, S.L. Investigating the Impact of Financial Aid on Student Dropout Risks: Racial and Ethnic Differences. J. Higher Educ. 2010, 81, 179–208. [Google Scholar] [CrossRef]
Braxton, J.M.; Milem, J.F.; Sullivan, A.S. The influence of active learning on the college student departure process toward a revision of Tinto’s theory. J. Higher Educ. 2000, 71, 569–590. [Google Scholar] [CrossRef]
Kuh, G.D. Organizational Culture and Student Persistence: Prospects and Puzzles. J. Coll. Student Retent. Res. Theory Pract. 2001, 3, 23–39. [Google Scholar] [CrossRef]
Tinto, V. Una Reconsideración de las Teorias de la Deserción Estudiantil. En Trayectoria Escolar en la Educación Superior; ANUIES-SEP: México D.F., Mexico, 1989. [Google Scholar]
Bernardo, A.; Esteban, M.; Fernández, E.; Cervero, A.; Tuero, E.; Solano, P. Comparison of Personal, Social and Academic Variables Related to University Drop-out and Persistence. Front. Psychol. 2016, 7, 1610. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Sánchez-arévalo, M.L.; Cruz-hueso, L.F.; Ferro-escobar, R. Modelo de aproximación al comportamiento de la deserción voluntaria universitaria en pregrados de Ingeniería periodo 2015–2018. Ing. Solidar. 2018, 14, 1–27. [Google Scholar] [CrossRef]
Gallegos, J.A.; Campos, N.A.; Canales, K.A.; González, E.N. Factores Determinantes en la Deserción Universitaria. Caso Facultad de Ciencias Económicas y Administrativas de la Universidad Católica de la Santísima Concepción (Chile). Form. Univ. 2018, 11, 11–18. [Google Scholar] [CrossRef]
Sandoval, I.; Sánchez, T.; Velasteguí, V.; Naranjo, D. Factores Asociados Al Abandono En Estudiantes De Grupos Vulnerables. Caso Escuela Politécnica Nacional. In Proceedings of the Congr. CLABES Conferencia Latinoamericana sobre el Abandono en la Educación Superior, Ciudad de Panamá, Panamá, 14–16 November 2018. [Google Scholar]
Truta, C.; Parv, L.; Topala, I. Academic engagement and intention to drop out: Levers for sustainability in higher education. Sustainability 2018, 10, 4637. [Google Scholar] [CrossRef] [Green Version]
Kim, D.; Kim, S. Sustainable Education: Analyzing the Determinants of University Student Dropout by Nonlinear Panel Data Models. Sustainability 2018, 10, 954. [Google Scholar] [CrossRef] [Green Version]
Quinn, J. Drop-out and Completion in Higher Education in Europe among Students from under-Represented Groups; Network of Experts on Social Aspects of Education and Training: Plymouth, UK, 2013; 104p. [Google Scholar]
Álvarez, J. Etiología de un Sueño o el Abandono de la Universidad por Parte de los Estudiantes por Factores no Académicos; Universidad Autónoma de Colombia: Bogotá, Colombia, 1997. [Google Scholar]
Lin, T.C.; Yu, W.W.C.; Chen, Y.C. Determinants and probability prediction of college student retention: New evidence from the Probit model. Int. J. Educ. Econ. Dev. 2012, 3, 217. [Google Scholar] [CrossRef]
Di Caudo, M. Política de cuotas en Ecuador: Me gané una beca para estudiar en la Universidad. Ponto-e-Vírgula. Rev. Ciências Sociais. 2015, 17, 196–218. [Google Scholar]
Jimenez, A.; Naranjo, D.; Sanchez, T.; Sandoval, I. Proposal of a Mathematics Pilot Program for Engineering Students from Vulnerable Groups of Escuela Politécnica Nacional. In Proceedings of the 17th LACCEI International Multi-Conference for Engineering, Education, and Technology, Montego Bay, Jamaica, 24–26 July 2019. [Google Scholar] [CrossRef]
Marbouti, F.; Diefes-Dux, H.A.; Madhavan, K. Models for early prediction of at-risk students in a course using standards-based grading. Comput. Educ. 2016, 103, 1–15. [Google Scholar] [CrossRef] [Green Version]
Baars, G.J.A.; Stijnen, T.; Splinter, T.A.W. A Model to Predict Student Failure in the First Year of the Undergraduate Medical Curriculum. Heal. Prof. Educ. 2017, 3, 5–14. [Google Scholar] [CrossRef]
Mason, C.; Twomey, J.; Wright, D.; Whitman, L. Predicting Engineering Student Attrition Risk Using a Probabilistic Neural Network and Comparing Results with a Backpropagation Neural Network and Logistic Regression. Res. High. Educ. 2018, 59, 382–400. [Google Scholar] [CrossRef]
Cao, W.; Wang, X.; Ming, Z.; Gao, J. A review on neural networks with random weights. Neurocomputing 2018, 275, 278–287. [Google Scholar] [CrossRef]
Juba, B.; Le, H.S. Precision-Recall versus Accuracy and the Role of Large Data Sets. Proc. AAAI Conf. Artif. Intell. 2019, 33, 4039–4048. [Google Scholar] [CrossRef] [Green Version]
Mayra, A.; Mauricio, D. Factors to predict dropout at the universities: A case of study in Ecuador. In Proceedings of the 2018 IEEE Global Engineering Education Conference (EDUCON), Tenerife, Spain, 17–20 April 2018; pp. 1238–1242. [Google Scholar] [CrossRef]
Teoh, E.J.; Tan, K.C.; Xiang, C. Estimating the number of hidden neurons in a feedforward network using the singular value decomposition. IEEE Trans. Neural Netw. 2006, 17, 1623–1629. [Google Scholar] [CrossRef] [PubMed]
Helal, S.; Li, J.; Liu, L.; Ebrahimie, E.; Dawson, S.; Murray, D.J.; Long, Q. Predicting academic performance by considering student heterogeneity. Knowledge-Based Syst. 2018, 161, 134–146. [Google Scholar] [CrossRef]
Vandamme, J.-P.; Meskens, N.; Superby, J.-F. Predicting Academic Performance by Data Mining Methods. Educ. Econ. 2007, 15, 405–419. [Google Scholar] [CrossRef]
Cawley, G.C.; Talbot, N.L.C. On over-fitting in model selection and subsequent selection bias in performance evaluation. J. Mach. Learn. Res. 2010, 11, 2079–2107. [Google Scholar]
Yang, F.; Li, F.W.B. Study on student performance estimation, student progress analysis, and student potential prediction based on data mining. Comput. Educ. 2018, 123, 97–108. [Google Scholar] [CrossRef] [Green Version]
Callejas, Z.; De, U.; Informáticas, C. Predicting computer engineering students dropout in cuban higher EDUCATION with pre-enrollment. J. Technol. Sci. Educ. 2020, 10, 241–258. [Google Scholar]
Vila, D.; Granda, P.; Ortega, C. Technology Trends; Botto-Tobar, M., Pizarro, G., Zúñiga-Prieto, M., D’Armas, M., Zúñiga Sánchez, M., Eds.; Communications in Computer and Information Science; Springer International Publishing: Cham, Switzerland, 2019; Volume 895, ISBN 978-3-030-05531-8. [Google Scholar] [CrossRef]
Zambrano, G.; Rodríguez, K.; Guevara, L. Análisis de la Deserción Estudiantil en las Universidades del Ecuador y América Latina. Pertinencia Académica 2018, 8, 1–28. [Google Scholar]
González, L.E.; Uribe, D. Estimaciones sobre la “repitencia” y deserción en la educación superior chilena. Consideraciones sobre sus implicaciones. Calidad en la Educación. 2002, 17, 75–90. [Google Scholar] [CrossRef] [Green Version]
Martínez Guerrero, J.; Campillo Labrandero, M.; Ibarra Vega, R. Desarrollo de un sistema de evaluación de prácticas paradisminuir el abandono en Educación Superior. In Proceedings of the III CLABES Tercera Conferencia Latinoamericana sobre el Abandono en la Educación Superior, México D.F., México, 13–15 November 2013. [Google Scholar]
World Bank. The-COVID-19-Pandemic-Shocks-to-Education-and-Policy-Responses. Available online: https://openknowledge.worldbank.org/handle/10986/33696 (accessed on 5 October 2020).

Figure 1. Garson test results.

Figure 2. ROC curve of the logistic regression models (orange) and neural network (green) for dropping out of the leveling course.

Table 1. Characteristics of the study population.

Variable	Distribution
Gender	65.3% Male
Gender	34.7% Female
Ethnicity	92.7% Mestizo	0.8% Mulatto
	3.2% Indigenous	0.2% Black
	1.2% White	0.3% Montubio
	1.3% Afro-Descendant	0.3% Other
Province of Origin	84.0% Pichincha,	0.8% Carchi	0.3% Chimborazo	0.1% Santa Elena
	2.1%Tungurahua	3.4% Imbabura	0.4% Sucumbíos	0.2% Zamora Chinchipe
	1.9% Cotopaxi	0.6% El Oro	0.2% Guayas	0.1% Orellana
	2.0% Santo Domingo	0.1% Azuay	1.5% Manabí
	1.1% Esmeraldas	0.4% Bolívar	0.8% Loja
Population Segment	93.5% General Population
	5.3% Affirmative Action
	1.1% Territorial Merit
	0.1% High Performance Group
Regime	60.8% Costa
Regime	39.2% Sierra
Dropout	72.8% Yes
Dropout	27.2% No
Leveling Course Type	76.3% Engineering, Sciences and Administrative Sciences 23.7% Technical Degree
Leveling Course Type
Application Grade	846.02 ± 81.679
Vulnerability Index	667.6287 ± 116.60856

Table 2. The coefficients and statistical significance of the logistic regression model for dropping out of the leveling course.

Coefficient	Value	p-Value
Intercept	17.1660958	<2 × 10⁻¹⁶
Application Grade	−0.0175573	<2 × 10⁻¹⁶
Vulnerability Index	−0.0002889	0.624
Regime (Sierra)	−1.8613324	<2 × 10⁻¹⁶
Leveling Course Type (Technical Degree)	0.0454707	0.815

Table 3. Residual deviations in the logistic regression model for dropping out of the leveling course.

Min	1st Q	Median	3rd Q	Max
−2.7581	−0.8236	0.3987	0.7675	2.0675

Table 4. Confusion matrix of the logistic regression model for dropping out of the leveling course.

		Predicted Value
		Yes	No	Total
Actual Value	Yes	1525	1	1526
Actual Value	No	571	0	571
	Total	2096	1	2097

Table 5. Area under the ROC curve (AUC) and Classification Accuracy (CA) as a function of the number of neurons in the hidden layer and the activation function of the neural network model for dropping out of the leveling course.

Neurons in Hidden Layer	Activation Function
	Identity		Logistic		Tanh		ReLu
	AUC	CA	AUC	CA	AUC	CA	AUC	CA
3	0.795	0.768	0.795	0.763	0.793	0.767	0.792	0.767
4	0.796	0.766	0.795	0.765	0.791	0.76	0.792	0.754
5	0.795	0.767	0.795	0.766	0.794	0.763	0.793	0.761
6	0.796	0.764	0.795	0.767	0.794	0.760	0.794	0.769
7	0.795	0.767	0.795	0.768	0.794	0.768	0.794	0.762
8	0.796	0.766	0.795	0.764	0.795	0.765	0.795	0.767

Table 6. Confusion matrix of the Artificial Neural Network (ANN) model for dropping out of the leveling course.

		Predicted Value
		Yes	No	Total
Actual Value	Yes	1397	129	1526
Actual Value	No	358	213	571
	Total	1755	342	2097

Table 7. Area under the ROC curve (AUC) and Classification Accuracy (CA) of the logistic regression and neural network models for dropping out of the leveling course.

Model	AUC	CA
Logistic Regression	0.475	0.727
ANN	0.795	0.768

Table 8. Results of the t-test for the vulnerability index and application grades on dropping out of the leveling course.

Variable	Difference of Means (Stay–Drop Out)	p-Value
Vulnerability Index	20.716	0.000
Application Grade	62.177	0.000

Table 9. Results of a test of independence of regime and type of leveling course, comparing students who did and did not drop out.

Variable	Chi-Squared	p-Value
Regime	74.986	0.000
Leveling Course	47.212	0.000

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sandoval-Palis, I.; Naranjo, D.; Vidal, J.; Gilar-Corbi, R. Early Dropout Prediction Model: A Case Study of University Leveling Course Students. Sustainability 2020, 12, 9314. https://doi.org/10.3390/su12229314

AMA Style

Sandoval-Palis I, Naranjo D, Vidal J, Gilar-Corbi R. Early Dropout Prediction Model: A Case Study of University Leveling Course Students. Sustainability. 2020; 12(22):9314. https://doi.org/10.3390/su12229314

Chicago/Turabian Style

Sandoval-Palis, Iván, David Naranjo, Jack Vidal, and Raquel Gilar-Corbi. 2020. "Early Dropout Prediction Model: A Case Study of University Leveling Course Students" Sustainability 12, no. 22: 9314. https://doi.org/10.3390/su12229314

APA Style

Sandoval-Palis, I., Naranjo, D., Vidal, J., & Gilar-Corbi, R. (2020). Early Dropout Prediction Model: A Case Study of University Leveling Course Students. Sustainability, 12(22), 9314. https://doi.org/10.3390/su12229314

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Early Dropout Prediction Model: A Case Study of University Leveling Course Students

Abstract

1. Introduction

2. Literature Review

3. Methods

3.1. Participants

3.2. Variables and Predictors

3.2.1. Predicted Variable

3.2.2. Predictors

3.3. Model Training

3.3.1. Logistic Regression

3.3.2. Artificial Neural Network

3.4. Model Performance Evaluation

4. Results and Discussion

5. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI