Institutional Challenge Dataset—Course Materials
Figure 3 shows the difficulty in understanding NOUN course materials. Thus, 58.9% of the students find it difficult to understand NOUN course materials, while 41.1% of students do not find it difficult.
Figure 4 shows the percentage of students who abandoned the program due to frustration and delay in obtaining information from NOUN. Further, 77.3% indicates the percentage of students who were frustrated, while 22.7% comprised students who abandoned their program but not because of frustration.
Figure 5 shows that 51.7% abandoned their study due to lack of the use of a social network platform to communicate with their departmental course mates, while 48.3% was not due to lack of the use of a social network platform. This suggests that social networking platforms are a popular way for students to stay connected with each other and share information.
Figure 6 shows that 48.8% of the students abandoned their studies due to poor academic performance, while 51.2% of the students were not due to poor academic performance.
Figure 7 indicates that 61.5% of the students who abandoned their program did so due to inadequate communication skills with the university, while 38.5% of NOUN students who abandoned their programs did not do so due to inadequate communication skill with the institution.
Figure 8 shows that 47.8% of the students failed to complete their studies in NOUN due to family challenges, while 52.2% of NOUN students did not complete their studies due to other reasons.
Figure 9 indicates that 58.2% of NOUN students abandoned their studies due to financial reasons, while 41.8% was due to other reasons.
Figure 10 indicates that 44.9% of NOUN students failed to complete their studies on health grounds, while 55.1% was due to other reasons.
Figure 11 shows a correlation matrix of the factors affecting students’ attrition. The Pearson correlation coefficient was reflected on every pair of factors. There are three possible correlation coefficients: -1 for a perfect negative correlation, 0 for no correlation, and 1 for a perfect positive correlation. The strongest positive correlation is between poor academic performance and no social networking (r = 0.14). This means that students who had poor academic performance were more likely to drop out of the course due to social networks. Other significant positive correlations include difficult course materials and no social networking (r = 0.11). This means that students who found the course materials difficult or who were frustrated with not knowing about social networking provided by the university were more likely to abandon their studies.
Figure 12 shows a correlation matrix between different reasons for attrition for non-institutional challenges. The values in the cells represent the correlation coefficient between two reasons. A positive correlation coefficient indicates that two reasons are positively correlated, meaning that if one reason increases, the other reason is also likely to increase. A negative correlation coefficient indicates that two reasons are negatively correlated, meaning that if one reason increases, the other reason is likely to decrease. The strongest positive correlation is between financial reasons and others (0.012). This means that if a student is facing financial reasons for attrition, they are also likely to be facing other reasons for attrition. The strongest negative correlation is between attrition and family challenges (−0.0005). This means that if a student is facing family challenges, they are less likely to be facing attrition.
The LSTM model’s accuracy during training and validation is displayed in
Figure 13. The percentage of correctly classified examples in the training set is known as the training accuracy, and the percentage of correctly classified examples in the validation set is known as the validation accuracy. The model is trained on a dataset of 80%, with a validation set of 20%. When the model understands the fundamental characteristics of the data, the training correctness rises quickly. The validation accuracy rises as well but more slowly, because the validation set is a more complex dataset than the training set. The model eventually reaches a point where the training accuracy and validation accuracy are both about 57%. This means that the model can correctly classify about 57% of the examples in both the training and validation sets. Additionally, the figure demonstrates that both the training and validation accuracies are greater than 55%, indicating that the model can accurately classify over 55% of the attrition dataset in both the training and validation sets. This is a positive outcome, demonstrating the model’s capacity to learn the characteristics of the data and generate predictions.
The LSTM model’s training and validation losses are depicted in
Figure 14. The model’s loss in training data is referred to as the training loss, and the model’s loss in validation data is referred to as the validation loss. Because the model can be trained from the training data, the training loss is usually smaller than the validation loss. As the total amount of epochs grows in this figure, the training loss and validation loss decline. This suggests that the model is gaining knowledge from the data and improving its forecasting abilities.
The CNN model’s reliability during training and validation is displayed in
Figure 15. The predictive ability of the model on training data is known as training accuracy, and the model’s accuracy on validation data is known as validation accuracy. Because the model can learn the training data more quickly than the validation data, the training accuracy is usually higher than the validation accuracy. The generalization gap is the difference in accuracy between training and validation data. The success rate for training in this study is approximately 50.5%, and the validation accuracy is approximately 49.9%. This implies that while the model is highly adept at learning the training set, it struggles to make strong generalizations to untested data. The absence of information or overfitting is possibly because of this.
Figure 16 shows the training and validation loss of a CNN model. “Training loss” describes the model’s loss in training data, whereas “validation loss” refers to the model’s loss in validation data. Given that the model can be trained using the training data, the training loss is usually smaller than the validation loss. In
Figure 16, as the number of epochs increases, both the validation loss and the training loss decline. This shows that the model is not overfitting and instead learning from the data. The model reaches its lowest validation loss at around 12 epochs. After this point, the validation loss starts to increase, which indicates that the model is starting to overfit to the training data.
The LSTM model’s accuracy during training and validation is displayed in
Figure 17. The model’s accuracy on training data is known as training accuracy, and the model’s accuracy on validation data is known as validation accuracy. Because the model can learn the training data more quickly than the validation data, the training accuracy is usually higher than the validation accuracy. Since the validation accuracy is unaffected by the model’s overfitting to the training set, it provides a more accurate assessment of the model’s performance. About 51% of the training and 46% of the validation accuracy are found in this study.
The LSTM model’s training and validation losses are shown in
Figure 18. The model’s loss on training data is referred to as the training loss, and the model’s loss on validation data is referred to as the validation loss. The goal of training a neural network model is to minimize the training loss and the validation loss.
Figure 18 illustrates how, as the number of epochs rises, both the training loss and the validation loss decline. This indicates that while the model continues to be trained, it is also learning. Since the training data are only a portion of the validation data, it is expected that the training loss remains lower than the validation loss. The percentage shown in
Figure 18 is the validation accuracy, which is the accuracy of the model on the validation data. The validation accuracy increases as the number of epochs increases, which means that the model is learning to classify the data correctly.
The CNN model’s accuracy during training and validation is displayed in
Figure 19. The model’s accuracy on training data is known as training accuracy, and the model’s accuracy on validation data is known as validation accuracy. Because the validation data are usually harder to analyze than the training data, the validation accuracy is typically lower than the training accuracy. About 50.5% the training and 49.9% of validation accuracy are found in this study.
The training and validation loss of a CNN model is displayed in
Figure 20. The model’s loss on training data is referred to as the training loss, and the model’s loss on validation data is referred to as the validation loss. Because the model can be trained from the training data, the training loss is usually smaller than the validation loss. In this study, as the number of epochs rises, both the training loss and validation loss decline. This suggests that the model is effectively generalizing to freshly collect data and learning from the data.
In
Table 4, the training and test accuracies of the LSTM and CNN models in the institutional challenge dataset are presented. The LSTM model exhibits a training accuracy of 57.29% and a test accuracy of 56.75%, indicating its ability to predict labels in both seen and unseen data. In comparison, the CNN model achieves a training accuracy of 49.91% and a test accuracy of 50.50%. Notably, the LSTM model outperforms the CNN model on both training and test datasets, showcasing its superior learning and generalization capabilities in this specific context. This agrees with [
32], who demonstrated that LSTM performs better than the other model. Consideration of dataset characteristics and further evaluation metrics can provide a nuanced understanding of model performance.
Table 5 presents the training and test loss values for the LSTM and CNN models in the institutional challenge dataset. The LSTM model achieved a training loss of 0.6765 and a test loss of 0.6852, while the CNN model showed slightly lower loss values with a training loss of 0.6730 and a test loss of 0.6782. Lower loss values indicate reduced discrepancies between predicted and actual labels, showcasing effective error minimization by both models during training and evaluation. The marginal superiority of the CNN model in minimizing losses suggests its slightly enhanced performance in handling the dataset’s intricacies. These loss metrics contribute valuable insights to the models’ efficacy and generalization capabilities on the institutional challenge dataset.
In
Table 6, the training and test accuracies of the LSTM and CNN models in the non-institutional challenge dataset are presented. Both models exhibit identical training and test accuracy values, with the LSTM and CNN models achieving 50.98% accuracy during training and 46.19% accuracy on the test dataset. This symmetry in performance suggests that, in the context of the non-institutional challenge dataset, neither model outperforms the other, yielding similar predictive capabilities. The comparable accuracies indicate that both the LSTM and CNN models have similar effectiveness in learning patterns and making predictions on unseen data, emphasizing the importance of dataset characteristics and the suitability of model architectures in achieving satisfactory performance in this specific non-institutional challenge scenario.
In
Table 7, the training and test loss values for the LSTM and CNN models in the non-institutional challenge dataset are presented. Both models exhibit closely aligned loss metrics, with the LSTM model showing a training loss of 0.6928 and a test loss of 0.6953, while the CNN model records a training loss of 0.6929 and a test loss of 0.6951. The minimal disparity in loss values underscores the similar performance of the two models in minimizing errors during both training and evaluation on the non-institutional challenge dataset. These comparable loss metrics indicate that neither the LSTM nor CNN model demonstrates a significant advantage in error reduction, highlighting a balance in their capacity to capture patterns and make predictions on the non-institutional challenge dataset. The marginal differences in loss values emphasize the nuanced comparison of these models’ efficacy in handling the specific characteristics of the non-institutional challenge scenario.