Sustainable Smart Education Based on AI Models Incorporating Firefly Algorithm to Evaluate Further Education

Li, Enhui; Wang, Zixi; Liu, Jin; Huang, Jiandong

doi:10.3390/su162410845

Open AccessArticle

Sustainable Smart Education Based on AI Models Incorporating Firefly Algorithm to Evaluate Further Education

¹

College of Music and Dance, Guangzhou University, Guangzhou 510006, China

²

School of Civil and Transportation Engineering, Guangzhou University, Guangzhou 510006, China

³

Higher School of Advanced Digital Technologies, Peter the Great St. Petersburg Polytechnic University, St. Petersburg 195251, Russia

⁴

School of Civil Engineering and Architecture, Linyi University, Linyi 276000, China

^*

Authors to whom correspondence should be addressed.

Sustainability 2024, 16(24), 10845; https://doi.org/10.3390/su162410845

Submission received: 3 November 2024 / Revised: 3 December 2024 / Accepted: 8 December 2024 / Published: 11 December 2024

(This article belongs to the Special Issue Unleashing the Potential of Informatization Teaching for Sustainable Education)

Download

Browse Figures

Versions Notes

Abstract

With the popularity of higher education and the evolution of the workplace environment, graduate education has become a key choice for students planning their future career paths. Therefore, this study proposes to use the data processing ability and pattern recognition ability of machine learning models to analyze the relevant information of graduate applicants. This study explores three different models—backpropagation neural networks (BPNN), random forests (RF), and logistic regression (LR)—and combines them with the firefly algorithm (FA). Through data selection, the model was constructed and verified. By comparing the verification results of the three composite models, the model whose evaluation results were closest to the actual data was selected as the research result. The experimental results show that the evaluation result of the BPNN-FA model is the best, with an R value of 0.8842 and the highest prediction accuracy. At the same time, the influence of each characteristic parameter on the prediction result was analyzed. The results show that CGPA has the greatest influence on the evaluation results, which provides the evaluation direction and evaluation results for the evaluators to analyze the level of students’ scientific research ability, as well as providing impetus to continue to promote the combination of education and artificial intelligence.

Keywords:

backpropagation neural networks; random forests; logistic regression; student ability assessment; machine learning model; model combination; graduate education; scientific research ability

1. Introduction

With the popularization of higher education and the evolution of the workplace environment, further study to improve academic qualifications has become a key choice for students to plan their future career paths. However, graduate education is not suitable for all students. Although some students perform well on academic tests, they may lack the necessary research skills to dissect scientific questions from existing research. Also, the importance of balanced communication and information dissemination across various educational levels should be noted, and gaps should be identified in themes and target groups for sustainability education [1,2]. Therefore, how to accurately assess whether students are suitable for scientific research has become an important consideration for college enrollment.

In the process of graduate student admission, the evaluator’s judgment has a decisive influence on the admission results of candidates. To accurately identify a student’s scientific potential, the admissions team needs to extract key information from the candidates’ data and rank their actual abilities. However, existing assessment methods tend to focus on academic competence rather than research competence, making assessment work dependent on extensive experience and complex analytical skills [3,4,5].

In the traditional admissions process, applicants are evaluated mainly based on test scores, letters of recommendation, and personal statements. However, the examination results only reflect the depth of academic understanding, and the recommendation letters and personal statements contain more subjective factors, making it difficult to comprehensively and objectively evaluate the scientific research potential of the applicants [6,7,8]. Therefore, the application of modern technology is expected to optimize the recruitment process of graduate students and improve the efficiency and accuracy of their selection.

In recent years, the application of diversified evaluation systems and big data analysis in the field of education has gradually increased, and enrollment work is developing in the direction of refinement, individuation, and intelligence. Although this trend has brought about positive changes, it is also accompanied by challenges such as the formulation of evaluation standards, the effectiveness of evaluation methods, and the fairness of evaluation processes [9,10]. These issues need to be further explored to ensure that research results can be translated into practical applications and provide scientific tool support for graduate students’ enrollment.

This study proposes using the data processing ability and pattern recognition ability of machine learning models to analyze the relevant feature information of graduate applicants, dig into the correlations between data, and build a graduate student ability prediction model [11,12,13]. The model aims to evaluate students’ data, predict their scientific research ability levels, and provide objective decision support for the admissions department [14].

At present, the machine learning model has begun to play a role in the field of student ability assessment, and its application in graduate enrollment has broad prospects, expected to bring revolutionary improvements to the selection process.

Zhang [15] proposed an innovative idea to build a student English learning assessment system based on BP neural network training and a K-means clustering algorithm. The system collects data by assigning different roles and permissions to people with different identities, aiming to build a prediction system with high test performance, low sensitivity, and strong stability. Almufarreh et al. [16] proposed the Quality Teaching and Evaluation Framework (QTEF) system by analyzing student performance data collected from the online course systems of higher education institutions. This system aims to ensure teachers’ performance in online courses, promote reforms in the way teachers teach, and create a better learning environment for students. Soffer et al. [17] identified some problems in online academic programs by investigating differences in students’ course achievement using the time, place, and access to learning resources in online academic programs. The Educational Data Mining (EDM) methodology was used to track students’ curriculum behavior and analyze the relationships among these parameters. A relevant system was constructed to assess students’ abilities and provide personalized learning support. To assess the impact of coronavirus on the online learning quality of Pakistani students, Saleem et al. [18] surveyed students from different universities through questionnaires. They used Hayes’s stepwise linear regression and PROCESS Macro for data analysis. The results showed that environmental factors harm the quality of e-learning. Chen et al. [19] believe that, in the diagnostic evaluation of metaphor learning, it is difficult to judge the subjective factors in students’ learning, so it is difficult to improve students’ knowledge levels according to their circumstances. Therefore, they proposed a diagnostic evaluation model for English learning, based on machine learning technology, and built a variety of models with different bases in the process to compare the evaluation results of different models for this problem and find the best diagnostic analysis algorithm.

In the work of existing scholars, we have observed that machine learning models exhibit remarkable capabilities in processing data in the field of education [20]. However, most studies only customize models for specific problems, limiting the computational and analytical capabilities of models in the face of unknown or more complex feature parameters. In other words, when dealing with more complex student information, the model parameters need to be returned to ensure realistic adaptability of the results. This process puts high demands on the experience of model developers, increasing the time cost of model construction. To overcome these challenges, this study proposes a method that combines search algorithms with machine learning models. The core of this method is to use the algorithm selection mechanism to search and determine the hyperparameters of the model quickly, ensuring that these hyperparameters can be efficiently applied to the current dataset. This strategy effectively solves the problem of the machine learning model’s dependence on the manual setting of hyperparameters and significantly improves the prediction accuracy and development efficiency of the model [21].

Therefore, in the face of the changing environment and diverse data characteristics in the field of education, combining search algorithms with machine learning models can enhance the generalization ability of the model, enabling it to better adapt to different data distributions and student situations. This flexibility provides effective support in responding to emerging issues in the field of education [22,23].

The remainder of this paper is structured as follows: Section 2 describes the datasets used in detail, including data sources and features. Section 3 introduces the search algorithm selected in this study and various machine learning models combined with it, while detailing the specific parameters used to quantify and evaluate the results of model construction. In the fourth section, the experimental process is designed and elaborated. The experimental data are visually displayed through tables and graphs, and the prediction effects of different combined models are objectively analyzed. In Section 5, all of the experimental results are analyzed and discussed, the optimal training model is extracted, and the final experimental results are presented.

2. Materials

In this section, the core content of the study is discussed in depth, and a detailed basic description of the collected data and their characteristics is provided. First, the data were initially processed and analyzed to ensure a stable and complete database for the subsequent experimental part. This step is crucial to improving the predictive accuracy of the model being constructed. One of the key analytical steps is to explore correlations between different features. We will discuss in detail whether there are significant correlations between the various characteristics that represent student information, which will help us to understand which characteristics may have an important impact on students’ research potential. Through this correlation analysis, the features that are crucial to model construction can be further screened, and the necessary operating conditions can be provided for the development of the subsequent model [23,24].

2.1. Dataset

In this study, we relied on datasets collected by previous researchers to train advanced machine learning classification models [25]. The performance of the model was evaluated comprehensively based on the deviation between its predicted results and the actual dataset. The dataset used includes three key features: the individual ability and subjective evaluation of students, research background, and school reputation. A detailed description of these features is given in Table 1.

In graduate admissions, some of the seven characteristics considered can directly reflect the specific situation and performance of students, and they can be objectively evaluated by specific numbers. Other features lack objective criteria and rely on the subjective judgment of third parties for their content. Therefore, before using these data to build a model, it is necessary to conduct proper pre-processing to meet the requirements of model construction.

Specifically, for those features that have objective numerical values, such as the CGPA, GRE Score, and TOEFL Score, these numerical values can be directly used as objective indicators for the model to assess students’ learning ability. Although the two characteristics of university rating and research are supported by objective data, they do not reflect students’ abilities as directly as the first three characteristics. Meanwhile, these data are time-sensitive and need to be updated according to the actual situation. As for the SOP and LOR, these two features contain descriptive text of the student’s abilities, which cannot be directly expressed in numerical terms, so before incorporating these features into the model, text processing and feature extraction are required to convert them into a format suitable for model training.

In addition, not all students’ information can be fully recorded, or there may be errors in the recorded data. Noise and missing values in datasets can have a significant negative impact on model training results [11]. Therefore, before building the model, it is necessary to clean and preprocess the basic dataset to a certain extent, which will effectively improve the training efficiency and reliability of the model.

In the process of constructing the model dataset, it is necessary to deal with the characteristic parameters that cannot be directly used for model training. In particular, SOPs and LORs are two features that rely mainly on subjective judgment and cannot be described by intuitive data. In addition, not all applicants provide an LOR as a feature parameter, so these data need to be transformed.

In this study, the results of SOPs were determined based on the subjective rating of reviewers. For LORs, due to the large differences in the contents of the recommendation letters, we chose to use a binary method of 1 and 0 to indicate whether the applicant has this characteristic. As for the feature of university rating, although relevant data exist in the world, their value does not match the order of magnitude of other feature parameters, so this feature was normalized to reduce the possible impact of excessive value differences on model construction.

As for the feature of research, although it can directly reflect students’ scientific research experience, not all students have the opportunity to participate in scientific research. Therefore, taking the same approach as for LORs, only 1s and 0s were used to distinguish whether students have the characteristic of scientific research experience.

On the other hand, the CGPA, GRE Score, and TOEFL Score—three data that can be used directly—may be missing during the entry process. This lack of data is detrimental to the construction of the model, so it is necessary to supplement the data manually. In this study, we decided to use the average value method to fill in the missing parts of these data to ensure the integrity and accuracy of model training. This approach is simple and effective, helping to reduce the potential negative impact on model performance due to missing data. The pre-processing steps can ensure that the data used in the training process of the model is more complete, so as to improve the accuracy of the model prediction and the generalization ability of the model [26,27].

2.2. Correlation Analysis

In the initial stage of model construction, analysis of the correlation among parameters is very important. The purpose of this step is to identify and eliminate parameters that, due to their high linear correlation, may lead to increased errors during model training. When there is a high correlation between parameters, their change trends will be very synchronous, and this multicollinearity may obscure the contribution of other important features to the model, thus affecting the predictive ability of the model. As a result, the regression coefficient estimation in the model becomes unstable, increasing the variance of the model and, thus, affecting the generalization ability of the model. Therefore, through correlation analysis, it is possible to identify those features with high correlation and determine whether some features need to be removed from the model to reduce the correlation between features [28,29].

In addition, correlation analysis can help understand the relationships between different features and how they collectively affect the output of the model. In this way, it is possible to ensure that the model can capture key information in the data, improving the model’s explanatory power and prediction accuracy.

Figure 1 gives the results of the correlation analysis. During the correlation analysis, the parameter has a correlation coefficient of 1 with itself, which is expected because it indicates that the parameter changes in the same way. In addition, we can also observe that the correlation coefficient between SOP and university rating is 0.4, which implies that there is a moderate linear association between the two, but this association is not very strong. On the other hand, the correlation coefficient between the research feature and other features in the dataset is below 0.1, indicating a very low correlation. This suggests that, from a data perspective, the variation in research is not influenced by other features, and there is a lack of significant connection between them. While there is a correlation between these two features, their relationship is not strong enough to have a significant impact on the model’s prediction accuracy.

Based on the results of these correlation analyses, it can be considered that the selected feature set is appropriate when constructing the model. These features can fully capture the key information required by the model while avoiding the prediction bias that may be introduced due to the high correlation between the features [30]. Therefore, the input features adopted here are suitable for model construction and can effectively support the development and optimization of the model.

3. Methodology

In order to determine the best model for assessing students’ scientific research ability, this study will explore three different machine learning models. The first is the backpropagation neural network (BPNN), which is particularly suitable for dealing with complex nonlinear problems. The strength of the BPNN is its ability to identify the intrinsic links between multiple seemingly unrelated features and the output results, and to quickly provide predictive results after the model is successfully constructed.

Secondly, the random forest (RF) model was selected. RF has excellent processing capabilities for noise in raw data, and by integrating multiple decision trees, noise interference with prediction results can be effectively reduced. In addition, the results of the RF model have a high interpretability, which helps to analyze the specific impact of individual features on the evaluation results.

Finally, the logistic regression (LR) model is introduced. LR is usually used for binary classification problems, and it has a good ability to distinguish the data that need to be classified. Therefore, LR was chosen as the baseline for comparison with the other two models.

In addition, the firefly algorithm was combined with the above three models in this study to improve the speed and accuracy of hyperparameter adjustment. The aim of this work is to find the most suitable model for evaluating students’ scientific research ability. This will provide educators and researchers with deeper insights to optimize methods for assessing scientific competence.

In order to ensure the amount of data required for model training and verify the accuracy of the model, a stratified sampling method was adopted. The specific operation method is to divide the collected dataset into ten equal parts according to the result ranking as the standard, and then randomly extract 70% of the data from these ten parts for the training stage of the model to learn the patterns and relationships in the data. The remaining 30% of the data are used to test the construction effect of the model, to evaluate whether the model meets the requirements of the experimental design. The proportion of selected data is judged by empirical experiments. Through this verification process, the prediction accuracy of the three composite models can be compared intuitively, and their performance ranking can be determined accordingly, so as to select the optimal model.

In the following sections, we will elaborate on the firefly algorithm and the chosen machine learning model, and on how they work together to improve the model’s predictive power and accuracy. Through this comprehensive method, the aim is to build a model that can accurately evaluate the scientific research potential of students with high efficiency, and that provides strong data support for the enrollment decisions of higher education institutions.

3.1. Data Classification and Evaluation Methods

To maximize the use of the collected data and accurately evaluate the performance differences between different composite models, we took a phased approach to partition the datasets. Specifically, we used 70% of the dataset for the training phase to train and optimize the three models: BPNN-FA, RF-FA, and LR-FA. This step is crucial in the model-building process because it allows the model to adjust its parameters by learning the patterns and relationships in the dataset.

The remaining 30% of the data were reserved for the validation phase, which aims to evaluate the model’s ability to generalize on unseen data. This validation process is important for ensuring the predictive performance of the model, as it reflects how the model will perform in real-world applications. With this rigorous evaluation, it is possible to ensure that the selected model not only performs well on the training data but also accurately predicts trends and outcomes of new data. This data partitioning strategy helps to fully understand the performance of the model and provides solid data support for the final selection of the most suitable model [12,31].

3.2. Tuning Algorithms and Machine Learning Models

3.2.1. Firefly Algorithm (FA)

During the execution of the firefly algorithm, a group of fireflies is randomly initialized in the solution space of the dataset, and the position of each firefly corresponds to a potential solution. The brightness of the firefly, the quality or fitness of its solution, is determined by the numerical representation of its current position. The higher the brightness, the closer the position of the firefly in the solution space to the optimal solution [32,33].

At the same time, fireflies interact with each other based on differences in brightness. Less bright fireflies are attracted to and move toward brighter ones, mimicking learning and information sharing among individuals in a colony. However, the light of the fireflies is gradually weakened by the influence of the medium during propagation, which means that the degree of mutual attraction between fireflies gradually decreases as the distance increases.

Therefore, the two core elements of the firefly algorithm are brightness and attraction. Brightness determines the quality of the firefly solution, while attraction affects the movement and search process between fireflies. By simulating this natural phenomenon, the firefly algorithm can effectively search and optimize in the solution space to find the optimal solution or approximate optimal solution. With its unique characteristics of swarm intelligence and random search, this algorithm shows remarkable potential and advantages in solving complex optimization problems [33,34].

Brightness:

I_{i} \propto \frac{1}{J (x_{i})}, 1 \leq i \leq n

(1)

where

I_{i}

is the brightness of the

i

th firefly,

x_{i}

is the position corresponding to the

i

th firefly, and

J (x_{i})

is the function value corresponding to the position of the

i

th firefly

Attraction:

β_{i, j} = β_{0} \times e^{- γ * r_{i j}^{2}}

(2)

where

β_{i, j}

is the relative attraction between firefly

i

and firefly

j

,

β_{0}

is the maximum attraction (that is, the attraction when

r

= 0),

γ

is the light intensity absorption factor, reducing the light intensity between fireflies, and

r_{i j}

is the distance between firefly

i

and firefly

j

.

The calculation method is as follows:

r_{i j} = ‖x_{i} - x_{j}‖ = \sqrt{\sum_{k = 1}^{d} {(x_{i k} - x_{j k})}^{2}}

(3)

where

d

is the dimension of the problem being solved.

The low-brightness firefly will move towards the high-brightness firefly after being attracted, and then update its position. The formula for its position is as follows:

x_{i}^{k + 1} = x_{i} + β (x_{j} - x_{i}) + α ε_{i}

(4)

where

x_{i}^{k + 1}

represents the updated firefly position,

α ε_{i}

is a disturbance that causes the firefly to make a random move to search for an optimal solution,

α

is the scale coefficient, and

ε_{i}

is a random number that follows a Gaussian distribution.

3.2.2. Backpropagation Neural Network (BPNN)

A BPNN is a multi-layer feedforward neural network prediction model trained by an error backpropagation algorithm. By analyzing a large number of input and output data pairs, the model mines the internal mapping relationships between the data and establishes a complex nonlinear mapping function, as shown in Figure 2.

The structure of the BPNN model is mainly composed of three levels: the input layer, the hidden layer, and the output layer. The most critical is the hidden layer, which undertakes the complex nonlinear transformation from input to output and captures the inherent law of the data by adjusting the internal weights and biases. The data in the output layer represent the response to the input data [35].

In the training process, the model calculates the predicted output through the forward propagation algorithm and calculates the error between the predicted value and the actual value. The error is then passed back to the network through a backpropagation mechanism to adjust the weight and bias of the hidden layer. By iterating through this process, the parameters of the hidden layer are optimized until the prediction error of the network output is reduced to an acceptable range. This means that the network has learned the mapping between the input data and the output data, which can be used to make predictions about new input data.

3.2.3. Random Forest (RF)

The random forest algorithm is a machine learning technique based on the bagging inheritance models. Each decision number is a classifier in the random forest model. Bootstrap random sampling technology is used to extract a certain amount of data from the dataset as a discriminant sample, and a corresponding sub-training set and test set are generated for it. The results of the decision tree are obtained. All decision tree results are classified and integrated to become the final data result of the random forest ridge model [36,37].

The computational steps of the RF model are outlined as follows:

Using the bootstrap sampling method, randomly select n samples from the sample set. Randomly extract k attributes from all attributes and establish a decision tree. Utilize the optimal partitioning attribute as the node. Repeat the steps to establish m decision trees and generate a random forest consisting of m decision trees. The data classification is determined through mode voting, and the mode voting formula is as follows:

(x) = \arg \max \sum_{i = 1}^{k} I (h_{i} (x) = Y)

(5)

where

H (x)

represents the final voting result of the RF model,

h_{i}

represents the prediction result of each decision tree, and Y represents the objective function.

In the general flow of the random forest algorithm, the final result depends on the maximum number of categories. Because the results of a single decision tree are quite sensitive to the training set, multiple decision trees are integrated into a random forest model, which effectively reduces the correlation problem among multiple decision trees by increasing the number of samples [37,38].

3.2.4. Logistic Regression (LR)

The logistic regression model is a powerful statistical tool; its core function is to use one or more independent variables (explanatory variables) to predict the probability of a particular event. This model not only evaluates the probability of each event but also classifies the events according to the predicted probability. In binary classification problems, logistic regression converts the output of a linear model to a probability value through the sigmoid function, which represents the confidence of the input features corresponding to the positive class label [39,40]. In general, a threshold is set (such as 0.5); when the probability value is above this threshold, the event is classified as a positive class; otherwise, it is classified as a negative class.

P (y = 1 | x) = \frac{1}{1 + e^{- (θ_{0} + θ_{1} x_{1} + \dots + θ_{n} x_{n})}}

(6)

where

P (y = 1 | x)

represents the probability that the target variable y is equal to 1 given input x, and

θ

is the model parameter.

In addition, as can be seen from the sigmoid function, another significant advantage of LR is its interpretability. Each parameter in the model represents the degree of influence of the corresponding independent variable on the probability of event occurrence, which makes the prediction process and results of the model easy to understand, thus making it easier to gain the trust of users in practical applications [41].

3.2.5. Tenfold Cross-Validation

During the 10-fold cross-validation test, the dataset is randomly divided into ten equal subsets. In each round of validation, one of the subsets is selected as the test set, while the remaining nine subsets are combined as the training set. The training results of the models are tested, and the process is repeated ten times. The performance of the model is evaluated by calculating the average of ten test results. In this way, we can make full use of the sample data, each of which is used in the training and testing phases of the model. In addition, this can effectively reduce the contingency of prediction results and improve the stability of model evaluation through multiple verifications.

3.3. Evaluation Index

To show the prediction effect of the model directly, various statistical parameters were used to analyze the prediction results of the model. To effectively compare the gap between model training results and actual data, root-mean-square error (RMSE) was used as a measurement index in this study.

R M S E = \sqrt{\frac{1}{m} \sum_{i = 1}^{m} {({p y}_{i} - {m y}_{i})}^{2}}

(7)

where m is the test size,

{p y}_{i}

is the predicted value, and

{m y}_{i}

is the mean of the actual value.

The correlation coefficient, R, measures the strength of the linear relationship between the predicted value and the actual value. When R is close to 1, it means that there is a strong correlation between the predicted value and the actual value; that is, the model has a high prediction accuracy.

R = \frac{\sum (x i - \bar{x}) (y i - \bar{y})}{\sqrt{\sum {(x i - \bar{x})}^{2} \sum {(y i - \bar{y})}^{2}}}

(8)

where

x i

and

y i

are the corresponding predicted and actual values, respectively, and

\bar{x}

and

\bar{y}

are the corresponding average values, respectively.

4. Results and Discussion

4.1. Hyperparameter Adjustment

Manual setting of hyperparameters during model building is a challenging task that not only tests the experience and ability of the engineer but also can take a lot of time for engineers unfamiliar with the data to find the right combination of hyperparameters. Once the dataset needs to be replaced, the process has to be repeated, resulting in a significant increase in uptime and cost. To solve this problem, this study combines a search algorithm with a machine learning model, using the search algorithm to automatically adjust the hyperparameters of the model to quickly find the most suitable hyperparameter configuration for the current dataset, thereby improving work efficiency.

The hyperparameters of the RF model were optimized by FA, the number of trees in the forest was determined to be 97, and the lower limit of the sample number of leaf nodes of each tree was set to 1.

For the BPNN, the results of structural optimization showed that the configuration of one hidden layer, with six neurons per layer, was the most efficient. The learning rate was set to 2.3 × 10⁻⁵, and the momentum factor was set to 0.0027.

The FA was used to search and test the hyperparameters of different models, and the most suitable value of the hyperparameter was found. After multiple model iterations, RMSE values were used to visually express the results of the hyperparameter adjustments. The specific optimization results are shown below. This approach not only improves the performance of the model but also reduces the time and resources required for model tuning, providing a solid foundation for subsequent model deployment and application.

The above three charts show the difference between the predicted results and the actual values of the three different models as the number of iterations increases with the help of the algorithm. Figure 3 gives the results of the hyperparameters’ regulation. It can be observed from Figure 3a,b that, with the increase in the number of iterations, RMSE shows a downward trend and gradually decreases in a stepped manner. Finally, at the 15th and 14th iterations, the RMSE values reached their lowest point, and then they tended to be stable thereafter. The RMSE values of the BPNN model and RF model decreased to 0.037 and 0.045, respectively. This shows that, after 15 iterations, the algorithm successfully found the most suitable combination of hyperparameters for the BPNN model and the RF model to process the student-related data.

In the issue of postgraduate admission, it is crucial to accurately analyze the ability and potential of students, which helps to judge whether students are suitable for scientific research. Therefore, it is necessary to build an accurate model to evaluate students’ academic ability in order to make the most appropriate decision. In this section, we use the algorithm to search and adjust the hyperparameters in the process of model construction, which effectively improves the accuracy of the model in analyzing and predicting students’ academic ability and provides a precise basis for the specific construction of the subsequent model. Through this approach, we can ensure that the model can provide more accurate and reliable predictions when processing complex datasets, thus providing strong data support for higher education institutions’ admissions decisions.

4.2. Building Machine Learning Models

After using the algorithm to adjust the hyperparameters of the model, the collected data were used to build the graduate research ability evaluation model. The construction of the model was divided into two parts, namely, the training stage and the testing stage. In the training stage, the model deeply analyzes the training dataset to dig out the intrinsic relationships and potential patterns between students’ relevant information and scientific research ability. These relationships and patterns form the core of the assessment model, which enables the model to identify key factors that affect students’ ability to research. The model was then tested using validation datasets, and the model’s predictions were compared with the actual data in detail. This comparison process can evaluate the prediction accuracy of different models and judge their performance in practical applications. Comparing the prediction results of the three models helps to understand the performance of the different models in practical applications and select the most suitable evaluation model. The specific training results are shown below.

Figure 4 shows the results of the three composite models run on the training set and the test set. By looking at these charts, we can see that the RF-FA’s data points are closest to the ideal 1:1 straight line, indicating the highest agreement between the RF model’s predictions and the actual values. In contrast, the distribution of LR-FA’s prediction points is relatively scattered, which indicates that the LR model was not able to effectively capture the best relationships between students’ characteristic information and scientific research ability.

The BPNN and RF models optimized by FA predicted values that were closer to the actual situation of the students. This shows that the BPNN and RF models can more accurately assess students’ scientific research ability than the unoptimized LR models. Therefore, these two models are more suitable for assisting assessors in analyzing and screening graduate applicants.

These results highlight the importance of using advanced optimization algorithms to adjust model hyperparameters, as well as the need to consider predictive performance when selecting models. With these optimized models, evaluators can more reliably predict students’ research potential and make more informed admissions decisions.

It is clear from Table 2 that the FA-optimized RF model exhibits the lowest RMSE value and the highest R value. This result shows that the RF model, with the help of the FA, successfully finds hyperparameters that are highly matched to the student feature data, thus effectively identifying the associations between the feature parameters and the actual results. Therefore, the predicted results of the RF-FA model have the highest agreement with the actual data. The results shown in this table are consistent with the previous observation in Figure 3 that RF models with carefully tuned hyperparameters perform best in subsequent predictions. In contrast, the LR model has a large deviation from the actual value.

To show the prediction results of the three models more intuitively, the prediction data of the three composite models are summarized in the same bar chart, so that the data distribution range of each model can be clearly compared. This visualization method will help to more intuitively understand the performance of each model, as well as their accuracy and reliability in predicting students’ scientific research ability. Through this comparison, we can further confirm which model is most suitable for evaluating the scientific research potential of graduate applicants, thus providing strong data support for admissions decisions.

The performance of the three composite models can be visually compared by Figure 5. In these models, the LR model’s coefficient of determination R is mainly distributed in the range of 0.82 to 0.88, which indicates that the LR model has a relatively low correlation between the predicted values and the actual values, and its prediction accuracy is lower than that of the other two models. In contrast, the RF model has an R value closer to 1, distributed in the range of 0.96 to 1, showing an extremely high correlation with the actual value—that is, having the highest predictive correlation. The data performance of the BPNN model was in between, overlapping with the data of the other two models, but it did not show better predictions.

The RMSE numerical diagram provides a clearer picture of the performance differences between the three models. Although all models showed the same trend, the test set performed slightly worse than the training set. This may be because the test set contained data that the model had not seen during the training process, resulting in a certain bias in the model’s fitting to the test data. However, from the results, the RF model and BPNN model show high prediction accuracy in predicting the relationships between students’ information-related features and students’ scientific research ability.

Also, Table 3 gives the results for the comprehensive identification of multiple evaluation parameters from the machine learning process. Regarding the mean squared error (MSE), mean absolute error (MAE), and mean absolute percentage error (MAPE), these parameters can be used to evaluate the prediction accuracy, robustness, and sensitivity of the evaluation model. MSE is sensitive to outliers, while MAE is more robust and less sensitive to large errors. MAPE, on the other hand, takes into account the proportional relationship between the prediction error and the true value. As can be seen from the table, RF shows more excellent results, which indicates that it has higher accuracy, higher robustness, and lower sensitivity in predicting student enrollment rates. The median prediction error is a measure of the performance of a regression prediction model, which calculates the median of the absolute error between the predicted value and the true value. The median prediction error is robust and has little effect on outliers. Therefore, when there are outliers in the data, the median prediction error may be a better indicator of the predictive performance of the model than the MSE or MAE. As can be seen from the table, after excluding outliers, the RF model still shows the highest accuracy among the three models. The difference between predicted and true SD is usually not a direct indicator to evaluate the performance of the model; it reflects the degree of uncertainty or dispersion of the predicted values of the model. It can be seen from the table that RF shows a small degree of dispersion, reflecting the stability in the prediction process of students’ continuing education.

Overall, from the experimental results of the test set alone, the RF model has the best performance in correlation analysis and low error in RMSE values. However, combined with the test set data comparison, the RF model may have a certain degree of overfitting. The results of the training set are too close to the actual values, and the test set results do not show such excellent data. Therefore, to find the most advantageous model among these three composite models, we can more accurately capture and predict the relevant characteristics of students’ scientific research ability. In the following work, the data are analyzed in two other ways.

4.3. Tenfold Cross-Validation Results

According to the previous test results, it was found that the three composite models constructed in this research have different degrees of fitting effects on the relevant feature information of students. However, these predictions only apply to the currently collected dataset, and the accuracy of the model’s assessment may vary for new data that have not yet been seen. To evaluate the generalization ability of the model, the tenfold cross-validation method was used to evaluate the model.

Tenfold cross-validation works by evenly dividing the existing dataset into ten pieces, selecting one piece each time as the validation set and the remaining nine pieces as the training set. This process is repeated ten times, each time selecting a different piece of data for the validation set, ensuring that each data point has a chance to be used for validation. After ten iterations of training and verification, the difference between the predicted result and the actual value is used as the evaluation index, so that the generalization ability of the model can be evaluated intuitively. The specific verification results are shown in the following figure.

As can be seen from the tenfold cross-validation results shown in Figure 6, the BP model and RF model show the lowest RMSE values in the first compromise, at 0.07 and 0.045, respectively, while the LR model has a relatively high RMSE value of 0.067. This shows that the BP model and RF model have advantages in accuracy and generalization ability, and that they can maintain relatively accurate prediction results on different datasets.

It is worth noting that although, in the previous model training results, the BPNN model’s results were slightly worse than the RF model’s, in the ten-fold cross-validation the BPNN model had a better prediction effect than the RF model, showing a lower RMSE value and higher prediction accuracy. This finding is of great significance for the evaluation of graduate admission results, because accurate prediction results can help staff to evaluate students’ scientific research ability more reasonably, so as to make more informed admissions decisions.

In summary, the BPNN model has the best performance in the tenfold cross-validation, which not only proves its effectiveness in predicting students’ scientific research ability but also provides a reliable evaluation tool for graduate students’ enrollment. With this precise predictive approach, admissions teams can more confidently screen out students with research potential, thereby improving the quality and efficiency of the admissions process.

4.4. Monte Carlo Simulation

To further verify the accuracy and generalization ability of the model, the Monte Carlo simulation method was used to detect the data of the three composite models. Monte Carlo simulation is a statistical method based on random sampling that approximates the distribution of random variables by generating large amounts of simulated data and repeating the experiment many times to improve the accuracy of the results. Through Monte Carlo simulation, multiple sets of unknown data are generated to simulate the performance of the model in different situations. The specific results are shown below.

Figure 7 shows the performance of the three models in Monte Carlo simulation. As the number of simulation iterations increases, the agreement between the predicted results of each model and the actual data gradually increases. Specifically, the RF model consistently maintained a high R value near 1 on the training set, showing a high degree of agreement between its predicted results and the actual results, and this state remained stable over multiple simulations.

However, for the test set, the performance of the RF model is similar to that of the other models, showing large fluctuations in the initial stage, and gradually stabilizing the R value as the number of simulations increases. It is worth noting that although the RF model performed well on the training set, it did not show better performance than the BPNN and LR models on the test set. This may suggest that the RF model overfits during training; that is, the model performs too well on the training data to capture the noise and details in the data, rather than the underlying data patterns, resulting in a decline in its ability to generalize on previously unseen test data.

Therefore, although RF models can achieve low error on the training set, their performance on the test set fails to meet expectations, emphasizing the need to balance model complexity and generalization ability when selecting models and hyperparameter tuning to ensure good predictive performance on new data.

From the perspective of RMSE as a performance indicator, it can be observed from the Figure 8 that the RMSE values of the three models tended to be stable with the increase in Monte Carlo simulation times. This shows that, with the progress of simulation, the prediction accuracy of all models gradually improved and finally stabilized at a relatively consistent level.

However, the RF model performed particularly well on the RMSE on the training set, and its value was significantly lower than that of the other two models, showing a very high degree of fit on the training data. This superior performance on the training set may imply that the model learned specific patterns in the data during the training phase, including underlying noise and details, rather than just the underlying distribution. This phenomenon is also reflected in the test set: although the RF model’s RMSE remains the lowest, the gap with the other models is not significant, which may mean that the RF model’s generalization ability on the test set is not as strong as expected.

Therefore, although the RF model showed excellent performance on the training set, it did not significantly outperform the other models on the test set, which may point to an overfitting problem. Overfitting occurs when the model overlearns the training data to the point that it ignores the general laws of the data, resulting in poor performance of the model on the new data. Therefore, when selecting the final model, it is important not only to look at its performance on the training set but also to consider its performance on previously unseen data, to avoid choosing overfitted models that only perform well on the training set.

The output results of the three models were placed in the same coordinate axis for comparison. In Figure 9, the BPNN model and the RF model are in similar positions, meaning that the prediction effects of the two models are similar. The RF model is closer to the realistic value, and the LR model is the most deviated.

However, the analysis was combined with the previous Monte Carlo simulation results and the model prediction data point plot. Although the RF results are the best in performance, RF models can be overfitted during the training phase. Therefore, it can be concluded that the BPNN has the best predictive effect in analyzing the relevant characteristic information of students and evaluating their research ability.

4.5. Importance and Sensitivity Analysis of Input Variables

In the previous work, the prediction results of the model were analyzed, and the most suitable model for the analysis of students’ research ability was found. To further analyze the influence of each input feature on the model prediction results, the importance analysis of each input and output feature was carried out. The degree of influence of different characteristics on the strength of students’ research ability is expressed by intuitive numerical values. This provides information guidance for school evaluators and points out the influence of various parameters, allowing the evaluator to focus on a few key data to make a more accurate judgment.

Figure 10 reveal the degree of influence of different characteristics on the assessment model of students’ research ability. Among them, the CGPA has the highest importance score (3.2542), which is significantly higher than that of the other characteristic parameters. This result emphasizes that the CGPA is a key indicator to measure students’ academic achievement, and its value directly reflects students’ learning ability and knowledge mastery, making it the most important reference factor in evaluating students’ research potential.

The next most important features are the GRE Score and SOP. While the GRE Score shows how students perform on standardized tests, the SOP provides a platform for students to present themselves and articulate their research interests and career goals. These two materials provide direct, specific information to the evaluator; therefore, they score relatively high in importance in the evaluation model.

In contrast, the importance scores of the other feature parameters are relatively low, indicating that these factors play a limited role in the model and can be regarded as auxiliary reference information. In particular, the TOEFL Score, with an importance score of only 0.1769, may suggest that language ability, while important, is not a decisive factor in the assessment of research ability.

Taken together, these findings provide valuable insights for college evaluators. More attention should be paid to the three key characteristics of CGPA, GRE Score, and SOP when evaluating graduate applications. Other characteristics, although also taken into account, have relatively little impact on the evaluation results and can be used as secondary factors. This data-driven approach can help admissions teams to more efficiently and precisely screen out students with scientific potential.

5. Conclusions

In the context of the intelligent reform of the education system, this study focuses on the challenges of applying for graduate employment. A detailed analysis of applicants’ data using machine learning models was designed to assess their potential research abilities, allowing students to be ranked accordingly to help school evaluators identify talents suitable for research jobs. Despite the increasing application of machine learning technology in the field of education, existing models are still inadequate in terms of generalization ability. Therefore, an innovative method combining the firefly algorithm and machine learning models is proposed in this study. Through the fast searchability of the FA, the machine learning model can find the most suitable hyperparameters for the current dataset, thus significantly improving the prediction accuracy and efficiency of the model.

After an in-depth analysis of the data, the following conclusions can be drawn:

The firefly algorithm is excellent at optimizing machine learning models. In particular, under the hyperparameter adjustment of the FA, the RMSE values of the BPNN and RF models decreased steadily with the increase in the number of iterations, finally reaching 0.037 and 0.045, respectively. The predicted results were very close to the actual values.
The RF model showed the highest coefficient of determination (R value, 0.9787) and the lowest RMSE value (0.0295) in the training phase, which were significantly better than those of the BPNN and LR models. However, in the test set, the RF model’s R value dropped to 0.8825, similar to the results for the BPNN (0.8712) and LR models (0.8619), suggesting that the RF model may have overfitted the original dataset during the training phase.
Through Monte Carlo simulation, we further verified the generalization ability and operation accuracy of the model. The simulation results confirm the overfitting phenomenon of the RF model in the training stage from another angle.
The importance analysis of input features shows that the CGPA has the greatest impact on the evaluation results of students’ scientific research ability, and its importance score is 3.2542. In contrast, the TOEFL Score is less influential in the evaluation process, with an importance score of 0.1769.

Although this study found the most suitable model for the evaluation of students’ scientific research ability among the three selected machine learning models, this does not mean that this model is the optimal solution. This study aims to provide a new research direction for the development of information technology in the field of education. However, there are still many limitations in this model. Due to the small number of features contained in the selected dataset, it is unable to fully display all of the factors that affect the enrollment of graduate students, which may lead to errors in the evaluation results of the model. Secondly, only three models and one algorithm are discussed in this study, which lacks extensive comparison. Therefore, in future work, we will explore the influence of more features on the results and analyze more models, so as to promote the informatization process in the field of education and provide more scientific and accurate technical support for educational decision-making.

Author Contributions

Conceptualization, E.L., Z.W. and J.L.; methodology, E.L., Z.W. and J.L.; software, J.H.; validation, E.L., Z.W. and J.L.; formal analysis, E.L. and Z.W.; investigation, E.L., Z.W. and J.L.; writing—original draft preparation, E.L., Z.W. and J.L.; writing—review and editing, E.L., Z.W., J.L. and J.H.; supervision, J.L. and J.H.; funding acquisition, J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the 2024 Humanities and Social Sciences Youth Fund Project of the Ministry of Education of China (24YJC760131), 2024 Higher Education Research Project of the Guangzhou Municipal Education Bureau (2024312216), Ministry of Science and Higher Education of the Russian Federation within the framework of the state assignment No. 075-03-2022-010 dated 14 January 2022 and No. 075-01568-23-04 dated 28 March 2023 (Additional agreement 075-03-2022-010/10 dated 9 November 2022, Additional agreement 075-03-2023-004/4 dated 22 May 2023), FSEG-2022-0010.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Anne McDonagh, S.; Caforio, A.; Pollini, A. The European Green Deal in Education; Routledge: Oxfordshire, UK, 2024. [Google Scholar]
Tomassi, A.; Falegnami, A.; Meleo, L.; Romano, E. The GreenSCENT Competence Frameworks. In The European Green Deal in Education; Routledg: Oxfordshire, UK, 2024; pp. 25–44. [Google Scholar]
Daucourt, M.C.; Napoli, A.R.; Quinn, J.M.; Wood, S.G.; Hart, S.A. The Home Math Environment and Math Achievement: A Meta-Analysis. Psychol. Bull. 2021, 147, 565–596. [Google Scholar] [CrossRef] [PubMed]
Urhahne, D.; Wijnia, L. A Review on the Accuracy of Teacher Judgments. Educ. Res. Rev. 2021, 32, 100374. [Google Scholar] [CrossRef]
Feraco, T.; Resnati, D.; Fregonese, D.; Spoto, A.; Meneghetti, C. An Integrated Model of School Students’ Academic Achievement and Life Satisfaction. Linking Soft Skills, Extracurricular Activities, Self-Regulated Learning, Motivation, and Emotions. Eur. J. Psychol. Educ. 2023, 38, 109–130. [Google Scholar] [CrossRef]
Steenbergen-Hu, S.; Makel, M.C.; Olszewski-Kubilius, P. What One Hundred Years of Research Says About the Effects of Ability Grouping and Acceleration on K-12 Students’ Academic Achievement: Findings of Two Second-Order Meta-Analyses. Rev. Educ. Res. 2016, 86, 849–899. [Google Scholar] [CrossRef]
Wu, P.-H.; Wu, H.-K.; Hsu, Y.-S. Establishing the Criterion-Related, Construct, and Content Validities of a Simulation-Based Assessment of Inquiry Abilities. Int. J. Sci. Educ. 2014, 36, 1630–1650. [Google Scholar] [CrossRef]
Shan, L. The Study of Test Method in Authentic Assessment of English Reading Ability. In Proceedings of the 2nd International Conference on Education, Management and Social Science, Shanghai, China, 21–22 August 2014; Volume 6, pp. 444–446. [Google Scholar]
Liu, Y.; Han, C.; Huang, L.; Wang, B.; Zhu, Z. IEEE Research and Development of Student Assessment System Based on Knowledge, Ability and Mentality. In Proceedings of the 2012 7th International Conference on Computer Science & Education, Melbourne, Australia, 14–17 July 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 1829–1832. [Google Scholar]
Wu, D. A Brief Analysis on How Formative Assessment Helps to Develop Students’ Ability of English Autonomous Learning. In Proceedings of the 2012 2nd International Conference On Applied Social Science (ICASS 2012), Kuala Lumpur, Malaysia, 1–2 February 2012; Volume 2, pp. 273–276. [Google Scholar]
Wang, S.; Wang, H.T.; Lu, Y.J.; Huang, J.D. Toward a Comprehensive Evaluation of Student Knowledge Assessment for Art Education: A Hybrid Approach by Data Mining and Machine Learning. Appl. Sci. 2024, 14, 5020. [Google Scholar] [CrossRef]
Zhou, J.; Lu, Y.J.; Tian, Q.; Liu, H.C.; Hasanipanah, M.; Huang, J.D. Advanced Machine Learning Methods for Prediction of Blast-Induced Flyrock Using Hybrid SVR Methods. CMES-Comput. Model. Eng. Sci. 2024, 140, 1595–1617. [Google Scholar] [CrossRef]
Zhu, F.; Wu, X.P.; Lu, Y.J.; Huang, J.D. Understanding Penetration Attenuation of Permeable Concrete: A Hybrid Artificial Intelligence Technique Based on Particle Swarm Optimization. Buildings 2024, 14, 1173. [Google Scholar] [CrossRef]
Bai, Y. A Study on Intercultural Operational Ability and Assessment of Students of International Trade Major. In Proceedings of the 2012 International Conference on Financial, Management and Education Science (ICFMES 2012), Beijing, China, 19–20 May 2012; pp. 432–437. [Google Scholar]
Zhang, T.Y. Design of English Learning Effectiveness Evaluation System Based on K-Means Clustering Algorithm. Mob. Inf. Syst. 2021, 2021, 5937742. [Google Scholar] [CrossRef]
Almufarreh, A.; Noaman, K.M.; Saeed, M.N. Academic Teaching Quality Framework and Performance Evaluation Using Machine Learning. Appl. Sci. 2023, 13, 3121. [Google Scholar] [CrossRef]
Soffer, T.; Kahan, T.; Nachmias, R. Patterns of Student’s Utilization of Flexibility in Online Academic Courses and Their Relation to Course Achievement. Int. Rev. Res. Open Distrib. Learn. 2019, 20, 202–220. [Google Scholar]
Saleem, F.; AlNasrallah, W.; Malik, M.I.; Rehman, S.U. Factors Affecting the Quality of Online Learning During COVID-19: Evidence from a Developing Economy. Front. Educ. 2022, 7, 847571. [Google Scholar] [CrossRef]
Chen, Y.Y.; Wang, X.; Du, X.H. Diagnostic Evaluation Model of English Learning Based on Machine Learning. J. Intell. Fuzzy Syst. 2021, 40, 2169–2179. [Google Scholar] [CrossRef]
Gentrup, S.; Lorenz, G.; Kristen, C.; Kogan, I. Self-Fulfilling Prophecies in the Classroom: Teacher Expectations, Teacher Feedback and Student Achievement. Learn. Instr. 2020, 66, 101296. [Google Scholar] [CrossRef]
Ji, Z.; Zhou, M.; Wang, Q.; Huang, J. Predicting the International Roughness Index of JPCP and CRCP Rigid Pavement: A Random Forest (RF) Model Hybridized with Modified Beetle Antennae Search (MBAS) for Higher Accuracy. Comput. Model. Eng. Sci. 2024, 139, 1557–1582. [Google Scholar] [CrossRef]
Bashir, T.; Haoyong, C.; Tahir, M.F.; Liqiang, Z. Short Term Electricity Load Forecasting Using Hybrid Prophet-LSTM Model Optimized by BPNN. Energy Rep. 2022, 8, 1678–1686. [Google Scholar] [CrossRef]
Zhou, J.; Su, Z.; Hosseini, S.; Tian, Q.; Lu, Y.; Luo, H.; Xu, X.; Chen, C.; Huang, J. Decision Tree Models for the Estimation of Geo-Polymer Concrete Compressive Strength. Math. Biosci. Eng. 2024, 21, 1413–1444. [Google Scholar] [CrossRef]
Liang, X.M.; Yu, X.; Jin, Y.; Huang, J.D. Compactness Prediction of Asphalt Concrete Using Ground-Penetrating Radar: A Comparative Study. Constr. Build. Mater. 2022, 361, 129588. [Google Scholar] [CrossRef]
Acharya, M.S.; Armaan, A.; Antony, A.S. A Comparison of Regression Models for Prediction of Graduate Admissions. In Proceedings of the 2019 Second International Conference on Computational Intelligence in Data Science (ICCIDS 2019), Chennai, India, 21–23 February 2019; IEEE: Piscataway, NJ, USA, 2019. [Google Scholar]
Yue, Y.; Cao, L.; Lu, D.; Hu, Z.; Xu, M.; Wang, S.; Li, B.; Ding, H. Review and Empirical Analysis of Sparrow Search Algorithm. Artif. Intell. Rev. 2023, 56, 10867–10919. [Google Scholar] [CrossRef]
Yao, Z.; Wang, Z.; Wang, D.; Wu, J.; Chen, L. An Ensemble CNN-LSTM and GRU Adaptive Weighting Model Based Improved Sparrow Search Algorithm for Predicting Runoff Using Historical Meteorological and Runoff Data as Input. J. Hydrol. 2023, 625, 129977. [Google Scholar] [CrossRef]
Tian, Q.; Lu, Y.J.; Zhou, J.; Song, S.T.; Yang, L.M.; Cheng, T.; Huang, J.D. Supplementary Cementitious Materials-Based Concrete Porosity Estimation Using Modeling Approaches: A Comparative Study of GEP and MEP. Rev. Adv. Mater. Sci. 2024, 63, 20230189. [Google Scholar] [CrossRef]
Zhou, J.; Tian, Q.; Ahmad, A.; Huang, J.D. Compressive and Tensile Strength Estimation of Sustainable Geopolymer Concrete Using Contemporary Boosting Ensemble Techniques. Rev. Adv. Mater. Sci. 2024, 63, 20240014. [Google Scholar] [CrossRef]
Tian, Q.; Lu, Y.; Zhou, J.; Song, S.; Yang, L.; Cheng, T.; Huang, J. Exploring the Viability of AI-Aided Genetic Algorithms in Estimating the Crack Repair Rate of Self-Healing Concrete. Rev. Adv. Mater. Sci. 2024, 63, 20230179. [Google Scholar] [CrossRef]
Wang, R.; Zhang, J.; Lu, Y.; Ren, S.; Huang, J. Towards a Reliable Design of Geopolymer Concrete for Green Landscapes: A Comparative Study of Tree-Based and Regression-Based Models. Buildings 2024, 14, 615. [Google Scholar] [CrossRef]
Wang, H.; Wang, W.; Zhou, X.; Sun, H.; Zhao, J.; Yu, X.; Cui, Z. Firefly Algorithm with Neighborhood Attraction. Inf. Sci. 2017, 382, 374–387. [Google Scholar] [CrossRef]
Zare, M.; Ghasemi, M.; Zahedi, A.; Golalipour, K.; Mohammadi, S.K.; Mirjalili, S.; Abualigah, L. A Global Best-Guided Firefly Algorithm for Engineering Problems. J. Bionic. Eng. 2023, 20, 2359–2388. [Google Scholar] [CrossRef]
Aydilek, I.B. A Hybrid Firefly and Particle Swarm Optimization Algorithm for Computationally Expensive Numerical Problems. Appl. Soft Comput. 2018, 66, 232–249. [Google Scholar] [CrossRef]
Wang, L.; Zeng, Y.; Chen, T. Back Propagation Neural Network with Adaptive Differential Evolution Algorithm for Time Series Forecasting. Expert Syst. Appl. 2015, 42, 855–863. [Google Scholar] [CrossRef]
Hajjem, A.; Bellavance, F.; Larocque, D. Mixed-Effects Random Forest for Clustered Data. J. Stat. Comput. Simul. 2014, 84, 1313–1328. [Google Scholar] [CrossRef]
Speiser, J.L.; Miller, M.E.; Tooze, J.; Ip, E. A Comparison of Random Forest Variable Selection Methods for Classification Prediction Modeling. Expert Syst. Appl. 2019, 134, 93–101. [Google Scholar] [CrossRef]
Karabadji, N.E.I.; Korba, A.A.; Assi, A.; Seridi, H.; Aridhi, S.; Dhifli, W. Accuracy and Diversity-Aware Multi-Objective Approach for Random Forest Construction. Expert Syst. Appl. 2023, 225, 120138. [Google Scholar] [CrossRef]
Sperandei, S. Understanding Logistic Regression Analysis. Biochem. Med. 2014, 24, 12–18. [Google Scholar] [CrossRef] [PubMed]
Zhang, Z. Model Building Strategy for Logistic Regression: Purposeful Selection. Ann. Transl. Med. 2016, 4, 111. [Google Scholar] [CrossRef] [PubMed]
Austin, P.C.; Merlo, J. Intermediate and Advanced Topics in Multilevel Logistic Regression Analysis. Stat. Med. 2017, 36, 3257–3277. [Google Scholar] [CrossRef]

Figure 1. Results of correlation analysis among different features.

Figure 2. BPNN model operation schematic diagram.

Figure 3. The results of hyperparameter tuning of the three combined models.

Figure 4. Composite model prediction results.

Figure 5. Histogram of model prediction results.

Figure 6. Tenfold cross-validation results.

Figure 7. Monte Carlo simulation (R).

Figure 8. Monte Carlo simulation (RMSE).

Figure 9. Data comparison.

Figure 10. Importance and sensitivity analysis of input variables.

Table 1. Student-related characteristics.

Attribute Type	Evaluation Characteristics	Attribute Description
Individual ability of students	Cumulative Grade Point Average (CGPA)	A measure of a student’s overall academic performance during college
	Graduate Record Examination Score (GRE Score)	Standardized test scores are used to assess an applicant’s mathematical, verbal, and analytical writing skills
	Test of English as a Foreign Language Score (TOEFL Score)	A standardized test that assesses the English ability of non-native English speakers
Subjective evaluation of students	Statement of Purpose (SOP)	A personal statement required to apply to graduate school that reflects the applicant’s research interests, career goals, and suitability for the program
Subjective evaluation of students	Letter of Recommendation (LOR)	Evaluation of applicants’ academic or professional abilities by a third party
Research background and school reputation	University rating	The university ranking of the applicant reflects the reputation and academic level of the applicant’s undergraduate school
Research background and school reputation	Research	The applicant’s research experience reflects the applicant’s research ability and achievements at the undergraduate or graduate level

Table 2. Comparison of model experimental data.

Model	Training Set RMSE	Training Set R	Test Set RMSE	Test Set R
BPNN	0.0669	0.8842	0.0765	0.8712
RF	0.0295	0.9787	0.0695	0.8825
LR	0.0868	0.877	0.0954	0.8619

Table 3. Comparison of the difference between the prediction results of different models.

Model	MSE	MAE	MAPE	Median Forecast Error	Difference Between Predicted and True SD
BPNN	0.006	0.057	0.093	0.043	0.023
RF	0.005	0.05	0.082	0.034	0.018
LR	0.009	0.075	0.116	0.064	0.032

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, E.; Wang, Z.; Liu, J.; Huang, J. Sustainable Smart Education Based on AI Models Incorporating Firefly Algorithm to Evaluate Further Education. Sustainability 2024, 16, 10845. https://doi.org/10.3390/su162410845

AMA Style

Li E, Wang Z, Liu J, Huang J. Sustainable Smart Education Based on AI Models Incorporating Firefly Algorithm to Evaluate Further Education. Sustainability. 2024; 16(24):10845. https://doi.org/10.3390/su162410845

Chicago/Turabian Style

Li, Enhui, Zixi Wang, Jin Liu, and Jiandong Huang. 2024. "Sustainable Smart Education Based on AI Models Incorporating Firefly Algorithm to Evaluate Further Education" Sustainability 16, no. 24: 10845. https://doi.org/10.3390/su162410845

APA Style

Li, E., Wang, Z., Liu, J., & Huang, J. (2024). Sustainable Smart Education Based on AI Models Incorporating Firefly Algorithm to Evaluate Further Education. Sustainability, 16(24), 10845. https://doi.org/10.3390/su162410845

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Sustainable Smart Education Based on AI Models Incorporating Firefly Algorithm to Evaluate Further Education

Abstract

1. Introduction

2. Materials

2.1. Dataset

2.2. Correlation Analysis

3. Methodology

3.1. Data Classification and Evaluation Methods

3.2. Tuning Algorithms and Machine Learning Models

3.2.1. Firefly Algorithm (FA)

3.2.2. Backpropagation Neural Network (BPNN)

3.2.3. Random Forest (RF)

3.2.4. Logistic Regression (LR)

3.2.5. Tenfold Cross-Validation

3.3. Evaluation Index

4. Results and Discussion

4.1. Hyperparameter Adjustment

4.2. Building Machine Learning Models

4.3. Tenfold Cross-Validation Results

4.4. Monte Carlo Simulation

4.5. Importance and Sensitivity Analysis of Input Variables

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI