Next Article in Journal
AAJS: An Anti-Malicious Attack Graphic Similarity Judgment System in Cloud Computing Environments
Previous Article in Journal
Modelling of Electric Power Generation Plant Based on Gas Turbines with Agricultural Biomass Fuel
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Role of Socioeconomic Factors in Improving the Performance of Students Based on Intelligent Computational Approaches

by
Yar Muhammad
1,
Muhammad Abul Hassan
2,*,
Sultan Almotairi
3,4,
Kawsar Farooq
5,
Fabrizio Granelli
2 and
Ľubomíra Strážovská
6
1
School of Computer Science and Engineering, Beihang University, Beijing 100191, China
2
Department of Information Engineering and Computer Science, University of Trento, 38123 Trento, Italy
3
Department of Computer Science, Faculty of College of Computer and Information Sciences, Majmaah University, Majmaah 11952, Saudi Arabia
4
Department of Computer Science, Faculty of Computer and Information Systems, Islamic University of Madinah, Medinah 42351, Saudi Arabia
5
School of Electronic and Information Engineering, Beihang University, Beijing 100191, China
6
Faculty of Management, Comenius University Bratislava, Odbojárov 10, 82005 Bratislava 25, Slovakia
*
Author to whom correspondence should be addressed.
Electronics 2023, 12(9), 1982; https://doi.org/10.3390/electronics12091982
Submission received: 22 February 2023 / Revised: 13 April 2023 / Accepted: 20 April 2023 / Published: 24 April 2023

Abstract

:
Nowadays, we are living in the modern era of technological revolution and globalization, where people are giving more priority to proper education to compete among the top countries and to achieve something in their lives. Education improves a person’s abilities and creativity, which in turn have a positive effect on the development of a nation’s or an individual’s economy and also play a productive role in it. The traditional approaches are based only on statistical measures and are not capable of figuring out the most significant socioeconomic factors affecting the performance of a student. Keeping in mind the significance of socioeconomic status (SES) in improving the performance of a student, this study analyzes the important socioeconomic factors that affect the performance of a student in Khyber Pakhtunkhwa, Pakistan. We developed our own dataset by collecting data from 100 different schools (both government and private) in Khyber Pakhtunkhwa, Pakistan, consisting of more than 5550 students who were given a proper questionnaire survey. The created dataset consists of a total of 18 features and a target class. In this research, we used different statistical and machine learning (ML) methodologies to identify the most crucial elements that significantly affect the academic achievements of a student and have a strong correlation with the target class. To select the most prominent features from the dataset, we used two different feature selectors (FCBF and relief) and measured their performances along with ML models. To measure the significance rate of each ML algorithm using the full and selected feature space, we used different performance measures such as accuracy, precision, recall, sensitivity, specificity, etc. The experimental outcomes show that the feature selection algorithms significantly improve the performance of the classification models by providing more relevant features that have a strong association with the target class. This study also offered some advice for decision-makers, particularly in the respective education sector and other authorities, to develop specific solution strategies, plans, and initiatives to address the issue. It is envisioned that the suggested scheme will help the residents of Khyber Pakhtunkhwa province, in particular, obtain a high-quality education that can help pave the way for an educated and developed Pakistan.

1. Introduction

Nowadays, we are living in an era in which education is considered the most important factor of success for all human beings. It offers individuals greater chances for a better lifestyle and plays an important part in the growth of human capital. It makes a person smarter and more productive, which has a big effect on a country’s or an individual’s economic growth. Education opens additional avenues for generating revenue and improves a person’s overall quality of life [1]. An individual’s educational accomplishments and performances are inextricably linked to his or her SES. A person’s SES can be determined using different approaches, including his/her professional and educational achievement, vulnerability to poverty, and bad life circumstances such as unemployment, etc. [2]. The SES of an individual can be best described with the help of two elements, i.e., social and economic status. Social status describes an individual’s social standing, such as his or her role in the growth and development of a society, whereas economic status describes an individual’s economic situation, such as financial status [3]. Both measures play a significant part in the success and development of an individual, an organization, and a country. Numerous educators, sociologists, and psychiatrists have used the same variables to assess people’s socioeconomic levels [4,5]. According to academics, the SES of an individual is a composite assessment of the person’s social and economic standing in comparison to others’ qualifications, revenue, and employment.
SES has a great influence on psychological health, physical health, family well-being, education, etc. Among the mentioned areas, education is considered the most important one that is greatly affected by the SES of an individual, a state, or a country [6]. Parents’ qualifications and occupations, transportation, and certain basic amenities that are highly important for a normal life are all socioeconomic elements that influence a student’s academic achievements [7]. Socioeconomic factors, according to many experts, are the most important elements for meeting educational requirements. In general, the academic achievements of a student are determined by his or her SES; the greater a student’s SES, the higher his or her academic achievements will be. Students with lesser economic backgrounds have poor academic performance and are more likely to drop out of high school and college.
Parental education is anticipated to have a substantial association with a student’s achievement; it is regarded as the most important component of SES. Eamon et al. [8] stated that educated parents have a significant impact on student success; the higher the parent’s education, the higher the student’s achievement. The family’s income has a significant impact on the educational achievements of a student. Parents’ income and students’ educational achievements are significantly correlated, as stated by Henrietta et al. [9]. They found that about 45% of students agree that rich students have access to all educational resources. However, some students disagree with the idea that rich students do better in school. In another study, Akhtar et al. [4] said that a student’s educational background and performance are affected by several things, with family income being the most important. Another factor that most affects the academic performance of a student is family size. This is because, due to many children, the parents cannot give proper time to each individual student and cannot afford large amounts of educational resources for them.
In Pakistan, some factors have an important impact on education, such as economic behavior, social activities, and political knowledge. Important aspects of student behavior include their drive for learning, their punctuality, and their enthusiasm for studying [10]. Some students struggle academically due to a lack of resources. These issues include a lack of funding, ignorance, and access to standard educational institutes [11]. In addition to the factors discussed above, there are some other important SE factors that have a great impact on students’ achievements, which we considered in this study. In this study, we considered all the possible important SE factors that affect a student’s performance in Khyber Pakhtunkhwa, Pakistan. For this research study, we chose a total of 18 SE factors, such as family size (FS), parent occupation (PO), parent education (PE), and so on. Furthermore, a student’s properties also greatly influence the student’s knowledge, such as their family, the community in which they live, and the school standard where they study. Establishing a generic hierarchical educational structure is crucial, as different schools may have distinct hierarchies and educational systems. A logical way to comprehend how differences in the educational system affect students’ SE status and performance is to develop a dataset that provides high-quality educational knowledge about students’ performance in different schools [12].
In this article, we developed an intelligent computational system that triggered all the important aspects that have a significant impact on a student’s performance. We created our own dataset by visiting various schools in Khyber Pakhtunkhwa, Pakistan, and administering a survey questionnaire that included all the important SE factors. The following are some of the basic contributions of this study:
  • A comprehensive study on the impact of the SES of a student on his or her educational performance and achievements is conducted in this article. The primary goal of the proposed research study is to determine the critical socioeconomic factors that influence student performance and to evaluate the educational systems of public and private schools using data from a questionnaire survey.
  • We created our own dataset by visiting different government and private schools in Khyber Pakhtunkhwa, Pakistan. The created dataset consists of a total of 18 crucial socioeconomic factors that have a great impact on the educational achievements of a student.
  • We proposed a novel approach using different ML algorithms to identify the impact of a student’s SES on his or her educational achievements. The proposed system worked on full features and on selected features as well, using two important feature selection algorithms, FCBF and Relief. Furthermore, the feature selectors present significant and correlated features to the ML models and play an important role in enhancing the performance of the models.
  • The performance of the proposed system using different ML models was tested on both full and selected features. Furthermore, to track the performance of each model, different performance measures were used. The experimental outcomes of the proposed system were found to be better than those of earlier statistical techniques.
The remaining layout of the paper is as follows: Section 2 depicts the related work accomplished in the socioeconomic domain. The complete methodology of the proposed system is illustrated in Section 3. Section 4 completely describes the results and discusses them. This section contains all the simulation results obtained through the utilized ML models on both full and selected features. Finally, Section 5 concludes by outlining the overall theme of the proposed system.

2. Related Work

In today’s modern and technological era, education is a prerequisite for both responsible citizenship and economic advancement. However, there are several issues that students run into while in school and college that cause many to drop out or delay finishing their studies [13]. Recently, numerous studies on the factors influencing student accomplishment have been conducted to improve educational systems all over the world through the development of policy recommendation systems [14,15]. The statistical techniques suggested by the literature from this perspective include stochastic frontier analysis, multilevel linear models, and linear regression models, all of which try to parameterize the educational production function (EPF) [16].
Numerous studies have been conducted to ascertain the impact of SES on students’ achievements, and almost all of them suggest that SES has a great influence on the educational achievements of a student, whether they are in middle school, high school, college, or university. The achievement of a student is significantly influenced by socioeconomic factors such as the number of siblings (size of the family), parents’ education, and economic status of the family (family income). For example, Tahir Hijazi et al. [10] conducted a comprehensive study on the impact of SES on college students’ performance by establishing a link between performance and SES. Furthermore, they stated that mother education is strongly related to student performance, whereas family income is negatively related to student performance. According to Farooq et al. [17], socioeconomic factors and parents’ education level are significantly correlated; however, there is no correlation between fathers’ professions and student academic achievement. They also stated that students with higher SES perform better in mathematics and other related courses as compared to those with lower SES. Ogunshola et al. [18] determined the effects of parental SE factors on students’ performance. To conduct their study, they selected a high school in Nigeria’s Kwara State.
W. Korir et al. [19] conducted a study to determine whether a student’s academic success is correlated with the number of their siblings and parents who live in rural or urban areas. According to the study, the number of siblings and educational achievement have a very bad association because the former negatively affects a student’s performance at school. Another finding of their study was that students with parents who live in urban regions perform better than students with parents who live in rural areas. Adhanja et al. [20] conducted an empirical study on how family SES affects students’ learning and performance in Kenya’s public secondary schools (Rong Sub-County, Migori County). In their study, they used a survey questionnaire method that used a descriptive research approach. They found a strong correlation between parental education and students’ performance. Parents with higher education levels can guide their children in a better way compared to those with lower education levels. Hossain et al. [21] conducted a research study to identify the impact of SES characteristics on students’ academic performance. A random questionnaire survey was used for data collection. According to their analysis results, family background, particularly family income, and parents’ education are the two most prominent characteristics that play significant parts in the educational achievements of students.
Suleman et al. [22] conducted a comprehensive study on the impact of parents’ SES on the educational performance of their children. For their questionnaire survey, which was only for 10th graders, they looked at 60 government schools. They considered a number of variables in their questionnaire, the most important of which were the parents’ education, occupation, and income, all of which have a significant impact on a student’s performance. They found that parents with high education and income levels have a greater impact on their children’s educational achievements. Okioga et al. [23] conducted a research study in Kenya by establishing a relationship between the learning performance of students and their SES. They performed multiple experiments based on their collected data and found that almost 95% of the SES factors have a positive impact on a student’s learning performance, while 5% of the students feel no effect from the SES factors. Singh et al. [24] investigated the effects of the SES conditions of parents on their children’s educational performance in Uttar Pradesh, India. They collected the data from around 100 students and concluded that a strong financial background has a promising impact on a student’s academic qualifications and achievements. Dudaite et al. [25] determined the impact of SE factors in terms of the home environment on the educational achievements of students. They discovered that domestic socioeconomic circumstances have a strong influence on learners’ behavior. In a study conducted in Mogadishu, Somalia, Machebe et al. [9] discovered a link between students’ SES and educational performance. Correlation analysis and regression were used to test the research hypothesis, while survey methodology was used to investigate the relationship. Student performance, parental education, and income were shown to be significantly correlated.
Pea-López et al. [26] conducted a research study on the impact of socioeconomic factors that are hazardous for the students’ education. They found out the characteristics associated with poor student performance, evaluating subject scores in math and science for this purpose. The outcomes were measured in terms of achievement and evaluated via the national standardization test while taking into account some significant and influencing factors, such as lower SES, language barriers, cultural underground, and lack of permanent residence status. Jiusi Zhang et al. [27] proposed a novel parallel hybrid model that is based on a 1D CNN and a bidirectional gated recurrent unit for the prediction of the remaining useful life and attained promising results using two datasets, i.e., the aircraft turbofan engine dataset and the milling dataset. In another study [28], the authors suggested a new method for predicting the remaining useful life, called VLSTM-LWSAN. They used data from turbofan engines to confirm the effectiveness of their suggested concept. The VLSTM-LWSAN performs better than several DL models and techniques, according to the experimental outcomes. In an article, Jiusi Zhang et al. [29] proposed a new integrated multitasking intelligent bearing fault detection method that can diagnose faults, classify them, and find faults that are not known to be there. This method uses representational learning under the conditions of imbalanced samples. Using a public bearing dataset and a bearing dataset produced by the rotor dynamics experiment rig (RDER), they demonstrated the efficacy of their proposed system. C. Masci et al. [30] conducted a study on SES and the average performance of males versus females. They investigated the fact that, on average, boys do better in science courses than girls do—this is also true in Finland—and that, when compared to non-immigrant youths, immigrant pupils had a higher probability of having low science test scores due to their socioeconomic level. Using a survey approach and a test score to evaluate performance, F. Leva et al. [31] established a link between school quality and student achievement. They also included 10 countries in an international student evaluation, where they examined the influence of a student’s score across 9 nations using ML and statistical techniques.
Some researchers have already embraced ML techniques and approaches to examine pertinent socioeconomic problems, support parametric assumptions, and describe multipart systems. For example, Buenao et al. [32] carried out a comprehensive study to find out the impact of SES on engineering students’ performance using ML techniques. The outcomes of their study demonstrated the potency of ML approaches in predicting student achievement. Hussain et al. [33] conducted a study to find out how SES affects student performance based on an internal assessment using a DL approach, i.e., the ANN. They attained promising results with fewer attributes and smaller data amounts; however, the performance of their proposed system degrades with more attributes and larger data amounts. Belachew et al. [34] presented a student performance prediction model for Wolkite University that was based on ML techniques. Their proposed system uses multiple ML algorithms (NB, SVM, and NN) to determine the impact of SES on student performance. Oloruntoba et al. [35] carried out a study on the use of SVM to predict student academic achievement. Data were collected from the students at one of Nigeria’s Federal Polytechnics in the southwest part. The students’ GPAs were used to determine their academic performance. Although econometric methods and predictable statistical procedures like regression frequently produce accurate results, they struggle when we apply them to large datasets, which leads to the need for alternative tools. Some scholars have used hierarchical ML models for the identification of the educational achievements of a student [36,37,38]. Still, however, these methods have some drawbacks and need further polishing. There are two things that are very important: (1) to select the most important socioeconomic variables that have a strong correlation with student performance; and (2) to use other ML models like DT, RF, KNN, etc., to overcome the issues of the earlier approaches. In this study we considered the most important socioeconomic variables to find out the impact of these variables on the educational achievements of a student. In the proposed system we used different ML models and found out the implications of ML models in predicting the performance of a student. The simulation outcomes proved the effectiveness of the proposed system.

3. Proposed Methodology

The methodology adopted for the accomplishment of this study consists of various steps. Firstly, a questionnaire survey is conducted for the purpose of collecting data from different government and private schools in Khyber Pakhtunkhwa, Pakistan. The questionnaire consists of a total of 19 questions. In the second step, preprocessing techniques are used to normalize the collected data. Thirdly, different ML classification models are used to identify whether the SES of a student affects their performance or not. The ML models are trained on both full and selected feature samples, which also helps in identifying the most important parameters that greatly affect the performance of a student. Furthermore, to track the effectiveness of different ML models, numerous performance measures are used. The following subsections represent the material used and the methodology adopted for the conduction of this study.

3.1. Dataset Acquisition

The most important and preliminary step in developing an intelligent and computational system is to develop a dataset that has a strong association with the target class. Keeping the importance of datasets in mind, we developed our own dataset by visiting different government and private schools in Khyber Pakhtunkhwa, Pakistan. A questionnaire survey was conducted for the collection of data and the creation of the students’ SES dataset. Mature and mentally stable students who possess good thinking and understanding of the socioeconomic environment and its impact on a student’s lifestyle, particularly on their educational achievements, were selected for the questionnaire survey. The collected dataset contains the data of two types of students: those whose academic performance is affected by their family’s SES and those whose performance is not affected by their family’s SES. The students whose performance was affected by family SES were fewer in number, while the students whose performance was not affected were many in number, resulting in an imbalanced dataset. The main problem associated with the imbalanced dataset is that it results in a model overfitting problem. To sort out this issue of overfitting and dataset unbalancing, we used the SMOTE data oversampling technique. The SMOTE algorithm increases the samples of affected students, balancing the dataset, improving the model’s performance, and avoiding model overfitting problems. The created dataset consists of a total of 19 parameters, of which 18 are features that decide whether the student’s performance is affected by the student’s SES or not, and the last one is the target class. A complete description of the developed dataset is given in Table 1.

3.2. Proposed System Framework Based on Intelligent ML Models

The framework proposed for determining the impact of SES on student performance is discussed with more explanation in this section. The proposed framework consists of several steps where the output of one step is used as input for another. First, the dataset is loaded; after loading the dataset, it is normalized using preprocessing techniques. After normalization, the data are given in both full feature space and selected feature space to the models for training. Two feature selection algorithms, namely, Relief and FCBF, are used to select the 10 most prominent features from the 18 total features. The models are trained on both full features and selected features. Various ML algorithms are used in this study such as DT, KNN, RF, NB, XGBoost, LR, etc. Different performance measures are also used to track the performance of the utilized models. Figure 1 illustrates the complete framework of the proposed system.

3.2.1. Preprocessing of Data

Preprocessing techniques play an important role in improving the efficiency of ML models, as they normalize the data and then present them to the models in a normalized and well-structured format. Different preprocessing techniques are used for the normalization of data, including standard-scaler, min-max-scaler, etc.

3.2.2. Feature Selection Algorithms

Feature selection algorithms are very important when there are many features in a dataset. Choosing the most important features from a feature space is critical for improving the performance of a classification model. The use of feature selection algorithms has two main benefits: (1) enhancing the performance of the model and (2) minimizing the computation time of the model. We chose two feature selections for the simulations run in this paper: FCBF and Relief.
1.
FCBF Feature Selection Algorithm:
Feature selection algorithms play an important part in improving the performance of ML models. FCBF is a multimodal feature selection technique that considers the dependencies between every feature pair as well as their class relevance. The FCBF feature selection algorithm, which is based on information theory, leverages symmetrical uncertainties to compute feature dependency and class relevance. In order to eliminate pointless and redundant characteristics, FCBF is based on a heuristic search strategy that combines a sequential search method with a backward selection method. Finally, when there are no more features to remove, the algorithm terminates. Table 2 illustrates the 12 most prominent features selected via the FCBF feature selector.
2.
Relief Feature Selection Algorithm:
Feature selection algorithms are useful for removing irrelevant and redundant features, enhancing the performance of the models, minimizing the effects of the curse of dimensionality, improving the data’s readability, and preventing learning algorithms from running slowly. The Relief feature selection algorithm assigns a score to each feature based on a calculation, which is then used for the selection of the most important and highest-scoring features from the feature space. As a result, for subsequent modelling, these scores could also be used as feature weights. The determination of feature values between nearest neighbor samples forms the basis for evaluating Relief feature pairs. The feature score drops if a difference in feature values is seen in a nearby instance pair belonging to the same class. However, the feature score increases if there is a feature value difference between the nearby feature pair of a different class. The Relief feature selector takes the 12 most important features from the dataset as shown in Table 3.

3.2.3. Utilized ML Classification Algorithms

With the constant advancement of technology, traditional mathematical and statistical methods are no longer capable of providing the modern services that are in demand. ML classification algorithms have various applications in different aspects of life, such as healthcare, agriculture, engineering, sports, entertainment, economics, management, social sciences, etc. In this study, we investigated the application of different ML algorithms in the field of socioeconomics, to find out whether SES has an impact on student performance or not. To do so, we trained and tested the performance of ML models such as the decision tree (DT), random forest (RF), naïve Bayes (NB), adaptive boosting (AB), logistic regression (LR), gradient boosting (GB), K-nearest neighbor (KNN), and extra tree classifier (ET) methods on the created dataset. The use of multiple models further explores the significance of ML in the field of socioeconomics.

3.2.4. Model Validation Techniques

Validation of the classification model results is also an important step that needs great attention. Generally, cross-validation (CV) methods ensure the stability of the results produced by the models. There are various CV techniques such as leave-one-out, hold-out, K-fold, stratified K-fold, etc. In this study, we used the K-fold CV method for the validation of classification model results. In a K-fold CV, the data are divided into k equal parts, where “k-1” parts are used to train the model, while the remaining part is used for testing the model. The number of iterations is equal to the number of folds. For example, in a 10-fold CV, the first 9 folds are used for training, whereas the last fold is used for testing the model in the first iteration. In the next iteration, the second-last fold is used for testing, while the remaining nine folds are used for training the model, and so on. At every iteration, the accuracy and loss of the model are noticed. At the end, the averages of the accuracy and loss are taken, which are considered the final accuracy and loss of the model.

3.2.5. Performance Metrics

The proposed system’s final step is to evaluate the performance of the implemented ML classification models using various performance metrics such as accuracy, sensitivity, specificity, precision, recall, Matthew correlation coefficient (MCC), and ROC curve. All the performance measures are attained via the confusion matrix shown in Table 4.
The following are some of the formulas of the mentioned performance measures:
Accuracy = TP + TN   TP + TN + FN × 100 %
Specificity = TN   TN + FP × 100 %
Sensitivity = TP TP + TN × 100 %
Precision = TP TP + FP × 100
F 1 - Measure   = 2 × Precision × Recal Precision + Recall × 100
MCC = TP × TN FP × FN TP + FP TP + FN TN + FP TN + FN × 100

4. Results and Discussion

In this section, we discuss the experimental setup, the simulation environment, and the conduction of multiple experiments on the developed student socioeconomic dataset using different ML classification algorithms. Firstly, we discuss the experimental setup and environment, such as system specifications, the tools (IDE and programming language) used for simulations, the packages used for the implementation of multiple ML algorithms, and the parameter settings of the utilized ML models. Secondly, we checked the performance of the utilized ML models on full features with the help of different performance measures. Thirdly, we tracked the performance of the ML models on selected feature spaces using the FCBF and Relief feature selection algorithms. Fourthly, we analyzed the performance of the models on both full and selected features. After analyzing the results, we found that the ML models performed well when the most prominent and target-related features were given to them. Finally, we compared the experimental outcomes of our proposed system with those of earlier approaches.

4.1. Experimental Setup and Environment

This section describes the experimental setup and the simulation environment used for the conduction of experiments and putting the ML models into action. All of the simulations were carried out on a system having the following specifications: a Lenovo Legion R7 5th Generation system having 16 GB of RAM, an NVIDIA G4 RTX360, and an SSD of 1 TB. An Anaconda Jupyter Notebook was used for experimentation and simulation results. Python was used as a tool to train and test the performance of classification models. The main packages we used for the implementation of ML models and graphics generation are Scikit-Learn (Sklearn) [39], Pandas [40], Numpy [41], Seaborn [42], Matplotlib [43], etc. Furthermore, parameter settings play a significant role in the performance enhancement of an ML model. Keeping the significance of parameter settings in consideration, we listed all of the ML models and their parameters in Table 5.
The following subsections represent the experimental outcomes on both full and selected features.

4.2. Experimental Outcomes of the Utilized Models Using Full Features

This section represents the experimental outcomes of all the utilized ML classification models using the full features of the created dataset. In this section, we measured the performance of all the utilized ML models using different performance measures such as accuracy, sensitivity, specificity, etc., as shown in Table 6.
The experimental results shown in Table 6 demonstrate that the ET classifier performed exceptionally well in almost every performance measure, obtaining an accuracy of 82.84%, a sensitivity of 87.00%, a specificity of 79.85%, a precision of 84.00%, a recall of 83.15%, an AUC score of 92.60%, an f1-measure of 0.83, and an MCC of 0.660. RF performed very well and attained the second spot in the performance competition. Furthermore, NB showed poor performance in terms of all performance measures, as shown in Table 6.
The efficiency of the KNN model varies with the value of K. Therefore, we checked the performance of KNN for varying values of K (K = 3 to 9) as shown in Table 7.
Table 7 shows that the performance of the KNN algorithm depends on the value of K. When we changed the value of K, there was a difference in the simulation results, resulting in an increase or decrease in the performance of the KNN algorithm, as shown in Table 7. Another important performance metric that signifies the importance of a classification model is the ROC curve. Figure 2 shows the ROC curves of all models investigated in this article.

4.3. Results of the Investigated Models on Selected Features (Relief)

This section explains the results of the investigated ML models on selected features using the Relief feature selector. We selected the 12 most important features out of the 18 features via the Relief feature selector. Table 8 depicts the performance of the investigated models and Relief feature selector.
Table 8 shows the performance of the investigated ML models on the features selected via the Relief feature selection algorithm. From Table 8, it is very clear that for selected features, the performance of the classification models increases. Furthermore, like with full features, the performance of the ET model was superb on selected features as well. The ET classification model attained an accuracy of 83.68%, a specificity of 83.61%, a sensitivity of 83.76%, an AUC score of 91.20%, a precision of 84.00%, a recall of 84.00%, an f1-score of 0.84, and an MCC of 0.674. As illustrated in Table 8, the second highest accuracy of 82.01%, sensitivity of 78.13%, specificity of 86.49%, AUC score of 88.10%, precision of 83.00%, recall of 82.00%, f1-score of 0.82, and MCC score of 0.645 were achieved by the RF classification model.
The experimental results show that when the value of “K” in the KNN algorithm is changed, the performance of the model also changes. Therefore, we performed multiple experiments for the KNN algorithm to find the optimal value for “K” by changing the value of “K” in each experiment. The experimental results of the KNN algorithm are shown in Table 9.
Table 9 shows that the value of “K” has a great association with the mentioned performance measures; changing the value of “K” significantly changes the values of the accuracy, sensitivity, and specificity. The highest value was observed for “K = 3”, and therefore, we selected “K = 3” as the optimal value for K.
The ROC curves of the investigated ML models in combination with the Relief feature selection algorithm are depicted in Figure 3.

4.4. Results of the Investigated Models on Selected Features (FCBF)

This section explains the experimental setup and the simulation results achieved by performing multiple experiments for the investigated ML classification models using the selected features chosen via the FCBF feature selection algorithm. First, we load the dataset with the full set of features (18 features in total), and then we give it to the FCBF feature selection algorithm to select the most important 12 features out of the 18 features. Secondly, we present the selected features to the utilized ML models to find out whether the selected features have an impact on the performance of the classification models or not. Furthermore, we track the performance of the models with the help of different performance measures such as accuracy, sensitivity, specificity, etc. Table 10 shows the experimental results attained by the ML models on the selected feature space chosen via the FCBF feature selection algorithm.
Table 10 shows that there are more variations in the results as compared to the other approaches, i.e., full features + ML models and selected features (Relief) + ML models. The combination of the FCBF feature selector and GB classifier performed better than the rest of the models, achieving an accuracy of 82.01%, a sensitivity of 83.87%, a specificity of 80.00%, a precision of 82.00%, a recall of 82.00%, an AUC of 87.00%, an f1-measure of 0.82, and an MCC score of 0.640, as shown in Table 10. Furthermore, like the other approaches, NB again performed poorly and stood in last place in the performance competition.
Parameter settings in ML play a vital role in increasing the performance of classification models. Keeping the importance of the parameter “K” in consideration, we performed multiple experiments for different values of “K” (K = 3 to 10), and each time we observed that the experimental outcomes fluctuated with the value of “K”. Table 11 illustrates the experimental results attained for different values of “K”.
The ROC curve is a performance metric that shows the significance of a model for a particular problem. Figure 4 shows a comparison of the ROC curves of all of the models used in this study for the selected features chosen through the FCBF feature selector.

4.5. Comparison of the Proposed Work with Earlier Approaches

Our proposed system performed very well on both full and selected features. Table 12 shows a comparison of the proposed system with the earlier approaches used for finding the impact of SES on student performance.

5. Conclusions

This study investigates the most critical socioeconomic factors and their impact on the educational achievements and performance of a student. This study mainly focuses on how the SES of a student affects his or her performance. Keeping in view the importance of SES for the performance of secondary school students, we conducted a comprehensive questionnaire survey by visiting different private and public schools in Khyber Pakhtunkhwa, Pakistan. We considered the 18 most critical socioeconomic factors for our questionnaire survey and created our own dataset, consisting of 18 features and a target class. We used different preprocessing techniques, such as missing value removal, outliers’ removal, and standard scaler to clean and normalize the data. The negative instances in the dataset outnumbered the positive ones, giving rise to the concept of an imbalanced dataset. To sort out this issue, we used SMOTE oversampling techniques to balance the dataset. After balancing the dataset, we trained and tested the nine ML models (DT, RF, NB, KNN, LR, GB, AB, ET, and MLP) on the full set of features (18 features) of the dataset and noted the experimental outcomes in terms of different performance measures such as accuracy, specificity, sensitivity, etc. According to the experimental outcomes, the ET classifier performed exceptionally well and attained an accuracy of 81.27%, a sensitivity of 84.42%, a specificity of 80.15%, a precision of 79.78%, a recall of 83.44%, an AUC score of 90.14%, an f1-score of 0.80, and an MCC of 0.646. After checking the performance of the utilized ML models on the full feature space, we used two well-known feature selection algorithms (FCBF and Relief) to select the 12 most important features out of the overall 18 and tracked the performance of the utilized models. Firstly, we used the Relief feature selection algorithm along with the investigated ML models and noticed the experimental outcomes. According to the experimental results of this approach, the performance of almost every model was improved. The experimental results reveal that again the ET classifier performed well by attaining an accuracy of 83.68%, a sensitivity of 83.76%, a specificity of 83.61%, a precision of 84.00%, a recall of 84.00%, an AUC score of 91.20%, an f1-score of 0.84, and an MCC score of 0.674. Secondly, we performed an experiment using the combination of FCBF and the nine ML models investigated and noted the experimental outcomes in terms of different performance measures. Unlike the full features + ML models and Relief + ML models, we observed that the GB classifier along with the FCBF feature selection algorithm performed better than the ET classifier by attaining an accuracy of 82.01%, a sensitivity of 83.87%, a specificity of 80.00%, a precision of 82.00%, a recall of 82.00%, an AUC score of 87.00%, an f1-score of 0.82, and an MCC score of 0.640. The experimental results indicate that the proposed approach is better than the other state-of-the-art statistical approaches used in the literature. The proposed computational system will help education departments improve the performance of students by considering all of the important socioeconomic factors.

Author Contributions

Conceptualization, Y.M., M.A.H. and S.A.; methodology, Y.M., M.A.H. and S.A.; software, Y.M., M.A.H., K.F. and S.A.; validation, Y.M., M.A.H. and K.F.; formal analysis, Y.M., M.A.H. and K.F.; investigation, Y.M., M.A.H. and K.F.; resources, Y.M., M.A.H. and F.G.; data curation, Y.M., M.A.H., Ľ.S. and F.G.; writing—original draft preparation, Y.M., M.A.H. and Ľ.S.; writing—review and editing, Y.M. and M.A.H.; visualization, Y.M. and M.A.H.; supervision, Y.M. and M.A.H.; project administration, Y.M. and M.A.H.; funding acquisition, Ľ.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Faculty of Management of Comenius University in Bratislava, Slovakia.

Data Availability Statement

All data are available in this manuscript.

Conflicts of Interest

The authors declare that they have no conflict of interest.

References

  1. Lempp, H.; Seale, C. The hidden curriculum in undergraduate medical education: Qualitative study of medical students’ perceptions of teaching. Bmj 2004, 329, 770–773. [Google Scholar] [CrossRef] [PubMed]
  2. Kimaiga, H.O. The Influence of Parental Socio Economic Status on Pupil’s Academic Performance at Kenya Certificate of Primary Education in Kiamokama Division of Kisii County; University of Nairobi: Nairobi, Kenya, 2014. [Google Scholar]
  3. Guo, L.-H.; Cheng, S.; Liu, J.; Wang, Y.; Cai, Y.; Hong, X.-C. Does social perception data express the spatio-temporal pattern of perceived urban noise? A case study based on 3137 noise complaints in Fuzhou, China. Appl. Acoust. 2022, 201, 109129. [Google Scholar] [CrossRef]
  4. Akhtar, Z. Socio-economic status factors effecting the students achievement: A predictive study. Int. J. Soc. Sci. Educ. 2012, 2, 281–287. [Google Scholar]
  5. Li, T.; Li, Y.; Hoque, M.A.; Xia, T.; Tarkoma, S.; Hui, P. To what extent we repeat ourselves? Discovering daily activity patterns across mobile app usage. IEEE Trans. Mob. Comput. 2020, 21, 1492–1507. [Google Scholar] [CrossRef]
  6. Saifi, S.; Mehmood, T. Effects of socioeconomic status on students achievement. Int. J. Soc. Sci. Educ. 2011, 1, 119–128. [Google Scholar]
  7. Sirin, S.R. Socioeconomic status and academic achievement: A meta-analytic review of research. Rev. Educ. Res. 2005, 75, 417–453. [Google Scholar] [CrossRef]
  8. Eamon, M.K. Social-demographic, school, neighborhood, and parenting influences on the academic achievement of Latino young adolescents. J. Youth Adolesc. 2005, 34, 163–174. [Google Scholar] [CrossRef]
  9. Machebe, C.H.; Ezegbe, B.N.; Onuoha, J. The Impact of Parental Level of Income on Students’ Academic Performance in High School in Japan. Univers. J. Educ. Res. 2017, 5, 1614–1620. [Google Scholar]
  10. Rasul, S.; Bukhsh, Q. A study of factors affecting students’ performance in examination at university level. Procedia-Soc. Behav. Sci. 2011, 15, 2042–2047. [Google Scholar] [CrossRef]
  11. Iqbal, S.M.; Long, C.S.; Fei, G.C.; Bukhari, S.M. Moderating effect of top management support on relationship between transformational leadership and project success. Pak. J. Commer. Soc. Sci. 2015, 9, 540–567. [Google Scholar]
  12. Tomul, E.; Polat, G. The effects of socioeconomic characteristics of students on their academic achievement in higher education. Am. J. Educ. Res. 2013, 1, 449–455. [Google Scholar] [CrossRef]
  13. Musso, M.F.; Hernández, C.F.R.; Cascallar, E.C. Predicting key educational outcomes in academic trajectories: A machine-learning approach. High. Educ. 2020, 80, 875–894. [Google Scholar] [CrossRef]
  14. Cao, H. Entrepreneurship education-infiltrated computer-aided instruction system for college Music Majors using convolutional neural network. Front. Psychol. 2022, 13, 900195. [Google Scholar] [CrossRef] [PubMed]
  15. Xiong, Z.; Liu, Q.; Huang, X. The influence of digital educational games on preschool Children’s creative thinking. Comput. Educ. 2022, 189, 104578. [Google Scholar] [CrossRef]
  16. Wu, B.; Liu, Z.; Gu, Q.; Tsai, F.-S. Underdog mentality, identity discrimination and access to peer-to-peer lending market: Exploring effects of digital authentication. J. Int. Financ. Mark. Inst. Money 2023, 83, 101714. [Google Scholar] [CrossRef]
  17. Farooq, M.S.; Chaudhry, A.H.; Shafiq, M.; Berhanu, G. Factors affecting students’ quality of academic performance: A case of secondary school level. J. Qual. Technol. Manag. 2011, 7, 1–14. [Google Scholar]
  18. Ogunshola, F.; Adewale, A. The effects of parental socio-economic status on academic performance of students in selected schools in Edu Lga of Kwara State Nigeria. Int. J. Acad. Res. Bus. Soc. Sci. 2012, 2, 230–239. [Google Scholar]
  19. Korir, W. Influence of number of siblings and learning resources at home on students’ academic performance. J. Emerg. Trends Educ. Res. Policy Stud. 2017, 8, 291–299. [Google Scholar]
  20. Nyakan, P.; Yambo, J. Family Based Socio-Economic Factors that Affect Students’ academic Performance in Public Secondary Schools in Rongo Sub-County, Migori County, Kenya. J. Bus. Manag. Sci. 2016, 2, 8. [Google Scholar]
  21. Hossain, A.; Zeheen, A.; Islam, M.A. Socio-economic background and performance of the students at presidency university in Bangladesh. Eur. J. Educ. Stud. 2012, 5. [Google Scholar] [CrossRef]
  22. Suleman, Q.; Hussain, I.; Khan, F.U.; Nisa, U. Effects of parental socioeconomic status on the academic achievement of secondary school students in karak district, pakistan. Int. J. Hum. Resour. Stud. 2012, 2, 14–32. [Google Scholar] [CrossRef]
  23. Okioga, C.K. The impact of students’ socio-economic background on academic performance in Universities, a case of students in Kisii University College. Am. Int. J. Soc. Sci. 2013, 2, 38–46. [Google Scholar]
  24. Singh, A.; Singh, J.P. The influence of socio-economic status of parents and home environment on the study habits and academic achievement of students. Educ. Res. 2014, 5, 348–352. [Google Scholar]
  25. Dudaite, J. Impact of socio-economic home environment on student learning achievement. Indep. J. Manag. Prod. 2016, 7, 854–871. [Google Scholar] [CrossRef]
  26. Peña-López, I. PISA 2015 Results (Volume I). Excellence and Equity in Education; OECD Publishing: Paris, France, 2016. [Google Scholar]
  27. Zhang, J.; Tian, J.; Li, M.; Leon, J.I.; Franquelo, L.G.; Luo, H.; Yin, S. A parallel hybrid neural network with integration of spatial and temporal features for remaining useful life prediction in prognostics. IEEE Trans. Instrum. Meas. 2022, 72, 3501112. [Google Scholar] [CrossRef]
  28. Zhang, J.; Li, X.; Tian, J.; Jiang, Y.; Luo, H.; Yin, S. A variational local weighted deep sub-domain adaptation network for remaining useful life prediction facing cross-domain condition. Reliab. Eng. Syst. Saf. 2023, 231, 108986. [Google Scholar] [CrossRef]
  29. Zhang, J.; Zhang, K.; An, Y.; Luo, H.; Yin, S. An Integrated Multitasking Intelligent Bearing Fault Diagnosis Scheme Based on Representation Learning Under Imbalanced Sample Condition. IEEE Trans. Neural Netw. Learn. Syst. 2023. [Google Scholar] [CrossRef]
  30. Masci, C.; Johnes, G.; Agasisti, T. Student and school performance across countries: A machine learning approach. Eur. J. Oper. Res. 2018, 269, 1072–1085. [Google Scholar] [CrossRef]
  31. Masci, C.; Ieva, F.; Agasisti, T.; Paganoni, A.M. Does class matter more than school? Evidence from a multilevel statistical analysis on Italian junior secondary school students. Socio-Econ. Plan. Sci. 2016, 54, 47–57. [Google Scholar] [CrossRef]
  32. Buenaño-Fernández, D.; Gil, D.; Luján-Mora, S. Application of machine learning in predicting performance for computer engineering students: A case study. Sustainability 2019, 11, 2833. [Google Scholar] [CrossRef]
  33. Hussain, S.; Muhsin, Z.; Salal, Y.; Theodorou, P.; Kurtoğlu, F.; Hazarika, G. Prediction model on student performance based on internal assessment using deep learning. Int. J. Emerg. Technol. Learn. 2019, 14, 4–22. [Google Scholar] [CrossRef]
  34. Belachew, E.B.; Gobena, F.A. Student performance prediction model using machine learning approach: The case of Wolkite university. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 2017, 7, 46–50. [Google Scholar] [CrossRef]
  35. Oloruntoba, S.; Akinode, J. Student academic performance prediction using support vector machine. Int. J. Eng. Sci. Res. Technol. 2017, 6, 588–597. [Google Scholar]
  36. Agasisti, T.; Ieva, F.; Paganoni, A.M. Heterogeneity, school-effects and the North/South achievement gap in Italian secondary education: Evidence from a three-level mixed model. Stat. Methods Appl. 2017, 26, 157–180. [Google Scholar] [CrossRef]
  37. Rauer, J.N.; Kroiss, M.; Kryvinska, N.; Engelhardt-Nowitzki, C.; Aburaia, M. Cross-university virtual teamwork as a means of internationalization at home. Int. J. Manag. Educ. 2021, 19, 100512. [Google Scholar] [CrossRef]
  38. Kryvinska, N.; Baroková, A.; Auer, L.; Ivanochko, I.; Strauss, C. Business value assessment of services re-use on SOA using appropriate methodologies, metrics and models. Int. J. Serv. Econ. Manag. 12 2013, 5, 301–327. [Google Scholar] [CrossRef]
  39. Available online: https://scikit-learn.org/stable/index.html (accessed on 12 April 2023).
  40. Available online: https://pandas.pydata.org/ (accessed on 12 April 2023).
  41. Available online: https://numpy.org/ (accessed on 12 April 2023).
  42. Available online: https://seaborn.pydata.org/ (accessed on 12 April 2023).
  43. Available online: https://matplotlib.org/ (accessed on 12 April 2023).
  44. Gerritsen, L.; Conijn, R. Predicting Student Performance with Neural Networks; Tilburg University: Tilburg, The Netherlands, 2017. [Google Scholar]
  45. Stapel, M.; Zheng, Z.; Pinkwart, N. An Ensemble Method to Predict Student Performance in an Online Math Learning Environment. In Proceedings of the 9th International Conference on Educational Data Mining, Raleigh, NC, USA, 29 June–2 July 2016; Available online: https://files.eric.ed.gov/fulltext/ED592647.pdf (accessed on 12 April 2023).
  46. Zohair, A.; Mahmoud, L. Prediction of Student’s performance by modelling small dataset size. Int. J. Educ. Technol. High. Educ. 2019, 16, 27. [Google Scholar] [CrossRef]
  47. Kirsal Ever, Y.; Dimililer, K.; Sekeroglu, B. Comparison of machine learning techniques for prediction problems. In Web, Artificial Intelligence and Network Applications: Proceedings of the Workshops of the 33rd International Conference on Advanced Information Networking and Applications (WAINA-2019); Springer: Berlin/Heidelberg, Germany, 2019; pp. 713–723. [Google Scholar]
Figure 1. Methodology of the proposed system.
Figure 1. Methodology of the proposed system.
Electronics 12 01982 g001
Figure 2. ROC curves of all of the utilized ML models on full features.
Figure 2. ROC curves of all of the utilized ML models on full features.
Electronics 12 01982 g002
Figure 3. ROC curves of all of the models using Relief as a feature selection algorithm.
Figure 3. ROC curves of all of the models using Relief as a feature selection algorithm.
Electronics 12 01982 g003
Figure 4. ROC curves of all of the models using FCBF as a feature selection algorithm.
Figure 4. ROC curves of all of the models using FCBF as a feature selection algorithm.
Electronics 12 01982 g004
Table 1. Dataset Description.
Table 1. Dataset Description.
S. NoFeature NameCodeDescription of the Feature
1School accessSAEasy access to school
Y = 1 N = 0
2Market accessMAAccess to market
Y = 1 N = 0
3Hospital accessHAAccess to hospital
Y = 1 N = 0
4Rented houseRHLiving in rented house
Y = 1 N = 0
5Main source of incomeMIMain source of family income
1 = father, 2 = mother, 3 = brother, 4 = sister, 5 = other
6Family incomeFITotal monthly income of the family
1 = 5–20k, 2 = 21–40k, 3 = 41–50k, 4 = 51–60k, 5 = above state
7Defendant FamilyDF1 = 1–3, 2 = 4–7, 3 = 8–11, 4 = 12–15, 5 = MORE
8Family SizeFSNumber of people in the family
1 = 3–5, 2 = 5–8, 3 = 8–12, 4 = 12–15
9Father EducationFEHighest education of the father
1 = primary, 2 = High, 3 = Graduate, 4 = Postgraduate, 5 = No Education
10Mother EducationMEHighest education of the mother
1 = primary, 2 = High, 3 = Graduate, 4 = Postgraduate, 5 = No Education
11Father’s OccupationFOOccupation of the father
1 = Govt. sector, 2 = private sector, 3 = business, 4 = Farmer, 5 = other, 6 = none
12Mother’s OccupationMOOccupation of the mother
1 = Govt. sector, 2 = private sector, 3 = business, 4 = Farmer, 5 = other, 6 = none
13Transport facilityTFHaving transport facility
Y = 1, N = 0
14Traveling sourceTSConveyance Type
1 = By walk, 2 = By wan, 3 = by cycle, 4 = by car
15Board-Exam percentageBP1 = 33–50%, 2 = 51–60%, 3 = 61–70%, 4 = 71–80%, 5 = 81% and above, 6 = fail
16Learning FacilitiesLFHaving learning facilities
Y = 1, N = 0
17Studying at homeSHType of study at home
1 = Family Member, 2 = Tutor, 3 = self-study
18Education satisfactionESSatisfied with the quality of education
Y = 1, N = 0
Table 2. Features selected by FCBF algorithm.
Table 2. Features selected by FCBF algorithm.
S. No Feature NoFeature CodeFeature Score
117LF0.77
203AH0.56
314TF0.53
416BP0.53
515SA0.44
602AM0.31
718SH0.29
809FE0.23
905EF0.21
1013NS0.031
1110ME0.030
1207DF0.012
Table 3. Features selected by Relief algorithm.
Table 3. Features selected by Relief algorithm.
S. NoFeature NoFeature CodeFeature Score
111FO0.32
203HA0.20
302MA0.16
410ME0.16
512MO0.14
618SH0.13
716BP0.13
801SA0.12
909FE0.09
1017LF0.09
118FS0.01
1207DF0.003
Table 4. Confusion matrix of the proposed system.
Table 4. Confusion matrix of the proposed system.
ActualPredicted
Unaffected StudentAffected Student
Unaffected studentTNFP
Affected studentFNTP
Table 5. Utilized ML models and their parameter settings.
Table 5. Utilized ML models and their parameter settings.
ClassifiersParameter Settings
DTmax_depth = 100, max_leaf_nodes = 13, min_samples_split = 10, random_state = 5, min_samples_leaf = 1
MLPAlpha = 0.007, random_state = 5, solver = ‘lbfgs’, activation = ‘relu’
KNN metric = ‘manhattan’, n_neighbors = 3, weights = ‘uniform’
RFmax_depth = 10, min_samples_leaf = 5, min_samples_split = 5, n_estimators = 100
NBGaussianNB()
ABn_estimators = 50, learning_rate = 1.0
LRC = 1.0, random_state = 5, solver = ‘liblinear’
ETn_estimators = 50, criterion = ‘gini’, min_samples_split = 2, min_samples_leaf = 1
GBlearning_rate = 1.0, n_estimators = 50, max_depth = 1.0, random_state = 5
Table 6. Performance of the ML models using the full features of the dataset.
Table 6. Performance of the ML models using the full features of the dataset.
ClassifiersAccuracySensitivitySpecificityPrecisionRecallAUCF1MCC
DT73.2270.7375.8673.1073.1275.900.730.466
MLP72.3876.3469.8674.1172.1279.200.730.450
KNN (K = 3)78.0086.2073.6881.1378.0083.500.740.576
RF78.2474.8082.1479.0078.0085.000.780.568
NB69.8768.0671.6670.0070.0076.500.700.397
AB71.5468.3471.8672.0072.1080.800.720.536
LR71.5470.0872.9572.0572.0078.500.720.405
ET82.8487.0079.8584.0083.1592.600.830.660
GB77.4080.1975.3678.0077.0684.900.780.549
Table 7. Performance of KNN model (K = 3 to 9) on full features.
Table 7. Performance of KNN model (K = 3 to 9) on full features.
KNN (K = 3 to 9)AccuracySensitivitySpecificity
K = 378.2486.2073.68
K = 471.1284.8465.89
K = 575.3180.4372.10
K = 671.5481.3367.07
K = 773.6476.0271.94
K = 869.9574.4166.66
K = 975.7375.2276.19
Table 8. Performance of the ML models on selected features selected through Relief.
Table 8. Performance of the ML models on selected features selected through Relief.
ModelAccuracySensitivitySpecificityAUCPrecisionRecallF1MCC
KNN (N = 3)78.2483.1675.0085.3080.0078.000.780.567
DT75.7368.7589.8775.4082.0076.000.760.551
ET83.6883.7683.6191.2084.0084.000.840.674
GB79.0881.4877.1086.0079.0079.000.790.583
RF82.0178.1386.4988.1083.0082.000.820.645
AB74.9073.1176.6781.6075.0075.000.750.498
NB74.8972.6577.4878.3075.0075.000.750.500
LR78.6675.5782.4182.7079.0079.000.790.577
MLP74.0677.8971.5379.1076.0074.000.740.484
Table 9. Performance of the KNN model (K = 3 to 9) on selected features (Relief).
Table 9. Performance of the KNN model (K = 3 to 9) on selected features (Relief).
KNN (K = 3 to 9)AccuracySensitivitySpecificity
K = 378.2682.0076.74
K = 476.5786.9070.97
K = 578.1581.7375.56
K = 676.9883.8772.60
K = 778.2478.9577.60
K = 878.2483.6774.47
K = 976.9978.3875.78
Table 10. Experimental results of the utilized ML models on selected features selected through BCBF.
Table 10. Experimental results of the utilized ML models on selected features selected through BCBF.
ModelAccuracySensitivitySpecificityAUCPrecisionRecallF1MCC
KNN (K = 3)74.8982.5269.1283.0077.0075.000.750.512
DT71.1264.9782.9375.1077.0071.000.720.455
ET79.5881.8276.2785.9079.0079.000.790.582
GB82.0183.8780.0087.0082.0082.000.820.640
RF79.5076.0082.0985.1082.0079.000.800.595
AB71.1371.6470.4878.8072.0072.000.720.420
NB66.9568.4665.1373.5067.0067.000.670.335
LR71.1271.6270.4776.0071.0071.000.710.419
MLP72.8068.8977.8873.5074.0073.000.730.464
Table 11. Performance of the KNN model (K = 3 to 9) using the FCBF feature selection algorithm.
Table 11. Performance of the KNN model (K = 3 to 9) using the FCBF feature selection algorithm.
KNN (K = 3 to 9)AccuracySensitivitySpecificity
K = 374.8973.9567.50
K = 470.7173.9567.50
K = 577.8273.9567.50
K = 670.7173.9567.50
K = 773.6473.9567.50
K = 870.7173.9567.50
K = 970.7173.9567.50
Table 12. Comparative analysis of the proposed system with earlier approaches.
Table 12. Comparative analysis of the proposed system with earlier approaches.
ReferenceDatasetApproach/ModelAccuracy
Gerristen et al. [44]Moodle logLR
NN
62.40%
66.10%
Stapel et al. [45]Own dataset
(Small dataset)
NB
LR
DT
65.40%
68.20%
71.50%
Zohair et al. [46]Own dataset
(Small dataset)
LDA
SVM
71.10%
76.30%
Sekeroglu et al. [47]SPD
SAPD
BPNN79.22% (SPD)
80.10% (SAPD)
ProposedDeveloped dataset
(18-features)
Full features + ET
Relief (12 features) + ET
FCBF (12 features) + GB
82.84%
83.68%
82.01%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Muhammad, Y.; Hassan, M.A.; Almotairi, S.; Farooq, K.; Granelli, F.; Strážovská, Ľ. The Role of Socioeconomic Factors in Improving the Performance of Students Based on Intelligent Computational Approaches. Electronics 2023, 12, 1982. https://doi.org/10.3390/electronics12091982

AMA Style

Muhammad Y, Hassan MA, Almotairi S, Farooq K, Granelli F, Strážovská Ľ. The Role of Socioeconomic Factors in Improving the Performance of Students Based on Intelligent Computational Approaches. Electronics. 2023; 12(9):1982. https://doi.org/10.3390/electronics12091982

Chicago/Turabian Style

Muhammad, Yar, Muhammad Abul Hassan, Sultan Almotairi, Kawsar Farooq, Fabrizio Granelli, and Ľubomíra Strážovská. 2023. "The Role of Socioeconomic Factors in Improving the Performance of Students Based on Intelligent Computational Approaches" Electronics 12, no. 9: 1982. https://doi.org/10.3390/electronics12091982

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop