Education Data Mining

A special issue of Data (ISSN 2306-5729). This special issue belongs to the section "Information Systems and Data Management".

Deadline for manuscript submissions: closed (28 February 2022) | Viewed by 42939

Special Issue Editors


E-Mail Website
Guest Editor
Dipartimento di Statistica, Informatica, Applicazioni (DiSIA), Università di Firenze, I-50134 Firenze, Italy
Interests: multilevel models; latent variable models; causal inference; methods for the evaluation of public services

E-Mail Website
Guest Editor
Dipartimento di Statistica, Informatica, Applicazioni (DiSIA), Università di Firenze, I-50134 Firenze, Italy
Interests: analysis of algorithms and data structures; enumerative combinatorics; symbolic computation; databases and data mining; educational data mining
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Dipartimento di Statistica, Informatica, Applicazioni (DiSIA), Università di Firenze, I-50134 Firenze, Italy
Interests: multilevel models; duration models; causal inference; evaluation of educational systems

E-Mail Website
Guest Editor
Dipartimento di Statistica, Informatica, Applicazioni (DiSIA), Università di Firenze, I-50134 Firenze, Italy
Interests: databases and algorithms; analysis of algorithms and combinatorics; educational data mining

Special Issue Information

Dear Colleagues,

Many fields and sectors, from business, medical and biological activities to public administration, are involved with the growth of data in computer systems. For this reason it is important to develop new methodologies and technologies to manage and analyse all the information that can be derived from such big sources  of data. For what concerns the field of education, Educational data mining is a  research area that explores and analyzes, by using data mining, machine learning and statistical  methods, both large repositories of data usually stored in the schools and universities databases for administrative purposes  and large amounts of information about teaching-learning interaction generated in e-learning or web-based educational contexts. Educational data mining considers a wide variety of types of data, including but not limited to log files of interactive learning environments and  intelligent tutoring systems,  results of examinations  and assessment tests and student-produced artifacts. Educational data mining seeks to use all this information  to better understand the performance of the student learning process and can be used by the university or school management to improve the entire educational process. The use of data mining in the educational context is mainly concerned with  techniques such as clustering, classification, regression, text mining, association rules mining and sequential pattern analysis.

This Special Issue aims at receiving papers in the field of educational data mining that are significant and original and clearly delineate their contributions to the literature, both in terms of data pre-processing and  data organization techniques and in terms of algorithms for data analysis.

Topics of interest include, but are not limited to, the following:

  • New techniques for mining educational data
  • Evaluation of students performance
  • Evaluation of curricula and university quality
  • Social network analysis of student and teacher interactions
  • Temporal patterns in student behavior
  • Text mining of educational documents
  • Students evaluation of teaching
  • Publishing educational datasets that are useful for the context

Prof. Dr. Leonardo Grilli
Prof. Dr. Donatella Merlini
Prof. Dr. Carla Rampichini
Prof. Dr. Maria Cecilia Verri
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Data is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Other

19 pages, 2685 KiB  
Article
A Mixture Hidden Markov Model to Mine Students’ University Curricula
by Silvia Bacci and Bruno Bertaccini
Data 2022, 7(2), 25; https://doi.org/10.3390/data7020025 - 21 Feb 2022
Cited by 2 | Viewed by 3161
Abstract
In the context of higher education, the wide availability of data gathered by universities for administrative purposes or for recording the evolution of students’ learning processes makes novel data mining techniques particularly useful to tackle critical issues. In Italy, current academic regulations allow [...] Read more.
In the context of higher education, the wide availability of data gathered by universities for administrative purposes or for recording the evolution of students’ learning processes makes novel data mining techniques particularly useful to tackle critical issues. In Italy, current academic regulations allow students to customize the chronological sequence of courses they have to attend to obtain the final degree. This leads to a variety of sequences of exams, with an average time taken to obtain the degree that may significantly differ from the time established by law. In this contribution, we propose a mixture hidden Markov model to classify students into groups that are homogenous in terms of university paths, with the aim of detecting bottlenecks in the academic career and improving students’ performance. Full article
(This article belongs to the Special Issue Education Data Mining)
Show Figures

Figure 1

19 pages, 1423 KiB  
Article
Development of a Web-Based Prediction System for Students’ Academic Performance
by Dabiah Alboaneen, Modhe Almelihi, Rawan Alsubaie, Raneem Alghamdi, Lama Alshehri and Renad Alharthi
Data 2022, 7(2), 21; https://doi.org/10.3390/data7020021 - 29 Jan 2022
Cited by 22 | Viewed by 9197
Abstract
Educational Data Mining (EDM) is used to extract and discover interesting patterns from educational institution datasets using Machine Learning (ML) algorithms. There is much academic information related to students available. Therefore, it is helpful to apply data mining to extract factors affecting students’ [...] Read more.
Educational Data Mining (EDM) is used to extract and discover interesting patterns from educational institution datasets using Machine Learning (ML) algorithms. There is much academic information related to students available. Therefore, it is helpful to apply data mining to extract factors affecting students’ academic performance. In this paper, a web-based system for predicting academic performance and identifying students at risk of failure through academic and demographic factors is developed. The ML model is developed to predict the total score of a course at the early stages. Several ML algorithms are applied, namely: Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbors (KNN), Artificial Neural Network (ANN), and Linear Regression (LR). This model applies to the data of female students of the Computer Science Department at Imam Abdulrahman bin Faisal University (IAU). The dataset contains 842 instances for 168 students. Moreover, the results showed that the prediction’s Mean Absolute Percentage Error (MAPE) reached 6.34%, and the academic factors had a higher impact on students’ academic performance than the demographic factors, the midterm exam score in the top. The developed web-based prediction system is available on an online server and can be used by tutors. Full article
(This article belongs to the Special Issue Education Data Mining)
Show Figures

Figure 1

15 pages, 2494 KiB  
Article
Analysing Computer Science Courses over Time
by Renza Campagni, Donatella Merlini and Maria Cecilia Verri
Data 2022, 7(2), 14; https://doi.org/10.3390/data7020014 - 24 Jan 2022
Viewed by 2588
Abstract
In this paper we consider courses of a Computer Science degree in an Italian university from the year 2011 up to 2020. For each course, we know the number of exams taken by students during a given calendar year and the corresponding average [...] Read more.
In this paper we consider courses of a Computer Science degree in an Italian university from the year 2011 up to 2020. For each course, we know the number of exams taken by students during a given calendar year and the corresponding average grade; we also know the average normalized value of the result obtained in the entrance test and the distribution of students according to the gender. By using classification and clustering techniques, we analyze different data sets obtained by pre-processing the original data with information about students and their exams, and highlight which courses show a significant deviation from the typical progression of the courses of the same teaching year, as time changes. Finally, we give heat maps showing the order in which exams were taken by graduated students. The paper shows a reproducible methodology that can be applied to any degree course with a similar organization, to identify courses that present critical issues over time. A strength of the work is to consider courses over time as variables of interest, instead of the more frequently used personal and academic data concerning students. Full article
(This article belongs to the Special Issue Education Data Mining)
Show Figures

Figure 1

19 pages, 1739 KiB  
Article
Dealing with Randomness and Concept Drift in Large Datasets
by Kassim S. Mwitondi and Raed A. Said
Data 2021, 6(7), 77; https://doi.org/10.3390/data6070077 - 19 Jul 2021
Cited by 5 | Viewed by 4611
Abstract
Data-driven solutions to societal challenges continue to bring new dimensions to our daily lives. For example, while good-quality education is a well-acknowledged foundation of sustainable development, innovation and creativity, variations in student attainment and general performance remain commonplace. Developing data -driven solutions hinges [...] Read more.
Data-driven solutions to societal challenges continue to bring new dimensions to our daily lives. For example, while good-quality education is a well-acknowledged foundation of sustainable development, innovation and creativity, variations in student attainment and general performance remain commonplace. Developing data -driven solutions hinges on two fronts-technical and application. The former relates to the modelling perspective, where two of the major challenges are the impact of data randomness and general variations in definitions, typically referred to as concept drift in machine learning. The latter relates to devising data-driven solutions to address real-life challenges such as identifying potential triggers of pedagogical performance, which aligns with the Sustainable Development Goal (SDG) #4-Quality Education. A total of 3145 pedagogical data points were obtained from the central data collection platform for the United Arab Emirates (UAE) Ministry of Education (MoE). Using simple data visualisation and machine learning techniques via a generic algorithm for sampling, measuring and assessing, the paper highlights research pathways for educationists and data scientists to attain unified goals in an interdisciplinary context. Its novelty derives from embedded capacity to address data randomness and concept drift by minimising modelling variations and yielding consistent results across samples. Results show that intricate relationships among data attributes describe the invariant conditions that practitioners in the two overlapping fields of data science and education must identify. Full article
(This article belongs to the Special Issue Education Data Mining)
Show Figures

Figure 1

31 pages, 1021 KiB  
Article
Performing Learning Analytics via Generalised Mixed-Effects Trees
by Luca Fontana, Chiara Masci, Francesca Ieva and Anna Maria Paganoni
Data 2021, 6(7), 74; https://doi.org/10.3390/data6070074 - 9 Jul 2021
Cited by 10 | Viewed by 3671
Abstract
Nowadays, the importance of educational data mining and learning analytics in higher education institutions is being recognised. The analysis of university careers and of student dropout prediction is one of the most studied topics in the area of learning analytics. From the perspective [...] Read more.
Nowadays, the importance of educational data mining and learning analytics in higher education institutions is being recognised. The analysis of university careers and of student dropout prediction is one of the most studied topics in the area of learning analytics. From the perspective of estimating the likelihood of a student dropping out, we propose an innovative statistical method that is a generalisation of mixed-effects trees for a response variable in the exponential family: generalised mixed-effects trees (GMET). We performed a simulation study in order to validate the performance of our proposed method and to compare GMET to classical models. In the case study, we applied GMET to model undergraduate student dropout in different courses at Politecnico di Milano. The model was able to identify discriminating student characteristics and estimate the effect of each degree-based course on the probability of student dropout. Full article
(This article belongs to the Special Issue Education Data Mining)
Show Figures

Figure 1

Other

Jump to: Research

10 pages, 415 KiB  
Data Descriptor
A Dataset of Dropout Rates and Other School-Level Variables in Louisiana Public High Schools
by Michael Stein, Michael Leitner, Jill C. Trepanier and Kory Konsoer
Data 2022, 7(4), 48; https://doi.org/10.3390/data7040048 - 12 Apr 2022
Cited by 2 | Viewed by 6031
Abstract
Students dropping out of high school is a nationwide problem in the United States, plaguing communities and often greatly reducing the prospects of a quality life for those students who do not complete their high school education. The state of Louisiana consistently has [...] Read more.
Students dropping out of high school is a nationwide problem in the United States, plaguing communities and often greatly reducing the prospects of a quality life for those students who do not complete their high school education. The state of Louisiana consistently has among the highest public high school dropout rates in the United States and, often, the highest. This massive dataset of school variables covering a duration of five academic years (2014–2015 to 2018–2019) was originally compiled with the intention of identifying the factors that correlate with high school dropouts in Louisiana public high schools, specifically. However, it can be useful to any researchers interested in analyzing school-level data concerning a wide range of variables beyond merely dropout rates. This dataset also contains socioeconomic demographics, financial variables, class size, and much more. The correlation analyses ultimately revealed many intriguing insights into the relationships between the tested variables and the dropout rates. Full article
(This article belongs to the Special Issue Education Data Mining)
Show Figures

Figure 1

10 pages, 714 KiB  
Data Descriptor
Dataset of Students’ Performance Using Student Information System, Moodle and the Mobile Application “eDify”
by Raza Hasan, Sellappan Palaniappan, Salman Mahmood, Ali Abbas and Kamal Uddin Sarker
Data 2021, 6(11), 110; https://doi.org/10.3390/data6110110 - 22 Oct 2021
Cited by 11 | Viewed by 10075
Abstract
The data presented in this article comprise an educational dataset collected from the student information system (SIS), the learning management system (LMS) called Moodle, and video interactions from the mobile application called “eDify.” The dataset, from the higher educational institution (HEI) in Sultanate [...] Read more.
The data presented in this article comprise an educational dataset collected from the student information system (SIS), the learning management system (LMS) called Moodle, and video interactions from the mobile application called “eDify.” The dataset, from the higher educational institution (HEI) in Sultanate of Oman, comprises five modules of data from Spring 2017 to Spring 2021. The dataset consists of 326 student records with 40 features in total, including the students’ academic information from SIS (which has 24 features), the students’ activities performed on Moodle within and outside the campus (comprising 10 features), and the students’ video interactions collected from eDify (consisting of six features). The dataset is useful for researchers who want to explore students’ academic performance in online learning environments, and will help them to model their educational datamining models. Moreover, it can serve as an input for predicting students’ academic performance within the module for educational datamining and learning analytics. Furthermore, researchers are highly recommended to refer to the original papers for more details. Full article
(This article belongs to the Special Issue Education Data Mining)
Show Figures

Figure 1

Back to TopTop