Knowledge Discovery and Dataset for the Improvement of Digital Literacy Skills in Undergraduate Students
Abstract
:1. Summary
Dataset | Year | High School | Undergraduate | Number of Observations | Purpose | Location |
---|---|---|---|---|---|---|
Open University Learning Analytics Dataset [13] | 2017 | - | ✓ | 22 courses, 32,593 students | Students’ interactions in the virtual learning environment (VLE) | Open University (OU) |
Digital Competency Observation Dataset [15] | 2019 | ✓ | - | 1061 students | Digital competency | Vietnam |
Academic Performance Evaluation Dataset [11] | 2020 | ✓ | ✓ | 12,411 students | Observe the influence of social variables and the evolution of students’ learning skills | Colombia |
Video Conferencing Tools Acceptance Dataset [14] | 2020 | - | ✓ | 277 records | Video conferencing tools (VTCs) | Vietnam |
High-School Dropout Rate Dataset [10] | 2022 | ✓ | - | 1613 records | Student Dropout rates | United States |
C# Programming Examination Dataset [12] | 2022 | - | ✓ | Unspecified | Academic results in C# programming language | Iraq, Sudan, Nigeria, South Africa, and India |
Undergraduate and High-School Dropout Rate Dataset [9] | 2022 | ✓ | ✓ | 50 records, 143,326 records | Student dropout rate | Mexico |
* RMUTT-DLD | 2023 | - | ✓ | 45,603 records | IC3 Digital Literacy Certification | Thailand |
2. Data Description
3. Methods
3.1. Raw Data
- Demographic data—represent basic information on the students, such as name, age (date of birth), home province, home district, first-entry GPA, current GPA, faculty name, etc.
- Academic data—show the records of enrollment information of a student’s education at RMUTT, including information on teachers, classes, and activities in RMUTT LMS.
- IC3 digital literacy exam data—are a record of student exam results according to digital literacy abilities.
3.2. Data Cleansing
- Removing the duplicated data and unused columns from the raw dataset.
- Joining, merging, and splitting the data among sources using student ID as a key.
- Removing outliers from data sources. For example, the minus values of GPA on a 4.0 scale were removed because the data were sometimes entered incorrectly from the beginning.
- Transforming some local data to international data units, such as year in B.E. into A.D. format, and the number of assignments submitted into the four simplified levels.
3.3. Data Anonymization and Release
4. Data Evaluation
- (1)
- The variables IC3_Score, IC3_Result, and IC3_Exam_Timeused exhibit a high correlation with each other, indicating that a negative correlation is observed between IC3_Exam_Timeused and performance, suggesting that students who take more time to complete the exam tend to have lower scores.
- (2)
- Variables such as IC3_Exam_Year, Std_Admit_Year, Class_Id, and Class_Academic_Year demonstrate a positive correlation with IC3_Score and IC3_Result. This implies that students who enrolled after the implementation of the digital literation learning procedure achieved better scores and higher pass rates.
- (3)
- Std_Entry_GPA and Std_Current_GPA also show a positive correlation with IC3_Score and IC3_Result. This suggests that students with strong entry and current GPAs tend to obtain higher IC3 scores and pass the exam.
- (4)
- The variable Class_Teacher_Encoded_Id plays a role in determining IC3_Score and IC3_Result. This indicates that the selection of a teacher can influence a student′s grades and overall success, as different teachers may vary in their delivery of course materials.
- (5)
- The frequency of Online_Assignment_Submission is also correlated with IC3_Score and IC3_Result. A lower frequency of assignments given in a class is associated with lower scores and pass rates for students.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Tinmaz, H.; Lee, Y.T.; Fanea-Ivanovici, M.; Baber, H. A Systematic Review on Digital Literacy. Smart Learn. Environ. 2022, 9, 21. [Google Scholar] [CrossRef]
- Ramaila, S.; Molwele, A.J. The Role of Technology Integration in the Development of 21st Century Skills and Competencies in Life Sciences Teaching and Learning. Int. J. High. Educ. 2022, 11, 9. [Google Scholar] [CrossRef]
- Alhassan, M.D.; Adam, I.O. The Effects of Digital Inclusion and ICT Access on the Quality of Life: A Global Perspective. Technol. Soc. 2021, 64, 101511. [Google Scholar] [CrossRef]
- Wittayasin, S. Education Challenges to Thailand 4.0. Int. J. Integr. Educ. Dev. 2017, 2, 29–35. [Google Scholar]
- Tripopsakul, W. Preparing for Industry 4.0-Will Youths Have Enough Essential Skills? An Evidence from Thailand. Int. J. Instr. 2020, 13, 89–104. [Google Scholar]
- Metee, P. Expectations of Hands-on Instructional Quality in the 21st Century Amongst Undergraduate Student: A Case Study at RMUTT. Adv. Sci. Lett. 2018, 24, 4507–4510. [Google Scholar]
- Daungtod, S. A Study of Digital Literacy of 1st Year Computer Education Students Faculty of Education Nakhon Phanom University. In Proceedings of the ACM International Conference Proceeding Series; Association for Computing Machinery: New York, NY, USA, 2019; pp. 241–244. [Google Scholar]
- Fernández-García, A.J.; Rodríguez-Echeverría, R.; Preciado, J.C.; Conejero Manzano, J.M.; Sánchez-Figueroa, F. Creating a Recommender System to Support Higher Education Students in the Subject Enrollment Decision. IEEE Access 2020, 8, 189069–189088. [Google Scholar] [CrossRef]
- Alvarado-Uribe, J.; Mejía-Almada, P.; Masetto Herrera, A.L.; Molontay, R.; Hilliger, I.; Hegde, V.; Montemayor Gallegos, J.E.; Ramírez Díaz, R.A.; Ceballos, H.G. Student Dataset from Tecnologico de Monterrey in Mexico to Predict Dropout in Higher Education. Data 2022, 7, 119. [Google Scholar] [CrossRef]
- Stein, M.; Leitner, M.; Trepanier, J.C.; Konsoer, K. A Dataset of Dropout Rates and Other School-Level Variables in Louisiana Public High Schools. Data 2022, 7, 48. [Google Scholar] [CrossRef]
- Delahoz-Dominguez, E.; Zuluaga, R.; Fontalvo-Herrera, T. Dataset of Academic Performance Evolution for Engineering Students. Data Brief 2020, 30, 105537. [Google Scholar] [CrossRef]
- Ibrahim, W.; Abdullaev, S.; Alkattan, H.; Adelaja, O.A.; Subhi, A.A. Development of a Model Using Data Mining Technique to Test, Predict and Obtain Knowledge from the Academics Results of Information Technology Students. Data 2022, 7, 67. [Google Scholar] [CrossRef]
- Kuzilek, J.; Hlosta, M.; Zdrahal, Z. Data Descriptor: Open University Learning Analytics Dataset. Sci. Data 2017, 4, 170171. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Pho, D.-H.; Nguyen, X.-A.; Luong, D.-H.; Nguyen, H.-T.; Vu, T.-P.-T.; Nguyen, T.-T.-T. Data on Vietnamese Students’ Acceptance of Using VCTs for Distance Learning during the COVID-19 Pandemic. Data 2020, 5, 83. [Google Scholar] [CrossRef]
- Le, A.V.; Do, D.L.; Pham, D.Q.; Hoang, P.H.; Duong, T.H.; Nguyen, H.N.; Vuong, T.T.; Nguyen, H.K.T.; Ho, M.T.; La, V.P.; et al. Exploration of Youth’s Digital Competencies: A Dataset in the Educational Context of Vietnam. Data 2019, 4, 69. [Google Scholar] [CrossRef] [Green Version]
- Wahbeh, A.H.; Al-Radaideh, Q.A.; Al-Kabi, M.N.; Al-Shawakfa, E.M. A Comparison Study between Data Mining Tools over Some Classification Methods. IJACSA Int. J. Adv. Comput. Sci. Appl. Spec. Issue Artif. Intell. 2020, 18, 72–76. [Google Scholar]
- Chen, S.; Webb, G.I.; Liu, L.; Ma, X. A Novel Selective Naïve Bayes Algorithm. Knowl.-Based Syst. 2020, 192, 105361. [Google Scholar] [CrossRef]
- Christodoulou, E.; Ma, J.; Collins, G.S.; Steyerberg, E.W.; Verbakel, J.Y.; van Calster, B. A Systematic Review Shows No Performance Benefit of Machine Learning over Logistic Regression for Clinical Prediction Models. J. Clin. Epidemiol. 2019, 110, 12–22. [Google Scholar] [CrossRef]
- Chen, Y.; Hu, X.; Fan, W.; Shen, L.; Zhang, Z.; Liu, X.; Du, J.; Li, H.; Chen, Y.; Li, H. Fast Density Peak Clustering for Large Scale Data Based on KNN. Knowl.-Based Syst. 2020, 187, 104824. [Google Scholar] [CrossRef]
- Tyralis, H.; Papacharalampous, G.; Langousis, A. A Brief Review of Random Forests for Water Scientists and Practitioners and Their Recent History in Water Resources. Water 2019, 11, 910. [Google Scholar] [CrossRef] [Green Version]
- Cervantes, J.; Garcia-Lamont, F.; Rodríguez-Mazahua, L.; Lopez, A. A Comprehensive Survey on Support Vector Machine Classification: Applications, Challenges and Trends. Neurocomputing 2020, 408, 189–215. [Google Scholar] [CrossRef]
- Yamazaki, K.; Vo-Ho, V.K.; Bulsara, D.; Le, N. Spiking Neural Networks and Their Applications: A Review. Brain Sci. 2022, 12, 863. [Google Scholar] [CrossRef] [PubMed]
- Sarlis, N.v.; Skordas, E.S.; Christopoulos, S.R.G.; Varotsos, P.A. Natural Time Analysis: The Area under the Receiver Operating Characteristic Curve of the Order Parameter Fluctuations Minima Preceding Major Earthquakes. Entropy 2020, 22, 583. [Google Scholar] [CrossRef] [PubMed]
- Maxwell, A.E.; Warner, T.A.; Guillén, L.A. Accuracy Assessment in Convolutional Neural Network-Based Deep Learning Remote Sensing Studies—Part 1: Literature Review. Remote Sens. 2021, 13, 2450. [Google Scholar] [CrossRef]
No. | Field Name | Data Type | Description | Data Scope |
---|---|---|---|---|
1 | STD_ENCODE_ID | Text | Record of student’s encoded identifier. | There are 45,603 IC3 examination records that were recorded. |
2 | IC3_MODULE_NAME | Text | IC3 certificate module name. This field has only three modules. | IC3 GS5—Computing Fundamentals IC3 GS5—Key Applications IC3 GS5—Living Online |
3 | IC3_EXAM_LANGUAGE | Text | Language for examination. | English/ Thai. |
4 | IC3_SCORE | Integer | IC3 certificate score for each module. | 0 to 1000 points. |
5 | IC3_RESULT | Text | IC3 certificate result. Scores ≥ 700 pass; otherwise, fail. | Fail/ Pass. |
6 | IC3_EXAM_TIMEUSED | Integer | The time that was used during the examination. | 0 to 3000 s. |
7 | IC3_EXAM_STATION | Text | Station of the test taker, mostly including building and computer name. For example, IWORK-201-01 is IWORK building, room number 201, and computer number 01. | There are 997 stations. Some are not in the standard format because they may use an extra building or computer. |
8 | IC3_EXAM_YEAR | DateTime (Year) | Year of IC3 examination in yyyy format, such as 2023. | 2016 to 2023 A.D. |
9 | STD_ENTRY_GPA | Float | Student’s first-entry GPA | 1.0 to 4.0 on a 4.0 scale. |
10 | STD_CURRENT_GPA | Float | Student’s current GPA during the IC3 examination. | 0.0 to 4.0 on a 4.0 scale. |
11 | STD_ADMIT_YEAR | DateTime (Year) | Student’s admission year in yyyy format, such as 2022. | 2012 to 2022 A.D. |
12 | STD_FACULTYNAME_THAI | Text | Student’s faculty name in Thai. | There are 13 faculties. |
13 | STD_FACULTYNAME_ENG | Text | Student’s faculty name in English. | There are 13 faculties. |
14 | STD_DEPARTMENTNAME_THAI | Text | Student’s department name in Thai. | There are 43 departments. |
15 | STD_DEPARTMENTNAME_ENG | Text | Student’s department name in English. | There are 43 departments. |
16 | STD_HOME_PROVINCENAME | Text (GEO) | Student’s home province name in Thai. | There are 77 provinces in Thailand. |
17 | STD_HOME_DISTRICT | Text (GEO) | Student’s home district name in Thai. | There are 988 districts. |
18 | STD_CONTACT_ZIPCODE | Text (GEO) | Student’s contact zip code in Thailand. In general, some districts have the same contact zip code. | There are 855 contact zip codes. Some values are NA, which is undefined. |
19 | CLASS_ID | Text | Class identifier is used for classifying a class/section for RMUTT CITS. | There are 788 sections for the RMUTT CITS class. |
20 | CLASS_TEACHER_ENCODE_ID | Text | Record of teacher’s encode identifier. This field can distinguish a lecturer from each other. | There are 76 teachers who taught many classes and have different name IDs. |
21 | CLASS_ENROLLSEAT | Integer | Number of students who enrolled in a class. | Between 3 and 78 students in a class. |
22 | CLASS_ACADEMIC_YEAR | DateTime (Year) | Year of class opening in yyyy format, such as 2022. | 2015 to 2022 A.D. |
23 | CLASS_SEMESTER | Integer | Semester period in which the class opens. | Semester 1, 2, or 3. |
24 | ONLINE_ASSIGNMENT_SUBMISSION_FREQUENCY | Text | Frequency of online assignment submissions in related modules. This field was transformed to include four levels. | Lowest/Low/Medium/High |
Model | AUC | CA | F1 | Precision | Recall |
---|---|---|---|---|---|
Logistic Regression | 0.976 | 0.925 | 0.926 | 0.930 | 0.925 |
kNN | 0.976 | 0.921 | 0.921 | 0.922 | 0.921 |
Random Forest | 0.974 | 0.914 | 0.914 | 0.915 | 0.914 |
Neural Network | 0.974 | 0.902 | 0.902 | 0.902 | 0.902 |
Naïve Bayes | 0.952 | 0.896 | 0.896 | 0.899 | 0.896 |
SVM | 0.934 | 0.889 | 0.889 | 0.892 | 0.889 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nilaphruek, P.; Charoenporn, P. Knowledge Discovery and Dataset for the Improvement of Digital Literacy Skills in Undergraduate Students. Data 2023, 8, 121. https://doi.org/10.3390/data8070121
Nilaphruek P, Charoenporn P. Knowledge Discovery and Dataset for the Improvement of Digital Literacy Skills in Undergraduate Students. Data. 2023; 8(7):121. https://doi.org/10.3390/data8070121
Chicago/Turabian StyleNilaphruek, Pongpon, and Pattama Charoenporn. 2023. "Knowledge Discovery and Dataset for the Improvement of Digital Literacy Skills in Undergraduate Students" Data 8, no. 7: 121. https://doi.org/10.3390/data8070121
APA StyleNilaphruek, P., & Charoenporn, P. (2023). Knowledge Discovery and Dataset for the Improvement of Digital Literacy Skills in Undergraduate Students. Data, 8(7), 121. https://doi.org/10.3390/data8070121