The Enhancement of Statistical Literacy: A Cross-Institutional Study Using Data Analysis and Text Mining to Identify Statistical Issues in the Transition to University Education
Abstract
:1. Introduction
- (1)
- What are the recurring statistical issues posing challenges to students?
- (2)
- What are the similarities between the UCD MSC categories and the URJC categories?
- (3)
- How can text mining extract insights from open-text responses to uncover specific statistical difficulties?
2. Literature Review
2.1. Statistics Education Research
2.2. Text Mining in Education Research
3. Materials and Methods
3.1. Methodology for Categorization of URJC Modules
- Access the URJC website and identified every School:
- a.
- Social Sciences and Law;
- b.
- Sciences;
- c.
- Health Sciences;
- d.
- Engineering and Architecture;
- e.
- Art and Humanities.
- Identify every degree inside every School in which a statistics module is taught.
- Download every module webpage for every statistics module and identify the statistical Lessons for each module.
- Summarize, compare, and match the statistical Lessons across the modules.
- Check what Lessons were consistent across different modules and identify the main Lessons that statistics covers.
- Create a description for each Lesson.
3.2. Description of UCD MSC Data
- (1)
- Code: the module code that the student sought help for;
- (2)
- Name and Description: the presenting issue as categorized by the tutor for the session;
- (3)
- Time: time stamp of student’s entry to the MSC;
- (4)
- Comment: the tutor’s open-text response outlining the statistical issue that the student needed help with and how they helped the student, or, in some cases, it could also include an open-text response by the student describing the statistical issue they are requesting help with. Students’ comments are indicated by quotation marks, for example, Student Query: “sampling distributions”.
3.3. Classification of Tutor Comments in the UCD MSC Data
- Module II for those students who have issues in Lesson 3 and Lesson 4.
- Module I, Module II, and Module III for those students who have issues in all Lessons.
3.4. Statistical Analysis
4. Results
4.1. RQ1: Lesson Data Descriptive Analysis
4.2. RQ2: Similarities between the UCD MSC Categories and the URJC Categories
4.3. RQ3: Insights from Open-Text Responses
5. Discussion
6. Conclusions and Future Research Directions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- López, R. The teaching of Statistics in Latin America Primary Education. REICE Rev. Iberoam. Sobre Calid. Efic. Y Cambio Educ. 2015, 13, 103–121. [Google Scholar]
- Blanco, A. Una revisión crítica de la investigación sobre las actitudes de los estudiantes universitarios hacia la estadística. Rev. Complut. Educ. 2008, 19, 311–330. Available online: https://revistas.ucm.es/index.php/RCED/article/view/RCED0808220311A (accessed on 4 August 2024).
- Ottaviani, M.G. From the past to the future. In Proceedings of the Sixth International Conference on Teaching of Statistics, Ciudad del Cabo, South Africa, 7–12 July 2002; CD ROM. Phillips, E.B., Ed.; IASE: London, UK, 2002. [Google Scholar]
- Cuétara Hernández, Y.; Salcedo Estrada, I.M.; Hernández Díaz, M. La enseñanza de la estadística: Antecedentes y actualidad en el contexto internacional y nacional. Atenas 2016, 3, 125–140. [Google Scholar]
- Batanero, C. Sentido estadístico: Componentes y desarrollo. In Actas de las Jornadas Virtuales en Didáctica de la Estadística, Probabilidad y Combinatoria; Contreras, E.J.M., Cañadas, G.R., Gea, M.M., Arteaga, P., Eds.; Departamento de Didáctica de la Matemática de la Universidad de Granada: Granada, Spain, 2013; pp. 55–61. [Google Scholar]
- Batanero, C.; Godino, J. Análisis de Datos y su Didáctica; Departamento de Didáctica de la Matemática de la Universidad de Granada: Granada, Spain, 2001. [Google Scholar]
- Batanero, C.; Castro, A.; Godino, J. Evolution of students, understanding of statistical association in a computer-based teaching environment. In Material Digital; Universidad de Granada: Granada, Spain, 2002. [Google Scholar]
- Garfield, J.B.; Ben-Zvi, D.; Chance, B.; Medina, E.; Roseth, C.; Zieffler, A. Developing Students’ Statistical Reasoning: Connecting Research and Teaching Practice; Springer: New York, NY, USA, 2008; pp. 57–69. [Google Scholar]
- Blanco, A. Enseñar y aprender Estadística en las titulaciones universitarias de Ciencias Sociales: Apuntes sobre el problema desde una perspectiva pedagógica. In Hacia Una Enseñanza Universitaria Centrada en El Aprendizaje; Torre, J.C., Gil, E., Eds.; Universidad Pontificia Comillas: Madrid, Spain, 2004; pp. 143–190. [Google Scholar]
- Behar, R.; Grima, P. La estadística en la educación superior ¿Formamos pensamiento estadístico? Ing. Compet. 2004, 5, 84–90. [Google Scholar] [CrossRef]
- Ramos, L. La educación estadística en el nivel universitario: Retos y oportunidades. Rev. Digit. Investig. Docencia Univ. 2019, 13, 67–82. [Google Scholar] [CrossRef]
- Estrella, S. Enseñar estadística para alfabetizar estadísticamente y desarrollar el razonamiento estadístico. In Alternativas Pedagógicas Para la Educación Matemática Del Siglo XXI; Salcedo, A., Ed.; Centro de Investigaciones Educativas, Escuela de Educación, Universidad Central de Venezuela: Caracas, Venezuela, 2017; pp. 173–194. Available online: https://goo.gl/tUFF5Y (accessed on 4 August 2024).
- Kaplan, J.J.; Gabrosek, J.G.; Curtiss, P.; Malone, C. Investigating Student Understanding of Histograms. J. Stat. Educ. 2014, 22, 1–30. [Google Scholar] [CrossRef]
- Witmer, J. Bayes and MCMC for Undergraduates. Am. Stat. 2017, 71, 259–264. [Google Scholar] [CrossRef]
- Johnson, A.; Rundel, C.; Hu, J.; Ross, K.; Rossman, A. Teaching an Undergraduate Course in Bayesian Statistics: A Panel Discussion. J. Stat. Educ. 2020, 28, 251–261. [Google Scholar] [CrossRef]
- Hu, J.A. Bayesian Statistics Course for Undergraduates: Bayesian Thinking Computing, and Research. J. Stat. Educ. 2020, 28, 229–235. [Google Scholar] [CrossRef]
- Hoegh, A. Why Bayesian Ideas Should Be Introduced in the Statistics Curricula and How to Do So. J. Stat. Educ. 2020, 28, 222–228. [Google Scholar] [CrossRef]
- Dogucu, M.; Hu, J. The Current State of Undergraduate Bayesian Education and Recommendations for the Future. Am. Stat. 2020, 76, 405–413. [Google Scholar] [CrossRef]
- Intepe, G.; Shearman, D. Developing statistical understanding and overcoming anxiety via drop-in consultations. Stat. Educ. Res. J. 2020, 19, 149–166. [Google Scholar] [CrossRef]
- Cronin, A.; Intepe, G.; Shearman, D.; Sneyd, A. Analysis using natural language processing of feedback data from two mathematics support centres. Int. J. Math. Educ. Sci. Technol. 2019, 50, 1087–1103. [Google Scholar] [CrossRef]
- Kovanovic, V.; Joksimovic, S.; Gasevic, D.; Hatala, M.; Siemens, G. Content analytics: The definition, scope, and an overview of published research. In Handbook of Learning Analytics; Siemens, G., Lang, C., Wise, A., Gašević, D., Eds.; Society for Learning Analytics Research: Beaumont, CA, USA, 2015; pp. 77–92. [Google Scholar]
- Litman, D. Natural language processing for enhancing teaching and learning. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; AAAI Press: Washington, DC, USA, 2016; pp. 4170–4176. [Google Scholar]
- Shum, S.B.; Knight, S.; McNamara, D.; Allen, L.; Bektik, D.; Crossley, S. Critical perspectives on writing analytics. In Proceedings of the Sixth International Conference on Learning Analytics & Knowledge, Edinburgh, UK, 25–29 April 2016; ACM: New York, NY, USA, 2016; pp. 481–483. [Google Scholar]
- Mello, R.F.; Andre, M.; Pinheiro, A.G.; Costa, E.; Romero, C. Text mining in education. WIREs Data Min. Knowl. Discov. 2019, 9, e1332. [Google Scholar] [CrossRef]
- Chowdhury, G.G. Natural language processing. Annu. Rev. Inf. Sci. Technol. 2003, 37, 51–89. [Google Scholar] [CrossRef]
- Brooks, B.J.; Gilbuena, D.M.; Krause, S.J.; Koretsky, M.D. Using word clouds for fast, formative assessment of students’ short written responses. Chem. Eng. Educ. 2014, 48, 190–198. [Google Scholar]
- Hadley, W.; Chang, W.; Henry, L.; Pedersen, T.L.; Takahashi, K.; Wilke, C.; Woo, K.; Yutani, H.; Dunnington, D.; ven den Brand, T. GGPLOT2: Create Elegant Data Visualisations Using the Grammar of Graphics. Available online: https://cran.r-project.org/web/packages/ggplot2/index.html (accessed on 4 August 2024).
- Wickham, H. Tidyverse: Easily Install and Load the ‘Tidyverse’. Available online: https://cran.r-project.org/web/packages/tidyverse/index.html (accessed on 4 August 2024).
- Feinerer, I.; Hornik, K.; Artifex Software, Inc. tm: Text Mining Package. Available online: https://cran.r-project.org/web/packages/tm/index.html (accessed on 4 August 2024).
- García, J.A.; Fallas, M.A.; Romero, A. Las actitudes hacia la estadística del estudiantado de orientación. Rev. Electrónica Educ. 2015, 19, 25–41. [Google Scholar]
Topic | Lesson | Description |
---|---|---|
I. Descriptive Statistics. | Lesson 1: Data Description. | Basic Concepts. Variable Types. Summary of Graphical Data. Summary of Numerical Data. |
Lesson 2: Bivariate Data Description. | Summary of Bivariate Data. Covariance, Correlation. Regression Line. | |
II. Probability. | Lesson 3: Probability. | Random Success. Probability Definitions and Intepretation. Conditional Probability. Success. Independence. Law of Total Probability and Bayes’ Theorem. |
Lesson 4: Random Variables. Special Random Variables. | Random Variable Definition. Types of Variables. Probability Mass Functions and Density Functions. Distribution Functions. Expectancy and Variance. Some Distributions. | |
III. Statistics Inference. | Lesson 5: Statistics Inference. | Introduction. Sampling. Definition of Statistical Inference. Central Limit Theorem. Point Estimation and Intervals for Average, Proportions and Variances. Statistical Hypothesis Testing and Decision Making. |
Code | Name | Description | Time | Comment |
---|---|---|---|---|
STAT1xxxx | Basic Statistics | Basic Statistics | 23 February 2015 12:28:01 | Student was unsure of a formula for a small sample t test that involved getting the pooled sample standard deviation; after explaining what the pooled sample s.d was and giving them the formula, they were happy to continue. They also wanted to know the difference between a two same t test and a two sample-paired t test. I explained this using simple examples of data sets where it would be better to use one over the other. |
STAT2xxxx | Other | Other | 19 February 2015 14:36:00 | Student was doing a probability question relating to testing positive for a disease and having the disease. After showing how to draw a probability tree, the student was able to finish the problem. |
STAT2xxxx | Basic Probability | Basic Probability | 23 February 2015 15:00:00 | Explained the difference between a continuous and discrete random variable. How to find the cumulative distribution function by integrating the probability distribution function—did an example with the exponential distribution. |
STAT1xxxx | Continuous distributions (normal, exponential, uniform) | Continuous distributions (normal, exponential, uniform) | 10:25:2 | Student Query: “sampling distributions”. Student didn’t understand what a sampling distribution was. |
Topic | Lesson | Description |
---|---|---|
I. Descriptive Statistics. | Lesson 1: Data Description. | Basic Concepts. Variable Types. Summary of Graphical Data. Summary of Numerical Data. |
Lesson 2: Bivariate Data Description. | Summary of Bivariate Data. Covariance, Correlation. Regression Line. | |
II. Probability. | Lesson 3: Probability. | Random Success. Probability Definition and Intepretation. Conditional Probability. Success Independent. Law of Total Probability and Bayes’ Theorem). |
Lesson 4: Random Variables. Special Random Variables. | Random Variable Definition. Types of Variables. Probability Mass Function and Density Function. Distribution Function. Expectancy and Variance. Some Distributions. | |
III. Statistics Inference. | Lesson 5: Statistics Inference. | Introduction. Sampling. Definition of Statistical. Central Limit Theorem. Point Estimation and Intervals for Average. Proportions and Variances. Statistical Hypothesis Testing and Decision Making. |
IV. Software. | Lesson 6: Statistics Software. | Statistical Computing. R Programming Languages. |
Other. | Other. | Comments with Limited Information, Students who work alone, etc. |
High Level (Advanced Statistics). | High Level (Advanced Statistics). | Higher Courses, Content beyond that of Introductory or Service Level. |
Module II. | Lesson 3 and Lesson 4. | See descriptions for Lesson 3 and Lesson 4. |
Module I, Module II, and Module III. | Lesson 1, Lesson 2, Lesson 3, Lesson 4, and Lesson 5. | See descriptions for all Lessons (except Lesson 6). |
Lesson | Frequency (N) |
---|---|
Lesson 1 | 254 |
Lesson 2 | 298 |
Lesson 3 | 355 |
Lesson 4 | 609 |
Lesson 5 | 709 |
Lesson 6 | 96 |
High Level | 150 |
Lessons 3 and 4 | 104 |
Other | 1098 |
New Merged Category | UCD Categories | N | Merged N |
---|---|---|---|
Other | Other | 403 | 1184 |
Other (please specify topic) | 665 | ||
Word Problem | 79 | ||
Student working alone for now | 21 | ||
Other (please specify topic) | 17 | ||
Basic Statistics | Basic Statistics | 843 | |
Standard deviation or variance | 44 | 902 | |
Graphs (reading, sketching and interpreting) | 15 | ||
Basic Probability | Basic Probability | 196 | 263 |
Basics of Probability theory | 67 | ||
Random Vectors and Distributions | Continuous distributions (normal, exponential, uniform) | 167 | 325 |
Discrete distributions (binomial, poisson, hypergeometric) | 82 | ||
Random Vectors | 29 | ||
Continuous Probability Distributions | 19 | ||
Properties of Random Samples | 16 | ||
Functions | 6 | ||
Functions (exponential and logarithmic) | 6 | ||
Hypothesis Testing and Confidence Intervals | Hypothesis Testing | 224 | |
Confidence Intervals | 199 | ||
Hypothesis test—One sample | 27 | ||
Hypothesis test—Two samples | 19 | ||
Statistical Inference | 17 | 507 | |
Inference about linear regression | 9 | ||
Confidence interval—One sample | 6 | ||
Confidence interval—Two samples | 6 | ||
Linear Regression | Linear Regression | 35 | 35 |
Integration | Integration | 88 | 88 |
Arithmetic | Arithmetic | 76 | 76 |
Statistical Software, e.g., Minitab, Excel SPSS, R | Statistical Software, e.g., Minitab, Excel SPSS, R | 37 | 37 |
Differentation Rules | Differentation Rules | 9 | 9 |
Differentation | Differentation | 7 | 7 |
Construction of Estimators | Construction of Estimators | 13 | 13 |
Basic Algebra | Basic Algebra | 8 | 8 |
Asymptotics | Asymptotics | 7 | 7 |
Advanced | Advanced | 36 | 36 |
Matrices | Matrices | 8 | 8 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
de la Hoz-Ruiz, A.; Howard, E.; Hijón-Neira, R. The Enhancement of Statistical Literacy: A Cross-Institutional Study Using Data Analysis and Text Mining to Identify Statistical Issues in the Transition to University Education. Information 2024, 15, 567. https://doi.org/10.3390/info15090567
de la Hoz-Ruiz A, Howard E, Hijón-Neira R. The Enhancement of Statistical Literacy: A Cross-Institutional Study Using Data Analysis and Text Mining to Identify Statistical Issues in the Transition to University Education. Information. 2024; 15(9):567. https://doi.org/10.3390/info15090567
Chicago/Turabian Stylede la Hoz-Ruiz, Antonio, Emma Howard, and Raquel Hijón-Neira. 2024. "The Enhancement of Statistical Literacy: A Cross-Institutional Study Using Data Analysis and Text Mining to Identify Statistical Issues in the Transition to University Education" Information 15, no. 9: 567. https://doi.org/10.3390/info15090567
APA Stylede la Hoz-Ruiz, A., Howard, E., & Hijón-Neira, R. (2024). The Enhancement of Statistical Literacy: A Cross-Institutional Study Using Data Analysis and Text Mining to Identify Statistical Issues in the Transition to University Education. Information, 15(9), 567. https://doi.org/10.3390/info15090567