Leveraging Machine Learning Techniques to Investigate Media and Information Literacy Competence in Tackling Disinformation

Alcalde-Llergo, José Manuel; Fernández, Mariana Buenestado; George-Reyes, Carlos Enrique; Zingoni, Andrea; Yeguas-Bolívar, Enrique

doi:10.3390/info16110929

Open AccessArticle

Leveraging Machine Learning Techniques to Investigate Media and Information Literacy Competence in Tackling Disinformation

by

José Manuel Alcalde-Llergo

^1,2

,

Mariana Buenestado Fernández

³

,

Carlos Enrique George-Reyes

⁴

,

Andrea Zingoni

²

and

Enrique Yeguas-Bolívar

^1,*

¹

Department of Computer Science, University of Córdoba, 14071 Córdoba, Spain

²

Department of Economics, Engineering, Society and Business Organization, University of Tuscia, 01100 Viterbo, Italy

³

Department of Education, University of Córdoba, 14071 Córdoba, Spain

⁴

Dirección de Investigación, Vicerrectoría de Investigación y Vinculación, Universidad Bolivariana del Ecuador, Durán 170507, Ecuador

^*

Author to whom correspondence should be addressed.

Information 2025, 16(11), 929; https://doi.org/10.3390/info16110929

Submission received: 24 September 2025 / Revised: 17 October 2025 / Accepted: 21 October 2025 / Published: 23 October 2025

Download

Browse Figure

Versions Notes

Abstract

This study develops machine learning models to assess Media and Information Literacy (MIL) skills specifically in the context of disinformation among students, particularly future educators and communicators. While the digital revolution has expanded access to information, it has also amplified the spread of false and misleading content, making MIL essential for fostering critical thinking and responsible media engagement. Despite its relevance, predictive modeling of MIL in relation to disinformation remains underexplored. To address this gap, a quantitative study was conducted with 723 students in education and communication programs using a validated survey. Classification and regression algorithms were applied to predict MIL competencies and identify key influencing factors. Results show that complex models outperform simpler approaches, with variables such as academic year and prior training significantly improving prediction accuracy. These findings can inform the design of targeted educational interventions and personalized strategies to enhance students’ ability to critically navigate and respond to disinformation in digital environments.

Keywords:

media and information literacy; disinformation; higher education; machine learning

1. Introduction

The current international political agenda underscores the need to develop and implement policies, action plans and strategies aimed at promoting MIL. These efforts seek to raise public awareness, strengthen prevention capacities and foster resilience against disinformation and the spread of misinformation [1]. The relevance of these issues is reflected in the UN General Assembly Resolution (2021), which calls on Member States and stakeholders to design and implement specific measures to promote MIL. Moreover, it highlights the importance of increasing awareness, enhancing prevention capacity and reinforcing resilience in the face of disinformation and false information. This approach is also recognized as a key solution in UNESCO’s Windhoek +30 Declaration (2021), where one of the three core principles to counter disinformation is fostering critical thinking among citizens, promoted through MIL initiatives.

However, a report from the same organization warns that young people, due to their constant interaction with digital platforms, are particularly vulnerable to disinformation. Approximately 70% of young people globally are connected online but still lack solid and sustainable training in MIL to tackle the challenges of the digital age [2]. The importance of adapting the guidelines of international organizations as a foundation for innovating the development of practices that strengthen professional training and address specific issues related to education and communication fields is often emphasized [3].

In the development of good MIL practices to counter disinformation, the scientific literature has sought to identify a competency profile that, through processes of social and professional transfer, allows for the selection of training proposals that positively impact the teaching culture and student body [4,5].

In recent years, MIL has evolved beyond its traditional focus on media access, critical consumption and message production to address new challenges posed by digital misinformation and disinformation [2]. While MIL generally promotes critical thinking and informed engagement with media, its application in the context of disinformation is distinctive because it requires not only the evaluation of information credibility but also an understanding of the intentional manipulation of content and its socio-political consequences. Disinformation represents a deliberate and systematic distortion of information, designed to mislead or polarize audiences, which demands higher-order cognitive, ethical, and civic competencies. Therefore, exploring MIL specifically as a means to counter disinformation highlights a more proactive and defensive dimension of literacy, one that equips individuals not just to interpret media critically, but to actively resist and respond to manipulative information flows in digital environments.

MIL against disinformation, like other topics in educational sciences, can be addressed through more advanced research designs and analytical techniques. These approaches open numerous opportunities in the educational field, in line with the strategic recommendations established in the Beijing Consensus on Education [6]. Such advancements have the potential to transform educational processes through improved monitoring, evaluation and research [7]. In particular, Machine Learning (ML) offers a wide range of algorithms capable of analyzing and interpreting complex data with great precision, facilitating the creation of predictive models that adapt to individual circumstances and the specific progress of each participant in the study [8].

While recent advances in generative artificial intelligence have expanded the possibilities for content creation and educational innovation, they also raise concerns about reliability, bias and hallucinated outputs that can distort information and amplify misinformation. These challenges underscore the need for stronger MIL skills to critically assess and responsibly engage with artificial intelligence-generated content. In contrast, the ML approach adopted in this study serves a different purpose: it functions as an analytical framework for identifying patterns and predicting competencies based on empirical data, without generating or altering information. By leveraging ML in this way, the study contributes to evidence-based educational research while remaining aligned with the ethical imperatives of fostering critical, data-informed literacy among students.

Various ML techniques have been employed to address the challenge of detecting disinformation on social networks, a topic that has received considerable attention, as evidenced by several literature reviews [9,10,11]. However, the application of these techniques remains in an early stage as methodological proposals in the field of research on MIL to combat disinformation [12]. In this context, the present article provides a valuable and original contribution to the scientific literature by analyzing the performance of ML models in predicting sociodemographic variables of future professionals in the fields of education and communication, specifically focusing on their perception of MIL against disinformation. By exploring the intersection between ML and the prediction of sociodemographic variables, this study expands our understanding of the potential of AI techniques to reveal relevant information and support informed decision-making in the educational field.

The article is organized as follows: Section 1 presents the object of study and its relevance. Section 2 reviews the background and analyzes previous studies related to MIL against disinformation, as well as the associated variables and the use of ML in the educational field. Section 3 details the methodology, including the research design, study variables, data collection instrument and participant characteristics. Section 4 outlines the experimental design, evaluating the performance of ML models in three key tasks: classification of the knowledge branch, selection of the most relevant variables and regression of key factors. Section 5 presents the results obtained during the study. Section 6 provides a comprehensive discussion of the findings, highlights their implications and outlines the study’s limitations. Finally, Section 7 concludes the article by highlighting the main findings and proposing directions for future research.

2. Background, Variables and Research Model

The spread of disinformation in digital environments has highlighted the importance of MIL as a key competency for thoughtfully engaging with media content. Research suggests that MIL is not only about accessing information but also about developing ethical and analytical thinking skills to analyze, evaluate and create content responsibly. This section reviews previous studies on MIL and its relationship with disinformation, focusing on three key aspects.

First, it explores how MIL empowers individuals to identify misinformation, detect biases and navigate the complexities of the modern information landscape. Then, the demographic and academic factors, such as gender, age, academic background and previous training, that influence MIL levels are examined. Finally, it discusses how ML methods have been used to assess and predict MIL competencies, offering insights into their role in educational contexts.

2.1. MIL Against Disinformation

MIL is a set of key knowledge, skills and attitudes that enable individuals to access, analyze, evaluate and create media and informational content critically and responsibly. Scientific studies on this topic indicate that MIL is not only about technical competencies to access information but also involves a critical and ethical capacity to understand the messages conveyed by the media, as well as the contexts in which they are produced and distributed [13].

It has been shown that people with strong MIL are capable of discerning between reliable and unreliable sources, identifying biases in content and understanding the intentions and objectives of information producers [14,15]. This ability is especially crucial in a world where disinformation, fake news and algorithms prioritizing sensationalist and polarizing content dominate many platforms [16].

Experts emphasize that MIL empowers individuals, granting them control over the information they consume and produce [17,18]. Instead of being passive consumers, people should be active participants who evaluate, question and create content that promotes a more democratic, transparent and respectful society.

The MIL approach requires disciplinary convergence and employs a conceptual framework grounded in various fields of knowledge, drawing inspiration particularly from communication studies, sociology, psychology, political science, engineering and education sciences [19]. Specifically, edu-communication is based on the interconnection between two traditionally separate intellectual fields, education and communication, which now converge to address a new social and educational need [20].

Osuna-Acedo et al. [21] highlights the need for communicators to take on greater social responsibility and for educators to integrate the relevance of media into their pedagogical methods.

2.2. Associated Variables

A wide range of variables has been examined in the literature to explain differences in MIL levels, particularly in the context of disinformation. Representative factors commonly reported include gender, age, academic year, academic field (education or communication) and prior training in media literacy or disinformation. In addition, MIL has been associated with broader contextual and behavioral dimensions, such as learners’ socio-cultural background, digital engagement and patterns of media consumption, which together shape how individuals acquire and apply critical information skills. These variables were selected because they have been repeatedly associated with variations in critical evaluation, digital engagement and exposure to media education, allowing the analysis to focus on demographic and academic factors most consistently linked to MIL in previous studies.

Regarding gender, although women tend to be more reflective than men, they are still less likely than men to frequently verify the accuracy of information in the media [22]. Additionally, female university students have greater difficulty identifying fake news when it comes from media sources with a humorous tone [23]. However, other studies report no significant gender differences or even higher verification rates among women [24]. Furthermore, studies indicate that women are more likely to engage with social media content in ways that can increase exposure to misinformation, highlighting a potential gender-related susceptibility to false information [25]. These findings underscore the complexity of gender-related patterns in MIL and the need for further research to clarify these relationships.

The studies by [26,27] hypothesized that online searches are more likely to lead young people to validate true news than to refute false ones. However, they concluded that age does not influence their ability to identify false information. Similarly, Romero-Rodríguez et al. [4] reported that levels of media competence are not linked to specific age groups. While associating university students with the terms “digital natives” and “digital immigrants” may be useful to describe instrumental access to platforms and digital competencies, these terms do not guarantee an adequate level of media literacy or critical media consumption [28].

The study by [29] also highlights differences between first-year and final-year students in communication-related degrees concerning the effectiveness of tools acquired during their studies, emphasizing the need to review and improve university training strategies.

On the other hand, Herrero-Diz et al. [23] emphasized significant differences in media competence in the face of disinformation between university students in communication and education. In particular, education students face greater difficulties in correctly interpreting aspects such as the accuracy, authorship and purpose of texts.

Lastly, Cherner and Curry [30] discussed that adequate training in media literacy is essential for individuals to develop the competencies necessary to critically evaluate and verify information. People with training in this area face fewer difficulties in interpreting content, a conclusion supported by studies such as those by [31,32], which emphasize from different perspectives that media literacy is the only effective tool to protect society against disinformation. However, Adjin-Tettey [33] warns that one-off training, while positive, is insufficient to consolidate strong critical skills. In this sense, a continuous and consistent educational approach is essential to enable students to identify and evaluate information more effectively.

2.3. Application of ML Techniques in Educational Perceptions

ML is a subfield of AI that focuses on developing models and algorithms capable of learning and making predictions or decisions based on data, without requiring explicitly programmed instructions for each task [34]. The literature review by [35] on the use of ML in educational sciences concludes that, although the impact of these methods in research and their practical applications in this field is still limited, it is constantly growing and is significantly transforming traditional approaches to educational research. Scientific evidence has shown that one of the main advantages of ML over traditional approaches, such as linear regression, is its ability to incorporate non-parametric components, making it more adaptable and versatile in practical applications as a predictive model [36,37].

In educational contexts, this adaptability allows ML algorithms to analyze large amounts of heterogeneous student data to detect individual learning trajectories and predict areas where students may struggle. By identifying these patterns, ML systems can recommend or even generate content, assessments and feedback that match each learner’s profile. In this way, ML facilitates the design of personalized curricula that adapt dynamically to students’ needs, optimizing teaching strategies and fostering more effective and equitable learning progress [38,39].

ML has found various applications in the educational field, including the exploration of more subjective aspects, such as students’ perceptions of their level of educational competencies. Examples of this include competency in computational thinking [40], competency in open education [41,42], digital teaching competency [43], competency in complex thinking [41] and competency in global citizenship [44,45].

Some relevant studies are related to the topic under study, although the application of ML techniques in research on MIL is still in a relatively nascent phase. The study by [46] focuses on predicting the level of media literacy through the use of classification algorithms, such as random forest (RF), decision trees (DT) and support vector machines (SVM), with the aim of building predictive models. The results show that media literacy skills among students are limited, with 61.4% of participants at low literacy levels. Two key factors for improving these skills are the interpretation of media messages and the use of various sources and devices.

On the other hand, the study by [47] analyzes student participation in media literacy campaigns on TikTok using ML techniques. The study explored how topic modeling, sentiment analysis and network analysis can provide valuable insights into students’ interactions with these campaigns. In a different approach, Mohd Nadzir and Abu Bakar [48] proposed a predictive model of digital media literacy in the context of distance education. During the data analysis phase, they selected five ML algorithms as classifier models: logistic regression (as a baseline model), k-nearest neighbor (KNN), RF, SVM and multilayer perceptron. Additionally, they employed five cross-validation configurations to evaluate the models. The comparison of results was conducted using metrics such as accuracy, precision, recall and F-measure. The best model, based on the SVM algorithm, demonstrated the highest accuracy: 82.9% for model learning based on generation and 41.4% for device-based learning. This study identified interesting patterns according to the generation of students, revealing that different generations employ digital literacy in distinct ways. Thus, the proposed predictive model demonstrated that the generation of students is a useful indicator for evaluating their level of digital media literacy, especially in the case of distance education students.

3. Methodology

This section presents the methodology used in this research, including the research design, study variables, the instrument employed for data collection and the participants involved. The focus is on understanding how MIL impacts the ability to critically engage with disinformation in educational contexts, particularly for future educators and communicators. The research design and corresponding data collection methods were therefore structured to develop and validate predictive models of MIL competencies based on sociodemographic and academic factors, allowing for the identification of variables that most strongly influence students’ capacity to address disinformation.

3.1. Research Design and Study Variables

The study followed a quantitative design and employed a survey-based approach to assess students’ MIL in the context of disinformation. From a temporal perspective, this is a cross-sectional study, as data were collected at a single point in time. The selection of study variables is directly informed by the literature reviewed in Section 2. Prior research has shown that MIL competencies are influenced by demographic and academic factors, such as gender, age, academic year, field of study and previous training in media literacy or misinformation. These factors were therefore included as independent (predictor) variables. The dependent (criterion) variables correspond to the core dimensions of MIL identified in previous studies and discussed in Section 2, namely: (1) knowledge about misinformation, (2) skills and behaviors in responding to misinformation and (3) attitudes of commitment and responsibility toward misinformation. This conceptualization aligns with empirical findings showing that effective MIL requires not only cognitive understanding but also behavioral competencies and ethical attitudes. By linking predictor and criterion variables to established literature, the study ensures that the analyses are grounded in the theoretical and empirical context of MIL in the face of disinformation.

3.2. Instrument

For this research, a questionnaire titled “Perception of Future Edu-Communicators on Disinformation” was designed. The instrument consists of 25 items organized on a 4-point Likert scale (1 = None, 2 = Little, 3 = Quite a bit, 4 = A lot). Its structure is divided into three main sections:

Sociodemographic variables: Five items collect information on gender, age, academic year, academic field and prior training in MIL and disinformation.
Theoretical dimensions of MIL in relation to disinformation: Five items for each theoretical dimension, covering knowledge, skills and attitudes toward disinformation.
Responsibility toward disinformation: Five items assess the perception of responsibility as future edu-communicators in combating disinformation.

Regarding the origin of the instrument, no existing questionnaire in the literature addressed perceptions of MIL in relation to disinformation for both future educators and communicators while aligning with our research objectives and hypotheses. Therefore, an ad hoc questionnaire was developed based on a review of the literature and documents from prestigious organizations. Concerning the structure of the instrument, the three dimensions (knowledge, skills and attitudes) are intrinsic to the concept of competence [49]. Moreover, UNESCO [2] emphasizes that digital communication poses challenges such as disinformation. To address these challenges and foster a media- and information-literate citizenship, it is essential to promote media education and critical thinking, particularly during the initial training of education and communication professionals. Taking this critical observation into account, the dimension of responsibility was added. The items corresponding to the different dimensions were identified from previously validated questionnaires used in studies such as those by [29,50,51]. Subsequently, these items were reformulated and adapted to fit the specific characteristics of the participants in this study.

To assess content validity, the expert judgment method was employed. A purposive sample of five specialists was selected using the convenience sampling method. These experts met the following criteria: doctoral studies in education or communication, experience in research projects, scientific output related to media literacy and disinformation and at least twelve years of research experience. Regarding Kendall’s W test, similar results were obtained. For the clarity criterion, a value of

p < 0.000

was observed; for the coherence criterion,

p < 0.032

; and for the relevance criterion,

p < 0.003

, reflecting statistical significance in all cases. Quantitative evaluations did not provide relevant information for improving the instrument. However, adaptations and modifications were made to the wording of some items based on qualitative evaluations or comments from participants in the expert judgment process.

3.3. Participants and Data Collection

The sample consisted of a total of 723 university students from two fields of study: 435 from Education degrees and 288 from Communication degrees. The study was conducted in a region in northern Spain, where both degrees share the Media and Information Literacy competency in their curricula. Regarding gender, 201 participants were men and 522 were women. Among the sociodemographic data, a significant tendency was observed for women to specialize in fields related to communication and education, a phenomenon reflected in the participants’ profile [50,52].

The age distribution of the participants was as follows: 277 students were between 17 and 19 years old, 384 were between 20 and 22 years old, and 62 were over 23 years old. Additionally, participants came from all four academic years, distributed as follows: 209 students were in their first year, 161 in their second year, 161 in their third year and 192 in their fourth year. Regarding prior training in disinformation, 218 students reported having received training on this topic, while 505 indicated they had not.

The inclusion criteria for the sample were enrollment in one of the two aforementioned fields of study and access to devices with an internet connection to respond to the online questionnaire. A convenience sampling method was used, as participants were selected from students enrolled in courses taught by the researchers.

The participants have been informed about the purpose of the study, and their responses are anonymous and confidential. All have explicitly given their consent to participate in this research voluntarily.

3.4. Procedure and Data Preprocessing

The data collected were processed using SPSS v.25 and Python 3.14. In the first phase, an exploratory factor analysis was conducted to assess the validity and reliability of the attitude questionnaire, aiming to identify the empirical structure of the data. In the second phase, the study’s objectives were addressed.

Before conducting the experiments, a data preprocessing step was performed. This process included normalization and encoding of categorical variables using the One-Hot Encoding technique. Additionally, the final scores for each subject were computed for the four categories: knowledge, skills, attitudes and responsibilities. As a result, the final dataset consists of 30 features.

4. Experimentation

This section describes the experimental design of this study. Specifically, the objective is to analyze the performance of different ML models in three key tasks:

Classification of the Knowledge Branch: This task evaluates whether students’ MIL profiles differ systematically between the two academic areas considered: Education and Communication. The purpose is not merely to label students but to identify discipline-specific patterns that may influence the development of MIL competencies. This helps reveal latent relationships in the dataset that are not easily observable through descriptive statistics alone.
Selection of the Most Relevant Variables: Different feature-selection techniques were applied to determine which variables contribute most to classification performance and to interpret their educational significance in predicting MIL outcomes.
Regression of the Key Factors: Regression models were trained to estimate the scores students would achieve in each MIL dimension (Knowledge, Skills, Attitudes and Responsibility). This allows the identification of which sociodemographic and academic factors most strongly predict each competence and the extent of their influence.

All algorithms and configurations used for the different tasks were implemented locally with open-source Python libraries (scikit-learn [53] and LightGBM [54]).

The experiments were conducted using the dataset derived from the questionnaire and data collection described in Section 3.2 and Section 3.3. After preprocessing, each row in the dataset represented one student and included both sociodemographic information and aggregated scores for the four MIL dimensions: Knowledge, Skills, Attitudes and Responsibility. In total, 723 valid responses were available, considering 30 numerical features for input into the different models. The following subsections provide a detailed breakdown of each stage in the experimental process.

4.1. Classification of the Knowledge Branch

To conduct the experimentation for the prediction of the knowledge branch, various ML algorithms and different hyperparameter configurations have been considered. The algorithms and configurations are as follows:

Support Vector Machine (SVM): Different configurations were explored by varying the regularization parameter C among {0.1, 1, 10, 100}, the kernel function among {radial basis function (RBF), linear, polynomial} and the parameter $γ$ , which controls the influence of training samples in defining the decision boundary. Instead of manually selecting $γ$ , two widely used and theoretically justified heuristics were employed:
–
$γ = \frac{1}{n_{features} \cdot Var (X)}$ , where $n_{features}$ is the number of input features and $Var (X)$ is their variance.
–
$γ = \frac{1}{n_{features}}$ , which only depends on the number of features.
Decision Tree (DT): Experiments were conducted by varying the maximum tree depth among {unrestricted, 5, 10, 20} and the minimum number of samples required to split a node among {2, 5, 10, 20}.
Random Forest (RF): Different numbers of DTs (estimators) were considered, choosing among {10, 50, 100}. Additionally, maximum tree depths {unrestricted, 5, 10, 20} and minimum sample sizes for node splitting {2, 5, 10, 20} were explored.
Light Gradient Boosting Machine (LightGBM): The model was evaluated with different numbers of boosting iterations (estimators) among {50, 100, 200}, maximum depths among {unrestricted, 5, 10} and learning rates among {0.01, 0.1, 0.2}.
k-Nearest Neighbors (KNN): The number of nearest neighbors was varied among {3, 5, 7, 9}, using two different weighting schemes: uniform (equal weight to all neighbors) and distance-based (closer neighbors have greater influence). Additionally, two distance metrics were considered: Euclidean and Manhattan.

This set of algorithms was selected because they represent a diverse set of widely used ML paradigms, ranging from linear (SVM) and distance-based (KNN) approaches to tree-based and ensemble methods (DT, RF, LightGBM), allowing a comprehensive comparison of both simple and complex classifiers for this prediction task.

To ensure proper generalization and avoid overfitting the data, the dataset has been split into a training subset (80%) and a test subset (20%). All model training, as well as hyperparameter calculation, has been performed on the training set, while the final validation has been conducted using the test subset. Furthermore, this data division was carried out randomly but stratified, maintaining the same proportion of the classes to be predicted in both sets. These training and test sets have been the same for each of the experiments conducted to ensure that the comparison between different models and configurations is fair.

Cross-validation has also been performed on the training set in each of the experiments, obtaining average scores for different combinations of training and validation data. These scores have been used to decide which classifier is the best for our task and apply it to the test set. The different folds in the cross-validation were also constructed in a stratified manner.

Due to an imbalance in the “Communication” class (288 vs. 435 students in the “Education” class), the F1-score was adopted as the main evaluation metric instead of overall accuracy. The F1-score is the harmonic mean between precision (proportion of correctly predicted positive cases among all the predictions) and recall (proportion of actual positive cases that were correctly identified), providing a balanced measure of performance when class sizes are unequal. Subsequently, the performance of the final model will be calculated using accuracy. Additionally, considered models, detailed in the following section, have been applied with balanced class weights to address this problem simply and improve final generalization.

4.2. Selection of the Most Relevant Features for Predicting the Knowledge Branch

Regarding feature selection, different methods have been considered and subsequently, they have been compared using this reduced subset of features with our best model. The methods considered are:

SelectKBest (SKB): This method employs the ANOVA (f_classif) scoring function to evaluate feature importance, selecting the top k features based on their scores. In this case, the selected features were obtained using the dataset transformed by this method.
Forward Feature Selector (FFS): Using an RF model with specific hyperparameters obtained from the best model configuration for the training data, this method selects features sequentially in a forward manner and optimizes the selection based on the accuracy metric using stratified five-fold cross-validation.
Recursive Feature Elimination (RFE): Also based on the previously configured RF model, this method recursively eliminates features until the desired number of selected features is reached.
Feature Importance Based on DT: Using a DT classifier, the importance of each feature was calculated, selecting the most relevant n features according to the generated importance scores (corresponding to the first splits in the tree branches).

The feature-selection techniques were chosen to combine complementary strategies: univariate statistical ranking (SKB), sequential wrapper-based optimization (FFS), recursive elimination (RFE) and model-based importance estimation (DT). This combination ensures that both individual feature relevance and inter-feature dependencies are considered in the conducted analysis.

Finally, in order to identify the optimal feature subset for predicting the field of knowledge, we will use as a baseline the best-performing model obtained from the experimentation aimed at optimizing this prediction.

4.3. Regression of the Measured Competencies

The final experiment will involve performing a regression to estimate the scores a student would achieve in each of the four assessed competencies: Knowledge, Skills, Attitudes and Responsibility. As input variables, all the students’ sociodemographic characteristics will be considered, along with their course, field of study and the scores obtained in the remaining competencies. To conduct the experimentation, various regression models and different hyperparameter configurations have been considered. The algorithms and configurations are as follows:

Linear Regression (LR): This model captures the linear relationship between predictors and the target variable, estimating how each input feature proportionally influences the predicted competency score. The model was evaluated by considering whether to include an intercept term in the equation. The inclusion of an intercept allows the model to better fit datasets where the dependent variable does not naturally pass through the origin.
Ridge Regression (RR): RR measures the same linear relationships as LR but adds a regularization term that penalizes large coefficients. Different values for the regularization strength parameter $α$ were explored, selecting from {0.01, 0.1, 1.0, 10.0}, where higher values impose stronger penalties on large coefficients to prevent overfitting. Additionally, different optimization solvers were tested:
–
Singular Value Decomposition (SVD): A matrix factorization method that decomposes the design matrix into singular vectors and singular values.
–
Least Squares (LSQR): An iterative method based on the conjugate gradient approach, solving the normal equations for least squares problems.
–
SAGA: A stochastic gradient-based method that updates coefficients in small batches, making it efficient for high-dimensional and sparse datasets.
DT Regressor: This model identifies non-linear, hierarchical relationships by recursively partitioning the feature space, creating threshold-based rules that predict competency scores. The tree-based model was evaluated using different maximum depths among {unrestricted, 5, 10, 20}, where deeper trees can capture more complex patterns but may lead to overfitting. The minimum number of samples required to split a node was varied among {2, 5, 10, 20} to regulate tree complexity. Furthermore, two different criteria for measuring the quality of a split were considered:
–
Squared Error: Minimizes the variance within each node, leading to mean-squared-error minimization.
–
Absolute Error: Focuses on minimizing the mean absolute deviation, which is more robust to outliers.
RF Regressor: The RF model extends the DT approach by combining multiple trees to capture more complex patterns and reduce variance. This ensemble learning method was tested with different numbers of trees (estimators) among {10, 50, 100}, balancing computational cost and predictive performance. The features for each of the trees within the forest and the split criteria are the same as those considered for the DT Regressor.

These regression models were selected to capture different types of relationships between predictors and target variables: linear dependencies (LR, RR) and non-linear or hierarchical structures (DT, RF). Comparing them enables assessment of whether students’ competencies can be better explained by proportional or more complex interactions among variables.

Finally, the pipeline followed for the development of this experimental stage in terms of validation of the models remained identical to that of Section 4.1.

5. Results

This section presents the main findings derived from the implementation of ML models to predict students’ MIL competencies. The results are organized into four subsections. First, we validate the questionnaire used for data collection and assess its reliability. Next, we examine the models’ classification performance across the two primary knowledge branches, Education and Communication. We then identify the most relevant features contributing to accurate classification and prediction. Finally, we analyze the performance of regression models used to estimate MIL competencies, highlighting key factors that influence their predictive accuracy.

5.1. Preliminary Results: Validation and Reliability of the Questionnaire

An analysis of the main psychometric properties of the instrument was carried out to verify its reliability and validity. The results showed an acceptable internal consistency (Cronbach’s alpha of 0.77) and a high sampling adequacy (Kaiser-Meyer-Olkin = 0.85). Bartlett’s test of sphericity was significant (

χ^{2} [190] = 2995.42; p < 0.001

), confirming that the correlation matrix was suitable for factor analysis. An exploratory factor analysis was then conducted using the principal component method with Varimax rotation. Although the eigenvalue-greater-than-one criterion initially suggested four factors, the fourth component was a residual and lacked conceptual coherence. Therefore, a three-factor solution was retained, explaining 66.86% of the total variance. This configuration provided a meaningful and interpretable structure aligned with both empirical and theoretical considerations. The interpretation of each factor was based on the content of the items with the highest loadings (≥0.40) and their conceptual relationship with existing frameworks of MIL and disinformation competence. Table 1 shows the items that configure each factor.

Considering the content of the items that most saturated each factor, the following interpretation was reached:

Factor 1: Knowledge about disinformation. This dimension includes five items reflecting conceptual understanding of disinformation, i.e., its definition, characteristics and differences from related terms such as fake news or hoaxes. It also encompasses awareness of the mechanisms that facilitate the spread of false content and the risks it poses to informed citizenship.
Factor 2: Skills and behaviors against disinformation. Comprising six items, this factor captures the ability to verify information accuracy by consulting multiple and credible sources, such as established media outlets, fact-checking organizations and official online platforms. It also reflects students’ discernment in evaluating information from social networks or close contacts.
Factor 3: Attitudes of commitment and responsibility towards disinformation. This factor, composed of nine items, emphasizes ethical and civic engagement in countering disinformation. It involves recognizing the role of responsible communication in safeguarding freedom of expression and democratic participation, as well as acknowledging the professional duty, particularly among future educators and communicators, to disseminate truthful information in an ethical and reflective manner.

Together, these three empirically derived dimensions capture the multifaceted nature of students’ MIL competencies in the context of disinformation, integrating cognitive, behavioral and attitudinal components in a coherent and statistically supported structure.

Once the construct structure was clarified, the internal consistency was calculated. The reliability coefficients in the three factors (Cronbach’s alpha) reached a good threshold of 0.77, 0.79 and 0.86.

5.2. Classification Results for the Knowledge Branch

Table 2 presents the classification performance of the evaluated models. The comparison is based on F1-score and accuracy for both training and test sets, providing insights into the models’ generalization capabilities.

The results indicate that SVM achieved the highest performance, with a test F1-score of 0.815 and accuracy of 0.814, demonstrating both high predictive power and generalization ability. The RF Classifier and LightGBM followed closely, with F1-scores of 0.744 and 0.792, respectively. These ensemble models outperformed the DT Classifier, which had the lowest test F1-score (0.679) and accuracy (0.676), suggesting that a single-tree approach may be less effective for this classification task.

The KNN model showed moderate performance, with a test F1-score of 0.779 and accuracy of 0.779, surpassing the DT classifier but falling short of RF and SVM. Notably, KNN exhibited a low standard deviation in training accuracy (0.020), suggesting stable performance across different training splits.

Overall, SVM emerged as the most effective model, followed by ensemble methods (RF and LightGBM), while DT performed the worst. The results highlight the advantage of more complex models that leverage multiple decision boundaries over simpler tree-based or distance-based classifiers.

5.3. Feature Selection Results for Knowledge-Branch Prediction

Feature selection plays a critical role in ML models, as it can significantly impact both model interpretability and predictive performance. This section presents the results of different feature-selection methods applied to the dataset, aiming to identify the most relevant features for classification. Table 3 summarizes the selected features and the corresponding model performance metrics.

The feature selectors exhibit notable variations in the features they prioritize. The SKB method selected three of the Knowledge category items, K1, K2 and K3, along with the Disinfo Training condition in both its False and True states. This suggests that these specific features hold significant discriminatory power according to univariate statistical tests. The FFS identified a different subset, incorporating K3, A3 and R5, while also retaining the Disinfo Training condition in both states. Notably, the inclusion of A3 and R5 suggests that this method captures additional dependencies between features that may not be evident through individual statistical scoring. The RFE approach selected a broader range of features, including Academic Year, K1 and K2, along with the Disinfo Training condition. The presence of academic year indicates that RFE might emphasize structural data attributes that influence model performance. Finally, the DT method produced a unique feature subset, selecting Academic Year, K3, A4 and S5, while disregarding the Disinfo Training = True feature. This difference implies that the DT structure favors hierarchical relationships among features rather than purely statistical significance.

Figure 1 further highlights the importance of specific features across different selection methods. Notably, Disinfo Training = False was the most frequently selected feature, appearing in all four selection methods, followed closely by Disinfo Training = True, which was chosen three times. This suggests that the disinformation training condition is a key determinant in the classification process. Additionally, K3 was selected three times, reinforcing its importance, whereas Academic Year, K1 and K2 were each chosen twice, indicating their moderate but relevant contribution to predictive performance.

Among all feature-selection methods, FFS achieved the highest test F1-score (0.756) and accuracy (0.759), indicating that its selected feature set contributed most effectively to model generalization. The DT-based selection followed closely with an F1-score and accuracy of 0.738, suggesting that its feature set was also effective but slightly less optimal for this classification task. Interestingly, SKB and RFE exhibited similar performance, with both reaching an F1-score of 0.751–0.752 and test accuracy of 0.752, reinforcing the idea that simpler selection methods based on statistical significance and elimination heuristics can still yield competitive results.

Overall, these findings indicate that FFS provided the most robust feature-selection strategy, likely due to its ability to iteratively assess combinations of features, leading to improved generalization. This highlights the importance of considering feature interactions rather than relying solely on individual statistical measures when optimizing feature selection for classification tasks. The frequency analysis in Figure 1 further supports this conclusion, as FFS incorporated highly recurrent features such as K3 and Disinfo Training variables, which were also selected by other methods.

5.4. Regression Results for Measured Competencies

Table 4 presents the predictive performance of the regression models for the different objective variables. The results are evaluated based on the Root Mean Squared Error (RMSE) for both the training phase, using cross-validation and the test phase. Lower RMSE values indicate better predictive accuracy.

The results indicate that the best model varies depending on the objective variable. Knowledge was best predicted by LR, with a test RMSE of 2.287. This suggests that a simple linear relationship between the input features and the target variable is sufficient for accurate predictions. Skills and Responsibility were best modeled by RF, achieving test RMSE values of 1.835 and 2.064, respectively. The ensemble approach of RF likely allowed for better handling of complex, nonlinear relationships in these cases. Attitudes were best captured by RR, with a test RMSE of 1.598. The use of regularization in RR likely helped in controlling model complexity and improving generalization.

Among all models, RR for Attitudes achieved the lowest test RMSE (1.598), indicating the best predictive performance overall. The RF models exhibited higher variability in their training RMSE, especially for Responsibility, with a standard deviation of 0.715, which suggests sensitivity to different training subsets. In contrast, LR and RR showed more stable performance across cross-validation folds.

6. Discussion and Limitations

The preceding section presented the main findings of the machine learning analyses. This section discusses their implications, situates them within the existing literature and addresses the study’s limitations.

From an educational perspective, the findings presented in Section 5.2 highlight the effectiveness of ML tools in assessing MIL competencies. The SVM model performed the best, suggesting that more complex approaches are essential for accurately evaluating MIL skills, as they can consider multiple variables and interactions. The strong performance of RF and LightGBM models further emphasizes the importance of using methods that can handle complex relationships in the data. These models are particularly useful in assessing students’ knowledge, skills and attitudes toward misinformation, which is increasingly vital in modern education. On the other hand, the lower performance of the DT model highlights the limitations of simpler approaches, indicating that more robust models are needed for assessing MIL in future educators and communicators. Finally, while KNN had lower accuracy, its stability suggests that it could still be valuable in smaller groups or less variable contexts.

Comparing these outcomes with prior works reveals several points of convergence. Studies such as Reddy et al. [46] and Mohd Nadzir and Abu Bakar [48] also reported the superiority of non-linear classifiers, particularly SVM, in educational prediction tasks. Similarly, our results support the notion that ensemble models outperform single-tree approaches when modeling students’ perceptual data, due to their robustness to noise and ability to generalize across heterogeneous populations.

The analysis of feature selection also provides meaningful insights. The feature-selection methods demonstrated that variables related to disinformation training were consistently important across all methods, underlining the critical role that prior training in misinformation plays in shaping students’ understanding and responses. The results also show that features such as knowledge items (K1, K2, K3) and attitudes toward misinformation (A3, R5) were influential in accurately predicting knowledge outcomes. Additionaly, the highest performance of the FFS method suggests that educational approaches should not only focus on standalone knowledge but also incorporate a more holistic understanding of how various factors, like academic year or previous training, interact to influence students’ competencies. The variability in feature importance across methods further suggests that educational systems should take a multifaceted approach when assessing and addressing MIL, ensuring that all relevant factors are considered to improve training and education on misinformation.

Finally, regarding the regression of the competencies, the results emphasize the importance of choosing appropriate regression models for different competencies in education. LR is effective for predicting knowledge, as it handles simple relationships well. For skills and responsibility, RF excels due to its ability to manage complex, nonlinear relationships. RR performs best for predicting attitudes, thanks to its stability through regularization. These findings suggest that selecting the right model based on the nature of the competency can enhance the accuracy of educational assessments.

Despite these contributions, some limitations must be acknowledged. First, the study relies on self-reported data, which may be subject to social desirability bias or inaccuracies in participants’ self-perception. Second, ML models require large volumes of data to generate accurate predictions, which can be challenging in studies with relatively small or heterogeneous samples. Moreover, the sample was restricted to students in education and communication degrees from a single geographic region, which may constrain the generalizability of the results. Future research could address these issues by combining self-perception data with performance-based assessments and expanding the sample to include other disciplines and institutions.

7. Conclusions

This study demonstrates the potential of ML tools in assessing MIL competencies among students. Key findings show that more complex models, such as SVM, RF and LightGBM, provide better insights into students’ abilities, highlighting the need for a holistic approach that considers multiple factors, such as academic year and prior training, when evaluating MIL skills.

The research emphasizes that educational systems should integrate various elements of students’ backgrounds to improve their understanding of misinformation. It suggests that incorporating ML tools and considering a variety of student-related factors can significantly improve how misinformation is addressed within educational frameworks, ultimately fostering more critical and informed students.

Importantly, this study contributes to the international academic debate on MIL in the context of disinformation by bridging theoretical understanding and advanced methodological approaches. While prior research has largely relied on survey-based assessments or traditional statistical analyses, this study demonstrates how ML can be used to model and predict MIL competencies, providing a more nuanced and data-driven understanding of the factors influencing students’ skills. Furthermore, by examining sociodemographic variables such as gender, academic year and prior training, the study highlights patterns and disparities that have been inconsistently addressed in the existing literature, thereby offering new insights into how these factors shape MIL development across diverse educational contexts.

Finally, the findings provide practical implications for designing targeted interventions and evidence-based educational strategies. Unlike previous studies that have primarily focused on general MIL competencies, this work explicitly addresses MIL against disinformation, demonstrating the relevance of predictive analytics for tailoring educational initiatives. By linking ML-based predictions with theoretically grounded MIL dimensions, the study extends the current literature on MIL, informing both policy recommendations and pedagogical practices aimed at enhancing students’ ability to critically navigate and respond to disinformation in digital environments.

Future research should investigate the generalizability of these findings across diverse regions, educational programs and cultural contexts. Moreover, the integration of more advanced ML techniques, such as deep learning, has the potential to further improve prediction accuracy. Subsequent studies could also examine the effectiveness of targeted interventions informed by these predictive models in enhancing students’ MIL competencies against disinformation.

Author Contributions

Conceptualization, J.M.A.-L., M.B.F., C.E.G.-R., A.Z. and E.Y.-B.; Methodology, J.M.A.-L., M.B.F., C.E.G.-R. and E.Y.-B.; Resources, J.M.A.-L., M.B.F. and E.Y.-B.; Data curation, J.M.A.-L., M.B.F. and E.Y.-B.; Writing—original draft preparation, J.M.A.-L., M.B.F., C.E.G.-R. and E.Y.-B.; Writing—review and editing, J.M.A.-L., M.B.F., C.E.G.-R., A.Z. and E.Y.-B.; Supervision, M.B.F. and E.Y.-B. All authors have read and agreed to the published version of the manuscript.

Funding

This work was conducted with the support of the project “Media and Digital Literacy in Young People and Adolescents: Diagnosis and Educational Innovation Strategies to Prevent Risks and Promote Good Practices on the Internet,” funded by the Department of Universities of the Government of Cantabria (Spain). Additionally, these results are part of UNITE: University Network for Inclusive and digiTal Education, project funded by the European Union within the Erasmus+ programme (call 2023), Key Action 2: KA220 Cooperation Partnerships for higher education (2023-1-IT02- KA220-HED-0001621181).

Institutional Review Board Statement

The research adhered strictly to ethical regulations, including Spain’s Organic Law 3/2018 on the Protection of Personal Data and the principles outlined in the Declaration of Helsinki. Ethical approval was obtained from the university’s institutional ethics committee (CEIH-25-54) on 11 July 2025.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

José Manuel Alcalde-Llergo enrolled in the National PhD in Artificial Intelligence, XXXVIII cycle, course on Health and Life Sciences, organized by Università Campus Bio-Medico di Roma. He is also pursuing his doctorate with co-supervision at the Universidad de Córdoba (Spain), enrolled in its PhD program in Computation, Energy and Plasmas.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Marcos-Vílchez, J.M.; Sánchez-Martín, M.; Muñiz-Velázquez, J.A. Effectiveness of training actions aimed at improving critical thinking in the face of disinformation: A systematic review protocol. Think. Skills Creat. 2024, 51, 101474. [Google Scholar] [CrossRef]
UNESCO. Ciudadanía Alfabetizada en Medios e Información: Pensar Críticamente, Hacer Clic Sabiamente. Available online: https://unesdoc.unesco.org/ark:/48223/pf0000385119 (accessed on 23 September 2025).
Mateus De Oro, C.; Jabba, D.; Erazo-Coronado, A.M.; Aguaded, I.; Campis Carrillo, R. Educommunication and ICT: From a corpus to a model of educational intervention for critical attitude. Technol. Pedagog. Educ. 2024, 33, 235–254. [Google Scholar] [CrossRef]
Romero-Rodríguez, L.M.; Contreras-Pulido, P.; Pérez-Rodríguez, M.A. Media competencies of university professors and students. Comparison of levels in Spain, Portugal, Brazil and Venezuela. Cult. Educ. 2019, 31, 326–368. [Google Scholar] [CrossRef]
Alcolea-Díaz, G.; Reig, R.; Mancinas-Chávez, R. UNESCO’s Media and Information Literacy curriculum for teachers from the perspective of Structural Considerations of Information. Comunicar 2020, 28, 103–114. [Google Scholar] [CrossRef]
UNESCO. Beijing Consensus on Artificial Intelligence and Education. Available online: https://unesdoc.unesco.org/ark:/48223/pf0000368303 (accessed on 23 September 2025).
OECD. Pushing the Frontiers with Artificial Intelligence, Blockchain and Robots; OECD Digital Education Outlook; OECD: Paris, France, 2021. [Google Scholar]
Zhai, X. Practices and Theories: How Can Machine Learning Assist in Innovative Assessment Practices in Science Education. J. Sci. Educ. Technol. 2021, 30, 139–149. [Google Scholar] [CrossRef]
Aïmeur, E.; Amri, S.; Brassard, G. Fake news, disinformation and misinformation in social media: A review. Soc. Netw. Anal. Min. 2023, 13, 39. [Google Scholar] [CrossRef] [PubMed]
Sanaullah, A.R.; Das, A.; Das, A.; Kabir, M.; Shu, K. Applications of machine learning for COVID-19 misinformation: A systematic review. Soc. Netw. Anal. Min. 2022, 12, 94. [Google Scholar] [CrossRef]
Korkmaz, C.; Correia, A.P. A review of research on machine learning in educational technology. Educ. Media Int. 2019, 56, 250–267. [Google Scholar] [CrossRef]
Fastrez, P.; Landry, N. Media Literacy and Media Education Research Methods: A Handbook; Routledge: Oxfordshire, UK, 2023. [Google Scholar] [CrossRef]
Knaus, T. Technology criticism and data literacy: The case for an augmented understanding of media literacy. J. Media Lit. Educ. 2020, 12, 6–16. [Google Scholar] [CrossRef]
Gross, E.C.; Balaban, D.C. The Effectiveness of an Educational Intervention on Countering Disinformation Moderated by Intellectual Humility. Media Commun. 2025, 13, 1–18. [Google Scholar] [CrossRef]
Huang, G.; Jia, W.; Yu, W. Media Literacy Interventions Improve Resilience to Misinformation: A Meta-Analytic Investigation of Overall Effect and Moderating Factors; Communication Research; Sage: Thousand Oaks, CA, USA, 2024. [Google Scholar] [CrossRef]
Cooper, T. Calling out ‘alternative facts’: Curriculum to develop students’ capacity to engage critically with contradictory sources. Teach. High. Educ. 2019, 24, 444–459. [Google Scholar] [CrossRef]
Aguaded, I.; Marin-Gutierrez, I.; Diaz-Parejo, E. Media literacy between primary and secondary students in Andalusia (Spain). Ried-Rev. Iberoam. Educ. Distancia 2015, 18, 275–298. [Google Scholar]
Hobbs, R.; Moen, M.; Tang, R.; Steager, P. Measuring the implementation of media literacy statewide: A validation study. Educ. Media Int. 2022, 59, 189–208. [Google Scholar] [CrossRef]
Landry, N.; Caneva, C. Defining media education policies: Building blocks, scope, and characteristics. In The Handbook of Media Education Research; Frau-Meigs, D., Kotilainen, S., Pathak-Shelat, M., Hoechsmann, M., Poyntz, S.R., Eds.; Wiley Online Library: Hoboken, NJ, USA, 2020; pp. 289–308. [Google Scholar]
Aguaded, I.; Delgado-Ponce, A. Educommunication. In The International Encyclopedia of Media Literacy; Wiley: Hoboken, NJ, USA, 2019; pp. 1–6. [Google Scholar] [CrossRef]
Osuna-Acedo, S.; Frau-Meigs, D.; Marta-Lazo, C. Educación mediática y formación del profesorado. Educomunicación más allá de la alfabetización digital. Rev. Interuniv. Form. Profr. 2018, 32, 29–42. [Google Scholar]
Golob, T.; Makarovič, M.; Rek, M. Meta-reflexivity for resilience against disinformation. Comunicar 2021, 29, 107–118. [Google Scholar] [CrossRef]
Herrero-Diz, P.; Conde-Jiménez, J.; Tapia-Frade, A.; Varona-Aramburu, D. The credibility of online news: An evaluation of the information by university students/La credibilidad de las noticias en Internet: Una evaluación de la información por estudiantes universitarios. Cult. Educ. 2019, 31, 407–435. [Google Scholar] [CrossRef]
Almenar, E.; Aran-Ramspott, S.; Suau, J.; Masip, P. Gender Differences in Tackling Fake News: Different Degrees of Concern, but Same Problems. Media Commun. 2021, 9, 229–238. [Google Scholar] [CrossRef]
Mansoori, A.; Tahat, K.; Tahat, D.; Habes, M.; Salloum, S.; Mesbah, H.; Elareshi, M. Gender as a moderating variable in online misinformation acceptance during COVID-19. Heliyon 2023, 9, e19425. [Google Scholar] [CrossRef]
Bouleimen, A.; Luceri, L.; Cardoso, F.; Botturi, L.; Hermida, M.; Addimando, L.; Beretta, C.; Galloni, M.; Giordano, S. Online search is more likely to lead youth to validate true news than to refute false ones. arXiv 2023, arXiv:2303.13138. [Google Scholar] [CrossRef]
Pennycook, G.; Rand, D.G. Who falls for fake news? The roles of bullshit receptivity, overclaiming, familiarity, and analytic thinking. J. Personal. 2020, 88, 185–200. [Google Scholar] [CrossRef]
Avello-Martínez, R.; Villalba-Condori, K.O.; Arias-Chávez, D. Algunos mitos más difundidos sobre las TUC en la educación. Rev. Mendive 2021, 19, 1359–1375. [Google Scholar]
Herrero-Curiel, E.; González-Aldea, P. Impacto de las fake news en estudiantes de periodismo y comunicación audiovisual de la Universidad Carlos III de Madrid. Vivat Academia. Rev. Comun. 2022, 155, 1–21. [Google Scholar] [CrossRef]
Cherner, T.S.; Curry, K. Preparing Pre-Service Teachers to Teach Media Literacy: A Response to “Fake News”. J. Media Lit. Educ. 2019, 11, 1–31. [Google Scholar] [CrossRef]
Kahne, J.; Bowyer, B. Education for democracy in a partisan age: Confronting the challenges of motivated reasoning and misinformation. Am. Educ. Res. J. 2017, 54, 3–34. [Google Scholar] [CrossRef]
Fajardo, I.; Villalta, E.; Salmerón, L. Are really digital natives so good? Relationship between digital skills and digital reading. Ann. Psychol. 2015, 32, 89–97. [Google Scholar] [CrossRef]
Adjin-Tettey, T.D. Combating fake news, disinformation, and misinformation: Experimental evidence for media literacy education. Cogent Arts Humanit. 2022, 9, 2037229. [Google Scholar] [CrossRef]
Chen, X.; Xie, H.; Zou, D.; Hwang, G.J. Application and theory gaps during the rise of Artificial Intelligence in Education. Comput. Educ. Artif. Intell. 2020, 1, 100002. [Google Scholar] [CrossRef]
Hilbert, S.; Coors, S.; Kraus, E.; Bischl, B.; Lindl, A.; Frei, M.; Wild, J.; Krauss, S.; Goretzko, D.; Stachl, C. Machine learning for the educational sciences. Rev. Educ. 2021, 9, e3310. [Google Scholar] [CrossRef]
Ersozlu, Z.; Taheri, S.; Koch, I. A review of machine learning methods used for educational data. Educ. Inf. Technol. 2024, 29, 22125–22145. [Google Scholar] [CrossRef]
Gibson, D.C.; Ifenthaler, D. Preparing the next generation of education researchers for big data in higher education. In Big Data and Learning Analytics in Higher Education; Daniel, B.K., Ed.; Springer: Berlin/Heidelberg, Germany, 2017; pp. 29–42. [Google Scholar] [CrossRef]
Munir, H.; Vogel, B.; Jacobsson, A. Artificial intelligence and machine learning approaches in digital education: A systematic revision. Information 2022, 13, 203. [Google Scholar] [CrossRef]
Luan, H.; Tsai, C.C. A review of using machine learning approaches for precision education. Educ. Technol. Soc. 2021, 24, 250–266. [Google Scholar]
Tan, B.; Jin, H.Y.; Cutumisu, M. The applications of machine learning in computational thinking assessments: A scoping review. Comput. Sci. Educ. 2023, 34, 193–221. [Google Scholar] [CrossRef]
Ibarra-Vázquez, G.; Ramírez-Montoya, M.S.; Terashima, H. Gender prediction based on university students’ complex thinking competency: An analysis from machine learning approaches. Educ. Inf. Technol. 2024, 29, 2721–2739. [Google Scholar] [CrossRef] [PubMed]
Estrada-Molina, O.; Mena, J.; López-Padrón, A. The Use of Deep Learning in Open Learning: A Systematic Review (2019 to 2023). Int. Rev. Res. Open Distrib. Learn. 2024, 25, 370–393. [Google Scholar] [CrossRef]
Forero-Corba, W.; Negre Bennasar, F. Diseño y simulación de un modelo de predicción para la evaluación de la competencia digital docente usando técnicas de Machine Learning. Edutec. Rev. Electrón. Tecnol. Educ. 2024, 89, 18–43. [Google Scholar] [CrossRef]
Bernardo, A.B.I.; Cordel, M.O.I.I.; Ricardo, J.G.E.; Galanza, M.A.M.C.; Almonte-Acosta, S. Global Citizenship Competencies of Filipino Students: Using Machine Learning to Explore the Structure of Cognitive, Affective, and Behavioral Competencies in the 2019 Southeast Asia Primary Learning Metrics. Educ. Sci. 2022, 12, 547. [Google Scholar] [CrossRef]
Miao, X.; Nadaf, A.; Zhou, Z. Machine learning evidence from PISA 2018 data to integrate global competence intervention in UAE K–12 public schools. Int. Rev. Educ. 2023, 69, 675–690. [Google Scholar] [CrossRef]
Reddy, P.; Chaudhary, K.; Sharma, B. Predicting Media Literacy Level of Secondary School Students in Fiji. In Proceedings of the 2019 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE), Melbourne, Australia, 9–11 December 2019; pp. 1–7. [Google Scholar] [CrossRef]
Wusylko, C.; Weisberg, L.; Opoku, R.A.; Abramowitz, B.; Williams, J.; Xing, W.; Vu, T.; Vu, M. Using machine learning techniques to investigate learner engagement with TikTok media literacy campaigns. J. Res. Technol. Educ. 2024, 56, 72–93. [Google Scholar] [CrossRef]
Mohd Nadzir, M.; Abu Bakar, J. A digital literacy predictive model in the context of distance education. J. ICT Educ. 2023, 10, 118–134. [Google Scholar] [CrossRef]
OECD. La Definición y Selección de Competencias Clave; Resumen Ejecutivo; OECD: Paris, France, 2005. [Google Scholar]
Catalina-García, B.; Sousa, J.P.; Silva Sousa, L.C. Consumo de noticias y percepción de fake news entre estudiantes de Comunicación de Brasil, España y Portugal. Rev. Comun. 2019, 18, 123–135. [Google Scholar] [CrossRef]
Figueira, J.; Santos, S. Percepción de las noticias falsas en universitarios de Portugal: Análisis de su consumo y actitudes. Prof. Inf. 2019, 28, e280315. [Google Scholar] [CrossRef]
Marín Suelves, D.; García Tort, E.; Gabarda Méndez, V. La elección de los grados de Maestro/a: Análisis de tendencias e incidencia del género y la titulación. Educ. Siglo XXI 2021, 39, 301–324. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Shi, Y.; Ke, G.; Soukhavong, D.; Lamb, J.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; et al. LightGBM: Light Gradient Boosting Machine, R Package Version 4.6.0.99; CRAN: Windhoek, Namibia, 2025.

Figure 1. Most useful features selected by the different feature-selection methods in order to predict the knowledge branch.

Table 1. Factor analysis (rotated component matrix).

Id	Item	Factor 1	Factor 2	Factor 3
K1	I know what misinformation means.	0.706
K2	I can identify the characteristics of
	misinformation in a fake headline.	0.757
K3	I differentiate the concept of misinformation
	from similar terms such as “fake news”,
	“rumors” and “false news”.	0.665
K4	I understand why misinformation spreads.	0.694
K5	I can identify the risks of misinformation.	0.679
S1	I differentiate true information from false
	information based on the author or media outlet.		0.569
S2	I verify the truthfulness of information with
	different resources (news outlets, fact-checking
	websites, social media, etc.).		0.472
S3	I use various reliable sources of information
	such as reputable newspapers, official websites,
	and institutional pages to search for information
	on topics that interest me.		0.430
S4	I share information with my contacts without
	verifying its truthfulness.		0.704
S5	I report false information when I detect it on any
	channel or platform.		0.316
A1	I positively value the fight against misinformation
	to guarantee freedom of expression.			0.684
A2	I am aware that misinformation goes against
	the formation of a free and democratic citizenry.			0.633
A3	I trust the information I receive through social media.			0.628
A4	I believe that the media is trustworthy.			0.786
A5	I give more credibility to information received from
	a close contact (family, friends, etc.) than from
	other sources.		0.595
R1	I am aware that misinformation can affect the
	exercise of my future professional practice.			0.724
R2	I perceive that the fight against misinformation
	falls on professionals in my field.			0.639
R3	I recognize that misinformation is a necessary topic
	in my education for the practice of my profession.			0.644
R4	I believe it is each person’s responsibility to fight
	against misinformation.			0.545
R5	I have the ethical and moral responsibility as a future
	edu-communicator to transmit truthful information.			0.602

Table 2. Results of the different models in predicting the knowledge branch, including the best configuration for each of the models.

Model		Train (CV)				Test
		Mean	Std	Mean	Std
Algorithm	Best Parameters	F1	F1	Acc	Acc	F1	Acc
	{’C’: 1, ’gamma’: ’scale’,
SVM	’kernel’: ’rbf’}	0.734	0.032	0.782	0.033	0.815	0.814
	{’criterion’: ’gini’,
	’max_depth’: 5,
DT Classifier	’min_samples_split’: 10}	0.662	0.041	0.704	0.040	0.679	0.676
	{’criterion’: ’entropy’,
	’max_depth’: 5,
	’min_samples_split’: 2,
RF Classifier	’n_estimators’: 10}	0.705	0.061	0.773	0.037	0.744	0.745
	{’learning_rate’: 0.1,
	’max_depth’: 5,
LightGBM	’n_estimators’: 50}	0.683	0.058	0.765	0.043	0.792	0.793
	{’metric’: ’euclidean’,
	’n_neighbors’: 5,
KNN	’weights’: ’uniform’}	0.658	0.034	0.747	0.020	0.779	0.779

Table 3. Results obtained for different feature selection methods.

Selector		Train (CV)				Test
Method	Best Features	Mean F1	Std F1	Mean Acc	Std Acc	F1	Acc
SKB	K1, K2, K3,
	Disinfo Training = False,
	Disinfo Training = True	0.682	0.030	0.763	0.019	0.751	0.752
FFS	K3, A3, R5,
	Disinfo Training = False,
	Disinfo Training = True	0.665	0.039	0.760	0.021	0.756	0.759
RFE	Academic Year, K1, K2,
	Disinfo Training = False,
	Disinfo Training = True	0.691	0.027	0.756	0.017	0.752	0.752
DT	Academic Year, K3, A4, S5,
	Disinfo Training = False	0.705	0.057	0.761	0.045	0.738	0.738

Table 4. Regression performance metrics for the evaluated models.

Regression Target and Selected Model		Train (CV)		Test
Objective Variable	Best Model	Mean RMSE	Std MSE	Test RMSE
Knowledge	LR {’fit_intercept’: True}	2.201	0.311	2.287
	RF {’criterion’: ’squared_error’,
	’max_depth’: 5,
	’min_samples_split’: 20,
Skills	’n_estimators’: 100}	1.759	0.455	1.835
Attitudes	RR {’alpha’: 10.0}	1.686	0.354	1.598
	RF {’criterion’: ’squared_error’,
	’max_depth’: 5,
	’min_samples_split’: 2,
Responsibility	’n_estimators’: 100}	2.088	0.715	2.064

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alcalde-Llergo, J.M.; Fernández, M.B.; George-Reyes, C.E.; Zingoni, A.; Yeguas-Bolívar, E. Leveraging Machine Learning Techniques to Investigate Media and Information Literacy Competence in Tackling Disinformation. Information 2025, 16, 929. https://doi.org/10.3390/info16110929

AMA Style

Alcalde-Llergo JM, Fernández MB, George-Reyes CE, Zingoni A, Yeguas-Bolívar E. Leveraging Machine Learning Techniques to Investigate Media and Information Literacy Competence in Tackling Disinformation. Information. 2025; 16(11):929. https://doi.org/10.3390/info16110929

Chicago/Turabian Style

Alcalde-Llergo, José Manuel, Mariana Buenestado Fernández, Carlos Enrique George-Reyes, Andrea Zingoni, and Enrique Yeguas-Bolívar. 2025. "Leveraging Machine Learning Techniques to Investigate Media and Information Literacy Competence in Tackling Disinformation" Information 16, no. 11: 929. https://doi.org/10.3390/info16110929

APA Style

Alcalde-Llergo, J. M., Fernández, M. B., George-Reyes, C. E., Zingoni, A., & Yeguas-Bolívar, E. (2025). Leveraging Machine Learning Techniques to Investigate Media and Information Literacy Competence in Tackling Disinformation. Information, 16(11), 929. https://doi.org/10.3390/info16110929

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Leveraging Machine Learning Techniques to Investigate Media and Information Literacy Competence in Tackling Disinformation

Abstract

1. Introduction

2. Background, Variables and Research Model

2.1. MIL Against Disinformation

2.2. Associated Variables

2.3. Application of ML Techniques in Educational Perceptions

3. Methodology

3.1. Research Design and Study Variables

3.2. Instrument

3.3. Participants and Data Collection

3.4. Procedure and Data Preprocessing

4. Experimentation

4.1. Classification of the Knowledge Branch

4.2. Selection of the Most Relevant Features for Predicting the Knowledge Branch

4.3. Regression of the Measured Competencies

5. Results

5.1. Preliminary Results: Validation and Reliability of the Questionnaire

5.2. Classification Results for the Knowledge Branch

5.3. Feature Selection Results for Knowledge-Branch Prediction

5.4. Regression Results for Measured Competencies

6. Discussion and Limitations

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI