Combine Virtual Reality and Machine-Learning to Identify the Presence of Dyslexia: A Cross-Linguistic Approach

Materazzini, Michele; Morciano, Gianluca; Alcalde-Llergo, José Manuel; Yeguas-Bolívar, Enrique; Calabrò, Giuseppe; Zingoni, Andrea; Taborri, Juri

doi:10.3390/info16090719

Open AccessArticle

Combine Virtual Reality and Machine-Learning to Identify the Presence of Dyslexia: A Cross-Linguistic Approach

by

Michele Materazzini

^1,*

,

Gianluca Morciano

¹,

José Manuel Alcalde-Llergo

¹

,

Enrique Yeguas-Bolívar

²

,

Giuseppe Calabrò

¹

,

Andrea Zingoni

¹

and

Juri Taborri

^1,*

¹

Department of Economics, Engineering, Society and Business Organization, University of Tuscia, 01100 Viterbo, Italy

²

Computing and Numerical Analysis, University of Córdoba, 14071 Córdoba, Spain

^*

Authors to whom correspondence should be addressed.

Information 2025, 16(9), 719; https://doi.org/10.3390/info16090719

Submission received: 3 June 2025 / Revised: 13 July 2025 / Accepted: 12 August 2025 / Published: 22 August 2025

(This article belongs to the Special Issue Machine Learning and Artificial Intelligence with Applications)

Download

Browse Figures

Versions Notes

Abstract

This study explores the use of virtual reality (VR) and artificial intelligence (AI) to predict the presence of dyslexia in Italian and Spanish university students. In particular, the research investigates whether VR-derived data from Silent Reading (SR) tests and self-esteem assessments can differentiate between students that are affected by dyslexia and students that are not, employing machine learning (ML) algorithms. Participants completed VR-based tasks measuring reading performance and self-esteem. A preliminary statistical analysis (t-tests and Mann–Whitney tests) on these data was performed, to compare the obtained scores between individuals with and without dyslexia, revealing significant differences in completion time for the SR test, but not in accuracy, nor in self-esteem. Then, supervised ML models were trained and tested, demonstrating an ability to classify the presence/absence of dyslexia with an accuracy of 87.5% for Italian, 66.6% for Spanish, and 75.0% for the pooled group. These findings suggest that VR and ML can effectively be used as supporting tools for assessing dyslexia, particularly by capturing differences in task completion speed, but language-specific factors may influence classification accuracy.

Keywords:

machine-learning; virtual reality; dyslexia

1. Introduction

The World Health Organization classifies specific learning disorders (SLDs) as neuro-developmental conditions characterized by persistent and significant difficulties in core learning abilities, such as reading, writing, and mathematics [1]. Students with SLDs, including dyslexia, dyscalculia, dysorthography, and dysgraphia, often face academic challenges, which can undermine their confidence levels [2]. Focusing on dyslexia, it varies by language due to differences in spelling rules. In transparent languages (e.g., Spanish, Italian), reading is slower but with fewer errors. In opaque languages (e.g., English, French), irregular spelling makes word recognition and spelling harder. Morphology also plays a key role in some languages, affecting how dyslexia is diagnosed and supported [3,4,5]. In addition, beyond the primary difficulties associated with SLDs, secondary issues frequently arise, with low self-esteem being one of the most prominent. Self-esteem is a critical component of psychological well-being, influencing academic success, social relationships [6], and overall quality of life [7]. Psychologists typically assess self-esteem using specialized questionnaires designed for this purpose. One of the most renowned and widely employed tools for such analysis is the Rosenberg Self-Esteem Scale (RSES) [8]. This scale consists of 10 questions, with responses ranging from “strongly agree” to “strongly disagree”. The resulting scores offer valuable insights into individuals’ self-perception across various life domains [9]. The RSES is a widely used and validated tool, even if the presence of negative wording has been criticized, since it seemed to be linked to lower reliability in some studies [10].

The relationship between SLDs and self-esteem is complex and widely explored in educational psychology. Research consistently indicates that students with SLDs, such as dyslexia and dyscalculia, tend to have lower self-esteem compared to their peers without these challenges [11]. Individuals with SLDs may experience significant impacts on both academic and social self-esteem, with affected students often experiencing diminished confidence and a reduced sense of self-worth [12]. Social dynamics further exacerbate these challenges, as peer rejection commonly leads to loneliness and feelings of inadequacy [13]. Additionally, anxiety and depression frequently accompany SLDs, compounding self-esteem issues and deepening the emotional struggles of affected students [14]. Dyslexia can also negatively impact self-concept and can lead to learned helplessness in the face of academic difficulties [15]. Despite this general trend, not all studies report uniformly low self-esteem among students with SLDs. Some students with learning disabilities maintain positive self-esteem, pointing to the influence of factors such as personal resilience and social support. This variability highlights the multifaceted nature of self-esteem in students with SLDs and suggests that external and individual factors can significantly mitigate the negative psychological impact of learning challenges [16]. As a result of the above, early identification of dyslexia is crucial for effective intervention; in fact, research shows that children identified as at risk for dyslexia can achieve significant progress when they are detected early and enrolled in targeted intervention programs [17].

Traditional dyslexia tests involve reading meaningful and meaningless words aloud, while newer methods focus on Silent Reading (SR), missing spaces, and recognizing misspelled words. Digital technologies have enhanced dyslexia diagnosis through neurological data analysis (magnetic resonance imaging (MRI), functional magnetic resonance imaging (fMRI), electroencephalogram (EEG), and eye-tracking) and AI-driven test optimization. AI helps refine test selection, analyze results, and develop automated predictors [18], achieving over 90% accuracy in some studies [19,20]. Among traditional tests, it is interesting to explore the reading preferences of the population. Research indicates that dyslexic children show little difference in performance between SR and reading aloud [21]. Dyslexic adolescents, however, often prefer reading aloud, to enhance text comprehension [22], whereas dyslexic adults demonstrate limited improvement in reading speed during SR [23]. Additionally, students struggle with SR due to difficulty with complex words, vocabulary, and focus, and often lack opportunities to develop independent reading skills [24].

It is important to note that traditional tools, such as the Rosenberg Self-Esteem Scale (RSES) and SR test, may lack the ecological validity and sensitivity needed to fully capture the nuanced experiences of children with SLDs [25]. In order to generalize these tests to real-life situations, Virtual Reality (VR) has emerged as a promising tool for improving the ecological validity of neuro-psychological tests by providing more realistic testing environments than traditional methods [26]. Through interactive and immersive scenarios in a VR environment, users can engage in tasks and experiences that mirror real-life situations. This approach provides a more authentic and nuanced assessment of their self-esteem and emotional well-being [27]. In addition, VR has proven to be an extremely effective tool in this field, where it has already been successfully applied as an effective educational tool for supporting students with learning disabilities [28,29], stimulating empathy [30,31] and self-esteem [32]. VR has also been successfully used on multiple fronts applied to dyslexic children, these include teaching [33], cognitive rehabilitation [34,35], and even in diagnosis, when used in combination with machine learning and eye-tracking devices, achieving 98% accuracy [36]. Furthermore, VR aims to increase engagement [28], accessibility [37], and concentration [38], by providing an interactive experience that builds on the strengths of modern devices. Lastly, VR offers the advantage of enabling the rapid, repeatable, and efficient collection of large amounts of data during user interactions within the VR environment [39], and the immersive nature of VR allows for real-time tracking of user responses [40], reducing manual evaluation time, while ensuring a more structured and reliable dataset. Given these findings, this work explores the potential of combining self-esteem assessment with SR evaluations as an additional diagnostic tool for dyslexia, all within an immersive VR environment.

Collected data can be analyzed using Artificial Intelligence (AI), particularly through Machine Learning (ML), creating a synergistic integration of these two cutting-edge technologies. Several studies have already demonstrated the efficiency of various ML algorithms for the detection of learning disorders. For example with k-Nearest Neighbors (k-NNs) [41], where a machine learning system was developed to identify dyslexia risk based on data from 857 first-grade students in Malaysia. The dataset, collected in a prior study using specially designed tests, was manually labeled by a dyslexia expert and preprocessed to ensure quality, and trained using a 70/30 train–test split. The model achieved 99% accuracy in classification, with validation against expert assessments confirming its reliability. Another study adopted a Support Vector Machine (SVM) [42], where an SVM-based anatomical classifier was used to distinguish students with (22) and without (27) dyslexia, achieving 80% accuracy in the training sample, but dropping to 59% in a general population sample of 876 subjects. Key brain regions involved were the Left Inferior Parietal Lobule (LIPL) and the Left and Right Orbitofrontal Gyrus (LOFG, ROFG). Brain structure correlated with dyslexia severity. However, the high false positives in the second sample suggest the classifier may capture broader anatomical variations rather than dyslexia-specific traits. A further work applied Logistic Regression (LR) [43], the study analyzed neuro-imaging data from 130 dyslexic and 106 control subjects, each described by 742 cortical features. A LR classifier initially achieved an Area Under Curve (AUC) of 0.61, but step-forward feature selection improved the performance to 0.73 using only 25 features (3.4% of the total). To address the site dependency, two methods were proposed: site-dependent whitening (SDW) and site-dependent extension (SDE). Both methods enhanced classification performance to AUC 0.82 (SDW) and 0.83 (SDE), representing a 12% improvement over naive feature selection and a 25% improvement over using all features. Finally, a combination of SVM, LR, and Random Forest (RF) can be find in the literature [44]; the study examined grey matter differences between typical and dyslexic children using a multivariate approach on T1-weighted MRI images from 236 children across Poland, France, and Germany. Participants aged 8.5–13.7 were screened for dyslexia based on standardized reading tests, IQ, and exclusion of ADHD or neurological disorders. The classification of dyslexic vs. control subjects achieved moderate accuracy (AUC = 0.66, accuracy (ACC) = 0.65 with 10-fold cross-validation) using LR, SVM, and RF classifiers. Dyslexic children showed higher mean curvature and greater folding index in left-hemisphere language-related regions, supporting prior findings of cortical folding anomalies in dyslexia. While multi-site variability and social background differences pose challenges, these results suggest geometric cortical properties as potential biomarkers of dyslexia.

From this perspective, it is clear how the combination of VR scenarios to gather data and ML algorithms for data processing could represent a groundbreaking approach, enhancing the use of technology to complement traditional methods for diagnosing dyslexia. To the best of the authors’ knowledge, while various efforts have been made to incorporate VR and ML tools into the clinical assessment of learning disorders, the virtualization of SR and self-esteem tests remains unexplored. Similarly, the use of machine-learning algorithms to automatically identify the presence of dyslexia based on data gathered from the abovementioned tests is still an open challenge. Recent metric-learning research shows that block-level feature augmentation combined with self-supervised auxiliary loss can markedly improve few-shot classification under extreme data scarcity, a result that directly motivated the present exploratory design [45].

This study introduces an innovative VR-based tool designed to administer both SR and Rosenberg self-esteem tests. The tool collects digital data that can be used to train ML algorithms to identify the presence of dyslexia. As an additional innovative aspect, this study adopted a cross-linguistic approach by exploring how the novel methodology could be applied to data collected from both Italian and Spanish students.

2. Materials and Methods

2.1. Setup and Virtual Test

The study was conducted using Meta Quest 2 (Meta Platforms, Inc.: Menlo Park, CA, USA) head-mounted displays (HMDs) with the “Out of the Box” app pre-installed. This application, developed through collaboration between experts in educational psychology and VR technologies, was designed in [46] and developed in [47] as part of the VRAIlexia project [48]. The virtual environment was designed to replicate real-world challenges faced by students with specific learning disorders (SLDs) and included two primary assessments.

The first was a SR evaluation, measuring reading-related cognitive and motor performance through interactive tasks. This is a commonly used test [49] where participants are asked to read a text in which are described various task to be accomplished. The VR SR scene reproduces the reading-comprehension task of the BDA 16–30 adult dyslexia battery (Giunti Psychometrics: Florence, Italy) [50], substituting the physical three-color button panel and sheets with virtual ones, while leaving the stimulus–response format unchanged. During the reading, users were asked to select buttons based on color, follow specific button sequences, hold and release buttons as instructed, choose words within a text, and use voice recognition to complete verbal interactions. The system recorded various data points, including the start time, error count, interaction duration, environmental factors, and voice recognition errors.

The second assessment was the Rosenberg Self-Esteem Test, a well-established 10-item scale evaluating global self-esteem in areas such as academic performance, social relationships, and personal competence. Participants responded to each statement using a four-point Likert scale ranging from “strongly agree” to “strongly disagree.” Data collection included start time, test duration, environmental context, and emotional responses. Self-esteem levels were categorized based on predefined cut-off points, with scores ranging from 30 to 40 indicating high self-esteem, 26 to 29 representing a medium level, and scores below 25 classified as low self-esteem. Responses were further analyzed in relation to emotional state descriptors relevant to students with SLDs, providing a deeper understanding of their psychological well-being. The ability to collect detailed behavioral insights within a controlled VR setting allows for more precise assessment and the development of targeted intervention strategies tailored to individual needs.

2.2. Participants and Experimental Protocols

A total of 80 university students participated in this study, divided into two independent samples of 40 students each. The first sample consisted of 40 Italian university students, half of whom were diagnosed with an SLD (SLD group), of whom 100% had dyslexia and 85% also had mixed-type, whereas the remaining 20 formed the control group (CG) with no diagnosed disorders. These participants were recruited from the University of Tuscia and the University of Perugia. The second sample included 40 Spanish university students, similarly divided into 20 with SLDs (of whom 100% had dyslexia and 55% also had mixed-type) and 20 in the control group. These participants were recruited from the University of Córdoba under the same selection criteria. All dyslexic participants held a current clinical certificate issued by a licensed neuro-psychologist; these certificates served as the ground-truth labels for supervised machine learning. For both groups, the inclusion criteria required participants to be university students over 18 years of age, native speakers of either Italian or Spanish, and to have no physical conditions that could interfere with VR use. Additionally, participants were excluded if they had any cognitive or psychological disorders beyond SLDs, ensuring a more homogeneous sample for comparison. Participation was entirely voluntary and no monetary or material compensation was offered, and behavioral data were collected in fully anonymous form.

In Table 1, we provide descriptive information for each subgroup, including mean age, gender distribution, and academic background.

Each student was instructed to wear the HMD and complete both tests. Upon launching the application, users were greeted by an initial screen where they provided basic socio-demographic information, including age, gender, and whether they had been diagnosed with one or more specific SLDs. To help participants familiarize with the VR environment, a virtual character introduced the experience and guided them through the interaction mechanics. Given the study’s research objectives, user movement within the virtual space was deliberately restricted, to ensure standardized conditions across all participants. Instead of freely navigating the environment or using hand gestures for selection, users interacted via head movement. A small circle at the center of the screen served as a pointer, requiring participants to align it with an interactive element before confirming their choice using the controller buttons, Figure 1a. This constraint ensured uniform interaction patterns, reducing the variability in data collection, while maintaining focus on the cognitive processes involved in reading and response selection. For the SR scenario, voice interaction was introduced as an additional mode of engagement. The virtual character explained this feature by instructing participants to repeat a specific sentence, familiarizing them with the speech-to-text algorithm Figure 1b. An audible signal confirmed when the system had successfully processed the spoken input. Since the study targeted individuals with reading disorders, the application also allowed users to adjust text size and font to accommodate their reading preferences and minimize accessibility barriers, Figure 1c. This familiarization ensured that the VR interface, unfamiliar yet identical for all participants, did not bias the subsequent SR measures.

Once participants had become comfortable with the VR interaction mechanics, the virtual character provided an educational segment on SLDs and the VRAIlexia project [51] Figure 2a. This segment aimed to increase awareness of reading difficulties and foster a sense of relatability by presenting a list of well-known historical and contemporary figures diagnosed with learning disorders, Figure 2b. Lastly, all the actors that participated in the development of the VRAIlexia project were shown, Figure 2c. By incorporating this step, the experience was designed to be more engaging, reducing the potential anxiety associated with performance-based assessments.

The SR test was structured to evaluate participants’ ability to process written content while simultaneously engaging in interactive tasks. Because selections were made through head movement, users often had to shift their focus between the text and response options, simulating real-world reading challenges experienced by individuals with SLDs, Figure 3a. This additional cognitive load was an intentional design choice, reflecting the difficulty many dyslexic readers face when needing to process visual information while performing secondary tasks. Participants were required to interact five times with three colored buttons at the bottom of the screen, select a specific word within the text and verbally repeat three words from the passage. These tasks were designed to assess different aspects of reading engagement, such as visual tracking, word recognition, and short-term verbal recall. Following the reading assessment, the Rosenberg Self-Esteem Test was administered using the same interaction mechanics. Participants responded to ten statements assessing global self-esteem, selecting their answers by aligning the central pointer with the desired option, before confirming with the controller Figure 3b. Unlike a fixed testing order, users were given the freedom to choose which test to complete first, allowing them to follow their preferred sequence, Figure 3c. This approach aimed to reduce stress and increase participant comfort, ensuring that individual preferences were accounted for in the testing process. Additionally, it allowed avoiding bias in the results due to the test sequence.

All the operations described above are part of an experimental protocol that has been approved by the Ethical Committee CEIm Provincial de Córdoba n. 367 (27/11/2024).

2.3. Data Analysis

To assess the effectiveness of the VR-based psychometric tests, we implemented a supervised ML classification framework. The dataset consisted of performance metrics extracted from VR assessments, including error rates, total response times, and self-esteem scores.

To reduce dimensionality and remove redundant information, a correlation matrix was computed among the numerical initial features. As shown in Figure 4, we observed high correlations among the individual time measurements of SR (e.g., time_SR1 to time_SR9) and RSES (e.g., RSES1 to RSES10) items, as well as between these and their respective totals. Therefore, we retained only the total SR time and total RSES time as synthetic indicators. Similarly, individual RSES responses showed high internal correlations and no particular item demonstrated discriminative power over the others. As a result, only the global self-esteem score was kept. The same rationale was applied to the error-related variables, where only the total number of SR errors was selected as a compact and meaningful metric. This resulted in a final feature set composed of four variables: SR errors, total SR time, total RSES time, and self-esteem score.

The target variable was the presence of dyslexia, which was binarized into class labels (0: no dyslexia, 1: dyslexia). Prior to model training, the dataset was preprocessed to remove missing values. The dataset consisted of 41 features (see Table 2) extracted using the Out of the Box application, which synthesized the results from two psychometric tests. These features included the number of errors in the different stages of the SR, as well as the Rosenberg test score for each of the items. Additionally, a correlation study was conducted to identify the most relevant features for dyslexia prediction and to avoid redundant information among features. Based on this analysis, the final feature set was reduced to four key predictive variables: errors, total SR time, total RSES time, and self-esteem score. Finally, the dataset was then split into training and testing sets using an 80–20 ratio, using stratified sampling to maintain the class distribution.

A set of widely used supervised learning algorithms were evaluated for dyslexia prediction. To optimize their predictive performance, a systematic search over predefined hyperparameter grids was conducted. The models evaluated and their corresponding hyperparameter tuning strategies were as follows:

LR: The hyperparameter optimization process evaluated two types of regularization: L1, which promotes sparse solutions by driving some coefficients to zero, and L2, which distributes regularization more evenly across all parameters. These were assessed in combination with two optimization algorithms: Limited-memory BFGS, a quasi-Newton method suited for smooth, differentiable objectives and typically used with L2 regularization; and Liblinear, a coordinate descent-based solver that efficiently handles both L1 and L2 penalties;
SVM: This function defining how data are mapped into a higher-dimensional space was explored using three types of kernels: a simple linear kernel, a polynomial kernel (2nd degree), and a RBF kernel, measuring similarity based on the distance between points and using a Gaussian function to assign higher weights to closer data points. The parameter controlling how the influence of individual points decreased with distance was evaluated with two predefined settings, one that scaled the values based on the reciprocal of the number of features in the dataset, and another that automatically adjusted based on the range of input values;
k-NN: The number of neighboring data points considered for classification was tested with values of 3, 5, and 7;
DT: The maximum depth allowed in the tree was varied between 10 and 15. The method for determining the best split at each node was tested using two well-known criteria: gini and entropy;
RF: The number of decision trees combined in the ensemble was tested with values 10, 20, 30, and 40. The other parameters considered were the same as those for the Decision Tree Classifier.

The choice of traditional ML algorithms was guided by the dataset size and the nature of the input features. With four final predictors and a sample size of 80 participants, deep learning models such as RNNs or MLPs were unsuitable due to their higher risk of overfitting, increased computational demands, and lack of interpretability. In contrast, classical ML models provide robust performance in low-data regimes and are easier to interpret in terms of feature contributions and decision boundaries, making them preferable for an application with potential clinical and educational implications.

Finally, model performance was assessed using standard classification metrics: accuracy and F1-score. Each model’s effectiveness was evaluated on the test set, and the results were compared to determine the most suitable approach for VR-based dyslexia assessment. To investigate potential cross-cultural differences and validate the robustness of the predictive models, the analysis was conducted in three phases: first, using data exclusively from Italian students, then, from Spanish students, and finally, using the combined dataset.

2.4. Statistical Analysis

In addition to the ML approach, statistical analyses were conducted to evaluate differences between the CG and SLD groups for every sample of data (CG_i and SLD_i for the Italians, CG_s and SLD_s for the Spanish, and CG_p and SLD_p for the pooled group). To determine whether the time taken to complete both tests differed significantly between SLDs and CG, an independent t-test and a post hoc power analysis were applied. Meanwhile, the non-parametric Mann–Whitney (MW) test was used to analyze the Rosenberg self-esteem scores and the number of errors made during the SR task. A significance level of 0.05 was set for all statistical tests.

3. Results

In this section, the results concerning the assessments are presented, first subdivided by nationality and finally as an uniform group.

3.1. Italian Group

3.1.1. Differences Between Groups

As shown in Figure 5, the results illustrate the differences in performance between the two Italian groups. From left to right, from top to bottom, the first set of data, Figure 5a, results indicate that the mean time to finish the SR test was 306 s for the SLD Italian group (SLD_i) and 200 s for the Italian CG (CG_i). The second set, Figure 5b, represents the mean time required to complete the RSES test, with SLD_i participants averaging 70.3 s and the CG_i completing it in 50.3 s. The third graphic, Figure 5c, shows that the mean number of errors recorded during the SR test was 3.58 for participants with SLD_i and 3.45 for those in the CG_i. Finally, the mean scores, Figure 5d, for RSES were 23.9 for the SLD_i group and 24.5 for the CG_i.

Table 3 presents the results of statistical tests for the Italian sample. The t-test results indicate statistically significant differences in the time taken to complete the SR and RSES tasks (p < 0.001 for SR time and p = 0.003 for RSES time), and the power analysis showed high sensitivity to the large effects (power = 0.999 for SR time and power = 0.932 for RSES time). This suggests that the SLD group took significantly different amounts of time to complete both tasks compared to the control group. Crucially, the Mann–Whitney U test results show no significant differences for SR errors (p = 0.584) and RSES scores (p = 0.531). This indicates that there was no significant difference in the number of errors made in the SR test or in the self-esteem scores between the two groups.

3.1.2. Classifier Performance

The results of the the training session for the Italian user group will now be presented. The SVM demonstrated a strong capacity to effectively model the data, Figure 6c. Notably, the SVM with the RBF kernel and ”scale” gamma achieved the highest performance, with an accuracy of 77.1% and an F1-score of 75.3%. This result indicates that, among the algorithms tested, the SVM was particularly adept at capturing the underlying patterns in the user data. RF was also a competitive algorithm within the Italian group, Figure 6a,b. Its performance, however, was more variable and sensitive to the chosen parameters, such as the maximum depth of the trees and the number of estimators. The best results observed for RF were around 70.5% for accuracy and 69.7% for the F1-score. For instance, when using the gini criterion and a max depth of 11, the accuracy reached 70.5% with 20 estimators. LR provided a more stable performance baseline for the Italian group, Figure 6d. It consistently delivered accuracy scores around 70% and F1-scores around 68% across the different parameter settings. As an example, the L1 penalty combined with the Liblinear solver yielded an accuracy of 71.0% and an F1-score of 70.0%. In contrast, the DT classifier showed a performance range with accuracies generally between 50% and 60%, Figure 6f. The entropy criterion tended to produce slightly better results compared with the gini criterion, but overall, the DT’s predictive power was lower than that of SVM, RF, or LR in this group. Finally, k-NN exhibited a trend of improving performance as the number of neighbors increased, Figure 6e. The highest accuracy achieved by k-NN was approximately 70% when using seven neighbors, suggesting that considering a larger neighborhood enhanced the model’s ability to generalize.

Therefore, the best configuration (SVM with the RBF kernel and adjusting the gamma based on the range of input values) was used on the test set, achieving 87.5% for accuracy and 85.7% for F1-score.

3.2. Spanish Group

3.2.1. Differences Between Groups

In Figure 7, as previously, from left to right and from top to bottom, the first graphic Figure 7a shows the mean time needed to finish the SR test, with 264 s for the SLD_s group and 196 s for the CG_s. The data results related to the mean time required to complete the RSES test, Figure 7b, illustrate that the SLD_s participants spent 70.7 s, with the CG_s completing it in 57.5 s. The third graphic, Figure 7c, is related to the mean errors recorded during the SR test, showing a mean of 2.95 for the SLD_s group and 3.40 for the CG_s. Lastly, the mean scores recorded in the RSES, Figure 7d, were 21.6 for the SLD_s group and 24.1 for the CG_s.

Table 4 shows the statistical test results for the Spanish sample. The t-test results revealed no statistically significant differences in task completion time between groups for either SR (p = 0.063) or RSES (p = 0.174), the power analysis showed a moderate sensitivity to large effects for SR time (power = 0.522) and a low sensitivity for RSES time (power = 0.264). Consistently, the Mann–Whitney U test also showed no significant differences for SR errors (p = 0.696) and RSES scores (p = 0.069).

3.2.2. Classifier Performance

Moving on to the ML training session for the Spanish user group, the RF algorithm proved to be particularly effective, Figure 8a,b. It achieved the highest accuracy, reaching up to 75.2% and an F1-score of 73.2%. This peak performance was observed with the gini criterion, a max depth of 15, and 40 estimators, demonstrating RF’s capacity to model the complexities of the data. k-NN also performed reasonably well for this group of data, Figure 8e. It obtained an accuracy of 71.9% and an F1-score of 69.1% when using either five or seven neighbors. This indicates that k-NN can be a viable alternative, though it did not quite reach the peak performance of RF in this context. The SVM showed a more varied performance in the Spanish group, compared to its performance in the Italian group, Figure 8c. While the polynomial kernel achieved an accuracy of 68.6% and an F1-score of 63.3%, SVM did not outperform RF. The effectiveness of SVM appeared to be more sensitive to the choice of kernel and parameters in the Spanish data. LR generally exhibited accuracy in the lower 60% range Figure 8d. This suggests that while LR could provide a stable baseline, it may not have fully captured the nuances present in the user data. Similarly to the first group, the DT classifier’s performance for the Spanish group was generally around 60% accuracy, Figure 8f. This consistency across both groups indicates that DT, while simple, may lack the predictive power of more complex algorithms like RF or SVM.

Again, the best configuration was used on the test set. As just mentioned, this time the best performance in training was achieved by the RF classifier with the gini criterion, a max depth of 15, and 40 estimators, achieving 66.6% for both accuracy and F1-score on the test set.

3.3. Pooled Results

3.3.1. Differences Between Groups

In Figure 9, the graphics related to the results using the entire dataset are presented. The graphic related to the mean time taken to finish, Figure 9a, shows that the results for the SR test were 284 s for the SLD_m group and 198 s for the CG_m. Concerning the mean time required to complete the RSES test, Figure 9b, it is possible to see that the SLD_m participants required 70.5 s and the CG_m needed 53.9 s. The diagram of the mean errors of the SR test, Figure 9c, illustrate a mean of 3.25 for SLD_m participants and 3.43 for the CG_m. Finally, the mean scores achieved in the RSES Figure 9d were 22.7 for the SLD_m group and 24.3 for the CG_m.

Table 5 presents the statistical test results for the entire sample. The t-test results indicate statistically significant differences in task completion time for SR (p < 0.001) and RSES (p = 0.005). The power analysis also shows a high sensitivity to the large effects for SR time (power = 0.994) and for RSES time (power = 0.814). However, the Mann–Whitney U test results showed no significant differences for SR errors (p = 0.952) and RSES scores (p = 0.853).

3.3.2. Classifier Performance

When considering the pooled Italian and Spanish user group, the results achieved during the ML training validated both RF and SVM as the top-performing algorithms. RF achieved a peak accuracy of 75.4% and an F1-score of 73.3%, Figure 10a,b. This strong performance underscores RF’s ability to generalize across different user demographics and potentially capture common patterns present in both groups. SVM also demonstrated robust performance in the pooled group, Figure 10c. With the RBF kernel, SVM achieved an accuracy of 72.3% and an F1-score of 70.2%. This indicates that SVM remains a highly effective algorithm, although it was slightly outperformed by RF in this cross-language analysis. LR continued to provide consistent performance in the pooled group, with an accuracy and F1-score hovering around 70%, Figure 10d. This stability suggests that LR could reliably model the data, although it may not have achieved the highest possible predictive accuracy. k-NN was also competitive on the pooled group, achieving an accuracy of approximately 72% with five neighbors, Figure 10e. This performance level suggests that k-NN could effectively classify users in the pooled group, balancing between bias and variance. In contrast, the DT classifier remained the least effective algorithm for the pooled group, Figure 10f, with accuracy scores generally in the low 60 s. This consistent under performance across all three groups reinforces the observation that DT may not be complex enough to capture the intricacies of user behavior compared to other algorithms.

Again, the best configuration for performing the test was RF, but this time with an entropy criterion, a max depth of 15, and 20 estimators. The final scores for this configuration was 75.0% for accuracy and 71.4% for F1-score.

4. Discussion

This study explored the use of VR and ML to identify dyslexia in Italian and Spanish university students. The research encompassed two main areas: (i) analyzing differences in VR-derived behavioral data between students with and without SLDs, and (ii) evaluating the performance of ML algorithms in classifying dyslexia based on these data.

The statistical analysis revealed a pattern that partially supported the study’s hypothesis. In the Italian sample, the t-tests indicated significant differences in the time taken to complete the SR and RSES tasks (p < 0.001 and p =

0.003

, respectively). This suggests that students with SLDs exhibited different task completion speeds compared to the control group. Importantly, the MW tests showed no significant differences in SR errors (p =

0.584

) or RSES scores (p =

0.531

), indicating that task accuracy and self-esteem levels did not significantly differ between the groups. These findings align with the expectation that dyslexia primarily manifests as a difference in processing speed, rather than accuracy or self-esteem. In contrast, the Spanish sample showed no statistical differences between groups for task completion time (t-tests) or SR errors/RSES scores (MW tests). This suggests that, within the Spanish sample, the VR-derived data did not reveal significant differences between students with and without SLDs, in terms of speed, accuracy, or self-esteem. When the Italian and Spanish samples were combined, the results mirrored the Italian sample findings: significant differences in task completion time (p

< 0.001

and p =

0.005

) but no significant differences in SR errors or RSES scores (p =

0.952

and p =

0.853

). Thus, the SR and RSES analyses statistically confirmed in two out of three cases how reading is a more time-consuming activity for people with SLDs [52,53]; Regarding the Spanish group, who did not pass the statistical analysis, but whose values remained close to the threshold limit, we can assume that the sample could be too meager to highlight this evidence. Staying on SR, the statistical score analysis of all three cases confirmed no difference between the two groups and that the results of individuals with SLDs can be the same as those of unaffected individuals, as long as it is supported with appropriate time and tools [54,55]. Finally, an interesting result lies in the self-esteem values noted during this study, where the MW test confirmed no statistical evidence among SLD and CG; in addition, while, as expected, the SLD group reported the lowest levels of self-esteem, they still achieved similar results to the CGs, who never managed to exceed the threshold value of 25 on the RSES scale that limited the low self-esteem area. This analysis confirms, as mentioned in the introduction, both that people with SLDs usually have low self-esteem [12,13,14,15], but also that this does not differ much from that of similarly aged peers without any disorder [16]. Since the study was aimed at college students, the probable cause of these low values should be sought among academic challenges and social dynamics [56,57].

The ML analysis demonstrated the feasibility of using VR-derived data to automatically identify dyslexia, but also revealed interesting differences in algorithm performance across the groups. On the test set, the classification accuracy was highest for the Italian group (87.5% accuracy, 85.7% F1-score), indicating the strong ability of the model to generalize to new data within the population. SVM is known for its effectiveness in high-dimensional spaces and its ability to capture complex non-linear relationships [58,59], which might explain its strong performance with the Italian data. The pooled group also showed promising results (75.0% accuracy, 71.4% F1-score), suggesting that the model could capture some cross-linguistic features relevant to dyslexia. RF, an algorithm robust to overfitting and capable of handling complex feature interactions [60], may have performed well in the combined group due to its ability to generalize across the increased variability introduced by combining the Italian and Spanish data [61]. However, the Spanish group showed a lower classification accuracy (66.6% for both accuracy and F1-score), which might indicate that the model struggled to generalize to the specific characteristics of this group. This could have been due to differences in the nature of the VR-derived data itself, such as the feature distributions or the strength of the relationship between VR measures and dyslexia manifestation in Spanish speakers. The latter, in particular, emphasizes again how each language, as all orthographic systems, has its own challenges [3]. Simpler algorithms like Logistic Regression and Decision Trees, while useful in other contexts, may have been less effective due to their limitations in capturing non-linear relationships and handling complex feature interactions.

The cross-linguistic approach is a notable strength of this study. The successful application of ML algorithms, particularly on the Italian and entire datasets, suggests that the VR-based assessment captured some language-invariant features relevant to dyslexia. However, the lower performance on the Spanish group indicates that further investigation into language-specific influences is warranted.

4.1. Limitations

The power analysis indicated that the design was highly sensitive to the large speed effects detected in the Italian and the pooled sample, but only moderately powered for SR time and under-powered for RSES time on the Spanish cohort, reinforcing the need for larger, age-diverse samples to confirm cross-linguistic generalization. Regarding algorithmic fairness, no explicit fairness metrics, such as demographic parity or equal opportunity, were computed. However, the dataset was carefully balanced in terms of group composition (equal numbers of participants with and without dyslexia), as reported in Table 1, as well as across key demographic variables such as age, language, and gender. This balanced design aimed to mitigate potential sources of bias during model training and evaluation. However, given the relatively small sample size, we avoided further subdividing the dataset to compute fairness-related metrics, as this would reduce the statistical power and compromise the reliability of the results. This limitation will be addressed in future studies involving larger and more diverse populations, where dedicated fairness assessments could be incorporated to ensure equitable performance across subgroups. Additionally, the lower performance observed in the Spanish dataset may have been due to several factors. The statistical differences between groups were weaker compared to the Italian sample, suggesting that the selected features may capture dyslexia-related patterns less effectively in Spanish. Although Spanish and Italian are both transparent languages, linguistic and cultural differences may affect how dyslexia manifests and how participants interacted with the VR tasks, demonstrating why a cross-linguistic approach is fundamental.

4.2. Future Works

Since the present study was conceived as an exploratory proof of concept, the VR protocol was not administered concurrently with the reference battery; a follow-up validation that presents equivalent BDA 16-30 passages in VR has already been planned. Additionally, the BDA 16-30 manual is intended for a single administration, mainly to avoid passage familiarity, and test–retest coefficients are not provided in the normative tables, and the present study followed the same approach. Therefore, future work will develop parallel passages to estimate reliability without learning effects. Regarding subsamples, the post hoc power of the Spanish subsample for SR-time was 0.52 and fell below 0.30 for the RSES-time; and medium cross-linguistic effects could therefore have escaped detection. A follow-up study will recruit a larger number of participants per language and will incorporate language as an explicit covariate in the classification model to test orthographic-transparency hypotheses with adequate statistical sensitivity. Finally, although neither the Rosenberg self-esteem score nor the SR comprehension score showed a significant group difference, both were retained in this exploratory model because small distributional shifts, especially when combined with timing features, may still aid classification. Future studies will benchmark models with and without these accuracy measures, and will remove them if they add no practical value.

5. Conclusions

This study provides valuable evidence for the potential of VR and ML in dyslexia assessment in Italian and Spanish university students. The machine learning results on the test set demonstrated the promise of these techniques for classifying dyslexia, with a particularly strong performance on the Italian group. The analysis of group differences largely supported the hypothesis that dyslexia is characterized by differences in task completion speed, with less impact on task accuracy and self-esteem. However, the difference in results between the Italian and Spanish samples and the varying performance of the ML models across groups suggest that every language needs a dedicated assessment, in order to select an appropriate algorithm. Future research should aim to expand the sample size, include participants from diverse age groups, and explore additional ML techniques. A key focus should be to further investigate the language-specific factors that may influence both VR-derived behavioral data and the performance of machine learning models.

Author Contributions

Conceptualization, M.M., G.M., J.M.A.-L., E.Y.-B., G.C., A.Z. and J.T.; Methodology, M.M., G.M., J.M.A.-L., A.Z. and J.T.; Resources, E.Y.-B. and G.C.; Data curation, M.M., G.M., J.M.A.-L., A.Z. and J.T.; Writing—original draft preparation, M.M., G.M., J.M.A.-L., A.Z. and J.T.; Writing—review and editing, M.M., G.M., J.M.A.-L., A.Z., E.Y.-B., G.C. and J.T.; Supervision, J.T., G.C. and E.Y.-B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Ethical review and approval were waived for this study, since no experimentation was carried out on human beings. The involvement of humans was limited to the completion of Virtual Reality tests.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

José Manuel Alcalde-Llergo enrolled in the National PhD in Artificial Intelligence, XXXVIII cycle, course on Health and Life Sciences, organized by Università Campus Bio-Medico di Roma. He is also pursuing his doctorate with co-supervision at the Universidad de Córdoba (Spain), enrolled in its PhD program in Computation, Energy and Plasmas.

Conflicts of Interest

The authors declare no conflicts of interest.

References

World Health Organization. 6A03-Developmental Learning Disorder. Available online: https://icd.who.int/browse/2024-01/mms/en#2099676649 (accessed on 22 April 2025).
Benedetti, I.; Barone, M.; Panetti, V.; Taborri, J.; Urbani, T.; Zingoni, A.; Calabrò, G. Clustering analysis of factors affecting academic career of university students with dyslexia in Italy. Sci. Rep. 2022, 12, 9010. [Google Scholar] [CrossRef]
Moore, K.A.; Lai, J.; Quinonez-Beltran, J.F.; Wijekumar, K.; Joshi, R.M. A cross-orthographic view of dyslexia identification. J. Cult. Cogn. Sci. 2023, 7, 197–217. [Google Scholar] [CrossRef]
Hengeveld, K.; Leufkens, S. Transparent and nontransparent languages. Folia Linguist. 2018, 52, 139–175. [Google Scholar] [CrossRef]
Cossu, G. The acquisition of Italian orthography. In Learning to Read and Write: A Cross-Linguistic Perspective; Cambridge University Press: Cambridge, UK, 1999; Volume 2, pp. 10–33. [Google Scholar]
Li, J.; Han, X.; Wang, W.; Sun, G.; Cheng, Z. How social support influences university students’ academic achievement and emotional exhaustion: The mediating role of self-esteem. Learn. Individ. Differ. 2018, 61, 120–126. [Google Scholar] [CrossRef]
Kermode, S.; MacLean, D. A study of the relationship between quality of life, self-esteem and health. Aust. J. Adv. Nurs. 2001, 19, 33–40. [Google Scholar] [PubMed]
Donnellan, M.B.; Trzesniewski, K.H.; Robins, R.W. Measures of self-esteem. In Measures of Personality and Social Psychological Constructs; Elsevier: Amsterdam, The Netherlands, 2015; pp. 131–157. [Google Scholar]
Rosenberg, M. Rosenberg self-esteem scale. J. Relig. Health 1965. [Google Scholar] [CrossRef]
Tinakon, W.; Nahathai, W. A comparison of reliability and construct validity between the original and revised versions of the Rosenberg Self-Esteem Scale. Psychiatry Investig. 2012, 9, 54. [Google Scholar] [CrossRef] [PubMed]
Alesi, M.; Rappo, G.; Pepi, A. Self-esteem at school and self-handicapping in childhood: Comparison of groups with learning disabilities. Psychol. Rep. 2012, 111, 952–962. [Google Scholar] [CrossRef]
Conley, T.D.; Ghavami, N.; VonOhlen, J.; Foulkes, P. General and domain-specific self-esteem among regular education and special education students. J. Appl. Soc. Psychol. 2007, 37, 775–789. [Google Scholar] [CrossRef]
Parshurami, A. A study on self-esteem and adjustment in children with learning disability. Indian J. Ment. Health 2015, 2, 306–311. [Google Scholar] [CrossRef]
Shah, P. The Relationship between Anxiety, Depression and Self-esteem in Adolescents with Learning Disability. Indian J. Ment. Health 2019, 6, 368–376. [Google Scholar] [CrossRef]
Burden, R.; Burdett, J. Factors associated with successful learning in pupils with dyslexia: A motivational analysis. Br. J. Spec. Educ. 2005, 32, 100–104. [Google Scholar] [CrossRef]
Pathrikar, K.R. A study on perceived social support and self esteem in children with and without learning disability. Indian J. Ment. Health 2016, 3, 271–277. [Google Scholar] [CrossRef]
Catts, H.; Petscher, Y. Early identification of dyslexia: Current advancements and future directions. Perspect. Lang. Lit. 2018, 44, 33–36. [Google Scholar]
Zingoni, A.; Taborri, J.; Panetti, V.; Bonechi, S.; Aparicio-Martínez, P.; Pinzi, S.; Calabrò, G. Investigating issues and needs of dyslexic students at university: Proof of concept of an artificial intelligence and virtual reality-based supporting platform and preliminary results. Appl. Sci. 2021, 11, 4624. [Google Scholar] [CrossRef]
Zingoni, A.; Taborri, J.; Calabrò, G. A machine learning-based classification model to support university students with dyslexia with personalized tools and strategies. Sci. Rep. 2024, 14, 273. [Google Scholar] [CrossRef]
Morciano, G.; Llergo, J.M.A.; Zingoni, A.; Bolívar, E.Y.; Taborri, J.; Calabrò, G. Use of recommendation models to provide support to dyslexic students. Expert Syst. Appl. 2024, 249, 123738. [Google Scholar] [CrossRef]
van den Boer, M.; Bazen, L.; de Bree, E. The same yet different: Oral and silent reading in children and adolescents with dyslexia. J. Psycholinguist. Res. 2022, 51, 803–817. [Google Scholar] [CrossRef]
Smyrnakis, I.; Andreadakis, V.; Rina, A.; Boufachrentin, N.; Aslanides, I.M. Silent versus reading out loud modes: An eye-tracking study. J. Eye Mov. Res. 2021, 14, 10–16910. [Google Scholar] [CrossRef]
Gagliano, A.; Ciuffo, M.; Ingrassia, M.; Ghidoni, E.; Angelini, D.; Benedetto, L.; Germanò, E.; Stella, G. Silent reading fluency: Implications for the assessment of adults with developmental dyslexia. J. Clin. Exp. Neuropsychol. 2015, 37, 972–980. [Google Scholar] [CrossRef]
Hairrell, A.; Edmonds, M.; Vaughn, S.; Simmons, D. Independent silent reading for struggling readers: Pitfalls and potential. In Revisiting Silent Reading: New Directions for Teachers and Researchers; International Reading Association: Newark, DE, USA, 2010; pp. 275–289. ISBN 979-0-87207-833-8. [Google Scholar]
Grigorenko, E.L.; Compton, D.L.; Fuchs, L.S.; Wagner, R.K.; Willcutt, E.G.; Fletcher, J.M. Understanding, educating, and supporting children with specific learning disabilities: 50 years of science and practice. Am. Psychol. 2020, 75, 37. [Google Scholar] [CrossRef]
Kourtesis, P.; Collina, S.; Doumas, L.A.; MacPherson, S.E. Validation of the Virtual Reality Everyday Assessment Lab (VR-EAL): An immersive virtual reality neuropsychological battery with enhanced ecological validity. J. Int. Neuropsychol. Soc. 2021, 27, 181–196. [Google Scholar] [CrossRef]
Servotte, J.C.; Goosse, M.; Campbell, S.H.; Dardenne, N.; Pilote, B.; Simoneau, I.L.; Guillaume, M.; Bragard, I.; Ghuysen, A. Virtual reality experience: Immersion, sense of presence, and cybersickness. Clin. Simul. Nurs. 2020, 38, 35–43. [Google Scholar] [CrossRef]
Lin, X.P.; Li, B.B.; Yao, Z.N.; Yang, Z.; Zhang, M. The impact of virtual reality on student engagement in the classroom—A critical review of the literature. Front. Psychol. 2024, 15, 1360574. [Google Scholar] [CrossRef]
Drigas, A.; Mitsea, E.; Skianis, C. Virtual reality and metacognition training techniques for learning disabilities. Sustainability 2022, 14, 10170. [Google Scholar] [CrossRef]
Alcalde-Llergo, J.M.; Yeguas-Bolívar, E.; Aparicio-Martínez, P.; Zingoni, A.; Taborri, J.; Pinzi, S. A VR serious game to increase empathy towards students with phonological dyslexia. In Proceedings of the 2023 IEEE International Conference on Metrology for eXtended Reality, Artificial Intelligence and Neural Engineering (MetroXRAINE), Milano, Italy, 25–27 October 2023; pp. 184–188. [Google Scholar]
Alcalde-Llergo, J.M.; Aparicio-Martínez, P.; Zingoni, A.; Pinzi, S.; Yeguas-Bolívar, E. Fostering Inclusion: A Virtual Reality Experience to Raise Awareness of Dyslexia-Related Barriers in University Settings. Electronics 2025, 14, 829. [Google Scholar] [CrossRef]
Pijnenborg, G.; Nijman, S.; Veling, W. DiscoVR: Results of a multicenter RCT on a social cognitive virtual reality training to enhance social cognition in psychosis. Eur. Psychiatry 2022, 65, S119. [Google Scholar] [CrossRef]
Maskati, E.; Alkeraiem, F.; Khalil, N.; Baik, R.; Aljuhani, R.; Alsobhi, A. Using virtual reality (VR) in teaching students with dyslexia. Int. J. Emerg. Technol. Learn. 2021, 16, 291–305. [Google Scholar] [CrossRef]
Maresca, G.; Leonardi, S.; De Cola, M.C.; Giliberto, S.; Di Cara, M.; Corallo, F.; Quartarone, A.; Pidalà, A. Use of virtual reality in children with dyslexia. Children 2022, 9, 1621. [Google Scholar] [CrossRef]
Maresca, G.; Corallo, F.; De Cola, M.C.; Formica, C.; Giliberto, S.; Rao, G.; Crupi, M.F.; Quartarone, A.; Pidalà, A. Effectiveness of the Use of Virtual Reality Rehabilitation in Children with Dyslexia: Follow-Up after One Year. Brain Sci. 2024, 14, 655. [Google Scholar] [CrossRef] [PubMed]
Vaitheeshwari, R.; Chih-Hsuan, C.; Chung, C.R.; Yang, H.Y.; Yeh, S.C.; Wu, E.H.K.; Kumar, M. Dyslexia Analysis and Diagnosis Based on Eye Movement. IEEE Trans. Neural Syst. Rehabil. Eng. 2024, 32, 4109–4119. [Google Scholar] [CrossRef] [PubMed]
Chalkiadakis, A.; Seremetaki, A.; Kanellou, A.; Kallishi, M.; Morfopoulou, A.; Moraitaki, M.; Mastrokoukou, S. Impact of artificial intelligence and virtual reality on educational inclusion: A systematic review of technologies supporting students with disabilities. Educ. Sci. 2024, 14, 1223. [Google Scholar] [CrossRef]
Ko, J.; Jang, S.W.; Lee, H.T.; Yun, H.K.; Kim, Y.S. Effects of virtual reality and non–virtual reality exercises on the exercise capacity and concentration of users in a ski exergame: Comparative study. JMIR Serious Games 2020, 8, e16693. [Google Scholar] [CrossRef]
Wen, E.; Gupta, C.; Sasikumar, P.; Billinghurst, M.; Wilmott, J.; Skow, E.; Dey, A.; Nanayakkara, S. VR.net: A real-world dataset for virtual reality motion sickness research. IEEE Trans. Vis. Comput. Graph. 2024, 30, 2330–2336. [Google Scholar] [CrossRef]
Spitzley, K.A.; Karduna, A.R. Feasibility of using a fully immersive virtual reality system for kinematic data collection. J. Biomech. 2019, 87, 172–176. [Google Scholar] [CrossRef]
Khan, R.U.; Cheng, J.L.A.; Bee, O.Y. Machine learning and Dyslexia: Diagnostic and classification system (DCS) for kids with learning disabilities. Int. J. Eng. Technol. 2018, 7, 97–100. [Google Scholar]
Tamboer, P.; Vorst, H.; Ghebreab, S.; Scholte, H. Machine learning and dyslexia: Classification of individual structural neuro-imaging scans of students with and without dyslexia. Neuroimage Clin. 2016, 11, 508–514. [Google Scholar] [CrossRef]
Płoński, P.; Gradkowski, W.; Marchewka, A.; Jednoróg, K.; Bogorodzki, P. Dealing with the heterogeneous multi-site neuroimaging data sets: A discrimination study of children dyslexia. In Proceedings of the Brain Informatics and Health: International Conference, BIH 2014, Warsaw, Poland, 11–14 August 2014; Proceedings; Springer: Cham, Switzerland, 2014; pp. 471–480. [Google Scholar]
Płoński, P.; Gradkowski, W.; Altarelli, I.; Monzalvo, K.; van Ermingen-Marbach, M.; Grande, M.; Heim, S.; Marchewka, A.; Bogorodzki, P.; Ramus, F.; et al. Multi-parameter machine learning approach to the neuroanatomical basis of developmental dyslexia. Hum. Brain Mapp. 2017, 38, 900–908. [Google Scholar] [CrossRef]
Zhang, L.; Lin, Y.; Yang, X.; Chen, T.; Cheng, X.; Cheng, W. From sample poverty to rich feature learning: A new metric learning method for few-shot classification. IEEE Access 2024, 12, 124990–125002. [Google Scholar] [CrossRef]
Yeguas-Bolívar, E.; Alcalde-Llergo, J.M.; Aparicio-Martínez, P.; Taborri, J.; Zingoni, A.; Pinzi, S. Determining the difficulties of students with dyslexia via virtual reality and artificial intelligence: An exploratory analysis. In Proceedings of the 2022 IEEE International Conference on Metrology for Extended Reality, Artificial Intelligence and Neural Engineering (MetroXRAINE), Rome, Italy, 26–28 October 2022; pp. 585–590. [Google Scholar]
Materazzini, M.; Morciano, G.; Alcalde-Llergo, J.M.; Yeguas-Bolivar, E.; Zingoni, A.; Taborri, J. VR-based Silent Reading and Rosenberg Tests: Machine-Learning Approach to Identify Learning Disorders. In Proceedings of the 2024 IEEE International Conference on Metrology for eXtended Reality, Artificial Intelligence and Neural Engineering (MetroXRAINE), St Albans, UK, 21–23 October 2024; pp. 541–546. [Google Scholar] [CrossRef]
Zingoni, A.; Morciano, G.; Alcalde-Llergo, J.M.; Taborri, J.; Yeguas-Bolivar, E.; Aparicio-Martinez, P.; Pinzi, S.; Calabro, G. VRAIlexia project: Provide customized support to university students with dyslexia using Artificial Intelligence and Virtual Reality. In Proceedings of the 2024 IEEE International Conference on Metrology for eXtended Reality, Artificial Intelligence and Neural Engineering (MetroXRAINE), St Albans, UK, 21–23 October 2024; pp. 535–540. [Google Scholar] [CrossRef]
Santulli, F.; Scagnelli, M.; Ciuffo, M.; Baradello, A. SuperReading: Ulteriori prove di efficacia rilevate con i test di valutazione per l’adulto. Dislessia 2018, 15, 35–51. [Google Scholar]
Ciuffo, M.; Angelini, D.; Barletta Rodolfi, C.; Gagliano, A.; Ghidoni, E.; Stella, G. BDA 16-30 Batteria Dislessia Adulti; Giunti Psychometrics S.r.l.: Florence, Italy, 2018. [Google Scholar]
VRAIlexia. Available online: https://vrailexia.eu/ (accessed on 22 April 2025).
Casini, L.; Pech-Georgel, C.; Ziegler, J.C. It’s about time: Revisiting temporal processing deficits in dyslexia. Dev. Sci. 2018, 21, e12530. [Google Scholar] [CrossRef]
Snowling, M.J.; Hulme, C.; Nation, K. Defining and understanding dyslexia: Past, present and future. Oxf. Rev. Educ. 2020, 46, 501–513. [Google Scholar] [CrossRef]
Moojen, S.M.P.; Gonçalves, H.A.; Bassôa, A.; Navas, A.L.; de Jou, G.; Miguel, E.S. Adults with dyslexia: How can they achieve academic success despite impairments in basic reading and writing abilities? The role of text structure sensitivity as a compensatory skill. Ann. Dyslexia 2020, 70, 115–140. [Google Scholar] [CrossRef] [PubMed]
van Viersen, S.; de Bree, E.H.; de Jong, P.F. Protective factors and compensation in resolving dyslexia. Sci. Stud. Read. 2019, 23, 461–477. [Google Scholar] [CrossRef]
Arsandaux, J.; Boujut, E.; Salamon, R.; Tzourio, C.; Galéra, C. Self-esteem in male and female college students: Does childhood/adolescence background matter more than young-adulthood conditions? Personal. Individ. Differ. 2023, 206, 112117. [Google Scholar] [CrossRef]
Yang, S.; Huang, P.; Li, B.; Gan, T.; Lin, W.; Liu, Y. The relationship of negative life events, trait-anxiety and depression among Chinese university students: A moderated effect of self-esteem. J. Affect. Disord. 2023, 339, 384–391. [Google Scholar] [CrossRef]
Rajeswari, S.; Kumari, D.A. Role of Environmental, Social and Governance on Firm Value using Support Vector Machine. In Proceedings of the 2024 2nd International Conference on Advances in Computation, Communication and Information Technology (ICAICCIT), Faridabad, India, 28–29 November 2024; Volume 1, pp. 524–529. [Google Scholar]
Roy, A.; Chakraborty, S. Support vector machine in structural reliability analysis: A review. Reliab. Eng. Syst. Saf. 2023, 233, 109126. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Lovrić, M.; Pavlović, K.; Žuvela, P.; Spataru, A.; Lučić, B.; Kern, R.; Wong, M.W. Machine learning in prediction of intrinsic aqueous solubility of drug-like compounds: Generalization, complexity, or predictive ability? J. Chemom. 2021, 35, e3349. [Google Scholar] [CrossRef]

Figure 1. Screenshots of the familiarization phase. (a) How to interact; (b) Test of the speech-to-text algorithm; (c) Font customization.

Figure 2. Screenshots of the educational segment. (a) VRAIlexia’s project explanation; (b) well-known figures with certified SLDs; (c) overview of institutions participating in the project.

Figure 3. Screenshots of the tests phase. (a) SR; (b) RSES; (c) Opportunity to choose which test to start with.

Figure 4. Correlation matrix.

Figure 5. Performance of the Italian group: (a) average time to perform SR test; (b) average time to perform RSES test; (c) average errors made during SR test; (d) average of the scores obtained in the RSES test.

Figure 6. Performance of the ML algorithms for the Italian group. (a) RF with gini criterion; (b) RF with entropy criterion; (c) SVM; (d) LR; (e) k-NN; (f) DT.

Figure 7. Performance of the Spanish group: (a) average time to perform SR test; (b) average time to perform RSES test; (c) average errors made during SR test; (d) average of the scores obtained in the RSES test.

Figure 8. Performance of the ML algorithms on the Spanish group. (a) RF with gini criterion; (b) RF with entropy criterion; (c) SVM; (d) LR; (e) k-NN; (f) DT.

Figure 9. Performance of the Pooled group: (a) average time to perform SR test; (b) average time to perform RSES test; (c) average errors made during SR test; (d) average of the scores obtained in the RSES test.

Figure 10. Performance of the ML algorithms on the pooled group. (a) RF with gini criterion; (b) RF with entropy criterion; (c) SVM; (d) LR; (e) k-NN; (f) DT.

Table 1. Demographic characteristics of the participant subgroups.

Group	Mean Age	Gender (M/F)	Field of Study
Italian CG	25.8 ± 2.8	11/9	Engineering–Humanities
Italian SLD	21.6 ± 2.8	10/10	Engineering–Humanities
Spanish CG	26.0 ± 3.1	12/8	Computer Science–Nursing
Spanish SLD	24.9 ± 4.7	10/10	Computer Science–Nursing

Table 2. Features extracted from the Out of the Box application.

Feature Name	Unit	Test Origin	Description
Demographic info.	–	General	Basic demographic information (age, sex, and language)
Reported SLDs	–	General	SLDs diagnosed to the participant (dyslexia, dyscalculia, dysgraphia, and dysorthography)
Additional problems	–	General	Other cognitive or developmental conditions (e.g., ADHD)
Device	–	General	Device used during the VR session (headset or cardboard)
SR response times	Seconds	SR	Time taken to perform each of the nine comprehension tasks during the SR test
SR accuracy	Boolean	SR	Whether each response in the reading task was correct
Total reading time	Seconds	SR	Time to complete the entire reading assessment
Total reading errors	Count	SR	Total number of incorrect answers in the reading test
Environment noise	Boolean	SR	Quality of the environment during the reading task
Microphone issues	Boolean	SR	Presence of technical microphone problems
Self-esteem responses	Ordinal (1–4)	RSES	Responses to 10 items in the RSES
Self-esteem score	Score (0–30)	RSES	Sum of Rosenberg items indicating global self-esteem
Total RSES time	Seconds	RSES	Time taken to complete the self-esteem test
Environment	Categorical	RSES	Conditions during the self-esteem assessment

Table 3. Results of t-test and Mann–Whitney test for the Italian sample.

	SR	RSES
p-value (power)	<0.001 (0.999)	0.003 (0.932)
p-value	$0.584$	$0.531$

Table 4. Results of t-test and Mann–Whitney test for the Spanish sample.

	SR	RSES
p-value (power)	$0.063$ ( $0.522$ )	$0.174$ ( $0.264$ )
p-value	$0.696$	$0.069$

Table 5. Results of t-test and Mann–Whitney test for the entire sample.

	SR	RSES
p-value (power)	<0.001 ( $0.994$ )	$0.005$ ( $0.814$ )
p-value	$0.952$	$0.853$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Materazzini, M.; Morciano, G.; Alcalde-Llergo, J.M.; Yeguas-Bolívar, E.; Calabrò, G.; Zingoni, A.; Taborri, J. Combine Virtual Reality and Machine-Learning to Identify the Presence of Dyslexia: A Cross-Linguistic Approach. Information 2025, 16, 719. https://doi.org/10.3390/info16090719

AMA Style

Materazzini M, Morciano G, Alcalde-Llergo JM, Yeguas-Bolívar E, Calabrò G, Zingoni A, Taborri J. Combine Virtual Reality and Machine-Learning to Identify the Presence of Dyslexia: A Cross-Linguistic Approach. Information. 2025; 16(9):719. https://doi.org/10.3390/info16090719

Chicago/Turabian Style

Materazzini, Michele, Gianluca Morciano, José Manuel Alcalde-Llergo, Enrique Yeguas-Bolívar, Giuseppe Calabrò, Andrea Zingoni, and Juri Taborri. 2025. "Combine Virtual Reality and Machine-Learning to Identify the Presence of Dyslexia: A Cross-Linguistic Approach" Information 16, no. 9: 719. https://doi.org/10.3390/info16090719

APA Style

Materazzini, M., Morciano, G., Alcalde-Llergo, J. M., Yeguas-Bolívar, E., Calabrò, G., Zingoni, A., & Taborri, J. (2025). Combine Virtual Reality and Machine-Learning to Identify the Presence of Dyslexia: A Cross-Linguistic Approach. Information, 16(9), 719. https://doi.org/10.3390/info16090719

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Combine Virtual Reality and Machine-Learning to Identify the Presence of Dyslexia: A Cross-Linguistic Approach

Abstract

1. Introduction

2. Materials and Methods

2.1. Setup and Virtual Test

2.2. Participants and Experimental Protocols

2.3. Data Analysis

2.4. Statistical Analysis

3. Results

3.1. Italian Group

3.1.1. Differences Between Groups

3.1.2. Classifier Performance

3.2. Spanish Group

3.2.1. Differences Between Groups

3.2.2. Classifier Performance

3.3. Pooled Results

3.3.1. Differences Between Groups

3.3.2. Classifier Performance

4. Discussion

4.1. Limitations

4.2. Future Works

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI