1. Introduction
Neurodevelopmental disorders (NDs) refer to a range of conditions that impact the development and functioning of the brain, leading to difficulties in communication, learning, social interaction, behavior, cognition, and emotional functioning [
1,
2]. NDs are defined by persistent difficulties in acquiring, understanding, and/or using spoken or written language, leading to an inability to express oneself, engage in meaningful conversations, and fully participate in social and professional interactions and effective communication [
2,
3,
4,
5,
6,
7,
8,
9,
10]. NDs affect more than 10% of people globally, triggering long-term consequences and a significant economic burden [
11]. NDs include the following [
1,
12,
13]:
Autism spectrum disorders (ASD), which are characterized by behavior and communication difficulties;
Attention deficit hyperactivity disorder (ADHD), which is characterized by inattention, impulsivity, and hyperactivity;
Intellectual disability (ID), indicated by cognitive impairments;
Specific learning disorder (SLD), characterized by a deficiency in communication;
Communication disorders (CDs), illustrated by persistent difficulties in language acquisition and usage.
The field of neurodevelopmental disorders and their identification has become increasingly complex and challenging [
14]. Identifying neurodevelopmental disorders poses significant challenges due to the complexity and variability of their symptoms and presentations [
15,
16,
17]. While medical advancements have helped, inaccurate diagnosis and comorbidities make it hard to establish precise diagnostic boundaries [
18]. Particularly, more than one-third of individuals with ASD exhibit symptoms that match criteria for various disorders, resulting in numerous possible diagnostic combinations. Further, traditional diagnostic methods often rely on subjective observations and lengthy assessments, leading to delayed or inaccurate diagnoses [
3,
19,
20,
21]. Early detection is crucial as the developing brain is adaptable, allowing for the creation of compensation mechanisms. Rapid medical intervention can contribute to the reduction or mitigation of symptoms, ultimately enhancing the individual’s overall quality of life [
18].
With advancements in technology, specifically in the domain of machine learning (ML), researchers are exploring innovative ways to enhance the accuracy and efficiency of diagnosing the risk of NDs. The potential of ML algorithms to analyze large amounts of complex data is enormous [
18]. These algorithms can identify patterns and connections within the data that may be difficult for humans to comprehend, laying the groundwork for more accurate and efficient diagnostic tools to assist clinicians’ early detection and supporting children with NDs in a critical period where early intervention can significantly impact long-term outcomes [
18,
22,
23,
24].
Numerous studies have been conducted on various types of NDs, using a wide range of ML techniques for diverse sorts of data for diagnostic and prediction processes, employing efficient and sophisticated standards to attain accuracy and cost-effectiveness [
6,
15,
25,
26]. These ML techniques mainly employ supervised (like regression, support vector machines (SVMs), decision trees, artificial neural networks (ANNs), and Bayesian logic), and unsupervised learning methods (like clustering, association rules, and dimensionality reduction) [
27]. Semi-supervised learning and reinforcement learning techniques are used less frequently [
27].
The current literature reports ML classifiers used to diagnose ASD, developmental language disorder (DLD), and global developmental delay (GDD) in preschoolers [
24]. To be precise, different models were utilized, including neural networks, decision trees, support vector machines, XGBoost (eXtreme Gradient Boosting), and logistic regression, while accuracy was evaluated by twelve doctors. The study reported the potential for improving our understanding of these disorders’ diagnosis based on behavioral and developmental features. In a different study, reduced motor synergies can lead to motor control issues in individuals with ASD, and an early detection of motor impairments has been suggested that can effectively differentiate between ASD and typically developed (TD) individuals using a SVM with a radial basis function (RBF) [
28]. In another research study, a face recognition framework to detect early signs of ASD analyzed facial traits and eye contact using k-means clustering; unsupervised learning has been suggested to assist medical professionals accurate clinical decisions [
29]. Also using the k-means algorithm, Vargason et al., studied ASD and found three distinct categories of children with ASD [
30]. They also found that developmental delay, gastrointestinal issues, and immunological imbalances are common comorbidities associated with ASD. These findings can help identify comorbidities and subgroups within the ASD population. Moreover, ML-based evaluation was proposed to detect heart rate variability in children with ASD dealing with bradycardia (low heart rates) [
31]. Most cardiovascular conditions in children with ASD are congenital. Other researchers used eye-tracking data in ML to screen for ASD [
32], and their neural network model outperformed traditional methods, indicating that eye-tracking data could help doctors to quickly and accurately identify autism.
Early indicators of ADHD and ASD have been reported with ML and deep learning (DL) approaches, utilizing convolutional neural networks (CNNs) and deep learning APIs [
15]. DL techniques and CNNs have utilized personalized spatial-frequency anomalies in EEG power spectrum density to identify ADHD in children [
33]. Also, a CNN have been used to distinguish ADHD from healthy controls by using stacked multi-channel EEG time-frequency decompositions [
34]. ML prediction with EEG signals’ morphological and feature extraction techniques were analyzed employing the Bernoulli naive Bayes classifier, and outperformed others in the distinction of ADHD [
35].
Researchers have developed a new method to categorize adolescents with intellectual impairments, involving the extraction of speech features from linear predictive coding (LPC)-based cepstral parameters and mel-frequency cepstral coefficients (MFCC) [
36]. The utilized classification models were k-nearest neighbor (k-NN), support vector machine (SVM), linear discriminant analysis (LDA), and radial basis function neural network (RBFNN), with findings suggesting that the proposed methodology can help speech pathologists in estimating intellectual disabilities at an early age. Further, ML on resting-state electroencephalography recordings attempted to distinguish healthy individuals from those with intellectual and developmental disorders [
37]. Their approach achieved a balanced accuracy of 91.67% and identified a lower beta activity in the 19.5–21 Hz range as the most distinguishing characteristic for individuals with these disorders. Furthermore, ML and regression models were compared for early ASD and ID diagnosis [
38]. Using logistic regression, SVM, and ensemble learning techniques, 241 children with ASD were diagnosed. Of these children, 40.66% had both ASD and ID, and the researchers’ findings suggested that ML models based on socio-demographic and behavioral observation data, like SVM, may better identify autistic children with ID than regression models. More studies reveal the ML efficiency in SLD [
10,
21,
39,
40], and CD [
41]. A common observation in real-world scenarios is that multiple disorders can be further explored in research, examining their co-occurrence within a person [
6,
15,
19,
25,
42,
43].
Consequently, there is a growing need for automated diagnostic tools to help experts accurately and efficiently identify NDs in children. SmartSpeech (Ioannina, Greece) is an innovative system that uses a serious game and a machine learning model to assess a child’s developmental profile [
19,
43]. The aim of this study is to improve the model’s ability to capture complex patterns in the game dataset that are difficult for humans to comprehend, enhancing its accuracy. This study can contribute to advance our understanding of NDs, aid clinicians in early detection, support children with NDs during a crucial period, and improve digital diagnostic tools. This study employs a complete machine learning strategy with logistic regression, using Thurstone’s factor score estimation on the SmartSpeech game dataset to make predictions about NDs.
2. Materials and Methods
This study is an extension of the ongoing project SmartSpeech, with the full title “Smart Computing Models, Sensors, and Early diagnostic speech and language deficiencies indicators in Child Communication”, funded by the Region of Epirus and supported by the European Regional Development Fund (ERDF). Participants were recruited through public and private health and education establishments, with most of them being young children. Prior to the start of the study, parents were informed of the project’s scope and protocols, provided with written consent forms, and shared information about their child’s developmental and communication skills. They were also informed that the study had been approved by the University of Ioannina Research Ethics Committee, in compliance with the General Data Protection Regulation (GDPR).
During the project’s data collection, the children played a serious game (SG) that was part of the SmartSpeech system. The SG activities were designed to collect data on the children’s developmental skills and biometric measurements to examine potential biomarkers for classification purposes. The game dataset included variables that were child responses quantified from two sources: hand movements on the touch screen (such as solving puzzles, manipulating items on the touchscreen, or identifying images and forms) and verbal responses to questions or executing commands (such as recalling names/events, recognizing emotions, or answering with vocal replies). It is important to note that this study only focuses on the dataset gathered from the SG activities, excluding biometric measurements [
19,
43]. The children participated in a range of activities that were presented in an engaging and visually appealing manner.
To recognize the children’s verbal responses, we used the CMUSphinx voice-to-text program (Pittsburgh, PA, USA) [
44]. This program is accessible, open-source, and compatible with both desktop and mobile platforms. Additionally, we designed and trained a Greek language model using this program [
45].
2.1. Data Measurements
Most prediction models classify the data points they use into four categories: (i) true positive (TP): the individual being referred to does indeed have NDs, and our prediction correctly identified that the person has NDs; (ii) true negative (TN): the individual does not have NDs, and our prediction correctly identified that the individual does not have NDs; (iii) false positive (FP): despite the absence of actual NDs in the individual, our prediction erroneously indicated the presence of NDs (this type of problem is called a Type 1 error); and (iv) false negative (FN): despite the presence of NDs in the subject, our prediction erroneously indicated that the individual does not have NDs (this type of problem is called a Type 2 error).
For the classification of the datasets, the reported accuracy is the average classification accuracy as measured in the test set. Accuracy is a metric that quantifies the likelihood of the classifier correctly predicting the number of outcomes. To put it simply, it represents the ratio of accurate predictions to the overall number of guesses. Accuracy is expressed in Equation (1):
Next, the precision metric measures how accurate our positive predictions were, i.e., what percentage of forecasted positive points actually happened. Precision is defined in Equation (2).
Next, the recall metric measures the ratio of correctly identified positive events by our model. Put simply, it evaluates the accuracy of our model in effectively classifying positive cases among all cases designated as positive. Recall and sensitivity are equivalent. Equation (3) defines recall.
Finally, the F1 score is a performance metric that assesses the efficacy of a model in terms of both precision and recall. It ranges from 0 to 1, with a higher value indicating better performance. It is particularly useful in situations where maintaining a balance between false positives and false negatives is critical, such as in medical diagnosis or fraud detection. Equation (4) defines the F1 score:
2.2. Analysis Workflow
Our analysis workflow has been divided into the following fundamental phases to achieve accurate and meaningful results.
In the first phase, we paid particular attention to data preprocessing, a crucial step to ensuring data quality and cleanliness. During this phase, we handled any missing values, removed outlier data, and standardized variables to make the data consistent and homogeneous.
Next, we performed a cluster analysis in the second phase to group similar variables into homogeneous clusters. This procedure allowed us to identify patterns and structures within the data, revealing any groupings of participants with similar characteristics.
In the third phase, we conducted a reliability analysis to assess the consistency and reliability of our variables’ measurements. This analysis allowed us to verify the stability of the measures over time and ensure the validity of the obtained results.
Subsequently, in the fourth phase, we engaged in a factor analysis on the initial 13 variables, which led to the identification of five latent factors. These factors precisely aligned with the clusters identified in the cluster analysis, providing deeper insights into the underlying dimensions of our data.
After completing the factor analysis, we moved on to the fifth and final phase of our workflow, where we developed a predictive mdel based on machine learning techniques. Using the logistic regression model, we predicted the presence of an ND based on the latent factors identified in the cluster analysis. This prediction model allowed us to obtain accurate and clinically relevant results, providing valuable insights for early diagnosis and treatment of NDs. The methodological workflow is illustrated in
Figure 1.
Through this comprehensive and in-depth analysis workflow, our study aims to provide a deeper understanding of NDs and develop a precise and reliable predictive model for their identification. The results obtained could significantly contribute to clinical practice and neurological research, enabling the early detection of at-risk subjects and providing targeted and personalized support for their well-being and development.
2.3. Application Context
In this study, we utilize a novel and recently developed serious game dataset that collects various data on children’s speech and linguistic responses [
19].
The initial dataset consisted of 520 instances, which, after undergoing the first phase of preprocessing, was reduced to 473 participants. Analyses were performed on this sample to obtain reliable data for a more robust model. Finally, predictive analyses were conducted on a subset of 184 participants with an average age of 7 years.
The analyses were conducted using Orange Data Mining v3.36 on an Apple M1 Pro system with 16 GB RAM and 1 TB storage, operating on macOS Sonoma 14.2.1. This setup, coupled with the application of advanced machine learning techniques, ensured the efficiency and reproducibility of our analyses. The significance of such machine learning methodologies in extracting meaningful insights and predictive models from complex data sets has been previously underscored and validated in similar studies within the field of public health performance assessment [
46].
2.4. Data Preprocessing
In phase number 1 of the preprocessing in the machine learning environment, we performed the following operations:
We addressed the issue of missing data, which constituted 2.1% of the initial dataset comprising 520 instances, by employing the “impute” widget of the Orange data mining software. Specifically, we utilized a model-based imputer (simple tree) approach. This method constructs a decision tree for each attribute with missing values, using the remaining attributes to predict and impute the missing data. This technique is particularly noted for its ability to maintain the intrinsic structure of the data and provide a statistically sound solution for handling incomplete observations in datasets.
We selected the 13 features under study using the “select column” widget.
We standardized the selected variables using the “continue” widget, with mean = 0 and SD = 1.
We considered only inliers using the “outliers” widget.
The descriptive statistics of the resulting dataset are shown in
Table 1.
2.5. Cluster Analysis
A hierarchical clustering analysis was conducted using the Spearman distance metric and Ward linkage. This helped to identify and differentiate various attributes within five distinct groups, which are illustrated in the dendrogram of
Figure 2. Because the 13 variables being looked at were not spread out in a normal way, it was important to use the Spearman correlation distance metric and Ward linkage. The Spearman correlation metric calculates the linear correlation between the ranks of variable values, and then remaps them as a distance within the interval of 0 to 1. This metric focuses on the rankings of variables rather than their actual values. Ward linkage, on the other hand, is a technique that determines the distance between clusters in a hierarchical clustering process. It follows a “bottom-up” approach where each observation begins in its own cluster, and clusters are merged as one works up the hierarchy. The goal is to minimize the variation within the joined clusters. These methods were used to figure out the distances between variables in our dataset using the Orange Visual Programming tool. This made it easier to group variables based on how similar they were in rank, and it revealed five clear clusters (see
Figure 2): C1. Verbal Development and Spatial Reasoning, C2. Language Proficiency and Psychoemotional Development, C3. Cognition and Attention Development, C4. Pragmatical Competence, and C5. Auditory Processing and Phonological Ability.
2.6. Reliability Analysis
We performed the reliability analysis of the five clusters identified earlier. The aim was to assess the internal consistency of the measures within each cluster and determine if the selected variables reliably represent the latent factor identified for each cluster.
By calculating Cronbach’s alpha for the five clusters, we obtained the results shown in
Figure 3. The calculated Cronbach’s alpha indicates that the selected variables within the two clusters are consistent with each other and reliably measure the respective single cluster.
These results further confirm the choice of a single factor for each cluster, as the data suggest a strong internal consistency of the measures within each cluster.
2.7. Factor Analysis
The exploratory factor analysis of the 13 selected variables revealed significant findings. The minimum residual extraction method was used in combination with a ‘varimax’ rotation to optimize the clarity and simplicity of factor interpretation. Five latent factors emerged, explaining 76.2% of the total variance. All variables show correlations with their respective factors, indicating a strong association among them. Bartlett’s test of sphericity confirmed the presence of a significant factor structure in the cluster, with a p-value below 0.001. Additionally, the Kaiser–Meyer–Olkin measure of sampling adequacy (KMO MSA) indicates that the data are suitable for factor analysis in the cluster (0.872). A screening test based on parallel analysis confirmed the importance of the five latent factors. These results indicate the presence of five latent factors that consistently explain the observed variations in their respective variables.
The cluster, reliability, and factor analyses highlight the identification of latent factors that align with the respective clusters highlighted among reliable and consistent variables, forming a more robust model. The five identified factors are renamed as follows (
Figure 4):
The factor scores following the factor analysis were estimated using Thurstone’s method. This well-established technique employs a regression approach, where the observed variables are regressed on the extracted factors, allowing for the calculation of factor scores from the regression predictions. Thurstone’s method is valued for its robustness and appropriateness in factor score estimation, as it effectively captures the relationships between observed variables and latent factors, ensuring that the factor scores accurately represent the underlying constructs for further analysis and interpretation. The dataset, prepared for the predictive model and comprising 184 instances with an average age of about 7 years, incorporates latent factors as features and the dichotomous variable “disorder” as the target. This structure highlights the relationship between the latent factors and the presence or absence of NDs, facilitating interpretation and use in prediction.
2.8. Prediction Model
The predictive model will use a logistic regression algorithm to forecast the presence of NDs based on the five clusters, C1–C5.
The logistic regression technique is used to predict binary classifications in healthcare decision making by relying on the given features [
18,
47]. It is frequently employed to predict the presence or absence of an ND. This algorithm is known for its computational efficiency and quick training time. The coefficients associated with each feature indicate the direction and strength of the relationship, providing a straightforward interpretation of results. In this technique, the input feature values (e.g., C1–C5) are used to compute a weighted sum based on the acquired coefficients. The outcome is then subjected to the logistic function, which converts the continuous output into a probability value ranging from 0 to 1.
For example, if the probability is greater than 0.5, the model predicts the presence of NDs (class 1); otherwise, it predicts the absence of NDs (class 0).
For the external validation of the predictive model, a cohort comprising 184 individuals, approximately 7 years of age, was employed. This subset was strategically chosen to reflect a specific demographic within the target population. In contrast, the training phase utilized an expanded initial dataset of 473 subjects, whose ages ranged from 3 to 52 years. This methodology was meticulously designed to ascertain the model’s robustness and its generalizability across a heterogeneous and broader population spectrum.
The predictive algorithm was used in age group 3, consisting of 184 participants with an average age of approximately 7 years, whose descriptive statistics are presented in
Figure 5.
3. Experimental Results and Discussion
For an exhaustive evaluation of the model’s performance, we incorporated an analysis of the confusion matrix and receiver operating characteristic (ROC) curves. The confusion matrix elucidated a substantial concordance between the model’s prognostications and the actual classifications: 81.2% of cases without NDs (class 0) were correctly identified (true negatives), while 85.7% of cases with NDs (class 1) were accurately classified (true positives). Conversely, 18.8% of cases were erroneously classified as non-NDs (false positives), and 14.3% of non-ND cases were misclassified as NDs (false negatives), as delineated in
Figure 6.
Furthermore, the ROC curves for both categories (absence and presence of NDs) were incorporated to furnish a visual elucidation of the model’s discriminative capacity. These curves exhibit pronounced class separation, with elevated area under the curve (AUC) values signifying the model’s robust capability to correctly classify cases based on the presence of NDs (
Figure 7). The inclusion of these visual and statistical analyses not only augments our comprehension of the model’s performance but also lays a substantial groundwork for subsequent inquiries and practical implementations in our study.
We introduced an analytical extension to the logistic regression model for predicting NDs, embodied in the integration of a sophisticated nomogram. This tool, fundamental in the interpretation of predictive models, transforms complex results into an intuitive visual representation, offering a quantitative and qualitative understanding of the impact of each predictive variable. The nomogram, built on a foundation of carefully calibrated parameters, illuminates the probability of absence (target class 0) and presence (target class 1) of NDs with extraordinary precision.
For the absence of NDs, a total probability of 81% and a log-odds ratio of 5.79 are manifested through a mosaic of attributes: “Language Proficiency and Social” (2.64 points, value 0.9), “Text Comprehension and Processing” (0.6 points, value −0.7), “Cognitive Precision” (1.38 points, value 0.4), “Auditory Processing Skills” (1.13 points, value 0.4), and “Cognitive Communication Skills” (0.05 points, value 0.1). In contrast, the presence of NDs, with a total probability of 19% and a log-odds ratio of 5.05, is influenced differently by these same attributes, reflecting the complex nature of neurodevelopmental disorders.
The detailed analysis of the nomogram, illustrated in
Figure 8, reveals a dynamic and significant correlation between the variables and the probability of the presence of NDs. Each line in the nomogram, with its length and direction, not only indicates the importance and influence on the prediction but also tells a story of how the variables interact in a complex clinical context. The points assigned to each variable represent their relative weight in the model; higher values indicate a greater impact on the outcome probability. In particular, we observed that reducing the values of “Language Proficiency and Social”, “Text Comprehension and Processing”, “Cognitive Precision”, and “Auditory Processing Skills”, while increasing “Cognitive Communication Skills”, results in a marked increase in the predicted probability of the presence of NDs. This observation underscores the critical importance and sensitivity of the selected variables in our model and opens new perspectives for future investigations, suggesting specific pathways through which targeted interventions could positively influence the incidence and management of NDs.
The coefficients represent the effect of predictor variables on the log-odds of the binary target variable (ND). Let us see the interpretations for each coefficient (
Figure 9).
In summary, the coefficients of the logistic regression model provide information about the effect of predictor variables on the log-odds of having ND. For example, higher “Verbal Development and Spatial Reasoning” and “Cognition and Attention Development” are associated with an increase in the log-odds of having ND, while increases in the other variables are associated with a decrease in the log-odds of having ND.
3.1. Experiments
At the heart of our scientific investigation into neurodevelopmental disorders in children, we have adopted a rigorous and cutting-edge methodological approach to select the most effective predictive model.
Figure 10 represents the culmination of this analytical process, offering a detailed report of the metrics for all considered regressors. This visualization not only embodies our dedication to scientific precision but also serves as a critical reference point for understanding the comparative performance of various models.
Figure 10 illustrates the performance of models such as logistic regression, random forest, gradient boosting, SVM, kNN, naive Bayes, stochastic gradient descent (SGD), and AdaBoost, evaluated through rigorous metrics such as area under the ROC curve (AUC), accuracy (CA), F1 score (F1), precision (Prec), recall, and Matthews correlation coefficient (MCC). These metrics have been carefully selected to provide a holistic and multidimensional evaluation of each model’s capabilities, ensuring that our final choice is informed by a complete understanding of their performance.
Logistic regression emerges as a beacon of excellence in this analytical landscape, distinguishing itself for its superior discrimination ability and an optimal balance between precision and sensitivity. With an AUC of 0.730, accuracy of 0.815, F1 score of 0.776, precision of 0.823, and recall of 0.815, this model not only demonstrates an excellent ability to distinguish between classes but also provides a clear and direct interpretation of its predictions, a crucial aspect for clinical application.
External validation on an independent cohort has further strengthened our confidence in logistic regression, confirming its robustness and applicability to a broader and more heterogeneous population. This step was crucial to ensure that the selected model is not only accurate and reliable on the training dataset but also generalizable and applicable in real-world scenarios. Despite a slight decrease in performance with an AUC of 0.729, accuracy of 0.725, F1 score of 0.686, precision of 0.703, and recall of 0.725, the model maintains a strong discriminative ability and a good balance between key metrics.
The AUC curves presented in
Figure 10 offer an immediate visual comparison of each model’s ability to balance true positives and false positives, further highlighting the superiority of logistic regression in the context of our evaluation criteria. This visual comparison not only confirms our choice but also provides a transparent and quantifiable basis for the decision, ensuring that the selected model is the most suitable to guide informed and accurate clinical decisions.
The performance indicators of the predictive model, using a logistic regression algorithm to forecast the presence of NDs, are described in
Figure 11.
This indicates that the model is achieving a good balance between accuracy, precision, and recall. The model was evaluated using stratified 10-fold cross-validation. An AUC of 0.7229 suggests that the model has a reasonably good discriminative capacity between the two classes (presence or absence of NDS), while an accuracy of 0.8152 indicates that approximately 81.52% of instances are correctly classified by the model.
The F1-score, which considers both precision and recall, is also quite high with a value of 0.7763, indicating that the model is performing well in identifying true positive instances while minimizing false positives and false negatives. With a precision value of 0.822616, the model has a good proportion of correct positive predictions compared to all positive predictions. This indicates that when the model predicts that an instance belongs to the positive class, it is correct about 82.26% of the time.
These results indicate that the model has a good ability to discriminate between the classes, achieving a balance between precision and recall. Overall, the model appears to be effective in predicting the presence of NDs based on the provided features.
3.2. Advantages and Limitations of Applying Machine Learning in the Study of Neurodevelopmental Disorders
The adoption of machine learning in our study represents a data-driven approach that unveils patterns and correlations not immediately evident through traditional analytical methods. The ability to process and analyze large volumes of data allows the use of complex and detailed datasets that enhance the quality of predictions. For instance, machine learning has been instrumental in developing personalized assistive tools, demonstrating significant potential in enhancing the educational and social development of children with various neurodevelopmental disorders. This approach has shown promise in improving their social interaction and supportive education, indicating a promising direction for enhancing care and education in these areas [
48]. Similarly, the investigation into the use of machine learning-based diagnostic techniques for the early prediction of neurodevelopmental disorders in children highlighted reduced intervention time and increased accuracy [
15].
The ability to compare different models, such as logistic regression, random forest, and SVM, enables us to select the best performing and most suitable one for the specific context of the study. The use of stratified 10-fold cross-validation provides a reliable estimate of the model’s performance and minimizes the risk of overfitting, while validation on an independent external cohort confirms the generalizability of the models to a broader population, increasing the robustness of the study. However, it is important to recognize the limitations associated with these methods. The complexity and interpretation of some machine learning models can be daunting, especially for those without specific training. Studies that have explored the use of machine learning to analyze qualitative data have indicated that, although it can provide valuable insights, it requires careful interpretation and validation [
49].
The quality and reliability of predictions strongly depend on the quality of the input data; erroneous, incomplete, or biased data can lead to misleading results. Despite the use of cross-validation, there is always a risk of overfitting, especially when working with small datasets or a high number of features. Moreover, the literature analysis has highlighted how many machine learning models can be sensitive to imbalanced datasets, affecting their ability to generalize to new data [
38]. Finally, the configuration, interpretation, and validation of models require in-depth knowledge and specific skills, which can limit accessibility for some research teams. Studies that have developed models to differentiate individuals with specific conditions have highlighted the complexity involved in developing and interpreting these models [
50].
In conclusion, while the machine learning-based approach offers numerous advantages in terms of analytical capacity and data processing, a deep understanding of its limitations and challenges is essential to ensure that the results are interpreted correctly and used responsibly in a clinical context. This balance between the advantages and limitations of machine learning guides our study towards a rigorous and informed scientific investigation, aiming to provide more effective and reliable tools for the diagnosis and treatment of neurodevelopmental disorders.
4. Conclusions
This study is an in-depth analysis providing a deeper understanding of NDs with the development of a precise and reliable predictive model, using a logistic regression algorithm to forecast the presence of NDs.
The phases of this study’s analysis to achieve accurate and meaningful results were as follows: (i) data preprocessing for data consistency and homogeneity; (ii) cluster analysis for grouping similar observations into homogeneous clusters; (iii) reliability analysis to assess the consistency and reliability of our variables’ measurements; (iv) factor analysis for the identification of latent factors that align with the respective clusters highlighted among reliable and consistent variables, forming a more robust model; and (v) a predictive model based on machine learning techniques using the logistic regression model, through which we predicted the presence of an ND based on the clusters identified in the cluster analysis.
The results of this study are comparable with two previous experiments on the SmartSpeech game score dataset [
19,
45]. This study’s prediction model has better accuracy, 81.52%, than the best performing method of those studies, which was the Grammatical Evolution variant named GenClass, with an accuracy of 79.56%. Moreover, the precision and recall metrics are superior in this study’s prediction model.
The findings of this study have given us a prediction model that is more accurate and clinically relevant. This model can help clinicians with the early diagnosis and treatment of NDs by providing valuable insights. This study encourages further research to focus on finding new ways to make models that help children with NDs deal with real-life problems more effectively. These models should be easy to understand, flexible, and more helpful.