1. Introduction
Results of intelligence tests are often used in childhood and adolescence to make decisions about the adequate type of school, grade level, or support measures. Moreover, the measured IQ is used as a predictor of school performance. In fact, IQ is a measure that has surprisingly high reliability and predictive validity compared to many other psychometric measures [
1,
2,
3]. Nevertheless, there are also many individual cases where IQ fails as a predictive measure of school performance. For example, children with learning disorders often fall into this category. In fact, a striking discrepancy between perceived intelligence and the inability to learn to read and write was already reported for one of the earliest cases of dyslexia described in the scientific literature in 1896 [
4]. The observation of such a discrepancy was even so remarkable that it has long been the basis for the diagnosis of learning disorders. While the DSM-5 [
5] has now abandoned the ability–achievement discrepancy as a requirement for diagnosing learning disorders, the ICD-11 definition [
6] still requires the affected academic skill to be markedly below what would be expected for general level of intellectual functioning. To put it differently, the individual IQ must by definition fail as a predictor of the academic skill in order for the learning disorder to be diagnosed. Moreover, adverse external conditions such as economic or environmental disadvantage, lack of instruction, or difficulties in speaking or understanding the language, which also affect school performance, are explicit exclusion criteria for the diagnosis of a learning disorders too [
5,
6].
Some cases of learning disorder might result from the fact that there are other individual factors besides intelligence that affect school performance. The most important personality factors to be mentioned here are motivation, emotion, and conscientiousness [
7,
8]. As a consequence, depending on the measures used, the average correlation between intelligence and school performance is only about
r = 0.50 to 0.60 [
3,
9]. In many cases, however, no such exceptional personality traits (e.g., very low motivation or very high school anxiety) can be found either. In these cases, instead of the general level of cognitive functioning or other personal traits, an alternative explanation for the poor school performance could be the general heterogeneity of cognitive abilities in children with learning disorders [
10]. In fact, in addition to the striking discrepancies between ability and school performance, children with learning disorders often exhibit exceptionally high intraindividual differences between other cognitive abilities too. For example, children with specific reading and/or writing disorder typically perform substantially below matched control groups on measures of working memory and/or perceptual speed, but not necessarily on measures of visual processing, fluid reasoning, or verbal comprehension [
11,
12,
13,
14]. For children with specific problems in mathematics, the evidence is a little bit more complex. Some studies showed that working memory is not only a specific predictor of reading and writing skills, but also a predictor of mathematical skills [
15]. By contrast, others demonstrated that dyscalculia is typically accompanied by difficulties in fluid reasoning [
2,
16,
17] and/or visual processing [
2,
18]. Since many cognitive abilities can be low in children with dyscalculia, it comes as no surprise that measures of general cognitive functioning (i.e., IQ) on average are also lower in these children compared to control groups [
2] and IQ usually is a better predictor of mathematical skills than of reading and writing skills [
3].
1.1. The Wechsler Intelligence Scales
Since the seminal work of Spearman in the early 20th century [
19], it has been widely acknowledged that different cognitive abilities are substantially correlated with each other, indicating a latent general factor also referred to as “g”. However, there is still disagreement on how the different facets of intelligence relate to each other and how they are hierarchically structured. The most frequently used intelligence test today is the Wechsler Intelligence Scale for Children (WISC) [
2]. Like other widespread intelligence tests [
20,
21], it is now predominantly aligned with the Cattell–Horn–Carroll (CHC) model of intelligence, the latter being supported by an overwhelming body of evidence [
22]. The CHC model, which postulates three levels of varying specificity, contains more than 80 narrow cognitive abilities (e.g., quantitative reasoning, auditory short-term storage, retrieval fluency etc.) on level one, up to 17 broad abilities (e.g., fluid reasoning, working memory capacity, visual processing, etc.) on level two, and a general factor on level three. This general factor is predominantly inspired by Carroll’s three-stratum model of intelligence [
1,
23]. Carroll, however, was not the first to propose a three-stratum model. In fact, Alexander [
24] already proposed a similar idea as early as 1935, stating that one common g-factor is not sufficient to explain the covariances between individual cognitive abilities, but that there are several other less general factors below the g-factor representing clusters of cognitive abilities. Wechsler recognized the importance of Alexander’s work shortly thereafter [
25]. However, although he equally assumed that there were several such clusters, in his own intelligence tests, he grouped the subtests into only two broad scales: a
Verbal and a
Performance scale [
26,
27,
28], noting that this was a content-based rather than a statistically verified grouping. Unfortunately, this second-level split far from corresponded to today’s differentiated model of intelligence, both in terms of the number and the content of the ability clusters. It was only after Wechsler’s death that the Wechsler scales were more closely adapted to the empirical findings that had since been established. The WISC-III [
29] was the first edition of the Wechsler scales to offer four different indices below the level of the
Full-scale IQ (
FSIQ) that roughly corresponded to the broad abilities described today in the CHC model. These indices were called
Verbal Comprehension,
Perceptual Organization,
Freedom from Distractibility, and
Perceptual Speed. The
Perceptual Organization Index represented aspects of both visual processing and fluid reasoning. The
Freedom From Distractibility Index combined several broad abilities but had the highest emphasis on working memory. In fact, this scale was later revised and renamed as the
Working Memory Index in the WISC-IV [
30]. In addition, the fluid reasoning component was also increased in this edition of the test. The
Perceptual Organization Index was therefore renamed as the
Perceptual Reasoning Index. The fifth edition of the WISC [
2] completed the shift towards the CHC model by splitting the
Perceptual Reasoning Index into the
Visual Spatial Index and the
Fluid Reasoning Index.
Nevertheless, the five-factor structure of the WISC-V has repeatedly been subject to criticism. In particular, it was noted that, depending on the method and sample used, only one to four factors could be extracted from the subtests with factor analysis [
31,
32,
33]. The development and critique of the Wechsler scales show that grouping subtests into broader scales is not a one-way street. Accordingly, the WISC-V does not only provide a five-factor structure. Alternatively, 9 out of 10 primary subtests can also be grouped into two broad ancillary indices, namely the
General Ability Index (
GAI) and the
Cognitive Proficiency Index (
CPI). This grouping is based on content as well as practical considerations in a manner similar to David Wechsler’s division of the subtests into a
verbal and a
performance scale. We will describe the content and possible use of the
GAI and the
CPI in more detail below.
1.1.1. GAI
The
GAI was first developed for use with the WISC–III to offer additional flexibility in the assessment of cognitive abilities [
34]. The goal was to establish an index that is less sensitive to the influence of working memory and perceptual speed. Consequently, eight subtests from the
Verbal Comprehension Index and the
Perceptual Organization Index were combined in the original scale, whereas subtests explicitly measuring working memory or perceptual speed were excluded. The development of the
GAI was largely driven by the diagnosis of learning disorders, which at that time mandated a discrepancy between ability and achievement, with the IQ score representing ability and tests on reading, writing, or mathematics representing achievement. Consequently, special education services were only granted when the IQ score was markedly higher than the results in the according achievement tests. At the same time, it was well known that many children with learning disabilities also have deficits in working memory and perceptual speed, which in turn decreased the
FSIQ in the WISC [
34]. Thus, calculating an ability index without working memory and perceptual speed increased the ability–achievement discrepancy and therefore the chance of receiving special education services if school performance was markedly below average.
Of course, the WISC has changed significantly from the third edition to the fifth edition. These changes have also affected the
GAI, which is now calculated from only five subtests, namely similarities, vocabulary, block design, matrix reasoning, and figure weights. The subtests are drawn from the
Verbal Comprehension Index, the
Visual Spatial Index, or the
Fluid Reasoning Index only. (The primary subtest, visual puzzles, from the
Visual Spatial Index, is not included.) All of these subtests are characterized by high g-loadings, ranging from 0.67 to 0.72 [
2]. The
GAI still excludes subtests measuring working memory and perceptual speed and can therefore be interpreted as a broad scale of fluid reasoning containing verbal, figural, as well as quantitative content.
1.1.2. CPI
The
CPI is basically the counterpart to the
GAI in that it combines all of the primary subtests of the WISC that measure either working memory or perceptual speed. In the WISC-V, the included subtests are digit span, picture span, symbol search, and coding. These subtests on average show much lower g-loadings than the
GAI subtests, ranging from 0.36 to 0.65 [
2]. The abilities measured in these scales are also often referred to by the term
executive functions [
35] and are closely related to attentional control [
22,
36,
37]. It has to be noted, though, that the correlation between the
Working Memory Index and the
Percepual Speed Index is rather low (
r = 0.36). Hence, at first glance, it seems that the subtests form the
Working Memory Index and the
Perceptual Speed Index should not be combined into another index beyond the
FSIQ, since they appear to have little shared variance and almost all of it is already captured in the
FSIQ. What makes the
CPI interesting to practitioners nonetheless is that many clinical groups score exceptionally low on this specific index [
38]. Specifically, they score below what would be expected based on the average
FSIQ and the correlation between the
FSIQ and the
CPI. For example, the data of clinical groups presented in the WISC-V technical manual [
2] indicate that this holds true for mildly and moderately disabled children, for children with attention deficit/hyperactivity disorder (ADHD), and for children diagnosed with autism spectrum disorder with or without language impairment. (Note that for gifted children, the
CPI on average is not below what would be expected based on the
FSIQ. Although it is well known that these children usually score lower on the
Working Memory Index and the
Perceptual Speed Index than on the
Verbal Comprehension Index, the
Visual Spatial Index, and the
Fluid Reasoning Index [
2,
30], their average
CPI is actually only within the range one would expect based on the regression toward the mean).
1.2. Rationale and Research Questions
Although the GAI is developed explicitly to diagnose learning disorders, its purpose is not to directly predict school performance. Instead, the stated goal is to maximize the ability–achievement discrepancy, which has long been considered an essential criterion of learning disorders. Since the GAI is supposed to represent ability in this procedure, the logical conclusion is that it cannot be a good predictor of academic skills. By contrast, it must be a particularly poor one. Otherwise, it may not maximize the discrepancy between ability and achievement.
On the one hand, this line of reasoning is misguided for two reasons. First, it has repeatedly been demonstrated that IQ scores combining different verbal and non-verbal subtests, such as the
FSIQ in the WISC-V, are at least moderately good predictors of school performance [
3]. Since the
GAI correlates to
r = 0.96 with the
FSIQ, the same must hold true for the
GAI. Second, if the
FSIQ fails as a predictor of academic achievement for children with learning disorders, then this should hold true for the
GAI too. The question then is if the
GAI should be used at all to diagnose learning disorders. In fact, the requirement of an ability–achievement discrepancy in the diagnosis of learning disorders has at least been removed from the DSM.
On the other hand, it is consistent with both our practical experience and empirical data that children with low school performance nevertheless often show exceptionally large discrepancies between general measures of cognitive functioning and school performance [
11,
12,
13,
14,
15,
16,
17]. We therefore propose a more direct and parsimonious explanation for the observed discrepancies, namely that they are not an indicator of the learning disorders themselves. Instead, they can simply be interpreted as an indicator of an unusually high heterogeneity among the individual cognitive abilities. Note that academic skills such as reading, writing, and mathematical skills are perceived as broad cognitive abilities in the CHC model. Thus, if these abilities deviate considerably from measures such as the
FSIQ, they must also deviate from at least some of the broad cognitive abilities that are captured in the
FSIQ.
Furthermore, we assume that the
FSIQ usually is a good predictor of school performance, but that it fails when heterogeneity among cognitive abilities is exceptionally high. Although practitioners have long relied on the assertion that the validity of global scores of intellectual functioning is diminished when heterogeneity across broad cognitive abilities is exceptionally high [
39], to the best of our knowledge, this assertion has never been directly tested. Nevertheless, the assertion is highly plausible because the basis for calculating the
FSIQ is the assumption that all cognitive abilities share a substantial amount of common variance. Hence, a specific cognitive ability should likely be low if the
FSIQ is also low. Nevertheless, apparently, for some children, the heterogeneity among their cognitive abilities is much higher than expected on the basis of representative samples. Thus, this group of children exhibits considerably less common variance in their cognitive abilities. Therefore, it can be assumed that for these children, scales such as the
FSIQ or the
GAI, representing this shared variance, generally have low predictive validity for specific cognitive abilities, especially if these specific cognitive abilities have relatively low g-loading, such as basic reading and writing skills.
In this article, we wanted to analyze whether an exceptionally large discrepancy between the GAI and the CPI is an easy-to-handle indicator of low predictive validity of the FSIQ. To this end, we compared German-speaking children with and without an exceptionally high discrepancy between the GAI and the CPI with regard to the correlations between the FSIQ and school grades in mathematics and German. As is presumably also the case in other languages/countries, the curriculum in German as a school subject mainly focuses on basic reading and spelling skills in elementary school. In secondary school, it covers a much broader range of language skills, with an emphasis on advanced language and text comprehension as well as writing skills. We hypothesized that when the discrepancy between the GAI and the CPI is high, the correlations between the FSIQ and school grades in mathematics and German are significantly lower than when the discrepancy is low. Moreover, we assumed that this effect would be even larger for the grades in German than for the grades in mathematics because, as described above, mathematical skills seem to depend equally on fluid reasoning and working memory.
Since it has repeatedly been demonstrated that the predictive validity of IQ scores is generally lower in elementary school than in secondary school [
3,
40] (note that elementary school in Germany usually covers only the first four years of school, beginning at the age of six), we hypothesized that the predictive validity of IQ scores might also be more susceptible to effects of heterogeneity in elementary school than in secondary school. We therefore conducted all analyses separately for children in elementary vs. secondary schools.
If the
FSIQ is indeed a poor predictor of school performance for children with exceptionally large discrepancy between the
GAI and the
CPI, then the question immediately arises as to which is the best predictor of school performance in this particular group. Specifically, the
CPI might be a better predictor of reading and writing because the importance of working memory and perceptual speed for reading and writing has repeatedly been demonstrated [
12,
13,
14]. Moreover, because the
CPI captures processes closely related to attentional control [
22,
35,
36,
37], it may be particularly predictive in elementary school when automation with regard to basic academic skills, such as reading fluency and spelling, is still low. Mathematical skills, on the other hand, seem to correlate more strongly with IQs than reading and writing skills. Therefore, despite reduced predictive validity, the
FSIQ might still be the best predictor of mathematical skills, even if the discrepancy between the
GAI and the
CPI is exceptionally high.
However, there is also another possible explanation as to why the FSIQ might be a bad predictor in the case of heterogeneous intraindividual cognitive abilities. The predictive validity of the FSIQ is based on the assumption of a compensation model. Such models assume that poor performance in one cognitive ability can at least partially be compensated by good performance in the other abilities. However, school performance of children with high heterogeneity in their cognitive abilities might be better explained by a deficit model. In deficit models, the overall performance is limited by the lowest single determinant. To explore the questions described above, we additionally modelled the data with lasso regression (least absolute shrinkage and selection operator regression).
4. Discussion
To determine whether the predictive validity of the FSIQ for school performance depends on the homogeneity of the test results in the WISC and to find optimal prediction models, we analyzed the significance of the FSIQ, the GAI, the CPI, or the minimum out of these variables as predictors of grades in mathematics and German as a function of the discrepancy between the GAI and the CPI. We performed this separately for children in elementary vs. secondary school.
The analysis of the correlations between the different scores and school grades demonstrated that the FSIQ alone was a significant predictor of school performance in all cases, regardless of the subject (mathematics vs. German), the type of school (elementary school vs. secondary school), or the discrepancy between the GAI and the CPI. However, for children in elementary school, a large discrepancy between the GAI and the CPI tended to decrease the predictive validity of the FSIQ. The significance level was missed quite narrowly, but the difference in terms of explained variance was so large that, in our opinion, it cannot be ignored in terms of its practical implications. Moreover, the explained variance in school grades was almost twice as high when predictors other than the FSIQ were used.
For the school grades in German, the
CPI turned out to be the best predictor in this specific group. This result highlights the importance of the cognitive abilities measured in the
CPI (i.e., working memory and perceptual speed) for the acquisition of basic reading and writing skills. The importance of working memory—especially auditory working memory—for these skills has repeatedly been shown in scientific literature [
13,
14]. Perceptual speed appears to play a more indirect role, mediated by attentional control, specifically shifting, and inhibition [
35]. In addition, fine motor skills are also closely related to perceptual speed and could play a specific role in writing skills in elementary school. For example, it has been shown that children with dyslexia not only have a deficit in spelling but many of them also show difficulties in handwriting [
50]. However, the etiology of such comorbidities is not entirely clear. On the one hand, reduced perceptual speed could have a direct causal effect on both the development of fine motor skills and spelling. On the other hand, deficits in fine motor skills could also divert attentional resources from other tasks such as spelling.
Unfortunately, our sample contained only few 6- and 7-year-old children. The reason for this is that school grades in Germany are usually not assigned before the third-grade level. However, we do not believe that this shortcoming calls into question the basic results. On the contrary, it can be assumed that the influence of attentional control is even more pronounced in the first two years of school because the automation of basic academic skills is still extremely low at this early stage.
In line with this assumption, the predictive validity of the
CPI for language skills seems to decrease in secondary school, that is, when most of the processes involved in basic reading and writing skills have been automated [
9,
51] and can therefore be performed without attentional control. In addition, in secondary school, the native language curriculum focuses much more on complex tasks, such as text comprehension and interpretation, than on basic reading and writing skills. Therefore, the role of verbal comprehension and fluid reasoning may be more important in secondary school compared to elementary school. Importantly, however, there were only weak indications that deficit models would markedly improve the prediction of the language skills in secondary school as compared to compensation models. Hence, the
FSIQ seems to be a valid predictor of language skills in secondary school, even if the discrepancy between the
GAI and the
CPI is large.
In mathematics, the pattern was somewhat different. When the discrepancy between the GAI and the CPI was large in elementary school, the most important predictor of the grades was the minimum out of the GAI and the CPI. (Note that the FSIQ is never the minimum score when the discrepancy between the GAI and the CPI is large.) This result suggests that for children with large intraindividual heterogeneity between different cognitive domains, compensation models provide suboptimal results with regard to the prediction of mathematical skills, at least in elementary school. As mentioned in the introduction, mathematical skills generally correlate more strongly with intelligence than reading and writing. In addition, children with dyscalculia often show intraindividual weaknesses in visual processing and fluid reasoning, but some of them also show intraindividual weaknesses in working memory and/or perceptual speed. The latter result already indicates that individual deficits in determinants of mathematical ability can have a limiting effect on the overall math performance. The results we obtained with lasso regression confirm this assumption. Interestingly, in secondary school, lasso regression also favored a minimum score over the FSIQ as a significant predictor of grades in mathematics, regardless of the discrepancy between the GAI and the CPI. Admittedly, though, the prediction of math skills was only slightly improved when using a deficit model as compared to prediction with the FSIQ only. Therefore, the FSIQ will probably be a valid, reliable, and easy-to-use predictor in secondary school in most cases, regardless of the subject or heterogeneity of the cognitive abilities. One must not forget, though, that the minimum score is highly correlated with the FSIQ, with correlations above r = 0.90 in every single subgroup. However, this is exactly why the advantage of deficit models presumably grows with the discrepancy between the FSIQ and the minimum score.
In summary, we demonstrated that, at least in elementary school, the GAI and the CPI can be used to markedly improve the prediction of school performance compared to the use of the FSIQ only. Specifically, a large discrepancy between both indices indicates a loss of predictive validity in the FSIQ. Of course, the question arises why one should use atheoretical scales such as the GAI and the CPI for prediction purposes in the first place, when the construct validity of the five primary indices used in the WISC-V is empirically established. Unfortunately, we could not reliably test the prediction of school performance with these five indices. Very large samples are needed to examine children with exceptionally large discrepancies between the cognitive abilities, which was the reason for compiling the German standardization samples of both the WISC-IV and the WISC-V. However, there are relatively large differences between the primary scales in the WISC-IV and the WISC-V, with the WISC-IV only providing four instead of five index scales. We therefore leave this issue to future studies that may be conducted with the compiled standardization samples of the WISC-V and the WISC-VI.
Although we faced a methodological barrier here, there are some theoretical and practical reasons for using the GAI and the CPI instead of the five primary indices. First, the GAI and the CPI are, on average, more reliable than the primary indices because they are each based on more subtests. Second, two scales can be compared faster and easier to each other than five different indices (simple difference score vs. complex profile analysis). Apart from these aspects, the same caveat that applies to the FSIQ presumably also applies to the two scales: the validity of the scales might be markedly decreased in case of too much heterogeneity within each scale. To put it differently, using the five primary indices probably does not significantly improve the prediction of academic achievement when intraindividual heterogeneity within the GAI and the CPI is small, but it might do so when heterogeneity is large. The use of the scales thus requires that homogeneity within the scales is ascertained beforehand.
Unfortunately, there were also a few limitations which restrict the validity of our results. First of all, the WISC-IV and the WISC-V contain different subtests, which is why the
GAI and the
CPI were derived in slightly different ways in the two subsamples. Specifically, we used two auditory working memory subtests to derive the
CPI in the WISC-IV subsample, whereas in the WISC-V, one auditory and one visual–spatial working memory subtest enter the
CPI. This difference might be important since auditory working memory has been shown to be more important for reading and writing skills as compared to visual–spatial working memory [
13,
52]. Therefore, the ancillary auditory working memory scale of the WISC-V might have a higher predictive validity than the
CPI. However, there was no indication in our data that this is in fact the case.
Second, the FSIQ is derived from a different number of subtests in the two different WISC versions. While all ten core subtests are used in the WISC-IV, “only” seven of the ten primary subtests are used in the WISC-V. However, since the correlation between the corresponding FSIQ scores of both instruments is almost as high as the test–retest correlation of the German WISC-V, we do not think that the different number of subtests plays a major role.
Third, the school system in Germany is such that after elementary school—which in Germany usually covers only the first four years of school—children are assigned to different school types based on their previous school performance. This system leads to limited variance in the cognitive abilities of each individual type of school. Furthermore, the same grade does not necessarily reflect the same level of proficiency when achieved in different types of schools. Both factors lead to a reduced covariation in the secondary school variables. Therefore, the validity of the models and results is probably markedly lower in secondary school as compared to elementary school.
Fourth, we used grades as dependent variables. They can be easily collected and were therefore collected by default in the WISC-IV and WISC-V standardization. Nevertheless, they have certain drawbacks too. First, the curriculum in German (as presumably in other countries too) mainly focuses on basic reading and writing skills in elementary school but covers a wide range of linguistic skills in secondary school, for example, knowledge about different types of texts, literary periods, or authors. Therefore, the grade in German measures very different skills and abilities in elementary school as compared to secondary school. Second, grades usually are not normally distributed. Third, their variance is smaller than that of standardized achievement tests. Fourth, they tend to be less reliable than the results of standardized achievement tests because they are partly based on subjective judgments and are therefore susceptible to various types of judgment biases. Therefore, the variance explained by the different models was certainly less than it would have been if standardized achievement tests were used as dependent variables.
Finally, it is well known that gifted children frequently score much lower on the
CPI as compared to the
GAI. By contrast, the cognitive profiles of children with a below-average IQ or intellectual disability are much more homogeneous on average [
2,
30]. Hence, whether a certain discrepancy between the
GAI and the
CPI is actually exceptional or not also depends on the general ability level of the child. The results obtained in this study may therefore not apply equally to all levels of general cognitive ability. However, methodologically, it is extremely difficult and costly to include the general ability level as a qualifying factor in such studies, as this would require extremely large samples. In particular, the question remains as to whether the abilities captured in the
CPI are a limiting factor for the acquisition of basic academic skills in gifted children to the same extent as in children with an average or a below-average IQ score.
5. Conclusions
Decisions about school placement or support measures are often based solely on IQ scores. Our analyses have shown that increased attention also needs to be paid to heterogeneity in the intraindividual cognitive profile, especially in elementary school. Unfortunately, the ICD-11 still requires academic performance to be below what would be expected for chronological age and level of intellectual functioning in order to diagnose learning disorders. To put it differently, a lack of predictive validity in the IQ score is considered a necessary criterion of the disorder. Clearly, we showed that the predictive validity of the FSIQ only decreases when the discrepancy between the CPI and the GAI is exceptionally large. Hence, the affected children were in fact exceptional in certain respects. However, exceptionality in and of itself is not necessarily a sign of pathology; otherwise, giftedness would also have to be considered a disorder. In our opinion, it is the low performance that makes the learning disorder a disorder, not the fact that existing (and probably insufficient) predictive models fail. After all, no one would argue that cancer is a disease and should be treated only if it was not expected for chronological age and level of physiological functioning, or that a hurricane is only bad weather if the weather forecast was wrong. The only difference is that the necessary actions can be taken earlier if the forecast is correct.
In light of the findings described in this article, we consider it imperative that the ability–achievement discrepancy criterion, which has long since been removed from the DSM [
5], will finally be removed from future versions of the ICD as well. As shown, the expectation for academic achievement may turn out differently if intraindividual heterogeneity in the cognitive abilities is taken into account in the prediction, at least in elementary school. Most importantly, the prediction of academic achievement can be improved if exceptionally high intraindividual deficits, such as weak working memory and attentional control, are included as predictors instead of IQ alone.
Finally, our data have shown that cognitive measures are generally limited predictors of school performance even when heterogeneity is included. However, whether and how other factors, such as conscientiousness, motivation, or socioeconomic status, should be included in school decisions goes beyond the scope of this article because such data were not contained in our samples.