1. Introduction
The past two decades have seen a significant increase in the number of meta-analyses examining topics in L2 reading comprehension. These meta-analyses vary in their scope, purpose, and methodology; some are large-scale, including a wide range of reading-related variables (e.g., 10 and 11 variables in [
1] and [
2], respectively) while others focus on a single or a smaller number of variables (e.g., 1 variable each in [
3,
4]). Some examine the relationship between L2 reading comprehension and relevant linguistic (e.g., decoding, orthographic skills, phonological awareness, morphological knowledge/awareness, vocabulary, grammar), cognitive (e.g., working memory), and metacognitive (e.g., metacognitive awareness, reading strategy) variables in nature [
1,
2,
4,
5]. Others have examined the treatment effectiveness of interventions targeting certain reading-related variables (e.g., phonology instruction in [
6], reading strategy instruction in [
7,
8]) or certain reading methods (e.g., extensive reading in [
9]). Yet, other meta-analyses have investigated the potential effects of test task types [
10] or effects of certain reading conditions (e.g., use of gloss in [
11]) on reading comprehension outcomes.
Over the years, meta-analyses on L2 reading comprehension have also made significant advancements in their statistical techniques. Meta-analyses that examine the relationship between reading-related variables and reading comprehension now do not stop at synthesizing bivariate correlations reported in primary studies, but also test theoretical models through meta-analytic structural equation modeling (henceforth, MASEM) [
12].
Because meta-analyses typically pre-require a substantial body of research on a topic, topics studied by meta-analyses indicate notable trends and important issues within the field. However, reviewing all of the diverse topics and aspects of reading-related meta-analyses of the past decades is beyond the scope of this review. For this reason, in this review, we focus our attention on the meta-analyses of primary studies that have investigated the relationship between L2 reading comprehension and reading-related variables through correlational techniques (e.g., meta-analyses of correlation coefficients, MASEM). Through this review, we aim to investigate the following research questions: (1) what are the notable patterns in the methods and findings (convergent and divergent, if any) across the 14 meta-analyses that examined the relationship between L2 reading comprehension and reading-related variables, and (2) what are gaps in the current research domain? We then offer our suggestions for future research.
2. Materials and Methods
We conducted a series of literature searches in September and October of 2023 using the following method. First, we ran searches using the ExLibris PRIMO discovery system (henceforth, PRIMO) using the following keywords with no restrictions on search date (i.e., publication dates of eligible studies): second language, foreign language, L2, reading comprehension, reading, meta-analy*, correlat*, relation*, variable* (adding an asterisk * after the root word allows the database to search for any possible ending after the asterisk, e.g., meta-analy* will search for all of the following terms: meta-analysis, meta-analyses, meta-analytic). PRIMO is a one-stop service discovery system that pulls search results from all scholastic databases such as ERIC, LLBA, PsycINFO, ScienceDirect, Web of Science, etc. that are available through the library. For this reason, we used PRIMO instead of using individual databases. In addition, we used Google Scholar, as it is another comprehensive search tool that we believed would supplement our literature search.
To be included in this review, first, the study had to be a quantitative meta-analysis; narrative or synthetic reviews of a qualitative nature were excluded. Second, the study had to report on the relationship between L2 reading comprehension and one or more reading-related variables (e.g., decoding, phonological awareness/knowledge, morphological awareness/knowledge, vocabulary knowledge, grammar, metacognition). No meta-analyses examining treatment effectiveness were included in the review. After the initial pool of potentially eligible studies (k = 55) was gathered, we examined the full texts of the studies in detail to see if they were suitable for our review. At this stage, 41 studies were eliminated for various reasons (e.g., they were not quantitative meta-analyses but narrative synthesis reviews; they were not meta-analyses of correlation-based statistics but reported on effect sizes based on treatment efficacy), leaving 14 meta-analyses.
We present in
Table 1 a summary of the key features of the 14 meta-analyses on L2 reading comprehension. As can be seen in the “Years Searched” column, most studies searched relevant literature spanning two to three decades, dating back as early as the 1970s. Two studies did not place restrictions on the earliest publication date for their literature search; In’nami et al. [
5] conducted their literature search in October 2020, with no restrictions on the earliest publication date, and Lee et al. [
12] similarly searched all relevant literature published up to December 2020.
Three meta-analyses [
1,
2,
3] reported that they only included articles in refereed publications, while most of the rest of the studies included both published and unpublished research, citing the concern of publication bias as the rationale for this decision.
As can be seen in
Table 1 and
Table 2, the vast majority (11 of 14) of meta-analyses included in this review were published between 2020 and 2023, indicating a conspicuously recent interest in the use of meta-analysis among L2 reading researchers. While most of these studies included participants with wide-ranging age groups (e.g., kindergarten to graduate levels), Ke et al. [
15] and Zhang et al. [
3] limited their age range to ages 5 to 18 and lower elementary to high school for their focused interest in biliteracy development during childhood [
15] and among school-aged readers [
3].
It is interesting to note that the two studies by Dong and colleagues [
13,
14] limited their L1 type to Chinese, while the rest of the studies included various L1s and L2s. Particularly, Dong et al. [
13] only included primary studies whose L1 was Chinese and L2 was English. The authors of the study explained that this decision was influenced by the substantial differences between the logographic Chinese and alphabetic English writing systems and how these differences may manifest in the contributions of metalinguistic knowledge to reading comprehension among Chinese L1 readers reading in L1 and L2. Similarly, Dong et al. [
14] also included primary studies with subjects whose L1 was Chinese, although it did allow studies on participants with various L2s. In this study, which examined the contribution of vocabulary knowledge to reading comprehension, the language type (measurements made in L1 and measurements made in L2) was used as a moderator to examine any differences as a function of language type.
In addition to Dong et al. [
13,
14], two meta-analyses restricted the language types included in the primary studies. Park [
17] sought to examine the relationship between reading strategy use and reading comprehension in L2 English exclusively because of a special interest in L2 reading in EFL/ESL settings. Similarly, Zhang et al. [
3], due to their focused interest in English morphological awareness as a contributor to English reading comprehension, limited their L2 to English.
All of the 14 meta-analyses aimed to examine the relationship between reading-related variables and L2 reading comprehension. As such, all of them synthesized effect sizes that were correlational (e.g., Pearson’s
r, Fisher’s
z). However, two of these meta-analyses took a step further and conducted model testing through MASEM. MASEM, a technique that combines meta-analysis and structural equation modeling, fits the researcher’s theoretical model of choice (e.g., the Simple View of Reading) with the meta-analytic data, for example, correlation matrices composed of averaged correlations yielded by multiple independent samples. As mentioned above, the unique advantage of this technique is that it not only allows the synthesis of univariate correlations from primary studies, but also enables the researcher to test the validity of theoretical models using empirical data pooled from multiple studies. Although it is a relatively new technique to be introduced in L2 reading research, MASEM has been more popularly used in other disciplines (e.g., business, psychology, education), and the tools for this analysis are also diversifying for improved user-friendliness [
21].
In
Table 3 and
Table 4 below, we provide an overview of the reading-related variables whose relationship with L2 reading comprehension was examined in each of the meta-analyses included in this review. In the columns of
Table 3 and
Table 4, variables that are conventionally thought of as lower-level processes (decoding, orthography, phonology, morphology, vocabulary, and grammar) appear before variables that are conventionally considered as higher-level or cognitive (rather than linguistic) processes (meta-cognition, working memory) or composite proficiency variables (L1 reading comprehension, L2 listening comprehension). Oral reading fluency, a variable that was included in only one meta-analysis [
2], is also a composite variable that represents multiple linguistic processes working in conjunction and is listed in the last column of the tables. Lastly, the numbers appearing parenthetically under each variable name in the columns of the tables indicate the number of meta-analyses that examined the variables in question (e.g., 4 meta-analyses included decoding and 3 meta-analyses included orthography-related variables).
Regarding our discussion of the study findings that follows
Table 3 and
Table 4, the reading-related variables are ordered based on multiple factors, such as (1) their importance as a contributor to L2 reading comprehension (e.g., key linguistic knowledge variables such as vocabulary and grammar were presented first and next to each other for this reason) and (2) their theoretical relevance (e.g., sub-lexical, metalinguistic variables such as decoding, orthographic awareness, and phonological awareness are presented next, and together for this reason). We would like to note, however, that the order of appearance of these variables does not indicate their absolute importance as reading contributors, as all of them were found to have a significant relationship with L2 reading comprehension in the meta-analyses reviewed herein.
3. Vocabulary and L2 Reading Comprehension: Findings from Meta-Analyses
Five meta-analyses investigated the relationship between vocabulary knowledge and L2 reading comprehension with different foci and yielded interesting findings. Jeon and Yamashita [
1,
2], in their respective, large-scale meta-analyses, synthesized 31 and 51 correlations between vocabulary knowledge and L2 reading comprehension. Jeon and Yamashita [
2] was based on the 31 correlations included in Jeon and Yamashita [
1] plus 20 additional correlations from more recent primary studies published between 2011 and 2017. In both studies, the overall correlations were significant and large (
r = 0.79 in Jeon and Yamashita [
1] and
r = 0.724 in Jeon and Yamashita [
2]). Jeon and Yamashita [
1] also found, through moderator analyses, that the correlation was higher for older (adolescent and adult) L2 readers rather than for children. Although Jeon and Yamashita [
2] did not find a significant moderator effect, the correlation was, again, found to be higher for the adolescent/adult group (
r = 0.746) than for the child group (
r = 0.698); as readers grow older and their L2 proficiency develops, reading texts feature increasingly difficult, low-frequency words which are likely to be associated with a wider range of individual variability. We believe that the higher correlation observed among older L2 readers is indicative of this change.
The relative importance of vocabulary knowledge as a reading correlate among readers of different age levels or education levels was investigated in depth by Dong et al. [
14]. The authors meta-analyzed studies involving Chinese participants whose ages ranged from elementary school to the graduate level to investigate whether the magnitude of the relationship between vocabulary knowledge and reading comprehension is moderated by the education stage. Covering studies published over 20 years, Dong et al. [
14] reported that the size of the overall correlation was nearly large (
z = 0.54,
p > 0.001). In addition to finding the overall correlation between vocabulary and comprehension, the meta-analysis investigated the potential moderator effects of the education stage, thereby examining the validity of the reading stage statement [
22] and the Information Gap Theory [
23]. Both theories noted that the nature of reading comprehension significantly shifts as the reader progresses from lower grades (early education stages) to higher grades (later education stages) due to the changes in the purpose of reading (learning to read to reading to learn) and more rigorous requirements in terms of word, passage, and main idea comprehension in later education stages. As a result, the education stage moderates the relationship between vocabulary knowledge and reading comprehension. Dong et al. [
14] showed through their moderator analysis that the relationship between vocabulary knowledge and reading comprehension was, indeed, significantly moderated by the education stage. More interestingly, the authors reported that the size of the overall correlation plotted an inverted U-shape across education stages, which ranged from primary school to secondary school to the university stage to the Master’s stage.
In a large-scale meta-analysis focusing on vocabulary and L2 comprehension (reading and listening comprehension), Zhang and Zhang [
4] carried out an extensive series of moderator analyses to investigate how different dimensions of vocabulary knowledge (e.g., recognition, recall, associational) moderate the relationship between vocabulary and reading comprehension. To this end, the features of vocabulary measures used in the primary studies were coded using the following scheme: type of form-meaning knowledge (meaning recognition, meaning recall, form recall), type of vocabulary depth (associational, morphological), test modality (auditory, orthographical), and context of vocabulary knowledge items (dependent, independent). The results showed that meaning recall knowledge had the strongest correlation (0.66) with L2 reading comprehension. Vocabulary depth knowledge in the form of word association knowledge and morphological awareness also showed a significant correlation (0.54 and 0.53) with L2 reading comprehension. As for the modality of the vocabulary test, the orthographical test type correlated more strongly with the L2 reading comprehension than the auditory test type.
Lee et al. [
12], through MASEM, also investigated the effects of vocabulary knowledge on L2 reading comprehension. As one of the three indicators (L2 vocabulary knowledge, L2 grammar knowledge, and L2 listening comprehension) of a latent variable, L2 reading comprehension abilities, L2 vocabulary knowledge showed a large correlation with L2 reading comprehension (
k = 58,
r = 0.55). It also showed a significant and large path coefficient to L2 comprehension abilities (
β = 0.77) which, in turn, had a significant and nearly large path coefficient to L2 reading comprehension (
β = 0.55).
Although all of the four meta-analyses reviewed above examined the relationship between vocabulary knowledge and L2 reading comprehension, the overlap across the primary studies included in these meta-analyses was surprisingly low; for example, Jeon and Yamashita [
2], which included 45 studies involving vocabulary, and Zhang and Zhang [
4], which included 21 studies involving vocabulary, shared only two studies. Another comparison between Dong et al. [
14] (68 samples involving vocabulary) and Jeon and Yamashita [
2] yielded a similar result of only five overlapping samples. Given the minimal overlap, we can claim that the consistently strong average correlation between vocabulary and L2 reading comprehension observed in the meta-analyses is a true and robust finding, and not a simple product of duplicate samples across meta-analyses.
4. Grammar and L2 Reading Comprehension: Findings from Meta-Analyses
Considering the importance of grammar as one of the two pillars of linguistic knowledge contributing to reading comprehension along with vocabulary, a surprisingly small number of meta-analyses (4) have investigated the relationship between grammar and L2 reading comprehension. The overall correlations reported by Jeon and Yamashita from 18 and 26 effect sizes, in their studies [
1,
2], respectively, were large (
r = 0.85 in [
1] and
r = 0.697 in [
2]). The increase in the number of effect sizes included in Jeon and Yamashita [
2] enabled moderator analyses with higher reliability and showed that the script distance between L1 and L2 (alphabetic–alphabetic, alphabetic–logographic, alphabetic–mixed) and one type of measurement characteristic (sentence completion, grammaticality judgment, other types) significantly moderated the relationship between grammar and L2 reading comprehension.
Zheng et al. [
20] also investigated the relationship between grammar and reading comprehension. This meta-analysis, however, expanded on the findings of Jeon and Yamashita [
1,
2] by examining the relationship between L1 grammar and L1 reading comprehension (41 effect sizes), as well as that between L2 grammar and L2 reading comprehension (49 effect sizes). The authors reported that the overall effect size, based on both L1 and L2 language types, was large (Fisher’s
z = 0.54). Additionally, the authors provided more fine-grained results by examining the moderating effects of language type (L1 vs. L2) in five grade levels (kindergarten to grade 2, grade 3 to grade 6, secondary school, grade 7 to grade 12, undergraduate and master’s students). The results showed that language type did not significantly moderate the relationship between grammar and reading comprehension, indicating that the magnitude of grammar as a contributor to reading comprehension remains largely similar across L1 and L2. In contrast, grade level was found to have significant moderator effects; the relationship between grammar and reading comprehension was stronger in the higher grades than in the lower grades. The authors attributed this finding to the increasingly complex language structures common to texts read at higher grades and the higher demands for grammatical knowledge required of readers at this level.
Lee et al. [
12] synthesized 23 correlations between L2 grammar knowledge and L2 reading comprehension in their meta-analysis and reported an overall average correlation of 0.49. The authors also tested the validity of the Simple View of Reading model via MASEM. Although L2 grammar knowledge had the lowest path coefficients of the three indicators (
β = 0.78 for L2 listening comprehension,
β = 0.77 for L2 vocabulary knowledge, and
β = 0.69 for L2 grammar knowledge), the size of the coefficient was significant and large.
Overlaps across the primary studies included in the four meta-analyses were found to be small, with only six overlapping studies between Jeon and Yamashita [
2] and Zheng et al. [
20], for example. Since Lee et al. [
12] did not report the subset of studies that were meta-analyzed for different variables, we could not examine the extent of overlap between this study and the other studies. With this caveat in mind, we can claim that the strong relationship between grammar and L2 reading comprehension found in multiple meta-analyses is genuine and not an artifact of overlapping primary studies.
5. Morphology and L2 Reading Comprehension: Findings from Meta-Analyses
Five meta-analyses examined the relationship between morphology and L2 reading comprehension [
1,
2,
3,
6,
17]. In their respective meta-analyses of Jeon and YamashitaJeon and Yamashita [
1] (
k = 6) and Jeon and YamashitaJeon and Yamashita [
2] (
k = 11) including children, adolescent, and adult L2 readers, Jeon and Yamashita [
1,
2] reported the overall correlations to be large (
r = 0.61 and 0.635, respectively), indicating that various constructs related to morphology (e.g., morphological awareness, knowledge of English morphemes) have a strong relationship with L2 reading comprehension outcomes.
Dong et al. [
13] also investigated the relationship between morphological awareness and reading comprehension, but only through primary studies whose participants were Chinese L1 speakers. Of the three metalinguistic variables included in this study (orthographic skills, phonological awareness, morphological awareness), morphological awareness was found to be the strongest correlate with reading comprehension (Fisher’s
z = 0.49). In this study, the authors only reported the overall effect size computed based on the correlations between morphological awareness and reading comprehension in Chinese (participants’ L1) and English (participants’ L2), and did not report separate correlations for reading comprehension of different language types. However, the authors reported that the language type of reading comprehension measure did not have significant moderating effects on the relationship between morphological knowledge and reading comprehension. In other words, the importance of morphological awareness was comparable across different orthographies (i.e., logographic Chinese vs. alphabetic English). With regard to this finding, the authors conjectured that L1 Chinese learners who are trained to rely on morphological cues as they learn to read their L1 seem to transfer this strategy to reading English.
Contrasting with Jeon and Yamashita [
1,
2] and Dong et al. [
13], which included a range of other correlates in addition to morphological awareness, Ke et al. [
15] took a focused look at the role of morphological awareness in reading comprehension by examining the relationship between L1 morphological awareness and L2 reading comprehension (
r = 0.39,
k = 6), as well as that between L2 morphological awareness and L2 reading comprehension (
r = 0.52,
k = 17). Moderator analyses showed that the relationship between L2 morphological awareness and L2 reading comprehension was stronger in older elementary school children (grades 3 to 5.
r = 0.59,
k = 6) than among younger children (kindergarten to grade 2,
r = 0.34,
k = 3).
In a large-scale meta-analysis, Zhang et al. [
3] utilized MASEM, which allows for theory-based model testing. In two subgroups of studies, each of which involved monolingual and bilingual school-aged students, Zhang et al. [
3] examined how morphological awareness directly and indirectly affects L2 reading comprehension in connection with other variables, such as word reading and vocabulary knowledge. In the bilingual subgroup (
k = 46, total number of study participants = 5423), morphological awareness had a significant, direct effect on reading comprehension (β = 0.201). Moreover, it was found that morphological awareness made a unique contribution to predicting L2 reading comprehension above and beyond word reading and vocabulary knowledge. Further, the results showed that morphological awareness also had significant indirect effects on reading comprehension through word reading and vocabulary knowledge, respectively.
An interesting finding emerged in the comparison of the indirect effects of morphological awareness on reading comprehension through word reading between the monolingual and bilingual subgroups; the results of the study showed that the indirect effect was much smaller for the bilingual subgroup. The authors explained that this finding indicated that bilingual readers seem to rely on meaning-based, morphological analysis of words to achieve reading comprehension, while monolingual readers may rely on the code-based route (using morphological awareness for word recognition to read). Although this result cannot be directly compared to those of Dong et al. [
13] and Jeon and Yamashita [
1,
2], as these studies only reported on the univariate meta-analysis results, the results of Zhang et al. [
3] and the two previously mentioned studies resonate with one another in that morphological awareness consistently emerged as the strongest variable among other, lower-level reading variables (e.g., orthography, phonology, word reading). The findings yielded by Dong et al. [
13] and Jeon and Yamashita [
1,
2] may also be explained by the use of a meaning-based (over-based) route by L2 readers, as suggested by Zhang et al. [
3].
In line with our previous observation, the primary studies included in the meta-analyses reviewed here did not substantially overlap either; Ke et al. [
1], a meta-analysis whose sole focus was morphological awareness, and Jeon and Yamashita [
2], respectively, had 36 and 14 samples on morphological awareness, but only six overlapping samples. Dong et al. [
13] had zero overlapping primary studies with Jeon and Yamashita [
2]. Zhang et al. [
3] had a slightly larger overlap of eight studies with Jeon and Yamashita [
2], but to no substantial degree considering the large primary study pool (107 studies).
6. Decoding and L2 Reading Comprehension: Findings from Meta-Analyses
Four meta-analyses reported on the relationship between decoding and L2 reading comprehension. Jeon and Yamashita [
1] reported, from a pool of 20 independent samples, an average correlation of 0.56. None of the moderators examined showed a significant result. In their updated meta-analysis, Jeon and Yamashita [
2] included nine additional independent samples and reported a similarly high average correlation of 0.585. This time, one moderator, language setting, was found to have a significant effect; the average correlation for the second language setting was higher (0.622) than for the foreign language setting (0.424).
Lee et al. [
12] whose primary focus was to investigate the validity of the Simple View of Reading in the context of second-language reading, carried out a more microscopic investigation into the relationship between decoding and reading comprehension; this study reported a separate overall correlation between the following pairs: L2 decoding efficiency and L2 reading comprehension (
r = 0.48,
k = 18), L2 decoding accuracy and L2 reading comprehension (
r = 0.57,
k = 23), and L2 decoding fluency and L2 reading comprehension (
r = 0.41,
k = 14), all of which were in a comparable range with those reported by Jeon and Yamashita [
1,
2]. In addition to reporting the average univariate correlations, Lee et al. [
12] also performed model testing using MASEM. The results showed that, of the three indicators of L2 decoding skills, L2 decoding accuracy had the highest beta coefficient (
β = 0.90 for L2 decoding accuracy,
β = 0.76 for L2 decoding efficiency, and
β = 0.67 for L2 decoding fluency). The beta coefficient of the path from L2 decoding skills to L2 reading comprehension was also significant and nearly large (
β = 0.38), although it was smaller in size compared the beta coefficient of the path from L2 comprehension abilities to L2 reading comprehension (
β = 0.47). The smaller contribution of decoding skills to reading comprehension found in this study resonated with the findings of Zhang et al. [
3], which reported that bilingual readers, compared to their monolingual counterparts, tend to rely more heavily on the meaning-based route (using morphological awareness directly to access word meaning) than the code-based route (using morphological awareness for word recognition to read).
Lastly, Melby-Lervåg and Lervåg [
16], as part of their meta-analysis, examined the relationship between L2 decoding and L2 reading comprehension among children whose ages ranged from 5 to 10. The average correlation of six independent samples was significant and large (
r = 0.54). In addition, the authors reported that the relationship between L1 decoding and L2 reading comprehension became significantly weaker as children grew older.
The overlap across the primary study pools included in the meta-analyses was, yet again, far from extensive. Between the 29 samples included in Jeon and Yamashita [
2] and the 6 samples included in Melby-Lervåg and Lervåg [
16], there were only 3 overlapping samples. As noted above, crosschecking samples with Lee et al. [
12] was not possible because the researchers did not report which particular studies or samples were included in the meta-analysis of correlations between decoding and L2 reading comprehension. We conclude, therefore, that the consistently strong relationship between decoding and L2 reading comprehension found in meta-analyses can be taken at face value and not as an artifact of overlapping primary study pools.
8. Phonology and L2 Reading Comprehension: Findings from Meta-Analyses
Three meta-analyses examined the relationship between phonology and L2 reading comprehension: Dong et al. [
13] and Jeon and Yamashita [
1,
2]. In all three studies, the construct under analysis was phonological awareness, indicating its importance in reading research. In Jeon and Yamashita [
1], phonological awareness was treated as a low evidence correlate due to its relatively low number of effect sizes (
k = 11). The overall average correlation was 0.48 and due to the small number of effect sizes, no moderator analyses were carried out. In Jeon and Yamashita [
2], however, phonological awareness rose to the status of a high evidence correlate with nine additional effect sizes. The mean correlation was slightly higher (
r = 0.611) compared to that of Jeon and Yamashita [
1], but no moderators were found to have a statistically significant effect on the relationship between phonological awareness and L2 reading comprehension.
As noted earlier, Dong et al. [
13] first reported an average correlation based on 28 effect sizes (7 from L2 English samples and 21 from L1 Chinese samples), which included both Chinse L1 (i.e., correlations between L1 Chinse phonological awareness and L1 Chinese reading comprehension) and English L2 (i.e., correlations between L2 English phonological awareness and L2 English reading comprehension). Then, the authors examined through a moderator analysis whether the language type (L1 Chinese vs. L2 English) had a significant effect on the relationship between phonological awareness and reading comprehension. The overall average correlation based on the entire effect sizes involving both L1 Chinese and L2 English was moderate in size (Fisher’s
z = 0.33). The moderator analysis results showed that language type (i.e., L1 Chinese vs. L2 English) did not have a significant effect, indicating that phonological awareness is a comparably important correlate with reading comprehension across languages.
A comparison of the study pools included in Dong et al. [
13] and Jeon and Yamashita [
2] yielded only two overlapping samples, indicating that the moderate to moderate to large overall effect sizes found in the meta-analyses were true findings, not artifacts of redundant sampling.
9. Working Memory and L2 Reading Comprehension: Findings from Meta-Analyses
Four meta-analyses examined the relationship between working memory and L2 reading comprehension. Although the overall goal was commonly to synthesize the correlation between these variables, the sample size varied according to the researchers’ interests and the times of the studies. Jeon and Yamashita [
1,
2] and Shin [
20] had relatively smaller sample sizes (10, 19, and 37, respectively) compared to In’nami et al. [
4] (228 samples). In’nami et al. [
4] was the most comprehensive, with the fewest restrictions on participants and tasks, while the other three had more specific research focuses. Jeon and Yamashita [
1,
2] delved into the relationship between L2 reading comprehension and various L2 reading components, including working memory. Consequently, their studies were more strongly guided by L2 reading theories than theories or past findings related to working memory per se. For Jeon and Yamashita, examining the effects of L1–L2 script/language distances was one of the critical theoretical considerations, leading them to exclude studies with mixed or unidentified L1 groups. Shin [
18] specifically included studies utilizing reading span tests only. Despite such variations in the inclusion/exclusion criteria, the overall mean correlations remained consistent across studies, ranging from small to medium [
24]: 0.30, 0.42, 0.33, and 0.30 in In’nami et al. [
5], Jeon and Yamashita [
1,
2], and Shin [
18], respectively.
Apart from Jeon and Yamashita [
1], who refrained from conducting moderator analyses due to a small sample size, three meta-analyses explored potential moderator variables that could affect the overall effect sizes. Various moderators examined in each study are broadly categorized as measurement features (working memory test and reading comprehension test variables) and learner features. Both In’nami et al. [
5] and Shin [
18] focused on working memory and examined a wider range of measurement features compared to learner features, while Jeon and Yamashita [
2] exhibited the opposite trend.
A consistent finding emerged across all three moderator analyses in the realm of working memory test features; L2 working memory tests showed stronger correlations than L1 tests (the effect of testing language). In’nami et al. [
5] found no significant effects from other working memory test variables, including content (verbal/non-verbal/both items), complexity (simple/complex tasks), and mode (paper/computer). Jeon and Yamashita [
2] similarly found no significant effects related to L1–L2 script distance or L1–L2 language distance. These collective findings underscore the significance of the effect of testing language in working memory measures. Upon this finding, In’nami et al. [
4] maintained that, in studies utilizing both L1 and L2 working memory tasks, they should be treated separately (i.e., the scores should not be averaged), with a recommendation to use L1 tests to minimize the interference of L2 proficiency with working memory capacity. This convergence of meta-analytic findings suggests that even if L1 and L2 working memory capacities are highly correlated and, thus, argued to be language-independent by some researchers [
25], concerning the association with L2 reading comprehension, the L1 and L2 working memory tests likely do not tap into identical constructs. Future researchers are advised to carefully consider the testing language of their working memory measure in line with their theories and research purposes.
Not as robust as the effect of the testing language, the effect of the scoring method of the reading span test, the most popular working memory tests in reading research, suggested that scores incorporating both storage and processing components of working memory exhibit higher correlations than scores representing only the storage component [
5,
18]. This result aligns with the demands of reading comprehension, which requires simultaneous storage and processing of information.
Many of the moderator analyses in In’nami et al. [
5] and Shin [
18] revolve around the reliability of working memory and reading comprehension measures. While there are similarities and differences in their findings, both studies highlighted a low ratio of reliability reporting in the included studies: 27.03% (20 out of 74 samples) and 29.73% (22 out of 74) in In’nami et al. [
5]; 32.43% (12 out of 37) and 24.32% (9 out of 37) in Shin [
18] in working memory and reading comprehension tests, respectively. In working memory measures, they consistently found significant differences in the magnitude of correlations between studies that reported reliabilities and those that did not. Interestingly, however, their findings were the opposite. Shin [
18] observed a higher correlation in studies reporting reliabilities, while In’nami et al. [
5] found a higher correlation in studies that did not report reliabilities. The latter finding, unexpected given the assumption that studies reporting reliabilities are conducted more rigorously, led In’nami et al. [
4] to contend that not reporting reliabilities might be a field convention rather than an indication of low reliability. If it is true, the field convention should shift towards encouraging reporting reliabilities. Additionally, Shin [
18] found the effects of the number and length of sentences in the reading span task, with studies using larger numbers or longer sentences showing stronger correlations, to likely be due to higher reliabilities associated with these features. In reading comprehension tests, the difference between studies that reported reliabilities and those that did not was not significant [
5,
18]. On the other hand, Shin [
18] found a higher correlation in studies using standardized tests compared to those with researcher-made tests, a distinction not observed by In’nami et al. [
5]. Generally, higher reliabilities are expected in standardized tests than in researcher-made tests, as the former go through extensive development and examination by expert teams, although researcher-made tests may better cater to the local needs and characteristics of focused learner groups. Viewed collectively, it seems that a higher reliability of measurement (both working memory and reading comprehension) tends to produce stronger correlations when reading span tests are used [
18], but such a trend is not as evident with a broad range of working memory measures, including verbal/nonverbal and simple/complex tasks [
5].
Regarding the sample overlap, the largest sample study by In’nami et al. [
5] had 18 and 11 samples in common with Shin [
18] and Jeon and Yamashita [
2], respectively. The latter two studies shared seven samples.
10. Metacognition/Reading Strategies and L2 Reading Comprehension: Findings from Meta-Analyses
Metacognition is a broad concept originally defined as “the knowledge and cognition about cognitive phenomenon” [
26] (p. 906). It is recognized as crucial, along with motivational and affective factors, for understanding how and why people perform cognitive tasks as they do [
27]. In reading research, it is widely, if not exclusively, conceptualized through two components: metacognitive knowledge/awareness about self/person, task, and task-specific strategies and metacognitive control in three domains (planning, monitoring, and revising) [
27,
28]. Each domain of metacognitive control is achieved through various reading strategies, such as setting a goal, previewing, predicting, inferring, summarizing, skipping, re-reading, or problem-solving. Consequently, metacognition plays a pivotal role in strategic reading [
29]. Given the strong connection between metacognition and reading strategies, we explore insights from four meta-analyses that investigated metacognition or reading strategies in relation to reading comprehension in this section.
Jeon and Yamashita [
1,
2] adopted a broad definition of metacognition, accepting a wide range of operational definitions such as perceived and actual use of reading strategies, self-assessment of reading comprehension, and monitoring. As mentioned earlier, they excluded studies with mixed L1 groups. With sample sizes of 10 and 11, the overall mean correlations were 0.32 and 0.33 in Jeon and Yamashita [
1] and Jeon and Yamashita [
2], respectively. Because of the relatively small sample sizes, these studies did not conduct moderator analyses. Park [
17] focused on reading strategy use, defining it as “techniques, actions, and procedures that readers deliberately employ to enhance their comprehension in reading a text” (p. 4). Park [
17] accepted one of the following techniques as operationalization of reading strategy use: interview, questionnaire, observation, journal, and think-aloud protocol. This study concentrated on L2 English only (EFL and ESL). As many of the potential studies were intervention studies that did not report correlation coefficients, Park [
17] computed correlations for studies that provided the necessary information, ultimately including 20 samples. The overall mean correlation was 0.326. Sun et al. [
19] investigated four types of reading strategies (affective strategy, elaboration strategy, monitoring strategy, and organization strategy). The final sample of this study consisted of 44 articles (57 samples) written in either English or Chinese. However, L2 reading was not their primary interest. They included both L1 and L2 studies in their analyses and examined language type (L1 vs. L2) as one of the moderators. The effect size of L2 readers was reported only for the monitoring strategy, where L1 and L2 readers were significantly different. Therefore, we take their meta-analytic result from this strategy only (four L2 samples with a total of 673 population, calculated from
Table 1, pp. 5–6). The mean correlation was 0.56 (calculated from Fisher’s
z of 0.63, which was the original effect size unit reported by Sun et al. [
19]).
Park [
17] is the only study providing moderator analysis results. Four moderator variables were found to be significant: grade level, language context, L1 or native culture, and standardization of reading comprehension test. Correlations were higher in postsecondary than secondary school levels, in ESL than EFL contexts, and in studies that used standardized rather than researcher-made reading comprehension tests. Moreover, there was a variation among five groups based on L1 or native culture. Although the last result may show the effect of environmental factors such as educational systems, teaching/learning environment, and culture, as conjectured by Park [
17], the sample sizes of some groups were small (two or three); hence, more studies are desired. On the other hand, there was no effect of the research design (correlational vs. experimental). Though not significant in effect sizes, the sample size contrast between correlational and experimental/quasi-experimental designs suggests a trend in L2 reading research. The sample size was much larger for experimental studies (
k = 14) than correlational studies (
k = 6), showing that more attention has been directed to pedagogical questions, such as the effects of reading strategy instructions (experimental/quasi-experimental design), than componential or construct-related questions (correlational design).
What we can say from the mean correlations between reading comprehension and metacognition/reading strategies from these meta-analyses is that the overall correlation is not large, ranging from small (0.32, 0.33, 0.326) [
1,
2,
17] to medium (0.56) [
19]. Sun’s result is based only on four L2 studies and one type of reading strategy. Considering this, the available meta-analyses likely point more strongly to significant but small effect sizes of the association of metacognition/reading strategies.
A notable trend among the four meta-analyses is very little overlap in their samples. Except for Jeon and Yamashita [
1,
2], by the same authors, virtually no study overlapped in the samples included in these meta-analyses. This may be partly due to the non-identical construct even with conceptual overlap (metacognition and reading strategies), theoretical differences (e.g., including studies that meet a certain theoretical framework, see Sun et al. [
19]), or any other methodological differences across meta-analyses, even if all seem to have followed standard meta-analytic procedures. Because the sample sizes are relatively small in this domain, one way to increase it may be to include intervention studies, as Park [
17] did by calculating correlation coefficients where possible. Since intervention studies seem more popular among L2 researchers, this method is likely to help increase the number of potential studies to include in future meta-analyses. The fact that the results of moderator analysis are available only from Park [
17] indicates the necessity of replicating it through further meta-analyses.
11. L2 Listening Comprehension and L2 Reading Comprehension: Findings from Meta-Analyses
Three meta-analyses examined the relationship between L2 listening comprehension and L2 reading comprehension: Jeon and Yamashita [
1,
2] and Lee et al. [
12]. Auditory/oral language comprehension and its relationship with reading comprehension have been at the center of the Simple View of Reading [
30], an important theory of reading that proposes a drastically parsimonious model of reading. This theory notes that reading outcomes can be largely accounted for by two key factors: decoding and linguistic comprehension, the latter of which is often operationalized as comprehension of auditory/oral language input. Originally built on Spanish–English bilingual readers, this theory has inspired L2 reading researchers, and this interest is also reflected in the meta-analyses included in this review, especially in Lee et al. [
12], which aimed to directly examine the validity of the Simple View of Reading through MASEM.
In Jeon and Yamashita [
1], L2 listening comprehension was treated as a low evidence correlate as it only had 14 effect sizes, and as a result, no moderator analyses were conducted. However, the overall average effect size was high (
r = 0.77), the highest among the six low-evidence correlates. In the updated study, Jeon and Yamashita [
2], six effect sizes were added, and the result was even stronger (
r = 0.812). Although smaller in size compared to Jeon and Yamashita [
1,
2], Lee et al. [
11] also found a strong overall correlation (
r = 0.59) based on 25 effect sizes. Furthermore, their MASEM showed that L2 listening comprehension was the stronger of the two contributors to L2 comprehension abilities (
β = 0.86 from L2 listening comprehension to L2 comprehension abilities vs.
β = 0.76 from L2 vocabulary knowledge to L2 comprehension abilities). The path coefficient from L2 comprehension abilities to L2 reading comprehension (
β = 0.47) was also larger than that from L2 decoding skills to L2 reading comprehension (
β = 0.38).
As mentioned earlier, Lee et al. [
12] did not report which primary studies included in their meta-analysis were synthesized for which variable. For this reason, we could not examine the extent of overlap between the study pool in Lee et al. [
12] and that in Jeon and Yamashita [
1,
2].
12. L1 Reading Comprehension and L2 Reading Comprehension: Findings from Meta-Analyses
Of the fourteen meta-analyses included in this review, only Jeon and Yamashita [
1,
2] examined the relationship between L1 and L2 reading comprehension. The lack of interest in L1 reading comprehension as a correlate of L2 reading comprehension among meta-analysts is inexplicable given its theoretical importance (e.g., Interdependence Hypothesis, Linguistic Threshold Hypothesis) which is evident in the popularity of L1 reading comprehension among researchers of L2 reading comprehension; to illustrate, Jeon and Yamashita [
1] included 22 correlations between L1 and L2 reading comprehension and Jeon and Yamashita [
2] included 34 correlations, the largest number of correlations, only second to vocabulary knowledge (
k = 51).
In Jeon and Yamashita [
1], L1 reading comprehension was a high-evidence correlate, with 22 effect sizes. The overall average correlation was medium in size (
r = 0.50), and L1–L2 language difference was found to have a significant moderating effect on the relationship between L1 reading comprehension and L2 reading comprehension; the average correlation was significantly higher (
r = 0.60) when the reader’s L1 and L2 were both Indo-European languages than when they formed a heterogenous pairing. In Jeon and Yamashita [
2] which included 12 additional effect sizes from studies published between 2011 and 2017, the initial findings were sustained; the overall mean correlation was, again, found to be medium in size (
r = 0.483). L1–L2 language difference was once again found to have a significant moderating effect, with the Indo-European L1–L2 subgroup showing a significantly higher correlation (
r = 0.626).
Since Jeon and Yamashita [
2] was an updated version of Jeon and Yamashita [
1] with 12 additional correlations, an examination of primary study overlap was irrelevant and was not conducted.
14. Discussion, Conclusions, and Suggestions
The past two decades have seen a proliferation of quantitative meta-analyses in L2 reading research. Through the present review of meta-analyses, we hoped to identify convergence or divergence in their findings, identify notable trends in methodology, and report any observations that would improve our practices of meta-analysis in the research domain in the future. In this section, we provide our findings and observations.
Across all subsets of meta-analyses that examined the relationship between L2 reading comprehension and a particular reading-related variable, we consistently found that the overlaps between the primary studies included in each meta-analysis were minimal, sometimes with no overlapping studies at all (e.g., the primary study subsets for morphological awareness in Dong et al. [
13] and Jeon and Yamashita [
2]). On the one hand, this lack of overlapping primary studies increases the reliability of the findings yielded by meta-analyses; for example, since Jeon and Yamashita [
2] and Melby-Lervåg and Lervåg [
16] each had 29 and 6 correlations between decoding and L2 reading comprehension with only three overlapping samples, the strong and significant relationship between these two variables found across the two studies are based on 33 independent samples (29 + 6 − 3 = 33). Since the magnitudes of average effect sizes for all reading-related correlates were largely comparable across multiple meta-analyses, the dearth of overlapping primary studies adds confidence to the collective findings of the meta-analyses.
On the other hand, the lack of overlap among primary studies across meta-analyses makes us wonder why the divergence across study pools is so great. While some differences in the literature search are anticipated for various reasons (e.g., differences in the search time frame, differences in search engines, databases, publications used for literature search, and differences in research questions and purposes), our observations of the primary studies included in the meta-analyses revealed that even some primary studies that were eligible according to the search criteria of the meta-analysis were simply not included without any explanation, and that this was the case with almost all the meta-analyses we reviewed in this paper. This observation seems to be related to the challenges of conducting a truly systematic and exhaustive search, perhaps the most difficult task of all procedures of meta-analysis due to the sheer amount of published and unpublished studies to be examined and databases that are under constant updating. Given this, we would like to make two suggestions: first, we recommend that, whenever possible, authors of meta-analyses always crosscheck their pool of primary studies with those of other, relevant meta-analyses. Most of the meta-analyses we reviewed in this paper were published within a relatively narrow time frame (2020 to 2023), which indicates that many of them were prepared around the same time and were not available for comparison when the authors were conducting literature searches. We conjecture that this constraint must have made it difficult for the authors to crosscheck their primary studies against those in other meta-analyses. Now that the meta-analyses and the primary studies synthesized in these meta-analyses are available, researchers working in relevant domains are strongly recommended to crosscheck their primary studies against theirs as well as any newer meta-analyses.
Despite the divergence in primary study pools and some methodological choices mentioned above, the overall results of the studies largely converged. As shown in
Table 3 and
Table 4, the overall effect sizes reported in meta-analyses on each of the reading-related variables did not vary greatly. Since the primary study pools of the meta-analyses included in this review overlapped only minimally, one possible interpretation of this finding is that the overall strength of the observed relationship between each reading-related variable and reading comprehension would remain similar if we conducted a truly large-scale meta-analysis including all the primary studies in each individual meta-analysis included in the present review. On the other hand, we would like to acknowledge that any summative interpretation of the findings of the individual meta-analyses included in this review must be conducted with caution due to the potential effects of many extraneous variables arising from divergence in study features (e.g., some studies only involved L1 Chinese samples while others included various L1 groups; some studies were more inclusive regarding the age range of eligible studies while others were more limiting).
Our review highlighted a new turn in the meta-analysis research on L2 reading; taking up the suggestions made by Jeon and Yamashita [
1], meta-analyses of correlation coefficients do not stop at the synthesis of univariate correlations, but are attempting model testing using advanced techniques such as MASEM. L2 reading research is a fertile ground for model testing due to its abundance of reading models: e.g., the Simple View of Reading [
30], the Interactive Compensatory Model of Reading [
31], and Bernhardt’s [
32,
33] interactive models of reading. While we commend the effort to expand the methodological frontiers, a close examination of Lee et al. [
12] yielded some serious concerns in both the conceptualization of the model and the construct validity of the measurement instruments. For model testing, Lee et al. [
12] used L2 vocabulary knowledge, L2 grammar knowledge, and L2 listening comprehension as the three indicators of the latent variable, L2 comprehension abilities. In the original Simple View of Reading model, vocabulary and grammar are not included as components of L2 comprehension. Rather, L2 comprehension is almost always operationalized as L2 listening comprehension [
30]. Apart from the theoretically questionable choice of subcomponents of L2 comprehension, a more serious problem lies in the modality of the tests. Upon examining the list of primary studies included in Lee et al. [
12], we noted that the vocabulary measures used in at least two primary studies [
34,
35] were written tests. In addition, grammar knowledge measures in at least two primary studies [
35,
36] included in Lee et al. [
11] were also written tests; therefore, they inherently involved decoding in the target construct. As noted earlier, in the original Simple View of Reading model, L2 comprehension is measured almost exclusively through an auditory comprehension test (listening comprehension test), so that the two key constructs, decoding and L2 comprehension, are kept separate. The apparent cross-contamination of decoding and L2 comprehension in the model tested by Lee et al. [
12] poses a serious threat to the validity of their findings. Since meta-analysis is, by nature, a secondary analysis of data, we strongly suggest that researchers take extra care to examine the descriptions of the measurement instruments reported in primary studies to prevent such problems.
Lastly, it is notable that two of the fourteen meta-analyses included in this review included primary studies with L1 Chinese participants only. Placing such a restriction makes the search process more challenging, as it limits eligible studies. Even with such a challenge, however, both studies by Dong et al. [
13,
14] managed to secure a substantial body of primary studies and carried out principled and theory-informed investigations into a special group of readers who share an orthography substantially different from English, the most popularly investigated L2 in reading research [
1,
2]. Considering the substantial presence of L1 Chinese learners of various alphabetic L2s, this line of research provides useful information for language teachers and researchers alike.