1. Introduction
In addition to single words, a large part of our vocabulary consists of multiword expressions, such as
how are you (greeting) or
know by heart (to be able to retrieve some information from memory). These multiword expressions are estimated to comprise between 20% and more than 50% of our spoken and written language (
e.g., Biber et al. 1999;
Erman and Warren 2000). It has been suggested that these fixed phrases are easier to store in memory (e.g.,
Wray 2002) and have been found to facilitate production (e.g.,
Arnon and Cohen Priva 2013;
Tremblay and Tucker 2011) and processing (e.g.,
Arnon and Snider 2010;
Conklin and Schmitt 2012;
Tremblay et al. 2011). Idioms are a subtype of these multiword expressions, as they are not only fixed phrases, but also have an intended meaning that is different from their literal meaning. For example, the meaning of
know by heart is unrelated to the literal meaning of heart. Idioms are a fundamental part of our adult language and idiom vocabulary is crucial for nativelike language proficiency (
Pawley and Syder 1983). For language learners—whether native or a foreign language (e.g.,
Cieślicka 2006;
Conklin and Schmitt 2008)—this additional, idiomatic interpretation seems difficult to learn: the development of the idiom vocabulary seems to be delayed compared to the single word vocabulary (
Carrol 2023;
Kuiper et al. 2009;
Sprenger et al. 2019). Whereas single-word learning levels off around the age of 20 (
Brysbaert et al. 2016), idiom vocabulary levels off at a later age (
Carrol 2023;
Sprenger et al. 2019) or may even show a different development trajectory over age (
Carrol 2023, but see
Sprenger et al. 2019). It is an open question what the mechanisms are that lead to the observed delay in idiom acquisition, relative to the single-word vocabulary. This paper aims to add more insight in the process of idiom acquisition by investigating children’s exposure to idioms.
Unlike single-word vocabulary research, attempts to quantify idiom knowledge so far have been sparse. We know neither how many idioms an adult language speaker knows, nor how many idioms are part of the language. Idiom dictionaries may contain more than 10,000 idioms (e.g.,
Ayto 2010 for English and
de Groot 1999 for Dutch), but it is not clear whether language users know and use all of these.
Martinez and Schmitt (
2012) constructed a list of 505 ‘phrasal expressions’—non-transparent and fixed multiword expressions with high frequencies—which may include syntactically fixed, non-transparent idioms. They concluded that these multiword expressions are part of the top 5000 most frequent word families. This led
Brysbaert et al. (
2016) to calculate that a 20-year-old English native speaker knows on average 4200 non-transparent multiword expressions and a 60-year-old 4820 (i.e., 10% of the estimated average single-word vocabularies of 20- and 60-year-olds). However, these estimates assume that the proportion of multiword expressions and idioms in our vocabularies is constant over age, which is not in line with the above-mentioned finding that idiom knowledge develops slowly across the life span (
Carrol 2023;
Sprenger et al. 2019). Furthermore, we also do not yet know much about children’s and adolescents’ idiom vocabulary.
The prevalence of multiword expressions in languages has inspired an ongoing debate on how language is learned and represented: various studies find evidence that children store frequently co-occurring words (i.e., multiword expressions) as holistic representations, and only later learn to decompose these (e.g.,
Arnon and Snider 2010;
Bannard and Matthews 2008; see
Arnon and Christiansen 2017, for a review). This contrasts with the traditional view that children first learn words and apply computational operations to combine these into larger units (e.g.,
Pinker 1991, see
Contreras Kallens and Christiansen 2022, for a recent theoretical approach to this problem). For example,
Bannard and Matthews (
2008) reported that 2- and 3-year-olds are more accurate and faster in repeating high phrase frequency four-word expressions than similar four-word expressions with a low phrase frequency, using multiword expressions extracted from a corpus of child-directed speech. Similar processing advantages for multiword expressions have also been reported for adults (e.g.,
Arnon and Snider 2010;
Arnon and Cohen Priva 2013;
Tremblay et al. 2011). The finding that young children are sensitive to the phrase frequency of multiword expressions suggested that these phrases are stored in memory from a young age. Interestingly, a study by
Nicoladis (
2019) suggests that children can decompose fixed expressions at an earlier age than generally assumed:
Nicoladis analyzed highly frequent fixed expressions produced by a three-year-old French-English bilingual child and found cross-linguistic influences on the child’s use of fixed expressions (for example, using the expression ‘I have hungry’ instead of ‘I am hungry’ in English, because in French the verb
avoir—‘to have‘—is being used for this expression).
As idioms are also multiword expressions, the above-mentioned findings suggest that
frequency should be a good predictor for idiom knowledge in children. However, the figurative meaning of idioms adds another dimension to the task of the learner, because they do not only need to learn a specific configuration of words, but also their non-literal interpretation. It is conceivable that the difficulty of this task varies with the degree to which an idiom is perceived as being
transparent: the more straightforward the relationship between the idiomatic meaning and the literal meaning of the constituent words, the more transparent the idiom. Various experimental studies have shown that transparent idioms are easier to understand and explain for children (e.g.,
Cain et al. 2009;
Gibbs 1991;
Levorato and Cacciari 1992;
Nippold and Taylor 1995;
Nippold and Rudzinski 1993). For opaque idioms, without a clear relation between the two different meanings, there is a consensus that language users must have stored and be able to retrieve the idiomatic meaning in memory in order to arrive at the correct interpretation (e.g.,
Swinney and Cutler 1979). In contrast, various studies have shown that language users—in the case of unfamiliar transparent idioms, even children as young as eight years old (
Levorato and Cacciari 1992)—can infer the idiomatic meaning, given a supporting context (e.g.,
Cain et al. 2009;
Gibbs 1987,
1991;
Levorato and Cacciari 1992;
Nippold and Martin 1989). This suggests that the idiomatic meaning can also be constructed compositionally, based on the constituent words (e.g.,
Cacciari and Tabossi 1988;
Titone and Connine 1999).
Titone and Connine (
1999) have argued that in idiom processing, both strategies—retrieving the idiomatic meaning from memory and interpreting the idiom compositionally—may be applied, and that the success of the strategies depends on the idiom characteristics: frequent and/or opaque idioms may benefit from storing the meaning in memory, whereas the compositional process may work better for the less common and/or more transparent idioms. Thus, both frequency of occurrence and the idiom’s transparency may influence idiom processing and memory access.
These same two predictors—idiom frequency and transparency—also play an important role in theories of idiom acquisition. One line of research has focused on children’s exposure to idioms, measured by familiarity ratings (e.g.,
Nippold and Taylor 1995;
Nippold and Rudzinski 1993;
Nippold and Martin 1989). In these experiments, children and adolescents show a gradual increase in their understanding of idioms with age, and better comprehension for high-familiarity idioms than for low-familiarity idioms. Interestingly,
Reuterskiöld and Van Lancker Sidtis (
2013) found that children do not require much exposure to learn unfamiliar idioms: 8–9-year-old children recognized idioms significantly better than novel utterances that they had heard once in a conversational context. Another line of research has focused on how children learn to derive an idiom’s meaning from context.
Levorato and Cacciari (
1992,
1995) have proposed that the compositional interpretation of idioms is dependent on children’s cognitive development: children first need to acquire
figurative competence, a set of skills necessary for considering a figurative interpretation and for deriving a figurative interpretation from the context. Only with figurative competence are children able to make use of the idiom’s transparency to compute the figurative meaning. Figurative competence develops over time, starting around age eight years old (
Levorato and Cacciari 1995). This set of skills is not specific to idiom interpretation, but is assumed to facilitate language comprehension in general, and may include semantic analysis—i.e., retrieving alternative meanings of polysemous words and inferring a non-literal meaning of a phrase—and context processing—i.e., deriving an interpretation that is coherent with the broader context (e.g.,
Cain et al. 2009;
Levorato and Cacciari 1995).
Levorato and Cacciari (
1995) proposed that children focus on the literal meaning of the words (in a ‘local, piece-by-piece elaboration of the text’,
Levorato and Cacciari 1995), and need to learn to integrate information from the global discourse to arrive at an interpretation that is coherent with the surrounding context. Other researchers (e.g.,
Piquer-Píriz 2020;
Winner 1988;
Zurer-Pearson 1990) have argued that figurative competence is acquired gradually, starting at a much earlier age. It has been proposed that one of the reasons for young children’s tendency to interpret figurative language literally may be that they lack world knowledge to understand the link between the figurative and literal interpretation (e.g.,
Piquer-Píriz 2020). An implication of this hypothesis is that children may understand and use a very specific set of idioms that fits their understanding of the world. Furthermore, for this specific set of idioms young children may be able to use transparency to derive the figurative meaning from context. This may explain why even 5-year-old children have been found to select the correct interpretation of idioms when presented in a supportive context (
Cain et al. 2009;
Gibbs 1987).
Although these lines of research are sometimes contrasted, it is probable that both idiom exposure and the development of figurative competence skills (which may be dependent on increasing world knowledge, cf.
Piquer-Píriz 2020) jointly play an important role in idiom acquisition, and that their relative contributions may be modulated by idiom characteristics such as transparency (comparable with
Titone and Connine (
1999)’s proposal for idiom
processing).
1.1. Current Study
In our view, idioms pose two challenges for learners of a native language: (1) language learners need to detect that a multiword expression is idiomatic, having another meaning than the literal (compositional) meaning of the constituent words, and (2) they need to learn the intended meaning. Figurative competence plays a role in these two processes: children need to be aware that phrases can have a non-literal meaning in order to detect an idiom, and they need sufficient context processing skills and world knowledge to derive the intended meaning from the context (cf.
Cain et al. 2009;
Levorato and Cacciari 1992,
1995;
Piquer-Píriz 2020). Our hypothesis is that when children’s figurative competence is developing (i.e., in older children), idiom transparency will start to influence their learning and processing: transparent idioms are understood better and remembered more easily, and therefore, they become relatively more familiar than opaque idioms. In addition, we expect an influence of idiom frequency for the same children, because language exposure also plays a role in these two processes (
Nippold and Taylor 1995): children need to encounter an idiom in order to recognize it as such, and they need to learn the idiomatic meaning.
However, these two processes are only necessary when the idiom actually has a competing literal interpretation. When the idiom is only used idiomatically, children may just learn to associate this idiomatic meaning with this phrase, similar to learning other types of multiword expressions (cf.
Arnon and Christiansen 2017;
Arnon and Snider 2010;
Bannard and Matthews 2008). In such learning situations, figurative competence (i.e., the ability to derive the figurative meaning) is not required and language exposure will be the only predicting factor. As a result, our hypothesis is that young children may show an effect of frequency, but not for transparency, and only for specific idioms that occur in children’s language input.
To test these hypotheses, the current study investigates the effects of idiom exposure on children’s acquisition of idioms in their native language, and also the influence of transparency as a marker of children’s figurative competence.
1.2. Operationalizing Idiom Frequency and Transparency
Levorato and Cacciari (
1992) have put forward a similar proposal and have tested the roles of idiom familiarity and context on idiom interpretation in seven-year-old and nine-year-old children. The seven-year-olds showed more idiomatic interpretations for familiar idioms, whereas familiarity did not play a large role for the nine-year-olds when context was provided, suggesting that younger children may be more sensitive to exposure than older children.
A potential problem with this study and other studies is that frequency of occurrence is operationalized in terms of familiarity ratings (e.g.,
Levorato and Cacciari 1992;
Nippold and Taylor 1995, see also
Bonin et al. 2013;
Hubers et al. 2019;
Tabossi et al. 2011;
Titone and Connine 1994). Typically, these familiarity ratings are provided by adult participants (for example, by teachers in
Levorato and Cacciari 1992). However, idiom familiarity increases with age and children’s familiarity ratings are generally quite different from those of adults (
Carrol 2023;
Nippold and Rudzinski 1993;
Sprenger et al. 2019). Therefore, adults’ familiarity ratings may not reflect children’s idiom exposure. An additional theoretical concern is that the use of familiarity ratings to capture frequency of occurrence assumes that idioms are always stored in memory. However, it may be easier to store an idiom for which the idiomatic interpretation was successfully derived from context than an opaque idiom that was not understood well (however, see
Reuterskiöld and Van Lancker Sidtis 2013). It also has been found that familiarity may not solely capture frequency of occurrence, but may also be influenced by transparency and other idiom characteristics (cf.
Carrol et al. 2018;
Keysar and Bly 1995;
Nordmann et al. 2014). For example,
Carrol et al. (
2018) showed that for native speakers, familiar idioms are perceived as more transparent. Therefore, the current paper investigates children’s idiom exposure by looking at corpus frequencies (cf.
Bannard and Matthews 2008, for multiword expressions).
To date, few studies have used corpus frequencies to capture idiom exposure. A probable reason is that idioms are not easy to find in a corpus, because they may show many types of variation (including syntactic and/or lexical variation, as well as insertions or modifications of adjectives and adverbs, e.g.,
Barlow 2000;
Moon 1998). Furthermore, corpus studies require at least some degree of manual checking of the results, to verify that the non-literal meaning of the phrase was intended.
Sprenger et al. (
2019) collected frequency counts for 193 Dutch idioms from the Lassy Large corpus (
Van Noord et al. 2013), a 700-million-word corpus of Dutch texts from mixed sources. They investigated how frequency and
decomposability (i.e., a different measure to quantify the relation between the literal and figurative interpretation, rating how strongly the literal meaning of the constituent words contribute to the figurative interpretation) influenced familiarity ratings for different ages, and reported interactions between age and frequency and age and decomposability: whereas
low-frequency idioms receive low familiarity ratings for all ages,
high-frequency idioms show a sharp increase in familiarity ratings over age for participants younger than 30 and are rated as being highly familiar by participants older than 30 (
Sprenger et al. 2019). Decomposability seemed to only influence familiarity ratings by young adults, with the familiarity ratings increasing with decomposability (i.e., higher decomposability ratings reflect a shorter distance between the literal and figurative interpretation). The current paper uses these same idioms with frequency and decomposability ratings to compare children’s and adults exposure to these idioms.
In the literature, the relation between the literal meaning of the constituent words and the idiomatic interpretation has been defined and measured in different ways. For example, the concept of
transparency measures how easy it is to derive the idiomatic meaning from the literal meaning, focusing on the underlying motivation (i.e., why this idiomatic meaning is associated with the phrase
Cieślicka 2015). Another commonly used measure is
decomposability (or semantic analyzability), which measures how strongly the literal meaning of the constituent words contribute to the figurative interpretation, focusing on the constituents and structure of the idiom (
Cieślicka 2015). However, these measures are not defined in consistent ways and the terms are used interchangeably (as discussed in
Carrol et al. 2018;
Hubers et al. 2019). Typically, the idiomatic meaning is provided when these ratings are collected to control the idiomatic meaning that participants rate (e.g.,
Hubers et al. 2019;
Sprenger et al. 2019; however, see
Carrol et al. 2018, for another approach). Although transparency and decomposability measure different idiom properties (e.g.,
Carrol et al. 2018;
Cieślicka 2015;
Nunberg et al. 1994), they are related in that they both aim to quantify an aspect of the distance between the literal and figurative interpretation. For the current studies, we assume that when children’s figurative competence is developing (i.e., in older children) they are able to use both idiom properties—transparency and decomposability—to derive the figurative meaning from context. Therefore, both measures can serve as a marker for the development of children’s figurative competence. In this study, we have used the collected decomposability ratings from
Sprenger et al. (
2019) to quantify the relation between the literal and idiomatic meaning, and therefore we will use the term
decomposability in the remainder of this paper, with higher values indicating a stronger relation—that is, with higher decomposability values, it is easier to get to the figurative meaning from the literal meaning.
In the following sections, we will present three studies that together investigate children’s exposure and familiarity with idioms. Study 1 is a corpus study investigating which idioms of the database of (
Sprenger et al. 2019) occur in a corpus of Dutch children’s books. Study 2 is a controlled experiment that compares children’s and adults’ familiarity with idioms, and how these are influenced by frequency counts from the adult corpus, the frequency counts from the children’s books corpus, and idiom decomposability. Study 3 presents familiarity ratings from children collected by means of an online questionnaire, using the same set of idioms, to verify the results of Study 2. Together, these three studies provide new insights in children’s exposure to idioms and how their idiom exposure influences their idiom vocabulary as measured in the familiarity ratings.
3. Study 2: Controlled Experiment
To test whether children’s idiom vocabulary is predicted by children’s idiom exposure and the idiom’s decomposability, we asked children of around 7 years old, children of around 9 years old (cf.
Levorato and Cacciari 1995), and adult controls to indicate their familiarity with 104 idioms from Study 1 (which were based on the
Sprenger et al. (
2019) database). For these items, decomposability ratings and adult frequency counts are available, and additionally we added the frequency counts from Study 1 (henceforth
child frequency). Children that were between 7 and 9 years old were tested, because we expect to see differences between these age groups. Experimental studies suggest that 7-year-old children generally have more difficulties with selecting the correct interpretation for idioms
without supporting context than 9-year-old children (e.g.,
Cain et al. 2009;
Gibbs 1991). We are interested to test whether this differences also shows in their familiarity ratings and whether the idiom frequency modulates their familiarity ratings differently. Furthermore, it has been proposed that figurative competence develops between 7 and 11 years old (cf.
Levorato and Cacciari 1995). Therefore, we want to test whether 7- and 9-year-old children show a different effect of decomposability on their familiarity ratings.
3.1. Methods
3.1.1. Participants
Two classes from a Dutch primary school participated in the study: 15 participants (9 male, 6 female) from grade 4 and 15 participants (11 male, 4 female) from grade 6 in the Dutch school system, which means that the children were about seven years (m = 7;4 years, range = 6;11–7;9) and about nine years old (m = 9;7, range = 9;2–10;9), respectively. In addition, 15 adults (9 male, 6 female) participated as controls, with a mean age of 23 (range = 19–27).
3.1.2. Experimental Design
Each participant was presented with a unique semi-random list of 30 items: the idioms were ordered by frequency (based on the LassyLarge (adult) corpus;
Van Noord et al. 2013) and labeled as
high frequency (idioms 1–16, 15% of),
mid frequency (idioms 17–47, 15–45%), and
low frequency (idioms 48–100, 45–96%). For each participant, 15 idioms were randomly selected from the high-frequency idioms, 8 idioms were selected from the mid-frequency idioms, and 3 idioms were randomly selected from the low-frequency idioms. The reason for using this semi-random selection procedure was to make it more likely for children to hear a familiar idiom—which would make the test more motivating—by including more higher frequency idioms. Due to an implementation error, the randomly sampled high-frequency idioms were presented first, followed by a block of randomly sampled mid-frequency idioms, and the experimental session concluded with a block of low-frequency idioms. As the experiment was rather short and the idioms were randomly ordered within these blocks, this unintentional effect probably did not have large consequences on the results. Each list also contained four control items in fixed positions in the list, namely at trials 7, 14, 21, and 28. These control items were idioms that were not expected to be familiar to children, because they had low frequencies in the adult corpus and they were not familiar to the (adult) Dutch speaking authors involved in designing this study.
3.1.3. Procedure
The experiment was implemented in Open Sesame 3.2.5 (
Mathôt et al. 2012) and was presented on a Lenovo 10” TB-X103F tablet. Each trial started with a fixation dot, followed by a and an idiom phrase, such as ‘Toen hield hij een oogje in het zeil’ (literal translation
Then he held an eye in the sail, meaning to keep an eye on things). Each experimental session started with three practice trials, which were not included in the analysis data. The test block consisted of 30 trials. On each trial, the phrase was presented on the screen and after 500 ms the phrase was also presented auditorily. The sound files were prerecorded by a native female speaker of Dutch. After hearing the sentence, the question ‘Ken je deze?’ (
Do you know this one?) was added to the screen with a big green checkmark symbol (✓) on the right and a big red cross symbol (×) on the left, for answering ‘yes’ and ‘no’, respectively. Participants had to press these pictures to indicate whether they recognised the idiom or not. Before answering the question, participants could ask the experimenter to play the sound recording again. After rating the idiom for familiarity, a second question was asked about the idiom (‘Who is likely to use this idiom?’), aiming to identify with which age groups children associate the idiom. As this question was difficult for children to answer and the data are outside the scope of this paper, we will only present the results of the familiarity ratings here.
3.2. Results
Participants completed between 14 and 30 items (mean 28.9; 46 responses were missing in total). In addition, five participants were presented with one duplicate trial because of a technical error. The responses for the second encounter have been removed, only the first encounter with the idiom was included in the data. This increased the missing responses to 51 (3.7% of 1350), resulting in valid data for 93 (out of 104) idioms.
Figure A3 in
Appendix B shows the number of observations per idiom. We then selected only those idioms that received more than two observations (19 idioms were excluded and 30 out of 1299 observations, 2.3% of the data), resulting in valid data for 74 idioms: 16 high-frequency idioms, 29 medium-frequency idioms, and 29 low-frequency idioms. In addition, we used the four control items to check whether participants were actually doing the task: it would have been very unlikely that children are familiar with all four low-frequency idioms, so we decided to exclude children when they indicated familiarity with all four control idioms. Children indicated that they were familiar with 0–2 of the control idioms, and adults between 0 and 3 of the control idioms (Grade 4 mean 0.67, Grade 6 mean: 0.73, Adults mean: 0.67). No participants were excluded, and the control items were included in the analysis data.
Figure 3 shows the average familiarity ratings for each age group.
To investigate the effects of decomposability, adult frequency, and child frequency on children’s and adults’ familiarity ratings, we ran two analyses. First, we ran separate analyses for each of the predictors, to see the individual contribution of each of these predictors and how it interacted with age group. We ran these analyses separately to avoid spurious effects due to collinearity of the predictors. For the second analysis, we reorganized the three predictors in orthogonal terms using principled component analysis (PCA), and included the three PCA components and their interaction with age groups in one model, to verify the results of our earlier analyses. In all analyses, children were grouped by their school grades instead of their age, because we did not have access to background information (such as IQ, verbal skills, and language or attention disorders). Children within a school grade may still show a large variation in language skills, but their attending a regular school program in the Netherlands ensures a minimal level of IQ and language experience.
Figure 4 visualizes the results of the three separate analyses. Random intercepts for idioms and participants were included in all models to account for item and participant variability, but the data did not allow us to include random slopes for the predictors.
Decomposability. The top row of
Figure 4 shows the effect of idiom decomposability on the familiarity ratings. The adult participants (right panel) show a significant linear trend for decomposability (
= 23.50;
p < 0.001), quite similar in direction to the effect of idiom frequency in the adult corpus. The trend for the children in Grade 4 (left panel) was not significant (
= 0.01), but the trend for the children in Grade 6 (center panel) was significantly different from zero (
= 5.13,
p = 0.024). A model comparison procedure indicated that the interaction between decomposability and age group contributed significantly to the model (
= 24.25,
p < 0.001,
AIC = 25.9). Put differently, we see in all but the youngest age group that idioms are more likely to be familiar if they are also considered to be more transparent.
Frequency in adult corpus. The middle row of
Figure 4 shows the effect of the Zipf-transformed adult frequencies on the familiarity ratings. The adult participants (right) show a significant linear trend for frequency (
=41.68;
p < 0.001), but the trends for the children in Grade 4 (left) and Grade 6 (center) were not significant (
= 0.41,
p = 0.52, and
= 3.30,
p = 0.07, respectively). A model comparison procedure indicated that the interaction between frequency and age group was significantly contributing to the model (
= 44.8,
p < 0.001,
AIC = 45.5). That is, the more frequent an idiom in the adult corpus, the more familiar it is to the adult raters. However, this relationship is not seen in the two groups of children.
Frequency in children’s books. The bottom row of
Figure 4 shows the effect of the child frequencies on the familiarity ratings. The adult participants (right) show a significant linear trend for child frequency
= 8.79;
p < 0.003), quite similar in direction to the effects of the adult frequency and decomposability. Again, the trend for the children in Grade 4 (left) was not significant (
= 0.095), but the trend for the children in Grade 6 (center) was significantly different from zero (
= 11.85,
p < 0.001). A model comparison procedure indicated that the interaction between frequency and age group was significantly contributing to the model (
= 18.06,
p < 0.001,
AIC = 19.5). Note that for this analysis, we only included the idioms that appeared at least once in the children’s book corpus (35 out of the 74 idioms).
Presence in children’s books. We also tested whether the presence of an idiom in the children’s books corpus (categorical predictor: ‘yes’, ‘no’) influenced the familiarity ratings, as a complementary measure of looking at the influence of children’s idiom exposure. In this analysis, the familiarity ratings for the absent idioms are also included (74 idioms in total). This measure is illustrated in
Figure 5: For the youngest children, there was no significant difference in their ratings for idioms that were present and absent in the children’s books corpus. However, the ratings of the older children and adults increased significantly for idioms that were present in the corpus (
= 1.051, SE = 0.354, z-value = 2.97,
p = 0.003;
= 1.131, SE = 0.394, z-value = 2.87,
p = 0.004).
PCA analysis. The analyses presented in
Figure 4 show that all three predictors show a very similar influence on adult participants’ familiarity ratings, and decomposability and child frequency seem to show the same effect for the older children (Grade 6). To test whether we can separate the effects of adult frequency, children’s frequency, and decomposability, we reorganized the three (scaled and centered) predictors into three PCA components. All three components explain considerable proportions of the variance (0.47, 0.28, and 0.24, respectively), showing that they each potentially can account for variation in the data. The analysis only included the 35 idioms that were present in the children’s books.
The model showed a gradual effect for PC1, which captures the shared effects of the predictors decomposability, adult frequency, and child frequency. Children in Grade 4 did not show a significant trend for PC1, but the familiarity ratings of children in Grade 6 and adult participants increased with increasing values for PC1. PC2 did not show a significant trend for any of the age groups and did not contribute to the model. Only the older children (Grade 6) showed a significant trend for PC3 ((1.0) = 6.21; p = 0.013). This component captures the difference between the child frequency and decomposability. The direction of the effect of PC3 indicates that the older children are more sensitive to child frequency than to decomposability.
3.3. Discussion
In contrast to our hypothesis, we did not find an effect of child frequency on the familiarity ratings of the youngest children. A potential reason may be that we did not include enough idioms that they knew, because the idiom list was originally constructed for adult participants (see
Sprenger et al. 2019) and the selection procedure in this experimental study was based on adult frequencies. A closer look at the items that were rated by more than three children as familiar reveals that only one item fulfills this criterion for the youngest children, namely ‘Toen hield hij een oogje in het zeil’ (
Then he kept an eye on the situation). For the older children, there are six idioms that meet this criterion. These idioms are listed in
Table A7 and
Table A8 in
Appendix B (with translations).
In line with our predictions, we found that older children’s familiarity ratings are influenced by child frequencies, but not by adult frequencies. This confirms the conclusion from Study 1 that children’s idiom exposure may be quite different from adult’s idiom exposure. In addition, older children also showed an effect of decomposability, but the PCA analysis suggests that the effect of child frequency is stronger and may cancel out the effect of decomposability when these effects conflict.
Adults, on the other hand, showed an overlapping effect of adult frequencies, child frequencies, and decomposability. It may be the case that they are sensitive to all these three effects, or that these effects are driven by items for which these three predictors overlap in direction. Because these adult participants were relatively young, the results are in line with the results of
Sprenger et al. (
2019), who reported an effect of decomposability for younger adults.
In this study, the participants performed the task in the presence of the experimenter. This may have resulted in a response bias to rate idioms as familiar, even though the participants were explicitly instructed to only indicate idioms they recognized as familiar. However, the effects of frequency and decomposability are not expected to be cause or influenced by a response bias, because the participants were not aware of these manipulations. Nevertheless, it is useful to compare the overall ratings of Study 2 with Study 3, in which children completed an online questionnaire at home and did not meet the experimenter.
In Study 3, we zoom in more closely on the older children, comparing 9-, 10-, and 11-year-old children. We were interested to see whether the effects of decomposability and frequency would get stronger, and the children’s familiarity ratings would become more adult-like with age.
5. General Discussion
In the present work, we have investigated the extent to which idiom frequency and decomposability explain idiom knowledge in children between 7 and 12 years old. To this end, we combined adult frequency data and decomposability ratings from a previous study (
Sprenger et al. 2019) with the results of three new studies: in Study 1, we determined the frequency of 192 Dutch idioms in a corpus of 50 popular children’s books (>2.5 million words) in order to determine the extent and quality of idiom exposure in children’s literature. The results of this top-down approach show that only a subset of our items (i.e., less than half of them) appears in the corpus, often with very low frequencies. The sparseness of the data are in line with our expectations, as our item set was originally created for research on adults. Interestingly, however, we also see that the number of idioms that could successfully be retrieved from the children’s book corpus increases with target age. These observations suggest two things. First, children are indeed exposed to idioms in children’s literature, from the earliest ages onwards. Most probably, our estimates form a lower boundary for idiom exposure, as writers may very well have chosen to use other idioms that are not part of our item set. As a follow-up, it would be interesting to investigate idiom use in our corpus by means of a bottom-up approach, to see how many and what type of idioms are used in the corpus beyond our sample. Second, we see that the extent to which writers adapt their language use to their audience comprises the use of figurative language, with idiom use seemingly becoming more adult-like with target age. As writers—and, maybe even more so, editors—are strongly aware of their target audience, their use of idioms suggests that they expect their readers to be able to understand this type of figurative language, and that this understanding develops with age. Their expectations are in line with findings in the literature showing that children learn literal multiword expressions from a young age (e.g.,
Bannard and Matthews 2008), but also add the figurative dimension. To our knowledge, however, idiom knowledge in young children has not yet been studied systematically.
Another interesting conclusion from Study 1 is that frequencies of idiom occurrence in the children’s books did not correlate with occurrence frequencies in the adult corpus. One consideration here is that the adult corpus is a corpus consisting of mixed sources, including fiction, spoken language, newspapers, Wikipedia entries, manual descriptions, and the annual speeches of the former Dutch queen, whereas the children’s books corpus only consists of fiction texts, albeit written by 50 different authors. We nevertheless think that the difference between adult frequencies and the frequencies from the children’s books reflects a difference in language exposure that we would have found when including other sources of children’s language input, such as television programs and educational texts. Adults may also apply idioms to specific situations in which children are not involved: certain idioms may be commonly used in politics, but others in newspaper headlines or in business environments, all environments that are not part of children’s contexts. For example, the Dutch idiom ‘iets/iemand in de arm nemen’ (literal translation to take something/someone in the arm, meaning to recruit someone or a company) is an idiom that is typically used in the context of hiring lawyers, detectives, construction companies, or gardeners. This idiom has a relatively high frequency in the adult corpus () but was not found in the children’s books.
Apart from idiom frequency, Study 1 also investigated idiom decomposability. We found that the frequency in the children’s books corpus increases with the idiom’s decomposability. Put differently, children’s authors seem to have a strong preference for decomposable idioms. This effect may be explained by the (adult) authors deliberately selecting idioms that are easier to interpret for children, who are developing the skills for interpreting figurative language. An alternative explanation could be that idioms that are relevant for children typically describe more concrete situations and require less specific world knowledge, and that these idioms are more decomposable, or are perceived as more transparent. In contrast to the children’s book data, we did not find a similar effect of decomposability on the adult frequency counts. Interestingly, this finding mirrors observations by
Carrol (
2023), who collected transparency ratings from adults aged 18–77 years. Note that their transparency ratings are equivalent to our decomposability ratings. The transparency ratings were influenced by idiom frequency counts: the more frequent an idiom, the higher the transparency ratings, irrelevant of age. In other words, idioms that were rated as more transparent, were found more frequently in the corpus. The frequency counts were retrieved from a corpus of recent web-based newspapers and magazines aimed at adult readers, Corpus of News on the Web (NOW;
Davies 2016). The two corpora—the children’s books corpus and the NOW corpus—have in common that the texts involve careful editing. Therefore, it seems a likely explanation that language users actually use more transparent idioms when carefully writing and editing their text. The alternative hypothesis, that language users perceive high-frequency idioms as more transparent, seems less likely, because the adult frequencies in Study 1, which are retrieved from the 700-million-word Lassy Large corpus (
Van Noord et al. 2013), do not show the same effect.
In our second study, we attempted to fill the gap with respect to idiom knowledge in young children. Our aim was to test whether children’s familiarity increased with idiom exposure. We found that the familiarity ratings of the young children (age 7) were not influenced by idiom frequency. However, the familiarity ratings of older children (from age 9) increased significantly with increasing frequencies from the children’s books corpus, while no effect of the adult frequencies was found. The results of the nine-year-old children signal that a reliable estimate of children’s exposure is necessary for measuring an influence of the frequency of occurrence. The frequencies from the children’s books corpus may not have been a good estimate of the idiom exposure of seven-year-old children, because less than 9% of the corpus contained books that were suited for children younger than seven years old (see
Table 1). That is, similarly to the adult frequencies not being representative of nine-year-olds idiom exposure, idiom frequencies from the whole children’s books corpus may not be representative of the youngest children’s exposure. In addition, our idiom list, which was originally created for research in adults, may not have contained enough idioms that were familiar to these younger children, as discussed before.
Besides frequency, we investigated the influence of decomposability on idiom knowledge in children. We see that the familiarity ratings of the older children are influenced by decomposability, even though the underlying decomposability ratings from
Sprenger et al. (
2019) were provided by (young) adult participants in the age range 21–26 years, and so may not be representative for children’s perception of the idioms’ decomposability. The aforementioned study by
Carrol (
2023) reports that transparency does not change with age, and we seem to see at least some of that effect in our data as well.
The absence of an effect of decomposability on the familiarity ratings of the young children (age 7) in our study may be surprising in the light of previous studies that have found effects of decomposability in seven-year-old children’s interpretation of idioms without context (e.g.,
Cain et al. 2009;
Gibbs 1991). One of the reasons is that our study asked children to rate their familiarity with the idioms, rather than asking them to select the idioms interpretations. Because the list of idioms was not representative for young children, the amount of familiar idioms may not been sufficient to show an effect of decomposability. In addition, we investigated decomposability as a continuum instead of a categorical predictor (i.e., comparing highly decomposable idioms with non-decomposable idioms), which requires more observations for finding a significant trend. An alternative explanation is that the youngest age group experiences more difficulties with recognizing idioms without context, because their figurative competence skills are not sufficiently developed.
Our third study aimed to investigate the familiarity ratings of older children in more detail. We had expected to find stronger effects of frequency and decomposability with increasing age, but the limited number of observations per participant and item reduced the power of the effects, so that we did not find any differences between age groups. However, the overall results in Study 3 are quite similar to our findings for the older children in Study 2: the average familiarity rating for the Grade 6 (9 years old) children in Study 2 and Study 3 is highly similar (0.341 in Study 2 vs. 0.354 in Study 3; see
Figure 3). Interestingly, we do see a numerical increase in average familiarity ratings with age: 0.354 for Grade 6, 0.428 for Grade 7, and 0.483 for Grade 8, but this trend is not significant. In addition, the effects of frequency and decomposability—while much weaker in Study 3 than in Study 2—go in the same direction as the results for the older children in Study 2. There are significant trends for decomposability and child frequency, but no effect of adult frequency, and higher familiarity ratings for idioms that appeared in the children’s book corpus. In other words, the ratings obtained in Study 3 support the idea that children are exposed to more transparent idioms than adults, and that the frequency with which these idioms occur predicts idiom knowledge in older (9+ years) children.
The consistent effect of decomposability on familiarity ratings in older children and adults is in line with the findings of
Sprenger et al. (
2019), who reported that young adults provide higher familiarity ratings for idioms with higher decomposability. Adults older than 40 in their study did not show this effect.
Sprenger et al. (
2019) observed that the decomposability did not affect familiarity ratings once an idiom had been acquired and was highly familiar. The reason for this effect of decomposability in the current study and in the earlier study may be that decomposable idioms are more easily recognized as being idiomatic—and their idiomatic meaning more easily analyzed—than opaque idioms, and hence, they are perceived as potentially familiar. Study 1 and the study of
Carrol (
2023) provide an additional hypothesis: transparent or decomposable idioms are more frequently used in edited texts (including children’s books) than opaque idioms. Maybe this effect of decomposability is, therefore, an indirect effect of idiom exposure.