“How Often Do You Encounter the Verb Obnaruzhit’?” Subjective Frequency of Russian Verbs in Heritage Speakers and Other Types of Russian–German Bilinguals

Clasmeier, Christina; Anstatt, Tanja

doi:10.3390/languages9080256

Open AccessArticle

“How Often Do You Encounter the Verb Obnaruzhit’?” Subjective Frequency of Russian Verbs in Heritage Speakers and Other Types of Russian–German Bilinguals

by

Christina Clasmeier

^1,* and

Tanja Anstatt

^2,*

¹

Institut für Slavistik, Universität Münster, 48143 Münster, Germany

²

Seminar für Slavistik & Lotman-Institut für Russische Kulturstudien, Ruhr-Universität Bochum, 44780 Bochum, Germany

^*

Authors to whom correspondence should be addressed.

Languages 2024, 9(8), 256; https://doi.org/10.3390/languages9080256

Submission received: 20 February 2024 / Revised: 17 June 2024 / Accepted: 28 June 2024 / Published: 23 July 2024

(This article belongs to the Special Issue Heritage Russian Bilingualism across the Lifespan)

Download

Browse Figures

Versions Notes

Abstract

:

The literature shows that word frequency data obtained from corpora (corpus frequency, CF) and L1 speaker estimation (subjective frequency, SF) are substantially correlated. However, little is known about languages other than English and the frequency estimation of different types of bilingual speakers. We address both issues and compare the correlation coefficients of the CF and SF for 49 Russian verbs as well as SF data between four groups of Russian speakers: monolinguals (MOs), late bilinguals (LBs), heritage speakers (HSs), and foreign language learners (FLs). We gained SF data from a frequency estimation study with 447 participants and found that despite the reduced exposure to Russian in the three bilingual groups, their SF data were correlated with the CF at the same level (moderately) as the monolinguals’ SF. Interestingly, the correlations between the SF of the MOs, LBs, and HSs were very high, indicating that the SF is extremely stable over different speaker groups and that HSs do not differ from other L1 speakers in this respect. Furthermore, in absolute terms, HSs judged the verbs consistently lower than LBs and MOs, demonstrating that speakers have a finely adjusted ability to estimate the frequency with which they encounter words. The learners, on the other hand, were a clearly distinguished group, with only moderate correlations with all groups of L1 speakers.

Keywords:

bilingual mental lexicon; corpus frequency; subjective frequency estimation; Russian; heritage speakers; types of bilinguals

1. Introduction

1.1. Word Frequency

Word frequency is one of the most important factors when it comes to mental lexicon and language acquisition (Brysbaert and New 2009; Ellis 2002, 2012). The higher the frequency of a word, the quicker and more correctly it is processed. This word frequency effect has been shown in numerous studies (Balota et al. 2004; Brysbaert et al. 2011, 2018). For example, Brysbaert et al. (2011) conducted a mega-study analyzing test data and word variables on over 40,000 English words. They found that word frequency had by far the highest predictive power for lexical decision time, explaining over 40% of the variance. Most models have assumed that frequency is a learning effect: the more frequently a word is processed, the higher its resting activation, in turn leading to an easier and quicker activation (see discussion in Baayen 2010; Brzoza 2018; Brysbaert et al. 2011, 2018). The word frequency effect closely interacts with other factors, for example, the age of acquisition, but it remains robust even when the other variables are accounted for (Brysbaert et al. 2018). Thus, it is assumed that frequency information is stored in the mental lexicon together with other information on a given word.

Importantly, the word frequency effect has been shown to depend on individual language exposure. Carrying out two large word recognition experiments for English and Dutch (above 50,000 words for each language) with almost 1.5 million participants representing various demographic groups, Mandera (2016) examined the correlation of word frequency with response time and accuracy measures. More precisely, he analyzed 1. the effects of education, comparing L1 speakers who have different education levels; 2. the effects of age, comparing different age groups of L1 speakers; and 3. the effects of language acquisition type, comparing L1 speakers and foreign language learners of English. He found stable effects of language exposure: Individuals with high exposure to the language (through having a higher level of education, being older, or having it as L1) showed different frequency effects compared to those with less exposure. Furthermore, frequency was not found to be a monolithic factor; there are typical patterns for high-frequency and low-frequency words. Both show different effects on the groups: the higher the language exposure, the more the word frequency effect is concentrated on low-frequency words. On the other hand, the lower the level of language exposure, the greater the effect within high-frequency words. Several other studies—mainly comparing L1 and L2 speakers—have supported his findings (see Cop et al. 2015, overview in Monaghan et al. 2017). Based on these results, we can assume that the frequency information stored in the mental lexicon differs between different groups of speakers. This is what we are building on with our article.

1.2. Subjective Frequency and Corpus Frequency

A central question in determining word frequency as a background variable for psycholinguistic studies is how word frequency can be reliably measured. There are two methods that can be used: extraction from corpora (corpus frequency, CF) and the collection of frequency estimates from speakers, that is, subjective frequency (SF). The latter method is based on the assumption that speakers have the unconscious ability to estimate the frequency with which they encounter linguistic units (Ellis 2012). Based on this assumption, there is an entire series of studies in which data on the SF of words has been collected and analyzed (for an overview, see Anstatt 2016). Most of the studies on SF available to date have primarily been concerned with the correlation between SF and CF. The aim of the older studies is to prove the reliability of SF by demonstrating its high correlation with CF (Carroll 1971; Shapiro 1969; for Russian, see Frumkina 1966; Frumkina and Vasilevich 1971). More recent studies have focused on the question of which method is better suited for controlling the word frequency effect (Alderson 2007; Baayen et al. 2006; Balota et al. 2004; Brysbaert and Cortese 2011; Brzoza 2018; McGee 2008). Very few studies have so far used subjective frequency as a sui generis method to gain insights into the mental lexicon, even though some have assumed that precisely these differences to corpus frequency are informative (Anstatt and Clasmeier 2012; Sherkina-Lieber 2004). To date, only a few studies on SF have focused on Slavic languages (on Russian Frumkina 1966; Frumkina and Vasilevich 1971; Anstatt and Clasmeier 2012). Miklashevsky (2018) included data on SF as the “gold standard of psycholinguistic studies” in his normative data on 506 Russian nouns, but to the best of our knowledge, there are no such norms for Russian verbs.1

SF and CF data have been shown to be moderately or even highly correlated in several studies (e.g., Balota et al. 2001; Brysbaert and Cortese 2011; Desrochers and Thompson 2009, for Slavic Frumkina 1966; Frumkina and Vasilevich 1971; Brzoza 2018). Usually, CF data are easier to obtain, but SF is the more powerful technique when it comes to controlling the word frequency effect. However, the predictive power of CF data crucially depends on the quality of the corpus: if the colloquial language is represented to a sufficient extent, as, for example, in the case of subtitle corpora, the predictive power of the CF for the word frequency effect is higher than that from primarily written corpora.

Nevertheless, especially in the low-frequency range, SF outperforms CF (e.g., Brysbaert and New 2009; Desrochers and Thompson 2009; but see the reverse finding for Polish in Brzoza 2018) because low-frequency words are not equally distributed across different texts. This effect becomes even more severe when linguistically less-skilled participants, such as unskilled readers or second language learners, are involved (Kuperman and Van Dyke 2013; Chen and Dong 2019). CF data tend to overestimate low-frequency words for these speaker groups because these individuals have “less language exposure and smaller vocabulary size” (Chen and Dong 2019, p. 2). SF estimation does not show this bias and is therefore the preferred technique in studies that take low-frequency words into account.

Previous studies of subjective frequency have almost exclusively tested basic forms, i.e., infinitives for verbs and nominative singulars for nouns. They assume, usually implicitly, that frequency judgments refer to the entire lemma and not to the particular word form presented (cf. Balota et al. 2001; Brzoza 2018; Frumkina 1966; Miklashevsky 2018). Reid and Marslen-Wilson (2003) explicitly state that their “familiarity scores served as a basis for estimates of lemma frequency” (Reid and Marslen-Wilson 2003, p. 303). Anstatt and Clasmeier (2012) found systematic differences between the CF and SF for Russian aspectual forms of verbs. They compared the CF and SF estimates given by monolingual Russian L1 speakers and found that the morphologically and functionally primary form of an aspectual pair influenced the frequency estimate of the secondary partner. They concluded that Russian speakers perceive the partners of an aspectual pair as the same verb rather than two different ones and combine the individual occurrences of both aspect partners into one lemma (Anstatt and Clasmeier 2012). On this basis, we can assume that uninflected forms are generally interpreted by respondents as lemmas rather than tokens when estimating frequency. This is the underlying assumption of the present study. Ultimately, however, no conclusive findings are available, and this question must remain unanswered until further research has been conducted. Even if inflectional forms were tested, it would remain open whether the participants were judging the given form or the whole lemma.

However, SF entails some uncertainties as well: The results depend, for example, on the scale and statistical methods used (overview in Anstatt 2016). In addition, it is not fully understood exactly what information participants rely on when estimating a word’s frequency.

1.3. Subjective Frequency and the Bilingual Mental Lexicon

As mentioned above, an impressive number of psycholinguistic studies have shown that frequency information must be stored together with words in the mental lexicon. In monolingual L1 speakers, this information results from language experience over all modes and registers and from early childhood on. Although SF in relation to CF in monolingual speakers has been extensively studied, comparatively little is known about second language learners’ ability in this respect (but see Imai et al. 2005 for Spanish L2 learners of English). Recently, Chen and Dong (2019) explored the relationship between the CF and SF of English words in different corpora and Chinese L2 learners of English, assessing their predictive power on L2 lexical processing (Chen and Dong 2019, p. 3). They revealed that, again, SF outperformed CF, especially in the low-frequency range.

Frequency has been attested to be a highly relevant factor in bilingual lexical processing as well, but only a few systematic studies on SF in early or late bilingual L1 speakers have been published until now (Cop et al. 2015; Gollan et al. 2005, 2008). Emmorey et al. (2012), in a study with hearing American Sign Language–English bilinguals, addressed the widely made observation that bilingual speakers perform weaker in speech production tasks than monolinguals. They challenged the widespread interpretation that the observed disadvantages result from competition between the two languages of a bilingual and instead proposed the “weaker links” hypothesis (also referred to as the “frequency lag hypothesis”). They brought the word frequency as a factor into play and hypothesized that the weaker performance occurred because bilingual speakers divide the frequency of use between their two languages. Therefore, each word is activated less often than the same word in monolingual speakers. “As a result of reduced language experience, all words in a bilingual’s mental lexicon will be of lower experienced frequency compared to a monolingual speaker” (Schmidtke 2016, p. 3). As a consequence, “weaker links” exist between the semantic and phonological representation of a word in the bilingual mind, which, in turn, causes the observed slower or less accurate performance in speech production tasks. Furthermore, slowing for low-frequency compared with high-frequency words should be more pronounced in bilinguals than in monolinguals because differences in frequency of use have more profound effects at the lower than at the higher end of the frequency range. This was confirmed by Gollan et al. (2008) in a picture-naming task with Spanish–English bilinguals and English monolinguals and by Cop et al. (2015) for English monolinguals and unbalanced Dutch–English bilinguals in their L1 and L2 during natural reading.

However, some questions remain open concerning the concept of reduced language exposure within the scope of the weaker links hypothesis. Namely, is reduced language use in childhood or at present more relevant? A wide range of language biographies is possible among bilinguals, and each constellation correlates with different frequencies of the languages’ use across their lifespan. How would these differences—here following the weaker links hypothesis—be reflected in the bilinguals’ mental lexicons? The present study has set out to compare SF in different groups of bilinguals, that is, Russian heritage speakers, late bilinguals, and foreign language learners, to shed light on these questions.

1.4. Heritage Language Speakers and Other Types of Speakers

A heritage language is a language that children acquire through natural first language acquisition and that is a minority language in the situation in question. The heritage language is typically the nondominant language of the heritage speakers (HSs) (Polinsky 2018; Rothman 2009); however, proficiency in a heritage language forms a large continuum among HSs (Anstatt 2017; Montrul 2016). A wealth of research over the past two decades has focused on the particularities of HSs, and it is becoming increasingly clear that the characteristics of heritage speakers can be studied particularly fruitfully when compared not only with monolinguals but also with different types of speakers (Kupisch and Rothman 2018).

Based on language acquisition and language distribution, some large groups of language users, including HSs, can be distinguished (see, e.g., Montrul 2008):

Individuals who have acquired the given language as their only L1 and have spent their lives up to the present time in an environment where this language is the main language of society (“monolinguals”, MO);
People who acquired this language as their only L1 but changed their place of residence and, thus, the majority language after completing their first language acquisition (“late bilinguals”, LB);
Individuals who acquired the given language as their L1 but acquired a second language, which is the dominant language of the surrounding society, in parallel from an early age, before completing first language acquisition (“heritage language speakers”, HS);
People who acquired another language as an L1 and learned the given language as a foreign language in later childhood or as adults, typically at least partially by formal language instruction (“foreign language learners”, FL).

These types are idealized; between them, there are transitions and fuzzy edges (DeLuca et al. 2019). For example, “monolinguals” normally do not know only one language but may use an additional language on a regular basis. Heritage speakers may have varying degrees of contact with their heritage language. However, these characteristics capture the most important conditions: 1. Was the language acquired from birth, that is, as L1? 2. Was it acquired as the only language during childhood? 3. Is this language the socially dominant language, or is this role taken by another language? Therefore, it can be assumed that these four groups represent different types in terms of the amount of language exposure, so we can expect significant differences in the frequency with which words in the language in question have been encountered. With respect to bilinguals, further phenomenon attrition has to be taken into account, which may significantly influence the strength of the representation of words in the mental lexicon (Schmid and Köpke 2009). It is known to have severe consequences for heritage speakers but is an important factor for other bilingual groups as well. However, with respect to L1 attrition, puberty (around age 12) seems to be the most important turning point: L1 attrition in prepuberty can be much more severe than in post-puberty in bilinguals (Köpke and Schmid 2004; Ahn et al. 2017).

1.5. Research Questions and Hypotheses

The present study compares the subjective frequency estimation of Russian verbs in the four aforementioned groups of Russian speakers and addresses two main research questions. Our first research question concerns the correlation between corpus frequency (CF) and subjective frequency (SF) compared between the four groups: Does the frequency lag of Russian in the three bilingual groups result in weaker correlations between corpus and subjective frequency measures? Based on the results of previous studies, we expect the SF measures of the Russian monolinguals to be at least moderately correlated with CF data (Hypothesis 1.1). This correlation should be existent but weaker in the three bilingual groups: the smaller the participants’ exposure to Russian, the weaker the correlation should be (Hypothesis 1.2). Regarding our verbal stimuli, we expect the correlation between CF and SF to be weaker for low-frequency than for high-frequency verbs. This discrepancy should be observed in monolinguals but should be even more severe in the bilingual groups (Hypothesis 1.3).

Our second research question concerns the comparison between the SF estimation of the four groups. Are monolingual SF measures correlated with the measures of the other groups? Does less exposure to Russian (leading to the frequency lag), which is particularly characteristic of heritage speakers and foreign language learners, affect this correlation?

Based on the weaker links hypothesis, we expect the correlation between the groups’ SF estimations to be lower the more their language exposure differs (Hypothesis 2.1). Because heritage speakers and foreign language learners are supposed to be more heterogeneous regarding their language biographies and, consequently, their experience with Russian, we expect SF measures in these groups to be more heterogeneous than in late bilinguals and monolinguals (Hypothesis 2.2). This heterogeneity should be particularly high in the lower frequency range of our verbal stimuli (Hypothesis 2.3).

2. Materials and Methods

2.1. Verb Materials and Corpus Frequency

The material consisted of 49 Russian verbs and 4 control verbs, all in the infinitive form and representing the primary verb of an aspectual pair, including perfective as well as imperfective verbs.2 The verbs were selected based on their corpus frequency (CF) to cover a broad range of low- to high-frequency words. For this purpose, we used the Novyj Chastotnyj Slovar’ (New Frequency Dictionary) by Ljashevskaja and Sharov (2011) in the version of 2011 (hereinafter: NChS), which provides frequency information for 50,000 Russian lemmas in instances per million words (ipm). The selected verbs ranged from 525.8 ipm (vzjat’ ‘to take’) to 0.5 ipm (iznyt’ ‘to languish‘). The NChS is based on a corpus of 100 million tokens, thus meeting the requirements formulated by Brysbaert and New (2009) on the reliability of CF. The corpus consists mainly of written texts and about 5% of oral texts (see Sharov and Ljashevskaja n.d.), which means that spoken language is strongly underrepresented.3 As discussed above, film subtitle corpora have recently proven to be a more suitable source for psycholinguistic studies (see Brysbaert and Cortese 2011; Brzoza 2018; Mandera 2016).4

The four control verbs consisted of two extremely rare verbs (atukat’ ‘to hunt hares’ and jarovizirovat’ ‘to jarowize’) and two non-verbs (trul’bit’, grebljat’) whose aspect form did not vary. They were included to ensure that the participants had a correct understanding of the task and the arrangement of the judgment scale. The complete list of verbs and their frequencies is available in repository https://slavdok.slavistik-portal.de/receive/slavdok_mods_00000353, accessed on 27 June 2024.

2.2. Subjective Frequency Data Collection

The survey of SF was carried out using a questionnaire. Each of the experimental verbs was tested twice: once in the perfective aspect and once in the imperfective aspect. For the present study, we analyzed only the morphologically and semantically primary aspectual form of each verb. A total of 104 verbs (100 verbs in 50 aspect pairs, 1 of which had to be excluded for the analysis, and 4 control verbs) were distributed in two complementary questionnaires5; the 4 control verbs were included in both sheets. Thus, each participant was given a list of 54 Russian verbs in the infinitive and asked to estimate their SF. A total of three versions of each questionnaire were generated in different randomized sequences to avoid sequence effects.

The instruction—which was offered in Russian and German—read as follows: “On the next sheet, you will see a list of Russian words. Please mark how often you encounter these words in everyday life when you use Russian. (Consider all possible situations—when you speak, read, watch TV, etc.) Please use the whole scale from 1 (never) to 7 (at every turn) to rate them”. The instruction was followed by a presentation of the 54 verbs in a table, for each of which a value on a scale of 1–7 was to be selected. Each number was accompanied by a verbalization in Russian for all participants: 1 nikogda ‘never’, 2 ochen’ redko ‘very rarely’, 3 bolee ili menee redko ‘more or less rarely’, 4 ni chasto ni redko ‘neither frequently nor rarely’, 5 bolee ili menee chasto ‘more or less frequently’, 6 ochen’ chasto ‘very frequently’, and 7 na kazhdom shagu ‘at every turn’, which we adopted from Frumkina and Vasilevich (1971). The data collection took place in four phases between 2012 and 2023 (see Table 1).

The details of the data collection differed slightly between the phases. The data collection in the first three survey phases was conducted using a paper-and-pencil task, while the 2023 survey was administered online using the SoScisurvey.de tool.6

2.3. Participants

Following the SF rating, the participants completed a questionnaire on their sociolinguistic backgrounds, with a focus on their language acquisition process. A total of 447 questionnaires were collected, 144 of which had to be excluded (cf. Table 1 above). An amount of 93 participants were excluded based on the background data: For 83 data sets, the sociolinguistic data were incomplete or unclear. The foreign language learners had to have a minimum level of experience with Russian, so we excluded 10 participants with a weighted length of learning of less than 2 points. For the remaining 354 data sets, we checked whether the information on SF was meaningful. First, the four control verbs were inspected, and data sets that had a mean value above 2 for these four verbs were excluded. Second, if the range of SF judgments of a participant was 0 or 1, indicating the use of only one or two levels of the estimation scale, we excluded the corresponding questionnaire. Based on these criteria, 51 questionnaires were excluded. If frequency judgments were missing for individual verbs, the rest of the data set was still included. The distribution of the remaining 303 questionnaires by group—as well as information on the sociolinguistic background—are given in Table 2.

The participants were classed as monolinguals if they had acquired Russian as their only language in childhood and if they had continuously lived in an environment where Russian could be used for all everyday and professional needs. Individuals were assigned to the LB or HS group based on their age of immigration to a country where Russian cannot be used for all everyday and professional needs, necessitating the acquisition of another language. If this age was between 0 and 12, they were assigned to the HS group (where 0 means birth in the new country); if it was 13 or higher, they were classified as LBs. The country of immigration of the HSs and LBs was Germany in most cases, but a few individuals from the fourth survey phase lived in another country.

Foreign language learners learned Russian through formal instruction at school and/or university. Some of them spent a longer time in a Russian-speaking country. To consider the great heterogeneity of individual learning histories, we calculated a weighted length of learning as follows: 1 point was allocated per year of learning at university, and 0.5 points were awarded per year of learning at school. For longer stays in a Russian-speaking country, a scale of points from 1 point for one month up to 3 points for one whole year and 2 points for each subsequent year were allocated.7

Because we used two different questionnaires with complementary verbs, the number of data sets per verb was based on approximately half of the questionnaires; the exact numbers per questionnaire are given in Table 3.

2.4. Data Analysis

From the SF values collected, we calculated the median of grouped data for each of the 49 verbs in each of the four studied groups. The calculation of the median of grouped data resulted in a median with several decimal places. It allowed for a more finely grained SF scale of the verbs than the simple median, which was limited to values with the decimal places 0.0 or 0.5. We refer to this value as the median of the subjective frequency (MSF).

The ipm values of the corpus frequency were logarithmized so that distances better corresponded to frequency perception. This is the common method used to compensate for biases between different levels of corpus frequency: the difference between 1 and 2 ipm is psycholinguistically much more relevant than the distance between 101 and 102 ipm. Baayen et al. (2006, p. 293) point out that reaction times in lexical decision tasks increase with a constant number of milliseconds per log unit of frequency and that log-transformed word frequencies are approximately normally distributed.

The formula log10(ipm+1) was used, hence preventing ipm values below 1 from becoming negative (Baayen et al. 2006). For the comparison of CF and SF graphs, the values were standardized using z-scores to bring them to the same scale.

The relationship between CF and the MSF scale as well as between the MSF scales of the four groups of participants was analyzed using Pearson’s correlation.8 Previous studies on the correlation of SF and CF generally found correlations of r > 0.5 for monolingual speakers, with most results being considerably higher (e.g., Balota et al. 2001 for English r > 0.8, Brzoza 2018 for English and Polish > 0.7, Miklashevsky 2018 for Russian r > 0.6, Shapiro 1969 for English: r > 0.9). Very high correlations between SF and CF were also observed for groups other than monolingual speakers (Chen and Dong 2019 for learners of English: r > 0.8). Even higher values were found for the correlations between the scales of SF of different groups of speakers (Shapiro 1969 for three age groups: r > 0.9, Balota et al. 2001 for two age groups: r > 0.9). Because relatively high correlations seemed to be the norm, we set the threshold values shown in Table 4 (cf. Hinkle et al. 2003) for the interpretation of the correlation strength, allowing for a good description of the high values.

To analyze the dispersion, we calculated the interquartile range (IQR). In assessing the IQR, we used the rating scale according to Frumkina and Vasilevich (1971) and Krause (2002) (Table 5).

3. Results

3.1. Corpus Frequency (CF) and Subjective Frequency (SF) Estimation

In this section, we address our first research question and present the results regarding the correlation between the corpus frequency (CF) and subjective frequency (SF) of the four groups of Russian speakers. As postulated in Section 1.5, we expected the SF of the Russian monolinguals to be moderately or highly correlated with the CF. To test Hypothesis 1.1, we correlated the median of the subjective frequency (MSF) with CF values as extracted from the NChS (logarithmized to base 10).

As shown in Table 6, the CF of all verbs and MSF rated by monolingual Russian speakers are indeed moderately correlated, as attested to by the highly significant correlation coefficient 0.624 (p < 0.01).

Following Hypothesis 1.2, the CF/SF-correlation should exist but be weaker in the three bilingual groups. More precisely, the smaller the participants’ experience with Russian is, the weaker the correlation should be. However, as indicated in Table 6, the correlation coefficients for the CF and SF of late bilinguals (LBs), heritage speakers (HSs), and foreign languages learners (FLs), respectively, are on a similar level at r = 0.699 for late bilinguals, r = 0.639 for heritage speakers, and r = 0.628 for second language learners, with each correlation being highly significant (p < 0.01).

Figure 1 illustrates this finding. The 49 analyzed verbs are sorted by their (logarithmized and z-transformed) CF, which is represented by the black solid line. MSF measures are mapped by colored lines; their jagged course indicates that the correlation with the CF is not perfect. However, especially at the end points of the scale, MSF measures take mainly the same direction as CF data: verbs that rarely occur in the corpus, which are located on the left side of the graph, are rated to be met in everyday Russian very seldom as well, for example, iznyt’ ‘(to) languish’ and lakomit’sja ‘(to) nibble’. On the other hand (and located on the right side of the graph), verbs that occur very often in the corpus are judged to come across Russian speakers extremely often as well, for example, uspet’ ‘(to) be in time’ and vzjat’ ‘(to) take’. However, in between, the exact frequency rank order suggested by the corpus is not followed by the MSF estimation.

In some cases, one may speculate that differences in language use between written and spoken Russian would cause the observed deviations of the MSF from the CF, as in the verb pereborshchit’ ‘(to) overdo’. Pereborshchit’ occurs in the corpus very seldom (1.6 ipm), presumably because it is typical of colloquial Russian, which is underrepresented in the Russian National Corpus. However, we can assume that when evaluating the frequency of verbs in everyday language, participants strongly consider their experiences with colloquial language. As a result, the MSF measures for pereborshchit’ are higher than the respective CF data. Further research is needed to investigate the influence of spoken and written language in more detail.

According to Hypothesis 1.3, we expected the correlation between the CF and SF to be weaker among low-frequency verbs than among high-frequency verbs, and this should be more severe in the bilingual groups. To test this hypothesis, we divided our verbal stimuli into two groups: low-frequency verbs with an ipm of <30 (n = 20) and high-frequency verbs with an ipm of ≥30 (n = 29), which is close to the proposal made by Brysbaert and New (2009).9

The second row in Table 6 (see above) gives the coefficients of the correlation between the CF and SF in the low-frequency range. Contrary to our expectations, the correlations are not weaker, but even higher than for the whole range of stimuli: The correlations for the three L1 groups are high at r = 0.773 for the monolinguals, r = 0.788 for the late bilinguals, and r = 0.798 for the heritage speakers; the MSF of foreign language learners correlates only moderately at r = 0.604.

Finally—and again contrary to the expectations—when correlating the MSF and CF of the 29 more-frequent verbs, the correlation coefficients notably decrease, as indicated in the third row in Table 6, and this is again true for all four groups. These results are consistent with the impression of the extremely jagged lines in Figure 1 at the right part of the graph. Let us consider and compare the two verbs polozhit’ ‘(to) put, place’ with a CF of 158.1 ipm and svjazat’ ‘(to) connect, link, associate’ with 160.2 ipm. Their CF values are very close to each other, and both belong to the 150 most frequent Russian verbs (rank 144 and 141). In the case of polozhit’, MSF estimates reflect this high frequency as well: MSF = 6.29 (MO), 6.55 (LB), 6.35 (HS), and 5.55 (FL). However, Russian speakers across all four groups rate svjazat’ to be met in everyday life Russian much less often than indicated by the verb’s CF: MSF = 4.35 (MO), 5.00 (LB), 4.55 (HS), and 3.14 (FL). This discrepancy between the CF and SF might result from a characteristic grammatical feature of the verb svjazat’: a substantial part of its occurrences in the Russian National Corpus (RNC) is in the form of the past participle svjazannyj10, which is typical for written texts, and participants may not consider them when estimating the frequency of the verb in everyday Russian. The number of participles in the RNC is much smaller in the case of polozhit’, where the representation of the verb in the corpus aligns with the written and colloquial language experience of the participants.

3.2. Correlation of the SF Ratings of the Four Groups of Speakers

This section explores our second research question: what does the relationship between the subjective frequency (SF) judgments of the four speaker groups look like? Thus, in the following analyses, only the SF data have been taken into account, while the corpus frequency no longer plays a role.

Our first assumption in this regard (Hypothesis 2.1) concerns the correlation of the MSF of the four groups. We expected that the correlation would be stronger between the high-exposure groups and thus the strongest between the monolinguals (MOs) and the LBs, weakest between MOs and FLs, and in between for the HSs. This was exactly what we found when we calculated the Pearson correlation of the MSF (cf. Table 7): the correlation values reflect exactly the expected relationships. The difference is particularly evident in the correlations between the monolingual group and the three bilingual groups (see Table 7, first row). For these, we compared the correlation coefficients of the monolingual MSF with the MSF of the three bilingual groups with Fisher’s r to z transformation and found each coefficient to differ significantly from the others: MO/LB (0.956) and MO/HS (0.901): z = 2.01 (p = 0.044), MO/LB (0.956) and MO/FL (0.697): z = 4.96 (p < 0.001), MO/HS (0.901), and MO/FL (0.697): z = 2.95 (p < 0.005).

For a closer inspection of the results, it is useful to look at a graphical representation of the distribution of the MSF of the four groups. Figure 2 presents the MSF scores of the four groups sorted by the MSF of the monolingual group (blue line). Here, the relationships between the curves of the four groups, as shown in the correlation values, can be observed very clearly: The frequency judgments of the LB group (green line) are very close to those of the MO group, while the HS group (red line) deviates a bit more. Mostly, the HS group judges the verbs to be less frequent compared with MOs and LBs. The FL group (yellow line), on the other hand, deviates strongly from the other three groups, and their MSF is consistently much lower, with only a few exceptions. There are just two cases where the judgments of all four groups are the same: the least frequent verb iznyt’ ‘to languish’, judged by all groups with an MSF of about 1.6, and the high-frequency verb exat’ ‘to drive’, judged by all groups with an MSF of about 6.6. Obnaruzhit’ ‘to discover’ is a verb that represents the typical differences between the four groups of speakers: The MSF in the monolingual as well as in the late bilingual group is “5” (“I encounter this verb more or less frequently”), while in the heritage speaker group, the MSF is 4.4 and, thus, closer to “neither frequently nor rarely”. Finally, in the learner group, the MSF is 1.7, even slightly below “I encounter this verb very rarely.”

So far, we have only looked at the median of SF judgments while ignoring dispersion. This is the subject of our next hypothesis: we assumed that the lower the exposure to Russian, the more heterogeneous the SF judgments would be within the groups (Hypothesis 2.2). To investigate this, we determined the interquartile range IQR for every verb in each group and assigned it to one of six levels of agreement, from 1 (very high agreement) to 6 (bimodal distribution) (cf. Table 5 above). In the MO group, half of the verbs were judged with very high or high agreement, while almost the entire other half was assessed with moderate agreement (cf. Figure 3). Only 10% of the verbs show a low level of agreement, and no verb has very low agreement or bimodal distribution. The distribution in the LB and HS groups is similar to that of MOs, but the proportion of good agreement decreases, and the proportion of moderate and low agreement increases. In the HS group, as many as 50% of all verbs show low agreement. The FL group differs strongly from the other three. First, the proportion of verbs with very high to moderate agreement combined is less than 40%. Second, this is the only group that judges verbs with very low agreement or even bimodal distribution. Therefore, Hypothesis 2.2 is fully confirmed. Another interesting detail is the proportion of verbs with very high agreement: at 16%, the proportion is higher in the FL group than in the other groups, which can be explained by the high number of verbs that are unknown to the FL group.

In our last hypothesis (Hypothesis 2.3), we assumed that the heterogeneity in the groups with less exposure to Russian should be particularly large for the rarer verbs. To analyze this question, we split the verbs into low-frequency and high-frequency parts based on the CF, thereby using the same procedure as in Section 3.1.11 However, the results for the low-frequency (Figure 4a) and those for the high-frequency verbs (Figure 4b) are very similar to those of all 49 verbs together (Figure 3 above). In the MO and LB groups, there is slightly more heterogeneity for the low-frequency verbs compared with those of high-frequency verbs. Surprisingly, for the HS and the FL group—in contrast to MOs and LBs, the agreement is even higher for the low-frequency verbs than for the high-frequency verbs: Of the low-frequency verbs, 60% in the case of HSs and 50% in the case of FLs were rated with an agreement level of 1–3, which is a larger portion than that of all the verbs combined. Moreover, the FL group rated 35% of the low-frequency verbs with very high agreement. This can be explained by the high number of verbs unknown to the members of this group, which consequently are judged by “1” (“I never encounter this verb”). On the other hand, the heterogeneity in the FL group is especially high for the high-frequency verbs, with 72% of verbs with an agreement level of 4–6. Therefore, Hypothesis 2.3 was only partially confirmed.

An example of a high-frequency verb is zajavit’ ‘to declare, to claim’12 (Figure 5a). The MO and LB groups mostly judge this verb with 5 or 6, with both groups displaying a large agreement. The HSs, in contrast, show large variance, with 80% of the judgments between 3 and 6, while 10% claim to never meet this verb. The learner group, on the other hand, shows a bimodal distribution with two peaks at 1–2 and 4–5. Opustoshit’ ‘to devastate’ (Figure 5b) is an example of a low-frequency verb with a typical distribution of SF judgments: More than half of the FL and a third of the HS groups claim to never meet it. The MO and LB groups as well as small parts of the HS and FL, however, assign it a medium frequency of about 3–5.

4. Discussion and Conclusions

We have presented the results of our subjective frequency investigation with four groups of Russian speakers: three different groups of L1 speakers (monolinguals, late bilinguals, and heritage speakers) and foreign language learners of Russian. Our participants were presented a list of 49 Russian verbs and asked to estimate on a 7-point Likert scale how often they encounter these words in everyday life when using Russian. We calculated the median of grouped data for each verb and group of Russian speakers and correlated this median of subjective frequency (MSF) with corpus frequency data (CF) obtained from the Russian National Corpus (RNC) as well as between the four groups.

We found the CF and MSF of Russian monolinguals to be moderately correlated. This finding confirms hypothesis 1.1, which was formulated based on a large number of studies addressing the relationship of CF and MSF in numerous languages and different word classes. Our results for monolinguals fit perfectly with previous research findings. However, the correlation coefficient is on the lower side of the range, which can be explained by the fact that oral speech is underrepresented in the RNC.

Because the three bilingual groups have less exposure to Russian, we expected the correlation coefficients between their MSF and CF to be weaker than for the monolingual group (Hypothesis 1.2). This expectation was based on the “weaker links” or “frequency lag” hypothesis (Gollan et al. 2005, 2008). However, this expectation was not met. Obviously, reduced language exposure does not automatically result in divergent frequency estimation, at least not concerning the basic order of given words by their frequency. This is, in a way, plausible because regardless of the amount of language contact of an individual, the proportion of occurrences of each verb in the total amount of input remains approximately the same. This is true for late bilinguals and, to a lesser extent, for heritage speakers who acquired Russian in a similar environment to that of monolinguals and currently use it, at least in part, for the same purposes. We can assume that the communication situations faced by monolinguals and bilingual L1 speakers are, on average, similar in type and relative frequency. L1 speakers of the different groups encounter Russian words in equal proportions, even though, seen in absolute terms, monolinguals face each word more often than bilinguals. However, heritage speakers might differ from this when it comes to specialized vocabulary used in professional situations that are typically experienced in the majority language.

Unlike L1 speakers, Russian foreign language learners encounter Russian words with a systematically deviating frequency. This is because in guided language acquisition, words are introduced to the learner in a didactically motivated order. Furthermore, over the course of learning, individual words appear disproportionately often because they are, for example, particularly suitable for explaining or practicing certain grammatical phenomena. Nonetheless, even the MSF of foreign language learners is moderately correlated with the CF, indicating that frequency information is learned notwithstanding this limitation. Remember that we included only advanced learners in our experiment, many of whom had already spent some time in a Russian-speaking environment. However, the heterogeneity of the group makes a detailed analysis of the influencing factors impossible.

Regarding the verbs’ frequency range, we expected verbs with a lower CF to be more weakly correlated with the MSF compared with those with a higher CF (Hypothesis 1.3) since it largely depends on the (coincident) selection of texts during corpus building whether a very rare and maybe specialized word occurs in the corpus or not. Moreover, because this effect has been shown to be even increasing when it comes to the MSF of linguistically less-skilled participants (e.g., learners or unskilled readers), we expected the difference between correlation coefficients of low- and high-frequency verbs to be greater in the three bilingual groups. However, on the contrary, the correlation coefficients for our 20 low-frequency verbs were significantly higher than for the 29 high-frequency verbs, and this was true for all four groups.

The moderate correlation coefficients observed for the whole range of verbs in all four groups must be predominantly attributed to the good agreement between the CF and MSF for verbs at the extreme ends of the scale, especially at the lower end. In contrast, the order of the verbs in the middle range hardly agrees between the CF and MSF. This is an interesting result indicating that medium-frequency verbs systematically differ in their frequency of occurrences in written and oral speech. This assumption is further supported by the fact that the MSF data between the four groups are highly correlated, making it unlikely that this effect is accidental. Finally, there are two possible reasons for the higher correlation for low-frequency than for high-frequency verbs. First, there are more good-agreement verbs at the low-frequency end of the scale. Second, the smaller size of the low-frequency group (20 verbs, in contrast to 29 high-frequency verbs) means that the mentioned good-agreement verbs from the extreme end of the scale make up a larger proportion of the group.

The subject of our second research question was the relationship between the SF estimates within the four speaker groups. For this purpose, we determined the correlation of the MSF of the four groups. We found that the MSF of the three L1 groups (monolinguals, late bilinguals, and heritage speakers) was highly correlated. The correlation strengths directly reflect the amount of exposure to Russian: the greater the difference in language exposure between the groups, the lower the correlation between their MSFs. Accordingly, the correlation is the highest between monolinguals and late bilinguals and lowest between monolinguals and foreign language learners. This corresponds exactly to our expectations (Hypothesis 2.1). Mandera (2016) found that the word frequency effect depends on individual language exposure, which points to differences in the frequency information stored in the bilingual mental lexicon. It was precisely these differences that have been confirmed in our study. The weaker links hypothesis or frequency lag hypothesis (Gollan et al. 2005, 2008; Emmorey et al. 2012) may be a possible explanation.

In the second step, we examined the dispersion of the frequency estimates by means of the interquartile range within the four speaker groups. We assumed that the lower the exposure to Russian of the group, the more heterogeneous the judgments will be (Hypothesis 2.2). This hypothesis was also fully confirmed: the dispersion in the frequency judgments was lowest among the monolinguals, slightly higher among the late bilinguals, and much higher among the heritage speakers. The explanation lies in the fact that with less language exposure, there are greater differences in which part of the language the speakers come into contact with. The foreign language learners show quite different behavior with a large dispersion but, on the other hand, a higher number of verbs with very high agreement.

As a final sub-hypothesis, we assumed that the heterogeneity in the groups with lower Russian exposure should be particularly strong in the rare verbs. This hypothesis was not confirmed: among the heritage speakers, on the contrary, agreement was slightly higher for the low-frequency verbs than for the high-frequency verbs, and in the learner group, it was even considerably higher. This can be explained by the fact that the rarer verbs are not known to many speakers, especially in the learner group, hence judging these verbs as “I never meet this verb”. At the same time, this result is consistent with the typical patterns for high-frequency and low-frequency words according to Mandera (2016), who found that the word frequency effect in speaker groups with high language exposure concentrated primarily on low-frequency words, but in groups with less language exposure, it was concentrated on high-frequency words. Therefore, the explanation for the behavior observed by Mandera and our investigation is that there are more differences between speakers with a lot of language experience in the low-frequency range because rare words are also generally more heterogeneously distributed in the language, while speakers with less language exposure are similar to each other because of their unfamiliarity with rare words. Conversely, speakers with high language exposure are similar in the high-frequency range because the same words occur frequently for all of them. Speakers with less language experience differ more because they deal with smaller sections of the language.

Overall, the assessment of SF provides a picture in line with the literature on other aspects of word frequency as part of the mental lexicon and is consistent with the expectations arising from the language exposure of speaker groups. The differences in exposure are not reflected in the correlation with the corpus frequency, which is the same for all groups and surprisingly low. In contrast, the exposure is very clearly reflected in the correlation between the SF values of the groups: the lower the exposure, the lower its correlation with high-exposure groups. This also allows for conclusions to be drawn for the measurement of frequency as a background variable: especially in studies with bilinguals, the corpus frequency should not be used unless a more suitable corpus than the RNC is available. Rather, the subjective frequency of the bilingual target group should be used, but even the subjective frequency of the monolinguals is more suitable than the corpus frequency. Regarding heritage speakers, although the lower exposure is clearly reflected in their results, their frequency estimation behavior is surprisingly similar to that of other L1 speakers (i.e., monolinguals and late bilinguals). The fact that this closeness is not a methodological artifact is shown by the clearly distinct frequency estimates of the foreign language learners: this is the only group that substantially stands out from the others. The central factor here appears to be the acquisition process because the greatest differences exist between the three L1 groups on the one hand and the foreign language learners, on the other hand.

A methodological problem is the presentation of the stimuli as infinitives as well as the instruction, which leaves open whether the infinitives should be interpreted and judged as lemmas or as infinitive tokens. However, in line with almost all other studies on subjective frequency estimation, we assume that the participants judged the frequency of lemmas, i.e., aggregated over all inflected forms. This is supported by the evaluation of high-frequency verbs, such as uspet’ ‘to be in time’, exat’ ‘to drive’, vzjat’ ‘to take’, polozhit’ ‘to put’, and izmenit’ ‘to change’, each with an MSF above 6 and relatively good agreement between the respondents (IQR < 1.1). The frequency analysis according to word forms in the Russian National Corpus shows that the proportion of the infinitive in the total number of tokens of a lemma varies greatly: it ranges from 37.76% (izmenit’) to 32.52% (exat’), 19.97% (vzjat’), 12.13% (polozhit’), and only 4.96% (uspet’). The most frequent form in the case of the last three verbs is the preterite form in the masculine singular (vzjal, polozhil, uspel). If the majority of our participants had been judging the frequency of the infinitive form, one would have expected larger differences between the MSF of these five verbs. However, individual subjects may have reflected on the type-token question while completing the questionnaire and consciously decided to judge infinitive tokens. Providing participants with a less ambiguous prompt (i.e., “how often do you encounter this particular form”, referring specifically to the infinitive and not its inflected forms) would probably have made the results more precise in this respect. However, exclusively considering the infinitive would have changed the whole study, resulting in reduced comparability with previous research and very limited significance of the results. Nevertheless, the issue of how different frequencies of inflected forms are reflected in subjective frequency estimation should be addressed in future research.

In summary, we have shown that Russian behaves analogously to other languages regarding subjective frequency estimates. Furthermore, the subjective frequency survey proved to be not only a method for collecting frequency data but also a research tool in its own right, providing insights into the bilingual mental lexicon.

Author Contributions

Both authors were equally involved in all parts of this investigation. All authors have read and agreed to the published version of the manuscript.

Funding

This study received no external funding.

Institutional Review Board Statement

Ethical review and approval were waived for this study because in our view subjective frequency estimation does not pose a risk of unsettling or negatively influencing participants in any way. In the questionnaire, the participants provided basic information about their linguistic biography (e.g., country of birth, age of emigration, time and duration of Russian lessons), which was also considered harmless by the authors. Also, some of our data was collected in 2011, when ethical review was not yet established in empirical linguistics.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The original data are available from https://slavdok.slavistik-portal.de/receive/slavdok_mods_00000353, accessed on 27 June 2024.

Acknowledgments

We would like to thank Anke Luislampe and Moritz Dettbarn (Bochum) for their support in collecting and processing the data, and Johannes Herrmann (Giessen) for his statistical advice. We presented the results at the 48th meeting of the “Konstanzer Slavistischer Arbeitskreis” in Greifswald in 2023—many thanks to the members for their constructive remarks. Finally, we would like to thank the four anonymous reviewers for their thorough and valuable comments.

Conflicts of Interest

The authors declare no conflicts of interest.

Notes

1	The list “Russian normative data for 375 action pictures and verbs” (Akinina et al. 2015) contains the parameter “action familiarity”, which refers to the degree to which the depicted action is familiar but not to the word familiarity. However, word familiarity is sometimes used as a synonym for subjective word frequency.
2	The primary verb in an aspectual pair is the more basic one as opposed to the semantically and morphologically derived secondary verb (Lehmann 2009). In telic pairs, the perfective verb is the primary verb. For example, in the pair pozdravit’ (perfective)—pozdravljat’ (imperfective) ‘to congratulate’, perfective pozdravit’ is the primary verb. In atelic pairs, the imperfective verb is the primary verb: from the pair vejat’ (imperfective)—povejat’ (perfective) ‘to blow (wind)’, imperfective vejat’ is the primary verb.
3	The Russian National Corpus and the frequency lists were revised after our selection process had taken place. The proportion of spoken language is now slightly higher, but still not an adequate reflection of the speakers’ average language use.
4	Sketchengine offers a Russian subtitle corpus (https://www.sketchengine.eu/corpora-and-languages/corpus-list/, accessed on 27 June 2024) containing frequency information on lemmas and word forms, however, word lists are limited to 1000 items (longer lists are available as a paid service).
5	Each aspect pair was represented only once per questionnaire. For example, questionnaire 1 contained napravit’ (‘to send’, perfective) and pozdravljat’ (‘to congratulate’, imperfective), while questionnaire 2 contained napravljat’ (‘to send’, imperfective) and pozdravit’ (‘to congratulate’, perfective).
6	Balota et al. (2001) compared the subjective frequency data collected in a paper-and-pencil survey with 547 university students to those from an online survey with 1590 participants and found a correlation of r = 0.95 for both groups, so we consider the different forms of presentation to be negligible.
7	25 of the 67 foreign language learners of Russian learned another Slavic language for an average of 1.42 years (SD 0.89). However, the possible influences of cognates were not further analyzed.
8	In principle, Likert scales should be regarded as ordinal scales, which is why we calculated the median instead of the mean. The median of grouped data, however, allowed us to calculate a very finely graded scale of the individual verbs, which has the characteristics of an interval scale, so we used the Pearson correlation for the further correlation analysis. Both Spearman and Pearson correlation analyses can be found in the literature. The Pearson correlation is used by Chen and Dong (2019), Gernsbacher (1984), Shatzman and Schiller (2004), and Sherkina-Lieber (2004, 2008), whereas the Spearman correlation is applied by Alderson (2007), Brzoza (2018), and Carroll (1971). Some papers do not specify which correlation technique was used, such as Balota et al. (2001), Brysbaert and Cortese (2011), Miklashevsky (2018). The correlation coefficients calculated by Pearson correlation are typically higher than those calculated by Spearman.
9	Brysbaert and New (2009, p. 980) established a boundary of >20 ipm for high-frequency and <10 ipm for low-frequency words. However, they did not provide any explanation for this division. For our sample of verbs, the Brysbaert and New-approach would have resulted in highly imbalanced proportions. Therefore, we chose to set the boundary at 30 ipm.
10	A check in the RNC 2024 revealed that the past participle svjazannyj occurs 10 times more frequently than all other forms of the verb.
11	We stick to the classification based on corpus frequency, even though this is not the best representation of frequency because the question of what is a subjective low-frequency and what a high-frequency verb has to be answered differently for each group, and using the monolinguals’ scale would be circular.
12	Interestingly, this verb ranks 27th among our 49 verbs according to the MSF of the monolinguals but 9th according to the corpus frequency. However, it is still in the upper frequency range we have set.

References

Ahn, Sunyoung, Charles B. Chang, Robert DeKeyser, and Sunyoung Lee-Ellis. 2017. Age effects in first language attrition: Speech perception by Korean-English bilinguals. Language Learning 67: 694–733. [Google Scholar] [CrossRef]
Akinina, Yulia, Svetlana Malyutina, Maria Ivanova, Ekaterina Iskra, Elena Mannova, and Olga Dragoy. 2015. Russian normative data for 375 action pictures and verbs. Behavior Research Methods 47: 691–707. [Google Scholar] [CrossRef] [PubMed]
Alderson, J. Charles. 2007. Judging the Frequency of English Words. Applied Linguistics 28: 383–409. [Google Scholar] [CrossRef]
Anstatt, Tanja. 2016. Subjektive Frequenz als Forschungsmethode. Wiener Slawistischer Almanach 77: 7–35. [Google Scholar]
Anstatt, Tanja. 2017. Language attitudes and linguistic skills in young heritage speakers of Russian in German. In Integration, Identity and Language Maintenance in Young Immigrants: Russian Germans or German Russians. Edited by Ludmila Isurin and Claudia Maria Riehl. Amsterdam: John Benjamins, pp. 197–224. [Google Scholar]
Anstatt, Tanja, and Christina Clasmeier. 2012. Wie häufig ist poplakat’? Subjektive Frequenz und russischer Verbalaspekt. Wiener Slawistischer Almanach 70: 129–63. [Google Scholar]
Baayen, R. Harald. 2010. Demythologizing the word frequency effect: A discriminative learning perspective. The Mental Lexicon 5: 436–61. [Google Scholar] [CrossRef]
Baayen, R. Harald, Laurie Beth Feldman, and Robert Schreuder. 2006. Morphological influences on the recognition of monosyllabic monomorphemic words. Journal of Memory and Language 55: 290–313. [Google Scholar] [CrossRef]
Balota, David A., Maura Pilotti, and Michael J. Cortese. 2001. Subjective frequency estimates for 2.938 monosyllabic words. Memory & Cognition 29: 639–47. [Google Scholar] [CrossRef]
Balota, David A., Michael J. Cortese, Susan Sergent-Marshall, Daniel Spieler, and Melvin Yap. 2004. Visual word recognition for single-syllable words. Journal of Experimental Psychology: General 133: 283–316. [Google Scholar] [CrossRef]
Brysbaert, Marc, and Boris New. 2009. Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods 41: 977–90. [Google Scholar] [CrossRef]
Brysbaert, Marc, and Michael J. Cortese. 2011. Do the effects of subjective frequency and age of acquisition survive better word frequency norms? The Quarterly Journal of Experimental Psychology 64: 545–59. [Google Scholar] [CrossRef] [PubMed]
Brysbaert, Marc, Matthias Buchmeier, Markus Conrad, Arthur M. Jacobs, Jens Bölte, and Andrea Böhl. 2011. The word frequency effect: A review of recent developments and implications for the choice of frequency estimates in German. Experimental Psychology 58: 412–24. [Google Scholar] [CrossRef] [PubMed]
Brysbaert, Marc, Paweł Mandera, and Emmanuel Keuleers. 2018. The word frequency effect in word processing: An updated review. Current Directions in Psychological Science 27: 45–50. [Google Scholar] [CrossRef]
Brzoza, Bartosz. 2018. Word frequency counts: Linking corpus data to user’s perception in linguistic research. Lingvisticæ Investigationes 41: 224–39. [Google Scholar] [CrossRef]
Carroll, John B. 1971. Measurement properties of subjective magnitude estimates of word frequency. Journal of Verbal Learning and Verbal Behavior 10: 722–29. [Google Scholar] [CrossRef]
Chen, Xiaocong, and Yanping Dong. 2019. Evaluating objective and subjective frequency measures in L2 lexical processing. Lingua 230: 102738. [Google Scholar] [CrossRef]
Cop, Uschi, Emmanuel Keuleers, Denis Drieghe, and Wouter Duyck. 2015. Frequency effects in monolingual and bilingual natural reading. Psychonomic Bulletin & Review 22: 1216–34. [Google Scholar]
DeLuca, Vincent, Jason Rothman, Ellen Bialystok, and Christos Pliatsikas. 2019. Redefining bilingualism as a spectrum of experiences that diferentially affects brain structure and function. Proceedings of the National Academy of Sciences of the United States of America 116: 7565–74. [Google Scholar] [CrossRef] [PubMed]
Desrochers, Alain, and Glenn L. Thompson. 2009. Subjective frequency and imageability ratings for 3,600 French nouns. Behavior Research Methods 41: 546–57. [Google Scholar] [CrossRef]
Ellis, Nick C. 2002. Frequency effects in language processing. A Review with Implications for Theories of Implicit and Explicit Language Acquisition. Studies in Second Language Acquisition 24: 143–88. [Google Scholar] [CrossRef]
Ellis, Nick C. 2012. What can we count in language, and what counts in language acquisition, cognition, and use? In Frequency Effects in Language Learning and Processing. Edited by Thomas Gries and Dagmar Divjak. Berlin and Boston: De Gruyter eBooks, vol. 1, pp. 7–33. [Google Scholar]
Emmorey, Karen, Jennifer A. F. Petrich, and Tamar H. Gollan. 2012. Bimodal Bilingualism and the Frequency-Lag Hypothesis. Journal of Deaf Studies and Deaf Education 18: 1–11. [Google Scholar] [CrossRef] [PubMed]
Frumkina, Revekka M. 1966. Ob”ektivnye i sub”ektivnye ocenki verojatnostej slov. Voprosy jazykoznanija 2: 90–96. [Google Scholar]
Frumkina, Revekka M., and A. P. Vasilevich. 1971. Poluchenie ocenok verojatnostej slov psixometricheskimi metodami. Edited by Revekka M. Frumkina. Moskva: Verojatnostnoe prognozirovanie v rechi, pp. 7–28. [Google Scholar]
Gernsbacher, Morton Ann. 1984. Resolving 20 years of inconsistent interactions between lexical familiarity and orthography, concreteness, and polysemy. Journal of Experimental Psychology General 113: 256–81. [Google Scholar] [CrossRef] [PubMed]
Gollan, Tamar H., Marina P. Bonanni, and Rosa I. Montoya. 2005. Proper names get stuck on bilingual and monolingual speakers’ tip-of-the-tongue equally often. Neuropsychology 19: 278–87. [Google Scholar] [CrossRef] [PubMed]
Gollan, Tamar H., Rosa I. Montoya, Cynthia Cera, and Tiffany C. Sandoval. 2008. More use almost always means smaller a frequency effect: Aging, bilingualism, and the weaker links hypothesis. Journal of Memory and Language 58: 787–814. [Google Scholar] [CrossRef] [PubMed]
Hinkle, Dennis E., William Wiersma, and Stephen G. Jurs. 2003. Applied Statistics for the Behavioral Sciences, 5th ed. Boston: Houghton Mifflin. [Google Scholar]
Imai, Satomi, Amanda C. Walley, and James E. Flege. 2005. Lexical frequency and neighborhood density effects on the recognition of native and Spanish-accented words by native English and Spanish listeners. The Journal of the Acoustical Society of America 117: 896–907. [Google Scholar] [CrossRef]
Köpke, Barbara, and Monika S. Schmid. 2004. Language attrition. The next phase. In First Language Attrition: Interdisciplinary Perspectives on Methodological Issues. Edited by Monika S. Schmid and Barbara Köpke. Amsterdam and Philadelphia: John Benjamins Publishing, pp. 1–43. [Google Scholar]
Krause, Marion. 2002. Subjektive Bewertung von Vorkommenshäufigkeiten: Methode und Ergebnisse. Glottometrics 2: 53–81. [Google Scholar]
Kuperman, Victor, and Julie A. Van Dyke. 2013. Reassessing word frequency as a determinant of word recognition for skilled and unskilled readers. Journal of Experimental Psychology. Human Perception and Performance 39: 802–23. [Google Scholar] [CrossRef]
Kupisch, Tanja, and Jason Rothman. 2018. Terminology matters! Why difference is not incompleteness and how early child bilinguals are heritage speakers. International Journal of Bilingualism 22: 564–82. [Google Scholar] [CrossRef]
Lehmann, Volkmar. 2009. Aspekt und Tempus. In Slavische Sprachen—Slavic Languages (Handbücher zur Sprach- und Kommunikationswissenschaft). Edited by Sebastian Kempgen, Peter Kosta, Tilman Berger and Karl Gutschmidt. Berlin and New York: Mouton De Gruyter, vol. 32.1, pp. 526–56. [Google Scholar]
Ljashevskaja, Ol’ga N., and Sergej A. Sharov. 2011. Chastotnyj slovar’ sovremennogo russkogo jazyka, na materialax Nacional’nogo korpusa russkogo jazyka, Moskva (ėlektronnaja versija: Novyj chastotnyj slovar’ russkoj leksiki 2011. Available online: http://dict.ruslang.ru/freq.php (accessed on 1 February 2024).
Mandera, Paweł. 2016. Psycholinguistics on a Large Scale: Combining Text Corpora, Megastudies, and Distributional Semantics to Investigate Human Language Processing. Doctoral dissertation, Ghent University, Ghent, Belgium. Available online: https://biblio.ugent.be/publication/7235387 (accessed on 13 January 2024).
McGee, Iain. 2008. Word Frequency Estimates Revisited. A Response to Alderson (2007). Applied Linguistics 29: 509–14. [Google Scholar] [CrossRef]
Miklashevsky, Alex. 2018. Perceptual Experience Norms for 506 Russian Nouns: Modality Rating, Spatial Localization, Manipulability, Imageability and Other Variables. Journal of Psycholinguistic Research 47: 641–61. [Google Scholar] [CrossRef] [PubMed]
Monaghan, Padraic, Ya-Ning Chang, Stephen Welbourne, and Marc Brysbaert. 2017. Exploring the relations between word frequency, language exposure, and bilingualism in a computational model of reading. Journal of Memory and Language 93: 1–21. [Google Scholar] [CrossRef]
Montrul, Silvina. 2008. Incomplete Acquisition in Bilingualism: Re-Examining the Age Factor. Amsterdam: John Benjamins. [Google Scholar]
Montrul, Silvina. 2016. The Acquisition of Heritage Languages. Cambridge: Cambridge University Press. [Google Scholar]
Polinsky, Maria. 2018. Heritage Languages and Their Speakers. Cambridge: Cambridge University Press. [Google Scholar] [CrossRef]
Reid, Agnieszka Anna, and William D. Marslen-Wilson. 2003. Lexical representation of morphologically complex words: Evidence from Polish. In Morphological Structure in Language Processing. Edited by R. Harald Baayen and Robert Schreuder. Berlin and New York: Mouton de Gruyter, pp. 287–336. [Google Scholar]
Rothman, Jason. 2009. Understanding the nature and outcomes of early bilingualism: Romance languages as heritage languages. International Journal of Bilingualism 13: 155–63. [Google Scholar] [CrossRef]
Schmid, Monika S., and Barbara Köpke. 2009. L1 Attrition and the Mental Lexicon. In The Bilingual Mental Lexicon: Interdisciplinary Approaches. Edited by Aneta Pavlenko. Bristol: Multilingual Matters, pp. 209–38. [Google Scholar] [CrossRef]
Schmidtke, Jens. 2016. The Bilingual Disadvantage in Speech Understanding in Noise is Likely a Frequency Effect Related to Reduced Language Exposure. Frontiers in Psychology 7: 678. [Google Scholar] [CrossRef] [PubMed]
Shapiro, Bernard J. 1969. The subjective estimate of relative word frequency. Journal of Verbal Learning and Verbal Behavior 8: 248–51. [Google Scholar] [CrossRef]
Sharov, Sergej A., and Olga N. Ljashevskaja. n.d. Vvedenie k novomu chastotnomu slovarju russkoj leksiki. Available online: http://dict.ruslang.ru/freq.php (accessed on 2 April 2015).
Shatzman, Keren B., and Niels O. Schiller. 2004. The word frequency effect in picture naming: Contrasting two hypotheses using homonym pictures. Brain and Language 90: 160–69. [Google Scholar] [CrossRef]
Sherkina-Lieber, Marina. 2004. The Cognate Facilitation Effect in Bilingual Speech Processing: The Case of Russian-English Bilingualism. Cahiers linguistiques d’Ottawa 32: 108–21. [Google Scholar]
Sherkina-Lieber, Marina. 2008. The cognate facilitation effect is a frequency effect: Evidence from Russian-English bilingualism. In Formal Description of Slavic Languages: The Fifth Conference, Leipzig 2003. Edited by Gerhild Zybatow. Frankfurt: Peter Lang, pp. 192–98. [Google Scholar]

Figure 1. The corpus frequency (CF) (logarithmized to base 10, z-transformed) and median of the subjective frequency (MSF) (z-transformed) for monolinguals (MOs), late bilinguals (LBs), heritage speakers (HSs) and foreign language learners (FLs) of the 49 analyzed Russian verbs (English equivalents can be found in the repository at https://slavdok.slavistik-portal.de/receive/slavdok_mods_00000353, accessed on 27 June 2024), as sorted by the corpus frequency.

Figure 2. The median of the subjective frequency (MSF) of the monolinguals (MOs), late bilinguals (LBs), heritage speakers (HSs), and foreign language learners (FLs) for the 49 analyzed Russian verbs (English equivalents can be found in the repository at https://slavdok.slavistik-portal.de/receive/slavdok_mods_00000353), sorted by the MSF of the monolingual group.

Figure 3. The percentage of the levels of agreement in the frequency assessment of the four speaker groups.

Figure 4. The percentage of the levels of agreement in the frequency assessment of the four speaker groups (monolinguals (MOs), late bilinguals (LBs), heritage speakers (HSs), foreign language learners (FLs)) (a) for low-frequency verbs (ipm < 30, N = 20) and (b) for high-frequency verbs (ipm ≥ 30, N = 29).

Figure 5. Percentage of frequency judgments of monolinguals (MOs), late bilinguals (LBs), heritage speakers (HSs), and foreign language learners (FLs) for (a) zajavit’ ‘to declare, to claim’; (b) opustošit’ ‘to devastate’ (from 1 “I never encounter this verb” from 7 “I encounter this verb at every turn”).

Table 1. Data collection phases.

Data Collection Phase	Year	Location	Groups ¹	Total N of Participants	N of Included Participants
1	2012	Krasnojarsk	MO	60	58
2	2012/13	Bochum	MO, LB, HS, FL	166	140
3	2016	various German cities	MO, FL	23	12
4	2023	Online	MO, HS, LB, FL	198	93
	Sum			447	303

¹ MO: monolingual, LB: late bilingual, HS: heritage speaker, FL: foreign language learner.

Table 2. Background information on included participants.

	Monolingual (MO)	Late Bilingual (LB)	Heritage Speaker (HS)	Foreign Language Learner (FL)
N	72	80	84	67
Age: mean in years (SD)	27.1 (11.2)	35.1 (11.8)	24.8 (5.5)	27.9 (11.2)
Age of immigration: mean in years (SD)	n/a	25.0 (8.5)	6.2 (3.9)	n/a
Length of stay: mean in years (SD)	n/a	11.4 (8.2)	18.6 (5.5)	n/a
Weighted length of learning: mean in points (SD)	n/a	n/a	n/a	5.4 (2.5)

Table 3. Number of data sets per questionnaire.

	Monolingual (MO)	Late Bilingual (LB)	Heritage Speaker (HS)	Foreign Language Learner (FL)
N questionnaire 1	34	43	41	30
N questionnaire 2	38	37	43	37
N total	72	80	84	67

Table 4. Interpretation of correlation coefficients.

Size of Correlation Coefficient	Interpretation
0.90 to 1.00	very high correlation
0.70 to 0.89	high correlation
0.50 to 0.69	moderate correlation
0.30 to 0.49	low correlation
0.00 to 0.29	negligible correlation

Table 5. Interpretation of interquartile range (IQR).

From	to	Qualitative Evaluation	Level of Agreement
0.25	0.6	very high agreement	1
0.61	0.9	high agreement	2
0.91	1.1	moderate agreement	3
1.11	1.8	low agreement	4
1.81	2.00	very low agreement	5
2.01	2.5	bimodal distribution	6

Table 6. Pearson correlation coefficients for correlations between CF and MSF of monolinguals (MOs), late bilinguals (LBs), heritage speakers (HSs), and foreign language learners (FLs).

	MSF MO	MSF LB	MSF HS	MSF FL
CF of all verbs (n = 49)	0.624 **¹	0.699 **	0.639 **	0.628 **
CF of low-frequency verbs (n = 20)	0.773 **	0.788 **	0.798 **	0.604 **
CF of high-frequency verbs (n = 29)	0.275 n.s.	0.337 n.s.	0.375 *	0.392 *

¹ level of significance: ** p < 0.01, * p < 0.05, n.s.—not significant.

Table 7. Pearson correlation coefficients for correlations between the median of subjective frequency (MSF) of the monolinguals (MOs) (N = 34 resp. 38) with late bilinguals (LBs) (N = 43 resp. 37), heritage speakers (HSs) (N = 41 resp. 43), and foreign language learners (FLs) (N = 30 resp. 37).

	MSF LB	MSF HS	MSF FL
MSF MO	0.956 **	0.901 **	0.697 **
MSF LB	N/A	0.933 **	0.716 **
MSF HS	N/A	N/A	0.736 **

Level of significance: ** = p < 0.01.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Clasmeier, C.; Anstatt, T. “How Often Do You Encounter the Verb Obnaruzhit’?” Subjective Frequency of Russian Verbs in Heritage Speakers and Other Types of Russian–German Bilinguals. Languages 2024, 9, 256. https://doi.org/10.3390/languages9080256

AMA Style

Clasmeier C, Anstatt T. “How Often Do You Encounter the Verb Obnaruzhit’?” Subjective Frequency of Russian Verbs in Heritage Speakers and Other Types of Russian–German Bilinguals. Languages. 2024; 9(8):256. https://doi.org/10.3390/languages9080256

Chicago/Turabian Style

Clasmeier, Christina, and Tanja Anstatt. 2024. "“How Often Do You Encounter the Verb Obnaruzhit’?” Subjective Frequency of Russian Verbs in Heritage Speakers and Other Types of Russian–German Bilinguals" Languages 9, no. 8: 256. https://doi.org/10.3390/languages9080256

Article Menu

“How Often Do You Encounter the Verb Obnaruzhit’?” Subjective Frequency of Russian Verbs in Heritage Speakers and Other Types of Russian–German Bilinguals

Abstract

1. Introduction

1.1. Word Frequency

1.2. Subjective Frequency and Corpus Frequency

1.3. Subjective Frequency and the Bilingual Mental Lexicon

1.4. Heritage Language Speakers and Other Types of Speakers

1.5. Research Questions and Hypotheses

2. Materials and Methods

2.1. Verb Materials and Corpus Frequency

2.2. Subjective Frequency Data Collection

2.3. Participants

2.4. Data Analysis

3. Results

3.1. Corpus Frequency (CF) and Subjective Frequency (SF) Estimation

3.2. Correlation of the SF Ratings of the Four Groups of Speakers

4. Discussion and Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Notes

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI