1. Introduction
Accurate metacognitive monitoring is critical for memory performance, and people often regulate their learning strategies based on the self-monitoring of their learning (
Dunlosky and Ariel 2011;
Kornell and Bjork 2008;
Metcalfe and Finn 2008). Judgments of learning (JOLs) are one of the most common measures of people’s metacognitive monitoring, which refers to their predictions of the likelihood of remembering recently encoded material on future memory tests. JOLs were once assumed to assess levels of learning without modifying them. However, a few early researchers questioned this assumption (
Nelson and Dunlosky 1992;
Spellman and Bjork 1992). More recently, there has been accumulating evidence that JOLs often produce robust learning effects (for reviews, see
Double et al. 2018;
Double and Birney 2019). The finding that making JOLs directly modifies subsequent memory performance is termed
JOL reactivity.
Multiple theoretical hypotheses have been proposed to explain JOL reactivity, such as the changed-goal hypothesis (
Mitchum et al. 2016), the cue-strengthening hypothesis (
Soderstrom et al. 2015), the item-specific processing hypothesis (
Senkova and Otani 2021), and the attention-reorienting/enhanced engagement account (
Shi et al. 2023;
Tauber and Witherby 2019). As background, we provide a summary of these hypotheses in
Table 1 and direct readers to other relevant studies that provide support for each theory. It is worth noting here that these theoretical accounts are not mutually exclusive. For instance, the cue-strengthening hypothesis proposes that JOL reactivity results from the strengthening of test-relevant cues, whereas the item-specific processing hypothesis specifies that JOLs’ strengthening effects are tied to item-specific processing. Because those two effects are not logically incompatible, both might occur.
In the current study, we focus on the item-specific processing hypothesis, which evolved from the item-and-relational-processing framework (
Einstein and Hunt 1980;
Hunt and Einstein 1981). According to this framework, there are two distinct types of processing in list encoding. One is item-specific encoding, which focuses on properties that distinguish individual items from each other (e.g., the unique orthography of a list word). The other is relational encoding, which focuses on properties that different items share (e.g., taxonomic categories and narrative themes). Item-specific and relational processing serve different functions, and memory performance is optimized when both types of processing occur.
Based on these notions,
Senkova and Otani (
2021) hypothesized that item-level JOLs resemble encoding tasks that promote item-specific processing. It follows that JOLs should produce larger learning benefits in learning materials that do not normally favor item-specific processing, such as categorized lists, which predominantly trigger relational processing. In support of this hypothesis, Senkova and Otani factorially manipulated JOL condition (JOL, no-JOL) and list type (categorized, unrelated) and found that JOLs enhanced the free recall of categorized lists but not unrelated lists. The hypothesis that making JOLs enhances item-specific processing received further support from the finding that recall for categorized lists following JOLs was comparable to recall following two other encoding manipulations that are known to induce item-specific processing: pleasantness ratings and mental imagery.
Zhao et al. (
2022) also found supporting evidence for the item-specific processing hypothesis using unrelated word lists. They reported that making JOLs improved performance on forced-choice recognition tests and simultaneously impaired performance on order reconstruction tests (i.e., reconstructing the temporal order in which words were encoded). Forced-choice recognition relies heavily on item-specific processing, whereas order reconstruction is an inherently relational task. Thus, the finding that JOLs improve the former and impair the latter suggests that they shift encoding toward item-specific features and away from relational features.
Zhao et al. (
2023) later replicated the negative effect of JOLs on relational processing with rhyming cue-target word pairs, for which the target words on consecutive pairs were exemplars of the same category. Specifically, they found that making JOLs decreased categorical clustering during the free recall of target words. This again supports the notion that making JOLs slants encoding toward item-specific processing and away from relational processing.
However, there is also evidence that runs counter to the item-specific processing hypothesis. For example,
Stevens (
2019) found no reactivity of item-level JOLs with Deese–Roediger–McDermott (DRM;
Deese 1959;
Roediger and McDermott 1995) lists. These lists are composed of words (e.g.,
bed,
pillow,
yawn, etc.) that are associated with a common missing word (e.g.,
sleep), which trigger high levels of relational processing. Similarly,
Stevens and Pierce (
2019) found no reactivity for item-level JOLs with categorized lists, although they did find positive reactivity for
list-level JOLs. Unlike item-level JOLs, which are made after studying each word, list-level JOLs are made after studying each categorized list. Usually, participants are asked to estimate the number of words they will be able to recall from the list.
Stevens and Pierce (
2019) argued that item-level JOLs direct participants’ attention to item-specific processing and list-level JOLs direct attention to relational processing. Thus, only list-level JOLs produced positive reactivity whereas item-level JOLs did not because categorical lists prefer relational processing to item-specific processing.
It should be noted that there is a clear discrepancy between
Senkova and Otani’s (
2021) and
Stevens and Pierce’s (
2019) findings, as the former authors found positive reactivity of item-level JOLs for categorized lists, whereas the latter did not. In that regard, there is a critical methodological difference between the two studies that may be responsible for the discrepancy. In Stevens and Pierce’s experiments, same-category exemplars were blocked for presentation (i.e., exemplars of the same category are presented consecutively), but in
Senkova and Otani’s (
2021) experiments, the words’ presentation order was completely randomized. Obviously, the blocked list presentation is more likely to cue relational processing than the randomized presentation.
With this background, the current experiment had two aims. The first was to reconcile the mixed findings of JOL reactivity with categorized lists, and the second was to conduct further tests of
Senkova and Otani’s (
2021) item-specific processing hypothesis. To achieve the aims, we factorially manipulated the organization of categorized lists (randomized vs. blocked) and JOL condition (item-JOL, list-JOL, and no-JOL), with both variables being manipulated between participants.
Regarding the first aim, we examined whether the two previous findings (positive reactivity vs. no reactivity) could both be obtained in a single experiment with standardized materials and procedures. If so, the two findings are not inconsistent with each other; instead, their discrepancy can be attributed to the level of list organization during encoding. List-level JOLs were also administered as in
Stevens and Pierce’s (
2019) study. Here, list-level JOLs were expected to produce positive reactivity on recall for blocked but not for randomized categorized lists. This is because, in the randomized list condition, exemplars from different categories are intermixed across lists, leaving no coherent categorical relations within individual lists.
Turning to the second aim, the item-specific hypothesis posits that item-level JOLs improve recall for categorized lists by enhancing item-specific processing, which complements the relational processing that normally predominates. If so, positive reactivity should be observed with both blocked and randomized presentations. In fact, positive reactivity may be stronger with blocked than with randomized presentation, as the former directs attention towards relational processing and away from item-specific processing to a larger extent.
Moreover, as
Senkova and Otani (
2021) noted, similar performance between the item-JOL conditions and the pleasantness rating and mental imagery conditions does not guarantee that the underlying processes operating in those conditions are the same or even similar. To address this uncertainty, we used the dual-retrieval model of recall (
Brainerd et al. 2009;
Chang and Brainerd 2023) to pinpoint the underlying memory processes that are responsible for JOL reactivity in the present experiment.
The dual-retrieval model was developed based on fuzzy-trace theory’s assumption that people store and retrieve dissociated verbatim and gist traces of experience, that is, the literal traces of individual items versus the traces of semantic, relational, and elaborative information in which items participate (
Brainerd and Reyna 1998;
Reyna and Brainerd 1995). The only experimental requirement to implement the dual-retrieval model is that participants respond to at least three separate recall tests for encoded items, which supplies sufficient degrees of freedom to estimate all model parameters. There are (a) two types of verbatim retrieval parameters, i.e., direct access (
D) and forgetting of direct access (
F), and (b) two types of gist retrieval parameters, i.e., reconstruction (
R) and familiarity judgment (
J). The definitions of these parameters can be found in
Table 2, and its mathematical machinery is summarized in
Appendix A.
Returning to the item-specific processing hypothesis, if JOL reactivity can be accounted for by enhanced item-specific processing, the differences between the item-JOL and no-JOL conditions should be tied to parameters that pertain to item-specific processing, namely, direct access (D) or forgetting (F) parameters, not parameters that pertain to processing of partial identifying information such as semantic relations across items, namely, reconstruction (R) or familiarity judgment (J) parameters.
3. Results
3.1. JOL Results
To examine the effects of list organization (blocked vs. randomized) on JOLs, we conducted two separate one-way analyses of variance (ANOVAs) for item-level JOLs and list-level JOLs, respectively. The effects of list organization on item-level JOLs did not reach statistical significance, F(1, 78) = 2.96, MSE = 305.31, ηp2 = 0.04, p = .089. In contrast, list-level JOLs were significantly higher for blocked lists (M = 4.89, SD = 1.19) than for randomized lists, (M = 3.73, SD = 1.06), F(1, 79) = 20.89, MSE = 1.29, ηp2 = 0.21, p < .001. Thus, list-level JOLs are relatively more sensitive to list organization than item-level JOLs.
3.2. Recall Results
Three participants’ recall data were identified as outliers as they were 1.5 interquartile ranges (IQRs) above the median (
Höhne and Schlosser 2018). These outliers were removed. The removal of the outliers did not change the qualitative effects in the ANOVA results. The recall data are displayed in
Figure 2.
We first conducted 2 (list organization: blocked, randomized) × 3 (JOL condition: item-JOL, list-JOL, no-JOL) × 3 (Test: 1, 2, 3) mixed ANOVA for recall. The ANOVA revealed a main effect of list organization, F(1, 231) = 12.71, MSE = 0.10, ηp2 = 0.05, p < .001, a main effect of JOL condition, F(2, 231) = 4.86, MSE = 0.10, ηp2 = 0.03, p = .009, and a main effect of test, F(2, 462) = 6.19, MSE = 0.002, ηp2 = 0.03, p = .002. Least significant difference (LSD) tests showed that the main effects were due to the fact that recall was higher for blocked lists (M = 0.45, SD = 0.20) than for randomized lists (M = 0.37, SD = 0.18), t(231) = 3.56, d = 0.41, p < .001, higher in the item-JOL condition (M = 0.45, SD = 0.17) than in the list-JOL condition (M = 0.37, SD = 0.21), t(231) = 3.12, d = 0.44, p = .002, and higher on the first recall test (M = 0.42, SD = 0.18) than on the second (M = 0.40, SD = 0.20), and the third recall test (M = 0.40, SD = 0.21), t(231) = 3.13 and 2.45, ds = 0.20 and 0.16, ps < .015. Additionally, there was a JOL condition × Test interaction, F(4, 468) = 2.99, MSE = 0.004, ηp2 = 0.03, p = .019. The JOL condition effect was significant across all three recall tests, ps < .032, but it increased slightly from test 1 to test 2 to test 3. Last, the interaction that is of primary interest, the JOL condition × List organization interaction, was not significant, F(2, 231) = 1.87, MSE = 0.10, ηp2 = 0.02, p = .156.
Although the JOL condition × List organization interaction did not reach the conventional criterion of statistical significance, we conducted a planned one-way ANOVA to compare recall between the item-JOL, list-JOL, and no-JOL conditions for randomized lists. Considering that we found significant differences in recall between test 1 and the following tests and that
Senkova and Otani (
2021) only administered a single study-test cycle, we only included test 1 data to make this analysis comparable to that of
Senkova and Otani (
2021). The one-way ANOVA showed that the effect of JOL condition was significant,
F(2, 114) = 6.52,
MSE = 0.03, η
p2 = 0.10,
p = .002. LSD tests indicated that the item-JOL condition (
M = 0.44,
SD = 0.16) produced a higher recall for randomized lists than both the list-JOL condition, (
M = 0.30,
SD = 0.15),
t(114) = 3.58,
d = 0.88,
p < .001, and the no-JOL condition, (
M = 0.36,
SD = 0.19),
t(114) = 1.99,
d = 0.42,
p = .049. Therefore, we replicated Senkova and Otani’s result that the item-JOLs produced a better recall for randomized categorized lists compared to that of the no-JOL condition.
Additionally, we conducted another planned one-way ANOVA to compare recall between item-JOL, list-JOL, and no-JOL conditions for blocked lists. We restricted this analysis to test 1 data for the same reason stated above. The ANOVA showed that there was no difference in recall for blocked lists between the item-JOL (
M = 0.47,
SD = 0.17), list-JOL (
M = 0.46,
SD = 0.19), and no-JOL conditions (
M = 0.45,
SD = 0.18), F(2, 117) = 0.16,
MSE = 0.03, η
p2 = 0.003,
p = .851. Therefore, we reproduced
Stevens and Pierce’s (
2019) finding of null reactivity of item-level JOLs for blocked categorized lists. In brief, we resolved the inconsistency between
Senkova and Otani’s (
2021) and
Stevens and Pierce’s (
2019) results by demonstrating that it is tied to whether the presentation of categorized lists is blocked or randomized.
3.3. Model Results
The free recall data were further analyzed with the dual-retrieval model. As can be seen in
Table 3, the average
G2(1) across all possible combinations between JOL conditions (item-, list-, and no-JOL) and list organization (blocked and randomized) is 3.56. Because
G2(1) is asymptotically distributed as
χ2(1), the goodness of fit is evaluated by comparing the observed
G2(1) to the critical value of
χ2(1) for rejecting the null hypothesis, which is 3.84 at the 0.05 confidence level. Thus, the observed fit level was acceptable.
For the blocked lists, the comparisons of primary interest are between the item-JOL and no-JOL conditions. There were no significant differences in any model parameter between those two conditions, which is consistent with the ANOVA results. Next, we consider the comparisons between the list-JOL conditions and the other two JOL conditions. The F parameter was larger in the list-JOL condition (0.10) than in the item-JOL and no-JOL conditions (0.05 and 0.05), ∆G2s > 15.38, ps < .001. Furthermore, the J2 parameter was larger in the list-JOL condition than in the item-JOL condition, ∆G2(1) = 4.58, p = .032. This suggests that list-level JOLs simulated more forgetting of item-specific verbatim details and that words followed by list-level JOLs felt more familiar on the later recall tests relative to those followed by item-level JOLs.
The patterns were quite different for randomized lists. We first consider the comparison between the item-JOL and no-JOL conditions. Here, the J1 parameter was larger in the item-JOL condition (0.54) than in the no-JOL condition (0.34), ∆G2(1) = 4.56, p = .033, suggesting that item-JOLs increased familiarity for reconstructed words. No other condition-wise difference in parameters reached statistical significance.
Next, we examine parameter differences between the list-JOL condition and the other two JOL conditions. Here, the D parameter was smaller in the list-JOL condition (0.25) relative to the item-JOL condition (0.37) and the no-JOL condition (0.33), ∆G2s > 11.87, ps ≤ .001, and the F parameter was again larger in the list-JOL condition than in the item-JOL condition (0.09 vs. 0.06), ∆G2(1) = 3.89, p = .046. This suggests that list-level JOLs impaired initial verbatim retrieval and increased its forgetting. Meanwhile, the J1 parameter was larger in the list-JOL condition (0.71) than in the item-JOL condition (0.54) or the no-JOL condition (0.43), ∆G2s > 8.34, ps < 0.004. Last, the R parameter was smaller in the list-JOL condition than in the item-JOL condition (0.20 vs. 0.10), ∆G2(1) = 5.20, p = .023. Thus, at the level of underlying memory processes, the list-JOL condition impaired both verbatim retrieval and reconstruction, but it made items seem more familiar during recall.
3.4. Follow-Up Analysis
The analyses so far show that item-level JOLs produced positive reactivity for randomized lists but not for blocked lists. Additionally, the positive reactivity of item-level JOLs was localized within familiarity judgment (J1) rather than direct access (D) or forgetting (F). Together, these results are not congruent with the notion that item-level JOLs improve the recall of randomized lists by enhancing item-specific processing. Instead, it appears that item-level JOLs may have enhanced relational processing, by increasing meta-cognitive awareness that a randomly ordered series of words can be grouped into categories.
To test this alternative hypothesis, we conducted a follow-up one-tailed
t-test that compared category clustering between the item-JOL and no-JOL conditions for the randomized lists. According to the item-and-relational-processing framework (
Einstein and Hunt 1980;
Hunt and Einstein 1981), relational processing should enhance category clustering during recall, but item-specific processing should not. We used the adjusted ratio of clustering (ARC;
Roenker et al. 1971) as the index of category clustering, with 0 indicating chance clustering and 1 indicating perfect clustering. ARC is calculated as follows:
Here,
is the total number of category repetitions (i.e., situations where a category exemplar follows another exemplar from the same category),
is the expected number of category repetitions by chance,
is the maximum possible number of category repetitions, and
is the minimum possible number of category repetitions.
,
, and
are calculated as follows:
where
is the number of items recalled from the category
i,
N is the total number of items recalled,
k is the number of categories to which recalled items belong, and
m is the number of items in the category with the most items recalled.
Again, we confined our analyses to test 1 for randomized lists. The t-test showed that the difference in category clustering between the item-JOL condition (M = 0.35, SD = 0.36) and the no-JOL condition (M = 0.19, SD = 0.56) approached but did not reach significance, t(79) = 1.53, d = 0.34, p = .065. However, it is worth mentioning that we did not have sufficient statistical power for the follow-up analysis, as post hoc power analyses showed that with a df of 79, and a small effect size of d = 0.34, we only had a power of 0.45 to detect a significant effect in the independent t-test.
4. Discussion
In the current study, supporting evidence was found for our explanation of the discrepant findings of
Stevens and Pierce (
2019) versus
Senkova and Otani (
2021): We found that the reactivity of item-level JOLs for categorized lists is controlled by list organization. Item-level JOLs produced positive reactivity when list words were randomized but not when they were blocked by category. Moreover, the dual-retrieval model revealed that the recall advantage for randomized lists in the item-JOL condition was driven by the enhancement in gist parameters rather than verbatim parameters. More specifically, making JOLs did not improve retrieval of item-specific verbatim traces, but it increased the feelings of familiarity with words that the participants had reconstructed based on gist traces.
The finding that item-level JOLs enhanced recall for randomized but not for blocked categorized lists cannot be accommodated by
Senkova and Otani’s (
2021) item-specific processing hypothesis. According to this hypothesis, categorized lists encourage relational processing, whereas unrelated lists promote item-specific processing. Consequently, if item-level JOLs enhance item-specific processing, they should improve memory for categorized lists more than memory for uncategorized lists, because encoding is largely shifted toward relational features and away from item-specific features in the former. However, the hypothesis expects positive JOL reactivity for categorized lists regardless of whether they are randomized or blocked. Actually, it expects stronger positive reactivity for blocked lists owing to the dominance of relational processing with blocked lists (
Ackerman 1986).
Then, why did the reactivity of item-level JOLs only occur in randomized but not blocked lists? One possible explanation offered by the model analysis is that positive JOL reactivity for randomized lists mainly results from enhanced familiarity of items that are reconstructed from partial-identifying gist traces. As can be seen in
Table 3, item-level JOLs improved familiarity judgment (
J1) but not item-specific recollection (
D) for randomized lists, whereas they only affected verbatim forgetting (
F) for blocked lists. Thus, it is possible that the improvement in relational processing was a key determinant of positive reactivity of item-level JOLs in categorized lists, that is, making JOLs may heighten the awareness that individual words can be grouped together under specific categories. If that is the case, it is obvious that enhancement of relational processing should be more beneficial for randomized than for blocked lists. With blocked lists, high levels of relational processing would spontaneously be afforded, while with randomized lists, the exemplars of the same category are randomly scattered around, which hinders relational processing relative to blocked lists. Therefore, if positive reactivity of item-level JOLs for categorized lists is driven by relational processing, such benefits will be relatively redundant for blocked lists but complementary for randomized lists. However, it must be acknowledged that this explanation is speculative because our experiment was not designed to test it. We did conduct follow-up clustering analyses to test it, which showed that item-level JOLs enhanced category clustering for randomized lists at the trend level. Given that the follow-up analysis was underpowered, we recommend that this result ought to be further replicated and examined in future research.
In brief, our findings about categorized lists pose challenges to the item-specific processing hypothesis because (a) we observed positive JOL reactivity only for randomized lists but not blocked lists, and (b) positive JOL reactivity for randomized lists was localized within retrieval processes that index relational processing. Although the item-specific processing hypothesis cannot accommodate the current data, it did provide a good account of JOL reactivity data in other studies. For example, in Experiments 1 and 2 of
Chang and Brainerd (
2023), it was observed that positive JOL reactivity for related word pairs was tied to dual-retrieval model parameters that index item-specific recollection. Moreover, according to
Zhao et al. (
2022,
2023), item-level JOLs disrupt order reconstruction (a measure of relational processing) with unrelated word lists and rhyming pairs whose target words are categorical exemplars. However, it is worth pointing out that those stimuli (word pairs and unrelated lists) naturally trigger greater levels of item-specific processing as compared to the categorized lists we used. Therefore, the types of processing (item-specific or relational) that are enhanced by JOLs may depend heavily on the characteristics of the items that people encode.
Last, we had expected that list-level JOLs would not produce reactivity for randomized lists because list-level JOLs direct participants’ attention to the relations among words on the same list when these words are not meaningfully related. Our results were consistent with this prediction. Additionally, the experiment showed that neither item- nor list-level JOLs enhanced the recall of blocked lists. The former finding is consistent with
Stevens and Pierce’s (
2019) results, whereas the latter is not. A possible reason why list-level JOL reactivity for blocked lists was not replicated is the difference in test format: Stevens and Pierce used a cued recall procedure that provided category labels as retrieval cues, whereas we used free recall. If list-level JOLs slant encoding toward relational processing, cued recall, which facilitates relational processing, should be more sensitive to JOL reactivity than free recall.
In summary, our results showed that JOL reactivity is a contextual memory effect that depends heavily on the interactions between learning material, JOL type, and memory test format. This notion echoes the cue-strengthening hypothesis, whose main assumption is that JOL reactivity depends on whether JOLs strengthen the cues afforded by the learning stimuli and what type of cues are favored by the memory tests. In addition, our argument is also highly consistent with the tetrahedral model of memory (
Jenkins 1979). The model posits that memory effect is dependent on four dimensions: subject characteristic (e.g., ability), learning stimuli (e.g., type of learning material), encoding task (e.g., instructions provided at encoding), and memory test (e.g., recall, recognition, etcs), and moreover, the interactions between them. Our findings tap the interactions between the last three dimensions in the model. Regarding the interaction between learning stimuli and encoding task, for learning materials where exemplars under the same category are scattered around (randomized) rather than blocked together, only item-JOLs pick up and strengthen the interitem relational cue. On the contrary, list-JOLs would be misleading for randomized lists as they slant participants’ attention to interitem relation among consecutive words, while they are not meaningfully related. Moreover, whether the cue strengthening eventually transfers to positive reactivity also depends on the sensitivity of memory tests to the strengthened cues. This may explain why list-JOL reactivity was observed for blocked lists with cued recall but not free recall.