Image Generation Phase

Participants were seated facing a computer monitor and pressed the right mouse button to begin each trial. Upon clicking the mouse, an alerting beep was sounded, followed 250 ms later by the display of a noun-cue at the center of the screen. Participants were instructed to read the cue silently

and as quickly as possible. They were immediately asked to generate an image that corresponded to the noun-cue. When participants felt that their mental image generation was at its most vivid state, they pressed the right mouse button. Upon pressing the button, another alerting beep was sounded, followed 250 ms later by a horizontal array of seven choices appearing near the bottom of the screen. From left to right, each button was labeled with one of seven vividness level descriptions in a seven-point scale format: ((1), "no image"; (2), "very vague/dim"; (3), "vague/dim"; (4), "not vivid"; (5), "moderately vivid"; (6), "very vivid"; and (7), "perfectly vivid"), as seen in previous research [47,58]. Participants were familiarized with the rating system during pre-test practice sessions. Participants used the mouse to click on one of these seven buttons and were instructed to rate any failure to generate an image as a "no image." There was no deadline for their response.

**Figure 2.** Schematic depicting the overall design of the imagery-generation incidental recall task. (**A**) Participants pressed the right mouse button to begin each trial (1). Upon clicking the mouse, an alerting beep was sounded, followed 250 ms later by the display of a noun-cue at the center of the screen (2). Participants were instructed to read the cue silently and as quickly as possible. They were immediately asked to generate an image that corresponded to the noun-cue (3). When participants felt that their mental image generation was at its most vivid state, they pressed the right mouse button (4). Upon pressing the button, another alerting beep was sounded, followed 250 ms later by a horizontal array of seven choices appearing near the bottom of the screen (5). From left to right, each button was labeled with one of seven vividness level descriptions in a seven-point scale format: ((1), "no image"; (2), "very vague/dim"; (3), "vague/dim"; (4), "not vivid"; (5), "moderately vivid"; (6), "very vivid"; and (7), "perfectly vivid"). Following the vividness response during the rating procedure, the array of buttons disappeared, and the display reverted back to a screen instructing the participant to click the mouse when they were ready to begin the next trial (6). A minimum of 5 s was needed between vividness response and the start of the next trial. (**B**) After completing the image generation phase, participants were told to take a break and fill out paperwork, including a debriefing session. (**C**) Exactly 30 min from their last trial, participants were asked to recall as many of the noun cues as possible on a blank excel spreadsheet (7).

Stimulus Familiarity Matching and Diagnostic Procedure

The third stage of the selection involved finding a complete archival match of familiarity for the sixty stimuli, again using the MRC2.DCT and using the same merging procedure outlined in Section 2.2.1. to consolidate the present database. MRC2.DCT is an online dictionary file being provided for public research use along with some programs which can be used either to access the dictionary or as examples on which to model programs which match users' specific needs. The

dictionary file does not contain any information which is original to it but was assembled by merging a number of smaller databases published in the psycholinguistic and imagery literature [59]. The original procedure for rating the items consisted of paper and pencil protocol similar to the one we used, albeit in computerized form. In the original norms, the equivalent range of the ratings was 1.00 to 7.00. This database dictionary differs from other machine usable dictionaries in that it includes not only syntactic information but also psychological data for the entries. The file contains 9392 words which possess imagery and other attributes and familiarity ratings except for vividness. The columns 26 to 28 labeled as "FAM" Familiarity stands for 'printed familiarity'. The FAM values were derived from merging three sets of familiarity norms: Paivio, Yuille and Madigan, Toglia and Battig, and Gilhooly and Logie [60–62]. The method by which these three sets of norms were merged is described in detail in Appendix 2 of the MRC Psycholinguistic Database User Manual [63]. FAM values lie in the range 100 to 700 with the maximum entry of 657, a mean of 488, and a standard deviation of 99: Note that they are integer values.

The fourth and final stage involved analysis of the distribution of the vividness and familiarity values to avoid range restriction and diagnostics to eliminate outliers. The latter stage narrowed the final number of stimuli in the database further to fifty.
