2.1.3. Procedure

The experiment comprised three stages. In the first stage, the *improvisation* stages, participants produced gestures for each vignette individually, without communicating with another participant. In the second stage, termed the *interaction* stage, participants communicated with their partners, producing and interpreting a gesture for each vignette. In the third stage, another *improvisation* stage, participants again produced gestures individuallya so that we could see whether any changes introduced in stage two were retained in stage 3 (Figure 2). Throughout the experiment, participants communicated using only manual gestures. Participants were instructed not to use speech when gesturing (audio was not recorded), nor to use fingerspelling of any kind. Participants were also asked to remain seated throughout the task.

**Figure 2.** Stages in experiment 1. Participants take part in 3 stages: first, they take part in an improvisation stage, producing gestures to describe each vignette. They then take part in an interaction stage, producing and interpreting gestures in interaction with a partner. Finally, they complete a second improvisation stage.

In the first and third stages, participants were presented with each vignette, in random order, and asked to produce gestures to communicate each scene. One vignette was shown and a gesture was elicited at each trial. Participants were given a 3 s countdown to prepare them for the beginning of each trial. The vignette was shown on the screen, playing through twice, before participants were instructed to communicate the scene they had watched to the camera, using only gestures. Participants were again shown a 3 s countdown, this time to prepare them for recording. When recording began, participants saw themselves onscreen (mirrored) in the VideoBox window. Instructions were shown onscreen throughout the trial, informing participants to press the space bar to stop recording and move on to the next trial. Participants completed trials for all 24 vignettes. The procedure was identical for both improvisation stages.

In the intervening interaction stage, participants took turns with a partner to produce and interpret gestures, in a director–matcher task. Participants both produced and interpreted gestures for each vignette, giving a total of 48 trials in the interaction stage (i.e., each participant acted as director and receiver for all 24 vignettes). Participants switched roles at each trial, and the presentation of the scenes in each trial was randomised. Participants remained seated in individual experiment booths, and communication was enabled by streaming video between networked computers.

As director, the participant was asked to produce a gesture to communicate the vignette to their partner. Aftera3s countdown, participants were shown a vignette, twice through, as in the improvisation stages. They were then instructed to communicate the scene they had just watched to their partner. A 3 s countdown prepared them for recording and streaming to their partner. The participant's gesture was streamed to the networked computer operated by the matcher; the director saw themselves mirrored onscreen at the same time. Either director or matcher could stop the recording and streaming by pressing the space bar. When streaming was terminated, the director had to wait for the matcher to guess what the gesture meant. Both participants were given feedback, and the experiment continued to the next trial.

As matcher, participants were given a 3 s countdown to signal the start of the trial, but were shown text on the screen reading "Waiting for partner" whilst the director watched the vignette. The matcher then received a synchronised 3 s countdown to prepare them for the start of streaming and recording. The matcher saw their partner's gesture, unmirrored, on screen. The matcher could terminate streaming by pressing the space bar when they felt they had understood their partner's gesture. Once streaming had been terminated, the matcher saw a set of 4 vignettes and made their guess. The 4 vignettes were chosen from vignettes used throughout the experiment, and comprised the target vignette (correct response) and three foils, determined as follows:


**Figure 3.** Example of a matching trial. The participant is shown a target and 3 foil videos playing in a loop on screen, and asked to select the video they think their partner was trying to communicate.

The target and 3 foils were presented as a grid of 4 looping videos. The matcher made their guess by pressing the number (from 1–4) of the corresponding video, as indicated in a dummy grid presented below the videos (see Figure 3). Once the matcher responded, both participants were given full feedback. If the matcher's guess was correct, they saw the target video highlighted in green, and the director saw the target video on screen. If the matcher's guess was incorrect, the selected video and the target video were highlighted on the screen in red and green, respectively. In this case, the director saw the target video and the selected video. Both participants also received text feedback on screen, reading either "Correct" or "Incorrect". Feedback showed onscreen for 8 s before the experiment software automatically continued to the next trial, giving participants enough time to see both the target and the selected videos.

## 2.1.4. Gesture Coding

Here, we analyse the gestures produced in the two improvisation rounds, the first round (before interaction) and the final round (after interaction). Gesture sequences produced at each trial (describing single vignettes) were glossed and coded using ELAN (Sloetjes and Wittenburg 2008) by members of the research team. Individual gestures in a sequence were given a gloss describing each gesture (e.g., take photo), and then category codes were assigned to each gesture denoting 4 main categories:


Following Abner et al. (2019), our goal is to analyse some of the formal features that distinguish noun and verb signs across natural sign languages (e.g., size, number of repetitions), in gestures that share a similar underlying form. For example, in Figure 4a, the participant produces two different gestures for typical and atypical scenes featuring the target item egg: in the left-hand panel, she gestures the target action of cracking an egg; in the right-hand panel, she positions her right hand as if holding an egg. Because the participant has chosen two distinct forms to represent the egg, we cannot compare features of the gestures in the typical and atypical contexts. In contrast, in Figure 4b, the participant produces gestures that have the same underlying form for typical and atypical scenes featuring the target item hammer: the participant's hand (or hands) moves as if manipulating a hammer in both cases. By comparing gestures with the same underlying form, we can examine if, across typical and atypical contexts, participants selectively use different features to distinguish productions in contexts designed to elicit noun forms vs. contexts designed to elicit verb forms. Therefore, we take the **target action (TA)** gestures produced in a sequence to be the participant's representation of the intended target (i.e., camera), and we compare TA gestures for the same object that the participant produced in its typical and atypical context3. We code these TA gestures for the following formal features known to distinguish nominal and verbal signs in natural sign languages:


**Figure 4.** Example of gestures representing targets: (**a**), representing the target item egg where the two forms (left, right) have different underlying representations (**b**) representing the target item hammer, where the two forms (left, right) have the same underlying representations.

Two coders completed coding for data from study 1. A subset of 20% of the data (spanning data from each coder) was second-coded by KM and reliability between this sample and the original coding was calculated using Cohen's Kappa (Cohen 1960) for target action coding and for each of the formal parameters. We found very high agreemen<sup>t</sup> for our variables of interest: first target action (κ = 0.93), base hand (κ = 0.85), gesture size (κ = 0.89), gesture location (κ = 0.88) and repetitions (κ = 0.88). The full coding scheme can be found at https://osf.io/qzgjt (accessed on 21 March 2022).
