*2.2. Results*

We analyse our measures using mixed effects models, implemented with R (R Core Team 2013) and lme4 (Bates et al. 2015), including context (typical/atypical) and round (first/final) as deviation-coded binary predictors, as well as their interaction. We use the maximal model (including all slopes and intercepts) that allows convergence, including intercepts for item and participant, nested in pairs. Where models do not converge, we (i) test model fit with different optimizers, (ii) remove correlations between slopes and intercepts, and (iii) remove slopes with the lowest variance. The full specification for each model can be found at https://osf.io/qzgjt (accessed on 21 March 2022).

**Sequence length** First, we analyse the overall length of gesture sequences for typical and atypical scenes (Figure 5a4), using a mixed effects Poisson regression model for

count data. A model including both round and scene context, as well as their interaction, demonstrated a better fit than a reduced model (χ<sup>2</sup> = 8.47, *p* = 0.003). The model revealed a significant main effect of context, such that typical scenes were shorter than atypical scenes (β = −0.39, SE = 0.08, z = −5.14, *p* < 0.001), and an interaction between round and context (β = −0.21, SE = 0.07, z = −2.92, *p* = 0.003). That is, participants produce longer gesture sequences for atypical compared to typical scenes, but this difference reduces over rounds once participants converge on conventional ways to communicate targets in the atypical contexts.

**Figure 5.** Mean gesture sequence length in terms of the number of individual gestures produced in a sequence (**a**) and Proportion of gesture sequences with a target action (TA) in final position (**b**), shown for each round and each context (typical/atypical).

**Target action position.** We assess differences in how target actions are positioned in a gesture sequence, using a logistic mixed effects model to analyse how often target action gestures appear in the final position in a sequence (Figure 5b). For example, in a camera event, does the target action gesture (taking a photo with a camera) appear at the end of a gesture sequence or elsewhere in the sequence? We present here a model including only context as a fixed effect, as including round did not improve model fit (χ<sup>2</sup> = 1.21, *p* = 0.27). Participants show a strong preference for producing target actions at the end of the sequence in typical contexts, and rarely produced target actions at the end of the sequence in atypical contexts (β = 10.97, SE = 1.72, z = 6.36, *p* < 0.001).

In our remaining analyses, we focus on gestures that are directly comparable across typical and atypical scenes—those coded as TA gestures. Though some responses did include multiple TA gestures, we include only the first instance of each TA gesture produced in a sequence (only ~11% of all trials contained more than one TA gesture within the same sequence).

In the following measures, we analyse how often participants' productions differ between typical and atypical contexts based on the four formal properties of gestures we coded: base hand use, gesture location, gesture size, and repetitions. If participants produce distinctions based on scene type, we expect typical contexts to elicit verb-like gestures and atypical contexts to elicit noun-like gestures, varying the gesture properties in ways similar to those found in natural sign languages (i.e., more base hand use for verbs, more repetitions for nouns).

**Base hand use.** The proportion of scenes in which participants use a base hand for each round and context is illustrated in Figure 6a. We analysed the presence of base hand gestures at each trial using a logistic mixed effects model; the model including round did not show improved fit over the model including only context (χ<sup>2</sup> = 0.79, *p* = 0.37). The model revealed a significant main effect of context, with base hand use more common in typical than atypical scenes (β = 3.19, SE = 0.96, z = 3.32, *p* < 0.001).

**Figure 6.** Gesture form analyses for experiment 1, showing base hand use (**a**), proportion of target action gestures produced in the same location for typical and atypical contexts (**b**), gesture size (**c**) and repetitions shown for iterable and non-iterable target actions (**d**).

**Location.** We used a logistic mixed effects model to analyse whether at each trial participants gesture target actions in the *same* location across typical and atypical contexts (see Figure 6b). Model comparison indicated that including round did not improve fit compared to the null model (χ<sup>2</sup> = 0.49, *p* = 0.48). The model revealed a significant intercept (β = 1.39, SE = 0.44, z = 3.17, *p* = 0.002), suggesting that, on average, participants gesture TAs in the same location across contexts.

**Size.** We analyse gesture size as how often participants produce target action gestures with path movements (shown in Figure 6c), using a logistic mixed effects model. Models including only context (χ<sup>2</sup> = 0.25, *p* = 0.61) and only round (χ<sup>2</sup> = 0.48, *p* = 0.49) did not improve fit over a null model. The grand mean from the model intercept did not sugges<sup>t</sup> a reliable preference for path movements overall (β = 1.28, SE = 1.81, z = 0.71, *p* = 0.48).

**Repetitions.** We analyse how often target actions are repeated in gestures across typical and atypical contexts using a logistic mixed effects model, adding an additional deviation-coded predictor (including all interactions) of iterability. Some of the events can elicit target actions that can be, and typically are, iterated (e.g., a hammering gesture); other events typically achieve their goal with one movement and thus are not usually iterated actions (e.g., putting on a ring). Our findings are illustrated in Figure 6d. A model including all 3 main effects without interaction terms suggested improved fit over a reduced model without round (χ<sup>2</sup> = 5.97, *p* = 0.01). We found a significant main effect of scene type, such that gestures for typical scenes were produced more often with repetitions than gestures for atypical scenes (β = 0.92, SE = 0.35, z = 2.65, *p* = 0.008). We also found a main effect of iterability, with non-iterable items demonstrating fewer repetitions (β = −4.23, SE = 0.77, z = −5.47, *p* < 0.001).

**Convergence.** Finally, we analyse the extent to which communication between partners has affected the gestures they produce between the first and final production rounds.

We compare gestures produced across pairs of participants (paired in the interaction stage) with pseudo-pairs, matching pairs of non-interacting participants, to assess the specific role communication has in shaping the systems participants produce. We compared the pairs on the four formal properties (base hand, location, size, and repetitions) for target action gestures, and calculated the proportion of those properties that pairs converge on for each target scene (illustrated in Figure 7). We analyse the proportion of form parameters that are the same for paired participants using a logistic mixed effects model, with the proportions weighted by the number of parameters, including fixed effects of round and pair type (both deviation-coded). We include by-pair and by-item random intercepts with a random slope of round for the by-item intercept (including a random slope with the by-pair intercept led to singular fit). The model including the interaction term did not improve fit over the model without (χ<sup>2</sup> = 0.24, *p* = 0.63). Inspection of the model indicated main effects of round (β = 0.10, SE = 0.04, z = 2.41, *p* = 0.02) and pair type (β = 0.19, SE = 0.06, z = 3.13, *p* = 0.002)—participants produce more similar gestures to other participants in the final round than in the first but, importantly, similarity is greater for interacting pairs than for pseudo-pairs.

**Figure 7.** Similarity in form parameters across rounds for real-paired dyads and pseudo-dyads (i.e., who did not interact during the experiment).

#### *2.3. Experiment 1 Summary*

In experiment 1, we examined gestural production in contexts aiming to elicit nounlike and verb-like gestures for target objects, investigating how participants' improvised gestures change after interaction with a partner. Our findings sugges<sup>t</sup> that, even in improvised gestures, participants make distinctions between descriptions of targets designed to elicit nouns and targets designed to elicit verbs. Gesture sequences describing typical scenes tended to be shorter than those describing atypical scenes. Gestures for target actions were primarily produced in final position for typical (i.e., verb-eliciting) contexts, but were rarely produced in final position for atypical (i.e., noun-eliciting) contexts. We also found that target action gestures for typical targets were more frequently produced with a base hand gesture than target action gestures for atypical targets, and typical targets were more frequently repeated than atypical targets. Our findings for target position and base hand use reflect distinctions found in ASL, NSL and those made by Nicaraguan homesigners, as reported by Abner et al. (2019). These patterns sugges<sup>t</sup> that some features distinguishing nominal and predicate forms can emerge even in the earliest stages of a communication system. However, we do not find distinctions based on the features of gesture location, gesture size, nor do we see further systematisation of the distinctions following communication. Analysing the convergence between interacting dyads and pseudo-pairs of participants reveals the role interaction plays. We find some patterns of convergence across pseudo-pairs, highlighting general pressures (i.e., iconicity) that

may affect gesture similarity. However, interacting participants produce gestures that are more similar to each other's than pseudo-pairs of participants, suggesting that similarities between the gestures produced by interacting participants cannot be attributed solely to iconic representations that would be similar across all participants.

In experiment 2, we further explore how communicative constraints affect the distinctions between gestures produced to signal typical and atypical contexts. In experiment 1, we used a constrained model of communication, a reductionist operationalisation in which participants take set turns to produce and interpret gestures, and receive comprehensive feedback on their successes and errors. As discussed by Kocab et al. (2018), it is possible that some of the constraints in operationalisations of communicative behaviour do not always map well onto natural language use, and that currently, such operationalisations do not account for the full range of behaviours that comprise communication in situated, face-to-face interactions. Such interactions in the real world involve conventions related to turn-taking (Stivers et al. 2009), alignment (Garrod and Pickering 2009) and repair (Dingemanse et al. 2015) that are not possible to enact in the reduced operationalisation we use in experiment 1. In experiment 2, we investigate the same research questions using a more ecologically valid operationalisation of communication, in which turn-taking and feedback about communicative success or failure are under the control of the interacting participants themselves. Furthermore, we contrast the interactive scenario with a condition in which individual participants repeatedly improvise gestures for our event vignettes, without interacting with a partner.

## **3. Experiment 2**
