*3.2. Results*

Analysis for experiment 2 largely follows the analysis for experiment 1, with the additional inclusion of group (individual vs. dyad) as a deviation-coded fixed effect, along with context, round, and all interaction terms. Model selection follows the same procedure as experiment 1 and a full specification for each model can be found at https://osf.io/qzgjt (accessed on 21 March 2022).

**Sequence length.** The model including only scene type demonstrated an improved fit over the null model ( χ2 = 25.20, *p* < 0.001); adding additional fixed effects did not improve model fit. Inspection of the model suggests a main effect of context (β = −0.48, SE = 0.06, z = −7.64, *p* < 0.001), with participants producing shorter sequences on average for typical contexts, compared to atypical contexts (illustrated in Figure 9a).

**Figure 9.** Mean gesture length (**a**) and Proportion of gesture sequences with target action (TA) in final position (**b**), shown for each round and each context (typical/atypical).

**Target action position.** Again, the model including only scene type (but no other additional fixed effects) showed improved fit over the null model (χ<sup>2</sup> = 41.62, *p* < 0.001), with the model demonstrating preference for more target actions at the end of the sequence in typical contexts than in atypical contexts (Figure 9b; β = 6.00, SE = 0.49, z = 12.23, *p* < 0.001).

As in experiment 1, the remaining analyses focus on the first TA gesture found in each sequence, comparing matched TA gestures across typical and atypical trials.

**Base hand use.** We analysed the presence of base hand gestures (see Figure 10a) at each trial using a logistic mixed effects model. The model including all three main effects, as well as an interaction between round and group, showed improved fit over the model without the interaction term (χ<sup>2</sup> = 5.56, *p* = 0.02). The model revealed a significant main effect of context, with base hand use more common in typical than atypical contexts (β = 2.30, SE = 0.63, z = 3.65, *p* < 0.001), as well as a significant interaction between round and group (β = −1.12, SE = 0.46, z = −2.42, *p* = 0.02), indicating an overall increase in base hand use between the first and final round for dyads only.

**Figure 10.** Gesture form analyses for experiment 2, showing base hand use (**a**), proportion of target action gestures produced in the same location for typical and atypical contexts (**b**), and gesture size (**c**) for both dyads and individuals.

**Location.** The proportion of typical and atypical targets gestured in the same location is shown in Figure 10b. Logistic mixed effects models including fixed effects of either group or round did not improve fit over the null model (group: χ2 = 2.62, *p* = 0.11, round: χ2 = 0.01, *p* = 0.90), and the model revealed a significant intercept (β = 1.88, SE = 0.49, z = 3.81, *p* < 0.001), indicating an overall preference across groups to place target action gestures in the same location in typical and atypical contexts.

**Size.** Figure 10c indicates that participants produce a high proportion of path gestures across rounds and contexts, for both dyads and individuals. Analysis using a logistic mixed effects model to predict path gesture production did not find an improved fit over the null model when including either context (χ<sup>2</sup> = 0.55, *p* = 0.46), or context and round (χ<sup>2</sup> = 1.07, *p* = 0.30) as fixed effects, suggesting no reliable changes in the preference for path gestures across contexts and rounds.

**Repetitions.** We show the proportion of trials with repeated targets in Figure 11, and use a logistic mixed effects model including context, round, group, and iterability of the target action as fixed effects, along with their interactions. A model including all fixed effects and interaction terms showed improved fit over a reduced model (χ<sup>2</sup> = 26.32, *p* = 0.006). Inspection of the model revealed a main effect of iterability (β = −4.22, SE = 0.72, z = −5.87, *p* < 0.001) and an interaction between round and iterability (β = −1.39, SE = 0.36, z = −3.92, *p* < 0.001). Participants in both groups produce more repetitions for iterable than non-iterable target actions. For iterable items, repetitions increase between the first and final round but decrease between the rounds for non-iterable items.

**Figure 11.** Proportion of repeated targets, shown for iterable and non-iterable target actions, and for participants in the dyadic and individual conditions.

**Convergence.** We measure convergence between pairs and pseudo-pairs on the different formal properties of gestures as in experiment 1. In addition, we include pseudo-pairs created from individual participants (who never communicate with a partner) matched with other participants in the same condition. Figure 12 shows the mean form similarity for each set of paired participants. We analyse form convergence using a logistic mixed effects model as described in experiment 1. We include round and pair type as fixed effects, with round deviation-coded. We include by-pair and by-item random intercepts with a random slope of rounds for both intercepts. Model comparison indicated that the model with the interaction between round and pair type did not improve fit over a reduced model without the interaction (χ<sup>2</sup> = 0.12, *p* = 0.94). Inspection of the reduced model suggested a significant effect of round (β = −0.13, SE = 0.06, z = −2.19, *p* = 0.03), indicating that, across groups, we see a small reduction in the similarity between the first and final production rounds. We also find a significant effect of pair type for the pseudo-dyads (β = −0.19, SE = 0.07, z = −2.78, *p* = 0.005), but not for the pseudo-individuals (β = −0.002, SE = 0.07, z = −0.04, *p* = 0.97). Participants in the dyadic condition that did not interact with each other

demonstrated lower form similarity than participants who did interact with each other. Individuals who only produced gestures in isolation showed similar levels of convergence as participants who communicated together in dyads.

**Figure 12.** Similarity in form parameters across rounds for real-paired dyads (**left panel**) and pseudodyads, made up of participants in the dyadic condition paired with participants with whom they did not interact (**middle panel**) and participants in the individual condition (**right panel**), paired with other individuals with whom they did not interact.

#### *3.3. Experiment 2 Summary*

In experiment 2, we investigated the emergence of distinctions between gestures communicating noun-like and verb-like meanings during improvisation by individuals and following interaction between pairs of participants. We used an operationalisation of interaction that allowed for more unconstrained and organic turn-taking and repair strategies between participants than in experiment 1. We further compared productions by dyads before and after interaction with productions by individuals who repeatedly produced gestures over 3 rounds but without communicating the target scenes to a partner. We replicated findings from experiment 1. Participants produced shorter gesture sequences when describing targets in a typical context than in an atypical context. Participants were also more likely to place gestures for target actions in the final position of a sequence, and to use a base hand gesture, when describing typical (i.e., verb-like) contexts than atypical (i.e., noun-like) contexts. Finally, we found that the frequency of repetitions maps onto the iconicity of the event, with iterable items gestured with more repetitions than noniterable items. Notably, our findings from individuals (not in dyads) align in key ways with those from dyads and from experiment 1, suggesting that, while communication allows participants in pairs to converge on a shared system, the distinctions that do emerge are not driven by communication but can emerge through improvisation alone.

#### **4. General Discussion**

The categories of nouns and verbs are among the basic elements of human language (Bickerton 1990; Hockett 1977; Jackendoff 2002). Here, we asked whether systematic formal distinctions between noun- and verb-like forms emerge in improvised gestures, and whether those distinctions further conventionalise over time and through interactions. In particular, our work closely follows that reported by Abner et al. (2019), tracking how similar features (base hand, size of movement, and repetition) distinguish noun and verb signs in ASL, NSL and Nicaraguan homesigners. Table 1 provides a summary of our findings in comparison to those reported by Abner et al. (2019).


Across both experiments we report, participants make distinctions between gestures they produce for targets appearing in typical contexts (designed to elicit verb-like gestures) and atypical contexts (designed to elicit noun-like gestures). Gesture sequences for typical contexts are shorter than gesture sequences for atypical contexts. This difference in length is largely driven by the additional verb gesture used to describe the action in atypical contexts (e.g., dig, drop). The target object and target action can be conflated and articulated simultaneously for typical contexts (e.g., a *taking a photo* gesture contains information about the object, camera, and the action, taking a photo with a camera). In contrast, the atypical action must be specified separately from the target object (e.g., digging with a camera requires a *dig* gesture and also a *camera* gesture). The conflation of object and action in descriptions of typical contexts is not inevitable and, indeed, there are some examples of participants who produce gesture sequences where they specify object information in one gesture (e.g., tracing a rectangular shape to indicate the camera) before producing a target action gesture.

However, since producing object and action information in a single gesture is sufficient to describe the typical contexts in this study, object-only information is often left out of descriptions of typical contexts, rendering those descriptions shorter than descriptions of atypical contexts.

When we focus only on gestures for target actions that capture the same property in both typical and atypical contexts (e.g., pushing the button on a camera for the *takinga-picture* event and for the digging with a *camera* event), we find that target actions tend to appear in the final position of a gesture sequence for typical contexts, but not for atypical contexts. Previous silent gesture experiments have suggested that participants from different language backgrounds show a preference for verb-final sequences for nonreversible events (Goldin-Meadow et al. 2008; Hall et al. 2013; Meir et al. 2014; Schouwstra and de Swart 2014), and verb-final order (specifically, SOV) is considered grammatical across all documented sign languages (Napoli and Sutton-Spence 2014). Finally, our results dovetail with those reported by Abner et al. (2019), who found that signers across all three groups they studied (ASL signers, NSL signers, and Nicaraguan homesigners) produced verb (but not noun) targets in the utterance-final position. Our findings are therefore consistent with an interpretation that target action gestures act like verbs in typical contexts, but like nouns in atypical contexts.

We also find that participants across experiments and conditions produce more base hand gestures for target actions in typical contexts than in atypical contexts. Abner et al. (2019) reported findings for distinctions made using base hand articulation, though their findings are somewhat complex. Their results suggested that, for NSL signers, only those who had entered the signing community relatively late (when a language model had been established), used base hand articulation more often with verb targets than noun targets. There was a tendency for a similar pattern in Nicaraguan homesigners, but only in some of the individuals. Notably, Abner et al. (2019) found that ASL signers demonstrated very limited use of base hand gestures for both verb and noun targets, suggesting that the grammatical function and role of the base hand can vary cross-linguistically. Where they are used, Abner et al. (2019) suggest, base hand gestures iconically represent additional event arguments (such as the wall being hammered against), not properties inherent to an object, and therefore we might expect them to appear in verb-like productions more frequently than noun-like productions. Indeed, many of the strategies used to distinguish nouns and verbs cross-linguistically in sign languages reflect iconic features of objects and events. These features can then be systematised to distinguish grammatical categories (Wilbur 2008). For example, repetition can iconically represent event iterability, as our participants demonstrate: more repeated gestures are used when describing iterable events than non-iterable events. Findings from Nicaraguan homesigners and NSL cohort 1 signers indicate similar patterns—repetitions do not distinguish noun from verb targets, but do (not surprisingly) signal iterability. In contrast, ASL signers and NSL signers who entered the signing community later not only use more repetition overall for iterable items, but also use

repetition to distinguish noun and verb targets. Together, these findings sugges<sup>t</sup> that the grammatical use of repetitions to distinguish word classes may develop over time. Abner et al. (2019) further sugges<sup>t</sup> that using repetitions as a grammatical marker may emerge from the iconic use of repetitions. Some NSL signs for objects, which were associated with iterable actions, were repeated; as a result, repetition became associated with, and a marker for, nouns. In comparison, our finding from experiment 1 in which participants produce more repetitions for typical (verb) than atypical (noun) targets runs counter to this pattern, though the pattern we find is also attested in some sign languages (Kubus 2008; Schreurs 2006). This finding suggests that the grammaticalisation of repetitions into word class markers, while possibly grounded in the iconic relation to iterability, may be flexible in how it is applied to distinguish noun and verb forms. Certainly, across both experiments 1 and 2, repetitions strongly (and iconically) distinguish iterable from non-iterable events.

We do not find that participants make any distinctions based on the two remaining form properties we analysed—the size of target action gestures, or the location of target gestures. In both cases, iconic representation of events would predict that distinctions could emerge based on either property. For example, Kimmelman (2009) suggests that verb forms may be derived from embodied enactments of events, which may rely on larger, iconic movements than on more economic, reduced forms. Similarly, locations inherent to an event may be preserved in a verb or action sign (such as holding a camera to the face to take a photo) but produced in a neutral space for an object sign (as the location is not intrinsically linked to the object alone). That we do not find distinctions based on these parameters is not surprising for a number of reasons. Firstly, though common strategies such as size, location and repetitions are used across sign languages to distinguish noun and verb forms, and have been hypothesised to have their bases in shared, iconic representations, not all languages mark grammatical categories across all parameters. Indeed, the use and perception of some distinctions such as the size of the signing space can vary depending on the signer's cultural or linguistic experience (Emmorey and Pyers 2017; McCaskill et al. 2011; Mirus et al. 2001). In addition, some representations may be more flexible in earlier stages of language emergence, as our experiment aims to model. For example, although natural word order preferences are widely documented in silent gestures (Goldin-Meadow et al. 2008; Hall et al. 2013; Schouwstra and de Swart 2014), and word order preferences appear early in emerging sign languages (Napoli and Sutton-Spence 2014; Sandler et al. 2005), other properties may arise later through interaction with communities and transmission to new learners forming a linguistic community. In particular, we would expect spontaneous gestures, on the whole, to use a larger gesture space than conventionalised sign systems (Flaherty et al. 2020; Namboodiripad et al. 2016), which may obfuscate more fine-grained gesture size distinctions used across scene types. That is, size distinctions may first require a reduction in the gesture/signing space to be discernible. Indeed, in experiment 2, we find that participants show a strong preference to produce larger path gestures, regardless of context—there is little variability here with which a distinction based on context could emerge.

Across both experiments 1 and 2, we find that, although the distinctions participants produce may be grounded in iconic representations of events, participants who interact with each other converge on a shared system, producing gestures more similar to each other than would be expected if similarity was based on iconicity alone. In particular, interacting participants produce similar forms in both experiments 1 and 2 despite our two different operationalisations of communication, suggesting that the act of producing a communicative signal that is then interpreted by a partner is sufficient for conventionalised systems to emerge, regardless of the behaviours available in face-to-face interaction that might otherwise shape or facilitate the emerging communicative system (Healey et al. 2007; Roberts and Levinson 2017). However, the distinctions between typical and atypical targets that emerge across participants do so at the earliest stage of improvisation. These distinctions map most closely onto the findings reported by Abner et al. (2019) for Nicaraguan homesigners, who produce distinctions between noun- and verb- targets that are still highly

variable across individuals, except for the strong preference (also found here) to place verb-like productions at the end of a sequence. Furthermore, our findings indicate that communication in itself is *not* sufficient for the further systematisation of these distinctions that we see in ASL and later cohorts of Nicaraguan Sign Language—communication in our case did *not* lead to substantial additional development of the gestures produced to signal typical vs. atypical targets. Consistent with these findings, previous work suggests that *both* using communicative signals in interaction and learning those signals by naive users of the system shape the emergence of categorical structure (Motamedi et al. 2019; Nölle et al. 2018; Raviv et al. 2019; Silvey et al. 2019). Moreover, it is the repetition of these processes over time that leads to the cultural evolution of systematic distinctions (Kirby et al. 2014; Mesoudi and Thornton 2018; Tamariz and Kirby 2016). Although communicative systems at all stages distinguish between noun-like and verb-like targets, manual communication systems evolve noun and verb *categories* marked by multiple features (see Goldin-Meadow et al. 1994, for evidence of noun–verb categories in a child homesigner in the United States). As such, future work is needed to test how preferences to distinguish noun and verb forms evolve through repeated interaction and iterated learning.

Finally, in experiments 1 and 2, we contrasted two experimental approaches to modelling communicative behaviour. In experiment 1, we operationalised interaction using a reduced director–matcher paradigm in which interacting participants took set turns to produce and interpret gestures (they selected one meaning from a restricted set of four possible interpretations), and all participants received feedback about whether their interpretation was successful. In experiment 2, the operationalisation of interaction was less restrictive, with participants free to negotiate turn-taking and repair strategies, and no limit was put on the meanings they could consider. Although there were small differences in the systems that participants produced (for example, participants in experiment 1 produced more repetitions for typical actions), our results from the two experiments closely align with each other, highlighting the robustness of the improvisation paradigm.

A final, important point is that we find similarities between the noun–verb distinctions created by participants in both experimental paradigms and the noun–verb distinctions found in the naturally emerging language studied in Nicaragua (Abner et al. 2019). For example, early distinctions were based on the order of gestures in a sequence and the use of base hand gestures to mark typical (verb-eliciting) contexts. Experimental models can rarely provide a perfect analogue of language emergence in the real world (Kocab et al. 2018), not least because the participants all know a language. Moreover, the experimental paradigm contains time- and task-related constraints that do not directly replicate language use in the real world. However, our experiments exemplify how such methods can be used alongside data from natural languages to test specific predictions about the processes and mechanisms that drive language evolution. A growing body of work uses these paradigms, informed by the available data from emerging sign languages, to explore key questions about how languages emerge (Hwang et al. 2016; Meir et al. 2014; Motamedi et al. 2019, 2021; Özyürek et al. 2015).
