**3. Results**

In this section, we first present results of two single runs in order to explain the measures used and to give an intuition as to what one single run of the model looks like. Here, we explain the results of the language games—that is, for each language game, it is recorded if the game ends at form success (step two from Figure 5), culturally salient features success (step three from Figure 5) or bit update (step four from Figure 5). Additionally, we show the mean degree of iconicity and the mean lexical variability for each run.

Following these examples, the role of shared context is investigated by altering the number of groups (*n\_groups*) and the effect of population size is investigated by altering the number of agents (*n\_agents*). We consider the effect of these parameters on the mean lexical variability and the mean degree of iconicity. The model simulations presented are of 100 repetitions. The remainder of the parameter explorations can be found in Appendix A, which investigate the effect of the number of concepts (*n\_concepts*), the number of bits (*n\_bits*) and the initial degree of overlap between the culturally salient features and the form (*initial\_degree\_of\_overlap*).

Additional parameter explorations studying the role of *initial\_degree\_of\_overlap*, *n\_concepts* and *n\_bits* on lexical variability and iconicity can be found in Appendix A.

#### *3.1. Two Example Runs*

To show what one run of the model entails, we present the results from two single model runs. Both model runs differ only in one parameter, the number of groups (*n\_groups*), which determines which set of culturally salient features an agen<sup>t</sup> has. The first run

presented consists of one group and the second run presented consists of ten groups. The other parameters are the following:


#### 3.1.1. Language Game Results

First, we present a model run consisting of one group (*n\_groups* = 1), meaning that all agents belong to the same group. This results in all agents having the same set of culturally salient features.

In the language game step of the model, as shown in Figure 5, there are three ways in which the language game can end: 1. there is a match between the concepts associated with the sender's form and the receiver's closest form to the sender's form (*form success*, step 2 Figure 5); 2. there is a match between the concepts associated with the sender's form and the receiver's closest culturally salient features to the sender's form (*culturally salient features success*, step 3 Figure 5); or 3. for the form of the receiver corresponding to the concept associated with the form communicated by the sender, a bit which does not match the sender's is updated (*bit update*, step 4 Figure 5). These three steps where the language game can end are visualized in Figure 7 for the first 10 stages (left) and over all 2000 model stages (right). To further explain, in each time step, each of the 10 agents initiates 1 language game, which may end in form success, culturally salient features success or bit update. At each time step, the proportions of these language game results are visualized as a barplot. For example, in the run presented in Figure 7, at stage 1 out of the 10 language games played, 8 resulted in form success, 1 resulted in culturally salient features success and 1 resulted in a bit update.

**Figure 7.** The proportion of language game results (form success, culturally salient features success or bit update) with 10 agents all belonging to the same group (*n\_groups* = 1) for the first 10 model stages (**left**) and over the 2000 stages (**right**). At each stage, 10 language games were played. The x-axis starts at stage one because in stage zero there is only model set up and no language game. Across all stages of this run, the majority of the language games end with form success, with a small proportion ending with culturally salient features success (here abbreviated as CS features success). Over 2000 stages, shown on the right, the results were averaged over 50 consecutive model stages (i.e., each bar of the histogram represents the mean of 50 stages).

It is apparent that the vast majority of language games in this run of the model end after form success. In this run of the model, as all agents share the same set of culturally salient features (*n\_groups* = 1) and because all agents create their forms to be highly iconic (*initial\_degree\_of\_overlap* = 0.9), the forms of agents will be highly similar at the start of the simulation. The similarity between the agent's forms results in a majority of language games that are ended with form success. Thus, even though the forms stay highly iconic

(they are not changed as there is hardly any bit updating), the agents do not use the iconicity present (language games ending in culturally salient features success) as the language game typically ends with form success. However, throughout the simulation there is still a small proportion (around 10%) of language games ending after culturally salient features success. Few language games end with a bit update.

Second, we present a model run consisting of 10 groups (*n\_groups* = 10), meaning that each agen<sup>t</sup> is randomly assigned to 1 of the 10 groups. Because agents are randomly assigned to a group, this does not guarantee that all agents are in a different group. Once assigned to a group, agents are initialized with the set of culturally salient features generated for that group.

Figure 8 shows the results of the language games of 1 model run with 10 agents and 10 groups for the first 10 stages (left) and over 2000 stages (right). For example, in stage 1, 8 language games end with a bit update, 1 ends after culturally salient features success and 1 ends after form success. Over the 2000 stages, it is evident that the proportion of runs ending in a bit update decreases and the proportion of runs ending in form success increases. Over time, form success becomes the most prominent result of the language game, though a considerable amount of language games ending in bit update remains. On the other hand, there are fewer language games ending in *culturally salient features success*; it is clearly the most infrequent result.

**Figure 8.** The proportion of language game results (form success, culturally salient features success or bit update) for a model run with 10 agents randomly assigned to 1 of 10 groups (*n\_groups* = 10) for the first 10 model stages (**left**) and over the 2000 stages (**right**). At each stage, 10 language games were played. The x-axis starts at stage one because in stage zero there is only model set up and no language game. The majority of the language games at the start of the simulation end with bit update, while later, more end with form success and still a considerable amount end with bit update. Few language games end with culturally salient features success (here abbreviated as CS features success). Over 2000 stages, shown on the right, the results were averaged over 50 consecutive model stages (i.e., each bar of the histogram represents the mean of 50 stages).

In comparing these two example runs, it is evident that the results of the language games with 1 group and 10 groups are different. With 10 groups, bit updates happen much more often than for the run with one group. This is because with one group, if form success is not possible, then culturally salient features success often is as all agents share the same set of culturally salient features. However, with 10 groups, if form success is not possible, agents are likely to end the game with a bit update because it is unlikely that agents share the same culturally salient features, so culturally salient feature success is unlikely to occur. Thus, these two model runs demonstrate how the number of groups (determining the set of culturally salient features of the agents) affect the results of the language games, which in turn affect the degree of lexical variability and iconicity across the population.

#### 3.1.2. Lexical Variability and Iconicity

Figure 9 shows the mean lexical variability and iconicity over the 2000 model stages for the run with 1 group (left) and with 10 groups (right). As previously mentioned, the mean lexical variability is calculated by comparing each bit of each form per pairs of agents (the distance is 0 if all bits match or 1 if more than 1 bit differs), averaged over all agents at each stage. The mean iconicity is calculated by comparing the degree of overlap between each form and corresponding culturally salient features in an agent's language representation, averaged over all agents at each stage.

First, when all 10 agents belong to 1 group (as can be seen on the left in Figure 9), the degree of iconicity remains constant throughout the run, above 0.9. The mean lexical variability drops slightly and then stabilizes around 0.5. In contrast, when agents are randomly assigned to 1 of 10 groups, the picture is drastically different; as can be seen on the right in Figure 9, both the mean lexical variability and degree of iconicity decrease more than when all agents are assigned to the same group. Initially in this case, the lexical variability across the population is nearly at 1, i.e., the maximum distance possible between the forms of agents. As the forms of agents are initialized on the basis of their culturally salient features, it makes sense that the lexical variability is maximal given that (most) agents are assigned to different groups. From there, the mean lexical variability drops sharply, indicating that there is more lexical similarity across the population over time. The degree of iconicity also drops but stabilizes above 0.5. Given that the degree of iconicity calculation is performed on a bit by bit basis comparing the form to the culturally salient features, 0.5 would represent chance, i.e., an unstructured relationship between the bits of the form and culturally salient features. Though the degree of lexical variability is initially higher when agents are assigned to 1 of 10 groups, the degree of lexical variability decreases much faster and continues to do so, whereas when agents all belong to the same group, the degree of lexical variation (after a short drop in the first 100 stages) remains relatively stable.

**Figure 9.** The mean lexical variability and iconicity over the 2000 model stages for one run with 1 group (**left**) and 10 groups (**right**). With all agents belonging to the same group (*n\_groups* = 1), the degree of iconicity remains high, and the mean lexical variability across the population remains relatively constant, with more than half of the forms across the population being different. With 10 groups that agents could be assigned to (*n\_groups* = 10), the degree of iconicity drops and then stabilizes slightly above 0.5, with 0.5 representing an unstructured relationship between the bits of the form and culturally salient features. The mean lexical variability across the population drops sharply and then continues to drop slowly, indicating that forms become more and more uniform in the population over time.

Now that two examples with just one run have been discussed, we will show the results from 100 repetitions averaged per run with a focus on lexical variation.

#### *3.2. The Effect of Multiple Groups on Lexical Variation*

Figure 10 shows different group sizes (*n\_groups* = 1, 2, 5 and 10) and the mean degree of lexical variability and iconicity over 100 repetitions. The results from the examples in the previous section are in line with what is shown here; when there is only one group (i.e., all 10 agents have the same set of culturally salient features) at stage 0, there is already some overlap between forms in the population—a lexical variability value of approximately 0.6, indicating that 40% of forms associated with a concept are identical across the population at the start of the run. Over time, the mean lexical variability does not drop below 0.5. The

degree of lexical variability in the population stabilizes more quickly and higher than in the simulations with other groups sizes.

In populations with more groups, the mean lexical variability at the start of the run is high (between 0.8 and 1), as agents belong to different groups and their culturally salient features and hence their forms differ. From this initial point of high lexical variability, there is a sharp decrease in lexical variability. Thus, these populations move quickly towards more uniform form–concept pairings. The number of groups in the population determines at which point the mean lexical variability stabilizes. When there are more groups, the mean lexical variability stabilizes at a lower point. In other words, with more groups, there is more lexical uniformity.

In populations with more groups and hence more culturally salient features, agents cannot rely on shared culturally salient features to communicate. Thus, more often, as shown in the previous section, agents update their forms to be able to successfully communicate with other agents, which results in more uniform form–concept pairings across the population.

**Figure 10.** The mean lexical variability over the 2000 model stages for 100 repetitions of a run with 10 agents being assigned to different groups depending on the run. The dark line represents the mean and the shaded area represents the standard deviation of the 100 repetitions. It is evident that there is a relationship between the number of groups and the speed of the decrease of lexical variability, as well as the final amount of lexical variability in the population: The more groups in the population, the higher the initial lexical variability (at stage 0) but the lower the final lexical variability (at stage 2000). In addition, when there are fewer groups, the degree of iconicity is higher.

Additionally, there is a clear relationship between the number of groups and the degree of iconicity: With fewer groups in the population, the degree of iconicity is higher. As predicted, when there are fewer groups, iconic mappings are more useful as more sets of culturally salient features are shared across the population, and therefore the degree of iconicity remains higher. Moreover, as the number of groups increases, the additional difference in lowered iconicity is smaller (e.g., the difference in iconicity between 1 and 2 groups is larger than the difference in iconicity between 5 and 10 groups). In contrast to the lexical variability values, the degree of iconicity quickly stabilizes within the first few hundred stages.

#### *3.3. The Effect of Population Size on Lexical Variation*

In this section, we explore the effect of population size on lexical variation for different group sizes. Figure 11 shows different population sizes over time, considering populations consisting of 5, 10, 20, 50 and 100 agents.

In the early stages of the simulation, larger populations exhibit a higher degree of lexical variability than smaller populations. However, over time, larger populations exhibit a steeper decrease in lexical variability compared to smaller populations. In the final stages of the simulation, the larger population sizes exhibit the lowest degree of lexical variability (i.e., the most lexical uniformity). What can explain this?

In larger populations, there are initially more forms per concept (as forms are generated on an individual level). With agents in a larger population communicating with a larger number of agents, this results in more bit updates. In turn, bit updates typically decrease the degree of iconicity, thereby decreasing the chance of successfully communicating with culturally salient features success. This leads to a feedback loop whereby the frequent bit updates lead to a decrease in the possibility for communicating with culturally salient features success. This process is visualized in Figure 12. On the other hand, in smaller populations, there are initially fewer forms per concept. As agents communicate with a smaller number of agents, less bit updates occur. With fewer bit updates occurring, a higher degree of iconicity is retained, and thus the use of the iconic–inferential pathway (language games ending in culturally salient features success) can be successfully used.

**Figure 11.** The mean lexical variability over 4000 model stages for different population sizes (*n\_agents*), showing three different group values (*n\_groups*) determining the sets of culturally salient features of the agents. The dark line represents the mean and the shaded area represents the standard deviation of the 100 repetitions. Regardless of the number of groups, it is clear that the larger population sizes exhibit a lower mean lexical variability than small population sizes. In addition, when there are more agents, the level of iconicity is lower.

Across all population sizes, the more groups, the lower the mean iconicity level is (see Figure 11 iconcity for *n\_groups* = 10 vs. *n\_groups* = 1), as discussed in the previous section. In addition to this, it is apparent that population size and the number of groups interact in determining iconicity levels. When all agents belong to one group (*n\_groups* = 1), there

are larger differences in the mean iconicity level in the population than compared to when agents can be assigned to different groups (*n\_groups* = 5 and *n\_groups* = 10). The explanation for this relates to the feedback loop mentioned where a lower degree of iconicity stems from more bit updates. When there are more groups, regardless of the population size, agents cannot rely on the iconic–inferential pathway to successfully communicate (language games ending in culturally salient features success) because their sets of culturally salient features differ. With more groups, the feedback loop is present across all population sizes: A lower degree of iconicity stems from more bit updates, here due to the inability of using the iconic–inferential pathway.

**Figure 12.** The feedback loop from bit updating to the use of culturally salient features success visualized.

## **4. Discussion**

Here, we present a first step in developing a model of how shared cultural context (allowing for the use of iconic mappings) may influence lexical variation in sign language emergence. We have shown that in a model where agents can rely on iconic mappings between a form and culturally salient features in addition to form–concept mappings, populations with a high degree of shared context (operationalized in the model as a smaller number of groups determining the culturally salient features of agents) retain a higher degree of lexical variation. In contrast, populations with many different cultural contexts do not retain the high degree of lexical variation present in language emergence; instead, because these populations cannot rely on iconic mappings between form and culturally salient features, the language becomes more uniform overtime. Overall, these results provide support for the idea that shared context facilitates a high degree of lexical variation (de Vos 2011; Meir et al. 2012).

The main contribution of this model is a novel representation of iconicity, operationalized as a mapping between the bits of the culturally salient features and forms. This has allowed us to consider how iconic properties allow for the retention of lexical variation in culturally homogeneous groups. Crucially, without the iconic–inferential pathway, individuals would need to rely on the conventional link requiring memorizing the association between concepts and forms. Though not tested here, we speculate that a model with only the conventional link would predict a lower degree of lexical variability in communities with more shared context, or at least a comparable degree of lexical variability to communities with less shared context.

In addition to the degree of lexical variability, the model generates predictions about how iconicity is retained in the early stages of language evolution. In populations with a high degree of shared context (i.e., a smaller number of groups), a higher degree of iconicity is exhibited. These populations largely retain the iconicity present in language emergence because agents initially have similar forms (in the model, there was a high degree of initial overlap between forms and culturally salient features), and hence they can typically use the conventional pathway, but if their forms do not match, they can often rely on the iconic–inferential pathway. As agents rarely need to update their forms, a high degree of iconicity is retained. For populations with more diverse backgrounds (i.e., a larger number of groups), the degree of iconicity in the population decreased compared to more

homogenous populations. We are unaware of studies comparing iconicity levels across signing communities with different social structures, but the model generates a prediction which could be empirically tested. It should be noted that in real life, the dynamics of the language game with respect to the two pathways are likely different, as the conventional link has priority over the iconic–inferential pathway in the model. In real life, rather, we assume there is more flexibility with regards to which route is used. We do not expect that the order of the conventional link and the iconic–inferential pathway in the language game has a strong effect on the model results, given that both occur before the form updating step, the step which has ramifications on the degree of lexical variation and iconicity.

We have also explored how population size, in addition to the number of groups, affects lexical variation. We find that larger groups exhibit more lexical uniformity than smaller groups, as found by another computational model in which the lexical variant chosen by the sender depends on their familiarity with the receiver, as agents keep track of individual preferences as well as a group-level preference (Thompson et al. 2020). Interestingly, our model finds the same result without storing information about the frequency of interaction between agents. Instead, the group that agents belong to determines the initial similarity between forms and the ability for agents to rely on the iconic–inferential pathway. All in all, our model provides support for the theories proposing that shared context and population size have an effect on lexical variation in situations of language emergence. Further work must be conducted to determine the precise contribution of each.

The current model is simple—the language model is basic, and there are few model parameters. Simple models permit us to formalize and understand the relationships present in complex systems (Smaldino 2017), such as in the emergence of language. In this way, the relationship between shared context and lexical variability can be studied with minimal confounding factors. However, the model presented here inherently lacks much of the complexity present in signing communities, factors which may have an effect on the degree of lexical variability and iconicity. This model admittedly has several shortcomings, which we discuss and either propose as future model extensions or as general limitations of the model.

One of the biggest shortcomings of this model is that agents only store one form per concept. All sign languages exhibit lexical variation, and while the nature of this variation is still being determined, it is clear that individuals sometimes use multiple forms per concept (i.e., productive synonyms) or understand multiple forms per concept (i.e., perceptual synonyms, see Discussion in Mudd et al. 2020). With regards to productive synonyms, chaining forms has been attested in several shared sign languages, such as in ABSL (Meir et al. 2010), SJQCSL (Hou 2016), in the sign language of Amami Island in Japan (Osugi et al. 1999) and in Kata Kolok (Lutzenberger et al. 2021; Mudd et al. 2020). In addition, compounding is a strategy that has been observed in CTSL (Ergin et al. 2021), SJQCSL (Hou 2016), in the sign language of Amami Island in Japan (Osugi et al. 1999), in ABSL (Meir et al. 2010) and in Kenyan Sign Language (Morgan 2015). For chaining variants together and for compounding, it is necessary that the language representation in the model allows for storing several forms per concept. Hence, the model does not account for productive synonyms. On the other hand, for perceptual synonyms, where an individual can learn a form–concept association even though they might not use it (unless retrieved using the iconic–inferential pathway), it is also necessary to store multiple forms per concept. In the model, perceptual synonyms can be accounted for when agents use the iconic–inferential pathway; the agents have not stored an additional form mapping for a concept, but they may be able to retrieve it. However, in real life, it is much more probable that individuals retain multiple forms associated with a concept even though they have a preference for one form. Thus, in order to account for these different types of synonyms, multiple forms would need to be stored per concept. However, doing so would complicate the dynamics of the language component of the model; it would be necessary to assign weights to each form, as well as assigning a weighing factor for taking iconic affordances into account.

In addition, the update rule in interaction models adapting one's variant in an extremely simplistic, perhaps unrealistic manner. In the case of a bit update (if communication at the form and culturally salient features level has not been successful), the receiver always adapts to the sender. There are many reasons why one individual may adapt their linguistic preferences, such as due to a frequency bias or prestige bias (Boyd and Richerson 1988). Here, agents do not keep track of how many times they have heard a certain variant, nor do agents have varying levels of prestige in the community. The agents simply update if communication has failed. Currently, the language update rule in the model is most akin to explicit feedback from the sender to the receiver. Though explicit feedback is one mechanism used in repair, it is not the only avenue by which individuals come to successfully communicate. Research from the repair sequences in cross-signing, where deaf signers with different native languages meet and communicate, offers an insight into the process of language grounding in its initial stages (Byun et al. 2018). In short, signers anticipate difficulties in communicating and typically produce "try markers" to signal this. The individual producing a try marker essentially asks their communicative partner to produce a grounding sequence, such as an affirmation that their production was understood or a request for clarification. This example highlights that negotiation and repair are complex and nuanced. One way in which the model can be extended is to have more variety in who updates and why exactly, following research from communication in contexts of language emergence and cultural evolution.

Related to this, the update rule dictates that the receiver changes one bit to match the corresponding bit of the sender. In a way, this could be akin to moving phonetically closer to the sender's form. However, this is unrealistic in cases where two forms are very different. Take the example of "sofa" and "couch", both forms referring to the same concept. In the event of communicative failure, it would not make much sense for an individual to adapt only part of the word (e.g., "couch" becomes updated to "souch"). Rather, what would make more sense in this situation is for one individual to learn and potentially use the form sofa from now on. For a more accurate model of human communication, the update rule needs to account for different situations (from learning an entirely new lexical variant to adapting one's existing form phonetically). More research into findings from language acquisition, psycholinguistics and sociolinguistics is necessary in order to adapt this element of the model.

Another unrealistic aspect of this model is that in reality between individuals from different cultures there is likely overlap between the culturally salient features corresponding to concepts, something that is not present in the model as all culturally salient features are generated independently for each group. Returning to the example of *pig*, two individuals from different cultural contexts (e.g., one from a farming community and another from an urban area) are both likely to have salient features comprised of the shape of the pig, the appearance of the animal's face with ears and a snout, the fact that it is an animal, as well as culturally specific points. Though there is undoubtedly overlap in salient features across cultures, for some cultures certain aspects may be more salient than for others. In an urban community with less interaction with pigs, the facial features or the fact that it is food might be more salient, while for a farming community, how it is killed could be more salient. Yet another consideration is how easy it is to represent different facets of culturally salient features. It has been shown in different sign languages that certain semantic categories prompt preferences in production, called patterned iconicity (Padden et al. 2013). For example, across languages signers prefer to use personification (where the culturally salient features are mapped onto the signer's body) for animal signs (Hwang et al. 2017). In the model, as all culturally salient features are generated for specific groups, there is no relationship between the culturally salient features across groups. Given that certain aspects of culturally salient features are typically shared cross-culturally and that patterned iconicity exists, a natural extension of the model would be to model culturally salient features as related, with some degree of overlap between the groups. Better yet, the culturally salient features should even be different for each individual, though more

similar for those in the same group. One final point about the culturally salient features is that in the model only forms can be updated. However, which features are culturally salient in real life become adapted over time, and thus, in the model, this may be important as well. How exactly to model this remains an open question.

Though there are undoubtedly many more ways in which the model can be updated to more closely resemble signing communities and the interaction occurring within them, one final point to address is interaction in the model. In this version of the model, agents all have an equal probability of interacting. This is not the case in real communities—individuals are more likely to interact with some than others. The dynamics in shared signing communities and Deaf community sign language communities with regards to interaction may differ or may be shaped merely because of the size. As shared sign languages are typically small, insular communities, there is more community-wide interaction. On the other hand, in Deaf community sign language communities, which often span entire countries, individuals would typically interact with those in their same city and/or school. This is reflected in the variation observed in these communities; for example, in BSL, a Deaf community sign language, as individuals are more likely to interact with those in their same region, there is substantial regional variation (Stamp et al. 2014). In terms of adding this element of interaction to the model, it would be possible to have agents prefer to select those nearest to them to interact with. This implementation detail may have consequences for the degree and speed of lexical variability and should thus be the subject of future work.

All in all, this research is a first step in developing a model to formalize how shared context affects the degree of lexical variation in sign language emergence. It is unclear to what extent these results may extend to language emergence in our earliest language-using ancestors, who lived in small, insular communities, or esoteric communities (Wray and Grace 2007) and whose communication was likely multi-modal (Levinson and Holler 2014; Perlman 2017). It has been proposed that iconic signs are at the root of proto-language emergence (Számadó and Szathmáry 2012). In addition to the iconic affordances of the manual modality, there is ample evidence that iconicity is also possible and used in the vocal modality and has been shown in spoken languages (Johansson and Zlatev 2013; for a review, see Perniss et al. 2010). As proposed by Meir et al. (2012), it seems plausible that our earliest language-using ancestors residing in small, insular groups had a highly variable lexicon, which may have become more systematic over time. By considering different parameter settings, this model may also provide insights for investigations into what language might have looked like in early human evolution.
