2.1. Salience: Main Characteristics, Related Theories, and Multifactorial Approach
- (i)
Relational (or singling-out): the prominent status is the result of competition among language units of the same level (e.g., syllables, referents);
- (ii)
Dynamic: the prominent status may change;
- (iii)
Structural attraction: prominent units are structural attractors in their domain.
According to
Von Heusinger and Schumacher (
2019), the relational principle results from the fact that an entity is considered salient only if it is more salient in relation to the other entities. In the process of interpreting anaphors, discourse referents (realized by referential expressions) are in competition with one another and it is the most salient entity that attracts the attention of the hearer and provides an anchor for the resolution of an anaphoric expression. In Example (1), the referents
zhǔrèn ‘the director’ and
zhè běn shū de zuòzhě ‘the author of this book’ are in competition. After the interpretation of the second clause, it is respectively
zhǔrèn and
zhè běn shū de zuòzhě that attract more attention of the hearer in (1) a and (1) b, and become the salient referents in their own context.
(1) | a. | [主任]i把这本书的作者介绍给我,[Ø]i鼓励我珍惜这次难得的机会。 |
| | [Zhǔrèn]i | bǎ | zhè | běn | shū | de | zuòzhě | jièshào | gěi | wǒ, |
| | director | ba | this | clf | book | de | author | introduce | to | 1sg |
| | [Ø]i | gǔlì | wǒ | zhēnxī | zhè | cì | nándé | de | jīhuì. | |
| | | encourage | 1sg | cherish | this | clf | rare | de | opportunity | |
| ‘The director introduced me to the author of this book and encouraged me to cher-ish this rare opportunity.’ |
| b. | 主任把这本书的作者介绍给我,[Ø]i是一个好象刚毕业的小姑娘。 |
| | Zhǔrèn | bǎ | [zhè | běn | shū | de | zuòzhě]i | jièshào | gěi | wǒ, |
| | director | ba | this | clf | book | de | author | introduce | to | 1sg |
| | [Ø]i | shì | yī | ge | hǎoxiàng | gāng | bìyè | de | xiǎo | gūniang, |
| | | is | a | clf | seem | just | graduate | de | little | girl |
| | ‘The director introduced me to the author of this book, who appeared to be a young lady who had just graduated.’ |
The second criterion emphasizes that the degree of salience of an entity may change as the discourse progresses. As a result, a referent considered salient enough to be the referent of an anaphoric expression at a particular time (or place) may lose (or maintain) its high-saliency status later, as a result of the influence of salience factors (see below). The third characteristic proposes that salient units may be more central in the process of structure building and may contribute to more operations or structures. This seems to be the corollary of the special attention attributed to the most salient entity and could explain the fact that a salient referent can be more easily retrieved by a reduced linguistic form.
In the literature, several theories close to the notion of salience share this perspective that certain entities are more salient (or central) than others in the consciousness of the speaker and the hearer, and that there is a correspondence between linguistic form and degree of salience. In centering theory (
Grosz et al. 1995), the ‘centers’ of an utterance are the entities (or semantic objects) that link that utterance to others in the segment of the discourse in question. According to
Grosz et al. (
1995), each utterance has a set of forward-looking centers (C
f) that are realized through the constituent expressions of an utterance (U). The elements of C
f are ranked according to their relative salience. Moreover, each utterance other than the initial utterance contains a single backward-looking center (C
b) which is to be chosen from the C
f of the preceding utterance and represents the discourse entity with which the current utterance is most concerned. Various factors can influence the ranking of C
f in an utterance. Most of the work in centering theory emphasizes the role of syntactic functions, and considers that the subject is more likely to contribute to a rise in the ranking. Other factors such as word order, subordination, and lexical semantics are also assumed to affect the ranking.
In accessibility theory (
Ariel 1990), the choice of referential expressions (or accessibility markers) by the speaker tells us about the cognitive accessibility of the referent in the mental representation of the discourse. A speaker will use a high (or low) accessibility marker to encode a referent that is assumed to be accessible (or inaccessible) to the hearer. Four factors are considered to have a determining effect on the degree of accessibility:
- (i)
Distance: The distance between the antecedent and the anaphor (relevant to subsequent mentions only);
- (ii)
Competition: The number of competitors on the role of antecedent;
- (iii)
Saliency: The antecedent being a salient referent, mainly whether it is a topic or a non-topic;
- (iv)
Unity: The antecedent being within vs. without the same frame/world/point of view/segment or paragraph as the anaphor. (
Ariel 1990, pp. 28–29)
In addition to the accessibility theory, the Givenness Hierarchy (
Gundel et al. 1993) also intends to associate different uses of referential expressions in discourse with the cognitive status of referents in the mental representation of interlocutors. A hierarchy of six cognitive statuses ranging from ‘in focus’ to ‘type identifiable’ is suggested:
in focus > | activated > | familiar > | uniquely identifiable > | referential > | type identifiable |
it | that/this/this N | that N | the N | indefinite this N | a N |
According to
Gundel et al. (
1993), a cognitive status higher (or more to the left) in the hierarchy includes all the lower statuses, and not the reverse. For example, an entity in focus is necessarily activated, whereas an activated entity is not necessarily in focus. With this inclusive feature, the hierarchy can allow the use of a referring expression corresponding to the lower cognitive status for an entity of a higher status, which is different from the accessibility theory which considers that the choice of a marker corresponds to a given degree of accessibility in the accessibility scale.
In fact, if the cognitive status of referents is often analyzed through the observation of referential expressions, it should be pointed out that it seems more likely that the hearer establishes a referent in his mental representation of the discourse and that he relates subsequent references to this referent to his mental representation, rather than to the original linguistic expression in the text (
Brown and Yule 1983). While the entities of discourse are virtually present in the mental representation of the interlocutors, the salience, as a property of the entities, is neither tangible nor visible. It is thus difficult to learn, in a direct way, the degree of salience of entities.
Most of the above-mentioned studies agree that the lexical form of an entity could reflect the salience degree of a referent in its immediate context, especially for reduced lexical forms which represent salient referents. In our analysis, salience is quite close to but different from the notion of accessibility. On the one hand, in accessibility theory, the emphasis is put on the one-to-one relationship between the form of an expression and the cognitive status of the entity to which the expression refers, with a more or less static view. We consider that the lexical form of an entity is only a reflection of the salience degree of a referent in its immediate context. And this reflection of the salience degree by the form of expressions is more complex than a one-to-one relation in an authentic text. Our view of this relationship is broadly consistent with that of
Gundel et al. (
1993), who argue that a referent with ‘in focus’ (salience) cognitive status may be realized prototypically by reduced forms, or less frequently by other linguistic forms generally related to a less salient referent. Therefore, even if salient referents are not always introduced by reduced referential expressions, high salience markers (anaphoric personal pronouns and zero pronouns) necessarily encode salient referents in their context of occurrence.
On the other hand, the degree of salience does not depend solely on the four factors in accessibility theory. If the discourse entities are constantly updated by textual data, the characteristics of the pronoun, of the antecedent, of other elements (e.g., verbs and grammatical constructions) of the relevant sentences (or, more broadly, of a discourse segment), the inherent properties of the referent, and the relational properties between the antecedent and anaphoric expressions are all likely to influence salience, hence the importance of a multifactorial approach to salience analysis (
Landragin 2004;
Hou and Landragin 2019). In accessibility theory, only the distance factor has been measured quantitatively to demonstrate distributional differences between different accessibility markers (i.e., pronoun, demonstrative, and definite description) and their antecedent. In our study, we will measure and compare several factors in two languages to understand their contribution to referential salience. By adopting this quantitative and contrastive method, which goes beyond the scope of accessibility theory, we aim to provide empirical evidence supporting the multifactorial nature of salience. This evidence will not only enhance our understanding of salience in cognitive terms but will also contribute to a better understanding of anaphora interpretation.
In our exploration of salience from a relational perspective, we consider that the salience of an entity is determined not only by the factors that are associated with the entity in question, but also by those that arise from the contexts of its potential competitors. This relational point of view is, however, taken into account by the centering theory (
Grosz et al. 1995), which proposes a ranking of C
f according to their degree of salience. However, the centering theory focuses on local coherence and models the relationship between two consecutive utterances, whereas an utterance can be linked to another more previous utterance. As a result, this theory could not explain cases where an anaphoric expression that marks high salience is not linked to an entity realized by an expression in the preceding utterance (i.e., where an anaphora and its antecedent are not located in two consecutive utterances), as well as cases where two expressions that are markers of high salience are found in the same utterance. A focus on local coherence might also miss factors that have a more global influence, such as factors from the context of encyclopedic knowledge and general cognitive processes (e.g., factors associated with the inherent semantic properties of a referent). By extending the analysis beyond immediate linguistic elements to encompass broader discourse factors, our approach offers a more nuanced understanding of the anaphora–antecedent relationship.
In our conception of salience, there is no limit to the number of salient entities in a single utterance, but the durability of the high-salience status of two or more entities over the course of the processing of the entire utterance must be questioned, as the analysis of salience must also take into account the moment and progress of the current processing or production. An entity is salient in relation to its own context and through the properties (or factors) that belong to it. That is to say, high salience status is the result of an accumulation of factors related to (but not limited to) the properties of the antecedent and the anaphora, the properties of other elements (i.e., referential, verbal or other elements) in the sentence of the antecedent and in that of the anaphora (or, even more broadly, in a segment of discourse), the inherent properties of the referent, the relational properties between the antecedent and the anaphora, the situational context, etc. In Example (2), the salience status of referents cannot be established solely on the basis of the content of the first sentence. Instead, the whole situation constructed by the two sentences in (2) involves a set of potential factors (such as syntactic function, syntactic parallelism, or animacy), making the referents ‘Susan’ and ‘Betsy’ salient for being the referents of
elle and
lui, respectively.
(2) | a. | [Susan]i | a | offert | un | hamster | à | [Betsy]j. |
| | Susan | has | given | a | hamster | to | Betsy |
| b. | [Elle]i | [lui]j | a | rappelé | que | | |
| | She | her | has | reminded | that | | |
| | les | hamsters | étaient | assez | sauvages. | | |
| | the | hamsters | were | quite | wild. | | |
| | ‘a. Susan gave Betsy a hamster. |
| | b. She reminded her that hamsters are quite wild.’ | [Cornish (2000)] |
In this article, we consider salience as the property of a discourse entity to be more in the center of attention in relation to other entities, in the mental representation of the speaker and the hearer, at a specific moment, and in a specific context. The notion is characterized by its relational, dynamic, and structural attraction aspects. Moreover, the complexity of the notion requires a model that considers the salience from a multifactorial perspective. According to
Landragin (
2004), two dimensions of salience can be distinguished, namely factors related to the cognitive aspect, such as perceptual intentions, subject attention, memory or affect, and factors related to the physical aspect. The latter includes, on the one hand, formal physical factors, such as salience due to particular syntactic constructions, syntactic function, and word order, and on the other hand semantic physical factors such as salience related to the thematic role or the theme (or topic) of the utterance. In line with this research,
Hou and Landragin (
2019) revisited salience factors and categorized factors into syntactic, semantic, textual, and pragmatic domains:
- (i)
Syntactic factors: syntactic function, grammatical constructions with salience effect, syntactic parallelism, and syntactic hierarchy;
- (ii)
Semantic factors: verb semantics (in the utterance of the antecedent or of the pronoun) and referents’ semantic features;
- (iii)
Textual factors: order of occurrence of the referents, recency (distance), frequency of occurrence of the referents, uniqueness, and main character;
- (iv)
Pragmatic factors: pragmatic constraint and the given–new distinction.
The influence of multiple factors in salience analysis or in anaphora resolution has been observed in several languages, such as French (
Landragin 2004,
2015;
Schnedecker 2011), English (
Chiarcos 2011), Spanish (
Lozano 2016;
Martín-Villena and Lozano 2020) for L2 Spanish learners, and English (
Quesada and Lozano 2020) for L2 English Learners. In this study, we examine these phenomena in light of an original study of salience in Chinese, aiming to de-lineate the specific characteristics and underlying mechanisms that drive referential salience in this language, and especially in a contrastive approach (French/Chinese). It is in this multifactorial and contrastive approach that we analyze five salience factors in this study: syntactic function, syntactic parallelism, animacy, mobility, and main character.
2.2. Salience Factors under Investigation
After clarifying our approach to the notion of salience, we review the discussions in the literature on the factors analyzed in this study in order to examine if they have a statistically significant influence on referents’ salience, and if the factors show similar or different effects in Chinese and in French. Five representative factors among all the factors discussed in
Hou and Landragin (
2019) were selected, since these factors were found to be influential in both languages we are analyzing, and they consistently appear across the corpus, ensuring a robust dataset for analysis. The other factors have not been annotated and examined, since annotating all the factors is very time consuming, and some factors, such as syntactic constructions with salience effect, verb semantics (of implicit causality), the concrete/abstract nature of referents or pragmatic constraint, have a relatively restricted occurrence or are even virtually unobservable in our quantitative analysis corpus, which proves to be quite different from the materials used in psycholinguistic studies (
Stevenson et al. 1994;
Sun 2014). In order to analyze these factors quantitatively with a corpus-based approach, it would be better to adopt a different methodology than the one used in this research, and to consider, for example, a search of the targeted constructions in corpus databases or in a larger corpus collection built specifically for this purpose.
In the literature, it is often argued that the most salient entity in a French sentence is the one that occupies the syntactic function of the subject. This argument is put forward especially in the work on Centering Theory and confirmed by psycholinguistic experiments (
Matthews and Chodorow 1988;
Gordon and Chan 1995;
Hudson-D’Zmura and Tanenhaus 1997). In these experiments, a self-paced reading test and reading comprehension test were used to show that reading time is faster when the antecedent occupies the subject function. In addition to the subject, other functions (or values of the syntactic function factor) can be ranked according to their ability to contribute positively to the salience of entities (
Grosz et al. 1995).
In the above-mentioned research, direct and indirect objects are classified in the same group, and it does not distinguish between the two. According to a cognitive point of view (
Van Hoek 2007), when there are two objects in the sentence, the degree of salience of the direct object (DO) and that of the indirect object (IO) differs. While the subject functions as the most salient entity (or Figure in cognitive terms) in the sentence, the DO functions as the second most salient entity (or primary landmark in cognitive terms) and is more prominent than the other object (the secondary landmark), which yields the following hierarchy:
(3) | Subject > direct object > indirect object > other |
In Chinese, the topic (if there is one in the sentence) is considered to be the function that contributes the most to a referent’s salience (
Jiang 2004,
2017;
Wang 2004). Although ‘topic/theme’ is primarily considered to be a pragmatic notion (
Reinhart 1981) or a notion of information structure (
Lambrecht 1994), and although the ‘topic–comment’ structure is universal, it should be noted that languages have different formal devices to encode it, hence the importance of distinguishing a pragmatic topic which constitutes what the comment is about in a ‘topic–comment’ structure from the syntactic topic which is the formal device of a pragmatic topic (
Gundel 1988). This distinction is especially important for Chinese (
Li and Thompson 1976;
Huang 1992;
Her 1991;
Shi 2000), which is considered as a pragmatic language (
Huang 1994,
2000) and a topic–prominent language (
Li and Thompson 1976). This being said, a pragmatic topic is not always encoded by a syntactic topic (it can also be encoded by a syntactic subject). Syntactic topics, however, refer always to pragmatic topics. In Examples (4) and (5), the expressions
zhè kuài jiāsù de suìpiàn (‘the accelerating fragment’) and
tā (‘it’), which are not subjects of the sentences, constitute the syntactic topics and encode also the pragmatic topics in (4) and (5).
(4) | 对于[这块加速的碎片] topic, 舰队太空监测系统只发出了一个三级攻击警报, … |
| Duìyú | [zhè | kuài | jiāsù | de | suìpiàn] topic, | jiànduì | |
| as.for | this | clf | accelerating | de | fragment | fleet | |
| tàikōng | jiāncè | xìtǒng | zhǐ | fāchū | le | yī | gè |
| spatial | surveillance | system | only | issue | pfv | a | clf |
| sān | jí | gōngjī | jǐngbào,... | | | | |
| three | level | attack | alarm | | | | |
| ‘As for the accelerating fragment, the fleet’s space surveillance system issued only a level-three attack alarm, …’ |
[Hēi’àn sēnlín ‘The Dark Forest’, Liu Cixin (excerpt)] |
(5) | [它] topic [飞行的速度] subject 很慢,… |
| [Tā] topic | [fēixíng | de | sùdù] subject | hěn | màn, … |
| 3sg | flying | de | speed | very | slow |
| ‘Its flying speed was very slow,...’ |
[Hēi’àn sēnlín ‘The Dark Forest’, Liu Cixin (excerpt)] |
Except for the difference in the primacy of topic function in Chinese,
Wang (
2004) and
Jiang (
2004,
2017) propose the same ranking of other values as in French:
(6) | Topic > subject > object(s)> other |
Another essential factor is syntactic parallelism, also called structural parallelism. This is a phenomenon whereby anaphoric pronouns prefer to co-refer to an element having the same syntactic function in the previous clause. Unlike the previous factor, which is a syntactic property of the antecedent expression, syntactic parallelism concerns both the properties of the antecedent and those of the anaphor, or more precisely a relational property between the two expressions. In the literature, this phenomenon was first observed and considered for pronouns in subject function (
Grober et al. 1978;
Zhu 2002), as shown in example (7), and later for the interpretation of pronouns in object function (
Chambers and Smyth 1998;
Jiang 2004), as shown in example (8). In our analysis, we consider that there is a parallel relationship between the antecedent and the anaphor in cases where both expressions function as subject, DO, or IO.
(7) | Jean | a | critiqué | Paul, | |
| Jean | has | criticized | Paul | |
| et | il | est | parti | précipitamment. (il = Jean) |
| and | he | has | left | in.a.hurry |
| ‘Jean criticized Paul, and he left in a hurry.’ |
(8) | Jean | a | critiqué | Paul, | |
| Jean | has | criticized | Paul | |
| et | Marie | l’ | a | insulté. (l’ = Paul) |
| and | Marie | him | has | insulted |
| ‘Jean criticized Paul, and Marie insulted him.’ |
In addition to syntactic properties, we also analyze two semantic factors, animacy and mobility, which are the inherent properties of referents. It is often discussed in the literature, particularly in cognitive linguistic and psycholinguistic approaches, that animate entities are generally more salient than inanimate entities in both French and Chinese (
Lyons 1980;
Comrie 1989;
Langacker 1991;
Pattabhiraman 1992;
Hou and Sun 2005;
Wang 2014). On the other hand, the semantic feature ‘mobility’ is less often analyzed as a salience factor. According to
Talmy (
2000),
Zhang (
2007), and
Schmid (
2010), movable entities are supposed to attract more attention than immovable entities and are therefore expected to be more salient. In this article, through the exploitation of corpus data, we attempt to confirm the influence of the mobility factor on salience.
In order to decide which non-human beings we consider animate, we adopted
Yamamoto’s (
1999) criterion that animate entities must have a face. Thus, body parts of a human or an animate object will be treated as inanimate. Although body parts have a more or less animate characteristic, this animate characteristic is in fact transferred from the entire animate (or human) entity. In other words, they do not possess in themselves this animacy. For the mobility factor,
Schmid (
2010) and
Talmy (
2000) consider that immovable entities have a permanent location. In addition to this criterion, in order to distinguish movable entities from immovable ones, we consider that movable entities are those that have, undoubtedly, the ability to move, or those that undergo a change in location in our text excerpts. As shown in example (9),
tā (‘she’) is considered as an animate and movable entity, while
tā de yī zhī shǒu (’one of his hands’) is considered as an inanimate and movable entity.
(9) | 她蹲在他跟前,Ø拉起他的一只手,Ø觉得手还是热的。 |
| Tā | dūn | zài | tā | gēnqián, | | |
| 3sg | squat | at | 3sg | in.front.of | | |
| Ø | lāqǐ | tā | de | yī | zhī | shǒu, |
| | take | 3sg | de | a | clf | hand |
| Ø | juéde | shǒu | háishì | rède. | | |
| | thik | hand | still | warm | | |
| ‘She crouched down in front of him, took one of his hands and saw that it was warm.’ |
| [Le Ventre de Paris ‘The Belly of Paris’, Émile Zola (excerpt)] |
The last factor analyzed—main character—is categorized as a textual factor.
Sanford and Garrod (
1981) consider that a particular centrality is given to main characters when interpreting anaphors in written texts.
Lima and Bianco’s (
1999) experiments show that the textual cue of the main character is crucial for anaphoric interpretation among French students. According to their study, references to the main character are always easier to understand, irrespective of the syntactic functions of the referent. In the corpus study of
Jiang (
2004), it is found that when only one main character is involved in a Chinese discourse, zero anaphora may even go across clauses or sentences to refer to the main character (which is mentioned several clauses before). In our study, we determined that the main character is the most often mentioned referent in our four text excerpts.