1. Introduction
Memory is described as a reconstructive process from reality [
1,
2], which implies a subjective interpretation of someone’s speech or writing [
3]. Not surprisingly, the assessment of eyewitness testimonies can be considered a difficult task. Moreover, when an individual intentionally lies, two types of deception might appear: (i) creating new information (primary deception); (ii) concealing the difference between the statement and the intention to narrate it credibly (secondary deception) [
4]. Focusing on intentional deception is of interest to different fields through credibility, from forensics to linguistics. However, linguistics does not seem to have delved into the nature of this phenomenon since the last decade [
5].
Artificial intelligence has become a promising tool in this field. More precisely, Natural Language Processing (NLP), which is based on the premise that language is a cognitive process that underlies executive functions and memory, among others, could be of interest on this front. In this way, the current study was framed in the contributions that both psychology and linguistics can offer to witness testimony.
Witness testimony can be one of the most compelling types of evidence in a trial due to its underlying credibility, which is also the most subjective factor to be examined in a process. Several attempts have tried to assess credibility from a more objective way, such as psycho-physiological indicators (galvanisation of the skin, heart rate, sweating, brain changes, among others), to other subjective variables such as non-verbal cues (e.g., vocal and facial features and movements) and content criteria [
6,
7]. Vocal cues for the detection of deception have been studied extensively, highlighting the presence of interjections, speech errors, and speech rate in lies requiring a higher cognitive effort [
8].
To examine credibility, Criteria Based Content Analysis (CBCA) [
9] is one of the most accepted techniques. This checklist takes part in the Statement Validity Assessment protocol, which stipulates that the memory of a self-experienced event differs in content and quality from a non-experienced one. However, there is a lack of consensus on the weighting of criteria, which increases sensitivity according to the discriminatory power [
10]. The amount of detail and contextual information has been recognised throughout the literature as a predictor of deception by relating it to strategic avoidance of verifiable information [
11,
12,
13]. The revised CBCA technique [
14] classifies the criteria into cognitive and motivational. The first group includes episodic autobiographical memory (spatial and temporal details, reproduction of conversations, emotions and feelings, among others) and script-deviant information (superfluous and unusual details, unexpected complications, related external associations, etc.). Motivational criteria refer to the witness’s self-presentation efforts and the manner in which the witness presents the statement. Examples of motivational criteria are: making spontaneous corrections, admitting forgetfulness, or expressing uncertainty. The authors claim that the truth-teller takes advantage of autobiographical memory, while the liar tends to focus on appearing credible. More specifically, Maier et al. [
15] investigated the relationship between the strategies employed by liars and the occurrence of criteria. They concluded that, when lying, the strategy of including episodic details is valued as positive. However, it is to be hoped that script-deviant and motivation information will be avoided, as liars find it negative to include them in their testimony. Therefore, the presence of episodic autobiographical details would not directly imply truthfulness.
2. Related Work
Contributions are limited if we focus on deception detection in Spanish via NLP in the forensics–psychological domain. However, Vogler and Pearl [
13] investigated deception content in three different domains, concluding that linguistic features are more generalisable across domains and that specific details reflect the psychological process underlying the creation of truthful or deceptive content.
Automated analysis of the content of a text allows the extraction of linguistic features. General Inquirer [
16] was one of the first systems developed for this purpose. This system explores the frequency of words according to their lexical category using sentences as the unit of analysis. Another noteworthy system is CohMetrix [
17], which, in addition to word frequency, takes into account the cohesion of words, analysing the meaning and context in which they appear. Finally, LIWC [
18] is a word-centred instrument that allows the study of language at the emotional, cognitive, and structural levels. It was used in one of the first studies [
19] that applied deception detection through NLP to forensic psychological practice, in particular in its fifth experiment, where a fictitious crime paradigm was carried out.
On the other hand, Zhou’s research group [
20] developed Linguistics-Based Cues (LBCs) through a study in which they analysed emails in which the communicator had been truthful or untruthful. To do so, they synthesised features provided by classical content analysis tools, such as CBCA or RM, and linked them to NLP levels of analysis. They found discriminant criteria between testimonies, although many were inconsistent with previous research. Therefore, they stressed the importance of taking into account context (textual versus face-to-face) and deception planning time, related to cognitive load and anxiety.
The DePaulo team’s research [
21] is credited with having studied 158 indicators of deception in English-language reports made by adults, drawn from 120 independent samples. This meta-analysis found that liars were less communicative (talking time and details) and convincing, including fewer imperfections (spontaneous corrections and admitting lack of memory) and unusual content and more complaints and negative statements. In addition, it was concluded that deception cues were more pronounced when there was a motivation for success, especially if it was identity (as opposed to material motivation).
In the same vein, Hauch and colleagues [
22] conducted a meta-analysis of 79 indicators of deception studied in 44 research studies whose sample was mainly adult speakers of different languages, mostly English. In addition, texts were collected from real cases, crime simulations, attitudes, and others, through different types of communication where the motivation varied from null to high. These indicators were grouped into six research questions explored using LIWC. Broadly speaking, they showed that lying entails higher cognitive load, negative emotions, more distance from the event, and fewer sensory–perceptual and cognitive process references. However, no greater insecurity was found in lying.
In relation to romance languages, Fornaciari and Poesio’s research [
23] is remarkable for having been carried out in a real-life context. The corpus used was collected from court hearings in Italy of persons accused mainly of slander or false testimony. Linguistic feature extraction was carried out with LIWC software, and their results showed a higher presence of “yes”, spatio-temporal information, and positive feelings in truthful testimonies. In contrast, the false testimonies were characterised by a higher frequency of the word “no”, negative feelings, expressions of lack of memory, and first-person pronouns.
Finally, in the Spanish language, the presentation of Veripol [
24] is noteworthy. This is a model that combines NLP and Machine Learning (ML), developed together with the National Police, for the detection of false reports of theft, obtaining a hit rate of over 91%. In addition, this research illustrates the differentiating characteristics between true and false texts. Morphosyntactically, they found that false reports are characterised by reflexive missing and shorter sentences, negations, common nouns, and common and non-reflexive verbs. This indicates that, in untruths, they tend to be more impersonal in their narration, creating a distance from the facts. In terms of detail, truthful texts tend to be longer and richer in detail.
In Spanish also, reference can be made to Almela’s study [
5]. This research explored, through LIWC, the linguistic keys of written opinion texts on homosexual adoption, bullfighting, and feelings about a good friend. The main results described truthful texts by a higher presence of first-person verbs in the past and future tense, sensory–perceptual words, insight words (e.g., think), tentative words (e.g., maybe), and exclusive words (e.g., but). On the other hand, the false texts presented shorter 2nd- and 3rd-person responses and words related to negative emotions.
All in all, there seems to be agreement on some aspects related to lying, such as cognitive complexity. On the other hand, some debate is found in some indicators of lying such as insight words or expressions of insecurity. For this reason, it is essential to take into account moderating variables. Therefore, it is important to emphasise that this study was conducted in Spanish, in a face-to-face interview, with no planning time and low motivation to lie, bearing in mind that the success of this does not have major consequences for the participants.
In sum, this study aimed to describe the differential linguistic styles between truth and intentional deception. For this purpose, the testimonies obtained from a free recall task were studied through three dimensions: lexical, linguistic, and content and speech. In this way, differences between true and simulated deception testimonies were expected. More precisely, it was hypothesised that deceptive testimonies depict higher variations in terms of lexical, linguistic features, and content than true testimonies. Nevertheless, it was also hypothesised that there are no speech disfluencies between truth and deceptive testimonies, as, in deception, the aim is to produce a discourse as close to the real one as possible.
4. Results
In this section, we report the results of a set of quantitative and qualitative analyses aimed at comparing truthful and deceptive testimonies with respect to the stylometric, content, and speech indicators described above (cf.
Section 3.4). In particular, we performed a qualitative comparison of the results of frequency distribution analyses carried out on the two sets of retellings, and we accompany them with inferential statistical analyses as described below. Note that we relied on non-parametric statistical tests since the distribution of the variables under study did not satisfy the normal distribution assumption required by parametric tests.
The study investigated two complementary levels of analysis. On the one hand, we compared the two sub-corpora of retellings to identify their main differences in terms of stylometric (namely, lexical, and morpho-syntactical) and content properties. On the other hand, a participant-level analysis allowed us to assess whether the testimonies of the same participant varied with respect to the monitored indicators depending on the narrative condition. Such a more fine-grained analysis abstracts away from one’s personal linguistic style, which tends to be fairly fixed [
42], and accounts for the narrative condition as the only variable at play.
Note that we excluded punctuation from these analyses since it resulted from a manual addition on the part of the experimenter in the transcription phase.
4.1. Surface-Related Features: Lexical Variations
Our first analysis was devoted to assessing whether the use of the lexicon varies in the sub-corpora of Truthful (T) and Deceptive (SD) testimonies. To perform such an analysis, we relied on the frequency lists of lemmas acquired from T and SD as described in
Section 3.4.1. As can be noted from
Table 3, which reports the results of the analysis performed on frequency lists, unigrams grouped by POSs were split into
closed-class and
open-class words. The former refers to the category of function words (e.g., pronouns, determiners, conjunctions, and prepositions), which are the most-commonly used words in language playing a functional role in the discourse [
43,
44]. They are distinct from open-class words, which comprise content words that contribute to the meaning of the sentence in which they occur. This class includes nouns, lexical verbs, adjectives, adverbs, and proper nouns. Words that do not belong to the previous classes, such as symbols and interjections, fall in the “Other” group, while “Total” indicates the overall amount of tokens in the respective corpus.
By looking at the frequency distributions reported in
Table 3, we noticed that the amount of tokens is higher in T than in SD for all POSs and overall. This resulted in a larger sub-corpus (in terms of tokens) of truthful testimonies, and most importantly, it suggests that participants tend to produce longer narratives when telling the truth. However, if we focus on the relative frequency of the POSs in each sub-corpus, we noticed that these are quite similar in T and SD. In other words, although they may vary in quantity, the proportion of each POS with respect to the other word classes remained constant regardless of the narrative condition. Adjectives represent the only notable exception: both their absolute and relative frequencies were higher in the SD sub-corpus. As observed in past research (see, e.g., [
13,
45,
46]), also in our corpus, the distribution of adjectives was skewed in favour of deceptive testimonies. Pronouns are also typically monitored in deceptive texts, as it was found that witnesses tend to create more distance between them and the events when lying, thus using less personal pronouns [
19,
22]. In line with previous results, we observed a slightly higher frequency of pronouns in T.
To deepen our analysis, we investigated whether the similarity between the word frequency distributions discussed above also corresponded to a deeper similarity in the use of the lexicon. To this aim, we relied on two metrics, both reported in
Table 3, which offer complementary perspectives on the degree of lexical similarity between the two sub-corpora of truthful and deceptive retellings.
Lexical overlap indicates the rates at which the same words are used in truthful and deceptive testimonies regardless of their distribution. To this aim, it was computed—for each POS and overall—as the ratio of the lemmas appearing at least once in both T and SD and the total amount of distinct lemmas in the full corpus.
Spearman’s rank correlation coefficient, on the other hand, allows quantifying whether words in common between T and SD also have similar relevance, measured in terms of their frequency, in the two sub-corpora. Accordingly, the Spearman correlation was run for each POS on the frequency rankings of lemmas shared by truthful and deceptive retellings, ordered by decreasing frequency of occurrence in their respective corpus (e.g., we computed the correlation score between the frequency ranking of adjectives occurring in T and in SD). This metric allowed us to verify whether words having the same POS occurred with a similar relative frequency in both sub-corpora. If so, we could claim that participants refer to the robbery using a similar lexicon.
Overall, the lexical overlap between truthful and deceptive retellings was around 50%. As can be noted, the overlap was generally higher for closed- than open-class words, with conjunctions even showing a perfect overlap (100%), which indicates that the two sub-corpora of testimonies use the exact same set of words for this POS. This fact is not particularly surprising: the closed class consists of a quite fixed set of elements with highly grammaticalised roles; thus, they may not be used as consciously as content words [
44]. The strong Spearman correlation coefficients (>0.9) observed for almost all closed-class POSs indicated also a similar frequency of occurrence in the two sub-corpora. Notably, and corroborating what was said above, pronouns showed the lowest correlation for this class (0.867), indicating a slightly different frequency of occurrence of shared words for this POS in T and SD.
Open-class words, on the other hand, showed lower correlations than closed classes, especially in the case of nominal, i.e., adjectives (0.667), and verbal modifiers, i.e., adverbs (0.797). As a further remark, it should be noted that these are the POSs showing a lower and higher lexical overlap, respectively. In the case of adjectives, this result indicates that only a few words were shared by the sub-corpora, and their rankings based on frequency were also quite different. A similar difference emerged from the correlation score obtained for adverbs, but the high lexical overlap indicated that adverbs used in truthful testimonies generally occurred also in deceptive ones. This result is even more interesting if compared to nouns, verbs, and proper nouns, which instead showed relatively high lexical overlap and correlation. Indeed, assuming that nouns and verbs are used to convey information about entities and events (in this study: the robbery, the thief, the robbed, and the witnesses), their higher correlation with respect to modifiers seems to suggest that, when lying, participants preferred to alter the qualitative properties of actions and entities rather than the core facts of the stories. It could be argued that this is a consequence of the instructions given to the participants. Indeed, in order to comply with the instructions, nouns and verbs must somehow remain the same in both narrative conditions in order to refer to the robbery and the people involved in the story, while the deception can concern qualitative elements of facts and people, or even participants’ perception of the events as witnesses, linguistically expressed by means of nominal and verbal modifiers.
To offer further evidence for this peculiar fact, consider the word lists in
Table 4, which reports the 10 most-frequent words for four open-class POSs in T and SD. When we looked at the lists of nouns and verbs, we noticed that they were in fact quite similar. However, the noun “pelo” (hair) is among the most-frequent nouns in SD, but not in T, suggesting that the physical appearance of the characters in the story was more thoroughly described in deceptive testimonies. This intuition seems confirmed by the lists of most-frequent adjectives: while in T, we found highly frequent words referring to the day of the event (i.e., “caluroso” (hot) in 2nd position in T and 7th in SD; “fresco” (cool) in 10th position in T and not appearing in SD), the most-frequent adjectives in SD were mainly used to describe qualitative physical properties, in particular the colour of the hair and skin of the thief. Interestingly, “negro” (black) and “blanco” (white) were the most-frequent adjectives in T and SD, respectively, indicating that the appearance of the thief in the two sub-corpora was diametrically opposed. For the interpretation of the results, it is important to note that the ethnicity of the characters varied across the stories (see the pictures in
Appendix A) and that, for the simulated deception condition, participants were asked to focus the lie on the identification of the thief. Taking this into account, the results indicate that, when faced with the truth condition, the skin colour of the robber was more likely to be emphasised when the robber was African-descended than when the robber was Caucasian. However, when participants were asked to lie with regard to the identity of the robber, there was a perceived tendency to change the ethnicity of the African-descended thief rather than other physical characteristics. This can be deduced from the higher frequency of the adjective “blanco” (white) in the simulated deception condition. In contrast, lying with regard to the Caucasian thief tended to change other physical traits beyond ethnicity. Furthermore, to investigate this fact in more depth, we analysed the set of uncommon adjectives (i.e., occurring either in T or SD): what we noticed is that the thief was usually described as Latin-American in truthful testimonies, while he was more frequently described as “gitano” (gypsy) or “islámico” (Islamic) in deceptive retellings, and the other physical properties mentioned tended to reflect stereotypes associated with these ethnic groups (e.g., “moreno” (brown hair), “violento” (violent), “drogado” (addicted)).
As a final remark on the surface-related properties of the testimonies, it is worth mentioning that the lists of most-frequent adverbs suggest that the high lexical overlap observed for this POS was possibly caused by the frequent use in both sub-corpora of discourse markers (e.g., “entonces” (then), “también” (also), “ya” (now)) and adverbs of time and space (e.g., “detrás” (behind), “antes” (before), “después” (after)). While the latter were used to describe the events in the story, discourse markers do not have a precise meaning, but are quite frequent in speech. The correlation score observed for adverbs (0.797) suggests that the amount of information provided by the witnesses to locate the events in time and space might be quite different in the two sub-corpora. This aspect will be explored in more depth in the content analysis (cf.
Section 4.3).
4.2. Linguistic Style of Testimonies
The analysis reported in this section is aimed at assessing whether the narrative conditions affect the linguistic style of testimonies. To this aim, we relied on a set of linguistic features acquired from T and SD, exploiting the methodology described in
Section 3.4.1. The truthful and deceptive testimonies produced by the same participant were paired in order to perform a participant-level comparison of the extent of variation of the linguistic features depending on the narrative condition. This was carried out based on a two-step analysis. First, we computed the Wilcoxon signed-rank test between paired testimonies to explore which linguistic properties varied significantly between T and SD. Due to space constraints, we report the results of the Wilcoxon singed-rank test in
Appendix B.1. Specifically, we report in the table the W score, the
p-value, and the effect size score
r [
47,
48] of the test. Then, for those features showing a significant variance between sub-corpora, we checked (i) their mean values in T and SD and (ii) the impact of the narrative condition on the linguistic productions of participants. The latter analysis was carried out by monitoring the variation between the values of features in the T and SD testimonies of the same participant and specifically checking whether these values increased or decreased depending on the condition. The results of the analysis performed on significantly varying features are reported in
Table 5. To check the results obtained on the full set of features, refer to
Table A2 in
Appendix B.2.
Based on our analysis, we found a selection of around 30% of features that showed a significant variation of their values in the testimonies due to the narrative conditions. These features mostly concerned Raw Text properties, the distributions of Parts-Of-Speech (POS), and dependency relations (SyntacticDep), features regarding global and local properties of the parsed dependency Tree (TreeStructure), and the use of Subordination (Subord). Lower discriminative power was assigned to the Inflectional morphology of Verbs (VerbInflection), while the Verbal Predicate structure (VerbPredicate), i.e., the number of dependents of a verbal head, and the linear Order of elements in the sentence (Order) turned out to not vary significantly between the two sub-corpora.
This linguistic analysis complemented the study on the lexicon discussed in
Section 4.1 in multiple ways. First of all, it confirmed the different distributions observed for certain grammatical categories, such as adjectives and pronouns, both with respect to their
POSs and the dependency relations linking them to their syntactic head (see the features in the
SyntacticDep group). Additionally, it allowed investigating in more depth the surface properties of the testimonies, as well as deeper linguistic structures. Concerning the former, consider for instance the length of testimonies. The greater length of truthful narratives, observed above with respect to the overall amount of tokens in the sub-corpora, was also reflected at the level of individual testimonies. In particular, significantly varying features of the
Raw Text group revealed that testimonies in T showed a higher number of sentences and tokens. If we look at this result in light of the variation analysis, we can also see that slightly more than half of the participants produced longer retellings in the truthful narrative condition, but they used a higher number of tokens in 75% of cases. This might suggest that, even when the retellings had an equal number of sentences, these tended to be longer in the truthful testimonies. However, if we look at the average number of tokens in sentences (feature
tok_per_sent, cf.
Table 5), we noticed that the average sentence length was equal to about 20 tokens in both sub-corpora, and rightfully considered as non-discriminative for deception. This indicates that the length of sentences was consistent across conditions, and the tendency to produce longer truthful testimonies was proper for a subset of participants, who simply produced a higher number of sentences.
Such a difference was reflected also in the local and global structure of the parsed dependency trees representing the syntactic structure of sentences. Features falling in the
TreeStructure and
Subordinate groups showed that, although all sentences had a similar length, they were characterised by a different deeper syntactic structure depending on the narrative condition. Testimonies of the T sub-corpus, for instance, showed higher use of subordination (feature
subordinate_proposition_dist) and of embedded nominal modifiers (feature
n_prepositional_chains), as shown in the following sentence excerpt acquired from the T corpus:
“tenía un poco de cara de mala hostia” (trad. “he had a bit of a grim face”), where the underlined noun
“hostia” is syntactically dependent of
“cara”, which in turn has
“poco” as its syntactic head. Deceptive testimonies, on the other hand, showed on average longer clauses (feature
avg_token_per_clause). These elements seem to point to a higher complexity of truthful testimonies, confirming what was stated by [
21]: creating and managing misinformation are more cognitively demanding than telling the plain truth; thus, shorter and simpler testimonies are produced when lying.
A further result emerging from this analysis concerns the inflectional morphology of verbs (
VerbInflection group). As can be noted from
Table 5, the presence of verbs in the present tense seems to have a discriminative role for truthful testimonies. Investigating this fact in more depth, we noticed that this was due to a higher presence of linguistic edges, i.e., words or phrases used to express ambiguity, probability, caution, or indecisiveness. These were primarily expressed by means of some verbs:
creer (to believe),
recordar (to remember),
saber (to know). This is in contrast with what was observed in previous studies, such as [
49], which found a higher presence of these expressions in deceptive texts. It is essential to analyse the phenomenon behind the emergence of these verbs related to cognitive processes, as detailed in the discussion.
4.3. Content and Speech Disfluencies
Analysis
The analysis of content and speech characteristics was carried out on the basis of the tags manually annotated on the transcripts, using the procedure described in
Section 3.4.2. A descriptive comparison was then made for each property. As for the participant-level analysis carried out on linguistic features, we paired the retellings of participants based on the narrative condition in order to monitor whether a property is more, equally, or less present in truthful than deceptive testimonies.
Regarding the analysis of the content, which concerns
cognitive and
motivational criteria, the results displayed in
Table 6 show that the former were more highly present in T. Motivational criteria, on the other hand, were mostly equally present in the sub-corpora (see column “Variation”). However, if we look at this result in light of their absolute frequency, which indicated a higher number of motivational tags in T, we could imagine that their distribution was skewed in the sub-corpora. Consider, for instance, the tag capturing cases of lack of memory: the variation analysis showed that, in 75% of cases, the participant was not affected by the narrative condition when recalling the story, which indicated that retellings showed the exact same number of
<lm> tags. Looking at this result in more depth, we noticed that the tag was not used in almost all of these cases (97.23%), meaning that the participant did not experience any lack of memory when retelling the stories. By looking again at the tag distribution over the whole corpus, we can conclude that the lack of memory was experienced by only some participants (56.25% of subjects taking part in the study) who tended to express it in most of their retellings, but with a higher frequency in the truth condition.
With respect to speech disfluencies, they did not seem to vary in the truthful and deceptive testimonies as their distribution was quite balanced in the two sub-corpora. However, it should be noted that some tags were heavily more present than others. This was the case of repetition and, most notably, hesitation. The latter captured cases of speech fillers, namely sounds filling a pause in utterances (e.g., “uhm”, “eeh”), which indeed quite common occur in narrative tasks.
Figure 1 depicts error bar charts for conditions with statistically significant differences under the Wilcoxon signed-rank test. In relation to the cognitive criteria, all analyses reached the level of statistical significance. More precisely, differences between T (Mdn = 6) and SD (Mdn = 3) in contextual information reached the significance level (W = 967,
p < 0.001 ), as well as differences between T (Mdn = 3) and SD (Mdn = 2) in superfluous details (W = 634,
p < 0.001) and differences between T (Mdn = 9) and SD (Mdn = 5) in the quantity of details (W = 1102,
p < 0.001). The same pattern occurred for motivational criteria, where the Wilcoxon signed-rank test showed that differences between T (Mdn = 1) and SD (Mdn = 0) in admission of lack of memory reached the significance level (W = 286,
p < 0.001), as well as differences between T (Mdn = 0) and SD (Mdn = 0) in spontaneous correction of content (W = 104,
p < 0.05). Lastly, and in relation to speech disfluencies, grammatical correctness also depicted statistically significant differences between T (Mdn = 1) and SD (Mdn = 0), W = 21,
p < 0.05. Other analyses did not reach the level of statistical significance in this condition.
5. Discussion and Conclusions
In this work, we introduced a study based on a free recall task aimed at collecting 384 truthful and deceptive retellings of two stories describing a robbery. To this aim, we recruited a balanced sample of 48 volunteering participants acting as witnesses to the robbery and pretending to be testifying in a trial. Our main goal was to assess whether there was a difference in terms of linguistic style in the testimonies based on the narrative condition in which these were produced. To this aim, we created a novel corpus of truth and deceptive testimonies in Spanish, and we proposed a methodology that integrates the psychological and NLP perspective for studying the impact of deception in the linguistic style of the collected testimonies. In particular, we explored the differences between truthful and deceptive retellings by monitoring multiple stylometric dimensions, concerning the lexical, morpho-syntactic, and content and speech properties of the testimonies. Although relatively small, the corpus of retellings allowed us to conduct investigations on the stylistic differences driven by the truth and simulated deception narrative condition, which we now discuss in light of the literature.
The results obtained based on our analysis highlighted stylometric differences between the two groups of retellings (i.e., truthful and deceptive) and showed how lexical, linguistic, and content and discourse features interact with each other in providing the expressive style of each condition. One of the main results emerging from our analysis concerns the use of the lexicon, which did not seem to be strongly affected by the narrative conditions, as discussed in
Section 4.1. Nevertheless, we attested to a slight difference in the use of nominal and verbal modifiers, as well as pronouns, in the two sub-corpora.
In accordance with the literature [
5,
19,
22], such variation concerning nominal modifiers seemed to indicate a tendency to focus lies on qualitative elements rather than on the central facts of the stories, possibly creating a personal distance from the facts. In particular, it is important to highlight two aspects of this analysis. The greater presence of adjectives (hot, cool, contrary, etc.) and adverbs (behind, before, after, etc.) in the sub-set of truthful testimonies indicated a greater presence of descriptive content.
On the other hand, when participants must lie in the identification of the thief, a racial bias seemed to emerge. In particular, there was a tendency to emphasise the skin colour of the African-descended thief, as opposed to the Caucasian. Furthermore, this physical trait prevailed over other characteristics that might describe the African-descended thief. At this point, it is also important to consider how stereotypes can influence the memory of witnesses [
50,
51]. This can be explained by the cross-race effect, that is there is a tendency to perceive the exogroup under general characteristics, losing the sensitivity to detect individual attributes [
52,
53].
Regarding linguistic analysis, the main differences that emerged concerned the length of the text, the distributions of parts-of-speech and syntactic relations, the structure of the parsed dependency tree, and the use of subordination. Since the linguistic properties acquired using Profiling-UD showed a high and significant correlation with the perceived complexity of texts [
34], we could claim that the true condition results in retellings with higher syntactic complexity than simulated deception ones. These findings are in line with other research in both Spanish [
5,
24] and in other languages [
19,
22]. There seems to be a consensus in concluding that lying entails a greater cognitive load [
40,
54], including, neurologically, a greater activation of the frontal lobe [
55]. In particular, to lie properly, it is necessary to be able to multi-task (suppressing the truth, creating new information, appearing credible, etc.) [
56]. Multiple resource theory [
57] explains how task performance can be affected by several factors: task demand (difficulty), resource overlap (if two tasks require the same resources, interference may occur), and resource allocation (executive control prioritises tasks, with more errors occurring in tasks that are in a secondary position).
In terms of the cognitive criteria for content, our results concluded that truthful testimonies tend to include a greater amount of details, including contextual and superfluous ones. As we have seen, this conclusion is supported by lexical analysis. In the literature, there seems to be agreement on the link between detail and truthfulness [
22,
24,
58]. Furthermore, this trend can be observed in different cultures, such as British, Arabic, or Chinese [
59]. This may be due to a lack of ability to provide details when lying [
8] or a fear that these details will be checked [
60]. In fact, the verifiability approach [
61] states that false testimonies tend to provide less verifiable details, but more unverifiable details, such as common knowledge or general details [
14]. However, it is suggested that this difference disappears if time is allowed to elapse between coding and interview, due to memory decline in truth-tellers and stability bias in liars [
40].
As regards the motivational criteria of content, a greater tendency was found to admit a lack of memory in truthful testimonies, compared with making spontaneous corrections. However, it is believed that when lying, we are more focused on strategic self-presentation in order to appear credible [
14]. This means that we are less likely to self-correct and admit memory lapses or doubt our own testimony [
21,
22]. This tendency is linked to the higher frequency observed in truthful testimonies of verbs in the present tense. Although the literature has associated the present tense with lying [
5], in this study, it was observed that the greatest use of this verb tense was focused on the verbs “to believe”, “to remember”, and “to know”. Specifically, they were used in expressions such as “I believe it was...”, “I can’t remember anymore”, or “I don’t know for sure whether”. Therefore, these expressions can be seen as a reflection of the admission of memory failure [
5,
9,
21,
22]. However, some studies have associated lying with references to thinking, memory, and other cognitive processes [
49,
62]. To resolve this inconsistency, it is important not to confuse the admission of lack of memory with self-handicapping strategies [
63]. These strategies consist of justifying why certain information cannot be given. They are used in lying because, as we said, fewer verifiable details are offered and the simple fact of admitting a memory failure without justifying the reason is considered detrimental to credibility [
63]. In this way, lying avoids criteria related to positive strategic self-presentation efforts (admitting a lack of memory and spontaneous corrections) and, also, information deviating from the script (superfluous details) [
15]. This increases the validity of these criteria for detecting truthfulness.
Finally, in general terms, no differences were found in terms of speech disfluencies. This result is consistent with other studies. There is a popular belief that you can detect lying through non-verbal communication. However, despite a large body of research, there seems to be no scientific evidence for various reasons: some signals have not yet been analysed; the measurements are imprecise; the non-verbal expression of lying is related to individual or contextual differences; there is a group of signals with greater discriminative power; there is an imitation of the interviewer’s behaviour; liars and truth-tellers use the same non-verbal strategies [
40]. It is important to focus on this last reason, as in this study, we observed a similar pattern between the two conditions, with hesitation and repetition being the most frequent. It is proposed that speech disfluencies are fillers that are linked to the speaker’s
Feeling of Knowing (FOK), that is with the confidence expressed by the speaker [
64]. Similarly, there appears to be the same relationship with
Feeling of Another’s Knowing (FOAK) or the listener’s perceived confidence [
65]. As proposed by the research group of Dinkar and colleagues [
66], fillers have various functions that relate in different ways to trust. In this connection, it might be interesting to study spoken language processing in court hearings, relating the different functions of fillers to FOAK.
In the study, ecological validity was sought through intentional non-coding, the distractor counting task, and verbal testimony. However, some authors agree that discrimination ability is lower in the laboratory than in reality [
7]. One should bear in mind that we were dealing with a simulated deception setup. For this reason, biases related to the ecological validity of results might occur. Future work should try to examine ecological validity from a simulation environment. In contrast, the advantage of these experiments lies in the control of variables, increasing the validity of the conclusions [
5]. For this purpose, the sample was balanced in terms of the gender and age of the participants. However, these variables, along with many others, such as IQ or narrative skills, influence the ability to remember and also to fabricate false testimony [
14,
67,
68,
69]. It would be interesting, as future research on deception, to study the linguistic and content characteristics that vary or that remain stable in relation to these variables. Other limitations are related to the selected task from the literature. Systematic replications might benefit from a more balanced set of stimuli regarding language use in testimonies.
In summary, these findings are relevant in the forensics context. Considering barriers, as well as the increasing need to implement current techniques, psycho-linguistics might offer a cutting-edge approach in the field. In particular, professionals are advised to take into account scientific findings in the face of the myths associated with lying. It is also recommended to pay attention to the influence of stereotypes on the testimony. Moreover, this study contributes to the development of NLP tools for deception detection in Spanish. Indeed, Spanish, along with other Romance languages, provides a less-explored scenario for detecting deception and an interesting test bed to evaluate the cross-linguistic validity of earlier studies, which were mainly conducted on English. Nevertheless, the setup of this study is potentially language-independent thanks to the multilingual syntactic representation formalism provided by UD, which guarantees the comparative encoding of language phenomena across different languages [
35] and upon which we based the acquisition of the stylometric properties from testimonies. Furthermore, also content and speech disfluencies properties are general enough to be multilingual, as testified by their past use on a corpus of Italian texts [
41]. For future research, it is recommended to further explore deception detection in a real context and to take advantage of the synergy between computational linguistics and psychology to increase the sensitivity of the instruments.