1. Introduction
The emergence and global use of social media are the most notable manifestations of the world’s information and communication technology (ICT)-based development. In this new worldwide paradigm, data are exchanged at lightning speed over social networks, resulting in an endless number of activities, including the manipulation of public feelings and emotions. Due to the fertile ground formed by the specific ways in which social networks dominate the internet, processes such as the exchange of jeers, insults, feelings of hatred, admiration, or other human emotions have recently become more common with international events. This is despite the fact that such processes are not novel in the annals of communication. Twitter, in particular, can offer more. Twitter messaging can aid in establishing and preserving an ethical image, which lessens the impact of ethical crises [
1]. It is currently one of the most prominent platforms worldwide [
2]. It is among the best online platforms for social interaction, introducing novel methods of storytelling through tweets, for which several methods for examination are possible. First and foremost, tweets allow conversations between speakers of various languages, highlighting their critical role in removing linguistic and geographic barriers. This gives researchers access to broader and more in-depth research topics. As a result, over the past five years, a body of research on emotional communication on Twitter has emerged [
3].
Currently, the popularity of Twitter generates a huge amount of behavioral data that may be used in many different ways. Among these is to understand how Twitter reflects the emotional and sentimental cultural aspects of various nations. A further approach is to determine whether there are significant cross-national differences in how emotional and sentimental tweets are expressed and, if so, how these differences relate to the cultural differences noted by researchers. Although several academics have examined this topic, few have provided cross-national comparisons [
4]. We can therefore anticipate fewer cross-national comparisons between developing world regions and nations due to the absence of financial support for technological and cultural advancement. Additionally, this may be due to a variety of linguistic factors [
5]. In a group of languages, including Arabic, Turkish, and Hebron, each input word can be composed of multiple lexical and functional elements, making natural language processing (NLP) extremely difficult [
6]. Due to issues such as the grammatical complexity of the Arabic language, its variety of dialects, and the need for reliable data sources, there is a lack of Arabic-language research in NLP, particularly in summarization [
7]. Amharic is among the languages with the fewest resources in ML. Therefore, it requires cost-effective strategies and enormous quantities of annotated training data, as well as techniques distinct from those used for the English language. Consequently, in developing nations, we can expect a small number of successfully developed NLP algorithms, which are frequently built on the foundation of machine learning (ML) algorithms.
This paper utilizes a mixed-method approach, which we believe to be the most suitable technique. First, all the tweets published in Arabic and Spanish during the Argentina vs. Saudi Arabia World Cup 2022 football match were gathered. Next, they were analyzed quantitatively in terms of the number of tweets and their distribution between Arabs and Hispanics. The qualitative-content-analysis technique was then used to probe into the texts to uncover cultural differences hidden within users’ emotional and semantic traits.
The proposed model in this method comprises successive phases in which the outcome of each phase builds upon the previous phase’s result. Therefore, we collected the tweets during the specified duration and separated them into Arabic and Spanish groups. Next, we translated both texts into English as an intermediary language and performed an emotional and sentimental analysis of the translated material. The final step was to classify the gathered emotions as low or high in arousal, based on the distribution of basic emotions in the arousal–valence space. Subsequently, the data were examined in terms of the collectivistic levels of Arabs and Hispanics, as shown in the literature, and the results are demonstrated in tables and charts. The only way to obtain Arabic and Spanish tweets for a comparative emotional and sentimental analysis was to translate them into English. There is no method available to analyze emotions in native texts. Arabic texts cannot be analyzed for emotions by EmoRoberta, but Spanish texts can. EmoRoberta could not have been used with native Spanish texts and English translations of Arabic texts. This would have been unjust.
In our research, we used both a sentiment analysis and an emotion analysis to compare Arabs and Hispanics regarding their tweets during the Saudi Arabia vs. Argentina match in the Fédération Internationale de football Association (FIFA) World Cup in Qatar, in 2022. We performed both emotion and sentiment analyses and did not choose one over the other because they are two independent methodologies that generate unique insights. Emotion analysis provides a finer degree of granularity than sentiment analysis. Therefore, emotion- and sentiment-analysis models may be considered mutually supportive. Emotion analysis is hence an additional layer on top of the comparatively simple sentiment analysis. Both emotions and sentiments often contribute to the building of a collectivist culture within a community. Thus, in our research, we seek to expand the scope of cross-national social media research by discussing the emotional and sentimental cultural components of Arab and Hispanic tweets. Our study exemplifies the use of Twitter data to explore research subjects that have rarely been addressed due to a lack of data and technology. It reveals how cross-cultural variances affect Twitter users’ messages and makes it easier to discriminate between various Twitter users with diverse cultural origins.
To the best of our knowledge, this is the first study to systematically examine and compare emotional and sentimental indicators in the content of Twitter tweets published in various languages. In addition, while occupying a major percentage of the globe (approximately one billion people), the Arab world and Latin America are underrepresented in social and cultural psychological research. This is therefore the first study to compare the emotional and sentimental cues in the content of Twitter tweets between two groups from the developing world. From the generalization perspective, since the main focus of this study is on the sentiments and emotions that language constructs contain, it can be generalized to all Arab countries. They can also be more specific to a single Arab country. The same holds for Latin American nations that speak Spanish. Moreover, this cross-national comparison between the two communities will help to reassess the current dimension of sentiments as positive, negative, or neutral, providing a more sophisticated approach to this question. It will also help to reassess emotional dimensions such as admiration, amusement, anger, etc., from a more sophisticated perspective.
This study was motivated by the growing practice of incorporating Twitter into cultural studies. This inspired us to investigate and compile information on the cultural distinctions between Arabs and Hispanics, as evidenced by their tweets. These two communities are both known to originate in emerging regions: the Arab world and Latin America. There are no Twitter-based cultural comparisons between these nations. This was a further research gap we were inspired to fill. The intense interest among Arabs and Hispanics in the World Cup match between Saudi Arabia and Argentina sparked this motivation. In addition, there is a perception that contemporary societies are complex and that these complexities have affected their cultures. Therefore, the categorization of Arabs and Hispanics as collectivistic or individualistic may be significantly altered. As a result, the comparison of Arabs and Hispanics will help researchers to reconsider current cultural dimensions.
Consequently, the purpose of this study is to investigate the emotional and sentimental cultural traits of viewers from Arabic- and Spanish-speaking communities, as observed in their tweets about a specific football match. We aimed to determine whether, even though they speak different languages, cultural distinctions can be seen in the online communication of the respective communities.
The remainder of this paper is structured as follows. In
Section 2, we present the literature related to our own research. Next, in
Section 3, we describe the methodology we employ. Subsequently, in
Section 4, we report the implementation of the proposed methodology, through which we obtained the relevant tweets, translated them, and performed an emotion and sentiment analysis of their content. Next, in
Section 5, we present a detailed discussion, in which we correlate the findings of the data analysis with the findings in the literature and highlight the results accordingly.
Section 6 concludes with a summary of the results, the limitations of the work, and a discussion of our contribution.
2. Related Work
Few studies have been undertaken on the cross-national emotional and sentimental characteristics of Twitter tweets. Fewer still have been carried out on developing nations [
8]. Cross-cultural emotional differences regarding urban greenspace (UGS), as disclosed in English and German tweets, were investigated. A sentiment analysis was conducted on the collected tweets to determine the sentiment values and their corresponding tweet numbers. The results indicate that different emotions are elicited by different types of UGS, and that English and German demands were distinct, as evidenced by their tweets, with the highest sentiment values in gardens and parks, respectively. The activity environment contributed most to positive emotions, regardless of cultural differences. The findings of the study indicate that human emotions can indicate whether UGS supply satisfies human needs and that particular contextual factors can promote positive human emotions to sustain their needs in a cross-cultural context [
9]. A novel sentiment-analysis technique that enables a comparison of the emotive contents of Twitter messages in the United States and Japan was used by researchers investigating how affective cultural values may influence social media use. The study revealed that Japanese users primarily produced low (vs. high)-arousal postings, whilst U.S. users mostly produced positive (vs. negative) posts, in line with their respective cultural and emotional values. However, in contrast to their affective cultural values, the Japanese users were more affected by changes in others’ high-arousal positive(including feelings such as excitement) posts than the U.S. users than by changes in their high-arousal negative (including feelings such as fury) posts. When accounting for variations in baseline exposure to emotive content across various themes, these trends persisted. The authors claimed that, across cultures, social media users are affected by culturally relevant content that contradicts their affective values [
10]. An empirical study on how Twitter users employ emojis was presented. The research used a comprehensive, cross-regional data set from Twitter to perform the analysis. The authors employed distributional semantic models to express emoji semantics and contrasted country-specific emoji models. It was observed that the categories and frequencies of the emojis expressed by users could serve as rich sources of data for understanding cultural differences between Twitter users from a wide variety of demographics. The study indicated that the preferred usage of emojis conforms to Hofstede’s cultural dimensions model, in which different cultural dimensions within countries demonstrate considerably diverse uses of emojis to express emotions.
From another perspective [
11], researchers investigated how people’s use of emotions on Twitter is influenced by cross-cultural variations. By combining Twitter-emoticon-usage patterns with Hofstede’s culture index, the authors found that people from collectivistic cultures favor vertical and eye-oriented emoticons, whereas people from individualistic cultures prefer horizontal and mouth-oriented emoticons [
12]. People’s emotions, captured from Twitter tweets, about the Russia–Ukraine war (RUW) were examined to present a framework for automatically categorizing various societal emotions on Twitter. The study proposed a framework for automatically categorizing the many societal emotions on Twitter using a pertained ML technique, EmoRoberta. The model extracted 27 distinct emotions exhibited by Twitter users, which were then classified using machine-learning techniques. The study found that 81% of the Twitter users who participated in the survey had a neutral opinion of the RUW; however, there were hints concerning countries other than Russia and Ukraine, including Slovakia and the United States. The majority of the tweets described the RUW with terminology more closely associated with Ukraine than with Russia [
13]. Key clinical indicators of the advancement of the COVID-19 pandemic were compared with indicators of public perceptions of the pandemic revealed from 20 million related tweets in a certain period. There were signs of psychophysical characteristics: Twitter users were becoming increasingly interested in death, but their tone was shifting away from passion and towards reason. Word co-occurrences were analyzed semantically to reveal variations in the affective context of COVID-19 fatalities. Their calculated parameters agreed with the estimations from the psychological experiments. The research demonstrated that users’ tweets differ in their sensitivity to national COVID-19 mortality rates based on their country.
Few ML-based cross-national studies on the emotional and sentimental characteristics of Twitter material have been conducted. Fewer still have been conducted on emerging nations, with the majority performed on developed nations. One study [
8] focused on English and German tweets. Another [
9] focused on user tweets from the USA and Japan. The authors of [
10] compared the usage of emojis, not the text, in tweets by users from many countries, of which four were developing countries: Indonesia, Brazil, the Philippines, and Mexico. To contribute to the closing of this research gap, we based our comparison of Arab and Hispanic tweets on text rather than emojis. Furthermore, the authors of [
10,
11] used Hofstede’s cultural dimensions as a base for cultural comparison, while others did not. The authors of [
12] examined emotions captured from Twitter tweets by using the pre-trained ML model, EmoRoberta. This was comparable to our method, in which we used only one Hofstede dimension (individualism vs. collectivism (IDV)) in our research, as we found that IDV is the most frequently studied Hofstede dimension to date. The authors of [
12] also assessed users’ emotions in captured tweets by using the pre-trained ML model, EmoRoberta, which was identical to our strategy. The authors of [
13] examined how the public emotions relating to the COVID-19 pandemic were revealed by 20 million tweets relating to this topic during a specific time frame, and compared their findings to those of psychological experiments. This is comparable to the way in which we conducted our research, which involved mapping our findings to literary descriptions of various cultural traits. In addition, all the previous studies used Twitter text messages in English as a basis for their analyses and comparisons, except [
10], which used the emojis in the tweets examined. On the other hand, we considered textual Twitter messages in languages other than English and translated them into English before conducting the emotion analysis. While more research is required in this area, we believe that the agreement between our study’s findings and the literature regarding the collectivistic differences between Arabs and Hispanics is only one example of how the translation preserved the emotions in the Twitter content.
3. Methodology
To achieve the goal of this work, our suggested method for extracting and classifying societal emotions entails many interconnected components. The proposed method’s workflow is depicted in
Figure 1.
As mentioned above, in this paper, we utilize a mixed-method approach, which we believe to be the most suitable technique. First, all tweets published in Arabic and Spanish during Argentina vs. Saudi Arabia World Cup 2022 football match were gathered. Next, they are analyzed quantitatively in terms of the number of tweets and their distribution between Arabs and Hispanics. The qualitative-content-analysis technique was then used to probe into the text to uncover cultural differences hidden within users’ emotional and semantic traits.
Within this methodology, our research adopted a novel approach. We compared two non-English-speaking communities from the developing world (Arab and Hispanic) based on their Twitter activity. We translated their tweets from their original languages into English, and then used NLP-based ML algorithms designed for English text to culturally assess these tweets and compare the sentiments and emotions of the two groups. Furthermore, in our approach, the problem of detecting users’ physical locations from Twitter-based public information was solved by the language of the gathered tweets, which revealed whether the tweet’s author was Arab or a Hispanic and, therefore, was used as a clue as to the author’s home nation. However, since the sentimental and emotional contexts of tweets may change when they are translated from one language to another, we utilized the study’s results as a guide to establish how effectively the translation preserved sentiments and emotions at a suitable level for cultural comparisons. Following translation, the tweets were emotionally rated using an English-based machine-learning model and presented in descriptive charts. Next, the collected emotions taken from the tweets were classified in terms of their arousal level, high vs. low, in accordance with the fundamental emotion distribution in the arousal–valence space [
14,
15]. Next, we addressed how these data can be understood in terms of psychological researchers’ proposed nation-based cultural dimensions. Subsequently, as mentioned above, the research results, when consistent with the literature about the cultural distinction between Arabs and Hispanics, were viewed as evidence. The results showed how effectively, to some extent, the machine translation preserved the context’s sentimental and emotional clues. We provide more details in the following sections.
It is important to highlight that the only way to obtain Arabic and Spanish tweets for a comparative emotional and sentimental analysis was to translate them into English. There is no method available to analyze emotions in native texts. Arabic text cannot be analyzed for emotions by EmoRoberta, but Spanish text can. EmoRoberta could not have been used with native Spanish text and an English translation of Arabic text. This would have been unjust.
3.1. Capturing Tweets
We captured all the tweets written in Arabic and Spanish languages that were posted during the period of the Argentina vs. Saudi Arabia football match in the FIFA World Cup in Qatar, in 2022. We chose 22 November 2022 as the starting date, which was the day of Argentina vs. Saudi Arabia match (the day when the trend “Where is Messi” in the Arabic language started), and 23 November 2022 as the ending date, which was one day after the match. We developed a Python script to communicate programmatically with Twitter via its Application Programming Interface (API) for developers. In this method, the script takes as input a key phrase, a start date, and an end date, and returns all tweets containing the specified key phrase throughout the specified duration. The Tweet-gathering period was marked by a high level of tension among both Arab speakers and Spanish speaker users. To facilitate the cross-cultural analysis by examining Twitter tweets, we retrieved the Twitter data using Python script with specific filtering. Therefore, only users watching Argentina vs. Saudi Arabia football match from hundreds of countries and regions were represented. As the aim of this work was to study emotions and sentiments on Twitter for Arabic- and Spanish-language speakers only, we needed to rule out tweets in any other language. Therefore, we parsed all the tweets posted by unique Arabic and Spanish users in the selected period. We crawled all the Arabic and Spanish Twitter data throughout the collection period with the key phrase “Where is Messi” written in either Arabic or Spanish. Messi is a well-known Argentinian football player known to football fans worldwide.
3.2. Splitting the Tweets
We divided the tweets into two classes based on the language of the terms contained within them and saved them in two separate tables: “Messi_ar” table for tweets using the phrase “Where is Messi”, which became popular in Arabic (5186 tweets), and the “Messi_es” table for tweets including the Spanish phrase “Donde está Messi”, in order to observe the Spanish-speaking population’s use of the expression (398 tweets).
3.3. Translation into English
Because the collected tweets were in Arabic and Spanish, they needed to be translated into English as an intermediary language so that we could compare Arabic and Spanish tweets in emotion and sentiment dimensions. To this end, we developed a Python script that interacted with Google Translate. The software received Arabic and Spanish tweets as inputs and delivered them translated into English. This allowed us to compare the tweets using English-based ML algorithms for sentiment analysis and emotion recognition that are already available.
It is known that when using translated data, translated emotions and sentiment data are likely to contain samples that are not indicative of their assigned sentiment or emotion in their source language [
16]. Although the level of preservation may be deemed sufficient for the present cross-national emotional comparison of messages on Twitter, more research was required.
3.4. Emotion Recognition
Emotion analysis was conducted by using another NLP approach that involves the extraction and analysis of emotions from a text. For this work, we used the EmoRoberta model which uses the cutting-edge Roberta approach with few changes to its key hyperparameters [
17]. Both Roberta and EmoRoberta utilized Google’s well-known Bidirectional Encoder Representations from Transformers (BERT) model [
18]. Then, Roberta surpassed the BERT model as the best pre-trained model for use in text classification tasks [
19]. EmoRoberta further divides the text into 28 emotion categories (admiration, amusement, anger, annoyance, approval, caring, confusion, curiosity, desire, disappointment, disapproval, disgust, embarrassment, excitement, fear, gratitude, grief, joy, love, nervousness, optimism, pride, realization) [
20].
We built a Python script that used this model and applied it to our translated English texts. The output of this analysis illustrated the percentage of references to each emotion in the tweets collected during and one day after Saudi Arabia and Argentina’s 2022 FIFA World Cup match in Qatar.
3.5. Sentiment Analysis
Sentiment analysis is used to determine whether a text is positive, negative, or neutral. For this task, we used ‘twitter-roberta-base-sentiment-latest’ [
21], which is a machine-learning model trained over ~124 million tweets. We built a Python script that used this model and applied it to our translated English texts. When a text was input, the script generated categorization probabilities for whether it was positive, negative, or neutral.
3.6. Defining Emotional Arousal Level (High vs. Low)
Emotional arousal level is one of the significant culturally variable elements of emotion. Consistently, cross-cultural disparities in emotional arousal intensity have been discovered. There are clear distinctions between social cultures in terms of collectivistic vs. individualistic tendencies.
The collected emotions were categorized as low or high-arousal emotions based on the distribution of fundamental emotions in the arousal–valence space proposed [
14,
15]. This emotional classification was used to distinguish between Arabs and Hispanics based on the distinction between their respective levels of collectivism.
The flowchart of the previously mentioned Python scripts is depicted in
Figure 2.
A link to the scripts can be found in [
22].
6. Conclusions
In this study, we examined how Twitter users communicate their sentiments and emotions in their tweets, as well as how they have adapted to the cultural qualities of their communities, as mentioned in the literature. There are certain disparities between Arabs and Hispanics, according to the data. These findings are consistent with the IDV index of Hofstede’s cultural dimensions model and the research of several other psychologists. It is also partially consistent with Hampden-Turner’s emphasis on the neutral vs. emotive cultural dimension.
According to the findings of this study, users exhibit emotions, feelings, and cultural traits that, in some respects, mirror those of their cultural peers. Although this study offers some fascinating contributions, it does feature some limitations.
- -
First, not all nations can benefit equally from Hofstede’s cultural model. For the sake of his research, Hofstede regarded all of the Arab countries as a single entity, and, he only included a portion of the Arab countries in his model. Hofstede also considered Spanish-speaking countries to be distinct countries rather than a region. To work around this, we chose Argentina as their representative as the competition was between Argentina and Saudi Arabia in the first place.
- -
We wish another competition between Arabs and Hispanics had taken place in the 2022 World Cup to increase the persuasiveness and representativeness of our research. However, Saudi Arabia and Argentina were the only Arab and Hispanic teams to compete against each other during the 2022 Qatar World Cup, with the exception of the match between an Arab nation, Morocco, and Spain (a Spanish-speaking country). This match, however, was not included in our research because our purpose was to compare two developing-world populations. Spain is classified as Hispanic, but not as belonging to Latin America, meaning that it does not belong to the category of emerging nations.
- -
The disparities between the emotionally classified tweets may indicate that machine translation from Spanish to English—both of which use Latin characters—is more effective than translation from Arabic to English, which uses a different character set. Hence, the results of the research were interpreted with caution.
- -
We sometimes could not find a score for the entire Arab world in the literature. In such cases, we deemed Saudi Arabia to be representative of the Arab world, as the competition was between Saudi Arabia and Argentina in the first place.
- -
Although this study adds to the body of international cultural research on the usage of new media, its sample size was modest. We only included a portion of the users in the two groups in the data we collected because the data were associated with a specific event. Each group only consisted of football fans.
- -
We used English-based NLP and ML algorithms to extract sentiments and emotions after tweets from non-English languages were translated into English, which may have resulted in the loss of sentimental and emotional context. This may mean that the emotional analysis was not as reliable as it would have been if the tweets had originally been written in English.
This research has numerous significant implications and contributions, despite its limitations:
- -
The authors believe that this study is the first cross-cultural comparison of developing regions using Twitter.
- -
The data from Twitter can be used to examine cross-cultural differences without requiring significant time or effort to define the geographic location or background culture by using the language in tweets as a cue to define a group culture.
- -
We show that the assessment of the sentiments and emotions in tweets in non-English languages after they are translated into English using our method may enable an evaluation of the degree of success of machine translation.
- -
The study also demonstrates that it is possible to recognize and classify users from various cultural origins using tweets as a basis.
- -
Although something vital appears to be lost in translation, the findings of this study demonstrate that emotions and sentiments were preserved, to some extent, after machine translation, since they were consistent with the findings in the literature. The research results are evidence of how effectively, to some extent, the machine translation preserved the sentiments’ and emotions’ contexts and implications.