Analyzing Large-Scale Political Discussions on Twitter: The Use Case of the Greek Wiretapping Scandal (#ypoklopes)

Dimitriadis, Ilias; Giakatos, Dimitrios P.; Karamanidis, Stelios; Sermpezis, Pavlos; Kiki, Kelly; Vakali, Athena

doi:10.3390/journalmedia5030085

Open AccessArticle

Analyzing Large-Scale Political Discussions on Twitter: The Use Case of the Greek Wiretapping Scandal (#ypoklopes)

by

Ilias Dimitriadis

^1,*,

Dimitrios P. Giakatos

¹

,

Stelios Karamanidis

¹

,

Pavlos Sermpezis

^1,*,

Kelly Kiki

² and

Athena Vakali

¹

School of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece

²

iMEdD (Incubator for Media Education and Development), 10562 Athens, Greece

^*

Authors to whom correspondence should be addressed.

Journal. Media 2024, 5(3), 1348-1363; https://doi.org/10.3390/journalmedia5030085

Submission received: 29 March 2024 / Revised: 31 August 2024 / Accepted: 10 September 2024 / Published: 19 September 2024

(This article belongs to the Special Issue Data Journalism: The Power of Data in Media and Communication)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

In this paper, we study the Greek wiretappings scandal, which was revealed in 2022 and attracted significant attention from the press and citizens. Specifically, we propose a methodology for collecting data and analyzing patterns of online public discussions on Twitter. We apply our methodology to the Greek wiretappings use case and present findings related to the evolution of the discussion over time, its polarization, and the role of the Media. The methodology can be of wider use and replicated to other topics. Finally, we publicly provide an open dataset and online resources with the results.

Keywords:

data journalism; data analysis; visualization

1. Introduction

A prolonged monitoring of the mobile phones of journalists and politicians was revealed in 2022 and shook the Greek political scene to its core. The scandal, commonly known as the Greek Wiretapping Scandal, has been in the public sphere for more than a year and has attracted significant attention from Greek and international media Jazeera and Agencies (2022); Partsakoulaki (2023); Reuters (2023); Schmitz (2022); Smith (2023), and raised a lot of discussions on social media.

In particular, the hashtags #υποκλοπες and #ypoklopes (the words in Greek and “Greeklish” for wiretapping) were among the top so-called “trending topics” on Greek language Twitter for several months. The importance of the topic and the large public interest motivated a data journalism effort for tracking, mapping, characterizing, and analyzing the public discussions on Twitter about the scandal. Tracking online discussions has been proven useful in a number of cases for extending insights on public issues AlShehhi et al. (2019); Alsinet et al. (2017); Bruns and Burgess (2012).

The contributions of this paper are threefold: we present the methodology, dataset, and results of this efforts. Despite the fact that our main focus is on the Greek wiretapping scandal, our methodology is applicable to other use cases of online discussions as well, and reveal insights and trends. Specifically, our contributions can be summarized as follows:

We design a methodology for monitoring and analyzing large-scale political discussions on Twitter (Section 3 and Section 4). The methodology is generic (i.e., it can be replicated to other use cases) and includes the steps of data collection, political inference, bot detection, polarization quantification, and analysis of users and content. An overview of the methodology is depicted in Figure 1.
We study the Greek wiretappings scandal (see details in Section 2) and collect a dataset of the entire discussion on Twitter with a duration longer than a year (Section 3.2). Moreover, we compile a number of complementary datasets related to political attributions, accounts of the Media, and Twitter bot accounts (Section 3.3). We publicly share these datasets in Dimitriadis et al. (2023).
We present and discuss the findings of our analysis (Section 5). For example, we show how the volume of tweets changes over time and significantly intensifies upon major events or news publications (Section 5.1), we analyze the role of the Media as the main drivers and influencers of the online discussion (Section 5.2), and we quantify the participation of users attributed to the political “Left” and “Right” in the discussion (Section 5.3) and how polarized they are (Section 5.4).

Supplementary to this paper, all the results of our analysis can be accessed in an online Web portal Dimitriadis et al. (2023), and an extended critical discussion of the results can be found in the online article Kiki et al. (2023).

2. The Greek Wiretapping Scandal

At the time of writing this paper, the Greek wiretapping scandal has been in the public sphere for 16 months. For a better understanding of the scope and results of our study, we provide the background and history of the scandal. In the following list we enumerate in chronological order a list of main events and news publications related to the wiretapping scandal:

14 November 2021: the newspaper “Efimerida ton Syntakton” reports that the National Intelligence Service (NIS) is monitoring citizens, among whom is a Greek journalist covering the story of a 12-year-old refugee from Syria1.
16 November 2021: Two days later, the reporter and member of the journalist network “Reporters United”, Stavros Malihoudis, publishes an article entitled “I am the journalist under surveillance by the NIS”2, explaining that he became aware of the fact by the aforementioned story
16 December 2021: Two studies are published by the University of Toronto’s Citizen Lab3 and Meta4 on the unknown, at that time, Predator spyware, with its clients possibly extending to Greece.
January 2022: The journalism groups “Reporters United”5 and “Inside Story”6 published investigations into the passing of an amendment in the Greek parliament on 31 March 2021, that altered the rules for the lifting of confidentiality of communications in Greece and for Predator and “business in Greece”, respectively. In the months that followed, the two groups were at the forefront of the journalistic investigation into the issue of surveillance and wiretapping, which, however, took a long time to start being extensively covered by the mainstream media and, consequently, widely discussed by citizens.
6 April 2022: The journalist Thanasis Koukakis files a complaint with the Hellenic Authority for Communication Security and Privacy (ADAE), requesting an investigation into the case of infection of his cell phone with Predator spyware. In the following days, three related journalistic investigations are published.
11–15 April 2022: Journalists Tasos Telloglou and Eliza Triantafyllou publish two related investigations in Inside Story, under the headlines “Who was monitoring journalist Thanasis Koukakis’s cell phone?” and “Koukakis surveillance case: The state knows” on 11 and 14 April, respectively. Also, on 15 April, journalists Nikolas Leontopoulos and Thodoris Chondrogiannos published an investigation in Reporters United, according to which the government was monitoring journalist Thanasis Koukakis. In the meantime there has been a statement by the Deputy Minister to the Prime Minister and Government Spokesperson, Yiannis Economou, where he had referred to the Koukakis case as surveillance by a private individual, stating, among other things, that “obviously it is unthinkable in a country like Greece, under the rule of law, for any private individual to be able to monitor another private individual”—a statement that Thanasis Koukakis himself had commented on Twitter.
19 May 2022: Google’s Threat Analysis Group announces their assessment, put forth “with high confidence”, that government-backed actors in at least eight countries, including Greece, have obtained exploit software.
26 July 2022: Nikos Androulakis, president of the socialists’ political party (PASOK) and MEP, reports the attempted wiretapping of his cell phone.
4 August 2022: Publication of a journalistic investigation by “Reporters United” and “Efimerida ton Syntakton”, according to which, certain transactions of Grigoris Dimitriadis (Secretary General of the Prime Minister) are linked to a former manager of the company Intellexa, which markets the Predator spyware.
5 August 2022: Resignation of Grigoris Dimitriadis and the head of the National Intelligence Service (NIS) Panagiotis Kontoleon.
26 August 2022: Debate in Parliament on the wiretapping case.
10 September 2022: A Greek parliament member (with the party of SYRIZA) reports the attempted interception of his mobile phone.
7 November 2022: Publication of the first list of public figures allegedly under surveillance by the newspaper “Documento”.
8 December 2022: Debate on the new draft law on the NIS at the Plenary Session of the Parliament.
10 January 2023: The Prosecutor of the Supreme Court opines that Hellenic Authority for Communication Security and Privacy is not responsible for handling citizens’ requests for information on whether they have been placed under surveillance and cannot address mobile telephony providers in this regard.

The scandal took a long time to start being extensively covered by the mainstream media and, consequently, widely discussed by citizens. As can be seen in Figure 2, where we annotate the above events along with the total number of relevant tweets, the main discussion on Twitter started only in August 2022.

3. Data

3.1. Twitter API

Twitter provides access to public data on the platform (user profiles, tweets, etc.) through an official API (v2) Twitter (2023) that can be used by all registered users. The API includes the Search API, which can be used to retrieve historical data, and the Streaming API, which can collect real-time data.7

To narrow down the collection of tweets on a specific topic, a set of keywords and/or hashtags can be given as input in the API call. The returned tweets will need to contain at least one of the terms of the given set. In our study, we compiled the #ypoklopes dataset by using the keywords and hashtags as described in Section 3.2.

Remark: The Twitter API has some restrictions, e.g., in terms of volume of data that can be collected. In our case, we used an academic licence that enabled us to retrospectively retrieve the entire volume of tweets relevant to the targeted discussion.

3.2. The #ypoklopes Dataset

Keywords and Hashtags. To collect the desired data, we selected a set of keywords and hashtags of terms (e.g., “wiretapping” or “spyware”) and names (of journalists or politicians) related to the scandal. The detailed list is given in Table 1.

The selection of these keywords was based on journalistic and technological criteria; after a thorough initial listing of terms relevant to the conversation about the wiretapping case on Twitter, the available data were quantified and subjected to quality control. For example, while the acronym “ΕΥΠ” (National Intelligence Service or “NIS”) was initially included in the terms, we found that our searches with the term “ΕΥΠ” would mainly return data that was not relevant to the subject. Therefore, it was eventually not included in the data collection terms.

Based on specific events during the tapping scandal period, the initial set of hashtags was enriched with new ones, solely for the period of each event, but the main set of tracked hashtags remained the same. The web application Dimitriadis et al. (2023) (https://ypoklopes.csd.auth.gr, (accessed on 10 September 2024)), which provides complementary support to this research, allows for the identification of such events and their corresponding hashtags.

Similarly, also for reasons related to ensuring the relevance of the data to the subject, (i) the hashtag #ανδρουλακης, which refers to the member of the EU parliament (MEP) and President of PASOK-KINAL, Nikos Androulakis, was added to the data collection criteria for tweets posted from 20 July 2022 (previous mentions of him on Twitter relate mainly to his activity as an MEP), and (ii) all hashtags corresponding to the names of individuals are included in the data collection criteria for tweets posted until 28 November 2022.

Finally, we keep only tweets in Greek.

Collection period. We collected data from the beginning of 2022 onward. In the first quarter of 2022 there was no conversation on Twitter, except for a few, not very relevant, tweets. Hence, we restrict our research to the main corpus of tweets, starting on 1 April 2022. This time point corresponds to the publication of Thanasis Koukakis’ monitoring case and related journalistic revelations.

3.3. Complementary Datasets

Political parties and politicians. For analyzing political aspects of the discussion (see Section 5), we collected all the accounts of Greek parties, members of the Greek parliament (MPs), and Greek members of the European parliament (MEPs) on Twitter, again using the Twitter API. We retrieved the official Twitter usernames of Greek MPs, MEPs, and Parties using the relevant data on “Vouliwatch” Vouliwatch (2023), an independent, non-profit open governance initiative, and the European Parliament’s website Parliament (2023).

The six parties participating in the Greek Parliament (2019–2023) are as follows: New Democracy, SYRIZA, PASOK-KINAL, KKE, MeRA25, and Greek Solution. The government party (New Democracy) and “Greek Solution” are generally considered to be positioned on the political “Right”, while the parties SYRIZA, KKE, and MeRA25 are on the political “Left”, and PASOK-KINAL is in the political “Center”. We annotate the accounts of the parties and the affiliated politicians in the categories “Right”, “Left”, and “Center”, respectively.

Media and journalists. We wanted to characterize the activity and role of the Media and journalists in the public discussions on Twitter. Due to the lack of relevant data sources, we had to rely on manual annotation of these accounts. However, given that in the collected dataset we identified more than 33K unique users, a manual annotation for all of them was infeasible. Hence, we considered a set of the most “prevalent” unique accounts, i.e., the top 500 accounts in each of the following categories: those who (i) posted the most tweets, (ii) responded the most to third-party tweets, (iii) posted the most quotes, (iv) were quoted the most by others, and (v) were the most influential in the discussion about wiretapping (further discussed in Section 4.4). We ended up with a set of 2262 unique users, out of which 407 users (18%) were identified as “Media/journalists” (including mass media, newsrooms, journalists, or blogs).

Twitter bots. Taking into account the evident presence of bots (i.e., automated accounts) Cresci (2020) on Twitter and their impact, especially on political discussions Ferrara (2020); Ratkiewicz et al. (2011), we investigated which of the most prevalent accounts could be classified as bots. To this end, we used an online state-of-the-art tool for automated bot detection, namely Bot-Detective Dimitriadis et al. (2021); Kouvela et al. (2020). In contrast to other political discussions and analyses on Twitter Dimitriadis et al. (2023), only a minor percentage (1.4%) of accounts were identified as actual bots.

Organizations and individuals. Again, manually, we annotated as “Organizations” the accounts corresponding to organizations (excluding media and political parties), brands, etc. Finally, the remaining accounts were annotated as “Invididuals”.

The detailed statistics of the account types (among the most prevalent accounts) in our datasets are provided in Table 2.

4. Methodology: Users and Network Characteristics

Apart from quantitative metrics (e.g., volume of tweets per day, or most active users) that can be calculated directly from the collected dataset (Section 3.2), for a deeper understanding of the phenomena taking place on the online discussion, we also need to analyze the user interactions and their opinions (microscopic level), as well as the macroscopic characteristic of the entire social network of users participating in the discussion.

In this section, we present our methodology for mapping user interactions to graphs (Section 4.1), inferring the political attribution of users (Section 4.2), quantifying the polarization of the network (Section 4.3), and identifying the most influencing users (Section 4.4).

4.1. Graph Generation

The data that have been collected can be used to map the interactions between the involved users in the form of a graph. This data consist of pure tweets, retweets, quotes, and replies. Pure tweets may include mentions to other users, while all the other types definitely include at least one.

To form a graph from the collected dataset, we proceeded as follows: Let us consider a user, u, who retweets, mentions, replies to, or quotes another user,

u^{'}

. We represent the users u and

u^{'}

as nodes of a graph; between these nodes/users we assign an edge that represents the bidirectional connection between them.

We (i) consider interactions

u \to u^{'}

and

u^{'} \to u

to be equivalent (i.e., undirected graph) and (ii) each type of interaction (retweets, mentions, etc.) is treated equally and contributes to a single undirected edge between users, regardless of the number of interactions (unweighted graph).8

Figure 3 depicts an example of a graph formation from interactions on Twitter. To implement the graph generation, we used the open-source PyPoll library Giakatos et al. (2023).

For our analysis, we generate a separate graph for each day. Each graph represents the interactions between users for this specific day9. These graphs will be further used to identify the most influential users in Section 4.4 and to calculate the polarization between different groups of users in Section 4.3.

4.2. Political Inference

A part of our analysis focuses on the political opinions of the users participating in the discussion and the overall polarization of the discussion. To this end, we attribute to each of the 33 K users a political opinion in the following way: (i) For each political party and politician account (see Section 3.3), we collect their followers from the Twitter API. Here, we remind you that each political party and politician account is annotated as “Right”, “Left”, or “Center” (see details in Section 3.3). (ii) Then, for each user participating in the discussion of wiretappings, we find how many “Right”, “Left”, or “Center” accounts they follow. (iii) If a user follows more “Right” accounts, then we attribute their opinion to the political “Right”, and we do the same for the “Left”. (iv) For users who follow an equal number of Left and Right political accounts, or follow more Central accounts, we annotated them as Central. (v) Finally, users who do not follow any political accounts are defined as neutral.

Remark: To examine the robustness of our political inference methodology, we considered different thresholds for the fraction of “Left”/“Right” follows; e.g., for a threshold of 0.75, an account is considered as “Left” if at least 3 out of 4 of the politicians accounts it follows correspond to “Left” politicians, otherwise it is considered “Central” or neutral. While the absolute number of “Left”/“Right” accounts are different for different thresholds, the qualitative results for the polarization in our study do not change.

4.3. Polarization Detection

Polarization is a social phenomenon that has been studied for decades Isenberg (1986). The term means that a society is split into two or more groups, based on the opinions of its individuals on a topic, and individuals in each group tend to adopt the views of the group. Many scientists have measured that phenomenon in social networks like Twitter Matakos et al. (2017).

For our analysis, we quantify the polarization of the discussion about the topic of wiretappings using the Friedkin and Johnsen (FJ) polarization metric Matakos et al. (2017); this metric takes on values in the interval from 0 (no polarization) to 1 (high polarization). The FJ metric measures the degree of opinion polarization within a group. It is based on an opinion dynamics model (Markov Chain), which accounts for both social influence (opinions of neighboring node in the graph) and individuals’ resistance to changing their initial opinions. The metric is calculated by assessing the variance in opinions among group members at equilibrium (stationary distribution), reflecting the extent to which opinions are spread out.

The calculation of the FJ polarization metric is based on the user graph (Section 4.1) and the political inference of the users (Section 4.2), using the PyPoll library Giakatos et al. (2023). In the corner case, where all the users that are attributed to the political “Right” interact only with other “Right” users, and users attributed to the political “Left” interact only with other “Left” users, then the polarization metric would be 1. On the contrary, if every user interacts with the same number of “Right” and “Left” users, then the network would not be polarized. In practice, these corner cases never happen in online discussions, and polarization values are between 0 and 1.

4.4. Influencer Identification

We identify the most influential users for each daily graph based on graph algorithms that rank nodes based on their importance: the higher ranked graph nodes are identified as the most influential users. We tested several algorithms, namely Pagerank, Betweenness Centrality, and NetShield Chen et al. (2016), and obtained similar results for all of them.

We selected to proceed with NetShield due to its efficiency and relevance to our objectives. Specifically, NetShield operates by optimizing the selection of a subset of nodes (influencers) that, if immunized or removed, would minimize the spread of information or the overall influence in the network. This aligns with our goal of identifying users who play critical roles in information dissemination. Finally, it is also able to handle large-scale networks efficiently10, making it ideal for our dataset, which involves extensive interactions and a significant number of users.

5. Analysis and Results

5.1. Quantitative Analysis

We analyze the online discussions for a period of almost a year. Specifically, we calculate the following total and daily statistics:

number of tweets, hashtags, users, and URLs;
most liked, retweeted, and replied tweets;
most mentioned, influencing, and active users;
most shared URLs, images, and videos;
most popular textual content (words, phrases, and hashtags).

All the results of our analysis (daily and aggregate11), as well as interactive visualizations of them, are available through a Web portal Dimitriadis et al. (2023). Here, as an example, we present in Figure 4 the total number of posts per day, within the time period where the discussion actively took place on Twitter.

Beyond this quantitative analysis, in the following sections further investigate specific aspects of the online discussion (for an extended critical interpretation of these results, see also Kiki et al. (2023)).

5.2. The Role of the Media

Individuals vs. the Media: who drives the discussion? By analyzing the top 500 accounts, we observe that while 18% of the total sample are media/journalist accounts, when we focus solely on the 500 users that posted the most tweets, the percentage of media/journalist accounts rises to 35% and individual participants comprise 61%. On the contrary, over 90% represents the share of individuals regarding the analysis of the 500 accounts that posted the most replies and quotes. These findings show that a large part of the online discussion is driven by the Media, while individual users tend mostly to share opinions rather than create original content.

This role of the Media can also be seen in Figure 2, which presents the daily number of tweets over time and includes annotations of the major events and publications related to the wiretappping scandal. We can observe that several of the spikes (i.e., a significant increase in tweets) is correlated with Media publications (see Section 2).

Activity of the Media on Twitter. The Media plays an important role in disseminating the news, in general, and on Twitter, as analyzed above. Exploring the activity of the Media/journalist accounts, we find that they mainly post content, rather than interact with other users. Moreover, the majority of the top 20 accounts with the most posted tweets belong to news websites. On the other hand, the Media rarely gets involved in online interactions with other users; e.g., in the list of the top 20 accounts with the higher number of responses to third parties, there is only one journalism group.

Media and journalists among the top “influencers”. Eleven out of the top 20 accounts identified as the most influential (see Section 4.4) belong to media organizations or journalists. The other accounts on this list correspond to politicians (e.g., the Greek Prime Minister) or individuals with many followers. Furthermore, looking at the list of the top 20 websites included in tweets, we find that this includes media outlets, the vast majority of which are harshly critical of the government, including those who have been at the forefront of reporting on the wiretapping case.

5.3. Political Opinions: Left vs. Right

We conduct a “political profiling” of all users participating in the online discussions for the wiretapping scandal. Note: we remind you that the government party is classified as being on the political “Right”. Our analysis shows that 70% of the total tweets were posted by accounts that tend to follow mostly left-wing parties and politicians, while only 20% of the volume of posts seems to come from users that are attributed to the political “Right”.

However, the distribution of users is not so imbalanced: 42% are attributed to the political “Left”, 29% to the “Right”, 6% to the “Center”, and the remaining 23% are classified as “Neutral”. Combining with the corresponding distribution of tweets analyzed above, this highlights that “Left” users make up slightly more (42%) of the users but they “talk” a lot more (70% of the tweets).

5.4. Polarization

Overall polarization. We calculate the polarization index (Section 4.3) of the discussion on a daily basis, and we find that its values are consistently above 0.5. This indicates a high level of polarization, i.e., users form groups (of similar views) and do not actively interact with other groups.

An example of this phenomenon is depicted in Figure 5, which shows the user graph (Section 4.1) for the day of the resignation of the Secretary General of the Prime Minister and the head of NIS (5 August 2022). This day had the largest amount of total tweets (more than 35 K). In the graph, users that are classified as “Left” and “Right” are presented as red and blue dots, respectively. It can be clearly seen that the graph consists of two distinct, weakly connected groups.

Moreover, Figure 6 presents the user graph generated based on the data from a 9-month period. Users are colored based on the exact party they are attributed: New Democracy (blue), SYRIZA (orange), MeRA 25 (yellow), KKE (red), and PASOK-KINAL (green). We can see that, even when considering these more fine-grained groups, the separation of users based on their political attribution is clear.

The effect of different type of users on polarization. Figure 7 presents, with a blue line, how the value of the polarization index (PI) changes over time. The PI is calculated on the graphs of the tweets/users for each day. We observe that the PI varies around a value of 0.5, with some days having higher PI values (up to 0.7) and some lower values (less than 0.3). These variations depend on the numbers of tweets and users participating in the discussion, but also on who those users are (i.e., their political affiliation), since the discussions at different time periods may focus on different aspects that engage audiences of different political views.

Moreover, Figure 7 presents the daily values of the PI calculated on the graphs without taking into account the users corresponding to political parties and politicians (orange line), or without the Media and journalists (blue line), or without influencers (red line)12. We can see that not taking into account the interactions with these users leads to an increased polarization index; in other words, these types of users are the connecting hubs between users of different political views. Political figures have the least impact, and the largest impact is by omitting influencers (which is, however, expected due to the definition of these types of users). Noteworthy, the impact of journalists and the Media is as high as those of all Influences, indicating the role of those types of users in the public discourse around #ypoklopes.

Polarization vs. political inference. Finally, we perform an analysis of the polarization results by examining the sensitivity of the political inference approach. As discussed in Section 4.2, if a user follows more “Right” accounts than “Left”, we assign to it a “Right” political opinion. Here, we apply variations of this political attribution by assigning an opinion if at least x% of the accounts followed by a user are “Right” (or “Left”, respectively), e.g., for a threshold

x = 50 %

, a user is inferred as “Right” if at least half of the accounts they follow are “Right”. In Figure 8, we present the PI for the entire graph (over all days) for three different thresholds: 0 (i.e., the default methodology), 50%, and 70%. We also calculate the same PIs for the graph without the different types of users we analyzed above. As expected, by becoming stricter of how we assign a political inference to a user (i.e., for larger threshold values), the PI reduces. However, the important finding is that the effect of the different types of users remains consistent across all thresholds; removing political figures, journalists, the Media, or influencers always increases the PI, and the effects are similar (qualitatively), no matter what threshold is selected.

5.5. Communities

In the concluding phase of our analysis, we implemented a community detection approach on the comprehensive user graph spanning from April to December. Our chosen method is the highly popular Louvain Algorithm Blondel et al. (2008), a heuristic approach designed to optimize modularity, a measure of the strength of network community structure. This method iteratively merges nodes to enhance the overall quality of the network partition, efficiently identifying communities by maximizing a quality function rooted in network structure. Notably, the network is constructed based on bilateral interactions among users.

The outcome of this analysis uncovers the presence of eight prominent communities, alongside a multitude of smaller ones. We further explore these communities by overlaying them with the “political profiling” methodology previously introduced, which offers insight into the internal composition of each community. The findings, particularly for the top 10 communities based on their size, are visually represented in Figure 9.

Specifically, Figure 9 (left) shows the top-10 communities as circles. The position of the circle in the x-axis indicates the size of the community and the position in the y-axis (as well as the color) indicates the fraction of Right/Left users in the community. We can see that the largest community (around 5000 users) consists of 50% more Left users than Right. The second largest community is towards Right political opinions. The next two communities (around 3000 users) are both Left-oriented, while the 6 smaller communities are either Left or Right but are less polarized (y-axis values closer to 0).

The same information is presented in Figure 9 (right), which shows the exact percentages of Left (red), Right (blue), and neutral (green) users.

Observations indicate a distinct alignment between each community and a predominant political attribute. This corroborates our earlier observation that users predominantly engage with others sharing similar political beliefs, underscoring the significance of political ideology in shaping online interactions.

6. Related Work

Twitter has become an essential platform for political discourse, facilitating large-scale discussions on a wide range of political issues. As presented by Robertson et al. (2019), who analyzed the democratic role of social media in political debates by examining Twitter activity during the first televised US presidential debate of 2016, Twitter commentary was mainly humorous and negative, but also showed that it played a key role in fact-checking and sharing information, thereby influencing public discourse. In the following, this section reviews recent research on the role of the Media, bots, polarization, and influential accounts in shaping political conversations on Twitter.

The Role of the Media: The influence of the Media on political discourse on Twitter is significant, as media organizations often act as primary drivers of discussions. Bruns and Burgess (2012) highlighted how media organizations use Twitter to engage with audiences and influence political narratives. Their study found that Twitter serves as a platform for news dissemination and public debate, which can shape political narratives. Another study AlShehhi et al. (2019) highlights the fact that media outlets played a crucial role in framing discussions and driving public engagement on political and social issues, after analyzing Twitter discussions before, during, and after Ramadan. The impact of news media has also been showcased in a recent research paper Dagoula (2019), which reveals that political elites and established news organizations maintain their influential positions within Twitter’s political conversations, despite the platform’s potential for democratizing public discourse. Dagoula (2019) features the different role of the Media in the challenges of the digital environment, where, although they do show high activity, their authority levels have been compromised.

The Role of Bots: Bots play a significant role in shaping political discourse on Twitter by amplifying certain narratives and influencing public opinion. Bessi and Ferrara (2016) explored the role of social bots in distorting the online discussion during the 2016 US Presidential election. They discovered that bots were responsible for a substantial number of tweets, thus influencing the overall tone and content of the political discourse on Twitter. Another study Gallagher et al. (2021) examined the amplification of COVID-19-related information by elites on Twitter, noting the significant role bots play in spreading misinformation and polarizing content. Misinformation spreading by bots was also the focus of a study about false information regarding earthquakes on Twitter Erokhin and Komendantova (2023). Both these studies underscore the need for effective detection and mitigation strategies to manage bot activities that distort political discourse.

Polarization: Polarization in political discourse on Twitter has been a significant area of research, demonstrating how social media can intensify political divisions. Primario et al. (2017) investigated the dynamics of polarization on Twitter during the 2016 US presidential election. They emphasized how the platform’s features can contribute to polarization by allowing users to participate in political discussions that reinforce their pre-existing beliefs and opinions. These findings were enriched by the results of Urman (2020), where the authors explored political polarization on Twitter across 16 countries, revealing that the level of polarization varies greatly depending on the local political context and electoral systems. Two-party systems with plurality electoral rules tend to exhibit higher polarization, while multi-party systems with proportional voting show lower levels. The role of the Media in polarization has also been emphasized by another paper Gruzd and Roy (2014), where the authors used social network analysis to investigate political polarization in Canadian Twitter discourse. They found that media organizations often acted as hubs in the network, connecting disparate user groups and influencing the overall tone and direction of political discussions.

Identification of Influential Nodes: Understanding the role of influential nodes in Twitter discourse is crucial for comprehending information spread and the impact of key accounts in political discussions. One study Murthy et al. (2016) conducted a sociotechnical investigation into how influential users, including media organizations and bots, leverage Twitter to enhance network connections and disseminate information. This research provided insights into how these accounts create social capital and shape political communication through strategic platform use. Another analysis Tien et al. (2020) focused on the far-right rally in the United States, identifying key influencers and how they propagate information. As presented in this paper, identifying these influential nodes is critical for understanding and potentially mitigating the spread of extremist content.

These studies collectively highlight the complex role of the Media, polarization, bots, and influential accounts in shaping political discourse on Twitter. Our study follows a methodological framework that aspires to effectively address all these challenges. The fact that our results are in line with the main conclusions of previous researchers further support the importance of the Media in public political dialogue, especially on Twitter.

7. Discussion

In this work, we studied how a real-world scandal was reflected in online discussions on Twitter. We collected a large corpus of data and analyzed them, which resulted in findings that enhance our understanding of the public opinion about the scandal and reveal aspects (e.g., polarization) that are difficult to identify with other means. We believe that the methodology we propose can help the investigation of other use cases and provide useful guidelines and resources for other researchers and journalists.

The key steps for replicating the methodology for a given discussion are the following: (i) The first step is to identify a set of hashtags and keywords that are related to the discussion in hand; this can be done with a manual (journalistic) investigation. (ii) Using available tools and libraries (e.g., Giakatos et al. (2023)), collect the tweets for a selected period of time. (iii) For identification of Twitter accounts corresponding to political parties and politicians there are available datasets, while for Media and journalist accounts a manual annotation is needed (focusing only on the most active accounts, which typically include these kind of users). (iv) For graph generation, political inference, polarization inference, and community detection one can use our methodology (Section 4), which is based on open tools and libraries.

The findings resulting from this analysis can be used to enrich research and standard journalistic analyses (see a detailed analysis in Kiki et al. (2023)), as well as reveal important insights on the role of social media on public discourse; specifically, the following:

Similar to the work of Bruns and Burgess (2012) and Dagoula (2019), our analysis confirms that media organizations continue to hold considerable influence over the narratives within Twitter discussions. However, our study and methodology also add to this body of research by demonstrating how this influence evolves dynamically in response to real-world events, as evidenced by the intensified discussion patterns we observed in Figure 2.

In terms of political polarization, our results echo the conclusions of Primario et al. (2017) and Urman (2020), which underscore how Twitter’s platform features can exacerbate political divisions. Our analysis contributes to this understanding by revealing how these divisions manifest specifically in the context of scandal-related discourse, but mainly by offering a perspective (and the methodological steps to accomplish it) on the role of the Media and influential accounts in either bridging or widening these divides.

Lastly, in line with previous findings Murthy et al. (2016) and Tien et al. (2020), we observed that key influencers (media outlets, political figures, etc.) play a critical role in shaping the online discourse. Our approach to identifying these nodes can be directly applied to future studies aiming to map influence in political discussions on social media.

Author Contributions

All authors contributed to the conceptualization of the paper; methodology, I.D., P.S., D.P.G., K.K. and A.V.; software, I.D., D.P.G., S.K. and P.S.; validation, I.D., D.P.G., P.S. and K.K.; data curation, I.D., D.P.G., S.K., P.S. and K.K.; writing, I.D., P.S., S.K., K.K. and A.V.; visualization, I.D., D.P.G., S.K. and K.K.; supervision, K.K. and A.V. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original data presented in the study are openly available in Dimitriadis et al. (2023) and Dimitriadis et al. (2023).

Conflicts of Interest

The authors declare no conflict of interest.

Notes

1	https://www.efsyn.gr/themata/thema-tis-efsyn/319063_polites-se-kathestos-parakoloythisis-apo-tin-eyp, (accessed on 10 September 2024).
2	https://www.reportersunited.gr/6976/eimai-o-dimosiografos-poy-parakoloythei-i-eyp/, (accessed on 10 September 2024).
3	https://citizenlab.ca/2021/12/pegasus-vs-predator-dissidents-doubly-infected-iphone-reveals-cytrox-mercenary-spyware/, (accessed on 10 September 2024).
4	https://about.fb.com/news/2021/12/taking-action-against-surveillance-for-hire/, (accessed on 10 September 2024).
5	https://www.reportersunited.gr/7359/parakoloythiseis-eyp-siopi-o-vasilias-akoyei/, (accessed on 10 September 2024).
6	https://insidestory.gr/article/neo-logismiko-kataskopeias-predator-kai-oi-doyleies-stin-ellada, (accessed on 10 September 2024).
7	Since the time the survey was conducted (2022–2023), there have been significant changes in the costs incurred with accessing the Twitter API; this would differentiate the costs of the data collection phase in our methodology today.
8	Although constructing a directed and weighted graph could provide richer information, the sparsity and variability of our collected dataset would result in poor graph analysis outcomes, as well as we would not be able to apply the polarization and influencer detection algorithms (see Section 4.3 and Section 4.4).
9	Similarly, we generate graphs for longer time periods.
10	Computational complexity of NetShield is $O (n k^{2} + m)$ , where n is the number of nodes, k is the number of nodes to select, and m is the number of edges.
11	We provide date filtering as an option, so that users can explore the aggregate data and statistics that refer to a specific time period.
12	See Section 3.3 and Section 4.4, respectively, for the details of these users

References

AlShehhi, Aamna, Justin Thomas, Roy Welsch, and Zeyar Aung. 2019. Cross-linguistic twitter analysis of discussion themes before, during and after ramadan. Paper presented at the 2019 IEEE 4th International Conference on Big Data Analytics (ICBDA), Suzhou, China, March 15–18. [Google Scholar]
Alsinet, Teresa, Josep Argelich, Ramón Béjar, Cèsar Fernández, Carles Mateu, and Jordi Planes. 2017. Weighted argumentation for analysis of discussions on Twitter. International Journal of Approximate Reasoning 85: 21–35. [Google Scholar] [CrossRef]
Bessi, Alessandro, and Emilio Ferrara. 2016. Social bots distort the 2016 us presidential election online discussion. First Monday 21. [Google Scholar] [CrossRef]
Blondel, Vincent D., Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 2008: P10008. [Google Scholar] [CrossRef]
Bruns, Axel, and Jean Burgess. 2012. Researching news discussion on twitter: New methodologies. Journalism Studies 13: 801–14. [Google Scholar] [CrossRef]
Chen, Chen, Hanghang Tong, B. Aditya Prakash, Charalampos E. Tsourakakis, Tina Eliassi-Rad, Christos Faloutsos, and Duen Horng Chau. 2016. Node immunization on large graphs: Theory and algorithms. IEEE TKDE 28: 113–26. [Google Scholar] [CrossRef]
Cresci, Stefano. 2020. A decade of social bot detection. Communications of the ACM 63: 72–83. [Google Scholar] [CrossRef]
Dagoula, Chrysi. 2019. Mapping political discussions on twitter: Where the elites remain elites. Media and Communication 7: 225–34. [Google Scholar] [CrossRef]
Dimitriadis, Ilias, Konstantinos Georgiou, and Athena Vakali. 2021. Social botomics: A systematic ensemble ml approach for explainable and multi-class bot detection. Applied Sciences 11: 9857. [Google Scholar] [CrossRef]
Dimitriadis, Ilias, Pavlos Sermpezis, Stelios Karamanidis, Dimitrios Panteleimon Giakatos, George Vlahavas, Maria Michali, Athena Vakali, and Kelly Kiki. 2023. The Greek Wiretappings Scandal Web Portal. Available online: https://ypoklopes.csd.auth.gr (accessed on 10 September 2024).
Dimitriadis, Ilias, Stelios Karamanidis, Pavlos Sermpezis, Athena Vakali, and Kelly Kiki. 2023. The Greek Wiretappings Scandal Twitter Dataset. Available online: https://github.com/Datalab-AUTH/Greek_Wiretapping_Dataset (accessed on 10 September 2024).
Erokhin, Dmitry, and Nadejda Komendantova. 2023. The role of bots in spreading conspiracies: Case study of discourse about earthquakes on twitter. International Journal of Disaster Risk Reduction 92: 103740. [Google Scholar] [CrossRef]
Ferrara, Emilio. 2020. Bots, elections, and social media: A brief overview. In Disinformation, Misinformation, and Fake News in Social Media: Emerging Research Challenges and Opportunities. Cham: Springer, pp. 95–114. [Google Scholar]
Gallagher, Ryan J, Larissa Doroshenko, Sarah Shugars, David Lazer, and Brooke Foucault Welles. 2021. Sustained online amplification of covid-19 elites in the united states. Social Media+ Society 7: 20563051211024957. [Google Scholar] [CrossRef]
Giakatos, Dimitrios P., Pavlos Sermpezis, and Athena Vakali. 2023. Pypoll: A python library automating mining of networks, discussions and polarization on twitter. ACM Web Conference. Available online: https://github.com/dpgiakatos/PyPoll (accessed on 10 September 2024).
Gruzd, Anatoliy, and Jeffrey Roy. 2014. Investigating political polarization on twitter: A canadian perspective. Policy & Internet 6: 28–45. [Google Scholar]
Isenberg, Daniel J. 1986. Group polarization: A critical review and meta-analysis. Journal of Personality and Social Psychology 50: 1141. [Google Scholar] [CrossRef]
Jazeera, Al, and News Agencies. 2022. Why You Should Care about the Greek Wiretapping Scandal. Available online: https://www.aljazeera.com/news/2022/9/6/explained-the-greece-wiretapping-scandal (accessed on 10 September 2024).
Kiki, Kelly, Ilias Dimitriadis, Stelios Karamanidis, Pavlos Sermpezis, and Athena Vakali. 2023. The Greek Wiretapping Scandal on Twitter: The Course of the Conversation, Polarization and the Role of the Media. iMEdD Lab. Available online: https://lab.imedd.org/en/greek-wiretapping-scandal-twitter-conversation-polarization-media/ (accessed on 10 September 2024).
Kouvela, Maria, Ilias Dimitriadis, and Athena Vakali. 2020. Bot-detective: An explainable twitter bot detection service with crowdsourcing functionalities. Paper presented at the International Conference on Management of Digital EcoSystems, Virtual Event, United Arab Emirates, November 2–4. [Google Scholar]
Matakos, Antonis, Evimaria Terzi, and Panayiotis Tsaparas. 2017. Measuring and moderating opinion polarization in social networks. Data Mining and Knowledge Discovery 31: 1480–505. [Google Scholar] [CrossRef]
Murthy, Dhiraj, Alison B. Powell, Ramine Tinati, Nick Anstead, Leslie Carr, Susan J. Halford, and Mark Weal. 2016. Bots and political influence: A sociotechnical investigation of social network capital. International Journal of Communication 10: 20. [Google Scholar]
Parliament European. 2023. Members of the European Parliament. Available online: https://www.europarl.europa.eu/meps/en/home (accessed on 1 October 2022).
Partsakoulaki, Ero. 2023. How a Wiretapping Scandal Reinforced the Need for Independent Media in Greece. Available online: https://gijn.org/stories/how-a-wiretapping-scandal-reinforced-the-need-for-independent-media-in-greece/ (accessed on 10 September 2024).
Primario, Simonetta, Dario Borrelli, Luca Iandoli, Giuseppe Zollo, and Carlo Lipizzi. 2017. Measuring polarization in twitter enabled in online political conversation: The case of 2016 us presidential election. Paper presented at the 2017 IEEE International Conference on Information Reuse and Integration (IRI), San Diego, CA, USA, August 4–6; pp. 607–13. [Google Scholar]
Ratkiewicz, Jacob, Michael Conover, Mark Meiss, Bruno Gonçalves, Alessandro Flammini, and Filippo Menczer. 2011. Detecting and tracking political abuse in social media. Paper presented at the AAAI Conference on Web and Social Media, Barcelona, Spain, July 17–21. [Google Scholar]
Reuters. 2023. Greek Government Wins No-Confidence Vote over Wiretapping Scandal. Available online: https://www.reuters.com/world/europe/greek-pm-mitsotakis-wins-no-confidence-vote-over-wiretapping-scandal-2023-01-27/ (accessed on 10 September 2024).
Robertson, Craig T., William H. Dutton, Robert Ackland, and Tai-Quan Peng. 2019. The democratic role of social media in political debates: The use of twitter in the first televised us presidential debate of 2016. Journal of Information Technology & Politics 16: 105–18. [Google Scholar]
Schmitz, Florian. 2022. Wiretapping Scandal in Greece. Available online: https://www.dw.com/en/wiretapping-scandal-in-greece/a-64128644 (accessed on 10 September 2024).
Smith, Helena. 2023. Greek PM Survives Confidence Vote But Phone-Tapping Scandal Rumbles on. Available online: https://www.theguardian.com/world/2023/jan/27/kyriakos-mitsotakis-greek-pm-survives-confidence-vote-but-phone-tapping-scandal-rumbles-on (accessed on 10 September 2024).
Tien, Joseph H., Marisa C. Eisenberg, Sarah T. Cherng, and Mason A. Porter. 2020. Online reactions to the 2017 ‘unite the right’rally in charlottesville: Measuring polarization in twitter networks using media followership. Applied Network Science 5: 1–27. [Google Scholar] [CrossRef]
Twitter. 2023. Twitter api. Available online: https://developer.twitter.com/en/products/twitter-api (accessed on 1 February 2023).
Urman, Aleksandra. 2020. Context matters: Political polarization on twitter from a comparative perspective. Media, Culture & Society 42: 857–79. [Google Scholar]
Vouliwatch. 2023. Vouliwatch. Available online: https://vouliwatch.gr/ (accessed on 10 September 2024).

Figure 1. Overview of the methodology.

Figure 2. Total posts (including tweets, retweets, quotes, and replies) over the period of study (1 April 2022–14 January 2022). A selection of important events and publications for the wiretapping scandal are annotated.

Figure 3. Construction of the user graph.

Figure 4. Total number of posts (x-axis) per day (y-axis), grouped by type: tweets (red), retweets (blue), quotes and replies (yellow). The total number of posts between 25 July 2022, and 14 January 2022, is 953,722.

Figure 5. Graph visualization depicting users attributed to political “Left” (red) and “Right” (blue); 5 August 2022.

Figure 6. Graph visualization depicting users attributed to different political parties (colors); 1 April 2022 to 1 December 2022.

Figure 7. Polarization index (PI) variations over time, and the role of different types of users.

Figure 8. Polarization index vs. threshold for the political inference methodology.

Figure 9. User communities decomposition based on their political profile.

Table 1. Keywords and hashtags (denoted with #) used as filters in the Twitter API requests.

Keyword/Hashtag	Comments
υποκλοπές	Words (in Greek or “Greeklish”)
#υποκλοπες	for “wiretapping”, “surveillance”, etc.
#υποκλοπές
υποκλοπη
#παρακολουθήσεις
#ypoklopes
#watergate	Other commonly used terms
greekwatergate	for the topic
predator	Terms related to the software
#predator	used for the wiretappings
#predatorgate
#pega
#spyware
#δημητριαδης	Political figures
#κοντολεων	involved in the scandal
#κουκακη	Journalist (under surveillance)
#ανδρουλακης	Politician (under surveillance)

Table 2. Distribution of the most “prevalent” users.

Category	Number	Percentage
Individuals	1688	74.6%
Media/Journalists	407	18.0%
Political Accounts	97	4.3%
Organizations	38	1.7%
Bots	32	1.4%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dimitriadis, I.; Giakatos, D.P.; Karamanidis, S.; Sermpezis, P.; Kiki, K.; Vakali, A. Analyzing Large-Scale Political Discussions on Twitter: The Use Case of the Greek Wiretapping Scandal (#ypoklopes). Journal. Media 2024, 5, 1348-1363. https://doi.org/10.3390/journalmedia5030085

AMA Style

Dimitriadis I, Giakatos DP, Karamanidis S, Sermpezis P, Kiki K, Vakali A. Analyzing Large-Scale Political Discussions on Twitter: The Use Case of the Greek Wiretapping Scandal (#ypoklopes). Journalism and Media. 2024; 5(3):1348-1363. https://doi.org/10.3390/journalmedia5030085

Chicago/Turabian Style

Dimitriadis, Ilias, Dimitrios P. Giakatos, Stelios Karamanidis, Pavlos Sermpezis, Kelly Kiki, and Athena Vakali. 2024. "Analyzing Large-Scale Political Discussions on Twitter: The Use Case of the Greek Wiretapping Scandal (#ypoklopes)" Journalism and Media 5, no. 3: 1348-1363. https://doi.org/10.3390/journalmedia5030085

APA Style

Dimitriadis, I., Giakatos, D. P., Karamanidis, S., Sermpezis, P., Kiki, K., & Vakali, A. (2024). Analyzing Large-Scale Political Discussions on Twitter: The Use Case of the Greek Wiretapping Scandal (#ypoklopes). Journalism and Media, 5(3), 1348-1363. https://doi.org/10.3390/journalmedia5030085

Article Menu

Analyzing Large-Scale Political Discussions on Twitter: The Use Case of the Greek Wiretapping Scandal (#ypoklopes)

Abstract

1. Introduction

2. The Greek Wiretapping Scandal

3. Data

3.1. Twitter API

3.2. The #ypoklopes Dataset

3.3. Complementary Datasets

4. Methodology: Users and Network Characteristics

4.1. Graph Generation

4.2. Political Inference

4.3. Polarization Detection

4.4. Influencer Identification

5. Analysis and Results

5.1. Quantitative Analysis

5.2. The Role of the Media

5.3. Political Opinions: Left vs. Right

5.4. Polarization

5.5. Communities

6. Related Work

7. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Notes

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI