*2.3. Network Analysis*

The scoring index for the network influence score (ii) takes into account typical approaches from social network analysis, which considers independent indexes from graph theory [31], i.e., betweenness centrality, out-degree, PageRank, and others. To detect influencer users in the network dynamics, we considered retweets as a proxy to represent an endorsement to the tweet content shared by the user. The modularity [32] detection algorithm was employed to identify communities (clusters) that compose retweet network. Basically, the modularity algorithm divides a network into a set of clusters where each node (user) belongs to only one cluster. It measures the strength of the identified clusters in the network where modularity group nodes exhibit high density with each other. The Force Atlas 2 [33] algorithm was employed to visualize the network layout. It is a force-based algorithm that draws linked nodes closer while pushing unrelated nodes farther, addressing hubs in clusters. This visualization provides a readable representation of the entire graph.

As a score index, the eigenvector centrality [34] was employed to determine the influencer nodes. Eigenvector centrality is a measure of the node's importance in the entire network weighed on the nodes' connection. For our purposes, this was the most suitable index to identify influencer nodes [35]. To calculate and compute the network analysis, score, and visualizations, we used Gephi software [36].

#### *2.4. Text Analysis and Topic Modeling*

Topic modeling is a branch of unsupervised methodology for the natural processing language applied to analyze and extract topics from a corpus of documents. This approach fit the text analysis for Twitter content quite well. Considering, the unsupervised nature of the topic modelling method, it was possible to identify the thematic structure (topics) within the set of tweet texts without any prior data manipulation, like text-labeling or training dataset. Topic modeling application allowed the discovery of the thematic structure in a large corpus of text, making it possible to organize, summarize, and visualize the latent themes and patterns present in any kind of text corpus [37].

The most common topic modeling approach used was the latent Dirichlet allocation (LDA) [38], which is a generative probabilistic model assuming that a document is composed of a set of (latent) topics, where each topic is composed of a set of words. This approach can be thought of as a

classification method instead of a numerical feature or collection of words one could group together in a meaningful way. See Figure A1 in the Appendix A for more details.

A recent application that can expand the ability of the LDA framework to gain valuable results from a large corpus of text is structured topic modeling (STM) [39]. STM provides the possibility of considering metadata associated with the text, such as the author of the tweet, the associated numerical score, and other characteristics of the overall dataset using document-level covariates. After identification of the latent topic, using the stm R package [40], we estimated the e ffect influencer score and network influencer score as covariates had on topic prevalence, exploring whether and which topics had a higher probability of appearing in tweet texts, aiming to investigate whether di fferent topics were used in di fferent ways. See Figure A2 in the Appendix A.
