*2.1. Proposed Methodology*

Using Twitter public streaming API, tweets released on the 10th of May 2019 containing at least one of the following words or hashtags were collected and analyzed: #WorldLupusDay, #lupus, #SystemicLupusErythematosus, or #SLE. A total number of 4434 (including retweets) tweets took into account information (i.e., time, location, sources, retweets, retweet count, follower count, and friend count) were collected. Tweets came from 2813 unique users. R software was used for the analyses.

A comprehensive analysis flow is presented in Figure 1. Following the scheme of social media analytics [22], it is possible to extract patterns, discover hidden information, and outline network interactions among online communities by mining the health discussions.

**Figure 1.** Framework workflow of social media Twitter analysis.

In stage one (capturing), we collected tweet texts and information containing keywords or hashtags released on Lupus Day through the Twitter API. Next, data-cleaning and pre-processing were applied to the entire dataset obtained. In stage two, we performed data analysis using two main techniques: (a) text analysis/natural processing languages through word frequencies, n-gram, and topic modeling, and (b) network analysis and measurements (statistics and scores of the network under investigation). Stage three focused on results visualization. Visualization techniques, such as bar-charts, histograms, network graphs, and other visualization types, assumed a key role in interpreting and presenting results.

#### *2.2. Data Cleaning and Pre-Processing*

Data were gathered to employ retweet [23] packages belonging in the R software. On the basis of data collected, the influencer score and network influence score were calculated. The influencer score represents a proxy to identify the small percentage of users who have a large connection (followers) to a large audience who follow them and have established a sort of trust in which their posted content creates perceived influence [24]. On the other hand, the network influence score, which is based on the

number of retweets received by other users, represents a sort of endorsement of a specific content or message shared. The further a tweet spreads, the more influence the user has.

We can summarize the two scores by saying that the first score is more oriented toward the enormous attraction of followers one is able to obtain based on shared lifestyles, opinions, and textual content [25]. The second score is more based on the attention and endorsement that a tweet content (or a set of tweets) is able to achieve, being shared throughout a user network in a certain span of time [26].

Despite the e fforts and increasing interest in properly measuring and assessing an influencer's score, when detecting a user's ability to maximize and spread content and thus shape followers' perceptions and behavior there is still a clear lack of widely recognized measures that are able to do so [27]. Nevertheless, some studies, especially from marketing literature [28,29], have developed robust measures to gain solid proxies of the social media influencers' e ffect. In our study, we obtained the influencers score, aggregating the performance of Twitter indicators addressed by Anger Isabel and Kittl Christian [30]. The score index was calculated as the average of the sum of three di fferent ratios: the ratio between the number of followers over the number of following (Rf); the retweets and mention ratio (Rrt), which is calculated as total retweet count over the total number of tweets created; and the interaction ratio (Ri) obtained dividing total retweets count by the number of followers. The aggregation of three independent ratios reduced the possibility of misinterpretation based on the mass-followers e ffect. Nevertheless, it is important to keep in mind that other measures exist, which could integrate even more sophisticated scores [29].
