**3. Results**

From the dataset composed of 4434 tweets, a network to analyze the network influencer score was created involving 2813 unique users and employing a direct graph. Each node represented a user and the edge between two nodes was established when a user's tweet was retweeted. We considered the giant component network and the smaller disconnected components were dropped out (18.3%). More details on the network analysis are provided in the Appendix A. See Figures A6 and A7.

The size of the nodes was proportional to the number of social connections based on the number of retweets a specific user received. Nodes and edges had the same color if they were linked to each other, making the detection of communities possible. The node position in the network was determined by a heuristic that attempted to locate nodes connected closer together, which thus revealed the communities' structure. See Figure 2.

**Figure 2.** Retweet network analysis.

The community detection algorithm found 25 communities (clusters). The top five communities accounted for more than 55% of all network connections. Applying the eigenvector centrality algorithm to detect the most influential users, five nodes emerged as the most influential. These five users received more attention, intended as the number of retweets, allowing them to catalyze a vast amount of attention based on their tweet text content shared. We asked the top influencers identified for their permission to display their account name. Four of them consented to display their names; for the others, we used anonymized acronyms to identify the account type.

As reported in Table 1, only one account appeared in both influencer scores. This was due to the fact that the two scores were intended to measure di fferent dynamics. Nevertheless, considering the specificity of the dataset collected, in this case it was also true that two di fferent types of influencers played a di fferent role and showed di fferent features in attracting attention based on their posted content. Interestingly, the highest scored user was Peter Morley, whose network is weakly connected with the rest of the main users' connections. He is easily visible in Figure 2 with his peripheral position in the network structure.


**Table 1.** Top scored influencers.

After the influencer score analysis and the network relationship measurement, tweet text analysis was employed. We adopted STM on the entire tweet text dataset.

When performing STM, several steps need to be addressed before reaching the final evaluation, including the identification of the proper number of topics (k) that better represents the number of themes in the text corpus. Di fferent approaches exist; no one is more correct than the others. In our analysis, we based the optimal number of latent topic on "Gri ffiths" [41] and "CaoJuan" [42], which are metric scores implemented in the ldatuning package [43] that use the log-likelihood method via Gibbs sampling. Gri ffiths metrics maximize likelihood, while CaoJuan metrics minimize divergence between topics. As a result, the optimal number of topics (k) for our dataset was 12 topics. In the Appendix A, Figure A3, the optimal number of topic plots is provided.

Another step in the STM that needs to be addressed before reaching the final evaluation is the choice of the model that best estimates the possible outcome. There are di fferent initialization parameters that need to be evaluated, discarding models with low likelihood values [40]. Even in this case, there is no ground truth approach. However, assessing the quality of the models by considering the trade-o ff between semantic coherence [44] and exclusivity [39] for each topic within the model is one of the most suitable approaches. The semantic coherence metric is related to pointwise mutual information that measures the most probable words in a specific topic that occur together. The exclusivity measure includes information on word frequency employed in the FREX metric [45]. These measures provide the distinctness of the topics, making possible a comparison of the highest scores, ensuring the quality of the model selected. Plots and results of the selected model are provided in the Appendix A Figures A4 and A5.

The results of the topic model are shown in Figure 3. Specific words were linked to specific topics accordingly with their (beta) β probabilities of belonging to the topic. Topic labels were not automatically generated. Label selection was the moment when researchers analyzed the results after the parameter setting to check what emerged from the model's execution, and to decide whether the emerged allocation was coherent, or if more model executions were needed. In our case, for each topic a specific label was identified using the authors' judgment obtained through an open discussion until a

consensus was reached. Indeed, topics were interpreted and labelled on the basis of the probability of each word belonging to each specific topic.

**Figure 3.** Topics and themes identified in the tweet text corpus.

In doing so, we also checked the most representative tweets related to the topics, to better understand the meaning of the topics by inspecting highly correlated tweet texts. A sample of the topics and the associated tweets are reported in Table 2.

> **Table 2.** Most representative tweet texts and topic label selection.

#### **Learning and sharing (topic 1):**

"To anyone with Lupus, it does ge<sup>t</sup> better. With time you learn your triggers, you learn to pace yourself and most importantly you learn to listen to your own body."; "Help us spread awareness for #lupus on #WorldLupusDay!" "Learn more about #lupus brain fog and ge<sup>t</sup> tips for coping with it in our article at."

#### **Information and advice (topic 2):**

"Do eat a healthy, balanced diet try to stay active when you're having a flare-up try walking or swimming ge<sup>t</sup> lots of rest try relaxation techniques to manage stress"; "stress can make symptoms worse." "For information about available support, please take a look at our article here."

#### **Feeling loneliness (topic 3):**

"Invisible. For everyone with a disability or an illness that can't be seen. YOU are not alone, WE are not alone. Today is #WorldLupusDay and we are especially thinking of everyone in the world who has #Lupus #invisibleillness #chronicpain #health #mentalhealth."

"In conjunction of special day for this invisible illness I would like to encourage everybody to appreciate your health and for all Lupus fighter in the world."

### **Table 2.** *Cont*.

#### **Spread awareness (topic 4):**

"MAY 10 is WORLD LUPUS DAY! Spread Lupus Awareness share the Lupus In Color Butterfly Woman. Spread Lupus Awareness Today!"; "Today is World Lupus Day! Show me your purple! #LupusAwarenessMonth,"; "I chose purple, and you?"

#### **Social support (topic 5):**

"Today around the world #Lupus advocates, patients, and amp; supporters are working hard to spread #LupusAwareness. For #WorldLupusDay we'll highlight our #LupusChat community members, advocates, caregivers, doctors, and friends who work tirelessly daily to educate others about Lupus." "Just because something doesn't directly a ffect you doesn't make it irrelevant. Sending out strength and encouragemen<sup>t</sup> to everyone battling lupus, extra love to my queen."

### **Advocating (topic 6):**

"Government would prefer narcotics or sleep medication, which isn't natural and addictive but that's ok they ge<sup>t</sup> their money from the big old pharma companies #kickbacks #opioidcrisis but they're getting paid right?!?"; "#WorldLupusDay; Sen Resolution presented ( ... ) We encourage ALL our legislators to join them."; "If you think #PreExistingConditions protections aren't important, remember someone you love could have an accident, that will change how you think about this."

#### **Patient stories (topic 7):**

"My scars are my war wounds, my proof that I survived. They show me that I am..." "Lupus is a long-term condition causing inflammation to the joints, skin and other organs. There's no cure, but symptoms can improve if treatment starts early. Read about the symptoms here ... "

#### **Disease description (topic 8):**

"#Lupus is a severe + life-changing autoimmune disease that can affect any organ in the body. Yet it is also an illness where "but you don't look sick" is truly apt as the pain, su ffering + heavy duty meds aren't always visible."; "Symptoms can flare up and settle down, often the disease flares up (relapses) and symptoms becomeworseforafewweeks,sometimeslonger."

 "How lupus is diagnosed? As lupus symptoms can be similar to lots of other conditions, it can take some time to diagnose."

### **Involvement (topic 9):**

"Learn more about the disease and how you can ge<sup>t</sup> involved with the charity at"; "Let's Join Together to Fight Lupus! #WorldLupusDay"; "Did you know that over 1:1000 Canadian men, women and children are living with lupus? Let's join together in the fight against #lupus!"

### **Encouraging (topic 10):**

"Keep fighting and know we are fighting with YOU!"; "to all the Lupus Warriors still fighting every day. You're amazing and you're strong. Keep the faith."; "To all those living with Lupus around the world, keep fighting and may your e fforts to awareness be successful."

#### **Body symptoms (topic 11):**

"As well as the 3 main symptoms, you might also have: weight loss, swollen glands, sensitivity to light (causing rashes on uncovered skin), poor circulation in fingers and toes (Raynaud's)"; "#Lupus is a long-term autoimmune disease in which the body's immune system becomes hyperactive and attacks normal, healthy tissue."; "The immune system protects the body against infections and diseases. However, in Lupus, the immune system starts attacking the body's healthy tissue, leading to organ damage and chronicinflammation."

#### **Communities e** ff**ect(topic12):**

"lupus a ffects approx. 5 million people globally ye<sup>t</sup> there is still a lack of awareness amongs<sup>t</sup> general public and healthcare professionals? On #WorldLupusDay join us in encouraging greater understanding of this condition."; "Today is #WorldLupusDay. Lupus is a global health problem that a ffects people of all nationalities, races, ethnicities, genders and ages! There are about 200,000 cases diagnosed in Kenya."; "Lupus is a global health problem that a ffects people of all nationalities, races, ethnicities, genders and ages."

From the topic model results, clearly latent themes behind the tweet texts discussion emerged, underlining a hidden structure that aimed to share something more than just awareness messages or informative content. Some topics that emerged appeared to be similar ye<sup>t</sup> still covered di fferent issues and tackled di fferent narratives, which attracted the attention of di fferent users. To capture the effects that di fferent topics may have on di fferent types of users, we employed a measurement of the covariate impact. As previously mentioned, the main di fference between the LDA and STM is the

possibility of incorporating metadata and estimating the relationship between the selected covariates and the topics [40].

Figures 4 and 5 show the estimated proportion of topics more likely to be used and discussed according to the value of influencer score and network influencer score in the contents of their tweets. Topics whose estimates lie on the right side (corresponding to positive values of the *x*-axis) were more likely to be discussed/used by influencer, and conversely for the left side.

**Figure 4.** Estimated topic proportion to be discussed by influencer score.

**Figure 5.** Estimated topic proportion to be discussed by network influencer.

Such an approach made it possible to evaluate the uncertainty surrounding the coe fficient, performing a regression where the topic-proportions were the outcome variable, based on the covariance matrix. The results allowed the estimation of topic proportion as a function of covariate data, which further produces confidence intervals around the estimated topic [39].

Interestingly, the results of the estimated topic prevalence showed that some topics and their prevalence were di fferent between the two types of influencer. In particular, topic number 6, the advocacy theme, was largely associated with the network influencer tweet content. We assumed that this kind of topic and the related discussion attracted an enormous amount of attention from a specific type of user related to network influencers. In other words, it was more probable that the topic was related with advocates' content, i.e., in favor of new policy law, health policy attention, or in support of specific collective actions. This can attract specific attention and spread the narrative under discussion faster and more deeply in specific communities.

Topics 8 (disease description) and 9 (involvement) received less attention from the general public and were more likely to occur in the influencers' network communities, which may be more attracted to news or information about possible new treatments or sustaining program involvement.

Instead, topic 11 (body symptoms description) was more likely to receive attention from general influencers. Thus, the public was more interested in understanding the illness and its manifestations.

The STM also allowed an exploration of the correlations between topics to evaluate topics more likely to be discussed in the same tweet. Figure 6 shows pairwise correlation coe fficients between identified topics. Positive correlations (in blue) indicate that both topics were more likely to be discussed in a tweet, and vice versa for the negative correlation coe fficient (in red). A positive correlation appeared between topics 1 and topic 8, addressing discussions about the disease description and the way in which it was possible to learn and share information on SLE.

**Figure 6.** Correlation topics matrix.

A fairly negative correlation appeared between topics 3 and topic 7, which referred to patient stories and loneliness. It is our opinion that these two topics were less likely to be discussed in a tweet together because patient stories tended to describe the illness' physical symptoms, while tweets about loneliness were more a consequence of the disease and tended to be oriented as messages in order to feel less alone. However, as previously mentioned, there was no positive or negative correlation in our results, so we did not have enough information to make more assumptions. Further research could explore more deeply how topics are related and discussed with each other and evolve over time.
