**6. User Evaluation**

The participants were asked to evaluate randomly assigned personas created by their peers. Each respondent evaluated two personas, for a total of 318 persona evaluations. The personas were assigned randomly, one from each group, for every evaluator. This work utilized the Persona Perception Scale, developed by Salminen et al. [45], for a user evaluation of social-media data-generated personas. This scale fully applies to our peer evaluation tasks, since it accounts for multiple aspects of interest. The exploratory nature of this work encouraged the use of a full scale for the facilitators to be able to observe possible subtle differences between the two groups.

Group A participants used our topic modelling tool to get the information for the personas (Figure 4). The topic analysis also provided trending information for both the topics and the keywords. Additionally, we utilized a customized model for the SentiStrength sentiment analysis tool, which included emotions [43]. The designers utilized the information for the present detailed topics, which included sentiment and trending information.

**Figure 4.** User-constructed topic modelling-assisted thin persona.

Group B participants used the output of the analysis tools as a guide for the validation and selection of the most prominent information to use for their persona construction (Figure 5). All sections of the persona were selectable, and the data options were editable. Observing the designers, they edited the age groups based on social media data, selected the number of keywords and validated their sentiment using at least two sources of information. Picture selection was a necessary process, since variations of the same pictures would originally lead to a few pictures overwhelming the selection. The users reported that the Group B personas exhibited a much greater variety of pictures.

**Figure 5.** User-constructed meta-data-assisted thin persona.

Figure 6 depicts the average user evaluation responses per perceived persona aspect on a standardized Likert scale of 1–5.

**Figure 6.** Persona perception user study results.

The Group A designers who utilized the advanced topic information scored higher on credibility, clarity and consistency, while Group B scored higher in completeness and empathy. This can be explained by the fact that Group A participants utilized already curated and streamlined topic information from the same source, while Group B participants accessed multiple sources that were also different per participant. The use of the same source information led to clearer and to more consistent results that were also perceived as more credible. On the other hand, the use of multiple sources led to a higher number of constructed personas, some of which contained more personal (or less general) information bits that were perceived as being more complete as a whole, as well as more sympathetic by the evaluators.

The two groups scored similarly for familiarity and liking, showing that the topic modelling accurately reflected on familiar terms and descriptions, and both approaches resulted in personas that were liked by the respondents. Personas constructed by Group B were perceived as marginally friendlier, which was also the result of the utilization of more personal information bits.

The major findings of this study were the responses on the interpersonal attraction (the level of attractiveness of the personas by the participants) and the similarity (the level of perceived similarity of the participants to the personas). The personas constructed by Group A participants scored much higher in the evaluation for both aforementioned aspects. This was not expected, nor was it hypothesized before the study. One possible explanation was that the clarity and consistency of the personas constructed using the topic modelling knowledge resulted in the users feeling more attracted and similar to the personas. Another possible explanation, mentioned by the evaluators during the post-study discussion, was that topic modelling clustered the data to more abstract notions, thereby flattening possible extreme or outlier data that could lead to unattractive personas. Even if the number of such personas would be very low, it might still affect their perceived attractiveness and feeling of similarity for the evaluators.

The participants also self-reported their acceptance and confidence for the personas they created. The rationale behind this metric is derived from the user experience evaluation, where the designs are evaluated by the end users and the designers use that feedback to self-reflect on their designs. In our case, the participants evaluated other personas but also their own on the bases of their acceptance to use their personas themselves and their confidence about their response. Figure 7 shows that the participants of Group B reported a much higher acceptance with a similarly high confidence. The participants of Group A reported a high acceptance of their own designs, which was, however, lower on average than that of the other group, with a very high confidence. Based on the literature, this is an expected result, and it is justified from the fact that the Group B participants were fully responsible for the data collection, analysis and persona design. Thus, they were confident that they did their best to design personas that they would use themselves. On the other hand, the participants of Group A used the already analyzed data to the best of their abilities, and they were confident that they produced very good results. However, they could not be sure that the data they had in their hands provided the maximum coverage of the requirements.

**Figure 7.** Participants' self-reported acceptance and confidence about their constructed personas.

#### **7. Conclusions and Future Work**

This paper presented a human study that aimed to examine the effect of big data utilization on persona construction. It followed the rationale, derived from earlier works, that automatically data-generated personas cannot fully replace the designer's immersion in actual data in terms of persona creation.

The findings showed that deep analysis and the use of data analytics, such as topic modelling, can lead to personas that are perceived as clear, consistent and complete. Furthermore, this persona design is perceived as very appealing to users and the personas as something that the users would feel quite similar to. This approach requires much less effort than traditional human-directed data analysis and may be especially helpful for limited scope personas, such as music events, thematic museums (e.g., war museum), as well as educational or medical applications.

Based on the findings of this work, an optimal approach to persona construction using big data analytics could be a combination of the two approaches that were examined, or even a triple combination of data-generated personas using data analytics and manual analysis for refinement. This work has basic limitations with regard to the target of the persona construction, namely a music event. This is a limited scope domain that was selected to demonstrate how data analytics may reveal aspects not easily discovered by designers, but how it may also allow for an extensive human study to be made possible by limiting the data and the scope of the experiments.

To evaluate the findings from this work against purely automatic data-generated personas, a comparative evaluation, including personas automatically generated using approaches such as the one by Salminen et al. [46], would be required. However, this was not applicable for our experiments because of the focus shift and the additional effort that this would require by the participants, as well as the complexity that the endeavor of such a non-standard evaluation between the three (automatic data-generated, topic modelling, user analysis) approaches would introduce. Moreover, there are already existing works that compare fully automatically generated personas with traditional ones, and which have yielded results that have been discussed in this paper [47].

This work has limitations that are bound by the tools, the data and the users. The tools and their use is a matter of personal expertise by the designers. The data that are used also comprise a designer choice (in this case, it was a cultural event), as do the sources. The tools were selected for their ease of use, since the users were familiar with them. As a process, the persona construction would not have been affected by using different or additional tools; however, the content and user decisions could have been. For example, tweaking the LDA parameters or having additional data for the analysis would possibly yield different results, and the users would have to work with those as their choices. However, an automatic persona construction would also have been affected by such parameters. The same core data sources have been used for all user groups in order to retain comparative fidelity.

To monitor changes, topic change information may be displayed, such as trending topics, a timeline view and clustering with regard to sentiment. This would allow designers to edit or amend their personas to account for major cases. Specific situations, such as the recent global COVID-19 pandemic, may lead to specific considerations regarding design thinking (for quarantine or online user experiences), introducing new potential users and methods for content delivery that requires a fast adaptation of user design. The persona design and update would be key for a user design's rapid adaptation to new situations and emerging requirements, by utilizing the information change from the main differences of the personas.

For future work, we are planning to include an analysis of textual information from existing social network users for the automatic adaptation of existing personas with regard to their content description for a fully-fledged persona construction [48,49]. Additionally, multiple sources, such as Facebook, could be utilized for automatic enrichment, since users reported that they found interesting information and expected it to be a valid resource for cultural event-based user content.

**Author Contributions:** Conceived, designed and performed the experiments; analyzed the data and wrote the paper, D.S., D.M. and C.V. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.
