*5.4. Cluster Distance*

As ground truth is unavailable for the processed dataset, the following quality metrics were used. They rely solely on the results of the experiments and do not require tuning or hyperparameters.


Table 12 shows the normalized values of ¯*d*, *d*∗, and *D*¯ for each clustering scheme. The reason for this is that a relative indicator of clustering quality sheds more light to the performance of each algorithm. Each clustering scheme was normalized to its respective minimum.

**Table 12.** Clustering metrics for each scheme (Source: Authors).


Let *Ci* and *Cj* be the *i*th and *j*th cluster and *C*<sup>0</sup> the total number of available clusters. Then,

$$d\_{i,j} \stackrel{\triangle}{=} \frac{1}{|\mathbb{C}\_i| |\mathbb{C}\_j|} \sum\_{\mathbf{w} \in \mathbb{C}\_i} \sum\_{\mathbf{w}' \in \mathbb{C}\_j} d\left(\mathbf{w}, \mathbf{w}'\right) \tag{15}$$

In (15), |*Ci*| and *Cj* are the number of data points of clusters *Ci* and *Cj*. Then,

$$d \stackrel{\triangle}{=} \frac{1}{\binom{\mathbb{C}\_0}{2}} \sum\_{(i,j)} d\_{i,j} = \frac{2}{\mathbb{C}\_0(\mathbb{C}\_0 - 1)} \sum\_{(i,j)} d\_{i,j} \tag{16}$$

Along a similar line of reasoning, *d*<sup>∗</sup> is the maximum over all pairwise distances *di*,*j*:

$$d^\* \stackrel{\triangle}{=} \max\left[d\_{i,j}\right] \tag{17}$$

Another figure of merit is how compact clusters are. This is measured by the average distance *D*¯ between any two points in a cluster. The intra-cluster distance *Di* for *Ci* is defined as in Equation (18):

$$D\_i \stackrel{\triangle}{=} \frac{1}{|\mathcal{C}\_i| \left( |\mathcal{C}\_j| - 1 \right)} \sum\_{(\mathbf{w}, \mathbf{w}')} d\left(\mathbf{w}, \mathbf{w}'\right) \tag{18}$$

Metric *D*¯ is obtained by averaging the intra-cluster distances as in Equation (19):

$$
\bar{D} \stackrel{\triangle}{=} \frac{1}{\mathcal{C}\_0} \sum\_{i=1}^{\mathcal{C}\_0} D\_i \tag{19}
$$

From the entries of Table 12, the following observations can be made:


#### *5.5. Player Type Distribution*

Once profile clustering is complete, two player type distributions are of interest:


From the entries of Table 13, the following can be said:



**Table 13.** Player type distributions (Source: Authors).

#### *5.6. Discussion*

The results obtained earlier agree with these reported elsewhere in the recent scientific bibliography. In particular, the combination of low- and high-level player attributes to improve player experience has been proposed [19]. Moreover, in [14], it is maintained that players can easier fit the Bartle taxonomy when their collective behavior is taken into consideration. The need for an advanced player clustering scheme is highlighted in [17]. Tensor clustering fulfills these requirements.

Based on the experiment results the inclusion of user annotations to the clustering scheme seem to make a difference both in clustering quality and in scalability. In addition, it should be highlighted that the tensor representation for player profiles and the user annotations are two different factors contributing in their own way to clustering. The results presented here are consistent with the findings of Yang *et al.* [49] where high quality labeling rules for reducing annotation cost were derived by crowdsourcing. In [50], these rules are augmented with ones mined with ML. Large-scale data annotation is indispensable for other type of games including serious games [51]. Annotations have also been proposed as a supplementary mechanism for understanding player actions in affective games [52]. To this end, games intended for cultural preservation may well benefit from including a dedicated module designed for collecting and processing them such as the one described in [53]. Since human activity patterns are to be extracted from gaming activity, it makes perfect sense that mining algorithms may rely on human assistance, even a partial one like annotations.

Regarding the role of user annotations in general, if they are utilized properly, they can be instrumental in the disambiguation of player activity [52]. As such, they can significantly boost the performance of mining algorithms by providing the initial information and ontological structure they can start from. The primary reasons for this happening are listed below [17]:


As a specific example for the above, we plan to incorporate the strategy proposed in this work within the gamification module of the ANTIKLEIA project in order to exploit the user annotations. Through a suite of open markup technologies, such as the extensible markup language (XML), the resource description framework (RDF), and JavaScript Object Notation (JSON), as well as open protocols such as *Dublin core*, annotations can be driven as ground truth data from the dedicated user interface (UI) component to the data management one. Meta-data have been known to boost the performance of mining algorithms in terms of accuracy and robustness [54].

In this framework, Figure 4 depicts how annotations fit in the software architecture of ANTIKLEIA. Once the raw annotations are extracted, they are first locally stored as JSON documents. The underlying

storage could very well be a document database, such as MongoDB, but a relational database is suitable as well. Then, the following analysis takes place in parallel:


**Figure 4.** Annotations and the architecture of project ANTIKLEIA (Source: Authors).

Besides game designers, the above analytics may be of use to cultural enthusiasts, cultural professionals, and independent developers alike, as further described in the ANTIKLEIA use cases [3]. User annotations can even contribute to the data-driven construction of massive ontologies for cultural items [55]. The latter can reveal how the player base sees these items based on diverse criteria that clearly go beyond the scope of the present work.

#### **6. Recommendations**

With knowledge of the particular player base composition certain recommendations can be stated for game designers and practitioners with the explicit aim of keeping player interest unabated from an affective perspective. It is a long running tenet of the gaming industry that a successful game should be played more than once [11]. Several criteria for this purpose have been proposed in the literature with engagement, replayability, and immersion being among them [14,20]. The strategy in this work will be to maximize engagement, which will be achieved based on the following data:


The analysis will be based on two complementary factors. First, emphasis will be placed on those gaming elements which attract the majority of player types. Conversely, elements eliciting negative emotions to the majority of players will be in general ignored. To this end, the following analysis will be used: Each basic emotion of Figure 2 is assigned the value ±1 depending on its polarity, whereas the neutral state is assigned the value zero. For each element, the score of Equation (20) is computed:

$$
\sigma\_i \stackrel{\triangle}{=} \sum\_t \sigma\_{i,t} p\_t \tag{20}
$$

In Equation (20), *ei*,*<sup>t</sup>* is the statistical emotional response of players of category *t* to the *i*th game element, while *t* ranges over the four player types. Given the above, Table 14 is generated for the elements used in the low- and high-level player profiles in Tables 3 and 5, respectively. This analysis can be conducted for the remaining elements of Table 2 or any other game elements for that matter.


**Table 14.** Scores for the game elements (Source: Authors).

Given the values of Table 14, for the specific player base, priority should be given to widening the in-game world, creating alternative timelines, and including loot boxes. In addition, tournaments should be cooperative instead of competitive. On the contrary, writeable objects and the inclusion of more NPCs will offer little to the in-game experience as they attract only a small number of players.
