*3.2. Player Types and Game Elements*

The relationship between the player types and certain low- and high-level game elements is critical for the cultural game success. Both element categories, taken from the literature, are shown in Table 2 and explained here. They belong to one of the three primary axes of world immersion, reward system, and engagement loop, namely affective mechanisms closely related to learning [9].

A very common player engagement system applying to all player types is the triplet of points, badges, and leaderboards (PBL). This system can be broken down to its constituent parts as follows:


Loot boxes, namely stashes with special rewards, are very common in cultural gaming. Easter eggs, namely humorous references to other games or real events, are also cases of special interest. Their use is perhaps most highlighted in the film of the universe Ready Player One [38]. Moreover, one-time items, typically available during holidays or anniversaries, increase player interest and many have been reported to be sold in both in-game and out-of-game auctions [17]. Such rewards are valuable for explorers, achievers, and conditionally for killers if they contain special equipment [16]. Writable objects such as mirrors and books are new item types where players may write something visible to others, acting thus as a local in-game chat and as a focal point for explorers and socializers.


**Table 2.** Emotions triggered by low-(upper half) and high-level (lower half) elements (Source: See text).

In-game events such as auctions and tournaments are also an integral part of storytelling in cultural games attracting mainly explorers, achievers, and socializers [14]. In auctions literally everything can be sold. Prime examples include classical, renaissance, or Victorian monuments, inscriptions, paintings, statues, and decorated columns, as well as handcraft objects including jewellery, vessels, books, and ordinary household belongings. Tournaments may also be held on a regular or sporadic basis. There, killers have plenty of opportunities to compete with each other and socializers to cooperate [9]. Carefully designed tournaments may well also be special cases of an inducement prize contest, benefiting ultimately the player, typically an achiever or killer, giving the prize.

In an open world, player characters are free to roam in a huge digital world, which is particularly appealing to explorers and killers [9]. An expanding world is in fact a strong motive for players not only to keep playing, but also to return once they have completed the game. Depending on its theme, a cultural game may well have secret rooms, namely bonus areas filled with rewards.

An open universe allows the plot to be expanded in a number of worlds or for a character to be developed across various game installments. Perhaps the most well-known example is the Wing Commander space opera series [39], which is famed for its original and effective immersion mechanisms. It was possible to transfer the same player character along six game installments [40,41] allowing the racking of an impressive score and resulting in a more continuous and coherent story. Such a universe is appealing to all four fundamental player types for different reasons.

Despite coming from the digital realm, a cultural game may have extensions to the outside world. This can be carried out in many ways including scanning quick response (QR) codes, taking pictures, or recording street noise. Moreover, tangible rewards such as material badges are strong motives for players, especially for achievers and explorers. More recently augmented reality (AR), virtual reality (VR), and haptic systems bridge the gap between human senses and the game world, making games even more immersive. Explorers and socializers are typically fond of this type of features [6].

The role of artificial intelligence (AI) or non-playable characters (NPCs) is instrumental in most in-game worlds. Interacting with NPCs adds ways explorers can learn about the game and well designed NPCs may well capture the attention of socializers, especially if they have advanced AI. Moreover, they may provide critical hints to achievers and serve as cannon fodder for killers. A significant design principle is that of the uncanny valley stating that NPCs should look convincingly human in order to be realistic but not *too* human as in this case they may look disturbing [42].

Storytelling techniques are essential to the game world evolution and all elements must have a place in that story, especially since it influences all four player types [9,14]. The classical Aristotelian structure is considered appropriate when simplicity is sought or when a clear message is to be given. On the contrary, the in media res storytelling of the Homeric sagas is considered suitable for games with open ended worlds or many installments with prequels and sequels. Other techniques such as the traditional Japanese storytelling [43] may be instrumental in games with experimental or specialized mechanics. Occasionally, alternate timelines or crossovers may make a linear story more interesting.

The affective reaction of the four player categories to these elements is important for cultural game mechanics if player interest is to be stimulated. Said reaction will be examined in terms of the emotion wheel model of Figure 2. The basic emotions under this model are anger, anticipation, joy, trust, fear, surprise, sadness, disgust, and the neutral emotional state.

**Figure 2.** Emotion wheel (Source: Wikipedia).

Table 2 contains the emotions the most common gaming elements are most likely to trigger to the four player categories. Note that this table has been compiled under a statistical approach from reports and works (e.g., [14,16,44]). The upper half of the table has low level attributes, whereas the lower half has the high level ones. Observe the rich variability of emotional reactions.

#### *3.3. Low Level Attributes*

Low level attributes examined in this work pertain primarily to the way players interact with in-game items. Since they do not involve interaction with other players or with the broader in-game world, they are classified as low-level in the sense that no strategic thinking has to take place prior to or during interaction. Nonetheless, these kind of features are indicative of player psychology as they relate to unconscious and almost instinctive decisions or decisions with minor cognitive effort [6]. These features have been selected on literature recommendations [6,15,45] and are shown in Table 3.


**Table 3.** Elements and mnemonic names for the low-level player profile (Source: [6,15,45]).

The low-level player profile for the *<sup>i</sup>*th player is the numerical vector **<sup>p</sup>**(*i*) *<sup>L</sup>* of length *Q*<sup>0</sup> = 8 with the structure of Equation (1). The latter also contains symbolic names for clarity and readability in Algorithms 1 and 2. The proposed methodology can be extended to any number of attributes.

$$\mathbf{p}\_{L}^{(i)} \stackrel{\triangle}{=} \begin{bmatrix} p\_{L}^{(i)}[1], & \dots, & p\_{L}^{(i)}[Q\_{0}] \end{bmatrix}^{T} \in V\_{0}^{Q\_{0} \times 1} \tag{1}$$

In Equation (1), the value set *V*<sup>0</sup> is {1/5, 2/5, 3/5, 4/5, 1}. Because of its structure, the symbolic five star scale of Table 4 is used throughout the text to make it more legible and intuitive-friendly. To enhance legibility, symbolic names consist of only one word without any modifiers.

**Table 4.** Symbolic names for the numerical attribute scale (Source: Authors).


These symbolic values are also linearly ordered not only because they represent actual numerical values where comparisons make perfect sense, but also because of their very context they represent distinct ranking levels. Thus, for instance, Weak < High or Medium > Low are valid comparisons.

The values of *V*<sup>0</sup> have been chosen in order to address the following reasons:


Notice that the number of attributes *Q*<sup>0</sup> in this work is the number of low-level elements found in the recent scientific literature. This number by no means limits the generality of the proposed methodology. In fact, the only actual hard constraint regarding how many attributes may be used is that the low- and high-level player profiles have the same number of components.

The distance between two low level profiles is defined as in Equation (2):

$$l\left(\mathbf{p}\_{L}^{(i)},\mathbf{p}\_{L}^{(j)}\right) \stackrel{\triangle}{=} l(i,j) \stackrel{\triangle}{=} \exp\left(-\frac{\left\|\mathbf{p}\_{L}^{(i)} - \mathbf{p}\_{L}^{(j)}\right\|\_{2}^{2}}{2Q\_{0}}\right) \tag{2}$$

#### *3.4. High Level Attributes*

High level game attributes pertain to player actions with higher semantic content as well as with conscious decisions. They mainly involve interaction with other players, strategic decisions, and dealing with the in-game world. As with the previous case, the high level profile of the *i*th player is a numerical vector **<sup>p</sup>**(*i*) *<sup>H</sup>* with the structure of Equation (3) and with the same length as **<sup>p</sup>**(*i*) *L* :

$$\mathbf{p}\_{H}^{(i)} \stackrel{\triangle}{=} \begin{bmatrix} p\_{H}^{(i)}[\mathbf{1}] & \dots \end{bmatrix} \begin{bmatrix} \mathbf{1} \end{bmatrix} \quad \dots \text{,} \quad p\_{H}^{(i)}[\mathbf{Q}\_{0}] \end{bmatrix}^{T} \quad \in V\_{0}^{\mathbf{Q}\_{0} \times \mathbf{1}} \tag{3}$$

The components of each profile vector **<sup>p</sup>**(*i*) *<sup>H</sup>* along with the respective mnemonic names, added for enhanced readability of Algorithm 2, are shown in Table 5. Observe that the high level attributes can add more semantic context since they directly reflect decisions which require at least some cognitive effort and they can be described with annotations or a restricted vocabulary as explained in [46]. Moreover, they are indicative of the social intelligence of the players, especially in the way they choose to compete against or cooperate with each other [47]. The same quantile-based scheme of the previous case and the same five star scale are used to obtain the respective values for each player for consistency.

The similarity metric between two high level profiles is the Gaussian kernel of Equation (4):

$$h\left(\mathbf{p}\_H^{(i)}, \mathbf{p}\_H^{(j)}\right) \stackrel{\triangle}{=} h(i, j) \stackrel{\triangle}{=} \exp\left(-\frac{\left\|\mathbf{p}\_H^{(i)} - \mathbf{p}\_H^{(j)}\right\|\_2^2}{2Q\_0}\right) \tag{4}$$



#### *3.5. First- and Higher-Order Player Profiles*

From the attributes described earlier it is possible to classify players according to the Bartle taxonomy. This is a first order player classification as it is based solely on features of a single player. Although this approach certainly has merit as is based on ground truth directly deriving from player activity, higher-order methodologies tend to systematically yield more robust player classifications as they aggregate not only local ground truth states but also information implicitly encoded in similarity matrices constructed from pairwise profile distance metrics such as those of Equations (1) and (3). Among the reasons favoring higher-order approaches are the following [14,15]:


The mapping of profiles to the Bartle taxonomy types will take two different forms depending whether only low-level attributes are available. This is necessitated by the form of some of the higher-order methods. When only the low-level attributes are available, Algorithm 1 applies.

Notice that in both Algorithms 1 and 2 the inequality symbols ≥ and ≤ can be applied to the symbolic values of Table 4 reserved for the profile attributes. In these cases, they are to be interpreted taking into consideration the linear order of the symbolic values. Therefore, for instance, the inequality points ≥ High is true when points has the values Strong or High and it is false otherwise.

Algorithm 1 relies on a number of observations about player activity as reflected in the low-level attributes. Achievers care for points and badges, which eventually will bring them to a prominent position in the leaderboard [14]. Explorers seek to find loot boxes, one-time boxes, secret rooms, or Easter eggs with the latter three being more important. This implies that explorers may well

accumulate significant points in the process, so points alone cannot distinguish between these player types [14,44]. Socializers can be recognized by their extended use of writeable objects in order to communicate with other players as well as their drive to find secret rooms with the hope of finding more players there. These factors may result in an increased number of badges related to player activity [15]. Killers are less easy to define through a set of rigid rules since their objective may be simple, but in order to achieve it they frequently resort in activities which are also common in the other player types. For instance, they may seek secret rooms as socializers and explorers do, they may collect badges similar to achievers and socializers, they want one-time items as explorers do, and they share with achievers the excitement for points as well as for a prominent leaderboard position [6].

**Algorithm 1** Mapping of low-level attributes to Bartle taxonomy.

**Require:** Low-level profile as described in Equation (1). **Ensure:** A first-order mapping to one of the four player types of the Bartle taxonomy. 1: **if** [points ≥ High] **and** [at least one of badges, points ≥ High] **then** 2: type = achiever 3: **else if** [loot ≥ High] **and** [at least two of rooms, loot, eggs ≥ High] **then** 4: type = explorer 5: **else if** [writeable ≥ High] **and** [at least one of badges, rooms ≥ Medium] **then** 6: type = socializer 7: **else** 8: type = killer 9: **end if** 10: **return** type

When both low- and high-level attributes are available in player profiles, then the more complex set of rules outlined in Algorithm 2 is used instead for the mapping to the Bartle taxonomy.

## **Algorithm 2** Mapping of high-level attributes to Bartle taxonomy.

**Require:** High-level profiles as in Equation (3).

**Ensure:** A first-order mapping to one of the four player types of the Bartle taxonomy.

1: **if** [tours ≥ High ] **and** [competition ≤ High **and** cooperation ≥ Weak] **then**

```
2: type = achiever
```
3: **else if** [at least one of timelines, crossovers, world ≥ High] **and** [tours ≥ Weak **or** npcs ≥ Medium]

```
then
```

```
4: type = explorer
5: else if [at least one of players, npcs ≥ High] and [cooperation ≥ Medium] then
6: type = socializer
7: else
8: type = killer
9: end if
10: return type
```
Algorithm 2 connects high-level attributes to player types. Achievers are more likely to participate to in-game tournaments in order to accomplish game-wide objectives, possibly with the help of other players [8]. Explorers tend to investigate both the in-game world and its extensions such as crossovers and alternative timelines. In addition, they may occasionally take part in tournaments in order to find one-time boxes, loot items, or other prizes [14]. Socializers interact with other players or NPCs, and they may tend to help others in in-game events [7]. As in the previous case, killers are difficult to

discern from other players as killers may join in-game events like achievers. Some killers search the game world as explorers. In addition, killers and socializers interact with other players and NPCs.

Higher-order methods may well use first-order profile classifications are starting points since the latter are obtained from ground truth data. Depending on the data representations, options include vector clustering, string matching, decision trees, or graph matching. Since here profiles are represented as numerical vectors, clustering techniques are more appropriate. In particular, iterative schemes coming from a template Simon–Ando scheme were selected and tested, as explained below.

#### **4. Proposed Clustering Methodology**

#### *4.1. Template Simon–Ando Clustering*

The algorithmic cornerstone for clustering player profiles is the Simon–Ando iterative scheme, which is based on the power method. The latter estimates the primary eigenvector **g** of a matrix **<sup>M</sup>** <sup>∈</sup> <sup>R</sup>*n*×*<sup>n</sup>* through a matrix-vector computation and normalization cycle, as shown in Algorithm 3.

#### **Algorithm 3** The power method.


The main parameter of the power iteration is the initialization of vector **g**[0] . In the general case, initialization takes place with random elements or with information extracted from matrix **M** itself. In the former case, it is advisable that Algorithm 3 be executed many times as a random vector many not contain the direction of **g**, especially when the dimension of the column space of **M** is large.

The Simon–Ando iteration is similar to the power method with one major difference: it terminates when the elements of **g**[*k*] are clustered. To this end, a different termination criterion *τ*<sup>0</sup> is necessary. Analysis of the power method indicates that it undergoes the stages described below [45,48]:


Algorithm 4 is the template Simon–Ando scheme from which the three iterative schemes are derived. It is a matrix free algorithm and multiplications can be seen as passing the vector **v**[*k*] to a kernel *<sup>T</sup>*[·] and retrieving the result **<sup>v</sup>**[*k*+1] . In this work, selecting the particular form of *T*[·] uniquely determines the iterative scheme. Additionally in the template **v**[*k*+1] is normalized by a generic norm · *<sup>H</sup>* in a Hilbert space, but the specific norm depends on the selection of *T*[·] as well.

10: **return**

**Algorithm 4** The template Simon–Ando scheme.

**Require:** Functions of Equations (2) and (4). **Ensure:** Players are clustered based on the basic Bartle types 1: initialize vector **g**[0] ; normalize **g**[0] 2: initialize matrices **L** and **H** as in Equation (9) 3: **if** annotations are used **then** 4: scale matrix **H** once as **H** ← **HN** 5: **end if** 6: compute kernel *T*[·] as a function of **L** and **H** 7: **repeat** 8: compute **<sup>g</sup>***k*+<sup>1</sup> <sup>←</sup> *<sup>T</sup>* **g***k* ; normalize **<sup>g</sup>***k*+<sup>1</sup> <sup>←</sup> **<sup>g</sup>***k*<sup>+</sup>1/ **g***k*+<sup>1</sup> *<sup>H</sup>* 9: **until** termination criterion *<sup>τ</sup>*<sup>0</sup> is **true**

The termination criterion *τ*<sup>0</sup> is a key component of Algorithm 4 since it determines clustering quality. The critical requirement for *τ*<sup>0</sup> is to detect the Simon–Ando phase before it is over. One way to achieve this is to compute the elementwise harmonic mean of the second order difference between three successive versions **g**[*k*−1] , **g**[*k*] , and **g**[*k*+1] as shown in Equation (5), assuming their length is *n*:

$$\tau\_0[k] \stackrel{\triangle}{=} \frac{n}{\sum\_{j=1}^n \frac{2}{|\mathbf{g}^{[k+1]}[j] - 2\mathbf{g}^{[k]}[j] + \mathbf{g}^{[k-1]}[j]|}} \ge \eta\_0 \tag{5}$$

The harmonic mean has been selected in this work for the following reasons:


To understand why the eingenstructure of **M** appears in the Simon–Ando clustering, consider the following: Let **g**[*i*] be a fixed element of **g**. For the very small number of iterations of the Simon–Ando phase, the multiplication with **M** should at most perturb the clustering of the elements of **g**. By construction, the elements of the former are weighted linear combinations of the elements of the latter, as shown in Equation (6). It follows then that for each of the *n* elements of **g**[*k*] it holds that:

$$\mathbf{g}^{[k+1]}[i] = \sum\_{j=1}^{n} \mathbf{M}[i,j] \mathbf{g}^{[k]}[j], \qquad 1 \le i \le n \tag{6}$$

If some **g**[*k*] [*j*] does not appear in (6), namely it is zero, then it can be added into both sides. Then, by stacking the *n* equations and casting them in an equivalent matrix notation leads to Equation (7):

$$
\begin{bmatrix}
\mathbf{g}^{[k+1]}[1] \\
\mathbf{g}^{[k+1]}[2] \\
\vdots \\
\mathbf{g}^{[k+1]}[n]
\end{bmatrix} = \begin{bmatrix}
\mathbf{M}[1,1] & \mathbf{M}[1,2] & \dots & \mathbf{M}[1,n] \\
\mathbf{M}[2,1] & \mathbf{M}[2,2] & \dots & \mathbf{M}[2,n] \\
\vdots & \vdots & \ddots & \vdots \\
\mathbf{M}[n,1] & \mathbf{M}[n,2] & \dots & \mathbf{M}[n,n]
\end{bmatrix} \begin{bmatrix}
\mathbf{g}^{[k]}[1] \\
\mathbf{g}^{[k]}[2] \\
\vdots \\
\mathbf{g}^{[k]}[n]
\end{bmatrix} \tag{7}
$$

Under the assumption that for two successive steps in the clustering phase of the Simon–Ando **<sup>g</sup>**[*k*+1] <sup>≈</sup> **<sup>g</sup>**[*k*] Equation (7) can be cast as a linear equation or as an eigenvalue problem as in (8):

$$\mathbf{g}^{[k]} = \mathbf{A}\mathbf{g}^{[k]} \Leftrightarrow (\mathbf{A} - \mathbf{I}\_{\text{ll}})\mathbf{g}^{[k]} = \mathbf{0} \tag{8}$$

Matrices **L** and **H** in (9) contain the low- and high-level attribute distance metrics, respectively, as:

$$\mathbf{L} \stackrel{\triangle}{=} \begin{bmatrix} 1 & l(1,2) & \dots & l(1,P\_0) \\ l(2,1) & 1 & \dots & l(2,P\_0) \\ \vdots & \vdots & \ddots & \vdots \\ l(P\_0,1) & l(P\_0,2) & \dots & 1 \end{bmatrix} \qquad \mathbf{H} \stackrel{\triangle}{=} \begin{bmatrix} 1 & h(1,2) & \dots & h(1,P\_0) \\ h(2,1) & 1 & \dots & h(2,P\_0) \\ \vdots & \vdots & \ddots & \vdots \\ h(P\_0,1) & h(P\_0,2) & \dots & 1 \end{bmatrix} \tag{9}$$

In Equation (9), *P*<sup>0</sup> is the total number of players. Because of the form of the Gaussian kernel both distance matrices **L** and **H** are symmetric and their diagonal elements equal one.

The user annotation weight matrix **N** is defined elementwise as in Equation (10). It is based on the annotations of Table 6. As a sign of the pairwise joint player activity between players *i* and *j*, the harmonic mean of the ratio of the annotation references to either player for the three annotation categories of Table 6 to the respective maximum is computed. This choice yields a real symmetric **N**.

$$\mathbf{N}[i,j] \stackrel{\triangle}{=} \min \left[ \mu\_0, \frac{3}{\frac{\sum\_{k=1}^{\mathbb{D}\_0} \left(f\_k^1 + f\_k^2\right)}{\left(f\_i^1 + f\_j^1\right) + \left(f\_i^2 + f\_j^2\right)} + \frac{\sum\_{k=1}^{\mathbb{D}\_0} f\_k^3}{f\_i^3 + f\_j^3} + \frac{\sum\_{k=1}^{\mathbb{D}\_0} \left(f\_k^4 + f\_j^4\right)}{\left(f\_i^4 + f\_j^4\right) + \left(f\_i^3 + f\_j^4\right)} \right] \quad 1 \le i, j \le P\_0 \tag{10}$$



In the above equation, *f <sup>k</sup> <sup>i</sup>* is the frequency of annotation *ak* regarding player *i* in the dataset and *μ*<sup>0</sup> is a small positive constant. The rationale behind the selection of the particular weight of Equation (10) is that joint player activity should be high when they are of the same type since they have similar objectives and in-game behavior. Additionally, this choice allows the separate treatment of different annotation categories. Notice that annotations *a*<sup>4</sup> and *a*<sup>5</sup> refer to opposing behaviors, namely cooperation and competition in tournaments. In this case, for most players, either one of the frequencies *f* <sup>4</sup> *<sup>i</sup>* and *<sup>f</sup>* <sup>5</sup> *<sup>i</sup>* will be high but not both. Of course both can be down as well. There are many ways of selecting the constant *μ*0. Since it represents the minimum amount of player interaction in the game, here it was set to be the minimum non-zero value of Equation (10).

The scheme named *matrix* is derived from the Simon–Ando template by selecting the kernel *T*[·] to be matrix **L** and the scheme named *comb* is the half-sum of matrices **L** and **H**. Therefore, the former iteration relies only only on low- level player attributes, whereas the latter exploits both low- and high-level ones. If the matrix **N** is used, then it multiplies **H** from right once before iteration starts. This is called scheme *comb-a* (see Table 7 for an overview of the clustering methods).


**Table 7.** Overview of iterative clustering schemes (Source: Authors).

#### *4.2. Tensor-Based Clustering*

Tensors allow similarity metrics modeling simultaneous linear dependencies between sets of variables. Although from a programming perspective a tensor is a multidimensional array indexed by an array of *p* integers, where *p* is the tensor order, the formal tensor definition is the following:

**Definition 1.** *<sup>A</sup> pth order tensor* <sup>T</sup> *, where <sup>p</sup>* <sup>∈</sup> <sup>Z</sup>∗*, is a linear mapping coupling <sup>p</sup> non necessarily distinct vector spaces* <sup>S</sup>*k,* <sup>1</sup> <sup>≤</sup> *<sup>k</sup>* <sup>≤</sup> *p. If* <sup>S</sup>*<sup>k</sup>* <sup>=</sup> <sup>R</sup>*Ik , then* T ∈ <sup>R</sup>*I*1×...×*Ip .*

Tensor multiplication along the *k*th dimension G = X ×*<sup>k</sup>* Y between tensors X of order *p* and Y of order *q* can occur if both tensors have the same number of entries *Ik* in the *k*th dimension.

**Definition 2** (Tensor multiplication)**.** *The multiplication of* X *and* Y *along the kth dimension is defined as:*

$$\mathbb{E}\left(\mathcal{X}\times\_{k}\mathcal{Y}\right)[\mathbf{i}\_{1},\ldots,\mathbf{i}\_{p},\mathbf{j}\_{1},\ldots,\mathbf{j}\_{k-1},\mathbf{j}\_{k+1},\ldots,\mathbf{j}\_{q}] \stackrel{\triangle}{=} \sum\_{i\_{k}=1}^{l\_{k}} \mathcal{X}\left[\mathbf{i}\_{1},\ldots,\mathbf{i}\_{k},\ldots,\mathbf{i}\_{p}\right] \mathcal{Y}\left[\mathbf{j}\_{1},\ldots,\mathbf{i}\_{k},\ldots,\mathbf{j}\_{q}\right] \tag{11}$$

Observe that, as a special case, a tensor-vector product X ×*<sup>k</sup>* **v** along the *k*th dimension where X ∈ <sup>R</sup>*I*1×...×*Ip* is a *<sup>p</sup>*th order tensor and **<sup>v</sup>** <sup>∈</sup> <sup>R</sup>*Ik* is a vector is defined elementwise as:

$$\mathbf{x}(\mathcal{X}\times\_{k}\mathbf{v})\begin{bmatrix} i\_{1},\ldots,i\_{k-1},i\_{k+1},\ldots,i\_{p} \end{bmatrix}\stackrel{\triangle}{=}\sum\_{i\_{k}=1}^{l\_{k}}\mathcal{X}\begin{bmatrix} i\_{1},\ldots,i\_{k},\ldots,i\_{p} \end{bmatrix}\mathbf{v}[i\_{k}]\tag{12}$$

The result is a tensor of order *p* − 1. Therefore, for a third order tensor the result is a matrix.

**Definition 3** (Frobenius norm)**.** *The Frobenius norm of a pth order tensor* X ∈ <sup>R</sup>*I*1×...×*Ip is:*

$$||\mathcal{X}||\_F \stackrel{\triangle}{=} \left(\sum\_{i\_1=1}^{I\_1} \dots \sum\_{i\_p=1}^{I\_p} \mathcal{X}\left[i\_1, \dots, i\_p\right]^2\right)^{\frac{1}{2}}\tag{13}$$

Given the above, the iterations named *tensor* and *tensor-a* in the experiments are built around the third-order tensor T ∈ P- <sup>×</sup> P- <sup>×</sup> L-, where *L*<sup>0</sup> is the available level of attributes, namely two in this case. If more levels were available, then the proposed approach could be extended to use them by adding one layer per level. This demonstrates the generality as well as the simplicity of this particular method. Notice that each level of T is a proper matrix by itself. The geometric insight for it can be seen as stacking matrices **L** and **H** along the third dimension, as shown in Figure 3.

**Figure 3.** Tensor structure (Source: Authors).

The computation part of the iteration loop has the block form of Equation (14). When the annotation weight matrix **N** is used, **H** is multiplied from the right by it once during initialization.

$$\mathbf{H} = \mathbf{H} \mathbf{N}, \qquad \mathbf{G}^{[0] \stackrel{\triangle}{=}} \begin{bmatrix} \mathbf{g}\_{L}^{[0]} & \mathbf{g}\_{H}^{[0]} \end{bmatrix}, \qquad \mathbf{Y}^{[k+1] \stackrel{\triangle}{=}} \mathcal{T} \times\_{1} \mathbf{G}^{[k]}, \qquad \mathbf{G}^{[k+1] \stackrel{\triangle}{=}} \mathbf{j}\_{2} \mathbf{Y}^{[k+1]} \tag{14}$$

The above iteration is based on running the low- and high-level cluster schemes separately and then combining them. An optional weight deriving from annotations for the high-level attributes can be added. The iteration steps of the computation part of the loop of Algorithm 4 are as follows:


In this case, the kernel *T*[·] of Algorithm 4 is an implicit function of **L**, **H**, and **J**2. Moreover, at the end of the computation part of the loop, normalization is done with the Frobenius norm, which is valid since matrices are tensors of order two. This happens as now iterations are about matrices.

Table 7 offers an overview of the iterative clustering schemes deriving from the template of Algorithm 4. It is based on applicable standard criteria for classifying iterative schemes [37].

## **5. Results**

#### *5.1. Setup*

Table 8 shows the experimental setup for evaluating the proposed methodology. It contains those parameters indirectly influencing clustering, but they have to be manually inserted by the developer.


**Table 8.** Experimental setup (Source: Authors).

The experiments are designed to answer the following questions:

• **First vs. higher order:** These tests examine the clustering quality achieved by the first-order mappings of Algorithms 1 and 2 compared to that of the higher-order iterative clustering methods. Recall that the latter aggregate local ground truth to reveal global properties.


#### *5.2. Dataset Synopsis*

The dataset serving as benchmark to test the proposed methodology as well as the effect of user annotations was obtained from Kaggle. It comprises of *L*<sup>0</sup> rows pertaining to an anonymized fantasy-style cultural game involving a number of late Roman and medieval historical elements including gladiatorial style combats, castle building, and knight quests, with jousting tournaments being one of the main in-game events. These rows had the following fields:


The above annotations come from *N*<sup>0</sup> distinct players, they refer to the in-game events, and they were drawn from a restricted list with a total of *E*<sup>0</sup> options. The latter means that annotating users had to choose only from a pre-specified set of annotations intended for the game designers to understand how certain game elements were understood by the player base. Therefore, there were no missing or erroneous values in the various fields of the dataset and the semantics were well defined. Observe that the number of annotators is significantly larger than the number of players the annotations are about. A possible explanation is that annotators chose to provide information about players with considerable in-game activity. Additionally, the relatively small number and format of options *E*<sup>0</sup> in the annotations list made them easy to remember, therefore making easy their creation. In turn, this is an important incentive for creating a large number of them over a short amount of time.

Table 6 has the meaning of each user annotation as well as the category it belongs to. It resulted from processing the raw dataset and extracting player as well as action information. The latter was the grouping factor for the processed *L* <sup>0</sup> rows in order to form the annotations categories *a*<sup>1</sup> to *a*5.

Table 9 contains the distribution of each of the five categories in the resulting dataset. Observe that this distribution is rather balanced, therefore greatly facilitating analysis. Notice there is only one category probability, namely that of generic interaction, is much larger than the other ones.

**Table 9.** Category distribution in the processed dataset (Source: Authors)


## *5.3. Number of Iterations and Floating Point Operations*

The primary figure of merit for each iterative algorithm is the number of iterations. Table 10 presents for the methodologies of Table 7 the number of iterations required to achieve the same level of convergence *η*0. Since the starting point is random, the number of iterations is a stochastic quantity. Therefore, for each scheme was run *R*<sup>0</sup> and the mean and variance were computed. Moreover, as an additional safeguard, a maximum of *η*<sup>1</sup> iterations was included.


**Table 10.** Iterations for each method (Source: Authors).

The termination criterion *τ*<sup>0</sup> used to achieve the number of iterations of the above figure was that of Equation (5) with a parameter of *η*<sup>0</sup> selected, as shown in Table 8. This value is low enough to achieve convergence without allowing the power method proceeding too far. From the results in Table 10, the following can be inferred about the number of iterations for each of the clustering schemes:


Table 11 has the average number of floating point operations (flops) for each method. Again, since the clustering schemes are stochastic, the procedure of the previous case was applied.


**Table 11.** Flops for each method (Source: Authors).

From the results in Table 11, the following conclusions can be drawn regarding the scaling:

