Community Evolution Analysis Driven by Tag Events: The Special Perspective of New Tags

Yang, Jing; Wang, Jun; Gao, Mengyang

doi:10.3390/math11061361

Open AccessArticle

Community Evolution Analysis Driven by Tag Events: The Special Perspective of New Tags

by

Jing Yang

,

Jun Wang

^* and

Mengyang Gao

School of Economics and Management, Beihang University, 37 Xueyuan Road, Haidian District, Beijing 100191, China

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(6), 1361; https://doi.org/10.3390/math11061361

Submission received: 11 February 2023 / Revised: 6 March 2023 / Accepted: 9 March 2023 / Published: 10 March 2023

(This article belongs to the Special Issue Business Analytics: Mining, Analysis, Optimization and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

The type, quantity, and scale of social-tagging systems have grown constantly in recent years as users’ interest increases. Tags have important reference value in the study of networked communities since they typically represent user preference. This paper aims to examine how a tagging community evolves and to check the impact of new tags on evolution. Therefore, we proposed an improved evolution model for tag communities where tags constantly accumulate without withdrawal. Based on the model, we conducted an evolution analysis on three different tag communities with the datasets generated from the Delicious bookmarking system, CiteULike, and Douban. The results from Delicious emphasized that new individuals have an enormous influence on the community evolution, for they dominate the Form event, lead the early Split event, indirectly have a hand in the Merge event, and affect existing tags’ transfer when they flood into the system. Moreover, new tags are proved to be more influential in tagging relation data of CiteULike and Douban, where new tags dominate the Split event. The in-depth and detailed depiction of community evolution helps us understand the evolution process of tag communities and the crucial role of new tags.

Keywords:

community evolution; tagging community; event-driven; group event; individual event; social tagging

MSC:

15A99; 68Uxx

1. Introduction

Tagging describes a process in which users utilize tags according to their preferences. Specifically, users add tags to resources to describe the characteristics, properties, or categories of resources or to deliver users’ thoughts or feelings about resources [1]. Tagging helps to realize users’ personalized classifications and organization of resources more conveniently. Meanwhile, different users can share resources through tags. In this context, social tagging has formed and developed, which also supports information retrieval [2]. With networks developing rapidly and types of social-tagging systems increasing, community analysis has attracted the attention of scholars. Research on communities has matured from simple community structures to multi-level, overlapping, interactive, and nested complex network community structures. The two most prominent research topics in community research are community discovery and community evolution [3]. Currently, most studies focus on analyzing community evolution, which can be divided into two stages: (1) community discovery, an attempt to capture all the communities in one snapshot, and (2) community evolution, a comparative analysis of communities among consecutive snapshots to determine their evolution [4].

Social-tagging systems typically represent Web 2.0’s features, decentralization, openness, communion, community aggregation, etc., and lead the new Internet information pattern and user-generated content (UGC) [5,6,7]. Tags are users’ free choices of text to mark resources and reflect resources’ content, subject, and users’ preferences [8,9,10]. Researchers obtain different themes and their distribution through theme clustering of tags, such as Latent Semantic Analysis (LSA) [10,11,12,13]. Furthermore, by analyzing tags’ themes, researchers can infer users’ interests and realize personalized user recommendations [14,15,16,17].

This paper studies the evolution of the social-tagging system. Tags have the advantages of directivity, consistency, and unicity in reflecting information content/subject and users’ interest preferences. Thus, to avoid the complexity and diversity of users belonging to different communities, tags are chosen as research objects to study community evolution in the context of social tagging (i.e., tagging community evolution). As far as it goes, there have been quite a few studies on community evolution, but few scholars have paid attention to the role of new individuals in the process of community evolution [18,19,20,21,22,23]. While in the field of organizational behavior, the impact of new individuals on changes in the organizational unit has been proposed [24,25]. To bridge the above gap in community evolution, this study focuses on the impact of the new individuals (i.e., new tags) on the tagging community evolution.

In this paper, the traditional two-stage strategy is used to study the evolution of tagging communities. First, tags are clustered in every snapshot using LMMSK (Min–Max Similarity K-means based on LSA) algorithm, which is an improved clustering algorithm [26]. Second, based on the obtained tag clusters, an evolutionary analysis of social community is conducted by applying the event-driven evolution model, which is improved to fit current tagging systems, Delicious, CiteULike, and Douban. Previous research ignored the role of new tags in community evolution, and this paper aims to make contributions to bridging this gap. To this end, the current study will complete evolution analyses of those tagging systems, with events at the individual and group level thoroughly depicted; meanwhile, the role of new tags will be discussed in depth along with those events. This study is a supplement to the study of the social-tagging community.

2. Related Works

In the field of community evolution, various research methods have been proposed, including theories based on graphs, such as average path length, clustering coefficient, degree distribution, and cluster size in the study of community discovery, evolution, structure, characterization, etc. [22,27,28]; methods based on information theory, such as normalized mutual information (NMI) with gain function Q for community division [29]; and community research based on clustering, such as hierarchical clustering, partition clustering, spectral clustering, etc. [22,30,31,32]. In addition, some researchers have proposed theoretical models for community updating/propagation, such as the local world evolving network model, forest fire model, and models of social network growth [33,34,35,36].

The focus of network community research has changed from static research to dynamic tracing. Static research methods include the topological analysis of complex networks, the discovery of key nodes or community leaders, knowledge–community discovery, and community–structure discovery [18,19,20,21,22]. Dynamic research includes community-evolution tracing in complex networks, community-evolution models, the tracing of users’ interaction [37], and abrupt group events (or abnormal changes) [23,38,39,40,41]. Many researchers have extracted evolution processes in dynamic communities to identify critical events over time. Palla et al. proposed the Clique Percolation Method (CPM) to identify community events based on continuous snapshots [42], where community evolutionary relationship is reflected by dynamic changes of different community structures between two consecutive snapshots, and the evolution model consists of six stages: growth, shrink, merge, split, form, and dissolve. Based on Markov chains, Wang et al. proposed a hybrid community detection algorithm and provided promising solutions for community detection [43]. However, this analysis of community evolution is only from an external structure [3,4,42,44]. Considering both the external structure and internal-community individual behaviors, Asur et al. proposed a community-evolution model, which classified key community events as Form, Continue, Split, Merge, and Dissolve, and individual events as appear, disappear, join, and leave [39]. This model helps to understand the community evolution more clearly but still cannot capture all events a community goes through [4,45,46]. Asur also proposed a viewpoint-based approach, which considers the perspectives of different individuals within a network to gain a more comprehensive understanding of the relationships and dynamics between them [47]. Some researchers consider new events, such as Contract and Reform, to capture more aspects of community evolution. For example, Takaffoli et al. proposed the community flag to help community-evolution tracking across the whole observation period, as the community flag can make change-tracing in continuous snapshots rather than limited to two consecutive snapshots [46]. In addition, shrinking and growing events are introduced in the framework of Piotr et al., and the decision tree is presented by combining the proposed inclusion threshold [3].

The field of dynamic community research encompasses thousands of studies, making it challenging to succinctly summarize the status of the existing research. Fortunately, some researchers have proposed their classification frameworks to help us understand the development of this field. According to the way that time awareness is incorporated into community detection methods, the framework proposed by Papadopoulos et al. contains three categories, longitudinal application on successive snapshots, vertex-centric time awareness, and incremental application [48]. Rossetti and Cazabet have classified cross-time community discovery approaches into four types based on the main evolving body, (1) fixed Memberships and fixed properties, (2) fixed memberships and evolving properties, (3) evolving memberships and fixed properties, and (4) evolving memberships and evolving properties [49]. The classification proposed by Dakiche et al. corresponds to different methodological principles used to track community evolution, including independent community detection and matching, dependent community detection, simultaneous community detection on all snapshots, and dynamic community detection on temporal networks [50].

Many studies on dynamic communities have been summarized in the above survey articles. However, most of them mainly focused on the discussion at the method level. Few studies have described the evolution of the whole community in detail or events at the individual level. We have tracked and investigated different social-tagging systems and noticed that in the development of those systems, tags increase exponentially. Although new individuals have been proven to have an important impact on dynamic teams (which may be destructive or beneficial) [51,52,53,54], their role in community evolution is hardly involved. We assume that in dynamic communities, new individuals can also play an important role as the structure and environmental context of dynamic communities and dynamic teams are similar. Therefore, we have collected data from three different tagging systems, Delicious, CiteULike, and Douban, to investigate the crucial impacts of new tags on the evolutionary events at the group and individual levels.

3. Materials and Methods

3.1. Datasets

To make the experimental results more convincing, the current study has collected different data sets from three tagging systems to track the evolution of the tag community.

3.1.1. Delicious

The first dataset, released by Cantador et al. [55] (RecSys 2011, http://ir.ii.uam.es/hetrec2011, accessed on 2 May 2019), was generated from the Delicious social bookmarking system, which is one of the earliest and most typical social tagging communities. This dataset contains users’ bookmarks and annotation information in Delicious from 2003 (when Delicious was started) to 2010 (when Delicious changed dramatically with a significant UI redesign after being purchased by an American Internet company AVOS). The dataset involves 53,388 tags and 69,223 resources. It fits well with the current research content to apply and verify the evolution model since user behavior has a good consistency and the system evolved freely without external interference in this period.

3.1.2. CiteULike

CiteULike allows users to tag papers that interest them and create their own collections of resources. CiteULike-t (http://www.wanghao.in/CDL.htm, accessed on 30 June 2021) contains many files, but only tagging relation file, ‘tag-item.dat,’ fits current research. However, this dataset has no tagging time [56]. We segmented tags into different snapshots based on tags’ occurrence order, as tags are sorted by tag-id. Thus, only the tags’ first tagging relation is utilized. Finally, this dataset contains 52,946 tags, 52,946 tagging relations, and 10,967 articles.

3.1.3. Douban

Douban (www.douban.com, accessed on 13 March 2020), as a collaborative-tagging system, provides thousands of resource sites related to books, movies, songs, etc., and allows users to tag those resources that interest them. In this paper, users’ book-tagging relations of 3005 book links have been randomly crawled from Douban between 2017 and 2019. After the deletion of meaningless tags, ‘@#,’ for example, with regular expressions and non-tagging book links, where no user has labeled any tag, 16,740 book-tag tagging relations remained, with 2152 books and 4551 tags in total.

3.2. Data Processing

In the datasets, there is no training data that can be referred to, and each dataset contains so many tags and resources that the tag-resource matrix is high-dimensional, which makes it difficult for us to artificially produce accurate and effective training data. Hence, we choose to use unsupervised techniques rather than supervised ones.

First, the data of each snapshot is converted into tag-resource matrix D, in which the element is the tag’s annotating frequency to the corresponding resource. Taking D as the input, the preprocessing goes as follows:

(1): Data filtering: in D, if a tag has no common resource object with any other tag, it is called a ‘wander tag.’ The left tag-resource matrix WS is used for clustering analysis after wander tags are screened out.
(2): k-SVD: first, calculate SVD (singular value decomposition) of WS using WS = USV^T; second, set k by finding the minimum that satisfies $(\sum_{1}^{k} λ_{i}) / (\sum_{1}^{r a n k (W S)} λ_{j}) \geq δ$ ; third, calculate k-SVD of WS and obtain the k-dimensional coordinate of tags and resources using WS* = U_kS_kV_k^T, where WS^* is the k-order approximation of WS, and δ is set to 0.2, while λ_i are the diagonal elements of matrix S.
(3): K-means cluster: input WS^* and k, obtained from (2), as the object of the cluster analysis and group number, and calculate the K-means with the initial centroids obtained by Min–Max similarity.

Here WS is clustered with the LMMSK algorithm, which is mainly represented by (2) and (3) [26]. Therefore, including wander tags, there are k + 1 tag groups in each snapshot, which are inputs in the following evolutionary analysis.

3.3. Methods

3.3.1. Determinations in the Evolution Model

The four-classification framework of dynamic community approaches proposed by Dakiche et al. [50] includes (1) independent community detection and matching, which breaks the evolution of systems into many steps and matches communities between consecutive steps; (2) dependent community detection, which is only suitable for networks with community structures that are more stable over time; (3) simultaneous community detection on all snapshots, which cannot update the community result with the incoming data; and (4) dynamic community detection on temporal networks, which may result in drifting toward invalid communities. The second class does not fit the current study as our social-tagging systems are not stable. The third one lacks Merge and Split events and does not support the tracking of new individuals, so it does not meet our expectations. The fourth one may result in drifting toward invalid communities. Therefore, only the first class fits our purpose well. Furthermore, Rossetti and Cazabet [49] suggest that we should choose approaches based on snapshots in priority as the number of evolution steps present in our original datasets is less than 10.

For the above considerations, the event-based model of evolution analysis proposed by Asur et al. [39], which is based on snapshots, is suitable for our datasets. This model is a typical two-stage approach and has been widely referred to in the literature [41,57,58]. In addition, individual events in the model are helpful for our research objectives. However, in the evolution model of Asur et al. [39], individuals can disappear from the system, and the authors believe that a group can only be divided into two subgroups. These do not conform to the reality of tag communities. For example, tags that have been used can hardly be eliminated from these systems, and a tag group can be split into more than two groups. Therefore, we have made some necessary improvements to the model. First, we take full account of cumulative tags’ growth in the datasets, and, in our design of evolution events, there is no Disappear event. Asur’s model reflects a situation in which individuals in the community can leave the group or completely disappear from the community. Nevertheless, in our social-tagging communities, once the tag appears, it is difficult and unnecessary to eliminate its traces of existence. Thus, in our improved model, the events at the individual level do not include Disappear, which is more in line with the real situation of the social-tagging communities.

Second, some restrictions of events, such as Continue, Merge, and Split, are eliminated from capturing more community transfers. For example, we have expanded the Split event as we do not have the limitation that a group can only be split into two new groups. In real-life scenarios, a group may split into more than two distinct sub-groups. This revision aligns our model with actual occurrences and leads to a more accurate representation of reality.

Lastly and most differently, when clustering, this paper only pays attention to nodes (tags) and ignores the links between nodes as the tag-resource matrix has marked the collaborative tagging relations of tags. This helps to simplify the first stage, i.e., community detection.

We have changed the model to make it more suitable for application in the current social-tagging systems so as to better detect events in the communities and track the community evolution.

In the following, the specific definitions of the events are described.

C_{i}^{q}

is the q-th tag group in the i-th snapshot,

|C_{i}^{q}|

is the tag number of

C_{i}^{q}

, and detailed differences between current and Asur’s studies are presented after event definitions.

Tag group events:

k-Continue:

k \in (0, 100)

if

|C_{i}^{q} \cap C_{i + 1}^{j}| / |C_{i}^{q}| \geq k %

,

C_{i + 1}^{j}

is a k-Continue of

C_{i}^{q}

. In addition, the Continuation Degree is calculated using

k_{c} = |C_{i}^{q} \cap C_{i + 1}^{j}| / |C_{i}^{q}|

. Noting that 50-Continue is the foundation of Merge, Join, and Leave.

In Asur’s research, the Continue event exists only if

C_{i + 1}^{j} = C_{i}^{q}

.

Merge:

C_{i}^{p}

and

C_{i}^{q}

merge into

C_{i + 1}^{j}

if

C_{i + 1}^{j}

is 50-Continue of both

C_{i}^{p}

and

C_{i}^{q}

.

k_{m}

taken as Merge Degree, calculated by

k_{m} = |(C_{i}^{p} \cup C_{i}^{q}) \cap C_{i + 1}^{j}| / |C_{i}^{p} \cup C_{i}^{q}|

.

In previous research, Merge demands one more condition: that there exist edges between

C_{i}^{p}

and

C_{i}^{q}

in timestamp i + 1.

Split: In (i + 1)-th snapshot,

C_{i}^{p}

goes through a split event if more than one tag group meets the condition

|C_{i + 1}^{j} \cap C_{i}^{p}| > |C_{i + 1}^{j} \cap C_{i}| / 2

, where

C_{i}

is the collection of all tags in the i-th snapshot with Split degree

k_{s} = \sum_{j} |C_{i + 1}^{j} \cap C_{i}^{p}| / |C_{i}^{p}|

.

The previous definition is stricter, and

C_{i}^{j}

splits when

(|(C_{i + 1}^{m} \cup C_{i + 1}^{n}) \cap C_{i}^{j}| / \max (|C_{i + 1}^{m} \cup C_{i + 1}^{n}|, |C_{i}^{j}|)) > k %

, in the condition of

|C_{i + 1}^{m} \cap C_{i}^{j}| > |C_{i + 1}^{m}| / 2

and

|C_{i + 1}^{n} \cap C_{i}^{j}| > |C_{i + 1}^{n}| / 2

. Asur et al. assumed communities always split into two communities, while the improved model does not have this limitation.

Form: if

\forall n |C_{i + 1}^{j} \cap C_{i}^{n}| < 2

, tag group

C_{i + 1}^{j}

is involved in a form event.

Dissolve: if

\forall n |C_{i + 1}^{n} \cap C_{i}^{j}| < 2

,

C_{i}^{j}

goes through a dissolve event.

The definitions of Form, Dissolve, and individual events (Appear, Join, and Left) are the same as Asur’s.

Tag individual events:

Considering tags will not disappear over time in a social-tagging system, individual events include only Appear, Join, and Leave.

Appear: if

v \notin C_{i - 1} & & v \in C_{i}

, an Appear event has occurred in the i-th snapshot.

Join: if

|C_{i}^{k} \cap C_{i + 1}^{j}| > |C_{i}^{k}| / 2 & & v \notin C_{i}^{k} & & v \in C_{i + 1}^{j}

, a join event is marked on v from

C_{i}^{k}

to

C_{i + 1}^{j}

.

Leave: if

|C_{i}^{k} \cap C_{i + 1}^{j}| > |C_{i}^{k}| / 2 & & v \in C_{i}^{k} & & v \notin C_{i + 1}^{j}

, a leave event is marked on v from

C_{i}^{k}

to

C_{i + 1}^{j}

.

3.3.2. CCR Matrix

A Clustering Corresponding Results Matrix (CCR Matrix) [26] is introduced to track events, combined with current event definitions of k-Continue, Form, Split, Merge, Dissolve, Appear, Join, and Left. Table 1 shows the CCR Matrix from 2003 to 2004 in the Delicious dataset.

Table 1 shows three tag groups in 2003 (

C_{3}^{1}

,

C_{3}^{2}

, and

C_{3}^{3}

) and 4 in 2004 (

C_{4}^{1}

,

C_{4}^{2}

,

C_{4}^{3}

, and

C_{4}^{4}

), including wander tags

C_{3}^{1}

and

C_{4}^{1}

. The element in CCR Matrix is the number of tags shared by different groups in consecutive snapshots. For example,

C_{3}^{3}

and

C_{4}^{3}

share five tags. Moreover, considering tags’ cumulative growth,

N_{3 - 4}

shows new individuals’ distribution in the later snapshot, that is, 33 (1 + 16 + 12 + 4) new tags in snapshot 2004.

3.4. Experimental Setup

In our datasets, there are a great number of tags and resources, which makes the corresponding tag–resource matrices very large. In some matrices, both rows and columns are more than 100,000. It is difficult to handle such high-dimensional matrices using ordinary personal computers. Therefore, all the algorithms above are implemented on a Server. The detailed experimental environment is as follows:

Hardware equipment: Server, Lenovo ThinkServer TD350; Processor, Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz (Double); RAM, 256 G; Disk space, 7449.53 G.

Software environment: Operating system version, Microsoft Windows Server 2012 R2 Standard; Matlab version, MATLAB R2015b.

4. Results

In general, the communities in CiteULike and Douban evolve mostly the same as that in Delicious, and new tags play an important role in tag community evolution. This paper mainly expounds on Delicious’ evolution analysis and explicitly indicates those different results in CiteULike and Douban, as the Delicious dataset contains more detailed data than the others.

4.1. Descriptive Statistics

4.1.1. Snapshots

There are eight snapshots in Delicious from 2003 to 2010. Tagging relation data, “tag-resource-time,” which is segmented by years to obtain cumulative data for each snapshot, is used for evolution analysis. For example, the cumulative data of snapshot 2007 contains all tag-resource correspondence data from 2003 to 2007. The Delicious system demonstrates its rapid expansion through the exponential growth of tags and resources from 2003 to 2010, indicating that from 2003 to 2010, the Delicious community stayed in a state of continuous growth [59,60].

Three snapshots are in Douban 2017, 2018, and 2019, correspondingly, where tags and books grow exponentially, as in Delicious. While in CiteULike, the tag number is supposed to increase exponentially, as in Delicious and Douban. Hence the evolutionary process is divided into three snapshots, T1, T2, and T3, correspondingly containing the first 1/32, 1/8, and the whole part of tags that appeared in CiteULike. It is assumed that the time intervals from T1 to T2 and T2 to T3 were the same.

4.1.2. Wander Tags

In Delicious, Wander tags increased linearly from 2003, while its ratio to all tags decreased yearly as the richness of tags and resources leads old Wander tags to have common tagging behaviors (co-occurrence), and those will not be converted into wander ones. As expected, more than half of wander tags are new tags, except for 2004 and 2005.

The change of wander tags in CiteULike is similar to that in Delicious, except in T1, when CiteULike holds a much larger proportion of wander tags. Probably because in CiteULike, only a tag’s first tagging relation is utilized, and with many tagging relationships missed, the collaborative tagging relationship among tags is relatively weak in the early stage. In addition, 370 wander tags from T1 appear in groups in T2 (28.07% conversion ratio), and over 80% of wander tags in T2 are converted into groups in T3. Thus, as new tags pour into the system, the speed of old wander tags’ conversion to groups increases dramatically.

However, there is no wander tag in Douban. One possible reason is that deleting meaningless tags and null-tag book links leads tags to co-occurrence more often. Meanwhile, the clustering results of users’ tags reflect not only the subject content of books but also readers’ own cognition. The absence of wander tags in Douban indicates the similarity of book content or readers’ cognition of book content.

4.1.3. Growth of Tag, Group, and Group Scale

Delicious showed a tremendous increase in tag groups from two in 2003 to 837 in 2010. Figure 1 depicts the group size distribution.

In Figure 1, the size-distribution curve moves up and right, showing the growth of both the number and scale of groups. Different tag groups represent different themes, indicating two features: (1) some large-scale groups are much more popular in the community than others; (2) average-size tag groups account for the majority of groups, demonstrating that a great deal of less-popular themes makes up the Delicious system’s mainstream environment.

In CiteULike and Douban, tags and groups also increase exponentially, and the results have verified those two features above. For example, in T1 of CiteULike, the two largest groups contained 89 and 83 tags, accounting for 67% of the total, and in Douban 2019, the average group scale is 42.14, while the largest group contains 106 tags.

4.2. Tag Group Events in Delicious

4.2.1. Form Event and Continue Event

After launching in 2003, Delicious attracted more and more users to tag bookmarks for the convenience of information exchange and sharing. Naturally, Form emerges as the first tag group event, followed by Continue, Split, Merge, and Dissolve.

New tag proportions of consequent groups are mostly below 0.6 in Continue events, but new tags constitute the main part of Form events, though its portion gradually decreases. In detail, new tags account for more than 80% of all Form groups before 2008, more than 60% of 80% of groups in 2009, and 50% of groups in 2010.

Statistical analyses of Form and Continue have one thing in common: generally, Form and Continue events increase, but their ratio to total groups decreases yearly. Note that relatively stable groups (Continue events) comprise a tiny part of the system, while Form is the main reason for community evolution initially.

Figure 2 shows the analysis of Continue events, where C-Rate is the former groups’ proportion of total groups in Continue event and

k_{c}

, continuation degree.

The 100-Continue event is rare. The Continue events’ proportion with

k_{c}

in [0.5, 0.6] increases, whereas that with higher

k_{c}

decreases. Results of both Continue and Form show that the group highly actively evolves, and the community system becomes more detailed and complex as new tags’ proportion in Form gradually decreases and few groups remain stable.

4.2.2. Split Event

A Split event is due to diversity expansion when the tag-group scale grows and tagging relationships change. As with Form and Continue, Split events increase yearly, while its ratio to all groups decreases, which means that Split’s explanation to Delicious community evolution decreases. In 2010, Split had little explanation value as its ratio to all groups was 0.04.

The attribution analysis of Split is given in Figure 3, which shows the distribution of Split degree

k_{s}

and the distribution of new tags in Split’s resulting groups.

In a Split event, tags from Split’s former group account for at least 50% of the resulting groups (according to the Split definition, where new tags are not considered). Smaller

k_{s}

means that Split events are more efficient, and the differentiation of tags becomes more obvious since fewer tags from former groups are dispatched into different tag groups in the later snapshot. Furthermore, most Split consequents contain new tags, while only from 2004 to 2008, most Split consequents are mainly made of new tags.

Based on the evolution analysis of Delicious, the new tag plays a leading role in Form events, has a great influence on Split events in the early period only, and has little effect on Continue events.

4.2.3. Merge Event

Based on 50-Continue, all 11 Merge events of Delicious are given in Table 2, which displays Merge using consecutive snapshots, containing former and resulting groups of Merge event and Merge degree,

k_{m}

. For example, in the 1st Merge event, the 12th, 52nd, and 54th tag groups from 2006 merged into the 18th group in 2007, with

k_{m}

= 0.72.

Merge events are scarcer than 100-Contine events, while the Merge degree is high, which shows the high effectiveness of Merge. Similar to 100-Continue, Merge is a concise and efficient group event, and its description of tag group evolution is quite specific, whereas its description scope is limited.

Form, Split, Continue, and Merge are taken as explanations of current tag groups. Figure 4 shows to what extent the current model can explain Delicious’ evolution, where Later-T is the number of the resulting groups of these group events, and Ratio-B, Later-T into the number of all groups in the current snapshot, is proposed to measure the effect of the current model.

Generally, Later-T increases over the years, whereas Ratio-B declines from 2006. Although better than any single group event, Later-T is much less than expected. The model needs to be improved to amplify its explanation scope, although it is difficult to maintain accuracy.

4.2.4. Dissolve Event

Table 3 shows 13 Dissolve events obtained from Delicious.

Groups dissolved from 2006 to 2008 mainly consisted of new tags owing to limited tagging frequency and fragile relations among inner-group tags. The period from 2009 to 2010 has the most Dissolve events because of the variety in tagging frequency and the large influx of tags. Thus, the Dissolve event supports our proposition. As an example to track further, a Dissolve event in Delicious from 2008 to 2009 is shown in Table 4.

In this Dissolve, tag ‘constructions’ left and joined a new group in 2009. Interestingly, ‘constructions’ and another four tags, ‘sch#ze,’ ‘viewing,’ ‘recycle_mckinney,’ and ‘green-business’ stayed in the same group in 2010. From 2008, the size of the group, including ‘constructions,’ grew from three to 46. By 2010, ‘constructions’ marked 59 resources, while the left only marked one or two. Those inactive tags were attached to ‘constructions,’ by which they were counted in a group.

Figure 5 shows tag groups in 2010, which contain ‘constructions,’ ‘nurturing,’ and ‘rockmelt’ (originated from the Dissolve between 2008 and 2009). The lines indicate co-occurrence relationships, where a bigger tag means more occurrences. Figure 5 explicitly explains how the Dissolve event impacts community evolution. Tags from the dissolved group ‘constructions,’ ‘nurturing,’ and ‘rockmelt’ cohere their new groups separately after evolution. Despite the small number, the Dissolve event does profoundly impact community evolution and group reconstruction.

4.3. Tag Individual Events in the Delicious Community

4.3.1. Appear Event

Appear events, namely new tags, increased exponentially from 24 in 2003 to 22,270 in 2010. The new tag’s ratio to all tags fluctuates by approximately 50%, indicating the multiplied expansion of the system, which agrees with the previous data.

Figure 6 shows Appear event proportion, namely the new tag rate (NT-Rate) in each group from 2004 to 2010. For example, the last green column shows 421 groups with NT-Rate in (0.4, 0.6) in 2010. Generally, tag groups with different NT-Rate intervals increase over time. Initially, groups with the NT-Rates in (0.8, 1) and (0.6, 0.8) are predominant, and most tag groups’ NT-Rates hover from 0.4 to 0.8 gradually. Thus, the new tag shows its importance in tagging system evolution.

This paper proposes Ratio-A to evaluate the model’s ability to interpret evolution. In a snapshot, Ratio-A equals the number of new tags related to group events into the number of all new tags. In Delicious, from 2003 to 2008, Ratio-A was 100% and later decreased to 33.56% in 2010, which means that the majority of new tags cannot be captured with the current evolution model. That means when the data size is too large, the current evolution model cannot track groups and new tags completely, and its power to explain the evolution at the group level is reduced.

4.3.2. Leave and Join Events

Variation between Leave and Join tags determines the group-scale change after Continue. A brief comparison between Leave and Join events directly reflects the scale expansion of tag groups, where the influence of Leave is weakened when there are more Join events than Leave events. Join tags play a leading role in size variation, which explains the contradictory phenomenon that size variation is positively correlated with Leave under some snapshots, as shown in Table 5.

Join events are divided into ‘new tags’ (New-Join) and ‘existing tags’ (Old-Join), with New-Joins accounting for the majority. Correlation analysis shows that scale variation and Leave are more relevant with New-Join than with Old-Join. Therefore, a large influx of new tags not only impacts the tag group scale but also affects tag transfer. Thus, the critical role of new tags is convincing in terms of individual events.

4.4. Community Evolution in CiteULike and Douban

Communities in CiteULike and Douban evolve almost in the same way as that in Delicious. The differences between Delicious and the other communities are pointed out.

Surprisingly, the current model better explains community evolution in CiteULike and Douban than in Delicious. The lowest Ratio-B, 42.86% in Douban 2018, was much higher than the lowest Ratio-B in Delicious, and the other Ratio-Bs of CiteULike and Douban were over 55%, which means the current model can capture over 55% of group events. Moreover, in Douban 2018 and 2019, Ratio-A increased from 40.23% to 56.06%, indicating that the current model captures more new tags in group events as the Douban community expands, which is opposite to the results of Delicious. Generally, CiteULike and Douban show support for our improved evolution model in explaining tagging system evolution. Furthermore, only a few Continue events exist in CiteULike and Douban, which means most tag groups hardly remain stable. Especially in CiteULike, over 88% of tags are involved in Split.

In addition, new tags play a more important role than in Delicious. In Delicious, they dominate the Form event and lead the early Split event. While in CiteULike and Douban, new tags directly dominate the Split event rather than the Form event. The comparison is shown in Table 6, where the embedded supplementary table depicts the new tags’ powerful impact on Form and Split, taking data from Douban 2018–2019 as an example.

Table 6 shows that the consequents of the Split are mainly made up of new tags. Furthermore, new tags show an enormous influence on Form events, with an average of more than 89% of new tags in Form groups in CiteULike and Douban. Therefore, CiteULike and Douban provide a clear clue for our assumption.

5. Discussion and Conclusions

This paper studies community evolution based on events of tag groups and individuals with Delicious, CiteULike, and Douban datasets using the improved community evolution model. This paper especially focuses on the perspective of new tags and holds that new tags play an important role in community evolution.

Results show that as the system expands, tag groups continuously evolve, and most evolution events are not only related to but also led by new individual tags. In Delicious, the current model can capture most groups and tags before 2009 but less than 50% in 2009 and 2010. It means the current model can thoroughly explain Delicious’s evolution relationships among groups initially but has less explanatory power in later evolution when Delicious grew to a larger scale. While in CiteULike and Douban, the community evolution can mainly be explained by the current model, with group events capturing the majority of tags in the system expansion at any stage. Only a few groups remain stable, and tag groups and their scale constantly change. In addition, our proposition is strongly supported, even though the new tag has little effect on the Continue event, which makes a tiny part of tagging groups. The influx of new tags leads to the system expanding, and most groups mainly consist of new tags. Moreover, new tags directly impel the growth of Form event and the more detailed evolution of existing tag groups; at the same time, the large influx of new tags is the principal power of group scale growth and has a profound impact on existing tag transfer. Especially in CiteULike and Douban, new tags dominate not only the Form event but also the Split event.

In the current study, the following improvements are expected: (1) Further optimization of the evolution model is needed. The current model can hardly explain all the evolutionary relationships since the evolution model missed more than half of the groups and new tags of 2009 and 2010 in Delicious. (2) From 2003 to 2010, Delicious was in a constant state of active growth, as well as CiteULike and Douban. It would be much better if there were some datasets of other states, such as mature or decay. (3) The study only examines three tagging systems, which may not be representative of all tagging systems. Future research can consider studying the community evolution of different types of tagging systems. (4) This study mainly discusses the role of new individuals in the process of community evolution but ignores other possible influencing factors, such as user behavior changes, external environment, etc. (5) In addition, this study considers the possible invalid tags in tag communities and has tried to exclude their influence on group events by identifying wander tags. However, our research does not involve tag spamming. Many wander tags have become non-wander tags, which means that not all wander tags are related to tag spamming. Therefore, to reflect the real community evolution more comprehensively, future research needs (1) more abundant and detailed community evolution data, (2) more systematic consideration of the identification of tag spamming, and (3) more holistic thinking of the influencing factors of community evolution, for example, from the user behavior level to the community environment level. In addition, considering possible different types of tagging datasets, the algorithms of community discovery and community evolution may need to be improved and optimized.

The contributions of this paper include: (1) the improvement of the event-based community evolution model, (2) the new methods to evaluate the community evolution model, and (3) the detailed and comprehensive community evolution analysis with a special perspective of new tags. First, the improved event-based community evolution model is applicable for data sets that are accumulated constantly without individual withdrawal. Some events’ definitions have been slightly changed. For example, the current definition of k-Continue lowers the selection criteria, and the definition of 50-Continue paves the way for the group event Split and the individual events, Leave and Join. Second, Ratio-A and Ratio-B are proposed to evaluate the explanation ability of the evolution model by measuring how much of tags and groups an evolution model can capture. Third, though it is hard for a single individual to influence community evolution, a considerable accumulation of individuals might dominate the evolution, which, as expected, is supported by current research. The rich data materials make this study more reliable, and the framework and methods in current research can be expanded to other collaborative-tagging systems. What is more, with the wide application of social network theory and the increasing number of teams regarded as dynamic adaptive systems recently [61], the methods and results of this study in the tagging community can be applied to the research and practice of social network communities composed of dynamic teams. In particular, the role of new tags in community evolution can be referenced by the research on the influence of new members in dynamic teams [51].

Author Contributions

Conceptualization, J.Y. and J.W.; Methodology, J.Y.; Validation, J.Y. and M.G.; Formal analysis, J.Y.; Resources, J.W.; Data curation, J.Y. and M.G.; Writing—original draft, J.Y. and M.G.; Writing—review & editing, J.Y., J.W. and M.G.; Visualization, J.Y. and M.G.; Supervision, J.W.; Project administration, J.Y. and J.W.; Funding acquisition, J.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China grant number 72171008.

Data Availability Statement

Publicly available datasets were analyzed in this study. The dataset of Delicious can be found here: [http://ir.ii.uam.es/hetrec2011, accessed on 2 May 2019]. The dataset of CiteULike can be found here: [http://www.wanghao.in/CDL.htm, accessed on 30 June 2021]. While the dataset of Douban in this study are available on request from the corresponding author. The data are not publicly available due to [privacy and ethical restrictions].

Conflicts of Interest

The authors declare no conflict of interest.

References

Lamere, P. Social Tagging and Music Information Retrieval. J. New Music Res. 2008, 37, 101–114. [Google Scholar] [CrossRef]
Klašnja-Milićević, A.; Vesin, B.; Ivanović, M. Social Tagging Strategy for Enhancing E-Learning Experience. Comput. Educ. 2018, 118, 166–181. [Google Scholar] [CrossRef]
Bródka, P.; Saganowski, S.; Kazienko, P. GED: The Method for Group Evolution Discovery in Social Networks. Soc. Netw. Anal. Min. 2013, 3, 1–14. [Google Scholar] [CrossRef] [Green Version]
Takaffoli, M.; Sangi, F.; Fagnan, J.; Zaïane, O.R. Community Evolution Mining in Dynamic Social Networks. Procedia Soc. Behav. Sci. 2011, 22, 48–57. [Google Scholar] [CrossRef] [Green Version]
Ghosh, S.; Srivastava, A.; Ganguly, N. Effects of a Soft Cut-off on Node-Degree in the Twitter Social Network. Comput. Commun. 2012, 35, 784–795. [Google Scholar] [CrossRef]
Traud, A.L.; Mucha, P.J.; Porter, M.A. Social Structure of Facebook Networks. Phys. A Stat. Mech. Its Appl. 2012, 391, 4165–4180. [Google Scholar] [CrossRef] [Green Version]
Hu, Q.; Lin, X.; Han, S.; Li, L. An Investigation of Cross-Cultural Social Tagging Behaviours between Chinese and Americans. Eletronic Libr. 2018, 36, 103–118. [Google Scholar] [CrossRef]
Yeung, C.M.A.; Gibbins, N.; Shadbolt, N. A Study of User Profile Generation from Folksonomies. In Proceedings of the SWKM’2008: Workshop on Social Web and Knowledge Management, Beijing, China, 20–24 April 2008. [Google Scholar]
Saari, P.; Eerola, T. Semantic Computing of Moods Based on Tags in Social Media of Music. IEEE Trans. Knowl. Data Eng. 2014, 26, 2548–2560. [Google Scholar] [CrossRef] [Green Version]
Yu, W.; Chen, J. Enriching the Library Subject Headings with Folksonomy. Electron. Libr. 2020, 38, 297–315. [Google Scholar] [CrossRef]
Deerwester, S.; Dumais, S.T.; Furnas, G.W.; Landauer, T.K.; Harshman, R. Indexing by Latent Semantic Analysis. J. Am. Soc. Inf. Sci. 1990, 41, 391–407. [Google Scholar] [CrossRef]
Hofmann, T. Probabilistic Latent Semantic Indexing. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkley, CA, USA, 15–19 August 1999; pp. 50–57. [Google Scholar]
Schiavi, G.S.; Behr, A.; Marcolin, C.B. Conceptualizing and Qualifying Disruptive Business Models. RAUSP Manag. J. 2019, 54, 269–286. [Google Scholar] [CrossRef]
Held, C.; Kimmerle, J.; Cress, U. Learning by Foraging: The Impact of Individual Knowledge and Social Tags on Web Navigation Processes. Comput. Human Behav. 2012, 28, 34–40. [Google Scholar] [CrossRef]
Sun, K.; Wang, X.; Sun, C.; Lin, L. A Language Model Approach for Tag Recommendation. Expert Syst. Appl. 2011, 38, 1575–1582. [Google Scholar] [CrossRef]
Symeonidis, P.; Nanopoulos, A.; Manolopoulos, Y. A Unified Framework for Providing Recommendations in Social Tagging Systems Based on Ternary Semantic Analysis. IEEE Trans. Knowl. Data Eng. 2010, 22, 179–192. [Google Scholar] [CrossRef]
AlAgha, I.; Abu-Samra, Y. Tag Recommendation for Short Abrabic Text by Using Latent Semantic Analysis of Wikipedia. Jordanian J. Comput. Inf. Technol. 2020, 6, 165–180. [Google Scholar]
Leskovec, J.; Lang, K.J.; Dasgupta, A.; Mahoney, M.W. Statistical Properties of Community Structure in Large Social and Information Networks. In Proceedings of the 17th International Conference on World Wide Web (WWW’08), Beijing, China, 21–25 April 2008; Association for Computing Machinery: New York, NY, USA, 2008; pp. 695–704. [Google Scholar]
Mislove, A.; Marcon, M.; Gummadi, K.P.; Druschel, P.; Bhattacharjee, B. Measurement and Analysis of Online Social Networks. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement, San Diego, CA, USA, 24–26 October 2007; Association for Computing Machinery: New York, NY, USA, 2007; pp. 29–42. [Google Scholar]
Guan, T.; He, Y.; Gao, J.; Yang, J.; Yu, J. On-Device Mobile Visual Location Recognition by Integrating Vision and Inertial Sensors. IEEE Trans. Multimed. 2013, 15, 1688–1699. [Google Scholar] [CrossRef]
Li, Y.-M.; Lai, C.-Y.; Chen, C.-W. Identifying Bloggers with Marketing Influence in the Blogosphere. In Proceedings of the 11th International Conference on Electronic Commerce, Taipei, Taiwan, 12–15 August 2009; Association for Computing Machinery: New York, NY, USA, 2009; pp. 335–340. [Google Scholar]
Jain, L.; Katarya, R. Discover Opinion Leader in Online Social Network Using Firefly Algorithm. Expert Syst. Appl. 2019, 122, 1–15. [Google Scholar] [CrossRef]
Nguyen, N.P.; Dinh, T.N.; Shen, Y.; Thai, M.T. Dynamic Social Community Detection and Its Applications. PLoS ONE 2014, 9, e91431. [Google Scholar] [CrossRef]
Kaur, W.; Balakrishnan, V.; Rana, O.; Sinniah, A. Liking, Sharing, Commenting and Reacting on Facebook: User Behaviors’ Impact on Sentiment Intensity. Telemat. Inform. 2019, 39, 25–36. [Google Scholar] [CrossRef]
Hopp, T.; Santana, A.; Barker, V. Who Finds Value in News Comment Communities? An Analysis of the Influence of Individual User, Perceived News Site Quality, and Site Type Factors. Telemat. Inform. 2018, 35, 1237–1248. [Google Scholar] [CrossRef]
Yang, J.; Wang, J. Tag Clustering Algorithm LMMSK: Improved K-Means Algorithm Based on Latent Semantic Analysis. J. Syst. Eng. Electron. 2017, 28, 374–384. [Google Scholar] [CrossRef]
Newman, M.E.J. Assortative Mixing in Networks. Phys. Rev. Lett. 2002, 89, 208701. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Wang, X.-G. A Network Evolution Model Based on Community Structure. Neurocomputing 2015, 168, 1037–1043. [Google Scholar] [CrossRef]
Danon, L.; Díaz-Guilera, A.; Duch, J.; Arenas, A. Comparing Community Structure Identification. J. Stat. Mech. Theory Exp. 2005, P09008. [Google Scholar] [CrossRef] [Green Version]
White, S.; Smyth, P. A Spectral Clustering Approach to Finding Communities in Graphs. In Proceedings of the 2005 SIAM International Conference on Data Mining (SDM), Newport Beach, CA, USA, 21–23 April 2005; Kargupta, H., Srivastava, J., Kamath, C., Goodman, A., Eds.; SIAM Publications Library: Newport Beach, CA, USA, 2005; pp. 274–285. [Google Scholar]
Hopcroft, J.; Khan, O.; Kulis, B.; Selman, B. Tracking Evolving Communities in Large Linked Networks. Proc. Natl. Acad. Sci. USA 2004, 101, 5249–5253. [Google Scholar] [CrossRef] [Green Version]
Chakrabarti, D.; Kumar, R.; Tomkins, A. Evolutionary Clustering. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, 20–23 August 2006; Eliassi-Rad, T., Ungar, L., Graven, M., Gunopulos, D., Eds.; Association for Computing Machinery: New York, NY, USA, 2006; pp. 554–560. [Google Scholar]
Li, X.; Chen, G. A Local-World Evolving Network Model. Phys. A Stat. Mech. Its Appl. 2003, 328, 274–286. [Google Scholar] [CrossRef]
Graham, I.; Matthai, C.C. Investigation of the Forest-Fire Model on a Small-World Network. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 2003, 68, 36109. [Google Scholar] [CrossRef] [Green Version]
Jin, E.M.; Girvan, M.; Newman, M.E.J. The Structure of Growing Social Networks. Phys. Rev. E 2001, 64, 046132. [Google Scholar] [CrossRef] [Green Version]
Deng, Z.-H.; Qiao, H.-H.; Song, Q.; Gao, L. A Complex Network Community Detection Algorithm Based on Label Propagation and Fuzzy C-Means. Phys. A Stat. Mech. Its Appl. 2019, 519, 217–226. [Google Scholar] [CrossRef]
Garza, S.E.; Schaeffer, S.E. Community Detection with the Label Propagation Algorithm: A Survey. Phys. A Stat. Mech. Its Appl. 2019, 534, 122058. [Google Scholar] [CrossRef]
Guan, T.; He, Y.; Duan, L.; Yang, J.; Gao, J.; Yu, J. Efficient BOF Generation and Compression for On-Device Mobile Visual Location Recognition. IEEE Multimed. 2014, 21, 32–41. [Google Scholar] [CrossRef]
Asur, S.; Parthasarathy, S.; Ucar, D. An Event-Based Framework for Characterizing the Evolutionary Behavior of Interaction Graphs. ACM Trans. Knowl. Discov. Data 2009, 3, 16. [Google Scholar] [CrossRef]
Xu, Z.; Rui, X.; He, J.; Wang, Z.; Hadzibeganovic, T. Superspreaders and Superblockers Based Community Evolution Tracking in Dynamic Social Networks. Knowl.-Based Syst. 2020, 192, 105377. [Google Scholar] [CrossRef]
Qiao, S.; Han, N.; Gao, Y.; Li, R.-H.; Huang, J.; Sun, H.; Wu, X. Dynamic Community Evolution Analysis Framework for Large-Scale Complex Networks Based on Strong and Weak Events. IEEE Trans. Syst. Man Cybern. Syst. 2020, 51, 6229–6243. [Google Scholar] [CrossRef]
Palla, G.; Barabási, A.-L.; Vicsek, T. Quantifying Social Group Evolution. Nature 2007, 446, 664–667. [Google Scholar] [CrossRef] [Green Version]
Wang, Z.; Wang, C.; Li, X.; Gao, C.; Li, X.; Zhu, J. Evolutionary Markov Dynamics for Network Community Detection. IEEE Trans. Knowl. Data Eng. 2022, 34, 1206–1220. [Google Scholar] [CrossRef]
Saganowski, S.; Bródka, P.; Kazienko, P. Community Evolution. In Encyclopedia of Social Network Analysis and Mining; Alhajj, R., Rokne, J., Eds.; Springer: New York, NY, USA, 2017; pp. 1–14. [Google Scholar]
Takaffoli, M.; Sangi, F.; Fagnan, J.; Zaïane, O.R. Modec-Modeling and Detecting Evolutions of Communities. In Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media, Barcelona, Spain, 17–21 July 2011; The AAAI Press: Menlo Park, CA, USA, 2011; pp. 626–629. [Google Scholar]
Takaffoli, M.; Sangi, F.; Fagnan, J.; Zaïane, O.R. A Framework for Analyzing Dynamic Social Networks. Available online: http://webdocs.cs.ualberta.ca/~zaiane/postscript/ASNA10.pdf (accessed on 6 June 2022).
Asur, S.; Parthasarathy, S. A Viewpoint-Based Approach for Interaction Graph Analysis. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, 28 June–1 July 2009; Association for Computing Machinery: New York, NY, USA, 2009; pp. 79–87. [Google Scholar]
Papadopoulos, S.; Kompatsiaris, Y.; Vakali, A.; Spyridonos, P. Community Detection in Social Media Performance and Application Considerations. Data Min. Knowl. Discov. 2012, 24, 515–554. [Google Scholar] [CrossRef]
Rossetti, G.; Cazabet, R. Community Discovery in Dynamic Networks: A Survey. ACM Comput. Surv. 2018, 51, 35. [Google Scholar] [CrossRef] [Green Version]
Dakiche, N.; Benbouzid-Si Tayeb, F.; Slimani, Y.; Benatchba, K. Tracking Community Evolution in Social Networks: A Survey. Inf. Process. Manag. 2019, 56, 1084–1102. [Google Scholar] [CrossRef]
Trainer, H.M.; Jones, J.M.; Pendergraft, J.G.; Maupin, C.K.; Carter, D.R. Team Membership Change “Events”: A Review and Reconceptualization. Gr. Organ. Manag. 2020, 45, 219–251. [Google Scholar] [CrossRef] [Green Version]
Kane, A.A.; Rink, F. How Newcomers Influence Group Utilization of Their Knowledge: Integrating versus Differentiating Strategies. Gr. Dyn. 2015, 19, 91–105. [Google Scholar] [CrossRef]
Rink, F.; Kane, A.A.; Ellemers, N.; van der Vegt, G. Team Receptivity to Newcomers: Five Decades of Evidence and Future Research Themes. Acad. Manag. Ann. 2013, 7, 247–293. [Google Scholar] [CrossRef]
Beus, J.M.; Jarrett, S.M.; Taylor, A.B.; Wiese, C.W. Adjusting to New Work Teams: Testing Work Experience as a Multidimensional Resource for Newcomers. J. Organ. Behav. 2013, 35, 489–506. [Google Scholar] [CrossRef]
Cantador, I.; Brusilovsky, P.; Kuflik, T. 2nd Workshop on Information Heterogeneity and Fusion in Recommender Systems (HetRec2011). In Proceedings of the 5th ACM Conference on Recommender Systems, Chicago, IL, USA, 23–27 October 2011; Association for Computing Machinery: New York, NY, USA, 2011; pp. 387–388. [Google Scholar]
Wang, H.; Chen, B.; Li, W.J. Collaborative Topic Regression with Social Regularization for Tag Recommendation. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, Beijing, China, 3–9 August 2013; Rossi, F., Ed.; AAAI Press: Palo Alto, CA, USA, 2013; pp. 2719–2725. [Google Scholar]
Mohammadmosaferi, K.K.; Naderi, H. Evolution of Communities in Dynamic Social Networks: An Efficient Map-Based Approach. Expert Syst. Appl. 2020, 147, 113221. [Google Scholar] [CrossRef]
Ye, X.; Qiao, S.; Han, N.; Yue, K.; Wu, T.; Yang, L.; Huang, F.; Yuan, C. Algorithm for Detecting Anomalous Hosts Based on Group Activity Evolution. Knowl.-Based Syst. 2021, 214, 106734. [Google Scholar] [CrossRef]
Yang, Z.; Wang, H. Evolvement Procession of Innovation Networks for Strategic Emerging Industries: Based on Life Cycle Curve and Social Network Method. Rev. Tec. Fac. Ing. Univ. Zulia 2016, 39, 231–237. [Google Scholar] [CrossRef]
Gu, Y. How Long Can Facebook Survive? Complex Physics Model for Predicting the Life Cycle of Social Network. Int. J. Web Appl. 2013, 5, 46–48. [Google Scholar]
Park, S.; Grosser, T.J.; Roebuck, A.A.; Mathieu, J.E. Understanding Work Teams From a Network Perspective: A Review and Future Research Directions. J. Manag. 2020, 46, 1002–1028. [Google Scholar] [CrossRef]

Figure 1. Size distribution of the tag groups of the Delicious community. Due to the small amount of early data, in order to describe it more clearly, the distribution from 2003 to 2005 is presented in the small figure in the upper right corner. Tag group size means the number of tags within a tag group.

Figure 2. Data analysis of Continue events. C-Rate equals the number of Continue events in snapshot i + 1 divided by the number of groups in snapshot i, which means how many groups remain almost unchanged.

Figure 3. The attribution analysis data for Split.

Figure 4. Explanations of current tag groups. Later-T is the number of generated groups after group events: Form, Split, Continue, and Merge. Ratio-B is Later-T divided into the number of all groups in the current snapshot.

Figure 5. Three tag groups in 2010 in the Delicious system, involving ‘Constructions,’ ‘Nurturing,’ and ‘Rockmelt’.

Figure 6. The distribution of NT-Rate (new tag rate) in groups (2004–2010 from Delicious community).

Table 1. CCR Matrix (an example of Delicious from 2003 to 2004).

Group Index	$C_{4}^{1}$	$C_{4}^{2}$	$C_{4}^{3}$	$C_{4}^{4}$
$C_{3}^{1}$	4	0	0	0
$C_{3}^{2}$	0	3	1	8
$C_{3}^{3}$	0	3	5	0
N_3-4	1	16	12	4

Note. CCR Matrix is an approach to analyze the accuracy of clustering results and compare the results of two different algorithms. CCR Matrix is also an effective tool to capture the evolutions of the social tagging system. See more details in [26].

Table 2. Merge event.

Period	Merge Event	$k_{m}$
2006–2007	(12, 52, 54)->18	0.72
	(16, 30)->32	0.59
	(13, 25)->76	0.77
2007–2008	(42, 145)->59	0.41
2008–2009	(19, 33)->95	0.85
	(49, 204)->218	0.65
	(149, 279)->246	0.52
	(106, 170, 300)->267	0.60
2009–2010	(22, 470)->247	0.49
	(233, 428)->495	0.70
	(350, 511)->586	0.71

Table 3. Dissolve events in the Delicious system.

Period	Dissolved Groups
2006–2007	Books/remediation/opinion/caf?/society
2007–2008	online_education/keyphrase/gtd/raamattu/briefing/educacion
2008–2009	Constructions/nurturing/rockmelt
2009–2010	Heroes/supybot/visualnotetaking/vizthink/colorspace
	higher-education/facebookplaces/rss-feedservices/ customer_service/html5rocks
	base_de_datos/rapidprototyping/Nings/minmal/scarf/esr/ alcohol_quiz /melatonin/szerver/deleted/dabbleboard/er/ primarysources/arthistory/chrismerritt/enewsletter/cthulhu
	webdev/glossy/finally/hmm/martial_arts/krav_maga/ Vietnamese/location_aware/unit_testing/scrollbar/tolisten/ opensso/jsync/iframe/recursosonline/httperf
	forms/omfg/migrant/venn/fractalart/twitrank/ latinoaménica/sweets/rsg/meegenius/thirdspace /dickflash/ photoediting/ obit /fast-flux/web20_tools/lesson_ideas
	thinking/a_z_listed_resources/tables/company/drhorrible/ hiring/objectives/frontend/empresa_20/servicedesign/ Ogilvy/pylons/defragmentation
	glitch/gettext/localization/=-o
	oauth/informationisbeautiful
	Mod#isation/zbrush/failcamp/econmicgrowth/ life_monitoring
	templates/awards/certificates

Table 4. Tracking Dissolve event from 2008 to 2009.

Former-D	Later-D
Constructions Nurturing Rockmelt	Doe/constructions/alternative_assessment/critical-infrastructure/recycle_mckinney/sch#ze/localization/naxos/several/coherence/cheapo/cipav/green-business/g13n
	Digitalfootprints/meta/calendar/facture/charles_brokoski/ical/blemnder/babilonia/element/referencias/rsstools/schemas/nurturing/charlotte/lawenforcement/Susie/redcarpet/webcal/numanuma/udelljon/fandom/icalendar/gruffrhys/hops/neil-freeman/calendar-swamp/udell/tent/anti-marketing/
	attention_economy/packages/architects_netherlands/intellisense/hal/Sherlock/fromjenblacker/Silos/firefly/debian/Barclays/agencymap/thebeatles/expertise/????/pop/sbir/rightclick/pd108munin/description/rockmelt/know-why/know-what

Note. Former-D represents the dissolved group, and Later-D includes all groups containing tags from Former-D.

Table 5. Correlation analysis of scale variation in Delicious.

Time	Coefficients Related to Scale Variation
Time	With Leave	With Join
2003–2004	0.1889822	0.9449112
2004–2005	−0.718132	−0.262071
2005–2006	−0.401565	0.9754618
2006–2007	0.1023321	0.9727505
2007–2008	0.26323	0.9496789
2008–2009	0.3019457	0.9340135
2009–2010	0.0729574	0.9816468

Table 6. New tags’ average proportion in consequent groups of Split in CiteULike and Douban.

Period		CiteULike	Douban
From T1/2017 to T2/2018		68.35%	52.64%
From T2/2018 to T3/2019		84.26%	69.72%
Supplement: New tags’ proportion in consequents of Form and Split events (2018–2019 in Douban)
Form events	New tags in Form consequents	Split events’ antecedents	New tags in Split consequents
3	95.00%	2	77.92%
7	95.24%	4	73.66%
28	93.55%	5	65.52%
78	77.78%	6	71.13%
87	83.33%	8	64.96%
92	95.00%	9	82.86%
93	88.89%	11	51.47%
Average	89.83%	12	67.52%
		13	69.66%
		15	70.51%
		16	72.89%
		17	48.39%
		19	66.67%
		22	92.50%
		24	70.21%
		Average	69.72%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, J.; Wang, J.; Gao, M. Community Evolution Analysis Driven by Tag Events: The Special Perspective of New Tags. Mathematics 2023, 11, 1361. https://doi.org/10.3390/math11061361

AMA Style

Yang J, Wang J, Gao M. Community Evolution Analysis Driven by Tag Events: The Special Perspective of New Tags. Mathematics. 2023; 11(6):1361. https://doi.org/10.3390/math11061361

Chicago/Turabian Style

Yang, Jing, Jun Wang, and Mengyang Gao. 2023. "Community Evolution Analysis Driven by Tag Events: The Special Perspective of New Tags" Mathematics 11, no. 6: 1361. https://doi.org/10.3390/math11061361

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Community Evolution Analysis Driven by Tag Events: The Special Perspective of New Tags

Abstract

1. Introduction

2. Related Works

3. Materials and Methods

3.1. Datasets

3.1.1. Delicious

3.1.2. CiteULike

3.1.3. Douban

3.2. Data Processing

3.3. Methods

3.3.1. Determinations in the Evolution Model

3.3.2. CCR Matrix

3.4. Experimental Setup

4. Results

4.1. Descriptive Statistics

4.1.1. Snapshots

4.1.2. Wander Tags

4.1.3. Growth of Tag, Group, and Group Scale

4.2. Tag Group Events in Delicious

4.2.1. Form Event and Continue Event

4.2.2. Split Event

4.2.3. Merge Event

4.2.4. Dissolve Event

4.3. Tag Individual Events in the Delicious Community

4.3.1. Appear Event

4.3.2. Leave and Join Events

4.4. Community Evolution in CiteULike and Douban

5. Discussion and Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI