Outlier Detection and Prediction in Evolving Communities
Abstract
:1. Introduction
2. Related Work
2.1. Community Detection
2.1.1. Static Methods
2.1.2. Dynamic Methods
2.2. Outlier Detection
2.2.1. Static Methods
2.2.2. Dynamic Methods
3. Background
Community Detection with COTILES
- Preprocessing, where labels are extracted from the attributed graph.
- For each incoming edge, corresponding timestamps and label sets are appropriately updated, then the edge is examined.
- If the edge leads a node into a community’s periphery, the node’s content is checked; if it matches the content of the community, then it is inserted into the community and its labels are inserted into the Community Label Set.
- At the end of every observation window time, the graph, communities, and label sets are updated.
4. COTILES for Outlier Detection
4.1. Outlier Score
4.2. Extending COTILES
- If the nodes of the new edge have only one neighboring node, this means that they have no other connections, and cannot be members of any community yet; thus, the algorithm does not take any actions in terms of community detection. Only the outlier scores of the nodes are computed (lines 8–11). These outlier scores will be high as concerns their structure, as Community Focus is 0, although their Label Set Match could balance their outlierness.
- Next, the algorithm checks whether each of nodes u and v belong to any community core (lines 12–13). Because peripheral nodes are not allowed to propagate community membership, no action is performed if neither node is core (line 14).
- If one of the nodes is a core node of a community with a neighborhood greater than 0 and the other node is appearing for the first time, then the core node spreads its community membership to its neighbors through peripheral propagation, which includes checking the constraint for content similarity before adding a node to a community. At the same time, the outlier score of this node and its neighborhood is re-evaluated, as the updating of the community structure affects all of them (lines 16–25).
- The final case is when both nodes u and v are existing core nodes in G (lines 26–46). Then, the common neighbors of the two core nodes are computed (line 27); based on this, two more scenarios are possible:
- (a)
- If nodes u and v do not have common neighbors, peripheral propagation takes place, as in the previous case (lines 28–30).
- (b)
- If u and v have common neighbors, core propagation takes place. For each common neighbor of the nodes, if it is not a member of any same community, a new community is formed (lines 33–37); otherwise, for each pair of these three nodes, if they are members of the same communities, then they propagate the community membership to the third node (lines 38–46).
The outlier score of each node and its neighbors are immediately computed (lines 47–52).
Algorithm 1 COTILES for Outlier Detection in Evolving Communities. |
|
Algorithm 2 Peripheral Propagation |
|
Algorithm 3 Compute Outlier Score |
|
4.3. Prediction
5. Evaluation Results
5.1. Datasets
5.2. Parameter Tuning
Weight Value (w)
5.3. Outlier Score Distribution
6. Predicting Outlying Behavior of Nodes
6.1. Exploration of Outlier Scores and Future Behavior
6.2. Classification Evaluation
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Hartmann, T.; Kappes, A.; Wagner, D. Clustering evolving networks. In Algorithm Engineering: Selected Results and Surveys; Springer: Berlin/Heidelberg, Germany, 2016; pp. 280–329. [Google Scholar]
- Jdidia, M.; Robardet, C.; Fleury, E. Communities detection and analysis of their dynamics in collaborative networks. In Proceedings of the 2007 2nd International Conference on Digital Information Management, Lyon, France, 11–13 December 2007; Volume 2, pp. 744–749. [Google Scholar]
- Giannakidou, E.; Kompatsiaris, I.; Vakali, A. Semsoc: Semantic, social and content-based clustering in multimedia collaborative tagging systems. In Proceedings of the 2008 IEEE International Conference on Semantic Computing, Santa Clara, CA, USA, 4–7 August 2008; pp. 128–135. [Google Scholar]
- Win, H.; Lynn, K. Community and Outliers Detection in Social Network. In Big Data Analysis and Deep Learning Applications: Proceedings of the First International Conference on Big Data Analysis and Deep Learning 1st; Springer: Singapore, 2019; pp. 58–67. [Google Scholar]
- Zrira, N.; Mekouar, S.; Bouyakhf, E. A novel approach for graph-based global outlier detection in social networks. Int. J. Secur. Netw. 2018, 13, 108–128. [Google Scholar] [CrossRef]
- Sachpenderis, N.; Koloniari, G.; Karakasidis, A. COTILES: Leveraging Content and Structure for Evolutionary Community Detection. In Transactions on Large-Scale Data-and Knowledge-Centered Systems XLV; Springer: Berlin/Heidelberg, Germany, 2020; pp. 56–84. [Google Scholar]
- Chunaev, P. Community detection in node-attributed social networks: A survey. Comput. Sci. Rev. 2020, 37, 100286. [Google Scholar] [CrossRef]
- Papadopoulos, A.; Rafailidis, D.; Pallis, G.; Dikaiakos, M. Clustering attributed multi-graphs with information ranking. In Database and Expert Systems Applications; Springer: Berlin/Heidelberg, Germany, 2015; pp. 432–446. [Google Scholar]
- Zhou, Y.; Cheng, H.; Yu, J. Graph clustering based on structural/attribute similarities. Proc. VLDB Endow. 2009, 2, 718–729. [Google Scholar] [CrossRef]
- Yang, J.; McAuley, J.; Leskovec, J. Community detection in networks with node attributes. In Proceedings of the 2013 IEEE 13th International Conference on Data Mining, Dallas, TX, USA, 7–10 December 2013; pp. 1151–1156. [Google Scholar]
- Huang, Y.; Wang, H. Consensus and multiplex approach for community detection in attributed networks. In Proceedings of the 2016 IEEE Global Conference on Signal And Information Processing (GlobalSIP), Washington, DC, USA, 7–9 December 2016; pp. 425–429. [Google Scholar]
- Sánchez, P.; Müller, E.; Korn, U.; Böhm, K.; Kappes, A.; Hartmann, T.; Wagner, D. Efficient algorithms for a robust modularity-driven clustering of attributed graphs. In Proceedings of the 2015 SIAM International Conference on Data Mining, Vancouver, BC, Canada, 30 April–2 May 2015; pp. 100–108. [Google Scholar]
- Luo, S.; Zhang, Z.; Zhang, Y.; Ma, S. Co-association matrix-based multi-layer fusion for community detection in attributed networks. Entropy 2019, 21, 95. [Google Scholar] [CrossRef] [PubMed]
- Xie, J.; Chen, M.; Szymanski, B. LabelrankT: Incremental community detection in dynamic networks via label propagation. In Proceedings of the Workshop on Dynamic Networks Management and Mining, New York, NY, USA, 22–27 June 2013; pp. 25–32. [Google Scholar]
- Agarwal, M.; Ramamritham, K.; Bhide, M. Real time discovery of dense clusters in highly dynamic graphs: Identifying real world events in highly dynamic environments. Proc. VLDB Endow. 2012, 5, 980–991. [Google Scholar] [CrossRef]
- Bu, Z.; Zhang, C.; Xia, Z.; Wang, J. A fast parallel modularity optimization algorithm (FPMQA) for community detection in online social network. Knowl.-Based Syst. 2013, 50, 246–259. [Google Scholar] [CrossRef]
- Fournier-Viger, P.; Cheng, C.; Cheng, Z.; Lin, J.; Selmaoui-Folcher, N. Mining significant trend sequences in dynamic attributed graphs. Knowl.-Based Syst. 2019, 182, 104797. [Google Scholar] [CrossRef]
- Rossetti, G.; Pappalardo, L.; Pedreschi, D.; Giannotti, F. Tiles: An online algorithm for community discovery in dynamic social networks. Mach. Learn. 2017, 106, 1213–1241. [Google Scholar] [CrossRef]
- Hawkins, D. Identification of Outliers; Springer: Berlin/Heidelberg, Germany, 1980. [Google Scholar]
- Thakur, A.; Trivedi, P. An Efficient Clustering Algorithm with Enhanced MapReduce Design based Modified K Means for Outlier Detection. Int. J. Res. Appl. Sci. 2020, 8, 1085–1089. [Google Scholar] [CrossRef]
- Muller, E.; Assent, I.; Steinhausen, U.; Seidl, T. OutRank: Ranking outliers in high dimensional data. In Proceedings of the 2008 IEEE 24th International Conference on Data Engineering Workshop, Cancún, Mexico, 7–12 April 2008; pp. 600–603. [Google Scholar]
- Dey, A.; Kumar, B.; Das, B.; Ghoshal, A. Outlier detection in social networks leveraging community structure. Inform. Sci. 2023, 634, 578–586. [Google Scholar] [CrossRef]
- Du, X.; Zuo, E.; He, Z.; Yu, J. Fluctuation-based Outlier Detection. arXiv 2022, arXiv:2204.10007. [Google Scholar] [CrossRef] [PubMed]
- Safdari, H.; De Bacco, C. Anomaly detection and community detection in networks. J. Big Data 2022, 9, 1–20. [Google Scholar] [CrossRef]
- Li, J.; Dani, H.; Hu, X.; Liu, H. Radar: Residual Analysis for Anomaly Detection in Attributed Networks. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17), Melbourne, VIC, Australia, 19–25 August 2017; pp. 2152–2158. [Google Scholar]
- Sánchez, P.; Müller, E.; Irmler, O.; Böhm, K. Local context selection for outlier ranking in graphs with multiple numeric node attributes. In Proceedings of the 26th International Conference on Scientific and Statistical Database Management, Berlin, Germany, 26–28 July 2014; pp. 1–12. [Google Scholar]
- Liu, K.; Dou, Y.; Zhao, Y.; Ding, X.; Hu, X.; Zhang, R.; Ding, K.; Chen, C.; Peng, H.; Shu, K.; et al. Benchmarking node outlier detection on graphs. arXiv 2022, arXiv:2206.10071. [Google Scholar]
- Li, R.; Chen, H.; Liu, S.; Li, X.; Li, Y.; Wang, B. Incomplete mixed data-driven outlier detection based on local–global neighborhood information. Inform. Sci. 2023, 633, 204–225. [Google Scholar] [CrossRef]
- Zardi, H.; Alrajhi, H. Anomaly Discover: A New Community-based Approach for Detecting Anomalies in Social Networks. Int. J. Adv. Comput. Sci. Appl. 2023, 14, 912–920. [Google Scholar] [CrossRef]
- Gupta, M.; Gao, J.; Aggarwal, C.; Han, J. Outlier detection for temporal data: A survey. IEEE Trans. Knowl. Data Eng. 2013, 26, 2250–2267. [Google Scholar] [CrossRef]
- Alghushairy, O.; Alsini, R.; Soule, T.; Ma, X. A Review of Local Outlier Factor Algorithms for Outlier Detection in Big Data Streams. Big Data Cogn. Comput. 2021, 5, 1. [Google Scholar] [CrossRef]
- Gupta, M.; Gao, J.; Sun, Y.; Han, J. Community trend outlier detection using soft temporal pattern mining. In Proceedings of the Joint European Conference on Machine Learning And Knowledge Discovery In Databases, Bristol, UK, 23–27 September 2012; pp. 692–708. [Google Scholar]
- Gupta, M.; Gao, J.; Sun, Y.; Han, J. Integrating community matching and outlier detection for mining evolutionary community outliers. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 12–16 August 2012; pp. 859–867. [Google Scholar]
- Das, B.; Anwar, M.; Bhuiyan, M. Attribute driven temporal active local online community detection. In Proceedings of the 2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), The Hague, The Netherlands, 7–10 December 2020; pp. 619–622. [Google Scholar]
- Kumar, S.; Khan, M.; Hasanat, M.; Saudagar, A.; AlTameem, A.; AlKhathami, M. An Anomaly Detection Framework for Twitter Data. Appl. Sci. 2022, 12, 11059. [Google Scholar] [CrossRef]
- Khan, W. An exhaustive review on state-of-the-art techniques for anomaly detection on attributed networks. Turk. J. Comput. Math. Educ. (Turcomat) 2021, 12, 6707–6722. [Google Scholar]
- Friedl, L.; Jensen, D. Finding tribes: Identifying close-knit individuals from employment patterns. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, CA, USA, 12–15 August 2007; pp. 290–299. [Google Scholar]
- Akrida, E.; Gąsieniec, L.; Mertzios, G.; Spirakis, P. On temporally connected graphs of small cost. In Proceedings of the International Workshop on Approximation and Online Algorithms, Patras, Greece, 17–18 September 2015; pp. 84–96. [Google Scholar]
- Sachpenderis, N. COTILESoutlierDetection. (2023, 08). Available online: https://github.com/sachpenderis/COTILESoutlierDetection (accessed on 10 November 2023).
- Stack Exchange, Inc. Stack Exchange Data Dump. Available online: https://archive.org/details/stackexchange (accessed on 10 February 2023).
- Harper, F.; Konstan, J. The movielens datasets: History and context. ACM Trans. Interact. Intell. Syst. 2015, 5, 1–19. [Google Scholar] [CrossRef]
- Sachpenderis, N. Datasets. Available online: https://github.com/sachpenderis/datasets (accessed on 11 November 2023).
Fusion Method | Communities | Dynamic | |||||
---|---|---|---|---|---|---|---|
Paper | Early | Simultaneous | Late | Overlap | Non-Overlap | Snapshot Based | Online |
[8,9,10] | ✓ | ✓ | |||||
[11] | ✓ | ✓ | |||||
[12] | ✓ | ✓ | |||||
[13] | ✓ | ✓ | |||||
[14] | ✓ | ✓ | ✓ | ||||
[15] | ✓ | ✓ | ✓ | ||||
[16,17] | ✓ | ✓ | ✓ | ||||
COTILES | ✓ | ✓ | ✓ |
Dimensions | Outlierness | Information | Dynamic | |||||
---|---|---|---|---|---|---|---|---|
Paper | Structure | Content | Label | Degree | Local | Global | Snapshot -Based | Online |
[21] | ✓ | ✓ | ✓ | |||||
[22,23] | ✓ | ✓ | ✓ | |||||
[24] | ✓ | ✓ | ✓ | |||||
[25] | ✓ | ✓ | ✓ | |||||
[26] | ✓ | ✓ | ✓ | |||||
[27,28] | ✓ | ✓ | ✓ | ✓ | ||||
[29] | ✓ | ✓ | ✓ | ✓ | ||||
[31] | ✓ | ✓ | ✓ | ✓ | ||||
[32,33] | ✓ | ✓ | ✓ | ✓ | ||||
[34] | ✓ | ✓ | ✓ | ✓ | ||||
[35] | ✓ | ✓ | ✓ | ✓ | ||||
[37] | ✓ | ✓ | ✓ | ✓ | ✓ | |||
ext. COTILES | ✓ | ✓ | ✓ | ✓ | ✓ |
Edge labelset | Labels of edge | |
Node labelset | Labels of node u, inherited by the edges it takes part | |
Community labelset | Labels describing the contents of community C at a time | |
a | alpha weight | Leverage between structure and content during community detection |
w | w weight | Leverage between structure and content during outlier detection |
Node outlier score | Outlier score of node u | |
Outlier score threshold | Threshold for a node to be assigned as outlier |
Dataset | Edges | Nodes | Labels | Timespan (Years) |
---|---|---|---|---|
Stack Exchange | 542,120 | 87,438 | 2615 | 10 |
MovieLens | 111,621 | 6113 | 1043 | 14 |
Dataset | ttl / obs | #Coms | Mean Members | Median Labels |
---|---|---|---|---|
Stack Exchange | 30/15 | 4118 | 5.80 | 7 |
30/30 | 2848 | 6.10 | 7 | |
60/30 | 5225 | 9.25 | 8 | |
Movie Lens | 30/30 | 294 | 13.45 | 20 |
60/30 | 631 | 25.21 | 25 | |
120/60 | 656 | 32.55 | 26 |
Community Member | Not Community Member | Community Member | Not Community Member | |
---|---|---|---|---|
Low OS in | 41.48% | 58.52% | 62.04% | 37.96% |
High OS in | 17.97% | 82.03% | 27.94% | 72.06% |
Community Member | Not Community Member | Community Member | Not Community Member | |
---|---|---|---|---|
Low OS in | 66.14% | 33.86% | 67.90% | 32.10% |
High OS in | 14.86% | 85.14% | 21.10% | 78.90% |
Classifier | Structure Score | Betweenness Centrality | PageRank | Degree | Centrality + Rank + Degree | |
---|---|---|---|---|---|---|
StackExchange | ||||||
SVC | 0.5862 | 0.6872 | 0.7097 | 0.7308 | 0.7308 | |
kNN5 | 0.7094 | 0.6583 | 0.6917 | 0.7166 | 0.7352 | |
DT | 0.6838 | 0.6872 | 0.7097 | 0.7379 | 0.7205 | |
+ Content Score | ||||||
SVC | 0.7779 | 0.7454 | 0.7454 | 0.7419 | 0.7419 | |
kNN5 | 0.7939 | 0.7712 | 0.7939 | 0.7731 | 0.7607 | |
DT | 0.7565 | 0.7844 | 0.7948 | 0.7589 | 0.7744 | |
MovieLens | ||||||
SVC | 0.6231 | 0.6266 | 0.7177 | 0.7863 | 0.7863 | |
kNN5 | 0.6916 | 0.6266 | 0.6694 | 0.7562 | 0.7821 | |
DT | 0.6803 | 0.6215 | 0.7056 | 0.7863 | 0.7738 | |
+ Content Score | ||||||
SVC | 0.8701 | 0.8149 | 0.8149 | 0.8508 | 0.8508 | |
kNN5 | 0.8427 | 0.8335 | 0.8297 | 0.8347 | 0.8105 | |
DT | 0.8586 | 0.8335 | 0.8177 | 0.8056 | 0.8245 |
Betweenness Centrality + PageRank | Outlier Score | |||
---|---|---|---|---|
Chain length: 3 | Accuracy | 0.7054 | 0.7455 | |
F-score | 0.6735 | 0.6627 | ||
Avg Precision | “TRUE” | 0.68 | 0.43 | |
“FALSE" | 0.72 | 0.87 | ||
Avg Recall | “TRUE” | 0.49 | 0.58 | |
“FALSE” | 0.68 | 0.79 | ||
Chain Length: 4 | Accuracy | 0.7246 | 0.7591 | |
F-score | 0.6935 | 0.7065 | ||
Avg Precision | “TRUE” | 0.78 | 0.55 | |
“FALSE” | 0.71 | 0.85 | ||
Avg Recall | “TRUE” | 0.48 | 0.62 | |
“FALSE” | 0.90 | 0.81 |
Betweenness Centrality + PageRank | Outlier Score | |||
---|---|---|---|---|
Chain length: 3 | Accuracy | 0.6129 | 0.7234 | |
F-score | 0.4946 | 0.6965 | ||
Avg Precision | “TRUE” | 0.16 | 0.48 | |
“FALSE” | 0.92 | 0.89 | ||
Avg Recall | “TRUE” | 0.57 | 0.77 | |
“FALSE” | 0.62 | 0.71 | ||
Chain Length: 4 | Accuracy | 0.8056 | 0.9167 | |
F-score | 0.8017 | 0.9150 | ||
Avg Precision | “TRUE” | 0.80 | 0.93 | |
“FALSE” | 0.81 | 0.90 | ||
Avg Recall | “TRUE” | 0.75 | 0.88 | |
“FALSE” | 0.85 | 0.95 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sachpenderis, N.; Koloniari, G. Outlier Detection and Prediction in Evolving Communities. Appl. Sci. 2024, 14, 2356. https://doi.org/10.3390/app14062356
Sachpenderis N, Koloniari G. Outlier Detection and Prediction in Evolving Communities. Applied Sciences. 2024; 14(6):2356. https://doi.org/10.3390/app14062356
Chicago/Turabian StyleSachpenderis, Nikolaos, and Georgia Koloniari. 2024. "Outlier Detection and Prediction in Evolving Communities" Applied Sciences 14, no. 6: 2356. https://doi.org/10.3390/app14062356