SentenceLDA- and ConNetClus-Based Heterogeneous Academic Network Analysis for Publication Ranking
Abstract
:1. Introduction
2. Literature Review
2.1. Publication Ranking
2.2. Factors for Evaluating Scientific Publications
2.3. Heterogeneous Network
3. Methodology
3.1. SentenceLDA for Training Topics based on the Abstract
3.2. Construction of the Heterogeneous Academic Network
- A topic that follows the multinomial distribution is generated by the author ( follows a prior distribution);
- Word ;
- Venue ;
- Keywords .
3.3. Ranking Algorithm for Heterogeneous Academic Networks
4. Experiments
4.1. Data Description
4.2. Data Processing
4.3. Evaluation Methods
4.4. Evaluation Results
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Vom Brocke, J.; Simons, A.; Riemer, K.; Niehaves, B.; Plattfaut, R.; Cleven, A. Standing on the shoulders of giants: Challenges and recommendations of literature search in information systems research. Commun. Assoc. Inf. Syst. 2015, 37, 9. [Google Scholar] [CrossRef]
- Eveleth, R. Academics Write Papers Arguing Over How Many People Read (and Cite) Their Papers. Available online: https://www.smithsonianmag.com/smart-news/half-academic-studies-are-never-read-more-three-people-180950222/?no-ist (accessed on 1 June 2021).
- Garfield, E. Citation analysis as a tool in journal evaluation: Journals can be ranked by frequency and impact of citations for science policy studies. Science 1972, 178, 471–479. [Google Scholar] [CrossRef] [PubMed]
- Garfield, E. Journal impact factor: A brief review. CMAJ 1999, 161, 979–980. [Google Scholar] [PubMed]
- Zhang, J.; Liu, X. Citation Oriented AuthorRank for Scientific Publication Ranking. Appl. Sci. 2022, 12, 4345. [Google Scholar] [CrossRef]
- Page, L.; Brin, S.; Motwani, R.; Winograd, T. The PageRank Citation Ranking: Bringing Order to the Web; Stanford InfoLab: Stanford, CA, USA, 1999. [Google Scholar]
- MacRoberts, M.H.; MacRoberts, B.R. Problems of citation analysis: A critical review. J. Am. Soc. Inf. Sci. 1989, 40, 342–349. [Google Scholar] [CrossRef]
- Liu, X.; Zhang, J.; Guo, C. Full-text citation analysis: Enhancing bibliometric and scientific publication ranking. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, Maui, HI, USA, 29 October–2 November 2012; pp. 1975–1979. [Google Scholar]
- Cronin, B. Metatheorizing citation. Scientometrics 1998, 43, 45–55. [Google Scholar] [CrossRef]
- Egghe, L.; Rousseau, R.; Van Hooydonk, G. Methods for accrediting publications to authors or countries: Consequences for evaluation studies. J. Am. Soc. Inf. Sci. 2000, 51, 145–157. [Google Scholar] [CrossRef]
- Abrishami, A.; Aliakbary, S. Predicting citation counts based on deep neural network learning techniques. J. Informetr. 2019, 13, 485–499. [Google Scholar] [CrossRef] [Green Version]
- Small, H.; Boyack, K.W.; Klavans, R. Citations and certainty: A new interpretation of citation counts. Scientometrics 2019, 118, 1079–1092. [Google Scholar] [CrossRef]
- Larson, R.R. Bibliometrics of the World Wide Web: An exploratory analysis of the intellectual structure of cyberspace. In Proceedings of the Annual Meeting—American Society for Information Science, Baltimore, MD, USA, 19–20 October 1996; Volume 33, pp. 71–78. [Google Scholar]
- Gibson, D.; Kleinberg, J.; Raghavan, P. Inferring web communities from link topology. In Proceedings of the Ninth ACM Conference on Hypertext and Hypermedia: Links, Objects, Time and Space—Structure in Hypermedia Systems: Links, Objects, Time and Space—Structure in Hypermedia Systems, Pittsburgh, PA, USA, 20–24 June 1998; pp. 225–234. [Google Scholar]
- Haveliwala, T.H. Topic-sensitive pagerank: A context-sensitive ranking algorithm for web search. IEEE Trans. Knowl. Data Eng. 2003, 15, 784–796. [Google Scholar] [CrossRef] [Green Version]
- Qiao, H.; Wang, Y.; Liang, Y. A value evaluation method for papers based on improved PageRank algorithm. In Proceedings of the 2012 2nd International Conference on Computer Science and Network Technology, Changchun, China, 29–31 December 2012; pp. 2201–2205. [Google Scholar]
- Hasan, F.; Ze, K.K.; Razali, R.; Buhari, A.; Tadiwa, E. An improved PageRank algorithm based on a hybrid approach. Sci. Proc. Ser. 2020, 2, 17–21. [Google Scholar] [CrossRef]
- Chauhan, U.; Shah, A. Topic modeling using latent Dirichlet allocation: A survey. ACM Comput. Surv. (CSUR) 2021, 54, 1–35. [Google Scholar] [CrossRef]
- Tao, M.; Yang, X.; Gu, G.; Li, B. Paper recommend based on LDA and pagerank. In International Conference on Artificial Intelligence and Security; Springer: Singapore, 2020; pp. 571–584. [Google Scholar]
- Zhang, Y.; Ma, J.; Wang, Z.; Chen, B.; Yu, Y. Collective topical PageRank: A model to evaluate the topic-dependent academic impact of scientific papers. Scientometrics 2018, 114, 1345–1372. [Google Scholar] [CrossRef]
- Kanellos, I.; Vergoulis, T.; Sacharidis, D.; Dalamagas, T.; Vassiliou, Y. Impact-based ranking of scientific publications: A survey and experimental evaluation. IEEE Trans. Knowl. Data Eng. 2019, 33, 1567–1584. [Google Scholar] [CrossRef]
- Brembs, B.; Button, K.; Munafò, M. Deep impact: Unintended consequences of journal rank. Front. Hum. Neurosci. 2013, 7, 291. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Bornmann, L.; Daniel, H.D. What do we know about the h index? J. Am. Soc. Inf. Sci. Technol. 2007, 58, 1381–1385. [Google Scholar] [CrossRef]
- Hu, G.; Wang, L.; Ni, R.; Liu, W. Which h-index? An exploration within the Web of Science. Scientometrics 2020, 123, 1225–1233. [Google Scholar] [CrossRef]
- Beel, J.; Gipp, B. Google Scholar’s ranking algorithm: The impact of citation counts (an empirical study). In Proceedings of the 2009 Third International Conference on Research Challenges in Information Science, Fez, Morocco, 22–24 April 2009; pp. 439–446. [Google Scholar]
- Gazni, A.; Didegah, F. The relationship between authors’ bibliographic coupling and citation exchange: Analyzing disciplinary differences. Scientometrics 2016, 107, 609–626. [Google Scholar] [CrossRef]
- Son, J.; Kim, S.B. Academic paper recommender system using multilevel simultaneous citation networks. Decis. Support Syst. 2018, 105, 24–33. [Google Scholar] [CrossRef]
- Zhao, M.; Yan, E.; Li, K. Data set mentions and citations: A content analysis of full-text publications. J. Assoc. Inf. Sci. Technol. 2018, 69, 32–46. [Google Scholar] [CrossRef]
- Liu, X.; Zhang, J.; Guo, C. Full-text citation analysis: A new method to enhance scholarly networks. J. Am. Soc. Inf. Sci. Technol. 2013, 64, 1852–1863. [Google Scholar] [CrossRef]
- Randy, H.K.; Eric, A.B. The Case for Wireless Overlay Networks; Springer: Boston, MA, USA, 1996. [Google Scholar]
- Shi, C.; Zhang, Z.; Ji, Y.; Wang, W.; Yu, P.S.; Shi, Z. SemRec: A personalized semantic recommendation method based on weighted heterogeneous information networks. World Wide Web 2019, 22, 153–184. [Google Scholar] [CrossRef]
- Wang, X.; Zhang, L.; Wang, Y.; Jie, X. 3D model features co-clustering based on heterogeneous semantic network. In Proceedings of the 2014 4th IEEE International Conference on Information Science and Technology (ICIST), Shenzhen, China, 26–28 April 2014. [Google Scholar]
- Shi, C.; Liu, J.; Zhuang, F.; Yu, P.S.; Wu, B. Integrating heterogeneous information via flexible regularization framework for recommendation. Knowl. Inf. Syst. 2016, 49, 835–859. [Google Scholar] [CrossRef] [Green Version]
- Mu, L.W.; Peng, X.B.; Huang, L. Abnormal Data Detection Algorithm in Heterogeneous Complex Information Network. Comput. Sci. 2015, 42, 34–137. [Google Scholar]
- Zhang, M.; Hu, H.; He, Z.; Wang, W. Top-k similarity search in heterogeneous information networks with x-star network schema. Expert Syst. Appl. 2015, 42, 699–712. [Google Scholar] [CrossRef]
- Yang, Y.; Xie, G. Efficient identification of node importance in social networks. Inf. Process. Manag. 2016, 52, 911–922. [Google Scholar] [CrossRef]
- Sun, Y.; Han, J.; Zhao, P.; Yin, Z.; Cheng, H.; Wu, T. Rankclus: Integrating clustering with ranking for heterogeneous information network analysis. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, Saint Petersburg, Russia, 24–26 March 2009; pp. 565–576. [Google Scholar]
- Pio, G.; Serafino, F.; Malerba, D.; Ceci, M. Multi-type clustering and classification from heterogeneous networks. Inf. Sci. 2018, 425, 107–126. [Google Scholar] [CrossRef]
- Han, J.; Sun, Y.; Yan, X.; Yu, P.S. Mining heterogeneous information networks. In Proceedings of the Tutorial at the 2010 ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’10), Washington, DC, USA, 24–28 July 2010. [Google Scholar]
- Balikas, G.; Amini, M.R.; Clausel, M. On a topic model for sentences. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, Pisa, Italy, 17–21 July 2016; pp. 921–924. [Google Scholar]
- Hwang, W.; Hajishirzi, H.; Ostendorf, M.; Wu, W. Aligning sentences from standard wikipedia to simple Wikipedia. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, CO, USA, 31 May–5 June 2015; 2015; pp. 211–217. [Google Scholar]
- Available online: http://snap.stanford.edu/data/ (accessed on 15 June 2019).
- Available online: https://github.com/baidu/Familia (accessed on 11 May 2018).
- Jrvelin, K.; Keklinen, J. Cumulated Gain-Based Evaluation of IR Techniques. ACM Trans. Inf. Syst. 2002, 20, 422–446. [Google Scholar] [CrossRef]
- Bellingeri, M.; Bevacqua, D.; Scotognella, F.; Alfieri, R.; Nguyen, Q.; Montepietra, D.; Cassi, D. Link and node removal in real social networks: A review. Front. Phys. 2020, 8, 228. [Google Scholar] [CrossRef]
- Nguyen, Q.; Nguyen, N.K.K.; Cassi, D.; Bellingeri, M. New Betweenness Centrality Node Attack Strategies for Real-World Complex Weighted Networks. Complexity 2021, 2021, 1–17. [Google Scholar] [CrossRef]
Notation | Interpretation |
---|---|
the homogeneous network | |
V | the vertex set, , N is the total number of vertexes |
E | the edge set, , M is the total number of edges |
the heterogeneous network | |
C | the vertex-type set, , S is the total number of vertex types |
the type of vertex is | |
the edge between and , the type of vertex is …and is |
No. | Vertex1 Type | Vertex2 Type | Edge |
---|---|---|---|
1 | Publication | Publication | Citation relationship |
2 | Author | Author | Co-author relationship |
3 | Topic | Topic | Co-occurrence relationship |
4 | Publication | Venue | Paper published in venue |
5 | Publication | Author | Paper created by author |
6 | Publication | Topic | Paper includes topic |
Method | Object | MAP@10 | @50 | @100 | Method | Object | MAP@10 | @50 | @100 |
---|---|---|---|---|---|---|---|---|---|
BM25 | Publication | 0.176 | 0.114 | 0.108 | BM25 + ConNetClus | Publication | 0.246 | 0.124 | 0.118 |
Venue | 0.163 | 0.102 | 0.094 | Venue | 0.228 | 0.119 | 0.104 | ||
Keyword | 0.191 | 0.135 | 0.114 | Keyword | 0.269 | 0.1149 | 0.132 | ||
Author | 0.225 | 0.159 | 0.143 | Author | 0.319 | 0.204 | 0.183 | ||
AVE. | 0.1888 | 0.1275 | 0.1148 | AVE. | 0.2655 | 0.149 | 0.1343 | ||
LDA + PageRank | Publication | 0.234 | 0.172 | 0.171 | SenLDA + ConNetClus | Publication | 0.315 | 0.274 | 0.246 |
Venue | 0.293 | 0.257 | 0.225 | ||||||
Keyword | 0.321 | 0.279 | 0.245 | ||||||
PLSA | Publication | 0.109 | 0.094 | 0.082 | Author | 0.356 | 0.3 | 0.271 | |
AVE. | 0.3213 | 0.2775 | 0.2468 |
Method | Object | NDCG@10 | @50 | @100 | Method | Object | NDCG@10 | @50 | @100 |
---|---|---|---|---|---|---|---|---|---|
BM25 | Publication | 0.067 | 0.083 | 0.097 | BM25 + ConNetClus | Publication | 0.068 | 0.083 | 0.096 |
Venue | 0.041 | 0.062 | 0.071 | Venue | 0.052 | 0.071 | 0.087 | ||
Keyword | 0.083 | 0.104 | 0.11 | Keyword | 0.09 | 0.111 | 0.129 | ||
Author | 0.088 | 0.103 | 0.119 | Author | 0.091 | 0.1 | 0.138 | ||
AVE. | 0.0698 | 0.088 | 0.0993 | AVE. | 0.0753 | 0.0913 | 0.1125 | ||
LDA + PageRank | Publication | 0.088 | 0.097 | 0.118 | SenLDA + ConNetClus | Publication | 0.098 | 0.138 | 0.153 |
Venue | 0.088 | 0.127 | 0.149 | ||||||
PLSA | Publication | 0.071 | 0.086 | 0.1 | Keyword | 0.102 | 0.146 | 0.171 | |
Author | 0.131 | 0.17 | 0.192 | ||||||
AVE. | 0.1048 | 0.1453 | 0.1663 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, J.; Jin, B.; Sha, J.; Chen, Y.; Zhang, Y. SentenceLDA- and ConNetClus-Based Heterogeneous Academic Network Analysis for Publication Ranking. Algorithms 2022, 15, 159. https://doi.org/10.3390/a15050159
Zhang J, Jin B, Sha J, Chen Y, Zhang Y. SentenceLDA- and ConNetClus-Based Heterogeneous Academic Network Analysis for Publication Ranking. Algorithms. 2022; 15(5):159. https://doi.org/10.3390/a15050159
Chicago/Turabian StyleZhang, Jinsong, Bao Jin, Junyi Sha, Yan Chen, and Yijin Zhang. 2022. "SentenceLDA- and ConNetClus-Based Heterogeneous Academic Network Analysis for Publication Ranking" Algorithms 15, no. 5: 159. https://doi.org/10.3390/a15050159
APA StyleZhang, J., Jin, B., Sha, J., Chen, Y., & Zhang, Y. (2022). SentenceLDA- and ConNetClus-Based Heterogeneous Academic Network Analysis for Publication Ranking. Algorithms, 15(5), 159. https://doi.org/10.3390/a15050159