Algorithm for the Accelerated Calculation of Conceptual Distances in Large Knowledge Graphs
Abstract
:1. Introduction
2. Related Work
2.1. Semantic Similarity
- Assign to each relation as an arbitrary conceptual weight defined in each direction of the relationship.
- A directed and weighted graph (conceptual graph) is created, where the vertices are the concepts contained in the conceptualization, and the edges are the relationships that connect each pair of concepts. Each edge has its counterpart in the opposite direction with a different weighting. Finally, to compute the weighting, the generality of each concept is necessary.
- Calculate the APSP length between each pair of vertices, i.e., diffuse the conceptual distance to all concepts.
2.2. All Pairs Shortest Path Problem (APSP)
- Addition comparison model: Assume that the inputs are real weighted graphs, where the only operations allowed on real data are comparison and addition.
- Random access machine (RAM) model: Shortest path algorithms assume that the inputs are weighted graphs of integers manipulated by addition, subtraction, comparison, shift, and various logical operations [50] (the most commonly used model).
Reference | Year | |
---|---|---|
Dijkstra/Floyd–Warshall | 1959/1962 | |
Fredman (1976) [55] | 1976 | |
Takaoka (1992) [56] | 1991 | |
Dobosiewicz (1990) [57] | 1990 | |
Han (2004) [58] | 2004 | |
Takaoka (2004) [59] | 2004 | |
Takaoka (2005) [60] | 2005 | |
Zwick (2004) [61] | 2004 | |
Chan (2008) [62] | 2005 | |
Han (2006) [63] | 2006 | |
Chan (2010) [64] | 2007 | |
Williams (2018) [65] | 2014 |
3. Materials and Methods
3.1. The Pruned Dijkstra Algorithm
Algorithm 1: The pruned Dijkstra |
Algorithm 2: APSP_PDijkstra |
3.2. The Sketch-Based Algorithms
Algorithm 3: Sketches_DIS-C: offline sampling |
Algorithm 4: Sketches_DIS-C: k sketches offline |
Algorithm 5: Sketches_DIS-C: k sketches online |
Algorithm 6: Sketches_DIS-C: k sketches online APSP |
4. Results
4.1. The Datasets
4.2. Evaluation Metrics
4.3. Generated Graphs and Performance
4.4. Conceptual Distance Results
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. The Pruned Dijkstra
Appendix B. The Sketch-Based Algorithms
Appendix C. Pearson and Spearman Correlation Results
Algorithm | MC | RG | PS | Agirre | SimLex | MTurk771 | MTurk287 | WSRel | Rel | SCWS | All | Avg. |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Dijkstra | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
Pruned Dijkstra | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.997 | 1.000 |
Sketches k = 1 | 0.940 | 0.853 | 0.862 | 0.733 | 0.762 | 0.605 | 0.864 | 0.720 | 0.841 | 0.815 | 0.745 | 0.795 |
Sketches k = 2 | 0.977 | 0.947 | 0.961 | 0.870 | 0.813 | 0.731 | 0.888 | 0.842 | 0.922 | 0.898 | 0.802 | 0.877 |
Sketches k = 5 | 0.985 | 0.984 | 0.978 | 0.923 | 0.897 | 0.881 | 0.937 | 0.921 | 0.953 | 0.944 | 0.899 | 0.937 |
Sketches k = 10 | 0.991 | 0.979 | 0.995 | 0.961 | 0.955 | 0.931 | 0.967 | 0.959 | 0.976 | 0.971 | 0.942 | 0.966 |
Sketches k = 15 | 0.993 | 0.996 | 0.991 | 0.980 | 0.972 | 0.959 | 0.971 | 0.971 | 0.983 | 0.982 | 0.957 | 0.978 |
Sketches k = 20 | 0.996 | 0.992 | 0.994 | 0.983 | 0.974 | 0.971 | 0.979 | 0.975 | 0.989 | 0.986 | 0.972 | 0.983 |
Algorithm | MC | RG | PS | Agirre | SimLex | MTurk771 | MTurk287 | WSRel | Rel | SCWS | All | Avg. |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Dijkstra | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
Sketches k = 1 | 0.897 | 0.777 | 0.877 | 0.734 | 0.722 | 0.562 | 0.862 | 0.757 | 0.857 | 0.793 | 0.733 | 0.779 |
Sketches k = 2 | 0.932 | 0.961 | 0.956 | 0.893 | 0.806 | 0.725 | 0.890 | 0.827 | 0.879 | 0.896 | 0.795 | 0.869 |
Sketches k = 5 | 0.979 | 0.984 | 0.981 | 0.909 | 0.893 | 0.865 | 0.946 | 0.936 | 0.947 | 0.945 | 0.897 | 0.935 |
Sketches k = 10 | 0.987 | 0.991 | 0.992 | 0.963 | 0.952 | 0.944 | 0.960 | 0.945 | 0.979 | 0.967 | 0.939 | 0.965 |
Sketches k = 15 | 0.992 | 0.998 | 0.994 | 0.968 | 0.971 | 0.958 | 0.975 | 0.966 | 0.978 | 0.980 | 0.960 | 0.976 |
Sketches k = 20 | 0.995 | 0.996 | 0.997 | 0.979 | 0.971 | 0.972 | 0.980 | 0.983 | 0.984 | 0.988 | 0.972 | 0.983 |
Algorithm | MC | RG | PS | Agirre | SimLex | MTurk771 | MTurk287 | WSRel | Rel | SCWS | All | Avg. |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Dijkstra | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
Pruned Dijkstra | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.968 | 0.997 |
Sketches k = 1 | 0.938 | 0.876 | 0.858 | 0.702 | 0.742 | 0.605 | 0.843 | 0.671 | 0.799 | 0.831 | 0.736 | 0.782 |
Sketches k = 2 | 0.977 | 0.944 | 0.959 | 0.823 | 0.804 | 0.729 | 0.855 | 0.821 | 0.900 | 0.906 | 0.807 | 0.866 |
Sketches k = 5 | 0.978 | 0.978 | 0.984 | 0.911 | 0.887 | 0.876 | 0.925 | 0.907 | 0.950 | 0.954 | 0.897 | 0.931 |
Sketches k = 10 | 0.987 | 0.981 | 0.991 | 0.941 | 0.955 | 0.931 | 0.963 | 0.946 | 0.973 | 0.978 | 0.942 | 0.962 |
Sketches k = 15 | 0.987 | 0.993 | 0.985 | 0.968 | 0.971 | 0.969 | 0.963 | 0.960 | 0.980 | 0.986 | 0.954 | 0.974 |
Sketches k = 20 | 0.989 | 0.988 | 0.991 | 0.973 | 0.979 | 0.978 | 0.976 | 0.961 | 0.985 | 0.990 | 0.965 | 0.980 |
Algorithm | MC | RG | PS | Agirre | SimLex | MTurk771 | MTurk287 | WSRel | Rel | SCWS | All | Avg. |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Dijkstra | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
Pruned Dijkstra | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.969 | 0.997 |
Sketches k = 1 | 0.875 | 0.826 | 0.843 | 0.735 | 0.700 | 0.573 | 0.823 | 0.693 | 0.817 | 0.807 | 0.732 | 0.766 |
Sketches k = 2 | 0.925 | 0.921 | 0.932 | 0.859 | 0.816 | 0.733 | 0.876 | 0.764 | 0.835 | 0.902 | 0.803 | 0.851 |
Sketches k = 5 | 0.972 | 0.970 | 0.961 | 0.898 | 0.884 | 0.866 | 0.920 | 0.906 | 0.936 | 0.953 | 0.896 | 0.924 |
Sketches k = 10 | 0.991 | 0.980 | 0.985 | 0.953 | 0.950 | 0.947 | 0.945 | 0.928 | 0.969 | 0.974 | 0.938 | 0.960 |
Sketches k = 15 | 0.991 | 0.990 | 0.981 | 0.959 | 0.974 | 0.960 | 0.961 | 0.952 | 0.969 | 0.984 | 0.958 | 0.971 |
Sketches k = 20 | 0.996 | 0.985 | 0.990 | 0.973 | 0.971 | 0.977 | 0.971 | 0.975 | 0.982 | 0.991 | 0.966 | 0.980 |
References
- Mejia Sanchez-Bermejo, A. Similitud Semantica Entre Conceptos de Wikipedia. Bachelor’s Thesis, Universidad Carlos III de Madrid, Getafe, Madrid, Spain, 2013. [Google Scholar]
- Goldstone, R.L. Similarity, interactive activation, and mapping. J. Exp. Psychol. Learn. Mem. Cogn. 1994, 20, 3. [Google Scholar] [CrossRef]
- Quintero, R.; Torres-Ruiz, M.; Saldaña-Pérez, M.; Guzmán Sánchez-Mejorada, C.; Mata-Rivera, F. A Conceptual Graph-Based Method to Compute Information Content. Mathematics 2023, 11, 3972. [Google Scholar] [CrossRef]
- Chen, X.; Wang, Z.; Hua, Q.; Shang, W.L.; Luo, Q.; Yu, K. AI-empowered speed extraction via port-like videos for vehicular trajectory analysis. IEEE Trans. Intell. Transp. Syst. 2022, 24, 4541–4552. [Google Scholar] [CrossRef]
- Quintero, R.; Torres-Ruiz, M.; Menchaca-Méndez, R.; Moreno-Armendariz, M.A.; Guzmán, G.; Moreno-Ibarra, M. DIS-C: Conceptual distance in ontologies, a graph-based approach. Knowl. Inf. Syst. 2019, 59, 33–65. [Google Scholar] [CrossRef]
- Dreyfus, S.E. An appraisal of some shortest-path algorithms. Oper. Res. 1969, 17, 395–412. [Google Scholar] [CrossRef]
- Gallo, G.; Pallottino, S. Shortest path algorithms. Ann. Oper. Res. 1988, 13, 1–79. [Google Scholar] [CrossRef]
- Magzhan, K.; Jani, H.M. A review and evaluations of shortest path algorithms. Int. J. Sci. Technol. Res. 2013, 2, 99–104. [Google Scholar]
- Madkour, A.; Aref, W.G.; Rehman, F.U.; Rahman, M.A.; Basalamah, S. A survey of shortest-path algorithms. arXiv 2017, arXiv:1705.02044. [Google Scholar]
- Zhang, F.; Liu, J. A new shortest path algorithm for massive spatial data based on Dijkstra algorithm. J. LiaoNing Technol. Univ. Sci. Ed. 2009, 28, 554–557. [Google Scholar]
- Chakaravarthy, V.T.; Checconi, F.; Murali, P.; Petrini, F.; Sabharwal, Y. Scalable single source shortest path algorithms for massively parallel systems. IEEE Trans. Parallel Distrib. Syst. 2016, 28, 2031–2045. [Google Scholar] [CrossRef]
- Yang, Y.; Li, Z.; Wang, X.; Hu, Q. Finding the shortest path with vertex constraint over large graphs. Complexity 2019, 2019, 8728245. [Google Scholar] [CrossRef]
- Liu, J.; Pan, Y.; Hu, Q.; Li, A. Navigating a Shortest Path with High Probability in Massive Complex Networks. In Proceedings of the Analysis of Experimental Algorithms: Special Event, SEA2 2019, Kalamata, Greece, 24–29 June 2019; Revised Selected Papers. Springer: Berlin/Heidelberg, Germany, 2019; pp. 82–97. [Google Scholar]
- Ma, X.; Huo, E.; Yu, H.; Li, H. Mining truck platooning patterns through massive trajectory data. Knowl. Based Syst. 2021, 221, 106972. [Google Scholar] [CrossRef]
- Li, S.; Sun, X.; Xiao, Y.H.; Guo, X.; Zhang, J. Noncoherent space-time coding for correlated massive MIMO channel with Riemannian distance. Digit. Signal Process. 2023, 133, 103876. [Google Scholar] [CrossRef]
- Meersman, R.A. Semantic ontology tools in IS design. In Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 1999; pp. 30–45. [Google Scholar] [CrossRef]
- Bondy, J.A. Graph Theory with Applications; Elsevier Science Publishing Co., Inc.: New York, NY, USA, 1982. [Google Scholar]
- West, D.B. Introduction to Graph Theory, 2nd ed.; Prentice Hall Inc.: Upper Saddle River, NJ, USA, 2001. [Google Scholar]
- Bollobás, B. Modern Fraph Theory; Springer Science & Business Media: Berlin/Heidelberg, Germany, 1998; Volume 184. [Google Scholar]
- Gross, J.L.; Yellen, J.; Anderson, M. Graph Theory and Its Applications; Chapman and Hall/CRC: London, UK, 2018. [Google Scholar]
- Juel Vang, K. Ethics of Google’s Knowledge Graph: Some considerations. J. Inf. Commun. Ethics Soc. 2013, 11, 245–260. [Google Scholar] [CrossRef]
- Ehrlinger, L.; Wöß, W. Towards a definition of knowledge graphs. SEMANTiCS (Posters, Demos, SuCCESS) 2016, 48, 2. [Google Scholar]
- Fensel, D.; Şimşek, U.; Angele, K.; Huaman, E.; Kärle, E.; Panasiuk, O.; Toma, I.; Umbrich, J.; Wahler, A.; Fensel, D.; et al. Introduction: What is a knowledge graph? In Knowledge Graphs: Methodology, Tools and Selected Use Cases; Springer: Berlin/Heidelberg, Germany, 2020; pp. 1–10. [Google Scholar]
- Zou, X. A survey on application of knowledge graph. In Proceedings of the Journal of Physics: Conference Series; IOP Publishing: Bristol, UK, 2020; Volume 1487, p. 012016. [Google Scholar]
- Pujara, J.; Miao, H.; Getoor, L.; Cohen, W. Knowledge graph identification. In Proceedings of the Semantic Web—ISWC 2013: 12th International Semantic Web Conference, Sydney, NSW, Australia, 21–25 October 2013; Proceedings, Part I 12. Springer: Berlin/Heidelberg, Germany, 2013; pp. 542–557. [Google Scholar]
- Sanchez, D.; Batet, M.; Isern, D.; Valls, A. Ontology-based semantic similarity: A new feature-based approach. Expert Syst. Appl. 2012, 39, 7718–7728. [Google Scholar] [CrossRef]
- Rada, R.; Mili, H.; Bicknell, E.; Blettner, M. Development and application of a metric on semantic nets. IEEE Trans. Syst. Man Cybern. 1989, 19, 17–30. [Google Scholar] [CrossRef]
- Wu, Z.; Palmer, M. Verb semantics and lexical selection. arXiv 1994, arXiv:cmp-lg/9406033. [Google Scholar]
- Hirst, G.; Stonge, D. Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms. Wordnet Electron. Lex. Database 1995, 305, 305–332. [Google Scholar]
- Li, Y.; Bandar, Z.A.; Mclean, D. An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans. Knowl. Data Eng. 2003, 15, 871–882. [Google Scholar] [CrossRef]
- Shenoy, M.K.; Shet, K.; Acharya, U.D. A new similarity measure for taxonomy based on edge counting. Int. J. Web Semant. Technol. 2012, 3, 23. [Google Scholar] [CrossRef]
- Tversky, A. Features of similarity. Psychol. Rev. 1977, 84, 327. [Google Scholar] [CrossRef]
- Lesk, M. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In Proceedings of the 5th Annual International Conference on Systems Documentation, Toronto, ON, Canada, 8–11 June 1986; 1986; pp. 24–26. [Google Scholar]
- Banerjee, S.; Pedersen, T. Extended gloss overlaps as a measure of semantic relatedness. In Proceedings of the 18th International Joint Conference on Artificial Intelligence, IJCAI, Acapulco, Mexico, 9–15 August 2003; Volume 3, pp. 805–810. [Google Scholar]
- Jiang, Y.; Zhang, X.; Tang, Y.; Nie, R. Feature-based approaches to semantic similarity assessment of concepts using Wikipedia. Inf. Process. Manag. 2015, 51, 215–234. [Google Scholar] [CrossRef]
- Resnik, P. Using Information Content to Evaluate Semantic Similarity in a Taxonomy. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, IJCAI, Montreal, QC, Canada, 20–25 August 1995; Volume 1, pp. 449–453. [Google Scholar]
- Jiang, J.J.; Conrath, D.W. Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings of the 10th Research on Computational Linguistics International Conference, ROCLING X, Taipei, Taiwan, 25–27 August 1997; pp. 19–33. [Google Scholar]
- Gao, J.B.; Zhang, B.W.; Chen, X.H. A WordNet-based semantic similarity measurement combining edge-counting and information content theory. Eng. Appl. Artif. Intell. 2015, 39, 80–88. [Google Scholar] [CrossRef]
- Jiang, Y.; Bai, W.; Zhang, X.; Hu, J. Wikipedia-based information content and semantic similarity computation. Inf. Process. Manag. 2017, 53, 248–265. [Google Scholar] [CrossRef]
- Zhou, Z.; Wang, Y.; Gu, J. A New Model of Information Content for Semantic Similarity in WordNet. In Proceedings of the 2008 Second International Conference on Future Generation Communication and Networking Symposia, Sanya, China, 13–15 December 2008; Volume 3, pp. 85–89. [Google Scholar] [CrossRef]
- Sanchez, D.; Batet, M.; Isern, D. Ontology-based information content computation. Knowl. Based Syst. 2011, 24, 297–303. [Google Scholar] [CrossRef]
- Seidel, R. On the All-Pairs-Shortest-Path Problem. In Proceedings of the Twenty-Fourth Annual ACM Symposium on Theory of Computing, STOC’92, New York, NY, USA, 4–6 May 1992; pp. 745–749. [Google Scholar] [CrossRef]
- Warshall, S. A Theorem on Boolean Matrices. J. ACM 1962, 9, 11–12. [Google Scholar] [CrossRef]
- Singh, P.; Kumar, R.; Pandey, V. An Efficient Algorithm for All Pair Shortest Paths. Int. J. Comput. Electr. Eng. 2010, 2, 984–991. [Google Scholar] [CrossRef]
- Zwick, U. All Pairs Shortest Paths Using Bridging Sets and Rectangular Matrix Multiplication. J. ACM 2002, 49, 289–317. [Google Scholar] [CrossRef]
- D’alberto, P.; Nicolau, A. R-Kleene: A high-performance divide-and-conquer algorithm for the all-pair shortest path for densely connected networks. Algorithmica 2007, 47, 203–213. [Google Scholar] [CrossRef]
- Islam, M.T.; Thulasiraman, P.; Thulasiram, R.K. A parallel ant colony optimization algorithm for all-pair routing in MANETs. In Proceedings of the International Parallel and Distributed Processing Symposium, Nice, France, 22–26 May 2003. [Google Scholar]
- Katz, G.J.; Kider, J.T. All-Pairs Shortest-Paths for Large Graphs on the GPU. In Proceedings of the EUROGRAPHICS/ACM SIGGRAPH Conference on Graphics Hardware 2008, Sarajevo, Bosnia and Herzegovina, 20–21 June 2008; pp. 47–55. [Google Scholar]
- Reddy, K.R. A survey of the all-pairs shortest paths problem and its variants in graphs. Acta Univ. Sapientiae Inform. 2016, 8, 16–40. [Google Scholar] [CrossRef]
- Aho, A.V.; Hopcroft, J.E. The Design and Analysis of Computer Algorithms; Pearson Education India: Chennai, India, 1974. [Google Scholar]
- Attiratanasunthron, N.; Fakcharoenphol, J. A running time analysis of an ant colony optimization algorithm for shortest paths in directed acyclic graphs. Inf. Process. Lett. 2008, 105, 88–92. [Google Scholar] [CrossRef]
- Neumann, F.; Witt, C. Runtime analysis of a simple ant colony optimization algorithm. In Proceedings of the International Symposium on Algorithms and Computation, Kolkata, India, 18–20 December 2006; Springer: Berlin/Heidelberg, Germany, 2006; pp. 618–627. [Google Scholar]
- Di Caro, G.; Dorigo, M. AntNet: Distributed stigmergetic control for communications networks. J. Artif. Intell. Res. 1998, 9, 317–365. [Google Scholar] [CrossRef]
- Horoba, C.; Sudholt, D. Running time analysis of ACO systems for shortest path problems. In Proceedings of the International Workshop on Engineering Stochastic Local Search Algorithms, Brussels, Belgium, 3–4 September 2009; Springer: Berlin/Heidelberg, Germany, 2009; pp. 76–91. [Google Scholar]
- Fredman, M.L. New bounds on the complexity of the shortest path problem. SIAM J. Comput. 1976, 5, 83–89. [Google Scholar] [CrossRef]
- Takaoka, T. A new upper bound on the complexity of the all pairs shortest path problem. Inf. Process. Lett. 1992, 43, 195–199. [Google Scholar] [CrossRef]
- Dobosiewicz, W. A more efficient algorithm for the min-plus multiplication. Int. J. Comput. Math. 1990, 32, 49–60. [Google Scholar] [CrossRef]
- Han, Y. Improved algorithm for all pairs shortest paths. Inf. Process. Lett. 2004, 91, 245–250. [Google Scholar] [CrossRef]
- Takaoka, T. A faster algorithm for the all-pairs shortest path problem and its application. In Proceedings of the International Computing and Combinatorics Conference, Jeju Island, Republic of Korea, 17–20 August 2004; Springer: Berlin/Heidelberg, Germany, 2004; pp. 278–289. [Google Scholar]
- Takaoka, T. An O (n3loglogn/logn) time algorithm for the all-pairs shortest path problem. Inf. Process. Lett. 2005, 96, 155–161. [Google Scholar] [CrossRef]
- Zwick, U. A slightly improved sub-cubic algorithm for the all pairs shortest paths problem with real edge lengths. In Proceedings of the International Symposium on Algorithms and Computation, Hong Kong, China, 20–22 December 2004; Springer: Berlin/Heidelberg, Germany, 2004; pp. 921–932. [Google Scholar]
- Chan, T.M. All-pairs shortest paths with real weights in O (n 3/log n) time. Algorithmica 2008, 50, 236–243. [Google Scholar] [CrossRef]
- Han, Y. An o (n 3 (loglogn/logn) 5/4) time algorithm for all pairs shortest paths. In Proceedings of the European Symposium on Algorithms, Zurich, Switzerland, 11–13 September 2006; Springer: Berlin/Heidelberg, Germany, 2006; pp. 411–417. [Google Scholar]
- Chan, T.M. More algorithms for all-pairs shortest paths in weighted graphs. SIAM J. Comput. 2010, 39, 2075–2089. [Google Scholar] [CrossRef]
- Williams, R.R. Faster all-pairs shortest paths via circuit complexity. SIAM J. Comput. 2018, 47, 1965–1985. [Google Scholar] [CrossRef]
- Chou, Y.L.; Romeijn, H.E.; Smith, R.L. Approximating shortest paths in large-scale networks with an application to intelligent transportation systems. INFORMS J. Comput. 1998, 10, 163–179. [Google Scholar] [CrossRef]
- Mohring, R.H.; Schilling, H.; Schutz, B.; Wagner, D.; Willhalm, T. Partitioning graphs to speedup Dijkstra’s algorithm. J. Exp. Algorithmics 2007, 11, 2–8. [Google Scholar] [CrossRef]
- Baswana, S.; Goyal, V.; Sen, S. All-pairs nearly 2-approximate shortest paths in O(n2polylogn) time. Theor. Comput. Sci. 2009, 410, 84–93. [Google Scholar] [CrossRef]
- Yuster, R. Approximate shortest paths in weighted graphs. J. Comput. Syst. Sci. 2012, 78, 632–637. [Google Scholar] [CrossRef]
- Thorup, M.; Zwick, U. Approximate distance oracles. J. ACM 2005, 52, 1–24. [Google Scholar] [CrossRef]
- Das Sarma, A.; Gollapudi, S.; Najork, M.; Panigrahy, R. A sketch-based distance oracle for web-scale graphs. In Proceedings of the Third ACM International Conference on Web Search and Data Mining, New York, NY, USA, 3–6 February 2010; pp. 401–410. [Google Scholar]
- Wang, Y.; Wang, Q.; Koehler, H.; Lin, Y. Query-by-Sketch: Scaling Shortest Path Graph Queries on Very Large Networks. In Proceedings of the 2021 International Conference on Management of Data, Virtual Event, 20–25 June 2021; pp. 1946–1958. [Google Scholar]
- Akiba, T.; Iwata, Y.; Yoshida, Y. Fast exact shortest-path distance queries on large networks by pruned landmark labeling. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, New York, NY, USA, 22–27 June 2013; pp. 349–360. [Google Scholar]
- Mendiola, E. Algoritmo Para el Cálculo Acelerado de Distancias Conceptuales. Master’s Thesis, Instituto Politécnico Nacional, Mexico City, Mexico, 2022. [Google Scholar]
- Robertson, N.; Seymour, P. Graph minors. III. Planar tree-width. J. Comb. Theory Ser. B 1984, 36, 49–64. [Google Scholar] [CrossRef]
- Miller, G.A.; Charles, W.G. Contextual correlates of semantic similarity. Lang. Cogn. Process. 1991, 6, 1–28. [Google Scholar] [CrossRef]
- Rubenstein, H.; Goodenough, J.B. Contextual correlates of synonymy. Commun. ACM 1965, 8, 627–633. [Google Scholar] [CrossRef]
- Pirro, G. A semantic similarity metric combining features and intrinsic information content. Data Knowl. Eng. 2009, 68, 1289–1308. [Google Scholar] [CrossRef]
- Agirre, E.; Alfonseca, E.; Hall, K.; Kravalova, J.; Pasca, M.; Soroa, A. A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches. In Proceedings of the Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Boulder, Colorado, 31 May–5 June 2009; pp. 19–27. Available online: https://aclanthology.org/N09-1003 (accessed on 9 October 2023).
- Hill, F.; Reichart, R.; Korhonen, A. Simlex-999: Evaluating semantic models with (genuine) similarity estimation. Comput. Linguist. 2015, 41, 665–695. [Google Scholar] [CrossRef]
- Halawi, G.; Dror, G.; Gabrilovich, E.; Koren, Y. Large-scale learning of word relatedness with constraints. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 12–16 August 2012; pp. 1406–1414. [Google Scholar]
- Radinsky, K.; Agichtein, E.; Gabrilovich, E.; Markovitch, S. A word at a time: Computing word relatedness using temporal semantic analysis. In Proceedings of the 20th International Conference on World Wide Web, Hyderabad, India, 28 March–1 April 2011; pp. 337–346. [Google Scholar]
- Finkelstein, L.; Gabrilovich, E.; Matias, Y.; Rivlin, E.; Solan, Z.; Wolfman, G.; Ruppin, E. Placing search in context: The concept revisited. In Proceedings of the 10th International Conference on World Wide Web, Hong Kong, China, 1–5 May 2001; pp. 406–414. [Google Scholar]
- Szumlanski, S.; Gomez, F.; Sims, V.K. A new set of norms for semantic relatedness measures. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, 4–9 August 2013; Volume 2: Short Papers, pp. 890–895. [Google Scholar]
- Huang, E.H.; Socher, R.; Manning, C.D.; Ng, A.Y. Improving word representations via global context and multiple word prototypes. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju Island, Republic of Korea, 8–14 July 2012; Volume 1: Long Papers. [Google Scholar]
- Cohen, E.; Halperin, E.; Kaplan, H.; Zwick, U. Reachability and distance queries via 2-hop labels. SIAM J. Comput. 2003, 32, 1338–1355. [Google Scholar] [CrossRef]
Reference | Expression | Description |
---|---|---|
Rada et al. (1989) [27] | length of the path from a to b | |
Wu and Palmer (1994) [28] | a and b are concepts within the hierarchy; c is a less common super concept of a and b. is the number of nodes on the path from a to c, is the number of nodes in the path from b to c, and is the number of nodes in the path from c to the hierarchy root. | |
Hirst and Stonge (1995) [29] | C and K are constants, is the length of the shortest path between a and b, and is the number of times that the path changes direction. | |
Li et al. (2003) [30] | is the length of the shortest path between a and b, h is the minimum depth of LCS (the more specific concept that is an ancestor of a and b) in the hierarchy, and and are parameters that scale the contribution of the length and depth of the shortest path, respectively. | |
Shenoy et al. (2012) [31] | L is the shortest distance between a and b, calculated by taking into account the direction of the edges. Each vertical direction is assigned a value of 1; one is added for every direction change. N is the depth of the entire tree. and are the distances from the root to the concepts a and b, respectively. is 1 for the adjacent concepts and 0 for the rest. |
Reference | Expression | Description |
---|---|---|
Resnik (1995) [36] | Based on the concept of the least common subsumer (LCS), information content (IC) is calculated when the terms share an LCS. A high IC value indicates that the term is more specific and clearly describes a concept with less ambiguity. | |
Jiang and Conrath (1997) [37] | It focuses on determining the link strength of an edge connecting a parent node to a child node. These taxonomic links between concepts are reinforced by the difference between a concept’s IC and its LCS. | |
Gao et al. (2015) [38] | Where is a constant and is the set of meanings of the concept x. | |
Jiang et al. (2017) [39] | Where is the set of hyponyms for a, is the set of pages in category c, and is the set of categories. | |
The second proposed approach is the combination of IC by using category structure and the extension of ontology-based methods. | ||
Generalization of the Zhou et al. (2008) [40] approach. Where is an adjustment factor for the weight of the two features involved in the IC calculation, is the depth of the leaf a, and is the maximum depth of a leaf. | ||
Generalization of the Sanchez et al. (2011) [41] approach. Where is the set of leaves of a in the category hierarchy, is the set of hypernyms, and is the maximum number of leaves in the hierarchy. |
Dataset | Content | Type | Word Pairs | Nodes | Edges |
---|---|---|---|---|---|
MC30 [76] | Nouns | Similarity | 30 | 5496 | 9583 |
RG65 [77] | Nouns | Similarity | 65 | 5080 | 9271 |
PS [78] | Nouns | Similarity | 65 | 5080 | 9271 |
Agirre201 [79] | Nouns | Similarity | 201 | 29,738 | 75,120 |
SimLex665 [80] | Nouns | Similarity | 665 | 44,798 | 138,122 |
MTurk771 [81] | Nouns | Relation | 771 | 55,403 | 185,523 |
MTurk287 [82] | Nouns | Relation | 287 | 25,576 | 58,637 |
WS245Rel [83] | Nouns | Relation | 245 | 24,029 | 56,983 |
Rel122 [84] | Nouns | Relation | 122 | 23,775 | 58,322 |
SCWS [85] | Nouns | Relation | 1994 | 53,052 | 183,366 |
All | - | - | 4445 | 119,034 | 818,788 |
Algorithm | MC | RG | PS | Agirre | SimLex | MTurk771 | MTurk287 | WSRel | Rel | SCWS | All | Avg. |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Dijkstra | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.0 | 1.00 |
Pruned Dijkstra | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.0 | 1.00 |
Sketches k = 1 | 0.94 | 0.86 | 0.86 | 0.72 | 0.75 | 0.61 | 0.85 | 0.69 | 0.82 | 0.82 | 0.0 | 0.79 |
Sketches k = 2 | 0.98 | 0.95 | 0.96 | 0.85 | 0.81 | 0.73 | 0.87 | 0.83 | 0.91 | 0.90 | 0.0 | 0.88 |
Sketches k = 5 | 0.98 | 0.98 | 0.98 | 0.92 | 0.89 | 0.88 | 0.93 | 0.91 | 0.95 | 0.95 | 0.0 | 0.94 |
Sketches k = 10 | 0.99 | 0.98 | 0.99 | 0.95 | 0.95 | 0.93 | 0.97 | 0.95 | 0.97 | 0.97 | 0.0 | 0.97 |
Sketches k = 15 | 0.99 | 0.99 | 0.99 | 0.97 | 0.97 | 0.96 | 0.97 | 0.97 | 0.98 | 0.98 | 0.0 | 0.98 |
Sketches k = 20 | 0.99 | 0.99 | 0.99 | 0.98 | 0.98 | 0.97 | 0.98 | 0.97 | 0.99 | 0.99 | 0.0 | 0.98 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Quintero, R.; Mendiola, E.; Guzmán, G.; Torres-Ruiz, M.; Guzmán Sánchez-Mejorada, C. Algorithm for the Accelerated Calculation of Conceptual Distances in Large Knowledge Graphs. Mathematics 2023, 11, 4806. https://doi.org/10.3390/math11234806
Quintero R, Mendiola E, Guzmán G, Torres-Ruiz M, Guzmán Sánchez-Mejorada C. Algorithm for the Accelerated Calculation of Conceptual Distances in Large Knowledge Graphs. Mathematics. 2023; 11(23):4806. https://doi.org/10.3390/math11234806
Chicago/Turabian StyleQuintero, Rolando, Esteban Mendiola, Giovanni Guzmán, Miguel Torres-Ruiz, and Carlos Guzmán Sánchez-Mejorada. 2023. "Algorithm for the Accelerated Calculation of Conceptual Distances in Large Knowledge Graphs" Mathematics 11, no. 23: 4806. https://doi.org/10.3390/math11234806