Construction of Knowledge Graphs: Current State and Challenges
Abstract
:1. Introduction
2. Prerequisites
2.1. Knowledge Graph
2.2. Importance of Application Domain and Use Cases
2.3. Graph Models
Terminology
2.4. Dynamic Adaptations in Knowledge Graph Construction
- Source Changes. Data can be updated continuously or regularly. Sources, like relational databases, can receive regular updates, enriching their content with new entries and revisions. Platforms like Wikipedia and various websites continually expand their repositories with fresh information. At the same time, new forms of social or multimedia, such as text, image, and video posts, contribute to the vast amount of continuously populated data. Beyond traditional datasets, new sources emerge with different structures, access methods, and specialized domains. These changes broaden the scope of available information and require fitting methods for integrating diverse and evolving sources into knowledge graphs.
- Target Changes. The domain or ontology is often precipitated by shifts in organizational priorities, advancements in knowledge within a particular field, or emerging trends in application requirements. These changes can be driven by evolving business strategies that necessitate a reevaluation of the scope and focus of the domain. External factors such as regulatory changes or market dynamics shifts can also influence ontology adjustments, ensuring alignment with current standards and practices. Ultimately, the dynamic nature of the target domain or ontology reflects an ongoing effort to adapt and refine knowledge representation to best serve the evolving needs of stakeholders and the broader environment in which the system operates.For example, initially centered on company data, Knowledge Graphs may later incorporate geographical information or delve into other specialized domains. Each new source brings unique opportunities and complexities, demanding flexible approaches to integration that can accommodate diverse data structures and formats. The evolution of data sources necessitates corresponding changes in the final data representation and ontology within Knowledge Graphs. As the scope of domains shifts, adjustments to the ontology become imperative to accurately reflect the underlying data. Ontologies can range from predefined structures that provide a consistent framework to flexible frameworks that evolve semi-automatically based on incoming data. This adaptability ensures that Knowledge Graphs remain relevant and effective in capturing and organizing the intricacies of evolving datasets.
- Method Changes. Integrating heterogeneous data sources into a Knowledge Graph involves a series of critical steps: extraction, resolution, fusion, completion, and quality assurance. Changes in these integration steps are essential to accommodate updates in both source data and the target domain Knowledge Graph to improve the KG quality. Methods may need to be adjusted or augmented to handle new data sources effectively. Specific steps may be introduced to enhance the integration process, ensuring that the Knowledge Graph remains robust and up-to-date amidst evolving data landscapes.For instance, if the domain of the Knowledge Graph shifts to a specialized area like biomedical data, it might be necessary to incorporate a different method or tool for text extraction. In the biomedical field, specific techniques are often required to extract entities accurately, such as gene names, protein interactions, or medical terminologies. Additionally, as advancements in machine learning continue, updating the integration pipeline with more powerful algorithms can significantly enhance the quality and efficiency of the Knowledge Graph. For example, a new deep-learning model for entity recognition in biomedical texts could replace an older model to achieve better precision and recall.
3. Requirements
3.1. Input Data Requirements
3.2. Support for Incremental KG Updates
3.3. Pipeline and Tool Requirements
3.4. KG Quality Requirements
4. Construction Tasks
- Metadata Management: The acquisition and management of different kinds of metadata, e.g., about the provenance of entities, structural metadata, temporal information, quality reports, or process logs.
- Data Acquisition and Preprocessing: The selection of relevant sources, acquisition, and transformation of relevant source data, and initial data cleaning.
- Ontology Management: The creation and incremental evolution of a KG ontology.
- Knowledge Extraction (KE): The derivation of structured information and knowledge from unstructured or semi-structured data using techniques for named entity recognition, entity linking, and relation extraction. If necessary, this also entails canonicalizing entity and relation identifiers.
- Entity Resolution (ER) and Fusion: Identification of matching entities and their fusion within the KG.
- Knowledge Completion: Extending a given KG, e.g., by learning missing type information, predicting new relations, and enhancing domain-specific data (polishing).
- Quality Assurance (QA): Possible quality aspects, their identification, and repair strategies for data quality problems in the KG.
4.1. Metadata Management
4.1.1. Metadata Repositories
4.1.2. Graph Embedded Metadata
4.1.3. Versioning
4.2. Data Preprocessing
4.2.1. Source Selection and Filtering
4.2.2. Data Acquisition
4.2.3. Transformation and Mapping
4.2.4. Data Cleaning
4.3. Ontology Management
4.3.1. Ontology Learning
4.3.2. Ontology/Schema Matching
4.3.3. Ontology Integration
- Simple Merge. Imports all input ontologies into a new ontology and adds bridging constructs between equivalent entities, like defining OWL equivalentClass or equivalentProperty relations.
- Full Merge. Imports all source ontologies into a new ontology and merges each cluster of equivalent entities into a new unique entity with a union of all their relations, leaving equivalent classes untouched.
- Asymmetric Merge. These approaches import source ontologies into a preferred target ontology, preserving all its concepts, relations, and rules by merging matching entities into existing target entities or else by creating new ones.
4.4. Knowledge Extraction
4.4.1. Named Entity Recognition
4.4.2. Linking
4.4.3. Relation Extraction
4.5. Entity Resolution
4.5.1. Incremental Entity Resolution
4.5.2. Fusion
- Conflict Ignorance: The conflict is not handled, but the different attribute values may be retained, or the problem can be delegated to the user application.
- Conflict Avoidance: It applies a unique strategy for all data. For example, it prioritizes data from trusted sources over others.
- Conflict Resolution: It considers all data and metadata before making a decision to apply a specified strategy, such as taking the most frequent, the most recent, or a randomly selected value.
4.6. Completion
4.6.1. Type Completion
4.6.2. Link Prediction
4.6.3. Data Enrichment
4.7. Quality Assurance
4.7.1. Quality Dimensions
- Accuracy indicates the correctness of facts in a KG, including type, value, and relation correctness. It can be separated into syntactic accuracy, assessing wrong value datatype/format, and semantic accuracy, assessing wrong information.
- Consistency ensures coherency and uniformity of the data within the graph. A consistent KG follows logical rules, avoids contradictions, and maintains coherence among entities, relationships, and attributes. Inconsistencies arise from conflicting information, duplicates, or rule violations.
- Timeliness in the context of KGs refers to the currency and freshness of the information present in the graph. KG timeliness can be influenced by the chosen integration approach, which may involve batch processing at specific intervals or real-time updates.
- Completeness captures and reflects knowledge coverage within a specific domain. Completeness is also a goal for KG completion as it involves generating new values or data to augment the current KG.
- Trustworthiness indicates the confidence and reliability of the KG and depends on source selection and the applied construction methods. It is strongly related to the quality dimensions of completeness, accuracy, and timeliness.
- Availability is the extent to which knowledge is convenient to use. In other words, it refers to how easily and quickly the knowledge of KGs can be retrieved concerning query complexity and data representation.
4.7.2. Evaluation Methods
4.7.3. Quality Improvement
4.7.4. Frameworks and Benchmarks
5. KG Systems
5.1. Specific KGs
5.2. KG Frameworks
5.3. Comparison
- KG Initialization. Here, a common strategy is to manually create the initial KG either by developing it from scratch or by reusing existing KGs (ontologies). There may also be a complex pipeline to construct the initial KG by processing semi-structured data from catalogs, wikis, or category systems. All projects start with building or using some initial KG data. Most of the approaches reuse or sample existing KGs and ontologies as the initial target KG. WorldKG and HKGB semi-automatically build an initial ontology and are, therefore, more advanced than manual ontology construction. For DRKG, SAGA, and SAKA, it is unclear how the initial KG (ontology) was derived.
- Data Preprocessing. These are access methods and support the filtering, normalization, or correction of noisy input data. We exclude here NLP/text pre-processing as this is normally part of Knowledge Extraction. We tried to highlight the incorporation of multiple of these steps with the filled circle.Some approaches apply a filtering step to integrate only entities of relevant types into the KG. This functionality is not always provided (or documented) and is often based on manually defined rules and filter definitions, e.g., to select properties and relationships for certain entity types. Artist-KG links entities in the data source to the current Knowledge Graph to identify entities of relevant target types. VisualSem filters out nodes of images that do not meet the inclusion criteria, like valid images, near duplicates, and non-photographic images. YAGO filters low-coverage entities and accepts entities and their types that are transitively connected to one of the initial classes via sub-class relations, resulting in the final taxonomy of 10k classes (taxonomy enrichment).Many of the investigated tools use a custom mapping approach to convert semi-structured data into a KG representation (DBpedia, YAGO, etc.) The flexible framework Helio requires providing an RDF mapping framework implementation. World KG maps and constructs RDF data based on the key–value pair-based tag system of Open Street Map. The FlexiFusion uses externally calculated entity and relation alignments and transforms the sources into an intermediate KG meta representation with IDs in the same namespace, keeping the initial source IDs as provenance. SAGA transforms input into a format in which the source, trust score, and one-hop relation information are extended to a triple. SAKA employs a generic mapping approach to convert key–value pairs from the source data into RDF and then allows the assignment of entity and relation types.Lastly, some solutions also apply normalization steps during preparation, e.g., to unify date or number representations. In the case of DBpedia(-Live), the implementation recognizes value types and employs data parsers to normalize them into the same units or representations.
- Ontology Management. Most approaches have at least some basic (manual) support for evolving the KG ontology and schema data for newly structured input data. In DBpedia, the KG ontology (and data mappings) can be changed manually and need to be loaded before running a new batch update. The more freshness-oriented approach of DBpedia Live continuously watches ontology changes and immediately schedules affected entities for re-extraction. More advanced approaches rely on semi-automatic ontology evolution or enrichment. In particular, some systems can identify new entities and relation types in the input data to add to the ontology after manual confirmation (NELL, HKGB). Image2Triplets can fully automatically add newly recognized entities or relations to the KG but reserve human intervention for specific edge cases where the system alone cannot decide the manner of integration.ArtistKG uses Karma for ontology matching. While merging is mentioned, it is unclear what procedure is used. While WorldKG, for example, relies on an unsupervised ML approach for ontology alignment, most approaches still perform ontology alignment and merging manually. SAGA’s ingestion component requires mappings from new data to the internal KG ontology. This step only requires predicate mappings, as the subject and object fields can remain in their original namespace and are linked later in the process. CovidGraph performs the mapping of biomedical ontology terms based on their IDs.
- Knowledge Extraction. Many solutions use rule-based methods to extract entities and relations from semi-structured sources (DRKG, VisualSem). Some tools use machine learning approaches for extraction (AI-KG, AutoKnow, CovidGraph, dstlr, SLOGERT, NELL). For entity linking, different approaches are used, such as dictionary-based approaches relying on gathered synonyms (e.g., AI-KG), the use of human interaction (XI), or the application of entity resolution (e.g., HKGB). Plumber selects approaches from a combination of 33 different methods for NER, RE, and EL using an ML approach on dataset samples and provided metadata. A few approaches have a multi-modal domain of extraction. Image2Triplets and VisualSem extract information from images. Image2Triplets uses computer vision techniques to extract visual relationships from images. They also determine human–object interaction in images, detecting novel objects and actions through zero-shot learning. VisualSem, on the other hand, only allows pre-defined relations. SAKA first segments audio files based on speakers and removes non-speech segments. They then transform this audio into text and then perform Knowledge Extraction on it. Given the focus on semi-structured data sources, Knowledge Extraction techniques are generally relatively advanced compared to other steps in KG construction. This has also been made possible by the frequent use of existing Knowledge Extraction tools, such as Stanford CoreNLP, as will be seen in the discussion of the approaches in the next subsections.
- Entity Resolution. This task is supported by only a few approaches, and the pipelines that do employ ER tend to use sophisticated methods like blocking to address scalability issues (ArtistKG, SAGA), and machine-learning-based matchers (SAGA). HKGB’s description of their ER solution is too vague to make a definite statement, and for SLOGERT, it is mentioned that in some cases, ER might be necessary but should be performed with an external tool. Similarly, Helios could enable any ER method, but they only briefly mention this possibility in their requirements through the underlying plugin architecture. CovidGraph relies on string similarities and global identifiers to identify matches. For textual data, the identification and matching of entities to KG elements are already covered by entity linking in the Knowledge Extraction step (Section 4.4). Only SAGA and ArtistKG use blocking methods to scale the matching process.
- Entity Fusion. This is the least supported task among the considered solutions. None of the dataset-specific KGs perform classical (sophisticated) entity fusion, consolidating possible value candidates and selecting final entity IDs or values. Instead, the final KG often contains a union of all extracted values, either with or without provenance, leaving final consolidation/selection to the targeted applications. The DRKG project uses a simple form of entity fusion to normalize entity identifiers. Even for the discussed toolsets, this task’s coverage is relatively low. FlexiFusion allows the application of specific fusion functions, leverages provenance information, and performs a stable ID assignment for entity and property clusters. SAGA refers to the usage of truth-discovery- and source-reliability-based fusion methods.
- Quality Assurance. Human-in-the-loop strategies have been applied to varying degrees, with some solutions, such as HKGB or XI, relying heavily on user interaction. In contrast, others require only final user approval of the correctness of extracted values or patterns, like NELL. In the World, the KG approach manually verifies all class and predicate matches to the external ontologies.Further, SAGA tries to detect potential errors or results of vandalism automatically. It quarantines them for human curation, where changes are treated directly in the live graph and later applied to the stable graph. AI-KG bases the validity of a triple on the trustworthiness of the extraction tool, the frequency of that triple being extracted reaching a certain threshold, or a specifically trained classifier deciding that it is valid.DBpedia and YAGO perform an automatic consistency check. The Helio paper mentions that in a specific use case, their approach was extended to use a validation mechanism, which they do not specify in more detail. Additionally, YAGO guarantees ontological consistency by applying a logical reasoner, and DBpedia checks for dataset completeness and measures quality against the former version.In our study, only dstlr supports validating extracted facts against an external knowledge base.
- Knowledge Completion. DBpedia attaches additional entity-type information based on current ontology and relation data. Three approaches (DRKG, HKGB, SAGA) presented ML-based link prediction on graph embeddings to find further knowledge. In the case of the DRKG and HKGB approaches, it is unclear if the newly predicted information flows back into the KG or is stored separately. AutoKnow uses a learning-based approach to categorize product types.Regarding enrichment with external knowledge, dstlr links entities to Wikidata and fetches stored properties from this external source. However, SLOGERT only adds links to external information based on previously extracted identifiers (PIDs).
6. Discussion and Open Challenges
7. Related Work
8. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Huang, X.; Zhang, J.; Li, D.; Li, P. Knowledge Graph Embedding Based Question Answering. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, WSDM 2019, Melbourne, VIC, Australia, 11–15 February 2019; ACM: New York, NY, USA, 2019; pp. 105–113. [Google Scholar] [CrossRef]
- Wang, X.; He, X.; Cao, Y.; Liu, M.; Chua, T. KGAT: Knowledge Graph Attention Network for Recommendation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, 4–8 August 2019; ACM: New York, NY, USA, 2019; pp. 950–958. [Google Scholar] [CrossRef]
- Mohamed, S.K.; Nováček, V.; Nounu, A. Discovering protein drug targets using Knowledge Graph embeddings. Bioinformatics 2019, 36, 603–610. [Google Scholar] [CrossRef]
- Oberkampf, H.; Zillner, S.; Bauer, B. Interpreting Patient Data using Medical Background Knowledge. In Proceedings of the 3rd International Conference on Biomedical Ontology (ICBO 2012), KR-MED Series, Graz, Austria, 21–25 July 2012; Volume 897. [Google Scholar]
- Sonntag, D.; Tresp, V.; Zillner, S.; Cavallaro, A.; Hammon, M.; Reis, A.; Fasching, P.A.; Sedlmayr, M.; Ganslandt, T.; Prokosch, H.; et al. The Clinical Data Intelligence Project—A smart data initiative. Inform. Spektrum 2016, 39, 290–300. [Google Scholar] [CrossRef]
- Fan, R.; Wang, L.; Yan, J.; Song, W.; Zhu, Y.; Chen, X. Deep Learning-Based Named Entity Recognition and Knowledge Graph Construction for Geological Hazards. ISPRS Int. J. Geo-Inf. 2020, 9, 15. [Google Scholar] [CrossRef]
- Nickel, M.; Murphy, K.; Tresp, V.; Gabrilovich, E. A review of relational machine learning for Knowledge Graphs. Proc. IEEE 2015, 104, 11–33. [Google Scholar] [CrossRef]
- Hogan, A.; Blomqvist, E.; Cochez, M.; d’Amato, C.; de Melo, G.; Gutiérrez, C.; Kirrane, S.; Labra Gayo, J.E.; Navigli, R.; Neumaier, S.; et al. Knowledge Graphs; Synthesis Lectures on Data, Semantics, and Knowledge (SLDSK); Springer: Cham, Switzerland, 2022; ISBN 978-3-031-00790-3. [Google Scholar] [CrossRef]
- Ji, S.; Pan, S.; Cambria, E.; Marttinen, P.; Philip, S.Y. A survey on Knowledge Graphs: Representation, acquisition, and applications. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 494–514. [Google Scholar] [CrossRef]
- Pan, S.; Luo, L.; Wang, Y.; Chen, C.; Wang, J.; Wu, X. Unifying Large Language Models and Knowledge Graphs: A Roadmap. IEEE Trans. Knowl. Data Eng. 2024, 36, 3580–3599. [Google Scholar] [CrossRef]
- Yang, L.; Chen, H.; Li, Z.; Ding, X.; Wu, X. Give Us the Facts: Enhancing Large Language Models with Knowledge Graphs for Fact-aware Language Modeling. arXiv 2023, arXiv:2306.11489. [Google Scholar] [CrossRef]
- Allen, B.P.; Stork, L.; Groth, P. Knowledge Engineering Using Large Language Models. arXiv 2023, arXiv:2310.00637. [Google Scholar] [CrossRef]
- Abu-Salih, B. Domain-specific Knowledge Graphs: A survey. J. Netw. Comput. Appl. 2021, 185, 103076. [Google Scholar] [CrossRef]
- Zou, X. A Survey on Application of Knowledge Graph. J. Phys. Conf. Ser. 2020, 1487, 012016. [Google Scholar] [CrossRef]
- Weikum, G.; Dong, L.; Razniewski, S.; Suchanek, F.M. Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases. Found. Trends Databases 2021, 10, 108–490. [Google Scholar] [CrossRef]
- Zhu, X.; Li, Z.; Wang, X.; Jiang, X.; Sun, P.; Wang, X.; Xiao, Y.; Yuan, N.J. Multi-Modal Knowledge Graph Construction and Application: A Survey. IEEE Trans. Knowl. Data Eng. 2024, 36, 715–735. [Google Scholar] [CrossRef]
- Ryen, V.; Soylu, A.; Roman, D. Building Semantic Knowledge Graphs from (Semi-) Structured Data: A Review. Future Internet 2022, 14, 129. [Google Scholar] [CrossRef]
- Ma, X. Knowledge graph construction and application in geosciences: A review. Comput. Geosci. 2021, 161, 105082. [Google Scholar] [CrossRef]
- Xiao, G.; Ding, L.; Cogrel, B.; Calvanese, D. Virtual Knowledge Graphs: An Overview of Systems and Use Cases. Data Intell. 2019, 1, 201–223. [Google Scholar] [CrossRef]
- Assche, D.V.; Delva, T.; Haesendonck, G.; Heyvaert, P.; Meester, B.D.; Dimou, A. Declarative RDF graph generation from heterogeneous (semi-)structured data: A systematic literature review. J. Web Semant. 2023, 75, 100753. [Google Scholar] [CrossRef]
- Schneider, E.W. Course Modularization Applied: The Interface System and Its Implications For Sequence Control and Data Analysis; Report HumBRO-PP-10-73; Human Resources Research Organization: Alexandria, VA, USA, 1973. [Google Scholar]
- Paulheim, H. Knowledge graph refinement: A survey of approaches and evaluation methods. Semant. Web 2017, 8, 489–508. [Google Scholar] [CrossRef]
- Ehrlinger, L.; Wöß, W. Towards a Definition of Knowledge Graphs. In Proceedings of the Joint Proceedings of the Posters and Demos Track of the 12th International Conference on Semantic Systems—SEMANTiCS 2016 and the 1st International Workshop on Semantic Change & Evolving Semantics (SuCCESS’16) Co-Located with the 12th International Conference on Semantic Systems (SEMANTiCS 2016), Leipzig, Germany, 13–14 September 2016. [Google Scholar]
- Lissandrini, M.; Mottin, D.; Hose, K.; Pedersen, T.B. Knowledge Graph Exploration Systems: Are we lost? In Proceedings of the 12th Conference on Innovative Data Systems Research, CIDR, Chaminade, CA, USA, 9–12 January 2022. [Google Scholar]
- Hogan, A.; Brickley, D.; Gutierrez, C.; Polleres, A.; Zimmerman, A. (Re)Defining Knowledge Graphs. In Proceedings of the Knowledge Graphs: New Directions for Knowledge Representation on the Semantic Web (Dagstuhl Seminar 18371), Wadern, Germany, 9–14 September 2018; Volume 8, pp. 74–79. [Google Scholar] [CrossRef]
- Feilmayr, C.; Wöß, W. An analysis of ontologies and their success factors for application to business. Data Knowl. Eng. 2016, 101, 1–23. [Google Scholar] [CrossRef]
- Dentler, K.; Cornet, R.; ten Teije, A.; de Keizer, N. Comparison of reasoners for large ontologies in the OWL 2 EL profile. Semant. Web 2011, 2, 71–87. [Google Scholar] [CrossRef]
- Abburu, S. A survey on ontology reasoners and comparison. Int. J. Comput. Appl. 2012, 57, 33–39. [Google Scholar]
- Chen, X.; Jia, S.; Xiang, Y. A review: Knowledge reasoning over Knowledge Graph. Expert Syst. Appl. 2020, 141, 112948. [Google Scholar] [CrossRef]
- Kejriwal, M. Domain-Specific Knowledge Graph Construction; Springer Briefs in Computer Science (BRIEFSCOMPUTER); Springer: Cham, Switzerland, 2019; ISBN 978-3-030-12374-1. [Google Scholar] [CrossRef]
- Noy, N.; Gao, Y.; Jain, A.; Narayanan, A.; Patterson, A.; Taylor, J. Industry-scale Knowledge Graphs: Lessons and Challenges: Five diverse technology companies show how it’s done. Queue 2019, 17, 48–75. [Google Scholar] [CrossRef]
- Song, Y.; Li, W.; Dai, G.; Shang, X. Advancements in Complex Knowledge Graph Question Answering: A Survey. Electronics 2023, 12, 4395. [Google Scholar] [CrossRef]
- Liu, J.; Huang, W.; Li, T.; Ji, S.; Zhang, J. Cross-Domain Knowledge Graph Chiasmal Embedding for Multi-Domain Item-Item Recommendation. IEEE Trans. Knowl. Data Eng. 2023, 35, 4621–4633. [Google Scholar] [CrossRef]
- Ioannidis, V.N.; Song, X.; Manchanda, S.; Li, M.; Pan, X.; Zheng, D.; Ning, X.; Zeng, X.; Karypis, G. DRKG—Drug Repurposing Knowledge Graph for COVID-19. 2020. Available online: https://github.com/gnn4dr/DRKG/blob/1a3141e71fbbd2ffa97d91a91ad4d12754dc7bd6/DRKG%20Drug%20Repurposing%20Knowledge%20Graph.pdf (accessed on 18 August 2024).
- Preusse, M.; Jarasch, A.; Bleimehl, T.; Muller, S.; Munro, J.; Gutebier, L.; Henkel, R.; Waltemath, D. COVIDGraph: Connecting Biomedical COVID-19 Resources and Computational Biology Models. In Proceedings of the 2nd Workshop on Search, Exploration, and Analysis in Heterogeneous Datastores (SEA-Data 2021) Co-Located with 47th International Conference on Very Large Data Bases (VLDB 2021), Copenhagen, Denmark, 20 August 2021; Volume 2929, pp. 34–37. [Google Scholar]
- Su, X.; You, Z.; Huang, D.; Wang, L.; Wong, L.; Ji, B.; Zhao, B. Biomedical Knowledge Graph Embedding With Capsule Network for Multi-Label Drug-Drug Interaction Prediction. IEEE Trans. Knowl. Data Eng. 2023, 35, 5640–5651. [Google Scholar] [CrossRef]
- Kertkeidkachorn, N.; Nararatwong, R.; Xu, Z.; Ichise, R. FinKG: A Core Financial Knowledge Graph for Financial Analysis. In Proceedings of the 17th IEEE International Conference on Semantic Computing, ICSC 2023, Laguna Hills, CA, USA, 1–3 February 2023; pp. 90–93. [Google Scholar] [CrossRef]
- Reinanda, R. Financial Knowledge Graph at Bloomberg: Applications and Challenges. In Proceedings of the Knowledge Graph Conference (KGC) 2021— KGC, Virtual, 3–6 May 2021. [Google Scholar] [CrossRef]
- Abu-Salih, B.; Alotaibi, S. Knowledge Graph Construction for Social Customer Advocacy in Online Customer Engagement. Technologies 2023, 11, 123. [Google Scholar] [CrossRef]
- Dong, X.; He, X.; Kan, A.; Li, X.; Liang, Y.; Ma, J.; Xu, Y.; Zhang, C.; Zhao, T.; Saldana, G.B.; et al. AutoKnow: Self-Driving Knowledge Collection for Products of Thousands of Types. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD), Virtual, 26 July 2020. [Google Scholar]
- Lee, J.; Yoon, W.; Kim, S.; Kim, D.; Kim, S.; So, C.H.; Kang, J. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2019, 36, 1234–1240. [Google Scholar] [CrossRef]
- Trabelsi, M.; Heflin, J.; Cao, J. DAME: Domain Adaptation for Matching Entities. In Proceedings of the WSDM ’22: The Fifteenth ACM International Conference on Web Search and Data Mining, Tempe, AZ, USA, 21–25 February 2022; pp. 1016–1024. [Google Scholar] [CrossRef]
- Balsebre, P.; Yao, D.; Cong, G.; Hai, Z. Geospatial Entity Resolution. In Proceedings of the ACM Web Conference 2022, Lyon, France, 25–29 April 2022; pp. 3061–3070. [Google Scholar] [CrossRef]
- Ngomo, A.N. ORCHID—Reduction-Ratio-Optimal Computation of Geo-spatial Distances for Link Discovery. In Proceedings of the Semantic Web—ISWC 2013—12th International Semantic Web Conference, Sydney, Australia, 21–25 October 2013; Volume 8218, pp. 395–410. [Google Scholar] [CrossRef]
- Cui, Z.; Sun, X.; Pan, L.; Liu, S.; Xu, G. Event-Based Incremental Recommendation via Factors Mixed Hawkes Process. Inf. Sci. 2023, 639, 119007. [Google Scholar] [CrossRef]
- Wang, P.; He, Y. Uni-Detect: A Unified Approach to Automated Error Detection in Tables. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, 30 June–5 July 2019; pp. 811–828. [Google Scholar] [CrossRef]
- Ekelhart, A.; Ekaputra, F.J.; Kiesling, E. The SLOGERT Framework for Automated Log Knowledge Graph Construction. In Proceedings of the ESWC, 2021, Virtual, 6–10 June 2021. [Google Scholar]
- Sakr, S.; Bonifati, A.; Voigt, H.; Iosup, A.; Ammar, K.; Angles, R.; Aref, W.G.; Arenas, M.; Besta, M.; Boncz, P.A.; et al. The future is big graphs: A community view on graph processing systems. Commun. ACM 2021, 64, 62–71. [Google Scholar] [CrossRef]
- Lassila, O. Resource Description Framework (RDF) Model and Syntax Specification, W3C Recommendation. 1999. Available online: http://www.w3.org/TR/PR-rdf-syntax (accessed on 18 August 2024).
- Horrocks, I.; Patel-Schneider, P.F.; Boley, H.; Tabet, S.; Grosof, B.; Dean, M. SWRL: A semantic web rule language combining OWL and RuleML. W3C Memb. Submiss. 2004, 21, 1–31. [Google Scholar]
- Sirin, E.; Parsia, B.; Grau, B.C.; Kalyanpur, A.; Katz, Y. Pellet: A practical OWL-DL reasoner. J. Web Semant. 2007, 5, 51–53. [Google Scholar] [CrossRef]
- Urbani, J.; Margara, A.; Jacobs, C.J.H.; van Harmelen, F.; Bal, H.E. DynamiTE: Parallel Materialization of Dynamic RDF Data. In Proceedings of the 12th International Semantic Web Conference (ISWC) 2013, Sydney, Australia, 21–25 October 2013; Volume 8218, pp. 657–672. [Google Scholar] [CrossRef]
- Mohamed, H.; Fathalla, S.; Lehmann, J.; Jabeen, H. A Scalable Approach for Distributed Reasoning over Large-scale OWL Datasets. In Proceedings of the 13th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K 2021, Volume 2: KEOD, Virtual, 25–27 October 2021; pp. 51–60. [Google Scholar] [CrossRef]
- Benítez-Hidalgo, A.; Navas-Delgado, I.; del Mar Roldán García, M. NORA: Scalable OWL reasoner based on NoSQL databases and Apache Spark. Softw. Pract. Exp. 2023, 53, 2377–2392. [Google Scholar] [CrossRef]
- Hu, P.; Urbani, J.; Motik, B.; Horrocks, I. Datalog Reasoning over Compressed RDF Knowledge Bases. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM 2019, Beijing, China, 3–7 November 2019; pp. 2065–2068. [Google Scholar] [CrossRef]
- Knublauch, H.; Kontokostas, D. Shapes constraint language (SHACL). W3C Candidate Recomm. 2017, 11. Available online: https://www.w3.org/TR/shacl/ (accessed on 18 August 2024).
- Prud’hommeaux, E.; Gayo, J.E.L.; Solbrig, H.R. Shape expressions: An RDF validation and transformation language. In Proceedings of the Joint Conference on Lexical and Computational Semantics, Dublin, Ireland, 23–24 August 2014. [Google Scholar]
- Frey, J.; Müller, K.; Hellmann, S.; Rahm, E.; Vidal, M.E. Evaluation of metadata representations in RDF stores. Semant. Web 2019, 10, 205–229. [Google Scholar] [CrossRef]
- Sikos, L.F.; Philp, D. Provenance-aware knowledge representation: A survey of data models and contextualized Knowledge Graphs. Data Sci. Eng. 2020, 5, 293–316. [Google Scholar] [CrossRef]
- Zhang, F.; Li, Z.; Peng, D.; Cheng, J. RDF for temporal data management—A survey. Earth Sci. Inform. 2021, 14, 563–599. [Google Scholar] [CrossRef]
- Lehmann, J.; Sejdiu, G.; Bühmann, L.; Westphal, P.; Stadler, C.; Ermilov, I.; Bin, S.; Chakraborty, N.; Saleem, M.; Ngomo, A.C.N.; et al. Distributed Semantic Analytics Using the SANSA Stack. In Proceedings of the International Workshop on the Semantic Web (ISWC) 2017, Vienna, Austria, 21–25 October 2017. [Google Scholar]
- Angles, R. The Property Graph Database Model. In Proceedings of the AMW, 2018, Cali, Colombia, 21–25 May 2018. [Google Scholar]
- Lbath, H.; Bonifati, A.; Harmer, R. Schema inference for property graphs. In Proceedings of the EDBT 2021-24th International Conference on Extending Database Technology, Nicosia, Cyprus, 23–26 March 2021; pp. 499–504. [Google Scholar]
- Neo4j Inc. Neo4j Graph Database. Available online: https://neo4j.com/ (accessed on 18 August 2024).
- The Linux Foundation. JanusGraph: An Open Source, Distributed Graph Database. Available online: https://janusgraph.org (accessed on 18 August 2024).
- TigerGraph, Inc. TigerGraph Graph Database. Available online: https://www.tigergraph.com (accessed on 18 August 2024).
- Hong, S.; Depner, S.; Manhardt, T.; Van Der Lugt, J.; Verstraaten, M.; Chafi, H. PGX.D: A Fast Distributed Graph Processing Engine. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’15, New York, NY, USA, 12–17 November 2015. [Google Scholar] [CrossRef]
- Rost, C.; Gómez, K.; Täschner, M.; Fritzsche, P.; Schons, L.; Christ, L.; Adameit, T.; Junghanns, M.; Rahm, E. Distributed temporal graph analytics with GRADOOP. VLDB J. 2022, 31, 375–401. [Google Scholar] [CrossRef]
- Wood, P.T. Query languages for graph databases. SIGMOD Rec. 2012, 41, 50–60. [Google Scholar] [CrossRef]
- Angles, R.; Arenas, M.; Barceló, P.; Boncz, P.A.; Fletcher, G.H.L.; Gutierrez, C.; Lindaaker, T.; Paradies, M.; Plantikow, S.; Sequeda, J.F.; et al. G-CORE: A Core for Future Graph Query Languages. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, 10–15 June 2018; pp. 1421–1432. [Google Scholar] [CrossRef]
- Rodriguez, M.A. The Gremlin graph traversal machine and language (invited talk). In Proceedings of the 15th Symposium on Database Programming Languages (SPLASH), Pittsburgh, PA, USA, 25–30 October 2015. [Google Scholar] [CrossRef]
- van Rest, O.; Hong, S.; Kim, J.; Meng, X.; Chafi, H. PGQL: A property graph query language. In Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems, Redwood Shores, CA, USA, 24 June 2016; p. 7. [Google Scholar] [CrossRef]
- Francis, N.; Green, A.; Guagliardo, P.; Libkin, L.; Lindaaker, T.; Marsault, V.; Plantikow, S.; Rydberg, M.; Selmer, P.; Taylor, A. Cypher: An Evolving Query Language for Property Graphs. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, 10–15 June 2018; pp. 1433–1445. [Google Scholar] [CrossRef]
- Deutsch, A.; Francis, N.; Green, A.; Hare, K.; Li, B.; Libkin, L.; Lindaaker, T.; Marsault, V.; Martens, W.; Michels, J.; et al. Graph Pattern Matching in GQL and SQL/PGQ. In Proceedings of the SIGMOD’22: International Conference on Management of Data, Philadelphia, PA, USA, 12–17 June 2022; pp. 2246–2258. [Google Scholar] [CrossRef]
- Chiba, H.; Yamanaka, R.; Matsumoto, S. Property Graph Exchange Format. arXiv 2019, arXiv:1907.03936. [Google Scholar] [CrossRef]
- Tomaszuk, D.; Angles, R.; Szeremeta, L.; Litman, K.; Cisterna, D. Serialization for Property Graphs. In Proceedings of the Beyond Databases, Architectures and Structures. Paving the Road to Smart Data Processing and Analysis—15th International Conference, BDAS 2019, Ustroń, Poland, 28–31 May 2019; pp. 57–69. [Google Scholar] [CrossRef]
- Neelam, S.; Sharma, U.; Bhatia, S.; Karanam, H.; Likhyani, A.; Abdelaziz, I.; Fokoue, A.; Subramaniam, L.V. Expressive Reasoning Graph Store: A Unified Framework for Managing RDF and Property Graph Databases. arXiv 2022, arXiv:2209.05828. [Google Scholar] [CrossRef]
- Angles, R.; Bonifati, A.; Dumbrava, S.; Fletcher, G.; Hare, K.; Hidders, J.; Lee, V.E.; Li, B.; Libkin, L.; Martens, W.; et al. PG-Keys: Keys for Property Graphs. In Proceedings of the 2021 International Conference on Management of Data, Shanxi, China, 3–5 June 2021. [Google Scholar]
- Bonifati, A.; Dumbrava, S.; Fletcher, G.; Hidders, J.; Li, B.; Libkin, L.; Martens, W.; Murlak, F.; Plantikow, S.; Savkovi’c, O.; et al. PG-Schema: Schemas for Property Graphs. Proc. ACM Manag. Data 2022, 1, 1–25. [Google Scholar]
- Rost, C.; Fritzsche, P.; Schons, L.; Zimmer, M.; Gawlick, D.; Rahm, E. Bitemporal Property Graphs to Organize Evolving Systems. arXiv 2021, arXiv:2111.13499. [Google Scholar] [CrossRef]
- Besta, M.; Fischer, M.; Kalavri, V.; Kapralov, M.; Hoefler, T. Practice of Streaming and Dynamic Graphs: Concepts, Models, Systems, and Parallelism. arXiv 2019, arXiv:1912.12740. [Google Scholar] [CrossRef]
- Lassila, O.; Schmidt, M.; Hartig, O.; Bebee, B.; Bechberger, D.; Broekema, W. The OneGraph Vision: Challenges of Breaking the Graph Model Lock-In. Semant. Web 2022. [Google Scholar] [CrossRef]
- Tian, Y. The World of Graph Databases from An Industry Perspective. SIGMOD Rec. 2022, 51, 60–67. [Google Scholar] [CrossRef]
- Ilyas, I.F.; Rekatsinas, T.; Konda, V.; Pound, J.; Qi, X.; Soliman, M. Saga: A Platform for Continuous Construction and Serving of Knowledge at Scale. In Proceedings of the 2022 International Conference on Management of Data, SIGMOD ’22, New York, NY, USA, 12–17 June 2022; pp. 2259–2272. [Google Scholar] [CrossRef]
- Hartig, O. Reconciliation of RDF* and Property Graphs. arXiv 2014, arXiv:1409.3288. [Google Scholar] [CrossRef]
- Abuoda, G.; Dell’Aglio, D.; Keen, A.; Hose, K. Transforming RDF-star to Property Graphs: A Preliminary Analysis of Transformation Approaches. In Proceedings of the QuWeDa 2022: 6th Workshop on Storing, Querying and Benchmarking Knowledge Graphs at ISWC, Online, 23 October 2022; Volume 3279, pp. 17–32. [Google Scholar]
- Taelman, R.; Sande, M.V.; Verborgh, R. GraphQL-LD: Linked Data Querying with GraphQL. In Proceedings of the ISWC 2018 Posters & Demonstrations, Industry and Blue Sky Ideas Tracks Co-Located with 17th International Semantic Web Conference (ISWC 2018), Monterey, CA, USA, 8–12 October 2018; Volume 2180. [Google Scholar]
- Cudré-Mauroux, P. Leveraging Knowledge Graphs for big data integration: The XI pipeline. Semant. Web 2020, 11, 13–17. [Google Scholar] [CrossRef]
- Madnick, S.E.; Wang, R.Y.; Lee, Y.W.; Zhu, H. Overview and Framework for Data and Information Quality Research. ACM J. Data Inf. Qual. 2009, 1, 1–22. [Google Scholar] [CrossRef]
- Zaveri, A.; Rula, A.; Maurino, A.; Pietrobon, R.; Lehmann, J.; Auer, S. Quality assessment for linked data: A survey. Semant. Web 2016, 7, 63–93. [Google Scholar] [CrossRef]
- Wang, X.; Chen, L.; Ban, T.; Usman, M.; Guan, Y.; Liu, S.; Wu, T.; Chen, H. Knowledge Graph Quality Control: A Survey. Fundam. Res. 2021, 1, 607–626. [Google Scholar] [CrossRef]
- Narayan, A.; Chami, I.; Orr, L.J.; Ré, C. Can Foundation Models Wrangle Your Data? Proc. VLDB Endow. 2022, 16, 738–746. [Google Scholar] [CrossRef]
- Trummer, I. From BERT to GPT-3 Codex: Harnessing the Potential of Very Large Language Models for Data Management. Proc. VLDB Endow. 2022, 15, 3770–3773. [Google Scholar] [CrossRef]
- Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J.; Appleton, G.; Axton, M.; Baak, A.; Blomberg, N.; Boiten, J.W.; da Silva Santos, L.B.; Bourne, P.E.; et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 2016, 3, 1–9. [Google Scholar] [CrossRef] [PubMed]
- Kricke, M.; Grimmer, M.; Schmeißer, M. Preserving Recomputability of Results from Big Data Transformation Workflows. Datenbank-Spektrum 2017, 17, 245–253. [Google Scholar] [CrossRef]
- Greenberg, J. Understanding metadata and metadata schemes. Cat. Classif. Q. 2005, 40, 17–36. [Google Scholar] [CrossRef]
- Neto, C.B.; Kontokostas, D.; Kirschenbaum, A.; Publio, G.C.; Esteves, D.; Hellmann, S. IDOL: Comprehensive & complete LOD insights. In Proceedings of the 13th International Conference on Semantic Systems (SEMANTiCS), Amsterdam, The Netherlands, 11–14 September 2017; pp. 49–56. [Google Scholar]
- Duval, E.; Hodgins, W.; Sutton, S.; Weibel, S.L. Metadata principles and practicalities. D-Lib Mag. 2002, 8, 1–10. [Google Scholar] [CrossRef]
- Arora, S.; Yang, B.; Eyuboglu, S.; Narayan, A.; Hojel, A.; Trummer, I.; Ré, C. Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes. Proc. VLDB Endow. 2023, 17, 92–105. [Google Scholar] [CrossRef]
- Chen, Z.; Cao, L.; Madden, S.; Kraska, T.; Shang, Z.; Fan, J.; Tang, N.; Gu, Z.; Liu, C.; Cafarella, M. SEED: Domain-Specific Data Curation With Large Language Models. arXiv 2023, arXiv:2310.00749. [Google Scholar] [CrossRef]
- Kadioglu, D.; Breil, B.; Knell, C.; Lablans, M.; Mate, S.; Schlue, D.; Serve, H.; Storf, H.; Ückert, F.; Wagner, T.O.; et al. Samply. MDR-A Metadata Repository and Its Application in Various Research Networks. In Proceedings of the GMDS, Osnabrück, Germany, 2–6 September 2018; pp. 50–54. [Google Scholar]
- Frey, J.; Götz, F.; Hofer, M.; Hellmann, S. Managing and Compiling Data Dependencies for Semantic Applications Using Databus Client. In Proceedings of the Research Conference on Metadata and Semantics Research, London, UK, 29 November–3 December 2022; pp. 114–125. [Google Scholar]
- Frey, J.; Hofer, M.; Obraczka, D.; Lehmann, J.; Hellmann, S. DBpedia FlexiFusion the best of Wikipedia> Wikidata> your data. In Proceedings of the 18th International Semantic Web Conference, Auckland, New Zealand, 26–30 October 2019; pp. 96–112. [Google Scholar]
- Meester, B.D.; Dimou, A.; Verborgh, R.; Mannens, E. Detailed Provenance Capture of Data Processing. In Proceedings of the SemSci@ISWC, Vienna, Austria, 21 October 2017. [Google Scholar]
- Meester, B.D.; Seymoens, T.; Dimou, A.; Verborgh, R. Implementation-independent function reuse. Future Gener. Comput. Syst. 2020, 110, 946–959. [Google Scholar] [CrossRef]
- Fernández, J.D.; Polleres, A.; Umbrich, J. Towards Efficient Archiving of Dynamic Linked Open Data. In Proceedings of the First DIACHRON Workshop on Managing the Evolution and Preservation of the Data Web Co-Located with 12th European Semantic Web Conference (ESWC 2015), Portorož, Slovenia, 31 May 2015; Volume 1377, pp. 34–49. [Google Scholar]
- Taelman, R.; Mahieu, T.; Vanbrabant, M.; Verborgh, R. Optimizing storage of RDF archives using bidirectional delta chains. Semant. Web 2022, 13, 705–734. [Google Scholar] [CrossRef]
- Hofer, M.; Hellmann, S.; Dojchinovski, M.; Frey, J. The new dbpedia release cycle: Increasing agility and efficiency in Knowledge Extraction workflows. In Proceedings of the International Conference on Semantic Systems, Amsterdam, The Netherlands, 7–10 September 2020; pp. 1–18. [Google Scholar]
- Zhang, H.; Wang, X.; Pan, J.; Wang, H. SAKA: An intelligent platform for semi-automated Knowledge Graph construction and application. Serv. Oriented Comput. Appl. 2023, 17, 201–212. [Google Scholar] [CrossRef]
- Graube, M.; Hensel, S.; Urbas, L. R43ples: Revisions for Triples—An Approach for Version Control in the Semantic Web. In Proceedings of the 1st Workshop on Linked Data Quality Co-Located with 10th International Conference on Semantic Systems, LDQ@SEMANTiCS 2014, Leipzig, Germany, 2 September 2014; Volume 1215. [Google Scholar]
- Arndt, N.; Naumann, P.; Radtke, N.; Martin, M.; Marx, E. Decentralized Collaborative Knowledge Management Using Git. J. Web Semant. 2019, 54, 29–47. [Google Scholar] [CrossRef]
- Anderson, J.; Bendiken, A. Transaction-Time Queries in Dydra. In Proceedings of the Joint Proceedings of the 2nd Workshop on Managing the Evolution and Preservation of the Data Web (MEPDaW 2016) and the 3rd Workshop on Linked Data Quality (LDQ 2016) Co-Located with 13th European Semantic Web Conference (ESWC 2016), Heraklion, Greece, 30 May 2016; Volume 1585, pp. 11–19. [Google Scholar]
- Debrouvier, A.; Parodi, E.; Perazzo, M.; Soliani, V.; Vaisman, A.A. A model and query language for temporal graph databases. VLDB J. 2021, 30, 825–858. [Google Scholar] [CrossRef]
- Dong, X.L.; Gabrilovich, E.; Murphy, K.; Dang, V.; Horn, W.; Lugaresi, C.; Sun, S.; Zhang, W. Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources. Proc. VLDB Endow. 2015, 8, 938–949. [Google Scholar] [CrossRef]
- Amsterdamer, Y.; Cohen, M. Automated Selection of Multiple Datasets for Extension by Integration. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Queensland, Australia, 1–5 November 2021; pp. 27–36. [Google Scholar]
- Fetahu, B.; Dietze, S.; Pereira Nunes, B.; Antonio Casanova, M.; Taibi, D.; Nejdl, W. A scalable approach for efficiently generating structured dataset topic profiles. In Proceedings of the European Semantic Web Conference (ESWC), Crete, Greece, 25–29 May 2014; pp. 519–534. [Google Scholar]
- Blei, D.M.; Lafferty, J.D. A correlated topic model of science. Ann. Appl. Stat. 2007, 1, 17–35. [Google Scholar] [CrossRef]
- Nentwig, M.; Rahm, E. Incremental clustering on linked data. In Proceedings of the 2018 IEEE International Conference on Data Mining Workshops (ICDMW), Singapore, 17–20 November 2018; pp. 531–538. [Google Scholar]
- Saeedi, A.; Peukert, E.; Rahm, E. Incremental Multi-source Entity Resolution for Knowledge Graph Completion. In Proceedings of the European Semantic Web Conference (ESWC), Athens, Greece, 2–6 November 2020; pp. 393–408. [Google Scholar]
- Hertling, S.; Paulheim, H. Order Matters: Matching Multiple Knowledge Graphs. In Proceedings of the K-CAP ’21: Knowledge Capture Conference, Virtual, 2–3 December 2021; pp. 113–120. [Google Scholar] [CrossRef]
- Giese, M.; Soylu, A.; Vega-Gorgojo, G.; Waaler, A.; Haase, P.; Jiménez-Ruiz, E.; Lanti, D.; Rezk, M.; Xiao, G.; Özçep, Ö.L.; et al. Optique: Zooming in on Big Data. Computer 2015, 48, 60–67. [Google Scholar] [CrossRef]
- Civili, C.; Console, M.; Giacomo, G.D.; Lembo, D.; Lenzerini, M.; Lepore, L.; Mancini, R.; Poggi, A.; Rosati, R.; Ruzzi, M.; et al. MASTRO STUDIO: Managing Ontology-Based Data Access applications. Proc. VLDB Endow. 2013, 6, 1314–1317. [Google Scholar] [CrossRef]
- Mami, M.N.; Graux, D.; Scerri, S.; Jabeen, H.; Auer, S.; Lehmann, J. Squerall: Virtual Ontology-Based Access to Heterogeneous and Large Data Sources. In Proceedings of the 18th International Semantic Web Conference (ISWC), Auckland, New Zealand, 26–30 October 2019; Volume 11779, pp. 229–245. [Google Scholar] [CrossRef]
- Banavar, G.; Chandra, T.; Mukherjee, B.; Nagarajarao, J.; Strom, R.E.; Sturman, D.C. An efficient multicast protocol for content-based publish-subscribe systems. In Proceedings of the 19th IEEE International Conference on Distributed Computing Systems (Cat. No. 99CB37003), Austin, TX, USA, 5 June 1999; pp. 262–272. [Google Scholar]
- Völkel, M.; Groza, T. SemVersion: An RDF-based Ontology Versioning System. In Proceedings of the IADIS International Conference on WWW/Internet, IADIS, Murcia, Spain, 5–8 October 2006. [Google Scholar]
- Im, D.H.; Lee, S.W.; Kim, H.J. A Version Management Framework for RDF Triple Stores. Int. J. Softw. Eng. Knowl. Eng. 2012, 22, 85–106. [Google Scholar] [CrossRef]
- Sande, M.V.; Colpaert, P.; Verborgh, R.; Coppens, S.; Mannens, E.; de Walle, R.V. R&Wbase: Git for triples. In Proceedings of the LDOW, Rio de Janeiro, Brazil, 14 May 2013; 14 May 2013. [Google Scholar]
- Neumann, T.; Weikum, G. X-RDF-3X: Fast Querying, High Update Rates, and Consistency for RDF Databases. Proc. VLDB Endow. 2010, 3, 256–263. [Google Scholar] [CrossRef]
- Stefanidis, K.; Chrysakis, I.; Flouris, G. On Designing Archiving Policies for Evolving RDF Datasets on the Web. In Proceedings of the Conceptual Modeling: 33rd International Conference, ER 2014, Atlanta, GA, USA, 27–29 October 2014; Volume 8824, pp. 43–56. [Google Scholar]
- Taelman, R.; Colpaert, P.; Mannens, E.; Verborgh, R. Generating public transport data based on population distributions for RDF benchmarking. Semant. Web 2019, 10, 305–328. [Google Scholar] [CrossRef]
- Lancker, D.V.; Colpaert, P.; Delva, H.; de Vyvere, B.V.; Meléndez, J.A.R.; Dedecker, R.; Michiels, P.; Buyle, R.; Craene, A.D.; Verborgh, R. Publishing Base Registries as Linked Data Event Streams. In Proceedings of the Web Engineering—21st International Conference, ICWE 2021, Biarritz, France, 18–21 May 2021; pp. 28–36. [Google Scholar] [CrossRef]
- Assche, D.V.; Oo, S.M.; Rojas, J.A.; Colpaert, P. Continuous generation of versioned collections’ members with RML and LDES. In Proceedings of the 3rd International Workshop on Knowledge Graph Construction (KGCW 2022) Co-Located with 19th Extended Semantic Web Conference (ESWC 2022), Hersonissos, Greek, 30 May 2022; Volume 3141. [Google Scholar]
- Aebeloe, C.; Keles, I.; Montoya, G.; Hose, K. Star Pattern Fragments: Accessing Knowledge Graphs through Star Patterns. arXiv 2020, arXiv:2002.09172. [Google Scholar] [CrossRef]
- Polleres, A.; Kamdar, M.R.; Fernández, J.D.; Tudorache, T.; Musen, M.A. A More Decentralized Vision for Linked Data. In Proceedings of the 2nd Workshop on Decentralizing the Semantic Web Co-Located with the 17th International Semantic Web Conference, DeSemWeb@ISWC 2018, Monterey, CA, USA, 8 October 2018; Volume 2165. [Google Scholar]
- Verborgh, R.; Sande, M.V.; Hartig, O.; Herwegen, J.V.; Vocht, L.D.; Meester, B.D.; Haesendonck, G.; Colpaert, P. Triple Pattern Fragments: A low-cost Knowledge Graph interface for the Web. J. Web Semant. 2016, 37–38, 184–206. [Google Scholar] [CrossRef]
- Aebeloe, C.; Montoya, G.; Hose, K. A Decentralized Architecture for Sharing and Querying Semantic Data. In Proceedings of the Semantic Web—16th International Conference, ESWC 2019, Portorož, Slovenia, 2–6 June 2019; Volume 11503, pp. 3–18. [Google Scholar] [CrossRef]
- Aebeloe, C.; Montoya, G.; Hose, K. Decentralized Indexing over a Network of RDF Peers. In Proceedings of the Semantic Web—ISWC 2019—18th International Semantic Web Conference, Auckland, New Zealand, 26–30 October 2019; Volume 11778, pp. 3–20. [Google Scholar] [CrossRef]
- Cai, M.; Frank, M.R. RDFPeers: A scalable distributed RDF repository based on a structured peer-to-peer network. In Proceedings of the 13th International Conference on World Wide Web, WWW 2004, New York, NY, USA, 17–20 May 2004; pp. 650–657. [Google Scholar] [CrossRef]
- Azzam, A.; Fernández, J.D.; Acosta, M.; Beno, M.; Polleres, A. SMART-KG: Hybrid Shipping for SPARQL Querying on the Web. In Proceedings of the WWW ’20: The Web Conference 2020, Taipei, Taiwan, 20–24 April 2020; pp. 984–994. [Google Scholar] [CrossRef]
- Hartig, O.; Aranda, C.B. Bindings-Restricted Triple Pattern Fragments. In Proceedings of the on the Move to Meaningful Internet Systems: OTM 2016 Conferences—Confederated International Conferences: CoopIS, C&TC, and ODBASE 2016, Rhodes, Greece, 24–28 October 2016; Volume 10033, pp. 762–779. [Google Scholar] [CrossRef]
- Minier, T.; Skaf-Molli, H.; Molli, P. SaGe: Préemption Web pour les services publics d’évaluation de requêtes SPARQL. In Proceedings of the IC 2019: 30es Journées Francophones d’Ingénierie des Connaissances (Proceedings of the 30th French Knowledge Engineering Conference), Toulouse, France, 2–4 July 2019; p. 141. [Google Scholar]
- Montoya, G.; Aebeloe, C.; Hose, K. Towards Efficient Query Processing over Heterogeneous RDF Interfaces. In Proceedings of the Emerging Topics in Semantic Technologies—ISWC 2018 Satellite Events [Best Papers from 13 of the Workshops Co-Located with the ISWC 2018 Conference], Monterey, CA, USA, October 2018; Demidova, E., Zaveri, A., Simperl, E., Eds.; IOS Press: Amsterdam, The Netherlands, 2018; Volume 36, pp. 39–53. [Google Scholar] [CrossRef]
- Azzam, A.; Aebeloe, C.; Montoya, G.; Keles, I.; Polleres, A.; Hose, K. WiseKG: Balanced Access to Web Knowledge Graphs. In Proceedings of the WWW ’21: The Web Conference 2021, Ljubljana, Slovenia, 19–23 April 2021; Leskovec, J., Grobelnik, M., Najork, M., Tang, J., Zia, L., Eds.; ACM/IW3C2. 2021; pp. 1422–1434. [Google Scholar] [CrossRef]
- Junior, A.C.; Debruyne, C.; Brennan, R.; O’Sullivan, D. FunUL: A method to incorporate functions into uplift mapping languages. In Proceedings of the 18th International Conference on Information Integration and Web-Based Applications and Services, Singapore, 28–30 November 2016. [Google Scholar]
- Dimou, A. R2RML and RML Comparison for RDF Generation, their Rules Validation and Inconsistency Resolution. arXiv 2020, arXiv:2005.06293. [Google Scholar] [CrossRef]
- Dimou, A.; Vander Sande, M.; Colpaert, P.; Verborgh, R.; Mannens, E.; Van de Walle, R. RDF mapping language (RML). Specif. Propos. Draft. 2014. Available online: https://rml.io/specs/rml/ (accessed on 18 August 2024).
- Iglesias, E.; Jozashoori, S.; Chaves-Fraga, D.; Collarana, D.; Vidal, M.E. SDM-RDFizer: An RML Interpreter for the Efficient Creation of RDF Knowledge Graphs. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtual Event, 19–23 October 2020. [Google Scholar]
- Knoblock, C.A.; Szekely, P.A.; Ambite, J.L.; Goel, A.; Gupta, S.; Lerman, K.; Muslea, M.; Taheriyan, M.; Mallick, P. Semi-automatically Mapping Structured Sources into the Semantic Web. In Proceedings of the Semantic Web: Research and Applications—9th Extended Semantic Web Conference, ESWC 2012, Heraklion, Greece, 27–31 May 2012; Simperl, E., Cimiano, P., Polleres, A., Corcho, Ó., Presutti, V., Eds.; Lecture Notes in Computer Science. Springer: Cham, Switzerland, 2012; Volume 7295, pp. 375–390. [Google Scholar] [CrossRef]
- Jain, N.; Liao, G.; Willke, T.L. Graphbuilder: Scalable graph ETL framework. In Proceedings of the First International Workshop on Graph Data Management Experiences and Systems (GRADES), New York, NY, USA, 23 June 2013. [Google Scholar]
- Kricke, M.; Peukert, E.; Rahm, E. Graph data transformations in Gradoop. In Proceedings of the Conference on Database Systems for Business, Technology and Web (BTW), Rostock, Germany, 4–8 March 2019. [Google Scholar] [CrossRef]
- Angles, R.; Thakkar, H.; Tomaszuk, D. Mapping RDF databases to property graph databases. IEEE Access 2020, 8, 86091–86110. [Google Scholar] [CrossRef]
- Lefrançois, M.; Zimmermann, A.; Bakerally, N. A SPARQL Extension for Generating RDF from Heterogeneous Formats. In Proceedings of the Extended Semantic Web Conference (ESWC), Portoroz, Slovenia, 28 May–1 June 2017. [Google Scholar]
- de Medeiros, L.F.; Priyatna, F.; Corcho, Ó. MIRROR: Automatic R2RML Mapping Generation from Relational Databases. In Proceedings of the International Conference on Web Engineering (ICWE), Rotterdam, The Netherlands, 23–26 June 2015. [Google Scholar]
- Sicilia, Á.; Nemirovski, G. AutoMap4OBDA: Automated Generation of R2RML Mappings for OBDA. In Proceedings of the International Conference Knowledge Engineering and Knowledge Management (EKAW), Bologna, Italy, 19–23 November 2016. [Google Scholar]
- Jiménez-Ruiz, E.; Kharlamov, E.; Zheleznyakov, D.; Horrocks, I.; Pinkel, C.; Skjæveland, M.G.; Thorstensen, E.; Mora, J. BootOX: Practical Mapping of RDBs to OWL 2. In Proceedings of the International Workshop on the Semantic Web (ISWC), Bethlehem, PA, USA, 11–15 October 2015. [Google Scholar]
- Rahm, E.; Do, H.H. Data cleaning: Problems and current approaches. IEEE Data Eng. Bull. 2000, 23, 3–13. [Google Scholar]
- Abedjan, Z.; Chu, X.; Deng, D.; Fernandez, R.C.; Ilyas, I.F.; Ouzzani, M.; Papotti, P.; Stonebraker, M.; Tang, N. Detecting Data Errors: Where are we and what needs to be done? Proc. VLDB Endow. 2016, 9, 993–1004. [Google Scholar] [CrossRef]
- Ilyas, I.F.; Chu, X. Data Cleaning; Morgan & Claypool: San Rafael, CA, USA, 2019; ISBN 978-1-4503-7152-0. [Google Scholar]
- Fiorelli, M.; Stellato, A. Lifting Tabular Data to RDF: A Survey. In Proceedings of the Metadata and Semantic Research (MTSR), Virtual, 2–4 December 2020; Garoufallou, E., Ovalle-Perandones, M.A., Eds.; Springer: Cham, Switzerland, 2020; pp. 85–96. [Google Scholar] [CrossRef]
- Abedjan, Z.; Golab, L.; Naumann, F.; Papenbrock, T. Data profiling. Synth. Lect. Data Manag. 2018, 10, 1–154. [Google Scholar]
- Beskales, G.; Ilyas, I.F.; Golab, L. Sampling the Repairs of Functional Dependency Violations under Hard Constraints. Proc. VLDB Endow. 2010, 3, 197–207. [Google Scholar] [CrossRef]
- Beskales, G.; Ilyas, I.F.; Golab, L.; Galiullin, A. On the relative trust between inconsistent data and inaccurate constraints. In Proceedings of the 29th IEEE International Conference on Data Engineering, ICDE 2013, Brisbane, Australia, 8–12 April 2013; Jensen, C.S., Jermaine, C.M., Zhou, X., Eds.; IEEE Computer Society: Washington, DC, USA, 2013; pp. 541–552. [Google Scholar] [CrossRef]
- Khayyat, Z.; Ilyas, I.F.; Jindal, A.; Madden, S.; Ouzzani, M.; Papotti, P.; Quiané-Ruiz, J.; Tang, N.; Yin, S. BigDansing: A System for Big Data Cleansing. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Australia, 31 May–4 June 2015; Sellis, T.K., Davidson, S.B., Ives, Z.G., Eds.; ACM: New York, NY, USA, 2015; pp. 1215–1230. [Google Scholar] [CrossRef]
- Kolahi, S.; Lakshmanan, L.V.S. On approximating optimum repairs for functional dependency violations. In Proceedings of the Database Theory—ICDT 2009, 12th International Conference, St. Petersburg, Russia, 23–25 March 2009; Fagin, R., Ed.; ACM: New York, NY, USA, 2009; Volume 361, pp. 53–62. [Google Scholar] [CrossRef]
- Bohannon, P.; Fan, W.; Geerts, F.; Jia, X.; Kementsietsidis, A. Conditional Functional Dependencies for Data Cleaning. In Proceedings of the 23rd International Conference on Data Engineering, ICDE 2007, The Marmara Hotel, Istanbul, Turkey, 15–20 April 2007; pp. 746–755. [Google Scholar] [CrossRef]
- Fan, W.; Geerts, F.; Jia, X.; Kementsietsidis, A. Conditional functional dependencies for capturing data inconsistencies. ACM Trans. Database Syst. 2008, 33, 48. [Google Scholar] [CrossRef]
- Geerts, F.; Mecca, G.; Papotti, P.; Santoro, D. The LLUNATIC Data-Cleaning Framework. Proc. VLDB Endow. 2013, 6, 625–636. [Google Scholar] [CrossRef]
- Chu, X.; Ilyas, I.F.; Papotti, P. Holistic data cleaning: Putting violations into context. In Proceedings of the 29th IEEE International Conference on Data Engineering, ICDE 2013, Brisbane, Australia, 8–12 April 2013; pp. 458–469. [Google Scholar] [CrossRef]
- Heidari, A.; McGrath, J.; Ilyas, I.F.; Rekatsinas, T. HoloDetect: Few-Shot Learning for Error Detection. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, 30 June–5 July 2019; pp. 829–846. [Google Scholar] [CrossRef]
- Lopatenko, A.; Bravo, L. Efficient Approximation Algorithms for Repairing Inconsistent Databases. In Proceedings of the 23rd International Conference on Data Engineering, ICDE 2007, The Marmara Hotel, Istanbul, Turkey, 15–20 April 2007; pp. 216–225. [Google Scholar] [CrossRef]
- Rekatsinas, T.; Chu, X.; Ilyas, I.F.; Ré, C. HoloClean: Holistic Data Repairs with Probabilistic Inference. Proc. VLDB Endow. 2017, 10, 1190–1201. [Google Scholar] [CrossRef]
- Krishnan, S.; Wang, J.; Wu, E.; Franklin, M.J.; Goldberg, K. ActiveClean: Interactive Data Cleaning For Statistical Modeling. Proc. VLDB Endow. 2016, 9, 948–959. [Google Scholar] [CrossRef]
- Mahdavi, M.; Abedjan, Z.; Fernandez, R.C.; Madden, S.; Ouzzani, M.; Stonebraker, M.; Tang, N. Raha: A Configuration-Free Error Detection System. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, 30 June–5 July 2019; pp. 865–882. [Google Scholar] [CrossRef]
- Milani, M.; Zheng, Z.; Chiang, F. CurrentClean: Spatio-Temporal Cleaning of Stale Data. In Proceedings of the 35th IEEE International Conference on Data Engineering, ICDE 2019, Macao, China, 8–11 April 2019; pp. 172–183. [Google Scholar] [CrossRef]
- Assadi, A.; Milo, T.; Novgorodov, S. DANCE: Data Cleaning with Constraints and Experts. In Proceedings of the 33rd IEEE International Conference on Data Engineering, ICDE 2017, San Diego, CA, USA, 19–22 April 2017; pp. 1409–1410. [Google Scholar] [CrossRef]
- Chu, X.; Ouzzani, M.; Morcos, J.; Ilyas, I.F.; Papotti, P.; Tang, N.; Ye, Y. KATARA: Reliable Data Cleaning with Knowledge Bases and Crowdsourcing. Proc. VLDB Endow. 2015, 8, 1952–1955. [Google Scholar] [CrossRef]
- He, J.; Veltri, E.; Santoro, D.; Li, G.; Mecca, G.; Papotti, P.; Tang, N. Interactive and Deterministic Data Cleaning. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, 26 June–1 July 2016; pp. 893–907. [Google Scholar] [CrossRef]
- Thirumuruganathan, S.; Berti-Équille, L.; Ouzzani, M.; Quiané-Ruiz, J.; Tang, N. UGuide: User-Guided Discovery of FD-Detectable Errors. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, 14–19 May 2017; pp. 1385–1397. [Google Scholar] [CrossRef]
- Tong, Y.; Cao, C.C.; Zhang, C.J.; Li, Y.; Chen, L. CrowdCleaner: Data cleaning for multi-version data on the web via crowdsourcing. In Proceedings of the IEEE 30th International Conference on Data Engineering, ICDE 2014, Chicago, IL, USA, 31 March–4 April 2014; pp. 1182–1185. [Google Scholar] [CrossRef]
- Yakout, M.; Elmagarmid, A.K.; Neville, J.; Ouzzani, M.; Ilyas, I.F. Guided data repair. Proc. VLDB Endow. 2011, 4, 279–289. [Google Scholar] [CrossRef]
- Wang, R.; Li, Y.; Wang, J. Sudowoodo: Contrastive Self-supervised Learning for Multi-purpose Data Integration and Preparation. In Proceedings of the 2023 IEEE 39th International Conference on Data Engineering (ICDE), Los Alamitos, CA, USA, 3–7 April 2023; pp. 1502–1515. [Google Scholar] [CrossRef]
- Neutatz, F.; Chen, B.; Abedjan, Z.; Wu, E. From Cleaning before ML to Cleaning for ML. IEEE Data Eng. Bull. 2021, 44, 24–41. [Google Scholar]
- Hao, S.; Tang, N.; Li, G.; Li, J.; Feng, J. Distilling relations using knowledge bases. VLDB J. 2018, 27, 497–519. [Google Scholar] [CrossRef]
- Ge, C.; Gao, Y.; Weng, H.; Zhang, C.; Miao, X.; Zheng, B. KGClean: An Embedding Powered Knowledge Graph Cleaning Framework. arXiv 2020, arXiv:2004.14478. [Google Scholar] [CrossRef]
- Noy, N.F.; McGuinness, D.L. Ontology Development 101: A Guide to Creating Your First Ontology; Stanford Knowledge Systems Laboratory Technical Report KSL-01-05; Stanford Knowledge Systems Laboratory: Stanford, CA, USA, 2001. [Google Scholar]
- Al-Aswadi, F.N.; Chan, H.Y.; Gan, K.H. Automatic ontology construction from text: A review from shallow to deep learning trend. Artif. Intell. Rev. 2020, 53, 3901–3928. [Google Scholar] [CrossRef]
- Browarnik, A.; Maimon, O. Ontology learning from text: Why the ontology learning layer cake is not viable. Int. J. Signs Semiot. Syst. (IJSSS) 2015, 4, 1–14. [Google Scholar] [CrossRef]
- Wong, W.; Liu, W.; Bennamoun, M. Ontology learning from text: A look back and into the future. ACM Comput. Surv. (CSUR) 2012, 44, 1–36. [Google Scholar] [CrossRef]
- Giglou, H.B.; D’Souza, J.; Auer, S. LLMs4OL: Large Language Models for Ontology Learning. In Proceedings of the Semantic Web-ISWC 2023—22nd International Semantic Web Conference, Athens, Greece, 6–10 November 2023; Volume 14265, pp. 408–427. [Google Scholar] [CrossRef]
- Funk, M.; Hosemann, S.; Jung, J.C.; Lutz, C. Towards Ontology Construction with Language Models. In Proceedings of the Joint proceedings of the 1st workshop on Knowledge Base Construction from Pre-Trained Language Models (KBC-LM) and the 2nd challenge on Language Models for Knowledge Base Construction (LM-KBC) Co-Located with the 22nd International Semantic Web Conference (ISWC 2023), Athens, Greece, 6 November 2023; Volume 3577. [Google Scholar]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models are Few-Shot Learners. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), Vancouver, BC, Canada, 6–12 December 2020; Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2020; Volume 33, pp. 1877–1901. [Google Scholar]
- Kommineni, V.K.; König-Ries, B.; Samuel, S. From human experts to machines: An LLM supported approach to ontology and Knowledge Graph construction. arXiv 2024, arXiv:2403.08345. [Google Scholar] [CrossRef]
- Zhang, B.; Carriero, V.A.; Schreiberhuber, K.; Tsaneva, S.; González, L.S.; Kim, J.; de Berardinis, J. OntoChat: A Framework for Conversational Ontology Engineering using Language Models. arXiv 2024, arXiv:2403.05921. [Google Scholar] [CrossRef]
- da Silva, L.M.V.; Köcher, A.; Gehlhoff, F.; Fay, A. On the Use of Large Language Models to Generate Capability Ontologies. arXiv 2024, arXiv:2404.17524. [Google Scholar] [CrossRef]
- Ma, C.; Molnár, B. Ontology learning from relational database: Opportunities for semantic information integration. Vietnam J. Comput. Sci. 2022, 9, 31–57. [Google Scholar] [CrossRef]
- De Virgilio, R.; Maccioni, A.; Torlone, R. R2G: A Tool for Migrating Relations to Graphs. In Proceedings of the International Conference on Extending Database Technology (EDBT), Athens, Greece, 24–28 March 2014; pp. 640–643. [Google Scholar]
- Petermann, A.; Junghanns, M.; Müller, R.; Rahm, E. BIIIG: Enabling business intelligence with integrated instance graphs. In Proceedings of the 2014 IEEE 30th International Conference on Data Engineering Workshops, Chicago, IL, USA, 31 March–4 April 2014; pp. 4–11. [Google Scholar]
- Lehmann, J.; Auer, S.; Bühmann, L.; Tramp, S. Class expression learning for ontology engineering. J. Web Semant. 2011, 9, 71–81. [Google Scholar] [CrossRef]
- Bühmann, L.; Lehmann, J.; Westphal, P. DL-Learner—A framework for inductive learning on the Semantic Web. J. Web Semant. 2016, 39, 15–24. [Google Scholar] [CrossRef]
- Obraczka, D.; Saeedi, A.; Rahm, E. Knowledge Graph Completion with FAMER (DI2KG Challenge Winner). In Proceedings of the 1st International Workshop on Challenges and Experiences from Data Integration to Knowledge Graphs Co-Located with the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD 2019), Anchorage, AK, USA, 5 August 2019. [Google Scholar]
- Suchanek, F.M.; Abiteboul, S.; Senellart, P. PARIS: Probabilistic Alignment of Relations, Instances, and Schema. Proc. VLDB Endow. 2011, 5, 157–168. [Google Scholar] [CrossRef]
- Rahm, E.; Bernstein, P.A. A survey of approaches to automatic schema matching. VLDB J. 2001, 10, 334–350. [Google Scholar] [CrossRef]
- Euzenat, J.; Shvaiko, P. Ontology Matching; Springer: Cham, Switzerland, 2007; Volume 18, ISBN 978-3-642-38721-0. [Google Scholar]
- Bellahsene, Z.; Bonifati, A.; Rahm, E. Schema Matching and Mapping; Springer: Cham, Switzerland, 2011; ISBN 978-3-642-16518-4. [Google Scholar]
- Rahm, E. Towards Large-Scale Schema and Ontology Matching. In Schema Matching and Mapping; Springer: Cham, Switzerland, 2011; pp. 3–27. ISBN 978-3-642-16518-4. [Google Scholar] [CrossRef]
- Otero-Cerdeira, L.; Rodríguez-Martínez, F.J.; Gómez-Rodríguez, A. Ontology matching: A literature review. Expert Syst. Appl. 2015, 42. [Google Scholar] [CrossRef]
- Do, H.H.; Rahm, E. COMA—A system for flexible combination of schema matching approaches. In Proceedings of the 28th International Conference on Very Large Databases (VLDB), Hong Kong, China, 20–23 August 2002; pp. 610–621. [Google Scholar]
- Zhang, Y.; Wang, X.; Lai, S.; He, S.; Liu, K.; Zhao, J.; Lv, X. Ontology Matching with Word Embeddings. In Proceedings of the Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data—13th China National Conference, CCL 2014, and Second International Symposium, NLP-NABD 2014, Wuhan, China, 18–19 October 2014; Volume 8801, pp. 34–45. [Google Scholar] [CrossRef]
- Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.S.; Dean, J. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013, Lake Tahoe, NV, USA, 5–8 December 2013; pp. 3111–3119. [Google Scholar]
- Ayala, D.; Hernández, I.; Ruiz, D.; Rahm, E. LEAPME: Learning-based Property Matching with Embeddings. Data Knowl. Eng. 2022, 137, 101943. [Google Scholar] [CrossRef]
- Portisch, J.; Costa, G.; Stefani, K.; Kreplin, K.; Hladik, M.; Paulheim, H. Ontology Matching Through Absolute Orientation of Embedding Spaces. In Proceedings of the Semantic Web: ESWC 2022 Satellite Events, Hersonissos, Greece, 29 May–2 June 2022; Volume 13384, pp. 153–157. [Google Scholar] [CrossRef]
- Portisch, J.; Hladik, M.; Paulheim, H. RDF2Vec Light—A Lightweight Approach for Knowledge Graph Embeddings. arXiv 2020, arXiv:2009.07659. [Google Scholar] [CrossRef]
- Qiang, Z.; Wang, W.; Taylor, K. Agent-OM: Leveraging Large Language Models for Ontology Matching. arXiv 2023, arXiv:2312.00326. [Google Scholar] [CrossRef]
- Hertling, S.; Paulheim, H. OLaLa: Ontology Matching with Large Language Models. In Proceedings of the 12th Knowledge Capture Conference 2023, Pensacola, FL, USA, 5–7 December 2023; ACM: New York, NY, USA; pp. 5–7. [Google Scholar] [CrossRef]
- Pottinger, R.A.; Bernstein, P.A. Merging models based on given correspondences. In Proceedings of the 2003 VLDB Conference, Berlin, Germany, 9–12 September 2003; Elsevier: Amsterdam, The Netherlands, 2003; pp. 862–873. [Google Scholar]
- Raunich, S.; Rahm, E. Target-driven merging of taxonomies with ATOM. Inf. Syst. 2014, 42, 1–14. [Google Scholar] [CrossRef]
- Osman, I.; Yahia, S.B.; Diallo, G. Ontology integration: Approaches and challenging issues. Inf. Fusion 2021, 71, 38–63. [Google Scholar] [CrossRef]
- Usbeck, R.; Ngonga Ngomo, A.C.; Auer, S.; Gerber, D.; Both, A. AGDISTIS—Graph-Based Disambiguation of Named Entities using Linked Data. In Proceedings of the 13th International Semantic Web Conference, Riva del Garda, Italy, 19–23 October 2014. [Google Scholar]
- Ferragina, P.; Scaiella, U. TAGME: On-the-fly annotation of short text fragments (by wikipedia entities). In Proceedings of the 19th ACM Conference on Information and Knowledge Management, CIKM 2010, Toronto, ON, Canada, 26–30 October 2010; pp. 1625–1628. [Google Scholar] [CrossRef]
- Piccinno, F.; Ferragina, P. From TagME to WAT: A New Entity Annotator. In Proceedings of the First International Workshop on Entity Recognition & Disambiguation, New York, NY, USA, 11 July 2014; ERD ’14. pp. 55–62. [Google Scholar] [CrossRef]
- Manning, C.; Surdeanu, M.; Bauer, J.; Finkel, J.; Bethard, S.; McClosky, D. The Stanford CoreNLP Natural Language Processing Toolkit. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Baltimore, MD, USA, 23–24 June 2014; pp. 55–60. [Google Scholar] [CrossRef]
- Goldberg, Y. Neural Network Methods for Natural Language Processing; Synthesis Lectures on Human Language Technologies; Morgan & Claypool Publishers: San Rafael, CA, USA, 2017; ISBN 978-3-031-01037-8. [Google Scholar] [CrossRef]
- Li, J.; Sun, A.; Han, J.; Li, C. A Survey on Deep Learning for Named Entity Recognition. IEEE Trans. Knowl. Data Eng. 2022, 34, 50–70. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019; Volume 1: (Long and Short Papers). Burstein, J., Doran, C., Solorio, T., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 4171–4186. [Google Scholar] [CrossRef]
- Harnoune, A.; Rhanoui, M.; Mikram, M.; Yousfi, S.; Elkaimbillah, Z.; El Asri, B. BERT Based Clinical Knowledge Extraction for Biomedical Knowledge Graph Construction and Analysis. Comput. Methods Programs Biomed. Update 2021, 1, 100042. [Google Scholar] [CrossRef]
- Caufield, J.H.; Hegde, H.; Emonet, V.; Harris, N.L.; Joachimiak, M.P.; Matentzoglu, N.; Kim, H.; Moxon, S.A.T.; Reese, J.T.; Haendel, M.A.; et al. Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): A method for populating knowledge bases using zero-shot learning. Bioinformatics 2024, 40, btae104. [Google Scholar] [CrossRef] [PubMed]
- Moon, S.; Neves, L.; Carvalho, V. Multimodal Named Entity Recognition for Short Social Media Posts. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, LA, USA, 1–6 June 2018; pp. 852–860. [Google Scholar] [CrossRef]
- Yu, J.; Jiang, J.; Yang, L.; Xia, R. Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal Transformer. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Virtual, 5–10 July 2020; pp. 3342–3352. [Google Scholar] [CrossRef]
- Pezeshkpour, P.; Chen, L.; Singh, S. Embedding Multimodal Relational Data for Knowledge Base Completion. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), Brussels, Belgium, 31 October–4 November 2018; pp. 3208–3218. [Google Scholar] [CrossRef]
- Li, M.; Zareian, A.; Lin, Y.; Pan, X.; Whitehead, S.; Chen, B.; Wu, B.; Ji, H.; Chang, S.F.; Voss, C.; et al. GAIA: A Fine-Grained Multimedia Knowledge Extraction System. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Virtual, 5–10 July 2020; pp. 77–86. [Google Scholar] [CrossRef]
- Ding, Y.; Yu, J.; Liu, B.; Hu, Y.; Cui, M.; Wu, Q. MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-Based Visual Question Answering. arXiv 2022, arXiv:2203.09138. [Google Scholar] [CrossRef]
- Martinez-Rodriguez, J.L.; Hogan, A.; Lopez-Arevalo, I. Information extraction meets the Semantic Web: A survey. Semant. Web 2020, 11, 255–335. [Google Scholar] [CrossRef]
- Kulkarni, S.; Singh, A.; Ramakrishnan, G.; Chakrabarti, S. Collective annotation of Wikipedia entities in web text. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining–KDD, Paris, France, 28 June–1 July 2009; ACM Press: New York, NY, USA, 2009. [Google Scholar] [CrossRef]
- Milne, D.; Witten, I.H. Learning to link with wikipedia. In Proceedings of the 17th ACM Conference on Information and Knowledge Mining—CIKM, Napa Valley, CA, USA, 26–30 October 2008; ACM Press: New York, NY, USA, 2008. [Google Scholar] [CrossRef]
- Han, X.; Sun, L.; Zhao, J. Collective entity linking in web text: A graph-based method. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, Beijing, China, 25–29 July 2011; Ma, W., Nie, J., Baeza-Yates, R., Chua, T., Croft, W.B., Eds.; ACM: New York, NY, USA, 2011; pp. 765–774. [Google Scholar] [CrossRef]
- Medelyan, O.; Witten, I.H.; Milne, D. Topic Indexing with Wikipedia. In Proceedings of the First AAAI Workshop on Wikipedia and Artificial Intelligence (WIKIAI 2008), Washington, DC, USA, 13–14 July 2008. [Google Scholar]
- Hoffart, J.; Milchevski, D.; Weikum, G.; Anand, A.; Singh, J. The Knowledge Awakens: Keeping Knowledge Bases Fresh with Emerging Entities. In Proceedings of the 25th International Conference on World Wide Web, WWW 2016, Montreal, QC, Canada, 11–15 April 2016; Companion Volume. Bourdeau, J., Hendler, J., Nkambou, R., Horrocks, I., Zhao, B.Y., Eds.; ACM: New York, NY, USA, 2016; pp. 203–206. [Google Scholar] [CrossRef]
- Mudgal, S.; Li, H.; Rekatsinas, T.; Doan, A.; Park, Y.; Krishnan, G.; Deep, R.; Arcaute, E.; Raghavendra, V. Deep Learning for Entity Matching: A Design Space Exploration. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, 10–15 June 2018; Das, G., Jermaine, C.M., Bernstein, P.A., Eds.; ACM: New York, NY, USA, 2018; pp. 19–34. [Google Scholar] [CrossRef]
- Hearst, M.A. Automatic Acquisition of Hyponyms from Large Text Corpora. In Proceedings of the 14th International Conference on Computational Linguistics, COLING 1992, Nantes, France, 23–28 August 1992; pp. 539–545. [Google Scholar]
- Agichtein, E.; Gravano, L. Snowball: Extracting relations from large plain-text collections. In Proceedings of the Fifth ACM Conference on Digital Libraries, San Antonio, TX, USA, 2–7 June 2000; ACM: New York, NY, USA, 2000; pp. 85–94. [Google Scholar] [CrossRef]
- Brin, S. Extracting Patterns and Relations from the World Wide Web. In Proceedings of the World Wide Web and Databases, International Workshop WebDB’98, Valencia, Spain, 27–28 March 1998; Selected Papers; Lecture Notes in Computer Science. Atzeni, P., Mendelzon, A.O., Mecca, G., Eds.; Springer: Cham, Switzerland, 1998; Volume 1590, pp. 172–183. [Google Scholar] [CrossRef]
- Zhou, G.; Zhang, M.; Ji, D.H.; Zhu, Q. Tree Kernel-Based Relation Extraction with Context-Sensitive Structured Parse Tree Information. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic, 28–30 June 2007; pp. 728–736. [Google Scholar]
- Nguyen, T.H.; Grishman, R. Relation Extraction: Perspective from Convolutional Neural Networks. In Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, VS@NAACL-HLT 2015, Denver, CO, USA, 5 June 2015; pp. 39–48. [Google Scholar] [CrossRef]
- Zeng, D.; Liu, K.; Chen, Y.; Zhao, J. Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, 17–21 September 2015; Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y., Eds.; ACL: Stroudsburg, PA, USA, 2015; pp. 1753–1762. [Google Scholar] [CrossRef]
- Baldini Soares, L.; FitzGerald, N.; Ling, J.; Kwiatkowski, T. Matching the Blanks: Distributional Similarity for Relation Learning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 2895–2905. [Google Scholar] [CrossRef]
- Wu, S.; He, Y. Enriching Pre-trained Language Model with Entity Information for Relation Classification. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM 2019, Beijing, China, 3–7 November 2019; pp. 2361–2364. [Google Scholar] [CrossRef]
- Han, X.; Gao, T.; Lin, Y.; Peng, H.; Yang, Y.; Xiao, C.; Liu, Z.; Li, P.; Zhou, J.; Sun, M. More Data, More Relations, More Context and More Openness: A Review and Outlook for Relation Extraction. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, Suzhou, China, 4–7 December 2020; pp. 745–758. [Google Scholar]
- Chung, H.W.; Hou, L.; Longpre, S.; Zoph, B.; Tay, Y.; Fedus, W.; Li, Y.; Wang, X.; Dehghani, M.; Brahma, S.; et al. Scaling instruction-finetuned language models. J. Mach. Learn. Res. 2024, 25, 1–53. [Google Scholar]
- Chen, X.; Zhang, N.; Xie, X.; Deng, S.; Yao, Y.; Tan, C.; Huang, F.; Si, L.; Chen, H. KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization for Relation Extraction. In Proceedings of the WWW ’22: The ACM Web Conference 2022, Lyon, France, 25–29 April 2022; Laforest, F., Troncy, R., Simperl, E., Agarwal, D., Gionis, A., Herman, I., Médini, L., Eds.; ACM: New York, NY, USA, 2022; pp. 2778–2788. [Google Scholar] [CrossRef]
- Kirkpatrick, J.; Pascanu, R.; Rabinowitz, N.; Veness, J.; Desjardins, G.; Rusu, A.A.; Milan, K.; Quan, J.; Ramalho, T.; Grabska-Barwinska, A.; et al. Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. USA 2017, 114, 3521–3526. [Google Scholar] [CrossRef] [PubMed]
- Hu, C.; Yang, D.; Jin, H.; Chen, Z.; Xiao, Y. Improving Continual Relation Extraction through Prototypical Contrastive Learning. In Proceedings of the 29th International Conference on Computational Linguistics, COLING 2022, Gyeongju, Republic of Korea, 12–17 October 2022; pp. 1885–1895. [Google Scholar]
- Zhao, K.; Xu, H.; Yang, J.; Gao, K. Consistent Representation Learning for Continual Relation Extraction. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, 22–27 May 2022; pp. 3402–3411. [Google Scholar] [CrossRef]
- Vashishth, S.; Jain, P.; Talukdar, P.P. CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, WWW 2018, Lyon, France, 23–27 April 2018; pp. 1317–1327. [Google Scholar] [CrossRef]
- Daiber, J.; Jakob, M.; Hokamp, C.; Mendes, P.N. Improving Efficiency and Accuracy in Multilingual Entity Extraction. In Proceedings of the 9th International Conference on Semantic Systems (I-Semantics), Graz, Austria, 4–6 September 2013. [Google Scholar]
- Clancy, R.; Ilyas, I.F.; Lin, J. Scalable Knowledge Graph Construction from Text Collections. In Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER), Hong Kong, China, 3 November 2019; pp. 39–46. [Google Scholar] [CrossRef]
- Han, X.; Gao, T.; Yao, Y.; Ye, D.; Liu, Z.; Sun, M. OpenNRE: An Open and Extensible Toolkit for Neural Relation Extraction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations, Hong Kong, China, 3 November 2019; pp. 169–174. [Google Scholar] [CrossRef]
- Elliott, D.; Keller, F. Image Description using Visual Dependency Representations. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, WA, USA, 18–21 October 2013; pp. 1292–1302. [Google Scholar]
- Zheng, C.; Feng, J.; Fu, Z.; Cai, Y.; Li, Q.; Wang, T. Multimodal Relation Extraction with Efficient Graph Alignment. In Proceedings of the 29th ACM International Conference on Multimedia, New York, NY, USA, 20–24 October 2021; pp. 5298–5306. [Google Scholar] [CrossRef]
- Köpcke, H.; Rahm, E. Frameworks for entity matching: A comparison. Data Knowl. Eng. 2010, 69, 197–210. [Google Scholar] [CrossRef]
- Christen, P. The data matching process. In Data-Centric Systems and Applications; Springer: Cham, Switzerland, 2012; pp. 23–35. ISBN 978-3-642-31164-2. [Google Scholar]
- Nentwig, M.; Hartung, M.; Ngomo, A.N.; Rahm, E. A survey of current Link Discovery frameworks. Semant. Web 2017, 8, 419–436. [Google Scholar] [CrossRef]
- Barlaug, N.; Gulla, J.A. Neural networks for entity matching: A survey. ACM Trans. Knowl. Discov. Data (TKDD) 2021, 15, 1–37. [Google Scholar] [CrossRef]
- Christophides, V.; Efthymiou, V.; Palpanas, T.; Papadakis, G.; Stefanidis, K. An Overview of End-to-End Entity Resolution for Big Data. ACM Comput. Surv. 2020, 53, 1–42. [Google Scholar] [CrossRef]
- Papadakis, G.; Skoutas, D.; Thanos, E.; Palpanas, T. Blocking and filtering techniques for entity resolution: A survey. ACM Comput. Surv. (CSUR) 2020, 53, 1–42. [Google Scholar] [CrossRef]
- Saeedi, A.; Peukert, E.; Rahm, E. Using link features for entity clustering in Knowledge Graphs. In Proceedings of the European Semantic Web Conference (EWSC) 2018, Heraklion, Crete, Greece, 3–7 June 2018; Springer: Cham, Switzerland, 2018; pp. 576–592. [Google Scholar]
- Papadakis, G.; Tsekouras, L.; Thanos, E.; Pittaras, N.; Simonini, G.; Skoutas, D.; Isaris, P.; Giannakopoulos, G.; Palpanas, T.; Koubarakis, M. JedAI3: Beyond batch, blocking-based Entity Resolution. In Proceedings of the 23th EDBT, Copenhagen, Denmark, 30 March–2 April 2020; pp. 603–606. [Google Scholar]
- Ebraheem, M.; Thirumuruganathan, S.; Joty, S.R.; Ouzzani, M.; Tang, N. DeepER—Deep Entity Resolution. arXiv 2017, arXiv:1710.00597. [Google Scholar] [CrossRef]
- Sun, Z.; Zhang, Q.; Hu, W.; Wang, C.; Chen, M.; Akrami, F.; Li, C. A Benchmarking Study of Embedding-based Entity Alignment for Knowledge Graphs. Proc. VLDB Endow. 2020, 13, 2326–2340. [Google Scholar] [CrossRef]
- Obraczka, D.; Schuchart, J.; Rahm, E. Embedding-Assisted Entity Resolution for Knowledge Graphs. In Proceedings of the 2nd International Workshop on Knowledge Graph Construction Co-Located with 18th Extended Semantic Web Conference (ESWC 2021), Online, 6 June 2021; Volume 2873. [Google Scholar]
- Leone, M.; Huber, S.; Arora, A.; García-Durán, A.; West, R. A Critical Re-Evaluation of Neural Methods for Entity Alignment. Proc. VLDB Endow. 2022, 15, 1712–1725. [Google Scholar] [CrossRef]
- Papadakis, G.; Ioannou, E.; Thanos, E.; Palpanas, T. The Four Generations of Entity Resolution. Synthesis Lectures on Data Management; Springer: Cham, Switzerland, 2021; ISBN 978-3-031-00750-7. [Google Scholar] [CrossRef]
- Wang, Y.; Cui, Y.; Liu, W.; Sun, Z.; Jiang, Y.; Han, K.; Hu, W. Facing Changes: Continual Entity Alignment for Growing Knowledge Graphs. In Proceedings of the Semantic Web-ISWC 2022—21st International Semantic Web Conference, Virtual Event, 23–27 October 2022; Volume 13489, pp. 196–213. [Google Scholar] [CrossRef]
- Gruenheid, A.; Dong, X.L.; Srivastava, D. Incremental record linkage. Proc. VLDB Endow. 2014, 7, 697–708. [Google Scholar] [CrossRef]
- Gazzarri, L.; Herschel, M. End-to-end Task Based Parallelization for Entity Resolution on Dynamic Data. In Proceedings of the 2021 IEEE 37th International Conference on Data Engineering (ICDE), Chania, Greece, 19–22 April 2021; pp. 1248–1259. [Google Scholar]
- Saeedi, A.; Nentwig, M.; Peukert, E.; Rahm, E. Scalable matching and clustering of entities with FAMER. Complex Syst. Inform. Model. Q. 2018, 16, 61–83. [Google Scholar] [CrossRef]
- Ramadan, B.; Christen, P. Forest-Based Dynamic Sorted Neighborhood Indexing for Real-Time Entity Resolution. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (CIKM), Shanghai, China, 3–7 November 2014; pp. 1787–1790. [Google Scholar] [CrossRef]
- Ramadan, B.; Christen, P.; Liang, H.; Gayler, R.W. Dynamic Sorted Neighborhood Indexing for Real-Time Entity Resolution. J. Data Inf. Qual. 2015, 6. [Google Scholar] [CrossRef]
- Karapiperis, D.; Gkoulalas-Divanis, A.; Verykios, V.S. Summarization Algorithms for Record Linkage. In Proceedings of the EDBT, Vienna, Austria, 26–29 March 2018; pp. 73–84. [Google Scholar]
- Brasileiro Araújo, T.; Stefanidis, K.; Santos Pires, C.E.; Nummenmaa, J.; Pereira da Nóbrega, T. Incremental blocking for entity resolution over web streaming data. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, Thessaloniki, Greece, 14–17 October 2019; pp. 332–336. [Google Scholar]
- Araújo, T.B.; Stefanidis, K.; Santos Pires, C.E.; Nummenmaa, J.; Da Nóbrega, T.P. Schema-agnostic blocking for streaming data. In Proceedings of the 35th Annual ACM Symposium on Applied Computing, Brno, Czech Republic, 30 March–3 April 2020; pp. 412–419. [Google Scholar]
- Javdani, D.; Rahmani, H.; Allahgholi, M.; Karimkhani, F. DeepBlock: A Novel Blocking Approach for Entity Resolution using Deep Learning. In Proceedings of the 2019 5th International Conference on Web Research (ICWR), Tehran, Iran, 24–25 April 2019; pp. 41–44. [Google Scholar]
- Zhang, W.; Wei, H.; Sisman, B.; Dong, X.L.; Faloutsos, C.; Page, D. AutoBlock: A Hands-off Blocking Framework for Entity Matching. In Proceedings of the WSDM ’20: The Thirteenth ACM International Conference on Web Search and Data Mining, Houston, TX, USA, 3–7 February 2020; pp. 744–752. [Google Scholar] [CrossRef]
- Thirumuruganathan, S.; Li, H.; Tang, N.; Ouzzani, M.; Govind, Y.; Paulsen, D.; Fung, G.; Doan, A. Deep Learning for Blocking in Entity Matching: A Design Space Exploration. Proc. VLDB Endow. 2021, 14, 2459–2472. [Google Scholar] [CrossRef]
- Hassanzadeh, O.; Chiang, F.; Lee, H.C.; Miller, R.J. Framework for evaluating clustering algorithms in duplicate detection. Proc. VLDB Endow. 2009, 2, 1282–1293. [Google Scholar] [CrossRef]
- Saeedi, A.; Peukert, E.; Rahm, E. Comparative evaluation of distributed clustering schemes for multi-source entity resolution. In Proceedings of the European Conference on Advances in Databases and Information Systems (ADBIS), Nicosia, Cyprus, 24–27 September 2017; Springer: Cham, Switzerland, 2017; pp. 278–293. [Google Scholar]
- Welch, J.M.; Sane, A.; Drome, C. Fast and accurate incremental entity resolution relative to an entity knowledge base. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIRK 2012), Maui, HI, USA, 29 October–2 November 2012; pp. 2667–2670. [Google Scholar]
- Brunner, U.; Stockinger, K. Entity Matching with Transformer Architectures—A Step Forward in Data Integration. In Proceedings of the 23rd International Conference on Extending Database Technology, EDBT 2020, Copenhagen, Denmark, 30 March–2 April 2020; pp. 463–473. [Google Scholar] [CrossRef]
- Li, Y.; Li, J.; Suhara, Y.; Doan, A.; Tan, W. Deep Entity Matching with Pre-Trained Language Models. Proc. VLDB Endow. 2020, 14, 50–60. [Google Scholar] [CrossRef]
- Peeters, R.; Bizer, C. Dual-Objective Fine-Tuning of BERT for Entity Matching. Proc. VLDB Endow. 2021, 14, 1913–1921. [Google Scholar] [CrossRef]
- Ge, C.; Wang, P.; Chen, L.; Liu, X.; Zheng, B.; Gao, Y. CollaborEM: A Self-Supervised Entity Matching Framework Using Multi-Features Collaboration. IEEE Trans. Knowl. Data Eng. 2023, 35, 12139–12152. [Google Scholar] [CrossRef]
- Yao, D.; Gu, Y.; Cong, G.; Jin, H.; Lv, X. Entity Resolution with Hierarchical Graph Attention Networks. In Proceedings of the SIGMOD ’22: International Conference on Management of Data, Philadelphia, PA, USA, 12–17 June 2022; Ives, Z.G., Bonifati, A., Abbadi, A.E., Eds.; ACM: New York, NY, USA, 2022; pp. 429–442. [Google Scholar] [CrossRef]
- Tu, J.; Fan, J.; Tang, N.; Wang, P.; Chai, C.; Li, G.; Fan, R.; Du, X. Domain Adaptation for Deep Entity Resolution. In Proceedings of the SIGMOD ’22: International Conference on Management of Data, Philadelphia, PA, USA, 12–17 June 2022; Ives, Z.G., Bonifati, A., Abbadi, A.E., Eds.; ACM: New York, NY, USA, 2022; pp. 443–457. [Google Scholar] [CrossRef]
- Tang, J.; Zuo, Y.; Cao, L.; Madden, S. Generic entity resolution models. In Proceedings of the NeurIPS 2022 First Table Representation Workshop, New Orleans, LA, USA, 2 December 2022. [Google Scholar]
- Zhang, R.; Su, Y.; Trisedya, B.D.; Zhao, X.; Yang, M.; Cheng, H.; Qi, J. AutoAlign: Fully Automatic and Effective Knowledge Graph Alignment Enabled by Large Language Models. IEEE Trans. Knowl. Data Eng. 2024, 36, 2357–2371. [Google Scholar] [CrossRef]
- Li, Q.; Ji, C.; Guo, S.; Liang, Z.; Wang, L.; Li, J. Multi-Modal Knowledge Graph Transformer Framework for Multi-Modal Entity Alignment. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, 6–10 December 2023; pp. 987–999. [Google Scholar] [CrossRef]
- Bai, L.; Song, X.; Zhu, L. Joint Multi-Feature Information Entity Alignment for Cross-Lingual Temporal Knowledge Graph with BERT. IEEE Trans. Big Data 2024, 1–13. [Google Scholar] [CrossRef]
- Fanourakis, N.; Lekbour, F.; Efthymiou, V.; Renton, G.; Christophides, V. HybEA: Hybrid Attention Models for Entity Alignment. arXiv 2024, arXiv:2407.02862. [Google Scholar] [CrossRef]
- Bleiholder, J.; Naumann, F. Data fusion. ACM Comput. Surv. (CSUR) 2009, 41, 1–41. [Google Scholar] [CrossRef]
- Bizer, C.; Becker, C.; Mendes, P.N.; Isele, R.; Matteini, A.; Schultz, A. Ldif—A framework for large-scale Linked Data integration. In Proceedings of the (WWW) 2012 Developer Track, Lyon, France, 18–20 April 2012. [Google Scholar]
- Mendes, P.N.; Mühleisen, H.; Bizer, C. Sieve: Linked data quality assessment and fusion. In Proceedings of the 2012 Joint EDBT/ICDT Workshops, Berlin, Germany, 30 March 2012; pp. 116–123. [Google Scholar]
- Dong, X.; Berti-Équille, L.; Srivastava, D. Data Fusion: Resolving Conflicts from Multiple Sources. In Proceedings of the Interational Conference on Web-Age Information Management (WAIM 2013), Beidaihe, China, 14–16 June 2013. [Google Scholar]
- Angles, R.; Thakkar, H.; Tomaszuk, D. RDF and Property Graphs Interoperability: Status and Issues. In Proceedings of the 13th Alberto Mendelzon International Workshop on Foundations of Data Management, Asunción, Paraguay, 3–7 June 2019; Volume 2369. [Google Scholar]
- Paulheim, H.; Bizer, C. Type Inference on Noisy RDF Data. In Proceedings of the Semantic Web-ISWC 2013—12th International Semantic Web Conference, Sydney, Australia, 21–25 October 2013; Volume 8218, pp. 510–525. [Google Scholar] [CrossRef]
- Paulheim, H.; Bizer, C. Improving the Quality of Linked Data Using Statistical Distributions. Int. J. Semant. Web Inf. Syst. 2014, 10, 63–86. [Google Scholar] [CrossRef]
- Lutov, A.; Roshankish, S.; Khayati, M.; Cudré-Mauroux, P. StaTIX—Statistical Type Inference on Linked Data. In Proceedings of the IEEE International Conference on Big Data, Big Data 2018, Seattle, WA, USA, 10–13 December 2018; pp. 2253–2262. [Google Scholar] [CrossRef]
- Zhao, Y.; Zhang, A.; Xie, R.; Liu, K.; Wang, X. Connecting Embeddings for Knowledge Graph Entity Typing. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online Event, 5–10 July 2020; pp. 6419–6428. [Google Scholar] [CrossRef]
- Aprosio, A.P.; Giuliano, C.; Lavelli, A. Extending the Coverage of DBpedia Properties Using Distant Supervision over Wikipedia. In Proceedings of the NLP & DBpedia Workshop Co-Located with the 12th International Semantic Web Conference (ISWC 2013), Sydney, Australia, 21–25 October 2013; Springer: Cham, Switzerland, 2013. [Google Scholar]
- Gerber, D.; Hellmann, S.; Bühmann, L.; Soru, T.; Usbeck, R.; Ngonga Ngomo, A.C. Real-time RDF extraction from unstructured data streams. In Proceedings of the International Semantic Web Conference (ISWC), Sydney, Australia, 21–25 October 2013; Springer: Cham, Switzerland, 2013; pp. 135–150. [Google Scholar]
- Gerber, D.; Ngomo, A.C.N. Bootstrapping the Linked Data web. In Proceedings of the 1st Workshop on Web Scale Knowledge Extraction@ ISWC, Bonn, Germany, October 2011; Volume 2011, p. 61. [Google Scholar]
- Mintz, M.; Bills, S.; Snow, R.; Jurafsky, D. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Singapore, 2–7 August 2009; pp. 1003–1011. [Google Scholar]
- West, R.; Gabrilovich, E.; Murphy, K.; Sun, S.; Gupta, R.; Lin, D. Knowledge base completion via search-based question answering. In Proceedings of the 23rd International World Wide Web Conference, WWW ’14, Seoul, Republic of Korea, 7–11 April 2014; pp. 515–526. [Google Scholar] [CrossRef]
- Lange, D.; Böhm, C.; Naumann, F. Extracting structured information from Wikipedia articles to populate infoboxes. In Proceedings of the 19th ACM Conference on Information and Knowledge Management, CIKM 2010, Toronto, ON, Canada, 26–30 October 2010; pp. 1661–1664. [Google Scholar] [CrossRef]
- Fields, C.R. Probabilistic models for segmenting and labeling sequence data. In Proceedings of the ICML 2001, San Francisco, CA, USA, 28 June–1 July 2001. [Google Scholar]
- Blevins, T.; Zettlemoyer, L. Moving Down the Long Tail of Word Sense Disambiguation with Gloss Informed Bi-encoders. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Virtual, 5–10 July 2020; ACM: New York, NY, USA; pp. 1006–1017. [Google Scholar] [CrossRef]
- Munoz, E.; Hogan, A.; Mileo, A. Triplifying wikipedia’s tables. In Proceedings of the First International Conference on Linked Data for Information Extraction (LD4IE), Sydney, Australia, 21 October 2013; pp. 26–37. [Google Scholar]
- Ritze, D.; Lehmberg, O.; Bizer, C. Matching html tables to dbpedia. In Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics (WIMS), Larnaca, Cyprus, 13–15 July 2015; pp. 1–6. [Google Scholar]
- Paulheim, H.; Ponzetto, S.P. Extending DBpedia with Wikipedia List Pages. In Proceedings of the 2013th International Conference on NLP & DBpedia, Sydney, Australia, 22 October 2013; pp. 85–90. [Google Scholar]
- Bordes, A.; Usunier, N.; García-Durán, A.; Weston, J.; Yakhnenko, O. Translating Embeddings for Modeling Multi-relational Data. In Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013, Lake Tahoe, NV, USA, 5–8 December 2013; pp. 2787–2795. [Google Scholar]
- Wang, Z.; Zhang, J.; Feng, J.; Chen, Z. Knowledge Graph Embedding by Translating on Hyperplanes. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, Québec City, QC, Canada, 27 –31 July 2014; pp. 1112–1119. [Google Scholar]
- Kolyvakis, P.; Kalousis, A.; Kiritsis, D. HyperKG: Hyperbolic Knowledge Graph Embeddings for Knowledge Base Completion. arXiv 2019, arXiv:1908.04895. [Google Scholar] [CrossRef]
- Ali, M.; Berrendorf, M.; Hoyt, C.T.; Vermue, L.; Galkin, M.; Sharifzadeh, S.; Fischer, A.; Tresp, V.; Lehmann, J. Bringing Light Into the Dark: A Large-scale Evaluation of Knowledge Graph Embedding Models under a Unified Framework. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 8825–8845. [Google Scholar] [CrossRef]
- Teru, K.K.; Denis, E.G.; Hamilton, W.L. Inductive Relation Prediction by Subgraph Reasoning. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, Virtual, 13–18 July 2020; Volume 119, pp. 9448–9457. [Google Scholar]
- Galkin, M.; Denis, E.; Wu, J.; Hamilton, W.L. NodePiece: Compositional and Parameter-Efficient Representations of Large Knowledge Graphs. In Proceedings of the International Conference on Learning Representations (ICLR), Online Event, 25–29 April 2022. [Google Scholar]
- Galárraga, L.A.; Teflioudi, C.; Hose, K.; Suchanek, F. AMIE: Association rule mining under incomplete evidence in ontological knowledge bases. In Proceedings of the 22nd International Conference on World Wide Web (WWW), Rio de Janeiro, Brazil, 13–17 May 2013; pp. 413–422. [Google Scholar] [CrossRef]
- Cheng, K.; Ahmed, N.K.; Sun, Y. Neural Compositional Rule Learning for Knowledge Graph Reasoning. In Proceedings of the Eleventh International Conference on Learning Representations, (ICLR) 2023, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Romero, A.A.; Grau, B.C.; Horrocks, I. MORe: Modular Combination of OWL Reasoners for Ontology Classification. In Proceedings of the Semantic Web–ISWC 2012—11th International Semantic Web Conference, Boston, MA, USA, 11–15 November 2012; Volume 7649, pp. 1–16. [Google Scholar] [CrossRef]
- Wang, C.; Feng, Z.; Zhang, X.; Wang, X.; Rao, G.; Fu, D. ComR: A combined OWL reasoner for ontology classification. Front. Comput. Sci. 2019, 13, 139–156. [Google Scholar] [CrossRef]
- Yao, L.; Mao, C.; Luo, Y. KG-BERT: BERT for Knowledge Graph Completion. arXiv 2019, arXiv:1909.03193. [Google Scholar] [CrossRef]
- Choi, B.; Jang, D.; Ko, Y. MEM-KGC: Masked Entity Model for Knowledge Graph Completion with Pre-Trained Language Model. IEEE Access 2021, 9, 132025–132032. [Google Scholar] [CrossRef]
- Veseli, B.; Singhania, S.; Razniewski, S.; Weikum, G. Evaluating Language Models for Knowledge Base Completion. In Proceedings of the Semantic Web—20th International Conference, ESWC 2023, Hersonissos, Greece, 28 May–1 June 2023; Volume 13870, pp. 227–243. [Google Scholar] [CrossRef]
- Omeliyanenko, J.; Zehe, A.; Hotho, A.; Schlör, D. CapsKG: Enabling Continual Knowledge Integration in Language Models for Automatic Knowledge Graph Completion. In Proceedings of the Semantic Web—ISWC 2023—22nd International Semantic Web Conference, Athens, Greece, 6–10 November 2023; Volume 14265, pp. 618–636. [Google Scholar] [CrossRef]
- Liu, P.; Yuan, W.; Fu, J.; Jiang, Z.; Hayashi, H.; Neubig, G. Pre-Train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM Comput. Surv. 2023, 55, 195:1–195:35. [Google Scholar] [CrossRef]
- Sun, M.; Zhou, K.; He, X.; Wang, Y.; Wang, X. GPPT: Graph Pre-training and Prompt Tuning to Generalize Graph Neural Networks. In Proceedings of the KDD ’22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 14–18 August 2022; pp. 1717–1727. [Google Scholar] [CrossRef]
- Sun, X.; Cheng, H.; Li, J.; Liu, B.; Guan, J. All in One: Multi-Task Prompting for Graph Neural Networks. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2023, Long Beach, CA, USA, 6–10 August 2023; pp. 2120–2131. [Google Scholar] [CrossRef]
- Wang, R.Y.; Strong, D.M. Beyond Accuracy: What Data Quality Means to Data Consumers. J. Manag. Inf. Syst. 1996, 12, 5–33. [Google Scholar] [CrossRef]
- Tartir, S.; Arpinar, I.B.; Moore, M.; Sheth, A.P.; Aleman-Meza, B. OntoQA: Metric-based ontology quality analysis. In Proceedings of the IEEE ICDM Workshop on Knowledge Acquisition from Distributed, Autonomous, Semantically Heterogeneous Data and Knowledge Sources, Houston, TX, USA, 27 November 2005. [Google Scholar]
- McDaniel, M.; Storey, V.C. Evaluating domain ontologies: Clarification, classification, and challenges. ACM Comput. Surv. (CSUR) 2019, 52, 1–44. [Google Scholar] [CrossRef]
- Bizer, C.; Cyganiak, R. Quality-driven information filtering using the WIQA policy framework. J. Web Semant. 2009, 7, 1–10. [Google Scholar] [CrossRef]
- Acosta, M.; Zaveri, A.; Simperl, E.; Kontokostas, D.; Auer, S.; Lehmann, J. Crowdsourcing Linked Data quality assessment. In Proceedings of the International Semantic Web Conference (ISWC), Sydney, Australia, 21–25 October 2013; Springer: Cham, Switzerland, 2013; pp. 260–276. [Google Scholar]
- Senaratne, A.; Omran, P.G.; Williams, G.J. Unsupervised Anomaly Detection in Knowledge Graphs. In Proceedings of the 10th International Joint Conference on Knowledge Graphs (IJCKG), Virtual, 6–8 December 2021. [Google Scholar]
- Ma, Y.; Gao, H.; Wu, T.; Qi, G. Learning Disjointness Axioms With Association Rule Mining and Its Application to Inconsistency Detection of Linked Data. In Proceedings of the China Semantic Web Symposium (CSWS), Changsha, China, 5–7 September 2014. [Google Scholar]
- Li, F.; Dong, X.L.; Langen, A.; Li, Y. Knowledge verification for long-tail verticals. Proc. VLDB Endow. 2017, 10, 1370–1381. [Google Scholar] [CrossRef]
- Lehmann, J.; Gerber, D.; Morsey, M.; Ngonga Ngomo, A.C. Defacto-deep fact validation. In Proceedings of the International Semantic Web Conference (ISWC), Boston, MA, USA, 11–15 November 2012; Springer: Cham, Switzerland, 2012; pp. 312–327. [Google Scholar]
- Tufek, N.; Saissre, A.; Hanbury, A. Validating Semantic Artifacts With Large Language Models. In Proceedings of the 21th European Semantic Web Conference (ESWC), Krete, Greece, 24–30 May 2024. [Google Scholar]
- Chen, H.; Cao, G.; Chen, J.; Ding, J. A Practical Framework for Evaluating the Quality of Knowledge Graph. In Proceedings of the China Conference on Knowledge Graph and Semantic Computing (CCKS), Hangzhou, China, 24–27 August 2019. [Google Scholar]
- Kontokostas, D.; Zaveri, A.; Auer, S.; Lehmann, J. TripleCheckMate: A Tool for Crowdsourcing the Quality Assessment of Linked Data. In Proceedings of the International Conference on Knowledge Engineering and the Semantic Web (KESW), St. Petersburg, Russia, 7–9 October 2013. [Google Scholar]
- Carlson, A.; Betteridge, J.; Kisiel, B.; Settles, B.; Hruschka, E.; Mitchell, T. Toward an Architecture for Never-Ending Language Learning. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2010, Atlanta, GA, USA, 11–15 July 2010. [Google Scholar]
- Kontokostas, D.; Westphal, P.; Auer, S.; Hellmann, S.; Lehmann, J.; Cornelissen, R.; Zaveri, A. Test-driven evaluation of Linked Data quality. In Proceedings of the 23rd international Conference on World Wide Web (WWW), Seoul, Republic of Korea, 7–11 April 2014. [Google Scholar]
- Röder, M.; Kuchelev, D.; Ngomo, A.N. HOBBIT: A platform for benchmarking Big Linked Data. Data Sci. 2020, 3, 15–35. [Google Scholar] [CrossRef]
- Hertling, S.; Paulheim, H. Gollum: A Gold Standard for Large Scale Multi Source Knowledge Graph Matching. In Proceedings of the 4th Conference on Automated Knowledge Base Construction, AKBC 2022, London, UK, 3–5 November 2022. [Google Scholar]
- Safavi, T.; Koutra, D. CoDEx: A Comprehensive Knowledge Graph Completion Benchmark. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Virtual, 16–20 November 2020. [Google Scholar]
- Li, Z.; Zhu, H.; Lu, Z.; Yin, M. Synthetic Data Generation with Large Language Models for Text Classification: Potential and Limitations. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, 6–10 December 2023; pp. 10443–10461. [Google Scholar]
- Mihindukulasooriya, N.; Tiwari, S.; Enguix, C.F.; Lata, K. Text2KGBench: A Benchmark for Ontology-Driven Knowledge Graph Generation from Text. In Proceedings of the Semantic Web-ISWC 2023—22nd International Semantic Web Conference, Athens, Greece, 6–10 November 2023; Volume 14266, pp. 247–265. [Google Scholar] [CrossRef]
- Meyer, L.; Frey, J.; Junghanns, K.; Brei, F.; Bulert, K.; Gründer-Fahrer, S.; Martin, M. Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph Engineering. In Proceedings of the Posters and Demo Track of the 19th International Conference on Semantic Systems Co-Located with 19th International Conference on Semantic Systems (SEMANTiCS 2023), Leipzing, Germany, 20–22 September 2023; Volume 3526. [Google Scholar]
- Galkin, M.; Auer, S.; Vidal, M.E.; Scerri, S. Enterprise Knowledge Graphs: A Semantic Approach for Knowledge Management in the Next Generation of Enterprise Information Systems. In Proceedings of the 19th International Conference on Enterprise Information Systems (ICEIS), Porto, Portugal, 26–29 April 2017; pp. 88–98. [Google Scholar] [CrossRef]
- Färber, M. The Microsoft Academic Knowledge Graph: A Linked Data Source with 8 Billion Triples of Scholarly Data. In Proceedings of the Semantic Web-ISWC 2019—18th International Semantic Web Conference, Auckland, New Zealand, 26–30 October 2019; Volume 11779, pp. 113–129. [Google Scholar] [CrossRef]
- Bollacker, K.; Cook, R.; Tufts, P. Freebase: A shared database of structured general human knowledge. In Proceedings of the AAAI, Vancouver, BC, Canada, 22–26 July 2007; Volume 7, pp. 1962–1963. [Google Scholar]
- Suchanek, F.M.; Kasneci, G.; Weikum, G. Yago: A core of semantic knowledge. In Proceedings of the 16th International Conference on World Wide Web (WWW), Banff, AB, Canada, 8–12 May 2007. [Google Scholar]
- Pellissier Tanon, T.; Weikum, G.; Suchanek, F. Yago 4: A reasonable knowledge base. In Proceedings of the European Semantic Web Conference (ESWC), Virtual, 2–5 June 2020; pp. 583–596. [Google Scholar]
- Vrandečić, D.; Krötzsch, M. Wikidata: A free collaborative knowledgebase. Commun. ACM 2014, 57, 78–85. [Google Scholar] [CrossRef]
- Morsey, M.; Lehmann, J.; Auer, S.; Stadler, C.; Hellmann, S. DBpedia and the live extraction of structured data from wikipedia. Program 2012, 46, 157–181. [Google Scholar] [CrossRef]
- Gawriljuk, G.; Harth, A.; Knoblock, C.A.; Szekely, P. A scalable approach to incrementally building Knowledge Graphs. In Proceedings of the International Conference on Theory and Practice of Digital Libraries (TPDL), Hannover, Germany, 5–9 September 2016; Springer: Cham, Switzerland, 2016; pp. 188–199. [Google Scholar]
- Auer, S.; Oelen, A.; Haris, M.; Stocker, M.; D’Souza, J.; Farfar, K.E.; Vogt, L.; Prinz, M.; Wiens, V.; Jaradeh, M.Y. Improving access to scientific literature with Knowledge Graphs. Bibl. Forsch. Prax. 2020, 44, 516–529. [Google Scholar] [CrossRef]
- Dessì, D.; Osborne, F.; Reforgiato Recupero, D.; Buscaldi, D.; Motta, E.; Sack, H. Ai-kg: An automatically generated Knowledge Graph of artificial intelligence. In Proceedings of the International Semantic Web Conference (ISWC), Athens, Greece, 2–6 November 2020; pp. 127–143. [Google Scholar]
- Alberts, H.; Huang, N.; Deshpande, Y.; Liu, Y.; Cho, K.; Vania, C.; Calixto, I. VisualSem: A high-quality Knowledge Graph for vision and language. In Proceedings of the 1st Workshop on Multilingual Representation Learning, Punta Cana, Dominican Republic, 11 November 2021; Ataman, D., Birch, A., Conneau, A., Firat, O., Ruder, S., Sahin, G.G., Eds.; ACL: Stroudsburg, PA, USA, 2021; pp. 138–152. [Google Scholar] [CrossRef]
- Dsouza, A.; Tempelmeier, N.; Yu, R.; Gottschalk, S.; Demidova, E. WorldKG: A World-Scale Geographic Knowledge Graph. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (CIKM), Virtual, 1–5 November 2021; ACM: New York, NY, USA, 2021. [Google Scholar]
- Pellissier Tanon, T.; Vrandečić, D.; Schaffert, S.; Steiner, T.; Pintscher, L. From freebase to wikidata: The great migration. In Proceedings of the 25th International Conference on World Wide Web, Montréal, QC, Canada, 11–15 April 2016; ACM: New York, NY, USA, 2016; pp. 1419–1428. [Google Scholar]
- Piscopo, A.; Kaffee, L.A.; Phethean, C.; Simperl, E. Provenance information in a collaborative Knowledge Graph: An evaluation of Wikidata external references. In Proceedings of the International Semantic Web Conference (ISWC) 2017, Vienna, Austria, 21–25 October 2017; Springer: Cham, Switzerland, 2017; pp. 542–558. [Google Scholar]
- Zhang, Y.; Sheng, M.; Zhou, R.; Wang, Y.; Han, G.; Zhang, H.; Xing, C.; Dong, J. HKGB: An Inclusive, Extensible, Intelligent, Semi-auto-constructed Knowledge Graph Framework for Healthcare with Clinicians’ Expertise Incorporated. Inf. Process. Manag. 2020, 57, 102324. [Google Scholar] [CrossRef]
- Jaradeh, M.Y.; Singh, K.; Stocker, M.; Both, A.; Auer, S. Better Call the Plumber: Orchestrating Dynamic Information Extraction Pipelines. In Proceedings of the Web Engineering—21st International Conference, ICWE 2021, Biarritz, France, 18–21 May 2021; Volume 12706, pp. 240–254. [Google Scholar] [CrossRef]
- Pan, Z.; Su, C.; Deng, Y.; Cheng, J.C.P. Image2Triplets: A computer vision-based explicit relationship extraction framework for updating construction activity Knowledge Graphs. Comput. Ind. 2022, 137, 103610. [Google Scholar] [CrossRef]
- Cimmino, A.; García-Castro, R. Helio: A framework for implementing the life cycle of knowledge graphs. Semant. Web 2024, 15, 223–249. [Google Scholar] [CrossRef]
- Kazakov, Y.; Klinov, P. Incremental Reasoning in OWL EL without Bookkeeping. In Proceedings of the Semantic Web—ISWC 2013—12th International Semantic Web Conference, Sydney, Australia, 21–25 October 2013; Volume 8218, pp. 232–247. [Google Scholar] [CrossRef]
- Jagvaral, B.; Wangon, L.; Park, H.; Jeon, M.; Lee, N.; Park, Y. Large-scale incremental OWL/RDFS reasoning over fuzzy RDF data. In Proceedings of the 2017 IEEE International Conference on Big Data and Smart Computing, BigComp 2017, Jeju Island, Republic of Korea, 13–16 February 2017; pp. 269–273. [Google Scholar] [CrossRef]
- Bhattarai, P.; Ghassemi, M.; Alhanai, T. Open-Source Code Repository Attributes Predict Impact of Computer Science Research. In Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries, Cologne, Germany, 20–24 June 2022; pp. 1–7. [Google Scholar] [CrossRef]
- Mahdavi, M.; Neutatz, F.; Visengeriyeva, L.; Abedjan, Z. Towards automated data cleaning workflows. Mach. Learn. 2019, 15, 16. [Google Scholar]
- Liang, K.; Meng, L.; Liu, M.; Liu, Y.; Tu, W.; Wang, S.; Zhou, S.; Liu, X.; Sun, F.; He, K. A Survey of Knowledge Graph Reasoning on Graph Types: Static, Dynamic, and Multi-Modal. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 1–20. [Google Scholar] [CrossRef] [PubMed]
- Zhao, X.; Jia, Y.; Li, A.; Jiang, R.; Song, Y. Multi-source knowledge fusion: A survey. World Wide Web 2020, 23, 2567–2592. [Google Scholar] [CrossRef]
- Shenoy, K.; Ilievski, F.; Garijo, D.; Schwabe, D.; Szekely, P.A. A study of the quality of Wikidata. J. Web Semant. 2022, 72, 100679. [Google Scholar] [CrossRef]
- Nuzzolese, A.G.; Gentile, A.L.; Presutti, V.; Gangemi, A.; Garigliotti, D.; Navigli, R. Open Knowledge Extraction Challenge. In Proceedings of the SemWebEval (ESWC 2015), Portorož, Slovenia, 31 May–4 June 2015. [Google Scholar]
- Rodríguez, J.M.; Merlino, H.D.; Pesado, P.; García-Martínez, R. Performance Evaluation of Knowledge Extraction Methods. In Proceedings of the International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems (IEA/AIE), Morioka, Japan, 2–4 August 2016. [Google Scholar]
- Zhang, Y.; Zhong, V.; Chen, D.; Angeli, G.; Manning, C.D. Position-aware Attention and Supervised Data Improve Slot Filling. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Copenhagen, Denmark, 7–11 September 2017; ACL: Stroudsburg, PA, USA, 2017. [Google Scholar]
- Euzenat, J.; Meilicke, C.; Stuckenschmidt, H.; Shvaiko, P.; dos Santos, C.T. Ontology Alignment Evaluation Initiative: Six Years of Experience. J. Data Semant. 2011, 15, 158–192. [Google Scholar]
- Köpcke, H.; Thor, A.; Rahm, E. Evaluation of entity resolution approaches on real-world match problems. Proc. VLDB Endow. 2010, 3, 484–493. [Google Scholar] [CrossRef]
- Galkin, M.; Berrendorf, M.; Hoyt, C.T. An Open Challenge for Inductive Link Prediction on Knowledge Graphs. arXiv 2022, arXiv:2203.01520. [Google Scholar] [CrossRef]
- Hu, W.; Fey, M.; Ren, H.; Nakata, M.; Dong, Y.; Leskovec, J. OGB-LSC: A Large-Scale Challenge for Machine Learning on Graphs. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, Online Event, 6–14 December 2021. [Google Scholar]
- Portisch, J.; Hladik, M.; Paulheim, H. Background knowledge in ontology matching: A survey. Semant. Web 2022, 1–55. [Google Scholar] [CrossRef]
- Oliveira, I.L.; Fileto, R.; Speck, R.; Garcia, L.P.; Moussallem, D.; Lehmann, J. Towards holistic Entity Linking: Survey and directions. Inf. Syst. 2021, 95, 101624. [Google Scholar] [CrossRef]
- Pan, J.Z.; Razniewski, S.; Kalo, J.; Singhania, S.; Chen, J.; Dietze, S.; Jabeen, H.; Omeliyanenko, J.; Zhang, W.; Lissandrini, M.; et al. Large Language Models and Knowledge Graphs: Opportunities and Challenges. TGDK 2023, 1, 38. [Google Scholar] [CrossRef]
- Hofer, M.; Frey, J.; Rahm, E. Towards self-configuring Knowledge Graph Construction Pipelines using LLMs—A Case Study with RML. In Proceedings of the 5th International Workshop on Knowledge Graph Construction Co-Located with 21th Extended Semantic Web Conference (ESWC 2024), Hersonissos, Greece, 27 May 2024; Volume 3718. [Google Scholar]
- Sansford, H.J.; Richardson, N.; Maretic, H.P.; Saada, J.N. GraphEval: A Knowledge-Graph Based LLM Hallucination Evaluation Framework. arXiv 2024, arXiv:2407.10793. [Google Scholar] [CrossRef]
- Wu, Z.; Qiu, L.; Ross, A.; Akyürek, E.; Chen, B.; Wang, B.; Kim, N.; Andreas, J.; Kim, Y. Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Mexico City, Mexico, 16–21 June 2024; ACL: Stroudsburg, PA, USA, 2024; pp. 1819–1862. [Google Scholar]
- Tamašauskaitė, G.; Groth, P. Defining a Knowledge Graph Development Process Through a Systematic Review. ACM Trans. Softw. Eng. Methodol. 2022, 32, 1–40. [Google Scholar] [CrossRef]
- Simsek, U.; Angele, K.; Kärle, E.; Opdenplatz, J.; Sommer, D.; Umbrich, J.; Fensel, D. Knowledge Graph Lifecycle: Building and maintaining Knowledge Graphs. In Proceedings of the 2nd International Workshop on Knowledge Graph Construction (KGC) Co-Located with 18th Extended Semantic Web Conference (ESWC 2021), Virtual, 6–10 June 2021. [Google Scholar]
Resource Description Framework (RDF) | Property Graph Model (PGM) | |
---|---|---|
base constructs | triples <subject, predicate, object> | labeled vertices and edges and their properties |
entity identity | IRI-based | local (implementation-specific) |
node classification | rdf:type triples | type labels |
ontology support | RDFS, OWL2 vocabularies | limited, e.g., schema graph |
reasoning/inference | supported, RDFS/OWL-based and other languages | limited, custom queries and procedures |
integrity constraints | SHACL, SHEX | PG-Keys, PG-Schema |
query language | SPARQL(-Star) | Cypher, Gremlin, G-Core, PGQL |
exchange format | N-Triples, N-Quads, (RDF/XML, JSONLD) | application specific e.g., PGEF, GDL |
meta information | reification, singleton-property, (RDF-Star) | dedicated properties |
Terms | Description |
---|---|
entity, instance, subject and object and resource (RDF), individual | KG nodes that represent a specific real-world or abstract thing |
relation, property (RDF) | A relationship (edge, link) between two KG entities. |
type, class, label, concept | Identifier that represents the same kind or group of entities or relations. |
property (PGM), attribute (RDF) | An entity feature identifier pointing to a value |
property value, literal, attribute value | Any value that is not referable to as an entity. |
Year | Domain | Srcs. | Model | Entities | Relations | Types | R-Types | Vers. | Update | ||
---|---|---|---|---|---|---|---|---|---|---|---|
Closed KG | |||||||||||
Google KG [31] | 2012 | Cross, MLang | 1 | Custom, RDF | 1B | >100B | ? | ? | ? | ? | |
Diffbot.com | 2019 | Cross | 1 | RDF | 5.9B | >1T | ? | ? | ? | ? | |
Amazon PG [40] | 2020 | Products | >1 | Custom | 30M | 1B | 19K | 1K | ? | ? | |
Open Access KG | |||||||||||
* Freebase [357] | 2007 | Cross | 1 | RDF | 22M | 3.2B | 53K | 70K | >1 | 2016 | |
DBpedia [108] | 2007 | Cross, MLang | 140 | RDF | 50M | 21B | 1.3K | 55K | >20 | 2023 | |
YAGO [358,359] | 2007 | Cross | 2–3 | RDF(-Star) | 67M | 2B | 10K | 157 | 5 | 2020 | |
NELL [347] | 2010 | Cross | ≥1 | Custom, RDF | 2M | 2.8M | 1.2K | 834 | >1100 | 2018 | |
* Wikidata [360] | 2012 | Cross, MLang | 1 | Custom, RDF | 100M | 14B | 300K | 10.3K | >100 | 2023 | |
DBpedia-EN Live [361] | 2012 | Cross | 1 | RDF | 7.6M | 1.1B | 800 | 1.3K | 1 | 2023 | |
Artist-KG [362] | 2016 | Artists | 4 | Custom | 161K | 15M | >1 | 18 | 1 | 2016 | |
* ORKG [363] | 2019 | Research | 1 | RDF | 130K | 870K | 1.3K | 6.3K | >1 | 2023 | |
AI-KG [364] | 2020 | AI Science | 3 | RDF | 820K | 1.2M | 5 | 27 | 2 | 2020 | |
CovidGraph [35] | 2020 | COVID-19 | 17 | PGM | 36M | 59M | 128 | 171 | >1 | 2020 | |
DRKG [34] | 2020 | BioMedicine | >7 | CSV | 97K | 5.8M | 17 | 107 | 1 | 2020 | |
VisualSem [365] | 2020 | Cross, MLang | 2 | Custom | 90K | 1.5M | (49K) | 13 | 2 | 2020 | |
WorldKG [366] | 2021 | Geographic | 1 | RDF | 113M | 829M | 1176 | 1820 | 1 | 2021 |
Consumed Data | (Meta) Data | Performed Construction Tasks | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Name of System | System Version/Year | Open Implementation | Unstructured Data | Semi-Structured Data | Structured Data | (Event-)Stream Data | Supplementary Data | (Deep) Provenance | Version/Temporal Data | Additional Metadata | KG Initialization | Data Preprocessing | Ontology Management | Knowledge Extraction | Entity Resolution | * Entity/Value Fusion | Quality Assurnace | Knowledge Completion | Incremental Integration | |
Dataset Specific | ||||||||||||||||||||
DBpedia | 2019 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||||||||||||
YAGO4 | 2020 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||||||||||||
DBpedia-Live | 2012 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||||||||||||
NELL | 2011 | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||||||||||
Artist-KG | 2016 | ✓ | ✓ | ✓ | ||||||||||||||||
AI-KG | 2020 | ✓ | ✓ | ✓ | ✓ | ? | ||||||||||||||
CovidGraph | 2020 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ? | ? | |||||||||||
DRKG | 2020 | ✓ | ✓ | ✓ | ✓ | ? | ||||||||||||||
VisualSem | 2020 | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||||||||||
WorldKG | 2021 | ✓ | ✓ | ✓ | ||||||||||||||||
Toolset/Strategy | ||||||||||||||||||||
FlexiFusion [103] | 2019 | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||||||||||
dstlr [255] | 2019 | ✓ | ✓ | ✓ | ? | |||||||||||||||
XI [88] | 2020 | ✓ | ✓ | ? | ? | ? | ? | ? | ||||||||||||
AutoKnow [40] | 2020 | ✓ | ✓ | ✓ | ||||||||||||||||
HKGB [369] | 2020 | ✓ | ✓ | ✓ | ✓ | ? | ||||||||||||||
SLOGERT [47] | 2021 | ✓ | ✓ | ✓ | ✓ | ? | ? | |||||||||||||
SAGA [84] | 2022 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ? | |||||||||||
Plumber [370] | 2023 | ✓ | ✓ | ✓ | ||||||||||||||||
Image2Triplets [371] | 2023 | ✓ | ✓ | ✓ | ||||||||||||||||
SAKA [109] | 2023 | ✓ | ✓ | ✓ | ? | |||||||||||||||
Helio [372] | 2024 | ✓ | ✓ | ✓ | ✓ | ? | ? | ? | ? | ? |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hofer, M.; Obraczka, D.; Saeedi, A.; Köpcke, H.; Rahm, E. Construction of Knowledge Graphs: Current State and Challenges. Information 2024, 15, 509. https://doi.org/10.3390/info15080509
Hofer M, Obraczka D, Saeedi A, Köpcke H, Rahm E. Construction of Knowledge Graphs: Current State and Challenges. Information. 2024; 15(8):509. https://doi.org/10.3390/info15080509
Chicago/Turabian StyleHofer, Marvin, Daniel Obraczka, Alieh Saeedi, Hanna Köpcke, and Erhard Rahm. 2024. "Construction of Knowledge Graphs: Current State and Challenges" Information 15, no. 8: 509. https://doi.org/10.3390/info15080509
APA StyleHofer, M., Obraczka, D., Saeedi, A., Köpcke, H., & Rahm, E. (2024). Construction of Knowledge Graphs: Current State and Challenges. Information, 15(8), 509. https://doi.org/10.3390/info15080509