Semantics for Big Data Integration

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Information Systems".

Deadline for manuscript submissions: closed (15 September 2018) | Viewed by 27212

Special Issue Editors


E-Mail Website
Guest Editor
Department of Information Engineering, University of Modena and Reggio Emilia, 41121 Modena, Italy
Interests: Big Data integration; information process; application integration
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Dipartimento di Ingegneria "Enzo Ferrari" – DIEF, Università di Modena e Reggio Emilia, Via Pietro Vivarelli 10 - int. 1 - 41125 Modena, Italy
Interests: database; data integration; data fusion; linked open data; big data
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

In recent years, there has been a great deal of interest in big data.  Much of the work on big data has focused on volume and velocity in order to consider data set size, but the problems of variety, velocity, and veracity are equally important in dealing with heterogeneity, diversity, and complexity of data. Semantic technologies can be a means to deal with these issues.

Therefore, the purpose of this Special Issue is to publish high-quality research, from academic and industrial stakeholders, for disseminating innovative solutions that explore how big data can leverage semantics, i.e., what are the challenges and opportunities arising from adapting and transferring semantic technologies to the big data context. 

Original, high-quality contributions that have not yet been published, submitted, or are not currently under review by other journals or peer-reviewed conferences are sought. 

Topics of interest include, but are not limited to, the following topics:

  • interplay of semantics and big data 
  • semantic methods and technologies applied to big data dimensions   
  • scalability of semantic methods and technologies 
  • the use of semantic metadata, linked open data and ontologies for big data
  • semantic for big data extraction, transformation and integration
  • Knowledge integration from big data on the Web 
Prof. Maurizio Vincini
Prof. Domencio Beneventano
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Big data dimensions
  • Big data technology
  • Big data integration
  • Semantic data design
  • Information visibility

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research

3 pages, 164 KiB  
Editorial
Foreword to the Special Issue: “Semantics for Big Data Integration”
by Domenico Beneventano and Maurizio Vincini
Information 2019, 10(2), 68; https://doi.org/10.3390/info10020068 - 18 Feb 2019
Cited by 2 | Viewed by 2682
Abstract
In recent years, a great deal of interest has been shown toward big data. Much of the work on big data has focused on volume and velocity in order to consider dataset size. Indeed, the problems of variety, velocity, and veracity are equally [...] Read more.
In recent years, a great deal of interest has been shown toward big data. Much of the work on big data has focused on volume and velocity in order to consider dataset size. Indeed, the problems of variety, velocity, and veracity are equally important in dealing with the heterogeneity, diversity, and complexity of data, where semantic technologies can be explored to deal with these issues. This Special Issue aims at discussing emerging approaches from academic and industrial stakeholders for disseminating innovative solutions that explore how big data can leverage semantics, for example, by examining the challenges and opportunities arising from adapting and transferring semantic technologies to the big data context. Full article
(This article belongs to the Special Issue Semantics for Big Data Integration)

Research

Jump to: Editorial

26 pages, 1555 KiB  
Article
Integration of Web APIs and Linked Data Using SPARQL Micro-Services—Application to Biodiversity Use Cases
by Franck Michel, Catherine Faron Zucker, Olivier Gargominy and Fabien Gandon
Information 2018, 9(12), 310; https://doi.org/10.3390/info9120310 - 6 Dec 2018
Cited by 7 | Viewed by 5396
Abstract
In recent years, Web APIs have become a de facto standard for exchanging machine-readable data on the Web. Despite this success, however, they often fail in making resource descriptions interoperable due to the fact that they rely on proprietary vocabularies that lack formal [...] Read more.
In recent years, Web APIs have become a de facto standard for exchanging machine-readable data on the Web. Despite this success, however, they often fail in making resource descriptions interoperable due to the fact that they rely on proprietary vocabularies that lack formal semantics. The Linked Data principles similarly seek the massive publication of data on the Web, yet with the specific goal of ensuring semantic interoperability. Given their complementary goals, it is commonly admitted that cross-fertilization could stem from the automatic combination of Linked Data and Web APIs. Towards this goal, in this paper we leverage the micro-service architectural principles to define a SPARQL Micro-Service architecture, aimed at querying Web APIs using SPARQL. A SPARQL micro-service is a lightweight SPARQL endpoint that provides access to a small, resource-centric, virtual graph. In this context, we argue that full SPARQL Query expressiveness can be supported efficiently without jeopardizing servers availability. Furthermore, we demonstrate how this architecture can be used to dynamically assign dereferenceable URIs to Web API resources that do not have URIs beforehand, thus literally “bringing” Web APIs into the Web of Data. We believe that the emergence of an ecosystem of SPARQL micro-services published by independent providers would enable Linked Data-based applications to easily glean pieces of data from a wealth of distributed, scalable, and reliable services. We describe a working prototype implementation and we finally illustrate the use of SPARQL micro-services in the context of two real-life use cases related to the biodiversity domain, developed in collaboration with the French National Museum of Natural History. Full article
(This article belongs to the Special Issue Semantics for Big Data Integration)
Show Figures

Figure 1

17 pages, 2896 KiB  
Article
Chinese Microblog Topic Detection through POS-Based Semantic Expansion
by Lianhong Ding, Bin Sun and Peng Shi
Information 2018, 9(8), 203; https://doi.org/10.3390/info9080203 - 10 Aug 2018
Cited by 3 | Viewed by 4028
Abstract
A microblog is a new type of social media for information publishing, acquiring, and spreading. Finding the significant topics of a microblog is necessary for popularity tracing and public opinion following. This paper puts forward a method to detect topics from Chinese microblogs. [...] Read more.
A microblog is a new type of social media for information publishing, acquiring, and spreading. Finding the significant topics of a microblog is necessary for popularity tracing and public opinion following. This paper puts forward a method to detect topics from Chinese microblogs. Since traditional methods showed low performance on a short text from a microblog, we put forward a topic detection method based on the semantic description of the microblog post. The semantic expansion of the post supplies more information and clues for topic detection. First, semantic features are extracted from a microblog post. Second, the semantic features are expanded according to a thesaurus. Here TongYiCi CiLin is used as the lexical resource to find words with the same meaning. To overcome the polysemy problem, several semantic expansion strategies based on part-of-speech are introduced and compared. Third, an approach to detect topics based on semantic descriptions and an improved incremental clustering algorithm is introduced. A dataset from Sina Weibo is employed to evaluate our method. Experimental results show that our method can bring about better results both for post clustering and topic detection in Chinese microblogs. We also found that the semantic expansion of nouns is far more efficient than for other parts of speech. The potential mechanism of the phenomenon is also analyzed and discussed. Full article
(This article belongs to the Special Issue Semantics for Big Data Integration)
Show Figures

Figure 1

20 pages, 1263 KiB  
Article
LOD for Data Warehouses: Managing the Ecosystem Co-Evolution
by Selma Khouri and Ladjel Bellatreche
Information 2018, 9(7), 174; https://doi.org/10.3390/info9070174 - 17 Jul 2018
Cited by 3 | Viewed by 4329
Abstract
For more than 30 years, data warehouses (DWs) have attracted particular interest both in practice and in research. This success is explained by their ability to adapt to their evolving environment. One of the last challenges for DWs is their [...] Read more.
For more than 30 years, data warehouses (DWs) have attracted particular interest both in practice and in research. This success is explained by their ability to adapt to their evolving environment. One of the last challenges for DWs is their ability to open their frontiers to external data sources in addition to internal sources. The development of linked open data (LOD) as external sources is an excellent opportunity to create added value and enrich the analytical capabilities of DWs. However, the incorporation of LOD in the DW must be accompanied by careful management. In this paper, we are interested in managing the evolution of DW systems integrating internal and external LOD datasets. The particularity of LOD is that they contribute to evolving the DW at several levels: (i) source level, (ii) DW schema level, and (iii) DW design-cycle constructs. In this context, we have to ensure this co-evolution, as conventional evolution approaches are adapted neither to this new kind of source nor to semantic constructs underlying LOD sources. One way of tackling this co-evolution issue is to ensure the traceability of DW constructs for the whole design cycle. Our approach is tested using: the LUBM (Lehigh University BenchMark), different LOD datasets (DBepedia, YAGO, etc.), and Oracle 12c database management system (DBMS) used for the DW deployment. Full article
(This article belongs to the Special Issue Semantics for Big Data Integration)
Show Figures

Figure 1

33 pages, 1604 KiB  
Article
High Performance Methods for Linked Open Data Connectivity Analytics
by Michalis Mountantonakis and Yannis Tzitzikas
Information 2018, 9(6), 134; https://doi.org/10.3390/info9060134 - 3 Jun 2018
Cited by 10 | Viewed by 5868
Abstract
The main objective of Linked Data is linking and integration, and a major step for evaluating whether this target has been reached, is to find all the connections among the Linked Open Data (LOD) Cloud datasets. Connectivity among two or more datasets can [...] Read more.
The main objective of Linked Data is linking and integration, and a major step for evaluating whether this target has been reached, is to find all the connections among the Linked Open Data (LOD) Cloud datasets. Connectivity among two or more datasets can be achieved through common Entities, Triples, Literals, and Schema Elements, while more connections can occur due to equivalence relationships between URIs, such as owl:sameAs, owl:equivalentProperty and owl:equivalentClass, since many publishers use such equivalence relationships, for declaring that their URIs are equivalent with URIs of other datasets. However, there are not available connectivity measurements (and indexes) involving more than two datasets, that cover the whole content (e.g., entities, schema, triples) or “slices” (e.g., triples for a specific entity) of datasets, although they can be of primary importance for several real world tasks, such as Information Enrichment, Dataset Discovery and others. Generally, it is not an easy task to find the connections among the datasets, since there exists a big number of LOD datasets and the transitive and symmetric closure of equivalence relationships should be computed for not missing connections. For this reason, we introduce scalable methods and algorithms, (a) for performing the computation of transitive and symmetric closure for equivalence relationships (since they can produce more connections between the datasets); (b) for constructing dedicated global semantics-aware indexes that cover the whole content of datasets; and (c) for measuring the connectivity among two or more datasets. Finally, we evaluate the speedup of the proposed approach, while we report comparative results for over two billion triples. Full article
(This article belongs to the Special Issue Semantics for Big Data Integration)
Show Figures

Figure 1

14 pages, 1691 KiB  
Article
A Hybrid Information Mining Approach for Knowledge Discovery in Cardiovascular Disease (CVD)
by Stefania Pasanisi and Roberto Paiano
Information 2018, 9(4), 90; https://doi.org/10.3390/info9040090 - 12 Apr 2018
Cited by 9 | Viewed by 4223
Abstract
The healthcare ambit is usually perceived as “information rich” yet “knowledge poor”. Nowadays, an unprecedented effort is underway to increase the use of business intelligence techniques to solve this problem. Heart disease (HD) is a major cause of mortality [...] Read more.
The healthcare ambit is usually perceived as “information rich” yet “knowledge poor”. Nowadays, an unprecedented effort is underway to increase the use of business intelligence techniques to solve this problem. Heart disease (HD) is a major cause of mortality in modern society. This paper analyzes the risk factors that have been identified in cardiovascular disease (CVD) surveillance systems. The Heart Care study identifies attributes related to CVD risk (gender, age, smoking habit, etc.) and other dependent variables that include a specific form of CVD (diabetes, hypertension, cardiac disease, etc.). In this paper, we combine Clustering, Association Rules, and Neural Networks for the assessment of heart-event-related risk factors, targeting the reduction of CVD risk. With the use of the K-means algorithm, significant groups of patients are found. Then, the Apriori algorithm is applied in order to understand the kinds of relations between the attributes within the dataset, first looking within the whole dataset and then refining the results through the subsets defined by the clusters. Finally, both results allow us to better define patients’ characteristics in order to make predictions about CVD risk with a Multilayer Perceptron Neural Network. The results obtained with the hybrid information mining approach indicate that it is an effective strategy for knowledge discovery concerning chronic diseases, particularly for CVD risk. Full article
(This article belongs to the Special Issue Semantics for Big Data Integration)
Show Figures

Figure 1

Back to TopTop