Quality Management in Big Data

A special issue of Informatics (ISSN 2227-9709).

Deadline for manuscript submissions: closed (30 August 2017) | Viewed by 55917

Special Issue Editors

Faculty of Informatics, Bundeswehr University Munich, 85579 Neubiberg, Germany
Interests: data quality management; big data; business intelligence; information system success

E-Mail Website
Co-Guest Editor
Department of Machine Learning and Data Processing, Faculty of Informatics, Masaryk University, 60200 Brno, Czech Republic
Interests: database systems; similarity searching; storage structures for non-traditional data; similarity-based data analytics

Special Issue Information

Dear Colleagues,

This Special Issue of Informatics intends to attract submissions on the topic of quality issues in Big Data. In the era of Big Data, organizations are dealing with tremendous amounts of data. These data are fast-moving and can originate from various sources, such as social networks, unstructured data from different websites, or raw feeds from sensors. Big Data solutions are used to optimize business processes and reduce decision-making times, so as to improve operational effectiveness. Big Data practitioners are, however, experiencing a huge number of data quality problems, which can be time-consuming to solve, or can even lead to incorrect data analytics. How to manage quality in Big Data has become quite challenging, and research, so far, can only address limited aspects. In particular, given the complex nature of Big Data, one cannot simply apply traditional data quality management to Big Data Quality Management. Therefore, it creates new challenges for researchers and practitioners to address quality management in Big Data. Hereby, we encourage authors to submit their original research articles, work-in-progress papers, surveys, reviews, and viewpoint articles in this field. Topics of interest include, but are not limited to:

  • Big Data Quality Metrics, Measures, and Models
  • Big Data Governance
  • Big Data Quality in Business Process
  • Master Data Management in Big Data
  • Quality Assessment for Big Data
  • Determining Value of Big Data
  • Data and Information Quality
  • Risk Management in Big Data
  • Data Integration in Big Data
  • Data Cleansing in Big Data
  • Big Data Quality and Analytics
  • Big Data Quality and Internet of Things
  • Big Data Quality Management Processes, Frameworks and Models

Dr. Mouzhi Ge
Guest Editor
Prof. Vlastislav Dohnal
Co-Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Informatics is an international peer-reviewed open access quarterly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research, Other

2 pages, 146 KiB  
Editorial
Quality Management in Big Data
by Mouzhi Ge and Vlastislav Dohnal
Informatics 2018, 5(2), 19; https://doi.org/10.3390/informatics5020019 - 16 Apr 2018
Cited by 11 | Viewed by 7696
Abstract
Due to the importance of quality issues in Big Data, Big Data quality management has attracted significant research attention on how to measure, improve and manage the quality for Big Data. This special issue in the Journal of Informatics thus tends to address [...] Read more.
Due to the importance of quality issues in Big Data, Big Data quality management has attracted significant research attention on how to measure, improve and manage the quality for Big Data. This special issue in the Journal of Informatics thus tends to address the quality problems in Big Data as well as promote further research for Big Data quality. Our editorial describes the state-of-the-art research challenges in the Big Data quality research, and highlights the contributions of each paper accepted in this special issue. Full article
(This article belongs to the Special Issue Quality Management in Big Data)

Research

Jump to: Editorial, Other

4465 KiB  
Article
A Data Quality Strategy to Enable FAIR, Programmatic Access across Large, Diverse Data Collections for High Performance Data Analysis
by Ben Evans, Kelsey Druken, Jingbo Wang, Rui Yang, Clare Richards and Lesley Wyborn
Informatics 2017, 4(4), 45; https://doi.org/10.3390/informatics4040045 - 13 Dec 2017
Cited by 5 | Viewed by 9349
Abstract
To ensure seamless, programmatic access to data for High Performance Computing (HPC) and analysis across multiple research domains, it is vital to have a methodology for standardization of both data and services. At the Australian National Computational Infrastructure (NCI) we have developed a [...] Read more.
To ensure seamless, programmatic access to data for High Performance Computing (HPC) and analysis across multiple research domains, it is vital to have a methodology for standardization of both data and services. At the Australian National Computational Infrastructure (NCI) we have developed a Data Quality Strategy (DQS) that currently provides processes for: (1) Consistency of data structures needed for a High Performance Data (HPD) platform; (2) Quality Control (QC) through compliance with recognized community standards; (3) Benchmarking cases of operational performance tests; and (4) Quality Assurance (QA) of data through demonstrated functionality and performance across common platforms, tools and services. By implementing the NCI DQS, we have seen progressive improvement in the quality and usefulness of the datasets across the different subject domains, and demonstrated the ease by which modern programmatic methods can be used to access the data, either in situ or via web services, and for uses ranging from traditional analysis methods through to emerging machine learning techniques. To help increase data re-usability by broader communities, particularly in high performance environments, the DQS is also used to identify the need for any extensions to the relevant international standards for interoperability and/or programmatic access. Full article
(This article belongs to the Special Issue Quality Management in Big Data)
Show Figures

Figure 1

1857 KiB  
Article
Relative Quality and Popularity Evaluation of Multilingual Wikipedia Articles
by Włodzimierz Lewoniewski, Krzysztof Węcel and Witold Abramowicz
Informatics 2017, 4(4), 43; https://doi.org/10.3390/informatics4040043 - 08 Dec 2017
Cited by 25 | Viewed by 28695
Abstract
Despite the fact that Wikipedia is often criticized for its poor quality, it continues to be one of the most popular knowledge bases in the world. Articles in this free encyclopedia on various topics can be created and edited in about 300 different [...] Read more.
Despite the fact that Wikipedia is often criticized for its poor quality, it continues to be one of the most popular knowledge bases in the world. Articles in this free encyclopedia on various topics can be created and edited in about 300 different language versions independently. Our research has showed that in language sensitive topics, the quality of information can be relatively better in the relevant language versions. However, in most cases, it is difficult for the Wikipedia readers to determine the language affiliation of the described subject. Additionally, each language edition of Wikipedia can have own rules in the manual assessing of the content’s quality. There are also differences in grading schemes between language versions: some use a 6–8 grade system to assess articles, and some are limited to 2–3. This makes automatic quality comparison of articles between various languages a challenging task, particularly if we take into account a large number of unassessed articles; some of the Wikipedia language editions have over 99% of articles without a quality grade. The paper presents the results of a relative quality and popularity assessment of over 28 million articles in 44 selected language versions. Comparative analysis of the quality and the popularity of articles in popular topics was also conducted. Additionally, the correlation between quality and popularity of Wikipedia articles of selected topics in various languages was investigated. The proposed method allows us to find articles with information of better quality that can be used to automatically enrich other language editions of Wikipedia. Full article
(This article belongs to the Special Issue Quality Management in Big Data)
Show Figures

Figure 1

Other

Jump to: Editorial, Research

1025 KiB  
Concept Paper
Big Data in the Era of Health Information Exchanges: Challenges and Opportunities for Public Health
by Janet G. Baseman, Debra Revere and Ian Painter
Informatics 2017, 4(4), 39; https://doi.org/10.3390/informatics4040039 - 10 Nov 2017
Cited by 3 | Viewed by 9539
Abstract
Public health surveillance of communicable diseases depends on timely, complete, accurate, and useful data that are collected across a number of healthcare and public health systems. Health Information Exchanges (HIEs) which support electronic sharing of data and information between health care organizations are [...] Read more.
Public health surveillance of communicable diseases depends on timely, complete, accurate, and useful data that are collected across a number of healthcare and public health systems. Health Information Exchanges (HIEs) which support electronic sharing of data and information between health care organizations are recognized as a source of ‘big data’ in healthcare and have the potential to provide public health with a single stream of data collated across disparate systems and sources. However, given these data are not collected specifically to meet public health objectives, it is unknown whether a public health agency’s (PHA’s) secondary use of the data is supportive of or presents additional barriers to meeting disease reporting and surveillance needs. To explore this issue, we conducted an assessment of big data that is available to a PHA—laboratory test results and clinician-generated notifiable condition report data—through its participation in a HIE. Full article
(This article belongs to the Special Issue Quality Management in Big Data)
Show Figures

Figure 1

Back to TopTop