Using Computational Provenance

A special issue of Informatics (ISSN 2227-9709).

Deadline for manuscript submissions: closed (30 November 2017) | Viewed by 36823

Special Issue Editor


E-Mail Website
Guest Editor
Computer & Information Sciences Dept., UMass Dartmouth, 285 Old Westport Rd., Dartmouth, MA 02747, USA
Interests: computational provenance; data science environments; visualization

Special Issue Information

Dear Colleagues,

This Special Issue of Informatics welcomes submissions on the topic of provenance. Provenance is information about how a particular entity, like a computed result, was generated. The study of provenance encompasses the means and frameworks for capture, the ways the information is modeled, organized, filtered, and queried, and the uses for the data. In addition, provenance may be tailored to specific scientific domains and applications, each with its own unique challenges including security, privacy, and semantics. While there is widespread agreement that capturing and storing provenance is important, the value of that information is linked to how efficiently we can understand and use it after it is created. Thus, provenance is enriched by techniques to query, summarize, and visualize it, as well as methods to connect it to domain-specific information. Provenance may be analyzed or mined to help inform future decisions or exploration. Finally, when capturing or storing provenance, representation and granularity often impact the ability to understand and use provenance. We encourage authors to submit their original research articles, work in progress, surveys, and position papers in this area. The special issue welcomes applications, models, case studies, and frameworks, which are connected to the use of provenance. A list of potential topics includes:

  • Provenance Visualization

  • Provenance Capture

  • Provenance Granularity

  • Distributed Provenance

  • Secure Provenance

  • Privacy Concerns in Provenance

  • Provenance Analysis

  • Applications of Provenance

  • Domain-specific Provenance

  • Provenance Models

  • Provenance Interoperability

  • Provenance for Streaming

  • Reasoning over Provenance

  • Personalization of Provenance

  • Mining Provenance

  • Querying Provenance

  • Evaluations of Provenance Utility

Prof. Dr. David Koop
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Informatics is an international peer-reviewed open access quarterly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

50 pages, 38014 KiB  
Article
Data Provenance for Agent-Based Models in a Distributed Memory
by Delmar B. Davis, Jonathan Featherston, Hoa N. Vo, Munehiro Fukuda and Hazeline U. Asuncion
Informatics 2018, 5(2), 18; https://doi.org/10.3390/informatics5020018 - 09 Apr 2018
Cited by 1 | Viewed by 9600
Abstract
Agent-Based Models (ABMs) assist with studying emergent collective behavior of individual entities in social, biological, economic, network, and physical systems. Data provenance can support ABM by explaining individual agent behavior. However, there is no provenance support for ABMs in a distributed setting. The [...] Read more.
Agent-Based Models (ABMs) assist with studying emergent collective behavior of individual entities in social, biological, economic, network, and physical systems. Data provenance can support ABM by explaining individual agent behavior. However, there is no provenance support for ABMs in a distributed setting. The Multi-Agent Spatial Simulation (MASS) library provides a framework for simulating ABMs at fine granularity, where agents and spatial data are shared application resources in a distributed memory. We introduce a novel approach to capture ABM provenance in a distributed memory, called ProvMASS. We evaluate our technique with traditional data provenance queries and performance measures. Our results indicate that a configurable approach can capture provenance that explains coordination of distributed shared resources, simulation logic, and agent behavior while limiting performance overhead. We also show the ability to support practical analyses (e.g., agent tracking) and storage requirements for different capture configurations. Full article
(This article belongs to the Special Issue Using Computational Provenance)
Show Figures

Figure 1

25 pages, 1576 KiB  
Article
Utilizing Provenance in Reusable Research Objects
by Zhihao Yuan, Dai Hai Ton That, Siddhant Kothari, Gabriel Fils and Tanu Malik
Informatics 2018, 5(1), 14; https://doi.org/10.3390/informatics5010014 - 08 Mar 2018
Cited by 21 | Viewed by 8324
Abstract
Science is conducted collaboratively, often requiring the sharing of knowledge about computational experiments. When experiments include only datasets, they can be shared using Uniform Resource Identifiers (URIs) or Digital Object Identifiers (DOIs). An experiment, however, seldom includes only datasets, but more often includes [...] Read more.
Science is conducted collaboratively, often requiring the sharing of knowledge about computational experiments. When experiments include only datasets, they can be shared using Uniform Resource Identifiers (URIs) or Digital Object Identifiers (DOIs). An experiment, however, seldom includes only datasets, but more often includes software, its past execution, provenance, and associated documentation. The Research Object has recently emerged as a comprehensive and systematic method for aggregation and identification of diverse elements of computational experiments. While a necessary method, mere aggregation is not sufficient for the sharing of computational experiments. Other users must be able to easily recompute on these shared research objects. Computational provenance is often the key to enable such reuse. In this paper, we show how reusable research objects can utilize provenance to correctly repeat a previous reference execution, to construct a subset of a research object for partial reuse, and to reuse existing contents of a research object for modified reuse. We describe two methods to summarize provenance that aid in understanding the contents and past executions of a research object. The first method obtains a process-view by collapsing low-level system information, and the second method obtains a summary graph by grouping related nodes and edges with the goal to obtain a graph view similar to application workflow. Through detailed experiments, we show the efficacy and efficiency of our algorithms. Full article
(This article belongs to the Special Issue Using Computational Provenance)
Show Figures

Figure 1

18 pages, 379 KiB  
Article
Using Introspection to Collect Provenance in R
by Barbara Lerner, Emery Boose and Luis Perez
Informatics 2018, 5(1), 12; https://doi.org/10.3390/informatics5010012 - 01 Mar 2018
Cited by 10 | Viewed by 9836
Abstract
Data provenance is the history of an item of data from the point of its creation to its present state. It can support science by improving understanding of and confidence in data. RDataTracker is an R package that collects data provenance from R [...] Read more.
Data provenance is the history of an item of data from the point of its creation to its present state. It can support science by improving understanding of and confidence in data. RDataTracker is an R package that collects data provenance from R scripts (https://github.com/End-to-end-provenance/RDataTracker). In addition to details on inputs, outputs, and the computing environment collected by most provenance tools, RDataTracker also records a detailed execution trace and intermediate data values. It does this using R’s powerful introspection functions and by parsing R statements prior to sending them to the interpreter so it knows what provenance to collect. The provenance is stored in a specialized graph structure called a Data Derivation Graph, which makes it possible to determine exactly how an output value is computed or how an input value is used. In this paper, we provide details about the provenance RDataTracker collects and the mechanisms used to collect it. We also speculate about how this rich source of information could be used by other tools to help an R programmer gain a deeper understanding of the software used and to support reproducibility. Full article
(This article belongs to the Special Issue Using Computational Provenance)
Show Figures

Figure 1

38 pages, 2623 KiB  
Article
LabelFlow Framework for Annotating Workflow Provenance
by Pinar Alper, Khalid Belhajjame, Vasa Curcin and Carole A. Goble
Informatics 2018, 5(1), 11; https://doi.org/10.3390/informatics5010011 - 23 Feb 2018
Cited by 6 | Viewed by 8620
Abstract
Scientists routinely analyse and share data for others to use. Successful data (re)use relies on having metadata describing the context of analysis of data. In many disciplines the creation of contextual metadata is referred to as reporting. One method of implementing analyses [...] Read more.
Scientists routinely analyse and share data for others to use. Successful data (re)use relies on having metadata describing the context of analysis of data. In many disciplines the creation of contextual metadata is referred to as reporting. One method of implementing analyses is with workflows. A stand-out feature of workflows is their ability to record provenance from executions. Provenance is useful when analyses are executed with changing parameters (changing contexts) and results need to be traced to respective parameters. In this paper we investigate whether provenance can be exploited to support reporting. Specifically; we outline a case-study based on a real-world workflow and set of reporting queries. We observe that provenance, as collected from workflow executions, is of limited use for reporting, as it supports queries partially. We identify that this is due to the generic nature of provenance, its lack of domain-specific contextual metadata. We observe that the required information is available in implicit form, embedded in data. We describe LabelFlow, a framework comprised of four Labelling Operators for decorating provenance with domain-specific Labels. LabelFlow can be instantiated for a domain by plugging it with domain-specific metadata extractors. We provide a tool that takes as input a workflow, and produces as output a Labelling Pipeline for that workflow, comprised of Labelling Operators. We revisit the case-study and show how Labels provide a more complete implementation of reporting queries. Full article
(This article belongs to the Special Issue Using Computational Provenance)
Show Figures

Figure 1

Back to TopTop