Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Pricing Personal Data Based on Data Provenance

Appl. Sci. 2019, 9(16), 3388; https://doi.org/10.3390/app9163388

by Yuncheng Shen^1,2

, Bing Guo^1,*, Yan Shen³, Fan Wu⁴, Hong Zhang¹, Xuliang Duan¹

and Xiangqian Dong¹

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Appl. Sci. 2019, 9(16), 3388; https://doi.org/10.3390/app9163388

Submission received: 22 July 2019 / Revised: 14 August 2019 / Accepted: 14 August 2019 / Published: 17 August 2019

(This article belongs to the Section Computing and Artificial Intelligence)

Round 1

Reviewer 1 Report

There is still a need for improving/checking the readability of the text, for example:
- Line 10: with select-joint query and complex query à with a select-joint query and a complex query
- Line 21: personal data have different values for different objects à personal data have different values for different objectives
- Line 77: stated that relational database has à stated that a relational database has
- Line 97: the newly added sentence needs refinement “… a result tuple …” and what is “the simplest”
- Line 133: is individual à in individual?
- Lines 174-5 are unclear.
- …

Please check property 2 on Lines 245-6, whether the upper-bound is correct (specially stating that “… on any query”).

The effectiveness parameter \Alpha defined in Relation (21) does not coincide with the paramter reported on lines 466 and 480.

The captions of figures 8 and 9 are the same, imo should be defined differently.

The section on the related work (i.e., section 6) does not provide any insight in the differences of the presented works with the current work.

Line 560: the “importance” of source tuples mentioned throughout the paper, but “the source tuple difference” is not elaborated upon enough. How it is determined and where/how its influence is applied. The point is to make it more explicit than it is now.

Review concern #4: the authors reply that “ … the output is the same even …”. The idea behind the proposed adaption was to make the authors’ point clearer. This suggestion is optional for the authors. So the authors may reconsider it if they agree that it improves the readability/understandability of the paper.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

This paper devises a new pricing model based on price setting and pricing, defines minimum provenance and proposes (exact and approximate) pricing algorithm. To my thought, the proposed idea and algorithm have creativity in order to correct the problems in the existing researches. I just want the authors to note that only the following things should be referred to in some revision:

1) there are some awkward parts in English sentences - ex) incorrect verb according to a singular/plural subject, a subject/object should be plural nouns, and so on

2) in line 89-91, please give a clear definition of the symbols.

3) in Figure 2, is it correct that the prices are almost the same between Exact/Approximate ?

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.

Round 1

Reviewer 1 Report

The paper presents a pricing model based on provenance for the minimum set of tuples that contribute to a query. The paper presents two algorithms to calculate the price of a query: An exact algorithm and an approximate algorithm. They show that the former has exponential complexity and the latter has polynomial complexity. Using two queries on two datasets from the practice, the authors illustrate that their pricing model is feasible.

The paper considers a relevant problem, specially for those cases where privacy can be traded with monetary incentives. However, there are a number of shortcomings that need improvements as follows.

— The authors present their pricing model in the context of privacy protection, while the link to privacy protection is rather weak/marginal (the authors mention this link just in the abstract and introduction). As the pricing model is generic, it makes more sense either: Not to start with the privacy aspect or to make the relation to privacy protection deeper than the way is done now. Particularly, for the latter, the statements on lines 21-23 (trading in monetary terms) and on line 48 (fairly) must be supported with more arguments and scientific evidence.
— Section 2.3 does not discuss in enough detail the added value of the proposed pricing model, compared to the existing ones (e.g., the minimal why-provenance). While in lines 323-325 the authors refer to and rely on this difference, as claimed, presented in section 2.
— Section 3 starts with grounding the proposed pricing model, with three important properties, but

(a) without naming these properties in the first paragraph and

(b) without elaborating on why these properties are necessary and sufficient for a valid pricing model.

— Example 1 is interesting, nevertheless it can be extended with another query where <n, cn> can constitute the output tuples (right now the output tuple is just <n>). In this way, Tom and John may appear twice in the output tuples. Based on this variant, the authors can show and argue whether the corresponding input tuples (like that of John/Tom in Relation R) should be compensated twice, compared how it is done now (once).
— At some points in the middle (perhaps line 148 and line 227), the authors introduce the term “base tuple”, without specifying/defining it. As these basic tuples play an important role in the proposed pricing model, it becomes difficult to follow the rest of the reasoning of the paper. Further, the concept of base tuple, can be indicated in/via Example 1.
— The paper suffers from a number of typo’s and unclarity in writing. These make it confusing to follow the proofs and reasoning. Some of these typos/issues are:
— — lines 100-101 unclear
— — definition 1 is not self-contained, what is \theta or other symbols in lines 113-116?
— — lines 124-125 unclear
— — lines 234-235 unclear
— — the motivation for some lemma’s and theorems are mentioned after presenting them (e.g., see line 179). This way of presentation reduces readability.
— — It improves readability if the terms "input tuple" and "output tuple” are used consistently, not just use the term “tuple" to refer to either of them here and there.

— — there are some typos in formulas, e.g., line 219 and line 221, where letters T and O are used to refer to the same items, or in formula 13 the min should be over V(Q, I).

Based on these observations, a major revision is suggested.

Article Menu

Pricing Personal Data Based on Data Provenance

Further Information

Guidelines

MDPI Initiatives

Follow MDPI