Next Article in Journal
On an Optimal Quadrature Formula in a Hilbert Space of Periodic Functions
Next Article in Special Issue
Special Issue on Algorithms in Decision Support Systems Vol.2
Previous Article in Journal
A Hybrid Clustering Approach Based on Fuzzy Logic and Evolutionary Computation for Anomaly Detection
Previous Article in Special Issue
Designing Reinforcement Learning Algorithms for Digital Interventions: Pre-Implementation Guidelines
 
 
Article
Peer-Review Record

Reasoning about Confidence in Goal Satisfaction

Algorithms 2022, 15(10), 343; https://doi.org/10.3390/a15100343
by Malak Baslyman 1, Daniel Amyot 2,* and John Mylopoulos 2
Reviewer 1:
Reviewer 3: Anonymous
Algorithms 2022, 15(10), 343; https://doi.org/10.3390/a15100343
Submission received: 2 June 2022 / Revised: 19 September 2022 / Accepted: 21 September 2022 / Published: 23 September 2022
(This article belongs to the Special Issue Algorithms in Decision Support Systems Vol. 2)

Round 1

Reviewer 1 Report

Short summary:
============
The paper presents a framework for decision-making based on goal-oriented requirements engineering taking into account the trust level that the values used as input parameters of the model are correct. These parameters are estimates of the properties of leaf tasks.
To achieve the intent, the approach proposes a scale for classifying the inputs (Data Quality Tagging) and a method for evaluating the trust level of calculated values (satisfaction level of high-level goals), from the trust level of the inputs. The paper presents a case study in which the proposal was used.

Strengths:
=======
    •    The proposal is interesting and solves a practical problem within decision-making context
    •    The paper seems sufficiently simple to be useful in practice

Weaknesses:
==========
    •    In the current form, the paper does not read very well, in special the section describing the background is confusing and needs significant review.
    •    The description of the case study has a long description of the scenario, but almost nothing discussing the adopted methodology and how the case study provided evidence to support the proposal
    •    The argument in favor of basing the propagation on the averaging of sub-nodes in the goal model is not very convincing. I would like to see an experimental evaluation of it.

Detailed review:
============

Introduction: I missed in the introduction a summary of how the approach was evaluated.

Section 2: In general, I found section 2 to be very difficult to understand. For instance:
    •    Figure 2 is very confusing. The Figure highlights values that are not explained in this section, letting the reader wondering their meaning.
    •    In l.89, the text left me to wonder what part of the notation is related to functional and which part is related to non-functional requirements.
    •    I wonder why Indicators were used in Figure 2, and not in Figure 3. Moreover, is not clear why Figure 3 has two goals. Are these two different examples? What each one is illustrating?
Section 3
I miss a more thorough positioning of the paper to the selected papers.

Section 4: I was not convinced by the argumentation in favor of the proposed equations. I think that should be some experimental validation of the proposed formulas. For instance, my intuition is that for the AND-decomposition, the propagation of the confidence should be weighted by the impact of a sub-node on the satisfaction level.

Section 6: Interestingly that the proposal was applied in a real-world scenario in which complex decisions where required. However, for me, the description lacks experimental rigor. For instance, I miss a discussion of the methodology of evaluation, i.e., the goals of evaluation, how they were measured, what was the collected data, and how the data supports (or not) that the proposal is valuable. Moreover, this section is too long and not up to the point.

 


Minors issues:
===========
“As other proposed approaches suffer from the same complexity and data availability concerns” which? I suggest including citations.
“… situations where available data is scarce or unavailable.” → check the phrasing
sounder → more robust
“2. Research Baseline” → I suggest using the more conventional section name "background”
l.80 - “in order to to”
l.100 - not clear what is referenced in Figure 1.
L93,96 - the relation between Satisfaction values and the color code is confusing.
l.95,97 - the background color in the words looks distracting to me. I suggest removing them.
Figure 1 introduces modeling elements that are not explained nor used until very late in the paper.
l.110 - not sure why notation related to "initialized in a strategy" is explained at this point, once it is not present in Figure 2 nor its semantics were presented to this point.
l.192 - "numerical" could be removed.
    •    l.242 - I do not see the referred trade-off. Please, clarify.
-l.328 - “the hospital really needs”. I suggest re-phrasing
l.335 - (other monitoring purposes) - ??
l.340 - The role of urgency in this context should be introduced before.

Author Response

Thank you for your review. Please find our answers (and answers to the other reviewers) in the file attached.

Author Response File: Author Response.pdf

Reviewer 2 Report

The authors of the manuscript present the proposition of use of Goal-oriented requirement language for user requirements specification. It is interesting, however, the manuscript has some drawbacks as:

1. Figures 1-10 are too small and not visible.

2. Sections 4 and 5 should be extended to present the contribution of the manuscript.

3. Sections 7 and 8 should be joined.

4. There is a lot of typos. English has to improved.

Author Response

Thank you for your review. Please find our answers (and answers to the other reviewers) in the file attached.

Author Response File: Author Response.pdf

Reviewer 3 Report

The paper proposes a Data Quality Tagging and Propagation Mechanism to compute the confidence level of a goal’s satisfaction. The approach is implemented within the Goal-oriented Requirement Language (GRL). The case study has been presented about tracking the position of lab samples from the Emergency Room (ER) to the lab unit. The work is interesting, but the presentation and evaluation must be improved.

1) GRL is a Domain-Specific Language (DSL). However, this fact is not presented to the readers. It would be good to also mention some general references about DSLs.

2) Related work section is shallow. Several different approaches are mentioned. However, the authors have not discussed how their approach differs from related work.

3) Section on implementation (Section 5) lacks sufficient details. Has been used similar guidelines for GRL extension as the PRISE guidelines for extending iStar?

Singh et al. 2022: Modelling human-centric aspects of end-users with iStar. Journal of Computer Languages, Volume 68, February 2022, 101091

4) Evaluation of the proposed approach is rather weak. Only one case study is presented. The approach has not been validated with controlled experiments.

5) What is the difference among references [29-31]? It seems that only one reference would be sufficient.

Author Response

Thank you for your review. Please find our answers (and answers to the other reviewers) in the file attached.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

The authors have addressed most of the issues I have previously raised. However, two major issues still not addressed in the current version:

  1. The authors have not provided in the paper, and neither in the letter, arguments in favor of the correctness of the provided equations. Apparently, there were design decisions but not explicitly validated either formally or empirically. Also, the fact that “the confidence of the parent goal satisfaction is equal to the average of its sub-goals’ confidence values” does imply the sub-goals have weights. Equal weights, though. This contradicts the authors reply when they mention in the letter to reviewers that ‘For the AND-decompostion, there is no weighting involved”. 
  2. I still find quite weak the evaluation of the proposed approach. The introduced paragraph in Section 6 does not address clearly if/how the evaluation have been reached. It still seems to me the evaluation was actually an application, instead of *evaluation*, of the approach. The description of the results are rather fuzzy ('positive' in the Introduction, 'good option' in the Evaluation, etc) and hard to grasp the real value behind the proposed approach. How the authors define a “good option” or that they reached “positive results”? What are the goals of the evaluation itself and what are the metrics to know if those goals were achieved or not? Is there a baseline approach to compare that the proposed approach does meet its intention of tagging data in goal model with an accurate representation of the analysis? How can one rely on the approach to deliver reliable results for important decision-making? The presented Evaluation Section is still quite weak to deliver such fundamental message and to report accordingly throughout the paper, starting from the Introduction. 

Author Response

Dear Reviewer 1,

The authors have addressed most of the issues I have previously raised.

Answer: Thank you!

However, two major issues still not addressed in the current version:

  1. The authors have not provided in the paper, and neither in the letter, arguments in favor of the correctness of the provided equations. Apparently, there were design decisions but not explicitly validated either formally or empirically. Also, the fact that “the confidence of the parent goal satisfaction is equal to the average of its sub-goals’ confidence values” does imply the sub-goals have weights. Equal weights, though. This contradicts the authors reply when they mention in the letter to reviewers that ‘For the AND-decomposition, there is no weighting involved”.

 

Answer:

In the second part of the comment, the reviewer is right in stating these implicit (equal) weights as we are using an average function. In our letter, we meant there is no explicit weighting on AND-decomposition links. As for the first part of the comment, we now included a discussion of alternative functions that could have been used instead of the average. This is now made clearer in Section 4.2, before equation (5) for the AND-decomposition, where we added:

“As there are no explicit weights associated with AND-decomposition links, we assume here equal weights by using an average function. Note also that the average is preferable here to the maximum and minimum functions as the maximum of the confidence values is too optimistic in an AND-decomposition context whereas the minimum is too pessimistic, especially as the actual confidence level of the propagated satisfaction value might be ignored in both these alternative functions. Additionally, only propagating the confidence level of the selected (and lowest) satisfaction values is deemed insufficient here as this solution would ignore the confidence levels of the other considered nodes, which might be much lower or higher.”

For the XOR/OR-decomposition, we also added:

“This is different from the AND-decomposition, where the average of all confidence values was considered, because the XOR/OR-decomposition is an optimistic decomposition operator by nature, and is concerned with selecting some options while ignoring others.”

For contributions, we refined an existing sentence:

“This is a mechanism similar to the propagation of satisfaction values in GRL, where all the contributions and their weights are considered and truncated whenever necessary.”

Similarly, for dependencies, we refined another existing sentence:

“This is a conservative propagation of confidence, and this choice is again dictated by the nature of the dependency link in GRL, which aims to identify locations for conservative evaluations in GRL models.”

 

  1. I still find quite weak the evaluation of the proposed approach. The introduced paragraph in Section 6 does not address clearly if/how the evaluation have been reached. It still seems to me the evaluation was actually an application, instead of *evaluation*, of the approach. The description of the results are rather fuzzy ('positive' in the Introduction, 'good option' in the Evaluation, etc) and hard to grasp the real value behind the proposed approach. How the authors define a “good option” or that they reached “positive results”? What are the goals of the evaluation itself and what are the metrics to know if those goals were achieved or not? Is there a baseline approach to compare that the proposed approach does meet its intention of tagging data in goal model with an accurate representation of the analysis? How can one rely on the approach to deliver reliable results for important decision-making? The presented Evaluation Section is still quite weak to deliver such fundamental message and to report accordingly throughout the paper, starting from the Introduction.

Answer:

Thanks for suggesting the main messages from the abstract/introduction to the conclusion (which we think we are better achieving now). In this revised version, we avoided the term “evaluation” when talking about the case study, which is indeed an application of the approach. We improved the text in the following way to address the above questions.

1) In the introduction, we removed a sentence about evaluation and added this paragraph, which refines the initial “positive results”:

“The proposed approach was applied to a real-world case study in the healthcare domain and informally assessed for feasibility and usefulness, with positive results related to supporting decision making among alternatives and identifying areas where higher-quality data would be needed to sufficiently increase relevant levels of confidence.”

2) The first paragraph of the case study was augmented with:

“This case study enables applying the mechanism to a real-life situation, in situ, and collect evidence of usefulness regarding support for organizational decision-making and the detection of low levels of confidence that could help identify needs for higher-quality data that feed indicators.”

3) In the Discussion section, we reformatted our limitations and threats as an enumerated list, and expanded on several of them. This should help answer some of the above concerns. The last four items are particularly new:

3. The functions used to propagate confidence levels (Section 4.2), although they did not generate complaints from the case study participants, are currently only justified through arguments. They could be further validated empirically, especially against alternative propagation functions.

4. The illustrative example may not reflect the complexity of other real-world cases and contexts, especially outside the healthcare domain. Additional cases studies in other domains would help raise our confidence in the suitability and generalizability of the mechanism proposed here. More formal experiments could also provide more reliable empirical evidence, for instance by comparing the outputs of the uncertainty reasoning proposal with those of domain experts, or by studying the scalability and usability of the reasoning as goal models get larger.

5. The approach was currently implemented for GRL models. Although we do not see major issues in porting it to other goal-oriented modeling languages, whether specific semantics of these languages will require major adaptations to some of the confidence propagation functions remains a research topic.

6. As the creators of the Data Quality Tagging and Propagation Mechanism also led the development and analysis of the case study, real or perceived biases represent another potential threat to the internal validity of our work. One potential mitigation would be to have people other than us lead experiments and case studies about the usefulness of the approach.”

 

4) The first paragraph of the conclusion was augmented with:

“… who perceived value in this approach. Not only does it provide confidence levels for all intentional elements of a (GRL) goal model, which can influence decision making, but it also helps identify locations where higher-quality data would help increase confidence in the goal-oriented analysis.”

5) We also made many smaller changes, highlighted in blue in the PDF file (especially in sections 5 to 7), to clarify a few points and better qualify our claims.

 

Thank you for this opportunity to further improve our paper.

Reviewer 3 Report

My comments have been addressed, more or less, and the paper can be accepted now.

Author Response

Thank you very much for your support and your previous comments!

Round 3

Reviewer 1 Report

Although quite succinctly, the authors have addressed my concerns. 

Back to TopTop