Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Integrating Sustainability Metrics into Project and Portfolio Performance Assessment in Agile Software Development: A Data-Driven Scoring Model

Sustainability 2023, 15(17), 13139; https://doi.org/10.3390/su151713139

by Cristian Fagarasan^1,*

, Ciprian Cristea², Maria Cristea²

, Ovidiu Popa¹ and Adrian Pisla^1,*

Reviewer 1:

Muataz Al Hazza

Reviewer 2:

Mohamad Kader

Reviewer 3: Anonymous

Reviewer 4: Anonymous

Sustainability 2023, 15(17), 13139; https://doi.org/10.3390/su151713139

Submission received: 28 July 2023 / Revised: 28 August 2023 / Accepted: 30 August 2023 / Published: 31 August 2023

Round 1

Reviewer 1 Report

1. the abstract is well presented but missing the last section of the standard abstract, the results. The abstract need to be rewritten to include the followings: introduction statement, problem statement, the main objective of the research, methodology used to achieve the objective, results, and conclusions.

2. the methodology section needs to restructured to show the detailed steps of the methodology (it is good if you present the methodology by a flowchart)

3. the conclusion section need to re be written (it is good if you can present the conclusions by steps.

4. the research paper need a proofreading

The research paper need a proofreading

Author Response

Dear Editorial Office and Reviewers, enclosed you can find the revision of the manuscript No. sustainability-2556405

Integrating Sustainability Metrics into Project and Portfolio Performance Assessment in Agile Software Development: A Data-Driven Scoring Model, submitted for publication in Sustainability Journal.

We appreciate the feedback provided and the insights from the reviewers regarding our manuscript. The comments received were invaluable in guiding the revisions to our paper. Each observation was meticulously considered, and changes were made in hopes of addressing the concerns raised. The manuscript has been updated based on the recommendations and specific suggestions from the reviewers.

Below are the corrections made to the paper and our responses to the reviewer's comments.

Answers to reviewer’s comments

The abstract is well presented but missing the last section of the standard abstract, the results. The abstract need to be rewritten to include the followings: introduction statement, problem statement, the main objective of the research, methodology used to achieve the objective, results, and conclusions.

Reply: Thank you for the constructive feedback on our manuscript's abstract. We appreciate your guidance, which has provided us with an enhanced perspective on structuring the abstract to encapsulate all critical elements. We've taken your feedback to heart and have reformulated sections of our abstract to incorporate an introductory statement, a problem statement, the research's primary objective, the employed methodology, results, and our conclusions. We believe this revised version provides a comprehensive overview of our study while adhering to the conventional structure of an abstract.

We trust this modification addresses your concerns, and we are hopeful that the refined abstract conveys the essence of our research more effectively.

The methodology section needs to restructured to show the detailed steps of the methodology (it is good if you present the methodology by a flowchart)

Reply: Thank you for pointing out the potential enhancement in the methodology section. We acknowledge the importance of clearly illustrating the steps of our methodology for improved comprehension. In response to your feedback, we have adjusted the methodology section to delineate the steps more detailedly. Additionally, to provide a visual representation of the process, a flowchart has been incorporated. This should offer readers a straightforward understanding of the methodology's progression and components. We trust that these revisions will address your concerns and enhance the clarity of our research approach.

Further enhancements that were applied:

Research Questions Articulation: At the beginning of Section 2, we have explicitly stated the research questions guiding our study. This provides readers with a clearer understanding of the objectives and intended outcomes of our investigation.

Detailing the Methodological Approach: We've made sure to flesh out the specifics of our overall research methodology. This encompasses both our rationale for the chosen approach and a step-by-step account of how we went about our investigation, ensuring that readers can follow our process with ease.

Incorporation of the Goal-Question-Metric (GQM) Method: We've structured our methodological approach around the GQM paradigm. We elaborated on our primary aim to enhance the project and portfolio performance assessment by seamlessly embedding sustainability metrics in agile software environments. The questions were derived from our research goal and give a structured framework for our inquiry. The section now includes details about the main questions answered in the paper and how the defined metrics align with our goals and questions.

The conclusion section need to re be written (it is good if you can present the conclusions by steps.

Reply: Thank you for the valuable feedback on the conclusion section. We recognize the importance of presenting our findings in a succinct and structured manner. We have taken your suggestion and restructured the conclusions, presenting them stepwise for better clarity and comprehension.

The research paper need a proofreading

Reply: Thank you for highlighting the importance of ensuring the manuscript is clear and error-free. We appreciate the diligence in assessing the clarity and coherence of our work. The manuscript was thoroughly proofread to address any grammatical, syntactic, and structural issues.

Thank you again for your helpful comments on our manuscript. We look forward to your positive response.

Reviewer 2 Report

This is a well-founded and well-researched piece of work that deserves merit as it provides a sensible scoring model that has potential for broader application. The engineering field is not the only field that could benefit from this model and this adds further weight to the work. It would be useful to perhaps consider giving applied examples of how the model can be applied in a medium-sized business where management are not familiar with sustainability considerations or where this is their first fore into the field.

Author Response

Dear Editorial Office and Reviewers, enclosed you can find the revision of the manuscript No. sustainability-2556405

We appreciate the feedback and the reviewers' insights regarding our manuscript. The comments received were invaluable in guiding the revisions to our paper. Each observation was meticulously considered, and changes were made to address the concerns raised. The manuscript has been updated based on the recommendations and specific suggestions from the reviewers.

Below are the corrections made to the paper and our responses to the reviewer's comments.

Answers to reviewer’s comments

This is a well-founded and well-researched piece of work that deserves merit as it provides a sensible scoring model that has potential for broader application. The engineering field is not the only field that could benefit from this model and this adds further weight to the work. It would be useful to perhaps consider giving applied examples of how the model can be applied in a medium-sized business where management are not familiar with sustainability considerations or where this is their first fore into the field.

Reply: Firstly, we'd like to express our gratitude for your positive appraisal of our work and the feedback you've provided. We are glad to note that you recognize the broader potential of our scoring model beyond just the engineering field.

You've rightly pointed out that while our research might cater more to a specialized audience, its applicability could be enhanced by offering examples in a context relatable to medium-sized businesses, especially those unfamiliar with sustainability considerations.

We understand the importance of presenting our model as approachable and actionable for businesses of varying sizes and backgrounds. We have been talking to the current paper's authors to expand on this idea and incorporate your suggestion in a future article, showcasing real-life scenarios where our model can be implemented in medium-sized enterprises beyond Engineering. This will provide readers with a step-by-step guide to applying the model, taking into account the unique challenges and advantages that come with businesses of that scale. The learning curve can be quite steep for companies taking their initial steps into sustainability. We intend to provide a simplified overview of the scoring model to cater to this demographic, highlighting the fundamental principles and benefits. This would serve as a primer to any business or individual new to sustainability considerations.

Your feedback has indeed been valuable in pointing us toward refining our approach to make the model more accessible and applicable to a diverse set of readers. We appreciate the insight and will ensure to incorporate these considerations into our future presentations and publications.

Reviewer 3 Report

The submitted manuscript is thorough and certainly justifies the authors’ research findings regarding ‘Integrating Sustainability Metrics into Project and Portfolio Performance Assessment in Agile Software Development’ in context of the data driven scoring modelling analysis through statistical examining activities in context of sustainability factor via Agile Methodology in software/IT industrial. However, this manuscript requires clarification to precisely address whether this study is about strategic portfolio management to encounter interrelated projects in the programme(s) or diverse mix of projects and programme under portfolio of the chosen industry towards sustainability.

Study has presented interesting statistical analysis of project time laps whereas calculating project performance specifically planning, executing and completing project(s) in time for portfolio success though this needs some justifications supported by selected case study to related portfolio strategy to organisational sustainability other than time management, which would be advantageous to identify key factors for which the sustainable industrial developments are promoted / regulated in the IT sector. However, this is an interesting manuscript, and I approve of its publication with minor revision.

The manuscript is written well though may require some quick proof read to avoid any typo and/or grammatical errors.

Author Response

Dear Editorial Office and Reviewers, enclosed you can find the revision of manuscript No. sustainability-2556405

Below are the corrections made to the paper and our responses to the reviewer's comments.

Answers to reviewer’s comments

The submitted manuscript is thorough and certainly justifies the authors’ research findings regarding ‘Integrating Sustainability Metrics into Project and Portfolio Performance Assessment in Agile Software Development’ in context of the data driven scoring modelling analysis through statistical examining activities in context of sustainability factor via Agile Methodology in software/IT industrial. However, this manuscript requires clarification to precisely address whether this study is about strategic portfolio management to encounter interrelated projects in the programme(s) or diverse mix of projects and programme under portfolio of the chosen industry towards sustainability.

Reply: Thank you for your insightful feedback and for recognizing the depth of our manuscript on 'Integrating Sustainability Metrics into Project and Portfolio Performance Assessment in Agile Software Development'. We understand your concern about the precise context of our study in terms of its focus on strategic portfolio management. A small paragraph was introduced in section 2 to add more details about how a portfolio is defined in this specific context.

Our study primarily focuses on integrating sustainability metrics within the broader context of portfolio performance assessment in agile software development. The objective was to create a framework that can be universally applied, regardless of whether the portfolio consists of interrelated projects within a specific program or a diverse mix of projects and programs. We appreciate your observation, as it indicates the breadth of applicability of our proposed metrics.

However, for the purpose of this manuscript and to maintain a clear scope, our analysis was primarily centered on projects that are not necessarily related but are managed under the same portfolio. Basically, these can be, for example, all the software projects that are initiated for a specific company function (Sales, Marketing, etc.) or all software projects initiated in a company. This choice was made to provide tangible, actionable insights for organizations within the software/IT industry.

That said, the inherent design of our metrics and the proposed framework is adaptable and can be applied to both contexts, whether it is for strategic portfolio management encompassing interrelated projects or for a broader range of diverse projects and programs.

Study has presented interesting statistical analysis of project time laps whereas calculating project performance specifically planning, executing and completing project(s) in time for portfolio success though this needs some justifications supported by selected case study to related portfolio strategy to organizational sustainability other than time management, which would be advantageous to identify key factors for which the sustainable industrial developments are promoted / regulated in the IT sector. However, this is an interesting manuscript, and I approve of its publication with minor revision.

Reply: In our research, we emphasized time management as it is often a primary and tangible metric that organizations, especially in the IT sector, consider crucial for project and portfolio success. While we understand the importance of other factors related to organizational sustainability, we intended to shed light on this often-overlooked aspect and its profound impact on portfolio performance, key results, and company objectives.

We appreciate your recommendation for publication with minor revision, and your insights will be invaluable as we progress in our research journey. Thank you again for your helpful comments on our manuscript. We look forward to your positive response.

Reviewer 4 Report

Dear authors,

While the paper covers a novel, interesting and relevant topic, I have several major concerns with how the research was conducted and reported:

· Paper structure –The section 2 called Materials and methods should describe the methodological aspect of your paper, including research questions, overall research methodology and individual research methods that you used to devise new metric. Instead, in your paper, this section lacks some of these important methodological aspects. The paper would benefit from using an established method for devising software metrics (perhaps Goal-Question-Metric method).

· Heavy reliance on estimations - Your calculation model and the results heavily rely on multiple different parameters and weights that have to be estimated. In order to apply your calculation model we need to estimate (or in some other way calculate) steepness parameter, weights for DH, LT and PT scores, KPI weights, Min and max values for code metrics, and weights for PDP and SS. Now, I do understand that you wanted to make your calculation model customizable. However, I am concerned about the number of parameters that we need to estimate correctly in order to get representative results. For example, how did you come up with values in Table 2 and Table 3?

· Choice of code metrics – overall this is in my opinion the biggest issue in the paper.

o In order to calculate Sustainability score you chose 6 code metrics for which you stated that they have “the most positive impact on sustainability” (lines 347-348). Problem here is that only Test coverage has a potential to affect sustainability in a positive manner. Other chosen metrics might be negatively correlated (your formulas, however, correctly acknowledge this).

o I was unable to find where in the cited paper (line 348) are chosen code metrics mentioned, and where it was reported that they have the most positive impact on sustainability. If we do not have evidence that the 6 chosen code metrics are right metrics to model the sustainability metric, then how can we be sure in the correctness of the sustainability metric?

o The 6 chosen code metrics significantly overlap. For example, code duplication is literally one of the code smells reported in Fowler’s book on Refactoring. Technical debt is an estimated time required to fix code quality issues, including code smells. Therefore, if in some project we have code duplication, your calculation model would penalize that duplication at least 3 times, i.e. through Code duplication metric, Code smells metric and Technical debt metric.

· Case study

o Some of the normalized values in Table 9 are negative, which is in conflict with what you prescribed in your calculation model (line 365), and again raises the question of estimating the min and max values for code metrics.

o When trying to replicate results from your case study, I was unable to get the same results as in Table 10. I might have missed something, but please check if your calculations are correct.

o The heading “3.4. Calculating the SS scores for the projects under review” appears two times in the paper (lines 499 and 534).

· No evaluation of metrics – While reported case study shows an example of applying metrics in the context of 7 projects, there is no real evaluation of proposed metrics. How do we know that, for example, results in Table 10. make sense at all and accurately describe/predict sustainability?

Some of the concerns I raised, such as paper structure, typos and possible errors in calculations can be easily addressed. Evaluating your metrics will require more effort, but in my opinion this should be done when proposing new metrics, especially when submitting to Q1/Q2 journals. The largest challenge you will have is resolving issues related to the choice of code metrics. I am not sure if this can be done post hoc.

Best regards!

Author Response

Dear Editorial Office and Reviewers, enclosed you can find the revision of manuscript No. sustainability-2556405

Below are the corrections made to the paper and our responses to the reviewer's comments.

Answers to reviewer’s comments

Paper structure –The section 2 called Materials and methods should describe the methodological aspect of your paper, including research questions, overall research methodology and individual research methods that you used to devise new metric. Instead, in your paper, this section lacks some of these important methodological aspects. The paper would benefit from using an established method for devising software metrics (perhaps Goal-Question-Metric method).

Reply: We deeply appreciate the feedback on the structure and content of Section 2. In line with the recommendations, the following changes and additions were made to enhance the clarity and rigor of our methodological exposition.

Research Questions Articulation: At the beginning of Section 2, we have now explicitly stated the research questions guiding our study. This provides readers with a clearer understanding of our investigation's objectives and intended outcomes.

Detailing the Methodological Approach: We've made sure to flesh out the specifics of our overall research methodology. This encompasses our rationale for the chosen approach and a step-by-step account of how we went about our investigation, ensuring readers can easily follow our process.

Incorporation of the Goal-Question-Metric (GQM) Method: Considering your advice, we've structured our methodological approach around the GQM paradigm. We elaborated on our primary aim to enhance the project and portfolio performance assessment by seamlessly embedding sustainability metrics in agile software environments. The questions were derived from our research goal and give a structured framework for our inquiry. The section now includes details about the main questions answered in the paper and how the defined metrics align with our goals and questions.

Heavy reliance on estimations - Your calculation model and the results heavily rely on multiple different parameters and weights that have to be estimated. In order to apply your calculation model we need to estimate (or in some other way calculate) steepness parameter, weights for DH, LT and PT scores, KPI weights, Min and max values for code metrics, and weights for PDP and SS. Now, I do understand that you wanted to make your calculation model customizable. However, I am concerned about the number of parameters that we need to estimate correctly in order to get representative results. For example, how did you come up with values in Table 2 and Table 3?

Reply: Thank you for sharing your thoughts and concerns regarding the parameters presented in the proposed model. We appreciate the opportunity to clarify.

Customizability and Estimations: The intention behind introducing multiple parameters was to provide a customizable model, recognizing the variability across projects and organizational contexts. We understand that the sheer number of customizable parameters may raise concerns about the representativeness and reliability of the results. To this end, we aimed for a balance where a generalized approach could be tailored to unique project circumstances.

Weighting Values in Tables 2 and 3: The weighting in Table 2 was derived based on a combination of literature review, expert consultations with the technical team part of the projects where thresholds were implemented, and evidence from historical project data on similar projects. A paragraph was added in the section to dive deeper into the impact of each metric and its relative impact (High, Medium, Low) on sustainability. The values in Table 3, specifically the maximum values, were general assumptions based on our observations of medium to large projects. We also consulted with several industry experts to validate these assumptions. As you rightly pointed out, these values should be adjusted according to the specific context of organizations and industry standards. Even SonarQube does not provide specific industry guidelines for the maximum values of these metrics. Instead, it provides a flexible framework for teams to define their own quality gates and thresholds based on their specific needs and context. Reference: SonarQube Documentation.

Addressing the Estimation Concerns: We acknowledge that relying heavily on estimations can introduce variability. To address this concern, we will be gathering more empirical data and collaborating with industry partners to refine our weights and thresholds. Moreover, we're considering the introduction of adaptive algorithms to adjust parameters based on real-time feedback, thus reducing the need for manual estimations.

In summary, while we aimed to balance customizability and precision, we recognize the pitfalls of over-reliance on estimations. We genuinely appreciate your feedback, as it will guide our efforts in further refining our model to strike a better balance and ensure its widespread applicability and reliability in future research. Thank you again for your invaluable insights.

Choice of code metrics – overall this is in my opinion the biggest issue in the paper.

In order to calculate Sustainability score you chose 6 code metrics for which you stated that they have “the most positive impact on sustainability” (lines 347-348). Problem here is that only Test coverage has a potential to affect sustainability in a positive manner. Other chosen metrics might be negatively correlated (your formulas, however, correctly acknowledge this).

Reply: Thank you for your insightful feedback. We acknowledge your concerns regarding the selection of code metrics and their impact on sustainability. Here's our response to your observations:

Correlation with Sustainability: We appreciate your observation that only Test coverage has an overt positive effect on sustainability among the metrics we chose. We intended to select metrics that, when improved, can enhance the sustainability of a software project. While some metrics, like Test Coverage, have a direct positive correlation, others, like Code Smells or Technical Debt, might be negatively correlated. We chose to include them because addressing these negative aspects can provide a more sustainable codebase in the long term. Decision: We removed the ‘positive’ word, as it introduces more confusion than clarity.

I was unable to find where in the cited paper (line 348) are chosen code metrics mentioned, and where it was reported that they have the most positive impact on sustainability. If we do not have evidence that the 6 chosen code metrics are right metrics to model the sustainability metric, then how can we be sure in the correctness of the sustainability metric?

Reply: Upon careful examination of the mentioned metrics in the cited paper, here are our conclusions:

Code Smells: This is directly mentioned in the cited paper, stating that refactoring code smells reduce energy consumption. Hence, this metric has a clear correlation with sustainability.
Technical Debt: sonarqube's evaluation of technical debt is highlighted as an indicator of quality attributes, thus establishing its importance and relevance to our research.
Code Complexity: While "Code Complexity" isn't directly mentioned, related terminologies such as "Cyclomatic complexity" are present in the text. This metric is closely associated with code complexity and is an implicit reference.
For the metrics not directly present:
- Code Duplication: No specific mention was found in the text, and we acknowledge this omission. However, this metric is closely related to Code smells and Technical Debt, as we will underline in the next point.
- Test Coverage: The provided text does outline metrics related to the testing phase. Although the "Total number of test cases," "Number of failed test cases," and "Number of passed test cases" do not translate directly to test coverage, they offer a perspective on the testing effort. While not a direct measure, a comprehensive testing suite can imply good test coverage indirectly.
- Security Vulnerabilities: The text references SonarQube, renowned for evaluating various quality attributes, including "security." Even though the specifics of "security vulnerabilities" aren't detailed in the excerpt, the broader notion of "security" as evaluated by SonarQube, provides an indirect association.
While some metrics are not directly referenced, their presence is intimated through closely associated or related metrics to address your main concern. These metrics' nuanced and interconnected nature highlights their collective role in gauging software sustainability. This intricate web of direct and indirect metrics presents a holistic view of sustainability.
We understand the importance of transparency and will endeavor to make these connections more explicit in our future iterations. Your feedback has been instrumental, and we truly appreciate it.

The 6 chosen code metrics significantly overlap. For example, code duplication is literally one of the code smells reported in Fowler’s book on Refactoring. Technical debt is an estimated time required to fix code quality issues, including code smells. Therefore, if in some project we have code duplication, your calculation model would penalize that duplication at least 3 times, i.e. through Code duplication metric, Code smells metric and Technical debt metric.

Reply: While we understand the concern raised regarding the overlaps among Code Duplication, Code Smells, and Technical Debt, it's worth noting that SonarQube categorizes these as separate metrics, each has a unique dimension and purpose.

SonarQube, as a widely-recognized tool in the industry, distinguishes between these metrics for granularity and to offer software developers a more detailed insight into their code's health. Here's why:
- Code Smells: While "code duplication" is a type of code smell, the umbrella term "code smells" covers a vast array of potential issues, not limited to duplication. It encapsulates other non-optimal coding practices that might not immediately cause defects but signal deeper problems in the code.
- Code Duplication: Even though it's a subset of code smells, duplication is a significant enough concern to merit its metric. Duplication can lead to larger codebases, making them harder to maintain, and can introduce bugs if updates are made in one location but missed in duplicated code.
- Technical Debt: This metric is more about the "cost" of fixing the issues present in the code. It quantifies the effort required to fix all maintainability issues, including code smells and duplication. But it also encapsulates broader concerns, such as architectural decisions that might make future changes more challenging.
Conclusion: While there's overlap, each metric provides specific insights. In sustainability, it's valid to argue that addressing these metrics can lead to a more maintainable (and thus more sustainable) software product. Given your feedback, we'll be more judicious in ensuring our model doesn't inadvertently penalize projects multiple times for interconnected issues in future versions of the proposed model.

Case study - Some of the normalized values in Table 9 are negative, which is in conflict with what you prescribed in your calculation model (line 365), and again raises the question of estimating the min and max values for code metrics.

Reply: Thank you for your insightful feedback regarding the normalized values in Table 9. We acknowledge the discrepancy in some of the normalized values that fall outside the prescribed range of [0, 1] due to certain KPIs exceeding the initially defined maximum values.

Interpretation of Negative Values: The negative values indicate that those metrics' actual KPI values exceeded our initially defined "worst-case" scenario or maximum threshold. In our current normalization formula, this results in values less than 0. The presence of such values suggests that those metrics, for certain projects, may be worse than our defined worst-case scenario. We clarified this in the paper so it’s clear for our readers. Thank you for this remark, as it helps us better explain this scenario.

When trying to replicate results from your case study, I was unable to get the same results as in Table 10. I might have missed something, but please check if your calculations are correct.

Reply: Thank you for taking the time to go through the formula and case study calculations. We appreciate the time put into this to guarantee the quality and correctness of the proposed mathematical model.

Actions taken: all formulas used in the model were carefully reviewed and validated, and no errors were discovered. Moreover, all the calculations were redone to ensure the correctness of the results. Small revisions of the results were made for the scores where errors were identified. We shared here the calculations done for the case studies. Also, some adjustments were added to the text to clarify that when the CD = 0, the DH will always be 1.

The heading “3.4. Calculating the SS scores for the projects under review” appears two times in the paper (lines 499 and 534).

Reply: Thank you for highlighting the 3.4 duplications of the header; this was somehow missed during our author peer review process. It has now been removed.

No evaluation of metrics – While reported case study shows an example of applying metrics in the context of 7 projects, there is no real evaluation of proposed metrics. How do we know that, for example, results in Table 10. make sense at all and accurately describe/predict sustainability?

Reply: Thank you for the insightful feedback regarding evaluating our metrics. We recognize the importance of validating metrics, especially when they serve as indicators for abstract constructs such as software sustainability. Here is our approach and future directions to address your concerns:

Methodological Choice: Our choice of KPIs and their respective weights were grounded in specific theory, prior literature, and expert consultation. While we understand that this approach has limitations, we believe it offers a starting point for gauging software sustainability.

Ground Truthing: We acknowledge the need to correlate our SS scores with real-world indicators of software sustainability. In our future work, we plan to compare our metrics with established indicators like project longevity, user base stability, and maintenance efforts to better validate our scores.

External Expert Evaluation: To further strengthen the validity of our SS scores, we are looking to organize expert panels. These panels, consisting of industry and academic professionals, will be tasked with independently evaluating software sustainability. We aim to see how their evaluations align with our scores, providing an external validation point.

Iterative Process: We view our metric system as iterative. Feedback like yours helps refine it. In the future, we will run sensitivity analyses, understand individual KPIs' influence, and tweak our system based on the insights gathered.

Future Collaboration: Recognizing the importance of diverse viewpoints in metric validation, we are open to collaborations. We invite scholars and professionals to apply our metrics in different settings and provide feedback, which can help refine our metric system.

In conclusion, while our current study is a preliminary exploration into quantifying software sustainability, we are fully committed to refining and validating our metrics in subsequent studies. We appreciate your constructive feedback, which points us in the right direction for future work.

Some of the concerns I raised, such as paper structure, typos and possible errors in calculations can be easily addressed. Evaluating your metrics will require more effort, but in my opinion this should be done when proposing new metrics, especially when submitting to Q1/Q2 journals. The largest challenge you will have is resolving issues related to the choice of code metrics. I am not sure if this can be done post hoc.

Reply: These revisions make the article considerably more robust, comprehensive, and aligned with academic standards. Your feedback was invaluable in driving these improvements, and we're grateful for the insights.

Thank you again for your helpful comments on our manuscript. We look forward to your positive response.

Round 2

Reviewer 4 Report

Dear authors,

Thank you for taking my suggestions and requests into consideration, as well as for thorough elaboration of introduced changes. I read your responses thoroughly, and my comments are as follows:

Paper structure – I agree with introduced changes.

Heavy reliance on estimations – Thank you for clarification. As I noted, I understand the intent, but I think it should be made clear for readers that such approach could have some issues.

Choice of code metrics

a. The removal of “positive” word now makes the role of individual metrics more clear. Thank you.

b. After reading your response related to selection of metrics, I still do not think that you provide evidence why precisely these 6 metrics are “the” metrics for evaluating sustainability. From your response it is clear that even the paper you cited does not do that. Rather, the choice of these metrics is your interpretation. Also, with regard to overlap between metrics, I understand SonarQube’s intention to provide more detailed look into some specific aspects of technical debt (such as code smells), or some specific code smells (such as code duplication). However, in my opinion this is done in order for developers to more easily identify and rectify issues in their code. This does not justify treating these metrics as a separate metrics in mathematical expression for calculating compound sustainability metric. Again, code duplication is already taken into account in code smells, and code smells are taken into account in technical debt.

Case study and evaluation – thank you for your clarification and double checking individual calculations. With regard to metrics evaluation in my opinion journal paper should have a more rigorous evaluation than the case study you presented. Perhaps, the reasonable approach was to choose several finished projects, and apply your metrics on data from the intermediate phases of these projects. After that you would analyze the final outcomes of the projects (e.g. quantitative data related to project success, opinions of project managers and other stakeholders, etc.). Finally, you would compare the results that you got from running your metrics with project success analysis. In this way you would get some sense whether your metrics were good or not at assessing and predicting project sustainability. Currently, your case study only proves that your metrics can be mathematically calculated.

Thank you again for clarifications and addressing my concerns related to your paper. Overall, I think that the paper covers and interesting and relevant topic. As I described above, I still have serious concerns whether your metrics can accurately assess and predict sustainability or not. However, since I am probably going to be outvoted by other reviewers, I would like to suggest that you include a “Study limitations” and “Future research” paragraphs in your paper (Conclusion). In these paragraphs you could list the current limitations of your metrics (such as heavy reliance on estimations, the not so scientifically grounded choice of 6 particular metrics, metrics overlap, and the lack of rigorous evaluation), and how these limitations are planned to be addressed in your future work. I think that this should be fair to future readers of your work.

Best regards!

Author Response

Dear Reviewer,

Thank you for your thoughtful and detailed comments. We are pleased that you find the structural changes and clarifications to the paper agreeable. Your continued feedback is valuable to us in improving the quality of our work.

Case study and evaluation: We understand that the current case study is not exhaustive enough to validate the proposed metrics rigorously. Your suggestion to apply metrics on data from the intermediate phases of finished projects and then compare it with project success outcomes is well-received. We aim to incorporate this more rigorous evaluation in our future work.

Your suggestion to include "Study limitations" and "Future research" as part of a paragraph of the Conclusions in the paper is excellent. We stated the limitations, such as the choice of metrics, their potential overlap, and the lack of exhaustive evaluation. We also outlined how these limitations will be addressed in future research.

Thank you once again for your time and insights. We greatly appreciate your constructive feedback and are committed to working diligently to address your concerns in future work.

Best regards!

Article Menu

Integrating Sustainability Metrics into Project and Portfolio Performance Assessment in Agile Software Development: A Data-Driven Scoring Model

Further Information

Guidelines

MDPI Initiatives

Follow MDPI