Next Article in Journal
Functional and Technological Potential of Whey Protein Isolate in Production of Milk Beverages Fermented by New Strains of Lactobacillus helveticus
Previous Article in Journal
Application of Metal Magnetic Memory Testing Technology to the Detection of Stress Corrosion Defect
 
 
Article
Peer-Review Record

An Expert Judgment in Source Code Quality Research Domain—A Comparative Study between Professionals and Students

Appl. Sci. 2020, 10(20), 7088; https://doi.org/10.3390/app10207088
by Luka Pavlič *, Marjan Heričko and Tina Beranič
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Appl. Sci. 2020, 10(20), 7088; https://doi.org/10.3390/app10207088
Submission received: 6 September 2020 / Revised: 25 September 2020 / Accepted: 2 October 2020 / Published: 12 October 2020
(This article belongs to the Section Computing and Artificial Intelligence)

Round 1

Reviewer 1 Report

In this paper we have a study that confirms what probably can be supposed without any research.

I have some remarks.

Line 42 gives:

We asked them to enter their perceived level of knowledge on programming language and provide numbers of years of their professional experiences.

I can imagine that most of us can judge our own skills in many areas. Usually, we have a natural tendency to overstate our skills. For instance, most of us in our own opinions are very good car drives because we have a car license for more than 10 years and we drive more than 100 000 km in total. But this is our own opinion and it is much better to check it by special tests or at least to be judged by others. I think that this is the weakest part of this research: it will be much better to score students' and professionals' skills in a kind o peer review method (each student scores other students) or by special tests. A short justification about the solution proposed by the authors is needed.

 

I suggest:

  1. To expand Section 2 giving a discussion about expert judgments and possible inconsistency of their assessments.
  2. Show how the experts' assessments inconsistency can influence final judgment about software quality.
  3. How the inconsistency can be avoided or reduced. This is a kind of pairwise comparison problem.
  4. Are there any measures expressed in numbers to score the software quality?

What means the assessments in Table 1?

source code size | very poor  poor  good  very good

How the code size can be poor or good, rather too long vs. short or something like this.

 

Explain the details in Table 2. Why we have

#poor #very good ->#very good
#poor #very good -> #poor

I suppose that this is related to the sentence:

In addition to performed individual evaluations, participants were asked to coordinate their evaluation with the assigned co-assessor and providing coordinated and agreed evaluation. An example of a software class assessment is shown in Table 2.

How it is done and who was the co-assessor and what is his/her influence on the final results?

 

Please check the language and English errors.

Write the authors contribution.

Author Response

Dear reviewer,

 

Thank you for your valuable review, expressing your concerns and improvement suggestions. We reviewed your comments and carefully addressed all of the open issues. We appreciate your contribution, which enabled us to raise the quality of the manuscript.

Our answers and changes are as follows.

 

Your comment:

I have some remarks.

Line 42 gives:

We asked them to enter their perceived level of knowledge on programming language and provide numbers of years of their professional experiences.

I can imagine that most of us can judge our own skills in many areas. Usually, we have a natural tendency to overstate our skills. For instance, most of us in our own opinions are very good car drives because we have a car license for more than 10 years and we drive more than 100 000 km in total. But this is our own opinion and it is much better to check it by special tests or at least to be judged by others. I think that this is the weakest part of this research: it will be much better to score students' and professionals' skills in a kind o peer review method (each student scores other students) or by special tests. A short justification about the solution proposed by the authors is needed.

Our answer:

Thank you for this comment.

We completely understand and agree with your doubt. We had similar also when we designed the experiment. This is why, in addition to collecting subjectively perceived level of knowledge, we also included a question on years of professional experience in order to objectify the profile. When we designed the questionnaire for collecting participant profiles, we looked upon existing related studies. We based our questionnaire on a work by Zhifeia et al., 2018 (Understanding metric-based detectable smells in Python software: A comparative study – now we added reference number 20). We included explanation to the manuscript as well - which can be seen in section 2, paragraph 2; starting at line 132.

Please note that we experimented with students and professionals separately. Data were also analysed separately – so it was not possible, that data, gathered while experimenting with, e.g. students would end up in professional databag just because student self-assessed himself or herself as a highly experienced. As shown in table 4, perceived knowledge (subjective part) corelate with years of experience (objective part of profile), with which we demonstrate that in "student groups" there were actual students, and in "professional group" there were professionals - majority having more than a decade of development experience.

Furthermore, we also supplemented the paragraph 2 in section 3 (starting with line 152), with an additional sentence, describing our designed questionnaire, addressing the doubts connected to assessment subjectivity: "Since the knowledge self-assessment can be biased and subjective, the years of experience criterion was added in order to objectify participant's experiences."

In addition, we also included a related work to the manuscript:

Within expert judgements, evaluators form their opinion based on past experiences and knowledge, which can result in the subjectivity of the provided assessments [Rosqvist2003, Hughes1996]. [Rosqvist2003] claim that each expert judgement is based on a participant's mental model that is used to interpret the assessed quality aspect. However, considering and adequately addressing the mentioned challenges, experts' assessments can - despite being based on participants' personal experience - constitute a good and valuable supplement to empirical shreds of evidence [Rosqvist2003].

 

Your comment:

I suggest:

  1. To expand Section 2 giving a discussion about expert judgments and possible inconsistency of their assessments.
  2. Show how the experts' assessments inconsistency can influence final judgment about software quality.
  3. How the inconsistency can be avoided or reduced. This is a kind of pairwise comparison problem.

Our answer:

Thank you for expressing your concerns. Let us explain possible inconsistencies (1-3) in a single answer. Since readers might rise the same question, we now included explanation in manuscript also – please see paragraph 4, section 3 (Research method), line 176: "We forced participants to coordinate their assessments as a measure to get as objective assessments as possible on one hand, and to address possible inconsistencies in the assessor's subjective views."

When we talk about possible inconsistencies in the individual assessments, they are addressed with group coordination to agree on joint assessment. An example of this is seen in Table 2, and we advocate our approach also later in this document. This is how we addressed possible divergent views of individuals, and, as a consequence, individual misunderstanding or superficial review of source code would not have an effect on comparing student and professional views on source code quality.

 

Your comment:

  1. Are there any measures expressed in numbers to score the software quality?

Our answer:

We appreciate your comment. The most basic approach for measuring software quality (i.e. assigning a numeric value to quality aspect) is by using the software metrics, which represent a basis for many of the available quality evaluation processes. In the paper, we described the measures of software quality in section 1, paragraph 1: "Several formal [1] and de-facto standards list different software quality attributes and prescribe procedures and metrics for software quality evaluation. They range from relatively simple indirect methods to measure source code in order to reason on expected quality (e.g. comment-to-code ratio) to relatively complex processes." We relate to standards ISO 25000 (ref.no. 2) and SQALE (ref.no. 1) – they are based on measuring (internal) quality while assigning quality aspects numbers. An example of the metric is cyclomatic complexity with well-known number values (1 – the less complex code, ~10+ - code that needs to be simplified).

However, in our experiment we deliberately wanted students and engineers to judge the quality on their own to see if and in what extend would professionals (experience) differ to students (formal education with minor experience).

Also, please note, it is the trend of source code reviews gaining popularity (as explained in manuscript, starting with line 35), where code reviewer's impression overrules numeric values, which gives our manuscript additional importance in terms of who would be trusted to do the source code reviews. In this context, it is not a coincidence, that industry source code reviews are by default carried out by senior developers.

 

Your comment:

What means the assessments in Table 1?

source code size | very poor  poor  good  very good

How the code size can be poor or good, rather too long vs. short or something like this.

Our answer:

Thank you for pointing out the deficient explanation. We supplemented the paragraph 3 in Section 3 (starting with line 161), providing a detailed description of the used scale and depicting the meaning with a practical example: "The participant evaluated each aspects using the scale: very poor, poor, good, very good. The scale aims at gathering their opinion about the quality of the assessed software entity, e.g. during the source code size assessment, the evaluators assessed, if, in their opinion, the size of a software class is poor (i.e. inappropriate) or good (i.e. appropriate). A software class where the source code size is evaluated as poor, contains too many or too little lines of code, resulting in unmanageable size and opacity or, on the other hand, in inappropriately short content. Contrarily, a software class that is assessed as good in terms of source code size is composed of manageable and acceptable lines of code." Please note, that this was also clearly explained to experiment participants before their tasks.

 

Your comment:

Explain the details in Table 2. Why we have

#poor #very good ->#very good
#poor #very good -> #poor

I suppose that this is related to the sentence:

In addition to performed individual evaluations, participants were asked to coordinate their evaluation with the assigned co-assessor and providing coordinated and agreed evaluation. An example of a software class assessment is shown in Table 2.

How it is done and who was the co-assessor and what is his/her influence on the final results?

Our answer:

We once again thank you for pointing out the lack of explanation.

You guessed well – coordinated assessment is assigned automatically if all assessors gave the same individual assessments – see lines 2, 3, and 4 in Table 2. When individual assessments were different (lines 1 and 5 in Table 2), a face-to-face conversation took place (explaining what was the rationale for their assessments) and assessors had to agree on common assessment.

Paragraph 4 in section 3 (starting with line 185) describe in details the evaluation process, explaining the content in Table 2 and 3 and emphasising the role of the coordination between the assessors.

 

Your comment:

Please check the language and English errors.

Our answer:

Thank you! A native English professional proof-reader proofread the revised version of a manuscript.

In addition to minor language errors (a, the, which, past – present ….) some statements were improved in terms of a style while preserving the meaning. Some examples:

 

They are popular to employ in scientific research.

-->

They are an obvious choice when it comes to scientific research.

 

Professionals are participants that represent experiences in contrast to students, where education is usually on high level, but they lack of experience.

-->

Professionals are participants that boast higher level of experience, in contrast to students, whose education level might be high but who are often missing a comparable level of experience.

 

Your comment:

Write the authors contribution.

Our answer:

Done, starting with line 362. Thank you!

Author Response File: Author Response.pdf

Reviewer 2 Report

Authors present a comparative study orieneted to expert judgment in a source code quality research domain, where two groups are involved - students and experts. Authors followed the research question whether the students are a comparable substitute for the professionals in experiments in source code quality research domain.

The results of the case study are fruitful and promissing and can serve as a background for the future research.

The research purpose and the statement of the problems are clear. The tables and figures are precisely prepared and they are very helpful for the readers to interpret and follow the findings.

Some general (major) remarks:

  • a number of participants 54 in a case study seems to be rather small, why?
  • paper needs one more proof-reading
  • some statements in Conclusions section should be provided in past tense (we showed in this paper ...)
  • a motivation (in Intro) and novelty (in Conclusions) should be more emphasized

 

Minor remarks:

  • keywords should not begin with capitals, they could be alphabetically ordered
  • Table 1, change article An -> A in caption (A program ...)
  • lines 172-173, maybe some leading text to avoid double heading should be inserted
  • sec 4.2, lines 197-198, please use unified format for decimal numbers (firstly comma for decimal part was used, now you use period symbol
  • Table 5 has caption over, not under

 

Typography (for LaTeX):

  • use \usepackage{csquotes}, for text in double quotes use \enquote{...}, for a single quote use \enquote*{}
  • lines 220 and 227: either use \noindent or do not break the line after \end{displaymath}, optionally put "comma" at the end of formula
  • distinguish between hyphen and dash:

- hyphen) between the elements of compound words

-- (en-dash) for ranges

--- (em-dash) punctuation for digressions in a sentence

Author Response

Dear reviewer,

 

Thank you for your valuable review, expressing your concerns and improvement suggestions. We reviewed your comments and carefully address all of the open issues. We appreciate your contribution, which enabled us to rise the quality of the manuscript.

Our answers and changes are as follows.

 

Your comment:

Authors present a comparative study orieneted to expert judgment in a source code quality research domain, where two groups are involved - students and experts. Authors followed the research question whether the students are a comparable substitute for the professionals in experiments in source code quality research domain.

The results of the case study are fruitful and promissing and can serve as a background for the future research.

The research purpose and the statement of the problems are clear. The tables and figures are precisely prepared and they are very helpful for the readers to interpret and follow the findings.

 

Our answer:

Thank you! We appreciate your positive opinion and recognized contribution.

 

Your comment:

Some general (major) remarks:

  • a number of participants 54 in a case study seems to be rather small, why?

Our answer:

Thank you for your remark. We agree that the number of participants seems low. However, considering that we implement an expert judgement in software quality domain, we believe a justification can be seen. Firstly, it was a classroom study that needed guidance and supervision. The students were divided into random groups, allowing them to coordinate their assessment. Since the groups were not constant, the coordination could take place with as many different participants as they have entities. In order to provide a reliable decision, a deep conversation and view's exchange had to be done. Finally, the number of assessed entities of each participant was quite high – on average a few hours were needed to provide the quality assessment of all of the assigned entities. The mentioned reasons contributed to a sample size that seems quite low. In addition to this, we selected the appropriate statistical method to verify data, which is also adequate to sample size.

 

Your comment:

  • paper needs one more proofreading

Our answer:

Thank you! A native English professional proof-reader proofread the revised version of a manuscript.

In addition to minor language errors (a, the, which, past – present ….) some statements were improved in terms of a style while preserving the meaning. Some examples:

 

They are popular to employ in scientific research.

-->

They are an obvious choice when it comes to scientific research.

 

Professionals are participants that represent experiences in contrast to students, where education is usually on high level, but they lack of experience.

-->

Professionals are participants that boast higher level of experience, in contrast to students, whose education level might be high but who are often missing a comparable level of experience.

 

Your comment:

  • some statements in Conclusions section should be provided in past tense (we showed in this paper ...)

Our Answer:

Thank you! Conclusion section was improved while a proofread, including tenses.

 

Your comment:

  • a motivation (in Intro) and novelty (in Conclusions) should be more emphasized

 

Our answer:

Thank you for your observation. During re-reading our manuscript, we also noticed, that, especially Conclusions, could be better articulated. This is why we included an additional paragraph on motivation (starting with line 73), and we also base the Conclusion on data – additional paragraph was included, starting with line 340, paragraph 3 and 4 in Conclusion section were also rewritten.

 

Your comment:

Minor remarks:

  • keywords should not begin with capitals, they could be alphabetically ordered
  • Table 1, change article An -> A in caption (A program ...)
  • lines 172-173, maybe some leading text to avoid double heading should be inserted
  • sec 4.2, lines 197-198, please use unified format for decimal numbers (firstly comma for decimal part was used, now you use period symbol
  • Table 5 has caption over, not under

Our Answer:

We appreciate the remarks. We addressed all bullets. Keywords are now formatted well; the decimal symbol was unified, using ".". The caption of Table 5 was moved. Complete manuscript was once again checked by a native English speaker.

However, we did not manage to split heading and subheading (4 and 4.1) since it would break the structure of subheading 4.2. The rationale is that we wanted to clearly separate subsections about a tool for data acquisition and subsection about results. If you think this is not acceptable, we can join the complete section, but in this case, the structure would be harmed in our opinion.

 

Your comment:

Typography (for LaTeX):

  • use \usepackage{csquotes}, for text in double quotes use \enquote{...}, for a single quote use \enquote*{}
  • lines 220 and 227: either use \noindent or do not break the line after \end{displaymath}, optionally put "comma" at the end of formula
  • distinguish between hyphen and dash:

- hyphen) between the elements of compound words

-- (en-dash) for ranges

--- (em-dash) punctuation for digressions in a sentence

 

Our answer:

Thank you for your suggestions and help. All of the abovementioned suggestions were considered. We used the csquotes package and consequently present the quoutes using \enquote{...}. Also, we added a \noindent in order to remove the line indent. The comma was added, and hyphens and dash distinguished.

 

Thank you again for in-depth review which gave us an opportunity to improve the manuscript.

Author Response File: Author Response.pdf

Back to TopTop