Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessFeature PaperArticle

Peer-Review Record

Impact of Computer-Based Assessments on the Science’s Ranks of Secondary Students

Appl. Sci. 2021, 11(13), 6169; https://doi.org/10.3390/app11136169

by Eduardo A. Soto Rodríguez

, Ana Fernández Vilas^*

and Rebeca P. Díaz Redondo

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Reviewer 4:

Wei Shin Leong

Appl. Sci. 2021, 11(13), 6169; https://doi.org/10.3390/app11136169

Submission received: 23 March 2021 / Revised: 29 June 2021 / Accepted: 29 June 2021 / Published: 2 July 2021

(This article belongs to the Section Computing and Artificial Intelligence)

Round 1

Reviewer 1 Report

Studies that take on the issue of differences between assessment modes are essential to moving the assessment industry forward. I wanted this article to be a part of that work, but I'm concerned that there are fundamental issues with the design of the study that will prohibit it from providing generalizable findings. I encourage review by an individual with expertise in psychometrics and resubmission.

Author Response

Attached

Author Response File: Author Response.pdf

Reviewer 2 Report

The study investigates what effects can be apparent in the shift of testing mode - from Computer-based to paper-based assessment. The focus is to show whether CBA produces equivalent results. The study contributes to the body of knowledge by initiating to compare the percentile ranking scores instead of raw scores.

As the authors claim validity, reliability, and fairness as the prime measures for the test and instrument quality. The unappropriated design of CBA (multimedia or interactive elements) is one of the main issues in the literature. How the authors address this issue in the study? I would like to hear more about the digital items, moreover that they are claimed to be constructed by the teachers. Are CBA and PBA items, used in the study identical? If they are different, how the authors can describe "test difficulty" levels? Was analysis done on an item level to compare CBA and PBA items?

Author Response

Attached

Author Response File: Author Response.pdf

Reviewer 3 Report

This study reports on a comparison of EOC scores awarded to two cohorts throughout four years of secondary school, by exploring the impact of testing mode: digital or paper-based. The importance of the study is in exploring this effect in the course of school-based setting, in addition to presenting the mode effect by various explaining variables: gender, cohort, and achievement level. The strength of this study is in its methodology and the mode for data analysis which are very clear and well justified, thus present a clear view of the investigated effect mode comparison in the results section. However, I do find weakness in the theoretical basis of the study which also has implications on the study discussion. Particularly, I believe that every study should state a clear theoretical contribution, where in this study it seems that the authors mainly suggest an (important but only) practical contribution. Following, I detail my review with respect to various sections of the paper, together with suggestions for improvements.

General comments:

There are some editing errors that should be taken care throughout the paper. Some of them are minor, for example on P. 3, in the beginning of section 2.1, there is missing the word “it” before “is important” To understand the benefits and risks of using CBA is important to check. But some of them impairs the understanding of the text, for example – on the same section, the authors claim: “An alleged advantage of CBA is that would allow to measure new constructs”… it is not clear what the authors want to say in this sentence. I strongly suggest that the paper will undergo linguistic editing.
The use if citations within the text is not accurate in some places, for example on P. 3, they cite “B Fishbein, 2018”. It should not contain the B letter.
The numbers of tables should be corrected. Table 1 that appears on page 9 should be Table 3.

Introduction and literature review:

There is intensive use of the word LSA. However, it is not explained anywhere in the paper.
I don’t understand the title: “0. How to Use This Template”.
The authors claim that: “First, describing how a secondary school adopted CBA to assess students in science through digital high-stakes exams made by teachers.” This is what the literature review should focus on, but in this current version of the paper this is not the case.
2: the paragraph that begins with: “The rest of the content will be distributed as follows:…” is not necessary. It contains an obvious structure of a paper. Instead, it would be helpful to provide the logical sequence of the literature review.
I suggest to add to the title of “2.1 Advantages & Risks” the words “of CBA”
Section 2.2 is extremely significant to the literature review. I think it should not be a sub-section, rather be a section for itself. Also, the authors should further describe the conditions for choosing the eight studies described in this section, i.e. to justify the following: “To contextualize this work, we gathered the outcomes of eight studies publishing since 2005, covering the effect of mode on achievement in school-aged students and related to the educational field of science.” Finally, this section should also refer to the mode effect that was realized in the PISA international test for example. The OECD began to use CBA for the PISA test only recently. It is important to mention the comparison of results between CBA and PBA test modes as was revealed in this international test – either between countries, or within a specific country while comparing between years in which students were tested using a PBA mode compared to years in which students were tested using a CBA mode.
Another issue that is missing in the literature review is the reference to assessment – particularly referring to CBA – in different grades in high-school, referring to gender differences and to different achievement levels. This is necessary to justify the research questions. It is also further deficient in the discussion of the paper. For example, the authors claim that “our results do not show evidence that CBA had harmed more the performance of female students and favored male students as it has been suggested for some cases (Gallagher, Bridgeman, & Cahalan, 2002; Jeong, 2014).” But the first cited study is not discussed in the literature review, and the second is slightly mentioned, and not in referencing to gender.

Methodology:

“3. The School: A Case Study”. On pages 9-10 the authors explain why a case study will be sufficient for this study. However, they do not give information about why this specific school was selected for this study. Is this school representative of other schools in Spain?
Table 1 on P. 8 should be displayed in a form that will better demonstrate the shifting from PBA to CBA mode.
10: “In this context, we will have to assume, at least initially, that the burden of proof is on the CBA side to demonstrate that it produces equivalent scores.” This sentence should be backed up by the literature review. The authors claim at the end of this paragraph that “This is also what the studies about the testing mode in PISA and TIMSS aimed to know”. As I mentioned before, the shifting to a CBA mode in recent years should be discussed in the literature review of this paper. In respect to that, the uniqueness of this study is the investigation of this shifting in school-based tests, which are more valuable to teachers as these are part of the regular assessment they conduct in school.
Figure 1: in STEP 3, the sentence that deals with CBA should be end with the word “CBA” instead of “PBA”

Research questions:

I wonder if there should be another RQ referring to the difference between various scientific fields… and if not – the authors should justify why they refer to all these fields as equivalent. With this in mind, I also would like to refer to that the authors claim that the EOC raw grades are a product of measuring “a common construct, namely the achievement in scientific literacy”. However, it is not trivial at all that EOC test would measure students’ scientific literacy. Although it is slightly justified on P. 6, it is not enough.

Results:

I think it is better to display tables 5-7 at the end of section 5.1.
16: there is repetition on the data analysis section. I suggest the authors to re-think whether it is necessary to mention it here.
I wonder if there is a necessity to display findings that also exemplify the comparison along the years investigated in this study, since it is very impressive that the authors had such extensive data.

Discussion:

More citations should be mentioned in this section, to back-up the authors’ claims.
The first two paragraphs mention some of the study limitations. I suggest to put it in a designated section, or to move it later in this section. Instead, I suggest to begin the discussion with stating the goal of this study.
More emphasis should be placed on discussing the findings for RQ1 & RQ2. I believe that after the revisions I suggested for the literature review, the discussion could be improved as well.

Author Response

Attached

Author Response File: Author Response.pdf

Reviewer 4 Report

Thanks for this opportunity to review this work - it is very informative, thought-provoking, and highly relevant to my area of work in policy-making on school-based assessment.

I will begin with a quick clarification question first before going into substantive commenting.

The methodology and analysis hinges upon an important assumption that (and I quote from line 318-321):

"the six science disciplines taught across secondary (see table 1) share a common “linking construct” called in PISA “scientific literacy (OECD, 2013, p. 13)”, are “sufficiently unidimensional” to be compared and, share a similar rank of difficulty for students of secondary education (Coe et al., 2008, Chapters 4–6; OFQUAL, 2018).

Hence the authors had gone on to "limit the effects of the aforementioned biases", and that "raw EOC grades will be converted into PRS and compared exclusively at within-pupil (paired-tests) level."

My questions are:

How are the assumptions being treated as 'biases' in this case of analysis?
Do we therefore assume that different cohort of students (in this case Y9 and Y10) should perform similarly across CBA/PBA such that inter/within student analysis makes sense? I am not so clear how and why the analysis of results of Y9 and Y10 students can be compared meaningfully, such that for e.g. the mean grades in Y9 and Y10 (and across CBA/PBA) should be similar? Maybe I misread this. Or is it that the comparison is only made within Y9 (CBA vs PBA) and Y10 (CBA vs PBA)?
The author alludes to the possible 'similarity in demands' of Science subjects, presumably suggesting that the demands of Science subjects say in Y9 should be more or less similar, so students' rank order of performances should be similar? Or should it be the case that we need to compare 'apple' with 'apple' e.g. PC3_17 versus PC4_17?
I also understand that EOC is a composite of many test/exam scores, so can I confirm that the EOC gathered say from CBA is ENTIRELY based on computer-based assessment, versus say PBA (it is entirely based on PBA?)

Going into more queries on sections of manuscript:

1. Not sure how the wonderment is extended throughout the manuscript as noted below?

Line 27-28:

Though that situation might be changing since the arrival of the current Covid-19 crisis we will wonder across this paper to what extent CBA is actually an affordable technology to regular schools?

2. The parameters of search of literature review should be spelt out more clearly. I hope it is not true that the author depended on only using google search.

Line 200-201:

Unfortunately, a search for related literature in Google Scholar using the terms “CBA” (or any of its other equivalent terms), “implementation” (or deployment) and “secondary school” (or secondary education)…

Line 207-209:

In sum, nothing was found in literature about long term “mode effect” studies under non-standardized conditions, with school grades as the main outcome variable and implemented with the existing resources of average schools.

3. These are not reasons to conclude that the current references are not conclusive:

Line 267-270:

"Third, because in our study we aim to compare the effects of mode on ranks instead of on raw scores, so we will convert the EOC grades into percentile rank scores (PRS). Fourth, because the entire implementation, this is, the development of items, the construction of test and the administration of these assessments were not led by assessment experts but for secondary teachers across school years."

4. Can authors give more details of how the teachers go about designing items in CBA – how different is it from PBA aside that it’s delivered online/marked by moodle? If not what is the mechanism of marking? It will be useful to comment on this say using the same subjects e.g. PC3_17 versus PC4_17.

5. Check that the formatting is consistent with requirement of the journal (esp. citation, use of italics and upper case)

Author Response

attached

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

This paper has improved significantly. You clearly put significant effort into the revisions and it shows. I'm excited to learn more about the future work. I've inlcuded a few notes below for further improvements.

18 – great to include Grieff here!

22- maybe state that it is uncommon in many education systems. It is very common in the US, for example, where a lot of your audience might be

23 – great quote by Bennett

29 – It seems that your argument is not really about whether CBA is ‘affordable’, as you state here, but rather to what extent the use of CBA is a worthwhile investment in terms of effectiveness, efficiency and cost as compared to PBA?

119 – Section 2.1 is much improved!

484 – Methodology section is much improved!

697 – Discussion section – Again here, I don’t think your argument is about affordability, but rather whether CBA is a worthwhile investment. I would reframe this argument to lay out the pros and cons that you have effectively identified in your work, as you do here.

880 – The Digital Education Research Network (DERN) by ACER is a very valuable and current resource for information about this space.

Author Response

attached

Author Response File: Author Response.pdf

Reviewer 2 Report

All parts of the article are significantly improved.

Author Response

attached

Author Response File: Author Response.pdf

Reviewer 3 Report

The authors responded carefully and thoughtfully to all of the reviewers' comments. Particularly, the extensive changes they made to the paper in response to my previous comments, ultimately improved it.

In the discussion, one important note needs to be made: It should contain references that were previously highlighted in the paper. I strongly suggest that the authors will consider adding the references from the introduction or theoretical background where they can further back up the findings in the discussion, or alternatively to place some of the citations that are mentioned in the discussion in other relevant sections of the paper.

Author Response

attached

Author Response File: Author Response.pdf

Article Menu

Impact of Computer-Based Assessments on the Science’s Ranks of Secondary Students

Further Information

Guidelines

MDPI Initiatives

Follow MDPI