4.6.1. Comparing Students’ Estimations
The following step in examining the results was to compare how students evaluated the difficulty of each test questions.
Given that the test contained a large number of questions, the analysis was not conducted on the whole set of questions (full scale). Five questions were selected for which the students had given the highest percentage of correct answers, and another five questions were selected for which students had given the lowest percentage of correct answers.
The responses were ranked on a three-point Likert scale, where the easy questions were rated as 1, medium questions were rated as 2, and the hard questions were rated as 3. Although the Likert scale is strictly a ranking scale, it is generally accepted that in order to take advantage of the possibilities offered by complex statistical procedures, it is treated as an interval scale. According to Selltiz et al. [
21], the scores can be added and averaged.
The following analysis presents the degree of estimated difficulty for the top five correctly answered questions and top five questions with the lowest correct-answer rate, for all students from all years of study. The results show estimations on the full scale.
As already indicated, the students could rate every single question as easy, medium, or hard.
In terms of the average of the responses for students from all three years of study, the top five correctly answered questions were Questions 3, 30, 37, 39, and 42. For these specific questions, most of the estimations fell into the easy category, which is in accordance with the number of correct answers to those questions (
Figure 2).
The descriptive results were analyzed with a one-way ANOVA test. Three out of the top five highest score questions (Q30, Q39, Q42) show a significant difference between students’ estimation (
Table 7). Based on the results, we can state that the students from the third year estimated the questions as significantly harder than the Year 1 and Year 2 students. The students from Year 2 estimated question Q39 as harder than the Year 1 and Year 3 students. The results for each question are less significant than the data for the whole scale, and this result partially supports the H2 hypothesis.
The five worst-performing questions were Questions 6, 13, 14, 20, and 25 on the average of the responses for students from all three years of study. For these questions, most of the estimations fell into the medium or hard category (
Figure 3). Based on the results, a review of question Q6 is definitely recommended, as the majority of students rated it as an easy question; however, they received one of the lowest scores (the lowest rate of correct answers) for this question.
Four from the top five lowest-score questions (Q6, Q14, Q20, Q25) show a significant difference between students’ estimation (
Table 8). The Q6 question was estimated as significantly harder by the Year 3 students than the Year 1 and Year 2 students. In the case of Q14, opposite estimation was done: this question was “harder” for the Year 1 students. The result for Q14 supports the H2 hypothesis. The estimations for Q20 and Q25 questions differ from the previous three.
These results do not support hypothesis H2 and show only that the students’ estimation was close to the real difficulty of the questions: when a question was estimated as hard, then they achieved a lower score.
This section presents the results of the analysis answering the following questions:
Is the lowest-score question estimated on the same level for all students?
Is the highest-score question estimated on the same level for all students?
What score did students achieve on those questions?
Score in this case refers to the rate of correct answers.
It was found that question Q25 received minimum correct answers (low score), and question Q42 was given the highest correct answers rate (high score) from students from all three years of study. The question with the highest score (Q42) was also rated as the easiest question by students of Year 1 and Year 2. Without exception, all students from Year 1 estimated its difficulty as easy, only 12% of students from Year 2, and 10% from Year 3 estimated the question’s difficulty as medium difficult (
Figure 4).
As for the question with the lowest score (Q25), the students’ estimations were quite different. The majority of students in Year 1 found the question moderately difficult (medium), while the majority of students in Years 2 and 3 found its difficulty hard (
Figure 5). The same trend can be observed with the points achieved on these two questions: the Year 1 students received the highest amount of points, while the Year 2 and Year 3 students received the lowest points. At the same time, the Year 2 and Year 3 students also felt that it was more difficult for them to solve the task, which supports hypothesis H2.
When estimating the degree of difficulty of the full scale (the whole test), the majority of students identified it as being of medium difficulty (
Table 9).
The differences in students’ estimation for each year of study are illustrated in
Figure 6.
The results were tested with the chi square probe (
Table 10).
The data are consistent with the values shown in the descriptive statistics: the majority of students from all three years of the study estimated the test with medium difficulty value. The results show a correlation between the two variables (performance and task difficulty estimation) (χ2 = 10.73 p = 0.03). When students from a higher year of the study estimated a question as medium or hard, then, they achieved a lower score there. This result supports hypothesis H2.
Based on year of study, the authors separately examined the five best- and worst-performing questions and the estimation associated with them. For these questions, they also compared the estimated value of the questions given by students with those defined by the instructors.
In addition to Questions 42 and 25 already analyzed above, several questions received similar estimations from the students of all three years of study. Question 37 received a high score from the students in their first and second year, and Q39 received a high score from the students in their second and third year. Students in Year 1 and Year 2 achieved low scores with Q14. Questions 5 and 16 were those where students in Year 1 and Year 3 gave fewer correct answers.
In the estimation of the difficulty of the five highest-scoring questions, the opinions voiced by students in Year 1 and the instructor are very similar (
Figure 7). Estimations were also compared with a nonparametric statistical analysis, where the result of the Mann–Whitney test also supported the agreement of the estimations.
For the lowest-scoring questions, the estimations of the students and the instructor were no longer so consistent (
Figure 8). For Questions 6 and 16, the majority of students and the instructor were of the same opinion. Question 25 was marked as medium by the students and hard by the instructor. Question 43 was rated as hard by the students as opposed to medium by the instructor. Statistically, these differences are not significant. However, for Question 14, where students estimated the question as medium and the instructor estimated the question as easy, there was a significant difference in estimation based on the Mann–Whitney test (Z = −1.97
p = 0.04). This result supports hypothesis H3: it indicates that the Year 1students gave a different estimation of difficulty than the instructor.
The same two analyses were conducted for the estimation of students in Year 2. In the case of the questions with the highest scores, four questions (Q3, Q4, Q37, Q42) were estimated to be on the same level by both students and instructors (
Figure 9). In the case of Question 27, the instructor rated the question as hard, while most of the students rated it as medium. The difference is only shown in the descriptive statistics; the Mann–Whitney test does not show a statistically significant difference. This result supports the H3 hypothesis. The questions with lowest scores showing in
Figure 10.
With the students in Year 3, regarding the questions that received the most points, there was a difference in the estimation for the students and the teachers for Question 23 (
Figure 11). The students rated the question as easy, whereas the teacher saw it as medium. The other questions’ estimation did not show much diversion: the majority of students as well as the teachers found the questions easy.
In the case of the questions with the lowest score (
Figure 12), the estimations of the students and the instructor were the same for Questions 25 and 26, as both parties saw their difficulty as hard. Question 6 was considered hard by the students, as opposed to easy by the teacher. Question 7 was considered easy by the majority of students and of medium difficulty by the teacher. Question 16 was estimated as hard by the students and of medium difficulty by the teacher. In all cases, the Mann–Whitney test performed with these results showed no relevant difference between the estimation of the students from the third year and the teacher. This result also supports hypothesis H3 for the students of Year 3.