1. Introduction
The world has probably seen more changes in recent decades than at any time since the industrial revolution. These changes demand substantial transformation in society in general, and particularly in our educational institutions. This transformation is characterised, above all, by increasingly complex problems, caused, in large part, by changes brought about by information and communication technologies (ICT) and economic competitiveness in a globalised world. All this means that today’s issues are a mix of the social, the professional and the personal. There used to be a clear line separating these types of issues and situations, but today, it is very much blurred.
These changes mean that people today must increasingly develop their abilities to make sound decisions and solve problems effectively. At the same time, today’s complex real-world problems are not solved by following a model of intelligence based primarily on classic IQ or logical reasoning [
1,
2,
3,
4]. Socio-emotional dimensions of intelligence, along with competences in creative and critical thinking, are fundamental today in the cognitive exercises of problem-solving and problem-finding [
5,
6,
7,
8,
9].
Tackling the problems of today’s world requires us to employ more critical thinking (CT) skills, as they seem to be specifically designed for these problems, given that they involve skills and strategies that are more generalisable or transferable than others [
10]. CT skills being mainly horizontal and contextualised competences means that we can use them equally effectively in many situations. The flexibility and adaptability of CT to different situations and contexts makes it an excellent candidate for handling today’s changes and new demands. In this sense, education and training at all levels need to adapt much more quickly to these changes. In particular, university education is called upon to change further in order to train young people in their transition to adulthood [
11]. Universities need to transform themselves more than other institutions in society if they are to progress, lead the necessary change and produce professionally trained, mature, socially responsible adults.
However, there seems to be a worrying disconnect between academia and the real world. Companies are increasingly demanding more transversal or horizontal competences. For example, they need not only qualified biologists or engineers, but also professionals who, from their specialist area (vertical competences), solve problems in different work contexts, make decisions individually and collectively and communicate their results in an argumentatively precise manner [
12]. These general skills in solving decontextualised problems, making good decisions and arguing persuasively are not the focus of today’s higher education. As we noted above, developing these competences requires a different conception of our cognitive machinery and the skills that are really fundamental. Additionally, and no less important, it requires an awareness of the deficiencies and limitations of that machinery.
Higher education today suffers from three significant ills [
12]. As implied earlier, the first of these stems from the fact that the real world has changed faster than universities have. The second relates to the fact that the predominant model is still the administration of accumulated knowledge and not the management of learning, in terms of the process of acquiring knowledge and developing competences. The third is that training is still mainly vertical or disciplinary and rarely horizontal or cross-disciplinary. We have already noted that in the real world—the world of work—companies and some institutions require qualified people who can perform in very different contexts and solve problems equally well in any of them. Our students lack sufficient preparation for working with real problems in different contexts—in other words, the widespread application of knowledge and its inter-domain practice is rather limited, at least in comparison to what it needs to be. With these three dominant characteristics, graduates are unlikely to be qualified to perform and solve problems in complex, new, changing contexts.
New times and new problems require new tools and new strategies. Today’s world has become so complicated that the need for lifelong learning is now a given. One real and challenging change is therefore the ongoing preparation of professionals and citizens. The consequence of this is the issue of institutions offering courses throughout people’s lives. This brings us to a reflection that has yet to be properly addressed: being aware of the differences between teaching, learning and training helps us to better address this issue. If it is not possible to offer education for all ages, yet lifelong learning is needed, how can this be achieved? In our view, the answer is primarily training, and not just apprenticeships. Training implies student autonomy and almost complete independence. This means that they know where to go for documentation; know how to apply reliable criteria when evaluating what they find; and are able to reach their own conclusions and explanations about the problems or questions that interest them. This reflection therefore relates to the question of where training can or should be given today.
However, it is important to note that some strategies are particularly effective in lifelong or ongoing training. People must learn how to learn, and learn to develop behavioural, cognitive and motivational strategies that regulate their ongoing acquisition of knowledge and skills. At the same time, with a greater emphasis on training than on learning, people need to be involved in processes that encourage their autonomy, initiative and responsibility at personal, professional and social levels. Three UNESCO reports, spaced roughly two decades apart [
13,
14,
15], show attempts to adapt to the changes that have taken place at each point in time.
The latest UNESCO report “retires the Delors report and redraws a new horizon for education” [
16]. According to Sobhi Tawil, a leading member of the commission that drafted the report: “In the current global debate on the future of education, there are two parallel currents. The first concerns the role of education in the post-2015 international development agenda. The second relates to the way in which the transformation of society globally impacts on our approach to education and learning...” [
17]. The second current is that which today requires training in terms of learning to understand, to do, to be, and to be with others.
If changes are made in educational methodologies by giving greater prominence to learning processes and focusing on greater application and cross-domain practice, then higher education can be made to take on the development of critical thinking as its main objective [
12]. Indeed, today’s changes and demands are greater than in past decades, and how we deal with them needs to be through competences and strategies that can be employed to handle this complexity.
CT offers what is needed in terms of how to progress, train and generalise. Being able to change faster entails developing collaborative communities of enquiry that participate in solving real projects together. In training, it is important to consider different methods that facilitate the acquisition processes, such as working on tasks involving not only comprehension but also production, aimed above all at acquiring knowledge based on explanation, and applying this knowledge in different contexts. In order to be able to generalise in different situations or contexts, it is essential to develop CT’s fundamental competences—such as explanation (search for causality), decision-making and problem-solving. These skills are domain-independent, meaning they are required skills that need to be used throughout any situation or context [
12].
Before specifying the research problem and the hypotheses, it is worth remembering the conceptual approach published in different works, some of which have been cited above. We understand that “to think critically is to reach the best explanation for a fact, phenomenon or problem, in order to gain insight and solve it effectively” ([
18], p. 27). Knowing or solving a problem requires knowing what causes are responsible for certain events or problems. If the explanation is sound, we can choose the best course of action or the best option to resolve the situation effectively by bringing about the desired change. Therefore, it is the explanation that determines the decision and the solution and, finally, the change and well-being or achievement (a full description of our approach can be found at [
19]). If improving critical thinking is our goal, we believe that this is the best way to achieve it. Therefore, our intervention aims to achieve change through the development of the aforementioned skills. This is the intervention framework for the development of such skills (for more information about the intervention, see [
18,
20,
21,
22]).
To be able to ascertain whether CT skills are being properly developed or can effectively predict performance in higher education, we need reliable, valid assessment tools. Fortunately, this assessment is available. As an example, we used a CT test—PENCRISAL [
23,
24]—to assess a number of first-term psychology undergraduate students, during the academic years 2011–2012 to 2015–2016, as part of the CT subject area, which forms part of a subject. PENCRISAL assesses five dimensions of critical thinking: deductive reasoning (DR), inductive reasoning (IR), practical reasoning (PR) or argumentation, decision making (DM) and problem solving (PS). We also obtained students’ average academic grades and average university entrance grades. In addition, we considered some of the main dimensions of achievement motivation, using the Manassero Achievement Motivation scale [
25]—although these were not used in this study beyond the procedures for monitoring course performance. With this information, we organize this article as part of our study of PENCRISAL’s structural and criterion validity as a tool for assessing critical thinking in higher education students.
As we described at the beginning, the complex problems of today’s world cannot be solved with a classical model of intelligence; these changes force citizens of this century to increasingly develop their ability to make solid decisions and to solve problems effectively. These competencies are precisely part of the fundamental CT competencies, widely studied in [
18,
19,
26,
27,
28]. However, there are still a lack of studies that prove that the CT model is better than the classic model of intelligence in solving the problems of today’s world [
27]. This study aims to fill this gap. Therefore, one of the objectives of our work is to demonstrate that CT predicts academic performance well, something that has not yet been proven. The other objective of our study is to demonstrate the structural validity of our CT assessment test. We must be sure that our measurement instrument measures what it says it measures and that, in addition, it predicts the performance of our university students. For this, we used a sample of more than 600 students, in order to test the construct validity. Furthermore, we used a sample of more than 200 students who underwent a CT instructional program, in order to test the predictive validity.
Thus, the purpose of our study is to test the structural and criterion validity of our CT assessment test and, therefore, to show the degree to which CT can account for academic performance. It is not our intention here to deal with conceptual developments or intervention procedures. However, we have cited our own works and those of relevant authors so that the interested reader can delve into these topics there. Let us now describe the methodology used in our study.
3. Results
Table 1 shows the distribution of students’ scores from the five PENCRISAL subtests by adding together the marks for their respective items. Along with the minimum and maximum values, we show the mean and standard deviation, skewness distribution coefficients and kurtosis. These values take into account the two timepoints when PENCRISAL was applied, i.e., at the beginning (pre) and at the end (post) of the course.
As
Table 1 shows, the students’ scores for all five critical thinking subtests were low, with scores of zero or just one point. The post-course scores were higher than the pre-course scores, meaning that at the end of the course, some students had improved their performance in the subtests. The only exception was the problem-solving (PS) subtest, as the average at both timepoints was very similar, and even slightly higher in the pre-course measurement (before starting the subject). In addition, the skewness and kurtosis indices for the distribution of the results in the sample fit a Gaussian distribution, as they were always less than unity. Lastly, the standard deviation of the results for the five subtests rose in the post-course measurements. This means there was more heterogeneity in the students’ scores at the end of the course, which may be because some students benefitted from the critical thinking lessons and others did not. This did not occur in the PS subtest, as the variance values were similar in pre-course and post-course measurements.
To analyse the changes in students’ scores in each subtest, pre- and post-course, we calculated the mean differences using the t-test for dependent samples (paired-samples t-test). To appreciate the magnitude of the difference between two moments, Cohen’s d was estimated. Most of the differences are statistically significant: PencriDR (t = −15.209, df = 681,
p < 0.001, d = −0.582); PencriIR (t = −12.067, df = 681,
p < 0.001, d = −0.483); PencriPR (t = −18.237, df = 681,
p < 0.001, d = −0.698); and PencriDM (t = −15.847, df = 681,
p < 0.001, d = −0.607). In all four of these subtests, Cohen´d presents near 0.50 or a medium effect size [
31]. The
p and d values indicate clearer improvement in the scores in four PENCRISAL subtests. In the problem-solving subtest, there was a slightly higher mean in the pre-course test, but it was not statistically significant and a very small effect size was observed (t = 1.326, df = 681,
p = 0.093, d = 0.051).
Bearing in mind one of the central objectives of our study,
Table 2 shows the results of PENCRISAL’s structural validity. Taking the test authors’ theoretical model, the indices of fit for the unidimensional model using the AMOS programme are presented [
32,
33]. This is because each of the five tests assesses a different dimension or cognitive process relevant to critical thinking, all adding up to one overall score. In other words, the PENCRISAL test proposes a measure of general critical thinking ability by taking the diversity of cognitive aspects presented in the five subtests: deductive reasoning, inductive reasoning, practical reasoning, decision making and problem solving. As pre-course and post-course data were available,
Table 2 shows the indices of fit for these two test timepoints (PENCRISAL_M1 and PENCRISAL_M2), indicating the confidence intervals (CI) for the RMSEA coefficient.
There were good indices of fit for both applications of PENCRISAL, which were slightly higher at the end of the course (timepoint 2). In both cases, CMIN/DF was below 5.0, the TLI and CFI indices were above or very close to 0.95, and RMSEA was below 0.08 (in the pre-course measurements, the RMSEA 90% confidence interval increased to 0.09). These indices are within the required parameters [
34]. One potential improvement for the PENCRISAL model at the beginning of the course is indicated by the RMSEA upper confidence interval being below 0.08 if we correlate the errors for the deductive reasoning and inductive reasoning subtests. We will return to this in the discussion.
These indices highlight the fact that the five subtests combine to give one overall score—understood here as critical thinking. The contribution of each subtest to the overall factor is shown in
Figure 1, with the two timepoints separated.
The factor weightings of some subtests increased between the start and end of the course, although in two cases, they remained the same (deductive reasoning) or decreased (practical reasoning). The deductive reasoning subtest was less strongly related to the latent critical thinking variable at both assessment timepoints. Both the inductive reasoning and deductive reasoning subtests were less strongly related to the general critical thinking factor than the other three subtests (practical reasoning, decision making and problem solving).
Turning to the second objective of this study—criterion validity—
Table 3 shows the distribution of students’ results from the pre-course and post-course PENCRISAL tests, admission scores and final average course scores. In addition to maximum and minimum values, the means, standard deviations, skewness and kurtosis, and correlations between variables are included in the table. The PENCRISAL score is the sum of the students’ scores in the five subtests, and these results are from a subsample of 242 students.
All of the correlations were statistically significant—which is in line with our second objective. The values for the correlations between the criterion variables were the highest, which may indicate intellectual baseline differences and cognitive improvements over the years. These data require further consideration, which we cover in the following section.
4. Discussion and Conclusions
Overall, the results are in line with our initial proposals. Critical thinking (CT) can account reasonably well for academic performance. Additionally, the test used to measure CT shows strong unidimensional structural validity, with an overall CT factor supported by the core dimensions of CT (deductive reasoning, inductive reasoning, practical reasoning, decision making and problem solving). In addition, the statistically significant correlations between the total scores in the pre-course test and post-course test and the criterion variables support our belief that CT can be a good predictor of academic performance.
Having an assessment of the level of CT before and after studying the topic provides useful information on the potential for improving these basic CT competences. With one exception, we were able to observe an increase in post-course versus pre-course scores in most factors or subtests. This reinforces the idea that CT can be improved with training and practice. Only in regard to the problem-solving aspect (where the scores from the post-course test were no better than for the pre-course test) was this improvement not seen. Bearing in mind the sample of classes measured (2011–2016) this lack of improvement is probably due to the way this aspect operated at the time. Decision making and problem solving both employ general strategies that are hard to separate. Additionally, at the time the study took place, the activities used to improve these skills were not yet able to distinguish between them sufficiently well. We have since managed to eliminate this overlap.
There are some less robust data that need to be substantiated. A correction was made to the pre-course measurement to reduce the RMSEA index in order for it to better align with the nature of the sample. The DR and IR subtests are more formal skills than the others and this makes them less sensitive to change because they are more difficult to apply and generalise. This results in a weaker correlation with the overall CT factor. It is important to remember that the CT test items are all problems that need to be answered by using, applying and generalising those specific processes; formal processes are less flexible and less easily modified for use outside their essentially algorithmic domain.
The indices of fit support the existence of a unidimensional model, which also worked better at the end of the course; the improvement at the end of the course may reinforce all dimensions of CT as a whole. Good performance is not possible without all CT core competencies working together. This may explain the unidimensional nature of the PENCRISAL test in terms of its structural validity.
In terms of criterion validity, we found significant relationships between CT and academic performance, and an even stronger relationship between this criterion variable and the university admissions measurement. As we noted above, these data go in the direction that we expected, based on our initial approach. However, there was no relationship between CT and university entry requirements. One way to interpret this lack of relationship may be the fact that the level of CT measured at the start of the course was fairly low in relation to the test’s reference standards. Moreover, whilst post-course measurements of CT demonstrated an increase in the level of CT, they were still too limited to capture the relationship that should exist with the entry requirements. CT skills are complex and require significant levels of expertise in order to be able to capture correlations with measures of a different intellectual nature—such as those that may underlie a university entrance score, which is the result of several years of schooling. In a study into how permanent the change in CT is following teaching and measured four years later, we saw greater improvement than immediately after the intervention. This improvement is attributable not only to the CT programme, but also to the experience and education gained throughout those four years at university [
20], although this is not always the case (see [
35]). We can therefore say that in this study, the relationships between CT and academic performance are easier to see (as they are the result of experience and education) than those between CT and the cut-off or selection measures for university admission, which capture a more stable and experience-independent threshold.
A number of implications emerge from our study. First, in order to be able to use CT levels as a predictor, we need levels that are above the average of the test scales used. Second, it is possible to improve CT competencies using the right instructional tools and given sufficient practice and procedural work. Lastly, we need to take CT measurements in order to ascertain our starting point, and thus, see what has been achieved. If we believe that today’s world requires these complex skills, we also need to ascertain the degree to which they are available from the outset. Without such an assessment, we cannot see how far we have come.