1. Introduction
Educational assessment is a key aspect of high-quality education [
1,
2]. In the United States, assessment is seen as an essential lever to facilitate high-quality education. States across the U.S. mandate the use of standardized content tests to assess, monitor, and support English learner (EL) student learning. When standardized assessments are administered to ELs, questions of validity come into play due to a variety of factors. An EL is “an individual who has sufficient difficulty speaking, reading, writing, or understanding the English language to be denied the opportunity to learn successfully in classrooms where the language of instruction is English or to participate fully in the larger U.S. society” [
3]. English learners can be found across the globe, but this article focuses on English learners in one state in the United States.
Standardized testing and questions of validity are exemplified in the state of Colorado in the U.S. In 2018, the percentage of ELs enrolled in public schools was 10% or higher in eight U.S. states, including Colorado [
3]. In the 2016–2017 school year, ELs represented approximately 12% of students in Colorado public schools; they were the fastest growing student population. The majority of ELs in Colorado, 77% and 80% respectively, are native Spanish speakers [
4]. Colorado law requires every student to take standardized content assessments in English, regardless of the time enrolled in U.S. schools. The Colorado Measures of Academic Success (CMAS) is Colorado’s standards-based assessment designed to measure the Colorado Academic Standards. The CMAS Science assesses four performance levels: distinguished, strong, moderate, and limited. The CMAS Science test results from 2015 point to a disparity between ELs and their non-EL peers. In 2015, statewide, 29% of non-ELs had distinguished and strong scores in the 8th grade CMAS Science test; however, only 6% of ELs scored at those two performance levels. Moreover, 61% of ELs received the lowest rating [
5].
A significant body of research reveals that high-stakes assessments present a challenge for ELs [
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17]. Research points to three significant issues that impact the assessment of ELs. First, ELs’ test performance may be negatively impacted due to high language demands in high-stakes assessments [
6]. This assertion is supported by research that established a strong association between increasing linguistic complexity and decreasing test performance [
7]. Second, a growing body of research demonstrates that ELs’ test performance is reflective of their English language attainment versus their content knowledge [
8,
9,
10]. As a case in point, in a study with 1700 ELs who were tested in English and Spanish on a standardized math achievement test, the results showed that the ELs answered more items correctly on a math test in their home language [
11]. Third, researchers assert that assessment results for EL students are not valid—that is, the assessments are not measuring the intended construct [
12,
13]. For example, in the case of standardized science assessments, the intended construct is science content knowledge; however, when an assessment is given to ELs in English, the assessment measures their English language proficiency instead [
14].
Researchers have established that there is a persistent disparity in test scores between ELs and non-ELs in academic content areas due to bias in testing [
9,
15]. However, there is a dearth of research on factors that impact EL test performance in general science
and content-specific strands—and subsequently, the implications for the validity of high-stakes science tests [
16]. It is vital to build a body of research that examines the extent to which current science testing practices adequately capture ELs’ academic potential. The misuse of tests can lead to marginalization and discrimination toward immigrant and minority groups [
17]. For instance, if testing practices do not adequately capture ELs’ academic potential, this can lead to centering the “problem” of EL test performance on ELs themselves. This study builds on current research on persistent disparities in test scores by examining EL science test performance in general and content-specific strands based on factors known to impact EL student test performance (e.g., socioeconomic status, English language proficiency) and extending this research to less commonly examined factors (e.g., home language, productive and receptive elements of language).
The purpose of this study is to examine factors that are predictive of ELs’ performance on general and content-specific science standardized tests and to identify implications for the construct validity of high-stakes science assessments. Specifically, we, the researchers collected and analyzed data from eighth-grade ELs’ performance on the statewide CMAS Science general and content-specific (physical, life, and earth sciences) exams, as well as the English language proficiency exam—Assessing Comprehension and Communication in English State-to-State (ACCESS) for English Language Learners 2.0. We examined the following variables in relation to CMAS test performance: socioeconomic status, home language, English language proficiency, and receptive and productive elements of language. Socioeconomic status (SES) is defined as “the social standing or class of an individual or group” [
18]. Home language (HL) is the native language spoken in the home [
19]. In this research, home language variables were designated as HL Spanish and HL Other due to the fact that Spanish is the dominant language; approximately 80% of ELs speak Spanish in the state of Colorado. There is no consensus on the definitions of English language proficiency (ELP)—the definition depends on a variety of contextual factors [
20]. Receptive language and productive language are defined as the ability to comprehend language and the ability to produce language, respectively [
21]. The research question is as follows: What factors influence the performance of ELs on a standardized science assessment, including overall performance and content-specific domains? The section that follows provides a synthesis of literature related to factors that impact test performance.
3. Method
This study examines the factors that influence the academic achievement of ELs in high-stakes science assessments, both general and content-based. The research question is as follows: What factors influence the performance of ELs on a standardized science assessment, including overall performance and content-specific domains? We, the researchers, used hierarchical multiple regression to conduct the analysis. This method allows for the testing of predictors in a particular order of theoretical interest to determine the predictive relationships between variables. Variables that are known to be predictive of variance, as stated in the hierarchical multiple regression literature, were added earlier to the model to account for a certain amount of the variance in achievement. Then, the variable(s) of interest were added later to the model to see if they are predictive of additional variance. Research has shown that SES and ELP are predictors of academic achievement for ELs. Therefore, these were included in the hierarchical regression to account for some of the variance. Home Language and Receptive and Productive elements were used in the model to see if there was any further explanation of variance above and beyond that of SES and ELP. We obtained the secondary data used for this analysis through CDE databases and included a Student Biographical Data Grid used in the two state-level assessments, CMAS Science and ACCESS for ELLs 2.0. The individual data were masked using a unique student identifier. Matched student identifiers were used in secondary data analysis.
3.1. Sample
The state of Colorado designates ELs into the following subgroups: NEP (Non-English Proficient), LEP (Limited English Proficient), FEP (Fluent English Proficient), FELL (Former English Language Learner), and PHLOTE (Primary Home Language Other Than English). NEP, LEP, and FEP are part of the Colorado Revised Statutes as official language designations for students who are learning English as a second language and who are receiving extra language support. FELL and PHLOTE are used for students who are not receiving extra support services, but whose language development is influenced by another home language other than English [
4].
In 2015, 64,104 EL students took the CMAS Science assessment. The sample for this study was 6402 eighth-grade EL students who took the 2015 CMAS Science assessment and ACCESS for ELLs 2.0. Only the ELs coded as NEP and LEP who took both the CMAS Science and ACCESS exam were included in the sample. Per Colorado law, only ELs designated as NEP and LEP are required to take the ACCESS exam. Approximately 92% of the EL students who took the tests were identified as Spanish speakers.
3.2. Instruments
Colorado law requires that all students enrolled in public schools take the CMAS Science at the eighth-grade level, and students identified as NEP and LEP take an annual assessment of English language proficiency—ACCESS for ELLs 2.0. Further information on these assessments is provided below.
3.2.1. CMAS Science Tool
The CMAS is Colorado’s standards-based assessment designed to measure the Colorado Academic Standards in science. For the 2015 CMAS, each assessment consisted of three sections. All sections contained a combination of selected-response items (28), technology-enhanced items (15), and constructed-response items (17). A subset of the science assessment includes simulation-based item sets, which are groups of items that relate to a scientific investigation or experiment. CMAS scores were validated using various sources of validity evidence, including the test content, response processes, internal structure, and fairness [
5]. Reliability, using Cronbach’s alpha, was reported at 0.93 for the overall assessment, and 0.82, 0.81, and 0.83 for the respective content domains: physical science, life science, and earth science [
5]. This assessment design follows a universal design for a learning approach that specifically decreases the language load and removes any extraneous language. Additionally, the integration of technology-enhanced items and simulations decrease the language load with regards to text.
3.2.2. ACCESS for ELLs 2.0 Tool
ACCESS is an English language proficiency test designed by the World-Class Instructional Design and Assessment (WIDA) consortium [
33]. ACCESS for ELLs is the collective name for WIDA’s suite of English language proficiency assessments. Colorado uses this instrument as their state English language proficiency assessment administered annually to ELs. ACCESS assesses academic language in language arts, mathematics, science, and social studies. Four composite scores are reported for the assessment: oral (listening and speaking domains), literacy (reading and writing domains), comprehension (listening and reading domains), and overall (listening, speaking, reading, and writing). In 2016, the ACCESS test for eighth-grade students had a composite reliability score of α = 0.930. Evidence for the reliability and validity of the ACCESS exam is provided through the Center for Applied Linguistics (CAL) Validation Framework [
33].
3.3. Analysis and Procedures
We, the researchers, were interested in variables that influence academic achievement of ELs in high-stakes science assessments, both general and content-based. This study includes both performance and demographic variables. The dependent variable was ELs’ performance on the 8th grade CMAS Science assessment (overall scale score and scale score by content domain along a continuous scale). Four independent variables were socioeconomic status (SES), home language (HL), English language proficiency (ELP), and receptive and productive language (R&P). ELP and R&P are composite scores, and a stratified Cronbach’s alpha coefficient was used to compute and weight the contribution of each domain score to determine the composite. SES was a control variable since research has established this as a predictor of achievement [
34]. IBM SPSS 20 was used for all analyses in this research study.
The variables were entered as “blocks”. This allowed for the testing of two models and the analysis of predictability of each individual variable. The first block displays SES (FRL—free and reduced lunch—and Non-FRL). The second block displays SES and primary home language (HL Spanish and HL Other). The third block displays SES, primary home language (HL), and overall English language proficiency (ELP), based on performance on the overall scale score. Block three includes receptive and productive elements of language. The receptive language “RL” and productive language “PL” are displayed as separate variables in combination. RL & PL are based on receptive (reading and listening) and productive (writing and speaking) domains of language. These variables were computed from the ACCESS data, specifically the “productive” composite variable. WIDA reports the “receptive” composite variable as comprehension, but does not calculate or report the “productive” composite variable. Therefore, we created this variable, using the same procedure as WIDA, by combining the speaking and writing domains of language based on the weights that WIDA used for each domain per their 2015 technical manual (speaking = 30% and writing = 70%) [
33]. The calculation of the production score is consistent with the procedure WIDA used to calculate comprehension.
We used hierarchical multiple regression to evaluate the relationship between the independent variables and the dependent variable, controlling for the impact of a different set of independent variables on the dependent variable. Variables were entered into “blocks” in a fixed order of entry to control for the effects of covariates and to test the effects of certain predictors independent of the influence of others.
Multiple regression assumptions included linearity, homoscedasticity, independence of errors, and multicollinearity. The minimum sample size rule 5-to-1 was met; the sample was large. Analysis of residual plots revealed that the assumption of linearity was met. Outliers were examined using standardized residuals. The plot of these residuals was examined using the +/−3 rule to check for homoscedasticity. The Durbin–Watson test for correlation of residuals was used to check for independence of errors using values between 1.5 and 2.5 [
35]. Tolerance levels were investigated for multicollinearity for all independent variables, i.e., greater than 0.10. This assumption was not met. The section that follows reveals the results of the hierarchical multiple regression analysis.
4. Results
4.1. Predictors of Overall Science Achievement
We used hierarchical multiple regression analysis to examine overall achievement on the CMAS Science test as the criterion variable. Multiple regression assumptions for linearity and homoscedasticity were met. Multicollinearity was evaluated using a minimum tolerance level of 0.10 [
35] and Variance Inflation Factor (VIF) maximum tolerance level of 10 [
36]; the VIF recommendation of 10 corresponds to the tolerance recommendation of 0.10 (i.e., 1/0.10 = 10). This assumption was violated for block four in the full regression model. The violation occurred between the overall ELP score and the R&P elements of the language scores. This is due to the R&P elements of language being inherently within the overall ELP. Therefore, to correct this violation, a three-block model used the original three blocks in the method as outlined previously, and then a second three-block model used the original two blocks as outlined, the overall ELP in block three was replaced with the R&P elements of language.
Table 1 displays the effect size measures (
R2), change in
R2, and adjusted
R2 for the full model, and
Table 2 displays pooled unstandardized regression coefficients (
B) and standardized regression coefficients (β). The changes in
R2 for each block suggest that for both models one and two, SES and primary home language combined accounted for 1.0% of the variability; then, by adding English language proficiency, model one accounted for 44% of the variability and model two, the receptive and productive elements of language, accounted for 48% of the variability in predicting the overall achievement regarding the CMAS Science test.
Table 2 displays two models. In the first model, block one, SES was a statistically significant predictor of academic achievement on the CMAS Science test,
F(1, 6400) = 47.49,
p < 0.001. In block two, SES and HL were statistically significant predictors of academic achievement on the CMAS Science test,
F(2, 6399) = 25.28,
p < 0.001. In block three, SES, HL, and ELP were statistically significant predictors of academic achievement on the CMAS Science test,
F(3, 6398) = 1703.47,
p < 0.001. These variables accounted for 44% of the variance in academic achievement on the CMAS Science test.
In the second model, block one, SES was a statistically significant predictor of academic achievement on the CMAS Science test, F(1, 6400) = 47.49, p < 0.001. In block two, SES and HL were statistically significant predictors of academic achievement on the CMAS Science test, F(2, 6399) = 25.28, p < 0.001. In block three, the SES, HL, and R&P elements of language were statistically significant predictors of academic achievement on the CMAS Science test, F(3, 6398) = 1445.37, p < 0.001. These variables accounted for 48% of the variance in academic achievement on the CMAS Science test. Therefore, the R&P elements of language increased the predictability of science achievement by an additional 4% over English language proficiency overall. It is important to note that the productive elements of language were more strongly predictive than the receptive language elements. All predictor variables had statistically significant correlations with overall CMAS Science achievement.
4.2. Predictors of Content Domains: Physical, Life, and Earth Science Achievement
Three different hierarchical multiple regressions were calculated to predict academic achievement in the three different strands of science (physical, life, and earth) within the CMAS Science test based on HL proficiency and R&P elements of language. For each strand, the first block displayed SES (FRL and Non-FRL). The second block displayed SES and HL (Spanish and Other). The third block displayed SES, HL, and overall ELP. An additional hierarchical regression was performed, including blocks one and two as stated above, and block three included the R&P elements of language as the predictor variables.
4.2.1. Physical Science
With the physical science strand on the CMAS Science test as the criterion variable,
Table 3 displays the effect size measures (
R2), change in
R2, and adjusted
R2 for the full model, and
Table 4 displays the pooled unstandardized regression coefficients (
B) and standardized regression coefficients (β). The changes in
R2 in each block suggest that, for both models one and two, SES and HL combined accounted for 1.0% of the variability.
Block one, SES, was a statistically significant predictor of physical science achievement on the CMAS Science test, F(1, 6400) = 47.16, p < 0.001. Block two (SES and HL) was a statistically significant predictor of academic achievement on the CMAS Science test, F(2, 6399) = 26.88, p < 0.001. Block three (SES, HL, and ELP) was a statistically significant predictor of academic achievement on the CMAS Science test, F(3, 6398) = 1043.89, p < 0.001. These variables accounted for 33% of the variance in academic achievement on the CMAS Science test. Block three (SES, HL, and R&P elements of language) was a statistically significant predictor of academic achievement on the CMAS Science test, F(3, 6398) = 891.71, p < 0.001. These variables accounted for 36% of the variance in academic achievement on the CMAS Science test. Therefore, the R&P elements of language increased the predictability of the variability of achievement by an additional 3% over ELP overall, and the productive elements of language were more strongly predictive than the receptive elements.
4.2.2. Life Science
With the life science strand on the CMAS Science test as the criterion variable,
Table 5 displays the effect size measures (
R2), change in
R2, and adjusted
R2 for the full models, and
Table 6 displays the pooled unstandardized regression coefficients (
B) and standardized regression coefficients (β). The changes in
R2 in each block suggest that in models one and two, SES and HL combined accounted for 1.1% of the variability.
Block one, SES, was a statistically significant predictor of physical science achievement on the CMAS Science test, F(1, 6400) = 32.78, p < 0.001. Block two (SES and HL) was a statistically significant predictor of academic achievement on the CMAS Science test, F(2, 6399) = 19.30, p < 0.001. Block three (SES, HL, and ELP) was a statistically significant predictor of academic achievement on the CMAS Science test, F(3, 6398) = 1129.70, p < 0.001. These variables accounted for 35% of the variance in academic achievement on the CMAS Science test. Block three (SES, HL, and the R&P elements of language) was a statistically significant predictor of academic achievement on the CMAS Science test, F(3, 6398) = 932.48, p < 0.001. These variables accounted for 37% of the variance in academic achievement on the CMAS Science test. Therefore, the R&P elements of language increased the predictability of the variability of life science achievement by an additional 2% over ELP overall, with the productive elements of language more strongly predictive than the receptive elements of language.
4.2.3. Earth Science
With the earth science strand on the CMAS Science test as the criterion variable,
Table 7 displays the effect size measures (
R2), change in
R2, and adjusted
R2 for the full models, and
Table 8 displays the pooled unstandardized regression coefficients (
B) and standardized regression coefficients (β). The changes in
R2 in each block suggest that, in both models one and two, SES and HL combined accounted for 1.0% of the variability.
Block one, SES, was a statistically significant predictor of physical science achievement on the CMAS Science test, F(1, 6400) = 33.02, p < 0.001. Block two (SES and HL) was a statistically significant predictor of academic achievement on the CMAS Science test, F(2, 6399) = 16.62, p < 0.001. Block three (SES, HL, and ELP) was a statistically significant predictor of academic achievement on the CMAS Science test, F(3, 6398) = 1032.71, p < 0.001. These variables accounted for 33% of the variance on the CMAS Science test. Block three (SES, HL, and the R&P elements of language) was a statistically significant predictor of academic achievement on the CMAS Science test, F(3, 6398) = 870.20, p < 0.001. These variables accounted for 35% of the variance in academic achievement on the CMAS Science test. Therefore, the R&P elements of language increased the predictability of the variability of earth science achievement by an additional 2% over ELP overall, with the productive elements of language being more strongly predictive than the receptive elements of language.