**7. Discussion**

The aim of the present study was to design an item pool to assess the word recognition skills of elementary school students. The importance of word recognition in the context of reading fluency was established. Overall, word recognition can be seen as a potential indicator for first reading skills [12,13]. To increase the assessment economy in school practice, the items were conceived as tasks for quiet reading. This enables an assessment of the whole class at a time. It is assumed that these results are largely related to oral reading skills [53,54].

Another aim of the study was to verify the psychometric suitability of these items. For this purpose, the items were distributed over different test templates, which were connected to each other by means of anchor items. Thus, not every child in the sample had to process all the items, but it was still possible to determine item and person estimators for each item and for all students using the item response theory. From the original 1277 items, a total of 1071 items corresponded to the previously set criteria. The other items were dropped due to unfavorable selectivity or an over- or underfit to the computed Rasch model. Overall, the reliability of the individual test templates is high (lowest average α = 85), which speaks for the homogeneity of the items. The correlation with other reading tests is also high (r = 64). This shows that word recognition skills are related to reading speed and reading comprehension.

Although there are indications that some of the items are of different difficulty for girls and boys (in the sense of test fairness), these differences can be regarded as minor. No items measure in the upper performance range, which is another limitation of this study. However, it can be argued that although word recognition is highly correlated with other reading skills, such as passage comprehension throughout primary school [47], it is particularly important in the first years of primary school [34]. From the third grade onwards, it can be assumed that students have largely acquired word recognition skills [35,36]. In this respect, possible ceiling effects are to be expected in higher grades. The items of the item pool therefore differentiate particularly in the lower performance range. Against this background, however, they can be used for screening purposes. Overall, the targeting of the items appears to be adequate. Though word recognition skills seem to be a potential indicator of reading skills, they are not sufficient to diagnose higher reading skills. The use of further test instruments should be considered here.

Based on the item pool, CBM with parallel forms of the same structure were developed, which can be used every four weeks during a school year (10 parallel forms in each grade level). A proportion of easier (−2.5 ≤ σ < −1), medium (−1 ≤ σ < 0), and more difficult items (0 ≤ σ ≤ 1) was selected to map different areas of competence. In further investigations, the suitability of the developed CBM for progress monitoring will be investigated. In addition to classical quality criteria (objectivity, reliability, and validity), progress monitoring criteria must also be fulfilled (Fuchs, 2004). The study presented here only uses results from a cross-sectional study. Thus, no information can be derived on the suitability of the items for status diagnostic purposes. The scaling according to the item response theory, however, is to be seen as a meaningful addition to the classical test theory, which allows first statements about the suitability for progress monitoring (high reliability, unidimensionality of the measured construct, constant item difficulty, and high test fairness) [67–69]. In a further step, the measurement invariance over different test times should be investigated. In addition, the sensitivity to change as well as the applicability and effectiveness in the school context should be examined [42].

A calibrated item pool, as described in this study, provides many advantages. Different instruments can be flexibly developed from such a pool. It is also possible to realize adaptive test situations, whereby the item selection in the concrete test situation is dependent on the ability of the child, in order to enable more precise measurements at the ability level. In this context, the use of digital media appears to be particularly useful [35,70]. In addition, the time taken to process the items can be measured with the aid of a computer. This makes it possible to dispense with a time restriction on the processing time, which can lead to increased pressure on the students. A further advantage is the possible combination of diagnostic information and training material. Computer-aided training programs can react adaptively to the results of an upstream diagnosis. Digital technologies offer the potential to support struggling readers; however, little systematic research has focused on the effect of technology on reading skills [71]. In terms of quiet reading, the research situation has so far been even sparser.

Future research will concentrate on factors that influence the difficulty of the items of the word pool. Possible variables in this context are structural features of the words (word type, word length, number of syllables or graphemes in a word) and phonological, morphological, and orthographic characteristics and occurrences in textbooks according to grade level.

**Author Contributions:** The authors contributed equally to the conceptualization, writing, and revision of this article. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Ministry of Education, Science and Culture of Mecklenburg-Western Pomerania/Germany.

**Acknowledgments:** We acknowledge the financial support of Deutsche Forschungsgemeinschaft and Universität Rostock within the funding program Open Access Publishing.

**Conflicts of Interest:** The authors declare no conflicts of interest.
