*5.1. Replicability of Scientific Methods and Findings*

Demonstrating that methods and findings can be replicated across studies is critical for the self-correcting mechanism of the scientific method to function properly and is imperative for generating solutions that can be applied widely. Hyperspectral data is commonly used in precision agriculture for predicting biochemical properties of vegetation and nutrients [18,26,51–53], yet the replicability of findings is rarely tested across different study sites. Scalable science is needed to develop solutions that can be applied globally. The adoption of PLS regression has provided a solution to some of the computational challenges when working with high dimensional, hyperspectral data in remote sensing in general and precision agriculture more specifically, but whether these solutions are transferable has not been widely explored. This study used hyperspectral data and in situ samples to build PLS models to predict plant and grain nutrients for tef and test the replicability of those models for predicting across different environments.

We found significant differences in model fits for both the PLS-Full and PLS-Wave models along with differences in the number and locations of wavebands deemed important for prediction with the PLS-Wave models. Differences in the optimal number and location of wavebands for predicting nutrients via plant canopy measurements may be influenced by varying management practices, such as differing irrigation practices, which may lead to variable water content in the crops [41]. The ET fields were rain fed while the US fields were irrigated through to harvest; therefore variable plant water content may have influenced which wavebands were selected, as water can cause access noise in spectral signatures. This noise is particularly prevalent in wavebands that are sensitive to O-H bonds, including the spectral range 971–1400 nm [54–56]. While we removed a portion of this range from our data (see Figure 4), it is possible remaining wavelengths were affected by noise making replication difficult.

The number of wavebands selected in the PLS models for ET was often less than the number selected for the US (Figure 5). This difference may reflect how the PLS models incorporate water-induced noise that may have been present in the US samples. These findings suggest that understanding how hyperspectral remote sensing methods, such as PLS regression, replicate across agricultural environments may require greater controls on the conditions under which crops are being cultivated. In addition to irrigation and other management differences, changes in latitude and sun angle could have led to differences in scattering and light absorption during field data collection [57], which would impact replication comparisons. However, hyperspectral readings for the grain samples were collected in a controlled laboratory setting with the same halogen lamp, so we can assume that any differences in wavebands were not the result of external factors (i.e., sun angles, latitudes, etc.).

When comparing the nutrient content of the grains, there were clear differences between the study areas (Table 2). These large differences likely result in varying chemical property relationships within the grain, which in turn can result in differential absorption and scattering of electromagnetic energy. The large variance amongst biochemicals within the grain may result in noise for some nutrients as nutrient reflectance properties are often associated with near or similar portions of the electromagnetic spectrum [41].

In short, had we completed this study only in the US, the PLS regression method (both with and without waveband selection) would have produced favorable findings for all three nutrients at the grain level, and for protein at the plant level. Similarly, had we completed this study only in ET, PLS regression would have also produced favorable findings for all three nutrients at the grain level, and for Mg at the plant level. Yet, even where favorable results were found in both environments (all three nutrients at the grain level), the model fits were statistically different, and the wavebands selected were not similar. This finding serves as a cautionary tale that urges researchers to refrain from placing too much confidence on *R*<sup>2</sup> values, which may not translate to different areas. Even when the *R*<sup>2</sup> values were superficially high, there were statistical differences in their values. This is not to suggest that hyperspectral investigations with PLS regression are not useful or valuable but rather that caution should be used when translating findings from one site to another.
