**K. Colton Flynn 1,\*,**†**, Amy E. Frazier <sup>2</sup> and Sintayehu Admas <sup>3</sup>**


Received: 29 July 2020; Accepted: 30 August 2020; Published: 4 September 2020

**Abstract:** Achieving reproducibility and replication (R&R) of scientific results is tantamount for science to progress, and it is also necessary for ensuring the self-correcting mechanism of the scientific method. Topics of R&R have sailed to the forefront of research agenda in many fields recently but have received less attention in remote sensing in general and specifically for studies utilizing hyperspectral data. Given the extremely local environments in which many hyperspectral studies are conducted (e.g., agricultural field plots), purposeful attention to the repeatability of findings across study locales can help ensure methods are generalizable. This study undertakes an investigation of the nutrient content of tef (*Eragrostis tef*), an understudied plant that is growing in importance due to both food and forage benefits, but does so within the context of the replicability of methods and findings across two study sites situated in different international and environmental contexts. The aims are to (1) determine whether calcium, magnesium, and protein of both the plant and grain can be predicted using hyperspectral data with partial least squares (PLS) regression with waveband selection, and (2) compare the replicability of models across differing environments. Results suggest the method can produce high nutrient prediction accuracy for both the plant and grain in individual environments, but selection of wavebands for nutrient prediction was not comparable across study areas. The findings suggest that the method must be calibrated in each location, thereby reducing the potential to extrapolate methods to different areas. Our findings highlight the need for greater attention to methods and results replication in remote sensing, specifically hyperspectral analyses, in order for scientific findings to be repeatable beyond the plot level.

**Keywords:** reproducibility; replicability; hyperspectral; waveband selection; partial least squares; Ethiopia; *Eragrostis tef*

#### **1. Introduction**

The reproducibility and replication (R&R) of scientific findings has recently moved to the forefront of research agenda in many fields [1–5] since it has been discovered that findings often cannot be reproduced or replicated [5,6]. While the two "R's"—reproducibility and replicability—are intertwined, there are key differences between their goals. Adopting the definitions from the National Science Foundation [7] and the National Academy of Science, Engineering, and Medicine [8], we define reproducibility as the ability of a researcher to duplicate the results of a prior study using the same data and methods as the original investigator. In short, if a researcher makes the data and methods/code available, another researcher should be able to produce the exact same results. In comparison, replicability is the ability of a researcher to duplicate results using similar methods but with new data.

Achieving R&R is critical for advancing scientific discoveries, yet neither topic has received much attention in geography and the spatial sciences, where investigations tend to be observational instead of experimental or theoretical [9]. R&R has received even less attention in remote sensing (but see [10] for an early take), even though the field is uniquely positioned to contribute to R&R on several fronts. First, there is a rich archive of publicly available remote sensing datasets (e.g., Landsat), supporting opportunities for reproducibility [9,11]. Second, remote sensing studies, and in particular hyperspectral studies, are often situated in extremely local contexts (e.g., agricultural plots) due to the need for ground reference data and the high labor and time costs of operating equipment. Yet, an implicit goal of science is to develop widely applicable methodologies and generalizable findings that can be applied in different contexts. Thus, working toward the replicability of methods and findings across different study areas is important for advancing remote sensing science.

Despite the myriad opportunities for remote sensing scientists to explore R&R issues, very few formal efforts have been documented. One reason is likely because remote sensing scientists often work with large datasets and perform complex spectral and spatial manipulations [12–16], which makes R&R difficult if processing code is not made available. Until recently, many scientific publications did not require code to be submitted as part of the manuscript review process, although this is changing. Replication in remote sensing is also hindered by attributes of local environments, which makes the transfer of results from one landscape to another difficult. However, if we are to develop methodologies that are transferrable across space, it is necessary to begin developing and implementing protocols for testing the R&R of remote sensing studies. One way to do this is to incorporate multi-field, multi-environment analyses into studies to self-test the replicability of methods and results.

Precision agriculture is one field where immediate gains can be made toward testing the replicability of methods while also contributing a larger understanding of the extent of R&R issues in remote sensing. Since the overall goal of precision agriculture is to decrease the ambiguity of decisions required on agricultural lands that are often highly variable [17], the ability to transfer methods and findings from one environment or location to another requires them to be replicable [18]. However, most studies capture data in a single region or location (often in a single crop field) under uniform conditions [12,15], thus limiting their generalizability across environmental or geographical contexts. Furthermore, the implicit assumption is that methods and findings are extendable beyond the single field in which they were tested, but in most cases, no such evidence is provided. Many studies lack basic explanation for environmental variances such as soil, hydrology, and topography that can cause reflectance variations, thereby altering results across space [16]. Ultimately, remote sensing methodologies are of little practical value for precision agriculture if they are developed, tested, and applicable in a single location where these multiple and often confounding factors are held constant.

Partial least squares (PLS) regression has become an accepted technique in vegetation studies using hyperspectral data for estimating a range of biophysical and biochemical properties [19–23]. In situations where the number of independent variables is large and the variables are collinear, which is common with hyperspectral data, multiple linear regression will often overfit the model [24,25]. PLS regression standardizes model construction from the preprocessed hyperspectral data via latent variables, from which the predictive capabilities of the model can be tested. Recently, variations of PLS

regression using a waveband selection procedure [13] have been proposed and adopted, but there has been little effort to test the replicability of these methods across environments to determine whether results might be transferable.

The objective of this study is to investigate the replicability of PLS regression methods, including PLS with waveband selection, for predicting nutrient content in plant and grain material across multiple environments. This study addresses gaps in the remote sensing R&R literature by replicating a methodological workflow using hyperspectral data and PLS regression for predicting nutrients in a single crop but in two varying environments in different international contexts to determine the degree to which the methods are replicable. The focus is on *Eragrostis tef* (tef), a cereal crop primarily grown in Ethiopia, although production has been expanding to other parts of the world due to its versatility and resistance to drought. Tef is ideal for studying replicability because it is grown in different international contexts and is known for being successfully cultivated across differing environments. Additionally, very few hyperspectral analyses have been performed on non-milled grains [26], so this study contributes knowledge in that realm as well.

Tef is a grass (Family: Poaceae) that has received little attention from the remote sensing and precision agricultural communities despite its versatile cultivation characteristics. Tef is thought to be one of the earliest domesticated plants [27], with the center of origin and diversity in Ethiopia/Horn of Africa [28]. It is drought and heat resistant, has a high nutrient content, and is grown for animal feed as well as a staple food crop [29]. While tef can be cultivated across many environments, it is primarily grown in Ethiopia, where it is the most commonly harvested crop, popular for its highly nutritious, gluten-free grain [30–35]. Recently, cultivation has been spreading outside the region; in the United States, tef is planted as a sequential forage crop for livestock feed but is currently only grown in a handful of locations [29,36].

#### **2. Data Collection and Processing**

#### *2.1. Study Sites*

This study focuses on four sites in two countries. The two US sites (US1: 21.45 ha; US2: 24.19 ha) are located in Hydro, Oklahoma, which is part of the Central Great Plains ecoregion. The region experiences cold winters (average temperature minimums from 4 to −12 ◦C) and hot summers (temperatures greater than 38 ◦C). Precipitation is variable, and temperature changes can be considerable across all seasons. The US sites are located within three kilometers of each other, so the environmental characteristics are similar. Both sites have similar soils (vertisols) and are located at the same elevation (474 m). The two Ethiopia sites (ET1: 0.77 ha; ET2: 1.23 ha) are also in close geographic proximity (Figure 1). The first site is located in the warm, sub-moist lowlands, while the second site is in the warm, humid lowlands. Soil composition at both sites is similar (vertisols), but the sites are at different elevations (1919 m and 2201 m, respectively).

**Figure 1.** Locations of the four study sites.
