*2.2. Data Analysis*

A representativeness analysis provides a robust statistical test to enable the user to investigate potential geographic biases within a collection of primary data observations (e.g., case studies) [96]. Using this analytical approach, for a given global variable of interest (e.g., average annual precipitation), the frequency distribution of the global variable within a user-specified geographic extent was compared with the frequency distribution of the observations in the sample collection, and the degree to which the sample collection's distribution is representative of the distribution of the global variable was quantified [96,97]. The null hypothesis for this analysis was that the frequency distributions of the global variable and sample collection are not statistically different. If the null hypothesis can be rejected with a low probability of type I error, then the sample can be declared as significantly biased. To enable comparability between values of the global variable and sample collection observations, which might include case study geographies of diverse extents, the standardized, hexagonal, and equal-area geographic units from the GLOBE system were used, known as GLOBE land units (GLUs). The degree of representedness (r) was then computed with a chi-squared (χ2) test and was characterized as follows:

$$\begin{array}{l} \mathbf{r} = 0 \text{ if } \mathbf{f\_e(g\_V)} = \mathbf{f\_o(g\_V)} \\ -(1 - \mathbf{p}) \text{ if } \mathbf{f\_e(g\_V)} > \mathbf{f\_o(g\_V)} \\ (1 - \mathbf{p}) \text{ if } \mathbf{f\_e(g\_V)} \le \mathbf{f\_o(g\_V)} \\ \text{undefined if } \mathbf{f\_e(g\_V)} = 0 \land \mathbf{f\_o} \ne 0 \end{array}$$

where fe(gv) was the expected frequency of the bin to which GLU g belonged (calculated from the population set), fo(gv) was the observed frequency of that bin (calculated from the sample set), and *<sup>p</sup>* was the *<sup>p</sup>*-value for the <sup>χ</sup><sup>2</sup> test. The range of r is between [−1 to 1], with 0 indicating perfect representedness, negative numbers indicating under-representedness, and positive numbers indicating over-representedness [96].

Several data preparation steps were followed to produce the sample and global (~population) datasets. Table 1 describes all the datasets used for this analysis. After shortlisting the case studies, the locations of the study sites (total = 53) mentioned in each of the selected 50 articles were mapped using the shapefiles of administrative boundaries from the GADM dataset in ArcGIS Pro software (see Figure 2). Next, the global GLU feature layer obtained from GLOBE was filtered using several context variables (see Table 2) to restrict the global dataset to the expected geographic extent of agricultural areas. Case study locations were also intersected with the filtered GLU layer to form the sample dataset and to maintain a similar unit of analysis for both the layers. For each GLU, values of three variables—average annual precipitation (mm/year), percent crop area, and market access index were calculated. For the area equipped for irrigation (%) variable, mean values were computed using zonal statistics within each GLU for both the above feature layers. The extent/range of the selected four variables within both the global and sample layers are shown in Figures 3–6. For each of these four variables, these two datasets were divided into different intervals or bins. The binning strategy was kept the same as their source datasets (see Table 1 for dataset details) except for average annual precipitation variable for which a geometric interval was used. Finally, Pearson's χ<sup>2</sup> test for the independence of two datasets was conducted to compare the frequency distributions of the sample and global datasets for each of the selected four variables to determine the geographic representativeness of the assembled case studies on irrigation adoption and answer the first hypothesis.

**Figure 2.** The map shows the location and distribution of selected cases.

(**b**)

**Figure 3.** (**a**) Global extent for % Area Equipped for Irrigation variable. (**b**) Sample extent for % Area Equipped for Irrigation variable.

**Figure 4.** (**a**) Global extent for Avg Annual Precipitation variable. (**b**) Sample extent for Avg Annual Precipitation variable.

**Figure 5.** (**a**) Global extent for Percent Cropland variable. (**b**) Sample extent for Percent Cropland variable.

**Figure 6.** (**a**) Global extent for Market Access variable. (**b**) Sample extent for Market Access variable.


**Table 2.** Description of all the filters applied to the GLU layer obtained from GLOBE.

To test the second hypothesis, first a list of factors reported to influence irrigation adoption decisions of farmers was compiled from the selected case studies. Factors affecting farmers' adoption decisions are often classified into broad clusters like financial/economic, physical, institutional, and individual characteristics, but depending on the researchers' preferences and disciplinary backgrounds this categorization can vary [57,70]. For our study, based on the background literature, the different (influential) factors were clustered into seven broad categories—biophysical, demographic, geographic, technology-specific, social capital, farm enterprise, and institutional factors (Figure 7). Individual factors were coded using these broad categories for frequency analysis. Next, the relationships between these seven factor categories and their corresponding geographical contexts were examined using correspondence analysis. Correspondence analysis (CA) is a multivariate statistical technique and a useful visualization tool for summarizing, examining, and displaying the relationships between categorical data in a contingency table [100,101]. No underlying distributional assumptions are needed for this analysis and therefore, it accommodates any type of categorical variable—binary, ordinal, or nominal [102]. Moreover, the row and column points from the contingency table are shown together on a multi-dimensional map called biplot, which allows for easier visualization of the associations among variables [103,104]. CA uses the chi-square statistic to measure the distance between points on the map, but it does not reveal whether these associations are statistically significant and is therefore used only as an exploratory method [104].

All the above-mentioned statistical tests were conducted and developed in the Py-Charm IDE (Integrated Development Environment) using pandas, Matplotlib, Prince, and Scipy Stats libraries.

**Figure 7.** Categorization of different factors influencing farmers' irrigation adoption decision-making.

#### **3. Results**

#### *3.1. Geographic Representativeness of Irrigation Adoption Studies*

Geographic representativeness analyses were conducted for the percentage of GLU area equipped with irrigation, percentage of GLU area in cropland, average market accessibility, and average annual precipitation. Pearson's χ<sup>2</sup> tests for independence for each of the four variables (Tables 3–6) found that the observed (~sample) distributions were statistically different from the expected distributions.

**Table 3.** Pearson's χ<sup>2</sup> test results with percentage of area equipped for irrigation variable.


\* At 0.05 significance level; \*\* r-value calculation based on criteria defined in Section 3.2.


**Table 4.** Pearson's χ<sup>2</sup> test results with percentage of cropland variable.

\* At 0.05 significance level; \*\* r-value calculation based on criteria defined in Section 3.2.

**Table 5.** Pearson's χ<sup>2</sup> test results with market accessibility variable.


\* At 0.05 significance level; \*\* r-value calculation based on criteria defined in Section 3.2.

**Table 6.** Pearson's χ<sup>2</sup> test results with average annual precipitation (mm/year) variable.


\* At 0.05 significance level; \*\* r-value calculation based on criteria defined in Section 3.2.

The observed frequencies of the two lowest percent areas of irrigation were significantly lower than their expected frequencies (see Figure 8) and highly underrepresented (Table 3). Similarly, the remaining seven bins were highly over-represented in this collection as the observed frequencies of these bins were higher compared to their corresponding expected frequencies. Case studies of irrigation adoption were thus biased toward areas of existing agriculture, and studies were generally more over-represented as the area equipped for irrigation increased.

**Figure 8.** Percentage of Observed (~Sample) vs. Expected Counts for Irrigation Variable.

Similarly, in the case of the percent cropland variable (Table 4 and Figure 9), four out of ten bins (with very low and high cropland cover) were highly underrepresented. Irrigation adoption studies were more frequently conducted in areas with moderate extents of agricultural land use, and thus biased against areas of low or high cropland. This likely had implications for the irrigation adoption decisions studied. Locations that were dominantly or exclusively agricultural likely had better support services and infrastructure and did not compete with other land uses, which would presumably facilitate irrigation adoption. Conversely, farmers in low agricultural areas face the opposite conditions and may experience more barriers to irrigation adoption.

**Figure 9.** Percentage of Observed (~Sample) vs. Expected Counts for Cropland Variable.

In the case of the market access index, most of the bins (8 out of 10) were highly over-represented (Table 5 and Figure 10) with a bias toward areas having moderate-high

market access. Market signals that might favor irrigation adoption were likely dampened in low market accessibility areas, which may not have been enough to overcome economic barriers to irrigation adoption. Additionally, remote areas are generally understudied due to access difficulties for researchers [30]. As a result, irrigation adoption studies were skewed toward locations with greater accessibility, including a well-represented sample of the most accessible locations.

**Figure 10.** Percentage of Observed (~Sample) vs. Expected Counts for Market Accessibility Variable.

Finally, regions receiving moderate average annual rainfall (463–1219 mm/year) were highly over-represented, while regions with very low and high average annual rainfall were under-represented and understudied (Table 6 and Figure 11). The underrepresentation of low rainfall areas was surprising, but these may be neglected by irrigation adoption studies due to the necessity of irrigation and limited variability in decision-making. The limited sampling of high precipitation areas was not surprising, since areas receiving high average annual precipitation were more likely associated with rainfed agriculture. However, such areas may also include those in which seasonal drought is a concern despite high aggregate rainfall (e.g., humid southeast United States) and which potentially have unique sets of adoption decision factors.

**Figure 11.** Percentage of Observed (~Sample) vs. Expected Counts for Average Annual Precipitation Variable.

#### *3.2. Similarity of Irrigation Adoption Factors across Geographic Contexts*

Most of the studies conducted in low irrigated regions of the world and that were highly underrepresented in this collection were from countries located in Africa and Latin America (see Table 7 and Figure 12). Further, Table 8 lists the different clusters of factors affecting irrigation adoption identified from the case studies, broken down by world regions. The frequency of each of the causal factors as reported in the case studies are provided in this table as an absolute number (this method of frequency analysis is based on the Geist & Lambin (2004) study). Only two case studies had a single variable (factor category) that explained farmers' decision-making regarding irrigation adoption, thus suggesting that the decision to adopt (or not) irrigation is best explained using a combination of factors (see Table 8). Dominating the broad clusters of factors affecting irrigation adoption decisions of farmers was the combination of—Biophysical, Demographic, Farm Enterprise, and Social Capital factors (B, D, F, S), followed by the cluster with Biophysical, Demographic, Farm Enterprise, Institutional, and Social Capital factors (B, D, F, I, S), with clear regional variations as both these clusters feature mainly in case studies from Asia and Africa. Cases from both these regions share a greater number of factors in common as compared to other regions. Demographic category that includes factors like age, gender, household size, and more (see Figure 7 for more details) featured the most, while both institutional and technology-related factor categories were least observed within these case studies. Further, demographic and social capital related factors together formed the most robust combination, although one that often occurred in combination with other clusters.

**Table 7.** Distribution of number of cases based on percentage of irrigation.


**Figure 12.** Distribution of study regions based on the percentage of area equipped for irrigation.


**Table 8.** Frequency of broad clusters of factors affecting irrigation adoption.

B = Biophysical; D = Demographic; F = Farm Enterprise; G = Geographic; I = Institutional; S = Social Capital;

T = Technology-specific.

Additionally, the CA biplot between the study regions and set of causal factors (Figure 13) was also prepared to visually identify and understand these regional variations. In this symmetric scatterplot, component 0 was represented by the horizontal axis and component 1 by the vertical axis. Together both the components explained about 45.68% of the variance/inertia in this dataset. Europe had high positive values along component 0 (horizontal axis), while Australia had high positive values along the vertical axis. Similarly, North America had high negative values and low positive values along vertical and horizontal axis, respectively. Moreover, from just visually inspecting this biplot it was evident that the set of factors influencing irrigation adoption (of farmers) in cases from Europe, Australia and North America were very different from each other as they were placed in separate quadrants and were also far from the origin. Australia and

Latin America study regions were placed in the same quadrant and thus, shared similar profiles, i.e., within both these regions similar combination of causal factors was observed as compared to say Europe or other regions (see Table 8 for more details). Further, the map also revealed that irrigation adoption by farmers from case studies in Europe was explained by a combination of only demographic, social capital, institutional, and technology-specific attributes. Whereas in case of North America, the strongest association was seen with factors like demographic, social capital, farm enterprise, institutional and biophysical.

**Figure 13.** 2-D Correspondence Analysis biplot of Study Regions and Factors affecting Irrigation Adoption.
