*2.2. Data Review*

Review of the mango dataset began by separating the dependent variables from the independent variables, also referred to as factors in this study. Thus, all numerical variables within the dataset are expressed in the unit of mango fruit, such as mangos consumed, mangos sold, and mangos losses in different ways, and were designated as potential dependent variables. Twenty-five (25) such potential dependent variables were determined from the dataset's 697 variables (total). The remaining 672 variables were designated as factors that potentially affect the dependent variables.

#### 2.2.1. Independent Variables

The 672 potential factors were sorted by removing factors with one or more missing entries, except for the "production and PHL practices" factor. Following the removal of factors with missing entries, the resulting dataset was reduced to 61 factors.

Then, factors containing the respondent farmers' identification information, such as name, contact information, and survey starting and ending times were removed. Additionally, all factors containing "true and false" entries were removed from the dataset. Furthermore, several numerical factors were positively correlated, such as the "total number of mango trees" and "number of productive mango trees" owned by a farmer. In such cases, one (number of productive mango trees) of the two was removed to avoid collinearity [20].

Finally, a listwise deletion of rows within the factor "production and PHL practices" was performed. As mentioned in the first step, this factor was the only one that was not entirely removed from the dataset despite missing entries. The reason being that Technoserve experts suggested the "fruit fly traps," a subset of the "production and PHL practices" factor, played a crucial role in reducing insect infestations of mangos before harvest. Hence, by retaining this factor in the dataset, the importance of "fruit fly traps" in reducing insect infestations of mangos before harvest could be compared to its importance in preserving quality and reducing loss after harvest. The listwise deletion of rows was applied to remove any randomly missing entries of this factor. Although the listwise deletion of rows is a commonly used technique for handling missing data [21], it was only applied to the "production and PHL practices" factor and not to the entire dataset. Using such an approach to the entire raw dataset would have resulted in a 100% loss of information due to multiple missing entries.

The final dataset of factors consisted of nine sections and 21 factors (Table 1), where 19 factors were categorical (each containing at least two subsets), and two were numerical. Therefore, harvest methods, type of storage used after harvest, type of package for sale, and production PHL practices are the four identified factors that contain various technology subsets as specified in Table 1. Their effect on mango PHL will be evaluated in this study. Additionally, certain factors and subsets were renamed to provide more clarity, and some subsets were combined into fewer to facilitate the evaluation of their effect on PHL.

Following factor review and summarization, the four factors that contained postharvest technologies are listed in Table 2, along with their subsets, subset descriptors, and descriptions.

#### 2.2.2. Dependent Variables

The 25 potential dependent variables were also sorted to identify the various types of mango losses along the value chain. The first step consisted of removing variables or columns with at least one missing entry. The second step consisted of identifying all mango PHL along the value chain. Though all 25 potential dependent variables were numerical data representing quantities of mango fruit sold, given to family, used as payment-in-kind, consumed by farmers, and lost along the value chain, not all were PHL variables. PHL variables are the hotspots of loss that form the entire PHL [22]. Therefore, in this study, mango losses that occurred during harvest and losses that occurred after harvest were the only types of losses considered to be PHL variables.

**Table 1.** A summary of the dataset showing sections, factors, subsets of factors, and respondent farmers: Column (a) lists the nine sections to which each factor belongs. Column (b) lists all 21 factors, including the 19 categorical C, two numerical N, and four containing postharvest technologies T. Column (c) expands each factor into subsets. Subsets with the superscript PHT are identified as postharvest technologies. Subsets with the superscript PRHT are identified as pre-harvest technologies. Numerical factors consist of numerical values estimated by each respondent farmer. Column (d) renames subsets and combines them into fewer categories to facilitate subsequent analysis. Subset descriptors with the superscript YWI are identified as technologies promoted by the YWI. Column (e) indicates the number of respondent farmers belonging to each subset. For each factor, respondent farmers who reported more than one subset were assigned the subset Other \*\*.



**Table 1.** *Cont.*

Following the selection of mango PHL variables, the resulting dependent variables consisted of nine types of mango PHL (Table 3) from the raw dataset's initial 25 potential dependent variables. The nine types of mango PHL were subsequently grouped based on the stages of the value chain at which they occurred (Table 3).

The third step consisted of identifying and removing outliers [23] from dependent variables. To identify outliers, mango gross production per farmer was calculated for each farmer. The calculation consisted of summing all variables that contributed to mango gross production, including mangos sold, given to family, used as payment-in-kind, consumed by farmers, and all PHL variables shown in Table 3. It was then observed that the calculated mango gross production distribution was skewed with outliers. Hence, removing the rows containing mango gross production outliers resulted in eliminating outliers from PHL distributions at each value chain stage.

The last step consisted of expressing mango PHL at all three value chain stages as percentages of gross production (Figure 1) for all 753 respondent farmers.

**Table 2.** Summarizing and describing factors containing postharvest technology subsets: Column (a) shows the four factors containing postharvest technologies. Column (b) shows the subset descriptors, which are renamed subsets; these were determined to reduce the raw data into fewer categories, to facilitate subsequent analysis. Column (c) shows the subsets of each factor as initially recorded in the raw data. Column (d) describes the purpose of each subset. The superscripts **YWI** in Columns (b) and (c) refer to technologies that the YWI promoted.



**Table 3.** Types of mango PHL within the dataset of dependent variables.

**Figure 1.** Distributions of mango PHL (%) during harvest (**a**), transportation (**b**), and at point of sale (**c**).

The PHL data summarized in Figure 1 were subsequently combined with the factors listed in Table 1. This combination resulted in creating the YWI mango dataset (summarized in Table 4) that formed the basis for the analysis and results presented in this study.

**Table 4.** Summary of all factors, subsets of factors, respondent farmers, and the seven types of mango PHL: Column (a) lists all 21 factors including the 19 categorical C, two numerical N, and four containing postharvest technologies T. Column (b) expands each factor into subsets that were previously referred to as subset descriptors in Table 1. The superscript **YWI** is used to identify technologies promoted by the YWI. "Other" refers to the combination of multiple subsets as reported by respondent farmers. Column (c) indicates the number of respondent farmers belonging to each subset. Column (d) encompasses mango PHL at harvest, during transportation, at point of sale, and as a total of all three value chain stages. PHL averages cannot be categorized by numerical factors, hence the n/a notation.



**Table 4.** *Cont.*

In addition to summarizing the YWI mango dataset in Table 4, each stage's PHL was expressed as a proportion of the total PHL (Figure 2) by dividing each stage's average by the average PHL of the entire value chain. Furthermore, an online interactive mango PHL dashboard was created (https://phldashboard.shinyapps.io/phldashboard/ (accessed on 2 June 2021)) to explore average mango PHL as a function of each factor in Table 4 Column (a) and as a function of a selected combination of factors.

#### *2.3. Statistical Analysis*

Identification of the four factors containing postharvest technology subsets (Table 2) and the subsequent quantification of mango losses associated with each subset (Table 4) provided a basis for comparing PHL averages per subset and quantifying the effect size among postharvest technology subsets. However, to ensure that the PHL averages are significantly different among subsets or technologies, a preliminary analysis of the subsets' data was conducted to identify an appropriate statistical tool for comparing means. The initial analysis consisted of verifying the main mathematical assumptions of normality, homogeneity of variance, and independence [24] required to use parametric statistical tools.

**Figure 2.** Proportion of PHL at each value chain stage.

The assumption of normality was considered violated as the distributions of PHL per subset were skewed, and the Shapiro–Wilk normality test results indicated that the skewed distributions were significantly different (*p* < 0.05) from a normal distribution curve. However, the assumption of homogeneity of variance was not violated as Levene's test results indicated a significant (*p* > 0.05) homogeneity of variance among subsets of all four factors. Similarly, the assumption of independence was not considered violated as PHL distributions per subset were identically distributed to the right for all four factors. Also, observations within each subset were assumed to be independent, although there could be a sampling bias owing to a lack of randomization during the YWI farmers selection process.

Consequently, the Kruskal–Wallis statistical test was identified as a suitable approach for evaluating the effect of the YWI promoted technologies on mango PHL incurred at the three stages of the value chain. The Kruskal–Wallis test is the nonparametric analog of a one-way ANOVA, which does not make assumptions about normality [25] and is robust when data contain outlying observations [24]. When the Kruskal–Wallis test showed significance, it was followed by a Dunn test with Benjamini–Hochberg adjustment.

In addition to performing the statistical tests mentioned above, the size of the reduction or increase in PHL was also calculated when PHL differences showed significance (*p* < 0.05). The method used for calculating the effect size of the Kruskal–Wallis test was the Epsilonsquared method [26]. Interpretation of the Epsilon-squared effect size was made using the measures of association rules [27]. However, since Epsilon-squared is a squared variable, the upper and lower bound of each bin mentioned were squared [27], yielding the following effect size rule: 0.00 and under 0.01 = negligible; 0.01 and under 0.04 = weak; 0.04 and under 0.16 = moderate; 0.16 and under 0.36 = relatively strong; 0.36 and under 0.64 = strong.

Lastly, knowing that interventions within the YWI were not randomly attributed to farmers and that farmers who benefited from the interventions were not randomly selected, causal inferences from statistical analysis results to a larger population of SHF can be somewhat speculative. However, thinking of the *p*-values as approximate *p*-values for permutation tests will lead to concluding that observed evidence of differences in the results is valid, more so than can be explained by chance [24].
