2.2.2. Structural Equation Model of Farmers' Willingness to Remediate

To study the main factors influencing farmers' willingness to participate in soil remediation, a structural equation model (SEM) was constructed using IBM SPSS Statistics Amos 24.0, which is a widely used method in the study of farmers' behaviors and attitudes, mainly for continuous variables. SEM mainly focuses on modeling the relationships among latent variables, which can effectively solve the problem of farmers' cognition and other problems that are difficult to directly observe, and clearly depicts the behavioral and attitudinal processes of farmers [23,31]. Models using SEM can be divided in two types: measurement and structural models. The measurement model describes how latent variables are measured or conceptualized by corresponding observable variables and is established based on factor analysis, whereas the structural model describes the relationship between different latent variables and is established based on path analysis.

The matrix equations of the measurement model are:

$$\mathcal{X} = \Lambda\_{\mathbf{x}} \xi + \mathcal{S} \tag{1}$$

$$
\Upsilon = \Lambda\_\mathcal{Y} \mathfrak{n} + \varepsilon \tag{2}
$$

where ξ and η are exogenous and endogenous latent variables, respectively; X is the exogenous observable variable corresponding to ξ; Y is the endogenous observable variable corresponding to η; Λ<sup>x</sup> is the loading matrix of X on ξ; Λ<sup>y</sup> is the loading matrix of Y on η; and δ and ε are the measurement errors of X and Y, respectively.

The matrix equation of the structure model is:

$$
\mathfrak{u} = \Gamma \mathfrak{E} + \zeta \,\tag{3}
$$

where Γ denotes the effect of ξ on η and ζ is the explanatory error vector that represents the residual term of the structural equation.

In general, farmers' individual characteristics, such as age, gender, and education, household endowments, such as household income and farmland area, and technical characteristics may influence their attitudes toward soil remediation [21,27,32]. Perceived benefits represent the subjective evaluation of soil remediation by farmers. Based on the relevant literature [33,34] combined with the actual situation, we set up questions to evaluate the perceived benefits of farmers, such as whether remediation affects food supply and whether the household income increases. In this study, we surveyed farmers in areas where soil remediation projects have been carried out, whose current participation status in remediation may have been influenced by factors, such as individual characteristics, household endowments, and may have an impact on future intentions. Therefore, we constructed a framework with individual characteristics, household endowments, technical characteristics, participation status, and perceived benefits as exogenous latent variables, and farmers' willingness as endogenous latent variable (Figure 3). The 18 observable variables under the 6 latent variables were all continuous variables and measured uniformly on a 5-level Richter scale. The variable descriptions and statistical values are presented in the Supplementary Table S3.

To examine the evaluation of model, fit indices were used, particularly chi-square, the ratio of chi-square to degrees of freedom (Chi/DF), the goodness of fit index (GFI), adjusted goodness of fit index (AGFI), and root mean squared error of approximation (RMSEA). Both GFI and AGFI estimates ranged from 0 to 1, and if the value was above 0.9, it was acceptable. The acceptable model fit indicated by RMSEA value is 0.08 [35].

**Figure 3.** Framework analyzing the factors influencing farmer's willingness to participate in soil remediation.

#### 2.2.3. Random Forest Model of Farmers' Technology Preferences

To explore the main factors influencing farmers' technology preferences for soil remediation, random forest (RF) dichotomy algorithm was used. The RF is a combined classification model based on classification and regression tree (CART), which is a supervised machine learning algorithm that can obtain small errors and high classification accuracy from a limited training set of samples and can be used to evaluate the importance of impact factors [36]. In RF, each node is split using the best value in a randomly selected subset of predictor variables at that node. This counterintuitive strategy performs efficiently compared with many other data mining techniques, including discriminant analysis, support vector machines, and neural networks, and is robust to overfitting [36].

The RF model was constructed using R4.2.1 software with the dichotomous variable of farmers' technology preference to phytoremediation or passivation as dependent variables and farmers' individual characteristics (gender, age, education), household endowment (such as population, farmland, income), participation status, and technological characteristics as independent variables, and finally the Gini index method was applied to evaluate the importance of the variables influencing farmers' technology preference classification. MeanDecreaseGini calculates the impact of each variable on the heterogeneity of the observations at each node of the classification tree using the Gini index, thus comparing the importance of the variables. The larger the value, the greater the importance of the variable. In this study, we normalized all variable importance indices to sum to 100% for presentation [37]. A total of 18 independent variables were used as the original survey data. The variable descriptions and statistical values are presented in the Supplementary Table S7.

#### 2.2.4. Farmer Features Extraction

To further extract the characteristics of farmers who chose each remediation technology, a principal component analysis (PCA) was performed on the data of farmers who chose phytoremediation and passivation separately (IBM SPSS Statistics 24.0). The validity of each variable was determined based on the communality, which indicates how much each variable is expressed by the common factor, and it is generally accepted that variables greater than 0.7 are well expressed by the common factor. Therefore, we used the variables with a communality greater than 0.7 as factors that better summarize the characteristics of the farmers (households) selected for a certain remediation technology. The frequency distribution of these variables was determined to identify the characteristics of farmers who chose this remediation technology.

Data calculations and statistical analyses were performed using Microsoft Excel 2019 and IBM SPSS Statistics 24.0. Figures were drawn using OriginLab Origin 2022.
