3.3.2. Machine Learning Methods

Machine learning models are gaining widespread attention for their ability to handle large amounts of input data from multiple platforms and to solve nonlinear tasks. Artificial Neural Network (ANN) and Back Propagation Neural Network (BPNN) are commonly used models for remote sensing estimation of crop N status, which can automatically extract relevant features from data. However, in practical applications, a large training dataset is required, and the number and size of the implied layers, training efficiency, and overfitting are considered. Yang et al. [96] used Gaussian radial basis function as the implied layer of the neural network to avoid the tedious calculation and overfitting phenomenon of BPNN, with structural adaptive features, good generalization ability and fast learning convergence speed, and more stable and reliable application. When constructing neural network models directly, the differences in results for different types of parameters are not obvious, but the accuracy is significantly improved after using PCA for model input parameters [136–139]. The combined use of PCA and machine learning methods shows unique advantages and promising applications.

Support Vector Machine (SVM) is extremely effective for analytically solving highdimensional data problems. Yao et al. [140] applied traditional regression analysis, ANN and SVM, to compare the prediction accuracy, computational efficiency and complexity level of different methods for inversion of wheat LNC. The results showed that the machine learning models were more accurate, with the SVM method being more stable in dealing with potential confounding factors for most varieties, ecological niches, and growth stages. The kernel function in SVM is the focus of attention, and the multiple-kernel support vector regression (MK-SVR) plays an advantage in estimating N status at different growth stages because it combines the advantages of local kernel function and global kernel function [141]. However, complex optimization algorithms can reduce the computational efficiency of SVM and using a combination of least squares and SVM methods, LS-SVM can solve linear or nonlinear multivariate estimation capability in a relatively fast way, significantly improving the computational efficiency of SVM [17,141,142].

In the presence of weak a priori knowledge, Gaussian Processes Regression (GPR) can perform adaptive nonlinear fitting of complex datasets with flexible probabilistic Bayesian models and simpler parameter optimization applied to crop N status inversion [11,93,143]. Random Forest (RF) integrated with decision trees as the basic unit can rank the importance of variables, reduce redundancy in high-dimensional datasets, and have high stability, with vast application prospects [37,144]. When the entire spectral range of a single band is used as an input variable, the accuracy of regression by RF inversion (R<sup>2</sup> = 0.89) is higher than that of univariate regression with existing VIs; when VIs is used as input features, model accuracy is improved with R2 of 0.95 [145]. Determining the appropriate input dataset is a key element to exploit the predictive power of the model.

Machine learning methods techniques can be used to reveal the physiological and structural characteristics of plants, and can respond to dynamic differences in physiology due to environmental influences [144]. The study is no longer limited to conventional machine learning models, but improves the models starting from input datasets [138,145], model parameters [96,144], functions and structures [17,96,141], which not only improves

the efficiency of data analysis, but also enables higher accuracy N status analysis, making it more efficient and flexible in N monitoring. The input variables have diversified from single spectral information to mathematically transformed spectral data, spectral indices, and texture information, etc. while the machine learning methods are expanding toward efficiency, accuracy, and speed.
