*2.4. Sample-Set Division Algorithm According to Sample-Set Partitioning Based on the Joint X–Y Distance (SPXY)*

In this study, the SPXY algorithm proposed by Galvão et al. was used to divide the modeling and verification sets [44]. This approach is a method for dividing the sample set on the basis of the statistical perspective, and it comprehensively considers the difference between the spectrum and the property parameters to select the modeling set. The SPXY algorithm first calculates the Euclidean distance between the spectrum data of all samples using Equation (5). The algorithm then selects the two with the largest distance as the first two samples in the modeling set.

$$d\_x(p,q) = \sqrt{\sum\_{i}^{I} \left[ \mathbf{x}\_p(i) - \mathbf{x}\_q(i) \right]^2}; \; p, q \in [1, N], \tag{5}$$

where *xp*(*i*) and *xq*(*i*) are the spectral parameters of samples *p* and *q* at *i* wavelength, respectively; *I* is the number of wavelengths in the spectrum; and *N* is the number of samples.

The Euclidean distances between the remaining and selected samples were calculated. The sample with the next longest Euclidean distance was selected as the third sample in the modeling set. We repeated the above-mentioned steps until the number of selected samples was equal to the predetermined number.

The nature property factor *dy*(*p*,*q*) was considered as Equation (6) on the basis of the above-mentioned formula.

$$d\_{\mathcal{Y}}(p,q) = \sqrt{\left(y\_p - y\_q\right)^2}; \; p, q \in \left[1, N\right].\tag{6}$$

where *yp* and *yq* are the property parameters of samples *p* and *q*, respectively.

Variables *dx*(*p*,*q*) and *dy*(*p*,*q*) were divided by their maximum values in the dataset to ensure that the sample had the same weight in the spectral and property spaces. The standardized *xy* distance formula was as follows:

$$d\_{xy}(p,q) = \frac{d\_x(p,q)}{\max\_{p,q \in [1,N]} d\_x(p,q)} + \frac{d\_y(p,q)}{\max\_{p,q \in [1,N]} d\_y(p,q)}.\tag{7}$$
