*3.6. Model Evaluation*

Five parameters, including accuracy (ACC), sensitivity (SE), specificity (SP), efficiency (EFF), and Matthews correlation coefficient (MCC) were applied to evaluate the identification ability of RF and OPLS-DA models. The ruggedness of OPLS-DA model was investigated through 200 times permutation tests. Furthermore, cumulative prediction ability (Q2), cumulative interpretation ability (R2), root mean square error of estimation (RMSEE), root mean square error of cross-validation (RMSECV), and root mean square error of prediction (RMSEP) were important evaluation indexes for the predictive power of OPLS-DA model [33,66].

Values of TP (Correctly identified samples of positive class), TN (correctly identified samples of negative class), FN (incorrectly identified samples of positive class), and FP (incorrectly identified samples of negative class) were calculated according to confusion matrixes of classification models. Subsequently, ACC, SE, SP, EFF, and MCC were calculated while using Equations (1)–(5) and values of Q2, R2, RMSEE, RMSECV, and RMSEP computed by software SIMCA 14.1.

$$\text{ACC} = \frac{(\text{TN} + \text{TP})}{(\text{TP} + \text{TN} + \text{FP} + \text{FN})} \tag{1}$$

$$\text{SE} = \frac{\text{TP}}{\text{(TP} + \text{FN)}} \tag{2}$$

$$\text{SP} = \frac{\text{TN}}{(\text{TN} + \text{FP})} \tag{3}$$

$$\text{EFF} = \sqrt[n]{\text{SE} \times \text{SP}} \tag{4}$$

$$\text{MCC} = \frac{(\text{TP} \times \text{TN} - \text{FP} \times \text{FN})}{\sqrt{(\text{TP} + \text{FP})(\text{TP} + \text{FN})(\text{TN} + \text{FP})(\text{TN} + \text{FN})}} \tag{5}$$

For model performance, lower values of RMSEE, RMSECV, and RMSEP mean better predictive ability for the models. Conversely, the closer that values of ACC, SE, SP, EFF, MCC, and Q2, R<sup>2</sup> are to 1, the more well performance the model is.
