*2.6. Model Evaluation*

There were 400 spectra for the four classes of mixed samples; 30% of the spectra were randomly selected as the prediction dataset, the remaining 70% being divided into training and validation datasets in a 3:1 ratio, which were then used to adjust the network hyperparameters. The accuracy of the training, validation, and prediction datasets (*ACCT*, *ACCV*, and *ACCP*), as well as the *Precision*, *Recall*, and *F*1-*score* of the prediction datasets, were used to evaluate the model performance. *Precision* is the percentage of true positives in all predicted positives; *Recall* is the percentage of predicted true positives in all positives; *F*1-*score* is the weighted harmonic average of *Precision* and *Recall*. The *ACC*, *Precision*, *Recall*, and *F*1-*score* may be conveniently calculated using the following expressions:

$$\text{ACC} = \frac{TP + TN}{TN + FP + FN + TP} \tag{1}$$

$$Precision = \frac{TP}{TP + FP} \tag{2}$$

$$Recall = \frac{TP}{TP + FN} \tag{3}$$

$$F1-score = \frac{Precision \times Recall}{Precision + Recall} \tag{4}$$

where *TP* (resp. *TN*) stands for true positive (resp. negative) and *FP* (resp. *FN*) for false positive (resp. negative).
