*2.6. Performance Evaluation Indicators for the Model*

Confusion matrix, accuracy, precision, recall, and F1-score are usually used to evaluate the performance of models for single-label image classification issues [37]. The confusion matrix is mainly used to compare the objective results with the predicted results when evaluating the recognition accuracy of the images. Accuracy refers to the probability of predicting correct samples among all samples. Precision indicates the proportion of samples with positive predictions that are correctly predicted. Recall denotes the proportion of correctly predicted outcomes in the actual sample of true examples. In the actual situation, precision and recall are mutually "restricted". Therefore, we need the F1-score, a weighted average of precision and recall, to comprehensively evaluation the performance of models. The higher the F1-score, the better the performance of the model. The calculation formula of each indicator is as follows.

$$\text{Precision (P)} = \frac{TP}{TP + FP} \tag{5}$$

$$\text{Recall (R)} = \frac{TP}{TP + FN} \tag{6}$$

$$\text{Accuracy (Acc)} = \frac{TP + TN}{TP + TN + FP + FN} \tag{7}$$

$$\text{F1-score} = \frac{2 \times \text{P} \times \text{R}}{\text{P} + \text{R}} \tag{8}$$

Here, *TP* is the number of samples where the actual case is true, and the predicted outcome is positive. *TN* is the number of samples where the actual case is true and the predicted outcome is negative, and the same for *FP* and *FN*. They can be calculated by a confusion matrix.
