2.4.2. Model Training and Validation Environment

The mixed dataset was divided into a training set, a validation set, and a test set, at a ratio of 64:16:20. The training and validation sets were resized to 640 × 640 pixels for input to the network for training and validation. The test set was used for the network assessment and testing. The model training process was carried out using the Ubuntu 18.04 Cloud operating system (Cuda 11.0, Cudnn 8.0.4, Python 3.8, Pytorch 1.8, 4 × NVIDIA RTX 3090). The model assessment process was performed using the Windows 11 operating system (Cuda 11.3, Cudnn 8.2.1, Python 3.8, Pytorch 1.10.2, 1 × NVIDIA RTX 1650). During the experiments, the freezing and unfreezing training process was conducted with 100:200 epochs, with primary batch sizes of 128:32 and 16:16 when the SAHI algorithm was not utilized. The other critical hyperparameters are listed in Table 3.

**Table 3.** The primary hyperparameters of the model training process.


### 2.4.3. Evaluation Indicators

Precision (P), recall (R), average precision (AP), and different types of mean average precision (*mAP*) are the main indicators used to assess the efficacy of the model for detecting apple flowers. The experimental outcome measures can be presented by calculating various combinations of positions within the confusion matrix: true positive (TP) is the accurate forecast for positive samples, true negative (TN) is the correct prediction for negative samples, false positive (FP) is the erroneous prognosis for positive samples, and false negative (FN) is the incorrect prediction for negative samples. In addition, the precision– recall (P–R) curve forms corresponding points between the horizontal axis, representing recall (R), and the vertical axis, representing precision (P) (with an IoU threshold equal to 0.5). Different intersections over the union (IoU) values were obtained by setting a certain degree of overlap between the prediction and the ground truth. Other specific metrics were calculated as follows:

AP: Average precision for a single category (IoU threshold from 0.5 to 0.95 in steps of 0.05), including bud, half-open, fully open, and end-open apple flowers;

*mAP*ALL: Mean average precision of apple flowers of the four stages (all pixels);

*mAP*S: *mAP* for small objects whose area is smaller than 322 pixels; *mAP*M: *mAP* for medium objects whose area is between 322 pixels and 962 pixels; *mAP*L: *mAP* for large objects whose area is bigger than 962 pixels.

Calculating the *P*, *R*, *AP*, and mAP for a given IoU threshold is defined in Equations (1)–(4).

$$P = \frac{TP}{TP + FP} \tag{1}$$

$$R = \frac{TP}{TP + FN} \tag{2}$$

$$AP = \sum\_{k=1}^{N} \max\_{\overline{k} \ge k} \left[ P(\overline{k}), P(k) \right] \left[ R(\overline{k}) - R(k) \right] \tag{3}$$

$$mAP = \frac{\sum\_{i=1}^{M} AP\_i}{M} \tag{4}$$

where *k* and *k* represent the point serial numbers before and after interpolation; *N* is the number of images under a category; *M* is the number of categories; *i* is the category label; and *P*(*k*) and *R*(*k*) are the precision and recall of the kth point. *APi* is the average precision of class *i*.
