*3.2. Validity Results of Mask R-CNN*

The model training (Section 2.2.1) results showed that the average precision (AP), AP0.50, and AP0.75 of the ResNet50-based Mask R-CNN model constructed on the self-built NIR field *P. rapae* image dataset reached 94.24%, 98.74%, and 96.79%, respectively.

Manual detection was performed on 70 images in the test set. The target distribution of the test set was actually 158 *P. rapae* larvae, and each image contains at least one.

Then, the test set images were input into the above models. The object detection results of the larvae in the image samples of the test set by the model are shown in Table 1. The values of precision, recall, and *F*<sup>1</sup> were 96.65%, 97.47%, and 96.55%, respectively, showing the effectiveness of the proposed model.

**Table 1.** Identification results for the *P. rapae* larvae in the test set.


<sup>1</sup> *N* is the total number of larvae in the test set. *TP*, *FP*, and *FN* are the quantities of correctly predicted positive samples, incorrectly predicted negative samples, and incorrectly predicted positive samples, respectively.

#### *3.3. 3D Localization Results of Field Pests*

The binocular stereo vision system completed the camera calibration and stereo correction, and the results are shown in Table 2. The reprojection error was 0.36 pixels, and the calibration results meet the test requirements [37].


**Table 2.** The internal and external parameters of the binocular stereo vision system.

#### 3.3.1. *X*-Axis and *Y*-Axis Location Error

In this paper, the ratio of the image positioning deviation of the laser strike point of different scales to the pixel width of the *P. rapae* body was used as the *X*-axis and *Y*-axis location error, and the results are shown in Figure 16. In the sample images of the whole test set (N = 30), all larvae were correctly recognized and segmented and the average image location errors in the *x* coordinate and the *y* coordinate of the laser strike point were 0.09 and 0.07, respectively. The maximum errors in different scenarios were 0.23 and 0.16.

**Figure 16.** The location error of the laser strike point on the *X*-axis, the *Y*-axis, and the *Z*-axis. *d* denotes the pixel width of the *P. rapae* body in images of different scales. The location error is represented by the ratio of the x coordinate, y coordinate deviation (*ex*,*ey*) and *d*.

In the experiment, the same *P. rapae* larvae were used in different locations of the vegetable field and the larval body width was 4.16 mm (Manual measurement). Therefore, the average absolute error of the *X*-axis of the laser strike point was 0.40 mm and the maximum error was 0.98 mm. The average absolute error of the *Y*-axis was 0.30 mm, and the maximum error was 0.68 mm.

Considering the distance between the real and the located point, the average absolute error of the total location error in the *X*–*Y* plane was 0.53 mm, and the maximum error was 1.03 mm. All the located point were within the effective strike range in the middle of the pest abdomen (Figure 6).

#### 3.3.2. *Z*-Axis Location Error

Analysis of Figure 16 shows the visual location error in the depth direction of the system when the working depth was between 400 and 600 mm. The average absolute error was 0.51 mm, and the maximum error value was 1.15 mm. The root mean square error and the mean absolute percentage error of the system were 0.58 mm and 0.10%, respectively, which shows that there is a strong explicit correlation between the estimated depth and the actual depth of the system.

#### **4. Discussion**

An automatic laser strike point localization system was established in this study based on the multi-constraint stereo matching method, which provided a basis for pests' laser control. Three aspects of the proposed model will be discussed in this section, i.e., the effects of the segmentation model, the effect of the location method, and the effect of the stereo matching method. Further improvements for the 3D locating system will also be pointed out in this section.

#### *4.1. Analyses of Instance Segmentation Result*

Experiment 1 showed that the segmentation results (AP, AP0.50, and AP0.75) of the ResNet50-based Mask R-CNN model were higher than 94% on the self-built NIR image dataset of *P. rapae*. The good segmentation performance of the network proves that the application of near-infrared imaging technology is feasible for pest identification, with protective color characteristics in multi-interference scenes.

In the sample images of the whole test set, the number of correctly predicted, incorrectly predicted, and unrecognized *P. rapae* were 154, 3, and 4, respectively. Among them, the number of incorrectly predicted and unrecognized *P. rapae* in a single *P. rapae* image was 0. The main causes of errors are: (1) When two or more *P. rapae* larvae overlap each other, the larvae bodies are blocked. This situation increases the difficulty of identification, resulting in multiple pests being identified as a whole or a single pest being only partially segmented (Figure 17a). (2) In the near-infrared image, the soil color is close to that of the cabbage bugs. When a leaf has a hole to expose the soil and the shape is a long strip, the model will misjudge it as a *P. rapae* larva (Figure 17b). Furthermore, the complicated network structure also makes the training time of Mask R-CNN longer. The detection time for a single image in the segmentation network was 460 ms.

**Figure 17.** False identification results. (**a**) Two *P. rapae* larvae overlap each other and (**b**) leaf holes mistakenly identified as *P. rapae*.
