*2.1. Binocular NIR Vision Unit*

The 3D locating system was composed of a binocular vision system, a light source module, and host computer software, as shown in Figure 2a. In this system, the binocular vision system was composed of two gigabit industrial cameras produced by Hangzhou Haikang Robot Technology Co., Ltd. (Zhejiang, China). The camera model was MV-CA060- 10GC, which is equipped with the lens model MVL-HF0628M-6MPE and a near-infrared filter of 850 nm. The resolution of each camera is 3072 (H) × 2048 (V), the focal length is 6 mm, and the frame rate is 15 fps. The two cameras were installed on the camera frame in parallel, and the baseline length was 50 mm. In addition, the system was illuminated by an 850 nm diffuse light bar, which can emit light evenly without shadows. The image processing platform adopted a Lenovo notebook ThinkPad P1, 24 GB RAM, Inter-Core i7-8750H@2.20 GHz, Windows 10, 64-bit system. The software system was mainly based on the OpenCV visual library and the TensorFlow deep learning framework.

Before image acquisition, a chessboard calibration board with a square size of 30 mm × 30 mm was used to perform stereo correction on the binocular camera [29]. In the process of image acquisition, the acquisition device was placed immediately above the cabbage leaves under natural illumination to collect images of *P. rapae* in the field. The collected images are shown in Figure 2b–e.
