*3.1. Head-Mounted Interactive System*

A head-mounted interactive system is used to acquire the live scenes and voice, so that the operator can easily manipulate the robot to interact with the in-field environment. An approximately immersive operation experience can be obtained when breeding experts use this system.

The structure of the head-mounted interactive system is described in Section 2.1. The operator wears a Royole VR standalone headset to acquire the live video. In this way, the operator can remotely observe the scene of the robot in the current field of view in real time. The microphone mounted at the robot head can record the sound around the robot. The operator end and the robot end can communicate through the VR standalone headset and the robot computer. The raspberry Pi works as a server. In this way, the operator can hear the real-time voice that is "heard" by the robot to monitor the in-field situations better.

#### *3.2. Motion Interactive System Based on Perception Neuron (PN) Sensor*

In order to expediently control the complex movement of the multi-degree of freedom robot and improve control precision, a wearable sensor system is adopted to map the operator's movement to the robot's movement. Then, the robot can mimic the operator's phenotyping operations.

A perception neuron (PN) sensor system produced by Noitom Company® [28] is used. This sensor system includes thirty-two inertial measurement units, each of which has a three-axis gyroscope, a three-axis accelerometer, and a three-axis magnetometer.

A PN sensor can export a BioVision Hierarchy (BVH) file after acquiring human motion data. A BVH file is a universal human motion feature description format, which is often used in skeletal animation models [29]. The BVH file describes the human skeleton model in the joint diagram shown in Figure 4a. Each joint describes the motion information through three rotation parameters and a complete description of the human motion is achieved. After the BVH data collected by the PN sensor are transmitted to the robot controller through the TCP/IP protocol, the Euler angles in the BVH need to be converted into joint angles and sent to the lower computer.

**Figure 4.** Motion interaction. (**a**) BVH joint diagram. (**b**) URDF visualization. (**c**) Motion interactive experiments.

However, the actual movement of the human body is physiologically constrained. Not every joint has three degrees of freedom, and some degrees of freedom are not independent of each other, so there is a large difference between the BVH model and human body. Therefore, the mapping of Euler angles to the joint angle of the robot requires a reasonable algorithm. For example, the human shoulder joint has three degrees of freedom, which is similar to the shoulder of the robot body, so the Euler angle of the shoulder joint motion can be directly mapped to the robot body through rotation matrix. Since the elbow joint of the robot body has only one bending degree of freedom and lacks a rotational one, the elbow bending angle can be obtained by calculating the angle between the direction vector of the large arm and forearm. The angle of rotation of the wrist joint is mapped by the angle of rotation of the human elbow. We denote vector <sup>→</sup> *<sup>r</sup>*<sup>1</sup> and <sup>→</sup> *r*<sup>2</sup> as the large arm and forearm, respectively, and <sup>→</sup> *r*<sup>1</sup> is the position direction of the X-axis. Therefore, the elbow bending angle can be calculated as

$$
\theta = \pi - \left< \vec{r\_1}, \vec{r\_2} \right> = \pi - \arccos \left( \vec{r\_1} \cdot \vec{r\_2} \right). \tag{1}
$$

We assume that the two rotation degrees of freedom are along the Y- and Z-axes, respectively. The PN sensor can acquire the Euler angles of ZYX axes of the human arm, i.e., *αz*, *βy*, *γx*. Since the rotation degree of freedom of the X-axis does not exist in the human arm, *γ<sup>x</sup>* ≈ 0. The rotation matrix of the elbow is formulated as

$$R = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} \cos \beta\_y & 0 & \sin \beta\_y \\ 0 & 1 & 0 \\ -\sin \beta\_y & 0 & \cos \beta\_y \end{pmatrix} \begin{pmatrix} \cos a\_z & \sin a\_z & 0 \\ -\sin a\_z & \cos a\_z & 0 \\ 0 & 0 & 1 \end{pmatrix} . \tag{2}$$

The direction vector of <sup>→</sup> *r*<sup>1</sup> is *r*ˆ <sup>1</sup> = (1, 0, 0) *<sup>T</sup>*. Therefore, the direction vector of <sup>→</sup> *r*<sup>1</sup> is

$$\begin{split} \mathbf{r}\_2^\* = R\mathbf{r}\_1^\* &= \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} \cos\beta\_y & 0 & \sin\beta\_y \\ 0 & 1 & 0 \\ -\sin\beta\_y & 0 & \cos\beta\_y \end{pmatrix} \begin{pmatrix} \cos a\_z & \sin a\_z & 0 \\ -\sin a\_z & \cos a\_z & 0 \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix} \\ &= \begin{pmatrix} \cos\beta\_y\cos a\_z \\ -\sin a\_z \\ -\sin\beta\_y\cos a\_z \end{pmatrix} .\end{split} \tag{3}$$

Finally, the elbow bending angle can be obtained by

$$\theta = \pi - \arccos\left(\overrightarrow{r\_1}\,\overrightarrow{r\_2}\right) = \pi - \arccos(\overrightarrow{r\_1}, \widehat{r\_2}) = \pi - \arccos\left(\cos\beta\_y\cos a\_z\right). \tag{4}$$

The robot hand has only one degree of freedom. In order to map the human hand motion to the maximum extent, the hand degree of freedom selects the fold angle of the human middle finger. Because of the high degree of freedom of the human neck, the left and right rotational degrees of freedom of the robot are directly mapped by the left and right rotation angle of the human neck.

A Unified Robot Description Format (URDF) file is constructed in the robot operating system (ROS) that runs on the robot's industrial computer. It contains the joint relations of each mechanical parts of the robot and real-time simulation of the robot can be realized based on the URDF file, as shown in Figure 4b. ROS transmits the mapped joint angle data in real time through the serial port to the lower machine with a 10 Hz sampling frequency. Then, the lower machine drives the joint servos moving to the corresponding angle. Therefore, the operator's motion is mapped to the robot ontology. Some motion interactive experiments are shown as Figure 4c.

#### *3.3. Bio-Inspired Operation*

Through the head-mounted interactive system and the motion interaction system based on PN sensors, the operator can remotely control the robot in an immersive interactive way. The breeding expert must wear a full headset linked to the interactive system in the control room, so that it is possible to observe the real-time environment around the robot by the head-mounted interactive system through the movement of their head. The operator observes the plants that need to be measured and moves the robot to the appropriate position. The operator only needs to repeat the procedure and operations during the traditional manual phenotyping process, then the robot can be controlled to mimic his/her action to interact with the plant. The phenotype is then measured by the machine vision system. The naturally instructive paradigm is user-friendly and especially highly efficient with the first person view (FPV), which can accomplish efficient phenotyping operations [30]. The robot completely mimics the interactive operations of the breeding experts, so this interactive form has high efficiency and strong adaptability. With the help of the automated visual system, high-efficiency and high-precision phenotyping is achieved through the interactive cognition method.

Regularized phenotyping forms are formed through the bio-inspired operations based on the HRI technique. In the process of HRI, the typical operation schedules and actions of the breeding experts are recorded. In the long term, a large amount of data is recorded to form a manual teaching dataset. With a sufficiently large data set, the automation of interactive cognition can be continuously improved through continuous training using machine learning algorithms. We have conducted various studies on the human-in-theloop imitation control method to improve robot adaptability to uncertain environments, although it is still challenging to realize entire task autonomy in a short period of time [31]. Eventually, fully automated bio-inspired phenotyping systems can be implemented to replace the traditional manual phenotyping pattern.

### **4. In-Field Rice Tiller Counting Method**

#### *4.1. Image Acquisition*

When the occlusion is removed through the interactive method illustrated above, images of the rice plant can be captured by the camera for tiller counting. However, since the tillers have similar colors with the background, it is difficult to recognize each tiller from an RGB image without depth information. To provide depth information for the images captured by an RGB camera, we use a horizontal line laser to scan the tillers. While the structured light system scans up and down, multiple images that scan different heights of the plant can be recorded for further tiller number recognition.

To reduce the influence of natural light on the light spots of the laser, we capture images with a small aperture to reduce the amount of light. Under this circumstance, the laser light spots can still be clearly identified and the rest of the regions are relatively dark. The images are then transformed to grayscale images to reduce computation. These grayscale images are resized to 256 × 256 pixels through bilinear interpolation to further improve computation efficiency.
