2.1.1. Garlic Clove Image Sample Collection

Deep learning requires that the training data and test data meet the conditions of being independent and identically distributed to ensure the generalization ability of the model. In the practical application stage of the model, the input data of the model must be independent and identically distributed with the training set in order to make the model work effectively. Therefore, considering the practical application of a deep learning model in a garlic planter, the training samples should cover wider morphological diversity distribution to enhance the robustness of model. When selecting training samples, one should not only ensure the garlic clove sizes, weights, and appearance, but also consider the influence of garlic seed production technology and other factors on garlic seed morphology, such as skin residue.

In this paper, the binary contour image was used as the model input, and the morphological features such as color and texture were discarded in the process of extracting the contours, while some edge features were preserved. Commercial garlic seeds are prone to having residual garlic skins and abnormal spikes. These garlic skin residues have a great impact on the extraction of contour images, and sometimes the extracted contour images may seriously deviate from the standard shape of garlic seeds. Therefore, the selection of training samples should also consider the situation of carried garlic skins while seeding. Because the individual shape of garlic cloves of hybrid garlic breed is the most diverse, we

randomly selected *Jinxiang* garlic and divided it into cloves, retained all the garlic cloves without screening, and obtained a total of 735 garlic cloves for image samples. When dividing garlic cloves, about <sup>1</sup> <sup>2</sup> of the skin residue of the garlic cloves was retained to ensure consistency with real sowing of garlic seeds, as shown in Figure 1a.

**Figure 1.** (**a**) Garlic clove samples and (**b**) image acquisition device.

Sample Acquisition Device and Image Preprocessing

In order to directly obtain the contour images of the garlic cloves, a garlic seed shooting device was designed that uses a transparent clamping belt to clamp and transmit the garlic cloves and adopts the method of back illumination of area light source. The area light source is placed below the transparent clamping belt, and the image sensor is placed above the transparent clamping belt. The clamping transmission module is wrapped by an opaque shell to avoid the influence of external light on image acquisition. The light emitted by the area light source passes through a transparent clamping tape to form a clear garlic clove shadow image on the vision sensor, as shown in Figure 1b. The image collected under the ideal state is shown in Figure 2a. However, because the reflection in the shell cannot be completely eliminated, some reflected light will still be cast on the upper surface of the garlic clove, and the continuous transmission of the garlic clove will bring the dust that adhered to the garlic clove into the shell, reducing the contrast between the shadow area of the garlic clove and the background, as shown in Figure 2b.

The above situation increases the difficulty of binarization of shadow image. Because the shadow of the garlic seed image is too dark, the binarization performance to achieve contour is poor. Manually adjusting the binarization threshold can alleviate the problem of misclassifying the area around the shadow image, but the shadow of the garlic clove will be lost and cannot be applied automatically, as shown in Figure 2c,d. An extremely lowcomputation pixel compensation method is proposed to solve this problem. The control system records an image of the empty conveyor belt without cloves, and then calculates the pixel difference matrix between this image and a pure white image and saves it as a pixel compensation matrix. When intercepting the garlic clove shadow image frame, the intercepted image frame is added to the pixel difference matrix, and then the Otsu binarization method [14] is used to obtain a high-quality binarized image, as shown in Figure 3. The calculation rules are shown in Equations (1) and (2), where *O* represents the image with no load when the device is initialized; *C* stands for the pixel compensation matrix; *X* represents the image frame collected in real time; *X'* represents the image frame

after compensation; and *m* and *n* represent the number of rows and columns of the pixel matrix, respectively.

$$\mathcal{C} = 255 - \mathcal{O} = \left[255 - o\_{ij}\right]\_{m \cdot n} \tag{1}$$

$$X' = X + \mathbb{C} = \left[ \max \{ x\_{ij} + c\_{ij}, \text{ 255} \} \right]\_{m \cdot n} \tag{2}$$

**Figure 3.** Contour extraction after pixel compensation. (**a**) Background of conveyor, no clove, (**b**) Pixel compensation matrix, (**c**) Compensated garlic clove shadow, (**d**) Binary shadow image, (**e**) Outline of garlic seed, (**f**) Contour sampling points.

Dataset Acquisition Method

During the sample image acquisition, the mechanical device introduced in the previous section was used to transmit garlic seeds, the vision sensor on it was used to record a video, and then the image frames were extracted from the acquired video. A total of 1470 original image samples were obtained. Among them, 1172 images were randomly selected as the training set, and the remaining 298 images were used as the validation set. Since the length–width ratio of most image sensors is 4:3, when applied to the seeder, the long side of the picture was parallel to the travel direction of garlic seeds to obtain a larger observation field of garlic seeds. In order to meet this demand, the image samples used for model training were processed with the same length–width ratio and were finally saved with an image size of 640 × 480 by cutting or expanding the image boundary (Figure 4). Image rotation does not change the shape of garlic cloves. In this study, the original image samples were all adjusted to the upward state of garlic clove buds through image rotation operation. In the data enhancement stage, image samples with other orientations were generated through image rotation.

**Figure 4.** Part of the original sample.

2.1.2. Data Enhancement for Datasets

Because the original images were adjusted to the garlic clove bud upward state, the key task in data enhancement was to generate image samples with left, bottom, and right orientation. In addition, some image transformations need to be performed on the image samples to make the images of the dataset more diverse to ensure the generalization ability of the model. In order to make the training data and the validation data conform to the conditions of independent and identical distribution, the same data augmentation operation was performed on the training samples and the validation samples, and the samples in the training set and validation set were always isolated during this process. The image enhancement methods include horizontal flipping, stretching, shearing, translation, rotation, and motion blur. All these methods except motion blur can be realized by twodimensional geometric transformation, which can be completed by multiplying the pixel matrix of the image by a homogeneous transformation matrix. The mathematical expression of this process is shown in Equation (3). In order to enhance the generalization of the sample to the image acquisition environment in the garlic seeder, these methods need to follow a certain logical order.

$$X' = M\_f \cdot M\_s \cdot M\_d \cdot M\_t \cdot M\_r \cdot X \tag{3}$$

$$M\_f = \begin{bmatrix} -1 & 0 & w \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}; M\_t = \begin{bmatrix} s\_x & 0 & 0 \\ 0 & s\_y & 0 \\ 0 & 0 & 1 \end{bmatrix}; M\_d = \begin{bmatrix} 1 & d\_x & 0 \\ d\_y & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}; M\_t = \begin{bmatrix} 1 & 0 & t\_x \\ 0 & 1 & t\_y \\ 0 & 0 & 1 \end{bmatrix}; M\_r = \begin{bmatrix} \cos\theta & -\sin\theta & 0 \\ \sin\theta & \cos\theta & 0 \\ 0 & 0 & 1 \end{bmatrix}$$

where *X* represents the original image, and *X'* represents the image after transformation; *Mf*, *Ms*, *Md*, *Mt*, and *Mr* represent the transformation matrix of horizontal inversion, stretching, shearing, translation, and rotation, respectively; *w* represents the width of the image; *sx* and *sy* represent the stretching ratio in two directions; *dx* and *dy* represent the shearing amplitude in two directions; *tx* and *ty* represent the translation distance in two directions; *θ* represents the rotation angle of the image.
