Morphological Diversity

First, a flip operation on the image was performed. Due to the irregular shape of garlic cloves, they usually show different external contours when the two sides of their abdomen are facing vertically downward. Therefore, the diversity of the dataset can be increased through the horizontal flipping operation (Figure 5). After this operation, the sample size doubled to 2940.

**Figure 5.** Horizontal reversal amplification sample.

Then, stretch, shear, and translation were performed. These three operations can effectively increase the morphological diversity of the image and are still effective after image rotation. The stretching operation range was a random value in the range of 0–20%. The strength of the shear operation was a random value in the range of 0–10. The range of translation operation amplitude was a random value in the range of 0–10%. Through the overlapping operation of the above three transformations, the image samples were amplified to 29,400. The amplified samples are shown in Figure 6.

#### Image Rotation and Class Generation

When the plane is divided into four equal regions, upper, left, lower and right, the range of each region is 90◦. In order to ensure the generalization ability of the model for irregular orientation, before generating image categories, a random small-amplitude rotation operation was performed on the image samples. The rotation amplitude in the ideal state should be ± 45◦, but the original image was manually rotated and righted, and there might be subtle deflection that is not easy to detect. Furthermore, because the data enhancement operation includes shear transformation of random amplitude, a rotation transformation in the range of ±30◦ was performed on each image, acting directly on the original image, without generating new image samples. The transformed samples are shown in Figure 7.

**Figure 6.** Stretching, shearing, and translation.

**Figure 7.** Image samples after rotation in the range of ± 30◦.

After completing the above operations, each original image was rotated by 90, 180, and 270◦ counterclockwise to obtain the standard left, lower, and right images. At this time, the sample size expanded to 117,600. The samples of each orientation class are shown in Figure 8.

Motion Blur and Contour Extraction

Seeding speed is an important performance index of garlic seeders. In order to achieve high-speed seeding, garlic seed images should be collected in motion, which may lead to motion blur in the collected images. Because of the influence of uncertain motion blur on contour extraction, the data enhancement operation should also include motion blur with a certain probability and range. In this study, image samples were randomly selected with a probability of 50%, and motion blur with random amplitude was applied in the direction parallel to the long edge of the image (Figure 9).

**Figure 8.** Samples of each orientation class.

**Figure 9.** Motion blur in different direction classes.

After all the data enhancement operations were completed, the contour of the image samples were extracted one by one, finally forming the garlic seed contour dataset for model training (Figure 10).

**Figure 10.** Final generated outer contour image samples.

#### Logical Sequence of Data Enhancement Operations

If the rotation operation used to generate the orientation classification is performed before the zoom, stretch, shear, and translation operations, image samples with greater morphological differences can be generated, theoretically promote the generalization ability of the deep-learning model during training. However, it was found in experiments that datasets with the same orientation classification morphology have better performance. A possible reason is that when the four classifications contain samples with the same morphology, the deep learning model can suppress the influence of morphology on classification and pay more attention to the high-level semantic feature of "orientation".

Because motion blur is directional, motion blur transformation should be carried out after rotation transformation, and motion blur may affect the edge contour, so contour extraction should be carried out after motion blur operation.

#### Storage of Dataset

The lightweight CNN model discussed has low computation complexity. A large batch size can be used in training on PC, and the training/inferring time of each batch is very short. In the initial practice, it was found that the transmission speed of training samples was often lower than the processing speed of the model; therefore, the way the dataset is stored has been improved. In the experimental environment of this paper, when using TFRecord format defined in TensorFlow [15] to store datasets, the input speed of samples could reach more than 10 times that of batch reading image files, which is faster than the inference speed of all deep learning models introduced in this paper. Therefore, this format is used as one of the storage schemes for datasets and testing of several CNN models.

The fully connected model proposed in Section 2.2.3 takes the pixel coordinates of garlic clove contours as the input. When the dataset composed of image samples is converted into the form of pixel coordinate array, the volume of the dataset is further reduced, and the whole dataset can be loaded into memory during training. The format DataFrame of Pandas [16] is used to store an array of contour point coordinates for all samples, which is changed to H5 format for loading on each training task. For the above two dataset formats, the shuffle operation was implemented for each training epoch to obtain better training results.

### *2.2. Lightweight Recognition Model of Garlic Clove Bud Orientation*

For the garlic seed orientation recognition method studied in this paper, its accuracy is the first important criterion. Secondly, it is of practical significance to improve the running speed of the recognition model under the premise of ensuring accuracy. At the same time, the hardware cost of the algorithm application is also one of the factors considered in this paper, which is a necessary condition to ensure the generalizability of the application. Low hardware cost means low computing performance, so the complexity of the recognition model needs to be greatly reduced, which should be key for input features and lightweight models. Therefore, the main contribution of this paper is to propose a deep-learning model, that is, to improve the recognition rate and running speed of the model, give priority to ensuring the accuracy of the algorithm, and try to lighten the model on this basis to adapt to low-cost embedded platforms. The application of a convolutional network and a fully connected neural network in garlic-clove orientation recognition was attempted in this study. The convolutional network included MobileNetV3 [17] with relatively complex structure and the naive CNN model, composed of convolutional-pooling stacking only. These directly used garlic contour image samples as input, and automatically completed feature extraction through image convolutional operations. The fully connected model used the contour point coordinate set sampled from the image samples as the input, and the contour point sampling operation can be regarded as a feature extraction method.

#### 2.2.1. Transfer Learning Based on MobileNetV3

MobileNetV3 is an excellent lightweight deep-learning model with two versions, large and small, which can be used as solutions for different levels of hardware performance. The TensorFlow framework comes with MobileNetV3 implementation code based on Keras API and provides six groups of pre-training parameters for training from an ImageNet dataset, which correspond to three forms of large and small models: standard width, 0.75 width, and standard width minimal mode. Based on this, transfer learning was tested.

The input size of the model is directly related to the amount of calculation required. In terms of ensuring the recognition performance of the model, the smaller the input size, the better. It was found in the experiment that when the image sample size was scaled to 120 × 160, the recognition performance of the model did not decrease significantly, so the input size of the MobileNetV3 model was modified to (120, 160, 1). The orientation of garlic clove buds was divided into four categories; the output size of the corresponding model is a 4-dimensional vector. Because the input and output of the model are redefined, only the weight values of the intermediate layers that are consistent with the original model parameter structure were loaded when loading the pre-training weights, and the intermediate layers with different structures were equivalent to training from zero.

Twelve model structures, including six with pre-training weights, were trained. The training results (Table 1) show that the transfer learning is effective. Using the pre-trained weights on the ImageNet dataset to perform transfer learning on the garlic seed outline, the image dataset could obtain a higher accuracy than starting training from zero. Overall, the large model performed better than the small model. The performance of the minimalistic mode was lower than that of the non-minimalistic mode, but this gap was not noticeable when pre-trained weights were not used. By comparing the number of parameters and calculation of different models, it can be seen that reducing the width factor mainly reduces the amount of calculation required for the model, which can improve the running speed of the model, while the minimalistic mode mainly reduces the number of parameters of the model, which can reduce the memory consumption of the model. The accuracy rate of all the model forms can reach above 0.96, and they all have certain application value. The structure of MobileNetV3-Large is shown in Figure 11.


**Table 1.** Overview of the performance of the transfer learning model.

Note: - indicates that the test could not be performed due to a lack of pre-trained weights.

**Figure 11.** Structure of MobileNetV3.
