2.2.2. Dataset Production

In order to improve the effectiveness of training and increase the diversity of samples, the collected image data were screened before training, and the images with low definition were removed. Finally, 1320 walnut kernel images were obtained and stored in JPG format. After processing by Matlab, the image resolution was set to 512 pixels × 512 pixels. In this paper, the dataset is enhanced by changing the adaptive contrast, rotation, translation, cropping and other methods, and the dataset is expanded to 5732 images [21]. The dataset contains four categories of labels: walnut shell, small impurities (diameter less than 5 mm), foreign impurities and metamorphic walnut kernels, as shown in Figure 2. The gray value range of the walnut kernel is the basis for identifying the deterioration degree of walnut kernels. All the images of walnut kernels are gray processed, and the gray value range of the metamorphic walnut kernel is from 20 to 35 after testing and statistics. The

image labeling software is Labelimg, which is used to label the real bounding box and categories [22]. Then, according to the ratio of 3:1:1, all the enhanced images are divided into the training set, validation set and test set. There are 3439 images in the training set, 1146 images in the validation set and 1146 images in the test set.

**Figure 2.** Walnut kernel impurity type labeling.
