2.2.1. Image Smoothing and Binarization

We first converted each color image to grayscale using an image grayscale transform. Image smoothing was achieved by median filtering that can eliminate image noise while preserving image edge information before implementing image segmentation [26]. We used a fixed threshold to complete the image binarization operation, which avoided the situation of separating rice endosperm and bran by other methods. Finally, we performed a morphological opening and closing operation on the binarized image to smooth the image and fill the holes inside the target rice.

#### 2.2.2. Segmentation of Single-Grain Rice Images

The Canny algorithm of contour detection was used to detect the edge of each grain of rice. The minimum circumscribed rectangle of each rice was drawn, and its four vertex coordinates and rotation angle were gained. Next, the original rice image was rotated by the derived rotation angle. Finally, image segmentation of single-grain rice in a vertical state was realized by extending the coordinates of the rotated rectangle vertex to the surroundings by 5 pixels as the boundary. Figure 2 shows the sample data of three kinds of DOM rice after single-grain segmentation.

**Figure 2.** Single-grain rice images of three kinds of DOM. (**a**,**b**) Well-milled. (**c**,**d**) Reasonably well-milled. (**e**,**f**) Substandard.

#### *2.3. Data Augmentation*

A dataset was established based on the segmented single-grain rice images, and 5800 valid images each of well-milled, reasonably well-milled, and substandard rice was obtained, for a total of 17,400 images. Each category of rice dataset was divided into a training set, validation set, and test set with a ratio of 6:2:2 for each category. That means obtaining 3480 images per class of rice for the training set and 1160 images for the validation and test sets, respectively. The training set is used for training the model, while the validation set is employed to optimize the model structure and hyperparameters, and the test set is only designed to test the performance of the model to enhance its generalization ability.

It is essential to enhance the training set data to reduce the incidence of overfitting when the data are limited. Firstly, each rice was cropped to an image of the same size (224 pixels × 224 pixels) by the center cropping for input into the CNN model. Secondly, 30% of the training data were randomly selected for horizontal and vertical flipping, respectively. Then, a random rotation was executed for each image with rotation angles ranging from 35◦ to 135◦. Finally, the mean and standard deviation of the three color channels of all training set images were calculated and fed into the normalization function to realize the normalization of each image. The training set was expanded according to the above steps to derive sufficient data to train models.

#### *2.4. Proposed Approach*

CNN is one of the most popular deep learning models and is widely used in image classification tasks at present. It is not only able to extract features of target objects in images automatically and comprehensively but also possesses the characteristic of weight sharing, which reduces the training parameters of the network and makes the model simpler [27]. We constructed an IRBOA model which can fuse multi-scale information based on the integration of the Inception-v3 structure and ResNet model to classify rice from three kinds of DOM. The model used was as described below.
