*Article* **Monitoring of Soybean Maturity Using UAV Remote Sensing and Deep Learning**

**Shanxin Zhang 1, Hao Feng 1, Shaoyu Han 1,2, Zhengkai Shi 1, Haoran Xu 1, Yang Liu 2,3, Haikuan Feng 2,4, Chengquan Zhou 2,5 and Jibo Yue 1,\***


**Abstract:** Soybean breeders must develop early-maturing, standard, and late-maturing varieties for planting at different latitudes to ensure that soybean plants fully utilize solar radiation. Therefore, timely monitoring of soybean breeding line maturity is crucial for soybean harvesting management and yield measurement. Currently, the widely used deep learning models focus more on extracting deep image features, whereas shallow image feature information is ignored. In this study, we designed a new convolutional neural network (CNN) architecture, called DS-SoybeanNet, to improve the performance of unmanned aerial vehicle (UAV)-based soybean maturity information monitoring. DS-SoybeanNet can extract and utilize both shallow and deep image features. We used a highdefinition digital camera on board a UAV to collect high-definition soybean canopy digital images. A total of 2662 soybean canopy digital images were obtained from two soybean breeding fields (fields F1 and F2). We compared the soybean maturity classification accuracies of (i) conventional machine learning methods (support vector machine (SVM) and random forest (RF)), (ii) current deep learning methods (InceptionResNetV2, MobileNetV2, and ResNet50), and (iii) our proposed DS-SoybeanNet method. Our results show the following: (1) The conventional machine learning methods (SVM and RF) had faster calculation times than the deep learning methods (InceptionResNetV2, MobileNetV2, and ResNet50) and our proposed DS-SoybeanNet method. For example, the computation speed of RF was 0.03 s per 1000 images. However, the conventional machine learning methods had lower overall accuracies (field F2: 63.37–65.38%) than the proposed DS-SoybeanNet (Field F2: 86.26%). (2) The performances of the current deep learning and conventional machine learning methods notably decreased when tested on a new dataset. For example, the overall accuracies of MobileNetV2 for fields F1 and F2 were 97.52% and 52.75%, respectively. (3) The proposed DS-SoybeanNet model can provide high-performance soybean maturity classification results. It showed a computation speed of 11.770 s per 1000 images and overall accuracies for fields F1 and F2 of 99.19% and 86.26%, respectively.

**Keywords:** unmanned aerial vehicle; soybean; convolutional neural network; deep learning

#### **1. Introduction**

Soybeans are a high-quality source of plant protein and raw materials for the production of hundreds of chemical products [1,2]. China's soybean-growing areas include the Northeast China Plain [3] and the North China Plain [4] (ranging from the north latitude of 30◦ to 48◦). Soybean breeders must develop early-maturing, standard, and late-maturing varieties for planting at different latitudes to ensure that soybean plants fully utilize solar

**Citation:** Zhang, S.; Feng, H.; Han, S.; Shi, Z.; Xu, H.; Liu, Y.; Feng, H.; Zhou, C.; Yue, J. Monitoring of Soybean Maturity Using UAV Remote Sensing and Deep Learning. *Agriculture* **2023**, *13*, 110. https:// doi.org/10.3390/agriculture 13010110

Academic Editor: Josef Eitzinger

Received: 22 November 2022 Revised: 27 December 2022 Accepted: 28 December 2022 Published: 30 December 2022

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

radiation. Therefore, timely and accurate monitoring of soybean breeding line maturity can facilitate soybean breeding decision-making and agricultural management [5–8].

Traditional methods for measuring field breeding line maturity are time-consuming and labor-intensive [7]. Meanwhile, the expertise and bias of the investigators can affect the accuracy of field surveys. Breeding fields have thousands of breeding lines with different maturation times. Manual surveys cannot quickly provide high-frequency breeding line maturity information to meet harvesting and yield measurement scheduling requirements. Unmanned aerial vehicle (UAV) remote sensing technology can be used to collect highresolution crop canopy images and has thus been widely used in precision agricultural crop trait monitoring [9–12]. Compared with satellite and airborne remote sensing technologies, UAV remote sensing technology is relatively inexpensive and flexible in its operation, and it requires less space for landing and takeoff [13]. More importantly, the digital images obtained by low-altitude UAVs have a high ground spatial resolution (centimeter-scale or higher); thus, they contain rich crop-canopy surface information for crop phenotypic research [14,15]. In recent years, UAV remote sensing technology has been widely used to collect crop trait information [9–12,16,17]. UAVs equipped with high-definition digital cameras can acquire soybean canopy ultrahigh ground spatial resolution digital images over a field scale [14,15]. Many UAV-based methods have been proposed for monitoring various types of crop trait information, including the leaf area index (LAI) [18], leaf chlorophyll content [18–21], biomass [15,22], and crop height [23].

Machine learning has been successfully applied in several areas, such as image classification, target recognition, and language translation [24–26]. In recent years, machine learning techniques have been widely used to recognize various crop traits based on remote sensing images [27]. Gniewko et al. [28] used an artificial neural network (ANN), growing degree days, and total precipitation to estimate soybean yields. Letícia et al. [29] conducted a study to identify nematode damage to soybeans through the use of UAV remote sensing and a random forest (RF) model. The results obtained by Eugenio et al. [30] and Paulo et al. [31] indicated that machine learning techniques are efficient and flexible for remote sensing monitoring of soybean yields. Abdelbaki et al. [32] conducted a study to predict the soybean LAI and fractional vegetation cover (FVC) based on the RF model and UAV remote sensing. Compared with traditional machine learning methods (e.g., SVM and RF), deep learning methods such as long short-term memory (LSTM) [33,34], deep convolutional neural networks (CNNs) [26,35], and transformers [14] have been applied to image recognition, medical image analysis, climate change, and Weiqi game analysis, where they can provide results with similar or even higher precision than human experts. Deep learning uses multiple layers to extract higher-level features from the raw input. In recent years, deep learning techniques have been widely used to recognize various crop traits in remote sensing images, e.g., in leaf disease identification, weed identification, and crop trait recognition [1,26,33–37]. Wang et al. [34] developed an LSTM model by integrating MODIS LAI data to predict crop yields in China. Khan et al. [37] used a YOLOv4 model to identify apple leaf diseases in digital images captured by mobile phones. Zhang et al. [26] used a YOLOv4 model to identify weeds in digital photos of a peanut field. Khalied et al. [38] proposed a model based on MobileNetV2 for fruit identification and classification. Yonis et al. [39] proposed a CNN model adopting the VGG16 architecture for seed identification and classification. Notably, most of these widely used networks (e.g., YOLOv4 [40], ResNet50 [41], MobileNet [42], VGG16 [39], and InceptionResNetV2 [43]) did not take full advantage of shallow features. Shallow features derived from the shallow layers of CNNs are rich in image details, which are generally used in areas such as fine texture detection or small target detection [44,45]. Fusing the deep and shallow features of CNNs may improve performance in soybean maturity classification [44–46].

The objective of this work was to monitor soybean maturity using UAV remote sensing and deep learning. We designed a new convolutional neural network architecture (DS-SoybeanNet) to extract and utilize both shallow and deep image features to improve the performance of UAV-based soybean maturity information monitoring. We used a

high-definition digital camera on board a UAV to collect high-definition soybean canopy digital images from two soybean breeding fields. We compared the UAV-based soybean maturity information monitoring performances of conventional machine learning methods (support vector machine (SVM) and random forest (RF)), current deep learning methods (InceptionResNetV2, MobileNetV2, and ResNet50), and our proposed DS-SoybeanNet method. Our results indicate that the proposed DS-SoybeanNet method can extract both shallow and deep image feature information and can realize high-performance soybean maturity classification.

#### **2. Materials**

#### *2.1. Study Area*

The study area was located at the Shengfeng Experimental Station (E: 116◦22 10– 116◦22 20, N: 35◦25 50–35◦26 20, Figure 1) of the National Center for Soybean Improvement, Jiaxiang County, Jining City, Shandong Province, China. Jiaxiang County is situated on the North China Plain, with a warm continental monsoon climate, concentrated precipitation, and an average annual sunshine duration of 2405.2 h. The average annual temperature is 13.9 ◦C.

**Figure 1.** Study area (**a**) and experimental soybean field (**b**).

#### *2.2. UAV Flights and Soybean Canopy Image Collection*

We used a high-definition digital camera on board an eight-rotor electric UAV to collect high-resolution soybean canopy remote sensing images (Table 1). In the soybean breeding experimental field, the size of each planting area was approximately 2.5 m × 5 m. As shown in Figure 1, we selected two independent soybean planting fields (fields F1 and F2) in the study area to obtain soybean canopy digital images and maturity information.

**Table 1.** Parameters of the UAV and digital camera used in this study.


For field F1, we conducted five UAV flights (29 July, 13 August, 31 August, 17 September, and 28 September 2015). A total of 2116 soybean canopy digital images and their maturity information were obtained, which were used to calibrate the SoybeanNet model. For field F2, we made only one observation on 30 September, 2015. There were immature, near-mature, mature, and harvested soybean breeding lines in field F2 on 30 September. A total of 546 planting areas were set up in field F2 for the mapping and independent evaluation of the DS-SoybeanNet model.

The soybean image collection and image stitching process mainly included the following three steps:


#### *2.3. Soybean Canopy Image Labeling*

In this study, soybean maturity information was manually labeled. The labeling method was based on the standards of soybean harvesting. The labeling method is described in Table 2. For workers to customize schedules for harvesting soybean planting plots, four categories were used: immature (L0), near-mature (L1), mature (L2), and harvested (L3). L2 plots have the highest harvesting priority and need to be harvested as soon as possible, L1 plots have a high priority because the soybean will mature in less than a week, L0 and L3 plots have a lower priority because L0 plots generally take longer to grow, and no outdoor work is required for L3 plots.


**Table 2.** Standards used for labeling the soybean plots.

Since different soybean breeding lines have different maturation times, the numbers of images corresponding to the four labels varied between the two fields. Sixty percent of the images of each type in the dataset were randomly chosen to train the model, and the remaining 40% were used to evaluate the model's accuracy. Table 3 shows the numbers of samples used to train and validate the DS-SoybeanNet model. Figure 2 shows the soybean images used for model calibration and validation.


**Table 3.** Numbers of soybean images for model calibration and validation.

**Figure 2.** Examples of the four labels.

#### *2.4. Data Enhancement*

In this study, we produced a DOM for the entire area by mosaicking together the digital images collected during each UAV flight. Since an orthoimage has a uniform scale, the ground spatial resolutions and solar angles were the main differences between the five DOMs. We used image rotation (four rotation angles: 0◦ (i.e., the original image), 90◦, 180◦, and 270◦) and scaling (four scaling factors: 1.0 (i.e., the original image), 1.2, 1.5, 1.8, and 2.0) to enhance the soybean canopy image dataset collected from field F1. Image rotation and magnification helped us to obtain soybean canopy images with different resolutions and angles; in addition, they helped prevent overfitting of the model due to the small number of samples collected in the field.

After data enhancement, the number of original soybean canopy images obtained from field F1 was increased by 20 times. The number of independent validation datasets obtained from field F2 was not increased. In this study we used the Python open-cv and NumPy libraries to extract, rotate, and magnify the soybean canopy images.

#### **3. Methods**

#### *3.1. Proposed DS-SoybeanNet*

CNNs were originally proposed based on the receptive field mechanism in biology and they are a widely used deep learning technology [47]. CNNs are designed to process images with a lattice-like structure. The multilayer convolution, weight sharing, and rotational-shift invariance of CNNs make them effective in image classification and feature recognition. The deep and complex features extracted by CNNs are often used to effectively describe differences between different image categories and can be used to quickly and accurately complete classification tasks. Currently, widely used networks (e.g., ResNet50 and MobileNetV2) ignore shallow image feature information. We designed a network structure (Figure 3) that considers both shallow and deep image features to enhance the model's generalization ability. The advantage of DS-SoybeanNet is that the shallow and deep features are linked together by means of a concatenation module. Consequently, DS-SoybeanNet can extract and utilize both shallow and deep image features to improve the accuracy of soybean maturity information classifications. Figure 3 shows the architecture of DS-SoybeanNet. DS-SoybeanNet contains five convolutional layers, five flattening modules, one concatenation module, and four fully connected layers. The layers are described as follows:

**Figure 3.** Architecture of DS-SoybeanNet.

#### (1) Input layer

The input data were collected via UAV remote sensing technology in the form of soybean canopy orthophotos and were then manually labeled and cropped to produce sample data. The sample size was 108 × 108 × 3, and the sample data were divided into four types: immature, near-mature, mature, and harvested.

(2) Convolutional and pooling layers

The purpose of the convolution operation was to extract the different features of the input images. DS-SoybeanNet was designed with five convolutional layers; each convolutional layer was combined with the ReLU activation function to achieve delinearization. The pooling layers can reduce the dimensions of the feature maps by summarizing the presence of features in patches of the feature map.

(3) Flattening and concatenation layers

A flattening layer can reshape the feature maps into the dimensions required for the subsequent layers. A concatenation layer concatenates inputs along a specified dimension. (4) Fully connected layers and output layer

Four fully connected layers were designed, and dropout layers were attached to the first three layers to prevent overfitting and improve model generalization. The output of the model was soybean maturity information derived from the input images.

#### *3.2. Transfer Learning Based on InceptionResNetV2, MobileNetV2, and ResNet50*

Transfer learning is a strategy for solving similar or related tasks using existing methods and data. Many deep learning networks show effective performance in image classification and target recognition from natural images (e.g., InceptionResNetV2 [43], ResNet50 [41], and MobileNetV2 [42]). Using a pretrained model to extract the features of remote sensing images can solve, to a certain extent, the problems involved with training a network for remote sensing image scene classification when there is a lack of training data. In this study, we used InceptionResNetV2, MobileNetV2, and ResNet50 as the pretrained deep learning models for transfer learning and performance comparisons with the proposed DS-SoybeanNet model.


Transfer learning requires a low learning rate for retraining because the feature extraction module of the model already has some ability to extract image feature information after pretraining. An ideal learning rate can promote model convergence, whereas an unsuitable rate can cause training oscillations or even directly lead to the "explosion" of the loss value of the objective function. In addition to transfer learning methods based on InceptionResNetV2, MobileNetV2, and ResNet50, we also tested the performance of the AlexNet [48] and VGG16 [38] models to monitor soybean maturity.

#### *3.3. SVM and RF*

We also compared the soybean maturity information classification accuracy of our proposed DS-SoybeanNet with those of conventional machine learning models (SVM and RF). SVM is a generalized linear classifier that performs binary data classification in supervised learning [49]. Its decision boundary is the maximum marginal hyperplane solved for the learned samples, which reduces the classification problem to a convex quadratic programming problem. SVM has a low composition risk, its training is challenging to implement on large samples, and it is not ideal for solving multiclassification problems. RF is based on an integrated learning strategy, which combines multiple decision trees [50]. These decision trees are independent and unrelated to each other. Random forest uses the bagging strategy and repeated sampling to generate multiple trees. Under the bagging and bootstrap aggregation strategy, a subset of the samples are randomly selected from the dataset for training, and voting is conducted to obtain the average value as the resulting

output. This strategy significantly avoids incorrect sample data, and thus shows improved accuracy.

#### *3.4. Accuracy Evaluation*

Figure 4 shows the experimental methodology used in this work. The canopy images of field F1 were used to calibrate and validate the models, whereas all canopy images of field F2 were used to validate the models.

**Figure 4.** Flowchart of the experimental methodology.

The confusion matrix is a widely used tool for model accuracy evaluations. Table 4 shows the confusion matrix for the binary classification problem. Accuracy and recall can be obtained based on the confusion matrix. Generally, a higher accuracy and recall indicate a higher classification accuracy.

$$\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN} \tag{1}$$

$$Recall = \frac{TP}{TP + FN} \tag{2}$$


**Table 4.** Confusion matrix.

*TP*, *TN*, *FP*, and *FN* represent the true-positive, true-negative, false-positive, and falsenegative categories, respectively, in the confusion matrix (Table 4). Confusion matrices are not limited to binary classification but can also be used for multiclass classification. In this study, we used the confusion matrix, accuracy, and recall to evaluate the soybean maturity classification accuracy of the proposed DS-SoybeanNet model.

#### **4. Results and Discussion**

#### *4.1. Model Calibration and Validation Based on Field F1*

We used the calibration dataset of field F1 to train the proposed DS-SoybeanNet, AlexNet, VGG16, InceptionResNetV2, MobileNetV2, ResNet50, SVM, and RF models. Each model was trained and validated three times, and the model with the highest performance was saved. The learning rates were set to 0.0005, 0.0001, and 0.00001 for the transfer learning models (InceptionResNetV2, MobileNetV2, and ResNet50), and the number of epochs was set to 100. For DS-SoybeanNet, we analyzed the model accuracy with different convolution window sizes.

#### 4.1.1. Validation of AlexNet, VGG16, SVM, and RF

We tested the SVM and RF models for monitoring soybean breeding line maturity (Table 5) based on the validation dataset from field F1. The L0, L1, and L3 classification recall values were higher than 99% for the traditional machine learning models (SVM and RF). The classification accuracies of SVM and RF were 92.31% and 94.23%, respectively. We also tested the performance of the AlexNet and VGG16 models (Table 5). The performances of AlexNet (99.44%) and VGG16 (97.99%) were higher than those of SVM (92.31%) and RF (94.23%).


**Table 5.** Classification results of AlexNet, VGG16, SVM, and RF.

Note: \* indicates the highest accuracy.

4.1.2. Validation of Transfer Learning Based on InceptionResNetV2, MobileNetV2, and ResNet50

We also tested the performance of the three deep learning models using three learning rates. Table 6 shows the accuracies of the models using different learning rates. The performances of the three deep learning models (InceptionResNetV2, MobileNetV2, and ResNet50) were similar when using different learning rates. Our results indicate that the soybean maturity classification accuracy of traditional machine learning models (RF: 94.23%; SVM: 92.31%) was lower than that of InceptionResNetV2, MobileNetV2, and ResNet50.

There were notable differences in recall among the four labels. For example, the L2 classification recall of InceptionResNetV2 was much lower than those of L0, L1, and L3 when the learning rate was 0.0005. The same was observed for MobileNetV2 and ResNet50, which had L2 classification recalls of 69.23% and 88.46%, respectively.

**Table 6.** Classification results of transfer learning based on InceptionResNetV2, MobileNetV2, and ResNet50.


Note: Rate 1 = 0.0005; Rate 2 = 0.0001; Rate 3 = 0.00001.

#### 4.1.3. Validation of the Proposed DS-SoybeanNet Model

We tested the proposed DS-SoybeanNet model in the monitoring of soybean breeding line maturity. Table 7 shows the classification results of the DS-SoybeanNet model with the convolution kernel size set to 3 × 3, 5 × 5, 7 × 7, 9 × 9, 11 × 11, 16 × 16, and 21 × 21. The results indicate that there was little difference in performance among the seven convolution kernel sizes (with classification accuracies ranging from 97.52% to 99.19%). The results suggest that the model had the best soybean maturity classification accuracy when the convolution kernel size was set to 5 × 5 (99.17%) or 7 × 7 (99.19%). Figure 5 shows the training accuracy and loss curves of the DS-SoybeanNet with kernel sizes of 5 × 5 and 7 × 7. These results indicate that the model reached convergence at about 40 epochs. Training the DS-SoybeanNet (5 × 5) for about 100 epochs could take about 40 min and 5 s. Tables A1 and A2 show the model architecture and parameter information of DS-SoybeanNet with 5 × 5 and 7 × 7 kernels.

**Table 7.** Classification results of the proposed DS-SoybeanNet.


Note: \* indicates the highest accuracy.

**Figure 5.** Training accuracy (**a**) and loss (**b**) of the DS-SoybeanNet with kernel sizes of 5 × 5 and 7 × 7.

#### *4.2. Performance Comparison Based on Field F2*

We used the 546 images from field F2 to test the performance of MobileNetV2, InceptionResNetV2, ResNet50, SVM, RF, and the proposed DS-SoybeanNet model in monitoring soybean maturity. Table 8 shows the confusion matrices of the soybean maturity classifications of the eight models. Table 9 shows the classification results of the eight models using the data from field F2. Our results (Tables 8 and 9) indicated that the proposed DS-SoybeanNet model exhibited a higher classification accuracy than the other machine learning models.

**Table 8.** Confusion matrices of MobileNetV2 (a), InceptionResNetV2 (b), ResNet50 (c), SVM (d), RF (e), DS-SoybeanNet with kernel sizes of 5 × 5 (f) and 7 × 7 (g), AlexNet (h), and VGG16 (i).


**Table 9.** Classification results of eight models from field F2.


The conventional machine learning models (SVM and RF) exhibited the highest classification recall (100%) in the classification of immature soybeans (L0) (see Table 9). AlexNet (96.95%) showed the highest classification recall for mature soybeans (L2). As shown in Tables 8 and 9, the conventional machine learning models (SVM and RF) and deep learning

models (MobileNetV2, InceptionResNetV2, and ResNet50) showed lower recalls for nearmature soybeans (L1), which led to lower overall classification accuracies for these models. DS-SoybeanNet (84.47%) had the highest classification recall for near-mature soybeans (L1) (see Table 9).

As shown in Table 9, the ResNet50 model exhibited a high classification accuracy of 72.16%. The RF (65.38%) and SVM (63.37%) models had similar classification accuracies. The soybean classification accuracies of InceptionResNetV2 (55.86%) and MobileNetV2 (52.75%) were lower than those of the other five models. The accuracies of DS-SoybeanNet based on 5 × 5 and 7 × 7 convolution kernels, namely, 86.26% and 84.25%, respectively, were notably higher than those of the other models.

Note that the eight models' performance decreased when using the field F2 dataset to test the models (Tables 5–7 and 9). As shown in Table 9, the top 3 models were DS-SoybeanNet, AlexNet, and VGG16 when monitoring soybean maturity using the field F2 dataset. Recently, the AlexNet [48] and VGG16 [39] models have been used to detect crop maturity by many researchers. Our results show that the new DS-SoybeanNet model performed better than the AlexNet and VGG16 models in the classification of immature (L0) and near-mature soybeans (L1). For the field F1 dataset, the recall of L0 for DS-SoybeanNet was 100%, which is higher than that of AlexNet (99.69%) and VGG16 (98.74%). For the field F2 dataset, the recall of L0 and L1 for DS-SoybeanNet was 92.19% and 84.47%, which was notably higher than that of the AlexNet (L0: 79.37%, L1: 43.89%) model.

To further evaluate the fusion of deep and shallow CNN features and to explore the efficiency of the proposed DS-SoybeanNet model, we set up three ablation experiments for DS-SoybeanNet, as described below. Figure 6 shows the architectures of the CNNs used for experiments 2 and 3. Each model was trained and validated three times, and the model with the highest performance was saved.


Our results (Table 10) indicate that the soybean maturity classification accuracy in experiment 2 (only shallow image features) and experiment 3 (only deep image features) was lower than that in experiment 1. This further proved that fusing deep and shallow CNN features [44–46] may improve the performance of the model in image classification tasks.

#### *4.3. Soybean Maturity Mapping*

For soybean maturity mapping, the following three steps were carried out:


Figure 7 shows a true-color RGB image and the maturity maps calculated for field F2 using DS-SoybeanNet with 5 × 5 and 7 × 7 convolution kernels. Our results indicate that the estimated soybean maturity information for field F2 had a high accuracy. The soybean maturity information obtained from the DS-SoybeanNet model with 5 × 5 and 7 × 7 convolution kernels was similar.

**Figure 6.** Architecture of CNNs used for experiments 2 (**a**) and 3 (**b**).

**Table 10.** Classification results of three experiments with 5 × 5 and 7 × 7 kernels.


Note: Bold and \* indicate the highest accuracy.

**Figure 7.** Maturity maps. (**a**) RGB true-color image; (**b**) DS-SoybeanNet (5 × 5); and (**c**) DS-SoybeanNet (7 × 7). Note: The red rectangle indicates the soybean plot region.

#### *4.4. Advantages and Disadvantages of UAV + DS-SoybeanNet*

As soybeans mature, the leaf chlorophyll level gradually decreases, contributing to a slow change in the leaves' color from green to yellow [51,52]. Crop leaf chlorophyll variation is asynchronous among layers of leaves [52]. For example, leaves in the top layer of a soybean canopy tend to have a younger leaf age and thus turn yellow later than the leaves in the bottom layer. Consequently, green and yellow leaves appear in the soybean canopy when the soybeans are nearly mature (Figure 2). Breeding fields commonly have thousands of breeding lines with different maturation times. Thus, timely monitoring of soybean breeding line maturity is crucial for soybean harvesting management and yield measurements [5–8]. UAV remote sensing technology can be utilized to collect high-resolution crop canopy images and has been widely used in precision agricultural crop trait monitoring [14,15]. Many studies have evaluated the crop parameter monitoring performance of digital cameras and multispectral sensors on board lightweight UAVs [17–19]. In our study, we attempted to evaluate the potential of using UAV remote sensing to monitor soybean breeding line maturity. We developed DS-SoybeanNet, which can extract and utilize both shallow and deep image features, and which thus helps to provide soybean breeding line maturity monitoring that is more robust than that offered by conventional machine learning methods. DS-SoybeanNet achieved the best accuracy of 86.26% (Table A1), which was notably higher than those of the conventional machine learning models (SVM and RF). However, DS-SoybeanNet has various disadvantages compared with conventional machine learning methods, such as its long elapsed time and large size (Table 11). In machine learning, CNNs have a more complex network structure and higher computational complexity than conventional machine learning models with larger model sizes.


Table 11 shows the time required to process 1000 samples using each model and the models' sizes. The computation times of the CNN models (ranging from 6.607 s to 67.080 s) were notably higher than those of the conventional machine learning models, SVM and RF (0.003 s and 0.007 s). In addition, a high-performance device is required to calibrate CNN models. As shown in Table 11, the model sizes of DS-SoybeanNet, ResNet50, and InceptionResNetV2 were more than 300 MB. The proposed DS-SoybeanNet model had the largest size (2616 MB) compared to the other models. The DS-SoybeanNet model's large size may mean that it requires large storage when deployed on lightweight platforms (e.g., Raspberry Pi) for stationary observations. Nevertheless, DS-SoybeanNet (5 × 5) had approximately the same calculation speed as MobileNetV2 and a much higher monitoring accuracy than the other deep learning models. Therefore, we consider DS-SoybeanNet a fast and high-performance deep-learning tool for monitoring soybean maturity.

Many previous studies have used AlexNet, VGG16, Inception-V3, and VGG19 in crop maturity classifications. Faisal et al. [53] compared the performance of pre-trained VGG-19 (99.4%), Inception-V3 (99.4%), and NASNet (99.7%) in detecting fruit maturity. Atif et al. [54] used AlexNet and VGG16 to classify the maturity levels of jujube fruits (best: VGG16 = 99.17%). Sahil et al. [55] developed a method that used YOLOv3 to pinpoint the locations of tomatoes (94.67%) and used an AlexNet-like CNN model to classify

their maturity levels (90.67%). In this work, we compared the results of conventional machine learning models (SVM (92.31%) and RF (94.23%)) and six CNN machine learning models (DS-SoybeanNet (99.19%), VGG16 (97.99%), AlexNet (99.44%), ResNet50 (98.97%), InceptionResNetV2 (99.49%), and MobileNetV2 (97.52%)) in soybean maturity information monitoring based on UAV remote sensing. The accuracy results reported in this study were close to those of previous studies based on AlexNet, VGG16, Inception-V3, and VGG16. Thus, our results further proved that deep learning is a good tool for crop maturity information monitoring [48,53–56]. The combination of UAV remote sensing and deep learning can be used for high-performance soybean maturity information monitoring. However, our results indicate that selected machine learning models' performance decreased when using the field F2 dataset to test the models (Tables 5–7 and 9). We suspect that changes in the UAV's working environment—for example, varying sunlight intensity over time—led to a direct decline in the models' performance. This is perhaps not surprising because the farmland environment is affected by varying cropland conditions (e.g., irrigation, wind). Thus, future research should be focused on the factors influencing cropland images.

In this study, the performance obtained when using soybean canopy images captured by the UAV's remote sensing digital camera may have been limited by the varying sunlight intensity over time. Since DS-SoybeanNet did not normalize the image differences due to sunlight, a normalization module may improve its performance in soybean maturity classification. Therefore, future studies need to develop a normalization module to weaken the effect of the sun. Thus, more experiments with different varieties and regions of soybeans are needed to improve the generalizability of the DS-SoybeanNet model. In this study, the proposed DS-SoybeanNet was validated using only two breeding fields from a single site; thus, further validation is required from additional fields and study sites.

#### **5. Conclusions**

In this study, we designed a network, namely, DS-SoybeanNet, to extract and utilize both shallow and deep image features to improve the performance of UAV-based soybean maturity information monitoring. We compared conventional machine learning methods (SVM and RF), current deep learning methods (AlexNet, VGG16, InceptionResNetV2, MobileNetV2, and ResNet50), and our proposed DS-SoybeanNet model in terms of their soybean maturity classification accuracy. The results were as follows.


**Author Contributions:** J.Y., H.F. (Haikuan Feng), S.Z. and H.F. (Hao Feng) designed the experiments. J.Y., H.F. (Haikuan Feng), Z.S., H.X. and C.Z. collected the soybean images. J.Y. and S.Z. analyzed the data and wrote the manuscript. Y.L. and S.H. made comments and revised the manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Natural Science Foundation of China (NO. 42101362).

**Data Availability Statement:** The data that support the findings of this study are available from the corresponding author, J.Y., upon reasonable request.

**Acknowledgments:** We thank Bo Xu and Guozheng Lu for field management and data collection.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

Figure A1 shows the attention regions of different models in the soybean canopy images. Regarding interpretability, the top three models performed differently when their attention regions were visualized by means of the Grad-CAM technique (Figure A1). VGG16 models focused only on luxuriant leaves for all four categories (Figure A1). The AlexNet model showed acceptable attention regions when dealing with L0 and L1 soybean images, whereas it focused only on branches and leaves when analyzing L2 and L3 soybean images (Figure A1). Compared with AlexNet and VGG16 models, DS-SoybeanNet showed acceptable attention regions for the four categories (Figure A1). In most cases, DS-SoybeanNet was able to differentiate among the soybean images accurately based on the leaves, branches, and soil pixels, similarly to farm workers. Tables A1 and A2 show the model architecture and parameter information of DS-SoybeanNet with 5 × 5 and 7 × 7 kernels.

**Figure A1.** The attention regions of the top 3 (Table 9) models in soybean canopy images.


**Table A1.** Details of the proposed DS-SoybeanNet with 5 × 5 kernels.


**Table A2.** Details of the proposed DS-SoybeanNet with 7 × 7 kernels.

Trainable params: 228,625,876

Non-trainable params: 0

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
