2.3.6. CNNs' Optimization Techniques and Hyperparameters Techniques

All the above networks were too deep to train them from scratch with our data set. Therefore, we used transfer learning, which consists of taking the features learned in other contexts and using them in a new and similar problem [28]. Transfer learning is usually done for tasks where the data set has too little data to train a full-scale model from scratch. This was our case since we only had 1002 Medjool date images.

Transfer learning is commonly used in two ways: (1) pretraining model, which consists of using a pretrained model that replace its last layers with others, so that the characteristics are of the new data set and (2) convolutional network tuning, which is a strategy to tune the weights of the layers using backward propagation.

For this study, the application of transfer learning was the pretraining model. We used the pre-trained networks with ImageNet, which is a large visual database designed for use in visual object recognition [26]. We removed the final classification layer, the neuron softmax layer at the end, which corresponds to ImageNet, and instead replaced it with a new softmax layer for our image data set. A summary of the utilized CNN architectures is shown in Table 1.


**Table 1.** Characteristics of the CNNs' architectures used in this study.

### Hyperparameters

Hyperparameters are variables that define the structure of a convolutional network as well as allow it to be trained [29]. These hyperparameters are learning rate, epochs, optimizer, batch size, number of layers, and activation functions, among others, which can be adjusted to make CNN more efficient. In this study, we changed the values of the hyperparameters optimizer, learning rate, batch, and epochs. Our CNN used an optimizer Adaptive Moment Estimation (Adam) and Stochastic Gradient Descent (SGD) since those are well-known optimizers, which have good performance to classify images in CNN [30]. The learning rates for the optimizers were 0.01 and 0.001. The batch size value was 64 and 128, the epochs were 25 and 400, and the number of layers depended on the CNN architecture used (Table 1).

### *2.4. Experimental Framework*

To implement and evaluate the CNN architectures presented in Section 2.3, we used the Google Colab cloud service based on Jupyter's Notebooks, which allows the free use of Google's GPUs or TPUs, with the libraries Scikit-learn, PyTorch, TensorFlow, Keras, and OpenCV [31]. The hardware specifications used in this experiment were GPU: Nvidia-Tesla-T4; CPU: Intel(R) Xeon(R) CPU @ 2.20 GHz; RAM: ~12.78 GB available; Hard Disk: ~32.20 GB available; and the software specifications were and Operating System: 18.04.5 LTS (Bionic Beaver) with the libraries Keras 1.0.8 and Tensorflow 1.15 as a back-end.

### *2.5. Performance Evaluation*

The accuracy is the metric used to evaluate the classification performance of the architectures proposed in this paper. This metric calculates the percentage of samples that are correctly classified, and it is represented in the next equation:

$$\text{Accuracy} = \frac{\text{tp} + \text{tn}}{\text{tp} + \text{tn} + \text{fp} + \text{fn}} \tag{1}$$

where tp represents true positives, those that belonged to the class and were correctly classified in that class; tn represents true negatives, those that did not belong to the class and were correctly classified in another class; fp represents false positives, those that did not belong to the class and were wrongly assigned to the class; and finally, fn represents false negatives, those that belonged to the class and were mistakenly classified in another class.

### **3. Results**

Using the Adam parameter as an optimizer, it can be observed in Table 2 that for the evaluation with 25 epochs, the highest performance percentage was for VGG-16 (96.63% and 95.27%), with a learning rate (0.001), and for VGG-19 (93.92% and 97.30%), with a learning rate (0.01). The lowest performance was for AlexNet (64.19%) and ResNet-152 (64.17%), for a learning rate (0.001), and CNN from scratch (46.62% and 53.38%), with a learning rate (0.01). On the other hand, for 400 epochs, the highest percentage was Inception V3 (98.65%) and VGG-19 (98.75%), both for a learning rate (0.001) and for Inception V3 (98.65%) and VGG-19 (99.32%), with a learning rate (0.01). Likewise, the lowest performance was for ResNet-101 and ResNet-152 (both with 80.41%) and ResNet-101 (79.05%), for a learning rate (0.001) and, finally, AlexNet (67.57%) and CNN from scratch (43.24%), both with a learning rate (0.01). It can also be observed that the two best results were for VGG-19 (99.32% and 98.65%) for a batch (128), followed by Inception V3 (98.65%) for both batches (64); all these for 400 epochs.



Regarding the time parameter in Table 2, CNN from scratch had the lowest values for processing time. However, some values were higher than those reported by ResNet-50, ResNet-101, ResNet-152, and AlexNet architectures. Likewise, the highest processing times in 25 epochs were for ResNet-152 (25 min) and Inception V3 (13 min), with a learning rate (0.001), and for ResNet-152 and AlexNet (16 min) and ResNet-152 (15 min), for a learning rate (0.01). For 400 epochs, the highest process time was for Inception V3 (131 min) and ResNet-152 (54 min), both for a learning rate (0.001) and ResNet-152 (65 and 60 min), with a learning rate (0.01). The ResNet-152 architecture was the CNN that required the most processing time on its network for most hyperparameters. The highest processing times were not associated with high or low accuracy.

Table 3 reveals that using the Stochastic Gradient Descent (SGD) parameter as an optimizer, for an evaluation with 25 epochs, the highest performance percentage was for VGG-19 (87.16%) and VGG-16 (87.16%), with a learning rate (0.001), and for Inception V3 (92.56% and 91.89%), with a learning rate (0.01). While the lowest performance was for AlexNet (52.70%) and CNN from scratch (51.35%), for a learning rate (0.001), and for ResNet-50 and ResNet-152 (both with 45.94%) and ResNet-50 (45.94%), with a learning rate (0.01). On the other hand, for 400 epochs, the highest percentage was obtained by Inception V3 (95.94%) and CNN from scratch (94.59%), both for a learning rate (0.001), and VGG-19 (94.59%) and Inception V3 (95.27%), with a learning rate (0.01). Likewise, the lowest performance was obtained by AlexNet (56.08% and 60.81%), for a learning rate (0.001), and, finally, ResNet-50 (50% and 52.03%) with a learning rate (0.01). It can also be observed that the two best CNN architectures turned out to be CNN from scratch (94.59%) and Inception V3 (95.27%) for a batch (128), followed by Inception V3 (95.94%) and VGG-19 (94.59%) for a batch (64).


**Table 3.** Accuracy evaluation of eight CNNs' architectures, changing the batch's hyperparameters values, learning rate, and epochs, using the Stochastic Gradient Descent (SGD) parameter as an optimizer.

> Table 3 shows that, for the time parameter, there was no defined pattern to identify the architecture that presented the lowest processing time in all its hyperparameters. Low values mostly appeared for CNN from scratch. However, the lowest value was for the ResNet-101 model with 8 min, in epochs (25), batch (64), and learning rate (0.01). Likewise, the accuracy of CNN from scratch was better than that reported by ResNet-50, ResNet-101,

ResNet-152, and AlexNet architectures. The highest processing times in 25 epochs was for VGG-16 (14 and 23 min), with a learning rate (0.001), and for ResNet-152 (14 min) and ResNet-101 (69 min), for a learning rate (0.01). For 400 epochs, the highest process time was for ResNet-152 (58 and 115 min) for a learning rate (0.001) and (58 and 54 min), with a learning rate (0.01). Finally, ResNet-52 architecture required the most processing time for most hyperparameters. The highest processing times were not associated with high or low accuracy.

### **4. Discussion**

Convolutional Neural Networks (CNNs) are used in several agriculture areas such as leaf and plant disease detection, land cover classification, crop type classification, plant recognition, segmentation of root and soil, crop yield estimation, fruit counting, obstacle detection in row crops and grass mowing, and identification of weeds, to mention a few [32,33]. For example, in Mohanty et al. [34], they presented the training of CNN architectures AlexNet and Google Net with a PlanVillage image data set to detect 26 types of diseases in 14 kinds of crops. Their results showed an accuracy of 99.35% to identify healthy and diseased plants. Meanwhile, Rahnemonfar and Sheppard [35] proposed using the CNN architectures' inception and Residual Networks (ResNet) architectures to estimate the yield of a tomato plant using synthetic images. Their results indicated that, with 91% accuracy, they can evaluate the yield.

Another example was presented in [36], where authors proposed training several convolutional networks to identify four fruits (mango, orange, apple, and banana). They were classified into two categories: fresh and rotten. The best performing models were Inception version 3 and the Visual Geometric Group of 16-layer (VGG-16) architectures, which received the learning transfer. Their results showed identification and classification percentages of 90% accuracy. A similar study was presented in [13], where the use of a VGG-16 network to classify vegetables and fruits was proposed. A total of 26 categories were classified: pumpkin, celery, cauliflower, pineapple, pomegranate, grapefruit, banana, cucumber, broccoli, onion, carrot, etc. The authors claimed to have 95.6% accuracy in classifying these fruits and vegetables. Regarding dates, we identified research works that proposed using CNNs to sort among dates or to detect among their different maturity stages [8,9,16].

Currently, determining the stage of maturity in the Medjool date using traditional image processing and machine learning methods is complicated. This is because these methods are trained to extract features in various cultivars such as their appearance, color (associated with the maturity stages), shape, and texture [7,16]. However, there are no studies where a feature extraction or predictive model for sorting Medjool dates that we are aware of. Furthermore, recent models cannot determine sorting Medjool because this cultivar is harvested, sorted, packaged, and consumed in its Tamar stage.

To contribute with a model that may be useful in sorting the Medjool date through images, we compared the performance of eight CNN architectures in this study. Additionally, some hyperparameters' values were modified, and transfer learning was used to identify and propose the use of CNN with the best precision.

As shown in Table 2, our findings indicated that when we use an Adam optimizer, the VGG architectures show the best accuracy, with the VGG-19 model that reached the highest percentage of accuracy with 99.32%. Likewise, the ResNet and CNN from scratch architectures showed the lowest performance percentages; the CNN from a scratch model achieved the most insufficient precision, with 43.24%. The highest average percentage generated among the eight architectures was 89.53%, using the combination of batch (64), learning rate (0.01), and epochs (400), with an average time of 48.71 min, while the lowest was 75%, combining a batch (64), learning rate (0.001), and epochs (25), with an average time of 12.25 min.

Likewise, Table 3 indicates that no architecture showed the best accuracy when we used an SGD parameter as an optimizer. However, the ResNet-50 architecture showed

the lowest performance percentages, with batch (64 and 128) and learning rate (0.001). The highest percentage generated among the eight architectures was 80.57%, using the combination of batch (64), learning rate (0.001), and epochs (400), with an average time of 38.13 min, while the lowest was 66.21%, combining a batch (128), learning rate (0.01), and epochs (25), with an average time of 21.63 min.

It was noticeable that, if the number of epochs for all models was increased, the percentage of accuracy and required processing time also increased. Likewise, we observed that the highest processing times corresponded to the ResNet-152 architecture, which could be associated with the fact that this architecture had the highest number of layers. However, none of its precision was higher than 85% performance.

The optimizer can help us minimize the error function that allows us to conform to the training set examples. In this study, the accuracy was higher for Adam than for SGD.

Several studies have focused on identifying the CNN that offers the best precision for selecting dates from cultivars in their various stages of maturity [8,9,18]. However, there are currently no reported studies that use any CNN to classify the date cultivar Medjool.

Table 4 compares similar studies to ours, where CNNs' architectures with the best performance have been reported. Nasiri et al., 2019 [9], only worked on VGG-16 with two hyperparameters, obtaining the highest accuracy of 96.98%. Likewise, Altaheri et al., 2019 [8], worked on two CNN with transfer learning and fine-tuning, modifying three hyperparameters twice, obtaining the highest percentage for VGG-16 with an accuracy of 97.25%. Faisal et al., 2020 [16], compared four CNNs' performances, evaluating four hyperparameters, resulting in ResNet as the best model, with an accuracy of 99.01%. Finally, our study evaluated eight CNNs' performances, using transfer learning and modifying four hyperparameters twice, resulting in the VGG-19 model with the highest performance, with 99.32% accuracy.

**Table 4.** Comparison of studies that report CNN architectures in the detection of various stages of maturity in the date palm fruit.


One aspect to consider in this comparison is that the Medjool date is only consumed in its Tamar stage. Therefore, this study only used two stages for its sorting. The number of images was lower compared to the rest of the studies. However, in our work, the percentage of accuracy was higher due to the application of transfer learning and modification in various hyperparameters, which influence architectures' performance [37,38].

In our study, resulting from choosing the hyperparameters epochs (400), batch (128), optimizer (Adam), and a learning rate (0.01), we identified that VGG-19 architecture had the best performance. Likewise, this architecture could be included as part of the software that controlled a robotic mechanism to support the date palm farmer in an automated system of sorting ripe fruits.

### **5. Conclusions**

This study evaluated the precision and processing time of eight CNN architectures. Seven of them were pretrained by an extensive image database designed for object recognition (ImageNet). These models were named VGG-16, VGG-19, Inception V3, ResNet-50, ResNet-101, ResNet-152, and AlexNet, which received transfer learning when their last classification layer was replaced. Additionally, a model that learns from scratch was used, that is, without obtaining learning.

All CNN architectures were evaluated by modifying the epochs, batch, optimizer, and learning rate hyperparameters since these parameters have been reported to have positive effects on the performance of convolutional networks. The results indicated that the CNN with the best performance for the sorting Medjool date was the architecture of the VGG group, which used the Adam optimizer. From these architectures, the VGG-19 model was the one that reported the best accuracy, with 99.32%. Likewise, the ResNet group architectures were the ones that reported the lowest performance using the same optimizer, the ResNet-152 model, which reported the most insufficient accuracy, with 64.17%. The use of the SGD optimizer did not have a significant effect on obtaining high accuracies.

Finally, it will be necessary to continue working on the best accuracy and the shortest processing time, with the modification of other hyperparameters. The inclusion in the evaluation of different fruit attributes, such as its size, gives it a high commercial value. It is essential in the packing process of this fruit.

**Author Contributions:** Conceptualization, B.D.P.-P. and J.P.G.V.; methodology, J.P.G.V. and R.S.-T.; software, B.D.P.-P.; validation, B.D.P.-P., J.P.G.V., and R.S.-T.; formal analysis, R.S.-T. and J.P.G.V.; investigation, J.P.G.V. and R.S.-T.; resources, B.D.P.-P.; data curation, B.D.P.-P.; writing, original draft preparation, J.P.G.V. and R.S.-T.; writing, review and editing, R.S.-T.; visualization, B.D.P.-P. All authors contributed to its editing and approved the final draft. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Dataset is available on https://data.mendeley.com/datasets/872xk9 npmz/1.

**Acknowledgments:** We would like to thank CONACyT for the scholarship granted to the first author (CVU-409617). We would like to thank Ramiro Quiroz, of the company Palmeras RQ. We accessed his plantation to take the photographs. We thank Emmanuel Santiago Durazo for his contributions to the work done.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


*Article*
