1. Introduction
Annotation is a work that involves a lot of repetition when performed completely manually. Numerous artificial intelligence-related tasks require massive datasets. Although it could be possible for a person to annotate everything, doing so might not be desired. The automatic image annotation (AIA) technique is a technology that can annotate images automatically with its semantic tags. Significant features of AIA include image retrieval [
1], classification [
2], recognition [
3], and medical diagnostics [
4,
5]. AIA is able to adapt to complex patterns as more training data become available, and deploys a common strategy to annotate a new image. In order to annotate, firstly, similar images from the training set are retrieved, and then labels are ranked based on their frequency in the retrieval set. The most frequent labels in the neighborhood are thus transferred to the test image to achieve automatic image annotation [
6].
Computer vision in agriculture automation is challenging due to the considerable variation within a class of fruit species as well as their similarities in color, size, and shape. As a result, manually annotating fruit takes time and effort. The detection and accurate classification of fruits is a fascinating issue associated with enhancing the quality and economic potential of fruits, especially in an industrial field. The challenge become more significant when automating tasks such as matching fruit quality with other information such as nutritional facts and pricing [
7]. The classification of oil palm fresh fruit bunch (FFB) ripeness is significant in ensuring the quality of the oil. The ripeness of oil palm fruits dictates the quality of palm oil produced and overall marketability. The color of the oil palm FFB may be used to estimate its ripeness. Color is one of the most important characteristics for determining fruit ripeness [
8]. The color of an item is determined by the light reflected from it. Therefore, these changes serve as a foundation for image processing and analysis. The primary components of the color coding are red, green, and blue (RGB). The Malaysian Palm Oil Board [
9] has classified unripe, underripe, ripe, and overripe based on color, as shown in
Table 1.
Manually grading oil palm fruit is a typical technique for identifying its quality, but this technique is time-consuming and may result in human error [
10,
11]. It is crucial to identify the ripeness of oil palm fruit. Incorporating artificial intelligence, computer technology has provided a variety of solutions to alleviate this dependence. Many researchers have recently used artificial intelligence techniques for object detection and classification challenges, with beneficial outcomes [
12]. A reliable, fast, and accurate approach for detecting oil palm FFB ripeness is required. Therefore, AIA using a deep learning approach gives both academic and commercial applications that benefit greatly from this technique. The automatic annotating of oil palm fruit classification can assist farmers in increasing production and make work easier. Oil palm is often used to make margarine, candles, soaps, home cooking oil, and snacks, and it is Malaysia’s main agricultural commodity export [
13].
Despite the prevalent deep learning-based strategies for improving AIA framework implementation, AIA remains vulnerable to several critical issues. Among these difficulties is the need for a large number of data to make an accurate prediction. The control of inconsistent keyword distribution, as well as the selection of relevant characteristics, are the other two primary AIA problems [
14]. With the development of artificial intelligence, deep learning is widely used in image annotation. Deep learning, which encompasses artificial neural networks and computational models, is a subset of the machine learning process. Its method is designed to replicate the topology of biological neural networks and mimic the function of the brain [
14]. When the brain acquires new information, it seeks to make sense of it by comparing it to previously acquired knowledge. Deep learning decodes information using the same approach that the brain employs to categorize and identify items. Deep learning accelerates and simplifies this process, which is particularly beneficial to data scientists who are tasked with obtaining, analyzing, and interpreting huge volumes of data [
15,
16]. YOLO is a regression issue that combines target classification and localization. A YOLO network uses regression to recognize targets in an image without the need for RPN. The network has the ability of the human visual system to recognize objects instantly [
17]. Moreover, YOLO is extremely efficient and works impressively well for real-time object detection [
18]. Nowadays, there are several YOLO variants with various architectures. The original YOLO has 24 convolutional layers preceded by two fully connected layers.
Annotation strategies that are fast and simple to use are recommended for effectively overcoming such obstacles. AIA approaches aim to develop a model from the training data and then use the trained model to automatically give semantic labels to the new image. With the recent attention and development of AIA in contributing to significant tasks, this study is about enhancing automatically annotated image techniques, namely, repetitive annotation tasks. This AIA method-enhancing technique contributes to solving the problem of massive image data and thus contribute to the time consumption and human energy needed to manually annotate an image. A repetitive training task to annotate images and implement deep learning techniques will increase the accuracy and efficiency of the AIA technique. The proposed repetitive annotation technique can be applied in various deep learning methods to automatically annotate the object. However, to evaluate the effectiveness of the proposed technique, this study chooses YOLOv5 as the algorithm platform to generate accurate predictions, as YOLOv5 generates high accuracy and fast performance. The annotation of oil palm FFB using a repetitive annotation task assists farmer with identifying the ripeness of oil palm FFB from the process of harvesting until the milling process.
2. Related Works
In recent decades, computer vision researchers have successfully endeavored to invent computer systems capable of imitating this human skill. AIA is a step ahead in this approach, detecting each item in an image and assigning appropriate tags to explain its content. AIA has made breakthroughs in the agricultural industry through numerous advanced equipment systems and procedures, making this field more productive and profitable. Various works presented in the literature address the technique of AIA in agriculture. Nemade and Sonavane [
19] examined the annotation of fruit by deploying co-occurrence patterns. Identifying fruit quality categories and combination attributes that contribute to co-occurrence patterns can be accomplished with the aid of this. The findings indicate that, for the fruit categories, the co-occurrence pattern using SVM yields an overall accuracy of 97.3%. Instead of the traditional two-step procedure of acquisition followed by human annotation, Samiei et al. [
20] evaluated the value of several egocentric vision approaches for performing joint acquisition and AIA. This approach is used in automatic apple segmentation and obtained high performance in annotating images by implementing a machine learning application. The review of image annotation techniques in the agriculture field has been proposed by Mamat et al. [
21]. The study summarized the implementation of deep learning techniques, the image annotation approach, and the various applications of deep learning techniques in the agriculture industry.
A lack of accessibility to efficient categorization systems might be a problem for farmers. The texture, shape, and color of a fruit are used to grade its ripeness, which may lead to variations and inefficiency in grading. Many methods have been introduced to address the obstacle and implement deep learning techniques to categorize the ripeness of oil palm fruit. Jamil et al. [
22] established the first artificial intelligence system for oil palm fruit ripeness classification in 2009. Their AI system uses a Neuro-Fuzzy model that had been trained on color data collected from 90 images. The algorithm correctly classified 45 test photos with 73.3% accuracy [
23]. Using the deep learning method in the agriculture field, Khamis et al. [
24] proposed YOLOv3, Elwirehardja and Prayoga [
10] deployed MobileNetv1, Liu et al. [
25] deployed YOLOv4-tiny, Janowski et al. [
26] implemented YOLOv5 in detecting apples, and Herman [
27] used DenseNet to classify the ripeness of oil palm fruit in their study. The application of AIA techniques is useful in increasing the harvesting fruit process. The implementation of a harvesting robot [
28] using computer vision was used to pluck fruit from the tree used on farmers’ requirements. Furthermore, these AI-enabled computers are developed using training datasets generated by image annotation. Tang et al. [
29] reviewed all the applications of fruit-picking robots using machine vision and related developing technology that have enormous promise in sophisticated agriculture applications.
YOLO update version 4, commonly known as YOLOv4, was released in early 2020 by Alexey Bochkovskiy [
30], a Russian developer who produced the first three versions of YOLO utilizing Joseph Redmon’s [
31] Darknet architecture. Glenn Jocher [
32] and his ultralytics LLC research division, who developed YOLO algorithms using the PyTorch framework, released YOLOv5 a month after YOLOv4. YOLOv5 is simple and efficient. It requires far fewer CPU resources than other designs while producing equivalent results and performing significantly faster than previous YOLO versions [
33]. The significance of YOLOv5 makes it widely used in agricultural areas [
34,
35]. Wang et al. [
36] detected real-time apple stems by deploying YOLOv5. The study was first conducted by figuring out the hyper-parameter and using transfer learning as a training approach to achieve stronger detection performance. Next, networks with different depths and widths were trained to find the baseline detection. Subsequently, the YOLOv5 was optimized for this task by using the detection of head searching, layer, and channel pruning. The results from the study showed that YOLOv5 was easier to use under the same setting and could be chosen as the baseline network based on how well it detected things. Other applications of YOLOv5 in agriculture has been proposed in crop detection by Yan et al. [
37], classification by Wang et al. [
38], disease recognition by Chen et al. [
39], and counting by Lyu et al. [
40].
Inspired by the previous research, this study chooses YOLOv5 as the method to investigate the proposed repetitive annotation task technique, since this method is founded on excellence in the detection of an object. YOLOv5 is compared to other variations of YOLO, which are YOLOv3 and YOLOv4, to evaluate its performance.
4. Results and Discussion
Three versions of YOLO were developed to evaluate the training performance for the oil palm fruit dataset.
Table 4 shows the results obtained for all version models after the first training dataset including precision, recall,
mAP, and training time. The accuracy generated by YOLOv5 is higher compared to the other versions. In fact, the training time produced is faster, with 0.609 h compared to YOLOv3 and YOLOv4 with 0.896 and 0.876, respectively.
Figure 6a–c show the detection results of oil palm FFB for these three versions of YOLO for the classification of the ripeness of oil palm FFB. All the YOLO version’s learning rates were set at 0.01 and the model training batch size was set to 32. The value of the
IoU was set to 0.2. At the optimized rapid performance, the training epoch value was set to 100. The model was continuously trained and performed effectively. The last weight result for the model was stored after training and the test set of 1000 images was used to assess the model’s performance. Next, the test images were deployed as a new dataset and combined with the first trained dataset. This method was utilized to increase the annotation accuracy for further test images.
The YOLOv5 model, which was trained on the custom dataset, was fine-tuned. The first test dataset, consisting of 150 images, was used to classify their ripeness using previously trained oil palm fruit detection algorithms. Precision, recall, and mAP@50 were used in the comparison. Furthermore, annotation speed was measured in frames per second (FPS) for each model to investigate the feasibility of using previously trained models in real-time applications. As the test images were unfamiliar to the training models, the metrics produced on this test dataset varied from the previously calculated metrics.
The repetitive annotation method at the second annotation generated 98.7% for oil palm FFB and was then tested on another 1000 new images, and the results are shown in
Figure 7a–h. The ripeness classification of oil palm fruit was successfully automatically annotated with a bounding box and accuracy value. The algorithm also trained with 20, 40, 60, 80, and 100 epochs to examine the accuracy performance and the efficiency of the model. The results obtained for each epoch and each performance are shown in
Figure 8a–f. The TensorBoard tool was used to visualize all of the network’s statistical data. According to the figures, the accuracy value for
mAP@50 at 100 epochs achieves a training accuracy of nearly 100%. Moreover, the network from which we observed the validation loss graph decreases concurrently with the training loss. Given higher accuracy and lower losses at 100 epochs, this study fixed the training epochs at 100 epochs to generate high efficiency in object detection and annotation tasks.
Table 5 shows the outcomes of the annotation precision, recall,
mAP, and time comparison for the training, second annotation, and third annotation process using repetitive annotation tasks. Each image’s annotation time was calculated using all of the annotation methods. There were statistically significant differences between the second annotation process, training process, and second annotation task. The average detection speed for the ripeness classification for the training process, first train, and second train were 0.55 ms, 0.43 ms, and 0.3 ms, respectively. The training time for the annotation process increased to generate a better result, however, the test speed FPS outcome was faster. A faster the test dataset is significant in the application of real-time capturing images and harvesting robots.
The technique of repetitive annotation was then evaluated with the larger dataset, tested on a variety of fruits consisting of rambutan, dragon fruit, pineapple, and mangosteen. The epoch was set to 30 for the training task. The annotation results with the bounding box obtained after the second annotation process are shown in
Figure 9.The performance curves for
mAP, precision, recall, bounding box regression loss and classification loss depicted by red lines are shown in
Figure 10a–f. The outcomes of the annotation precision, recall,
mAP, and time comparison for the various fruit dataset are shown in
Table 6. The accuracy recorded for the second training for a variety of fruit was 99.5%. The accuracy obtained was better compared to the accuracy of oil palm fruit due to the large volume of the dataset used with the variety of fruit, thus producing better predictive performance. Moreover, a larger dataset enhances the probability that the data may include relevant information. There are unstable values for precision and recall. However, in the detection case, most of the cases are evaluated based on the
mAP due to its value produced by calculating the average precision for each class and then averaging across several classes. Moreover,
mAP takes into consideration both false positives (FP) and false negatives (FN), and reflects the trade-off between accuracy and recall. Based on this feature,
mAP is a good measure for most detection applications. There is no accuracy improvement for the first and second annotations, which may occur because the model eventually reaches a point where increasing a dataset will not improve the accuracy. At this point, the model can be playing around with the learning rate or epoch values. Even though there is no enhancement accuracy, the time required to generate an annotation for a new test image is decreased. This benefit may lower the time required to classify further huge numbers of images. Since the accuracy value achieved is almost 100%, this result obtains satisfactory performance shown in employing the repetitive annotation task method. The average detection speed for the fruit classification for the training process, first annotation, and second annotation recorded are 0.44 ms, 0.32 ms, and 0.25 ms.
Based on the findings, it can be demonstrated that the approach technique of repetitive annotation tasks in automatic image annotation has effectively annotated new images with high accuracy. With accurately annotated data, computer vision systems can identify and classify a variety of objects in a huge number of images. In contrast, the proposed method based on the YOLOv5 architecture performed well with the provided dataset. The classification of oil palm fruit maturity or ripeness determines the quality of palm oil produced and its overall marketability. Using this proposed method, the classification of FFB could be employed to address an obstacle in fruit processing for oil production.
5. Conclusions
In the agricultural sector, robotics, drones, and AI-enabled machines are employed to accomplish a variety of jobs. All of this equipment is based on computer vision technology. When image annotation is performed for the agriculture industry, numerous crops and plants are annotated according to model requirements, such as their ripeness and disease. Therefore, this study proposed an automatic image annotation advancement approach that employs repetitive annotation tasks to automatically annotate an object. This study’s dataset includes oil palm FFB and a variety of fruits, with a vast number of data. The YOLOv5 model, a deep learning approach, is chosen for automatically annotating images using the repetitive annotation task technique. The developed method was tested on a large dataset to determine its performance and accuracy in the annotation. The findings reveal that the trained network can correctly classify an object in an image. Furthermore, to demonstrate the superiority of the suggested technique, two alternative YOLO versions, YOLOv3 and YOLOv4, were trained and evaluated on the same dataset, and their results were compared to those obtained by the proposed approach. The comparative results demonstrated the proposed method’s efficacy and superiority for the task of fruit categorization. In addition, the repetitive annotation task method is able to increase efficiency in automatically annotating an object in an image. The accuracy for the last training dataset achieves 98.7% for oil palm fruit and 99.5% for a variety of fruit. Therefore, the design of this method is proven fast in annotating a new image and successfully achieves high accuracy. Additionally, this automated method can greatly reduce the amount of time required to classify fruit, while also addressing the difficulty caused by a massive number of unlabeled images. Other than YOLO, the proposed repetitive annotation task technique is recommended to be deployed in any deep learning technique as the advancement of deep learning evolves.