1. Introduction
Nowadays, the food industry uses vision systems for fruit and vegetable classification. These systems commonly use conveyors to transport fruits through the sorting system [
1]. The classifier system uses an interface to communicate with the actuators that perform the separation task. Generally, these systems determine the size, color, ripening, and quality of fruits [
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12].
Image processing techniques are used in agriculture to detect diseased leaf, stem, and fruit, to quantify the affected area by disease, to estimate or evaluate the productivity, among others, to find the shape of the affected area, to count or calculate the number of fruits entering in a sorting machine, to determine the color of the disease affected area, and to determine the size and shape of fruits. In this article, we propose a vision system for citrus fruits sorting by color and size, implemented on an FPGA device. The proposed hardware implementation on an FPGA of a classifier and vision tasks requires minimal resources, resulting in high performance. We coupled this implementation to an existing mechanical automatic citrus sorting machine.
We explored the hardware implementation of DTs classifiers, previously described in [
13,
14,
15]. DTs require a minimum number of combinational and sequential components for their implementation. In parallel, we implemented an algorithm that processes a text-mode representation of a DT obtained from a dataset to generate the corresponding modules in Hardware Description Language (HDL) for later inclusion in a system that processes video in real time. We extracted the text mode representation of DTs using the Weka too, and we trained each DT using features obtained from images taken from citrus fruit videos captured in the cabin of the citrus separation system. The system allows the remote monitoring of the citrus fruit sorting machine’s performance.
From the implementations found in the state of the art, no system allows processing multiple lines with a low power consumption device. The most appropriate approach to achieve this goal is to find a classifier to train, among which DTs meet this objective, and a computing device with enough computational resources to achieve high-performance citrus fruit classification. It is also necessary for a classifier whose structure to represent the knowledge is relatively simple, discarding any classical or modern neural network from this objective.
The knowledge gap that this paper addresses is the dedicated digital hardware implementation of simple machine-learning techniques in complex problems, specifically in the issue of classifying citrus fruits by size and color, since deep-learning algorithms require vast amounts of images and classical algorithms are in disuse. We specifically chose a classifier based on DTs because their construction using digital components requires simple elements, and they represent a form of knowledge that is understandable by humans, facilitating debugging and extension.
We designed the citrus sorting system with low- and medium-sized industries in mind, where monetary resources are more constrained than the large industry. This work reports an implementation that adequately balances cost and performance. It is possible to achieve good accuracy using industrial PCs with GPUs, but power requirements and prices would increase. By using microcomputers, it is possible to lower the cost, but the classification accuracy and processing speed would be affected. The solution based on FPGAs manages, on the one hand, to have an intermediate cost between a workstation with GPU and a microcomputer. Existing fruit separation systems involve the processing of a single grading line. This work proposes a flexible solution that works simultaneously on multiple processing lines without compromising performance. Generally, one computer processes each grading line. To achieve this goal, we explored classifier simplification, which was possible by using a DT classifier trained with different datasets captured under the operating conditions of the citrus separator machine. The system guarantees real-time performance since the system processes each citrus grade line by a dedicated hardware block. Roughly speaking, FPGAs spend much less energy when compared with a conventional PC processor. Related to processing speed, even when the FPGAs work at lower frequencies than PC processors, the FPGAs use minimal hardware with low latencies and high throughputs.
The proposed system can handle multiple processing lines without compromising performance. The processing lines in the current implementation are limited to two video inputs available in the FPGA kit. The proposed approach can extend the processing capacity by replacing the FPGA with a higher-capacity device and adding multiple video inputs In a PC-based solution, the performance is low because of the workload on the processor and memory space for image storage. The designer must add one PC for each processing line to keep the same performance or use a more powerful PC.
This paper is structured as follows. In
Section 2, related work is reviewed. In
Section 3, the proposed system is described. In
Section 4, the results are explained. Finally, in
Section 5, conclusions are given and future work is addressed.
2. Related Work
In this section, we reviewed related works found in the literature. In
Section 4.6, we compared the system proposed in this work to the most relevant reviewed systems.
Video streaming and image processing are useful tasks in classification systems. However, these tasks are computationally expensive in terms of time and computational resources. Video streaming provides information about the environment or gives useful visual features in visual quality inspection. Image processing techniques use these visual features as input to classification or clustering algorithms. Many IoT applications, such as surveillance video, healthcare, face recognition, human activities understanding, and farming, use video sensors [
16,
17]. Some approaches use low-cost and low-power machine vision systems [
16]. However, most of these embedded video processing platforms exhibit low performance in real-time classification tasks due to their low computational power and bandwidth. In this case, GPU-based or FPGA-based approaches are suitable.
Machine vision-based fruit sorting systems are capable of replacing labor work for the inspection of fruit size. Seema et al. [
18] reviewed fruit grading and classification systems. The authors summarize the most used features to identify the degree of rotting and ripening, the kind of fruits, and the machine-learning (ML) models used by the reviewed algorithms. They found two approaches: the first is multiple fruit identification systems focused on fruit differentiation, but the fruit quality is discarded. The training of these systems requires thousands of images of a series of different fruits. The second one, the specific fruit classification system, uses large image sets of a single fruit type to train and test the sorter. Although the first approach is more general, the second one is more suitable for single-type fruit sorting machines.
Concerning multiple fruit recognition approaches, we found several methods in the literature. Blasco et al. in 2003 [
7] proposed a system to estimate the quality of oranges, peaches, and apples using four attributes: size, color, stem location, and detection of external blemishes. The proposed segmentation is based on Bayesian discriminant analysis, performing the correlation of fruit color using the colorimetric index values. The authors tested the classification system with apples, obtaining a blemish detection accuracy of 86% and size accuracy of 93%. Seng and Mirisaee [
19] proposed an image retrieval method that combines classification models obtained from three features: color-based, shape-based, and size-based features to increase the accuracy of recognition. The proposed system uses the nearest neighbors classification to recognize 15 different fruits from their feature values, obtaining an accuracy of 90%. Jana et al. [
12] proposed a system that preprocesses images to separate the fruit in the foreground from the background. Their system extracts texture features from the Gray-level Co-occurrence Matrix (GLCM) and statistical color features from the segmented image. The system creates a single feature descriptor from the extracted features and trains a Support Vector Machine (SVM) classification model. The generated model predicts the category for an unlabeled image from the validation set. The proposed method obtains an 83.33% overall accuracy. De Goma et al. [
11] proposed a system to recognize fruits regardless using the K-nearest neighbor clustering based on statistical values of the color moments, GLCM features, and area by pixels for the size and shape roundness. They used a dataset with 15 different categories with 2633 images, obtaining an 81.94% accuracy.
Concerning orange fruit classification systems, we found several methods in the literature. Subramaniam and Balasubramanian [
20] used parallel computing techniques on a multi-core processor to grade citrus fruits. They used the Task Parallel Library to add parallelism and concurrency to applications. They extracted geometrical features such as diameter, perimeter, area, and circularity under a laboratory-simulated real-time condition without a suitable conveyor. The system demonstrated the ability to estimate the diameter of the fruit with 98% accuracy. Sirisathitkul et al. [
21] proposed an image processing technique to perform Chokun orange maturity sorting. In the training step, they captured images of 90 Chokun oranges of three different degrees of maturity with a color digital camera under normal illumination conditions. They performed an RGB to HSV color transformation for each image, using the hue colors to generate a set of decision rules. They tested the proposed model using 50 Chokun orange samples, obtaining a 98% accuracy. Chen et al. [
22] proposed an orange sorting detection by obtaining four main features of the oranges, including fruit surface color, size, surface defect, and shape using image processing. They trained a BackPropagation neural network with these features. They report a sorting accuracy of 94.38%. Peter et al. [
23] proposed an automatic system for disease identification in infected fruits images. The approach is evaluated on three diseases of the navel orange fruits, namely Citrus canker, Citrus melanose, and Citrus black spot, achieving 93% accuracy using global color histogram, local binary patterns, and Halarick texture features. Patel et al. (2019) [
24] reported a system for orange sorting and detecting the bacteria spot defect based on four features: shape, size, color, and texture. They evaluated the SVM classification, obtaining a 67.74% overall accuracy. Behera et al. [
25] proposed a system to grade oranges and identify deformities. They used a multi-class SVM with K-means clustering to classify orange diseases with an accuracy of 90%, and they used fuzzy logic to compute the degree of disease severity. Ifmalinda and Putri [
26] proposed an orange sorting program based on diameter and skin color. They used diameter and RGB index to generate a set of rules to classify oranges, obtaining an overall accuracy of 87%. Wang et al. [
27] proposed an algorithm to predict the sugar content of citrus fruits and performed a classification of the sugar content using light in the visible spectrum. Similar approaches for sorting apples can be found in [
5,
9,
10,
28]; for tomatoes in [
23,
29]; for sorting watermelons in [
30]; for palm oil fruit sorting in [
31]; and dates in [
32].
Related to high-performance implementation using FPGA, there are few works. Martínez-Usó et al. in 2005 [
8] proposed an unsupervised segmentation algorithm based on a multi-resolution applied to multi-spectral images of fruits as a quality assessment application. Lyu et al. [
33] proposed a citrus flower recognition model based on YOLOv4-Tiny lightweight neural network using software and hardware co-design patterns. They generated the dynamic link library and integrated it into the FPGA-based embedded platform. The recognition accuracy of the citrus flower recognition model deployed on the embedded platform for flowers and buds was not less than 89.30%, and the frame rate was not lower than 16 FPS.
Zhenman et al. proposed an analytical model to compare FPGAs and GPUs performance. FPGAs can provide comparable performance or even achieve better performance than a GPU while consuming an average of 28% of the power required by a GPU for most Rodinia Kernels. Even when FPGAs use a lower clock frequency than GPUs, the FPGA usually achieves a higher number of operations per cycle in each computing pipeline due to its small pipeline initiation interval and considerable pipeline depth [
34]. Zhang et al. proposed an FPGA acceleration of the generalized sparse matrix–matrix multiplication, an essential computing kernel for many algorithms in artificial intelligence [
35]. They evaluated a Huffman tree scheduler on 20 real-world benchmarks, finding that the energy efficiency and performance are increased by 6× and 4×, respectively. Qasaimeh et al. assessed the energy efficiency of CPU, GPU, and FPGA implementation of computer vision kernels. They benchmarked algorithms for all the computer visions based on the OpenVX standard of GPU and FPGA platforms. Many simple seeds implemented on GPUs obtain a 1.1–3.2× energy/frame reduction. Still, the FPGA outperforms GPUs when complex ones that require a complete vision pipeline are necessary by obtaining a 1.2–22.3× energy/frame reduction [
36]. Guo et al. performed a state-of-the-art review of neural network accelerator designs. They concluded that FPGAs achieve more than 10× better speed and energy efficiency than state-of-the-art GPU [
37]. Sanaullah and Herbordt evaluated the hardware implementation of 3D Fast Fourier Transforms (FFTs) using OpenCL as Hardware Description Language. Their performance achieves an average speedup of 29× versus the current CPU and 4.1× versus the recent GPU [
38]. Fowers et al. compared the performance and energy of sliding window applications when implemented on FPGAs, GPUs, and multicore devices. They concluded that FPGAS provides a significant performance increase in most cases, with speedups up to 11× and 57× compared with GPUs and multicores [
39].
Recently, there have been efforts to use deep learning as an effective technique for fruit sorting. In [
4], the authors propose a real-time visual inspection system for sorting fruits using a classification model obtained from state-of-the-art deep-learning convolutional networks. They test their system using apples and bananas. During real-time testing, the system obtained an accuracy of 96.7% for apples and 93.8% for bananas. For the training stage, they used a database composed of 8791 apples and 300 bananas of both healthy and defective fruits. Kukreja and Dhiman, in 2020 [
40], proposed a dense CNN algorithm to detect the apparent defects of citrus fruit. They generated a first model without preprocessing and data augmentation on 150 images, achieving an accuracy of 67%. In a second model, the applied data augmentation and preprocessing after the model generation using 1200 images attained an accuracy of 89.1%. Sa et al. in 2016 [
41] proposed an approach to fruit detection using deep convolutional neural networks, with application to automated harvesting using a robotic platform, completing fruit detection using imagery obtained from two modalities: color (RGB) and near-infrared (NIR). They compute both precision and recall performances, improving from 80.7% to 83.8% for the detection of sweet peppers. They created a model to detect seven fruits, which took four hours to annotate and train the new model per fruit. Leelavathy et al., in 2021 [
42], proposed a CNN-based orange fruit image using a binary cross-entropy loss function, obtaining an overall accuracy of 78.57%. Hossain et al., in 2019 [
43], proposed a framework based on two different deep learning architectures. The first is a proposed light model of six convolutional neural network layers, while the second is a fine-tuned visual geometry group-16 pre-trained deep learning model. They used two color-image datasets to evaluate their proposed framework. The first dataset contains clear fruit images, while the second dataset contains fruit images with noise, illumination, and pose variations, which are much harder to classify. Classification accuracies of 99.49% and 99.75% were achieved on dataset 1 for the first and second models, respectively. On dataset 2, the first and second models obtained accuracies of 85.43% and 96.75%, respectively.
Recently, existing solutions have used deep learning approaches to classify defects in fruits. In [
43], the authors propose a system that classified orange images based on fresh and rotten using a CNN, with SoftMax classifier, using 800 orange images, achieving an accuracy of 78.57%. In [
2], the authors generated a dataset of eight different classes of date fruits and compared several CNN models, such as AlexNet, VGG16, InceptionV3, ResNet, and MobileNetV2; MobileNetV2 architecture achieved an accuracy of 99%. In [
44], the authors present a deep-learning system for multi-class fruit and vegetable categorization based on an improved YOLOv4 model that first recognizes the object type in an image before classifying it into one of two categories: fresh or rotten. Compared with the previous YOLO series, the proposed method obtained higher average precision than the original YOLOv4 and YOLOv3, with 50.4%, 49.3%, and 41.7%, respectively. In [
45], the authors proposed an automatic image annotation to classify the ripeness of oil palm fruit and recognize a variety of fruits, trained with 100 images of oil fruit palm and 400 images of various fruits. From the previous systems, not many focus on classifying citrus fruits by color or size but focus specifically on fruit defects, which is a different problem than the one solved by the work reported in this paper.