In this study, a new AFC-HPODTL model was developed for the automatic identification and classification of fruits. The presented AFC-HPODTL model comprises a series of processes namely pre-processing, DenseNet169 feature extraction, Adam optimizer, RNN classification, and AOA hyperparameter optimization.
Figure 1 illustrates the overall process of the AFC-HPODTL algorithm.
3.2. Feature Extraction
To extract feature vectors from the pre-processed fruit images, the DenseNet169 model is employed. The CNN structures have two bases, namely the convolution and classification bases. The convolution base contains three important kinds of layers, namely the convolution, activation, and pooling layers [
21]. These kinds of layers were utilized for discovering the fundamental features of input images that are named feature maps (FM). The FM was obtained by applying convolutional procedures to input images or prior features utilizing linear filtering, and integration of a bias term. Afterward, the passing of this FM was achieved with a nonlinear activation function such as Sigmoid and ReLU. Conversely, the classification base comprised the dense layer integrated with activation layer for converting the FMs to 1D vector for expediting the classifier task utilizing several neurons. Generally, more than one dropout layer is used along with the classification base to minimize the overfitting which encounters CNN structures and enhances its generalized nature. Adding a dropout layer to the classification base establishes a novel hyperparameter named dropout rate. Usually, the dropout rate is fixed in the range of 0.1–0.9.
DenseNet is the most novel addition to the NNs utilized for detection of visual objects. DenseNet169 is a process of the DenseNet group [
22]. The DenseNet group is planned for the executing image classifier. DenseNet169 is superior to the rest of the DenseNet group. Typically, in DenseNet, every image is being trained. An ImageNet image database, however, can be trained by the method and stored and tested by loading our saved method rather than ImageNet. At this point, the results of the earlier layer is obtained concatenated with the future layer DenseNet. DenseNet has been shown to reduce the accuracy from a higher level NN that is produced by vanishing gradients, while there is a longer path that exists amongst the input as well as output layers and the data obtained vanish even before attaining its target. The DenseNet goes to type of typical network. Based on the new stats, a convolution layer is more effective and accurate when it can be shorter and linked among layers nearby, the input and closer output. At this point, the DenseNet was employed for connecting all the layers that are in feed-forward fashion. Generally, a classical convolution network has
layers. Moreover,
linking exists among the
layer. That represents one link among all the layers and their following layers.
It takes L (L + 1)/2 direct connections from the networks. For all the layers as input, every presiding layer is utilized. In order to input every following layer, their FM is being utilized. Several benefits can be obtained from DenseNet. It can decrease the vanishing gradient problems. The feature propagation is strengthened, feature reprocess is encouraged, and it decreases the number of parameters. The presented structure is estimated on extremely competitive image detection benchmark ImageNet and it also utilizes the saved and load function. The combination of layers was feasible as defined by when there is an entire similarity from the FM dimensional at the time of concatenation or addition. DenseNet is divided into DenseBlocks with a different number of filters, but within the blocks, the dimensional is similar. The Batch normalization (BN) was executed by utilizing down-sampling with transition layer. That is assumed to be a vital stage by the CNN. Based on the improvement of dimensional of the channels, the number among the DenseBlocks of filter variations, The rate of growth is represented by
. It plays an important role in generalizing
layer. The count of data which is further required from all the layers are being measured by:
Here, the Adam optimizer fine-tunes the initial values of the DenseNet169 model. We employ ADAM, which is an optimization approach, as a substitute for traditional stochastic gradient descent algorithm for updating the network weight in training dataset [
23]. This is utilized for performing optimization. ADAM is derived from adagrad and it is the more adaptable technique. ADAGRAD and momentum are collectively called ADAM.
Variables
and
, where index
specifies the present trainable iteration, the parameter update in ADAM is shown as follows:
From the expression, and denotes gradient forgetting factor and second moment of gradient. a small scalar utilized for preventing division by 0.
3.3. Fruit Classification
In the final stage, the RNN model is utilized for the identification and classification of fruits. The presented technique makes use of the LSTM model, which is a special kind of RNN. In the RNN, the neurons are interconnected with one another through a directed cycle [
24]. The RNN model processes the data sequentially since it utilizes internal memory for processing a series of inputs or words. RNN implements a similar task to all the elements since the output is dependent on each preceding node input and remembered data.
Figure 2 depicts the structure of RNN. For additional processing, Equation (7) characterizes typical RNN structure where
indicates the novel state at time
denotes a function with
variable,
represents an older state (preceding state), and
signifies input vector at
time.
We alternate the Equations (7) and (8) viz., utilized to assign weight.
Given that, the activation function is denoted as
, the weight of hidden state is represented by
, and the input vector can be signified as
. The exploding vanishing or gradient problems are generated while learning of gradient model is back-propagated by using the network. A special kind of RNN model called LSTM is utilized for handling the gradient vanishing problems. The LSTM saves long-term dependency by utilizing three diverse gates in an efficient manner. The LSTM gate is explained in the following expression.
From the formula, characterizes the bias vector, is utilized for weighted, and indicates the input vector at time, whereas, , , and represent input, forget, cell memory, and output gates.
3.4. Hyperparameter Tuning
In this study, the AOA is exploited to tune the hyperparameters of the RNN model such as learning rate, number of hidden layers, weight initialization, and decay rate. The AOA algorithm is a new modern swarm intelligence approach [
25]. There are four hunting strategies of Aquila; for dissimilar types of prey, Aquila might flexibly change hunting strategy for diverse prey and later use their fast speed combined with claws and sturdy feet to attack the prey. The summary of mathematical expression is demonstrated in the following steps.
Step 1: Extended exploration (): higher soar using vertical stoop
Here, the Aquila flies higher above the ground level and widely explores the searching space, later a vertical dive is taken when the Aquila defines the prey region. Such behavior can be mathematically expressed as follows:
From the equation, signifies the optimally obtained location, and represents the average location of each Aquila in present iteration. and indicate the existing iteration and the maximal amount of iterations, correspondingly denotes the population size, and refers to an arbitrary integer that lies within the range of zero and one.
Step 2: Narrowed exploration (): contour flight with shorter glide attack
This is a popular hunting methodology for Aquila. It applies short gliding to attack the prey, afterward descending within the designated area and flying around the prey. The updated location is given in the following:
In Equation (15),
refers to an arbitrary location of Aquila,
indicates the dimension size, and
represent an arbitrary integer lies in the range of
.
signifies Levy flight function that is given in the following:
From the expression,
and
are constant values equivalent to 0.01 and 1.5, correspondingly, and
and
stand for arbitrary numbers lying within a range [0, 1].
and
represent the spiral shape in the search space that is computed in the following:
In Equation (18), is the number of search cycles within the interval of 1 and 20, is comprised of integer numbers from 1 to D dimensional size, then is equivalent to 0.005.
Step 3: Expanded exploitation : lower flight with a slower descent attacks
Here, once the prey region is commonly identified, the Aquila vertically descends to execute an initial attack. AOA uses the designated region to get closer and attack the prey. This behavior can be mathematically modeled by the following equation:
In Equation (19), represents the optimally attained location, and indicates the average value of present position. and signify the exploitation fine-tuning parameter set as 0.1, and denotes the upper and lower limits, and and refers to arbitrary value lies in the interval of .
Step 4: Narrowed exploitation : grabbing and walking prey
Here, the Aquila chase the prey with regard to escape trajectory and later attack the prey on the ground. The arithmetical expression of the behavior is given below:
In Equation (20), indicates the present location, and characterizes the quality function value that balances the searching strategy. represents the movement parameter of Aquila during tracking prey, which is an arbitrary integer lying within the range of signifies the flight slope while chasing prey that linearly reduces from 2 to 0. , and are arbitrary numbers that lie within [0, 1].
The AOA system computes a fitness function (FF) for achieving higher classifier efficiency. It defines the positive integer for demonstrating a better performance of candidate outcomes. During this case, the minimized classifier error rate can be assumed as FF provided in Equation (21).