4.1. Crop Harvest Time Prediction Model
In recent years, there have been no comprehensive studies on the distinguishing features affecting maturity (based on Elsevier Science, Springer-Verlag, EBSCO, ProQuest, and Google search results), but there has been research on individual features affecting growth; for instance, (1) Hatfield and Prueger [
28] showed that temperature and accumulated temperature significantly affect plant growth; (2) Punia et al. [
29] showed that solar radiation affects plant growth; (3) Ndamani and Watanabe [
30] showed the effect of rainfall on plant growth; (4) Hirai et al. [
31] showed the effect of humidity on plant growth; (5) Gardiner et al. [
32] illustrated the relationship between wind speed and plant growth. According to the aforementioned literature, the considered features include cumulative accumulated temperature (according to Ref. [
1]), accumulated temperature (according to Ref. [
1]), accumulated sunshine hours (according to Ref. [
2]), accumulated total sky radiation (according to Ref. [
2]), accumulated radiation (according to Ref. [
2]), cumulative rainfall (according to Ref. [
3]), cumulative precipitation hours (according to Ref. [
3]), average humidity (according to Ref. [
4]), and average wind speed (according to Ref. [
5]). The influential features associated with the crop harvest time prediction model were compiled as follows: (1) cumulative accumulated temperature, (2) accumulated temperature, (3) accumulated sunshine hours, (4) accumulated total sky radiation, (5) accumulated radiation, (6) accumulated rainfall, (7) cumulative precipitation hours, (8) average humidity, and (9) average wind speed.
The data from three days of lag for the 9 selected features mentioned above are considered as input (After testing for n days of lag, the input variables use the values of previous observations (t − n, …., t − 2, t − 1) at time t. These previous observations are called lags one, two, and three. The RMSE (=0.565) of three days of lag for LSTM is the smallest, and it is selected for harvest time prediction.). A total of 27 variables are used as input x
t at time t for LSTM, and the output variable y
t at time t is the harvest time (the number of days until harvesting from time t). Details of the variables are compiled in
Table 1.
The data from the previous three days for the 9 selected features mentioned above are considered (after testing, three days is the best parameter). A total of 27 variables (features) are used as input xt at time t for long short-term memory (LSTM) (i.e., cumulative accumulated temperature one day ago, cumulative accumulated temperature two days ago, cumulative accumulated temperature three days ago, accumulated temperature one day ago, accumulated temperature two days ago, accumulated temperature three days ago, accumulative sunshine hours one day ago, accumulative sunshine hours two days ago, accumulative sunshine hours three days ago, accumulated total sky radiation one day ago, accumulated total sky radiation two days ago, accumulated total sky radiation three days ago, accumulated radiation one day ago, accumulated radiation two days ago, accumulated radiation three days ago, accumulated rainfall one day ago, accumulated rainfall two days ago, accumulated rainfall three days ago, cumulative precipitation hours one day ago, cumulative precipitation hours two days ago, cumulative precipitation hours three days ago, average humidity one day ago, average humidity two days ago, average humidity three days ago, average wind speed one day ago, average wind speed two days ago, and average wind speed three days ago), and the output variable yt at time t is the harvest time (the number of days until harvesting from time t).
The structure of LSTM is shown in Equations (1)–(6). There are four components in the LSTM: a forget gate (f
t), an input gate (i
t), an output gate (o
t), and a memory cell (
). This cell retains values over time intervals, and the three gates are responsible for controlling the flow of information into and out of the cell. At time t, the cell is fed with input x
t and the hidden state h
t−1 at time t − 1. The forget gate f
t, the input gate i
t, the output gate o
t, and the memory cell
are calculated as follows:
where 𝜎 and tanh are the sigmoid and hyperbolic tangent activation functions, respectively. The weights and biases of the input gate, output gate, forget gate, and memory cell are denoted by W
i, W
o, W
f, and W
c and b
i, b
o, b
f, and b
c, respectively.
Then, the output cell state C
t and the hidden state h
t at time t can be calculated as follows:
There are five layers for the harvest time prediction in the Keras sequential model (
Figure 1): Input layer, one LSTM layer, Dropout layer, Dense layer, and Output layer. In the implementation of the model, the input data x
t in the Input layer include the 27 variables mentioned above. The LSTM layer is adopted with 30 hidden nodes. The activation function used in this layer is a rectified linear unit. A dropout mechanism in the Dropout layer is applied to the inputs of the Dense layer to prevent over-fitting, and the dropout rate is set to 0.4. The Dense layer with a linear activation function is used to return a single continuous value. The adaptive moment estimation function is used in the optimizer parameter. This function defines how the weights of the neural network are updated. The output data y
t in the Output layer represent the predicted harvest time (the number of days until harvesting from time t).
4.2. Feature Selection Method for the Crop Harvest Time Prediction Model
After the crop harvest time prediction model is determined, the feature (variable) selection method is used to remove some irrelevant input variables in order to improve the accuracy of the prediction. Since the search for the best solution for all variable combinations is a combinatorial problem (complexity is 2
N, where N is the number of all input variables), the proposed method is a hybrid search method integrating a particle swarm search and a large neighborhood search (LNS, a variant of variable neighborhood search (VNS)). First, the parameters are set. Then, the particle position and velocity at the first iteration (generation) are generated (there are psize particles). The particle adaptation values for psize particles are calculated. LNS is executed for the new iteration (generation), and, then, Pbest and Gbest are updated. Has iter reached the default value (=isize)? If it has, (1) stop; otherwise, (2) update the particle velocity and position, (3) execute LNS for Gbest, (4) calculate iter = iter + 1, and (5) go back to calculate the particle adaptation values (
Figure 2).
4.2.1. Set the Parameters
Set the initial parameters: current iteration (generation) count pointer, iter (=1); current particle pointer, pindex; upper iteration (generation) count, isize; number of particles, psize; inertia weight, w; learning factors c1 and c2; number of variable skips, LNSsize; and M.
Generate the particle position and velocity:
(representing the pindex particle at the first iteration for 1 ≤ pindex ≤ psize) is expressed as follows: (v
1, v
2, …, v
i,…, v
N) there are N dimensions; for 1 ≤ i ≤ N, v
i can be 0 (variable combination without variable i) or 1 (variable combination with variable i), and psize particles are chosen randomly as
(the position). The velocity
is randomly selected from the range U[−V
max, V
max], and V
max is set according to 15% of the range of variables in each dimension [
33].
4.2.2. Calculate the Particle Adaptation Values
The variables in the specific particle (for 1 ≤ iter ≤ isize, 1 ≤ pindex ≤ psize) are used as the inputs for long short-term memory (LSTM). After training and testing from real data, the root mean square error (RMSE) for is calculated and adopted as the particle adaptation value.
4.2.3. Execute LNS for the New Iteration
M particles are randomly selected from the new iteration (generation) of psize particles. LNSsize variables are arbitrarily selected for diversity for each selected particle in the N-dimensional variables. The value of the relevant variable is changed to 1 if it is 0 or to 0 if it is 1. We update these variables to generate a new solution, X. If the adaptation value of X is better than , then replace ( = X), and if the adaptation value of X is worse than, then replace ( = X) with the probability of .
4.2.4. Update Pbest and Gbest
We determine whether the adaptation value of each particle (for 1 ≤ pindex ≤ psize) in iteration (generation) iter is better than Pbestpindex (set Pbestpindex = if it is the first generation of particles). If it is, then replace Pbestpindex; thereafter, determine whether the Gbest update condition is met (if it is the first generation of particles, then set Gbest = best solution for all first-generation particles). If the particle is inferior to Gbest, then Gbest remains unchanged; if the particle is not inferior to Gbest, then the particle replaces Gbest, the particle updates the velocity and position, and LNS is performed for the particle (see Execute LNS for the New Iteration).
Has iter reached the default value (=isize)?
Has iter reached the default value of isize? If it has, (1) stop; otherwise, (2) update the particle velocity and position, (3) execute LNS for Gbest, (4) calculate iter = iter + 1, and (5) go back to calculate the particle adaptation values.
Update the particle position and velocity
Update the particle position
and velocity
according to the current position and velocity of each particle in the iteration (see Equations (7) and (8)) and check whether the combination of variables is out of range (0 and 1 for each variable). If the velocity is out of range, then the out-of-range velocity value is expressed as the maximum (out of maximum) or minimum (out of minimum) of the range. If the particle position (variable) out of
has a non-integer variable (between 0 and 1), the upper limit (1) or lower limit (0) is used according to the nearest-distance principle:
4.2.5. Set the Related Parameters
Related parameters: number of particles, psize; number of iterations, isize; inertia weight, w; learning factors, c
1 and c
2; LNSsize; and M. The study carried out by Rabbani et al. [
33] was used to set the inertia weight w = 0.975 and the learning factors c
1 = 2 and c
2 = 2. Other values were determined experimentally based on the minimal RMSE criterion. The number of particles (psize) is tried from 50 to 150 (50, 100, and 150); the number of iterations (isize) is attempted from 100 to 300 (100, 200, and 300); LNSsize is attempted from 3 to 7 (3, 5, and 7); and M is attempted from 10 to 30 (10, 20, 30). The following hyperparameters of the LSTM prediction model are used in this paper after the experiments: activation function: rectified linear units (Relu), optimizer: Adam, number of hidden layers: one, hidden nodes: 30, epoch: 50, batch size: 10, learning rate: 0.001, and dropout rate: 0.4. The following values are determined after the experiments: psize = 50, isize = 200, LNSsize = 5, and M = 10.