**3. Methodology**

*3.1. The Overview of the Proposed Deep Learning-Based Framwork*

Operators of different proficiency levels can account for grea<sup>t</sup> differences in productivity and fuel efficiency. Consequently, deep learning is used to predict the throttle value of wheel loaders based on the driving data of experienced operators so that the driving process of wheel loaders conforms to the driving decisions of experienced operators to meet the vehicle's operational requirements, even in sophisticated driving environments, while ensuring productivity and fuel efficiency. Meanwhile, based on the temporal features extracted by LSTM, the BP neural network is also added to predict the state of the wheel

loader, which does not make any assumption regarding its internal behavior and learns the impact of the environment on the state from the data. The flowchart of this proposed framework is shown in Figure 3, which involves three parts.

Part one: Data collection and pre-processing. Neural networks require real driving data from the skilled operator to imitate the experienced operator. For the collection of wheel loader driving data, skilled drivers were required to perform a V-cycle in the actual working environment. To improve the computation speed and prediction accuracy, the driving data are normalized, and the working cycle is divided.

Part two: Sequence model. LSTM, which is capable of extracting temporal features and solving the problem of gradient disappearance in the original RNN, is applied in this paper. Six LSTM networks with the same structure are used for six stages of the working cycle.

Part three: Regression model. In order to output the final results, two BPNNs following the LSTM output of the prediction results of throttle value and state, respectively. Eachpartisdiscussedindetailasfollows.

**Figure 3.** The model presented in this paper.

#### *3.2. Data Collection and Pre-Processing*

The data acquisition of the wheel loader is shown in Figure 4, which is equipped with pressure sensors and GPS. Field data were collected in sites with dry ground. To study the adaptability of the proposed prediction approach, it is important to conduct experiments with a variety of materials. Small coarse gravel (SCG) and large coarse gravel (LCG) were used as the operating materials for this experiment, which are shown in Figure 5. Small coarse gravel mainly contains particles with sizes 0–25 mm, while large coarse gravel mainly contains particles with sizes 25–500 mm.

**Figure 4.** Experimental wheel loader.


**Figure 5.** Two operating materials.

The V-cycle of wheel loaders consists of six working phases, which possess their own unique characteristics. To improve the prediction precision and computation efficiency, six prediction models were constructed for six phases of the working cycle of wheel loaders. Normalization was used to speed up the training. According to the working characteristics of wheel loaders in the V-cycle, the V-cycle was divided by extracting the working condition features of the actuator and walking device to realize the mapping between the collected data and working state, as shown in Figure 6. For different operating materials, 50 sets of data were collected to train and test the prediction model.

**Figure 6.** Schematic diagram of working condition division: (**a**) velocity of wheel loader; (**b**) lift cylinder pressure; (**c**) tilt cylinder pressure.

#### *3.3. Construction of LSTM*

The proposed deep-learning-based prediction method consists of LSTM and BPNN, as illustrated in Figure 7. LSTM can memorize the temporal relationship in time-series data. The particular gate structure of the LSTM allows the networks to learn when to store and when to forget the relationship. Thus, the temporal information of the driving data is encoded into the LSTM network. In the training process, the high-dimensional temporal information was extracted by the hidden layer from the time-series data.

**Figure 7.** Structure of the proposed LSTM and BPNN.

The LSTM model is developed with triple-stacked LSTM units because this configuration outperformed the double-stacked and the single-stacked LSTM in the training experiment. Meanwhile, compared with quadruple-stacked LSTM, triple-stacked LSTM has similar prediction precision and requires fewer computation resources. This result implies that increasing the structural complexity of the LSTM does not always lead to an improvement in the prediction accuracy.

The prediction of throttle value and state share an LSTM network to extract the temporal features. An alternative option is to use two LSTMs to extract the temporal features and predict the throttle value and state separately. However, two LSTMs introduce the extra burden of training and real-time calculations. In the experiment, both options have a similar effect. A possible explanation for this is that the sequence features required to predict the throttle value and state are similar.

When operators drive wheel loaders to work, the cycle operation time is diverse. Thus, the time-series data have different lengths. For the LSTM network, the time-series data with different sequence lengths need padding to ensure the same length. However, after padding for the time-series data, the prediction ability of the network will be influenced. Therefore, in this paper, the batch-size was set to 1 to ensure prediction accuracy.

The time-series data were taken as the input and the time dimension was [1,2,. . .*t*,. . . *n*]. each sequence has five parameters: lift cylinder, tilt cylinder, engine speed, vehicle velocity and throttle, respectively. In the training process, all previous throttle values and state values were taken as the inputs to output the corresponding prediction values of the next moment via BPNN, and the real values of the next moment were used as the correct mark values.

For the LSTM, the number of output units is 64. To train the neural networks, the learning rate was 0.001, while the loss function was mean squared error (MSE) and expressed as:

$$MSE = \frac{1}{N} \sum\_{i=1}^{N} (\hat{y}\_i - y\_i)^2 \tag{7}$$

where *yi* and *yi* are, respectively, the predicted value and the actual value of the sampling point in the test set, and *N* is the number of samples in the test set.

The solver was the Adam algorithm [39], which is one of the most common solvers and suitable for training RNN. To assess the quality of training results, the root mean square error (RMSE) is taken as the criterion and expressed as:

$$RMSE = \sqrt{MSE} \tag{8}$$

#### *3.4. Construction of BPNN*

Two BPNNs with 64 inputs were used to output the prediction results of throttle value and state. The temporal information extracted from all previous data was taken as the input parameter of BPNN at each moment and the BPNN output the prediction values of the next moment. The two BPNNs have the same structure, with two hidden layers, with 64 and 32 units, respectively. The BPNN structure was proven to be effective and accurate. The BPNN part in Figure 4 depicts the network architecture. The Rectified Linear Units were chosen as the activation function of BPNN because they allow for deep neural networks to be trained with acceptable speed and performance [40].

#### **4. Results and Discussion**

TensorFlow was employed for the programming implementation of the benchmark and proposed architectures. The time-series data were imported into Python as a list. The label was placed in the other list. The first 40 elements of the lists were used as the training set, and the last 10 elements formed the test set.

#### *4.1. Performance Analysis of Deep Learning Model for Different Materials*

To validate the adaptability of the proposed method on the prediction problem, the experimental wheel loader was required to load different materials with the V-cycle operation mode at two different working sites, and the collected driving data at the two test sites were used as the inputs to train two LSTM network individually. Meanwhile, the throttle value and state of the wheel loader at the next time step were used as the output to train the networks. In addition to the different operating materials, the two different working sites have different driving road surfaces. When loading small coarse gravel, the pavement comprised concrete road surfaces, and when loading large coarse gravel, the pavement comprised native soil road surfaces. For each working material, 50 sets of driving data were collected at a 200 Hz sampling frequency. The data were further divided into training data, consisting of 40 sets, and testing data, consisting of 10 sets.

Figure 8 shows the comparison results of the RMSE of the predicted throttle value and state using small and large coarse gravel as working materials for the six working stages and 10 groups of test data, respectively. Each boxplot represents the quartiles of RMSE, where the current throttle value and state are used as the input, and the prediction results belong to the next time step. It can be seen from Figure 8a that the RMSE of the predicted throttle value for two different materials was less than 1.8 and, compared with the RMSE using small coarse gravel as working materials, the RMSE using large coarse gravel had a higher mean and wider variation range, which indicates worse prediction results. In Figure 8b, the RMSE of the predicted state for two different materials are less than 5, with the same comparison results as the predicted throttle value. A possible explanation for this is that the complexity of working environments has an impact on prediction accuracy. If large coarse gravel is used as the working material, the load of the wheel loader will change drastically during the bucket-filling stage (V2) and dumping stage (V5), which

increases the difficulty of prediction. In addition to this, native soil pavement is more complicated than concrete pavement, so the interaction between the wheel loader and the environment has stronger randomness. The predicted results and the actual values under two different working conditions are compared in Figure 9. As shown in Figure 9, during the whole working cycle, the LSTM network can predict the throttle value and state with relatively high accuracy under different working conditions, which means that the proposed prediction model has good adaptability.

**Figure 8.** Comparison of RMSE from different materials: (**a**) prediction of throttle value (**b**) prediction of state. mean and median values are shown with '–' and '—' respectively.

#### *4.2. Comparison with Different Deep Learning Models*

The single V-cycle of wheel loaders consists of six stages, which have different operation modes and feature data. To more accurately extract unique feature data for each work stage and obtain high prediction accuracy, six LSTM prediction networks are developed for different stages. A single LSTM prediction network can also be used for this work. A single prediction network takes the complete data containing six stages as input and outputs the prediction result, which is end-to-end deep learning. End-to-end deep learning can reduce the hand-designed features and intermediate steps, but requires a considerable amount of data.

Figure 10 compares the RMSE results of the single LSTM prediction network and multiple LSTM prediction networks using small coarse gravel (SCG) as a working material. From Figure 10, it can be seen that the RMSE obtained by the single prediction networks has a higher mean and wider variation range compared with the RMSE obtained by the multiple prediction networks. Particularly for the bucket-filling stage (V2) and dumping stage (V5), the multiple prediction networks significantly outperforms the single prediction network in the prediction effect. The above finding can be further confirmed by Figure 11, which shows the RMSE comparison results using large coarse gravel (LCG) as a working material. There are two possible reasons for this result. The first reason is that there is a change in the load of the wheel loader during the bucket-filling stage and the dumping stage. The lift and tilt of the working device also account for this result. Therefore, in the case of limited data, to obtain accurate prediction results, it is necessary to establish different prediction networks for different working stages. However, it should be noted that the single prediction network may achieve the same performance as the multiple prediction networks with sufficient data.

BPNN is also used as a benchmark model for different stages. Figures 12 and 13 compare the RMSE results of BPNNs and LSTM networks. The result shows that the LSTM network has a better prediction effect. The better prediction result can be ascribed to the fact that LSTM can extract temporal features, which can make the model understand the environment and wheel loader more accurately. The RMSE of throttle value for different operating materials and models is shown in Table 2, and the RMSR of state is shown in Table 3.

**Figure 9.** Driving data of experienced drivers and the predicted value from different materials: (**a**) small coarse gravel (**b**) large coarse gravel.

**Figure 10.** RMSE comparison of different LSTM networks using small coarse gravel: (**a**) prediction of throttle value (**b**) prediction of state.

**Figure 11.** RMSE comparison of different LSTM networks using large coarse gravel: (**a**) prediction of throttle value (**b**) prediction of state.

**Figure 12.** RMSE comparison of BPNNs and LSTM networks using small coarse gravel: (**a**) prediction of throttle value (**b**) prediction of state.

**Figure 13.** RMSE comparison of BPNNs and LSTM networks using large coarse gravel: (**a**) prediction of throttle value (**b**) prediction of state.

#### *4.3. Performance Analysis of LSTM Networks for Different Sampling Frequency*

Due to the high integration of the wheel loader and the high signal density, the sampling frequency is severely restricted by the storage capacity of the host. The appropriate sampling frequency should be as low as possible while ensuring prediction accuracy. The low sampling frequency can reduce the amount of data, thereby reducing the cost of data storage and the consumption of computation resources. Therefore, to reduce the cost, it is necessary to study the relationship between the signal sampling frequency and the prediction accuracy. The sampling frequency is reduced to 100, 50, 20, and 10 Hz, respectively.


**Table 2.** The RMSE of throttle value for different operating materials and models.

**Table 3.** The RMSE of state for different operating materials and models.


Figures 14 and 15 show the relationship between prediction performance and signal sampling frequency. Table 4 shows the RMSE of throttle value and state under different sampling frequencies. It can be seen that the prediction effect improves with the increase of the signal sampling frequency. This result may be explained by the fact that the higher sampling frequency can provide sufficient feature information in time. However, the too-high sampling frequency may bring more noise, making it difficult for the neural network model to learn the correct mapping from input to output. At the same time, when the sampling frequency is higher than 50 Hz, the increase in frequency does not significantly improve the prediction performance. In practice, although the increase in sampling frequency will improve the prediction accuracy, it will also lead to an increase in storage costs and a decrease in the real-time calculation rate. Therefore, a trade-off is necessary for the selection of sampling frequency. For example, if a fully automated system is required, a higher sampling frequency is necessary to reduce the prediction error. However, for the assisted driving, a lower sampling frequency should be considered to reduce the storage and computing costs.

**Figure 14.** Relationship between sampling frequency and prediction performance using large coarse gravel: (**a**) prediction of throttle value (**b**) prediction of state.

**Figure 15.** Relationship between sampling frequency and prediction performance using small coarse gravel: (**a**) throttle prediction (**b**) state prediction.


**Table 4.** The RMSE of throttle value and state under different sampling frequency.

## **5. Conclusions**

This paper proposed a deep-learning-based method to predict throttle value and state for wheel loaders. The prediction model can help achieve autonomous operation and reduce the need for remote intervention during remote operation. Additionally, the proposed model can be applied to model predictive control and energy managemen<sup>t</sup> to achieve a good performance in terms of efficiency and fuel consumption.

The prediction model consists of three main parts, namely, data collection and preprocessing, LSTM and BPNN. Six LSTM networks are used to extract the temporal features of six stages of the V-cycle for wheel loaders. Based on the extracted temporal features, two BPNNs are employed to predict the throttle value and state of wheel loaders, respectively. The data obtained from two different working materials and pavements are used to train and test the proposed prediction model. The results show that the proposed prediction model can achieve a good prediction effect under different working conditions and outperform BPNNs. Moreover, compared with end-to-end deep learning, which only uses a single LSTM network for prediction, the prediction model of multiple LSTM networks shows better prediction performance. However, the prediction model of multiple neural networks requires more hand-designed features. The relationship between signal sampling frequency and prediction accuracy is also studied. In the range of 10 Hz to 200 Hz, as the frequency increases, the prediction performance improves. However, when the signal sampling frequency exceeds 50 Hz, the improvement effect of prediction accuracy is not obvious as the frequency increases. Therefore, in engineering practice, it is necessary to weigh the prediction accuracy and cost. Although this paper takes the wheel loader as the research object, the proposed prediction model can be adapted to other construction machinery. In future, the prediction network will be deployed to a physical wheel loader to improve the efficiency and real-time fuel efficiency using reinforcement learning.

**Author Contributions:** Conceptualization, J.H. and X.C.; methodology, J.H. and Y.S.; software, J.H. and X.C.; validation, J.H. and Y.S.; formal analysis, D.K.; investigation, X.C.; resources, J.W.; data curation, Y.S. and X.C.; writing—original draft preparation, J.H.; writing—review and editing, J.W. and D.K.; visualization, X.C. and Y.S.; supervision, J.W.; project administration, J.W.; funding acquisition, J.W. and D.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Natural Science Foundation of China gran<sup>t</sup> number 51875239.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.
