1. Introduction
Stock price prediction has always been a significant research topic in the field of finance. Researchers have proposed various algorithms to forecast the trends of stock prices, which are highly volatile, complex, and nonlinear [
1,
2,
3,
4]. Traditional forecasting methods include Multivariable Linear Regression, Exponential Smoothing, Autoregressive Integrated Moving Average (ARIMA), etc. However, these models are often relatively simple and may not achieve satisfactory prediction accuracy. In recent years, with the development of big data and artificial intelligence, machine learning and deep learning methods have been widely applied to stock price prediction research [
5,
6,
7]. Examples include Random Forest, Extreme Gradient Boosting (XGBoost), and Support Vector Machine (SVM), among others. Compared to traditional methods, these approaches have effectively improved prediction accuracy.
Deep learning is a machine learning concept based on artificial neural networks. For many applications, deep learning models outperform shallow machine learning models and traditional data analysis methods [
8]. There are various forms of neural networks, such as BP neural networks, RNN recurrent neural networks, etc. Long Short-Term Memory (LSTM), as an important research method in deep learning, has achieved significant success in the field of financial forecasting and has become one of the cutting-edge research methods. LSTM is an improvement on the structure of Recurrent Neural Networks (RNN), addressing the issues of RNN gradient vanishing, gradient exploding, and insufficient long-term memory by utilizing gate mechanisms when learning long-term dependencies [
9]. It can more effectively utilize information previously present in time series and current input data to make more accurate predictions about the future.
Applying LSTM for prediction on S&P 500 component stocks from 1992 to 2015, it was found that the prediction performance was superior to Random Forest (RAF), Deep Neural Networks (DNN), and Logistic Regression Classifier (LOG) [
10]. While LSTM can produce good predictions in some cases, it may still struggle with highly nonlinear time series [
11]. One possible reason is that the time dependencies between elements in these time series are not well exploited [
12]. This point is evidenced by [
12], where the use of LSTM and other deep learning models for runoff prediction showed that the model’s predictive accuracy can be demonstrated to some extent through temporal dependencies. In 2000, Felix Gers and his colleagues found that if the internal state of LSTM is not updated when processing input time series, the network would eventually collapse [
13]. Therefore, they introduced the forget gate mechanism to update and reset its state [
14]. The introduction of the forget gate mechanism somewhat effectively handles the time dependencies between elements, but the forget gate in the LSTM unit acts on the input data
and the hidden state
, without considering
. Additionally, experiments by [
15] also demonstrated that the forget gate is the only crucial gate in LSTM. Meanwhile, Felix Gers and others [
13] introduced peephole connections, giving rise to peephole LSTM. However, it was pointed out in [
14] that, in most cases, peephole LSTM does not perform well compared to LSTM. Yet, experiments indicate that the input gate and output gate mechanisms are crucial gating mechanisms for finding better-performing LSTM units. The coupling of the forget gate and input gate has a comparable performance with LSTM. Therefore, considering that the forget gate is a key gating mechanism in LSTM, improving the input and output gates may lead to better results. This paper redesigns the internal structure of LSTM units and introduces a variant of the LSTM structure. Compared to existing LSTM and peephole LSTM, the variant LSTM captures the idea that the forget gate is the key gate, retaining the original forget gate while coupling the input gate with the forget gate for improvement. This enhances the information utilization efficiency of the variant LSTM unit relative to the LSTM unit. Additionally, although peephole LSTM performs poorly in many experiments, its idea is still insightful. Therefore, considering the long-term state
, but handling it through a “simple” forget gate. As a result, compared to peephole LSTM, the variant LSTM focuses structural improvements on the input gate and forget gate, making the structure simpler. In the backpropagation process, the number of parameters that need to be learned is also reduced, reducing the computational time of the model. Later, some scholars further modified LSTM. In [
16], LSTM was integrated with CNN and an attention mechanism was introduced, discovering the feasibility of the model and improving its accuracy.
Additionally, when there is significant data noise, the predictive accuracy of LSTM for stock prices is greatly affected. Inspired by the human visual system, attention mechanisms in machine learning have been developed for a long time [
17] and are now widely applied in various deep learning tasks, such as natural language processing, speech recognition, image recognition, and the processing of time series data. It is an information allocation mechanism that simulates the human brain’s attention. The attention mechanism can assign different weights to the input data of the model, highlighting the importance of useful data and weakening the importance of less relevant data, reducing the impact of irrelevant parts. It was initially proposed by Bahdanau et al. [
18] and applied to machine translation, giving a probability distribution of attention when seeking attention distribution. They later proposed the soft attention mechanism and hard attention mechanism [
19]. And others constructed an LSTM with a soft attention mechanism and applied it to the prediction of flow in the Canadian basin, finding an improvement in accuracy; ref. [
20] and others applied an LSTM with a soft attention mechanism to the prediction of photovoltaic generation, finding the new model to be more effective and robust compared to a multi-layer perceptron (MLP) and traditional LSTM models. We also chose the soft attention mechanism because it is a deterministic mechanism and fully differentiable. The hard attention mechanism is an uncertain mechanism; its process is a random process, and during decoding, the system does not use all hidden states but randomly selects some [
21]. In this way, when computing gradients, it can effectively integrate with the algorithm to calculate directly.
In response to the issue of low prediction accuracy and poor model robustness in current stock price forecasting methods, considering the high volatility of stock data, the AMV-LSTM stock price prediction model is proposed. This paper improves upon the traditional LSTM neural network variant by seamlessly integrating an attention mechanism. The attention mechanism layer is placed in front of the variant LSTM, allowing data to pass through weight layers and normalization layers to obtain scores. These scores are then dot-producted with the original data vectors to assign weights, thereby forming new data vectors. Finally, these new data vectors are used as inputs to the variant LSTM, serving the purpose of noise reduction and forming a new neural network model. The model emphasizes the importance of different features in stock data, not only learning the temporal dependencies of input sequences but also effectively utilizing the interdependence features among data sequences. This will help improve the accuracy and stability of stock price prediction.
Based on the above description, the contributions of this paper are summarized as follows:
To better utilize the temporal dependencies between elements in time series, this paper improves the internal structure of LSTM. By coupling the forget gate with the input gate, simplifying LSTM, and enhancing its efficiency. Also, a simple forget gate is applied to forget the long-term state Ct, relieving the pressure on the LSTM to transmit long-term state information by filtering new inputs and historical state elements.
To enhance the model’s robustness, an attention mechanism is integrated into the improved LSTM, proposing a variant LSTM model with integrated attention mechanism (AMV-LSTM). This improves the accuracy of stock price prediction by mitigating the impact of significant data noise. Compared to LSTM with integrated attention mechanism (AM-LSTM), it further improves the model’s generalization ability, robustness, and convergence speed.
Applying AMV-LSTM to stock price prediction yields higher accuracy, providing valuable references for predicting stock prices in the financial market.
3. AMV-LSTM Model
In this section, the structure and algorithm of the AMV-LSTM model will be fully introduced.
3.1. Variational LSTM
The structure of the variant LSTM unit is shown in
Figure 4.
The design of the variant LSTM was inspired by the classic LSTM and the peephole LSTM. Since the classic LSTM is relatively sensitive and has poor noise-handling capabilities, the variant LSTM initially couples the forget gate and the input gate on the basis of the classic LSTM model. This coupling establishes a fixed connection between the input gate and the forget gate, enabling the model to no longer make separate decisions when handling old and new information, but to consider both simultaneously. In other words, information is only forgotten when new input is available, and vice versa. This means that input and forget operations occur simultaneously until the information is transmitted to the cell state. When the input gate and the forget gate make independent decisions, the input gate needs to input and transmit all the information without the ability to filter the information. However, the structure after coupling reduces the input information for the input gate, thereby alleviating the processing pressure on the input gate. This reduction is achieved through the filtering by the forget gate and is not a blind random filtration of information. This approach helps to some extent in reducing noisy data and utilizing useful information more effectively, considering the substantial impact of noisy data on stock prices. Furthermore, compared to the LSTM, the coupled structure also reduces the number of parameters, which benefits model training, aids convergence, and ultimately helps improve the accuracy of stock price predictions.
Furthermore, a “simplified” forget gate is established for the long-term cell state , enabling selective forgetting of long-term memory during the information transmission process, thereby enhancing the data noise handling capability. The reason it is referred to as a “simplified” forget gate is because the forget gate here is slightly different from the classic forget gate , as it does not contain a bias term in the activation function. This results in learning one less parameter when updating the model’s parameters and speeds up the convergence rate of the model.
Due to the findings from previous researchers’ experiments indicating that the performance of peephole LSTM is often not as effective as that of the classic LSTM, the design of the variant LSTM separates the forget gates for and , not sharing them. However, drawing on the concept of peephole LSTM, the from the “simple” forget gate is input into the input gate. This design of the variant LSTM takes into account the information of during the propagation process, thereby enhancing the efficiency of information transmission in each unit.
3.2. AMV-LSTM Structure
The structure of AMV-LSTM consists of an attention mechanism layer and a variant LSTM. In the model, the attention mechanism layer is located at the front of the variant LSTM, meaning that the input data first passes through the attention mechanism layer. After being processed by the attention mechanism layer, each element of the input data is effectively assigned weights. The processed vector is then input into the variant LSTM layer, entering each variant LSTM unit. Finally, the prediction values are obtained through a fully connected layer.
The structure of AMV-LSTM model is shown in
Figure 5.
The combination of the variant LSTM with the attention mechanism is aimed at enhancing the robustness and generalization capability of the variant LSTM. Additionally, compared to AM-LSTM, the structure of AMV-LSTM becomes simpler, primarily due to the design of the variant LSTM. The information transmission is more efficient, and the number of parameters that need to be learned during the backpropagation process is reduced. This effectively addresses the shortcomings of AM-LSTM.
3.3. Forward Propagation Algorithm of AMV-LSTM
So far, the forward propagation formula of the AMV-LSTM model becomes
Using the Formulas (
6)–(
15) to pass the calculated
through the fully connected layer, the predicted value is
The loss function
L is calculated using the mean square error (RMSE) as follows:
The total error
is calculated as
3.4. Backpropagation Algorithm of AMV-LSTM
Backpropagation Through Time (BPTT) is a backpropagation algorithm based on time series, also known as gradient descent [
25]. By iterating the parameters to be updated, it minimizes the error and finds the optimal parameter value. In the process of backpropagation to solve the gradient, the loss function
L is required to calculate the gradient of each parameter, in which LSTM involves 14 parameters to be updated, including: Weight matrices
,
,
,
,
,
,
,
,
, and offset vectors
,
,
,
,
. AMV-LSTM needs to update parameters by adding three weight matrices,
,
,
; one parameter vector,
; and reducing one weight vector and one parameter vector,
and
, respectively. AMV-LSTM needs to update a total of 16 parameters.
3.4.1. Variant Gradient Formula for LSTM Layer Parameters
The sensitivity of the loss function to the hidden variable
h at the time of
t is
, expressed by
. Then, the backpropagation formula of the weight matrix and the parameter vector of the loss function
L to the output gate
is (
19)–(
21)
The backpropagation formula of the loss function
L to the weight matrix and the parameter vector of the forgetting gate
is (
22)–(
24).
The backpropagation formula of the loss function
L to the weight matrix of the “simple” forgetting gate
is (
25)
Loss function
L to
backpropagation formula of weight matrix and parameter vector for (
26)–(
28)
The partial derivative of the loss function
L with respect to the weight matrix and bias vector of the output layer is the Formula (
29) and (
30)
3.4.2. Derivation of the Parameter Gradient Formula of the Attention Mechanism Layer
For the three parameters involved in the attention mechanism layer, according to the loss function
L and the parameter relationship, the formula for obtaining the partial derivative of
is as follows:
The formula for finding the partial derivative of the loss function
L with respect to
is
The formula for finding the partial derivative of the loss function
L with respect to
is
In the Formula (
33),
represents the diagonal matrix with the elements of the vector
a as diagonal elements, and
is the partial derivative of
with respect to the attention weight
obtained from the Softmax layer. The formula is as follows:
The calculation formulas for
in Equation (
31) and
in Equation (
33) are, respectively,
In Equation (
34),
denotes the
i-th element of vector
,
denotes the
i-th element of vector
, and
refers to the dimension of the input vector
x.
At this point, all backpropagation formulas of AMV-LSTM are derived. After the corresponding gradient of each parameter is obtained, the Adam gradient descent method is used to update the parameters.
Taking
as an example: Let
denote the gradient of
after Adam updates. The calculation formula is as follows:
In Equation (
37),
is computed based on Equation (
31). The initial values
and
are both set to 0. The parameters
,
,
, and
are experimentally set values. Usually,
and
are the exponential decay rates for the matrices, while
= 1 × 10
−8 is a very small number to prevent division by zero.
is the learning rate for Adam, which is equal to the global learning rate in this paper [
26].
The iterative formula for
at time step
is
In Equation (
42),
i represents the number of iterations,
is a constant, set to 0.64 in this case. By utilizing Equations (
37)–(
42),the iterative update of the gradient
can be achieved, and similar methods are applied for updating other parameters.
3.5. Algorithm: AMV-LSTM Model
The algorithmic process of the AMV-LSTM model is as follows:
Step 1: Set the batch size, dimension of the hidden layer, time steps, Adam learning rate for AMV-LSTM, and determine the dimension of the input feature vectors. Partition the experimental data into training and testing samples and preprocess the data. Provide the training samples along with their corresponding labels y.
Step 2: Involves feeding the training samples
into the attention mechanism layer, obtaining
according to Equations (
3)–(
5).
Step 3: Involves computing
by applying the forward propagation algorithm to
based on Equations (
6)–(
16), and obtaining Equation (
17).
Step 4: Compute the gradients of the loss function with respect to the parameters of the LSTM layer using Equations (
19)–(
30), and obtain the gradients of the loss function with respect to the parameters of the attention mechanism layer using Equations (
31)–(
36).
Step 5: Employs the Adam gradient descent method to iteratively update each parameter based on Equations (
37)–(
42).
Step 6: According to the set number of iterations, determine the termination condition. If the number of iterations is less than or equal to the specified number, return to Step 4; otherwise, terminate the iteration.
Step 7: Train the pre-trained AMV-LSTM model using the training data, make predictions on the test set, and then denormalize the predicted data to obtain the final prediction results.
4. Experiments
In this section, to validate the superiority of AMV-LSTM, a simple comparison is first conducted by comparing the variant LSTM with the classical LSTM model. Subsequently, a comparison is made between AMV-LSTM and LSTM models incorporating attention mechanisms. The precision of AMV-LSTM in predicting the stock price for the next 1 day, the accuracy in predicting the next 2 days, and the convergence speed of AMV-LSTM are studied separately. Ultimately, it is concluded that the performance of AMV-LSTM is more outstanding.
The data used in the experiment are from China Ping An Bank, sourced from the Uminer platform. It is a representative stock in the financial industry. The time span of the dataset is from 19 January 2007, to 12 April 2022, comprising a total of 3590 data points. Fourteen dimensions of stock features were selected, including the opening price, highest price, lowest price, closing price, as well as 5-day, 10-day, 20-day, 60-day, and 120-day exponential moving average indicators, and 5-day, 10-day, 20-day, 60-day, and 120-day simple moving average lines. The time step is set to 10, predicting the stock price for the 11th day based on the stock prices of the preceding 10 days.
During preprocessing, we first checked whether there were missing values or anomalies among the 3590 data points. Since the opening price of stocks is zero during non-trading hours, we removed data points with opening prices of zero to ensure data integrity. Subsequently, considering the practical scenario where the actual opening price of a stock on a given day may not be exactly the same as the previous day’s closing price due to factors such as stock dividends, we adjusted the stock data. Two methods, namely forward adjustment and backward adjustment, are commonly used for stock price adjustments. Forward adjustment maintains the stock price while lowering the pre-adjustment closing price below the stock price; backward adjustment increases the adjusted stock price. Both methods effectively correct stock prices. In this study, we applied backward adjustment to correct the stock prices.
The specific steps are as follows:
- (1)
Determine the ex-dividend date of the stock, which is the date on which the company announces the ex-dividend date.
- (2)
Calculate the ex-dividend factor for each day prior to the ex-dividend date. The ex-dividend factor is the ratio of the stock price before the ex-dividend date to the price after the ex-dividend date. The calculation formula is
In Equation (
43), EDF refers to the ex-dividend factor, CD refers to cash dividends, and BTED refers to the stock price on the day before the ex-dividend date.
- (3)
Use ex-dividend factors to adjust all prices after the ex-dividend date. For each day’s stock price, multiply it by the corresponding ex-dividend factor to obtain the adjusted price.
- (4)
Based on the stock’s ex-dividend status, repeat steps 2 and 3 until all ex-dividend factors have been applied to their respective dates.
Afterwards, the dataset was split into a training set and a test set. In this study, 3000 data points were selected as the training set, and 500 data points were chosen from the remaining data as the test set. After the dataset division, due to the impact of measurement units and variance among stock features, it is necessary to normalize the data. The normalization formula is as follows:
The evaluation criteria for the model include Mean Squared Error (MSE), Mean Absolute Error (MAE), and the Coefficient of Determination (
). MSE, MAE, and
are metrics used to gauge the fitting degree and prediction accuracy of the model. For MSE and MAE, smaller values are better, while for the Coefficient of Determination (
), a value closer to 1 is preferable. Market volatility affects the model’s predictive performance, as reflected in these metrics. When the model exhibits stronger robustness, even in the presence of significant market volatility, it can still capture the nonlinear relationship in stock price data. Consequently, MSE and MAE decrease, while
increases. Comparing models, the more significant the reductions in MSE and MAE, the stronger the model’s robustness and its ability to capture nonlinear data relationships, resulting in higher predictive accuracy of stock prices and greater increases in
. The formulas for calculating these three evaluation metrics are as follows:
4.1. Comparison between Variant LSTM and Classic LSTM
The classic LSTM is highly unstable when predicting stock prices based on 14-dimensional stock features. It is prone to generating ineffective prediction results due to the influence of data noise. Even when capable of handling data, it tends to suffer from overfitting issues. In contrast, modified versions of LSTM demonstrate more stability in predicting stock prices and offer some relief in situations of overfitting (
Figure 6).
In the experiment, Adam’s learning rate is 0.64. And the model performs best when the learning rate of Adam is at 0.64, with stable predictive performance observed between 0.6 and 0.75. Additionally, the optimal batch size is 500. The iterations are conducted for a total of 100 rounds each. From the performance chart, it is evident that, compared to the classic LSTM, the variant LSTM yields more accurate predictions on the test set. To confirm that the “simple” forget gate indeed selectively forgets information, a heatmap of the weights of the “simple” forget gate
is plotted (
Figure 7). This visualization provides a more intuitive understanding of the distribution of numerical values.
From the graph, it is evident that the “simple” gate plays a role in the information within ; most of the information can be selectively forgotten, thus alleviating the burden on the LSTM caused by an excess of information during transmission.
Furthermore, although the classic LSTM performs reasonably well in predicting stable stock prices, it struggles to capture sudden spikes or drops in stock prices in a timely manner. While the variant LSTM shows significant improvement in both stable and unstable stock prices compared to the classic LSTM, its accuracy still does not reach a very high level.
4.2. Comparison Experiment of AMV-LSTM and AM-LSTM
4.2.1. Predicting Future Stock Prices for 1 Day
Due to the variant LSTM’s relatively low accuracy in predicting stock prices, an attention mechanism layer is added in front of the variant LSTM. This forms the AMV-LSTM, which first predicts the stock price for the next day.
In comparison with the variant LSTM shown in the predicted performance
Figure 8, it is evident that AMV-LSTM has significantly improved accuracy in both the training and test sets.
Comparing with
Table 1 and
Table 2, it can be observed that, after 20 rounds of iteration, the accuracy of the AMV-LSTM has increased by 6.7727% on the training set and by 18.8125% on the testing set, compared to the variant LSTM. Additionally, when compared to the AM-LSTM, the performance of AMV-LSTM remains superior, especially on the testing set, where the accuracy has further increased by approximately 3%.
From
Figure 9 and
Figure 10, it can be seen that when facing significant fluctuations in stock prices, AMV-LSTM is more adaptable to such abrupt changes compared to AM-LSTM. This demonstrates that the information utilization capability of AMV-LSTM is stronger.
To verify the applicability of the model to data from different industries and highlight the effect of integrating attention mechanisms into variant LSTMs, we used AMV-LSTM to predict the stock prices of American companies such as IBM and Ford. We compared these results with those obtained using LSTM and variant LSTMs and found that the performance of the AMV-LSTM model remained superior. The comparative plot of the model’s predictive performance is shown below (
Figure 11 and
Figure 12):
Furthermore, the comparison between AMV-LSTM and the more advanced model GRU is presented in the table below.
Table 3 shows that, compared to GRU, the overall performance metrics of AMV-LSTM, including MAE, MSE, and
, are still superior, both on the training and testing datasets.
4.2.2. Predicting Stock Prices for the Next 2 Days
To assess the generalization ability of AMV-LSTM, stock price predictions for the next 2 days were conducted using both AMV-LSTM and AM-LSTM (
Figure 13). This involves forecasting stock prices for the 12th days using the stock prices of the preceding 10 days.
Based on the effectiveness charts of stock price predictions for the next 2 days, both AMV-LSTM and AM-LSTM show a certain decline in predictive capabilities (
Table 4). However, the predictive performance of AMV-LSTM is still superior to that of AM-LSTM.
The various evaluation indicators of AMV-LSTM, although showing a slight decrease in predicting the next 2 days, still maintain an accuracy of over 80%. However, the impact on the prediction accuracy of AM-LSTM is comparatively larger, especially in the performance on the test set. This validates that the generalization ability of AMV-LSTM is stronger compared to AM-LSTM. Overall, considering the comprehensive performance, AMV-LSTM still outperforms AM-LSTM.
4.2.3. Comparison of Convergence Speed
In order to compare the convergence speed of AMV-LSTM and AM-LSTM, the Mean Squared Error (MSE) in the testing set of both models is set to 0.0015. When the model’s MSE reaches 0.0015, the iteration is terminated, and the number of iterations as well as the time taken for both models are recorded. Experimental results show that AMV-LSTM requires only 11 iterations, taking 321.496 s, while AM-LSTM requires 13 iterations, taking 379.821 s.
To compare the computational complexity of the three models, we iterated LSTM, AM-LSTM, and AMV-LSTM 10 times, respectively, and recorded the time taken by each model for each iteration.
Table 5 shows that LSTM took a total of 47.2695 s for 10 iterations, with an average time of 4.72695 s per iteration. AM-LSTM took 293.034 s for 10 iterations, averaging 29.3034 s per iteration. AMV-LSTM took 291.576 s for 10 iterations, averaging 29.1576 s per iteration. Compared to LSTM, AMV-LSTM took approximately 25 s more per iteration, but achieved a nearly 77% improvement in prediction accuracy on the test set. Despite the increase in time, there was a significant improvement in accuracy. Compared to AM-LSTM, AMV-LSTM reduced the time per iteration by 0.2 s, showing a slight decrease in time, while still achieving an approximately 3% increase in accuracy on the test set. This reflects that the computational complexity of AMV-LSTM is higher than LSTM but lower than AM-LSTM.
5. Conclusions
The prediction of stock prices has always been a complex and highly challenging issue. In order to enhance the accuracy of stock price prediction as much as possible, the AMV-LSTM model was developed in this study. We first improved the structure of LSTM and designed a variant of LSTM, drawing inspiration from both classic LSTM and peephole LSTM. The variant LSTM integrates the forget gate and input gate, and introduces a “simplified” forget gate. Experimental results demonstrate that the robustness of the variant LSTM has been enhanced. The added “simplified” gate effectively filters data information, improving the efficiency of information utilization, and effectively addressing LSTM overfitting. This will directly contribute to the improvement of stock price prediction accuracy in real-world scenarios.
Additionally, an attention mechanism layer was added to the variant LSTM, resulting in the design of AMV-LSTM, for which the backpropagation formula was derived. Compared to the variant LSTM, AMV-LSTM shows overall performance improvement. Compared to AM-LSTM, AMV-LSTM demonstrates stronger robustness, excelling not only in predicting the stock price for the next day but also maintaining superior overall performance in predicting the stock price for the next two days, indicating enhanced generalization ability. This implies a longer-term reliability and predictive capability in forecasting market trends. From a decision support perspective, providing forecasts for the next two days’ stock prices helps investors reduce decision-making blindness and randomness, thereby enhancing the accuracy and efficiency of investment decisions and facilitating the formulation of investment strategies. In terms of managing trading risks, it enables better avoidance of market risks, timely adjustment of investment portfolios, and higher investment returns. Compared to other benchmark models and variant models, AMV-LSTM exhibits lower Mean Squared Error (MSE) for stock price prediction than the models LSTM-CNN and LSTM+CNN+CBAM presented in [
16].
Finally, in terms of convergence speed, experimental results show that AMV-LSTM requires fewer iterations than AM-LSTM to achieve the same level of accuracy, indicating that AMV-LSTM converges faster. A faster convergence speed implies that less computational resources and time are required for real-time predictions, making investors’ decisions more timely and enhancing the practicality of forecasting.
In the future, integrating the model with specific investment strategies could potentially yield higher profits for investors and enhance risk management capabilities by providing timely alerts. Additionally, applying this model to other similar fields such as runoff prediction and short-term electricity forecasting is also a viable option worth exploring. These avenues can be considered in future work.