Next Article in Journal
Method for Wind–Solar–Load Extreme Scenario Generation Based on an Improved InfoGAN
Previous Article in Journal
High-Speed Tomography—A New Approach to Plasma Bulk Velocity Measurement
Previous Article in Special Issue
Hands-On Fundamentals of 1D Convolutional Neural Networks—A Tutorial for Beginner Users
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Stock Prediction Method Based on Heterogeneous Bidirectional LSTM

School of Mathematics, Physics and Statistics, Shanghai University of Engineering Science, Shanghai 201620, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(20), 9158; https://doi.org/10.3390/app14209158
Submission received: 19 August 2024 / Revised: 26 September 2024 / Accepted: 8 October 2024 / Published: 10 October 2024
(This article belongs to the Special Issue Advances in Neural Networks and Deep Learning)

Abstract

:
LSTM (long short-term memory) networks have been proven effective in processing stock data. However, the stability of LSTM is poor, it is greatly affected by data fluctuations, and it is weak in capturing long-term dependencies in sequential data. BiLSTM (bidirectional LSTM) has alleviated this issue to some extent; however, due to the inefficiency of information transmission within the LSTM units themselves, the generalization performance and accuracy of BiLSTM is still not very satisfactory. To address this problem, this paper improves LSTM units on the basis of traditional BiLSTM and proposes a He-BiLSTM (heterogeneous bidirectional LSTM) with a corresponding backpropagation algorithm. The parameters in He-BiLSTM are updated using the Adam gradient descent method. Experimental results show that compared to BiLSTM, He-BiLSTM has further improved in terms of accuracy, robustness, and generalization performance.
MSC:
37N99

1. Introduction

The exploration of stock price prediction models has always been a significant topic. From traditional forecasting methods such as Autoregressive Integrated Moving Average (ARIMA), Generalized Autoregressive Conditional Heteroskedasticity (GARCH), and Vector Autoregressive Moving Average (VARMA) (Atsalakis & Valavanis [1]) to Machine Learning (ML) and nowadays Artificial Neural Networks (ANNs) (Kao et al. [2]), numerous scholars have continually attempted to improve models for predicting stock prices. In 2019, Shahzad [3] constructed a model using Support Vector Regression (SVR) for short-term stock index prediction, a nonlinear time series forecasting method. In 2018, Ao and Zhu [4] utilized Support Vector Machine (SVM) technology to establish a model to predict the movements of the CSI 300 Index, demonstrating the effectiveness of SVM in stock index prediction.
Compared to traditional machine learning algorithms, neural network models can effectively address the non-stationary and nonlinear characteristics of stock price data (Zhang and Lou [5]). Liu et al. (2019) [6] used a BP neural network model to predict stock price fluctuations. Kong et al. (2021) [7] employed a GRU neural network model to predict Amazon spot prices, verifying the applicability of neural network models in the financial domain [8].
Especially, LSTM networks are highly favored due to their ability to address the vanishing gradient problem common in RNNs. This is attributed to their internal hidden memory cells [9], which can store past information and use it along with the latest information to predict sequences, and automatically decide when to delete information from their memory (Gers, Schraudolph, & Schmidhuber, 2003 [10]; Hochreiter & Schmidhuber, 1997 [11]), thus maintaining long-term dependencies.
BiLSTM, an updated variant of LSTM, has been utilized in both classification and regression problems (Peng et al., 2021 [12]). BiLSTM combines forward and backward LSTM, achieving additional training by traversing the input data from left to right and from right to left. Some studies (Cheng, Ding, Zhou, & Ding, 2019 [13]; Kulshrestha, Krishnaswamy, & Sharma, 2020 [14]) have demonstrated its effectiveness in handling long-term dependencies. It primarily achieves this through back-to-back encoding, allowing for the extraction of more hidden information from the data.
However, due to certain issues with the gating mechanism of LSTM units in processing data information, some gating mechanisms are not designed to be sufficiently streamlined, which may limit the efficiency of BiLSTM in processing data. Experimental results indicate that the input gate and output gate mechanisms in LSTM are crucial, and coupling the forget gate and input gate can achieve performance comparable to LSTM. The coupled structure can enhance the data processing capability of LSTM units. The He-BiLSTM takes advantage of the BiLSTM architecture and a modified neural network unit, replacing the traditional LSTM units in the BiLSTM framework with the modified neural network units, thereby influencing the overall model through these modified units and subsequently optimizing it.
Based on this situation, this paper optimizes the LSTM design and proposes a He-BiLSTM, aiming to improve the data processing efficiency of BiLSTM and enhance the overall model performance.
Based on the aforementioned situation, the research findings of this paper are summarized as follows:
  • To further enhance the robustness, accuracy, and generalization capability of the BiLSTM model, a novel LSTM unit structure is adopted to reconstruct the BiLSTM. Compared to the existing BiLSTM, this paper reconstructs a novel bidirectional neural network structure (He-BiLSTM) by replacing the traditional neural units in BiLSTM. This makes the data transmission efficiency of the He-BiLSTM higher, enabling the model to better handle the non-stationary and nonlinear characteristics of stock financial data. Consequently, the model’s ability to process noise is improved, leading to an overall performance enhancement. Additionally, this paper also presents the backpropagation algorithm for the He-BiLSTM.
  • Applying the He-BiLSTM to stock price prediction results in higher accuracy, providing significant reference value for predicting stock prices in financial markets.

2. Related Work

Since the construction of the He-BiLSTM is based on the improved design of LSTM and BiLSTM, this section will first introduce these two aspects.

2.1. LSTM (Long Short-Term Memory) Neural Networks

LSTM, proposed by Hochreiter and Schmidhuber [11] in 1997, is a variant of RNN neural networks. It was introduced because training standard Recurrent Neural Networks (RNNs) to handle problems with long-term temporal dependencies presents numerous challenges. Recent studies (e.g., Abe and Nakayama, 2018 [15]; Fischer & Krauss, 2018 [16]; Kraus & Feuerriegel, 2017 [17]; Minami, 2018 [18]; Choi, 2018 [19]) have utilized deep learning methods to predict financial time series data, involving techniques such as LSTM and BiLSTM, with these models achieving notable predictive performance.
The significance of LSTM neural networks lies in the gate mechanisms present in each of their units: the forget gate, input gate, and output gate. These three gates play different roles in processing data information. The forget gate determines which information from the input should be discarded, the input gate decides which data information will be modified, and the output gate determines which data information will be passed to the next neural unit. The specific structure of an LSTM neural unit is shown in Figure 1:
The relevant gating mechanism equations are as follows:
f t = σ ( W x f · x t + W h f · h t 1 + b f )
i t = σ ( W x i · x t + W h i · h t 1 + b i )
c t = f t c t 1 + i t c ˜ t
c ˜ t = t a n h ( W x c · x t + W h c h t 1 + b c )
o t = σ ( W x o · x t + W h o · h t 1 + b o )
h t = o t t a n h ( c t )
Among these, Equation (1) is the calculation formula for the forget gate f t , Equation (2) represents the calculation formula for the input gate i t , Equations (3) and (4) represent the calculation formulas for the cell state c t , indicating the long-term memory transmission state of data information, Equation (5) represents the calculation formula for the output gate o t , and Equation (6) h t represents the hidden state output information flowing to the next unit. In Equations (1)–(6), σ represents the sigmoid activation function, ⊙ is the Hadamard product, and x t represents the input vector of the current unit. W x · and W f · represent the weight matrices for the input vector x t and the previous hidden state h t 1 , respectively. b · represents the bias vectors.

2.2. BiLSTM (Bidirectional LSTM) Network

The core of the BiLSTM architecture lies in utilizing two LSTM layers to process the input sequence in both forward and backward directions, enabling bidirectional information flow. This design allows the model to simultaneously capture historical and future information within the sequence. In non-financial data domains, scholars (Fan, Qian, Xie, & Soong, 2014 [20]; Graves, Jaitly, & Mohamed, 2013 [21]; Graves & Schmidhuber, 2005 [22]; Schuster & Paliwal, 1997 [23]; Siami-Namini, Tavakoli, & Namin, 2019 [24]) have found that its performance surpasses that of LSTM. Later, Ren et al. discovered that the characteristics of the BiLSTM model are more suitable for nonlinear data, such as stock prices, compared to LSTM models.
Figure 2 illustrates the structure of the BiLSTM framework, and since forward propagation is divided into front and back directions, we write the front direction as a positive direction and the back direction as a negative direction. In this way, we can distinguish between the reverse direction of forward propagation and the concept of backpropagation. Compared to the LSTM, the structure of the BiLSTM is more complex. In an LSTM unit, there are 12 internal parameters involved in backpropagation updates: W x · , W h · and b · . However, for the BiLSTM, the parameters include not only W x · , W h · and b · in the positive direction but also W x · , W h · and b · in the negative direction, making a total of 24 parameters. For clarity, the parameters of the positive unit are denoted as W x · , W h · and b · , and the parameters of the negative unit are denoted as W x · , W h · and b · .
The forward propagation calculations of the BiLSTM are also divided into two parts: the positive-direction LSTM and the negative-direction LSTM. The calculation formulas for the forward LSTM are (7)–(12).
f t = σ ( W x f · x t + W h f · h t 1 + b f )
i t = σ ( W x i · x t + W h i · h t 1 + b i )
c ˜ t = t a n h ( W x c · x t + W h c h t 1 + b c )
c t = f t c t 1 + i t c ˜ t
o t = σ ( W x o · x t + W h o · h t 1 + b o )
h t = o t t a n h ( c t )
The calculation formulas for the negative direction LSTM are (13)–(18):
f t = σ ( W x f · x t + W h f · h t + 1 + b f )
i t = σ ( W x i · x t + W h i · h t + 1 + b i )
c ˜ t = t a n h ( W x c · x t + W h c h t + 1 + b c )
c t = f t c t + 1 + i t c ˜ t
o t = σ ( W x o · x t + W h o · h t + 1 + b o )
h t = o t t a n h ( c t )
In BiLSTM, the input vector at time t is x t (where t = 1, 2... τ ), with both h 0 and h τ being zero vectors. The final transmitted h τ and h 0 from both directions are multiplied together to get h τ . The calculation formula is given by Equation (19), as follows:
h τ = h τ h 0

3. Methodology Proposal

Although BiLSTM networks alleviate some of the issues related to insufficient dependency modeling between information compared to traditional LSTM, they still inherit many of the inherent problems of LSTM units. One of these issues lies in the imperfect design of the gating mechanisms within LSTM units. In 2000, Felix Gers [25] observed that LSTM units needed to update their internal states while processing time series data, leading to the introduction of the forget gate mechanism. In the study referenced in [18], it was also found that within LSTM units, the forget gate is the critical gate, while improving the input and output gates is crucial for enhancing LSTM unit performance. Therefore, it is advisable to retain the LSTM forget gate while improving the other gating mechanisms.
In the LSTM unit structure, due to the limited gating mechanisms involved in controlling the long-term memory c t , handling long-term memory information may be inadequate. Additionally, LSTM units require learning 12 parameters, and when applied to BiLSTM, this number increases to 24. This increase in model parameters significantly impacts the model’s computational speed.
Given these issues, the concept of He-BiLSTM is proposed to address the inefficiencies in information processing. This approach aims to improve accuracy, enhance robustness, strengthen generalization capabilities, and mitigate the slow computational speed caused by inefficient information handling in traditional models.

3.1. Structure and Forward Propagation Algorithm of He-BiLSTM

The structure diagram of He-BiLSTM is shown in Figure 3:
The He-BiLSTM consists of two parts, the variant LSTM and the BiLSTM, which is constructed on the basis of the V-LSTM (Variant LSTM) model. The V-LSTM unit adds a new forget gate internally, which is aimed at the long-term memory c t . Since in the traditional LSTM unit, the information in c t is not required much in the gating mechanism, it may lead to excessive noise information affecting the unit’s ability to process information during the transmission process. Therefore, selectively forgetting information in the long-term memory c t and retaining useful information can improve the performance of the LSTM unit. On the other hand, for the traditional LSTM unit, many researchers have proven that reasonably simplifying the gating mechanism of the LSTM does not affect the performance of the LSTM. Instead, the simplified structure will reduce the parameters that need to be learned inside the unit, thereby improving the unit’s information processing efficiency. For the overall model structure, it can improve the model’s running speed while ensuring accuracy. Based on the above two factors, the V-LSTM was designed. In [26], experiments also show that the V-LSTM unit effectively improved some of the problems of the classical LSTM unit, and to a certain extent, improved the model’s robustness, accuracy, and generalization performance.
The BiLSTM integrates the structures of LSTM in two directions, establishing relationships between units. Therefore, in the BiLSTM, the advantages and disadvantages of the LSTM are both magnified. The disadvantage of the traditional LSTM, which is the inefficiency in processing information that leads to slower model operation, is even more pronounced in the BiLSTM. Based on this situation, considering the improvement of the traditional BiLSTM, the V-LSTM is seamlessly embedded into the BiLSTM to design a He-BiLSTM, thereby enhancing the overall performance of the BiLSTM.
Here, the forward propagation formula of He-BiLSTM is as follows:
f t = σ ( W x f · x t + W h f · h t 1 + b f )
g t = σ ( W c g c t 1 )
i t = ( 1 f t ) g t
c ˜ t = t a n h ( W x c · x t + W h c · h t 1 + b c )
c t = f t c t 1 + i t c ˜ t
o t = σ ( W x o · x t + W h o · h t 1 + b o )
h t = o t t a n h ( c t )
f t = σ ( W x f · x t + W h f · h t + 1 + b f )
g t = σ ( W c g c t + 1 )
i t = ( 1 f t ) g t
c ˜ t = t a n h ( W x c · x t + W h c · h t + 1 + b c )
c t = f t c t + 1 + i t c ˜ t
o t = σ ( W x o · x t + W h o · h t + 1 + b o )
h t = o t t a n h ( c t )
After obtaining h τ and h 0 at time τ , compute their dot product using Formula (34) to obtain h τ as the final result.
h τ = h τ h 0
After computing h τ , pass it through a fully connected layer to compute the predicted value y ^ , using Formula (35), as follows:
y ^ = W y · h τ + b y .
The loss function here is computed using mean squared error (MSE), denoted as E, with the following calculation formula:
E = 1 2 · ( y ^ y ) 2 .
The formula for calculating the total error (total loss) is
E t o t a l = 1 2 · i = 1 n ( y ^ i y i ) 2 .

3.2. Backpropagation Algorithm of He-BiLSTM

BPTT (Backpropagation Through Time) is a backpropagation algorithm used to handle time series data, also known as the gradient descent method. It iteratively adjusts parameters to reduce errors and ultimately finds the optimal parameter values.
In traditional BiLSTM models, there are 24 parameters that need to be learned through backpropagation. Specifically, there are 12 parameters for the positive direction ( W x · , W h · and b · ), and there is another set of 12 parameters for the negative direction ( W x · , W h · and b · ). For He-BiLSTM, due to improvements within the units, in the positive direction, there are W x f , W h f , W c g , W x c , W h c , W x o , W h o , b f , b c , b o , i.e., 10 parameters to be learned; in the backward direction, there are also 10 parameters, reducing the total number of parameters to be learned by 4 compared to traditional BiLSTM.
Let the partial derivative of the loss function E with respect to h τ at the final time τ be denoted as δ τ . The formula for calculating δ τ is as follows:
δ τ = E h τ = E h τ · h τ h τ = ( W y T · ( y ^ y ) ) h 0
After obtaining δ τ , the partial derivative of the loss function E with respect to c τ is
E c τ = δ τ · o τ ( 1 t a n 2 ( c τ ) )
The partial derivative of the loss function E with respect to h τ 1 is
E h τ 1 = E h τ · h τ h τ · h τ h τ 1 = ( W y T · ( y ^ y ) ) · ( W h o T · t a n h c τ o τ ( 1 o τ ) h 0 + W h f T · o τ ( 1 t a n 2 ( c τ ) ) c τ 1 f τ ( 1 f τ ) h 0 + W h f T · o τ ( 1 t a n 2 ( c τ ) ) c τ ( g τ ) f τ ( 1 f τ ) h 0 + W h c T · o τ ( 1 t a n 2 ( c τ ) ) i τ ( 1 c ˜ τ 2 ) h 0 )
The partial derivatives of the loss function E with respect to the weight matrices W h o , W h f , W h c and W c g are, respectively, as follows:
E W h o = δ o , τ · h τ 1 T
E W h f = δ f , τ · h τ 1 T
E W h c = δ c , τ · h τ 1 T
E W c g = δ g , τ · h τ 1 T
The partial derivatives of the loss function E with respect to the weight matrices W x o , W x f and W x c are, respectively, as follows:
E W x o = δ o , τ · x τ T
E W x f = δ f , τ · x τ T
E W x c = δ c , τ · x τ T
The calculation formulas for δ o , τ , δ f , τ , δ c , τ and δ g , τ are as follows:
δ o , τ = δ τ t a n h c τ o τ ( 1 o τ )
δ f , τ = δ τ o τ ( 1 t a n 2 ( c τ ) ) c τ 1 f τ ( 1 f τ )
δ c , τ = δ τ o τ ( 1 t a n 2 ( c τ ) ) i τ ( 1 c ˜ τ 2 )
δ g , τ = δ τ o τ ( 1 t a n 2 ( c τ ) ) c ˜ τ ( 1 f τ ) g τ ( 1 g τ )
The partial derivatives of the loss function E with respect to the bias vectors b f , b o and b c are, respectively, as follows
E b f = δ f , τ
E b o = δ o , τ
E b c = δ c , τ
The above are the formulas for calculating the partial derivatives of all parameters in the forward direction. For the backward direction, there are differences in the computation of δ τ . If we denote the backward activation δ τ during the forward propagation as δ τ , then the formula for calculating δ τ is as follows:
δ τ = E h 0 = E h τ · h τ h 0 = ( W y T · ( y ^ y ) ) h τ
When referring to δ τ in the backward direction in future discussions, it can be replaced with δ τ .
At this point, all the backpropagation formulas for He-BiLSTM have been derived. For the parameter update method, this paper uses the Adam optimizer to update each parameter.

3.3. The Algorithmic Process of He-BiLSTM

Accordingly, the entire algorithmic procedure of He-BiLSTM is presented in Algorithm 1:   
Algorithm 1: The algorithmic process of He-BiLSTM
    Divide the dataset into training and testing sets  X t r a i n , X t e s t ;
    set the size of the batch processing b a t c h s ;
    the dimensions of the hidden layer h t ;
    the dimensions of c t ;
    the time step t;
    the learning rate of the Adam optimizer l r ;
    the number of iterations e p o c h ;
    Random seed number.
    Input: the input vector x t (where t =1, 2... τ )
    Output: y ^
    if  i < e p o c h  then (
          while  j < l e n ( X t r a i n / b a t c h )  do (
               1. During forward propagation, calculate the values of various variables.
               2. Calculate the partial derivatives with respect to all parameters based on
                 the loss function E for backpropagation.
               3. The Adam optimizer updates the iteration of various parameters.
               4. Calculate the error on the training set.
               5. Calculate the error on the test set.
               6. Calculate the model’s running time.
          end (
    end (
    return 1. Calculate the error on the training set.
    2. Calculate the error on the test set.
    3. Calculate the model’s running time.

4. Experiment

Since the model in this paper is an improvement based on BiLSTM, the main comparison model for He-BiLSTM is BiLSTM. The comparison involves aspects such as accuracy, robustness, generalization ability, and running speed between the two. In terms of robustness, the experiment compares the models at moments of stock price volatility, observing the models’ adaptability to sudden changes through a graphical analysis. Additionally, the experiment also considers the ability to predict stock prices for multiple future days as a factor in judging the model’s robustness. In terms of generalization ability, the same model is used in different financial datasets to evaluate the model’s generalization capability.
In the experiment, the data used were stock index data (S&P 500) and futures data (gold) to verify the performance of the model. For the S&P 500 data, 5000 data points from 26 January 2000 to 9 December 2019 were selected. For the gold data, 4960 data points from 25 January 2005 to 23 April 2024 were selected. The input vector dimensions are all four-dimensional, representing the opening price, highest price, lowest price, and closing price. The time step is 10 days, using the data from the previous 10 days to predict the prices for the 11th, 12th, and 13th days.
In the data processing section, the dataset is first divided into a training set and a test set, with the training set consisting of 3000 entries and the test set consisting of 500 entries. After that, the data are checked for missing values, specifically looking for cases where the opening price is zero. Once the check is complete, the data are normalized. The formula for normalization is as follows:
x t * = x t x t ( m i n ) x t ( m a x ) x t ( m i n ) .
Here, x t represents the value of each element in vector, x t ( m i n ) indicates the minimum value in each vector, and x t ( m a x ) indicates the maximum value in each vector.
The evaluation metrics for the model are mean squared error (MSE), mean absolute error (MAE), and the coefficient of determination ( R 2 ). The smaller the values of MSE and MAE, the higher the model’s predictive accuracy and performance; the closer R 2 is to 1, the closer the model’s predicted values are to the actual values. The calculation formulas for the three evaluation metrics are as follows:
M S E = 1 n i = 1 n ( y ^ i y i ) 2
M A E = 1 n i = 1 n | y ^ i y i |
R 2 = 1 i = 1 n ( y i y ^ ) 2 i = 1 n ( y i y ¯ ) 2 .

4.1. Predict the Price for the Next Day

In this section, we initially use the prices from 10 days to predict the price on the 11th day. With a random seed number set to 10, a batch size of 500 samples, and an Adam learning rate of 0.59, the BiLSTM model achieves a prediction accuracy of 93.80% on the training set and 92.79% on the test set. This represents a certain degree of improvement in accuracy compared to the classical LSTM. The classical LSTM is less stable in prediction, easily influenced by hyperparameters, and exhibits greater volatility. The BiLSTM, leveraging the advantages of bidirectional transmission, enhances the dependency of information over time, which significantly improves this issue. In the He-BiLSTM model, the prediction accuracy is even higher than that of Bi-LSTM, with the training set reaching an accuracy of 95.41% and the test set reaching 94.23%. The prediction effect diagrams for both BiLSTM and He-BiLSTM are shown in Figure 4.
From Figure 4, it is evident that during periods of price volatility, the predictions of He-BiLSTM are closer to the actual values. To compare the predictive capabilities of the two models during periods of stable prices, a section of Figure 4 is enlarged for comparison. The comparative chart is shown in Figure 5.
Upon comparing the prediction effect diagrams during periods of stable prices, it is observed that the predictions of He-BiLSTM are more accurate than those of BiLSTM. This indicates that He-BiLSTM demonstrates superior robustness compared to BiLSTM, regardless of whether the prices are stable or experiencing sudden changes.
For the comparison of evaluation metrics for the two models, see Table 1:
From Table 1, it can be seen that compared to BiLSTM, He-BiLSTM has superior evaluation metrics across the board. On the test set, the MSE of He-BiLSTM is reduced by 38.36% compared to BiLSTM, and the MAE is reduced by 33.42% compared to BiLSTM. In terms of accuracy, He-BiLSTM has improved by an additional 3.87% over BiLSTM.

4.2. Predicting the Price for Multiple Future Days

To validate the generalization capability of He-BiLSTM, it is employed to predict the prices of the 12th and 13th days based on the prices of the first 10 days, and the results are compared with those of BiLSTM. The specific evaluation metrics are shown in Table 2.
When predicting the prices for the next two days, it can be observed that the test accuracy of He-BiLSTM can still reach 92.79%. Compared to predicting the price for the next day, the accuracy only drops by less than 2% from 94.22%. However, for BiLSTM, although the accuracy on the test set can still exceed 85%, it drops by nearly 4% compared to 90.36% when predicting the price for the next day. This indicates that in terms of robustness and generalization ability, He-BiLSTM performs better.
As can be seen from Table 3, When predicting the prices for the next three days, He-BiLSTM’s test set accuracy is 90.03%, still maintaining a benchmark above 90%. In contrast, BiLSTM’s test set accuracy has dropped to 74.28%. Comparatively, He-BiLSTM’s evaluation metrics are now significantly ahead of BiLSTM’s. This demonstrates that He-BiLSTM has excellent capability for predicting prices over multiple future days, which, in the practical field of financial investment, can provide investors and decision makers with a broader range of strategies.
Since the He-BiLSTM model achieves an accuracy of 94.2% in predicting the stock index price for the next day, it is considered a high-probability event. Therefore, in real market conditions (excluding the impact of sudden news events), if the predicted lowest price for the next day is within a 6% fluctuation range from the current price, buying at that point and selling when the predicted highest price is within a 6% fluctuation range from the current price can maximize profits. If the He-BiLSTM predicts a higher price within the next two days and one aims for greater returns, the accuracy decreases to 92.8%; thus, the buying and selling price fluctuation range should be around 8%. Similarly, if a higher price is predicted within the next three days, the accuracy is 90.3%, and the fluctuation range for buying and selling should be around 10%.
In individual stock price predictions, Microsoft and Kweichow Moutai were also selected for further stock price forecasting. The prediction results are shown in Figure 6 and Figure 7, respectively. The experiments showed that the overall performance of the He-BiLSTM model surpassed that of the BiLSTM.
Additionally, to further demonstrate the model’s generalization capability, He-BiLSTM is applied to futures data and compared with BiLSTM in experiments, with results also showing that He-BiLSTM is superior to BiLSTM.The prediction results are shown in Figure 8.

4.3. Comparison of Model Convergence Speed

In terms of model running speed, the time for one iteration of both He-BiLSTM and BiLSTM is recorded, with the first 10 iterations being counted. The statistical results of the running time are shown in Table 4.
From Table 4, it can be observed that as the number of iterations increases, the advantage of He-BiLSTM in terms of reduced time consumption becomes increasingly evident. After 10 iterations, He-BiLSTM takes a total of 159.378 s, while BiLSTM takes 172.683 s. Compared to BiLSTM, He-BiLSTM’s time consumption is reduced by 7.70%.

5. Conclusions

BiLSTM is currently a more advanced model. To further improve the performance of BiLSTM and better cope with the aperiodicity and volatility of financial market prices, this paper starts with the internal units of BiLSTM to improve the overall performance of the model. The V-LSTM unit is seamlessly integrated into BiLSTM, reconstructing it into He-BiLSTM and providing the backpropagation algorithm. The V-LSTM unit adds a new forget gate to the long-term memory c t , and the information forgotten by c t and the information coupled with the forget gate f t are used together as the input for the next unit. This allows each unit to utilize c t when transmitting information backward, while at the same time, it simplifies the number of parameters that each LSTM unit needs to learn, improving the efficiency of internal information processing. These advantages of the V-LSTM unit, when applied to BiLSTM, will further enhance the performance of He-BiLSTM.
To validate the robustness, accuracy, generalization ability, and model running speed of He-BiLSTM as superior, this paper has designed three experiments.
Firstly, He-BiLSTM and BiLSTM are applied to predict the stock price of S&P 500 for the next day. The experimental results show that compared to BiLSTM, He-BiLSTM has a smaller mean squared error, being reduced by 38.36% in comparison to BiLSTM’s mean squared error of 0.000318619, and with the accuracy rate being increased by 3.87% from 90.36%. Moreover, the improvement in He-BiLSTM’s predictive capability is not only reflected in parts where the price changes abruptly but also during periods of stable prices, where the predictive accuracy is further enhanced. This also proves the strong robustness of the model.
Secondly, to verify the generalization capability of He-BiLSTM, it was used to predict prices for multiple days in the future and compared with BiLSTM. The experiments showed that even when predicting up to 3 days into the future, He-BiLSTM can still maintain an accuracy rate of over 90% on the test set. Furthermore, He-BiLSTM not only performs better in stock index data prediction but also yields good results in the futures field. This proves that He-BiLSTM has stronger generalization capabilities.
Lastly, in terms of model running speed, He-BiLSTM operates faster. When iterating 10 times under the same conditions, He-BiLSTM can reduce the time by approximately 7.70%.
The performance characteristics of He-BiLSTM, when applied to the field of financial investment, can provide investors with more accurate price forecasts, more strategic choices for the future, and faster trading operations, thereby helping to address some of the risks inherent in the financial markets.

6. Limitations

Since the prediction of stock prices can also be influenced by factors such as financial reports, interest rates, economic indicators, and geopolitical events, these non-technical factors may also impact the trend of stock prices. This is an area that needs to be considered and improved in future research work. In terms of code implementation, the model runs on custom-written NumPy code, and we will optimize the overall code in the future to facilitate the overall invocation of the model. Additionally, the model will be applied to a broader range of fields for experimentation to further verify the model’s generalization capabilities.

Author Contributions

Conceptualization, S.S.; Methodology, S.S.; Writing—original draft, S.S.; Supervision, L.L.; Project administration, L.L.; Funding acquisition, L.L. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the National Natural Science Foundation of China (62173222) and Shanghai University of Engineering Science Horizontal Research Project (SJ20230195).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Atsalakis, G.S.; Valavanis, K.P. Surveying stock market forecasting techniques—Part II: Soft computing methods. Expert Syst. Appl. 2009, 36, 5932–5941. [Google Scholar] [CrossRef]
  2. Kao, L.; Chiu, C.; Lu, C.; Chang, C. A hybrid approach by integrating wavelet based feature extraction with MARS and SVR for stock index forecasting. Decis. Support Syst. 2013, 54, 1228–1244. [Google Scholar] [CrossRef]
  3. Shahzad, F. Does weather influence investor behavior, stock returns, and volatility? Evidence from the Greater China region. Phys. Stat. Mech. Its Appl. 2019, 523, 525–543. [Google Scholar] [CrossRef]
  4. Ao, K.; Zhu, H. Predicting Trend of High Frequency CSI 300 Index Using Adaptive Input Selection and Machine Learning Techniques. J. Syst. Sci. Inf. 2018, 6, 120–133. [Google Scholar]
  5. Zhang, D.; Lou, S. The application research of neural network and BP algorithm in stock price pattern classification and prediction. Future Gener. Comput. Syst. 2021, 115, 872–879. [Google Scholar] [CrossRef]
  6. Liu, Y.Y.; He, X.S. BP Neural Network Stock Price Prediction Based on Adaptive Firefly Algorithm. J. Weinan Norm. Univ. 2019, 34, 87–96. [Google Scholar]
  7. Kong, D.W.; Liu, S.J.; Pan, L. Amazon Spot Instance Price Prediction with GRU Network. In Proceedings of the IEEE International Conference on Computer Supported Cooperative Work in Design (CSCWD), Dalian, China, 5–7 May 2021. [Google Scholar] [CrossRef]
  8. Liu, B.; Yu, Z.; Wang, Q.; Du, P.; Zhang, X. Prediction of SSE Shanghai Enterprises index based on bidirectional LSTM model of air pollutants. Expert Syst. Appl. 2022, 204, 117600. [Google Scholar] [CrossRef]
  9. Abdelhamid, A.A.; El-Kenawy, E.M.; Alotaibi, B.; Amer, G.M.; Abdelkader, M.Y.; Ibrahim, A.; Eid, M.M. Robust speech emotion recognition using CNN+LSTM based on stochastic fractal search optimization algorithm. IEEE Access 2022, 10, 49265–49284. [Google Scholar] [CrossRef]
  10. Gers, F.A.; Schraudolph, N.N.; Schmidhuber, J. Learning precise timing with LSTM recurrent networks. J. Mach. Learn. Res. 2003, 3, 115–143. [Google Scholar] [CrossRef]
  11. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  12. Peng, T.; Zhang, C.; Zhou, J.Z.; Nazir, M.S. An integrated framework of Bi-directional Long-Short Term Memory (BiLSTM) based on sine cosine algorithm for hourly solar radiation forecasting. Energy 2021, 221, 119887. [Google Scholar] [CrossRef]
  13. Cheng, H.; Ding, X.; Zhou, W.; Ding, R. A hybrid electricity price forecasting model with Bayesian optimization for German energy exchange. Int. J. Electr. Power Energy Syst. 2019, 110, 653–666. [Google Scholar] [CrossRef]
  14. Kulshrestha, A.; Krishnaswamy, V.; Sharma, M. Bayesian BILSTM approach for tourism demand forecasting. Ann. Tour. Res. 2020, 83, 102925. [Google Scholar] [CrossRef]
  15. Abe, M.; Nakayama, H. Deep learning for forecasting stock returns in the cross section. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Melbourne, VIC, Australia, 3–6 June 2018; pp. 273–284. [Google Scholar]
  16. Fischer, T.; Krauss, C. Deep learning with long short-term memory networks for financial market predictions. Eur. J. Oper. Res. 2018, 270, 654–669. [Google Scholar] [CrossRef]
  17. Kraus, M.; Feuerriegel, S. Decision support from financial disclosures with deep neural networks and transfer learning. Decis. Support Syst. 2017, 104, 38–48. [Google Scholar] [CrossRef]
  18. Minami, S. Predicting equity Price with corporate action events using LSTMRNN. J. Math. Financ. 2018, 8, 58–63. [Google Scholar] [CrossRef]
  19. Choi, H.K. Stock price correlation coefficient prediction with ARIMA-LSTM hybrid model. arXiv 2018, arXiv:1808.01560. [Google Scholar]
  20. Fan, Y.; Qian, Y.; Xie, F.L.; Soong, F.K. TTS synthesis with bidirectional LSTM based recurrent neural networks. In Proceedings of the Fifteenth Annual Conference of the International Speech Communication Association, Singapore, 14–18 September 2014. [Google Scholar]
  21. Graves, A.; Jaitly, N.; Mohamed, A.R. Hybrid speech recognition with deep bidirectional LSTM. In Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013—Proceedings, Olomouc, Czech Republic, 8–12 December 2013; pp. 273–278. [Google Scholar] [CrossRef]
  22. Graves, A.; Schmidhuber, J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 2005, 18, 602–610. [Google Scholar] [CrossRef]
  23. Schuster, M.; Paliwal, K.K. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 1997, 45, 2673–2681. [Google Scholar] [CrossRef]
  24. Siami-Namini, S.; Tavakoli, N.; Namin, A.S. The performance of LSTM and BiLSTM in forecasting time series. In Proceedings of the 2019 IEEE International Conference on Big Data, Los Angeles, CA, USA, 9–12 December 2019; pp. 3285–3292. [Google Scholar] [CrossRef]
  25. Gers, F.A.; Schmidhuber, J.; Cummins, F. Learning to forget: Continual prediction with lstm. Neural Comput. 2000, 12, 2451–2471. [Google Scholar] [CrossRef]
  26. Sang, S.; Li, L. A Novel Variant of LSTM Stock Prediction Method Incorporating Attention Mechanism. Mathematics 2024, 12, 945. [Google Scholar] [CrossRef]
Figure 1. LSTM unit structure.
Figure 1. LSTM unit structure.
Applsci 14 09158 g001
Figure 2. BiLSTM structure diagram.
Figure 2. BiLSTM structure diagram.
Applsci 14 09158 g002
Figure 3. He-BiLSTM structure diagram.
Figure 3. He-BiLSTM structure diagram.
Applsci 14 09158 g003
Figure 4. Comparison of He-BiLSTM and BiLSTM in S&P 500 forecasting.
Figure 4. Comparison of He-BiLSTM and BiLSTM in S&P 500 forecasting.
Applsci 14 09158 g004
Figure 5. Training and test set local comparison effect diagram.
Figure 5. Training and test set local comparison effect diagram.
Applsci 14 09158 g005
Figure 6. Comparison between He-BiLSTM and BiLSTM in Microsoft stock price prediction.
Figure 6. Comparison between He-BiLSTM and BiLSTM in Microsoft stock price prediction.
Applsci 14 09158 g006
Figure 7. Comparison between He-BiLSTM and BiLSTM in Kweichow Moutai stock price prediction.
Figure 7. Comparison between He-BiLSTM and BiLSTM in Kweichow Moutai stock price prediction.
Applsci 14 09158 g007
Figure 8. Comparison of He-BiLSTM and BiLSTM in gold futures price forecasting.
Figure 8. Comparison of He-BiLSTM and BiLSTM in gold futures price forecasting.
Applsci 14 09158 g008
Table 1. Comparison of evaluation metrics for predicting the next day.
Table 1. Comparison of evaluation metrics for predicting the next day.
Evaluation IndicatorsMSE (Train)MSE (Test)MAE (Train)MAE (Test) R 2 (Train) R 2 (Test)
BiLSTM 9.49135 × 10 5 0.0003186190.008098410.01360130.9190830.90357
He-BiLSTM 6.12311 × 10 5 0.0001963910.005729860.009056350.9541460.942272
Table 2. Comparison of evaluation metrics for predicting the next two days.
Table 2. Comparison of evaluation metrics for predicting the next two days.
Evaluation IndicatorsMSE (Train)MSE (Test)MAE (Train)MAE (Test) R 2 (Train) R 2 (Test)
BiLSTM 9.1886 × 10 5 0.0004220570.007300220.01532760.9202950.862634
He-BiLSTM 8.09596 × 10 5 0.0002658740.006362950.01035760.9380240.927856
Table 3. Comparison of evaluation metrics for predicting the next three days.
Table 3. Comparison of evaluation metrics for predicting the next three days.
Evaluation IndicatorsMSE (Train)MSE (Test)MAE (Train)MAE (Test) R 2 (Train) R 2 (Test)
BiLSTM0.0001144560.0007078410.008095430.02091390.8880850.742807
He-BiLSTM 9.67334 × 10 5 0.0003333680.006865060.0120310.9261510.90033
Table 4. Time consumed per iteration for the first 10 rounds.
Table 4. Time consumed per iteration for the first 10 rounds.
Running TimeBiLSTMHe-BiLSTM
116.152513.6154
235.537830.074
354.52746.0516
471.051562.8683
588.244678.1939
6106.6994.4182
7123.867112.488
8138.934128.593
9156.113143.171
10172.683159.378
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sang, S.; Li, L. A Stock Prediction Method Based on Heterogeneous Bidirectional LSTM. Appl. Sci. 2024, 14, 9158. https://doi.org/10.3390/app14209158

AMA Style

Sang S, Li L. A Stock Prediction Method Based on Heterogeneous Bidirectional LSTM. Applied Sciences. 2024; 14(20):9158. https://doi.org/10.3390/app14209158

Chicago/Turabian Style

Sang, Shuai, and Lu Li. 2024. "A Stock Prediction Method Based on Heterogeneous Bidirectional LSTM" Applied Sciences 14, no. 20: 9158. https://doi.org/10.3390/app14209158

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop