1. Introduction
As a typical complex system, the stock market exhibits characteristics of non-linearity, non-stationarity, and multi-components [
1,
2,
3]. In complex real-world scenarios, accurately identifying the specific factors that influence stock market fluctuations becomes challenging. Many traditional approaches are no longer applicable for predictive research in this domain. Therefore, it is necessary to employ more sophisticated methods to uncover the complex features embedded within time series data.
Currently, methods such as logistic regression, gradient boosting models, and deep learning primarily aim to identify correlation information between variables by fitting observed data. Subsequently, variables with high correlations to the target variable are selected as input variables for prediction [
4,
5,
6,
7]. However, focusing solely on correlations and not considering causality when choosing predictors can affect the accuracy of predictions [
8]. Correlations may arise from causal relationships, but it is not equivalent to causality and is further influenced by confounding variables. Confounding variables affect both the treatment and target variables simultaneously, resulting in correlations between them [
9].
In most cases, the Granger causality test is used to infer causal relationships between financial time series [
10,
11]. Traditional methods for inferring Granger causality mainly include Vector Autoregression (VAR) [
12], the Vector Error Correction Model (VECM) [
13], and their respective variants [
14,
15].
Expanding on the ideas discussed above, Xu et al. [
16] presented a novel causal decomposition approach and further applied it to investigate information flow between two financial time series on different time scales. The causal decomposition method has three main steps: decomposition, reconstruction, and causality testing. By tracking the driving factors of causal relationships from the perspective of information frequency, the causal decomposition method re-evaluates the causal relationship between stock prices and trading volume from the time-frequency perspective.
Although causal inference methods based on statistical modeling have been extensively studied and have yielded fruitful results, these methods mainly focus on evaluating the interaction between two variables, overlooking the impact of confounding variables [
17,
18,
19,
20,
21,
22,
23,
24]. Constructing more intricate causal networks, based on the examination and analysis of pairwise causal relationships, still presents significant challenges. This necessitates the exploration of novel methods and theories to tackle issues such as indirect dependence resulting from front-door paths and common driving factors caused by backdoor paths.
Moreover, it has been found that most economic time series are non-stationary according to various unit root test methods, such as the Augmented Dickey-Fuller (ADF) test [
25]. By applying preprocessing techniques, like differencing or logarithmic transformation, these time series can be transformed into stationary data. VAR and the VECM are usually more effective when the input data are stationary [
26]. Therefore, for the inference of Granger causality in non-stationary time series, the inputs need to be preprocessed to obtain a stationary series in order to avoid forecast distortion. Assuming that the time series becomes stationary after a difference order of 1 or 2, static causal relationships can be inferred. However, economic data often exhibit dynamic properties, and the preprocessing steps may overlook the dynamic causal relationships within the time series. Causal relationships may not only exist between the current and previous period of data. The oversimplification of preprocessing methods can lead to disregarding dynamic causality and losing valuable information from the original data.
Machine learning (ML) models can capture non-linear and complex relationships better than traditional statistical models [
27]. Research has been conducted on causal inference utilizing ML models to overcome limitations in conventional approaches. The use of ML models offers a more accurate and comprehensive ability to infer causal relationships. These models can handle complex datasets that involve a substantial number of variables and aim to identify and infer causal relationships within them. This would help to reveal potential causal paths in causal networks [
28].
Leng et al. [
29] proposed an independent component analysis (ICA) framework, inspired by the decision tree (DT) algorithm in machine learning, for measuring causal relationships by calculating feature importance. The core idea of the framework is to convert time series data into a causal network representation and to make causal inferences based on feature importance. The ICA framework is designed at the network level, serving as a connection and link between traditional mutual causality detection methods and causal network reconstruction.
While the application of machine learning in causal analysis brings forth new perspectives and approaches, it also possesses certain shortcomings. These limitations encompass sensitivity to data bias and causal confounding, as well as restrictions in handling complex non-linear relationships [
30,
31]. In comparison, deep learning models exhibit more robust expression and pattern learning capabilities, as they can acquire abstract computational methods from intricate and high-dimensional data [
32]. For example, Chong et al. [
33] explored a deep learning-based model to evaluate the efficacy of three unsupervised feature extraction methods to predict future market behavior. Similarly, other studies have made significant advancements in the financial domain through the utilization of deep learning models.
Indeed, a dual-stage attention-based recurrent neural network (RNN) model proposed by Qin et al. [
34] has shown promising results in predicting stock datasets. By adaptively extracting relevant input features for prediction, the model can effectively capture important information and make accurate predictions. Zahra et al. [
35] combined the convolutional neural network (CNN) and long short-term memory (LSTM) with a fundamental analysis. By extracting and synthesizing features at multiple levels and dimensions, the model can capture both local and global patterns and improve prediction accuracy. Rahman et al. [
36] applied the GRU model to predict stock data of Coca-Cola and reduced modeling errors. The GRU model is effective in overcoming the problem of vanishing gradients, which can occur in deep learning models and affect their performance. Compared to traditional econometric and ML approaches, the deep learning-based models demonstrate better prediction performance. This highlights the efficiency and effectiveness of deep learning architectures in handling financial time series data.
Wei [
37] proposed an interpretable deep learning architecture, known as deep learning inference (DLI), to investigate Granger causality. The main contribution of DLI is to uncover the Granger causality between Bitcoin price and the S&P index, enabling more accurate prediction of Bitcoin prices in relation to the S&P index. However, this method only provides a visual representation of the predicted results for Bitcoin prices and lacks evaluation metrics to quantify the results. Its inference of a Granger causality between the two variables is solely based on the enhanced prediction performance of Bitcoin prices after incorporating historical data from the S&P index. Studies inferring the causal relationship of individual stock-related factors based on deep learning models are gradually stimulating the interest of researchers.
Tank et al. [
38] introduced the neural Granger test, enhancing the detection of Granger causality by incorporating non-linear interactions. The researchers proposed a suite of non-linear architectures, wherein each time series is modeled using either Multi-Layer Perceptron (MLP) or RNN. Inputs to this non-linear framework consist of past lags from all series, while the outputs predict the future values of each series. Additionally, a group lasso penalty is applied to effectively reduce the input weights to zero, refining the predictive accuracy of the model.
In order to better comprehend and explicate causal relationships in data, Donald Rubin put forward the POF. The POF seeks to reveal dynamic causal relationships through randomized trials and natural experiments [
39,
40]. Additionally, Judea Pearl introduced the causal diagrams model as a formalized approach for researchers to describe and infer causal relationships. Combining the potential outcomes framework and causal diagrams can lead to a better understanding and explanation of causal relationships in observed data for causal inference.
Although there has been significant progress in the investigation of causality within stock data, the Granger causality test, designed for linear and stable data, exhibits biases when applied to non-linear and unstable stock data. Moreover, the majority of causal inference methodologies neglect the issue of interference caused by confounding variables, as well as the lack of further validation of the results. To address the aforementioned issue, in this paper, a deep learning-based causal inference architecture and algorithm inspired by the POF, which also integrates causal diagrams and the non-linear Granger test, is proposed. This framework and algorithm are specifically designed to infer the causal relationships between individual stock closing prices and their relevant factors.
The innovation of this paper lies in the utilization of causal diagrams and the establishment of a grouped architecture through deep learning networks. The primary contributions of the proposed methods are summarized as follows:
To better understand the impact of confounding variables on causal relationships, causal diagrams are employed in the stock data analysis to explore the relationships among confounding variables, treatment variables, and target variables. The application of front-door and backdoor adjustments allows for the control of confounding variables and the accurate inference of the relationship between closing prices and relevant factors.
To leverage the computational power of deep learning networks and address the deficiencies of the Granger test, a sliding window strategy is incorporated into the GRU model to achieve precise estimation of closing prices. The enhanced capability of GTU serves to expand the applicability of the Granger test beyond the realm of linear stationary data, facilitating the direct assessment of causal linkages within non-linear time series data.
To control for confounding variables, a grouped architecture structure is built using GRUs combined with a sliding window strategy, inspired by the POF. Additionally, the non-linear Granger test is utilized to infer the causal relationships between individual stock closing prices and relevant factors, thus implementing a deep learning-based causal inference framework and algorithm.
To further validate the accuracy of the inferred causal relationships, different sets of input variables, such as individual closing prices, all related factors, and causal factors, are used when predicting stock closing prices. The results show that including causal factors as input variables significantly enhances prediction accuracy. These findings provide additional validation of the effectiveness and reliability of the proposed algorithm.
The remainder of the paper is organized as follows: In
Section 2, we introduce causal diagrams and the Granger test as well as presents a deep learning-based architecture and algorithm for causal inference between stock closing prices and relevant factors. The dataset used as well as the evaluation metrics are also presented in
Section 2. The experimental results are presented in
Section 3.
Section 4 discusses the experimental results to validate the effectiveness of the algorithm. Some conclusions are provided in
Section 5.
2. Materials and Methods
2.1. Causal Diagrams
Causal diagrams, which utilize directed acyclic graphs (DAGs), were introduced by Judea Pearl to describe the causal relationships between variables. Causal diagrams are helpful in eliminating estimation bias through conditional distributions [
41]. The fundamental idea behind this approach is to estimate and test distributions while minimizing bias introduced by other variables. Due to the presence of confounding variables, three distinct paths arise in causal inference: causal paths, backdoor paths, and front-door paths.
Figure 1 illustrates these three paths. Here,
Z refers to confounding variable,
X represents the treatment variable, and
Y represents the target variable.
The set of variable Z in backdoor path satisfies the backdoor criterion:
The variable Z set in the front-door path satisfies the front-door criterion:
Z cuts off all directed paths from X to Y.
There is no backdoor path from X to Z.
X blocks all backdoor paths from Z to Y.
The significance of the backdoor criterion and the front-door criterion lies in their ability to estimate certain causal effects using observed data, even when some variables are unobservable. These two criteria are helpful in identifying confounding variables and in designing experimental studies.
The forthcoming experimental design will employ backdoor adjustment and front-door adjustment to truncate the backdoor paths and front-door paths, respectively, based on the backdoor criterion and the front-door criterion. The influence of confounding variables on the causal paths will be eliminated by doing so, allowing the model to correctly identify and assess the causal relationships and effects among stock-related factors.
2.2. Granger Causality Test
The Granger causality test uses statistical techniques to analyze the causality of economic variables [
42]. The existence and direction of causal relationships between variables are determined through the assessment of the significance of respective prior period indicators, as reflected by the lagged variables of the economic variables. This assessment helps to explain and influence indicators of each other, leading to conclusive results. The Granger causality test commonly employs a distributional lag model to infer whether the previous level of variable
impacts the subsequent level of variable
, which is typically represented as follows:
where
represents the distributed lag term of
, examining whether
has an impact on the current level of
. The coefficient
reflects the magnitude of this impact, indicating the existence of causality.
is the distributed lag term of
, and the coefficient
represents its impact.
denotes the error term. To infer the causality of
on
is to examine the following hypotheses:
This hypothesis is generally tested by constructing the
F-test statistic Equation (3), which is defined as
where
represents the sum of squared errors under the null hypothesis
,
is the sum of squared errors under the alternative hypothesis
in Equation (4),
denotes the lag length, and
is the sample size. This statistic satisfies the
F-distribution, when the null hypothesis is that
is not the cause of
. By referring to the
F-distribution table, one can determine the statistical significance of the test at a specific confidence level. If the original hypothesis is rejected, this indicates that there is a causal relationship from variable
to variable
.
The alternative hypothesis
is denoted as
2.3. Temporal Causal Network
The stock closing price and its relevant factors are time-series data.
Figure 2 depicts the causal relationships among these temporal data. In this figure,
refers to the treatment variable,
represents the potential confounding variable, and
Y denotes the target variable. The arrows in the figure indicate the direction of influence between the variables.
Specific techniques must be employed for these confounding variables that vary over time. Before using these methods, certain generalized premise assumptions must be fulfilled [
43]:
Sequential Ignorability Assumption: the assumption is that for each time point, if after controlling for a series of values of the confounding factors, the outcome of the treatment at each time point is only affected by its own treatment status and not by the treatment status of other time points, i.e., the effect of accepting the treatment on a single individual is independent of whether other individuals accept the treatment or not. The Assumption can be expressed as
here, for time point , the values of various confounding variables at each time point prior to the time point, which includes time point t, can be denoted as . The string of historical values of the treatment variable prior to t, excluding t, is written as Xt−1. The various potential values of Y are represented as . T0 indicates that Y does not accept the treatment and T1 indicates that Y accepts the treatment. If this assumption is not satisfied, there is in fact a confounding of causal relationships at each time point. And it is impossible to accurately estimate the causal effect at each time point.
- 2.
Consistency Assumption: this assumption requires that the observed value of Y under a specific sequence of treatment variable values is equal to its potential value.
- 3.
Positive Value Assumption: this assumption refers to the probability of an individual receiving a treatment intervention at time point t being between 0 and 1, but not equal to either 0 or 1, after controlling for a series of confounding variables up to and including time point t, and the sequence of treatment variable values before time point t.
Introducing a time dimension in deep learning models allows for the creation of temporal structures to better handle time-related confounding variables. For instance, models such as RNN or LSTM are used to capture temporal dependencies. The prediction of stock closing prices using deep learning models can be obtained for the values in potential states, satisfying the consistency assumption. In addition, neural networks can satisfy the positivity assumption with activation functions, such as the Sigmoid function, Tanh function, ReLu function, etc.
However, experimental design is still required to satisfy the sequential ignorability assumption. After fulfilling the three main assumptions, the temporal causal network can be abstracted as a neural network, where the arrows in the neural network are determined by their corresponding weights. If the weight is zero, it implies that the corresponding path does not exist.
2.4. Deep Learning-Based Causal Inference Network Architecture and Algorithms
The causal inference architecture based on deep learning is illustrated in
Figure 3. This architecture employs GRU networks to capture temporal dependencies in sequence data and extract feature representations. Both LSTM and GRUs address the issues of gradient vanishing, gradient explosion, and long-term dependencies in a traditional RNN. However, compared to LSTM, GRUs possess a more concise structure, which makes them computationally faster and suitable for handling large-scale datasets. Meanwhile, GRUs exhibit higher efficiency in memory utilization, reducing the burden of storage and computation.
A sliding window strategy with a window size of 5 is used in GRUs. This strategy allows the segmentation and processing of time series data in fixed window lengths. By sliding the window, continuous subsequences can be obtained and utilized for further analysis and modeling. This approach proves to be highly effective in capturing local patterns and dynamic features within sequences, thereby enhancing the performance and effectiveness of the model.
The fully connected layer converts the feature extraction and representation from the previous layer into the final output result. By learning the weights of each connection, the fully connected layer can adjust these weights during training to minimize the loss function, allowing the model to make accurate predictions.
To satisfy the assumption of sequential ignorability and eliminate the interference brought by confounding variables in analyzing causal relationships, backdoor adjustment and front-door adjustment were conducted. The input data were divided into two groups: an experimental group and a control group. When one test factor is selected as the treatment variable, the other test factors are the confounding variables, and the stock closing price serves as the target variable.
Control group: the historical information of confounding variables Z and the target variable Y is utilized to predict the target variable Y.
Experimental group: The historical information of the treatment variable X, confounding variables Z, and the target variable Y is utilized to predict the target variable Y.
The distribution of confounding variables
Z remains unchanged between the control group and the experimental group, with the target variable consistently being the closing price. According to the backdoor criterion and the front-door criterion, it can be inferred that all the backdoor paths and front-door paths between
X and
Y are cut off. This experimental design also satisfies the MB-by-MB (Markov Blanket by Markov Blanket) algorithm in the local learning of causal networks [
44], and the optimal stepwise intervention design in the active learning of causal networks [
45].
The corresponding network model for causal inference architecture is shown in
Figure 4. The innovation of this model is to build a grouped architecture using two GRU networks, which achieves the control of confounding variables and the accurate prediction of the closing price of the target variable under different circumstances. It is coupled with a sliding window strategy, so that it satisfies the assumption of sequential ignorability.
Specifically, when a factor is selected as a treatment variable, other factors are potential confounding variables, and the closing price of an individual stock is the target variable. The data underwent initial processing and grouping to obtain the experimental and control groups. Then the two groups of data were used as inputs, and the potential values of closing prices in different situations were obtained by sliding window and GRU calculations, respectively, and then the causal relationship between the closing prices of individual stocks and the relevant factors was inferred by the non-linear Granger test.
Closing prices were predicted using data from the experimental group and control group. Since the computational process of the GRU network is non-linear, the formula of the lagged distribution model under the
assumption is rewritten as
where
represents the baseline value;
denotes the composite function;
indicates the lag length;
,
, and
are the coefficients of the corresponding distributional lag terms;
,
, and
are distributional lags of
,
, and
, respectively; and
stands for the error term. The output under the
assumption is obtained as:
Subsequently, a Granger causality test is conducted by calculating the F-value. The formula is as follows:
where
is the true value of the closing price. If
(where
is obtained by querying the F-distribution table using
and
values), the null hypothesis
is rejected. Thus, it is inferred that there is a causal relationship between the treatment variable and the target variable, i.e., the treatment variable is the cause of the target variable.
Algorithm 1 shows the algorithm for the causal inference network architecture based on deep learning.
Algorithm 1. Causal Inference Algorithm Based on Deep Learning. |
Input: The experimental dataset contains T samples, N features ; the control group dataset contains T samples and M features ,,,,,,,,,, the lag length , the number of training cycles Output: Granger causality of closing price |
1 | Initialize the parameters of the GRU model; |
2 | # Storage for the results of each iteration. |
3 | for do |
4 | for do |
5 | # Storage for the results y1 of each iteration. |
6 | # Storage for the results y2 of each iteration. |
7 | for do |
8 |
|
9 |
|
10 |
|
11 |
|
12 |
end for |
13 |
|
14 |
|
15 |
|
16 |
|
17 |
end for |
18 | end for |
19 | return |
2.5. Dataset
Using BaoStock to obtain the time series data of a stock, there are 12 factors to consider for evaluating the causal relationship of closing prices of the stock. These factors include opening price, highest price, lowest price, trading volume, trading amount, turnover rate, percentage change, price-earnings (P/E) ratio, price-to-book (P/B) ratio, price-to-sales (P/S) ratio, price-to-cash flow (P/CF) ratio, and the Shanghai Stock Exchange (SSE) Index.
The SHCOMP is a comprehensive stock index that reflects the performance of the overall A-share market. If the SHCOMP undergoes significant fluctuations, the majority of stock prices will be affected. Based on historical data and market performance, certain industries are more affected to changes in the SHCOMP, including the following industries:
Financial industry: Due to the significant impact of government policies and regulations on the financial market, the volatility of financial stocks is more pronounced relative to other industries. China Taibao (sh.601601) was chosen as a representative. It is a Chinese insurance company with a substantial market capitalization and a particular degree of influence.
Real estate industry: The real estate market has a large contribution, weighting ratio to the Shanghai stock market. The relaxation or tightening of property market policies has a considerable impact on the volatility of stock prices. Poly Real Estate (sh.600048), a renowned real estate developer in China, involved in diverse sectors, such as residential, commercial real estate, and office buildings, has been selected as a representative.
Energy and raw materials industry: The profitability of these industries is affected by factors such as changes in the global supply and the demand of raw materials, international oil prices, policy environment, among others. China Petroleum & Chemical Corporation (sh.601857) has been selected. It is one of largest oil and gas producers in China, with abundant energy resources and a significant market share.
There are also some industry stocks that are relatively less affected. Based on historical data and market performance, the following are some of the industries that are less affected by the volatility of the SHCOMP:
Public utility companies: Public utility companies typically exhibit a relatively stable earnings model and decent cash flow. As a result, they may be relatively less affected by the volatility of the SHCOMP. China Guodian (sh.601985) is chosen because it is one of the largest power companies in China.
Food & beverage industry: As a general trend, food and beverage enterprises tend to have a stable income and profit model, with a relatively stable market and less volatility in comparison to other industries. Yingjia Gongjiu (sh.603198), a well-known Chinese Baijiu brand, was chosen as a representative.
Banking industry: Although the financial industry as a whole tends to be volatile, the stock prices of the banking industry are relatively less affected by the SHCOMP because it has substantial cash flows and asset-liability structure. Shanghai Pudong Development Bank (sh.600000) was chosen due to a relatively stable business model and income.
To ensure sufficient data support and incorporate diverse market conditions, the date range of the selected stocks is from 1 July 2017 to 1 July 2022. This approach also aims to minimize the disturbance of structural changes, allowing for a comprehensive observation and analysis of long-term trends and cyclical fluctuations in the stock market, ultimately enhancing the reliability and generalization of the findings.
2.6. Evaluation Parameter
- (1)
Root mean square error (RMSE)
The
RMSE is the square root of the sum of squared differences between predicted values and true values, which is defined as
where
denotes the predicted value,
represents the true value, and
is the number of observations. The sum of squared deviations is highly sensitive to errors that are either significantly larger or smaller. Consequently, the resulting error measure provides a reliable assessment of the predictive performance. A smaller
RMSE value denotes superior prediction performance, while a larger
RMSE value indicates a greater divergence from the true results.
- (2)
Mean absolute error (MAE)
The MAE is the average of absolute differences between predicted values and true values of the model. As a result, it intuitively captures the discrepancy between predicted and true values. The
MAE is determined by
A smaller
MAE value indicates a smaller difference between predicted values and true values, suggesting that the prediction results are closer to true values.
- (3)
Mean absolute percentage error (MAPE)
The
MAPE diminishes the influence of magnitude in comparison to the previous two metrics, making it well-suited for assessing the efficacy of a model in predicting various stocks, which is formulated as follows:
The
MAPE is also an error metric, with smaller values indicating a better performance of the predictive model.
- (4)
The R2 coefficient of determination
The
R2 coefficient of determination is utilized to evaluate the fitting degree of a network model, which can be expressed as
where
is the average of the real values. The
R2 coefficient of determination serves as a measure to evaluate the fitting and predictive capabilities of the model. A higher value signifies a greater predictive ability and improved accuracy.
4. Discussion
The causal inference experiment employs the Granger causality test to determine the causality of factors in industries that are highly and moderately influenced by the SHCOMP. The results show that, in highly influenced industries, causal factors included the open price, high price, low price, trading volume, trading amount, turnover rate, percentage change, P/E ratio, P/B ratio, and the related index itself. In contrast, in less influential industries, the causal relationship between the remaining factors and the closing price is more significant, except for the related index. These findings suggest that, in highly influential industries, individual stock closing prices are more significantly affected by index factors. Meanwhile, the causal relationship between individual stock factors is more pronounced in less influential industries.
The performance of various models was compared using identical input variables, and
Table 3 illustrates the percentage improvement in performance of the optimal model compared to other models. From the data presented in the table, it is evident that the GRU model outperformed the RNN and LSTM models in predictive accuracy. This outcome proves that the decision of the proposed algorithm and framework to use GRUs to compute the potential value of the target variable is a reasonable and appropriate choice.
Table 4 shows the percentage improvement in predictive performance of the model, with input causal factors compared to the baseline model with input from all potential factors. The results show that the enhanced model with causal factors input outperformed the corresponding benchmark model with all potential factor input in terms of predictive performance. For example, in the dataset of stock sh.601601, the inclusion of causal factors resulted in performance improvements for the RNN, LSTM, and GRU models compared to their corresponding baseline models. For the RNN model, there was an enhancement of 12.78% in RMSE, 13.07% in MAE, 5.15% in MAPE, and 1.76% in R2. Regarding the LSTM model, RMSE saw a 14.99% enhancement, MAE improved by 16.91%, MAPE by 2.42%, and R2 by 1.03%. For the GRU model, the RMSE was enhanced by 17.70%, MAE by 18.01%, MAPE by 15.65%, and R2 by 1.16%.
In addition to this, the table also compares the performance between GRU models, predicted using data of closing prices and data predicted using causal factors. The results show that the model using causal factors as input variables performed the best. For example, in the dataset of stock sh.600000, the GRU model using causal factors showed a significant performance improvement compared to the GRU model using only closing price: 23.86% on RMSE, 29.25% on MAE, 33.71% on MAPE, and R2 1.93%. Furthermore, relative to the GRU model using all latent factors, the causal factor model improved by 15.08% on RMSE, 20.06% on MAE, 26.95% on MAPE, and 1.03% on R2.
The utilization of causal factors not only reduces the amount of noisy information that the model has to deal with, but also focuses on the most relevant information, thus improving the predictive performance of the model. The causal factors inferred by the method proposed in this paper as inputs to the closing price prediction GRU model outperformed the models using closing price as inputs in terms of prediction accuracy and prediction performance, while the prediction performance of the benchmark model with all potential factors input was inferior to that of the model with causal factors inputs. These results further test the causal inference of correctness and enhance the reliability and validity of the method proposed in this paper.