1. Introduction
Futures contracts are agreements to buy or sell a commodity at a future date at a price that is agreed upon today. They are important hedging tools for investors [
1]. Stock index futures, derived from index values, are highly effective in minimizing investors’ exposure to market volatility and addressing systemic risk. This is especially important in the Chinese market, where the difficulty of short-selling individual stocks heightens the significance of the short-selling mechanism in the futures market as a primary risk management strategy [
2]. Acknowledging the significance of fostering financial market growth and employing effective risk management tools, China officially launched its first financial futures product, the Shanghai Shenzhen 300 (CSI300) index futures, in 2010 [
3]. Subsequently, the Shanghai Stock Exchange 50 (SSE50) index futures and the China Securities 500 (CSI500) index futures were also launched. The launch of these stock index futures has facilitated an increase in trading volume and market liquidity, while emphasizing the importance of anticipating changes in the stock index futures market [
4,
5]. However, existing research indicates that the market does not fully align with the efficient market hypothesis [
6,
7]. As a result, predicting stock index futures prices is not only theoretically possible but also practical and valuable in real-world applications [
8].
Long-term stock index futures prediction is a popular research area. Previous mainstream methods primarily rely on traditional econometric models [
9] such as Logistic Regression (LOGIT) [
10,
11], autoregressive integrated moving average (ARIMA) [
12,
13], and generalized autoregressive conditional heteroscedasticity (GARCH) [
14,
15]. In addition, Zhe Lin [
16] proposed an empirical analysis of the SSE Composite Index using GARCH-type models to examine its volatility, finding that the exponential GARCH (EGARCH) model outperforms others and offering suggestions for improving China’s securities market stability. However, traditional econometric models struggle to effectively address the nonlinear relationships in financial data, resulting in lower prediction accuracy [
17].
To address the limitations of traditional models, machine learning and deep learning have been increasingly applied in financial time-series forecasting, due to its ability to accurately capture complex nonlinear patterns in the data [
18,
19,
20,
21]. For example, Huang et al. investigated the predictability of financial movement direction. They used support vector machine (SVM) to forecast the weekly movement of the NIKKEI 225 index. The study demonstrated that SVM outperforms other methods [
22]. Illa et al. proposed a stock price prediction methodology using random forest and SVM algorithms to forecast market trends, demonstrating that both models perform well, with SVM yielding the highest accuracy for time-series data [
23]. Furthermore, Yang et al. focused on predicting stock market trends by analyzing emotions through news articles using natural language processing techniques and machine learning algorithms, finding that gradient boosting decision tree (GBDT) outperforms Adaboost, XGboost, decision tree, and logistic regression in accuracy [
24].
Predicting stock index futures is essentially a time-series forecasting problem, and deep learning-based time-series models have become one of the hottest research topics in recent years [
25]. For instance, Waswani et al. proposed the Transformer model, which replaced traditional recurrent networks with a self-attention mechanism, thus significantly improving parallelization and performance in sequence modeling tasks such as machine translation [
26]. Building upon this, Zhou et al. developed Informer, a Transformer-based model tailored for long sequence time-series forecasting. Informer leveraged ProbSparse self-attention and a generative decoder to efficiently manage long-range dependencies, further enhancing prediction accuracy [
27]. While these Transformer-based models showed strong performance, Zeng et al. questioned their effectiveness for long-term forecasting and presented DLinear, a simple linear model that outperformed complex Transformer models by more effectively capturing temporal relations with a single-layer architecture, suggesting that simpler approaches may be more suitable in some cases [
28]. In contrast, Nie et al. introduced PatchTST, which enhanced Transformer models for multivariate time-series forecasting by employing a patching strategy to segment time series, thereby reducing computation and memory usage while improving the model’s ability to capture long-term dependencies [
29]. To address the complexity of temporal variations in time-series data, Wang et al. proposed TimeMixer, a multiscale mixing architecture that decomposes time series into seasonal and trend components, enabling more accurate predictions for both short-term and long-term forecasting [
30]. Extending the idea of temporal variation modeling, Wu et al. developed TimesNet, which transformed time series into 2D tensors to capture both intra- and inter-period variations, achieving state-of-the-art performance in a range of time-series analysis tasks [
31]. Moreover, Chen et al. presented TSMixer, an all-MLP architecture that efficiently captured both temporal and cross-variate dependencies using a novel mixing operation along both the time and feature dimensions, further contributing to advancements in time-series forecasting [
32]. Due to its advantages, we will adopt this method.
Time-series forecasting, particularly in the financial domain, often faces the challenge of handling complex market volatility and the need for information across multiple time scales [
33]. Trading decisions in stock index futures are influenced not only by short-term fluctuations but also by long-term trends, such as weekly or monthly movements. However, most existing forecasting models primarily focus on short-term trends, neglecting cross-time scale information, which limits their effectiveness in addressing the complex dynamics of financial markets.
Moreover, many existing models rely on the historical data of stock index futures for overall modeling and forecasting [
34]. This approach overlooks the fact that stock indices are composed of multiple weighted component stocks, resulting in models that capture only the overall trend of the index without accurately measuring the contribution of individual component stocks to index volatility. Due to the aggregation nature of stock indices, different component stocks may exhibit distinctly varying price changes and volatility characteristics at different times, which undermines the model’s precision and accuracy.
To address these issues, this paper proposes three improved methods. First, based on the TSMixer model, this paper introduces a Multi-Scale module to better capture feature variations across different time granularities, such as 20-day and 60-day periods. This multi-path fusion strategy enables the model to retain daily details while effectively integrating weekly and monthly information, enhancing the accuracy and robustness of stock index futures forecasting. Second, considering that stock index futures are composed of multiple component stocks, this paper proposes using the historical data of all component stocks for training, and then inferring future trends of the stock index futures by weighting the forecast results of the individual component stocks during the prediction phase. This approach not only captures the unique characteristics of each component stock but also estimates their contributions to the overall index volatility, significantly enhancing the interpretability of the model. Third, to further explore the differentiated contributions of each component stock to the overall index in the prediction phase, three different weighted fusion methods are designed. These methods are average-based fusion, weighted-based fusion, and weighted-based decay fusion. In our experiments, the weighted-based decay fusion method considers both the weighted information of component stocks and the future expectation of the index at the same time and achieves the optimal predicted results.
The main contributions of this paper are as follows: (1) A deep learning-based Multi-Scale TsMixer model is proposed, which integrates a Multi-Scale time-series data processing strategy. This model performs feature fusion across multiple time scales (short term, medium term, and long term) to more comprehensively capture the short-term fluctuations and long-term trends in stock index futures history. (2) A stock index futures trend forecasting method combining independent component stock predictions with weighted fusion is proposed. (3) Three different weighted fusion strategies (average-based fusion, weighted-based fusion, and weighted-based decay fusion) are designed and systematically evaluated for their forecasting performance. Experimental results demonstrate that the weighted-based decay fusion method significantly outperforms the other methods in terms of prediction accuracy and stability.
4. Experimental Results and Analysis
In this section, we comprehensively compare and analyze the performance of each model in both regression and binary classification tasks, employing multiple metrics (MAPE, MAE,
, precision, specificity, and accuracy) across different sequence lengths and indices. To streamline the discussion, we provide the regression and classification results for the CSI500 stock index in
Table 1 and
Table 2, respectively, while supplementary findings for the CSI300 and SSE50 indices appear in
Appendix B and
Appendix C (
Table A2,
Table A3,
Table A4 and
Table A5).
This is a detailed explanation of the tables (see
Table 1,
Table 2,
Table A2,
Table A3,
Table A4 and
Table A5). In all the tables mentioned, upper arrows indicate that larger values are better; lower arrows vice versa. Green, red, and black represent the results that stand out for the 20, 60, and 80-day input sequences, respectively. Bold numbers indicate the best results, and double-underlined numbers indicate the suboptimal results.
4.1. Regression Results
In this regression task, we systematically evaluate the performance of each baseline model (TSMixer, Informer, DLinear, PatchTST, Transformer, TimesNet, TimeMixer) alongside the proposed Multi-Scale TsMixer under input sequences of 20, 60, and 80 days (see
Table 1).
Generally, as the input sequence length increases, models capture more historical data, resulting in steady improvements in prediction performance. This trend is particularly evident with longer sequences (60 and 80 days). For instance, from
Table 1, with DLinear, the MAE error is 55.147 for a 20-day sequence. However, when the sequence length increases to 60 and 80 days, the error decreases by 0.055 and 0.167, respectively.
It is noteworthy that the proposed Multi-Scale TsMixer demonstrates consistent stability across all input lengths, progressively leveraging the advantages of its Multi-Scale features as the input sequence length increases. For instance, at 20 and 60 days, the performance of Multi-Scale TsMixer and DLinear is relatively close, but at an 80-day input, Multi-Scale TsMixer obviously outperforms the other baseline models in terms of MAPE, MAE, and . This further validates the efficacy of the Multi-Scale (weekly, monthly) feature fusion strategy in capturing longer-term market trends and subtle fluctuations. These findings not only highlight the importance of utilizing multi-level information in long sequences, but also demonstrates that the proposed method can effectively accommodate varying time scales, offering greater applicability and robustness in predicting stock index futures price changes.
To comprehensively evaluate the performance of our model, we further compare its results across different indices. The experimental results for the CSI300 and SSE50 indices are detailed in
Appendix B and
Appendix C, respectively. The findings reveal that the proposed Multi-Scale TsMixer model consistently delivers the optimal performance across all these indices, demonstrating the robustness of our proposed method and its capability to yield accurate predictions across diverse market conditions.
4.2. Classification Results
Additionally, we further validate the model’s effectiveness in trading strategies through the binary classification task (up/down) and evaluate its accuracy during simulated trading using metrics such as precision, specificity, and accuracy. As illustrated in
Table 2 and
Figure 3, the performance trends of most models remain consistent. Interestingly, we find that improvements in regression metrics (MAE, MAPE) are often accompanied by higher classification accuracy. Our proposed model consistently achieves the highest prediction accuracy across various indices.
Figure 3 presents a comparison of accuracy variations across different input sequence lengths for multiple models. In the classification task, our proposed model consistently outperforms all baseline models, regardless of the input sequence length.
4.3. Weighted Fusion Results
After validating the model with overall index data, we turn our attention to the internal structure of the index, performing more detailed predictions at the constituent stock level. Specifically, we independently model each constituent stock and apply different weighting fusion strategies during the prediction phase to aggregate multiple stock forecasts into a composite prediction for the overall index. This approach enables the explicit differentiation of the contribution that each constituent stock makes to the index’s movement. By integrating this method with the previously proposed Multi-Scale TsMixer network structure, we further improve the model’s ability to capture both short-term fluctuations and long-term trends in stock index futures.
In this phase, we use the best-performing Multi-Scale TsMixer with an 80-day sequence length from the previous subsection as the baseline. We then compare several fusion methods and their performance. In the experiments, we compare three fusion methods, including average-based fusion, weighted-based fusion, and weighted-based decay fusion methods. In
Table 3, the experimental results show that the weighted-based decay fusion method delivers the best results across all evaluation metrics. This indicates the weighted-based decay fusion method’s superiority in capturing the dynamic relationships between constituent stocks and stock index futures.
The experimental results also show that the average-based fusion method, by failing to account for the varying importance of constituent stocks in the index composition, results in moderate accuracy. The weighted-based fusion enhances the alignment with the index structure by applying the weights based on market capitalization or liquidity, which improves predictive accuracy, but it overlooks the potential expectation gap between futures and spot markets, leading to the insufficient capture of dynamic relationships.
In contrast, the weighted-based decay fusion method achieves a more significant breakthrough. By incorporating a short-term decay factor into the training labels or prediction process, it more accurately reflects the expectation gap between futures and spot markets, resulting in superior performance across accuracy and other metrics compared to the other methods.
The further analysis of the performance of these fusion methods across various indices (CSI500, CSI300, and SSE50) shows that the weighted-based decay fusion method outperforms the others in all indices. For instance, in the CSI500 index, the accuracy (0.61848) and specificity (0.64136) of the weighted-based decay fusion method significantly exceed those of the average-based and weighted-based fusion methods. In
Table 3, the weighted-based decay fusion method also shows notable improvements in accuracy and specificity on the CSI300 and SSE50 indices. Particularly for the SSE50 index, the weighted-based decay fusion method achieves an accuracy of 0.60018, which is approximately 4% higher than the accuracy of the average-based (0.57394) and weighted-based fusion methods (0.57998).
In conclusion, introducing the weighted decay mechanism significantly enhances the prediction synchronization and accuracy in the dynamic relationship between constituent stocks and stock index futures. The weighted-based decay fusion method demonstrates distinct advantages across different indices and fusion strategies, particularly in capturing the dynamic relationships and short-term fluctuations between constituent stocks and stock index futures, highlighting its broad potential for stock index futures prediction.
Finally, we backtest the performance of the Multi-Scale TsMixer in both independently predicting index futures and merging forecasts from individual components on the test set over the period from 1 January 2022 to 31 October 2024. In the return calculation, we incorporate both the predicted and actual signals. Specifically, let
denote the predicted signal on day
t and
denote the actual signal on day
t (with 1 corresponding to an upward movement and 0 corresponding to a downward movement). We define the indicator function as
Then, we introduce the conversion factor
, defined as
Thus, when the predicted signal matches the actual signal,
; otherwise,
. The cumulative return over the entire backtesting period is calculated as
where the net return on day
t,
, is defined as
where
denotes the closing price of the futures on day
t, and
f is the transaction fee, set at 0.000023 (i.e., 0.23 per 10,000).
Figure 4 shows the return comparison between the single-future input and the constituent-stock weighted approach for three index futures with an 80-day feature length.
Table 4 demonstrates that our weighted decay fusion method consistently achieves superior returns across various stock index futures. In addition, the approach achieves a notably lower maximum drawdown and enhanced profitability compared to single-future based model, underscoring the remarkable efficacy and robustness of the proposed fusion strategy. Moreover, continuous iterations and optimizations of the fusion method lead to a steady increase in returns across the three stock index futures.