Next Article in Journal
Adaptive Suppression Method for Periodic Pulsation Interference in Partial Discharge of Converter Transformers Based on Periodic Consistency Scoring and Waveform Characteristics
Previous Article in Journal
Construction of Journal Knowledge Graph Based on Deep Learning and LLM
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Neural Network-Based CSI300 Stock Prediction: Feature Importance and Attention Mechanism Analysis

1
Department of Computer Science and Engineering, Sogang University, Seoul 04107, Republic of Korea
2
School of Computer Science, Semyung University, Jecheon-si 27136, Republic of Korea
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(9), 1729; https://doi.org/10.3390/electronics14091729
Submission received: 27 March 2025 / Revised: 17 April 2025 / Accepted: 22 April 2025 / Published: 24 April 2025

Abstract

:
In this study, neural networks are utilized to develop a stock price prediction model based on the constituent stocks of the China Securities Index 300 (CSI300). This research investigates various prediction methods and models through experiments, comparing their advantages, limitations, and applicability to improve the accuracy and efficiency of stock price forecasting. Furthermore, we analyze the characteristics of CSI300 constituent stocks and explore the relationships among key variables and influential factors, enhancing our understanding of stock market behavior. Additionally, we explore the impact of the attention mechanism on stock price prediction, assess its contribution to enhancing predictive performance, and propose an optimized CSI300 stock prediction model based on neural networks.

1. Introduction

Stock indices serve as key indicators of overall market trends, making stock index prediction an essential component of financial forecasting. For investors, accurately anticipating stock index movements enables strategic asset allocation, maximizing potential returns while minimizing risks. When stock index prices are expected to rise, investors may increase their equity exposure to capitalize on growth opportunities; conversely, in anticipation of a decline, they may shift toward fixed-income assets and cash to preserve capital and hedge against volatility. Beyond individual investors, stock index analysis is equally critical for corporations and policymakers. Listed companies rely on market trend insights to develop effective financing strategies, optimize capital allocation, and mitigate operational risks stemming from market fluctuations. Similarly, governments and financial regulators leverage stock market data to assess economic stability, detect financial anomalies, and implement policies that enhance market resilience. In China, the stock market has experienced substantial growth, with the China Securities 300 Index (CSI300) emerging as a benchmark for market performance. Introduced in 2005, the CSI300 Index comprises the 300 largest and most liquid companies listed in China’s A-share market, spanning multiple industries. Accounting for approximately 60% of the total market capitalization across the Shanghai and Shenzhen stock exchanges, the index is widely used by investors and analysts as a key market indicator. Given its broad market representation, accurate CSI300 predictions can provide valuable insights for investment decision making, risk management, and economic policy formulation. Academic research on stock market prediction has consistently aimed to enhance forecasting accuracy while mitigating risks. Traditional approaches rely on statistical models such as ARIMA [1] and GARCH [2], while more recent developments leverage machine learning and deep learning. With the advancement of neural networks, particularly Multi-Layer Perceptrons (MLPs) [3], Convolutional Neural Networks (CNNs) [4], and Recurrent Neural Networks (RNNs) [5], researchers have explored AI-driven approaches to capture complex market dynamics and improve predictive performance. However, existing studies often lack rigorous validation in terms of profitability and practical applicability, limiting their effectiveness in real-world trading environments. Additionally, many models consider stock prices as isolated variables, failing to account for the impact of feature selection and attention mechanisms on prediction accuracy. To address these limitations, this study proposes a neural network-based CSI300 stock prediction model that integrates an attention mechanism. Through extensive experiments, we compare different prediction models, evaluate their advantages and limitations, and assess their practical usability. Furthermore, we analyze key influencing factors, such as the role of attention layers in stock price forecasting, and present an optimized deep learning model for CSI300 prediction. The main contributions of this study are as follows:
  • We propose a neural network-based prediction framework that integrates attention mechanisms for CSI300 index movement forecasting.
  • We conduct a comprehensive feature engineering process, constructing 111 features including trend, momentum, volume, and frequency-domain indicators.
  • We experimentally demonstrate the performance enhancement achieved through the inclusion of attention layers.
  • We analyze feature importance based on attention weights to improve model interpretability and offer insights into influential market indicators.

2. Related Works

2.1. The Need for Stock Price Prediction

The emergence of stock markets dates back to the early 17th century, when capitalist economies sought efficient mechanisms to raise capital and facilitate investment. Over time, stocks have evolved into critical financial instruments, enabling capital accumulation and fostering economic expansion. The continued globalization of financial markets has led to increased cross-border capital flows, further integrating national economies and accelerating financial interdependence. On 19 December 1990, China established the Shanghai Stock Exchange (SSE), marking a significant milestone in the country’s financial development. Over the past three decades, China’s stock market has played a pivotal role in national economic growth, state-owned enterprise (SOE) reform, and the modernization of its financial system. The stock market serves multiple functions across economic sectors.
  • Economic development: Facilitates efficient resource allocation, mobilizing capital for national economic growth and industrial expansion.
  • Listed companies: Provides firms with access to capital markets, supporting business expansion, production growth, and corporate governance improvement.
  • Investors: Offers diverse investment opportunities, enhances market liquidity, and allows individuals and institutions to seek higher returns.
However, despite its contributions to economic growth, the stock market also carries significant risks. Increasing global financial interconnectivity has heightened systemic vulnerabilities. Local market fluctuations can rapidly spread across economies, triggering financial instability and economic downturns. The COVID-19 pandemic exposed the fragility of global financial markets. The U.S. stock market experienced four consecutive circuit breakers within two weeks, while WTI crude oil futures prices plunged into negative territory for the first time in history. Simultaneously, the market volatility index (VIX) surged to record highs, reflecting heightened uncertainty among investors. More recently, geopolitical tensions, such as the Russia–Ukraine conflict and economic sanctions imposed by Western nations, have further destabilized global financial markets. The Russian ruble depreciated sharply against the U.S. dollar, exacerbating economic difficulties in Russia. Likewise, China’s financial market has experienced significant volatility due to macroeconomic challenges and shifting monetary policies. To counteract economic uncertainty and downward pressure, the People’s Bank of China (PBoC) has implemented a series of interest rate cuts and expansionary monetary policies. As illustrated in Figure 1, by September 2023, the one-year benchmark deposit interest rate had dropped to 1.5%, contributing to declining yields and increased investor caution.
Given these volatile market conditions, prudent investment strategies are more crucial than ever. While the stock market presents significant opportunities for high returns, uninformed or speculative investment decisions can lead to substantial financial losses. This highlights the need for accurate stock price forecasting, particularly in unstable economic environments. By leveraging advanced financial modeling techniques, investors and policymakers can better anticipate market trends, manage risk exposure, and develop adaptive strategies to navigate uncertain financial landscapes. This study seeks to address these challenges by developing a neural network-based stock price prediction model, incorporating attention mechanisms to improve accuracy and feature interpretability in complex financial markets.

2.2. Existing Studies on Stock Price Prediction

Stock price fluctuations are influenced by various factors, which can generally be analyzed using technical analysis and fundamental analysis. Technical analysis utilizes historical price and volume data to identify patterns and trends, aiding in investment decisions. Fundamental analysis examines a company’s financial statements and industry conditions to assess its long-term value [6]. However, fundamental analysis requires an in-depth evaluation of both company-specific factors and macroeconomic conditions, which can be difficult to manage in a complex stock market [7].
Among these approaches, technical analysis is more commonly utilized in algorithmic and AI-driven trading strategies due to its reliance on structured numerical data, making it more suitable for machine learning applications. Stock prices are often assumed to reflect all available market information, including supply–demand imbalances, trading volume trends, and investor sentiment. As a result, identifying chart patterns and trend-based price fluctuations can provide valuable insights into price movement predictions, independent of external economic or political events [8]. Historically, statistical models such as ARIMA, GARCH, and Support Vector Machines (SVMs) were widely used for stock market prediction. Additionally, Genetic Algorithms (GAs) have been explored for stock index forecasting, including applications to the S&P 500 index [9]. However, these traditional methods often struggle to capture the non-linear and dynamic nature of financial markets. With advancements in neural network-based deep learning, more sophisticated predictive techniques have emerged. Recent studies have demonstrated the effectiveness of deep learning in stock forecasting, with models such as Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN), and Generative Adversarial Networks (GAN) achieving significant improvements [10,11,12]. Reinforcement learning-based trading systems have also gained traction, allowing models to adapt dynamically to changing market conditions [13]. Despite these advancements, many existing models lack robust validation in terms of return predictability, limiting their practical utility in real-world trading. Furthermore, a significant gap in research exists in feature selection and structured data segmentation, which are crucial for extracting meaningful market patterns [14,15,16]. Some studies [17] focus on predicting returns without properly comparing their results to broader market indices, making it unclear whether observed profits are derived from model performance or overall market movements. Additionally, although CNNs have been successfully repurposed for stock chart pattern recognition, achieving up to 85% accuracy, many existing studies fail to integrate all critical market-influencing factors, resulting in suboptimal prediction accuracy [18].
While previous studies have employed models such as GAN-LSTM-Attention [19], CNN-BiLSTM-Attention [20], and GRU-Attention [21] for stock index forecasting, these approaches predominantly rely on raw time-series inputs without extensive feature engineering or the integration of structured financial indicators. In contrast, our model combines deep learning techniques with domain-specific technical indicators and employs a dual-layer attention mechanism, thereby improving both interpretability and predictive performance.
This study aims to address these limitations by developing an enhanced neural network-based stock price prediction model. Specifically, the proposed model integrates an attention mechanism to improve feature selection, enhance model interpretability, and optimize both prediction accuracy and return predictability. By incorporating domain-specific financial indicators alongside deep learning architectures, this study seeks to develop a more robust and practical forecasting approach.

3. China Stock Market

The Chinese stock market has undergone significant structural advancements, primarily driven by the Shanghai Stock Exchange (SSE) and the Shenzhen Stock Exchange (SZSE). These two exchanges serve as the foundation of China’s stock market, providing investment opportunities for domestic and international investors. The structure of the Chinese stock market is illustrated in Figure 2.
In 1991, the Shanghai Composite Index and the Shenzhen Composite Index were introduced as the primary indicators of market performance. Shortly after, in 1992, the A-share and B-share markets were established to facilitate trading for domestic and foreign investors, respectively. Over the next decade, additional indices such as the Shenzhen Composite Index, Shanghai 50 Index, and Shenzhen 100 Index were launched. However, these indices primarily reflected individual trends of either the Shanghai or Shenzhen stock markets, failing to provide a comprehensive market-wide benchmark. To address this fragmentation and the need for a unified market indicator, the China Securities 300 Index (CSI300) was introduced in 2005.

3.1. The CSI300 Index

The CSI300 Index consists of the 300 largest and most liquid publicly traded companies in China’s A-share market, covering a diverse range of industries. As a comprehensive market indicator, it accounts for approximately 60% of the total market capitalization of both the Shanghai and Shenzhen stock exchanges. Due to its broad representation and high trading activity, the CSI300 Index serves as a critical benchmark for institutional investors, hedge funds, and economic analysts. Since the CSI300 reflects the overall trends of China’s stock market, accurate forecasting of its movements can significantly benefit investors by improving market timing strategies, risk management, and portfolio diversification. Additionally, policymakers and regulators rely on stock index predictions to detect financial anomalies, anticipate economic shifts, and formulate macroeconomic policies. The CSI300 Index is closely linked to other major indices in China, as illustrated in Figure 3. The relationships between the CSI300 and other stock indices highlight its strong correlation with overall market performance, making it a key indicator for financial decision making.
The figure demonstrates how the CSI300 interacts with other indices, reinforcing its role as a primary market indicator. Investors and analysts frequently use the CSI300 alongside sector-specific indices to gain deeper insights into market trends and sectoral movements.

3.2. Construction and Market Influence

The CSI300 Index follows a systematic selection process to ensure it accurately represents market dynamics. Companies are initially ranked based on their trading volume over the past year, allowing for a preliminary assessment of liquidity and market activity. To refine the selection, the bottom 50% of stocks are excluded, removing illiquid securities that may not contribute to reliable market representation. Among the remaining stocks, the top 300 companies by total market capitalization are chosen to form the final composition of the CSI300 Index. To maintain market relevance, the index is rebalanced every six months, ensuring that it reflects current economic trends and sectoral shifts. Given that CSI300 stocks account for nearly 70% of the total market capitalization of China’s A-share market, the index serves as a key benchmark for domestic and global investors, providing insights into the overall performance of China’s financial markets.

3.3. Importance of Predicting the CSI300 Index

As a key indicator of China’s economic health, the CSI300 Index plays an essential role in investment decision making, financial stability, and macroeconomic planning.
  • For Investors: Accurate CSI300 predictions can enhance portfolio allocation strategies, helping investors identify optimal entry and exit points in the market.
  • For Policymakers: Government agencies can use stock market forecasts to develop monetary policies, fiscal regulations, and economic stimulus plans based on expected market trends.
  • For Risk Management: Hedge funds and financial institutions utilize machine learning models to predict index movements, allowing them to mitigate downside risks and maximize returns while mitigating risks.
Given the complexity of stock market fluctuations, traditional statistical models (ARIMA, GARCH) struggle to capture non-linear trends. Thus, recent advancements in deep learning and attention-based neural networks offer a more sophisticated approach to stock index forecasting. This study aims to develop an optimized neural network model to enhance the accuracy and interpretability of CSI300 predictions.

4. Methods

4.1. Multi-Layer Perceptron (MLP)

Artificial intelligence has experienced multiple cycles of growth and decline since the 1950s, with deep learning emerging as a dominant paradigm in the 2000s. Advances in GPU acceleration, in-memory computing, and high-performance computing architectures have enabled deep learning to process large-scale financial datasets efficiently. Deep learning has significantly improved performance in pattern recognition tasks such as speech and image recognition. In financial applications, Multi-Layer Perceptrons (MLPs), characterized by multiple hidden layers, have demonstrated strong predictive capabilities. A Multi-Layer Perceptron (MLP) consists of three main components. First, the input layer receives stock market data, including the historical prices, volume, and technical indicators. Next, the hidden layers, composed of multiple neurons with non-linear activation functions [22], extract complex features and detect intricate relationships in the data. Finally, the output layer generates the final prediction, such as future stock price movement. MLPs are trained using backpropagation, an optimization method that adjusts model parameters by minimizing the difference between predicted and actual values [23]. Figure 4 illustrates the deep learning architecture used in this study for stock price prediction.

4.2. Attention Mechanism

The attention mechanism is a fundamental deep learning technique that enhances model performance by dynamically assigning different weights to input sequences. This mechanism enables the model to selectively focus on the most relevant input features, improving its ability to extract meaningful patterns from complex data. The attention layer, illustrated in Figure 5 plays a crucial role in refining the model’s understanding of stock price movements by emphasizing significant features while reducing the influence of less relevant ones.

4.2.1. Basic Principle

The attention mechanism is designed to selectively highlight important input elements, helping the model prioritize key stock-related features. The attention scores are computed based on the relationships between Queries (Q), Keys (K), and Values (V) using the scaled dot-product attention formula, expressed as follows:
Attention ( Q , K , V ) = softmax Q K T d k V
where Q (Query) represents the specific feature or position the model needs to focus on. K (Key) corresponds to the input feature or position in the sequence that helps determine relevance. V (Value) represents the associated weight assigned to each input feature based on its importance. d k is the dimensionality of the key vectors, which ensures that the dot-product values are scaled appropriately. The softmax function normalizes the attention scores so that they sum to 1.

4.2.2. Impact on Stock Prediction

By incorporating the attention mechanism, the model significantly improves its ability to analyze stock price fluctuations, particularly in highly volatile market conditions. The mechanism enhances pattern recognition by dynamically adjusting focus, enabling more accurate predictions of market trends and investor behaviors. Additionally, the ability to model complex relationships between financial indicators allows the system to capture short-term and long-term dependencies in stock data effectively.

4.3. Justification for Model Selection

Traditional stock prediction models often rely on complex sequence-based architectures such as LSTM, GRU, or Transformer. While effective in capturing temporal dependencies, these models require considerable computational resources, longer training times, and intricate hyperparameter tuning [24,25]. Additionally, models trained directly on raw time-series data often lack interpretability, which can be problematic in financial applications. In contrast, this study adopts a Multi-Layer Perceptron (MLP) architecture due to its simplicity, computational efficiency, and suitability for structured data. MLPs are well-suited to handle engineered technical indicators and can achieve stable performance with relatively few parameters [26,27]. Their fast training time enables rapid experimentation and architecture refinement while reducing the risk of overfitting, particularly in moderate-sized datasets. However, a standard MLP model may not sufficiently capture the relative importance of each feature in high-dimensional input. To address this limitation, we incorporate attention mechanisms into the MLP architecture. Attention layers dynamically adjust the weight of each input feature based on its contribution to the prediction task, thereby enhancing both interpretability and predictive power, while attention is commonly used in recurrent architectures, our approach demonstrates that attention mechanisms can also be highly effective when integrated into a feed-forward MLP framework [28]. A key challenge in the model design process was determining the optimal placement and dimensionality of the attention layers. We addressed this through iterative experimentation and validation-based tuning. To improve learning stability and address scale differences across features, we applied z-score normalization during preprocessing. In summary, the attention-based MLP model combines architectural simplicity with strong performance and interpretability. It offers a practical alternative to more resource-intensive sequential models, particularly when working with well-engineered input features in financial forecasting.

5. Experiments

This study develops a deep learning-based stock price prediction model using CSI300 constituent stock data. The prediction framework integrates technical indicators, feature engineering techniques, and machine learning-based filtering methods to capture market trends and price movement patterns. The model is implemented in Python using TensorFlow 2.4 as the core deep learning library, with Jupyter notebook 6.4.6 as the development environment. Figure 6 illustrates the process of the overall experiment.
The experiments were conducted on a Linux server located in Korea, equipped with an NVIDIA GTX 1080 Ti GPU (NVIDIA Corporation, Santa Clara, CA, USA), an Intel Core i7 CPU (Intel Corporation, Santa Clara, CA, USA), and 32GB of RAM. The implementation was carried out in Python 3.9 using standard libraries, including NumPy 1.21.6, Pandas 1.3.5, and Matplotlib 3.5.3.
The dataset was constructed by merging multiple CSV files corresponding to individual CSI300 constituent stocks. For each stock, we computed a comprehensive set of technical features, including moving averages, gradients, divergence indicators, and volume-based metrics. The data were aligned by trading date and sorted chronologically. All numerical features were standardized using z-score normalization to ensure consistent scaling across indicators, and rows with missing values were excluded during preprocessing. To preserve the temporal structure of the data, we employed a chronological split as follows: 70% of the data were used for training, 20% for validation, and 10% for testing. This approach avoids information leakage and ensures generalization testing in a realistic time-series setting. We explored multiple training configurations to identify suitable hyperparameters. Specifically, we conducted a series of empirical trials by varying the learning rate from 0.1 to 0.001, testing different batch sizes, and modifying model architecture (e.g., the number of hidden layers from one to four and neuron counts from 50 to 300). Among these, a learning rate of 0.001, a batch size of 150, and 100 training epochs consistently yielded the most stable convergence and strongest validation performance without overfitting. These values were selected as the final training configuration.

5.1. Data Collection

This study develops a deep learning-based stock price prediction model using CSI300 constituent stock data. The prediction framework integrates technical indicators, feature engineering techniques, and machine learning-based filtering methods to capture market trends and price movement patterns. The model is implemented in Python using TensorFlow as the core deep learning library, with Jupyter notebook as the development environment. The basic structure of the CSI300 dataset is presented in Table 1. Based on this dataset, 111 input features were developed through feature extraction techniques, which were subsequently used to train a deep learning model. The next section provides a detailed explanation of the feature engineering process and model training methodology.

5.2. Input Features and Target Vector

This section describes the 111 input features extracted from stock price data and used in the deep learning model. These features are derived through various transformations, including moving averages, gradients, divergence metrics, Fourier Transform features, and candlestick patterns. The target variable is also defined based on stock price movements.

5.2.1. Moving Averages and Gradients

To analyze stock price trends over different timeframes, moving averages are computed for various periods (5, 10, 20, 60, and 120 days). Additionally, their gradients are calculated to measure the rate of change in trend direction.

Moving Average ( M A )

The moving average is calculated as follows:
M A P = 1 P i = 0 P 1 ClosePrice t i
where P represents the lookback period.

Gradient of the Moving Average ( G r a d )

The gradient of the moving average is computed as follows:
G r a d P = d d t M A P
This helps detect acceleration or deceleration in price trends. A total of 10 features (5 M A s + 5 G r a d i e n t s) are generated from this group.

5.2.2. Volume-Based Features

Similar to price-based indicators, moving averages and gradients are also computed for trading volume data.

Volume Moving Average ( V M A )

The volume moving average is calculated by averaging the trading volume over a specified lookback period P, as follows:
V M A P = 1 P i = 0 P 1 Volume t i

Gradient of the Volume Moving Average ( V G r a d )

The gradient of the volume moving average indicates the rate of change in trading volume trends over time and is computed as follows:
V G r a d P = d d t V M A P
These features capture trends in trading activity, helping to identify accumulation or distribution phases. A total of 10 features are included.

5.2.3. Divergence Metrics

Divergence features compare the stock’s closing price to its moving averages, helping to identify overbought or oversold conditions.

Price Divergence ( D i v )

Price divergence measures the difference between the current closing price and the moving average over a lookback period P:
D i v P = ClosePrice t M A P
This measures how far the current price deviates from the trend.

Disparity Index ( D i s )

The disparity index expresses the price divergence as a percentage, providing a normalized view of how far the price is from its moving average:
D i s P = ClosePrice t M A P × 100
This expresses price deviation in percentage terms.

Volume Divergence and Disparity Index (VDis)

Similar calculations are applied to the volume data to measure deviations from typical trading activity. A total of 20 features (5 each for Div, Dis, VDiv, and VDis) are included.

5.2.4. Fourier Transform Features

To analyze frequency-domain characteristics of stock price movements, the Fast Fourier Transform (FFT) is applied to both closing prices and trading volume.

Amplitude

Amplitude represents the magnitude of the signal in the frequency domain, computed using its real and imaginary components:
Amplitude = Real 2 + Imaginary 2 N / 2

Phase ( θ )

The phase of the signal is measured using the arctangent of the ratio between imaginary and real parts:
θ = arctan Imaginary Real

Frequency Components

The dominant frequency components of the signal are extracted using the Fast Fourier Transform (FFT). The corresponding frequency values are obtained using:
FFT Frequency = np . fft . fftfreq ( N , d = 1 )
which identifies dominant cycles in the data. A total of 6 features (Amplitude, Theta, and Frequency, for both close price and volume) are included.

5.2.5. Candlestick Type

Each trading day is categorized based on whether the closing price is higher than the opening price as follows:
CandleType = 1 , if Close Price > Open Price 0 , otherwise
This binary feature captures daily market sentiment. A total of 1 feature is included.

5.2.6. Price Change and Target-Related Features

To track stock momentum, the percentage change in closing price over 5 days is computed as follows:
PriceChange = ClosePrice t ClosePrice t 5 ClosePrice t 5 × 100
This helps measure short-term trend strength.

5.2.7. Additional Features

To enhance predictive performance, additional features are derived from stock price and trading volume data, capturing trend dynamics, liquidity, and momentum indicators. Moving averages play a crucial role in trend analysis. The Simple Moving Average (SMA) smooths out short-term fluctuations, while the Exponential Moving Average (EMA) assigns greater weight to recent prices for quicker responsiveness. The Double Exponential Moving Average (DEMA) further reduces lag, allowing for earlier trend detection. Both short-term (12-day) and long-term (26-day) EMAs are included to differentiate between short- and long-term trends. The Moving Average Convergence Divergence (MACD), calculated as the difference between the short- and long-term EMAs, helps identify trend shifts. Additionally, the MACD Signal Line, derived from a 9-day moving average of MACD, and the MACD Histogram, which represents the difference between MACD and the Signal Line, provide further confirmation of trend direction. Volume-based indicators add depth to market trend analysis. The Volume-Weighted Average Price (VWAP), computed over multiple periods (5, 10, 20, 60, and 120 days), reflects the average price at which a security has traded, weighted by volume, offering insights into institutional trading behavior. The Rate of Change (ROC) measures the percentage change in price over a given period, highlighting momentum shifts. By integrating these trend-following, momentum, and liquidity-based indicators, the model effectively captures stock price movements, improving its ability to recognize patterns and predict future trends.

5.3. Target Vector

The target variable is defined based on the 5-day percentage change in closing price. It is classified into two categories.
Target = 1 , if PriceChange 0 0 , otherwise
where 1 indicates a non-negative price change (stock price increased or remained the same), and 0 indicates a decrease. This binary classification is used to train the deep learning model for stock trend prediction.

5.4. Model Evaluation Metrics

To evaluate the performance of the classification model, standard evaluation metrics are used. The confusion matrix categorizes predictions into four possible outcomes. A True Positive ( T P ) occurs when the model correctly predicts that a stock will rise, while a True Negative ( T N ) occurs when the model correctly predicts that a stock will decline. On the other hand, a False Positive ( F P ) happens when the model incorrectly predicts that a stock will rise when it actually declines, and a False Negative ( F N ) occurs when the model incorrectly predicts that a stock will decline when it actually rises. The structure of these outcomes is summarized in Table 2.
An ideal model should maximize T P and T N while minimizing F P and F N to ensure higher predictive accuracy. Several key metrics are used to assess classification performance as follows:
Accuracy measures the proportion of correct predictions among all samples and is calculated as follows:
Accuracy = T P + T N T P + T N + F P + F N
Precision evaluates the proportion of correctly predicted positive cases out of all predicted positive cases, calculated as follows:
Precision = T P T P + F P
A high precision indicates that most of the predicted rising stocks are indeed correct.
Recall, also known as sensitivity, measures the model’s ability to correctly identify actual positive cases as follows:
Recall = T P T P + F N
A high recall means the model captures most of the rising stocks, minimizing missed opportunities.
The F1-Score provides a balanced measure by considering both precision and recall, making it a more reliable metric when dealing with imbalanced datasets, expressed as follows:
F 1 - Score = 2 × Precision × Recall Precision + Recall
The F1-Score is particularly useful in stock market prediction, where imbalanced data can make accuracy a misleading metric.
Since stock market data is often imbalanced, accuracy alone is insufficient for evaluation, as it may fail to reflect the model’s effectiveness in predicting both rising and declining stocks. Instead, the F1-Score is prioritized to ensure a better balance between identifying positive cases and avoiding false alarms.
To optimize predictions, the binary cross-entropy loss function is used, which measures the difference between the actual and predicted probabilities as follows:
L = y log ( p ) + ( 1 y ) log ( 1 p )
where y represents the true label (0 or 1), and p represents the predicted probability. This function ensures that the model is trained to make confident and accurate predictions by penalizing incorrect classifications more heavily.

6. Experiment Results

This chapter presents the experimental results for the stock price prediction model using a Multi-Layer Perceptron (MLP) with and without attention layers. The same dataset and hyperparameters are used across both models to evaluate the impact of the attention mechanism. The MLP model takes 111 input features and predicts whether the closing price will rise after five days, producing a binary output (0 or 1). The training process is optimized using the RMSprop optimizer [30] with a learning rate of 0.001, a batch size of 150, and 100 epochs. To examine the effect of the attention layer, an alternative model is created by inserting attention layers between MLP layers. Specifically, an attention layer is applied after the input layer, and another after an intermediate MLP layer. The outputs from both attention layers are then concatenated before the final prediction layer, allowing the model to enhance feature interactions and refine important patterns in the data. All experiments were repeated five times using different random seeds to account for the stochastic nature of training. The performance metrics reported in the following tables represent the average values over these runs. To verify the statistical significance of the improvements, a two-sample t-test was conducted on the F1-Scores obtained from the repeated experiments.

6.1. Model Performance Without Attention Layers

For the model without attention layers, the results for the training and validation data are shown in Figure 7. The left side of the figure illustrates the loss function, while the right side depicts accuracy over epochs. The model achieves low prediction error and high accuracy on training data, but the validation error remains unstable, suggesting possible overfitting.
The detailed performance metrics, including loss, accuracy, precision, recall, and F1-Score, are summarized in Table 3.

6.2. Model Performance with Attention Layers

For the MLP model incorporating attention layers, the learning results are shown in Figure 8. Despite using the same dataset and hyperparameters, adding attention layers between dense layers helps reduce error divergence and significantly improves validation performance. The model structure consists of an input layer followed by an attention layer. A fully connected MLP layer with 200 neurons and ReLU activation is applied, followed by a second attention layer after another dense layer with 50 neurons. The outputs of both attention layers are then concatenated before passing through a final dense layer with a sigmoid activation function, which predicts the binary stock movement.
The detailed performance metrics for the MLP model with attention layers are presented in Table 4.
The experimental results confirm that adding attention layers improves model generalization and stability. While the MLP model without attention exhibits overfitting, with low training error but unstable validation performance, the MLP model with attention layers demonstrates better performance across all datasets, with lower validation loss and higher F1-Scores. By incorporating attention layers between dense layers and merging their outputs via concatenation, the model effectively captures dependencies across input features and enhances predictive accuracy for stock price movements.

6.3. Performance Comparison and Effect of Attention

The inclusion of attention layers significantly improved the model’s generalization performance. As shown in Table 4, the MLP with attention outperformed the baseline MLP in all key metrics. The MLP model with attention achieved a test accuracy of 97.77% and an F1-Score of 96.99%, significantly outperforming the baseline MLP without attention, which recorded a test accuracy of 91.76% and an F1-Score of 89.00% (Table 3). This indicates that the attention mechanism effectively captured the most influential features and reduced noise in the feature space. Although MLP is a relatively simple architecture, the use of 111 engineered technical indicators allowed it to model complex market behaviors. Attention layers further enhanced feature prioritization, contributing to superior performance without overfitting. While deeper architectures such as LSTM or Transformer are often adopted in sequence modeling, our results suggest that for structured financial indicators, attention-augmented MLPs can offer both efficiency and interpretability. A two-sample t-test conducted on five independent runs showed that the performance improvement in F1-Score from the attention-based model was statistically significant, with a p-value of 1.94 × 10 12 . This result confirms that the inclusion of attention layers leads to a reliable and measurable improvement in model generalization.

6.4. Attention Layer Analysis

This section analyzes which input features the attention layer focuses on during stock price prediction. The attention mechanism assigns different weights to input features, allowing the model to emphasize more relevant factors. To visualize the distribution of attention weights, a heatmap of the calculated weights for each input feature is displayed in Figure 9.
Through the learning process from Attention Layer 1 to Attention Layer 3, it can be observed that the attention mechanism gradually refines its focus on important input features. The state weights represent the importance assigned to each feature based on the similarity between the key and query vectors. This mechanism helps determine which parts of the input sequence the model should prioritize for accurate predictions. To further investigate which features contribute the most to stock price predictions, the average attention scores across 111 input features were analyzed. The results are illustrated in Figure 10, where yellow represents higher attention scores, while purple represents lower attention scores.
The results indicate that input features such as VMA5, VGrad10, DEMA10, and UPMOVE—which correspond to the short-term moving averages of the trading volume and price trends—receive the highest attention scores. This suggests that, since the prediction target in this study is the closing price in five days, the model assigns higher weights to short-term moving averages, trading volume trends, and moving average slopes. By calculating the average attention scores, the model highlights the most critical input features among the 111 available features. The heatmap visualization clearly illustrates which factors the model prioritizes, with yellow indicating the most influential features.

7. Performance Comparison with Other Deep Learning Models

To further contextualize the effectiveness of our proposed attention-based MLP model, we compared its performance with several recent deep learning models that conducted binary classification (up/down) of stock price movements using the CSI300 index. Table 5 summarizes the reported accuracy and F1-Scores of these models, drawn from the literature published between 2020 and 2025.
As shown in Table 5, recent classification models applied to the CSI300 dataset have achieved moderate accuracy levels, ranging from approximately 55% to 60%. Models such as WD-LSTM and DanSmp, which combine LSTM with denoising or dual-attention graph learning, offer incremental performance over traditional methods. Chart GCN stands out with a reported accuracy of 94.1%, though it focuses on pattern graph extraction rather than direct time-series features, and its F1-Score was not disclosed.
Our attention-based MLP outperforms these models, achieving a test accuracy of 97.77% and F1-Score of 96.99%, while architectural differences and input features vary across studies, these results demonstrate the practical effectiveness of our model in stock movement prediction using technical indicators.
To ensure comparability, we limited this analysis to classification-based models using the CSI300 index or its constituent stocks. Studies using regression metrics (e.g., R 2 or RMSE) were excluded from the comparison.

8. Conclusions

This study examined stock price prediction for the CSI300 index using Multi-Layer Perceptron (MLP) models, both with and without attention layers. The experimental findings show that incorporating attention mechanisms substantially improves the model’s generalization ability and stability. The baseline MLP model, although highly accurate on training data, exhibited signs of overfitting and delivered less stable validation performance. In contrast, the attention-based model achieved lower validation loss and higher F1-Scores across all evaluation sets.
In addition to architectural enhancements, the inclusion of a rich set of 111 technical indicators—including moving averages, gradients, divergence measures, and Fourier Transform features—contributed to the model’s improved predictive accuracy. Attention-based feature importance analysis highlighted the significance of short-term trends, volume fluctuations, and momentum indicators in forecasting stock movements.
Despite these promising results, the challenge of stock market prediction persists due to the inherently complex and volatile nature of financial markets. External factors such as economic policy shifts, geopolitical risks, and market sentiment often exert influence beyond the scope of historical data. Moreover, like all data-driven models, neural networks can produce unexpected results, emphasizing the need for robust risk management when applying such models to real-world decision making.
Maintaining model relevance requires regular updates with new market data to reflect shifting dynamics. One notable limitation of this study is the absence of post-2023 stock data, as the proprietary data source used during this research is no longer accessible. While public APIs provide limited support, especially for A-share stocks listed on the Shanghai and Shenzhen exchanges, they do not yet enable complete and consistent data retrieval.
Future work will focus on integrating commercial data services, such as TuShare or Wind, to broaden the dataset and support evaluation under more volatile market conditions. This would also pave the way for building real-time prediction systems that can adapt continuously to market updates.
Overall, this work demonstrates the value of attention-enhanced MLPs for stock prediction, outlines key limitations, and offers practical insights for improving model robustness. The findings contribute to the growing body of research in AI-powered financial forecasting and highlight promising directions for future exploration.

Author Contributions

Conceptualization, Z.D. and K.S.; methodology, K.S.; software, K.S.; validation, K.S., Z.D. and Y.S.; formal analysis, K.S.; investigation, K.S.; resources, Y.S.; data curation, K.S.; writing—original draft preparation, K.S.; writing—review and editing, Z.D. and Y.S.; visualization, K.S.; supervision, Y.S.; project administration, Y.S.; funding acquisition, Y.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Semyung University Research Grant (Grant No. 2023).

Data Availability Statement

The datasets and source code used in this study are openly available on GitHub at https://github.com/YoojeongSong/NNCSI300 (accessed on 23 April 2025).

Acknowledgments

This manuscript is partially based on the Master’s thesis of Zhipeng Dong, submitted to Semyung University, Republic of Korea. The content has been significantly revised for journal publication. The authors confirm that the figures and experimental results are reused from the thesis with appropriate modifications and that the thesis is publicly accessible without copyright restrictions.

Conflicts of Interest

The authors declare that they have no conflicts of interest related to this work.

References

  1. Shumway, R.H.; Stoffer, D.S. ARIMA Models. In Time Series Analysis and Its Applications: With R Examples; Springer: New York, NY, USA, 2017; pp. 75–163. [Google Scholar]
  2. Bauwens, L.; Laurent, S.; Rombouts, J.V. Multivariate GARCH Models: A Survey. J. Appl. Econom. 2006, 21, 79–109. [Google Scholar] [CrossRef]
  3. Popescu, M.C.; Balas, V.E.; Perescu-Popescu, L.; Mastorakis, N. Multilayer Perceptron and Neural Networks. WSEAS Trans. Circ. Syst. 2009, 8, 579–588. [Google Scholar]
  4. Li, Z.; Liu, F.; Yang, W.; Peng, S.; Zhou, J. A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 6999–7019. [Google Scholar] [CrossRef]
  5. Medsker, L.R.; Jain, L. Recurrent Neural Networks: Design and Applications; Springer: New York, NY, USA, 2001; Volume 5, pp. 64–67. [Google Scholar]
  6. Kim, S.D. Data Mining Tool for Stock Investors Decision Support. J. Korea Contents Assoc. 2012, 12, 472–482. [Google Scholar] [CrossRef]
  7. Won, J.M.; Hwang, H.S.; Jeong, Y.H.; Park, H.D. Stock Price Prediction Using Technical Analysis Indicators and Deep Learning. In Proceedings of the KIIT Conference, Seoul, Republic of Korea, 30 November–1 December 2018; pp. 404–405. [Google Scholar]
  8. Song, Y.J.; Lee, J.W. A Design and Implementation of Deep Learning Model for Stock Prediction Using TensorFlow. KIISE Trans. Comput. Pract. 2017, 23, 799–801. [Google Scholar]
  9. Allen, F.; Karjalainen, R. Using Genetic Algorithms to Find Technical Trading Rules. J. Financ. Econ. 1999, 51, 245–271. [Google Scholar] [CrossRef]
  10. Patel, J.; Shah, S.; Thakkar, P.; Kotecha, K. Predicting Stock Market Index Using Fusion of Machine Learning Techniques. Expert Syst. Appl. 2015, 42, 2162–2172. [Google Scholar] [CrossRef]
  11. Adebiyi, A.A.; Adewumi, A.O.; Ayo, C.K. Comparison of ARIMA and Artificial Neural Networks Models for Stock Price Prediction. J. Appl. Math. 2014, 2014, 614342. [Google Scholar] [CrossRef]
  12. Laboissiere, L.A.; Fernandes, R.A.; Lage, G.G. Maximum and Minimum Stock Price Forecasting of Brazilian Power Distribution Companies Based on Artificial Neural Networks. Appl. Soft Comput. 2015, 35, 66–74. [Google Scholar] [CrossRef]
  13. Rundo, F. Deep LSTM with Reinforcement Learning Layer for Financial Trend Prediction in FX High-Frequency Trading Systems. Appl. Sci. 2019, 9, 4460. [Google Scholar] [CrossRef]
  14. Zhou, X.; Pan, Z.; Hu, G.; Tang, S.; Zhao, C. Stock Market Prediction on High-Frequency Data Using Generative Adversarial Nets. Math. Probl. Eng. 2018, 2018, 4907423. [Google Scholar] [CrossRef]
  15. Gupta, A.; Chaudhary, D.K.; Choudhury, T. Stock Prediction Using Functional Link Artificial Neural Network (FLANN). In Proceedings of the 3rd International Conference on Computational Intelligence and Networks (CINE), Odisha, India, 28 October 2017; pp. 10–16. [Google Scholar]
  16. Dixon, M.; Klabjan, D.; Bang, J.H. Classification-Based Financial Markets Prediction Using Deep Neural Networks. Algorithmic Financ. 2017, 6, 67–77. [Google Scholar] [CrossRef]
  17. Ding, X.; Zhang, Y.; Liu, T.; Duan, J. Deep Learning for Event-Driven Stock Prediction. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI), Buenos Aires, Argentina, 25–31 July 2015. [Google Scholar]
  18. Arora, N. Financial Analysis: Stock Market Prediction Using Deep Learning Algorithms. In Proceedings of the International Conference on Sustainable Computing in Science, Technology and Management (SUSCOM), Jaipur, India, 26–28 February 2019; Amity University Rajasthan: Rajasthan, India, 2019. [Google Scholar]
  19. Li, P.; Wei, Y.; Yin, L. Research on Stock Price Prediction Method Based on the GAN-LSTM-Attention Model. Comput. Mater. Cont. 2025, 82, 609–625. [Google Scholar] [CrossRef]
  20. Zhang, J.; Ye, L.; Lai, Y. Stock Price Prediction Using CNN-BiLSTM-Attention Model. Mathematics 2023, 11, 1985. [Google Scholar] [CrossRef]
  21. Huang, X. Stock Price Prediction Based On Neural Networks Incorporating Attention Mechanisms. In Proceedings of the 3rd International Conference on Internet Finance and Digital Economy (ICIFDE 2023), Chengdu, China, 4–6 August 2023; Atlantis Press: Dordrecht, The Netherlands, 2023; pp. 505–514. [Google Scholar] [CrossRef]
  22. Rosenblatt, F. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms; Spartan Books: Washington, DC, USA, 1961. [Google Scholar]
  23. Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning Representations by Back-Propagating Errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
  24. Lin, T.; Wang, Y.; Liu, X.; Qiu, X. A Survey of Transformers. AI Open 2022, 3, 111–132. [Google Scholar] [CrossRef]
  25. Duraj, A.; Szczepaniak, P.S.; Sadok, A. Detection of Anomalies in Data Streams Using the LSTM-CNN Model. Sensors 2025, 25, 1610. [Google Scholar] [CrossRef]
  26. Liu, H.; Dai, Z.; So, D.R.; Le, Q.V. Pay Attention to MLPs. Adv. Neural Inf. Process. Syst. 2021, 34, 11912–11923. [Google Scholar]
  27. Gorishniy, Y.; Kotelnikov, A.; Babenko, A. TabM: Advancing Tabular Deep Learning with Parameter-Efficient Ensembling. arXiv 2025, arXiv:2410.24210. [Google Scholar]
  28. Chen, P.; Dong, W.; Wang, J.; Lu, X.; Kaymak, U.; Huang, Z. Interpretable Clinical Prediction via Attention-Based Neural Network. BMC Med. Inform. Decis. Mak. 2020, 20, 131. [Google Scholar] [CrossRef]
  29. Dong, Z. Neural Network-Based Analysis and Prediction of the Chinese Stock Market. Master’s Thesis, Semyung University, Jecheon, Republic of Korea, 2024. [Google Scholar]
  30. Soydaner, D. A Comparison of Optimization Algorithms for Deep Learning. Int. J. Pattern Recognit. Artif. Intell. 2020, 34, 2052013. [Google Scholar] [CrossRef]
  31. Shi, Y.; Wang, Y.; Qu, Y.; Chen, Z. Integrated GCN-LSTM stock prices movement prediction based on knowledge-incorporated graphs construction. Int. J. Mach. Learn. Cybern. 2023, 15, 161–176. [Google Scholar] [CrossRef]
  32. Zhao, J.; Li, B.; Wang, X. DanSmp: Dual Attention Networks with Structured Market Knowledge Graphs for Stock Movement Prediction. Proc. AAAI Conf. Artif. Intell. 2022, 36, 7774–7782. [Google Scholar]
  33. Li, Q.; Zhou, Y.; Feng, Z. Chart-GCN: Stock Trend Forecasting Using Graph Convolutional Networks on Price Pattern Graphs. Expert Syst. Appl. 2022, 198, 116873. [Google Scholar]
  34. Liao, T.; Yu, M.; Chen, X. DHSTN: Dynamic Hypergraph Spatio-Temporal Network for Stock Prediction with Industry-Level Relations. Inf. Sci. 2024, 651, 119874. [Google Scholar]
  35. Peng, Y.; Huang, K.; Liu, X. RTGCN: Relation-Type Guided Graph Convolutional Network for Stock Movement Prediction. Knowl.-Based Syst. 2023, 266, 110194. [Google Scholar]
Figure 1. Trend of one-year deposit interest rates.
Figure 1. Trend of one-year deposit interest rates.
Electronics 14 01729 g001
Figure 2. China Stock Market index system.
Figure 2. China Stock Market index system.
Electronics 14 01729 g002
Figure 3. Relationship Between the CSI Index.
Figure 3. Relationship Between the CSI Index.
Electronics 14 01729 g003
Figure 4. Structure of a multi-layer perceptron.
Figure 4. Structure of a multi-layer perceptron.
Electronics 14 01729 g004
Figure 5. Computation steps in the attention mechanism.
Figure 5. Computation steps in the attention mechanism.
Electronics 14 01729 g005
Figure 6. Overview of the experimental process, illustrating data preparation, model training, and evaluation workflow (adapted from the Master’s thesis of Zhipeng Dong, 2024 [29]).
Figure 6. Overview of the experimental process, illustrating data preparation, model training, and evaluation workflow (adapted from the Master’s thesis of Zhipeng Dong, 2024 [29]).
Electronics 14 01729 g006
Figure 7. Visualization of loss and accuracy for the MLP model without attention layers.
Figure 7. Visualization of loss and accuracy for the MLP model without attention layers.
Electronics 14 01729 g007
Figure 8. Learning error and accuracy of models using attention layers.
Figure 8. Learning error and accuracy of models using attention layers.
Electronics 14 01729 g008
Figure 9. Layered attention weights for each attention layer.
Figure 9. Layered attention weights for each attention layer.
Electronics 14 01729 g009
Figure 10. Input feature importance analysis based on attention scores.
Figure 10. Input feature importance analysis based on attention scores.
Electronics 14 01729 g010
Table 1. Sample of CSI300 daily stock price data.
Table 1. Sample of CSI300 daily stock price data.
Trading DateSymbolSymbol NameOpen PriceClose PriceHigh PriceLow PriceTrading VolumeChange Ratio
2016-05-02000001Ping An Bank7.9827.9827.9827.98200
2016-05-03000001Ping An Bank7.9908.0658.0967.94548,910,2100.01040
2023-04-26601857PetroChina7.5007.5407.6007.380258,525,790−0.01050
Table 2. Confusion Matrix.
Table 2. Confusion Matrix.
Actual PositiveActual Negative
Predicted PositiveTrue Positive ( T P )False Positive ( F P )
Predicted NegativeFalse Negative ( F N )True Negative ( T N )
Table 3. Summary of loss, accuracy, precision, recall, and F1-Score values for the MLP model without attention layers.
Table 3. Summary of loss, accuracy, precision, recall, and F1-Score values for the MLP model without attention layers.
ModelDatasetLossAccuracyPrecisionRecallF1-Score
MLP without attentionTrain0.01550.99970.99970.99980.9997
Validation0.79120.97160.93910.99990.9684
Test0.23490.91760.92990.88260.8900
Table 4. Model performance evaluation metrics using attention layers.
Table 4. Model performance evaluation metrics using attention layers.
ModelDatasetLossAccuracyPrecisionRecallF1-Score
MLP with attentionTrain0.22240.99440.99500.99420.9946
Validation0.08490.98060.95930.99860.9784
Test0.09100.97770.98900.95880.9699
Table 5. Performance comparison of CSI300-based classification models.
Table 5. Performance comparison of CSI300-based classification models.
ModelAccuracy (%)F1-Score (%)
GCN-LSTM [31]57.81
DanSmp [32]55.79
GCRNN [33]96.9296.91
DHSTN [34]51.3458.30
EA-RTGCN [35]60.09
Ours: MLP + Attention97.7796.99
Note: Only classification-based models using the CSI300 dataset are included.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Seo, K.; Dong, Z.; Song, Y. Neural Network-Based CSI300 Stock Prediction: Feature Importance and Attention Mechanism Analysis. Electronics 2025, 14, 1729. https://doi.org/10.3390/electronics14091729

AMA Style

Seo K, Dong Z, Song Y. Neural Network-Based CSI300 Stock Prediction: Feature Importance and Attention Mechanism Analysis. Electronics. 2025; 14(9):1729. https://doi.org/10.3390/electronics14091729

Chicago/Turabian Style

Seo, Kanghyeon, Zhipeng Dong, and Yoojeong Song. 2025. "Neural Network-Based CSI300 Stock Prediction: Feature Importance and Attention Mechanism Analysis" Electronics 14, no. 9: 1729. https://doi.org/10.3390/electronics14091729

APA Style

Seo, K., Dong, Z., & Song, Y. (2025). Neural Network-Based CSI300 Stock Prediction: Feature Importance and Attention Mechanism Analysis. Electronics, 14(9), 1729. https://doi.org/10.3390/electronics14091729

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop