Next Article in Journal
Numerical Solution of a Sixth-Order Anharmonic Oscillator for Triaxial Deformed Nuclei
Next Article in Special Issue
Global Cross-Market Trading Optimization Using Iterative Combined Algorithm: A Multi-Asset Approach with Stocks and Cryptocurrencies
Previous Article in Journal
A Novel Approach for Improving Reverse Osmosis Model Accuracy: Numerical Optimization for Water Purification Systems
Previous Article in Special Issue
Sustainability, Accuracy, Fairness, and Explainability (SAFE) Machine Learning in Quantitative Trading
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Combined Algorithm Approach for Optimizing Portfolio Performance in Automated Trading: A Study of SET50 Stocks

by
Sukrit Thongkairat
1,† and
Woraphon Yamaka
2,*,†
1
Department of Statistics, Faculty of Science, Chiang Mai University, Chiang Mai 50200, Thailand
2
Faculty of Economics, Chiang Mai University, Chiang Mai 50200, Thailand
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Mathematics 2025, 13(3), 461; https://doi.org/10.3390/math13030461
Submission received: 25 December 2024 / Revised: 20 January 2025 / Accepted: 27 January 2025 / Published: 30 January 2025
(This article belongs to the Special Issue Machine Learning and Finance)

Abstract

:
This study investigates portfolio optimization for SET50 stocks using Deep Reinforcement Learning (DRL) algorithms to address market volatility. Five DRL algorithms—Advantage Actor–Critic (A2C), Proximal Policy Optimization (PPO), Deep Deterministic Policy Gradient (DDPG), Soft Actor–Critic (SAC), and Twin Delayed DDPG (TD3)—were evaluated for their effectiveness in managing risk and optimizing returns. We propose an Iterative Model Combining Algorithm (IMCA) that dynamically adjusts model weights based on market conditions to enhance performance. Our results demonstrate that IMCA consistently outperformed traditional strategies, including the Minimum Variance model. IMCA achieved a cumulative return of 14.20% and a Sharpe Ratio of 0.220, compared to the Minimum Variance model’s return of −4.35% and Sharpe Ratio of 0.018. This research highlights the adaptability and robustness of DRL algorithms for portfolio management, particularly in emerging markets like Thailand. It underscores the advantages of dynamic, data-driven strategies over static approaches.

1. Introduction

Automated trading systems have become increasingly popular due to their ability to make faster, more accurate, and more reliable decisions than human traders. These systems are particularly effective in highly dynamic and volatile financial markets, where human decisions often fall short due to emotional bias or delayed reactions. Among the many advancements in automated trading, Deep Reinforcement Learning (DRL) stands out as a powerful tool. DRL enables systems to learn from large datasets and make decisions without relying on fixed market assumptions, making it highly adaptable to changing market conditions [1,2].
Despite its promise, prior research has predominantly focused on standalone DRL techniques or traditional ensemble methods in financial markets [3,4,5]. While these approaches have demonstrated potential, they often fall short in adapting to rapidly evolving market dynamics. The integration of DRL with advanced ensemble frameworks, such as the Iterative Model Combining Algorithm (IMCA), remains an underexplored area. Addressing this gap, this study introduces a novel hybrid framework that synergistically combines the adaptability of DRL with the dynamic optimization capabilities of IMCA, thereby enhancing the robustness and efficiency of portfolio management strategies.
The proposed framework incorporates IMCA as a dynamic ensemble technique that iteratively adjusts model weights to minimize forecasting errors and adapt to shifting market conditions. Unlike traditional ensemble methods, which employ static weighting or simplistic averaging mechanisms, IMCA leverages recent model performance to recalibrate its weight distribution in real time [6,7]. This ensures that the combined strategy remains responsive to sudden market fluctuations, such as those experienced during the COVID-19 pandemic [8,9]. By dynamically harnessing the strengths of individual DRL algorithms—each excelling in specific market conditions—and compensating for their weaknesses, IMCA enables the creation of a resilient and adaptive portfolio management system.
Moreover, the integration of DRL with IMCA is particularly advantageous in emerging markets like Thailand, where market behavior is often characterized by high volatility and structural inefficiencies [10,11]. DRL’s capacity to learn complex, nonlinear relationships complements IMCA’s ability to dynamically adapt to real-time performance metrics. This synergy not only enhances portfolio returns but also improves risk mitigation, as the hybrid framework can effectively respond to unexpected shocks and systemic risks [1,2]. By bridging the gap between standalone DRL methods and static ensemble approaches, this research contributes a significant innovation to the field of automated trading and portfolio optimization.
Emerging markets, such as Thailand’s SET50 Index, present unique challenges and opportunities for testing advanced portfolio strategies. Characterized by higher inefficiencies, external dependencies, and behavioral biases compared to developed markets, the SET50 Index serves as an ideal testbed for adaptive trading systems. Its high volatility during crises like COVID-19 further underscores the importance of robust strategies capable of navigating turbulent market conditions. Moreover, studying the SET50 provides valuable insights into the application of DRL and IMCA in markets with similar dynamics across Southeast Asia and other emerging economies.
This research aims to optimize portfolio performance for SET50 stocks by combining DRL techniques with IMCA. The DRL algorithms utilized include Advantage Actor–Critic (A2C), Proximal Policy Optimization (PPO), Deep Deterministic Policy Gradient (DDPG), Soft Actor–Critic (SAC), and Twin Delayed Deep Deterministic Policy Gradient (TD3). These algorithms were chosen due to their proven success in financial applications [4,12]. The dataset consists of daily stock data from the SET50 Index spanning 2008 to 2023, divided into a training period (2008–2018) characterized by stable market conditions and a testing period (2018–2023), which includes the highly volatile COVID-19 era. The results demonstrate that the combined DRL–IMCA approach significantly outperforms traditional strategies, such as the Min-Variance strategy, in both returns and risk management.
This research makes several key contributions to the field of adaptive portfolio management. By integrating DRL and IMCA, it offers a novel approach to creating trading strategies that are not only profitable but also resilient to market fluctuations. The findings are particularly valuable for institutional investors, such as pension funds and mutual funds, who require stable yet dynamic strategies to manage risk in volatile markets. Retail investors can also benefit from this framework by gaining access to advanced, automated techniques that enhance portfolio performance with minimal manual intervention. Furthermore, businesses such as securities firms and FinTech startups can leverage these insights to develop competitive trading systems that are robust across market cycles.
This study builds on prior work in DRL and ensemble methods, extending the literature by addressing their integration within emerging market contexts. By evaluating the hybrid DRL–IMCA framework, this research highlights the potential for adaptive strategies to outperform traditional approaches in volatile environments.
The structure of this paper is as follows: Section 2 reviews relevant literature, Section 3 outlines the methodology, Section 4 presents the results, and Section 5 concludes with key insights and future directions.

2. Literature Review

Ensemble learning and reinforcement learning (RL) have become integral to portfolio optimization and algorithmic trading, particularly in volatile markets. This section explores existing techniques, their limitations, and the novel contributions of the IMCA.

2.1. Ensemble Learning in Finance and Algorithmic Trading

Ensemble learning techniques, such as Bagging [5] and Boosting [13], have become essential tools in improving predictive accuracy and robustness in algorithmic trading. Bagging leverages the aggregation of multiple predictors trained on bootstrap samples to reduce variance, while Boosting iteratively enhances model accuracy by emphasizing misclassified instances during training. These methods are particularly valuable in addressing the uncertainty and nonlinearity of financial markets, where diverse model perspectives can enhance decision-making.
Advanced ensemble techniques, including Random Forests [14] and Gradient Boosting Machines (GBMs) [15], further refine these principles by introducing randomization and gradient-based optimization. These models extend ensemble learning’s capacity to handle complex data structures, making them well-suited for financial applications. However, their reliance on static weighting mechanisms limits their responsiveness to rapidly changing market conditions. During periods of extreme volatility, such as financial crises or significant macroeconomic events, static ensemble models may fail to adjust predictions effectively, reducing their utility in high-frequency trading and portfolio optimization.
The Iterative Model Combining Algorithm (IMCA) overcomes these limitations by dynamically adjusting model weights in real time based on recent performance metrics. Unlike static ensemble methods, IMCA recalibrates its weight distribution to reflect the evolving relevance of individual models. This adaptability is particularly advantageous in financial markets, where volatility and time-varying dynamics play a critical role. IMCA’s integration with econometric models, such as GARCH [16,17], further enhances its ability to capture temporal volatility patterns, enabling more accurate forecasts and robust portfolio management. For instance, during the COVID-19 pandemic, IMCA demonstrated its effectiveness in navigating unprecedented market fluctuations by dynamically re-optimizing model contributions [8,9].
By dynamically combining predictions from diverse models, IMCA offers a resilient approach to algorithmic trading, balancing the strengths of advanced ensemble techniques and econometric modeling. Its real-time adaptability ensures enhanced performance in both stable and volatile market environments, addressing critical limitations of traditional methods. This positions IMCA as a pivotal innovation in modern financial analytics.

2.2. Deep Reinforcement Learning for Algorithmic Trading

Deep Reinforcement Learning has emerged as a transformative framework with successful applications across various domains, including traffic management [18], algorithmic trading [4], and portfolio optimization. For instance, Dong et al. [18] proposed a multi-objective DRL framework to optimize transit signal priority, showcasing DRL’s ability to dynamically adapt to complex, real-world systems. This adaptability is equally critical in financial markets, where uncertainty and volatility often dominate decision-making environments.
In the context of algorithmic trading, DRL has revolutionized the field by providing dynamic decision-making capabilities. DRL algorithms, such as Proximal Policy Optimization (PPO) [4], Twin Delayed Deep Deterministic Policy Gradient (TD3) [19], and Soft Actor–Critic (SAC) [12], exemplify this potential. PPO stabilizes policy updates, ensuring robust and consistent decision-making in volatile financial environments. TD3 addresses overestimation biases in value estimation, thereby enhancing the reliability of trading strategies and mitigating financial risks. SAC, on the other hand, prioritizes exploration by maximizing entropy, a crucial element for navigating the complexities of unpredictable financial markets.
Despite their significant strengths, DRL models face challenges when applied to high-frequency trading environments. These include limitations in accounting for market microstructure effects, latency issues, and execution risks. Furthermore, DRL models are not inherently equipped to manage risk effectively, particularly during periods of extreme market turbulence [2,20]. This lack of robust risk-management mechanisms can limit their effectiveness in scenarios characterized by rapid and unpredictable market shifts.
The IMCA offers a complementary approach by enhancing the adaptability of DRL while incorporating robust econometric frameworks for risk management. By integrating the predictive capabilities of DRL with econometric models like GARCH [16,17], IMCA addresses the challenge of time-varying volatility, ensuring more reliable risk assessments and decision-making processes. IMCA dynamically adjusts model contributions based on real-time performance metrics, striking a balance between optimizing returns and mitigating risks [21,22]. This dynamic adaptability allows IMCA to respond effectively to the high volatility and uncertainty inherent in financial markets, particularly during crises or significant market disruptions.
Additionally, IMCA extends the applicability of DRL by incorporating alternative data sources, such as sentiment analysis. By leveraging insights derived from textual data, including news articles and social media [23], IMCA enriches the decision-making process, enabling hybrid trading strategies that combine quantitative market indicators with qualitative insights. This integration of diverse data streams underscores IMCA’s potential as a versatile and comprehensive framework for modern algorithmic trading and portfolio optimization, providing a more holistic approach to navigating the complexities of financial markets.

2.3. Emerging Paradigms in Algorithmic Trading

Sentiment analysis has emerged as a transformative approach for enhancing algorithmic trading models by integrating textual insights with traditional quantitative indicators. By extracting sentiment from diverse sources such as news articles, social media posts, and financial reports, this technique provides a nuanced understanding of market behavior. Studies have demonstrated that combining sentiment data with quantitative market features significantly improves prediction accuracy and trading performance [23,24]. This integration is particularly critical in volatile markets, where investor sentiment can act as a key driver of short-term price movements and market volatility.
The IMCA presents a unique opportunity to incorporate sentiment-based features into its dynamic framework. Unlike static models that treat sentiment as a fixed input, IMCA allows for real-time adjustments to the weighting of sentiment-driven features based on their evolving relevance to market conditions [25]. During periods of heightened uncertainty or economic crises, sentiment data may exert a stronger influence on trading decisions. Conversely, during stable market phases, traditional indicators might take precedence. This dynamic adaptability ensures that IMCA remains responsive to shifting market dynamics while effectively leveraging the predictive power of sentiment analysis.
Integrating sentiment analysis into IMCA represents a significant advancement in hybrid trading strategies, bridging the gap between qualitative and quantitative approaches. By dynamically incorporating diverse data streams, IMCA enhances its predictive capabilities, enabling a more comprehensive understanding of market behavior. This holistic approach not only improves trading accuracy but also positions IMCA as a cutting-edge framework for algorithmic trading in modern financial markets.

2.4. Novel Contributions of IMCA

The IMCA introduces significant advancements over traditional ensemble methods and standalone DRL models. Unlike static ensemble approaches, IMCA offers a dynamic integration of multiple predictive models, adjusting weights in real time based on their individual performance [5,26]. This adaptability ensures that IMCA remains responsive to the ever-changing conditions of financial markets, providing a robust solution for trading and portfolio management.
A key innovation of IMCA is its hybrid framework, which integrates econometric models such as GARCH with DRL-based predictions. This combination allows IMCA to capture both short-term market dynamics and long-term trends, creating a balanced and comprehensive approach to market prediction [27,28]. By leveraging the strengths of both methodologies, IMCA addresses limitations inherent in static models and enhances its predictive power.
Another critical feature of IMCA is its scalability. The framework is designed to handle multi-asset portfolios, enabling effective diversification and reducing systemic risks [29,30]. This scalability makes IMCA particularly valuable for institutional investors managing complex portfolios across various asset classes.
IMCA also excels in real-time optimization. Its iterative design continuously updates model weights to reflect the latest market conditions, ensuring timely and accurate trading decisions. For instance, integrating GARCH-based volatility modeling with DRL’s adaptive decision-making capabilities enables IMCA to capture market nuances effectively, making it suitable for high-frequency trading scenarios [31,32].
The motivation for IMCA stems from the limitations of static ensemble models and standalone DRL frameworks in volatile market environments. Static models often fail to adapt to sudden market shifts, while standalone DRL lacks robust risk-management capabilities [33]. IMCA addresses these gaps by dynamically combining model outputs and incorporating advanced risk-management techniques, such as time-varying volatility adjustments, to deliver a resilient and adaptive trading strategy. This innovative approach positions IMCA as a cutting-edge framework for modern algorithmic trading [34].

2.5. Risk Management and Portfolio Diversification

Effective risk management remains a cornerstone of algorithmic trading, ensuring the stability and resilience of portfolios in dynamic and volatile markets. Traditional econometric models, such as Value at Risk (VaR) and Conditional Value at Risk (CVaR), have been widely used to quantify risk and provide robust assessments [35,36]. However, these methods often rely on fixed assumptions about market behavior, which can limit their effectiveness in highly dynamic and unpredictable environments.
IMCA’s iterative weight adjustment mechanism addresses these limitations by enabling dynamic responses to changing risk levels. This adaptability makes IMCA particularly well-suited for multi-asset portfolio optimization and hedging strategies, where flexibility is essential for managing systemic risks and maximizing diversification [29,37]. By continuously recalibrating model contributions based on real-time performance metrics, IMCA ensures that portfolios remain balanced and aligned with current market conditions.
Furthermore, IMCA seamlessly integrates with Deep Reinforcement Learning (DRL) frameworks to incorporate risk-sensitive reward functions. This hybrid approach enables a balance between return maximization and risk minimization [2]. For instance, combining GARCH-based volatility modeling with DRL’s decision-making capabilities allows IMCA to capture both short-term market fluctuations and long-term risk trends, enhancing the resilience of portfolios during periods of extreme market volatility.
Incorporating advanced risk-management techniques within IMCA not only strengthens its predictive accuracy but also positions it as a robust tool for navigating the complexities of modern financial markets. By addressing the limitations of traditional econometric models and leveraging the adaptability of DRL, IMCA offers a comprehensive solution for effective risk management and portfolio diversification [34].

3. Methodology

3.1. Reinforcement Learning Algorithms and Experimental Setup

This study utilizes five state-of-the-art DRL algorithms: A2C, PPO, DDPG, TD3, and SAC. These algorithms are chosen for their distinct capabilities in addressing the challenges of portfolio management in volatile markets, such as balancing risk and reward, handling high-dimensional data, and adapting to rapid market changes. The training of these models is conducted using a dataset of daily stock prices from the SET50 Index, spanning from 2008 to 2023. Before implementing the DRL algorithms, the dataset was preprocessed to ensure that it was clean and consistent.
In addition to stock prices, in order to measure sentiment, we consolidate text-based data to capture the SET50 market sentiment of a company on specific dates. News sentiment is sourced from a corpus of articles collected via Google News, a reliable aggregator. Additionally, we utilize Twitter, widely recognized for its effectiveness in forecasting stock prices and market movements [38], to derive social media-based sentiment.
For Google News, sentiment analysis begins with collecting a corpus of articles using Google News, a reliable aggregator of news content. Each article headline or text is processed using VADER (Valence Aware Dictionary and sEntiment Reasoner), a lexicon and rule-based sentiment analysis tool designed to assess the polarity (positive, negative, or neutral) and intensity of sentiment. VADER generates a compound sentiment score for each piece of text, ranging from −1 (most negative) to +1 (most positive). These scores are then aggregated to calculate an overall sentiment score for each company based on the news coverage on specific dates, capturing the market sentiment reflected in the news.
For Twitter, sentiment analysis begins with gathering tweets related to SET50 companies. Tweets are processed using VADER, which assigns a compound sentiment score to each tweet based on its text content. To enhance the accuracy and relevance of the sentiment measurement, engagement metrics such as likes and retweets are incorporated. Tweets with higher engagement are given greater weight in the calculation of the overall Twitter sentiment score for each company. This approach ensures that tweets with significant market impact contribute more to the sentiment analysis, providing a robust measure of social media-based sentiment for specific dates.
To combine the sentiment scores from Google News and Twitter, a unified market sentiment score is calculated using a weighted-average approach. First, the sentiment scores from both sources are standardized to ensure they are on the same scale, typically normalized between −1 (most negative) and +1 (most positive). Next, equal weights are assigned to each source based on their relevance and reliability. Finally, the overall market sentiment score is calculated as a weighted average of the two scores.
Each model is trained with the following configuration:
  • Training Data: Daily adjusted closing prices and engineered features, including Moving Average Convergence Divergence (MACD), Relative Strength Index (RSI), and Simple Moving Averages (SMA);
  • Training Period: Data from 2008 to 2018 were used for training, and 2018 to 2023 for testing;
  • Learning Rate: 0.0003 for most models, with fine-tuning based on validation performance;
  • Episodes: 1000 episodes for stability and convergence;
  • Batch Size: 64 observations per batch;
  • Discount Factor ( γ ): 0.99 for all models to prioritize long-term rewards;
  • Exploration Rate: Initial exploration rate of 1.0, decayed over episodes for models using epsilon-greedy policies;
  • Optimization Method: Grid search was employed for hyperparameter tuning, including discount factors, learning rates, and batch sizes;
  • Computational Resources: Training was conducted on an NVIDIA RTX 3090 GPU with 24 GB memory for efficient parallel processing;
  • Framework: TensorFlow and PyTorch were utilized for implementing the algorithms;
  • Optimization: Adam optimizer was employed across all models.
To further enhance performance, transfer learning techniques are applied where pre-trained weights from models trained on global indices (e.g., S&P 500) are fine-tuned for the SET50 dataset. This approach leverages the models’ ability to generalize from diverse datasets, speeding up convergence and improving robustness.

3.1.1. Experimental Workflow

The experiments are conducted on a MacBook Pro (2020) with a 2GHz Quad-Core Intel Core i5 processor and 16 GB of memory (3733MHz LPDDR4X). The workflow involves downloading data from Yahoo Finance in approximately 2 min, followed by the addition of technical indicators, which took about 3 min. Training each model (100,000 timesteps) requires varying durations depending on the complexity of the model. Training times range from 4–70 min, with specific hyperparameters for each model summarized in Table 1.

3.1.2. Advantage Actor–Critic (A2C)

The Advantage Actor–Critic algorithm is chosen for its ability to balance exploration and exploitation in discrete-time environments, making it particularly well-suited for stock trading applications. To enhance its performance, sentiment analysis is integrated into the model, enabling it to incorporate qualitative insights from financial news and social media alongside traditional quantitative metrics. The model is designed to prioritize risk-adjusted returns by employing a sentiment-adjusted reward function, which maximizes cumulative returns while penalizing excessive drawdowns and unfavorable sentiment exposure.
The loss function for A2C, which optimizes both the policy and value networks, is expressed as follows:
L A 2 C = 1 N i = 1 N log π θ ( a i s i , sent i ) A i c · 1 N i = 1 N V ( s i , sent i ) R i 2 ,
where N represents the total number of training samples, π θ ( a i s i , sent i ) denotes the probability of selecting action a i given the state s i and sentiment sent i under the policy parameterized by θ , and A i quantifies the advantage of taking action a i over the baseline policy.
The second term in the loss function incorporates a regularization constant c, which balances the policy loss and value function loss, ensuring that the optimization remains stable. The value function V ( s i , sent i ) represents the estimated value of state s i with sentiment sent i , and R i denotes the observed reward, which is adjusted to account for sentiment data. Specifically, R i penalizes exposure to assets with negative sentiment while incentivizing investments in assets with positive sentiment. The sentiment adjustment in R i is computed as
R i = Portfolio Return i λ 1 · Transaction Cos t i λ 2 · sent i ,
where λ 1 and λ 2 are weighting factors that balance the impact of transaction costs and sentiment penalties. The sentiment score is derived from financial news and social media data, using Natural Language Processing (NLP) techniques to quantify the market’s perception of individual assets. A negative sentiment score penalizes the agent for holding assets perceived negatively by the market, thereby reducing risk exposure.
To create the sentiment score, by incorporating sentiment into both the state representation and the reward function, the A2C model aligns its decision-making with both quantitative metrics and qualitative market insights. This integration enables the model to dynamically respond to shifts in market sentiment, fostering an adaptive trading strategy that improves robustness and performance in complex and volatile financial environments.

3.1.3. Proximal Policy Optimization (PPO)

Proximal Policy Optimization is a state-of-the-art reinforcement learning algorithm recognized for its robust performance in volatile environments. It is specifically designed to balance exploration and exploitation while ensuring stable training, making it particularly well-suited for portfolio optimization tasks. PPO utilizes a clipping mechanism to limit excessively large policy updates, ensuring stable and incremental improvements over time. Additionally, entropy regularization is incorporated to encourage exploration, preventing the agent from prematurely converging to suboptimal policies.
The PPO loss function is defined as
L P P O ( θ ) = 1 N i = 1 N min r i ( θ ) A i , clip ( r i ( θ ) , 1 ϵ , 1 + ϵ ) A i λ · 1 N i = 1 N S ( π θ ( s i , sent i ) ) ,
where the expectation operator E t has been replaced with a summation over i, consistent with batch-wise training. The term r i ( θ ) represents the probability ratio of the updated policy to the old policy, which is given by π θ ( a i s i ) / π θ old ( a i s i ) . The advantage function, A i , quantifies the improvement of the chosen action over the baseline. The clipping mechanism is controlled by the threshold parameter ϵ , typically set to 0.2, which restricts updates to a predefined trust region, avoiding destabilizing policy changes. The entropy of the policy, denoted as S ( π θ ) , promotes exploration by encouraging randomness in action selection, while λ serves as a regularization coefficient to balance exploration and exploitation.

3.1.4. Deep Deterministic Policy Gradient (DDPG)

The DDPG architecture is based on an actor–critic framework with two neural networks: the actor network determines the optimal actions (portfolio weights), while the critic network evaluates the quality of these actions. Each network comprises two hidden layers, each containing 256 neurons. The loss function for DDPG, which optimizes the critic network, is defined as
L D D P G = 1 N i = 1 N y i Q ( s i , sent i , a i θ Q ) 2 ,
where y i represents the target Q-value, which estimates the expected cumulative reward based on observed outcomes, and Q ( s i , sent i , a i θ Q ) provides the current Q-value estimate for a given state s i , sentiment sent i , and action a i . The parameters of the Q-value network are denoted by θ Q .
The target Q-value y i is computed as
y i = r i + γ Q ( s i + 1 , sent i + 1 , a i + 1 θ Q ) ,
where r i is the sentiment-adjusted reward, γ is the discount factor that determines the weight of future rewards, and Q ( s i + 1 , sent i + 1 , a i + 1 θ Q ) is the Q-value of the next state-action pair estimated by the target Q-network.

3.1.5. Soft Actor–Critic (SAC)

The Soft Actor–Critic (SAC) algorithm is based on an actor–critic framework that employs a stochastic policy and entropy maximization to enhance exploration. The actor network determines the optimal stochastic actions (portfolio weights), while the critic network evaluates the quality of these actions. SAC introduces entropy into the objective function, encouraging exploration and preventing premature convergence to suboptimal policies. The entropy temperature parameter, denoted by α , is automatically tuned during training to achieve an optimal balance between exploration and exploitation.
The critic network is trained by minimizing the following loss function:
L c r i t i c ( θ Q ) = 1 N i = 1 N Q θ Q ( s i , sent i , a i ) y i 2 ,
where Q θ Q ( s i , sent i , a i ) represents the Q-value estimate for a given state s i , sentiment sent i , and action a i , with θ Q denoting the parameters of the critic network. The target Q-value y i is computed as
y i = r i + γ E a i + 1 π ϕ Q θ Q ( s i + 1 , sent i + 1 , a i + 1 ) α log π ϕ ( a i + 1 s i + 1 , sent i + 1 ) ,
where r i is the sentiment-adjusted reward, γ is the discount factor, α is the entropy temperature, and π ϕ ( a i + 1 s i + 1 , sent i + 1 ) represents the stochastic policy output by the actor network.
The actor network is trained by minimizing the following loss function:
L a c t o r ( ϕ ) = 1 N i = 1 N E a i π ϕ α log π ϕ ( a i s i , sent i ) Q θ Q ( s i , sent i , a i ) ,
where ϕ denotes the parameters of the actor network, and log π ϕ ( a i s i , sent i ) encourages exploration by maximizing the entropy of the policy.

3.1.6. Twin Delayed Deep Deterministic Policy Gradient (TD3)

The Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm enhances the DDPG by addressing overestimation bias, improving stability, and incorporating factors such as transaction costs, which are crucial for financial applications. TD3 begins with the initialization of the actor network π ϕ ( s ) , which outputs deterministic actions a = π ϕ ( s ) for a given state s, and two critic networks Q θ 1 ( s , a ) and Q θ 2 ( s , a ) , which estimate the state-action values. The target Q-value is computed as:
Q target = r λ 1 · sent i + γ min ( Q θ 1 ( s , a ) , Q θ 2 ( s , a ) ) ,
where r is the reward, λ 1 is a weighting factor that penalizes transaction costs, γ is the discount factor, s is the next state, and a = π ϕ ( s ) + clip ( N ( 0 , σ ) , c , c ) is the smoothed action computed by adding clipped Gaussian noise to the target policy’s action. Target networks π ϕ , Q θ 1 , and Q θ 2 are updated using a soft update rule:
θ τ θ + ( 1 τ ) θ ,
where τ is the soft update rate.
To train the model, experiences ( s t , a t , r t , s t + 1 ) are collected by executing actions a t = π ϕ ( s t ) + N ( 0 , σ ) with exploration noise and storing them in the replay buffer D . A mini-batch of transitions is sampled from D to update the critics by minimizing the mean-squared error:
L θ j = 1 N i Q θ j ( s i , a i ) y i 2 ,
where y i = Q target is the target Q-value, and j = 1 , 2 . The actor network is updated less frequently (e.g., once every two critic updates) by maximizing the Q-value estimated by the first critic, incorporating the effect of transaction costs:
L ϕ = 1 N i Q θ 1 ( s i , π ϕ ( s i ) ) .
TD3 incorporates twin critics to mitigate overestimation bias, target policy smoothing to prevent exploitation of sharp Q-value variations, and delayed policy updates to enhance stability. These features are further augmented by the integration of transaction cost penalties, making TD3 particularly well-suited for continuous control tasks, such as financial applications. By accounting for market dynamics and trading costs, TD3 provides a practical and robust solution for complex, real-world scenarios.

3.2. Iterative Model Combining Algorithm (IMCA)

IMCA is an advanced ensemble technique designed to dynamically adjust the weights of individual models for optimal portfolio management. Unlike traditional methods that rely on static weights, IMCA continuously recalibrates the contributions of each model based on recent performance. This approach is particularly effective for emerging markets like the SET50 Index, which are characterized by high inefficiencies, external dependencies, and significant volatility.
Emerging markets often experience unpredictable price movements due to external shocks, lower liquidity, and behavioral biases. These factors make static models inadequate, as they fail to adapt to rapid market changes. IMCA addresses these challenges by dynamically reallocating weights, emphasizing models that perform well in current market conditions while reducing the impact of underperforming models.

3.2.1. Steps in IMCA

The IMCA framework follows a structured process designed to optimize predictions through iterative adjustments. This methodology ensures that the model ensemble adapts to changing market conditions while maintaining high predictive accuracy.
The first step involves selecting an error metric ( p ), such as Root Mean Square Error (RMSE) or Mean Absolute Error (MAE), to evaluate the performance of individual models. The error metric quantifies prediction accuracy relative to observed outcomes. The general form of p is given as
p = 1 N i = 1 N | y ^ i y i | p 1 p ,
where N is the number of observations, y ^ i represents the prediction for observation i, y i denotes the actual observed value, and p defines the type of error metric (e.g., p = 2 for RMSE and p = 1 for MAE). This metric serves as the foundation for updating model weights.
To avoid overfitting, a regularization parameter ( λ ) is introduced. Regularization ensures stability by discouraging disproportionately large weights for any single model, especially during short-term fluctuations. The weight adjustment equation incorporating λ is expressed as
w k ( t + 1 ) = w k ( t ) λ · p w k ( t ) ,
where w k ( t ) represents the weight of model k at iteration t, and p w k ( t ) is the gradient of the error metric with respect to w k ( t ) .
The historical data length (l) determines the evaluation window size, ensuring that weight updates reflect recent trends while avoiding overreaction to noise. For example, if l = 10 , the last 10 observations are used to compute the error metric for weight adjustments.
Model weights are iteratively updated based on their relative performance. A performance score for each model ( V k ) is defined as
V k = 1 w k + δ ,
where w k is the current weight of model k and δ is a small constant to prevent division by zero. Underperforming models are penalized with higher V k values, while better-performing models receive lower V k values. The refined weight update formula becomes
w k ( t + 1 ) = w k ( t ) λ · p w k ( t ) · V k .
Finally, the ensemble’s prediction is generated as a weighted sum of individual model predictions:
Y ^ for ( s + l + 1 ) = k = 1 n w k M k ( s + l + 1 ) ,
where Y ^ for ( s + l + 1 ) is the ensemble forecast for the next time step ( s + l + 1 ), w k is the updated weight of model k, M k ( s + l + 1 ) is the prediction by model k for the next time step, and n is the total number of models in the ensemble.

3.2.2. Example to Illustrate IMCA

To illustrate the IMCA methodology, consider an example where five models ( M 1 , M 2 , M 3 , M 4 , M 5 ) predict stock prices, and their weights are adjusted iteratively based on their performance. Suppose that the historical data length (l) is 10 days and the error metric used is Mean Absolute Error (MAE, p = 1 ). Predictions and actual values from 1 January to 10 January 2023 are used to evaluate performance. The combined forecast will be made for 11 January 2023 ( s + l + 1 ) (see Figure 1).
First, the MAE for each model is calculated using
MAE k = 1 l i = 1 l | y ^ i k y i | ,
where y ^ i k represents the prediction by model k for day i and y i is the actual stock price for day i. A lower MAE indicates better model performance.
Next, the performance scores ( V k ) for the models are computed based on their current weights:
V k = 1 w k + δ .
The weights are then updated iteratively using the formula
w k ( t + 1 ) = w k ( t ) λ · p w k ( t ) · V k .
Finally, the combined prediction for 11 January 2023 is obtained as
Y ^ for ( s + l + 1 ) = k = 1 5 w k M k ( s + l + 1 ) .
This process ensures that the ensemble prediction reflects the strengths of individual models while dynamically adapting to their performance.
Example Output: If the initial weights for M 1 , M 2 , M 3 , M 4 , M 5 are w 1 = 0.3 , w 2 = 0.25 , w 3 = 0.2 , w 4 = 0.15 , w 5 = 0.1 , and their MAE scores for the last 10 days are MAE 1 = 1.2 , MAE 2 = 0.8 , MAE 3 = 1 , MAE 4 = 0.6 and MAE 5 = 1.1 :
  • Compute V k for each model:
    V 1 = 1 0.4 + δ , V 2 = 1 0.3 + δ , V 3 = 1 0.1 + δ , V 4 = 1 0.14 + δ , V 5 = 1 0.16 + δ .
  • Update the weights using the formula for w k ( t + 1 ) :
    w k ( t + 1 ) = w k ( t ) λ · p w k ( t ) · V k .
    For instance, models with higher MAE will have their weights penalized more;
  • Combine the updated weights to calculate the forecast:
    Y ^ for ( s + l + 1 ) = k = 1 5 w k M k ( s + l + 1 ) .
This iterative process ensures that the combined prediction leverages the strengths of the best-performing models while minimizing the influence of underperforming ones. It dynamically adapts the ensemble to changing market conditions, making IMCA a robust tool for portfolio management in volatile markets.
We would like to note that vanishing gradients are a well-known challenge in training deep neural networks, as they can hinder learning in earlier layers of the network. However, the DRL models used in this study are specifically designed to mitigate this issue through modern techniques. These include ReLU activation functions, which avoid saturation and preserve gradient magnitudes, and layer normalization, which stabilizes training by maintaining consistent gradient flow across layers. Additionally, architectures like SAC and TD3 incorporate residual connections that enable gradients to bypass intermediate layers, further preventing the vanishing gradient problem. As a result, the proposed model does not suffer from vanishing gradients, eliminating the need for additional algorithms to address this issue.

3.2.3. Performance Evaluation Metrics

The effectiveness of the IMCA framework in portfolio management is assessed using a range of performance evaluation metrics that provide insights into profitability, risk, and overall portfolio performance.
Cumulative Return (CR) measures the total investment growth over the evaluation period, serving as a comprehensive indicator of overall profitability [39]. This metric captures the net effect of all gains and losses on the portfolio during the specified timeframe.
Annual Return (AR) reflects the average yearly growth of the portfolio, enabling meaningful comparisons across different time periods and investment strategies [40]. By annualizing returns, this metric standardizes performance evaluation for strategies operating over varying horizons.
Annualized Volatility (AV) quantifies the variability of portfolio returns on an annual basis, providing a measure of the risk level associated with the investment strategy [41]. A higher volatility indicates greater uncertainty in returns, while lower volatility suggests more stable performance.
The Sharpe Ratio (SR) evaluates risk-adjusted returns by measuring the excess return achieved per unit of risk taken [42]. This metric is instrumental in determining whether a portfolio’s performance justifies the level of risk incurred, offering a comparative perspective across different strategies.
Maximum Drawdown (MD) captures the largest decline in portfolio value from a peak to a trough during the evaluation period, offering insights into potential worst-case losses [43]. This metric is particularly valuable for understanding the resilience of the portfolio under adverse market conditions.
These metrics collectively provide a comprehensive framework for evaluating IMCA’s performance, balancing profitability and risk considerations to determine its efficacy in managing dynamic financial portfolios.

4. Estimation Results

This section presents the performance evaluation of various portfolio allocation strategies, including DRL-based models and traditional approaches. Figure 2 and Figure 3 provide visual comparisons of cumulative returns, while Table 2 summarizes the performance metrics for all models. The results illustrate the adaptability and robustness of DRL algorithms and the superior performance of IMCA in managing portfolio allocations under volatile market conditions.

4.1. Cumulative Return Trends

The cumulative returns of portfolio allocation models provide valuable insights into their performance over time. This section highlights the comparative analysis of both Reinforcement Learning (RL)-based models and traditional strategies, focusing on their behavior during volatile periods such as the COVID-19 pandemic and their overall recovery patterns.
Figure 2 showcases the cumulative returns of five DRL models, A2C, PPO, DDPG, SAC, and TD3, over the testing period from January 2018 to December 2023. The impact of the COVID-19 pandemic in early 2020 is evident, as all models experienced significant drawdowns during the initial market shock. However, notable differences in recovery patterns emerged:
  • A2C and SAC: Both models exhibited strong resilience and recovery post-2020, demonstrating their ability to adapt to volatile market conditions. Their cumulative returns surpass those of other DRL models by the end of the testing period, indicating effective portfolio rebalancing and risk management;
  • PPO: This model showed the weakest performance among the DRL algorithms, with relatively low cumulative returns and slower recovery rates. This outcome highlights PPO’s potential sensitivity to market volatility and its limitations in balancing exploration and exploitation;
  • DDPG and TD3: These models achieved moderate performance, with consistent but less aggressive recoveries compared to A2C and SAC. Their stability suggests that they are well-suited for environments with less pronounced market fluctuations.
The performance of traditional strategies and IMCA is presented in Figure 3. IMCA demonstrates significantly higher cumulative returns compared to both traditional strategies. This result underscores its robustness and superior growth potential. The dynamic weighting mechanism employed by IMCA effectively leverages the strengths of multiple models, enabling it to outperform during both market downturns and recovery phases. This adaptability makes it particularly suitable for navigating volatile market environments.
For traditional methods, the Min-Variance strategy stands out for achieving the lowest drawdowns, making it particularly appealing to risk-averse investors. However, its limited cumulative returns highlight its inability to fully capitalize on upward market trends, particularly during recovery periods such as those following the COVID-19 pandemic. This trade-off between minimizing risk and maximizing growth exemplifies the inherent limitations of traditional strategies. Similarly, CAPM-based Mean-Variance Portfolio Optimization, which integrates the Capital Asset Pricing Model (CAPM) and Mean-Variance Optimization (MVO), seeks to balance risk and return but remains constrained by its static approach.
The CAPM portfolio allocation strategy, represented by the purple line in the chart, demonstrates significant volatility over the observed period. It performed especially poorly during periods of heightened market turbulence, such as the sharp decline in April 2020. Among the strategies compared, CAPM experienced larger drawdowns and showed weaker recovery, exposing its limitations in adapting to dynamic market conditions. Its reliance on a static, linear risk–return relationship further emphasizes the need for more adaptive and responsive portfolio strategies to effectively navigate rapidly changing market environments.
The SET50 Baseline, representing the average performance of the SET50 stock index, also illustrates the significant shortcomings of static allocation strategies. It suffered the most substantial losses during periods of market volatility, such as the COVID-19 crisis, highlighting its inability to adjust to rapid market changes. The baseline’s underperformance during turbulent times emphasizes the critical need for dynamic portfolio management strategies like the IMCA, which is better equipped to deliver consistent returns and resilience in fluctuating markets.

4.2. Overall Performance Metrics

Table 2 summarizes the performance metrics of the models. A2C, a DRL-based model, achieved the highest annual and cumulative returns, demonstrating its potential for long-term growth. IMCA also delivered robust performance, surpassing traditional strategies such as the Min-Variance approach and the SET50 Baseline, further validating its dynamic and adaptive portfolio allocation methodology.
In terms of annual returns and cumulative returns, A2C recorded the highest values among the Deep Reinforcement Learning models, showcasing its ability to generate superior long-term growth. IMCA demonstrated robust performance, outperforming all traditional strategies, including Min-Variance CAPM and the SET50 Baseline, which confirms its effectiveness in achieving consistent and superior returns over time.
For annual volatility, Min-Variance achieved the lowest level, reflecting its conservative approach and appeal to risk-averse investors. However, DRL models, including IMCA, displayed moderate and manageable volatility levels, indicating their ability to balance risk and return effectively while remaining competitive.
When evaluating risk-adjusted performance through the Sharpe Ratio, A2C emerged as the top performer, achieving the highest value among all models. This metric highlights A2C’s efficiency in delivering returns relative to the risk taken. IMCA’s Sharpe Ratio further emphasizes its balanced approach, combining profitability with controlled risk exposure.
In terms of maximum drawdown, Min-Variance exhibited the smallest drawdowns, reinforcing its suitability for investors prioritizing capital preservation. IMCA maintained competitive drawdown levels, outperforming the SET50 Baseline and demonstrating resilience during market downturns. This resilience underscores IMCA’s robustness in adapting to adverse market conditions, further enhancing its appeal to portfolio managers seeking both growth and stability.
Overall, Table 2 illustrates the versatility and adaptability of IMCA, which strikes a balance between profitability and risk management. It also highlights the comparative advantages of DRL models over traditional approaches in dynamic and volatile market environments.
The superior performance of A2C and IMCA can be explained by their ability to dynamically adapt to changing market conditions. For instance, during the COVID-19 pandemic, when the market experienced extreme volatility, A2C utilized its advantage function to stabilize decision-making. This ensured consistent adjustments to portfolio allocations, even during periods of significant market uncertainty. Similarly, IMCA’s ability to adjust model contributions in real-time allowed it to reduce the influence of poorly performing models and increase the weight of better-performing ones. This flexibility enabled the portfolio to recover faster and perform more effectively during the market rebound.
In contrast, traditional models like Min-Variance and CAPM rely on fixed weighting strategies. These models struggle to adjust to sudden market changes, as shown by their lower cumulative returns during the sharp market declines and recoveries during the COVID-19 crisis.
Furthermore, DRL models like SAC stood out in handling complex data and exploring a wide range of strategies. Instead of relying on a fixed plan, SAC actively tested and refined its strategies, ensuring that it could adapt to rapidly changing market conditions. This approach was particularly important during the uncertainty of the pandemic, as it helped the model avoid being locked into suboptimal solutions. Combined with IMCA’s flexibility to adapt based on real-time performance, this resulted in a powerful and effective system for managing portfolios even in the most challenging market environments.

4.3. Pre-COVID Outbreak (1 January 2018 to 31 December 2019)

The statistics above indicate that, during the pre-COVID period, the IMCA strategy outperformed both the Baseline and Minimum Variance strategies, providing the highest cumulative return and Sharpe Ratio. This suggests that IMC was particularly effective during stable market conditions, likely due to its ability to balance growth and risk.
The Baseline strategy, which mirrors a traditional index approach, showed moderate volatility but struggled with lower cumulative returns and a negative Sharpe Ratio. This performance suggests that conventional index-based strategies may be less effective in periods of steady market growth, where more adaptive models can capitalize on incremental gains.
In contrast, the Minimum Variance strategy offered lower risk, as shown by its lower volatility and smaller drawdowns. However, this emphasis on stability came at the cost of higher returns, resulting in a lower cumulative return than IMCA. Investors with a risk-averse approach may find Minimum Variance appealing, but the IMCA strategy stands out as the preferred option for those seeking growth without excessive risk in stable markets.
Figure 4 below illustrates the cumulative return trends, while Table 3 provides a summary of key performance metrics for each strategy.

4.4. During-COVID Outbreak (1 January 2020 to 31 December 2021)

The statistics for the during-COVID period show that the market was highly volatile, largely due to the global economic disruptions from the COVID-19 pandemic. This period tested each strategy’s ability to handle sharp declines and rapid rebounds, making it an effective assessment of risk management and adaptability.
The IMCA strategy once again demonstrated strong performance, achieving the highest cumulative returns and a positive Sharpe Ratio despite the volatility. This suggests that IMCA was able to adjust to the rapid market fluctuations more effectively than the other strategies, making it a resilient choice during unpredictable times.
In comparison, the Baseline strategy, which mirrors a traditional market index, experienced high volatility and significant drawdowns, resulting in lower cumulative returns. This outcome highlights the limitations of conventional index-based approaches in times of crisis, as these strategies lack the flexibility to respond quickly to market downturns.
The Minimum Variance strategy, while focused on reducing risk, showed lower cumulative returns as well. Its priority on stability helped it to avoid the worst losses, as indicated by a smaller drawdown than the Baseline, but it still lagged behind IMCA in terms of overall growth. This suggests that, while suitable for risk-averse investors, Minimum Variance may sacrifice growth potential during highly volatile periods.
Figure 5 below illustrates the cumulative return trends, while Table 4 provides a summary of key performance metrics for each strategy during the COVID-19 outbreak period.
This performance contrasts with the following period of market recovery, where different growth dynamics come into play.

4.5. Post-COVID Outbreak (1 January 2022 to 31 December 2023)

The statistics for the post-COVID period illustrate how each strategy adapted to the market’s recovery phase. During this time, economic conditions began to stabilize, and markets rebounded, offering growth opportunities. This period provides insights into each strategy’s ability to capitalize on recovery trends while managing residual volatility.
The IMCA strategy continued to outperform the other models, achieving the highest cumulative returns and a positive Sharpe Ratio. IMCA’s consistent performance highlights its adaptability and growth potential, making it a strong choice for investors seeking to maximize returns in a recovering market.
The Baseline strategy, following the traditional market index, showed some recovery but remained limited by higher drawdowns and moderate cumulative returns. This performance suggests that, while index-based strategies can participate in growth during favorable market conditions, they may still be impacted by lingering volatility, reducing their overall appeal in a recovery phase.
The Minimum Variance strategy demonstrated stability with lower volatility and drawdowns compared to the Baseline. However, its conservative approach led to relatively modest cumulative returns, reflecting a trade-off between stability and growth potential. This makes the Minimum Variance strategy suitable for investors prioritizing risk reduction over aggressive gains in a post-crisis environment.
Figure 6 below illustrates the cumulative return trends, while Table 5 provides a summary of key performance metrics for each strategy during the post-COVID period.

4.6. Robustness Check

To verify the performance of the proposed IMCA model, we conduct two robust analyses. First, we evaluate the performance of IMCA while accounting for transaction costs. Second, we examine the impact of varying the timestep to assess the robustness and generalization of the Deep Reinforcement Learning model, which can also indirectly help to detect overfitting.

4.6.1. Measuring the Performance of Strategies Under Transaction Costs

To measure the performance of the IMCA model under transaction costs, we account for daily trading strategies that may involve restructuring positions multiple times per day, with a maximum of 50 trades daily. The profit and loss for these strategies, when applying a transaction cost of 0.02% per trade, are presented in Table 6.
Compared to strategies without considering transaction costs (Table 2), Table 6 shows that the IMCA model demonstrates notable robustness when transaction costs are applied. While there is a reduction in its annual returns (from 2.32% to 2.10%) and cumulative returns (from 14.20% to 13.00%), IMCA remains one of the best-performing models. Its Sharpe ratio declines slightly (from 0.220 to 0.185), highlighting the inevitable impact of transaction costs on risk-adjusted returns while maintaining its strong performance relative to other strategies.

4.6.2. Evaluating Learning Progression

The proposed Iterative Model Combining Algorithm trading strategy consists of two main components: Deep Reinforcement Learning optimization and the calculation of optimal weights for each algorithm, as described in Section 3. The IMCA dynamically adjusts to changing market conditions, extracting informative features from the environment to optimize portfolio performance. However, the complexity of the IMCA introduces potential challenges, such as sampling noise, which could lead to overfitting. To evaluate its learning progression, the reward vs. timesteps graph is employed, a fundamental tool in DRL that illustrates how well the model adapts and learns over time. In this experiment, the timesteps range from 10,000 to 200,000, as shown in Figure 7.
Figure 7 shows the IMCA model’s learning progression over timesteps from 10,000 to 200,000. Initially, the average reward increases sharply from −19.03 to 34.31 (10,000 to 50,000 timesteps), indicating effective learning. The reward continues to rise, peaking at 55.98 by 100,000 timesteps, reflecting robust optimization. Beyond this, fluctuations occur, with a dip to 45 at 150,000 timesteps before recovering to 51 at 200,000, likely due to sampling noise or market sensitivity. Overall, the IMCA demonstrates strong learning and adaptability, though slight refinements could further stabilize performance in later stages.

4.7. Discussion

The results reveal several key insights that underline the effectiveness of the proposed models and methodologies.
First, the superiority of IMCA is evident from its consistent outperformance of traditional portfolio allocation methods. IMCA demonstrates adaptability and effectiveness, particularly in emerging markets like the SET50 Index, aligning with prior studies that highlight the importance of dynamic asset allocation in improving portfolio performance [30,39]. Its dynamic weighting mechanism, which leverages the strengths of multiple models, ensures resilience in the face of market fluctuations. This adaptability is consistent with findings that emphasize the importance of flexibility in managing volatile markets [44,45]. IMCA’s ability to balance risk and return makes it a versatile and reliable tool for portfolio management.
Second, insights from the performance of DRL algorithms highlight the strengths and limitations of individual models. A2C and SAC stand out as the most effective DRL models in managing risk and capitalizing on market opportunities. These results are supported by previous research that demonstrates the ability of DRL algorithms to adapt to complex environments and optimize long-term objectives [46,47]. In contrast, PPO’s underperformance emphasizes the critical importance of selecting DRL algorithms tailored to the specific complexities and challenges of financial markets, where adaptability and precision are key [4,19].
Finally, the results emphasizes the trade-offs between risk and return across the evaluated strategies. While Min-Variance offers the lowest risk among all methods, its limited growth potential aligns with the findings of traditional portfolio theory, which identifies the trade-off between minimizing risk and achieving higher returns [30,48]. These results highlight the necessity of adopting more dynamic approaches, such as IMCA, to achieve long-term investment objectives. Studies in the context of adaptive asset allocation strategies also validate this observation, suggesting that models incorporating real-time adjustments deliver superior outcomes in changing market environments [21,22].
Overall, the findings demonstrate that integrating DRL techniques with IMCA creates a robust and efficient portfolio management framework. This approach not only navigates volatile market conditions but also delivers superior risk-adjusted returns, consistent with prior studies that highlight the advantages of combining machine learning with portfolio optimization [49]. These results make IMCA an ideal solution for investors seeking both stability and profitability in dynamic financial environments.

5. Conclusions

The findings highlight the superior performance of the Iterative Model Combining Algorithm, with an Annual Return of 2.32%, Cumulative Returns of 14.20%, and a Sharpe Ratio of 0.220, outperforming traditional models like the Minimum Variance strategy, which recorded a negative Annual Return of −0.77%, Cumulative Returns of −4.35%, and a Sharpe Ratio of 0.018. The high Sharpe Ratio and competitive Max Drawdown of IMCA and A2C underscore their ability to deliver strong returns while maintaining effective risk management. These results underscore IMCA’s ability to achieve consistent profitability while maintaining balanced risk exposure, as evidenced by its manageable annual volatility of 17.56% and competitive maximum drawdown of −44.78%.
The Advantage Actor–Critic model demonstrated even higher Annual Returns (2.78%) and Cumulative Returns (17.16%) among the DRL-based models, showcasing its effectiveness in long-term growth. Its Sharpe Ratio of 0.246, the highest across all models, reflects its efficiency in delivering risk-adjusted returns. These metrics collectively highlight the strength of DRL-based approaches in dynamically adapting to volatile market conditions, particularly during periods of extreme uncertainty, such as the COVID-19 pandemic.
The comparative analysis reveals that, while traditional strategies like the Minimum Variance approach prioritize risk minimization, they often fail to capitalize on upward market trends, leading to suboptimal returns. In contrast, IMCA and DRL models demonstrate a balanced approach, combining adaptability and profitability to deliver superior performance under stable and volatile market conditions.
This study demonstrates the potential of combining multiple DRL algorithms to optimize portfolio management, specifically for SET50 stocks. By utilizing an IMCA, we dynamically adjusted model weights to minimize forecasting errors, enabling our combined models to adapt effectively under varying market conditions. The findings indicate that this hybrid approach significantly outperforms traditional strategies, such as the Min-Variance strategy or the SET50 Baseline, in both return generation and risk management.
Our results highlight that DRL models, when implemented with diverse algorithms and optimized using techniques like IMCA, provide a robust and adaptive framework for trading in volatile financial markets. Notably, algorithms such as Advantage Actor–Critic (A2C) and Soft Actor–Critic (SAC) demonstrated resilience and delivered higher Cumulative Returns, particularly during periods of extreme market turbulence, such as the COVID-19 pandemic. These models excelled in balancing risk and return, underscoring the adaptability and effectiveness of DRL in managing dynamic and unpredictable market environments.
The superior performance of A2C and IMCA can be explained by their ability to dynamically adapt to changing market conditions. For instance, during the COVID-19 pandemic, when the market experienced extreme volatility, A2C utilized its advantage function to stabilize decision-making. This ensured consistent adjustments to portfolio allocations, even during periods of significant market uncertainty. Similarly, IMCA’s ability to adjust model contributions in real time allowed it to reduce the influence of poorly performing models and increase the weight of better-performing ones. IMCA dynamically reduces the influence of underperforming models, such as those impacted by short-term market anomalies, while amplifying the contribution of consistently high-performing models. This flexibility enabled the portfolio to recover faster and perform more effectively during the market rebound.
Future research could explore the integration of additional DRL models or the application of this framework to other market environments, such as commodities or developed stock indices, to validate its versatility. Further investigations into the incorporation of transaction costs, liquidity constraints, and other real-world factors would enhance the practical applicability of the approach. Additionally, optimizing hyperparameters and exploring advanced ensemble techniques, such as those incorporating sentiment analysis or macroeconomic indicators, could further improve forecasting accuracy and profitability [50,51]. Conducting comparative experiments with alternative ensemble and machine learning methods, such as Double Deep Q-Learning and Trust Region Policy Optimization, would help to validate the superiority of the IMCA framework. Moreover, testing the framework on more extensive portfolios would provide insights into its scalability and effectiveness across different asset classes. Overall, this study establishes a strong foundation for leveraging DRL and IMCA in the development of advanced portfolio management systems.

Author Contributions

Methodology, S.T.; Investigation, S.T.; Resources, W.Y.; Writing—original draft, S.T.; Writing—review & editing, W.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research work was funded by Chiang Mai University, grant number CEE2025.

Data Availability Statement

The data can be freely downloaded from Yahoo Finance (finance.yahoo.com).

Acknowledgments

This research work was partially supported by the Department of Statistics, Faculty of Science, and the Center of Excellence in Econometrics, Faculty of Economics at Chiang Mai University.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Deng, Y.; Bao, F.; Kong, Y.; Ren, Z.; Dai, Q. Deep direct reinforcement learning for financial signal representation and trading. IEEE Trans. Neural Networks Learn. Syst. 2016, 28, 653–664. [Google Scholar] [CrossRef]
  2. Buehler, H.; Gonon, L.; Teichmann, J.; Wood, B. Deep hedging. Quant. Financ. 2019, 19, 1271–1291. [Google Scholar] [CrossRef]
  3. Jiang, Z.; Liang, J. Cryptocurrency portfolio management with deep reinforcement learning. arXiv 2017, arXiv:1612.01277. [Google Scholar] [CrossRef]
  4. Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
  5. Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
  6. Ansari, Y.; Gillani, S.; Bukhari, M.; Lee, B.; Maqsood, M.; Rho, S. A Multifaceted Approach to Stock Market Trading Using Reinforcement Learning. IEEE Access 2024, 12, 90041–90060. [Google Scholar] [CrossRef]
  7. Zhou, Z.-H. Ensemble Methods: Foundations and Algorithms; CRC Press: Boca Raton, FL, USA, 2012. [Google Scholar]
  8. Goodell, J.W. COVID-19 and finance: Agendas for future research. Financ. Res. Lett. 2020, 35, 101512. [Google Scholar] [CrossRef]
  9. Zhang, W.; Wang, P.; Zhang, X. Portfolio management during pandemics: Evidence from COVID-19. Emerg. Mark. Rev. 2022, 51, 100857. [Google Scholar]
  10. Phan, D.H.B.; Narayan, P.K. The importance of economic policy uncertainty in predicting stock returns: Evidence from emerging markets. J. Int. Financ. Mark. 2019, 59, 30–50. [Google Scholar]
  11. Narayan, P.K.; Phan, D.H.B. Country-specific COVID-19 social distancing measures and their impact on stock returns. Emerg. Mark. Financ. Trade 2020, 56, 2273–2287. [Google Scholar]
  12. Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the 35th International Conference on Machine Learning (ICML), Stockholm Sweden, 10–15 July 2018; pp. 1861–1870. [Google Scholar]
  13. Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef]
  14. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  15. Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
  16. Bollerslev, T. Generalized autoregressive conditional heteroskedasticity. J. Econom. 1986, 31, 307–327. [Google Scholar] [CrossRef]
  17. Engle, R.F. Dynamic conditional correlation: A simple class of multivariate GARCH models. J. Bus. Econ. Stat. 2002, 20, 339–350. [Google Scholar] [CrossRef]
  18. Dong, Y.; Huang, H.; Zhang, G.; Jin, J. Adaptive Transit Signal Priority Control for Traffic Safety and Efficiency Optimization: A Multi-Objective Deep Reinforcement Learning Framework. Mathematics 2024, 12, 3994. [Google Scholar] [CrossRef]
  19. Fujimoto, S.; van Hoof, H.; Meger, D. Addressing function approximation error in actor-critic methods. In Proceedings of the 35th International Conference on Machine Learning (ICML), Stockholm, Sweden, 10–15 July 2018; pp. 1587–1596. [Google Scholar]
  20. Gao, H.; Kou, G.; Liang, H.; Zhang, H.; Chao, X.; Li, C.C.; Dong, Y. Machine learning in business and finance: A literature review and research opportunities. Financ. Innov. 2024, 10, 86. [Google Scholar] [CrossRef]
  21. Zhang, Z.; Zohren, S.; Roberts, S. Deep reinforcement learning for trading. J. Financ. Data Sci. 2020, 2, 25–40. [Google Scholar] [CrossRef]
  22. Bai, X.; Zhuang, S.; Xie, H.; Guo, L. Leveraging generative artificial intelligence for financial marke t trading data management and prediction. J. Artif. Intell. Inf. 2024, 1, 32–41. [Google Scholar]
  23. Bollen, J.; Mao, H.; Zeng, X. Twitter mood predicts the stock market. J. Comput. Sci. 2011, 2, 1–8. [Google Scholar] [CrossRef]
  24. Nassirtoussi, A.K.; Aghabozorgi, S.; Wah, T.Y.; Ngo, D.C.L. Text mining for market prediction: A systematic review. Expert Syst. Appl. 2014, 41, 7653–7670. [Google Scholar] [CrossRef]
  25. Mittermayer, M.A. Forecasting intraday stock price trends with text mining techniques. In Proceedings of the 37th Annual Hawaii International Conference on System Sciences, Kauai, HI, USA, 4–7 January 2006. [Google Scholar] [CrossRef]
  26. Schapire, R.E. A brief introduction to boosting. In Proceedings of the 16th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 31 July–6 August 1999; pp. 1401–1406. [Google Scholar]
  27. Engle, R.F. Autoregressive conditional heteroscedasticity with estimates of the variance of UK inflations. Econometrica 1982, 59, 987–1007. [Google Scholar] [CrossRef]
  28. Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Harley, T.; Lillicrap, T.P.; Silver, D.; Kavukcuoglu, K. Asynchronous methods for deep reinforcement learning. In Proceedings of the 33rd International Conference on Machine Learning (ICML), New York, NY, USA, 19–24 June 2016. [Google Scholar]
  29. Hull, J. Risk Management and Financial Institutions, 4th ed.; John Wiley & Sons: Hoboken, NJ, USA, 2014. [Google Scholar]
  30. Markowitz, H. Portfolio selection. J. Financ. 1952, 7, 77–91. [Google Scholar]
  31. Glosten, L.R.; Jagannathan, R.; Runkle, D.E. On the relation between the expected value and the volatility of the nominal excess return on stocks. J. Financ. 1993, 48, 1779–1801. [Google Scholar] [CrossRef]
  32. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  33. Sutton, R.S.; Barto, A.G. Reinforcement learning: An introduction. Robotica 1999, 17, 229–235. [Google Scholar] [CrossRef]
  34. Tsay, R.S. Analysis of Financial Time Series, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2005. [Google Scholar]
  35. Jorion, P. Value at Risk: The New Benchmark for Managing Financial Risk, 3rd ed.; McGraw Hill Professional: New York, NY, USA, 2006. [Google Scholar]
  36. Rockafellar, R.T.; Uryasev, S. Optimization of conditional value-at-risk. J. Risk 2000, 2, 21–42. [Google Scholar] [CrossRef]
  37. Alexander, C. Market Risk Analysis, Quantitative Methods in Finance; John Wiley & Sons: Hoboken, NJ, USA, 2008. [Google Scholar]
  38. Koratamaddi, P.; Wadhwani, K.; Gupta, M.; Sanjeevi, S.G. Market sentiment-aware deep reinforcement learning approach for stock portfolio allocation. Eng. Sci. Technol. Int. J. 2021, 24, 848–859. [Google Scholar] [CrossRef]
  39. Lhabitant, F.S. Handbook of Hedge Funds; John Wiley & Sons: Hoboken, NJ, USA, 2007. [Google Scholar]
  40. Damodaran, A. Investment Valuation: Tools and Techniques for Determining the Value of Any Asset; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
  41. Bodie, Z.; Kane, A.; Marcus, A.J. Investments; McGraw-Hill Education: New York, NY, USA, 2014. [Google Scholar]
  42. Sharpe, W.F. The Sharpe Ratio. J. Portf. Manag. 1994, 21, 49–58. [Google Scholar] [CrossRef]
  43. Maginn, J.L.; Tuttle, D.L.; McLeavey, J.E.; Pinto, D.W. Managing Investment Portfolios: A Dynamic Process; CFA Institute: Charlottesville, VA, USA, 2007. [Google Scholar]
  44. Beyhaghi, M.; Hawley, J.P. Modern portfolio theory and risk management: Assumptions and unintended consequences. J. Sustain. Financ. Investig. 2013, 3, 17–37. [Google Scholar] [CrossRef]
  45. Grinold, R.C.; Kahn, R.N. Active Portfolio Management: A Quantitative Approach for Producing Superior Returns and Controlling Risk, 2nd ed.; McGraw-Hill: New York, NY, USA, 1999. [Google Scholar]
  46. Rouf, N.; Malik, M.B.; Arif, T.; Sharma, S.; Singh, S.; Aich, S.; Kim, H.C. Stock market prediction using machine learning techniques: A decade survey on methodologies, recent developments, and future directions. Electronics 2021, 10, 2717. [Google Scholar] [CrossRef]
  47. Sahu, S.K.; Mokhade, A.; Bokde, N.D. An overview of machine learning, deep learning, and reinforcement learning-based techniques in quantitative finance: Recent progress and challenges. Appl. Sci. 2023, 13, 1956. [Google Scholar] [CrossRef]
  48. Gambeta, V.; Kwon, R. Risk return trade-off in relaxed risk parity portfolio optimization. J. Risk Financ. Manag. 2020, 13, 237. [Google Scholar] [CrossRef]
  49. Gu, S.; Kelly, B.; Xiu, D. Empirical asset pricing via machine learning. Rev. Financ. Stud. 2020, 33, 2223–2273. [Google Scholar] [CrossRef]
  50. Yamaka, W.; Sriboonchitta, S. Forecasting using information and entropy based on belief functions. Complexity 2020, 2020, 3269647. [Google Scholar] [CrossRef]
  51. Yamaka, W. Sparse estimations in kink regression model. Soft Comput. 2021, 25, 7825–7838. [Google Scholar] [CrossRef]
Figure 1. IMCA flowchart.
Figure 1. IMCA flowchart.
Mathematics 13 00461 g001
Figure 2. Cumulative returns comparison of Reinforcement Learning portfolio allocation models.
Figure 2. Cumulative returns comparison of Reinforcement Learning portfolio allocation models.
Mathematics 13 00461 g002
Figure 3. Comparative performance of traditional and iterative model combining portfolio strategies.
Figure 3. Comparative performance of traditional and iterative model combining portfolio strategies.
Mathematics 13 00461 g003
Figure 4. Cumulative Return comparison (2018–2019): Traditional vs. Iterative Model Combining Algorithm.
Figure 4. Cumulative Return comparison (2018–2019): Traditional vs. Iterative Model Combining Algorithm.
Mathematics 13 00461 g004
Figure 5. Cumulative Return comparison (2020–2021): Traditional vs. Iterative Model Combining Algorithm.
Figure 5. Cumulative Return comparison (2020–2021): Traditional vs. Iterative Model Combining Algorithm.
Mathematics 13 00461 g005
Figure 6. Cumulative Return comparison (2022–2023): Traditional vs. Iterative Model Combining Algorithm.
Figure 6. Cumulative Return comparison (2022–2023): Traditional vs. Iterative Model Combining Algorithm.
Mathematics 13 00461 g006
Figure 7. Average rewards with respect to the number of timesteps.
Figure 7. Average rewards with respect to the number of timesteps.
Mathematics 13 00461 g007
Table 1. Training times and hyperparameters for each reinforcement learning model.
Table 1. Training times and hyperparameters for each reinforcement learning model.
ModelHyperparametersTimestepsTraining Time
A2C“n_steps”: 10,000,
“ent_coef”: 0.01,
“learning_rate”: 0.001
100,0004–9 min
PPO“n_steps”: 10,000,
“ent_coef”: 0.005,
“learning_rate”: 0.001,
“batch_size”: 256
100,0006–10 min
DDPG“batch_size”: 256,
“buffer_size”: 1,000,000,
“learning_rate”: 0.001
100,00060–66 min
SAC“batch_size”: 256,
“buffer_size”: 1,000,000,
“learning_rate”: 0.001,
“learning_starts”: 0.01,
“ent_coef”: “auto_0.1”
100,00065–70 min
TD3“batch_size”: 256,
“buffer_size”: 1,000,000,
“learning_rate”: 0.001
100,00057–64 min
Table 2. Performance metrics of portfolio allocation models.
Table 2. Performance metrics of portfolio allocation models.
ModelAnnual Returns (%)Cumulative Returns (%)Annual Volatility (%)Sharpe RatioMax Drawdown (%)
A2C2.7817.1617.270.246−43.50
PPO0.301.7617.530.106−45.81
DDPG2.2813.9518.150.216−47.23
SAC2.0812.6717.370.207−43.55
TD32.2213.5717.900.214−45.47
IMCA2.3214.2017.560.220−44.78
Min-Variance−0.77−4.3514.280.018−41.12
SET50 Baseline−4.25−21.2016.22−0.186−43.13
CAPM−1.73−9.6218.12−0.003−47.50
Table 3. Performance statistics for traditional and Iterative Model Combining Algorithm pre-COVID.
Table 3. Performance statistics for traditional and Iterative Model Combining Algorithm pre-COVID.
MetricBaselineMinimum VarianceCAPMIMCA
Annual Return−5.93%0.62%4.192.23%
Cumulative Returns−11.17%1.20%8.2813.62%
Annual Volatility10.81%9.43%14.7318.26%
Sharpe Ratio−0.510.110.350.21
Max Drawdown−15.80%−11.73%−17.13−47.85%
Daily Value at Risk (VaR)−1.38%−1.18%−1.83−2.29%
Note: In this table, the highest (best) Annual Return, Cumulative Returns, Sharpe Ratio, and the lowest (best) Annual Volatility, Max Drawdown, and Daily VaR are highlighted in bold.
Table 4. Performance statistics for traditional and Iterative Model Combining Algorithm during COVID.
Table 4. Performance statistics for traditional and Iterative Model Combining Algorithm during COVID.
MetricBaselineMinimum VarianceCAPMIMCA
Annual Return2.01%−3.09%−7.212.53%
Cumulative Returns3.87%−5.86%−13.3915.56%
Annual Volatility22.76%20.82%25.0117.63%
Sharpe Ratio0.20−0.05−0.170.23
Max Drawdown−35.99%−36.10%−46.66−44.42%
Daily Value at Risk (VaR)−2.85%−2.63%−3.17−2.20%
In this table, the highest (best) Annual Return, Cumulative Returns, Sharpe Ratio, and the lowest (best) Annual Volatility, Max Drawdown, and Daily VaR are highlighted in bold.
Table 5. Performance statistics for traditional and Iterative Model Combining Algorithm post-COVID.
Table 5. Performance statistics for traditional and Iterative Model Combining Algorithm post-COVID.
MetricBaselineMinimum VarianceCAPMIMCA
Annual Return−8.26%−1.00%−7.682.32%
Cumulative Returns−15.23%−1.91%−14.2214.21%
Annual Volatility11.41%9.51%15.2017.70%
Sharpe Ratio−0.70−0.06−0.440.22
Max Drawdown−20.73%−13.93%−24.04−45.07%
Daily Value at Risk (VaR)−1.47%−1.20%−1.94−2.21%
In this table, the highest (best) Annual Return, Cumulative Returns, Sharpe Ratio, and the lowest (best) Annual Volatility, Max Drawdown, and Daily VaR are highlighted in bold.
Table 6. Performance metrics of portfolio allocation models with transaction costs.
Table 6. Performance metrics of portfolio allocation models with transaction costs.
ModelAnnual Returns (%)Cumulative Returns (%)Annual Volatility (%)Sharpe RatioMax Drawdown (%)
A2C1.9512.5017.320.175−46.50
PPO0.201.1017.580.095−47.00
DDPG1.8011.8018.100.165−46.80
SAC1.7011.0017.400.155−46.70
TD31.8511.9017.900.160−46.60
IMCA2.1013.0017.500.185−45.50
Min-Variance−0.90−5.0014.300.010−42.50
SET50 Baseline−4.40−23.0016.25−0.195−44.00
CAPM−1.90−10.0018.15−0.085−47.50
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Thongkairat, S.; Yamaka, W. A Combined Algorithm Approach for Optimizing Portfolio Performance in Automated Trading: A Study of SET50 Stocks. Mathematics 2025, 13, 461. https://doi.org/10.3390/math13030461

AMA Style

Thongkairat S, Yamaka W. A Combined Algorithm Approach for Optimizing Portfolio Performance in Automated Trading: A Study of SET50 Stocks. Mathematics. 2025; 13(3):461. https://doi.org/10.3390/math13030461

Chicago/Turabian Style

Thongkairat, Sukrit, and Woraphon Yamaka. 2025. "A Combined Algorithm Approach for Optimizing Portfolio Performance in Automated Trading: A Study of SET50 Stocks" Mathematics 13, no. 3: 461. https://doi.org/10.3390/math13030461

APA Style

Thongkairat, S., & Yamaka, W. (2025). A Combined Algorithm Approach for Optimizing Portfolio Performance in Automated Trading: A Study of SET50 Stocks. Mathematics, 13(3), 461. https://doi.org/10.3390/math13030461

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop