Next Article in Journal
On the Asymptotic Behavior of the Optimal Exercise Price Near Expiry of an American Put Option under Stochastic Volatility
Previous Article in Journal
To Trust or Not to Trust? COVID-19 Facemasks in China–Europe Relations: Lessons from France and the United Kingdom
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Forecasting a Stock Trend Using Genetic Algorithm and Random Forest

1
Huizenga College of Business, Nova Southeastern University-SBE, 3301 College Avenue, Fort Lauderdale, FL 33319, USA
2
School of Arts and Sciences, Lebanese International University, Mouseitbah, Mazara P.O. Box 146404, Lebanon
3
College of Business Administration, Tripoli Campus, Beirut Arab University, Beirut P.O. Box 11-50-20, Lebanon
4
Faculty of Economics and Business Administration, Tripoli Campus, Lebanese University (UL), Beirut P.O. Box 6573/14, Lebanon
5
Faculty of Business Administration, AZM University, Tripoli P.O. Box 1010, Lebanon
*
Author to whom correspondence should be addressed.
J. Risk Financial Manag. 2022, 15(5), 188; https://doi.org/10.3390/jrfm15050188
Submission received: 22 February 2022 / Revised: 13 April 2022 / Accepted: 14 April 2022 / Published: 19 April 2022
(This article belongs to the Section Financial Markets)

Abstract

:
This paper addresses the problem of forecasting daily stock trends. The key consideration is to predict whether a given stock will close on uptrend tomorrow with reference to today’s closing price. We propose a forecasting model that comprises a features selection model, based on the Genetic Algorithm (GA), and Random Forest (RF) classifier. In our study, we consider four international stock indices that follow the concept of distributed lag analysis. We adopted a genetic algorithm approach to select a set of helpful features among these lags’ indices. Subsequently, we employed the Random Forest classifier, to unveil hidden relationships between stock indices and a particular stock’s trend. We tested our model by using it to predict the trends of 15 stocks. Experiments showed that our forecasting model had 80% accuracy, significantly outperforming the dummy forecast. The S&P 500 was the most useful stock index, whereas the CAC40 was the least useful in the prediction of daily stock trends. This study provides evidence of the usefulness of employing international stock indices to predict stock trends.

1. Introduction

Stock market movements are influenced by many exogenous variables, such as political and geopolitical events, exchange rates, movements of other stock markets, the economic environment, firm policies, and the psychology of investors (Gidofalvi 2001; Nobel Prize Committee 2013; El-Chaarani 2016; El-Chaarani 2019).
For efficient market theory (Fama 1965; Fama 1970; Fama 1998), all relevant information must be reflected in efficient stock markets. In the weak-form market efficiency, prices of stocks reflect all information of past prices. In semi-strong market efficiency form, prices of stocks reflect all of the available public information. We employ uptrend forecasting, which contrasts with market efficiency, as it includes real time information that does not mirror the theoretical conditions.
Dynamic markets with nonlinear stock price movements, along with the multiplicity of predictors makes forecasting a stock’s trend challenging. In fact, an efficient forecasting solution in the stock market can play a crucial role in motivating people toward stock trading (Sharma et al. 2017).
Artificial Neural Networks (ANN) and the Support Vector Machine (SVM) are the most commonly used machine learning algorithms for forecasting stocks (Guresen et al. 2011; Hoseinzade 2019; Kara et al. 2011; Wang and Wang 2015). Many Artificial Neural Networks tend to predict the next day’s closing price (Li and Liao 2017). Certain Artificial Neural Networks-based models have been enriched with other algorithms to boost the accuracy of the forecast (Nair et al. 2011). Likewise, Support Vector Machine models have been developed to forecast stock trends (Reddy and Sai 2018). Some proposals combined the Support Vector Machine, or Artificial Neural Networks with preprocessing techniques, following meta-heuristics algorithms to find the optimal machine learning parameters, the Artificial Neural Networks architecture, and a set of input features (Asadi et al. 2012; Sedighi et al. 2019; Zhang and Wu 2009). Recently, deep learning techniques such as Convolutional Neural Network (CNN), Deep Multilayer Perceptron (MLP), Deep Belief Network (DBN), Recurrent neural network (RNN), Long Short-Term Memory (LSTM), and Generative Adversarial Network (GAN) have proved to be applicable to stock studies (Aloud 2020; Sang and DiPierro 2019; Selvin et al. 2017; Zhanga et al. 2019). The features selection is a key factor for the knowledge discovery process. Its importance is derived from its role in improving the accuracy and efficiency of the prediction model by selecting the relevant variables and reducing the dimensionality of the datasets (Mao et al. 2016; Sugunnasil and Somhom 2010). Yet, features selection models suffer from the drawback of using isolated features to forecast trends. There is a gap in the literature for stock trend models that not only use features selection, but include international stock indices to forecast trends, as such models capture the essence of worldwide market movements, rather than isolated features. This paper offers such a model.
Our model is as follows:
  • Select a set of international stock indices. In this work, the focus is on the NIKKEI 225, CAC 40, DAX, and S&P 500;
  • Select one stock to be forecasted, such as Apple;
  • Employ a Genetic Algorithm for feature selection. The objective here is to decide which historical prices are significant in forecasting the stock’s trend;
  • Finally, we consider the Random Forest (RF) algorithm to find hidden relationships between the selected features (from Step 3), and the stock’s trend.
This remainder of this paper is organized as follows. Section 2 is a review of literature. Section 3 describes our approach for forecasting a stock’s trend. Section 4 discusses materials and methods. Section 5 presents the results, while the conclusions are described in Section 6.

2. Review of Selected Literature

In this section, we address several studies that have attempted to develop machine learning models to predict stock prices. Artificial Neural Network prediction models have been proposed in many research works, such as Chandan et al. (2016). However, the noisy behavior of stock markets constructs an obstacle for the Artificial Neural Networks, leading to convergence to suboptimal solutions (Hoseinzade 2019). To solve this problem, Kara et al. (2011) suggested that a Support Vendor Machine preprocessing model can help in eliminating irrelevant features. With respect to predicting stock trends, Reddy and Sai (2018) proposed a Support Vendor Machine and Radial Basis Function approach to forecast stock prices in large as well as in small capitalizations. Their predictor is trained based on the available historical data to predict the next day’s data. While the obtained numerical results showed high efficiency of the algorithm, its drawback is that the solution assumes four fixed features without any specific engineering or optimization. The experimentation relies on online data without addressing its quality.
Hoseinzade (2019) suggested a model, called CNNpred, that can be applied to a collection of data from different sources, from different markets. The approach uses feature extraction to predict the next day’s trend of movement for specific indices, including the S&P 500, NASDAQ, DJI, NYSE, and the RUSSELL 3000. Similarly, Chen (2018) used a conv1D function to process 1D data in the convolutional layer. To improve the results, the proposed model used preprocessed stock data as input. The work goal was to forecast stock prices in the Chinese stock market. The proposed model was limited to four features as open, close, high, and low prices. The chief limitation was that the validation relied on a limited dataset.
Karathanasopoulos et al. (2019) proposed several optimization techniques to find the optimal Neural Network hierarchy to forecast 12 Exchange Traded Funds (ETFs). They considered three optimization approaches, namely genetic algorithm, differential evolution, and the particle swarm optimizer. They also considered three multilayer perceptron, recurrent neural networks, and radial basis function neural networks. Their results showed that differential evolution was the optimal method, with the highest forecasting accuracy.
The study most analogous to this paper is the one by Jiao and Jakubowicz (2017). This study predicted the daily direction of stock price movement. The authors considered predicting stock movement as a binary classification problem. They studied 463 stocks, constituents of the S&P 500, and 8 international indices. International indices included three Asian indices (Nikkei 225, Hang Seng, and All Ords), two European indices (DAX, FTSE 100), and three US indices (NYSE Composite, Dow Jones Industrial Average, and S&P500). Thus, when the daily return was positive (greater than zero), the mean price direction was uptrend. When the daily return was negative (less than zero), the mean price direction was downtrend. After that, they used a lag operator to extract more features from stock indices, in addition to more than 200 technical indicators, as input features into a classifier. Then, they employed genetic algorithm-based feature selection, to use the selected features as input into a classifier. The authors used four classification algorithms to compare their prediction performance. The classification algorithms used included Random Forest, Gradient Boosted Trees, Artificial Neural Networks, and logistic regression. However, Jiao and Jakubowicz (2017) did not examine the usefulness of international stock indices to forecast stock trends, which we relied on in our study. Additionally, we proposed a genetic algorithm-based approach to select only helpful features. We extracted 10 lag features from each international stock index, beside the stock historical data of the stock whose trend we wished to predict. Furthermore, they classified trends of stock into uptrend if the daily return price change was positive (greater than zero), and downtrend, if the daily return price change was negative (less than zero). In our study, we classified trends of stock as uptrend if the daily return price change is greater than five tenths percentage ( 0.5 % ). If the daily return price change was less than five tenths percentage ( 0.5 % ), the classification was Not-Uptrend. Our objective was to ensure that the trader would not sustain a loss. Our approach answered the question of whether international stock indices could be useful in forecasting stock trends, whereas the Jiao and Jakubowicz (2017) approach could not provide such insight.
Sable et al. (2017) provided a short-term prediction model, using the Genetic Algorithm, and evolutionary strategies, predicting the price of eight scripts, with six attributes for each script (Opening Price, Closing Price, Highest Price, Lowest Price, Volume, and Adjusted Closing Price). The eight scripts reflect US-based companies. This paper does not address other international markets. Neither does it mention the specific reason behind the chosen six attributes.
Shen and Shafiq (2020) proposed a prediction model for the stock market price trend. The proposal relies on a customization of feature engineering and deep learning. The pillars of the proposal are various techniques of feature engineering with a fine-tuned system, instead of just a deep learning model. The work assessment relies on only the Chinese stock market. While the paper has high scientific value, applying the same approach to other international markets is still to be assessed.
Wanjawa and Muchemi (2014) proved that the configuration model 5:21:21:1 can achieve very good prediction accuracy. The assessment was done on 1000 records trained on 130 K cycles. The training percentage was 80%. Using Artificial Neural Networks (ANN) to predict three stocks on the New York Stock Exchange (NYSE), with Encog and Neuroph for validation, the prediction was achieved with error range [0.71%–2.77%].
Soni et al. (2022), Rouf et al. (2021), Rahul et al. (2020), and Tawarish and Satyanarayana (2019) provide interesting surveys on the ML techniques used for stock market prediction. Almost all of these surveys show that a large percentage of proposals use Support Vector Machine (SVM), fewer use Genetic Algorithm, and fewer use the Random Forest. While these surveys support the novelty of our approach, they do not discuss any work that combines Genetic Algorithm and Random Forest, or international stock indices. Wang et al. (2021) did another survey, which focuses on work done using Fuzzy Cognitive Mapping, and the Deep Neural Network. This review is limited to research work with datasets either spanning several years, or those measured over short-term periods.

3. Our Approach for Forecasting a Stock’s Trend

3.1. Problem Formulation

This research focused on forecasting whether a stock would exhibit an uptrend tomorrow, with reference to today’s stock price. More formally, r t + 1 denoted tomorrow’s price change relative to today’s closing price. This value of r t + 1 is also indicated by Δ P t + 1 as expressed in Equation (1).
r t + 1 = Δ P t + 1 = P t + 1 P t P t ,
where, P t = today’s closing price.
P t + 1 = tomorrow’s closing price.
T r e n d t + 1 = {     U p t r e n d                       i f   r t + 1 0.5 % N o t U p t r e n d       i f   r t + 1 < 0.5 %             .
T r e n d t + 1 could be either Uptrend or Not-Uptrend. Tomorrow’s trend could be considered to be an ‘Uptrend’ if the relative price change, r t + 1 , exceeds a cut-off threshold of 0.5 % . The cut-off threshold of 0.5 % is employed to ensure that trading of the stock could provide positive returns after deducting transaction costs1. If the relative price change, r t + 1 is less than the anticipated transaction costs, then the trader could easily have a loss, even if the forecast is accurate.

3.2. The Forecasting Model

The objective of this research model was to forecast the variable T r e n d t + 1 .

3.2.1. Step One: Feature Engineering

The Feature Engineering step consisted of collecting and composing a set of features that would be considered in forecasting a stock’s trend. Four international stock indices are considered as the main features, including the S&P 500 (United States), NIKKEI 225 (Japan), CAC40 (France), and DAX (Germany). These indices are chosen for their prominence in the global stock markets, with the S&P 500, the index of 500 US companies with highest market capitalization, NIKKEI 225, the index of the leading 225 Japanese companies, the CAC40, with 40 of the stocks with largest market capitalization in France, and DAX for 30 major German securities.
‘Lag operation’ is employed to extract features from stock indices that contained price change data from prior time periods. Shifting the time series historical data of each stock index to prior periods constituted the ‘lag operation’. For example, in Equation (3), historical price changes of the S&P 500 are shifted to one prior period.
Δ S & P   500 t 1 = 100 × S & P 500 t S & P 500 t 1 S & P 500 t 1   ,
where:
  • S & P 500 t = closing price of S&P500,
  • S & P 500 t 1 = yesterday’s closing price,
  • Δ S & P 500 t 1 = today’s price change.
The ‘lag operation’ is employed to the time series of price changes for each stock index 10 times to get 10 lag features. More specifically, historical data of price changes of each stock index (such as the S&P 500) are shifted 10 times to get 10 lag features. To compute the 10 return lags of the S&P 500, 10 values for parameter i (1, 2, …., 10) are considered for the S&P 500 index. For instance, if i = 4 was set, then the price change of S&P500 four days ago is referred to, with reference to today’s price. In other words, the closing price four days before today’s closing price ( S & P 500 t 4 ).
Δ S & P 500 t i = 100 × S & P 500 t S & P 500 t i S & P 500 t i   .
The process of shifting the historical data of price changes for international stock indices is known as ‘distributed lags analysis’.
Table 1 illustrates the 10 lag features extracted from each stock index. For each international stock index, historical data of its price changes over the past 10 days are considered. For example, the value of ‘CAC40_1’ can be denoted as Δ C A C 40 t 1 and computed based on Equation (4). The first row of Table 1 refers to the 10 features corresponding to the 10 lags of the stock market ‘CAC40.’ For example, ‘CAC40_2’ in the first row, indicates the second feature corresponding to lag 2; that is, the price changes of CAC40 stock index two days ago. The value of CAC40_2 can be computed by replacing i with value of two. The second row denotes 10 features corresponding to the 10 lags of ‘DAX’. The same interpretation holds true regarding the third row, the US stock index, ’S&P500,’ and the Japanese stock index, and ‘NIKKEI255,’ in the fourth row.
In addition, the historical data of the stock whose trend we predicted is also considered. The daily price changes of the stock, using historical data, is computed, as in Equation (5). More precisely, the lag operation to shift the historical data of the price changes several steps back is employed. Ten lag features shifted from the historical data of price changes of the considered stock are extracted. Therefore, the 10 values of i (1, 2, …,10) are considered. Each lag feature represents price changes of the stock on the previous day, corresponding to the lag number. As an example, LAG 2, of the considered stock (such as AAPL) represents price changes of stock AAPL two days ago. LAG 2’s value, denoted as Δ AAPL t 2 , is obtained by setting the value of i to 2. Shifting the time series of a stock’s historical data is known as autocorrelation (Salkind 2010). It computes the correlation of a time series with the same series at prior time steps. It tests if the current value of the stock is affected by shifted values of the time series of the stock itself at prior time steps.
Δ AAPL t i = 100 × AAPL t AAPL t i AAPL t i .
Ten lag features extracted from the historical data of the price change of AAPL stock are presented in Table 2. The first column represents the name of the stock, whose trend is to be forecasted. The second column (LAG1) represents the lag index of the historical data of the price change of the AAPL stock the previous day. The third column (LAG2) represents the historical price change of the AAPL stock, shifted two days prior to the present. In fact, the Δ AAPL t 2 can be computed by replacing i with 2 in Equation (5). The idea is to keep shifting the historical data of price changes of the AAPL stock till it reaches 10 days prior to the present day.

3.2.2. Step Two: Feature Selection Using Genetic Algorithm

The Feature Engineering step in the previous section resulted in a total of 50 features. These 50 features are composed of 10 lag features extracted from the historical data of each international stock index in addition to the 10 lags associated with the considered stock itself. In this section, this study identifies which of these 50 features are truly useful in forecasting the stock’s trend, T r e n d t + 1 .
See Table 3 for a one sample chromosome represented as zeroes and ones. Create an initial population. The initial population is created randomly by the Genetic Algorithm. A population is a set of individuals, where an individual is composed of a subset of features. An individual, also known as a chromosome, contains a number of genes, where a gene is a feature in our problem. The number of genes is equal to the number of input features (that is 50, in our case). Subsequently, binary coding is applied for the genes with a gene value of zero or one. A gene value of 1 indicates that the corresponding feature is labeled as useful in forecasting the considered stock, otherwise it will not be employed for the forecast.
All lags with value one are selected for the forecast. For instance, LAG3, LAG6, and LAG10 columns, associated with the stock index DAX (shown in bold in the first column), are labeled as useful, and are thus employed to conduct the forecast. Other LAG features of DAX stock are eliminated in this chromosome. The objective is to find the best set of features (distribution of ones and zeroes) that can forecast T r e n d t + 1 .
Calculate the objective function. The ability of each chromosome in the population to predict T r e n d t + 1 is evaluated by the objective function. The relevance of each chromosome, and its ability to predict T r e n d t + 1 is assessed. For this purpose, the Random Forest algorithm is employed. The Random Forest is a well-known machine learning algorithm, which can be used for classification and regression purposes. The Random Forest is employed to find out the relation between the features represented in each chromosome and T r e n d t + 1 .
Copy the best 10% of the chromosomes. The top 10% of the chromosomes with the highest accuracy over the validation dataset, from the previous step, are copied to the next new population.
Crossover. See Figure 1. Once the accuracy of all the chromosomes is evaluated, they are subjected to a crossover process. The crossover process is used to reproduce new chromosomes (distribution of ones and zeroes), from old chromosomes. In this process, two chromosomes, (known as the parents), are selected to be mixed, to produce two new chromosomes (known as the offspring). To select the parent chromosomes, the roulette wheel selection algorithm is used. The roulette wheel algorithm is applied twice to get the chromosomes of two parents. After that, a single point crossover is applied. The crossover point is selected randomly. This point determined the position at which each chromosome is divided. Each chromosome is divided into two parts according to a random crossover point. Then, the crossover process merged the first part of the first parent with the second part of the second parent, followed by merging the first part of the second parent with the second part of the first parent, resulting in new chromosomes (offspring), produced with new features.
Mutation. See Figure 2. Following the new generation of crossover chromosomes, random changes in the selected features of one chromosome are made by the mutation process. The objective of mutation is to avoid a local optimum in the search space. The mutation process flipped the value of some genes’ bits, for a random number of chromosomes that have been reproduced using the aforementioned crossover process.
New Population. After the crossover process, and after the mutation process are completed, the new population is ready for the next iteration of the Genetic Algorithm. This new population is composed of:
  • The top 10% of the chromosomes copied from the initial population that provided the highest accuracies;
  • The 40% of the chromosomes selected from the new offspring generated by the crossover process, subjected to the mutation process;
  • The 50% of the chromosomes selected from the new offspring generated by the crossover process, without subjection to the mutation process. These fifty percent are selected using the roulette wheel algorithm, with each offspring having selection chances as per its accuracy.
The Genetic Algorithm STOP. The optimization cycle is repeated, until the accuracy of forecasting over the validation dataset did not increase by more than 0.5% over 10 successive iterations. A single chromosome to be employed to forecast the stock’s trend T r e n d t + 1 is returned by the Genetic Algorithm feature.

3.2.3. Step Three: Conducting Forecasts

The Random Forest algorithm is applied to predict T r e n d t + 1 . Random Forest creates a forest with large number of decision trees for classification in training. Each node in a decision tree represents a feature. The root node of the tree consists of the best feature that splits the training samples. The decision node (or internal node) denotes a test on a feature. More particularly, it asks yes or no questions to test the feature and split the points for the decision into sub-nodes. The branches denote the decision rule represented as ‘yes or no’ branches (outcome of the test on the training dataset). Each leaf node (terminal node) denotes the categorical, up or down, classification decision. Leaf nodes do not split, as they are the last structure in the tree. Figure 3 represents the flowchart of a decision tree.
Each tree is built on a random selected sample and a subset of features from a training dataset. Each node in a tree splits the path into a ‘yes’ branch or ‘no’ branch. The tree started by placing the significant feature, best in splitting the samples in the training set, as a root node. The tree evaluated the importance of the feature by employing a selection algorithm, such as the Gini Index, or Information Gain. Each node in the decision tree was split into sub-nodes. The root node divided the training set samples into homogenous subsets by a ‘yes or no’ feature testing. It compared root feature samples to the records values of next node. Next, the sample set that met the criteria, followed the ‘yes’ branch, while the rest followed the ‘no’ branch. Comparison between node feature values and next node records continued until we reached leaf nodes and had a classification output. Random Forest collected all the class labels (decision outputs) from all decision trees, and selected the final decision based on majority votes as the best classification result.
The features of one sample chromosome denoted as zeroes and ones, in which this chromosome is selected randomly, is illustrated in Table 4. The CAC40 row indicates the employed lag features of CAC40, denoted as ‘1’, to forecast T r e n d t + 1 . The same interpretation applies to the remaining rows. Next, all selected features, denoted as one from the selected chromosomes, are sent to the Random Forest classifier to build decision trees. Each tree had a categorical classification decision (Uptrend or Not-Uptrend). Random Forest selected the final classification decision, according to the majority votes of the trees. This process ran recursively on each chromosome, improving the accuracy of predictions with each iteration. Finally, the resulting decision tree model is employed to an out-of-sample dataset to evaluate our model in terms of accuracy of performance.
In sum, our approach comprised the following steps:
  • The basic features extracted from historical data of four international stock indices (following the distributed lag analysis concept), and the historical data of the stock itself (following the concept of autocorrelation), are calculated;
  • A Genetic Algorithm Approach is followed to identify the most useful features to predict a stock’s trend, using a training or validation dataset;
  • A Random Forest classifier is trained, using training and validation datasets (Figure 4), to find the relationships leading to a stock’s trend. Then, the stock trend is forecasted over the out-of-sample testing dataset using the produced Random Forest decision tree.

4. Materials and Methods

4.1. Data Selection

See Table 5. Fifteen stocks listed are considered, belonging to the Technology, Finance, and Healthcare industries. The daily stock closing prices for each stock are listed, sampled from 2 January 2018 to 30 June 2019.
It is important to evaluate stock forecasts and trades using a set of assets with different trends (Prado 2011). The performance of the research forecasting model under different market scenarios is tested by such variation in the selected dataset, thereby avoiding bias towards particular patterns. The descriptive statistics of the daily price changes for all considered stocks are reported in Table 6. It should be noted that some stocks have positive means (such as, GNW, EVTC, and ALC), whereas the daily returns of other stocks have negative means (such as, EVR, AVAL, and ABC). Additionally, positive skewness is reported among certain stocks (such as, GNW, AVAL, ALC), whereas others have negative skewness (such as, ESNT, FB, ABBV). The considered stocks exhibited different trends during the period, 2 January 2018 to 30 June 2019.
Furthermore, a dataset of four international stock indices (S&P500, NIKKEI 225, CAC 40, DAX) over the period from 2 January 2018 to 30 June 2019 is selected and prepared. Distributed lag analysis and autocorrelation to extract 10 lag features from each considered stock and each stock index, are employed. The resulting 50 lag features are employed for training and testing our model to forecast T r e n d t + 1 .

4.2. Model Training and Testing Process

A rolling window approach has been proposed to improve the reliability of the trading strategy (Prado 2011) The rolling window approach is usually used for evaluating trading systems. Data is divided into overlapping training sets. In each cycle, each is moved forward through the time series. Stricter models ensue due to more frequent retraining, and large out-of-sample data sets (increasing training processing requirements, but with models that adapt more quickly to the changing market conditions). For this study, training and testing of the model is conducted using a monthly rolling window (see Figure 5).
See Table 7. The dataset in Section 4.1 is sampled on a daily basis over 17 months, between 2 January 2018 and 30 June 2019. The dataset is divided into multiple overlapping training sets. Each set is composed of two sub-sets. The time length of the first subset was five months, which is subsequently used for model training. The second set is a one-month time window for validation. In fact, prediction is possible, on a daily or weekly basis, but will require high levels of computation for testing, which are beyond the scope of this paper. By adopting monthly testing, sufficient variations in stock movements are being included, as such variations are generated by a dynamic stock market. In future work, daily or weekly testing will be employed to capture a larger number of stock prices. Excessive computation would also be required for shorter training sets, which, although challenging at present, will be considered in future research.
The training and validation dataset are employed to find the best features set, using the Genetic Algorithm. Table 8 lists the settings of the Genetic Algorithm module as employed in the experiments. The second subset is a one-month time window for testing the out-of-sample, dataset. Each set is moved forward one month in each round of this rolling window. 12 rolling window sets are composed.
Three types of experiments to test the model are employed:
  • Evaluation of the accuracy of the model;
  • Evaluation of the usefulness of international stock indices;
  • Sensitivity analysis.
Next, the details of all of the experiments are discussed.
A.
Evaluation of The Accuracy of Our Model
The objective of this experiment is to evaluate the performance of the model. Accuracy denotes the fraction of the correct classifications we obtained from our model, as measured by Equation (6). The accuracy of forecasting T r e n d t + 1 of each considered stock across all three sectors is measured. Twelve sets of rolling windows for each stock are trained and tested. In other words, for each stock, 12 datasets, with each dataset being split into training, are composed, with the testing of sub-datasets. The training phase consisted of feature selection. The testing sub-dataset is employed to measure the accuracy of prediction of T r e n d t + 1 .  
A c c u r a c y = N u m b e r   o f   c o r r e c t   p r e d i c t i o n s T o t a l   n u m b e r   o f   p r e d i c t i o n s .
Furthermore, these accuracies are compared with the accuracies obtained by a dummy forecast model. A baseline measurement of performance using simple rules and different strategies to predict are given by the dummy forecast model. The dummy forecast is based on the imbalance in the dependent variable, T r e n d t + 1 . For instance, suppose that 65% of the instances of T r e n d t + 1 are Uptrend and 35% are Not-Uptrend. In such a case, the accuracy of a dummy forecast is 65%. Furthermore, to validate this comparison, the nonparametric Wilcoxon signed rank test is employed. The Wilcoxon test is used to compare two matched samples with unknown distributions, where the median difference between the pairs of observations of this forecasting model and the dummy forecast are zero. The possible rejection of the null hypothesis is determined by the result of this comparison (Wilcoxon 1945).
B.
Evaluation of the Usefulness of International Stock Indices
The usefulness of each international stock index to predict a stock trend is highlighted. A total of 40 lag features, based on international stock indices, are extracted. In addition, the 10 lags related to the considered stock itself. Then, those 50 features are fed into the model to start the training and testing process. The optimal chromosome with the highest accuracy was returned at the end of the training and feature engineering process. In this experiment, the optimal chromosome to determine the most and least effective stock indices for the prediction process is used. To clarify this method, the best chromosome selected by the model over one sample rolling window for one particular stock is considered in Table 9. A value of one, the selected lag features extracted from each stock index and the stock itself, are presented in Table 9. Selected lag features that correspond to the CAC40 stock index are represented in the LAG 2, LAG 6, LAG 8, and LAG 10 columns, in row CAC40. In the case of row DAX, the selected lag features are denoted in columns of LAG 3, LAG 7, and LAG 10.
After obtaining the best chromosome, the number of selected lag features from each stock index and the stock itself is computed. The most and least effective stock index is determined by the number of selected features. As shown in Table 10, four lag features are selected corresponding to the CAC40 stock market. Three lag features are extracted from the DAX stock market. The highest number of selected features of seven, from all stock indices, are from the S & P 500 index, suggesting that the S&P 500 had the strongest impact on the forecasting model. Conversely, the least frequently selected lag features from all stock indices are in the DAX row.
C.
Sensitivity Analysis
In the study problem formulation, it is considered a cut-off threshold for the daily price change, r t + 1 , of 0.5%, to consider T r e n d t + 1 as an Uptrend. In this experiment, a sensitivity analysis to validate the accuracy of our model’s threshold is conducted. The question is “How would a different value for the cut-off thresholds affect the accuracy of our forecasting model?” For this purpose, Equation (8), as a generalized form of Equation (7), is introduced.
T r e n d t + 1 {     U p t r e n d                       i f   r t + 1 0.5 % N o t U p t r e n d           r t + 1 < 0.5 %       ,
T r e n d t + 1 {     U p t r e n d                       i f   r t + 1 0.5 % N o t U p t r e n d                         r t + 1 < 0.5 %               .
The impact on the accuracy of the model to different values of thresholds ( α in Equation (8)) is analyzed. The forecasting approach is applied to the identical 15 selected stocks using multiple values of α = {0.5, …, 2.4} where α rises from 0.5 to 2.4, in steps of 0.1. For each value of α , the overall average accuracy of our model for all stocks over the 12 rolling windows following the steps of Section 4.2 is computed.

5. Results

In this section we will describe and discuss the results of the three types of experiments, A, B, and C, presented in the previous section (Section 4)

5.1. Description and Discussion of the Results of Experiment A (Evaluation of the Accuracy of Our Model)

The objective of this experiment was to measure the accuracy of our forecasting model. We considered 15 stocks, covering 3 sectors. Table 11 shows the accuracy of our model for the 15 considered socks. The column, Average Our Model, is the accuracy of forecasting a stock trend by our model. The Average Our Model was calculated by dividing the sum of each five stock accuracies belong to a sector by the total number of stocks in that sector. In each column, we have listed the highest accuracies in bold, and underlined the lowest accuracies. The column Dummy Forecast was the accuracy of conducting a dummy forecast. The accuracy of dummy forecasting reflects the imbalance in the dataset.
The accuracy of our algorithm ranges between 55% (ABC stock) and 80% (EVTC stock). An accuracy of 80%, in the case of EVTC, is the highest accuracy achieved by our model. In contrast, the accuracy of the dummy forecast ranges between 50% (ABBV, FB, and JD stocks) and 65% (EVTC and AVAL stocks). By comparing our model versus the dummy forecast, we can point out that our algorithm provides better accuracies in all cases, except for the case of AVAL. The Wilcoxon test showed that the test statistic equals two, while the critical value for this experiment was 17. When the test statistic is less than the critical value, the null hypothesis is rejected. Thus, the Wilcoxon test rejected the null hypothesis that the median difference between the two models was zero. The figures in the last column ‘Average Our Model,’ is the statistical mean of the accuracy of our model when applied to each sector. The highest average achieved by our model was by the Technology Sector with an average of 71%, followed by the Finance and the Healthcare sectors with averages equal to 63%, and 62%, respectively.
We conclude that our model provides higher accuracies than the dummy forecast in 14 out of the 15 considered stocks. Furthermore, higher accuracy in forecasting stocks by our algorithm exists in the Technology sector. Finally, this experiment is statistically significant by rejecting the null hypothesis using the Wilcoxon signed rank test.

5.2. Description and Discussion of the Results of Experiment B (Evaluation of the Usefulness of International Stock Indices)

In this experiment, we wanted to determine which stock had the highest and least impact on each sector.
Table 12 lists the total number of selected lag features in all of the best chromosomes that contributed to predicting the stock trend over the 12 rolling windows. For instance, in the case of the stock symbol ESNT (shown in the first row), we note that the 72 lags of the DAX column have been selected by the Genetic Algorithm-based feature selection algorithm. In contrast, only 36 lags have been selected based on the autocorrelation of the ESNT. In Sum Finance, the first 288 number indicates the number of times that the S&P 500 was selected by our Genetic Algorithm by each stock in the finance sector over the entire 12 rolling windows. Numbers shown in bold denote the most frequently selected indices, while numbers shown in italics indicates the least selected indices.
From Table 12, the autocorrelated total of 312 for Sum Finance suggests that the stocks belonging to the Finance Sector depend on their own past data rather than stock indices. However, the numbers 288 and 252 in the same row suggest that S&P500 and NIKKEI225 contributed to forecasting stock trends in the Finance sector. In contrast, the number 197, shown in italics, suggests that CAC40 was the least selected stock index compared to other stock indices in the Finance Sector. As for Sum Technology, the most frequently selected feature belongs to the stock index S&P 500, with 178 features. On the other hand, stock CAC40 was the least useful stock index for forecasting, with the total number of selected features of 106. In the case of the Healthcare Sector, the autocorrelated total of 312 indicates that the stocks in the healthcare sector depend on their own historical price movements. In addition, the DAX was useful for forecasting in the Healthcare Sector, with a total number of selected features of 250.
We conclude from this experiment that
  • The historical price movements of a stock can be helpful in predicting stock trends, particularly for the Finance and Healthcare sectors;
  • Globally, the results of the last row ‘Overall Sum’ suggest that the S&P 500 and DAX seem to have a high impact on all the three sectors. The row Sum Finance suggest that the NIKKEI255 had an equal impact on the Finance Sector to the S&P 500. We consider this as an indication that financial markets are closely correlated;
  • The last row Overall Sum in Table 12 suggests that overall, the S&P 500 index was the most useful in predicting stock trends. The results in the same row show that CAC40 was the least useful stock index in predicting stock trends.

5.3. Description and Discussion of the Results of Experiment C: Sensitivity Analysis

The objective of this experiment was to determine the accuracy of our model with variations in the threshold value, α, that demarcates Uptrend from Not Uptrend. We re-ran our forecasting model using 20 different values of α . For each value of α , we measured the average accuracy of our model over the 15 selected stocks. The results depicted in Figure 6 provided visual indication that there was a negative correlation between the accuracy of our model and the value of α . The statistical correlation between the two sets of numbers, Accuracy (%) and α , shown at the bottom of Figure 6, was 0.9. This indicates the existence of a strong negative correlation between the accuracy of our model, and the value of α , suggesting that a trader should be cautious while using our model to predict large price changes.

6. Conclusions

6.1. Contribution to Literature

Jiao and Jakubowicz (2017) measured the performance of their forecasting model based on Area Under Curve (AUC) criteria. They reported an average AUC of 0.78. In Experiment A, we computed an average AUC of 0.75. We conclude that their forecasting model outperforms our model. However, there is one major difference between the two studies. Jiao and Jakubowicz (2017) considered more than 200 features, in addition to 8 international stock indices. Therefore, their study cannot provide any insight on the usefulness of just international stock indices to forecast stock trends. In contrast, we considered only four international stock indices. Thus, we consider the accuracy of our model as evidence that international stock indices contribute to forecasting a stock’s trend. Another difference is that Jiao and Jakubowicz (2017, p. 20) employed more than 200 technical indicators and concluded that, “Stock movement direction is hardly predictable from its own past data”. In this work, based on the results reported in the last row of Table 12, we observed that stock’s historical data contributes significantly to forecasting a stock’s direction of movement.

6.2. Implications for Stock Trend Forecasting

This paper proposes a forecasting model that predicted if a particular stock exhibited an uptrend with reference to today’s closing prices. Our model is a mixture of features selection based on a genetic algorithm and random forest classifier. We have provided evidence that international stock indices can be helpful to forecast stock trends. We considered four international stock indices as the main source for features. We considered the concept of distributed lag analysis and autocorrelation for features engineering. We adopted a genetic algorithm approach to select the most helpful set of features to predict a stock’s trend. Finally, we employed the Random Forest algorithm to forecast the next day stock’s trend based on the selected set of features.
To examine the performance of our model, we predicted the daily stock trend of 15 stocks from different sectors. The experimental results suggest that the performance of our model significantly outperforms the dummy forecast. In some cases, the accuracy of our model was up to 80%. The results also showed that S&P 500 (the US stock market) is the most useful stock index in the prediction on all sectors. This could be because the 15 considered stocks have been selected from NYSE and NASDAQ. In contrast, CAC40 (French stock index) seems to have the lowest impact on two out of the three sectors. Moreover, we find out that past historical data of the stock itself helps significantly in predicting its trend. However, the results also show that the accuracy of our model decreases considerably while trying to predict large price changes.

6.3. Future Works

In future work, we will test our model with 100 stocks. We accept that a sample of 15 stocks may demonstrate some random effects that overstate or understate stock trends, that would disappear with a larger sample size. Yet, there is a challenge in terms of the lengthy computational time required for a model with 100 stocks, but we will still undertake the test to improve the accuracy of the model.
In addition, our model was tested on monthly basis, but we can do the prediction on a daily basis, which will require high computation. We will work on decreasing the high complexity of the computation, as we extend our work to cover weekly and daily trainings.

Author Contributions

Conceptualization, H.E.-C. and M.E.S.; methodology, A.M.B.; software, A.S.; validation, H.E.-C., R.A. and A.S.; formal analysis, H.E.-C.; investigation, D.J.; resources, S.E.N.; data curation, H.E.-C.; writing—original draft preparation, H.E.-C. and R.A. review and editing, R.A.; visualization, H.E.-C. and A.M.B.; supervision, H.E.-C. and R.A.; project administration, M.E.S. All authors have read and agreed to the published version of the manuscript.

Funding

The research received no external funding.

Data Availability Statement

Data is available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflict of interest.

Note

1

References

  1. Aloud, Monira E. 2020. Role of attribute selection in a deep ANNs learning framework for high frequency financial trading. Intelligent Systems in Accounting Finance and Management 27: 43–54. [Google Scholar] [CrossRef]
  2. Asadi, Shahrokh, Esmaeil Hadavandi, Farhad Mehmanpazir, and Mohammad Nakhostin. 2012. Hybridization of evolutionary Levenberg-Marquardt neural networks and data pre-processing for stock market prediction. Knowledge-Based Systems 35: 245–58. [Google Scholar] [CrossRef]
  3. Chandan, Kumar, Sumathi Mahadevan, and S. N. Sivanandam. 2016. Prediction of stock market price using hybrid if wavelet transform and artificial neural network. Indian Journal of Science and Technology 9: 1–5. [Google Scholar]
  4. Chen, Sheng. 2018. Stock prediction using convolutional neural network. Paper presented at the IOP conference series on Materials Science and Engineering, Suzhou, China, June 22–24. [Google Scholar]
  5. El-Chaarani, Hani. 2016. Exploring the Impact of Emotional Intelligence on Portfolio Performance. Humanomics 32: 1–28. [Google Scholar] [CrossRef]
  6. El-Chaarani, Hani. 2019. The Impact of Oil Prices on Stocks Markets: New Evidence During and After the Arab Spring in Gulf Cooperation Council Economies. International Journal of Energy Economics and Policy 9: 1–26. [Google Scholar] [CrossRef]
  7. Fama, Eugene F. 1965. The Behavior of Stock-Market Prices. Journal of Business 38: 34–105. [Google Scholar] [CrossRef]
  8. Fama, Eugene F. 1970. Efficient capital markets: A review of theory and empirical work. Journal of Finance 25: 383–417. [Google Scholar] [CrossRef]
  9. Fama, Eugene F. 1998. Market efficiency, long-term returns, and behavioral finance. Journal of Financial Economics 49: 283–306. [Google Scholar] [CrossRef]
  10. Gidofalvi, Gyozo. 2001. Using News Articles to Predict Stock. Working Paper. San Diego: Department of Computer Science and Engineering, University of California. [Google Scholar]
  11. Guresen, Erkam, Gulgun Kayakutlu, and Tugrul U. Daim. 2011. Using artificial neural network models in stock market index prediction. Expert Systems With Applications 38: 10389–97. [Google Scholar] [CrossRef]
  12. Hoseinzade, Ehsan. 2019. CNNpred: CNN-based stock market prediction using a diverse set of variables. Expert Systems With Applications 129: 273–85. [Google Scholar] [CrossRef]
  13. Jiao, Yang, and Jeremie Jakubowicz. 2017. Predicting stock movement direction with machine learning: An extensive study on S&&P 500 stocks. Paper presented at the IEEE International Conference on Big Data (BIGDATA), Boston, MA, USA, December 11–14. [Google Scholar]
  14. Kara, Yakup, Melek A. Boyacioglu, and Omer K. Baykan. 2011. Predicting direction of stock price index movement using artificial neural networks and support vendor machines: The sample of the Istanbul Stock Exchange. Expert Systems With Applications 38: 5311–19. [Google Scholar] [CrossRef]
  15. Karathanasopoulos, Andreas, Mitra Sovan, Chia C. Lo, Adam Zaremba, and Mohammed Osman. 2019. Ensemble models in forecasting financial markets. Journal of Computational Finance 23: 101–19. [Google Scholar] [CrossRef]
  16. Li, Wei, and Jian Liao. 2017. A comparative study on trend forecasting approach for stock price time series. Paper presented at the 11th IEEE International Conference on Anti-Counterfeiting, Security, and Identification (ASID), Xiamen, China, October 27–29. [Google Scholar]
  17. Mao, Yanan, Zuoquan Zhang, and Dingyuan Fan. 2016. Hybrid feature selection based on improved Genetic Algorithm. Paper presented at the 2016 6th International Conference on Digital Home (ICDH), Guangzhou, China, December 2–4. [Google Scholar]
  18. Nair, Binoy, S. Ghana Sai, A. N. Naveen, A. Lakshmi, G. S. Venkatesh, and V. P. Mohandas. 2011. A GA artificial neural network hybrid system for financial time series forecasting. Information Technology and Mobile Communication 147: 499–506. [Google Scholar]
  19. Nobel Prize Committee. 2013. Understanding asset prices. In Nobel Prize in Economics Documents. Stockholm: Economic Sciences Prize Committee of the Royal Swedish Academy of Sciences. [Google Scholar]
  20. Prado, Robert. 2011. The Evaluation and Optimization of Trading Strategy. Hoboken: John Wiley. [Google Scholar]
  21. Rahul, Subrat Sarangi, Priyansh Kedia, and Monika. 2020. Analysis of various approaches for stock market prediction. Journal of Statistics and Management Systems 23: 285–93. [Google Scholar] [CrossRef]
  22. Reddy, Vankuru, and Kranthi Sai. 2018. Stock market prediction using machine learning. International Research Journal of Engineering and Technology (IRJET) 5: 1033–35. [Google Scholar]
  23. Rouf, Nusrat, Majid Bashir Malik, Tasleem Arif, Sparsh Sharma, Saurabh Singh, Satyabrata Aich, and Hee-Cheof Kim. 2021. Stock market prediction using Machine Learning techniques: A decade survey on methodologies, Recent developments, and future directions. Electronics 10: 2717. [Google Scholar] [CrossRef]
  24. Sable, Sonal, Ankita Porwal, and Upendra Singh. 2017. Stock price prediction using genetic algorithms and evolution strategies. Paper presented at the International conference of Electronics, Communication and Aerospace Technology (ICECA) Proceedings, Coimbatore, India, April 20–22; pp. 240–53. [Google Scholar]
  25. Salkind, Neil J. 2010. Encyclopedia of Research Design. Thousand Oaks: Sage Publications. [Google Scholar]
  26. Sang, Chengjie, and Massimo DiPierro. 2019. Improving trading technical analysis with TensorFlow long short-term memory (CSTM) neural network. The Journal of Finance and Data Science 5: 1–11. [Google Scholar] [CrossRef]
  27. Sedighi, Mojtaba, Hossein Jahangirnia, Mohsen Gharakhani, and Saeed F. Fard. 2019. A novel hybrid model for stock price forecasting based on metaheuristics and Support Vendor Machine. Data, Special Issue on Data Analysis for Financial Markets 4: 1–20. [Google Scholar]
  28. Selvin, Sreelekshmy, R. Vinayakumar, E. A. Gopalakrishnan, Vijay Krishna Menon, and K. P. Soman. 2017. Stock price prediction using LSTM, RNN, and CNN-sliding window model. In Paper presented at the 2017 Proceedings of the International Conference on Advances in Computing, Communications, and Informatics (ICACCI), Udupi, India, September 13–16; pp. 1643–47. [Google Scholar]
  29. Sharma, Ashish, Dinesh Bhuriya, and Upendra Singh. 2017. Survey of stock market prediction using machine learning approach. Paper presented at the International conference of Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, April 20–22; Volume 2021, p. 1. [Google Scholar]
  30. Shen, Jingyi, and M. Omair Shafiq. 2020. Short-term stock market price trend prediction using a comprehensive deep learning system. Journal of Big Data 7: 1–33. [Google Scholar] [CrossRef]
  31. Soni, Payal, Yogya Tewari, and Deepa Krishnan. 2022. Machine Learning approaches in stock price prediction: A systematic review. In Journal of Physics: Conference Series. Bristol: IOP Publishing, p. 2161012065. [Google Scholar]
  32. Sugunnasil, Prompong, and Samerkae Somhom. 2010. Feature selection for neural network based stock prediction. Paper presented at the International Conference on Advances in Information Technology, Bangkok, Thailand, November 4–5. [Google Scholar]
  33. Tawarish, M., and K. Satyanarayana. 2019. A review on pricing prediction on stock market by different techniques in the field of data mining and genetic algorithm. International Journal of Advanced Trends in Computer Science and Engineering 8: 23–26. [Google Scholar]
  34. Wang, Jie, and Jun Wang. 2015. Forecasting stock market indexes using Principal Component Analysis and stochastic time effective neural networks. Neurocomputing 156: 68–78. [Google Scholar] [CrossRef]
  35. Wang, Wenjian Liu, Linkai Zhu, Rujie Luo, Guang Li, and Shugeng Dai. 2021. Stock price prediction methods based on FCM and DNN Algorithms. Mobile Information Systems 2021. [Google Scholar] [CrossRef]
  36. Wanjawa, Barack Wamkaya, and Lawrence Muchemi. 2014. ANN model to predict stock prices at stock exchange markets. arXiv arXiv:1502.06434. Available online: https://arxiv.org/ftp/arxiv/papers/1502/1502.06434.pdf (accessed on 10 January 2022).
  37. Wilcoxon, Frank. 1945. Individual comparisons by ranking methods. Biometrics Bulletin 1: 80–83. [Google Scholar] [CrossRef]
  38. Zhang, Yudong, and Lenon Wu. 2009. Stock market prediction of S P 500 via combination of improved BCO approach and BP neural network. Expert Systems With Applications 36: 8849–54. [Google Scholar] [CrossRef]
  39. Zhanga, Kang, Guoquiang Zhonga, Junyun Donga, Shengke Wanga, and Yong Wanga. 2019. Stock market prediction based on Generative Adversarial Network. Procedia Computer Science 147: 400–6. [Google Scholar] [CrossRef]
Figure 1. A sample single-point crossover, with the point’s index set randomly to two.
Figure 1. A sample single-point crossover, with the point’s index set randomly to two.
Jrfm 15 00188 g001
Figure 2. Mutation example with 4 genes to be mutated (genes at index 3, 5, 7, and 10).
Figure 2. Mutation example with 4 genes to be mutated (genes at index 3, 5, 7, and 10).
Jrfm 15 00188 g002
Figure 3. A sample random forest decision tree structure.
Figure 3. A sample random forest decision tree structure.
Jrfm 15 00188 g003
Figure 4. Random forest classifier.
Figure 4. Random forest classifier.
Jrfm 15 00188 g004
Figure 5. Illustration of 12 rolling windows for the training, validation, and testing sets.
Figure 5. Illustration of 12 rolling windows for the training, validation, and testing sets.
Jrfm 15 00188 g005
Figure 6. Variation of the accuracy of our mdel as a fucntion of α .
Figure 6. Variation of the accuracy of our mdel as a fucntion of α .
Jrfm 15 00188 g006
Table 1. The 10 lag features extracted from each global stock index in the past 10 days.
Table 1. The 10 lag features extracted from each global stock index in the past 10 days.
12345678910
CAC40CAC 40_1CAC 40_2CAC 40_3CAC 40_4CAC 40_5CAC 40_6CAC 40_7CAC 40_8CAC 40_9CAC 40_10
DAXDAX_1DAX_2DAX_3DAX_4DAX_5DAX_6DAX_7DAX_8DAX_9DAX_10
S&P500S&P 500_1S&P 500_2S&P 500_3S&P 500_4S&P 500_5S&P 500_6S&P 500_7S&P 500_8S&P 500_9S&P 500_10
NIKKEI225NIKKEI 225_1NIKKEI 225_2NIKKEI 225_3NIKKEI 225_4NIKKEI 225_5NIKKEI 225_6NIKKEI 225_7NIKKEI 225_8NIKKEI 225_9NIKKEI 225_10
Each feature consists of price changes of the stock index at the prior day corresponding to the lag number.
Table 2. The 10 lag features extracted from AAPL stock in the past 10 days.
Table 2. The 10 lag features extracted from AAPL stock in the past 10 days.
Stock NameLAG1LAG2LAG3LAG4LAG5LAG6LAG7LAG8LAG9LAG10
AAPLAAPL_1AAPL_2AAPL_3AAPL_4AAPL_5AAPL_6AAPL_7AAPL_8AAPL_9AAPL_10
The price changes of AAPL stock are denoted by each lag feature on the prior day corresponding to the lag number.
Table 3. One sample chromosome of population represented as a set of zeroes and ones.
Table 3. One sample chromosome of population represented as a set of zeroes and ones.
Name of Index or StockLAG1LAG2LAG3LAG4LAG5LAG6LAG7LAG8LAG9LAG10
CAC 401110011000
DAX0010010001
S&P5000010101110
NIKKEI2251001001101
Stock (AAPL)1101000111
Table 4. The features of a one sample chromosome, represented as zeroes and ones.
Table 4. The features of a one sample chromosome, represented as zeroes and ones.
Stock NameLAG1LAG2LAG3LAG4LAG5LAG6LAG7LAG8LAG9LAG10
CAC 400100010001
DAX0010001001
S&P5000011101010
NIKKEI2251001011001
Stock (APPL)0001000010
Table 5. The distribution of the 15 selected stocks across 3 sectors.
Table 5. The distribution of the 15 selected stocks across 3 sectors.
SectorSelected Stocks
TechnologyFBAPPLBABAJDEVTC
FinanceESNTGNWGTYEVRAVAL
HealthcareHCAWATALCABBVABC
Table 6. Descriptive statistics of the daily rate of returns r t + 1 of the selected stocks.
Table 6. Descriptive statistics of the daily rate of returns r t + 1 of the selected stocks.
SectorStock SymbolMeanStandard DeviationSkewnessKurtosis
FinanceESNT0.0002414790.020302896−1.7224770511.43985993
GNW0.0010798410.0321417332.39330687416.36510994
GTY0.0005697290.013049528−0.2666081062.501580582
EVR−0.000194150.0196820990.3075579252.626563921
AVAL−0.0002676560.0166102930.8125910736.70001155
TechnologyFB0.0003020660.023904315−1.33529105514.05632121
APPL0.0006891750.0184068840.1290205293.017014388
BABA−0.000396280.0228332730.0174424680.446496642
JD−0.0012978270.027757257−0.1537934180.80117389
EVTC0.0024662150.0210554821.29837630910.40076212
HealthcareHCA0.0009185860.016395269−0.289913278.3983388
WAT−0.00007833160.017568363−0.24041228515.40554447
ALC0.0750331140.4381048825.90484956834.22098852
ABBV−0.0010943730.02099967−2.11708273514.22452989
ABC−0.0002964820.01893862−0.2131776293.212242095
Table 7. Dataset splitting into a rolling window from 2 January 2018 to 30 June 2019.
Table 7. Dataset splitting into a rolling window from 2 January 2018 to 30 June 2019.
Rolling Windows IDTraining (5 Months)Testing (1 Month)
1From 1/2/2018 to 30/6/2018From 1/7/2018 to 31/7/2018
2From 1/3/2018 to 31/7/2018From 1/8/2018 to 31/8/2018
3From 1/4/2018 to 31/8/2018From 1/9/2018 to 30/9/2018
4From 1/5/2018 to 30/9/2018From 1/10/2018 to 31/10/2018
5From 1/6/2018 to 31/10/2018From 1/11/2018 to 30/11/2018
6From 1/7/2018 to 30/11/2018From 1/12/2018 to 31/12/2018
7From 1/8/2018 to 31/12/2018From 1/1/2019 to 31/1/2019
8From 1/9/2018 to 31/1/2019From 1/2/2019 to 28/2/2019
9From 1/10/2018 to 28/2/2019From 1/3/2019 to 31/3/2019
10From 1/11/2018 to 31/3/2019From 1/4/2019 to 30/4/2019
11From 1/12/2018 to 30/4/2019From 1/5/2019 to 31/5/2019
12From 1/1/2019 to 31/5/2019From 1/6/2019 to 30/6/2019
Table 8. The genetic algorithm settings for feature selection.
Table 8. The genetic algorithm settings for feature selection.
SettingValue
Size of population500
Crossover typeSingle point
Crossover rate100%
Mutation rate40%
Number of genes subjected for mutation within each chromosome 7
Stop criteria10 successive iterations with less than 0.5% increasing in accuracy
Objective functionAccuracy
Number of genes in one chromosome 50
Table 9. A sample of one chromosome returned by the genetic algorithm feature to forecast AAPL for a single rolling window.
Table 9. A sample of one chromosome returned by the genetic algorithm feature to forecast AAPL for a single rolling window.
Stock NameLAG 1LAG 2LAG 3LAG 4LAG 5LAG 6LAG 7LAG 8LAG 9LAG 10
CAC 400100010101
DAX0010001001
S&P5001011101110
NIKKEI2251001010001
Stock (AAPL)0101010110
Table 10. Number of selected lag features from each stock index according to the chromosome selected from Table 9.
Table 10. Number of selected lag features from each stock index according to the chromosome selected from Table 9.
StocksSelected Lags with Value 1
CAC 404
DAX3
S&P5007
NIKKEI2254
Stock (APPL)5
Table 11. Accuracy of our forecasting model.
Table 11. Accuracy of our forecasting model.
Stock SectorStock SymbolOur ModelDummy Forecast Average of Our Model
HealthcareHCA655362.8
WAT7152
ALC6154
ABBV6250
ABC5554
TechnologyFB735170.8
AAPL6459
BABA7460
JD6352
EVTC8066
FinanceESNT675664.4
GNW6854
GTY6662
EVR6261
AVAL6065
Table 12. The count of the total number of features selected per stock.
Table 12. The count of the total number of features selected per stock.
Stock SectorStock SymbolS&P500NIKKEI225CAC40DAXAuto Correlation
FinanceESNT4848487236
GNW72123624108
GTY3672122412
EVR9610896120120
AVAL362451236
Sum Finance288264197252312
TechnologyFB361241224
AAPL6048245935
BABA31091123
JD5771594673
EVTC22101093
Sum Technology178151106137158
HealthcareHCA453361373
WAT2325242549
ALC73628394105
ABBV4533348110
ABC3510613775
Sum Healthcare221163208250312
Overall Sum687578511639782
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Abraham, R.; Samad, M.E.; Bakhach, A.M.; El-Chaarani, H.; Sardouk, A.; Nemar, S.E.; Jaber, D. Forecasting a Stock Trend Using Genetic Algorithm and Random Forest. J. Risk Financial Manag. 2022, 15, 188. https://doi.org/10.3390/jrfm15050188

AMA Style

Abraham R, Samad ME, Bakhach AM, El-Chaarani H, Sardouk A, Nemar SE, Jaber D. Forecasting a Stock Trend Using Genetic Algorithm and Random Forest. Journal of Risk and Financial Management. 2022; 15(5):188. https://doi.org/10.3390/jrfm15050188

Chicago/Turabian Style

Abraham, Rebecca, Mahmoud El Samad, Amer M. Bakhach, Hani El-Chaarani, Ahmad Sardouk, Sam El Nemar, and Dalia Jaber. 2022. "Forecasting a Stock Trend Using Genetic Algorithm and Random Forest" Journal of Risk and Financial Management 15, no. 5: 188. https://doi.org/10.3390/jrfm15050188

Article Metrics

Back to TopTop