**7. Experimental Results and Analysis**

*7.1. Model Performance and Feature Importance*

The performance of the random forest (RF) and SVM are compared to a simple random approach using the two-class probabilities. In particular, for each observation, a random uniform number is generated and if its value is below or equal to the first class's probability, it is assigned to that class, and otherwise, it is assigned to the second class. This approach is taken to compare the random forest and SVM with a random approach but still account for the class sizes (especially for the Year-Highest class, which has a higher share of observations with the positive target class). The average classification accuracy, precision, and recall for the three models are displayed for each of the two targets ("Year-End" and "Year-Highest") in Table 5. The results are based on 20 runs of a nested cross-validation (10-fold cross-validation split for the external and also the nested cross-validation).


**Table 5.** Model results for the Year-End and the Year-Highest targets.

The notation '\*\*\*' refer to 0.1% significance level corresponding to a one-sided Welch's test of the accuracy of RF and SVM versus the accuracy of the Random model for a specific target, respectively.

The results for the Year-End target show that the random forest is, with an average accuracy of 73.24%, the most accurate model. The linear SVM model performs noticeably worse than the random forest. However, using the one-sided Welch's test, it can be demonstrated that both the random forest and the SVM are highly significantly (\*\*\*) more accurate than the random model (*p*-value < 0.999). The average precision and recall are also the highest for the random forest model with both values being around 70%. This indicates that the model correctly predicts around 70% of the actual target price hits (recall) and that also about 70% of the positive predictions are actual hits (precision). For the Year-Highest target, the ranking of the methods is the same, with the random forest performing the best in terms of accuracy and, both the random forest and SVM show average accuracies that are highly significantly more accurate than that of the random model (*p*-value < 0.999). It is noteworthy that all metrics—average accuracy, average precision, and average recall are higher for all methods for the Year-Highest target than for the Year-End target. This is likely based on the fact that it is an easier classification task to determine if a certain

target price is exceeded at some point during a time period than for only one point in time (year-end). a certain target price is exceeded at some point during a time period than for only one point in time (year-end). The next question investigated is that of the feature importance, meaning, which var-

worse than the random forest. However, using the one-sided Welch's test, it can be demonstrated that both the random forest and the SVM are highly significantly (\*\*\*) more accurate than the random model (*p*-value < 0.999). The average precision and recall are also the highest for the random forest model with both values being around 70%. This indicates that the model correctly predicts around 70% of the actual target price hits (recall) and that also about 70% of the positive predictions are actual hits (precision). For the Year-Highest target, the ranking of the methods is the same, with the random forest performing the best in terms of accuracy and, both the random forest and SVM show average accuracies that are highly significantly more accurate than that of the random model (*p*value < 0.999). It is noteworthy that all metrics—average accuracy, average precision, and average recall are higher for all methods for the Year-Highest target than for the Year-End target. This is likely based on the fact that it is an easier classification task to determine if

*Sustainability* **2021**, *13*, x FOR PEER REVIEW 18 of 29

The next question investigated is that of the feature importance, meaning, which variables are relevant and used by each of the two machine learning algorithms for their models. The relevance of features (=variables) for these two models for both targets is displayed in Figure 9. iables are relevant and used by each of the two machine learning algorithms for their models. The relevance of features (=variables) for these two models for both targets is displayed in Figure 9.

**Figure 9.** Feature importance by model and target. **Figure 9.** Feature importance by model and target.

The feature importance scores illustrate that for both the Year-End and the Year-Highest random forest and SVM models the most relevant variable is the mean target price of the stock. This may not be surprising given that (1) the mean target was the target price used to set up both of the targets and (2) it represents a consensus of analysts about the expected (average) stock price in the future. For the random forest model, the number The feature importance scores illustrate that for both the Year-End and the Year-Highest random forest and SVM models the most relevant variable is the mean target price of the stock. This may not be surprising given that (1) the mean target was the target price used to set up both of the targets and (2) it represents a consensus of analysts about the expected (average) stock price in the future. For the random forest model, the number of target prices was the second most relevant variable whereas for the SVM models it was only the third most relevant one. In order to analyze the obtained model performances in more detail and understand for which type of observations the model works particularly well, the overall accuracy accomplished is broken down by the mean target price and the number of target prices. This breakdown for the random forest and SVM model with the Year-End target is presented in Figure 10. The categories for the number of targets were created with the help of the 33rd and 67th percentile of the number of analysts covering a stock as cut-off points. Thus, the number of targets is considered "Small" when an observation is covered by 1–6 analysts, "Medium" for 7–14 analysts, and "Large" when 15 or more analysts' target prices are available.

more analysts' target prices are available.

**Figure 10.** Model accuracy by mean target and number of targets for the Year-End target. **Figure 10.** Model accuracy by mean target and number of targets for the Year-End target.

The results show that for both the random forest and SVM model, the average accuracies tend to be the highest for the very high mean target prices ("Above 70%" and "30% to 70%), followed by the lowest mean target prices ("Under 0%), which imply a decrease from the current stock price. Both models rarely predict the positive class (target price met) for observations with very high and high mean target prices ("Above 70%", "30% to 70%)—but the SVM is in that case more extreme by almost never predicting a "hit" for these return groups (see in Figure A3 in Appendix A). Moreover, the precision of the random forest for these return groups tends to be rather high, indicating that when it predicts a hit (which it does not do often), then it is often correct with that prediction (see in Figure A2 in Appendix A). This holds true especially for stocks with high target returns ("30% to 70%", "Above 70%") and that are highly covered meaning that there are 15 or more (recent) analyst prices at that time available for it. These two subgroups show a precision of 84.95% and 93.06%, indicating that positive predictions are in the vast majority of cases correct. It should be pointed out that the random forest model can also be considered prudent since the recall is not high for instance 37.53% and 25.97% for these subgroups highlighting that often observations for stocks that hit their target prices are not predicted as positive. These results are very different for the SVM model for the Year-End target, which almost never predicts a positive outcome for the high return groups and even when it does, the precision is generally low. Thus, the high accuracies achieved with the SVM for the high return groups are almost exclusively based on predicting a negative outcome (which is the majority class label for these return groups). This likely makes this model less attractive for potential investors since correctly predicting hits of a target price provides usually more information than the miss. In particular, a hit states a minimum return achieved (the target return) to be an actual hit, whereas a miss does not provide other The results show that for both the random forest and SVM model, the average accuracies tend to be the highest for the very high mean target prices ("Above 70%" and "30% to 70%), followed by the lowest mean target prices ("Under 0%), which imply a decrease from the current stock price. Both models rarely predict the positive class (target price met) for observations with very high and high mean target prices ("Above 70%", "30% to 70%)—but the SVM is in that case more extreme by almost never predicting a "hit" for these return groups (see in Figure A3 in Appendix A). Moreover, the precision of the random forest for these return groups tends to be rather high, indicating that when it predicts a hit (which it does not do often), then it is often correct with that prediction (see in Figure A2 in Appendix A). This holds true especially for stocks with high target returns ("30% to 70%", "Above 70%") and that are highly covered meaning that there are 15 or more (recent) analyst prices at that time available for it. These two subgroups show a precision of 84.95% and 93.06%, indicating that positive predictions are in the vast majority of cases correct. It should be pointed out that the random forest model can also be considered prudent since the recall is not high for instance 37.53% and 25.97% for these subgroups highlighting that often observations for stocks that hit their target prices are not predicted as positive. These results are very different for the SVM model for the Year-End target, which almost never predicts a positive outcome for the high return groups and even when it does, the precision is generally low. Thus, the high accuracies achieved with the SVM for the high return groups are almost exclusively based on predicting a negative outcome (which is the majority class label for these return groups). This likely makes this model less attractive for potential investors since correctly predicting hits of a target price provides usually more information than the miss. In particular, a hit states a minimum return achieved (the target return) to be an actual hit, whereas a miss does not provide other information than that the return is lower than the target return, which can still be positive or be negative (exception ("Under 0%")).

of target prices was the second most relevant variable whereas for the SVM models it was only the third most relevant one. In order to analyze the obtained model performances in more detail and understand for which type of observations the model works particularly well, the overall accuracy accomplished is broken down by the mean target price and the number of target prices. This breakdown for the random forest and SVM model with the Year-End target is presented in Figure 10. The categories for the number of targets were created with the help of the 33rd and 67th percentile of the number of analysts covering a stock as cut-off points. Thus, the number of targets is considered "Small" when an observation is covered by 1–6 analysts, "Medium" for 7–14 analysts, and "Large" when 15 or

The two models are also very accurate on observations with a mean target that is below the current stock price ("Under 0%"). For these observations the model tends to predict the positive class (target price met) in 90% to 100% of the cases and, thus, unsurprisingly correctly predicts most observations that are actually positive. The observations "Under 0%" have a high share of stocks that after one year are at or above the target price, which may indicate that the mean target price is accurate or even too pessimistic. However, investors should keep in mind that the target price is below the current price, so this does not necessarily reflect an investment opportunity. However, the average actual return associated with these observations is over 26% (within 12 months) with 63.9% of observations in that group showing a positive return instead of a decline over the 12-month period.

period.

*Sustainability* **2021**, *13*, x FOR PEER REVIEW 20 of 29

or be negative (exception ("Under 0%")).

#### This breakdown for the random forest and SVM model with the Year-Highest target is presented in Figure 11. This breakdown for the random forest and SVM model with the Year-Highest target is presented in Figure 11.

information than that the return is lower than the target return, which can still be positive

The two models are also very accurate on observations with a mean target that is below the current stock price ("Under 0%"). For these observations the model tends to predict the positive class (target price met) in 90% to 100% of the cases and, thus, unsurprisingly correctly predicts most observations that are actually positive. The observations "Under 0%" have a high share of stocks that after one year are at or above the target price, which may indicate that the mean target price is accurate or even too pessimistic. However, investors should keep in mind that the target price is below the current price, so this does not necessarily reflect an investment opportunity. However, the average actual return associated with these observations is over 26% (within 12 months) with 63.9% of observations in that group showing a positive return instead of a decline over the 12-month

**Figure 11.** Model accuracy by mean target and number of targets for the Year-Highest target. **Figure 11.** Model accuracy by mean target and number of targets for the Year-Highest target.

The average accuracy of both models is not just higher for the Year-Highest target than for the Year-End target (see Table 5) but there also seems to be clearly less variation among the average accuracy values for different subgroups. It is interesting to note what for both models there are more positive predictions for the high return groups, but the recall for them tends to be lower (see Figures A4 and A5 in Appendix A). However, the opposite is true for the moderate return groups such as "10% to 29.9%" or "0% to 9.9%" which tend to have the same or a larger share of positive predictions for the Year-Highest than for the Year-End target but have a higher recall. This means that for these moderate return groups the share of positive predictions that turn out to the correct is higher. The simple reason for the higher accuracy and precision on these moderate return groups is likely the fact that the magnitude of the estimated increase is not that high, and the stock price has an entire year to reach it at least at a single point in time. Since stock prices tend to fluctuate over a year, it appears plausible that especially low to moderate increases can happen at least temporarily during that entire time period. This also highlights the main problem of models using the Year-Highest target: investors do not know at which time and for how long targets may be met, thus requiring strict and continuous monitoring of the stock prices and optimal market timing to accomplish the results suggested by the Year-Highest model. However, if this is possible for an investor, then the predictions especially for the moderate target groups may be of interest due to the high precision. The average accuracy of both models is not just higher for the Year-Highest target than for the Year-End target (see Table 5) but there also seems to be clearly less variation among the average accuracy values for different subgroups. It is interesting to note what for both models there are more positive predictions for the high return groups, but the recall for them tends to be lower (see Figures A4 and A5 in Appendix A). However, the opposite is true for the moderate return groups such as "10% to 29.9%" or "0% to 9.9%" which tend to have the same or a larger share of positive predictions for the Year-Highest than for the Year-End target but have a higher recall. This means that for these moderate return groups the share of positive predictions that turn out to the correct is higher. The simple reason for the higher accuracy and precision on these moderate return groups is likely the fact that the magnitude of the estimated increase is not that high, and the stock price has an entire year to reach it at least at a single point in time. Since stock prices tend to fluctuate over a year, it appears plausible that especially low to moderate increases can happen at least temporarily during that entire time period. This also highlights the main problem of models using the Year-Highest target: investors do not know at which time and for how long targets may be met, thus requiring strict and continuous monitoring of the stock prices and optimal market timing to accomplish the results suggested by the Year-Highest model. However, if this is possible for an investor, then the predictions especially for the moderate target groups may be of interest due to the high precision.
