Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessFeature PaperArticle

Peer-Review Record

Predicting Healthcare Mutual Fund Performance Using Deep Learning and Linear Regression

Int. J. Financial Stud. 2024, 12(1), 23; https://doi.org/10.3390/ijfs12010023

by Anuwat Boonprasope^1,2 and Korrakot Yaibuathet Tippayawong^2,3,*

Reviewer 1: Anonymous

Reviewer 2:

Özgür Ömer Ersin

Reviewer 3: Anonymous

Int. J. Financial Stud. 2024, 12(1), 23; https://doi.org/10.3390/ijfs12010023

Submission received: 31 December 2023 / Revised: 10 February 2024 / Accepted: 20 February 2024 / Published: 29 February 2024

(This article belongs to the Topic Artificial Intelligence Applications in Financial Technology)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

General Evaluation:

The authors do a lot of analysis, which is generally well done. The study is also well written and structured. However, the main problem with the study is its contribution. The authors try to examine only a few months of return data for a single mutual fund. In this context, I find it to be a purely statistical exercise with no broader research context and implications. I see no substantive contribution to the academic literature - either on machine learning or asset pricing.

Other Comments:

1. The introduction should be completely restructured, it says nothing about the results and is unconvincing about the contributions.

2. The methodology could be improved. For example, it is now a standard approach to use rolling or extended training periods. Meanwhile, the authors use only a single fixed window, which makes the study extremely susceptible to data mining and overfitting.

3. The choice of return predictors lacks any reasonable theoretical basis, or at least is not explained.

4. To improve the positioning, the authors should link their study to seminal or general applications of machine learning to stock return predictability - in both cross-sectional and time-series contexts, e.g., Gu et al. (2020), Brogaard and Zareei (2023), Zhou et al. (2023). Currently, the study is not properly rooted in the relevant literature.

5. Figures and tables are usually not self-contained and lack relevant notes.

6. Data are not adequately described. For example, do the authors use price or total return series? In what currency? etc.

7. The title could be shortened.

References:

Brogaard, J., & Zareei, A. (2023). Machine learning and the stock market. Journal of Financial and Quantitative Analysis, 58(4), 1431-1472. https://doi.org/10.1017/S0022109022001120

Gu, S., Kelly, B., & Xiu, D. (2020). Empirical asset pricing via machine learning. The Review of Financial Studies, 33(5), 2223-2273. https://doi.org/10.1093/rfs/hhaa009

Zhou, X., Zhou, H., & Long, H. (2023). Forecasting the equity premium: Do deep neural network models work? Modern Finance, 1(1), 1–11. https://doi.org/10.61351/mf.v1i1.2

Comments on the Quality of English Language

Satisfactory; the language quality is not the biggest problem of this paper.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

1) In abstract, the first sentence is not clear. What is meant with "push"? A new policy? The remaining part of the sentence is not grammatically correct. Careful revision for the whole abstract.

"...Multiple Linear Regression (MLR) algorithm. Remove algorithm here. model should be preferred.

It is stated that "The LSTM performs better than MLR, demonstrating an RMSE Overall of 0.0596, MSE overall of 0.0035, and R2 Overall of 0.7809."

LSTM is for time series problems with specifical focus on forecasting. With forecasting, out-of-sample forecasting performance is preferred. The criteria stated above, are they for in-sample, or one-step-ahead, or out of sample forecasts, say 1 month (30 days) ahead? The problem and focus of the paper is not clear with this respect. Further, as to be seen in the next pages, sample is divided to training, validation and test. Therefore, this also clarifies my concern that no out of sample practice is conducted.

Overall, revise abstract by showing the difference, contribution, highlights of the paper and where the paper stands in the existent literature.

2) Is AI necessary in keywords?

3) Referencing style is not Sustainability's standards, which should be in the form of [numbers]. But this is an issue for final stage.

4) Revise the intro section. Introduction focuses on discussing ML and Deep Learning. Not the focus of the paper, which is not central. The problem should be well discussed, which is a forecasting problem to overcome difficulties related to Covid-19 and form better policy measures. Health sector is explained towards the last sections. A revision is needed to bring out this focus to center.

5) A literature review is missing. A dedicated section should be added as section 2 to make the placement of this paper to the literature and its contribution compared to others.

6) "The dataset is divided into Training Data and Test Data in an 129

80:20 ratio. Within the Training Data, a further split is made into Training Data and Valida- 130

tion Data in a 90:10 ratio."

which one? How many divisions? 80-10-10? What about out of sample forecasting? a part should be left out after the test sample for out of sample forecasting practice.

7) MLR in eq. 1 has no dynamic structure. The model is static. If no dynamic relation, no forecasting can be conducted, because, the problem in hand becomes a prediction problem. This changes the overall suggestion that the paper is a paper focusing on forecasting. Especially, if no out-of-sample forecasts are taken from the model.

8) after line 176, check the format of text.

9) Why PCA is used and explained under 2.3? It is not stated. To take all these variables and produce a PC1, a new timeseries to be used for LSTM in forecasting? Clarify please.

10) R2 plummeting to -2.7617.? What? R2 is a measure between 0 and 1. This should be corrected. Also in table.

11) What is included in the input layer? It is not stated in the relevant section.

12) Do not use predict and forecast verbs in conjunction: "The use of a 10-day window size for predicting future prices 436

allows the model to forecast trends rather than capturing noise in the data." As stated also above, forecasting is for out of sample, or for one-step-ahead in some cases, predict is for in-sample estimation.

13) Please bring figures for y and y hats for the MLR and LSTM next to each other and also the relevant tables together. This will bring the improvement foreward.

14) Diebold-Mariano tests to test predictive accuracy of two models and also as a confirmatory analysis for the results for MLR and LSTM.

15) "And in the part of LSTM, this study employed PCA to reduce the dimensionality 476

of the data, making it more manageable for faster processing and avoiding overfitting 477

issues associated with capturing data noise. The data dimensionality was reduced from 12 478

features to 6 features, retaining up to 96.23 percent of the information."

This information should be presented before as I noted in previous comments. Overall, in methodology section, authors could present a flowchart of the methods to show what is done.

16) Future directions is in the final heading however it is not satisfactorily presented.

17) Mostly, R squares are used in the text and in the conclusion for comparisons. Also the RMSE and MSE. However, in addition to numbers, the percentage of reduction in RMSE could be used to show the gain from LSTM.

18) Other than these issues, other issues are: referencing could be augmented in the LSTM section, which could also be reduced in terms of size. It is too long of a section which can be made compact while preserving the information.

The application section could be shortened in size, however, this might be also due to the format used instead of sustainability's font and spacing format.

19) Literature for applications to be improved with references

https://doi.org/10.3390/math11081785

https://doi.org/10.1016/j.chaos.2020.109864

Lastly, the paper has important merits and my suggestion is positive so far. However, the contribution of the paper is not well presented, sectioning problems exist, presentation of the results is not clear and no discussion to show the placement of the paper in the existent literature. My suggestion is to improve the paper in terms of the critiques above.

Best Regards.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The subject of the work - Predicting Post-COVID-19 Trends in Healthcare - is potentially interesting for readers and very current. The authors conducted research using known and developed research methods, however, in my opinion, the conclusions are correct.
The results of the research should be examined more thoroughly. The work requires corrections: actual values should be marked on Figure 1. On Figures 3 and 6, all axes should be labeled. Bibliography should be expanded.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

I must commend the authors for the extensive changes they have made. That said, I do have a few minor comments, mostly editorial.

1. I suggest that the title be shortened to "Predicting Healthcare Mutual Fund Performance Using Deep Learning and Linear Regression".

2. After reading the introduction, the reader can still learn very little about the foundations of the study. I suggest that the authors elaborate more on the results in the introduction, devoting at least one or two distinct paragraphs to them.

3. It would be good to end the introduction with a paragraph outlining the structure of the paper.

4. In the literature review, I still think that the authors can improve the positioning of the paper by linking their study to seminal or general applications of machine learning to stock return predictability, such as Gu et al. (2020), Brogaard and Zareei (2023), Zhou et al. (2023). This comment of mine has been largely ignored.

5. The authors should ensure that their citation style conforms to the journal's requirements.

References

Brogaard, J., & Zareei, A. (2023). Machine learning and the stock market. Journal of Financial and Quantitative Analysis, 58(4), 1431-1472. https://doi.org/10.1017/S0022109022001120

Gu, S., Kelly, B., & Xiu, D. (2020). Empirical asset pricing via machine learning. The Review of Financial Studies, 33(5), 2223-2273. https://doi.org/10.1093/rfs/hhaa009

Zhou, X., Zhou, H., & Long, H. (2023). Forecasting the equity premium: Do deep neural network models work? Modern Finance, 1(1), 1–11. https://doi.org/10.61351/mf.v1i1.2

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The quality of English moved away from academic English after revision. Examples: Covid-19 situation in the abstract, "...see and learn" in the results.

The title is not informative about what is done after revision. Also, no need for dot at the end of title.

In the introduction, investing..two forms: direct to stocks, indirect through mutual funds. This sentence and paragraph is totally wrong and should be revised.

Model is revised. Made dynamic in terms of Y. Not the X's, which is central. Therefore, it is suspicious how forecasting is possible under this setting except for putting X at each t to produce forecasts. In fact, in forecasting, future is unkown, so we don't now what will be the x's after T+1, T+2... T indicating the end of the period, also the end of the test sample. Combined with the insufficient and too complex explanation of data division after revision, which was warned for in the last round, still the authors did not include an out-of-sample period for forecasting. The figure for data splitting is the same, training-validation-test. Test sample is not out-of-sample forecasting.

R square issue persists. Explanation is not satisfactory. Sorry to state this, however, it is quite problematic to calculate a -2. value for R square. Not to mention, R square is not a good metric for forecast or prediction accuracy evaluation.

DM tests provided, but no tables.

Suggested sources cannot be added to the literature, which are suggestions for Covid + forecasts + LSTM study examples. However, authors stated that they benefited from them and they added them to the literature section. In contrast to their statement, they did not in both aspects.

I regret to inform that given the empirical concerns of this study, my decision is negative for this version. Though efforts have been made and improved the paper. However, central empirical issues persist.

Comments on the Quality of English Language

English was ok except for minor typos. It got worse after revisions but this is not in terms of Grammar, but in terms of academic wording.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 3

Reviewer 2 Report

Comments and Suggestions for Authors

Corrections are made and my decision is positive for this final v3 of the manuscript.

Article Menu

Predicting Healthcare Mutual Fund Performance Using Deep Learning and Linear Regression

Further Information

Guidelines

MDPI Initiatives

Follow MDPI