Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

A Method to Explore the Best Mixed-Effects Model in a Data-Driven Manner with Multiprocessing: Applications in Public Health Research

Eur. J. Investig. Health Psychol. Educ. 2024, 14(5), 1338-1350; https://doi.org/10.3390/ejihpe14050088

by Hyemin Han

Reviewer 1: Anonymous

Reviewer 2:

Daniel W. K. Tse

Reviewer 3: Anonymous

Reviewer 4:

Tim Hulsen

Eur. J. Investig. Health Psychol. Educ. 2024, 14(5), 1338-1350; https://doi.org/10.3390/ejihpe14050088

Submission received: 13 February 2024 / Revised: 6 May 2024 / Accepted: 8 May 2024 / Published: 10 May 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This is a well-written manuscript. The only thing I would recommend is expanding the introduction to include other methods commonly used in model selection, such as model averaging. A more thorough overview of alternative methods and their limitations would strengthen the proposal for this method (as is done for stepwise selection algorithms, already)

Author Response

Dear Reviewer 1,

Thank you very much for your invaluable comments to improve my paper. You can find how to revised the manuscript as per your comments below:

Comment

Response

Thanks a lot for your suggestion. In the revised manuscript, I added additional alternative model search methods following your suggestion:

There are several alternative approaches to generating plausible prediction models that are not completely suitable for identifying the best prediction models. First, we may consider BMA. BMA averages the most probable prediction models based on their Bayesian posterior probability [30]. Previous studies reported that BMA shows improved prediction accuracy, particularly cross-validation accuracy, and addresses uncertainty existing in model selection processes [14]. Second, variable selection and regularization methods, such as LASSO and elastic-net regression, can also be employed [31]. These methods are suitable for selecting variables and regularizing coefficients to minimize cross-validation errors in prediction to prevent potential overfitting [13,32]. Although these methods perform effectively in generating prediction models, compared with the model exploration method I will propose, they have several limitations in being used in health and psychological research. The result from BMA does not suggest one specific best model; instead, it demonstrates coefficients from averaging multiple candidate models [33]. Furthermore, gathering information for statistical inference, such as significance, by performing regularization is more difficult than the conventional analysis methods [34]. From the practical side, I could not find any available R packages implementing these methods within the context of mixed-effect analysis. (pp. 4-5)

Reviewer 2 Report

Comments and Suggestions for Authors

This article is a revolutionary approach in improving the quantitative research method which has some important weaknesses already. The best credit is the showing of program codes and other useful materials for readers to trace and drill down the details. There are some minor issues which need to be solved:

1) It is not so common to use first-person writing style to describe the work done in academic world.

2) In line #191, why are the variable names the same for fixed effects and random slopes? It is quite confusion.

3) From lines #208-210, why is there no case for Y~X1 + X2? Same questions to line #213-214.

4) in line #223, why is no need to include 'number of cores'?

5) There are so many results mentioned. Should the author show them using tables?

6) In line #412, why not consider quantum computing? Large-scale cluster computing is not cheap.

The writing style and the presentation method need to be revamped.

Author Response

Dear Reviewer 2,

Thank you very much for your invaluable comments to improve my paper. You can find how to revised the manuscript as per your comments below:

Comment 1.

1) It is not so common to use first-person writing style to describe the work done in academic world.

Response 1.

Thanks a lot for your comment about using the first-person writing style. I understand that such is not recommended in some field, in the field of psychology, the writing convention, the APA style, encourages using the first-person expression. The style guidelines suggest authors avoid to use the third-person expression or passive voice unnecessarily. For additional information, please refer to these documents:

https://owl.purdue.edu/owl/research_and_citation/apa6_style/apa_formatting_and_style_guide/apa_stylistics_basics.html

https://apastyle.apa.org/style-grammar-guidelines/grammar/first-person-pronouns

https://apastyle.apa.org/blog/first-person-myth

Comment 2.

2) In line #191, why are the variable names the same for fixed effects and random slopes? It is quite confusion.

Response 2.

I appreciate your comment about the further clarification. Because ordinary multilevel modeling requires including random slopes that were already employed as fixed effects, you saw some overlapping variable names between them. Users can designate which fixed effect variables are supposed to be included as random slopes.

Comment 3.

3) From lines #208-210, why is there no case for Y~X1 + X2? Same questions to line #213-214.

Response 3.

Thanks for your question. Because

explore.models (data, Y X1 + X2 + X3, ’G’, c(’X1’,’X2’), ’X3’, 4)

Requires X3 should be included all possible models, X3 appeared all possible combinations. So, "X1 + X2" did not appear in the results.

Comment 4.

4) in line #223, why is no need to include 'number of cores'?

Response 4.

Thanks for your comment. I revised the sentence accordingly:

When calling explore.models, three parameters, the group variable, the list of random slopes, the list of variables, are required. The number of cores to be employed is optional (default = 1). (p. 6)

Comment 5.

5) There are so many results mentioned. Should the author show them using tables?

Response 5.

I appreciate your question. Because many numbers, including the model complexity and processing time, are required to be reported in the results section, tables were added despite the presence of many representative numbers appearing in the main text.

Comment 6.

6) In line #412, why not consider quantum computing? Large-scale cluster computing is not cheap.

The writing style and the presentation method need to be revamped.

Response 6.

Thanks for your suggestion. However, I decided not to add information about quantum computing, because quantum computing is deemed to be not ideal for exploring candidate models with big data while addressing classical problems. Please refer to this document for further details about the discussion:

https://spectrum.ieee.org/quantum-computing-skeptics

Reviewer 3 Report

Comments and Suggestions for Authors

Abstract

The abstract introduces a creative new model for multiprocessing public health research in a timely manner. Additionally, the abstract provides a detailed outline of how this model was created and acknowledges that it can be used for future research. However, this abstract is not appropriate for publication in its current state and requires major revisions throughout the paragraph. The abstract has major grammatical errors that must be addressed. There is an evident language barrier, which is noticeable throughout the abstract. The addition of another author with English as a first language could improve the flow of the article. Additionally, the abstract provides a detailed explanation of the methods, and could benefit from more background information, or conclusions that the author developed from their module’s data. It is difficult to understand what the author developed, what makes the author’s model different from existing models, and what the implications of this new model are from this abstract.

Line 1: explore the best models

Line 3: compared; the sentence is a run-on sentence, does not flow grammatically and hard to understand based on how it is currently written

Line 4: Provide abbreviations for Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC), they are used later in the abstract but never defined.

Line 5: Can we elaborate more on the three previous studies in public health? More information on these is needed. From its current state, the abstract does not have any novelty because what this R module is being compared to is unknown or not understood by the reader

Line 7-8: “After conducting model exploration with explore.models, I calculated the model Bayes Factors of the nominated best models for validation.” This whole sentence needs to be rewritten, unclear to readers, not easy to comprehend what the author is trying to say here.

Line 8-9: “The results suggested that explore.models using AIC and BIC was able to nominate best candidate models that also demonstrated superior model Bayes Factors compared with competitors, the full models in particular.” This whole sentence is grammatically incorrect, unreadable, and appears to be written by someone who does not use English as first language. I am unable to understand what the result is when given this sentence. Consider a complete revision, “The results suggested that explore.models using AIC and BIC was able to nominate best successfully identified full candidate models that also demonstrated superior model Bayes Factors when compared with competitors. the full models in particular”

Line 11-12: explore.models required the shorter processing time compared with in comparison to complete model Bayes Factor calculations.

Line 12: “I discussed the implications of this R module for future research in the field.” This needs additional information. What are the implications? This is not an appropriate summary of the results or future directions.

Introduction

The introduction includes interesting data from a study regarding COVID-19 vaccination intent across many countries and suggests ways to improve the analysis of this data through more sophisticated models. The author explains that there is a gap for their style of models, identifying other literature that supports the development of their model. Also, the author gives a thorough explanation of the reasoning behind why they designed their study the way they did. But this may be more of a beneficial addition to the Methods section. Despite defining the gap and having some interesting data, major edits must be made to this introduction. One important detail that is missing from this introduction is the objectives of this article and what the researchers’ goals are. A lack of objectives gives the article a lack of purpose. Another major flaw of the introduction is that it requires major grammatical revisions. The introduction is hard to follow due to the abundance of run-on sentences and wordy explanations. Additionally, many different concepts are introduced but not thoroughly explained, as if there is an assumption that the reader is already an expert in this subject. Many of the studies and ideas mentioned would benefit from further explanation from the author, so the reader can comprehend why these are relevant to the introduction and the article.

Line 14: Why is there a 1. in front of introduction? There are not any other sections that are numbered, suggest removing this.

Line 17-20: This is a very long, run-on sentence. Consider breaking up into two sentences, for example. “The mixed-effects model method enables us to examine associations between predictors and the dependent variable of interest at the population level (fixed effects). It also allows us to assess how the intercepts (random intercepts) and the aforementioned associations vary across different groups (random slopes), especially when observations are nested within groups [1]."

Line 18: “(fixed effects)” what does it mean?

Line 19: “(random intercepts)” what does it mean?

Line 20: “(random slopes)” what does it mean?

Line 21-23: “in analysis, analysis” Consider revision of this entire sentence, putting the same word twice is hard for readers to comprehend. Consider the following revision: When data exists in multiple groups, failing to consider group-level factors in analysis can lead to misleading results.

Line 24: so likely resulting in to end up overconfident estimates

Line 25: For instance, we may consider A global public health study across 43 countries conducted in the field of global public health,

Line 26: which explored the relationship between people’s trust in government and science, and

Line 27: COVID-19 vaccine intent across 43 countries [ 4].

Line 28: Replace “get” with a more professional word, such as receive.

Line 30: to be biased.

Line 31: countries, while assuming predictors are the same [6], this possibility warrants warranting the necessity

Line 33: in different across various countries. Consider using another word since “different” was utilized in the Line above.

Line 35: Adding a semi colon makes this sentence too long. Consider ending the sentence in Line 34 and starting the next sentence: “This suggests that…” In fact, the analysis results in

Line 36: regression models, including random intercepts and slopes,

Line 37: significantly better predicted outcome variables significantly better in comparison compared with the simpler models that only

Line 38: possessed with fixed effects [4].

Line 41: “what shall we be supposed to do?” does not make grammatical sense, must be rewritten. It is an interesting way to propose a question in the article to make readers guess what comes next. Consider “we should consider what actions to take next”

Line 43: examined

Line 44-47: This is a very long sentence that is hard to follow. Consider breaking up into two sentences for better flow for readers.

Line 52: might not be able may be unable to accurately predict

Line 53: data accurately [12].

Line 56: Then, If researchers are genuinely interested in searching for the model that best explaining explains

Line 58-59: Is the intent of this sentence to lead into the following paragraph, Methods for Model Exploration? If not, consider elaborating on these methods and employed, because this sentence seems like an unrelated ending to the paragraph.

Line 62: researchers or the researcher

Line 68-70: Consider combining these two sentences and removing such as forward and backward selection. “First, the variable selection process can be arbitrary, for example, different stepwise methods may suggest different outcome models”

Line 90: to others when given data

Line 93-94: This is not a complete sentence “However, there are several practical limitations warrant further considerations.” Consider adding limitations that warrant

Lines 96-99: Second, in the case of MLM, which is the main interest of this paper, the existing R libraries, such as BayesFactor, implementing diverse model exploration with feasibility only allowing exploring models with random intercepts, but without random slopes [21].

Line 101: Abbreviations for AIC and BIC should be introduced earlier as they are utilized prior to being defined here.

Line 103: above-mentioned should be edited, for example to mentioned above

Line 117-121: This is a very long sentence, consider breaking into multiple sentences for better flow for the reader

Line 122: In addition, in general,

Line 124: Compared with what criteria? What is the criteria?

Line 125: “despite their calculation is computationally heavier.” is grammatically incorrect. Consider revision, for example despite their heavier computational calculation

Line 126: Some argued for BIC must be clarified. Who argued for BIC? What was the basis for this argument? More detail is necessary.

Line 135: time,

Line 150: particularly when a complex model is examined Consider removing to improve flow of sentence

Line 157-158: Author refers to Blackburn et al., Han, and Ntontis et al. as if they have been introduced to the reader, but they have not

Line 162: Overuse of the word abovementioned, consider utilizing another word

Methods

The author provides tutorials in a working link at the beginning of the section so readers can utilize the codes for themselves. Also, the author provides very detailed lists and explanations of possible candidate models, while walking the reader step-by-step through their methods. The author addresses a gap in literature with an explanation of how they created this novel model with a detailed tutorial. However, there are major grammatical errors that persist into the methods section and take away from the author’s hard work. These run-on sentences and grammatical errors must be revised in order to allow readers to fully understand the methods section. In conclusion, the reader is unable to follow the methods to replicate the author’s model because the methods section is grammatically flawed.

Line 182-185: Consider revising the structure of this sentence, utilization of semi-colons is excessive

Line 222-224: “When calling explore.models, four parameters, the group variable, the list of random slopes, the list of variables that must be included, and the number of cores to be are not required.” Are all of these elements not required? Or just the number of cores? Consider clarifying and revising this sentence

Line 275: models was 97, indicating that

Line 288: Consider removing (7)

Line 304-305: Consider removing “as well.” It is repetitive after starting the sentence with “In addition to”

Line 305-306: Must consider rewriting the sentence “The processing time was analyzed to examine whether my model exploration method can complete comparing all possible candidate models more quickly than Bayesian MLMs”

Line 308: above-mentioned… We must be consistent with how we are spelling this word, there are times were it is spelled abovementioned and here it is hyphenated. Please edit for consistency

Line 309: Because Bayesian MLMs requires an extremely long time

Results

The results section contains major flaws. It is recommended to refrain from using I (first person) in scholarly writing. It was noted several times throughout this section. Additionally, the results section should solely focus on the results. The interpretation of the results and their significance should go in the discussion section. Please find the edits for this section below.

Line 316: Instead of I conducted, it is recommended to say “this research was conducted with”

Line 317: “with explore.models with models suggested by explore.models” Using with twice in a row sounds redundant, also said models frequently in this sentence

Line 326-327: “Only included one random slope, primary stressors” It is suggested to say “only included one random slope, which was primary stressors”

Line 332: Instead of suggesting , use “suggests”

Line 335: It is recommend to say which was similar to after the word predictors

Line 337: There should not be a comma after the and, also suggest deleting the word and

Line 339: “When the BIC-best model as” Use was instead of as

Line 343: “,10” It is recommended to say which was 10

Line 343: “Thus, I shall conclude that the models” à “It can be concluded that the models”

Line 346: “was not significant, 2(4297.25 − 4296.58) = 1.34” It is suggested to say was 2(4297.25 − 4296.58) = 1.34 and not significant

Line 347: Instead of which, start new sentence with this was below

Line 351: Instead of saying as expected, say “as predicted”

Line 353: “One note is that when” it is recommended to say of note, when Blackburn

Line 359: “Even if we assume” it is recommended to say even if one assumes

Line 362: “,the actual processing time would be” It is suggested to say this means the actual processing time

Line 363: “estimate, 43637.75 seconds” instead of the comma, say estimate of 43637.75 seconds

Line 363: Do not say as expected, it is suggested to say “similar to the predictions of this study”

Line 364: Suggest taking out “the” before the word shorter

Line 365: “The same trends was also found from my analysis of” Instead of was, use were. Also instead of saying my, say “found from the analysis of”

Line 363-366: This part of the results likely belongs in the discussion section.

Discussion:

The discussion succeeded in comparing the model to other models and explaining the use for the model. The limitations were noted well with specific examples. Since the paper did not include a conclusion, some of the text from the discussion section should be added to the conclusion. The conclusion should restate the main points and discuss next steps. A weakness of this paper was the lack of further direction for future studies. Please find the edits for this section below.

Line 369-371: “explore.models, which I invented, allows its users to explore the best prediction model among candidate models; the candidate models are generated by the combination of candidate predictors at the population and group levels following the users’ directions.”

· The sentence should not start with Explore.models

· Which I inventedà It is recommended to say “invented by the author”, want to refrain from using first person

· Instead of a semicolon highlighted in red, it is suggested to put a period and make the second part a different sentence.

Line 373: outcomes and to inflate false positivesà It is suggested to delete the word “to”

Line 380: Should the E be capital in explore.Models because it is the start of a sentence?

Line 381: What is brms? It is not defined anywhere else in the paper.

Line 385: “Stringent models are less likely susceptible” Recommend saying either “less susceptible” or “less likely to be susceptible”

Line 387-389: “In general, I found that researchers who intend to conduct data-driven model exploration with multilevel models will be able to employ explore.models to save their time while maintaining the credibility of the model selection process.”

· Rewording suggestion: Researchers who intend to conduct data-driven model exploration with multilevel models can use explore.models to save time, while also maintaining credibility of the model selection process.

Lines 389-390: “Also, since I composed the R codes, which include the customized functions, and tutorials available to the public via GitHub, researchers will be able to employ explore.models in their research projects feasibly.”

· Rewording suggestion: The R codes include customized functions and tutorials available to the public via GitHub, meaning researchers will be able to feasibly employ explore.models in their research projects.

Line 393: “information criteria are” Say “is” in the place of “are”

Line 397: “been regarded practically acceptable” Rewording suggestion: Has been regarded as acceptable

Line 398: “with proper prior distributions coherent with data” Consider rewording

Line 399: Recommend changing “that suggests” to “this suggests”

Line 402-403: “It might allow researchers to compromise between computational complexity and credibility of model recommendation reasonably.” Suggest changing to: It could allow researchers to reasonably compromise between computational complexity and credibility of model recommendation.

Line 404-406: “Second, although I was able to boost the processing time via use of information criteria and multiprocessing, it still requires a long time to complete model exploration with a complex model.” Rewording suggestion: Although this study was able to boost the processing time via use of information criteria and multiprocessing, it still requires a lengthy amount of time to complete model exploration with a complex model.

Line 407-408: “13 models with 2 predictors → 97 models with 4 predictors → 2315 models with 7 predictors” Recommend writing this out in a sentence instead of using arrows.

Line 409: “3.06 seconds → 18.83 seconds 409 → 3314.29 seconds” Recommend writing this out in a sentence instead of using arrows.

Line 411-412: “setting more restrictions (e.g., specifying more "should be 411 included" variables or less candidate random slopes” It is suggested to say “setting more restrictions, such as”

Line 414: “ Third, although” It is recommended to say Even though instead

Line 415-416: “the models do not necessarily provide theoretically and conceptually relevant and meaningful results” Rewording suggestion: Theoretically and conceptually relevant results that are meaningful.

Line 418: Possibly à This word can be deleted

Line 420: “Also, researchers will need to” à Rewording suggestion: Researchers will also need to consider

Line 423-424: “spending an unnecessarily long time for computation.” Rewording suggestion: spending an excessive amount of time computing the data

Line 424: “they may consider” It is suggested to say “they must consider”

Line 426: Instead of (e.g.,) Recommend making this a separate sentence and start it with for example

Line 430: Used also twice in this sentence, suggest deleting the also at the end of the sentence

Line 430: Apparently dataà Is this a term used in the field? The sentence read weird due to the word apparently.

Line 439: Do not start a sentence of with of course, you could delete this and start with the word ideally.

Conclusion: This article did not have a conclusion section; author should add one at the end of the discussion section.

Comments on the Quality of English Language