Bayesian Learning Strategies for Reducing Uncertainty of Decision-Making in Case of Missing Values
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis paper presents a well-motivated and methodologically sound study addressing the critical challenge of missing data in predictive modeling for liquidity crisis forecasting. The integration of Bayesian Model Averaging (BMA) with Decision Trees (DTs) via RJ MCMC sampling is innovative, and the proposed "sweeping strategy" effectively mitigates overfitting while maintaining model interpretability. The novel Ext preprocessing technique (extending features with binary missing-value indicators) demonstrates significant empirical advantages over established baselines, particularly in handling non-random missingness. The real-world financial application enhances practical relevance, and the rigorous validation (synthetic benchmarks, AUC-PRC, Hosmer-Lemeshow tests) strengthens credibility.
1. The RJ MCMC birth/death/change moves (Algorithms 1–2) lack sufficient pseudocode detail. How the Metropolis-Hastings acceptance ratio is calculated (e.g., proposal distributions for parameters)? How priors (e.g., tree size, node parameters) are defined beyond uniform sampling. Please include mathematical formulations for key transition probabilities and a complete RJ MCMC sampling flowchart.
2. The computational cost of RJ MCMC (burn-in: 100k samples; post-burn: 5k samples) is non-trivial but underexplored. Please benchmark runtime against simpler methods (e.g., single DT, Random Forest) and discuss scalability. Could parallelization or variational approximations expedite sampling?
3. Only one synthetic (XOR) and one real-world dataset were tested. Please validate on additional UCI/standard datasets with controlled missingness mechanisms (MCAR, MAR, MNAR) to generalize claims beyond finance. The importance of Ext’s 14 binary indicators (Fig. 4) is not analyzed. Please discuss whether specific indicators (e.g., for debt ratios) drive predictions, enhancing model interpretability.
4. Table 2 (30+ rows) is overwhelming. Please condense Table 2 to critical thresholds (e.g., max F1, Youden’s index) and move full data to supplementary material.
5. Limited comparison with modern missing-data techniques (e.g., MICE, GAIN, or deep learning imputers). Please add a baseline using XGBoost/Random Forest with built-in missing-value handling to contextualize the 92.2% accuracy gain.
6. Uncertainty quantification (Fig. 5) is insightful but lacks guidance on translating posterior distributions into actionable decisions (e.g., risk thresholds for intervention). Please include a case study showing how uncertainty estimates directly impact a financial decision pipeline.
7. Critical implementation details (e.g., RJ MCMC proposal variances tuning protocol) are omitted. Please publish code/data (or synthetic data generator) and specify hyperparameter search ranges.
Author Response
Please see attached
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsPlease see the attached PDF file.
Comments for author File:
Comments.pdf
Author Response
Please see attached
Author Response File:
Author Response.pdf
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsI congratulate the authors for the excellent improvements made. They have addressed all the points raised in the first review.
I would like to point out the following minor issues for your consideration:
-
Lines 86–97 contain information that also appears in lines 98–106 regarding the definition of μj and M. I kindly ask you to check whether this is a typos-rephrasing issue.
-
Line 125. The sentences ends and begins with "in this paper". If possible, please rephrase the beginning of the following sentence.
-
FIGURE 1. The rectangles overlap, with arrows that are small and in some parts not clearly visible. Please check Figure 1.
-
Line 516. A truncated sentence reads "Tuning protocol". Is this a header or something else?
Author Response
Please see attached.
Author Response File:
Author Response.pdf