Enhancing Small Tabular Clinical Trial Dataset through Hybrid Data Augmentation: Combining SMOTE and WCGAN-GP
Round 1
Reviewer 1 Report
The paper cover important topic related to lack of sufficient amount of medical data required for AI algorithms development. However, some issues should be explained and clarified before the paper will be suitable for publication.
1. This work lacks literature review, thus its novelty and contribution to the field cannot be assessed.
2. The structure of database used in the experiments should be clearly defined (number of classes, number of samples in each class, etc.).
3. After artificial data generation, how the classification experiments were performed and evaluated? Which validation scheme was used, what was a proportion of real and augmented data in training/learning sets?
4. What is purpose of data clustering since many other supervised classifiers were already tested?
5. What is a content of Table 4, 6 and 7?
6. Based on Table 7, it seems that SMOTE approach outperforms the WCGAN-GP-based method. Thus, what is a reason to use both augmentation techniques?
7. Analysis of Fig. 4 shows that 3 PCA distributions are rather different for the analyzed 3 datasets. Please provide some quantitative measure that compares the similarity of these distributions.
8. What is the purpose of Section 4.5? How the bivariate analysis contributes to presented results?
9. Please perform classification experiments where training set consists of augmented data only while the test set contain the real data.
Minor editing of English language required.
Author Response
please find authors' response to reviewer comments in the attached file
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments
1. The keyword should be arranged in alphabetical order.
2. Separate the discussion from result.
3. This section provides a comprehensive outline of the experimental procedures. To improve the section further, consider providing more explicit details on rationale for choices made during data pre-processing and hyperparameter tuning, and including visualizations to aid in data similarity interpretation. Addressing these points would enhance the clarity and reproducibility of the research, and strengthen the overall scientific contribution of the manuscript.
4. Discuss the novelty of the results, particularly in terms of the hybrid approach combining SMOTE augmentation and WCGAN-GP training. Highlight how this approach overcomes limitations and provides a more diverse and realistic representation of the original data compared to existing methods.
5. Strengthen the discussion of the research's novelty and contributions to the field. Discuss implications of the findings for future research and potential areas of improvement in the hybrid approach. Specifically, highlight how the hybrid approach provides a novel solution to augmenting small tabular datasets, particularly in clinical trial research settings.
Good.
Author Response
please find the authors response to reviewer comments in the attached file
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Thank you for properly addressing all issues raised in my review. The paper now is suitable for publication.
--