Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Generative Adversarial Networks for Synthetic Data Generation in Finance: Evaluating Statistical Similarities and Quality Assessment

AI 2024, 5(2), 667-685; https://doi.org/10.3390/ai5020035

by Faisal Ramzan¹

, Claudio Sartori²

, Sergio Consoli³

and Diego Reforgiato Recupero^1,*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Reviewer 4:

Jan Lansky

AI 2024, 5(2), 667-685; https://doi.org/10.3390/ai5020035

Submission received: 17 April 2024 / Revised: 2 May 2024 / Accepted: 9 May 2024 / Published: 13 May 2024

(This article belongs to the Special Issue AI in Finance: Leveraging AI to Transform Financial Services)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

In this manuscript, the authors propose a new FinGAN algorithm for syntethic dataset generation and compare its characteristics with those of the given input stock dataset. The topic is interesting and worth investigating. The provided GitHub code, data, and description facilitate the reproducibility of the study.

My remarks are as follows:

In the abstract and “1. Introduction” section, the novelty and authors’ contributions should be clearly explained.

The presentation of the proposed algorithm is too brief – only the last three paragraps of “4.1. Model Setting and Parameters” subsection. Please, add a flowchart or pseudocode and explain your improvements in detail.

In “5. Performance Evaluation” section, the choice of the particular input dataset should be motivated. Please, describe its origin, main characteristics and deficiencies. In the analytical subsection, the obtained results are only visualised (in tabular and chart form). However, they should be analysed, compared and the differences should be commented. Another drawback is the limited

verification – only one input dataset and only one alternative GAN algorithm (TabularGAN). The complexity of the proposed algorithm could be compared with that of existing analogs.

In the last section, study limitation should be added.

Technical remark:

Figure 1: The flowchart does not align with the rules for algorithm description (for example, the if block contains only one output). Please copy the original from the cited source or redraw it.

Author Response

Dear Editor,

We would like to thank you and the reviewers for your work and the constructive comments we received that led us to improve the paper substantially.

We completed the requested revision of our manuscript “Generative Adversarial Networks for Synthetic Data Generation in Finance: Evaluating Statistical Similarities and Quality Assessment”, Manuscript ID ai-2994232, submitted to the Journal of Artificial Intelligence.

More in detail, in the following, each reviewer’s observation and our related responses have been organized as a series of questions and answers. All the major changes in the paper have been highlighted in blue.

Thanks again for all your feedback!

Best Regards,

Faisal Ramzan, Claudio Sartori, Sergio Consoli, Diego Reforgiato Recupero.

Reviewer #1:

Comment 1. In this manuscript, the authors propose a new FinGAN algorithm for synthetic dataset generation and compare its characteristics with those of the given input stock dataset. The topic is interesting and worth investigating. The provided GitHub code, data, and description facilitate the reproducibility of the study.

Response 1. We thank the reviewer for his/her feedback.

Comment 2. In the abstract and “1. Introduction” section, the novelty and authors’ contributions should be clearly explained.

Response 2. As indicated, we have better explained the innovations and contributions we bring in the paper.

Comment 3. The presentation of the proposed algorithm is too brief – only the last three paragraphs of “4.1. Model Setting and Parameters'' subsection. Please, add a flowchart or pseudocode and explain your improvements in detail.

Response 3. As indicated, we have added a flowchart of our methodology in Section 4.1 and have detailed each step in the text description within the section.

Comment 4. In “5. Performance Evaluation” section, the choice of the particular input dataset should be motivated. Please, describe its origin, main characteristics and deficiencies. In the analytical subsection, the obtained results are only visualized (in tabular and chart form). However, they should be analyzed, compared and the differences should be commented.

Response 4 The motivations for choosing the input dataset used for evaluating the performance of FinGAN have been described in detail in Section 3, highlighting the dataset's origin, main characteristics, and deficiencies. In particular, the input dataset was carefully selected to address the common challenges encountered in the field of finance, particularly those related to data scarcity, privacy concerns, and the need for high-quality synthetic data generation. The input dataset's characteristics make it a suitable testbed for evaluating FinGAN's ability to generate synthetic data that is statistically similar to real-world financial data, despite the constraints posed by the original data.

Next, in the reported Performance Evaluation section we aimed to demonstrate the efficacy of FinGAN in generating synthetic data that retains the statistical characteristics of this real-world financial dataset. In particular, our detailed evaluation reported in Section 5.2 has been expanded to further analyze and compare the obtained results. In detail, we have now included a statistical analysis of the algorithm's performance using the Friedman test and the Nemenyi post-hoc test, to corroborate our analysis and findings further.

We believe that our results provide statistical evidence that our novel GAN approach, FinGAN, consistently outperforms the classic TabularGAN approach, emphasizing the unique effectiveness of our model. As carefully explained in the section, FinGAN can produce effectively synthetic data that closely resembles the original dataset in statistical terms, thus addressing the issues of data scarcity and enabling robust financial analysis and machine learning model training.

Comment 5. Another drawback is the limited verification – only one input dataset and only one alternative GAN algorithm (TabularGAN). The complexity of the proposed algorithm could be compared with that of existing analogs.

Response 5. We appreciate the reviewer's comment regarding the scope of verification in our study. The input dataset used for evaluating the performance of our proposed Generative Adversarial Network model was indeed carefully chosen to reflect the complexities and challenges specific to the financial domain. We recognize that a broader evaluation incorporating multiple datasets and comparative analysis with several GAN algorithms would provide a more extensive validation of FinGAN's capabilities. However, it's important to note that the current scope of the paper is self-contained, and due to the constraints of the maximum page allowance set by the AI journal, it is not practically feasible to further expand the experimental evaluation within this publication. In this way, we also guarantee that the paper maintains a good focus and conciseness for the reader. We plan to address the comparative complexity and broader verification of FinGAN against various datasets and alternative GAN algorithms in our future work. This upcoming research will aim to further substantiate the efficacy and robustness of FinGAN in a wider range of contexts and against a more diverse set of benchmarks.

Comment 6. In the last section, study limitations should be added.

Response 6. As suggested by the reviewer, we have added Section 5.3 about the limitations of the proposed approach.

Comment 7. Technical remark: Figure 1: The flowchart does not align with the rules for algorithm description (for example, the if block contains only one output). Please copy the original from the cited source or redraw it.

Response 7. We thank the reviewer for his/her observation. As suggested we have fixed the corresponding flowchart.

Reviewer 2 Report

Comments and Suggestions for Authors

This paper introduces FinGAN, a sophisticated Generative Adversarial Network (GAN) model designed specifically for synthesizing continuous data in financial contexts. By employing techniques such as adjusting layer count, configuring neurons, implementing early stopping criteria, and fine-tuning hyperparameters like learning rates and activation functions, the model effectively captures intricate patterns present in original data. Nonetheless, there exist areas ripe for improvement:

1. Further elaboration is needed regarding the rationale for integrating Generative Adversarial Networks, particularly from a data-driven perspective.

2. The comparative analysis within this paper could broaden its scope to encompass a wider range of deep learning models, particularly the latest iterations driven by deep learning techniques.

3. Incorporating a comprehensive algorithmic framework into the paper would prove beneficial, providing a detailed overview of the design rationale underlying this paper's framework.

Comments on the Quality of English Language

Minor editing of English language required.

Author Response

Dear Editor,

We would like to thank you and the reviewers for your work and the constructive comments we received that led us to improve the paper substantially.

Thanks again for all your feedback!

Best Regards,

Faisal Ramzan, Claudio Sartori, Sergio Consoli, Diego Reforgiato Recupero.

Reviewer #2:

Comments 1. This paper introduces FinGAN, a sophisticated Generative Adversarial Network (GAN) model designed specifically for synthesizing continuous data in financial contexts. By employing techniques such as adjusting layer count, configuring neurons, implementing early stopping criteria, and fine-tuning hyperparameters like learning rates and activation functions, the model effectively captures intricate patterns present in original data. Nonetheless, there exist areas ripe for improvement:

Response 1. We thank the reviewer for his/her feedback. We have carefully revised our manuscript following the provided comments, and in the following we give answers to each of the raised points.

Comment 2. Further elaboration is needed regarding the rationale for integrating Generative Adversarial Networks, particularly from a data-driven perspective.

Response 2. As recommended, we have further elaborated on the rationale of integrating Generative Adversarial Networks (GANs) within our approach. In particular, we have incorporated a detailed description of this at the end of Section 4.

Comment 3. The comparative analysis within this paper could broaden its scope to encompass a wider range of deep learning models, particularly the latest iterations driven by deep learning techniques.

Response 3 We acknowledge the reviewer's comment regarding the scope of comparative analysis. While we agree that extending the analysis to include a range of deep learning models would enhance the validation of our proposed FinGAN's capabilities, the current manuscript has been designed to be self-contained. Given the constraints of the maximum page allowance set by the AI journal, expanding the experimental evaluation within this publication is not feasible. We believe that maintaining focus and conciseness is crucial for the readability and coherence of the paper.

Nevertheless, we recognize the importance of a broader comparative analysis, and we plan to explore this in our future work. Upcoming research will involve a more extensive verification of FinGAN against various datasets and alternative deep-learning techniques, which will allow us to provide a more comprehensive assessment of FinGAN's performance. This will enable us to better understand the model's efficacy and robustness across different contexts and against a wider array of benchmarks.

Comment 4. Incorporating a comprehensive algorithmic framework into the paper would prove beneficial, providing a detailed overview of the design rationale underlying this paper's framework.

Response 4. We appreciate the reviewer for the valuable feedback provided. In response to this suggestion, we have included a flowchart of the proposed methodology in Figure 2, which is elaborated upon in detail within Section 4.1 of the paper.

Comment 5. Comments on the Quality of English Language

Minor editing of English language required.

Response 5. We have revised the manuscript in search of compilation errors and minor typos, and corrected them accordingly. We have also further refined the usage of the English language in the text and paper presentation. We believe the paper now meets the high standards required to be published in a front research journal like AI.

Reviewer 3 Report

Comments and Suggestions for Authors

This paper introduces FinGAN, an advanced Generative Adversarial Network (GAN) model specifically designed for synthesizing continuous data in the financial sector. The model effectively captures intricate data patterns by employing techniques such as adjusting layer count, configuring neurons, implementing early stopping criteria, and fine-tuning hyperparameters like learning rates and activation functions. While the paper offers theoretical and practical insights, there are areas that could benefit from further refinement:

1. For the comparative analysis, the authors are encouraged to delve into statistical methods such as the Friedman test, followed by the Nemenyi test, p-test, and Wilcoxon test. These analyses can help emphasize the uniqueness of this paper's model compared to others.

2. The quality of images in the paper needs enhancement, including using clearer image versions and larger font sizes.

3. The abstract and introduction should clearly articulate research gaps and motivations. For instance, stating that "The limited availability of data can hinder the training and testing of machine learning models" only addresses a common field deficiency. The authors should meticulously outline these points in the introduction.

4. In Table 1's Evaluation Methods, aside from KL Divergence and Wasserstein Distance, the authors should consider including a broader range of divergence and distance measurement formulas for a comprehensive evaluation.

Comments on the Quality of English Language

Minor editing of English language required.

Author Response

Dear Editor,

We would like to thank you and the reviewers for your work and the constructive comments we received that led us to improve the paper substantially.

Thanks again for all your feedback!

Best Regards,

Faisal Ramzan, Claudio Sartori, Sergio Consoli, Diego Reforgiato Recupero.

Reviewer #3:

Comments 1. This paper introduces FinGAN, an advanced Generative Adversarial Network (GAN) model specifically designed for synthesizing continuous data in the financial sector. The model effectively captures intricate data patterns by employing techniques such as adjusting layer count, configuring neurons, implementing early stopping criteria, and fine-tuning hyperparameters like learning rates and activation functions. While the paper offers theoretical and practical insights, there are areas that could benefit from further refinement:

Response 1. We thank the reviewer for his/her feedback.

Comment 2. For the comparative analysis, the authors are encouraged to delve into statistical methods such as the Friedman test, followed by the Nemenyi test, p-test, and Wilcoxon test. These analyses can help emphasize the uniqueness of this paper's model compared to others.

Response 2. We appreciate the reviewer's suggestion to utilize statistical methods for a comparative analysis. We have indeed performed the recommended Friedman and Nemenyi tests. Our results provide statistical evidence that our novel GAN approach, FinGAN, consistently outperforms the classic TabularGAN approach. Specifically, according to the Nemenyi test, the average rank difference is statistically significant at the 1\% level. We believe that these results strongly emphasize the unique effectiveness of our model compared to others. The detailed results of these tests have been incorporated in Section 5.2 of the revised manuscript.

Comment 3. The quality of images in the paper needs enhancement, including using clearer image versions and larger font sizes.

Response 3. As suggested by the reviewer, we have enhanced the resolution of all the images present in the paper.

Comment 4. The abstract and introduction should clearly articulate research gaps and motivations. For instance, stating that "The limited availability of data can hinder the training and testing of machine learning models" only addresses a common field deficiency. The authors should meticulously outline these points in the introduction.

Response 4. We appreciate the reviewer's feedback. In response, we have enhanced the abstract by providing additional details regarding the innovative contributions outlined in our paper. Additionally, we have incorporated two examples in the introduction illustrating the potential consequences of unreliable data within the finance and health sectors.

Comment 5. In Table 1's Evaluation Methods, aside from KL Divergence and Wasserstein Distance, the authors should consider including a broader range of divergence and distance measurement formulas for a comprehensive evaluation.

Response 5. To respond to the reviewer’s request, we believe that the Energy Distance and the Maximum Mean Discrepancy (MMD) are two additional metrics well-suited for comparing the similarity between the generated synthetic data and the original dataset. Furthermore, in Section 5.2 (Analysis of the Results), we have provided a statistical analysis of the results using the Friedman test and the Nemenyi post-hoc test, to corroborate our analysis and findings further. These enhancements to our methodology and analysis serve to strengthen the validity of our findings and offer a more detailed evaluation of the FinGAN approach. We have included this in the revised manuscript, providing further evidence that supports the efficacy of the FinGAN model in generating high-quality synthetic financial data.

Comment 6. Comments on the Quality of English Language

Minor editing of English language required.

Response 6. We have revised the manuscript in search of compilation errors and minor typos, and corrected them accordingly. We have also further refined the usage of the English language in the text and paper presentation. We believe the paper now meets the high standards required to be published in a front research journal like AI.

Reviewer 4 Report

Comments and Suggestions for Authors

Its simple idea, but I like it, it can be very usefull and it can generate a lot of citations, because many researcher can use it for their own work. Quality of the text is OK, slightly improvement can be in experiment part.

abstract

-add some concreate examples of use

Introduction

- its short, but contain all necessary information.

- Maybe you can write more about usage, try find some articles for which will be your method usefull.

- Line 42 - write more about [10-13], what are differences ?

Related works

- good work, many related works 2022-23, maybe you can find some more from 2024

Chapters 3 and 4

- description is easy to read, I didnt find any problems there

Chapter 5

- experiments looks very simple, but acceptable

Conclussion

- try more promote your work, add more about usage, it can inspire other authors for use your work

Author Response

Dear Editor,

We would like to thank you and the reviewers for your work and the constructive comments we received that led us to improve the paper substantially.

Thanks again for all your feedback!

Best Regards,

Faisal Ramzan, Claudio Sartori, Sergio Consoli, Diego Reforgiato Recupero.

Reviewer #4:

Comments 1. It's a simple idea, but I like it. It can be very useful and it can generate a lot of citations, because many researchers can use it for their own work. Quality of the text is OK, slightly improvement can be in experiment part.

Response 1. We thank the reviewer for his/her feedback.

Comment 2. Abstract -add some concrete examples of use.

Response 2. We value the reviewer's feedback and have made improvements accordingly. Specifically, we have enriched the abstract with more detailed explanations of the innovative contributions presented in our paper. Furthermore, we have integrated two examples into the introduction to highlight the potential impacts of unreliable data in the finance and health sectors.

Comment 3. Introduction

- It's short, but contains all necessary information.

- Maybe you can write more about usage, try to find some articles for which your method will be useful.

- Line 42 - write more about [10-13], what are the differences ?

Response 3. Thank you for your feedback. We appreciate you pointing out the issue with the introduction. When converting the paper from LaTeX to Word, we mistakenly changed the citation format from [10,13] to [10-13]. This was corrected by replacing the dash with the comma, indicating two specific references rather than a range. Additionally, we have expanded on the explanation for references [10] and [13] (which now have become [16] and [17]).

Comment 4. Related works

- good work, many related works 2022-23, maybe you can find some more from 2024

Response 4. We searched online and discovered another work from 2023 that has recently been published. Following your suggestion, we have incorporated it into the related work section.

Comment 5. Chapters 3 and 4

- description is easy to read, I didn't find any problems there.

Response 5. We are happy to read that, thanks!

Comment 6. Chapter 5

- experiments looks very simple, but acceptable

Response 6. Thanks! Please note also that we have further included in Section 5.2 a statistical analysis of the results using the Friedman test and the Nemenyi post-hoc test, in order to further corroborate our analysis and findings.

Comment 7. Conclusion

- try more promote your work, add more about usage, it can inspire other authors for use your work

Response 7. Following this reviewer’s comment, in order to further stimulate the reader on the subject and promote our work and its usage, we have now added Section 5.4, "Benefits and Usage of Synthetic Datasets," to the manuscript. This section elucidates the advantages and potential applications of the synthetic datasets generated by FinGAN.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

In this second revision, the authors have addressed all of my concerns. My recommendation is to accept the manuscript as it is.

Article Menu

Generative Adversarial Networks for Synthetic Data Generation in Finance: Evaluating Statistical Similarities and Quality Assessment

Further Information

Guidelines

MDPI Initiatives

Follow MDPI