Next Article in Journal
Machine Learning and Deep Learning Models for Demand Forecasting in Supply Chain Management: A Critical Review
Previous Article in Journal
Innovating Patent Retrieval: A Comprehensive Review of Techniques, Trends, and Challenges in Prior Art Searches
 
 
Article
Peer-Review Record

The E(G)TL Model: A Novel Approach for Efficient Data Handling and Extraction in Multivariate Systems

Appl. Syst. Innov. 2024, 7(5), 92; https://doi.org/10.3390/asi7050092
by Aleksejs Vesjolijs
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Syst. Innov. 2024, 7(5), 92; https://doi.org/10.3390/asi7050092
Submission received: 7 May 2024 / Revised: 5 September 2024 / Accepted: 24 September 2024 / Published: 26 September 2024
(This article belongs to the Section Information Systems)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

1. The section "Related Works" is not comprehensive enough to give the readers a clear picture about the real state of the art. And, The author should engage with related literature on data engineering, AI integration, and digital ecosystems to situate the EGTL model within the broader research landscape and establish its relevance.

2. The author is suggested to provide more detailed explanations or examples of how the Fusion and Alliance stores operate within the EGTL framework to enhance the reader's understanding.

3. Should consider including case studies or practical examples to demonstrate the real-world application and effectiveness of the EGTL model in data integration scenarios.

4. Should discuss potential challenges or limitations of implementing the EGTL model to provide a balanced perspective on its feasibility and scalability.

5. In experiments, invite feedback from experts in the field of data engineering and AI integration to validate the EGTL model's theoretical framework and practical implications.

Comments on the Quality of English Language

Too much repetition in the text.

Author Response

Comments 1: The section "Related Works" is not comprehensive enough to give the readers a clear picture about the real state of the art. And, The author should engage with related literature on data engineering, AI integration, and digital ecosystems to situate the EGTL model within the broader research landscape and establish its relevance.

Response1: Thank you for your insightful comments and suggestions regarding the "Related Works" section of my paper. I appreciate your feedback on the need to provide a more comprehensive overview of the state-of-the-art and engage with the literature on data engineering, AI integration, and digital ecosystems.

In response, I have significantly expanded the "Related Works" section to include a more thorough examination of key research in these areas. Specifically, I have included additional references to foundational works in data engineering, the integration of artificial intelligence (AI) in ETL processes, and the evolving role of digital ecosystems in modern data architectures. This literature review now more explicitly situates the EGTL model within the broader research landscape.

In particular, I have added review of Medallion Architecture and compared with EGTL, ETL, and ELT, including a detailed discussion on how the EGTL model differentiates itself by incorporating generative techniques. I hope these revisions address your concerns and provide a clearer contextualization of the EGTL model within the current state-of-the-art. I remain open to any further suggestions you may have to improve the manuscript.

Comments 2: The author is suggested to provide more detailed explanations or examples of how the Fusion and Alliance stores operate within the EGTL framework to enhance the reader's understanding.

Response 2: Thank you for the suggestion, I totally see reason in adding more detailed explanation. This is implemented, also updated corresponding tables. On top of that I have elaborated on Fusion and Alliance stores examples applied to Hyperloop Decision-Making Ecosystem in Experiments section. 

Comments 3: Should consider including case studies or practical examples to demonstrate the real-world application and effectiveness of the EGTL model in data integration scenarios.

Response 3: Thank you for your valuable suggestion regarding the inclusion of case studies or practical examples to demonstrate the real-world application of the EGTL model.

In response to your comment, I have added an Experiments section that outlines preliminary results from testing the EGTL model in a prototype web-based Hyperloop Decision-Making Ecosystem. This section aims to provide a practical demonstration of the model's capabilities in controlled scenarios.

However, I would like to respectfully clarify that detailed real-world case studies fall outside the scope of the current paper, which is primarily focused on establishing the theoretical foundations and high-level solution architecture of the EGTL model and serves as a bridge for practical implementation. Future research will include more comprehensive empirical studies and case-specific validations to demonstrate the model’s application in real-world data integration scenarios.

I appreciate your understanding, and I hope the experiments provided offer some initial insights into the model's effectiveness and limitations.

Comments 4: Should discuss potential challenges or limitations of implementing the EGTL model to provide a balanced perspective on its feasibility and scalability.

Response 4: Thank you for your feedback on the need to discuss potential challenges and limitations of implementing the EGTL model. I have taken your suggestion into account and have expanded the Discussions section to address problems and limitations related with GenAI adoption to the model as well as elaborated on potential mitigation strategies for the challenges identified.

Comments 5: In experiments, invite feedback from experts in the field of data engineering and AI integration to validate the EGTL model's theoretical framework and practical implications.

Response 5: Thank you for your insightful suggestion to invite feedback from experts in data engineering and AI integration as part of the experimental validation of the EGTL model.

I have implemented an Experiments section to provide initial insights into the model's theoretical framework and practical performance. However, I agree that gathering feedback from field experts is important for a more comprehensive validation. This is indeed a valuable suggestion, and I plan to conduct industry Survey and incorporate expert reviews in future phases of this research, which will include more extensive empirical testing and case studies.

I appreciate your forward-thinking recommendations, and it will be an essential part of future research endeavors. I have made a significant effort to improve the English language throughout the manuscript and have addressed many instances of repetition to enhance clarity and readability.

I hope these addition addresses your concerns, and I welcome any further feedback.

Best regards,

Aleksejs Vesjolijs

Reviewer 2 Report

Comments and Suggestions for Authors

This paper proposes a novel model names EGTL which integrate Generative Artificial Intelligence within traditional ETL process before loading phase to enhance data handling and extraction. EGTL can process data of different formats and handle both batch and streaming data. This paper describes the methodology and framework of EGTL throughout, but does not carry out any experimental s to verify the effectiveness of the method, including evaluation indicators and concrete application examples.

 

1.       In Section 3 paragraph four, “Domain ownership, federated governance, data as product, self-serve data infrastructure platform to enable platform-wide thinking is a backbone of data mesh Data mesh offers decentralized …”. Is there a typo in “…data mesh Data mesh …”?

2.       In Section 3, paragraph five states that “The software development quality model for ETL embraces the following aspects: accuracy, efficiency, scalability, and maintainability.” While EGTL in this work doesn’t conduct any experiment to validate these quantized evaluation.

3.       In Table 2, both Data Augmentation’s Example Applications and Enhancing Data Quality’s Description are “Generating additional data points in sparse datasets”. What’s the difference between two functions.

4.       In Section 4 paragraph 10, “User interface is connected using API to Data Lake.” And in paragraph 13 “An essential component of this design is the establishment of a data lake,…”. Is that mean target data structure generated by our EGTL model is pooled into API System such as Hadoop Distributed System?

5.       EGTL model aims to combine various data sources and transform them to target data structure. It is hard to visualize data structure but specific User Interface (UI) application after EGTL and ETL can be provided. Please provide a visual example of the application.

6.       In Section 5, the line spacing of formula 1 and the following line is abnormal, please typeset carefully.

7.       In Section 7 paragraph 3, “As IT project personnel require adaptation and training to be able to operate with 441 novel big data tools, - traditional ETL process also should be adapted.” There is a print error “-“.

8.       Additionally, experiment configurations should be refined, and additional experiments conducted to bolster the conclusions drawn from the methodology.

Comments on the Quality of English Language

It can be improved.  

Author Response

Comments 1: In Section 3 paragraph four, “Domain ownership, federated governance, data as product, self-serve data infrastructure platform to enable platform-wide thinking is a backbone of data mesh Data mesh offers decentralized …”. Is there a typo in “…data mesh Data mesh …”?

Response 1:

Dear Reviewer,

Thank you for pointing out the possible typo in Section 3. I have reviewed the text and corrected the issue with the repeated phrase “data mesh.” The revised sentence now reads more clearly and without repetition.

Comments 2: Thank you for your valuable comment on the need for validation of the software development quality model in relation to EGTL. I have since implemented an Experiments section where I address the practical application of the EGTL model and provide insights into its performance. This section identifies key challenges and limitations while offering preliminary validation related to aspects like scalability, efficiency, and maintainability on which I elaborate in Discussions and Conclusions.

However, I acknowledge that more detailed, quantized evaluations, particularly in terms of accuracy, will be explored in future work as part of a broader empirical validation effort.

Comments 3: In Table 2, both Data Augmentation’s Example Applications and Enhancing Data Quality’s Description are “Generating additional data points in sparse datasets”. What’s the difference between two functions.

Response 3: Thank you for your observation regarding the duplication in Table 2. I have fully revised and corrected the table to address the issue. Specifically, I have clarified the distinction between Data Augmentation and Enhancing Data Quality. Additionally, I introduced a new aspect related to code generation by GenAI to further enrich the table’s content.

I appreciate your careful review and hope this revision resolves the issue.

Comments 4: In Section 4 paragraph 10, “User interface is connected using API to Data Lake.” And in paragraph 13 “An essential component of this design is the establishment of a data lake,…”. Is that mean target data structure generated by our EGTL model is pooled into API System such as Hadoop Distributed System?

Response 4: Thank you for your comment regarding the wording and potential ambiguity in Section 4. I have revised the related section to clarify the relationship between the user interface, API, and the data lake. Specifically, I clarified that while the user interface is connected via API to the data lake, the target data structure generated by the EGTL model is not simply pooled into an API system like Hadoop. Instead, the EGTL model integrates with data storage frameworks such as Hadoop Distributed File System (HDFS) for managing large-scale data, but it operates in a more comprehensive, structured process involving multiple stages, including fusion, staging, and alliance stores.

I believe these revisions address the ambiguity, and I appreciate your feedback on improving the clarity of the manuscript.

Comments 5: EGTL model aims to combine various data sources and transform them to target data structure. It is hard to visualize data structure but specific User Interface (UI) application after EGTL and ETL can be provided. Please provide a visual example of the application.

Response 5: 

Thank you for your valuable suggestion regarding the visualization of the data structure and User Interface (UI) application following the EGTL process.

I agree that visualizing the data structure itself can be challenging, but to address your comment, I have included a visual example of the specific User Interface (UI) application used after the EGTL and ETL processes. This UI, implemented in Streamlit, demonstrates how the processed data is visualized and integrated into a decision-making dashboard, providing a practical example of the EGTL model’s real-world application.

Comments 6: In Section 5, the line spacing of formula 1 and the following line is abnormal, please typeset carefully.

Response 6: Thank you! It is fixed.

Comments 7: In Section 7 paragraph 3, “As IT project personnel require adaptation and training to be able to operate with 441 novel big data tools, - traditional ETL process also should be adapted.” There is a print error “-“.

Response 7: Thank you! I revised and fixed print errors.

Comments 8: Additionally, experiment configurations should be refined, and additional experiments conducted to bolster the conclusions drawn from the methodology.

Response 8: 

Thank you for your suggestion regarding the refinement of experiment configurations and the need for additional experiments to support the conclusions.

I have fully implemented an Experiments section in the final version of manuscript. This section outlines the experimental setup, configurations, difficulty levels and results obtained, which provide practical insights for the EGTL model. The experiments conducted cover a range of configurations, addressing all phases of the EGTL process.

I believe the current experiments effectively bolster the conclusions drawn from the methodology. However, I acknowledge the importance of ongoing refinement, and future research will continue to build on these initial findings with even more comprehensive experiments.

Thank you once again for your valuable feedback! I have proofread the manuscript with native speaker and made improvements to the English language for better clarity and readability.

Best regards,
Aleksejs Vesjolijs

 

Reviewer 3 Report

Comments and Suggestions for Authors

The author presented a promising conceptual framework that extends the traditional ETL process by introducing the new "Generate" step by exploiting rapidly evolving generative AI technologies. While the framework sounds reasonable theoretically, there is a lack of practical examples. It would be nice if the author could provide an example of how it will work in practice. A practical use case example differentiating ETL and EGTL frameworks side-by-side would be nice for understanding the frameworks. 

 

Author Response

Comments 1: The author presented a promising conceptual framework that extends the traditional ETL process by introducing the new "Generate" step by exploiting rapidly evolving generative AI technologies. While the framework sounds reasonable theoretically, there is a lack of practical examples. It would be nice if the author could provide an example of how it will work in practice. A practical use case example differentiating ETL and EGTL frameworks side-by-side would be nice for understanding the frameworks. 

Response 1:

Dear Reviewer,

Thank you for your positive feedback on the conceptual framework and your suggestion to provide a practical example to better differentiate the ETL and EGTL frameworks.

In response to your comment, I have included an Experiments section that demonstrates how the EGTL model operates in practice. This section provides practical examples of data extraction, generation, and transformation using Generative AI within the EGTL process. Additionally, I offer comparisons between the traditional ETL and EGTL models to highlight the differences and advantages of the "Generate" step in real-world scenarios.

I hope this addition clarifies the practical application of the framework, and I appreciate your valuable feedback.

Best regards,
Aleksejs Vesjolijs

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

No

Back to TopTop