Next Article in Journal
A Method for Fingerprint Edge Enhancement Based on Radial Hilbert Transform
Previous Article in Journal
Optimal Control of a Semi-Active Suspension System Collaborated by an Active Aerodynamic Surface Based on a Quarter-Car Model
 
 
Article
Peer-Review Record

Enhanced Data Processing and Machine Learning Techniques for Energy Consumption Forecasting

Electronics 2024, 13(19), 3885; https://doi.org/10.3390/electronics13193885
by Jihye Shin 1,†, Hyeonjoon Moon 2,†, Chang-Jae Chun 3, Taeyong Sim 3, Eunhee Kim 4 and Sujin Lee 3,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Electronics 2024, 13(19), 3885; https://doi.org/10.3390/electronics13193885
Submission received: 20 August 2024 / Revised: 23 September 2024 / Accepted: 27 September 2024 / Published: 30 September 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors
  • The manuscript is clear, relevant for the field. It is suggested to change its structure - sections 2 and 3 - see comments in the attached PDF file.  
  • The cited references are well selected and mostly to recent publications (within the last 5 years). Some of them are older - but important and crucial for the main topic of the paper. 
  • The manuscript is scientifically sound, however the original contribution of the paper should be better emphasized, in the context of other research and approaches - review literature should be rebuild - according the comments in the attached PDF file. 
  • The manuscript’s results reproducible based on the details given in the methods section - the method is well presented, supported by the results. 
  • Figures/tables/images/schemes are appropriate and well prepared. They properly show the data and they are easy to interpret and understand. However, the references to the figures should be rebuild - according to the comments in the PDF attached file. All the data is interpreted appropriately and consistently throughout the manuscript. 
  • The conclusions are consistent with the evidence and arguments presented in the paper. Future works should be added - according to the comments provided in the attached PDF file. 

Comments for author File: Comments.pdf

Author Response

Thank you very much for taking the time to review this manuscript. We appreciate your detailed feedback and constructive suggestions. Please find the detailed responses below and the corresponding revisions/corrections highlighted/in track changes in the re-submitted files.

 

Comments 1: The manuscript is clear, relevant for the field. It is suggested to change its structure - sections 2 and 3 - see comments in the attached PDF file.  

Response 1: Thank you for your positive feedback regarding the clarity and relevance of the manuscript. We appreciate your suggestion to revise the structure. We have reviewed your detailed comments in the attached PDF file and have made the recommended changes to these sections. After carefully considering whether to merge sections 2 and 3 or to revise section 2 individually, we have decided to proceed with the revision of section 2 to improve its depth and organization. We appreciate your input, which has been invaluable in improving the clarity and organization of the manuscript.  If you have any further comments or require additional modifications, please let us know.

(Page 2, Section 2, Line 67-154) 2. Literature review

 

Comments 2: The cited references are well selected and mostly to recent publications (within the last 5 years). Some of them are older - but important and crucial for the main topic of the paper.

Response 2: Thank you for your valuable feedback regarding the selection of references. As you noted, the majority of the references are from recent publications, while some older references were included due to their foundational importance to the main topic. These references were carefully chosen to provide essential background and context, which we believe is crucial for supporting the paper's argument. We appreciate your comments and will consider this aspect as we refine the manuscript further.

 

Comments 3: The manuscript is scientifically sound, however the original contribution of the paper should be better emphasized, in the context of other research and approaches - review literature should be rebuild - according the comments in the attached PDF file. 

Response 3: Thank you for acknowledging the scientific soundness of the manuscript. We appreciate your suggestion to better emphasize the original contribution of the paper within the context of existing research and approaches. In response to your feedback, we have revised Section 2 to rebuild the literature review. This updated section now provides a more comprehensive analysis of the importance of models, preprocessing, and the use of XAI in time series prediction. We have highlighted that previous studies typically focused individually on energy prediction, data preprocessing, data analysis, and XAI. In contrast, our research proposes an integrated architecture for energy prediction that combines data preprocessing, analysis, and XAI, with a particular emphasis on preprocessing. We believe that this revised literature review more clearly demonstrates the novel contributions of our study. We believe this revised literature review more clearly demonstrates the novel contributions of our work. Thank you again for your valuable feedback.

(Page 2, Section 2, Line 67-154) 2. Literature review

 

Comments 4: The manuscript’s results reproducible based on the details given in the methods section - the method is well presented, supported by the results.

Response 4: Thank you for your positive feedback regarding the methods and results sections. We are pleased that you found the methodology well presented and that the results are reproducible based on the details provided. Ensuring that our methods are clear and replicable is a priority for us, and we appreciate your recognition of this aspect. We will continue to ensure that the clarity and transparency of our methods support the reliability of the results throughout the manuscript.

 

Comments 5: Figures/tables/images/schemes are appropriate and well prepared. They properly show the data and they are easy to interpret and understand. However, the references to the figures should be rebuild - according to the comments in the PDF attached file. All the data is interpreted appropriately and consistently throughout the manuscript.

Response 5: Thank you for your positive feedback on the figures/tables/images/schemes. We appreciate your comments regarding the references to the figures. In line with your recommendation, we have revised the text to provide reference before the figure and table, following the guidelines mentioned in the attached PDF. We believe these changes improve the flow and clarity of the manuscript. We sincerely appreciate your thoughtful suggestions.

 

Comments 6: The conclusions are consistent with the evidence and arguments presented in the paper. Future works should be added - according to the comments provided in the attached PDF file. The Conclusions should be extended with future works proposed by the Authors

Response 6: Thank you for your valuable feedback regarding the conclusions of the manuscript. We appreciate your recognition that the conclusions are consistent with the evidence and arguments presented. In response to your suggestion, we have added a section on future works to the conclusion. The revised section is as follows:

(Page 24, Section 5, Paragraph 3, Line 707) Future research will focus on extending the proposed architecture to more advanced models, such as hybrid deep learning frameworks, and exploring its application in different geographical regions and energy systems. In addition, incorporating real-time data from IoT sensors and investigating the integration of renewable energy sources in energy forecasting will be key areas for further study.

We believe these additions enhance the depth of the conclusion and align the manuscript with your suggestions. Thank you again for your insightful contributions.

 

Comments 7: original contribution? show that this approach is unique, new, more efficient etc. (attached PDF file) here original contribution is clear. Show it - in one sentence in the Abstract as well. (line 45-50)

Response 7: Thank you for your feedback regarding the original contribution of the manuscript. The proposed architecture in enhancing predictive performance through improved preprocessing and model optimization. Additionally, it demonstrates the potential for extending the architecture to various energy prediction domains, beyond heat energy, thereby contributing to more efficient energy use and cost reduction. In response to your comments, we have added a concise summary of the contribution to clearly highlight the uniqueness, novelty, and efficiency of our approach.

(Page 1, Abstract, Line 4) Therefore, this study presents a new architecture that highlights the critical role of preprocessing in improving predictive performance and demonstrates its scalability across various energy domains.

We believe this addition addresses your concerns and clearly demonstrates the value of our work. Thank you once again for your valuable insights.

 

Comments 8: Add here paper structure. Short info about topics in the following sections shoud be provided at the end of the Introduction. (attached PDF file)

Response 8: Thank you for your valuable feedback. We acknowledge the suggestion to include an overview of the paper's structure at the end of the Introduction. To address this, we have added a summary outlining the topics covered in the subsequent sections of the manuscript. This addition aims to provide readers with a clearer roadmap of the paper's organization. We appreciate your contribution to enhancing the clarity and coherence of our manuscript.

(Page 2, Section 1, Paragraph 6, Line 60) The structure of this paper is as follows: Section 2 reviews the literature on the increasing importance of efficient energy utilization and advancements in energy prediction research. Section 3 provides an overview of the proposed architecture for energy prediction, with a focus on time series data processing. Section 4 presents empirical results, summarizing the data used in experiments with heat and electric energy sources. Finally, Section 5 concludes the paper by summarizing the proposed time series prediction architecture and providing insights into its implications and future work.

 

Comments 9: Very good analysis. Therefore, consider suggestion mentioned in the comment at the beginning of this section 3. (line 146-158, 174-199)

Response 9: Thank you for your positive feedback on the analysis. We have reviewed and implemented the necessary revisions based on your suggestions for the beginning of Section 3. We appreciate your guidance and believe that these changes contribute to the overall quality of the manuscript. We have revised section 3 to incorporate your suggestions.

(Page 4, Section 3, Paragraph 1, Line 160) In the process optimization methods stage, data is optimized based on three conditions that consider the patterns and characteristics of time series. These conditions include normalization, data cleaning, data split patterns, data split ratio to enhance model performance, reduce training time, and mitigate the impact of outliers. Min-max normalization is applied to address the varying range of feature values, ensuring consistent data scaling. Additionally, eight data cleaning methods are considered to resolve issues such as missing values and duplication. The data split patterns are designed to reflect seasonal and quarterly trends inherent in time series data, while the data split ratios are set to capture the overall trends over time, ensuring a robust and accurate model.

 

Comments 10: This structure does not make sense. Two headers one-by-one (it is not recommended) without text between and below and then figure with reference a few lines below. It should be rebuild according the rules and authors recommendations. (attached PDF file, 4.1-4.1.1, 4.2-4.2.1)

Response 10: Thank you for your feedback regarding the structure of the manuscript. To address your concerns, we have revised the layout to improve clarity and compliance with the recommended formatting rules. Specifically, we have added explanatory text between the headers to avoid the issue of having two headers one-by-one without intervening content. Additionally, we have adjusted the placement of the figure and its reference to ensure proper integration within the text. We appreciate your suggestions and believe these revisions enhance the manuscript's readability.

(Page 5, Section 3.1, Paragraph 1, Line 180) This section describes the data preparation process in terms of data collection and time correlation. First, it discusses the data collection for the experiment. Subsequently, it examines time correlation to analyze the impact of time on the data.

(Page 11, Section 4.1, Paragraph 1, Line 403) This section presents the experiments conducted using the proposed architecture with heat energy data. Section 4.1.1 provides a description of the data utilized in the experiments. section 4.1.2 outlines the process optimization methods, as described in section 3.2. Finally, section 4.1.3 compares and evaluates five different models based on the results obtained in section 4.1.2, across various scenarios.

(Page 17, Section 4.1, Paragraph 1, Line 511) This section details the experiments conducted using the proposed architecture with both district heat energy and electric energy to evaluate its scalability across different domains. Section 4.2.1 provides an overview of the data used in these experiments, whereas section 4.2.2 details the process optimization techniques described in section 3.2. Finally, section 4.2.3 delivers a comparative evaluation of five distinct models based on the outcomes from section 4.2.2, across different scenarios.

 

Comments 11: “The collected data was optimized using the defined process optimization methods” it is not clear where defined. Provide references to the previous section/subsection/literature.

Response 11: Thank you for your comment regarding the clarity of the optimization methods. To clarify, we have revised the statement as follows:

(Page 13, Section 4.1.2, Paragraph 1, Line 436) The dataset of heat energy was optimized using the process optimization methods described in section 3.2.

This provides a direct reference to the section where the optimization methods are explained in detail. We appreciate your feedback and hope this resolves the issue. Please let us know if further clarification is needed.

 

Comments 12: Please consider the possibility of limiting personal forms in sentences. Instead of "we have done" or "we optimized" you can use "The authors propose/have proposed" or "The data are optimized by...". It is recommended to minimize the use of personal phrases in scientific articles. Check this question within the whole paper and change sentences appropriately. (attached PDF file)

Response 12: Thank you for your valuable suggestion regarding the use of personal forms in the manuscript. We appreciate your recommendation to use more objective language. In response, we have thoroughly reviewed the entire paper and replaced personal phrases with more formal expressions. We believe these changes enhance the objectivity of the manuscript and align with standard scientific writing practices. Thank you again for your helpful feedback.

(Page 2, Section1, Paragraph 3, Line 37) A deep learning-based architecture is proposed that emphasizes preprocessing techniques to effectively harness time series data and address these challenges. Effective preprocessing methods are first focused on, accounting for the complex patterns and characteristics of time series data. The data is then optimized through various scenarios, considering the inherent characteristics of time series, to enhance the performance of predictive models. The generality of the proposed preprocessing methodology is validated by applying it to various machine learning (ML) algorithms and energy types. Performance comparisons and analyses of multiple machine learning models using the optimized data are conducted. Furthermore, the same methodology is applied to both thermal and electrical energy, confirming its applicability to various energy types.

(Page 2, Section1, Paragraph 5, Line 58) It is anticipated that the study will significantly enhance energy prediction capabilities, thereby contributing to greater energy efficiency and cost reduction.

(Page 4, Section 3, Paragraph 2, Line 170) Model optimization was performed using various methods to enhance performance. In this study, the data is optimized through enhanced preprocessing and cleaning of time-series energy data, along with feature engineering using Shapley Additive Explanations (SHAP) [28], to achieve optimal performance.

(Page 7, Section 3.2, Paragraph 1, Line 226) Analyzing the data to uncover these relationships and incorporating them into the model can enhance prediction accuracy [30]. 

(Page 7, Section 3.2, Paragraph 2, Line 243) The MinMaxScaler function from the Scikit-Learn library [36] was employed to normalize the dataset features within the 0 to 1 range during both the training and testing phases of the model, following recommendations and findings from recent studies.

(Page 7, Section 3.2.1, Paragraph 1, Line 253) An appropriate method is selected by considering eight missing value processing techniques.

(Page 8, Section 3.2.1, Paragraph 4, Line 286) Among the eight preprocessing methods described, the most suitable method was selected based on its ability to enhance predictive performance while maintaining data integrity.

(Page 8, Section 3.2.2, Paragraph 2, Line 306) Considering data split patterns in the way as above helps avoid being bound by the seasonal characteristics of the data.

(Page 10, Section 3.3, Paragraph 1, Line 349) In this experiment, five models—XGBoost, LightGBM, CatBoost, MLP, and LSTM—were adopted to validate the effectiveness of the proposed method on time series energy data.

(Page 10, Section 3.3, Paragraph 1, Line 365) By leveraging these proven techniques, comprehensive and reliable predictions were ensured in the study.

(Page 11, Section 3.3, Paragraph 3, Line 383) Five predictive models and four evaluation metrics are used to check their performance on optimization from a data perspective. And a model demonstrating evenly high performance across the four evaluation indicators is selected. Finally, the application of XAI is discussed to enhance reliability by showing the analyzed patterns and which input variables influenced the AI model's results.

(Page 11, Section 3.3, Paragraph 3, Line 388) SHAP [28], one of the XAI models, is used in this study.

(Page 11, Section 4, Paragraph 1, Line 397) Experiments were conducted on the proposed architecture using district heat energy for energy prediction.

(Page 12, Section 4.1.1, Paragraph 2, Line 416) To understand the periodicity of energy usage and its correlation with the outdoor temperature, one of the key variables, the monthly average energy usage and monthly average temperature are visualized in Figure 7.

(Page 13, Section 4.1.2, Paragraph 1, Line 437) Under the proposed architecture, 1,440 scenarios were generated using eight data cleaning methods, four data split patterns, and 45 different combinations of data split ratios and years.

(Page 17, Section 4.2.1, Paragraph 1, Line 517) In addition to heat energy, a dataset from a power generation unit on Jeju Island, Republic of Korea, was used to predict actual electric energy usage.

(Page 19, Section 4.2.2, Paragraph 3, Line 550) Then, the data cleaning was grouped, and the distribution of high-performance data split patterns was examined to select the data split patterns.

(Page 19, Section 4.2.2, Paragraph 4, Line 553) Finally, the data split ratio was determined by grouping based on the previously selected dataset3 and 12 month.

(Page 22, Section 4.3, Paragraph 2, Line 609) This section describes the input variables that influenced the model's results through the application of XAI.

(Page 23, Section 4.3, Paragraph 6, Line 638) Elucidating these influential factors through XAI clarifies how the model makes predictions and why.

(Page 23, Section 4.3, Paragraph 8, Line 655) In this study, preprocessing of time series energy data and the use of SHAP were employed to develop a robust, generalizable model.

(Page 24, Section 4.3, Paragraph 7, Line 663) The approach proposed by the authors emphasizes enhancing the model's generalizability through comprehensive experimental design and robust preprocessing techniques. Methods such as MinMaxScaler for normalization, advanced imputation strategies for handling missing data, and careful feature engineering were incorporated to ensure that the model could effectively adapt to various time series datasets.

(Page 24, Section 4.3, Paragraph 7, Line 669) Furthermore, the model was evaluated using a diverse set of prediction algorithms, including XGBoost, LightGBM, CatBoost, MLP, and LSTM.

(Page 24, Section 4.3, Paragraph 7, Line 678) However, certain limitations, such as sensitivity to extreme outliers and domain-specific nuances, are acknowledged and may require further attention in future research.

(Page 24, Section 5, Paragraph 1, Line 688) This paper focuses on exploring effective preprocessing techniques and proposes a time series prediction architecture based on these techniques.

(Page 24, Section 5, Paragraph 2, Line 698) Evaluation of five different prediction models confirms that the proposed preprocessing method enhances predictive results, regardless of the model employed.

(Page 24, Section 5, Paragraph 2, Line 701) Furthermore, electric energy prediction experiments were performed to confirm the improvement in prediction performance and to show the scalability to other types of energy prediction by using the same input variables as heat energy.

(Page 24, Section 5, Paragraph 3, Line 704) An enhanced energy prediction method enables the determination of required production volumes in advance and the selection of appropriate low-carbon energy production methods for supply.

 

Reviewer 2 Report

Comments and Suggestions for Authors

1)      Clarify the use of explainable artificial intelligence

2)      Literature review is too small and more than two references are used at one time

3)      “Table 1 is an input variable used …” – Clarify

4)      Why different approaches for time synchronization? Different time points for input and output variables, why?

5)      Figure 4 – MAPE, etc. not defined

6)      Table 3, Table 10 – percentage of what?- clarify

7)      Table 4 MAPE (%) – 6.194e12?

8)      Table 5 – Count=45?

9)      Table 6 compares 10 years of training data with an 83.33:16.67 split ratio, using 5 years  for training – clarify

10)   Line 469 – table 5?

11)   It would be interesting to have plots with real and predict values

Comments on the Quality of English Language

Minor editing required.

Author Response

Thank you very much for taking the time to review this manuscript. We appreciate your questions and suggestions. Please find the detailed responses below and the corresponding revisions/corrections highlighted/in track changes in the re-submitted files.

 

Comments 1: Clarify the use of explainable artificial intelligence

Response 1: Thank you for your feedback. We appreciate the time you’ve taken to review our work and provide your comments. Explainable artificial intelligence (XAI) is employed in our study to clarify how various input variables impact the results of our energy consumption models. XAI techniques are instrumental in understanding the relationships between the data features and the model's predictions, thereby enhancing the transparency and interpretability of the model's decisions. In response to your comments, we have updated sections 1 to provide a clearer explanation of how XAI is utilized in our study. More in details at section 4.3. These additions aim to better articulate the role of XAI in elucidating the influence of different variables on our model's outcomes. Thank you once again for your valuable feedback.

(Page 2, Section 1, Paragraph 3, Line 46) In the final analysis, explainable artificial intelligence (XAI) is employed to identify variables that affect energy prediction and to elucidate their contributions. Using the XAI methodology, the primary factors influencing energy output were identified and analyzed.

 

Comments 2: Literature review is too small and more than two references are used at one time.

Response 2: Thank you for your feedback regarding the literature review and references. We appreciate your suggestion to extend this section. In response, we have extended the literature review by adding additional content and incorporating more citations to provide a more comprehensive overview of the relevant research. This updated section now provides a more comprehensive analysis of the importance of models, preprocessing, and the use of XAI in time series prediction. We have highlighted that previous studies typically focused individually on energy prediction, data preprocessing, data analysis, and XAI. In contrast, our research proposes an integrated architecture for energy prediction that combines data preprocessing, analysis, and XAI, with a particular emphasis on preprocessing. We believe these enhancements strengthen the manuscript and provide a more thorough background for our research. If you have any further recommendations or questions, please let us know. Thank you again for your valuable input.

(Page 2, Section 2, Line 67-154) 2. Literature review

 

Comments 3: “Table 1 is an input variable used …” – Clarify

Response 3: Thank you for your comment regarding Table 1. To clarify, we have revised the description as follows:

(Page 5, Section 3.1.1, Paragraph 2, Line 189) Table 1 presents the input variables used for energy prediction, selected based on their relevance to energy consumption.

This revision ensures the purpose of Table 1 is more clearly communicated. We appreciate your feedback and hope this clarification resolves any confusion. Please let us know if further clarification is needed.

 

Comments 4: Why different approaches for time synchronization? Different time points for input and output variables, why?

Response 4: Thank you for your question regarding the time synchronization approaches and the differing time points for input and output variables. The varying time points for input and output variables are indicative of the complex relationships we aim to model. In time series forecasting, understanding how historical data (inputs) relates to future data (outputs) is crucial. By analyzing these relationships, we can better align the temporal aspects of our data to improve prediction accuracy. In our study, we applied different approaches to effectively capture these time correlations. We have revised our terminology to use "time correlation" to more accurately describe this analysis. This term better reflects the process of understanding how past data points are related to future outcomes and ensures clearer communication of our methodology. We appreciate your feedback and hope this explanation clarifies our approach. If you have any further questions or need additional information, please let us know.

(Page 6, Section 3.1.2, Line 202) Time correlation

(Page 6, Section 3.1.2, Line 219) In this paper, data is collected by time for energy prediction, and the first approach used as a method for time correlation.

 

Comments 5: Figure 4 – MAPE, etc. not defined

Response 5: Thank you for pointing out the need for definitions of the metrics used in Figure 4. We have updated the figure’s description to explicitly include the performance metrics shown: R2 score, MAE, RMSE, and MAPE. We appreciate your feedback and hope these updates clarify the information presented in Figure 4. If you have any further questions or need additional details, please let us know.

(Section 3.2.3, Paragraph 1) Figure 4 illustrates how the performance of the model trained on the heat energy dataset from 2012 to 2016 changed over time. To evaluate performance, the R-squared score (R2 score), mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE) were used. Detailed information about these metrics can be found in section 3.3. The figure shows a consistently decreasing trend in all performance metrics, except for 2019. Additionally, it confirms that the performance after 2 years does not exceed the performance immediately after training.

 

Comments 6: Table 3, Table 10 – percentage of what?- clarify

Response 6: Thank you for your comment regarding Tables 3 and 10. To clarify, the percentages shown in these tables represent the "Percentage of count [%]." This means that the percentages are calculated relative to the total value for the respective categories. Each individual count or value is expressed as a percentage of the category total. We have updated the table to ensure this is clearly communicated. If you have any further questions or require additional clarification, please let us know. We appreciate your attention to this detail.

(Page 14, Section 4.1.2, Table3) Percentage of count [%]

(Page 19, Section 4.2.2, Table10) Percentage of count [%]

 

Comments 7: Table 4 MAPE (%) – 6.194e12?

Response 7: Thank you for highlighting the unusual results for MAPE (mean absolute percentage error) in Table 4. To ensure the accuracy of our findings, we have re-examined the data and methodology used to calculate MAPE. We have corrected any anomalies identified in the MAPE values and reviewed other tables to check for similar issues. All necessary revisions have been made to address these concerns. We appreciate your feedback, which has been instrumental in improving the accuracy and reliability of our results.

(Page 15, Section 4.1.2, Table4) MAPE (max) 27.93 (mean) 15.21

(Page 15, Section 4.1.2, Table5) MAPE (max) 19.33 (mean) 14.10 (min) 12.07

 

Comments 8: Table 5 – Count=45?

Response 8: Thank you for your question regarding the count of 45 in Table 5. The table presents a summary of performance for a subset of cases derived from an initial pool of 1440 scenarios. Through the process optimization, which involved data cleaning and applying a specific data split pattern, the number of scenarios was reduced to 45. The title of Table 5, "A Set of Cases Performance Summary for Heat Energy Dataset Which Satisfied Selected Data Cleaning and Data Split Pattern," reflects this process. The count of 45 represents the cases that met the criteria after these preprocessing steps were applied. We hope this explanation clarifies the derivation of the count in Table 5. If you have any further questions or need additional details, please let us know. Thank you for your attention to this matter.

 

Comments 9: Table 6 compares 10 years of training data with an 83.33:16.67 split ratio, using 5 years for training – clarify

Response 9: Thank you for your comment regarding Table 6. To clarify the content, we have revised the relevant sentence and updated the table title for better understanding. The sentence in section 4.1.2, Paragraph 6 has been updated to: "Table 6 presents the results of training the LightGBM model on heat energy using two different time periods: 5 years and 10 years." Additionally, the title of Table 6 has been updated to: "Comparison of LightGBM Training Results Using 5 Years of Heat Energy Data and 10 Years of Heat Energy Data." These changes aim to provide a clearer explanation of the data split and training periods used in the analysis. We appreciate your feedback in helping us enhance the clarity of our manuscript. If you have any further questions or need additional information, please let us know.

(Page 15, Section 4.1.2, Paragraph 6, Line 488) Table 6 presents the results of training the LightGBM model on heat energy using two different time periods: 5 years and 10 years.

(Page 15, Section 4.1.2, Table6) Comparison of LightGBM Training Results Using 5 Years of Heat Energy Data and 10 Years of Heat Energy Data

 

Comments 10: Line 469 – table 5?

Response 10: Thank you for pointing out the incorrect reference to Table 5 in Line 469. We have reviewed the manuscript and corrected the reference to ensure it accurately matches the intended content. The revised text now properly reflects the correct table and information (Section 4.2.2, Paragraph 5, line 541). We appreciate your attention to this detail and your help in improving the accuracy of our manuscript. If you have any further questions or need additional clarification, please let us know.

(Page 19, Section 4.2.2, Paragraph 5, Line 571) Table 12 shows the maximum, minimum, and average values of the performance of each evaluation index for cases that satisfy the data cleaning method and the data split pattern method selected through a proposed method that evenly considers the used evaluation indicators. While the highest performance values for each indicator in Table 12 may differ from those in Table 11, a comparison of the average values shows overall improved performance.

 

Comments 11: It would be interesting to have plots with real and predict values

Response 11: Thank you for your suggestion to include plots with real and predicted values. In response, we have added the requested visualizations to the manuscript. Specifically, Figure 9 displays the heat energy prediction results for Scenario C, with LightGBM predictions compared to the actual test data. Similarly, Figure 13 illustrates the electric energy prediction results for Scenario I, with LightGBM predictions compared to observed values. We hope these additions enhance the clarity of our results. If you have any further suggestions or require additional modifications, please let us know.

(Page 16, Section 4.1.3, Paragraph 3, Line 506) Figure 9 shows the prediction results of LightGBM on the test data. The green solid line represents the predicted values, while the red solid line denotes the real values. The x-axis and y-axis represent dates of test data and the heat energy usage [Gcal], respectively.

(Page 16, Section 4.1.3, Figure 9) Heat energy prediction results for scenario C based on LightGBM predictions using test data.

(Page 21, Section 4.2.3, Paragraph 3, Line 598) Figure 13 illustrates the electric energy prediction results for scenario I using LightGBM on test data. The green solid line denotes the predicted values, whereas the red solid line represents the observed values. The x-axis displays the dates of the test data, and the y-axis represents the electric energy usage [MWh].

(Page 22, Section 4.2.3, Figure 13) Electric energy prediction results for scenario I based on LightGBM predictions using test data.

 

Reviewer 3 Report

Comments and Suggestions for Authors

The article descibes data processing and machine learning techniques for

energy consumption forecasting. The topic is important.

 

Below are the main comments on the reviewed article.

- Abstract: phrase "predictive energy collection" is unclear, please correct it.

- Literature review is too short in my opinion, please extend this point (add minimum half a page) and add more citations

- please add the list of abbreviations

- Figure 1. Where did the set of predictive models come from, and is this the complete list? Some belong to the same category, GBDT.

- Table 1. Meteorological factors: historical values or forecasts?

- line 144, Catboost is faster in many experiments in regression tasks (i see is included but the time is longer than XGBOOST it is a little odd – Table 8)

- Table 4. Very strange results for MAPE (max. mean)

- Table 14 differences in time for ML is odd, very big for LSTM and MLP, it is a little strange but the results for LSTM i MLP are very good (it is normal), maybe too shallow architecture for GBDT techniques-short time of calculation, to small number of trees?

 

The entire study looks interesting and has novel aspects. After considering the comments, I believe the article is valuable.

Author Response

Thank you very much for taking the time to review this manuscript. We are grateful for your positive feedback regarding the topic of our article. We also appreciate the comments you provided, which have been invaluable in refining our work. Please find the detailed responses below and the corresponding revisions/corrections highlighted/in track changes in the re-submitted files.

 

Comments 1: Abstract: phrase "predictive energy collection" is unclear, please correct it.

Response 1: Thank you for pointing out the unclear phrase "predictive energy collection" in the abstract. We appreciate your attention to this detail. The term was intended to describe the process of collecting time-series data for energy prediction, but we acknowledge that it was not sufficiently clear in the context. To improve clarity, we have revised the abstract by removing the ambiguous phrase and adjusting the sentence to better convey the intended meaning. Your feedback has been invaluable in enhancing the clarity of our manuscript.

(Page 1, Abstract, Line 3) In order to achieve carbon neutrality and enhance energy efficiency through a stable energy supply, it is necessary to pursue the development of innovative architectures designed to optimize and analyze time series data.

 

Comments 2: Literature review is too short in my opinion, please extend this point (add minimum half a page) and add more citations

Response 2: Thank you for your feedback regarding the literature review and references. We appreciate your suggestion to extend this section. In response, we have extended the literature review by adding additional content and incorporating more citations to provide a more comprehensive overview of the relevant research. This updated section now provides a more comprehensive analysis of the importance of models, preprocessing, and the use of XAI in time series prediction. We have highlighted that previous studies typically focused individually on energy prediction, data preprocessing, data analysis, and XAI. In contrast, our research proposes an integrated architecture for energy prediction that combines data preprocessing, analysis, and XAI, with a particular emphasis on preprocessing. We believe these enhancements strengthen the manuscript and provide a more thorough background for our research. If you have any further recommendations or questions, please let us know. Thank you again for your valuable input.

(Page 2, Section 2, Line 67-154) 2. Literature review

 

Comments 3: please add the list of abbreviations

Response 3: Thank you for your suggestion regarding the inclusion of a list of abbreviations. We have added an "Abbreviations" section to the manuscript as recommended. This section provides a comprehensive list of all abbreviations used throughout the paper, ensuring clarity and ease of understanding for the readers. We appreciate your valuable feedback and believe this addition enhances the readability of the manuscript. If you have any further suggestions or questions, please feel free to let us know.

(Page 25, Abbreviations, Line 723) Abbreviations

 

Comments 4: Figure 1. Where did the set of predictive models come from, and is this the complete list? Some belong to the same category, GBDT.

Response 4: Thank you for your question regarding the models presented in Figure 1. The set of predictive models shown includes five models that were specifically selected for our experiments. Each model was chosen based on its relevance and performance in the context of our study. Regarding your observation about the GBDT (Gradient Boosting Decision Trees) models, while some models fall under the GBDT category, we do not consider them as part of a uniform category for this analysis. Instead, we view each GBDT model as having distinct characteristics and implementation details that justify their individual inclusion. This approach allows us to evaluate a range of techniques within the broader category of boosting methods, highlighting differences in performance and applicability. If you have any further questions or require additional clarification, please let us know. Thank you again for your insightful feedback.

 

Comments 5: Table 1. Meteorological factors: historical values or forecasts?

Response 5: Thank you for your inquiry regarding Table 1. The meteorological factors listed are historical values, not forecasts. Using historical data was essential for our analysis to ensure the accuracy and reliability of our results. We appreciate your attention to this detail and your contribution to enhancing the clarity of the manuscript. If you have any further questions or require additional information, please feel free to ask.

 

Comments 6: line 144, Catboost is faster in many experiments in regression tasks (i see is included but the time is longer than XGBOOST it is a little odd – Table 8)

Response 6: Thank you for your comment regarding the performance of the model as noted in line 144 and Table 8. As indicated in our manuscript, we observed from previous research and our experiments that LightGBM consistently demonstrated the fastest computation times. Consequently, we employed LightGBM in our experiments. The results in Table 8 reflect this, with LightGBM showing the fastest speed, followed by XGBoost and then CatBoost. While CatBoost is indeed known for its efficiency and often performs well in various settings, the results from our experiments showed a different performance pattern. This discrepancy may be attributed to specific settings or conditions in our experimental setup. We appreciate your observation and will ensure that these results are accurately interpreted in the context of our study. Thank you again for your valuable feedback.

 

Comments 7: Table 4. Very strange results for MAPE (max. mean)

Response 7: Thank you for highlighting the unusual results for MAPE (mean absolute percentage error) in Table 4. To ensure the accuracy of our findings, we have re-examined the data and methodology used to calculate MAPE. We have corrected any anomalies identified in the MAPE values and reviewed other tables to check for similar issues. All necessary revisions have been made to address these concerns. We appreciate your feedback, which has been instrumental in improving the accuracy and reliability of our results.

(Page 15, Section 4.1.2, Table4) MAPE (max) 27.93 (mean) 15.21

(Page 15, Section 4.1.2, Table5) MAPE (max) 19.33 (mean) 14.10 (min) 12.07

 

Comments 8: Table 14 differences in time for ML is odd, very big for LSTM and MLP, it is a little strange but the results for LSTM i MLP are very good (it is normal), maybe too shallow architecture for GBDT techniques-short time of calculation, to small number of trees?

Response 8: Thank you for your detailed feedback regarding Table 14 and the time differences for machine learning models. We appreciate your observations about the LSTM and MLP results being quite good, as well as your concern about the performance of the GBDT techniques. Regarding your comment on the GBDT techniques, the relatively short computation time is indeed noticeable. However, we used a sufficiently large number of trees to ensure adequate performance. The architecture was optimized to balance efficiency and effectiveness, and while the computation time might appear short, it aligns with the practical constraints and goals of our study. We will review the results and methodology to ensure that all aspects are appropriately addressed. Thank you again for your valuable insights and contributions to improving our manuscript.

 

Comments 9: The entire study looks interesting and has novel aspects. After considering the comments, I believe the article is valuable.

Response 9: Thank you for your positive feedback and for recognizing the novel aspects of the study. We appreciate your thorough review and the acknowledgment of the article’s value. Your comments have been valuable in refining the manuscript and enhancing its overall contribution. Thank you again for your support and constructive feedback.

Back to TopTop