Electrical Load Forecast by Means of LSTM: The Impact of Data Quality
Round 1
Reviewer 1 Report
This paper deals with the impact of data quality on the load forecast accuracy.
I think the paper is well-written and well-structured. The objectives are clear and the experiment is well-designed. The work is concise and, I think is of interest for anyone interested in forecasting an energy time series. Although the topic of load forecasting has a lot of references and an ample literature, the topic of how to clean the data before doing the forecasting exercise is very scarce. I think the authors propose a sound methodology with promising results that can be helpful for the journal audience.
I only have a minor concern and it is about Figure 6, since that graph compares days of different seasons and I'm not sure whether that is a fair comparison.
Therefore I recommend the publication for being published.
Author Response
Reviewer#1, Concern # 1:
I only have a minor concern and it is about Figure 6, since that graph compares days of different seasons and I'm not sure whether that is a fair comparison.
Author response: Thank you for pointing out this critical aspect
Author action: The image was removed since the case study was significantly revised and it was considered no longer useful.
Author Response File: Author Response.docx
Reviewer 2 Report
The paper "Day-ahead load forecast by means of LSTM: the impact of data quality" proposes an evaluation of the impact of several methods of data cleaning on the accuracy of load forecasting for an industrial customer. The paper largely repeats what has already been written in the paper "Data quality analysis in day-ahead load forecast by means of LSTM" by the same authors, which deals with the same subject on the same dataset and with respect to which it essentially presents the introduction of a further method of data cleaning.
In addition to the modest innovation compared to what has already been published, the work has some weaknesses and some unclear points from the methodological point of view. In particular: the introduction is not sufficiently detailed, there are few references to other works on the same subject; the paragraph on the methodology is generic, the LSTM network is presented from a general point of view, but there is a lack of adequate information about its implementation in practice, and the description of the features used; there is no discussion of the results obtained compared to other works in the literature.
From a methodological point of view, it is not clear if the cleaning methodology has been applied also to the test portion of the dataset or only to the data used in the training. In the former case, it would be difficult to understand whether the results merely indicate the increased ease for a neural network in predicting a time series cleaned of outliers. A thorough discussion of the results should highlight where the claimed improvements were achieved. Applying a cleaning procedure also to the test set seems incorrect as it assumes the knowledge of such data to be able to do a cleaning a posteriori and does not lend itself to an operational prediction procedure.
I would also point out that:
lines 68-69, the statement needs supporting references
line 103, the dates reported do not appear correct
Author Response
Reviewer#2, Concern # 1:
The paper "Day-ahead load forecast by means of LSTM: the impact of data quality" proposes an evaluation of the impact of several methods of data cleaning on the accuracy of load forecasting for an industrial customer. The paper largely repeats what has already been written in the paper "Data quality analysis in day-ahead load forecast by means of LSTM" by the same authors, which deals with the same subject on the same dataset and with respect to which it essentially presents the introduction of a further method of data cleaning.
Author response: The reviewer point is indeed correct, as stated in the text and reported below, the current work is an extension of the previous one, though presenting a new methodology and a different case, vastly upgraded in the current version:
This paper extends the preliminary analysis provided in [10] providing more details on the motivations, deepening the investigation of the implemented techniques that are adopted and testing them against a manual recognition. Above all, however, the most important content of this extension is represented by the provided outliers’ detection study, enforced and supported by the analysis of real industrial loads case studies.
Author action: In order to further differentiate the two works, further analysis is presented. Moreover, the case study, upgraded in the newer version of the manuscript, in significantly different from the one presented in the other work due to the introduction of the Generalized ESD detection.
Reviewer#2, Concern # 2:
In addition to the modest innovation compared to what has already been published, the work has some weaknesses and some unclear points from the methodological point of view. In particular: the introduction is not sufficiently detailed, there are few references to other works on the same subject;
Author response: Thank you for the suggestion
Author action: the introduction was revised and expanded including the reviewer’s proposal.
Reviewer#2, Concern # 3:
The paragraph on the methodology is generic, the LSTM network is presented from a general point of view, but there is a lack of adequate information about its implementation in practice, and the description of the features used; there is no discussion of the results obtained compared to other works in the literature.
Author response: Thank you for pointing out this critical aspect.
Author action: a new section “Dataset aggregation and LSTM architecture” was created with further detail regarding the LSTM architecture.
Reviewer#2, Concern # 4:
From a methodological point of view, it is not clear if the cleaning methodology has been applied also to the test portion of the dataset or only to the data used in the training. In the former case, it would be difficult to understand whether the results merely indicate the increased ease for a neural network in predicting a time series cleaned of outliers. A thorough discussion of the results should highlight where the claimed improvements were achieved. Applying a cleaning procedure also to the test set seems incorrect as it assumes the knowledge of such data to be able to do a cleaning a posteriori and does not lend itself to an operational prediction procedure.
Author response: thank you for pointing out this fundamental aspect of our work. The cleaning process has been performed only on the training dataset, in order to allow a fair comparison on the results obtained on the test dataset.
Author action: A new table has been provided, namely Table2, that clearly highlights the dimensions of the training and test dataset. In addition, the topic has been further detailed in the text.
Reviewer#2, Concern # 5:
lines 68-69, the statement needs supporting references
Author response: thank you for the suggestion.
Author action: we have added further supporting references.
Reviewer#2, Concern # 6:
line 103, the dates reported do not appear correct
Author response: Thank you for pointing out the mistake.
Author action: February 2020 was substituted with February 2019
Author Response File: Author Response.docx
Round 2
Reviewer 2 Report
The authors responded promptly and clearly to the points raised. I suggest publication in the present revision