Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

A System for Sustainable Usage of Computing Resources Leveraging Deep Learning Predictions

Appl. Sci. 2022, 12(17), 8411; https://doi.org/10.3390/app12178411

by Marius Cioca¹

and Ioan Cristian Schuszter^2,3,*

Reviewer 1: Anonymous

Reviewer 2:

Wu Deng

Reviewer 3:

Mohd Haq

Appl. Sci. 2022, 12(17), 8411; https://doi.org/10.3390/app12178411

Submission received: 13 July 2022 / Revised: 16 August 2022 / Accepted: 20 August 2022 / Published: 23 August 2022

(This article belongs to the Collection Software Engineering: Computer Science and System)

Round 1

Reviewer 1 Report

This paper suggests using DL to properly estimate peaks in K8s infrastructure load thus allowing to properly (de)allocate pods to run services efficiently. While the topic itself is important and thoroughly studied, I did not think this paper brings any major contribution to the field. Let me explain why:

1) While the authors mention CERN production, it seems to me that the proposed system is not deployed practically. It is a prototype using some production log as an input. No real experiments have been performed that would measure the impact on utilization and power consumption.

2) No novel technique/algorithm has been developed. Instead, existing DL frameworks are "tested" using a very limited set of fairly predictable data.

3) In chapter 2, the authors point to weaknesses of previous studies, e.g., [11] is criticized for "smooth data", and [10] is criticized for "lack of dynamic aspect" as scaling decisions are taken every minute based on the limited past window. However, the data used in your paper does not look "noisy" to me. They seem very regular (roughly 2 peaks every hour). So where is the noise in your approach? Second, in your Conclusion, you are happy that "The algorithm requires a very small window of data". So it is either a "double standard" or you have not explained properly to the reader why RUBAS is bad for using small windows while you are good when doing the same thing. Similarly, RUBAS rescales every minute while you are collecting data (for forecasting) every 3 minutes (see page 5). Later you mention 5 minutes (see ratio). This must be explained, because it seems that you are even less "dynamic" compared to RUBAS.

4) On page 9, you discuss LSTRM and RNN but in both cases, you refer to Fig. 11 which is about CNN. This is probably a mistake.

6) The findings about convergence on page 11 (see line 299) seems trivial to me. In my view, this would happen to nearly every heuristic, because when the threshold increases the chance that the predictor will miss decreases, right? I mean with a higher "ratio" it is more likely that we will need to scale up.

7) Sentence "The ideal scaling system would" on page 11, line 313 has no end...

8) Sentence "As observed on the cumulative sum..." on page 12, line 319 has no end...

9) The font used in Fig.2 is not very readable

10) More care should be given to the readability of figures. For example, figs. 3-5 are too small, figs. 6-8 has a weird y-axis label and no legend (we have to guesstimate which color is "real" and which is "predicted").

11) Neither the simulation data nor the implementation details (e.g., source codes) are provided with the paper, thus limiting the impact/reproducibility of this publication seriously.

I believe this paper would fit to a workshop (after corrections) but it is not strong&novel enough to be published in a journal.

Author Response

Hello, thank you very much for your valuable review. We have added the comments to each of the points below (and the new version of the paper in the attachment):

yes, unfortunately the shortcoming is that the prediction outputs are currently not used to scale the production system, but the plan is to do so in the near future given the potential of the results. The mention of production merely refers to the production data used in the prediction.

2) No novel technique/algorithm has been developed. Instead, existing DL frameworks are "tested" using a very limited set of fairly predictable data.

while the algorithms implemented leverage existing methods, we believe the value is brought by the usage of real production data and the associated discussion. We addressed the comment about the predictability of the data due to the scheduled jobs that regularly cause spikes, more info below. Please check around lines 181

please check lines 181, we discuss the seasonality of those spikes you noticed. Additionally, we provided another sample image of the data, from a different time as oppoed to the first one, one that contains some irregular usage patterns as well.

Second, in your Conclusion, you are happy that "The algorithm requires a very small window of data". So it is either a "double standard" or you have not explained properly to the reader why RUBAS is bad for using small windows while you are good when doing the same thing. Similarly, RUBAS rescales every minute while you are collecting data (for forecasting) every 3 minutes (see page 5). Later you mention 5 minutes (see ratio). This must be explained, because it seems that you are even less "dynamic" compared to RUBAS.

-we tried to better describe what we mean. Please look around line 415. The point is being able to learn from new data and improve the predictions from the algorithm, something that RUBAS and other more static methods cannot achieve.

4) On page 9, you discuss LSTRM and RNN but in both cases, you refer to Fig. 11 which is about CNN. This is probably a mistake.
-- corrected, thank you

5) Section 4.1 is not properly explained and is hard to understand. The problem is magnified by the fact that the sentence "By computing some statistics..." has no end (see the missing text in line 275).
-- fixed and added some more information about the section, around and after line 310-320

actually when the ratio is higher, it means that "we will only scale up once the CPU usage goes over the threshold". For example, with a ratio vaulue of 0.5 (50%), the cumulative sum is around the 100, while with 0.4 it's around 150. This means that taking a higher threshold eliminates the intervals where the CPU usage is lower and the system doesn't need to stay scaled up. Ideally yes, like you mentioned, the higher the value the smaller the amount of misses that we get. However, one needs to weigh the real implications of this action, it would mean that we don't give the cluster more power until it's almost at full load.

7) Sentence "The ideal scaling system would" on page 11, line 313 has no end...
-- fixed, thank you

8) Sentence "As observed on the cumulative sum..." on page 12, line 319 has no end...
-- fixed and added more information, thank you

9) The font used in Fig.2 is not very readable
-- the figure was redone and the font enlarged

we redid the figures so that they would be more readable. For the "MAE and Loss" diagrams, you have the legends in the top-right of the image.

11) Neither the simulation data nor the implementation details (e.g., source codes) are provided with the paper, thus limiting the impact/reproducibility of this publication seriously.

unfortunately we couldn't disclose the data that we used in the experiments. We might reconsider this in the future and include the code and data with a future publication.

Author Response File: Author Response.pdf

Reviewer 2 Report

The authors proposed a system for predicting and influencing computer resource usage based on historical data of real production software systems at CERN, allowing us to scale down the number of machines or containers running a certain service during periods that have been identified as idle. The results look encouraging and motivating. But there are still some contents, which need be revised in order to meet the requirements of publish. A number of concerns listed as follows:

(1) The abstract should be improved. Your point is your own work that should be further highlighted.

(2) In the introduction, the authors should clearly indicate the contributions and innovations of this paper.

(3) The values of parameters could be a complicated problem itself, how the authors give the values of parameters.

(4) The method/approach in the context of the proposed work should be written in detail.

(1) In order to highlight the introduction, some latest references should be added to the paper for improving the reviews part and the connection with the literature. For example, https://doi.org/10.3390/agriculture12060793; https://doi.org/10.1007/s10489-022-03719-6 ; https://doi.org/10.1016/j.engappai.2022.105139；https://doi.org/10.1007/s10489-022-03719-6 ; https://doi.org/10.1016/j.isatra.2021.07.017 and so on.

(5) There are some grammatical mistakes and typo errors. please proof read from native speaker.

(6) The conclusion and motivation of the work should be added in a clearer way.

(7) In page one, Line 30, the first use of abbreviations should give the full name in whole paper.

(8) “When discussing resource usage and the carbon footprint of services,…”_>”when resource usage and the carbon footprint of services are discussed, …” .Please carefully check.

(9) More statistical methods are recommended to analyze the experimental results.

Author Response

Hello, thank you very much for your valuable review! We have just addressed your comments below (and the new version of the paper is attached):

(1) The abstract should be improved. Your point is your own work that should be further highlighted.
-- some extra lines about the work done and its significance have been added in the abstract

(2) In the introduction, the authors should clearly indicate the contributions and innovations of this paper.
-- the contributions are mentioned in section 2, lines 48 and below. Additionally there is new information provided from line 35 onwards.

(3) The values of parameters could be a complicated problem itself, how the authors give the values of parameters.
-- we discussed the parameters and the values in more detail around line 255

(4) The method/approach in the context of the proposed work should be written in detail.
-- line 63 and below contains new information on the methods and implementations used

(5) There are some grammatical mistakes and typo errors. please proof read from native speaker.
-- we have done another proof-reading of the document, we are confident that we solved most of the typos and incomplete sentences.

(6) The conclusion and motivation of the work should be added in a clearer way.
-- the motivation was refactored and expanded upon a bit in the introduction (lines 30+). The conclusion was extended as well, lines 373+

(7) In page one, Line 30, the first use of abbreviations should give the full name in whole paper.
-- the name was mentioned before the abbreviations, thank you

(8) “When discussing resource usage and the carbon footprint of services,…”_>”when resource usage and the carbon footprint of services are discussed, …” .Please carefully check.
-- we corrected it, thank you

(9) More statistical methods are recommended to analyze the experimental results.

-- we were thinking about adding something but the table seems to be sufficient to explain the performance of the model

Author Response File: Author Response.pdf

Reviewer 3 Report

I have now completed the review of the manuscript titled "A System for Sustainable Usage of Computing Resources Leveraging Deep Learning Predictions". The manuscript have developed a system for predicting and influencing computer resource usage based on historical data of real production software systems at CERN, allowing us to scale down the number of machines or containers running a certain service during periods that have been identified as idle I have some suggestions to further improve the quality of the manuscript. 1. The introduction section only contains focus, goal and novelty of the present investigation. Introduction section requires some relevant articles which used ML and DL for various applications [1-3], including forecasting. Please add these in the related work section. 2. Please provide the computational complexity of all the models, see CDLSTM, SMOTEDNN, DNNBOT, PCCNN etc. 3. What is the future scope of the proposed research, authors have described the limitations in good way, I suggest that these can be the future scope of the work. 4. These days ML and AI are utilized to solve other applications which are based on several parameters, I suggest to make a small paragraph which discusses the role of AI and ML methods, authors can use some of the references provided in the comments 1 and 2. 5. MAE, and loss are not enough for performance assessment, please add NSE, AUC etc. References 1. Deep Learning Based Modeling of Groundwater Storage Change 2. SMOTEDNN: A Novel Model for Air Pollution Forecasting and AQI Classification 3. CDLSTM: A Novel Model for Climate Change Forecasting

Author Response

Hello, thank you for your valuable review! We tried our best to answer to you and the answers are inlined below (and the revised paper attached):

1. The introduction section only contains focus, goal and novelty of the present investigation. Introduction section requires some relevant articles which used ML and DL for various applications [1-3], including forecasting. Please add these in the related work section.
-- we added some in the section 2.3, check lines 124 onwards

2. Please provide the computational complexity of all the models, see CDLSTM, SMOTEDNN, DNNBOT, PCCNN etc.
-- hello, these were already available in the first version of the paper, it's described in "3.3. Deep learning algorithms". Lines 226 to 244

3. What is the future scope of the proposed research, authors have described the limitations in good way, I suggest that these can be the future scope of the work.
-- we have expanded the conclusions, please check from line 388 onwards

4. These days ML and AI are utilized to solve other applications which are based on several parameters, I suggest to make a small paragraph which discusses the role of AI and ML methods, authors can use some of the references provided in the comments 1 and 2.
-- several things were added and discussed as well as citing some of the papers above. Please check the introduction and section 2


5. MAE, and loss are not enough for performance assessment, please add NSE, AUC etc.
-- we considered that the AUC wouldn't really work well given the fact that it is timeseries data, we would like to keep the current metrics as is. The table with the ratios / thresholds, serves as a good indicator on the performance of the timeseries forecasting.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

I appreciate that the authors did their best to fix the paper in the short time they have. However, I still don't find this paper convincing, because of the lack of "novelty". Let me explain why.

Unlike other tools that have been published and provided to the community (e.g., as a K*s component and/or on GitHub) your solution is not public, nor the data are available.
Moreover, your experiments only cover a single data set. If anyone else would like to use your solution, what proof he or she has that your solution will work on different data? He or she cannot even check the source codes or anything. I understand that you are not allowed to publish your data. So why not use different (public) data sets as well? There are plenty of them. This would make the paper much more convincing. Also, since the DNNs used in this paper are public, why not publish the experimental setup? So other people can just "plug in" their own data sets and see the results?

Also, when conducting experiments, it is important to show that you can beat other existing approaches that already deal with this problem. Clearly, you are aware of other "static" methods that solve similar problems. Are you better than them? If yes, show it. If not, then explain why your research is important despite this shortcoming. If you haven't tested those existing methods, then your methodology is not correct as you are providing "yet another tool" but you cannot rule out that other existing methods may be (much) better.

At least, you should add some very trivial heuristic to the comparison. For example: "if during previous 2 measurements the ratio increased by > 0.1 then scale up (and vice versa)". Something very simple that would mimic typical actions that are triggered in real resource managers when some system-level threshold is reached.

With current experiments, you are only showing that you are "better than if the system always runs at high CPU usage" (which is trivial to achieve), and "you converge to 'best case scenario' when your ratio is over 0.5" (this is nice, but somehow expected due to lower chance of false positives).

What is also missing is a thorough experimental description, e.g., how time-consuming is to run the RNN solution wrt. other available (static) solutions, etc.

Further comments:

Fig.1 and related discussion still does not demonstrate that your data are noisy. Why do you present a short "night smooth pattern" and not, e.g., a full 24h period so that the reader can better understand how your usage pattern changes over the day? You seem to have collected 20 days of data so it should be easy to show the full day (and/or compare day vs. night or work day vs. weekend, etc.).
As an example of a paper that does discuss the data reasonably, I suggest looking at other journal papers, e.g., this one: https://ieeexplore.ieee.org/document/9356582

Figures 6, 7, and 8 still have no legend (what is the orange and blue line?) and the y-axis has the label "value" which does not explain much.

Moreover, Figures 7 & 8 occasionally show that the orange line (predictor I guess?) predicts negative usage (below 0.0). This is clearly nonsense and indicates that the solution is not properly polished wrt. outliers (unexpected results).

line 365 "values in 4" -> in Table 4

line 421 "recurrent aspect of he implemented" he -> the

Author Response

Dear reviewer, thank you very much for your thorough revision. We have tried to take into consideration all your comments and remarks. Please find our answer below inline.

Unlike other tools that have been published and provided to the community (e.g., as a K*s component and/or on GitHub) your solution is not public, nor the data are available.
Moreover, your experiments only cover a single data set. If anyone else would like to use your solution, what proof he or she has that your solution will work on different data? He or she cannot even check the source codes or anything. I understand that you are not allowed to publish your data. So why not use different (public) data sets as well? There are plenty of them. This would make the paper much more convincing. Also, since the DNNs used in this paper are public, why not publish the experimental setup? So other people can just "plug in" their own data sets and see the results?

In the end we were able to provide the data as it doesn't contain anything that would need anonymizing. You may find the link in line 513 and below, with some description on how it should be parsed. Unfortunately, the code would still need a bit of polishing before it would get released, but we are envisioning that would be possible. Nevertheless, we provided extra details about the exact shape of the neural net architectures developed (lines 315 and onwards) so they should be easy to reproduce in code. Additionally, using other public data sets would be ideal, but for now we focused on the one collected by us.

Also, when conducting experiments, it is important to show that you can beat other existing approaches that already deal with this problem. Clearly, you are aware of other "static" methods that solve similar problems. Are you better than them? If yes, show it. If not, then explain why your research is important despite this shortcoming. If you haven't tested those existing methods, then your methodology is not correct as you are providing "yet another tool" but you cannot rule out that other existing methods may be (much) better.

This part has been overhauled and we have done a 2-fold improvement to the paper:

added a more detailed analysis section in the paper and provided both weekday and weekend views on the data, with discussion attached to it (lines 199 onwards)
added ARIMA as one of the models in the comparison, after performing a grid search to discover the best-performing parameters. We discuss the implementation of the algorithm as well as why the deep learning solutions would be more suited for the problem

At least, you should add some very trivial heuristic to the comparison. For example: "if during previous 2 measurements the ratio increased by > 0.1 then scale up (and vice versa)". Something very simple that would mimic typical actions that are triggered in real resource managers when some system-level threshold is reached.

With current experiments, you are only showing that you are "better than if the system always runs at high CPU usage" (which is trivial to achieve), and "you converge to 'best case scenario' when your ratio is over 0.5" (this is nice, but somehow expected due to lower chance of false positives).

Thank you for the very astute suggestion. We added a heuristic similar to what you suggested to the "cumulative sums" comparison chart and discussed the relevance of its result. You may find more information from line 425 onwards.

What is also missing is a thorough experimental description, e.g., how time-consuming is to run the RNN solution wrt. other available (static) solutions, etc.

We didn't perform a thorough comparison on the duration of training each of the solutions, but the grid search for the optimal ARIMA parameters was far slower than training any of the neural network models. It doesn't scale very well with large amounts of data, it was mentioned in the paragraphs that we added.

Fig.1 and related discussion still does not demonstrate that your data are noisy. Why do you present a short "night smooth pattern" and not, e.g., a full 24h period so that the reader can better understand how your usage pattern changes over the day? You seem to have collected 20 days of data so it should be easy to show the full day (and/or compare day vs. night or work day vs. weekend, etc.).
As an example of a paper that does discuss the data reasonably, I suggest looking at other journal papers, e.g., this one: https://ieeexplore.ieee.org/document/9356582

As mentioned when we replied to the 2nd comment, that part of the discussion has been overhauled and the decomposition of the data used as a starting point before passing it through the predictors.

Figures 6, 7, and 8 still have no legend (what is the orange and blue line?) and the y-axis has the label "value" which does not explain much.

This has been fixed, thank you. The label is now more relevant and there's a legend for each of the graphs.

Moreover, Figures 7 & 8 occasionally show that the orange line (predictor I guess?) predicts negative usage (below 0.0). This is clearly nonsense and indicates that the solution is not properly polished wrt. outliers (unexpected results).

Actually, if you read around line 224, it's described how such a scenario could occur. The metrics collected from prometheus represent the rate of change in CPU usage, not the exact value. We considered it a more relevant metric because we want to be able to predict spikes in usage based on the already aggregated rolling window of data. That is why you were seeing some values below 0 in the predictions. We understand that it might be confusing to other readers and for the prediction graphs we trimmed the prediction to 0.

line 365 "values in 4" -> in Table 4

We have addressed this, thank you.

line 421 "recurrent aspect of he implemented" he -> the

We have addressed this, thank you.

We hope that the changes we provided would be considered sufficient enough, especially given the short amount of time we had for the 2nd reviewed version of the paper.

Reviewer 2 Report

According to the revised paper, I have appreciated the deep revision of the contents and the present form of this manuscript. There is little content, which need be revised according to the comment of reviewer in order to meet the requirements of publish. A number of concerns listed as follows:

(1) The authors need to interpret the meanings of the variables.

(2) Please highlight your contributions in introduction.

(3) In Line 261, how to determine these parameters? For example, 25793, 17573, ….The author should give a detailed explanation.

(4) Conclusion: What are the advantages and disadvantages of this study compared to the existing studies in this area?

(5) The inspiration of your work must further be highlighted. Some suggested recent literatures should add in the revised paper according to the reviewer prevenient comments.

(6) Further correct typological mistakes and mathematical errors.

I hope that the authors can carefully and further revise this manuscript according to the reviewer comments in order to meet the requirements of publish.

Author Response

Dear reviewer, thank you very much for your thorough revision. We have tried to take into consideration all your comments and remarks. Please find our answer below inline.

(1) The authors need to interpret the meanings of the variables.

We have further extended the discussion and the data analysis part of the paper. You can look at lines 195 and below. Additionally, we added ARIMA as one of the static solutions for the implementations, since it is a good tool to compare with the deep learning implementations.

(2) Please highlight your contributions in introduction.

We have added the contribution section in the introduction (see lines 35 and below).

(3) In Line 261, how to determine these parameters? For example, 25793, 17573, ….The author should give a detailed explanation.

Starting from line 305 we explained how the algorithms were implemented. Furthermore, we added tables containing all the parameter breakdowns for each of the models. This should suffice for any person that would like to implement a similar model in the future.

(4) Conclusion: What are the advantages and disadvantages of this study compared to the existing studies in this area?

We have provided a few advantages and disadvantages starting from line 481. Additionally, we open-sourced the data, as can be seen in lines 513 and below, where it is described how to use it.

(5) The inspiration of your work must further be highlighted. Some suggested recent literatures should add in the revised paper according to the reviewer prevenient comments.

Thank you for the comment, https://ieeexplore.ieee.org/document/9356582 has been added as a suitable recent citation, as well as https://doi.org/10.3390/agriculture12060793 from the previous round of comments.

(6) Further correct typological mistakes and mathematical errors.

We have done a thorough analysis of the text and math in the paper, proofreading it several times. It should be correct now.

We hope that the changes we provided would be considered sufficient enough, especially given the short amount of time we had for the 2nd reviewed version of the paper.

Reviewer 3 Report

I have observed that the authors have responded well to all the comments/suggestions. I have suggested using NSE, which is a very good performance metric for time series analysis. I am leaving it to the authors if they want to use or add this metrics.

Author Response

Dear reviewer,

Thank you very much for your comments after checking the revised paper. We have decided to leave the current metrics without including the NSE. We appreciate your feedback and will take it into consideration during future publications as well.

We have done some other modifications on the paper, as requested by the other reviewers. Some of the major changes include the addition of a heuristic for comparison with the prediction metrics (cumulative sums). Additionally, we open-sourced the dataset used and added ARIMA as a "static" comparison model.

Thank you for your review!

Round 3

Reviewer 1 Report

Dear authors,

I think that this third version is much better compared to those two previous versions I've reviewed. Of course, I would like to see more convincing experiments (more techniques and more data sets), but I must admit that given the short time for fixes, this paper has been improved a lot.

Therefore, I think this paper now can be published, because its scientific value has been improved in several areas (public data, detailed log analysis, added experiments).

I hope my suggestions have been useful and I thank you for your detailed answers to my comments.

One last comment: if possible, please elaborate what it means when the "heuristic" (Fig.15) achieves lower cumul. sums than the "best case" solution. Is it better than "optimal"? I don't think so. I believe it has some tradeoff, e.g., it means that the system is overloaded then, right? This should be properly explained. Thank you.

Author Response

Dear reviewer,
Thank you very much for the second review. Your comments were valuable and we appreciate their thoroughness, as it has helped us greatly improve the quality of the paper.

Regarding your last comment, we have included a short discussion paragraph explaining why cumulative sum values lower than the ideal are bad, since that one should be the lower limit for scaling. Please read the explanation starting from line 451.

We wish you all the best and thank you for your peer reviews.

Reviewer 2 Report

There is little content, which need be revised according to the comment of reviewer in order to meet the requirements of publish. A number of concerns listed as follows:

(1) At line 4, “In this paper we present the benefit…”—> “In this paper, we present the benefit…”…Further correct typological mistakes and mathematical errors.

(2) At line 19, “rnn” —>“RNN”, “lstm” —>“LSTM”….

(3) At line 216,” Figure 4 shows all the components involved in the system being monitored”->” Figure 4 shows all the monitored components in the system”..

(4) Conclusion should be more carefully rewritten, summarizing what has been learned and why it is interesting and useful.

(5) In order to further highlight the introduction, some suggested references should be added to the paper for improving the reviews part.

Author Response

Dear reviewer,
Thank you very much for your valuable second review. Regarding points 1-3, we have dilligently corrected your comments.
Regarding point 4, we believe that the 80% potential improvements stated at lines around 488 are sufficiently interesting and useful. Additionally, the dataset has been open-sourced, allowing you to benefit from it and make your own conclusions https://github.com/saibot94/cpu-dataset-prometheus .

Regarding point 5, as we previously stated, we included some of the suggested references that you gave us.

We wish you all the best and thank you for your peer reviews.

Article Menu

A System for Sustainable Usage of Computing Resources Leveraging Deep Learning Predictions

Further Information

Guidelines

MDPI Initiatives

Follow MDPI