Next Article in Journal
Study on the Time Domain Semi Analytical Method for Horizontal Vibration of Pile in Saturated Clay
Previous Article in Journal
Fractal Analysis Applied to the Diagnosis of Oral Cancer and Oral Potentially Malignant Disorders: A Comprehensive Review
Previous Article in Special Issue
On Predictive Modeling Using a New Three-Parameters Modification of Weibull Distribution and Application
 
 
Article
Peer-Review Record

Renewable-Aware Frequency Scaling Approach for Energy-Efficient Deep Learning Clusters

Appl. Sci. 2024, 14(2), 776; https://doi.org/10.3390/app14020776
by Hyuk-Gyu Park 1 and Dong-Ki Kang 2,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Appl. Sci. 2024, 14(2), 776; https://doi.org/10.3390/app14020776
Submission received: 22 November 2023 / Revised: 7 January 2024 / Accepted: 9 January 2024 / Published: 16 January 2024
(This article belongs to the Special Issue Recent Applications of High-Performance Computing)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The article "Renewable-aware Frequency Scaling Approach for Energy-Efficient Deep Learning Clusters" develops an alternative frequency scaling approach to improve energy utilization.

Given a fixed set of jobs in an underutilized cluster, the scheduler is trying to reduce the frequency of the jobs (aka minimizing energy consumption) to meet the overall deadlines to complete the jobs.

 

Overall, the paper contains solid and interesting research with some issues that can be fixed in the preparation of the final manuscript.

 

== Pros ==

+ Well organized paper.

+ The conducted evaluation of performance for the GPUs is valuable. 

+ Recent hardware, good scenarios (except workload specification, see below)

+ Good presentation of the figures.

 

== Cons ==

- Various equations appear slightly convoluted (too complicated) for the simplicity of their meaning - I recommend a simplification.

A couple of questions remain unanswered:

- Are the jobs considered only single node training? How about parallel jobs?

- Section 6.3, the workload specification isn't that clear to me, are you saying that the same workload is executed 100x? Expand here.

 

== Minor issues ==

- The authors assume that renewable energy is free of charge - this should at least be discussed with 2 sentences.

- As expected, and given the results in Figure 5, there exists optimal execution points for all models depending on the GPU type.

It appears quite natural to set the frequency such that the minimum energy consumption is reached.

Such a competing algorithm isn't included, I would be curious about its results.

 

 

== Detailed comments ==

* 122 and following in Section 2. - Remove () around the terms such as (Service Users) 

* RA-FS Manager label not included in Figure 1 (I suppose the green box in the center?)

* Equation 1, you can use \cdot instead of *

* 170 sinh(·)) 

* Equation 2, 3, 4 ending "," and "." is uncecessary

* 190, 215 (... DJR) remove () -> I won't report that further

* 197 - the notion of ∝ should be clarified (proportional, I suppose)

* I wonder about Definition 2, the usual definition of "throughput" is sth per time. Given that the unit of Tau isn't specified (but appears to be in seconds), the overall equation resolves to 

no time. Overall, the presentation is a bit convoluted for a simple scenario, I recommend a rewrite. Also provide an example to demonstrate correctness.

Also 234, δi = δiF + δiB + δiO => then why not simply start with delta i instead and simply the equations.

* 222 - Tehn

* 233 p(k)

* 240 " , "

* 241 (7))

* 308 \texttt{nvidia-smi XXX}

* 327 dshow

* 408 December~1 (to prevent line break)

Author Response

We have attached the response sheet for Reviewer 1.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The paper addresses the important issue of energy consumption in artificial intelligence processing. They very rightly used graphics cards, as it is known that they are more efficient for many applications. The layout of the work is correct. The formatting of the paper seems to be correct, but I feel that all the text is shifted too much to the right.

 

I have a few comments regarding the measurement of energy consumption.

The authors wrote: "we collected data on epoch 310 completion times and power consumption using parsing scripts we developed.". Please explain how this script measures power consumption. Is it read from the graphics card driver, i.e. programmatically? If so, you should confirm the accuracy of these values, e.g. for the same test compare the power consumption read by the script against the consumption measured on the power line between GPU and PSU. If these values coincide, you should assume that the reading with the script is sufficient.

What is not clear is the value (W) in the graphs in Figure 4. Are these peak values, instantaneous, average? If the energy consumption was constant during the measurement, please provide one measurement as a function of time, which will justify Figure 4. If the energy consumption changes over time, please provide [Wh], in which case Figure 5 seems unnecessary.

Please correct the notation of units (W) and (Wh) should be in square brackets [W] and [Wh].

It would be a good idea to state what power consumption the cards used have in a popular synthetic test, e.g. 1 minute in Furmark.

Author Response

We have attached the response sheet for Reviewer 2.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors have addressed all reported reviewer comments and even include a new algorithm based on the feedback.

The improvements are overall satisfactory and the article - if the description for the input is improved it would meet the  quality requirements for publication. 

Improvements suggestion:

For the analysis it would be beneficial to still improve the description regarding the energy sources, while it is clear how the time series data is obtained, it remains unclear how much energy is provided from these sources - how much would be optimally available? How was this determined.
Also, include the (expected) overall utilization in Section 6.3. I assume it is lower than anticipated.

Author Response

Please refer to attached response letter.

Author Response File: Author Response.pdf

Back to TopTop