Low-Overhead Reinforcement Learning-Based Power Management Using 2QoSMâ€
Round 1
Reviewer 1 Report
This paper proposes reinforcement learning based Quality-of-Service Manager (2QoSM) that takes input from both application and computing platform to learn a power management policy. The 2QoSM is implemented on an autonomous robot built on an embedded single board computer (SBC) and a high-resolution path-planning algorithm. Well-designed experiments are performed to validate the effectiveness of the proposed framework compared to Linux on-demand and situation-aware governors. The paper is actually well written and presented. In my opinion, this submission is acceptable for publication. Nevertheless, I have very few minor comments to improve the overall quality of the manuscript:
- Fig. 7 is not clear: A key that defines the reward functions code colors is needed. Mapping the collected metrics of power, performance, and runtime (on the left), and the Energy-Error Delay Product (on the right) to the reward functions is not clear.
- Many figures are hard to see in a black and white hard copy (Fig. 7 to Fig. 14). The authors are recommended to use other ways to present data so that it can be seen in black and white.
Author Response
Thank you to the reviewer for his helpful suggestions. We have updated the figures' coloring and layout to help with black-and-white legibility. In addition, we have added a better legend to Figure 7.
Reviewer 2 Report
Excellent work. However, a few minor questions to clarify:
1) Could author summarize a table that presents trade-off in power vs accuracy of path finding algorithms.
2) What is the power distribution in the overall system including the motor driver, etc? Current power only involves the SBC algorithm computation. Author could include the power overhead in the mechanical motor systems.
3) Q definition involved several curve fitting parameters, how are these factors being determined to find out a good model? What is the methodology to find out the discount factor, learning rate, etc? Is it by some heuristics?
4) In Figure 5, what is the criteria set for convergence threshold? How is converged being defined i.e. the minimum Q desired or variance of Q?
5) Add a table laying out the hardware, and software used in the experiments.
6) It will be aesthetically desirable to show a graphic of your actual prototype hardware robot in action, and the break down of hardware configurations.
Author Response
Thank you for the detailed questions and helpful suggestions.
- The tradeoff of accuracy versus power consumption is very difficult to directly quantify in table form. Figure X shows the comparison of various reward functions and the error/power relationship. However, because the error of the system is a combination of path efficiency and actual physical behavior of the robot, it's difficult to stay quantitatively what the accuracy of any given path is.
- While obviously the power consumption of the motors changes depending on the torque/velocity of the motor, in general, the motor system takes about as much power as the SoC at maximum power.
- Currently the parameters were determined heuristically from a set of commonly used values. However, future work is looking at a greater parametric exploration of these hyperparameters as well as dynamic updating of the parameters online.
- The convergence threshold used for the experiments was a windowed average variance of Q-updates i.e. when the average update to a Q-value over an n-sample window is below the threshold, convergence is considered to be adequate.
- Thank you for the suggestion, we will add a table.
- We will add a photo of the robot.