An Improved Multi-Objective Deep Reinforcement Learning Algorithm Based on Envelope Update
Round 1
Reviewer 1 Report
- The need to the discounting factor \gamma should be specified altogether with Eq. 6.
- The network in Fig. 2 is acting like a critic network with some degree. The would be nice if authors can discuss about its benefit over the networks proposed by the previous works such as 10.1080/16168658.2021.1943887 or similar works in the introduction. That will help the readers to clarify this contribution.
- The convergence of the update law in Eq. 9 should be clarified for more details according to the sequence of \theta^{-} when requiring a positive parameter \tou <1-
- To clarify the results, in Fig. 5, all loss function curves may be plotted together or using the same y-axis scale.
- Section 5 should be improved in order enhance its propose for pointing out the advantage points of the proposed scheme.
- The conclusion should be rewritten a bit to emphasize the results according to main contributions of this work also the abstract.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
Multi-objective reinforcement learning (MORL) aims to uniformly approximate the Pareto frontier in multi-objective decision-making problems, which suffers from insufficient exploration and unstable convergence. The authors proposed a multi-objective deep reinforcement learning algorithm (Envelope with Dueling structure, Noisynet and soft update, EDNs) to improve the ability of the agent to learn optimal multi-objective strategies. Firstly, the EDNs algorithm uses neural networks to approximate the value function and update the parameters based on the convex envelope of the solution boundary. Then the DQN network structure is replaced with the dueling structure and the state value function is split into the dominance function and value function to make it converge faster. Secondly, the Noisynet method is used to add exploration noise to the neural network parameters to make the agent have a
more efficient exploration ability. Finally, the soft update method updates the target network parameters to stabilize the training procedure. They used DST environment as a case study, the experimental results show that the EDNs algorithm has better stability and exploration capability than the EMODRL algorithm. In 1000 episodes, EDNs algorithm improved the coverage by 5.39% and reduced the adaptation error by 36.87%. The following references should be added in reference section:
1. Vandana, R. Dubey, Deepmala, L.N. Mishra, V.N. Mishra, Duality relations for a class of a multiobjective fractional programming problem involving support functions, American J. Operations Research, Vol. 8, (2018), pp. 294-311. DOI: 10.4236/ajor.2018.84017.
2. R. Dubey, Deepmala, V.N. Mishra, Higher-order symmetric duality in nondifferentiable multiobjective fractional programming problem over cone constraints, Stat., Optim. Inf. Comput., Vol. 8, March 2020, pp 187–205. DOI: 10.19139/soic-2310-5070-601.
Recommendation: Based on above revision, manuscript is accepted in this journal after minor revision.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
MERITS
- A accurate presentation of MORL methods is provided, based on detailed investigation of related works from the literature. Both mathematical support and technical description
- A novel MORL algorithm is proposed and tested against others ones, showing higher performances for some particular multi-objective optimization problems
- The pseudo-code’s lines are presented
- The new algorithm performances are demonstrated in clear way, both by graphs and by numerical values
CRITICS
- The first part of the fifth section, Related Work, since is a presentation of other Authors’ works, should be placed in the beginning of the paper.
ERRORS
- There are few editing errors: capitalized “the”, “on” and “and” in the titles of sub-sections 3.3, 4.1
- The text “In this paper, with the number of training episodes as the horizontal coordinate and
the value of loss function as the vertical coordinate. We compare the loss function curves of EMODRL, Envelope-Dueling, Envelope-Noise, Envelope-soft, Envelope-Dueling-…” seems fuzzy.
- Some figures are over-sized
RECOMMENDATIONS
- Correct the errors
- Try to reduce the size of figures 1, 2, 3 and 4
- The references should be mentioned into the text in increasing order of the numbers
- If it is possible, remove the fifth section including the content into the Background section
Comments for author File: Comments.pdf
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Authors already did a good work to revised the article.