Next Article in Journal
Research on Automotive Bidirectional CLLC Resonant Converters Based on High-Order Sliding Mode Control
Previous Article in Journal
Privacy-Preserving Top-k Query Processing Algorithms Using Efficient Secure Protocols over Encrypted Database in Cloud Computing Environment
 
 
Article
Peer-Review Record

GDR: A Game Algorithm Based on Deep Reinforcement Learning for Ad Hoc Network Routing Optimization

Electronics 2022, 11(18), 2873; https://doi.org/10.3390/electronics11182873
by Tang Hong, Ruohan Wang, Xiangzheng Ling * and Xuefang Nie
Reviewer 1: Anonymous
Reviewer 3:
Reviewer 4: Anonymous
Electronics 2022, 11(18), 2873; https://doi.org/10.3390/electronics11182873
Submission received: 7 August 2022 / Revised: 29 August 2022 / Accepted: 3 September 2022 / Published: 11 September 2022
(This article belongs to the Section Artificial Intelligence)

Round 1

Reviewer 1 Report

The paper is organized well. Please add some statistical about obtained results in the abstract. Moreover, you must compare your approach with some of existing methods (comparing with GMM is not enough).

Author Response

    \textbf{Response:} Thanks for your comments and suggestions. We add a comparison with the existing method EBTG, and the experiments show that our method has a low standard deviation and a slower rise rate, as follows:
    
    \begin{quote}
    \textsf{Section 4.2, paragraph 1}: \hl{To show the superiority of dynamic topology control and deep reinforcement learning, we compare GDR (proposed in this paper) with the Energy Balance Topology control Game algorithm (EBTG) and the Routing strategy based on deep reinforcement learning (PRMD) for both residual energy and survival time. }
    \end{quote}

Author Response File: Author Response.pdf

Reviewer 2 Report

Can you please check Algorithm 1 again. I do not see how this is a 'game' per se and not a heuristic approach since it is not apparent how the nodes interact with each other to make it a 'game'.

4.1 how this is a 'game' is not certain. It appears that this could be considered an optimization instead (those 2 things are not equivalent). There has to be an interaction between the players for it to be a game.

 

Figure 4, what is valuable about the topology of Figure 4b over 4a? What has been achieved is not evident to the reader.

Algorithm 2, the pseudo code is not formal enough without the argument and return variables provided, please fix.

Where is the loss function should be defined before its mention in the algorithm 2.

The major issue, which is paramount is what this is achieving. It is not clear what changes in the topology are beneficial, maybe those in this industry can recognize it directly without reading an explanation but it is not for a newcomer. The context for the introduction is not direct and explicit on the problem statement 'what is this addressing' is not apparent to the reader.  This needs to be changed.

Also, what is novel about the work? What state of the art is being surpassed here?

 

 

 

 

Author Response

\quad \textcolor{black}{Thank you so much for the critical but helpful comments. We have revised our paper carefully according to them.}
    
    \textbf{2.1 } 
    \textit{Can you please check Algorithm 1 again. I do not see how this is a 'game' per se and not a heuristic approach since it is not apparent how the nodes interact with each other to make it a 'game'.}
    
    \textbf{Response:}  Thanks for your comments and suggestions. We consider this a game because the nodes interact with each other. We gave a simple example before introducing game theory in section 3.2. Therefore, in introducing the game model, we present the participants and the game strategy. The game strategy includes the choice of links, which is obviously possible to represent the interactions between nodes.
    
    \textbf{2.2 }
    \textit{4.1 how this is a 'game' is not certain. It appears that this could be considered an optimization instead (those 2 things are not equivalent). There has to be an interaction between the players for it to be a game. }
    
    \textbf{Response:} Thanks for your comments and suggestions. We consider this a game because the nodes interact with each other. We gave a simple example before introducing game theory in section 3.2. The new manuscript is as follows:
    \begin{quote}
        \textsf{Section 3.2, paragraph 2}:"Figure 3(a) shows the topology generation diagram of nodes at a certain moment, assuming that the current average lifetime is 4, and the remaining lifetimes of nodes A, B, C and D are 7, 3, 4 and 2, respectively. Node A can reach Node B with the smallest transmit power and the least energy consumption, and node B can reach node C with the largest transmit power and the most energy consumption. If the operation continues according to Figure 3(a), node B will surely die due to excessive energy consumption. In Figure 3(b), the remaining lifetime of node A is 7, which is longer than the average lifetime. Consider increasing the transmitting power to help the surrounding nodes reduce energy consumption, so the communication radius of node A increases from the original node B to the average lifetime node C. In Figure 3(c), the node B has a shorter than average lifetime, so consider reducing the transmitting power to reduce energy consumption to extend its lifetime under the condition of ensuring network connectivity. Therefore, the communication radius of node B is reduced from reaching the distant node C to the nearest node B. The remaining lifetime of node C is the same as the average lifetime, so its transmitting power remains unchanged. After the node transmitting power is adjusted, the network remains connected and point B will not be dead prematurely."
    \end{quote}
    
    \textbf{2.3 }
    \textit{Figure 4, what is valuable about the topology of Figure 4b over 4a? What has been achieved is not evident to the reader. }
    
    \textbf{Response:} Thanks for your question. We have added some content to the article to make it look clearer. We have supplemented detailed descriptions of these two images , as follows:
    
    \begin{quote}
        \textsf{Section 3.2.3, paragraph 2 }: "\hl{After the algorithm.1, we are able to obtain a new and more concise topology.The new topology not only reduces node energy consumption, but also has higher link quality between nodes and lower node load.} Figure 4 shows the process of establishing the topology of the topology control game algorithm. From Figure 4(a), we can see that in the initial state of the network, the communication links constructed between nodes This can cause high energy consumption of nodes and lead to premature death of nodes. "
    \end{quote}
    
    \textbf{2.4 }
    \textit{Algorithm 2, the pseudo code is not formal enough without the argument and return variables provided, please fix.}
    
    \textbf{Response:} Thanks for your advice. The DRL agent updates the network parameter $\theta $ and  utility function parameter, $\alpha$, $\beta $, $\eta$ and $\mu$ in continuous iterations until the loss function converges, so we do not provide return variables in the pseudo-code.
    
    \textbf{2.5 }
    \textit{Where is the loss function should be defined before its mention in the algorithm 2. }
    
    \textbf{Response:} Thanks for your comments and suggestions. We have corrected it in the new manuscript, as follows:
    
    \begin{quote}
        \textsf{Section 3.3.3,paragraph 4}:"During the learning process, the agent continuously updates the policy driven by the cumulative reward function, until the best policy for routing optimization is learned.\hl{
         We optimize the cross-entropy loss and back-propagate the gradients through the policy network. The loss function are as follow:}
         \begin{equation}
         \mathcal{L} _{\theta}=\sum{_{t=0}^{T}\left( \log \pi _{\theta}\left( a_{i,t}\left| o_{i,t} \right. \right) \left( \sum{_{t^{'}=t}^{T}r\left( o_{i,t^{'}},a_{i,t^{'}} \right) -b_{i,t^{'}}} \right) \right)}
         \end{equation}
         
         Therefore, the total loss from each small batch is given by:
         \begin{equation}
         \mathcal{L} _{\mathrm{total}}=\frac{1}{Z}\sum{_{i=1}^{Z}\mathcal{L} _{\theta}}
         \end{equation}
         
    \end{quote}
    
    \textbf{2.6 }
    \textit{The major issue, which is paramount is what this is achieving. It is not clear what changes in the topology are beneficial, maybe those in this industry can recognize it directly without reading an explanation but it is not for a newcomer. The context for the introduction is not direct and explicit on the problem statement 'what is this addressing' is not apparent to the reader.  This needs to be changed. }
    
    \textbf{Response:} Thanks for your comments and suggestions. It can be concluded from the experiment that the game model in the algorithm can reduce the node load to improve the link quality. At the same time, this results in smaller inputs and better computational performance for the reinforcement learning agent. It can be seen as follow:
    \begin{quote}
        \textsf{Section 3.2.3 paragraph 2} :"\hl{After the algorithm.1, we are able to obtain a new and more concise topology.The new topology not only reduces node energy consumption, but also has higher link quality between nodes and lower node load.} Figure 4 shows the process of establishing the topology of the topology control game algorithm. From Figure 4(a), we can see that in the initial state of the network, the communication links constructed between nodes This can cause high energy consumption of nodes and lead to premature death of nodes. "
    \end{quote}
    
    
    \textbf{2.7 }
    \textit{Also, what is novel about the work? What state of the art is being surpassed here? }
    
    \textbf{Response:}We have added an overview of the dissertation work section 1(Introduction), paragraph 6
    
    \hl{To obtain a stable, longer life cycle and lower delay network and to solve the problem of unbalanced energy consumption during dynamic networking, we propose an adaptive routing algorithm based on reinforcement learning. The method first describes the network topology and network transmission parameters in terms of graph neural networks, and then dynamically adjusts the topology using a game algorithm. In order to maintain stability when the network changes dynamically, reinforcement learning is used to generate automatic routing strategies, so that GNN can attain approximate optimal performance without a priori information about the environment, and is capable of independent exploration and optimization decisions.} 
    
    In section 4.2  we have added a comparison experiment with other methods. We compare GDR (proposed in this paper) with the Energy Balance Topology control Game algorithm (EBTG) and the Routing strategy based on deep reinforcement learning (PRMD) for both residual energy and survival time. The results show that our method is better than these 2 methods.

Author Response File: Author Response.pdf

Reviewer 3 Report

 

The paper looks great, well structured and with high quality of scietific presentation,

However needs a few things to be rady for publication:

1. Abstract needs to be more informative, put the achieved results with benchmarking with the closed related works,

2. there are serious problem in citation, i.e., first paragraph in the introduction no reference citation, also 4th paragraph citation comes at the end with 4 to 7, which can be distributed among the paragraph to be more specific. last three paragraph in introduction only one citation has been given [12], which is full and rich of information need to be identified and cited,

3. Related works, also not presented in a good way, spilit this section to sub titles, graph, deep learning, and game thoery will not give at the end the gap or the problem need to be resolved, the literature need to be discussed as one unit. in this format it gives impression that the manuscript has three different contribtion or addressing three different problems, a few related references need to be added to the manuscript. 21 references is too a few 

4. Dynamic Graph Construction Based on Ad Hoc network, this section has too many basic information for those in the field better to take them out and give brief discribtion about what need to be done,

5. sections 3, 4 and 5 until 5.3 all are well known stuff and no contribution has been given. the stuff makes the manuscript looks like a book chapter rather than scientific paper. these needs to be vitally reduced and focused on the contribution. section 5.3 should be rephrased to be proposed GDR method

6. section 5.3, looks to simple where the three methods has been connected in chain when even simple feedback has not been given, if we delete the network graphs below the boxes it would be very simple flow chart, where DGC --> Agent <--TCGA and then the agent --> GCN. which is very simple and need more elaboration. these boxes need to be split to many boxes with details for the proposed algorithm to be acceptable for publication.

7. Agent --> GCN , just like? that no output needed from GCN? then why we gona use it, if no output is needed.

8. GCN in the literature (section 2) and section 5.2 has been called Graph Neural Networks GNN, while in section 5.3 has been called GCN where there are a serious different between the two.

9. Algorithm 2, need more details about

Construct the graph H = hV, E, Pi of current topology (give deails on how and what is been contributed here, if off the shelf then no need to elaborate)

 

10: Observe the state O(t) from the graph (give deails on how and what is been contributed here, if off the shelf then no need to elaborate)

11: Select routing link a i,t 0 form A(t) and perform routing link selection (give deails on how and what is been contributed here, if off the shelf then no need to elaborate)

 

12: Use the game model to adjust the topology generation b ((give deails on how and what is been contributed here, if off the shelf then no need to elaborate))

in this algorithm need to show what has been contributed and be very focus no need to mention everything, small contribution is enough rather than go abroad without contribution,

10. Simulation parameters.need to be enhanced, what have been given is quite brief for three different methods graph theory, GCN, Game theory and routing optimization. if the work at the network layer need to have some parameters about this layer, (related to packets), if at the link layer level need to show the parameters (about frame and link), maybe going back to the literature reveiw may help to see a few examples

 

11. Number of initial nodes was 50 in the table 1, while in Figure 8 100 noodes were examined

12. in Figure 11 the survival time was 700 sec which about 10 minutes, for mobile ad-hoc network this time looks not visible,

13. in Figure 10, the maximum time for residual energy was 100 sec means about 1 minutes which is also not practical

14. in Figure 12 and 13 the maximum delay was upto 1500 s which is bout 22 minutes which is not practical as well.

15. in Figure 14 what is topology means, 140 topology. is it means during send same data the network can have 140  topology, need the figure to be self explanatory

 

    

 

Author Response

    \quad We appreciate the valuable comments and suggestions. We modified our manuscript accordingly, as follows.
    
    \textbf{3.1 }
    \textit{Abstract needs to be more informative, put the achieved results with bench-marking with the closed related works, }
    
    \textbf{Response:} Thanks for your comments and suggestions. We have added more information to the summary, bench-marking the results achieved against related work that has been concluded. The new manuscript is as follows:
    \begin{quote}
        \textsf{Abstract}: "Ad Hoc networks have been widely used in emergency communication tasks. For dynamic characteristics of Ad Hoc networks, problems of node energy limited and unbalanced energy consumption during deployment, we propose a strategy based on game theory and deep reinforcement learning (GDR) to improve the balance of network capabilities and enhance the autonomy of the network topology. The model uses game theory to generate an adaptive topology, adjusts its power according to the average life of the node, helps the node with the shortest life to decrease the power, and prolongs the survival time of the entire network. When nodes move in and out of the network dynamically, reinforcement learning is used to automatically generate routing policies to improve the average end-to-end latency of the network. Through theoretical analysis and experimental results, it is proved that under the condition of ensuring connectivity, GDR has lower load balancing, longer network lifetime, and lower network delay. It reduces the average end-to-end delay of the network and exhibits greater robustness to topology changes.\hl{The experiment results show that the average end-to-end delay of the network under the GDR model is 10.5$\% $ higher than that of existing methods on average.}"
    \end{quote}
    
    \textbf{3.2 }
    \textit{there are serious problem in citation, i.e., first paragraph in the introduction no reference citation, also 4th paragraph citation comes at the end with 4 to 7, which can be distributed among the paragraph to be more specific. last three paragraph in introduction only one citation has been given [12], which is full and rich of information need to be identified and cited, }
    
    \textbf{Response:} Thanks for your advice. We have added citations as follows:
    \begin{quote}
        \textsf{Section 1(Introduction)}.
        \textsf{paragraph 3, line 3} : "each node is on the move and can stay connected to other nodes in any way dynamically\hl{[1]}"
    
    \textsf{Paragraph 4, lines 8 and 9} : " Compared with supervised learning and unsupervised learning, which can only classify network traffic, reinforcement learning can directly generate routes by training intelligent agents in unlabeled data sets through newly connected nodes\hl{[5,6]}. Many related studies apply reinforcement learning to routing engineering or traffic flow engineering \hl{[7,8]}."
    
    \textsf{Paragraph 5} :"\hl{The application effect of reinforcement learning in routing engineering is not so significant at present, because the existing methods [9] are mainly based on neural network (NN for short) architectures (e.g., Convolution Neural Network [10], Recurrent Neural Network [11], which are not appropriate for modeling information about graph structures. This is because changes in the topological of dynamic networks imply that the inputs and outputs of CNNs or RNNs are not fixed. Also, for the input or output of dynamic Ad Hoc network information, CNN or RNN cannot generalize well. Even if it is represented, it is very inconvenient to store, because the change of topology means the modification of the whole  representation.Graph neural networks(GNN) use graphs to represent network topology information, which can clearly characterize the Ad Hoc network structure and store it more efficiently. In [12], a new multipath routing model based on GNN is proposed to explore the complexity between links, paths and MPCP connection on various topologies. Using the GNN model, it is possible to predict the expected throughput for a given network topology and multi-path routing, which can further serve as a guide to optimize multi-path routing. In [13], packets routing strategy based on deep reinforcement learning(PRMD) were proposed to reduce data packet transmission time by learning the data information of the network to forward the packets. These reinforcement learning strategies do not focus on the consumption of network energy, which can lead to changes in the overall network topology and therefore cannot make the network stable for a long time.}"
    \end{quote}
    
    
    
    
    
    \textbf{3.3 }
    \textit{Related works, also not presented in a good way, spilt this section to sub titles, graph, deep learning, and game theory will not give at the end the gap or the problem need to be resolved, the literature need to be discussed as one unit. in this format it gives impression that the manuscript has three different contribtion or addressing three different problems, a few related references need to be added to the manuscript. 21 references is too a few}
    
    \textbf{Response:} Thanks for your advice. In related work, 1 paragraph was added and the next 3 paragraphs were modified. The new manuscript reads as follows:
    \begin{quote}
       \textsf{    paragraph 1}: "\hl{Early approaches to enhance the survival time of Ad Hoc networks were mainly based on graph-theoretic topology control approaches. This approach constructs the minimum spanning tree of network topology with Euclidean distance and lacks adaptive capability. Currently more promising approaches mainly use Game Theory and Deep Reinforcement Learning.This paper uses GNN to describe the topology of the network, then uses Game Theory to adjust the topology based on the current state of the network, and finally uses DRL methods to determine a routing policy.}"
    
    \textsf{paragraph 2 line 1}:" \hl{\textbf{Graph Neural Networks.} Graph neural network is an emerging network that operates on graph structural information.}"
    
    \textsf{paragraph 3 line1}:" \hl{\textbf{Game theory based approach to network topology adjustment.} }"
    
    \textsf{paragraph 4 line1:} "\hl{\textbf{Deep reinforcement learning based routing Policy.} }"
    \end{quote}

    
    \textbf{3.4 }
    \textit{Dynamic Graph Construction Based on Ad Hoc network, this section has too many basic information for those in the field better to take them out and give brief distribution about what need to be done,}
    
    \textbf{Response:} We have simplified the content of the article and the new manuscript reads as follows:
    
    \begin{quote}
        \textsf{Section 3.1 paragraph}: "\hl{To build the graph of an Ad Hoc network with $N$ transceiver pairs, we regard the $i$th pair of transceivers as the $i$th node of the graph. Each node has a feature vector, including environmental information and direct channel state information $h_{ii}^1$, such as the remaining energy $w_i$ of the $i$th node. The feature vectors of two directed edges between nodes $v_i$ and $v_j$ can include $h_{ij}$ and $h_{ij}$, respectively.} "
    \end{quote}
    
    \textbf{3.5 }
    \textit{sections 3, 4 and 5 until 5.3 all are well known stuff and no contribution has been given. the stuff makes the manuscript looks like a book chapter rather than scientific paper. these needs to be vitally reduced and focused on the contribution. section 5.3 should be rephrased to be proposed GDR method}
    
    \textbf{Response:} Thanks for your advice. We have reorganized the structure of the article to make it look more explicit.The new manuscript is as follows:
    \begin{quote}
    \textsf{Section3 GDR framework}: "\hl{In this section, we will describe GDR framework in detail. In Section 3.1, we introduce how to represent Ad Hoc networks with a dynamic graph. Section 3.2 provides the topology control game algorithm and a detailed description of the optimization problem. Section 3.3 develops the routing algorithm based on the DRL framework with GNN.}"
    \end{quote}
    
    \textbf{3.6 }
    \textit{section 5.3, looks to simple where the three methods has been connected in chain when even simple feedback has not been given, if we delete the network graphs below the boxes it would be very simple flow chart, where DGC --> Agent <--TCGA and then the agent --> GCN. which is very simple and need more elaboration. these boxes need to be split to many boxes with details for the proposed algorithm to be acceptable for publication.}
    
    \textbf{Response:} Thanks for your advice. We have revised the structure of the article by reintroducing the original section 5.3 in section 3.3. And we introduced deep reinforcement learning in section 3.3.1, graph neural networks in section 3.3.2 and DRL framework in section 3.3.3. The details of each box in the flowchart are described in detail in each subsection.
    
    \textbf{3.7 }
    \textit{Agent --> GCN , just like? that no output needed from GCN? then why we gone use it, if no output is needed. } 
    
    \textbf{Response:} Thank you  for pointing out the issue. Based on your comments, we have modified the image. In this image, GNN is the network used to extract the key features in the learning step which is trained by reinforcement learning. This image depicts the selection of the appropriate action strategy for routing optimization after the GNN extracts the features. The new flowchart is as follows:
    \begin{quote}
        \begin{figure}[ht]
        \centering
        \includegraphics[width=5.3in]{figs/The deep reinforcement learning 1.png}
        \label{The deep reinforcement learning framework}
        \end{figure}
    \end{quote}
    
    \textbf{3.8 }
    \textit{GCN in the literature (section 2) and section 5.2 has been called Graph Neural Networks GNN, while in section 5.3 has been called GCN where there are a serious different between the two.}
    
    \textbf{Response:} Thank you very much. We have corrected it in the new manuscript.
    
    \textbf{3.9 }
    \textit{Algorithm 2, need more details about Construct the graph $H = \left< V,E,P \right> $ of current topology (give details on how and what is been contributed here, if off the shelf then no need to elaborate)}
    
    \textbf{Response:} Thank you very much. We have corrected it in the new manuscript.The new manuscript is as follows:
    \begin{quote}
        \textsf{Section 3.3.3 paragraph 5}: "In Algorithm 2, we can see a pseudo-code describing the operation of the DRL agent. We use the data from each small batch to do optimization. At the beginning, nodes is initialized randomly with an arbitrary position $\vartheta $ and utility information $\xi $. After that, some nodes is dropped randomly as $Z$. At the same time, we initialize the environment by setting randomly the parameter of GNN model and the parameter of utility function as $\theta $, including $\alpha$,$\beta $,$\eta$ and $\mu$, where $\alpha+\beta+\eta+\mu = 1$. And then, we operate on each node in $Z$ until all nodes have been traversed. We initialize the environment $\theta $ for each node. \hl{Then, Construct the graph $H = \left< V,E,P \right> $ of current topology, which is introduced in section 3.1 and section 3.2.1.} Observe the state $\mathcal{O} \left( t \right)$ from the graph for each case, which can be found in equation.8 in detail. What's more, we select routing link $a_{i,t^{'}}$ form $\mathcal{A} \left( t \right) $ which can be found in equation.9 in detail and perform routing link selection. After that, we use the game model to adjust the topology generation to obtain a more optimal structure, as shown in section 3.2. Specially, the performance after execution is judged by the reward $r\left( o_{i,t^{'}} \right)$. "
    \end{quote}
    
    \begin{quote}
        \textsf{Section 3.1 paragraph 3}: "In an Ad Hoc network, each transceiver is frequently distinct from the other, and the information carried by the graph is diverse.The above representation does not do well enough to perform the network, so we define the graph as $H = \left< V,E,P \right> $, where $V$ represents the set of all transceiver nodes in the network, i.e., N nodes randomly deployed in a region; the link set $E$ represents the set of communication links between two nodes in the node set $V$. Each node can communicate with its neighboring reachable nodes. $V_i$ stores the coordinates $d_i$ of point $i$. For convenience, we treat the graph representation as undirected, i.e., $E_{ij} = E_{ji}$. $E_i$ contains the propagation delay of the link, which is related to the quality of the link. $P$ represents the set of states of a node, including network connectivity, transmitting power, node degree and link quality. In this paper, these specific feature values are integrated into the environment information, which will be introduced in Section 3.2 and section 3.3."
    \end{quote}
    
    \begin{quote}
        \textsf{Section 3.2.1 paragraph 1}: "The definition strategy game is $\theta =\left<  N,C,U \right>$, where $N=\left\{ 1,2,\cdots ,n \right\} $ denotes the game participants, which is equal to the number of nodes in the graph for the Ad Hoc network; The strategy space is denoted as $ C=\left\{ C_1,C_2,\cdots ,C_n \right\} $, where $C_i$ represents the set of strategies that participant $i$ can choose. If there are $k$ alternative strategies for $i$, then we have $ C_i=\left\{ c_{i}\left[ 1 \right],c_{i}\left[ 2 \right],\cdots ,c_{i}\left[ k \right] \right\} $; $U=\left\{ u_1,u_2,\cdots ,u_n \right\}$ is the payoff value obtained by the participants after the game, and $ u_{i}\left( c_{i},c_{-i} \right) $ denotes the payoff value obtained by participant $i$ in strategy combination $\left( c_{i},c_{-i} \right) $, with $c_i$ denoting the strategy chosen by participant $i$, and $c_{-i}$ denoting the strategy chosen by the remaining of participants."
    \end{quote}
    
    \textbf{3.10 }
    \textit{Observe the state O(t) from the graph (give details on how and what is been contributed here, if off the shelf then no need to elaborate) }
    
    \textbf{Response:} Thank you very much. We have corrected it in the new manuscript.The new manuscript is as follows:
    \begin{quote}
        \textsf{Section 3.3.3 paragraph 5}: "In Algorithm 2, we can see a pseudo-code describing the operation of the DRL agent. We use the data from each small batch to do optimization. At the beginning, nodes is initialized randomly with an arbitrary position $\vartheta $ and utility information $\xi $. After that, some nodes is dropped randomly as $Z$. At the same time, we initialize the environment by setting randomly the parameter of GNN model and the parameter of utility function as $\theta $, including $\alpha$,$\beta $,$\eta$ and $\mu$, where $\alpha+\beta+\eta+\mu = 1$. And then, we operate on each node in $Z$ until all nodes have been traversed. We initialize the environment $\theta $ for each node. Then, Construct the graph $H = \left< V,E,P \right> $ of current topology, which is introduced in section 3.1 and section 3.2.1. \hl{Observe the state $\mathcal{O} \left( t \right)$ from the graph for each case, which can be found in equation.8 in detail.} What's more, we select routing link $a_{i,t^{'}}$ form $\mathcal{A} \left( t \right) $ which can be found in equation.9 in detail and perform routing link selection. After that, we use the game model to adjust the topology generation to obtain a more optimal structure, as shown in section 3.2. Specially, the performance after execution is judged by the reward $r\left( o_{i,t^{'}} \right)$. "
    \end{quote}
    
    \begin{quote}
        \textsf{Section 3.3.3 paragraph 2}: "In our work, the state mainly consists of the utility function $U\left( t \right)$ of Section 3.2 and the distance distribution matrix $D\left( t \right)$. $D\left( t \right)$ is derived from the location information carried by each node itself interacting with other nodes. Consequently, the GNN-Based DRL agent system state is defined as: 

\begin{equation}\label{ot}
    \mathcal{O} \left( t \right) =\left\{ D\left( t \right) ,U\left( t \right) \right\} 
\end{equation}"
    \end{quote}
    
    \textbf{3.11 }
    \textit{Select routing link a i,t 0 form A(t) and perform routing link selection (give details on how and what is been contributed here, if off the shelf then no need to elaborate) }
    
    \textbf{Response:}Thank you very much. We have corrected it in the new manuscript.The new manuscript is as follows:
    \begin{quote}
        \textsf{Section 3.3.3 paragraph 5}: "In Algorithm 2, we can see a pseudo-code describing the operation of the DRL agent. We use the data from each small batch to do optimization. At the beginning, nodes is initialized randomly with an arbitrary position $\vartheta $ and utility information $\xi $. After that, some nodes is dropped randomly as $Z$. At the same time, we initialize the environment by setting randomly the parameter of GNN model and the parameter of utility function as $\theta $, including $\alpha$,$\beta $,$\eta$ and $\mu$, where $\alpha+\beta+\eta+\mu = 1$. And then, we operate on each node in $Z$ until all nodes have been traversed. We initialize the environment $\theta $ for each node. Then, Construct the graph $H = \left< V,E,P \right> $ of current topology, which is introduced in section 3.1 and section 3.2.1. Observe the state $\mathcal{O} \left( t \right)$ from the graph for each case, which can be found in equation.8 in detail. \hl{What's more, we select routing link $a_{i,t^{'}}$ form $\mathcal{A} \left( t \right) $ which can be found in equation.9 in detail and perform routing link selection.} After that, we use the game model to adjust the topology generation to obtain a more optimal structure, as shown in section 3.2. Specially, the performance after execution is judged by the reward $r\left( o_{i,t^{'}} \right)$. "
    \end{quote}
    
    \begin{quote}
        \textsf{Section 3.3.3 paragraph 3}: "At every single step, we define the actions as the allocation of the newly accessed nodes, which can be expressed as:
\begin{equation}\label{at}
    \mathcal{A} \left( t \right) =\left\{a_1\left( t \right) ,a_2\left( t \right) ,\cdots  \right\} 
\end{equation}
where $a_i(t)$ denotes a choice of action.In our approach, the action values are a matrix of information about the newly accessed nodes, including $\left< V, P \right> $ described in section 3.1. Each edge information $E$ is calculated after the information in $P$ and $V$. The utility function value of the route is calculated from the carried utility function content."
    \end{quote}
    
    \textbf{3.12 }
    \textit{Use the game model to adjust the topology generation b ((give details on how and what is been contributed here, if off the shelf then no need to elaborate)). In this algorithm need to show what has been contributed and be very focus no need to mention everything, small contribution is enough rather than go abroad without contribution,}
    
    \textbf{Response:} Thank you very much. We have corrected it in the new manuscript.The new manuscript is as follows:
    \begin{quote}
        \textsf{Section 3.3.3 paragraph 5}: "In Algorithm 2, we can see a pseudo-code describing the operation of the DRL agent. We use the data from each small batch to do optimization. At the beginning, nodes is initialized randomly with an arbitrary position $\vartheta $ and utility information $\xi $. After that, some nodes is dropped randomly as $Z$. At the same time, we initialize the environment by setting randomly the parameter of GNN model and the parameter of utility function as $\theta $, including $\alpha$,$\beta $,$\eta$ and $\mu$, where $\alpha+\beta+\eta+\mu = 1$. And then, we operate on each node in $Z$ until all nodes have been traversed. We initialize the environment $\theta $ for each node. Then, Construct the graph $H = \left< V,E,P \right> $ of current topology, which is introduced in section 3.1 and section 3.2.1. Observe the state $\mathcal{O} \left( t \right)$ from the graph for each case, which can be found in equation.8 in detail. What's more, we select routing link $a_{i,t^{'}}$ form $\mathcal{A} \left( t \right) $ which can be found in equation.9 in detail and perform routing link selection. \hl{After that, we use the game model to adjust the topology generation to obtain a more optimal structure, as shown in section 3.2.}  Specially, the performance after execution is judged by the reward $r\left( o_{i,t^{'}} \right)$. "
    \end{quote}
    
    \textbf{3.13 }
    \textit{Simulation parameters.need to be enhanced, what have been given is quite brief for three different methods graph theory, GCN, Game theory and routing optimization. if the work at the network layer need to have some parameters about this layer, (related to packets), if at the link layer level need to show the parameters (about frame and link), maybe going back to the literature review may help to see a few examples}
    
    \textbf{Response:} Thanks for your comments and suggests. We have corrected it in the new manuscript. In section simulation parameters, we summarized some hyperparameters for simulation experiments. And the parameters of graph theory, GCN, Game theory and routing optimization in the DRL framework is randomly initialized and trained through iterations, which can be seen in algorithm 2. It can be seen as follows:
    \begin{quote}
        \textsf{Section 3.3.3 paragraph 6}: "In Algorithm 2, we can see a pseudo-code describing the operation of the DRL agent. We use the data from each small batch to do optimization. At the beginning, nodes is initialized randomly with an arbitrary position $\vartheta $ and utility information $\xi $. After that, some nodes is dropped randomly as $Z$. At the same time, we initialize the environment by setting randomly the parameter of GNN model and the parameter of utility function as $\theta $, including $\alpha$,$\beta $,$\eta$ and $\mu$, where $\alpha+\beta+\eta+\mu = 1$. And then, we operate on each node in $Z$ until all nodes have been traversed. \hl{We initialize the environment $\theta $ for each node. Then, Construct the graph $H = \left< V,E,P \right> $ of current topology, which is introduced in section 3.1 and section 3.2.1. Observe the state $\mathcal{O} \left( t \right)$ from the graph for each case, which can be found in equation.8 in detail. What's more, we select routing link $a_{i,t^{'}}$ form $\mathcal{A} \left( t \right) $ which can be found in equation.9 in detail and perform routing link selection. After that, we use the game model to adjust the topology generation to obtain a more optimal structure, as shown in section 3.2.} Specially, the performance after execution is judged by the reward $r\left( o_{i,t^{'}} \right)$. "
    \end{quote}
    
    \textbf{3.14 }
    \textit{Number of initial nodes was 50 in the table 1, while in Figure 8 100 nodes were examined.}
    
    \textbf{Response:} Thank you very much. We have corrected it in the new manuscript.
    
    \textbf{3.15 }
    \textit{In Figure 11 the survival time was 700 sec which about 10 minutes, for mobile ad-hoc network this time looks not visible. }
     
    \textbf{Response:} Thank you very much. We have conducted simulation experiments using a simulation platform with an initial number of 50 nodes and an initial energy of 50 J for each point. We introduced in the paragraph 2 of Section 4.2 to consider the first node death time as the survival time of the network. In this case, 10 minutes is reasonable for our simulation experiments.
    
    \textbf{3.16 }
    \textit{In Figure 10, the maximum time for residual energy was 100 sec means about 1 minutes which is also not practical.}
    
    \textbf{Response:} Thank you very much. 100 sec indicates the duration of the simulation experiment. As the simulation experiment proceeds, the change in the standard deviation of the residual energy of the nodes can be visualized as the load between the nodes.
    
    \textbf{3.17 }
    \textit{In Figure 12 and 13 the maximum delay was up to 1500 s which is bout 22 minutes which is not practical as well.}
    
    \textbf{Response:} Thank you very much.In our simulation experiments, the specific fraction shown on the Y-axis is the average end-to-end delay calculated from the environmental information available at the time of each evaluation. It is not a specific number of seconds. It can be seen as follows:
    \begin{quote}
        \textsf{Section 4.2 paragraph 9}: "\textbf{Network delay.} Network delay is measured by the average end-to-end delay, which is reflected as a specific score in the algorithm of this paper.Figures 12-13 show the results of 2 experiments based on different number of nodes,15 and 20, respectively. \hl{In each box plot, the Y-axis shows the specific score, which is the average end-to-end delay calculated from the environmental information available at the time of each evaluation.}"
    \end{quote}
    
    \textbf{3.18 }
    \textit{In Figure 14 what is topology means, 140 topology. is it means during send same data the network can have 140  topology, need the figure to be self explanatory }
    
    \textbf{Response:} Thanks for your question. We conducted experiments using 140 different real network topologies to test the generalization of the model. Figure 2 shows that we have better results on several different topologies, with an average of 10.5\% higher than AutoGNN. It can be seen as follows:
    \begin{quote}
        \textsf{Section 4.2 paragraph 9}: "\textbf{Generalization ability.} We evaluated the ability of our GDR model to generalize to real-world network topologies obtained from a small self-study room. From the data collected in that study room, we selected topologies with more than 10 and less than 40 nodes. In particular, ring and star topologies we do not consider. This is because, in these topologies, the number of effective candidate paths for distribution requirements is usually very limited (in many cases a node is connected to only 1-2 nodes).\hl{ By filtering, we obtained 140 real network topologies with which to perform tests.}"
    \end{quote}
    
    \begin{quote}
        \textsf{Section 4.2 paragraph 10}: "To evaluate the generalization ability of our model, we select the best model during training for comparison experiments. On each topology, we perform 500 evaluation experiments and store the rewards achieved by GDR and AutoGNN routing strategies and calculate the average value. Figure.14 shows performance of different models on different topologies (X-axis). \hl{X-axis represents different topologies, which are ordered according to the GDR model scores.} The Y-axis indicates the relative performance of our model to the AutoGNN model. \hl{It is shown that in 80\% of the cases, our model works better than AutoGNN.And the average end-to-end delay of the network under the GDR model is 10.5$\% $ higher than that of existing methods on average.}"
    \end{quote}

Author Response File: Author Response.pdf

Reviewer 4 Report

I do not have any comments for the paper, it has novelty and contribution. Moreover, is well written and the structure is clear.

Author Response

\textcolor{black}{
    \textit{I do not have any comments for the paper, it has novelty and contribution. Moreover, is well written and the structure is clear. }
    }

    \textbf{Response:} Thank you for acknowledging our work, and we have revised the full text based on the reviewers. All the changes in response to the comments are highlighted in yellow in the revised version.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

All the comments are addressed.

Reviewer 2 Report

My concerns have been addressed

Reviewer 3 Report

The paper is good and can be published,

Back to TopTop