*4.1. Simulation Platform*

The simulation experiment in this paper runs in the Ubuntu 20.04 environment, using Python as the development language, and the parameters of the simulation platform are shown in Table 2.

**Table 2.** Simulation platform


Based on the simulation platform we use, the duration of the proposed algorithm learning process usually takes one to two days. The duration of the learning process is limited by simulation platform, and a better simulation platform can significantly reduce learning time of the proposed algorithm. In order to verify the algorithm proposed in this paper, the size of the working environment of the USV in this paper is 20 × 20 m2, which is divided into 11 × 11 grid maps.

#### *4.2. Algorithm Simulation*

In the simulation process, the above environmental parameters and DQN network model parameters are initialized first, and then a preprocessed grid map is constructed to record the grid map information and location information covered by the USV. Before each training starts, the position of the USV appears randomly on the grid map, and it is ensured that the position of the USV is not the position of the obstacle. During training, when the USV task is moved, the training will terminate, and the next training will begin. The conditions for task termination are that the USV collides with the map boundary or the obstacle grid, completes the full-coverage task, and the USV's step size exceeds the set maximum step size. During the training process, each action will generate a sample, and the sample will be stored in the playback memory pool. When the number of samples reaches a certain level, the artificial neural network will be trained. When the training round reaches the set maximum training round, the training will end.

The simulation diagrams are shown in Figures 12–15. Both the improved DQN proposed in this paper and the traditional boustrophedon method can complete the fullcoverage path planning task well, and the coverage rate is 100%. The DQN and inner spiral coverage cannot complete the full-coverage path planning task. Their coverage rates are only 13% and 86%. As shown in Figures 13 and 15, the coverage repetition rate of the improved DQN was 0.04%, while that of the boustrophedon method was 0.13%. Its final composite scores were 0.96 and 0.87 points, respectively. The effect of using the improved DQN-based full-coverage path planning algorithm is due to the traditional boustrophedon method.

Figure 16 shows the coverage of the full-coverage path planning algorithm using boustrophedon and the full-coverage path planning algorithm using the improved DQN. The red curve is the real-time coverage using boustrophedon, and the blue curve is the real-time coverage using the improved DQN full-coverage path planning. The horizontal axis is the step size, and the vertical axis is the coverage. From the real-time coverage map, it can be concluded that the improved DQN-based full-coverage path planning algorithm is more efficient than the boustrophedon method.

The loss function diagram of the improved DQN is shown in Figure 17, and its ordinate is the distance between the path and the optimal path, which can be obtained from Figures 4–7. When the number of training times reaches 1000, the neural network has converged. The 1000 times at this point is not that the agent has performed 1000 tasks, but that 1000 times of memory are extracted from the playback memory pool for playback training, and the overall efficiency is high.

**Figure 12.** Path plan of inner spiral coverage.

**Figure 13.** Path plan of boustrophedon.

**Figure 14.** Path plan of DQN.

**Figure 15.** Path plan of improved DQN.

**Figure 16.** Coverage map.

**Figure 17.** Training graph.

#### **5. Conclusions**

#### *5.1. Main Conclusions and Findings*

This paper proposes a raster map preprocessing method. Combined with the description and analysis of typical full-coverage path planning task scenarios, and further according to the characteristics of full-coverage path planning, we found the grids that cannot be covered by USVs and set them as obstacle grids.

To solve the problem of the low learning efficiency of traditional DQN, an improved action selection mechanism is proposed. This makes the failure rate of full-coverage path planning in the initial training phase lower. At the same time, the method of extracting memory training is improved, so that valuable data has a greater probability of being trained, and the efficiency of full-coverage of the area is improved.

Simulation experiments show that the algorithm can effectively complete the fullcoverage path planning task. The results show that, compared with the traditional fullcoverage path planning algorithm, the proposed algorithm has good coverage rate. Compared with the boustrophedon and DQN algorithms, the performance of this algorithm on the coverage repetition rate is better, indicating that the performance of this algorithm is better.

### *5.2. Main Limitation of the Research*

In modeling phase, the dynamical constraints and performance of the USV were not fully considered. The actual model of USV is more complex than the one in this study. This full-coverage path planning algorithm for USV might not have good performance in some situations.

#### *5.3. Future Research Prospects*

Considering economic factors, we usually want USVs to achieve full-coverage tasks efficiently at a low cost. To achieve this goal, the efficiency of path planning algorithms and their adaptability to different environments need to be improved.

In future work, the focus will be on the USV model and reward function, since the full-coverage path planning algorithm is affected by USV model and reward function when using simulation platform. At the same time, how to extend this algorithm to a complex actual environment with disturbances to obtain application value is also worth studying.

**Author Contributions:** Methodology, B.X.; formal analysis, Q.W.; data curation, Z.L.; data analysis, L.Y. and X.W.; software, X.W.; writing—original draft preparation, L.Y.; writing—review and editing, X.W.; supervision, Q.W. and B.X.; project administration, B.X. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by Shanghai Science and Technology Committee (STCSM) Local Universities Capacity-building Project (No. 22010502200), Scientific Research Project of China Three Gorges Corporation (No. 202003111).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data are available on request.

**Acknowledgments:** The authors would like to express their gratitude for the support of Fishery Engineering and Equipment Innovation Team of Shanghai High-level Local University.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
