*4.2. Learning Environment*

The experiment result compares the proposed algorithm with the existing A\* algorithm. The proposed algorithm learns based on the search time of the A\* algorithm and assumes learning success when it reaches the target position more quickly than the A\* algorithm. Figure 4 shows the simulation configuration diagram of the proposed algorithm. The environmental information image used by CNN is 2560 × 2000 and is designed with 3 × 3 filters with a total of 19 layers (16conv. and 3 FC layers) in the form of VGGNet [42].

**Figure 4.** Simulation configuration diagram of proposed algorithm.

Figure 5 shows the score of the Q parameter when learning by episode. In the graph, the red line depicts the score of the proposed algorithm and the blue line is the score when learning Q parameters per mobile robot. The experiment result confirms that the score of the proposed algorithm is slightly slower at the beginning of learning but reaches the final goal score.

**Figure 5.** Score depending on algorithms according to episode.

The proposed algorithm, as in Figure 6, which shows the target arrival speed of the episode-based algorithm, has slower learning progression speed than the Q parameter of each model with a similar target position.

**Figure 6.** The rate at which episode goals are reached.

Figure 7 depicts that the average score of each generation increases gradually as learning progresses. As the learning progresses, it gradually gets better results.

**Figure 7.** The rate of reaching the goal by Epochs.
