Next Article in Journal
Multi-Scale Frequency-Adaptive-Network-Based Underwater Target Recognition
Previous Article in Journal
Overview of Theory, Simulation, and Experiment of the Water Exit Problem
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Coordinated Ship Welding with Optimal Lazy Robot Ratio and Energy Consumption via Reinforcement Learning

1
School of Automation and Key Laboratory of Measurement and Control of Complex Systems of Engineering, Ministry of Education, Southeast University, Nanjing 210096, China
2
Advanced Ocean Institute of Southeast University, Nantong 226000, China
*
Author to whom correspondence should be addressed.
J. Mar. Sci. Eng. 2024, 12(10), 1765; https://doi.org/10.3390/jmse12101765
Submission received: 28 August 2024 / Revised: 25 September 2024 / Accepted: 3 October 2024 / Published: 5 October 2024
(This article belongs to the Special Issue Ship Wireless Sensor)

Abstract

:
Ship welding is a crucial part of ship building, requiring higher levels of robot coordination and working efficiency than ever before. To this end, this paper studies the coordinated ship-welding task, which involves multi-robot welding of multiple weld lines consisting of synchronous ones to be executed by a pair of robots and normal ones that can be executed by one robot. To evaluate working efficiency, the objectives of optimal lazy robot ratio and energy consumption were considered, which are tackled by the proposed dynamic Kuhn–Munkres-based model-free policy gradient (DKM-MFPG) reinforcement learning algorithm. In DKM-MFPG, a dynamic Kuhn–Munkres (DKM) dispatcher is designed based on weld line and co-welding robot position information obtained by the wireless sensors, such that robots always have dispatched weld lines in real-time and the lazy robot ratio is 0. Simultaneously, a model-free policy gradient (MFPG) based on reinforcement learning is designed to achieve the energy-optimal motion control for all robots. The optimal lazy robot ratio of the DKM dispatcher and the network convergence of MFPG are theoretically analyzed. Furthermore, the performance of DKM-MFPG is simulated with variant settings of welding scenarios and compared with baseline optimization methods. Compared to the four baselines, DKM-MFPG owns a slight performance advantage within 1% on energy consumption and reduces the average lazy robot ratio by 11.30%, 10.99%, 8.27%, and 10.39%.

1. Introduction

In recent decades, the development of the ship-building industry has been greatly stimulated by maritime transportation, which has become a core element of global economic development. The ship-building [1], ship repair, and offshore structures [2] industries highly rely on ship-welding techniques to construct more ships. To enhance welding efficiency, there is a trend to apply intelligent and innovative technologies for employing a group of robots to coordinately accomplish ship-welding tasks, known as the coordinated ship-welding problem.
The early welding task usually considers a single robot to achieve the shortest welding time or distance path. In [3], the spot welding task has been converted into the traveling salesman problem and is solved by the genetic algorithm [4] to realize the minimum time path planning. In [5], the improved lazy PRM algorithm based on online collision-free path planning is proposed for the arc welding robot. On the basis of planning the shortest path from one weld line to another, task allocation methods such as market-based allocation [6] help determine the welding sequence among numerous weld lines, which greatly decreases time consumption. In [7], the welding path planning problem considers the combinational optimization problem by arranging the sequence and directions of welding seams with time and energy optimization, which is tackled by an improved discrete MOEA/D based on a hybrid environment selection. Different from the single-robot welding case, coordinated welding, that is, some welding tasks are executed by a group of robots, has received more attention due to its efficiency and robustness. In [8], the double global optimum genetic algorithm—particle swarm optimization (GA-PSO)—is proposed to solve the welding path planning of two robots with the shortest collision-free path. In [9], the gantry welding with minimal path length and energy consumption is solved by the combination of a genetic algorithm, an adaptive reference direction-guided many-objective evolutionary algorithm, and a lazy probabilistic roadmap (PRM) trajectory planning method. Note that the coordinated welding problem requires both path planning to desired weld lines and multi-objective optimization such as minimal time and lazy robot ratio. Despite GA and MA for single-objective optimization, multi-objective evolutionary algorithms (MOEA) such as MOEA/D [10] and NSGA-II [11] can be effective. To further improve the performances of search and computation efficiency, NSGA-III is proposed in [12], which emphasizes non-dominated population members to get close to a set of supplied reference points. In [13], another MOEA method, AnD, is designed, which consists of an angle-based selection strategy and a shift-based density estimation strategy to delete poor individuals one by one in the environmental selection. To our knowledge, the case of utilizing more than two robots and the coordinated ship-welding process requirement of executing a weld line by a pair of robots together have not been solved. Furthermore, it is challenging when determining the welding sequence to dispatch the group of robots with multiple objectives.
According to the ship-building or ship-welding tasks [14], if two or more workers/robots fail to cooperate well, it will result in a heavy workload for one and idleness for the other. Note that excessive use and workload for ship-welding robots can lead to performance degradation and even functional failure. Therefore, how to prevent robots from being lazy and to enhance their working efficiency are important issues to balance the workload for all robots. In [15], the efficiency of ship-welding robots is significantly enhanced by using the ship’s block panel assembly process. Concerning energy consumption and efficiency, a multi-objective optimization method is proposed in [16] through parameter optimization for arc welding. When coordinated ship-welding tasks are required where a weld line needs to be welded by two robots simultaneously, the robot idleness becomes a more difficult problem, as the insufficient number of robots can not even finish the task, let alone optimize the working efficiency.
Recently, reinforcement learning (RL) [17] has shown the potential to achieve coordinated tasks of multiple agents under optimization objectives. In [18], RL algorithm deep deterministic policy gradient (DDPG) and convolutional neural networks are designed to achieve the dynamic coordinated scheduling task of automated guided vehicles. To quickly adopt the motion based on the resulting feedback of robots, model predictive control and RL controllers [19], intelligent optimization algorithm and DDPG controllers [20], and deeply learning-based recurrent neural network controllers [21] are designed. In [22], a RL-based dueling deep Q-network is used for the optimization of berth allocation with multiple ships. To deal with multiple optimization objectives, RL serves as an end-to-end framework for providing effective solutions. In [23], a deep RL-based multi-objective optimization algorithm is proposed to decompose the multiple objectives and collaboratively optimize them as parameters of neural networks. Considering the dynamic environment and multi-objective optimization, RL is designed in [24] to learn better evolutionary behaviors from dynamic environment information. Nevertheless, current RL methods seldom focus on the coordinated ship-welding problem, especially when synchronous welding requirements and multiple objectives coexist.
Motivated by the above discussion, this paper focuses on the coordinated ship-welding task consisting of synchronous weld lines executed by a pair of robots and normal weld lines executed by one robot, where the objectives for robots are the optimal lazy robot ratio and energy consumption while accomplishing the task. In the literature, most coordinated welding problems focus on the case of two robots [3,5,7,8,9]. This is due to the fact that two robots’ cooperation is the basic and simplest case in coordinated ship welding. As welding tasks become increasingly complex, the cooperation of more than two robots becomes a trend and hence brings some new problems in determining how to improve the working efficiency in scaling when lazy robots occur. To this end, the DKM dispatcher in DKM-MFPG is designed to dynamically dispatch all the unemployed robots. Then, an MFPG motion planner is designed to achieve the energy-optimal motion for robots. The contributions of this paper are listed as follows:
(1)
Different from the welding tasks [3,5,7,8,9] with one or two robots and normal weld lines, the coordinated ship-welding task in this paper requires multiple robots with synchronous weld lines that need to be executed by a pair of robots together. To this end, a novel DKM-MFPG algorithm is proposed to achieve the coordinated ship-welding task composed of normal weld lines and synchronous weld lines, which is a first attempt and effective solution applied in future large-scale unmanned ship-welding scenarios. In this paper, the optimal lazy robot ratio under the DKM dispatcher and the convergence of neural networks under MFPG are analyzed theoretically in detail, which is ignored in [18,22].
(2)
Distinct from the single objective of shortest time/distance path in [3,5,8] and the objectives of shortest time/distance path and energy consumption in [7,9], this paper intends to achieve the minimal lazy robot ratio and task accomplishment time, which are tackled by the DKM dispatcher and the MFPG motion planner, respectively. Comparison experiment results show that our DKM-MFPG algorithm outperforms single- and multi-objective optimization methods on the lazy robot ratio by around 10%, while maintaining close energy consumption performance.
The rest of the article is organized as follows. Section 2 demonstrates the formulation of the problem. Section 3 is the methodology of DKM-MFPG with theoretical analysis. Section 4 numerically simulates DKM-MFPG under variant scenarios and comparison baselines with result analysis. Finally, Section 5 is the conclusion.

2. Problem Formulation

Consider that the coordinated ship-welding task in this paper is constructed on a grid-world Cartesian coordinate system W = o x y with origin O, horizontal direction x, vertical direction y, and grid length c R + . The coordinated ship-welding task is composed of n robots labeled by i = 1 , , n and m weld lines for ship building labeled by j = 1 , , m . In this task, n robots are required to weld all m weld lines with m s  synchronous weld lines (SWLs) labeled by j = 1 , , m s and m m s  normal weld lines (NWLs) labeled by j = m s + 1 , , m . Each of the SWLs is required to be synchronously executed by a pair of robots, and each of the NWLs can be finished by one robot. The execution of welding is started from the welding starting point p j * = [ x j * , y j * ] T for the jth weld line.
In practice, the lengths of each weld line L j R + are not all the same. Meanwhile, according to the literature [5,9], weld lines in a ship-welding task can be placed in different directions, such as vertical (i.e., welding from the top to the bottom of the line, parallel to the y-axis), horizontal (i.e., welding from the right to the left of the line, parallel to the x-axis), and oblique (i.e., 45° to the x- or y-axis). The ideal execution path varies depending on the direction in which the weld is placed. The ideal velocity at which a weld j is executed is c j R 2 , where c j = c , 0 T for the horizontal, c j = 0 , c T for the vertical, and c j = c , c T or c j = c , c T for the oblique direction. An illustrative example of the coordinated ship-welding task is shown in Figure 1.
Assumption 1.
Each weld line can only be executed once.
The kinematics of the ith ship-welding robot is given by
p i ( k + 1 ) = p i ( k ) + u i ( k ) ,
where p i ( k ) = x i ( k ) , y i ( k ) T R 2 denotes its position at time step k, and u i ( k ) = u x i ( k ) , u y i ( k ) T R 2 is the motion input and can be selected as move up ( u i = [ 0 , c ] T ), down ( u i = [ 0 , c ] T ), left ( u i = [ c , 0 ] T ), right ( u i = [ c , 0 ] T ), still ( u i = [ 0 , 0 ] T ), and oblique ( u i = [ c , c ] T , u i = [ c , c ] T , u i = [ c , c ] T , u i = [ c , c ] T ), where the oblique motion can only be chosen for the welding procedure of oblique weld lines.
Remark 1.
In the practical deployment of the real-world ship building environment [9], the motion of welding robots for ship building is always restricted on the gantry welding system. As shown in Figure 2, welding robots can only move up, down, left, or right along with the gantry. To address this, this paper mainly focuses on constructing the coordinated ship-welding environment and the motion of robots on the grid world to simulate this process.
Remark 2.
To focus the study on the following dispatch of the welding sequence in the coordinated ship-welding task and the optimization objectives, the robots’ kinematics take the linear grid motion as shown in (1). Noting that robots with nonlinear dynamics can generally be efficiently controlled by existing mature techniques, such as feedback linearization [25], this paper does not put effort into the robot’s complicated dynamics.
Next, two core issues of the coordinated ship-welding task will be put forward, that is, welding sequence dispatch and coordinated ship-welding motion.
Welding sequence dispatch. Due to the existence of multiple robots with multiple weld lines, how to reasonably assign the welding sequence of each robot to the weld lines is crucial to improving the task accomplishment efficiency. The dispatch of robot i can be expressed as χ i : R [ 0 , 1 , , m ] , χ i ( 0 ) = 0 . χ i ( k ) = j means that the ith robot is dispatched to the jth weld line, while χ i ( k ) = 0 means that it is not dispatched to any weld line.
Coordinated ship-welding motion. On the basis of the weld lines dispatched to robots, since the initial positions of robots are not always exactly at the ideal weld starting point of weld lines, robots need to first move from their initial position to the weld starting point of each weld line before executing the welding process. Before giving the definition of coordinated ship-welding error, first, the completion of welding L ¯ i , j ( k ) = L ¯ i , j x ( k ) , L ¯ i , j y ( k ) T R 2 associated with the ith robot and jth dispatched weld line at k time is defined, that is,
L ¯ i , j ( k ) = p i ( k ) p j * s i ( k ) L j c j c j , for NWLs ; p i ( k ) p j * s i ( k ) L j c j c j + p i ( k ) p i ( k ) s i k , for SWLs ;
where s i ( k ) reflects whether the ith robot reaches the welding starting point at k, and if so, s i ( k ) = 1 , vice versa s i ( k ) = 0 . L j c j c j is the weld path relative to the starting point that needs to be tracked to complete welding of the jth weld line. For NWLs, this value directly portrays the completion of welding, whereas for synchronous welds, the completion of welding is determined by the real-time position between the ith robot and the other i th robot that is co-welding with it. This means that the two robots are required to maintain synchronized welding at all times to complete the SWLs.
Based on the welding completion, the coordinated ship-welding error e i ( k ) R 4 of the ith robot with the dispatched weld j = χ i ( k ) at k is
e i ( k ) = x i ( k ) x j * 1 s i ( k ) , y i ( k ) y j * 1 s i ( k ) , L ¯ i , j x ( k ) , L ¯ i , j y ( k ) T ,
where the first two terms correspond to the position error between the ith robot and the welding starting point of the dispatched jth weld line before the start of the welding, and the last two terms correspond to the welding completion in (2).
Based on (1), the coordinated ship-welding error dynamics can be expressed as
e i ( k + 1 ) = x i ( k ) + u x i ( k ) x j * 1 s i ( k + 1 ) , y i ( k ) + u y i ( k ) y j * 1 s i ( k + 1 ) , L ¯ i , j x ( k + 1 ) , L ¯ i , j y ( k + 1 ) T ,
where L ¯ i , j ( k + 1 ) = L ¯ i , j x ( k + 1 ) , L ¯ i , j y ( k + 1 ) T R 2 satisfies
L ¯ i , j ( k + 1 ) = p i ( k ) + u i ( k ) p j * s i ( k + 1 ) L j c j c j , for NWLs ; p i ( k ) + u i ( k ) p j * s i ( k + 1 ) L j c j c j + p i ( k ) + u i ( k ) p i ( k ) u i ( k ) s i k + 1 . for SWLs ;
This paper considers the coordinated ship-welding task under the following two optimization objectives.

2.1. Lazy Robot Ratio

As shown in Figure 1, if SWLs exist in the coordinated ship-welding tasks, more robots need to be kept active such that the welding of SWLs can be finished successfully. In other words, in the most extreme cases when only one robot is active and the others are lazy, none of the SWLs can be finished and the coordinated ship-welding task has failed. Therefore, to keep as many robots as possible active and to prevent the robots from being lazy, the concept of the lazy robot ratio is introduced in this paper.
The lazy robot ratio is defined as the percentage of the number of lazy robots to all robots, where lazy robots are defined as robots that do not have any weld lines dispatched when there are still not any dispatched weld lines. The formulation of the lazy robot ratio at time step k can be expressed as
R ( k ) 0 , if all robots or weld lines have been dispatched ; i = 1 n 1 sgn χ i k n , else ;
where sgn · is the sign function and χ i ( k ) is the dispatched weld line index for the ith robot. Note that if the ith robot is not dispatched to a weld line, then χ i ( k ) = 0 ; otherwise, χ i ( k ) is greater than 0 corresponding to the weld line label, and then 1 sgn χ i k = 0 .
To measure the lazy robot ratio, the dispatch χ i ( k ) for all robots i = 1 , , n at each time step needs to be designed and used, and the real-time states of weld lines need to be known (i.e., whether weld lines exist that have not been dispatched to robots). The measure of the lazy robot ratio helps evaluate the working efficiency of robots. When it equals 0 most of the time, it means all robots are put to work and the working efficiency can be high. Therefore, in coordinated ship-welding tasks with multiple robots (more than two), it is required to employ as many available robots as possible to improve overall efficiency.
Then, the objective for optimizing the lazy robot ratio is defined as
min χ i k = 0 N a R ( k ) ( N a + 1 ) n ,
where k = N a denotes the time step at which all weld lines have been dispatched to robots. The above objective indicates the ratio of lazy robots among all the n robots at each time step from k = 0 to k = N a . After k = N a , all weld lines have been dispatched and R ( k ) will be 0. A smaller value of this objective means a higher ratio of robots put to work, which helps guarantee the accomplishment of the coordinated ship-welding task. Minimizing the objective in (7) not only ensures the working efficiency of robots, but guarantees that the scenario of not having enough robots for SWLs will be much less likely to occur.

2.2. Energy Consumption

Based on the coordinated ship-welding error dynamics (4), the optimal coordinated ship-welding energy for the ith robot is defined as
J i * e i k , u i * k = min u i k k = 0 C i e i k , u i k ,
where u i * k is the optimal motion planner associated with u i k Ω u , Ω u is the admissible set,
C i e i ( k ) , u i ( k ) = e i T k I e e i k + u i T k I u u i k
is the cost function, and I e , I u are diagonal matrices with constant diagonal elements. In the above equation, the optimization of the e i T k I e e i k term allows the coordinated ship-welding error to converge to 0, i.e., the robot achieves coordinated ship welding of the weld lines; the u i T k I u u i k term reflects the energy required by the planner. In the energy function (8), if the cost function C i equals 0, energy consumption J i * will not increase anymore, and its value remains unchanged. To achieve C i = 0 , in (9), it is noticed that the non-negative items e i T k I e e i k and u i T k I u u i k should both be 0, which requires reducing the welding error e i to 0 under the designed control input u i . Note that the u i under consideration is an admissible control, which means it can make e i converge to 0, and then u i will also equal 0 to keep the fixed equilibrium point. Therefore, minimizing energy use requires reducing the welding error to 0 under the designed u i .
Remark 3.
The challenges of achieving optimal energy consumption lie in the fact that (1) the optimal motion planner u i * k needs to be designed to achieve both the convergence of e i and optimal energy consumption J i * under the assumption that the planner is in the admissible set Ω u ; and (2) the optimal planner should be solved by the corresponding discrete-time Bellman function, where its analytical solutions can be hard to obtain. Note that reinforcement learning can solve the above problem online through neural networks to obtain the optimal policy. Therefore, the reinforcement learning-based motion planner is considered in this paper to solve these problems.
Definition 1.
[Coordinated ship-welding task problem.] Considering the coordinated ship-welding task consisted of m weld lines with m s SWLs and n robots in (1). Suppose Assumption 1 holds. Design welding sequence dispatch input χ i and motion planner u i for each robot such that the optimal lazy robot ratio (7) and energy consumption of coordinated ship-welding motion (8) are achieved.
Remark 4.
The coordinated welding problem in this paper differs from the other welding problems in three aspects: (1) Different from the single-robot cases [3,5,7] and two-robot cases [8,9], more than two robots are considered to be used, which can be challenging for dispatching. (2) The coordinated ship-welding motion requires two robots welding a SWL together, which is not considered in [3,5,7,8,9] and brings difficulty to determining the optimal welding sequence. (3) Distinct from the minimal time and energy included in the multi-objective welding problems [7,9], the lazy robot ratio of more than two robots and energy consumption are to be minimized.

3. Methodology of DKM-MFPG

To solve the problem in Definition 1, the DKM-MFPG (dynamic Kuhn–Munkres model-free policy gradient) algorithm based on dispatching and reinforcement learning is proposed, where the DKM (dynamic Kuhn–Munkres) dispatcher uses the position information obtained by wireless sensors to dynamically design the welding sequence dispatch and ensures a low lazy robot ratio, while the MFPG (model-free policy gradient) optimal motion planner yields an energy-optimal planner for the coordinated ship-welding motion based on the dispatch results of the DKM. The DKM-MFPG framework is illustrated in Figure 3.

3.1. DKM Dispatcher

The design of the DKM dispatcher aims to reduce the lazy robot ratio through a dynamic dispatch mechanism that immediately dispatches a weld line that has not yet been dispatched to any robot, to a robot that has not been dispatched to any weld line in real-time. At the same time, considering that the term e i T k I e e i k of the motion energy cost (9) is affected by the welding sequence dispatch, the design of the DKM dispatcher also prioritizes this term with a smaller value, thus providing the basis for subsequent motion design.
To determine whether the ith robot should be dispatched to the jth weld line, a dispatch degree function is first designed as follows:
D i , j ( k ) = e i T k I e e i k d i ( k ) + ε ,
when the ith robot is not assigned to any weld line at k, d i ( k ) = 1 , otherwise, d i ( k ) = 0 ; ε is a very small positive constant. Based on the above equation, the dynamic dispatch mechanism of DKM can be described as follows: when the ith robot is not assigned to any weld line at time step k, the smaller the value of e i T k I e e i k with the jth weld line, the smaller the value of the dispatch degree D i , j ( k ) . At this time, the smaller the value of the dispatch degree D i , j ( k ) , the greater the probability that the ith robot is immediately dispatched to the jth weld line; when the ith robot is dispatched to a weld line at k, the value of D i , j ( k ) tends to infinity, and the probability that the ith robot will be dispatched to another weld line before completing welding of the current weld line is nearly zero.
Remark 5.
Note that the dispatch degree function (10) includes e i ( k ) , which is determined by real-time positions of the jth dispatched weld line and the i th co-welding robot associated with the ith robot. According to [26], this position information can be effectively localized and broadcasted by wireless sensor devices among robots. Therefore, the position information is directly used in the DKM dispatcher.
Based on the dispatch degree function (10), the dispatch degree matrix D ( k ) R n × ( m + m s ) between all the robots and all the weld lines can be obtained as
D ( k ) = D 2 m s ( k ) D m m s ( k ) ,
where D 2 m s ( k ) R n × 2 m s is the dispatch degree matrix between n robots and m s SWLs defined as
D 2 m s ( k ) = D 1 , 1 ( k ) D 1 , 1 ( k ) D 1 , m s ( k ) D 1 , m s ( k ) D n , 1 ( k ) D n , 1 ( k ) D n , m s ( k ) D n , m s ( k ) ,
which contains n rows and 2 m s columns; this is because when dispatching, each SWL requires a row corresponding to a robot in each of its two corresponding columns. The D m m s ( k ) R n × ( m m s ) is the dispatch degree matrix between n robots and m m s NWLs defined as
D m m s ( k ) = D 1 , m s + 1 ( k ) D 1 , m ( k ) D n , m s + 1 ( k ) D n , m ( k ) .
In order to determine the welding sequence dispatch of n robots to m welds based on the dispatch degree matrix, the following lemma is introduced.
Lemma 1.
(Kuhn–Munkres (KM) algorithm [27].) Consider an arbitrary n × m matrix G with its element g i , j in row i { 1 , . . . , n } and column j { 1 , . . . , m } . A set of elements of matrix G are said to be independent if no two of them lie in the same line (the word “line” applies both to the rows and to the columns of matrix G). Therefore, a set of n independent elements [ g 1 * , . . . , g n * ] T of matrix G are chosen so that the sum of these elements is minimum. The KM algorithm K M ( · ) : R n × m R n can be denoted by
K M ( G ) = G * ,
where G * = [ g 1 * , . . . , g n * ] T is the minimum dispatch for G.
Based on the dispatch degree function (10), the dispatch degree matrix (11), and the KM algorithm (14) with Lemma 1, the DKM dispatcher is designed as
χ i ( k ) = j , if D i , j ( k ) = K M D ( k ) | i ; 0 , else ;
which means the ith robot is dispatched to the jth weld line if the value of the dispatch degree equals the minimum value of the ith row assignment in K M D ( k ) between the ith robot and the jth weld line. From (10) and (15), the relationship between the DKM dispatcher and the resulting allocation can be summarized as the dynamic dispatch in real-time, which guarantees that lazy robots will not exist. This is reflected in d i ( k ) in (10). When the ith robot has a dispatched weld line, d i = 0 , which means D i , j ( k ) is a very large value for any j = 1 , , m . On the other hand, when the ith robot has just finished its previously dispatched weld lines and has not been assigned, d i = 1 , and D i , j ( k ) becomes a smaller value. Note that the Kuhn–Munkres algorithm always selects the minimum value of D i , j for the ith robot. That is to say, when d i = 0 , D i , j tends to infinity such that the ith robot will not be dispatched to a new weld line j until it finishes its current welding task; when d i ( k ) = 1 the ith robot can be dispatched with a weld line according to the value of  D i , j .
Proposition 1.
Consider the coordinated ship-welding task consisting of n robots in (1) and m s SWLs and m m s NWLs with different lengths and placement directions. Assume Assumption 1 holds and all robots will complete the welding after being dispatched to a weld line. Then the welding sequence dispatch χ i ( k ) output by Algorithm 1 can achieve the optimal lazy robot ratio objective (7).
Algorithm 1 DKM dispatcher
  • Initialization At k = 0 , set the number n and the initial positions p i ( 0 ) of robots. Set the number m s of SWLs, the number m m s of NWLs.
  • Calculate the dispatch degree matrix D ( 0 ) by (11).
  • Determine χ i ( 0 ) based on (15) and D ( 0 ) .
  • for time step k 1 , , N a  do
  •     Let j = χ i ( k 1 ) .
  •     Calculate the dispatch degree matrix D ( k ) by (11).
  •     Determine χ i ( k ) based on (15) and D ( k ) .
  •     Until all weld lines have been dispatched, end loop.
  • end for
  • Output: Welding sequence dispatch χ i ( k ) between all robots i = 1 , , n and all weld lines j = 1 , , m at k = 0 , , N a .
Proof. 
When the number of robots is greater than the sum of the number of welding robots required for all weld lines, the weld sequence can be determined by one-step dispatch; otherwise, multi-step dispatch is required. For both cases, the achievement of objective (7) is analyzed here.
Case 1. The number of robots is greater than the sum of the number of welding robots required for all weld lines, that is, n m + m s . In this case, the DKM dispatcher can assign all the weld lines to the robots in one step. According to the definition, lazy robots are defined as robots that do not perform welding tasks when there are still not dispatched and unfinished weld lines. Therefore, R ( 0 ) = 0 holds.
Case 2. In the case where there are fewer robots than weld lines, there are three kinds of possible states for dispatching: (1) Initial state at time step 0. At this state, all robots are dispatched with a weld line, which means no lazy robots exist and R ( 0 ) = 0 ; (2) Intermediate state between 0 and the last time step for dispatching. During this time, once a robot has finished its previously dispatched weld line, the DKM dispatcher instantly dispatches a new weld line for it. This means during this time, robots always have their dispatched weld lines; thus, no lazy robots exist and R ( k ) = 0 ; and (3) Final state when all weld lines have been dispatched at k = N a . As the last unwelded weld line has been dispatched to a robot, there is no weld line to be welded for robots just as in Case 1. To this end, lazy robots no longer exist from k = N a , and R ( k ) will remain 0 from now on. An example of the DKM dispatcher for this procedure can be found in Figure 4, where five robots and nine weld lines including a SWL labeled by 1 are dispatched, and the robots are all dispatched to a weld line until the final state with the lazy robot ratio equal to 0.
To sum up, the lazy robot ratio is always kept as R ( k ) = 0 under the DKM dispatcher, which is the minimal value for the non-negative variable. Therefore, it is concluded that the optimal lazy robot ratio can also be achieved in this case.    □

3.2. MFPG Motion Planner

In this subsection, the robot will be set with state, action, and cost functions based on the MFPG (model-free policy gradient) reinforcement learning method according to the real-time weld sequence dispatch of the DKM dispatcher. Then, the Q-network (action-value network) and the actor network are constructed to obtain the energy-optimal coordinated ship-welding motion planner. Meanwhile, inspired by the literature [28,29], convergence proof of network weights based on the Lyapunov function are given.
State. The state of the ith robot is defined as the coordinated ship-welding error (3), where the jth weld line is dispatched to the ith robot according to the DKM dispatcher. Note that since the state values may have duplicate values under different dispatched weld lines when the same robot has participated in multiple dispatches, the above states are accompanied by a label of the dispatched weld line number to differentiate them during training.
Action. The definition of the action of robot i is consistent with the coordinated ship-welding motion control input u i ( k ) R 2 in Equation (1).
Cost Function. The cost function for the ith robot is defined as the cost function in the optimal coordinated ship-welding motion energy (8), i.e., Equation (9).
Define J i e i k = k = 0 C i e i k , u i k to be the coordinated ship-welding motion energy function corresponding to the cost function. According to Bellman’s principle of optimality [17], the optimal coordinated ship-welding motion energy function should satisfy the following discrete-time Bellman function, that is,
J i * e i k = min u i k C i e i k , u i k + J i * e i k + 1 .
The above optimal value is achieved by designing the optimal action policy
u i * k = arg min u i k C i e i k , u i k + J i * e i k + 1 .
Notice that the cost function is affected by both the state and action of the robot, so to more accurately reflect the relationship between the cost function and the coordinated ship-welding motion energy function, the optimal Q-function (action-value function) is used in place of the optimal coordinated ship-welding motion energy function in (16), i.e.,
Q i * e i k , u i k = C i e i k , u i k + Q i * e i k + 1 , u i k + 1 .
The optimal action policy to achieve (18) is
u i * ( k ) = arg min u i ( k ) Q i * e i ( k ) , u i ( k ) .
To approximate the unknown Q in (18) and u i * in (19), the Q-network and actor network are designed as follows.
Q-network. Let W c i * R h c be the optimal weight parameter of the Q-network for the ith robot. Then the optimal Q-function can be approximated as
Q i * e i k , u i k = W c i * T ψ c i σ c i T z i k + δ c i ,
where ψ c i = tanh ( · ) is the activation function of the Q-network, and h c is the number of nodes in the hidden layer; σ c i R 6 × h c is the function of the network from the input layer to the hidden layer, z i ( k ) = e i T , u i T T , and δ c i is the network reconstruction error.
Actor network. Let W a i * R h a × 2 be the optimal weight parameter of the actor network of the ith robot. Then the optimal action policy can be expressed as
u i * e i k = W a i * T ψ a i σ a i T e i k + δ a i ,
where ψ a i = tanh ( · ) is the activation function of the actor network, and h a is the number of nodes in the hidden layer; σ a i R 2 × h a is the function of the actor network’s input layer to the hidden layer, and δ a i is the reconstruction error.
Let W ^ c i , W ^ a i be the estimates of W c i * , W a i * , respectively. Then the estimate network of the optimal Q-function and the optimal action policy is constructed as
Q ^ i e i ( k ) , u i ( k ) = W ^ c i T ψ c i σ c i T z i ( k ) ,
u ^ i e i ( k ) = W ^ a i T ψ a i σ a i T e i ( k ) .
To train the Q-network, the following training error function is defined:
E c i = 1 2 ε c i 2 ,
where ε c i = C i e i ( k ) , u i ( k ) + W ^ c i T Δ ψ c i , Δ ψ c i = ψ c i σ c i T z i k + 1 ψ c i σ c i T z i k . To minimize the above error function, the following gradient descent update is applied to the weights of the Q-network:
W ^ c i k + 1 = W ^ c i k α c i E c i W ^ c i k = W ^ c i k α c i ε c i Δ ψ c i ,
where α c i is a positive learning rate constant.
For the actor network, the weight parameters are updated using the policy gradient method, i.e.,
W ^ a i k + 1 = W ^ a i k α a i Q ^ i e i k , u i k W ^ a i k = W ^ a i k α a i W ^ c i T k ψ c i σ c i T η i ψ a i σ a i T e i k ,
where ψ c i = ψ c i z i k ; η i = z i k u ^ i k ; α a i is a positive learning rate constant.
Proposition 2.
Consider the coordinated ship-welding task consisting of n robots (1), m s SWLs, and m m s NWLs with different lengths and placement directions. Suppose Assumption 1 holds and that all the weld lines can be dispatched to the corresponding robots by Proposition 1, then the optimal coordinated ship-welding motion energy of all robots (8) can be achieved by Algorithm 2 under u ^ i . Meanwhile, the weight estimation errors W ˜ c i = W ^ c i W c i * and W ˜ a i = W ^ a i W a i * for the Q-network and the actor network are uniformly ultimately bounded (UUB).
Algorithm 2 MFPG motion planner
  • Initialize the weight parameters W ^ c i , W ^ a i of the Q-network and the actor network, as well as the coordinated ship-welding error e i 0 , the action u i 0 , and the cost function C i e i 0 , u i 0 .
  • for  k = 0 to N do
  •     Obtain the welding sequence dispatch χ i ( k ) by using Algorithm 1.
  •     Obtain the coordinated ship-welding error e i k according to (4), and the cost function value C i k according to (9).
  •     for  i = 1 to n do
  •         Update the weight parameters W ^ c i , W ^ a i of the Q-network and actuator network according to (25) and (26), respectively.
  •     end for
  •     Obtain the position of the next time step p i k according to (1) and the action u i k .
  • end for
  • Output: Optimal coordinated ship-welding motion planner u ^ i (23) for all robots i = 1 , , n .
Proof. 
Consider the candidate Lyapunov function
L ( k ) = L 1 ( k ) + L 2 ( k ) ,
where L 1 k = i = 1 n 1 α c i W ˜ c i T W ˜ c i , L 2 k = i = 1 n 1 α a i W ˜ a i T W ˜ a i .
Let ϕ c i = W ˜ c i T Δ ψ c i . Take the first difference of L 1 k along W ˜ c i while using Young’s inequality and the updated law of the Q-network (25) yields
Δ L 1 k = i = 1 n 1 α c i W ˜ c i T k + 1 W ˜ c i k + 1 i = 1 n 1 α c i W ˜ c i T k W ˜ c i k = i = 1 n 2 W ˜ c i T Δ ψ c i C i + W ˜ c i T Δ ψ c i + W c i * Δ ψ c i + α c i Δ ψ c i 2 C i + W ˜ c i T Δ ψ c i + W c i * Δ ψ c i 2 i = 1 n ϕ c i 2 + C i + W c i * Δ ψ c i 2 ,
which holds when 0 < α c i 1 Δ ψ ¯ c i 2 , where Δ ψ c i 2 Δ ψ ¯ c i 2 is bounded.
Also, let ϕ a i = W ˜ a i T ψ a i . Take the first difference of L 2 k along W ˜ a i while using Young’s inequality and substituting the updated law of the actor network (26) yields
Δ L 2 k = i = 1 n 1 α a i W ˜ a i T k + 1 W ˜ a i k + 1 i = 1 n 1 α a i W ˜ a i T k W ˜ a i k = i = 1 n 2 ϕ a i W ^ c i T ψ c i σ c i T η i + α a i W ^ c i T ψ c i σ c i T η i ψ a i 2 i = 1 n 1 α a i ψ c i 2 σ c i 2 ψ a i 2 W ^ c i 2 + 2 W ˜ c i 2 + 2 W c i * 2 + ψ c i σ c i T η i ϕ a i 2 .
Combining (28) with (29) yields
Δ L k i = 1 n ϕ c i 2 1 α a i ψ c i 2 σ c i 2 ψ a i 2 W ^ c i 2 + M ¯ ,
where M ¯ = C i + W c i * Δ ψ c i 2 + 2 W ˜ c i 2 + 2 W c i * 2 + ψ c i σ c i T η i ϕ a i 2 is bounded. By selecting 0 < α c i 1 Δ ψ ¯ c i 2 , 0 < α a i 1 ψ c i 2 σ c i 2 ψ a i 2 , one has W ^ c i M 1 α a i ψ c i 2 σ c i 2 ψ a i 2 , Δ L k < 0 , which implies the weight estimate errors W ˜ c i , W ˜ a i of the Q-network and the actor network are UUB. Then the optimal motion planner u ^ i of the actor network outputted by Algorithm 2 can approximate the optimal action policy u i * , which in turn achieves the optimal Q-function value corresponding to the optimal coordinated ship-welding motion energy. Therefore, Algorithm 2 can effectively achieve the energy consumption objective of coordinated ship welding (8). □
Remark 6.
The MFPG motion planner is specifically designed for the coordinated ship-welding task problem, where the state and action of each robot are based on the DKM dispatcher-based welding error and the motion in the grid-world environment, and the cost function is defined corresponding to the energy consumption optimization objective. Compared to the other reinforcement learning methods [22,23,24] applied to the robot energy optimization problems, our method has the following advantages: (1) the state of the welding error is novelly defined such that the coordinated ship-welding motion for normal and synchronous weld lines can both be well-controlled and achieved with optimal energy consumption; (2) the model-free update of the Q-network and actor network can be faster and more efficient, as the model dynamics are not calculated for the network updating procedure; and (3) the integration of the Q-network and the policy-gradient-updated actor network can simultaneously evaluate the energy cost and optimize the motion planner, and the network convergence is theoretically guaranteed to improve the learning stability.
Remark 7.
The specific welding process requirement of synchronous weld lines is originated from the practical application, which requires a pair of robots such that one holds a soldering gun, and at the same time, the other holds a soldering tin to cooperate with each other. To deal with this, the coordinated welding error is constructed, and its convergence and energy optimization are achieved by our MFPG motion planner.
From Propositions 1 and 2, we now give the main result as follows.
Theorem 1.
Consider the coordinated ship-welding task consisting of n robots (1), m s SWLs, and m m s NWLs with different lengths and placement directions. Suppose Assumption 1 holds. Then, the optimal lazy robot ratio (7) and coordinated ship-welding motion energy (8) objectives can be solved by the welding sequence dispatch χ i ( k ) of the DKM dispatcher in Proposition 1 and the actor putput u ^ i of the MFPG motion planner in Proposition 2.
Proof. 
According to the conclusions of Propositions 1 and 2, the welding sequence dispatch χ i ( k ) of the DKM dispatcher and the optimal coordinated ship-welding motion planner u ^ i of the MFPG can effectively achieve the optimal lazy robot ratio (7) and energy consumption (8) objectives, respectively. Therefore, the coordinated ship-welding optimal task problem in Definition 1 can be solved successfully. □
Remark 8.
The trade-offs between keeping robots active and minimizing energy use are balanced by first guaranteeing the optimal lazy robot ratio via the DKM dispatcher, and then achieving the optimal energy under this dispatch for all robots via the MFPG motion planner. This is due to more than two robots existing and hence the working efficiency is a prime consideration. The emphasis on the lazy robot ratio ensures as many robots are put to work as possible and hence how to optimize energy consumption becomes meaningful.

4. Numerical Simulation

In this section, two numerical simulations are set up, in which Simulation 1 aims to verify the effectiveness of the DKM-MFPG algorithm in multiple coordinated ship-welding scenarios, and Simulation 2 sets up a comparison between the DKM-MFPG algorithm and multiple single-objective optimization and multi-objective optimization algorithms, so as to demonstrate the advantages of the proposed method in this paper by analyzing the values of the optimization objectives as well as the performance indexes.

4.1. Setting of Test Scenarios

The test scenarios are based on a 20 × 20 grid world where the length of each grid is c = 0.05 . The execution welding speed is set to e T j = 0.05 for vertically and horizontally placed welds, and e T j = 0.0705 for obliquely placed welds. The time step of the simulation is capped at k = 100 . Five representative test scenarios corresponding to actual welding tasks are listed in Table 1, where the specific settings vary for different test objectives, i.e., the number of robots (S1 vs. S2–S5) and the number of weld lines (S1 vs. S2, S3 and S4, S5), line length (S1, S3 vs. S2, S4 and S5) and placement directions (S1, S2 vs. S3, S4 and S5). Specifically, the five test scenarios cover the following three situations:
(1)
Variant number of robots and weld lines.The number of robots and weld lines is the most important key point for affecting the performance of the proposed algorithm. Note that in the case of the number of robots larger than that of the weld lines, the dispatch can be easily determined by calculating the task accomplishment time in one step with a deterministic lazy robot ratio, which can be neglected. To comprehensively verify the performance of algorithms, two cases are considered: (I) the small scene case where the numbers of robots and weld lines are both small, e.g., S1; (II) the number of robots is smaller than that of the weld lines, e.g., S2–S5, where more than 10 weld lines are set since 10 weld lines cover at least 20 grids occupying most of the area of the grid world with 20 × 20 . The comparison between S1 and S2–S5 is given to test the performance associated with the variant robot number, and at the same time, the comparisons between S1, S2 and S3, S4 and S5 are given to test the effectiveness under the variant number of weld lines.
(2)
Variant length of weld lines. Note that the finishing time of a weld line executed by one robot is calculated by the robot’s initial position toward the welding starting point and the weld length. Therefore, the variant length of weld lines can lead to numerous different dispatching solutions when other settings remain unchanged. Specifically, S1 and S3 have the same length of weld lines, whereas the different lengths are set for comparison in S2, S4, and S5.
(3)
Variant placement of weld lines. S1 and S2 show the general case of vertical placement, whereas S3 considers the case of horizontal placement. S4 and S5 test the variant placements including the combination of different directions.
The variant settings aim to cover the utmost possible scenarios for the coordinated ship-welding task problem in practice. The settings of variant number and length of weld lines are common in [3,5,7,8,9]. The setting of variant placement of weld lines is considered in [5,9] with different starting and ending positions, which is a more complex dispatching situation. It is noted that the number setting of robots is first under consideration in accordance with the requirement of synchronous welding. It has the potential to discover an efficient way of multiple welding robots with a low lazy robot ratio.
The test of these scenarios validates that our proposed method is applicable to various settings of coordinated ship-welding tasks, including the cases of (1) small and simple scenes with few robots and weld lines; (2) large and simple scenes with more than two robots and more than ten weld lines; (3) small but complex scenes with few robots and weld lines, where the lengths and placement of weld lines are different; and (4) large and complex scenes with numerous robots and different types of weld lines. Except for the above coordinated ship-welding scenarios, DKM-MFPG also has the potential to be applied to problems that are constructed on the grid-world environment and require coordination between robots with optimization objectives.
Remark 9.
In the literature [3,5,7,8,9], the spot welding problem can be regarded similarly to the traveling salesman problem (TSP) [30] as the benchmark problem. However, the TSP only considers the single-salesman case with single-objective optimization, e.g., the time cost or the distance cost, which can only be used to represent single-robot welding. Furthermore, the length and placement settings of different weld lines are not considered in this benchmark. As a result, the methods in the TSP are not applicable to this paper. Another potential benchmark problem, that is, the multi-objective multiple traveling salesman problem (MOmTSP) [31], employs multiple salesmen and optimizes multiple objectives. To our knowledge, complex situations such as synchronous welding (that is, two robots are required for welding one weld line simultaneously) and welding settings of different lengths and placements have rarely been discussed in MOmTSP. Therefore, the five test scenarios are designed to justify the testing effectiveness of the proposed algorithm by taking into account the following specific settings: the variant number of robots and weld lines, variant length, and placement direction of weld lines.
The simulation software used for testing the S1 to S5 scenarios is based on Python and OpenAI Gym environment [32], where the scenarios are constructed on the grid world. The positions of weld lines and robots, and the dynamics of robots are defined and initialized within the grid-world environment. The algorithms are introduced by (1) obtaining the position and related data from the software environment; (2) calculating the welding sequence dispatch of the DKM dispatcher based on the position data; (3) determining the state, action, cost function, and network updates of the MFPG motion planner based on the welding sequence dispatch and positions; and (4) applying the planner for the environment at every step until obtaining all the required resulted data. The outcome form for the resulted data contains (1) arrays of each robot’s trajectory in the environment; (2) arrays of the number of the lazy robots; and (3) arrays of the state, action, cost function, and network weights of the robots at every time step.

4.2. Simulation 1: Effectiveness Validation

The parameters of DKM-MFPG are given as follows. The learning rates are α c i = 0.1 and α a i = 0.01 . The number of hidden units of Q and actor networks is 6, and their weights are initialized randomly within 0 , 1 and 1 , 0 , respectively. The constant elements of the diagonal matrices I s , I u in (9) are set as 1.
The robot motion trajectories for S1–S5 are plotted in Figure 5, and the results show that the robots successfully complete the welding of all weld lines in these scenarios under the DKM-MFPG algorithm. From Figure 6, it can be seen that the number of non-lazy robots equals the initial setting number of robots in each scenario until the last dispatched time step k = N a in all scenarios, which implies that the lazy robot ratio is always 0 and the optimization objective of the lazy robot ratio in (7) as well as Proposition 1 is validated. The curves of state, action, cumulative energy values, and weights of the network for all scenarios under the DKM-MFPG algorithm are plotted in Figure 7. According to the simulation result in Figure 7, the curves of Q-network weights remain unchanged and the energy cost curves also plateau simultaneously, which validates the accomplishment of the desired training performance. It can be seen that the above curves converge gradually in all the scenarios, which verifies the conclusion of Proposition 2 that the motion energy optimization objective (8) of coordinated ship welding is accomplished. Therefore, combining the above results, as stated in Theorem 1, DKM-MFPG can effectively complete the coordinated ship-welding task problem in Definition 1.

4.3. Simulation 2: Comparison Simulation

Comparison baselines. To analyze the performance of the DKM-MFPG algorithm by comparison, four baseline algorithms are set, namely, GA [4] (genetic algorithm) and MA [6] (market-based allocation) based on single-objective optimization, and NSGA-III [12] and AnD [13] algorithms based on multi-objective optimization. It is noted that there have been many attempts to successfully apply single-objective optimization methods (e.g., some evolutionary algorithms) to multi-objective optimization problems, as indicated in the literature [33]. Single-objective optimization methods can yield solutions with better objective values in a shorter execution time, or perhaps even better solutions among multiple objectives for a given objective value. Therefore, to test the effectiveness of different algorithms more comprehensively, single-objective optimization methods such as GA and MA are also set as baselines in this section.
Consider that there are two optimization objectives in this chapter, i.e., lazy robot ratio and energy consumption for coordinated ship welding. To deal with the problem of balancing the two optimization objectives in a single-objective optimization method, a “weighted sum method” [34] can be used to combine each objective function into a single unified function, i.e.,
z = w 1 z 1 + w 2 z 2 ,
where z is the composite function, z 1 and z 2 are the two objectives, and w 1 and w 2 are the weights selected as ranging from 0.1 to 0.9 with interval 0.1 such that w 1 + w 2 = 1 as performed in [35]. For GA, NSGA-III, and AnD, the population size, the crossover rate, the mutation rate, and the number of iterations are set as 200, r c = 0.6 , r m = 0.1 , and 200, respectively. The fitness function and cost of bid for GA and MA are selected as the sum of the lazy robot ratio and energy consumption as in (31).
Performance metrics. In addition to the optimization objectives of lazy robot ratio and the evaluation of energy consumption, the performance metrics of the coverage rate (C-metric) and the execution time have been added to better evaluate the performance between different algorithms.
C-metric [36]: C-metric is used to compare the rate of non-dominated solutions between algorithms A and B, which is calculated as
C A , B = b B ; a A : a b B ,
where · denotes the number of elements in a set; a and b denote the solution obtained by A and B, respectively; a b means a dominates b. From (32), the larger C A , B is, the more solutions of B dominated by at least one solution of A exist. If C A , B = 1 , all solutions of B are dominated by at least one solution in A; if C A , B = 0 , no solution of B is dominated by any solution of A. Note that C A , B is not necessarily equal to 1 C B , A ; thus, both the C A , B and C B , A are calculated. If C A , B is large and C B , A is small, A is better than B, in a sense.
Execution time [37]: The execution time is defined as the time from the start point in producing solutions to the final time when no more solutions are obtained or when the iteration number reaches the limit by executing an algorithm. The shorter the execution time, the stronger the possibility of the competitiveness of the algorithm for real-world applications.
The numerical simulation results of the Pareto front of all methods under S1–S5 are given in Figure 8, and the average objective values calculated by the solutions of the Pareto front are shown in Table 2. It is noted that the diverse solutions of single-objective optimization methods are obtained by tuning different sets of weights for calculating the sum of the two objectives as the objective/fitness function, and the non-dominated solutions can then be picked out just like the Pareto front obtained by the multi-objective optimization methods. DKM-MFPG only has one certain solution due to the fact that the DKM dispatcher has determined the minimal lazy robot ratio dispatch; meanwhile, MFPG directly drives robots to reach the corresponding path with optimal energy. The values of the C-metric and the execution time are given in Table 3 and Table 4, respectively.
Comparison of DKM-MFPG with baselines. From Figure 8a, all methods find nearly the same non-dominated solution in S1. This is because S1 only contains four robots and six weld lines, a small scenario with limited exploration space. From Figure 8b–e with more than ten weld lines in S2–S5, more than one solution can be obtained by the other four methods due to the variant number of robots and weld lines, and variant length and placement direction of weld lines. From Figure 8 and Table 2, it is indicated that the solution of DKM-MFPG outperforms most of the solutions of the other methods on the lazy robot ratio, maintaining approximate performance in terms of the energy consumption compared to the other methods. Specifically, DKM-MFPG reduces the average lazy robot ratio by 11.30%, 10.99%, 8.27%, and 10.39% compared to baselines, respectively. The reason for this lies in the fact that the DKM dispatcher can dynamically and immediately dispatch new tasks for robots that have just finished their previous tasks and tend to be lazy; thus, the lazy robot ratio can be decreased.
Moreover, from Table 3, except for S1, more than 80% of the C-metric values of DKM-MFPG with respect to the other four methods are over 0, and nearly all the C-metric values of the other methods with respect to DKM-MFPG are 0. This means that few of the solutions obtained by the other methods dominate DKM-MFPG’s. This is also due to the fact that the DKM-MFPG algorithm’s lazy robot ratio is almost zero, so that a dominated solution exists only if other baseline algorithms reach the same lazy robot ratio and have less energy consumption.
From Table 4, the average execution time of DKM-MFPG is the lowest, saving 84.53%, 53.57%, 91.20%, and 97.45% time by comparison. This is because the pre-training and end-to-end optimization of the reinforcement learning model can effectively secure the execution procedure.

4.4. Discussion

Compared with the baselines, DKM-MFPG achieves close performance on energy consumption and an increase of around 10% on the lazy robot ratio. The reason lies in that the DKM dispatcher prioritizes the lazy robot ratio by its dynamic dispatch mechanism; therefore, the lazy robot ratio is always 0, and thus it has an advantage over the baselines. In addition, the execution time of DKM-MFPG also has an obvious advantage over the baselines by saving more than 50% of the time. This is because reinforcement learning methods train the model first, then directly output the optimal policy according to the trained model and real-time input, which secures the execution procedure. To sum up, our designed DKM-MFPG approach successfully achieves satisfactory results for the coordinated ship-welding task compared with baselines.
There are some known limitations of the proposed DKM-MFPG approach. Compared with heuristic searching methods such as GA, NSGA-III, and AnD, the diversity of solutions of DKM-MFPG is lacking due to the decomposition of the two objectives and the dispatching mechanism of the DKM dispatcher. Furthermore, as a reinforcement learning-based approach with the implementation of neural networks, a certain level of computer power is required to process large amounts of data, which is also a limitation of the heuristic or other learning-based methods.
When implementing DKM-MFPG for real-world ship-building environments, the following steps can be considered: (Step 1) Construct the mapping from the real-world to the simulator. (Step 2) Set the number of robots and the number, lengths, and placement of weld lines in the simulator according to real-world specifications. (Step 3) Apply the DKM-MFPG algorithm and obtain the trajectory data of all robots. (Step 4) Implement the trajectory data for real-world robots to guide their motion. Compared to the simulation, the following uncertainties can be encountered in the experiment: (1) Sensor noise. Due to the limitation of the sensor precision and noise, the position information cannot always be precise. This uncertainty can be tackled by constructing the grid world with a larger grid length, such that the sensing error within a certain range will not affect the choice of a specific grid point. (2) Actuator faults. When executing the control inputs given by DKM-MFPG, actuators cannot always execute the given control commands precisely. To deal with this, actuator faults can be detected and calculated in advance, and design compensation should be implemented for the planner to offset the impact.

5. Conclusions

This paper deals with the coordinated ship-welding task, with objectives of optimal lazy robot ratio and energy consumption. A DKM-MFPG RL algorithm is designed to achieve task accomplishment with minimal lazy robot ratio and energy using the DKM dispatcher and MFPG motion planner, respectively. The theoretical analysis of the DKM dispatching and the MFPG network weight convergence is provided, and the proposed algorithm performed well, validated by comparison of numerical simulations under different scenarios and with different methods.
For the ongoing research of further improving the performance of DKM-MFPG, increasing solution diversity and operating the robots with nonlinear motion trajectories can be potential options. The solution diversity of DKM-MFPG can be possibly improved by integrating the multi-objective optimization methods with the reinforcement learning cost, such that more sets of dispatches can be obtained.
Note that our DKM-MFPG algorithm deals with the optimization of the lazy robot ratio and energy consumption, which focuses more on the upper-level dispatch and motion planning in the above grid-world environment. Operating welding robots in environments with more complex motion dynamics can be an interesting lower-level control problem. This problem can possibly be solved by adding some restriction conditions for optimization in DKM-MFPG.
Moreover, constraints like robot battery life, tool wear, or path overlap between robots are important considerations in the coordinated ship-welding problem and hence, we will further devote ourselves to designing reinforcement learning-based approaches under constraints in future work.

Author Contributions

Conceptualization, R.Y. and Y.-Y.C.; methodology, R.Y. and Y.-Y.C.; software, R.Y.; validation, R.Y.; formal analysis, R.Y. and Y.-Y.C.; investigation, R.Y.; resources, Y.-Y.C.; data curation, R.Y.; writing—original draft preparation, R.Y.; writing—review and editing, R.Y. and Y.-Y.C.; visualization, R.Y.; supervision, Y.-Y.C.; funding acquisition, Y.-Y.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under Grant 62373097 and the Research Fund for Advanced Ocean Institute of Southeast University (Key Program) under grant number KP202408.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data presented in this study were obtained from simulation experiments. The algorithms used for these simulations are transparently described in the paper, and readers can independently design algorithms to generate similar data based on the provided pseudo-code.

Acknowledgments

The authors are highly thankful to Jianhong Wang from the Imperial College London, United Kingdom for providing continuous moral and technical support.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Zhang, X.; Zhang, M.; Jiao, S.; Sun, L.; Li, M. Design and Optimization of the Wall Climbing Robot for Magnetic Particle Detection of Ship Welds. J. Mar. Sci. Eng. 2024, 12, 610. [Google Scholar] [CrossRef]
  2. Bharti, S.; Kumar, S.; Singh, I.; Kumar, D.; Bhurat, S.S.; Abdullah, M.R.; Rahimian Koloor, S.S. A review of recent developments in friction stir welding for various industrial applications. J. Mar. Sci. Eng. 2023, 12, 71. [Google Scholar] [CrossRef]
  3. Zhang, Q.; Zhao, M.Y. Minimum time path planning of robotic manipulator in drilling/spot welding tasks. J. Comput. Des. Eng. 2016, 3, 132–139. [Google Scholar] [CrossRef]
  4. Grefenstette, J.J. Genetic algorithms and machine learning. In Proceedings of the Sixth Annual Conference on Computational Learning Theory, Santa Cruz, CA, USA, 26–28 July 1993; pp. 3–4. [Google Scholar]
  5. Zhou, X.; Wang, X.; Xie, Z.; Li, F.; Gu, X. Online obstacle avoidance path planning and application for arc welding robot. Robot Comput. Integr. Manuf. 2022, 78, 102413. [Google Scholar] [CrossRef]
  6. Mohamed, B.; Ahmed, H.; Alaa, K. A Comparative Study between Optimization and Market-Based Approaches to Multi-Robot Task Allocation. Adv. Artif. Intell. 2013, 2013, 1–11. [Google Scholar]
  7. Zhou, X.; Wang, X.; Gu, X. Welding robot path planning problem based on discrete MOEA/D with hybrid environment selection. Neural. Comput. Appl. 2021, 33, 12881–12903. [Google Scholar] [CrossRef]
  8. Wang, X.; Shi, Y.; Ding, D.; Gu, X. Double global optimum genetic algorithm–particle swarm optimization-based welding robot path planning. Eng. Optim. 2016, 48, 299–316. [Google Scholar] [CrossRef]
  9. Wang, X.; Xie, Z.; Zhou, X.; Gao, J.; Li, F.; Gu, X. Adaptive path planning for the gantry welding robot system. J. Manuf. Process. 2022, 81, 386–395. [Google Scholar] [CrossRef]
  10. Zhang, Q.; Li, H. MOEA/D: A multiobjective evolutionary algorithm based on decomposition. IEEE Trans. Evol. Comput. 2007, 11, 712–731. [Google Scholar] [CrossRef]
  11. Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 2002, 6, 182–197. [Google Scholar] [CrossRef]
  12. Deb, K.; Jain, H. An Evolutionary Many-Objective Optimization Algorithm Using Reference-Point-Based Nondominated Sorting Approach, Part I: Solving Problems With Box Constraints. IEEE Trans. Evol. Comput. 2014, 18, 577–601. [Google Scholar] [CrossRef]
  13. Liu, Z.Z.; Wang, Y.; Huang, P.Q. AnD: A many-objective evolutionary algorithm with angle-based selection and shift-based density estimation. Inf. Sci. 2020, 509, 400–419. [Google Scholar] [CrossRef]
  14. Weiers, B.J. The Productivity Problem in United States Shipbuilding; Technical Report; Office of the Secretary, US Department of Transportation: Washington, DC, USA, 1984.
  15. Wahidi, S.I.; Oterkus, S.; Oterkus, E. Simulation of a ship’s block panel assembly process: Optimizing production processes and costs through welding robots. J. Mar. Sci. Eng. 2023, 11, 1506. [Google Scholar] [CrossRef]
  16. Yan, W.; Zhang, H.; Jiang, Z.g.; Hon, K. Multi-objective optimization of arc welding parameters: The trade-offs between energy and thermal efficiency. J. Clean. Prod. 2017, 140, 1842–1849. [Google Scholar] [CrossRef]
  17. Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
  18. Chen, C.; Hu, Z.H.; Wang, L. Scheduling of AGVs in automated container terminal based on the deep deterministic policy gradient (DDPG) using the convolutional neural network (CNN). J. Mar. Sci. Eng. 2021, 9, 1439. [Google Scholar] [CrossRef]
  19. Zhao, Y.; Wang, J.; Cao, G.; Yuan, Y.; Yao, X.; Qi, L. Intelligent control of multilegged robot smooth motion: A review. IEEE Access 2023, 11, 86645–86685. [Google Scholar] [CrossRef]
  20. Zhang, J.; Cheng, L.; Wang, T.; Xia, W.; Yan, D.; Wu, Z.; Duan, X. A welding manipulator path planning method combining reinforcement learning and intelligent optimisation algorithm. Int. J. Model. Identif. Control. 2019, 33, 261–270. [Google Scholar] [CrossRef]
  21. Wang, H.; Han, D.; Cui, M.; Chen, C. NAS-YOLOX: A SAR ship detection using neural architecture search and multi-scale attention. Conn. Sci. 2023, 35, 1–32. [Google Scholar] [CrossRef]
  22. Dai, Y.; Li, Z.; Wang, B. Optimizing berth allocation in maritime transportation with quay crane setup times using reinforcement learning. J. Mar. Sci. Eng. 2023, 11, 1025. [Google Scholar] [CrossRef]
  23. Li, K.; Zhang, T.; Wang, R. Deep reinforcement learning for multiobjective optimization. IEEE Trans. Cybern. 2020, 51, 3103–3114. [Google Scholar] [CrossRef]
  24. Zou, F.; Yen, G.G.; Tang, L.; Wang, C. A reinforcement learning approach for dynamic multi-objective optimization. Inf. Sci. 2021, 546, 815–834. [Google Scholar] [CrossRef]
  25. Khalil, H.K. Nonlinear Systems; Prentice Hall: New York, NY, USA, 2002. [Google Scholar]
  26. Demirkol, I.; Ersoy, C.; Alagoz, F. MAC protocols for wireless sensor networks: A survey. IEEE Commun. Mag. 2006, 44, 115–121. [Google Scholar] [CrossRef]
  27. Munkres, J. Algorithms for the assignment and transportation problems. J. Soc. Ind. Appl. Math A 1957, 5, 32–38. [Google Scholar] [CrossRef]
  28. Zhang, Y.W.; Zhao, B.; Liu, D.R. Deterministic Policy Gradient Adaptive Dynamic Programming for Model-Free Optimal Control. Neurocomputing 2020, 387, 40–50. [Google Scholar] [CrossRef]
  29. Cao, A.; Wang, F.; Liu, Z.; Chen, Z. Model-free event-triggered optimal containment control for multi-agent systems via adaptive dynamic programming. IEEE Trans. Control Netw. Syst. 2024, 11, 1452–1464. [Google Scholar] [CrossRef]
  30. Gavish, B.; Graves, S.C. The Travelling Salesman Problem and Related Problems; MIT Press: Cambridge, MA, USA, 1978. [Google Scholar]
  31. Trigui, S.; Cheikhrouhou, O.; Koubaa, A.; Baroudi, U.; Youssef, H. FL-MTSP: A fuzzy logic approach to solve the multi-objective multiple traveling salesman problem for multi-robot systems. Soft Comput. 2017, 21, 7351–7362. [Google Scholar] [CrossRef]
  32. Brockman, G. OpenAI Gym. arXi 2016, arXiv:1606.01540. [Google Scholar]
  33. Tan, K.C.; Lee, T.H.; Khor, E.F. Evolutionary algorithms for multi-objective optimization: Performance assessments and comparisons. Artif. Intell. Rev. 2002, 17, 251–290. [Google Scholar] [CrossRef]
  34. Konak, A.; Coit, D.W.; Smith, A.E. Multi-objective optimization using genetic algorithms: A tutorial. Reliab. Eng. Syst. Saf. 2006, 91, 992–1007. [Google Scholar] [CrossRef]
  35. Dai, P.; Liu, K.; Feng, L.; Zhang, H.; Lee, V.C.S.; Son, S.H.; Wu, X. Temporal information services in large-scale vehicular networks through evolutionary multi-objective optimization. IEEE Trans. Intell. Transp. Syst. 2018, 20, 218–231. [Google Scholar] [CrossRef]
  36. Zitzler, E.; Thiele, L.; Laumanns, M.; Fonseca, C.M.; Da Fonseca, V.G. Performance assessment of multiobjective optimizers: An analysis and review. IEEE Trans. Evol. Comput. 2003, 7, 117–132. [Google Scholar] [CrossRef]
  37. Li, S.; Wang, F.; He, Q.; Wang, X. Deep reinforcement learning for multi-objective combinatorial optimization: A case study on multi-objective traveling salesman problem. Swarm Evol. Comput. 2023, 83, 101398. [Google Scholar] [CrossRef]
Figure 1. An example of the coordinated ship-welding task.
Figure 1. An example of the coordinated ship-welding task.
Jmse 12 01765 g001
Figure 2. Motion of robots on the gantry.
Figure 2. Motion of robots on the gantry.
Jmse 12 01765 g002
Figure 3. The DKM-MFPG framework.
Figure 3. The DKM-MFPG framework.
Jmse 12 01765 g003
Figure 4. An example of the DKM dispatcher for Case 2.
Figure 4. An example of the DKM dispatcher for Case 2.
Jmse 12 01765 g004
Figure 5. Trajectories of robots for S1–S5 under DKM-MFPG.
Figure 5. Trajectories of robots for S1–S5 under DKM-MFPG.
Jmse 12 01765 g005aJmse 12 01765 g005b
Figure 6. Number of non-lazy robots for S1–S5 under DKM-MFPG.
Figure 6. Number of non-lazy robots for S1–S5 under DKM-MFPG.
Jmse 12 01765 g006
Figure 7. Time histories of state, action, energy, and weights of DKM-MFPG under S1–S5.
Figure 7. Time histories of state, action, energy, and weights of DKM-MFPG under S1–S5.
Jmse 12 01765 g007
Figure 8. Pareto front of all methods under S1–S5.
Figure 8. Pareto front of all methods under S1–S5.
Jmse 12 01765 g008aJmse 12 01765 g008b
Table 1. Specifications of 5 coordinated ship welding scenarios.
Table 1. Specifications of 5 coordinated ship welding scenarios.
ScenariosRobot
Number
Weld Line
Number
Weld Line
Length
Weld Line
Placement
S146 0.2 Vertical
S2512 0.1 , 0.2 , 0.3 Vertical
S3512 0.2 Horizontal
S4513 0.1 , 0.2 , 0.3 Vertical and Horizontal
S5513 0.141 , 0.2 , 0.282 , 0.423 Vertical and Slope
Table 2. Average objective values of all methods under S1–S5.
Table 2. Average objective values of all methods under S1–S5.
Average
Objective
Values
ScenariosGAMANSGA-IIIAnDDKM-
MFPG
Average
Energy
Consumption
S15.315.174.965.175.66
S210.1110.3810.279.829.52
S39.8410.6210.219.7210.01
S415.5315.7315.0915.1814.69
S536.9035.7037.2037.4537.44
Average15.54
(+0.47%)
15.52
(+0.36%)
15.55
(+0.53%)
15.47
(+0.03%)
15.46
Average
Lazy Robot
Ratio
S19.79%12.50%23.19%14.86%0.00%
S221.88%21.22%6.54%22.02%0.00%
S34.19%3.25%1.62%3.24%0.00%
S49.53%6.86%7.11%6.69%0.00%
S511.09%11.12%2.90%5.17%0.00%
Average11.30%
(+11.30%)
10.99%
(+10.99%)
8.27%
(+8.27%)
10.39%
(+10.39%)
0.00%
Table 3. C-metric values between methods under S1–S5.
Table 3. C-metric values between methods under S1–S5.
Scenarios C A , B GAMANSGA-IIIAnDDKM-MFPG
S1GA0.000.000.000.000.00
MA0.000.000.000.000.00
NSGA-III0.000.000.000.000.00
AnD0.000.000.000.000.00
DKM-MFPG0.000.000.000.000.00
S2GA0.000.000.000.000.00
MA0.000.000.000.000.00
NSGA-III0.330.330.000.000.00
AnD0.330.330.000.000.00
DKM-MFPG0.671.001.000.670.00
S3GA0.000.670.330.250.00
MA0.330.000.000.000.00
NSGA-III0.330.000.000.250.00
AnD0.671.000.670.000.00
DKM-MFPG0.330.670.330.500.00
S4GA0.000.750.000.500.00
MA0.330.000.000.000.00
NSGA-III1.000.750.000.750.00
AnD0.330.750.330.000.00
DKM-MFPG0.670.750.670.750.00
S5GA0.000.000.000.000.00
MA0.330.000.000.000.00
NSGA-III0.330.000.000.330.00
AnD0.330.000.000.000.00
DKM-MFPG0.330.000.330.330.00
Table 4. Execution time between methods under S1–S5 (Unit: second).
Table 4. Execution time between methods under S1–S5 (Unit: second).
GAMANSGA-IIIAnDDKM-MFPG
S18.754.0221.9880.971.46
S214.684.8625.3886.152.28
S314.104.3924.1983.711.68
S416.235.0125.8787.622.43
S516.975.2826.8589.913.09
Average14.15
(+84.53%)
4.71
(+53.57%)
24.85
(+91.20%)
85.67
(+97.45%)
2.19
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yu, R.; Chen, Y.-Y. Coordinated Ship Welding with Optimal Lazy Robot Ratio and Energy Consumption via Reinforcement Learning. J. Mar. Sci. Eng. 2024, 12, 1765. https://doi.org/10.3390/jmse12101765

AMA Style

Yu R, Chen Y-Y. Coordinated Ship Welding with Optimal Lazy Robot Ratio and Energy Consumption via Reinforcement Learning. Journal of Marine Science and Engineering. 2024; 12(10):1765. https://doi.org/10.3390/jmse12101765

Chicago/Turabian Style

Yu, Rui, and Yang-Yang Chen. 2024. "Coordinated Ship Welding with Optimal Lazy Robot Ratio and Energy Consumption via Reinforcement Learning" Journal of Marine Science and Engineering 12, no. 10: 1765. https://doi.org/10.3390/jmse12101765

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop