Automatic Alignment Method of Underwater Charging Platform Based on Monocular Vision Recognition

Yu, Aidi; Wang, Yujia; Li, Haoyuan; Qiu, Boyang

doi:10.3390/jmse11061140

Open AccessArticle

Automatic Alignment Method of Underwater Charging Platform Based on Monocular Vision Recognition

College of Mechanical and Electrical Engineering, Harbin Engineering University, Harbin 150001, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2023, 11(6), 1140; https://doi.org/10.3390/jmse11061140

Submission received: 13 April 2023 / Revised: 17 May 2023 / Accepted: 27 May 2023 / Published: 29 May 2023

(This article belongs to the Special Issue Autonomous Marine Vehicle Operations)

Download

Browse Figures

Versions Notes

Abstract

:

To enhance the crypticity and operational efficiency of unmanned underwater vehicle (UUV) charging, we propose an automatic alignment method for an underwater charging platform based on monocular vision recognition. This method accurately identifies the UUV number and guides the charging stake to smoothly insert into the charging port of the UUV through target recognition. To decode the UUV’s identity information, even in challenging imaging conditions, an encryption encoding method containing redundant information and an ArUco code reconstruction method are proposed. To address the challenge of underwater target location determination, a target location determination method was proposed based on deep learning and the law of refraction. The method can determine the two-dimensional coordinates of the target location underwater using the UUV target spray position. To meet the real-time control requirements and the harsh underwater imaging environment, we proposed a target recognition algorithm to guide the charging platform towards the target direction. The practical underwater alignment experiments demonstrate the method’s strong real-time performance and its adaptability to underwater environments. The final alignment error is approximately 0.5548 mm, meeting the required alignment accuracy and ensuring successful alignment.

Keywords:

real-time control; underwater image restoration; thresholding; visual servoing; target recognition; monocular vision; alignment

1. Introduction

Unmanned underwater vehicles (UUVs) play an irreplaceable role in various fields, serving as oceanic equipment suitable for underwater tasks. They are widely utilized for tasks including seafood fishing, subsea pipeline tracking, seafloor mapping, submarine cable laying, and marine resource exploration. However, as the scope of UUV missions continues to expand, the issue of their endurance has become a focal point. Due to the limited energy carried by UUVs, frequent charging becomes necessary. However, surface charging not only reduces the operational efficiency of UUVs and increases costs but also compromises their stealth capabilities during mission execution [1,2]. To address this challenge, underwater charging platforms have emerged, enabling UUVs to recharge without the need to surface. Currently, there is a wealth of research on UUV docking. Researchers have utilized navigation systems such as acoustics [3,4], optics [5], and electromagnetics [6,7] to guide UUVs into docking station (DS). However, to the best of our knowledge, there are few methods available for guiding the charging stake to accurately insert into the UUV’s charging port after docking. Achieving automatic alignment of underwater charging platforms is a current trend in the development of underwater equipment technology and holds significant research and practical value.

Although there is limited research on automatic alignment of underwater charging platforms, the process of inserting the charging stake into the UUV’s charging port can be conceptualized as a peg-in-hole assembly. The solution to this problem can be broadly categorized into contact-based and non-contact-based methods. Contact-based methods [8] typically involve the end of the shaft contacting the plane where the hole is located, followed by using a force sensor to search for the hole’s position on the plane. This method has low safety and can potentially damage the outer surface of the UUVs. Non-contact-based methods can be divided into methods based on laser alignment instruments, acoustic sensors, and vision sensors. The core components of a laser alignment instrument are a semiconductor laser that emits laser beams and a photoelectric semiconductor position detector that collects information about the position of the laser spot [9]. Therefore, precise alignment can be achieved by installing laser alignment instruments on the hole axis. However, underwater small planktonic organisms cause light scattering, affecting the alignment accuracy, and suspended particles in the water obstruct the laser, rendering the alignment process unable to continue. Due to the slow attenuation of sound waves underwater, acoustic sensors are widely used in various underwater positioning and navigation tasks [4], but their alignment accuracy is lower in short-distance scenarios. Vision-based alignment, on the other hand, corrects alignment deviations through visual feedback, providing positioning and guidance for fragile or easily disturbed objects without physical contact [10]. It exhibits robustness in underwater environments and meets alignment requirements in terms of accuracy. Therefore, we adopt a vision-based non-contact alignment approach to guide the charging stake into the UUV’s charging port.

Alignment operations using vision sensors have been widely studied in various fields. For instance, Fan et al. [11] proposed a laser-vision-sensor-based method for initial point alignment of narrow weld seams by utilizing the relationship between laser streak feature points and initial points. They obtain a high signal-to-noise image of the narrow weld seam using the laser vision sensor, and then calculate the 3D coordinates of the final image feature point and the initial point based on the alignment model. Finally, they control the actuator to achieve the initial point alignment. In another study, Chen et al. [12] developed an automatic alignment method for tracking the antenna of an unmanned aerial vehicle (UAV) using computer vision technology. The antenna angle is adjusted using the relative position between the center of the UAV image and the center of the camera image fixed on the antenna. The two image centers overlap during antenna alignment. Similarly, Jongwon et al. [13] designed a vision system that uses three cameras to locate the wafer’s position for wafer alignment.

Underwater image processing techniques are of great importance in the field of underwater charging platform alignment due to the problems of low contrast, blurred edges with blue-green tones, and other problems. Traditional underwater image processing techniques can be divided into two categories: image enhancement and image restoration. Image enhancement algorithms [14] include histogram equalization, white balance, Retinex, wavelet transform, etc. These algorithms can render underwater objects clearer by enhancing image contrast and denoising. Image restoration techniques recover images by solving two unknown variables in the Jaffe–McGlamery [15,16] underwater imaging model: the transmission map and background light. For example, the dark channel prior algorithm (DCP) proposed by He et al. [17,18], which simplifies the JM model by introducing a priori knowledge that the dark channel value of a clear, fog-free image is close to zero. Variants of the DCP algorithm [19,20,21,22] have been developed and optimized over time, achieving better results. In recent years, convolutional neural networks have made remarkable achievements in multiple fields such as image classification [23], object detection [24], and instance segmentation [25]. Increasingly, many networks are being used to process underwater images. Some of these networks are end-to-end [26,27,28], which output the recovered image directly after inputting the original image, while others use deep learning to derive some of the physical parameters of the underwater imaging model and then perform image restoration [29]. This method has good performance and strong robustness but is not suitable for situations with limited hardware resources.

Accurate pose estimation is a primary prerequisite for successful alignment. Pose estimation technology recovers the position and orientation of an object by observing the correspondence between its image and features [30], which can be divided into three types: corners, lines, and ellipses (circles). Luckett et al. [31] compared the performance of these three features and found that the accuracy and precision of corner and line features increase as the distance decreases, but in high-noise environments, ellipse features have the strongest robustness. To address the issue of ellipse detection accuracy, Zhang et al. [32] improved the circle-based ellipse detection method and designed a sub-pixel edge-based ellipse detection method. This improved the accuracy of ellipse detection, especially in cases where the ellipse is incomplete. It was the first to prove that improving the accuracy of ellipse edges helps to improve the detection accuracy of ellipses. Huang et al. [33] proposed a universal circle and point fusion framework that can solve pose estimation problems with various feature combinations, combining the advantages of both features with high accuracy and robustness. Meng et al. [30] proposed a perspective circle and line (PCL) method that uses the perspective view of a single circle and line to recover the position and orientation of an object, which solves the duality and restores the roll angle.

We proposed an automatic alignment method for an underwater charging platform based on monocular vision recognition. After the UUV enters the underwater charging platform, the method accurately identifies its number and guides the charging platform to move towards the target direction using target recognition. This is achieved by calculating the deviation between the current position of the target keypoints and the target position, which aligns the charging stake with the UUV’s charging port. The main contributions of this paper are as follows:

1. A single-camera visual-recognition-based UUV underwater alignment method is proposed that includes an encoding and decoding method for encrypted graphic targets, a method for determining the two-dimensional coordinates of the target location, and a target recognition algorithm, which can guide the charging stake on the charging platform to smoothly insert into the UUV’s charging port.

2. The method can adapt to underwater environments and has certain robustness to partial occlusion. Additionally, this method requires less computational resources, lower hardware requirements, shorter processing times, and satisfies real-time control requirements. Moreover, the detection accuracy of this method meets the requirements for smooth alignment.

The rest of this paper is organized as follows. Section 2 describes the proposed single-camera visual-recognition-based UUV underwater alignment method in detail. Section 3 presents the experimental results of the proposed method. In Section 4, we analyze the experimental results from Section 3 and describe the shortcomings of our method. Finally, Section 5 presents our conclusions.

2. Methods

The structure of the charging platform utilized in this method is illustrated in Figure 1, where both the camera and charging stake are attached to the axial sliding table, which is fixed on the circumferential turntable. Meanwhile, the UUV is secured onto the alignment platform. Due to the positioning of the alignment platform, the camera’s distance from the UUV target at the target position is a known, fixed value. Consequently, the two-dimensional coordinates of the target’s keypoints on the UUV remain unchanged at the target position. By comparing the two-dimensional coordinates of the target keypoints at the current position and the target position, the direction of motion of the charging stake can be determined.

The proposed method comprises three stages, as shown in Figure 2. In the first stage, the UUV’s identity information is decoded to obtain its number, which serves as an index to retrieve the registered information of the UUV within the charging platform. This information includes the UUV’s charging voltage, size, and target spray position. Subsequently, the UUV is firmly clamped onto the alignment platform by the clamping device of the underwater charging platform. In the second stage, the UUV’s size and target spray position are obtained based on the retrieved information, and the target position for docking is determined. This target position is the two-dimensional coordinate on the camera imaging plane where the keypoints of the UUV’s target are located when the charging stake on the underwater charging platform can be inserted into the UUV’s charging port. In the third stage, the keypoints of the UUV’s target are recognized, and the charging stake is guided to move towards the target position by calculating the deviation between the current position and the target position. The stake is first aligned circumferentially and then aligned axially until the distance between the current position and the target position is within the allowable error range, as shown in Figure 3.

2.1. Encoding and Decoding

2.1.1. Encoding

Before UUVs can charge or exchange information with underwater charging platforms, their identity information should be determined to identify their model and charging voltage, and to confirm their mission type and ensure secure information exchange. Therefore, a UUV identity information encryption and coding method is necessary to ensure information security. Firstly, the UUV number is expanded to three digits, with leading zeros added if necessary. The UUV number is denoted as

A_{1} A_{2} A_{3} (A_{1}, A_{2}, A_{3} \in [0, 9], A_{1}, A_{2}, A_{3} \in Z)

. Next, four coding values are obtained:

A_{1} A_{2}, A_{2} A_{3}, A_{1} A_{3}, A_{1} + A_{2} + A_{3}

. These coding values are used to query their corresponding ArUco codes, which are graphic codes obtained by converting the numeric codes. Finally, the four ArUco codes obtained from the previous step are rotated clockwise by 0°, 90°, 180°, and 270°, respectively, and their position information is added to each coding value through the ArUco code’s pose information. The resulting encoding pattern is shown in Figure 4. This encoding method has some redundancy; when decoding, it is only necessary to recognize the position and ID of any two of the four coding values to infer the UUV number.

2.1.2. Decoding

Due to the challenging imaging conditions, the ArUco codes in the original images cannot be recognized directly. Therefore, this paper proposes a method for ArUco code detection, which involves image restoration, thresholding, and template-filling techniques to reconstruct the ArUco codes in the respective regions. The specific steps are illustrated in Figure 5.

In the first step, the original image is subjected to image restoration. The core of the image restoration method involves solving for two unknowns,

t (x, λ)

and

B_{\infty} (λ)

, based on the underwater imaging model as represented by Equation (1). We adopt the method proposed in [19] and use the difference between the red channel and the maximum value of the blue and green channels to estimate the transmission of the red channel. The transmission map of the blue and green channels is then obtained based on statistical analysis [34], as shown in Equation (2). Furthermore, we apply the gray world assumption theory [9] in the field of image restoration to estimate the background light, as shown in Equation (3). By substituting the calculated values of

t (x, λ)

and

B_{\infty} (λ)

into Equation (1), the restored image can be obtained.

I (x, λ) = J (x, λ) t (x, λ) + B_{\infty} (λ) (1 - t (x, λ)), λ \in \{R, G, B\}

(1)

where

I (x, λ)

is the original image,

J (x, λ)

is the restored image,

t (x, λ)

is the transmission map,

B_{\infty} (λ)

is the background light,

λ

is the wavelength of light.

D (x) = \max_{x \in Ω, λ = R} I (x, λ) - \max_{x \in Ω, λ \in \{B, G\}} I (x, λ), λ \in \{R, G, B\} t (x, R) = D (x) + (1 - \max_{x} D (x)) c (λ) = - 0.00113 \times λ + 1.62571, λ \in \{R, G, B\} t (x, G) = t {(x, R)}^{\frac{c (G)}{c (R)}} t (x, B) = t {(x, R)}^{\frac{c (B)}{c (R)}}

(2)

where

Ω

is a local patch in the image.

B_{\infty} (λ) = \frac{\bar{I (x, λ)} - M \times \bar{t (x, λ)}}{1 - \bar{t (x, λ)}}, λ \in \{R, G, B\}

(3)

where

M

is a constant value that represents the desired mean gray value of the restored image, which is set to 0.5 in this paper.

In the second step, the restored image is subjected to a thresholding operation. Although the contrast of the restored image has been improved, it still does not meet the recognition criteria for ArUco codes. Traditional methods utilize contrast enhancement and image binarization techniques to assist in ArUco code recognition. However, these operations can amplify the noise in the image and result in the failure of ArUco code recognition. Therefore, this paper proposes an improved approach to local thresholding, as shown in Equation (4). This method creates small window

w_{1}

and large window

w_{2}

around each pixel in the image. It compares the mode of the pixel grayscale values in

w_{1}

with the average grayscale value of the pixels in

w_{2}

and returns the grayscale value of the pixel at the center. Compared to traditional binarization methods, this approach has the advantage of using the mode of the grayscale values in a small window centered around each pixel for comparison, which helps with noise reduction. Additionally, it categorizes all pixels into five classes instead of two (0 and 255), resulting in smoother transitions between grayscale values and enhancing the robustness of the ArUco code recovery process in the third step.

d_{1} = mode (w_{1}) d_{2} = mean (w_{2}) I (i, j) = \{\begin{matrix} 0, & d_{2} - d_{1} > 0.5 \\ 60, & d_{2} - d_{1} > 0.1 \\ 180, & d_{1} - d_{2} > 0.1 \\ 255, & d_{1} - d_{2} > 0.5 \\ 125, & others \end{matrix}

(4)

where

w_{1}

and

w_{2}

are sliding windows with radii of

d_{w 1}

and

d_{w 2}

, as shown in Figure 6. In this paper,

d_{w 2}

= 10

d_{w 1}

.

In the last step, the ArUco codes are reconstructed. After obtaining the thresholded image, the regions of interest (ROIs) containing the ArUco codes can be determined. Each ROI is then divided into 36 equally sized rectangles, with the central 16 rectangles containing the encoding information of the ArUco codes. The color of the corresponding blank positions in the template is determined based on the average grayscale value within each rectangle. By filling in the template, the reconstructed ArUco codes are obtained. Finally, the ID and angle of the reconstructed ArUco codes are identified.

2.2. Determination of Target Position

Due to the complex underwater environment, fixing all UUVs to a position that allows the charging stake to be smoothly inserted into their charging port and recording the two-dimensional coordinates of the current target keypoints would require a lot of manpower and resources. Therefore, this paper proposes a method to determine the two-dimensional coordinates of the target keypoints underwater based on the target spraying positions on the UUV. This method first determines the above-water coordinates of the target position based on the target spraying position and then uses the law of refraction to determine the underwater coordinates.

2.2.1. Above-Water Coordinates of the Target Position

The schematic diagram of the two-dimensional coordinates of the water surface target position is shown in Figure 7. The two-dimensional coordinates of the above-water target position refer to the coordinates of the keypoint of the target T in the o-uv coordinate system. During the UUV target spraying, the relative position between the keypoint of the target and the UUV charging port can be obtained, that is, the coordinates (X, Y, Z) of the keypoints of the target in the coordinate system O-XYZ. Since the target position of the underwater charging platform is the position where the charging stake can be inserted into the charging port of the UUV, the coordinates

(X_{W}, Y_{W}, Z_{W})

of the keypoint of the target in the coordinate system

O_{w} - X_{W} Y_{W} Z_{W}

are (X, Y, Z + L). The value of L is determined by the type of UUV and can be obtained during the decoding process since the clamping device of the charging platform in this method will fix and move the UUV to a specific position. The process of converting the coordinates in the coordinate system

O_{w} - X_{W} Y_{W} Z_{W}

to the coordinates in the coordinate system o-uv can be regarded as the camera calibration process. The conversion process is shown in Equation (1):

[\begin{matrix} \begin{matrix} X_{c} \\ Y_{c} \end{matrix} \\ \begin{matrix} Z_{c} \\ 1 \end{matrix} \end{matrix}] = [\begin{matrix} R & T \\ 0 & 1 \end{matrix}] [\begin{matrix} \begin{matrix} X_{w} \\ Y_{w} \end{matrix} \\ \begin{matrix} Z_{w} \\ 1 \end{matrix} \end{matrix}] Z_{c} [\begin{matrix} u \\ v \\ 1 \end{matrix}] = [\begin{matrix} α_{x} f & 0 & u_{0} \\ 0 & α_{y} f & v_{0} \\ 0 & 0 & 1 \end{matrix}] [\begin{matrix} X_{c} \\ Y_{c} \\ Z_{c} \end{matrix}]

(5)

where

[\begin{matrix} R & T \\ 0 & 1 \end{matrix}]

represents the camera’s extrinsic matrix, which represents the position relationship between the world coordinate system

O_{w} - X_{W} Y_{W} Z_{W}

and the camera coordinate system

O_{c} - X_{c} Y_{c} Z_{c}

.

[\begin{matrix} α_{x} f & 0 & u_{0} \\ 0 & α_{y} f & v_{0} \\ 0 & 0 & 1 \end{matrix}]

represents the camera’s intrinsic matrix, which represents the transformation relationship between the camera coordinate system

O_{c} - X_{c} Y_{c} Z_{c}

and the image coordinate system o-uv. The parameters

α_{x}

and

α_{y}

represent the scaling factors in the x and y directions, respectively. The parameter f represents the camera’s focal length, and

(u_{0}, v_{0})

represents the principal point of the camera, which is the coordinate of the camera’s optical center in the image.

The camera’s intrinsic parameters can be obtained and image distortion can be corrected using Zhang’s calibration method [35]. However, due to the installation errors of the camera, it is necessary to accurately determine the camera’s extrinsic matrix through further hand–eye calibration [36].

Inspired by neural network concepts, this paper transforms the camera calibration problem into estimating the function f in the Equation (6). To achieve this, the paper controls the movement of the slider and captures 25 photos, as shown in Figure 8. The coordinates of the four concentric circle centers in the world coordinate system

(X_{W}^{i}, Y_{W}^{i}, Z_{W}^{i})

and their corresponding coordinates in the image coordinate system

(u^{i}, v^{i})

are recorded for each photo. This resulted in 100 samples, where

(X_{W}^{i}, Y_{W}^{i}, Z_{W}^{i}, 1)

serve as data and

(u^{i}, v^{i})

as labels. Fifty percent of the samples were used for training, and the remaining fifty percent for testing. A single-hidden-layer neural network without activation functions was constructed, as shown in Figure 9. The mean squared error (MSE) loss function was employed, and the external parameter matrix estimate was used to initialize the first layer of the network, while the calibrated internal parameters were used to initialize the second layer. The initialized network showed fast convergence, small oscillations, and ultimately converged to a smaller loss value.

{[u, v]}^{T} = f ({[X_{W}, Y_{W}, Z_{W}]}^{T})

(6)

After training, the network weights consist of two matrices,

\hat{E}

and

\hat{I}

, with dimensions of 4 × 3 and 3 × 3, respectively. Given a coordinate

(X_{W}, Y_{W}, Z_{W})

in the world coordinate system, its corresponding coordinate in the image coordinate system can be calculated using Equation (7). It is worth noting that

\hat{E}

and

\hat{I}

obtained from training do not have physical meaning, and the intermediate variables

\hat{X_{c}}

,

\hat{Y_{c}}

, and

\hat{Z_{c}}

in Equation (7) are not the coordinates of the point in the camera coordinate system.

{[\hat{X_{c}}, \hat{Y_{c}}, \hat{Z_{c}}]}^{T} = \hat{E} {[X_{W}, Y_{W}, Z_{W}, 1]}^{T} {[\hat{u}, \hat{v}, 1]}^{T} = \frac{\hat{I} {[\hat{X_{c}}, \hat{Y_{c}}, \hat{Z_{c}}]}^{T}}{\hat{Z_{c}}}

(7)

2.2.2. Underwater Coordinates of the Target Location

The imaging principle of the underwater camera is shown in Figure 10. In air, the light reflected by the target propagates in a straight line, and the size of an object with a size of h projected onto the camera imaging plane is a. However, underwater, due to refraction of light between different media, the size of an object with a size of h projected onto the imaging plane is b, and according to Snell’s Law [37], as shown in Equation (8), since the refractive index of water is 1.333,

α > θ

and b

> a

.

n = \frac{\sin α}{\sin θ}

(8)

where

θ

represents the angle of incidence,

α

represents the angle of refraction, and n represents the refractive index of the medium.

It can be inferred that the projection of underwater objects on the imaging plane can be obtained by magnifying the projection of objects above water with the camera center as the projection center by a certain factor. The magnification factor

\frac{b}{a}

can be obtained from Equation (9).

\frac{b}{a} = \frac{(D + K)}{(D \times \frac{t a n θ}{t a n α} + K)} \approx \frac{(D + K)}{(D \times \frac{1}{n} + K)}

(9)

2.3. Target Recognition and Instruction Provision

This method utilizes a target designed by Tweddle et al. [38], as shown in Figure 11. The target consists of four concentric circles with area ratios of 1.44, 1.78, 2.25, and 2.94, and the keypoints of the target are the centers of the four concentric circles. During the alignment process, the camera image is preprocessed first, followed by contour detection to identify circles that may originate from the target according to their area and roundness. Then, based on the area ratios, the concentric circles are matched, and the coordinate values of each concentric circle are obtained. Finally, based on the relative position between the visible concentric circle center of the current position and the corresponding target concentric circle center, the charging station is guided to move until the coordinate difference between the current and target positions is less than the maximum allowable error, and the movement of the charging station is controlled to stop. During the alignment process, the x-coordinate of the concentric circle center is compared first to guide the charging station to rotate tangentially for azimuthal alignment, followed by the y-coordinate of the concentric circle center to guide the charging station to move axially for axial alignment.

To address the problem of low contrast in underwater images, this paper employs the Niblack binary thresholding method [39]. It calculates the pixel threshold by sliding a rectangular window over the grayscale image [40], as shown in Equation (10). If the pixel value surpasses the threshold, it is set as foreground; otherwise, it is set as background. Because the Niblack binary thresholding algorithm can adaptively adjust the threshold based on local blocks, it can preserve more texture details in the image. However, due to its need for more computational resources and time to process images, the Niblack binary thresholding method is computationally expensive.

T = m - k \sqrt{\frac{1}{n} \sum_{i ϵ Ω} {(p_{i} - m)}^{2}}

(10)

where T represents the pixel threshold,

Ω

represents the sliding window, n represents the number of pixels in the window,

p_{i}

represents the grayscale value of each point in the window, m represents the pixel mean value of all points in the window, and k represents the correction parameter.

To accelerate the processing speed of Niblack thresholding, this paper is inspired by the ResNet network [41] and adopts a structure similar to the bottleneck architecture that first reduces and then restores the size of the image. As the processing time required for Niblack thresholding is directly proportional to the size of the image, the size of the image is reduced to one quarter of its original size, and then Niblack thresholding is performed. Finally, the thresholded image is enlarged four times to restore its original size. Although the size of this image is the same as that of the original image, its information content is only one quarter of that of the original image, and its contour details are relatively blurred. Directly using this image for contour detection would lower the detection accuracy. Therefore, inspired by the coarse-to-fine idea in the LoFTR algorithm [42], this paper uses the image to determine the region of interest (ROI) and performs Niblack thresholding on the ROI of the original image. This significantly reduces the number of pixels processed by Niblack thresholding, ensuring the real-time performance of the algorithm.

To address the problem of partial occlusion caused by bubbles and suspended particles in underwater images, this paper enhances the robustness of the algorithm by utilizing the redundancy of target information. During the alignment process, only one of the four concentric circles in the target needs to be identified, and then the two-dimensional coordinates of its center are compared with the center of the concentric circle corresponding to the target position area ratio, thus providing the motion instructions in both the circumferential and axial directions. If multiple concentric circle centers are detected, the center point of multiple circle centers is used for comparison.

3. Results

3.1. Result of the Decoding Experiment

In this paper, decoding experiments were conducted underwater. During the encoding process, assuming the UUV is numbered 123, its corresponding four-digit code would be 12, 23, 31, and 6. These four digits were then converted into ArUco codes and rotated at 0°, 90°, 180°, and 270°, respectively, resulting in the final encoded patterns. The ArUco code recognition result is shown in Figure 12.

The proposed image restoration method is compared with image enhancement algorithms based on the gray world assumption theory [43], the UDCP algorithm [44], and the Shallow-UWnet method [28] based on end-to-end convolutional neural networks. The results are shown in Figure 13. Furthermore, the proposed thresholding method is compared with Niblack and Bernsen methods [45]. Table 1 presents the detection results of ArUco codes after being processed by different thresholding methods.

3.2. Result of the Experiment on Determining Target Position

3.2.1. Result of the Experiment on Above-Water Target Position Calibration

Figure 14 shows the loss variation of the proposed network architecture and the network architecture proposed by Cao et al. [46] during the training and testing processes.

The training configuration is as follows: there are a total of 100 samples, with labels of

(u^{i}, v^{i})

and data of

(X_{W}^{i}, Y_{W}^{i}, Z_{W}^{i}, 1)

; half of them are training sets and the other half are test sets. The initial learning rate is set to 0.1, which decreases by half every 1000 epochs to optimize the training process. The MSE loss function is employed as the loss function during the training phase. Training ends when the test loss does not decrease for 500 consecutive epochs.

3.2.2. Result of the Experiment on Underwater Target Position Calibration

Based on Figure 15, the two-dimensional coordinates of the underwater target position can be obtained by enlarging the two-dimensional coordinates of the on-water target position with the projection of the camera center on the imaging plane as the center. The enlargement factor can be obtained from Equation (9). To determine the distance between the camera center and the refraction surface, as well as the projection of the camera center on the imaging plane, the projection of the same target on the camera imaging plane was recorded both underwater and above water. The targets were located 200 mm from the refraction surface, as shown in Figure 11. The corresponding keypoints

A_{2} A_{1}, B_{2} B_{1}, C_{2} C_{1}, D_{2} D_{1}

were connected and extended, and their intersection point P was taken as the projection of the camera’s optical center on the imaging plane. The average values of

\frac{{PA}_{2}}{{PA}_{1}}, \frac{{PB}_{2}}{{PB}_{1}}, \frac{{PC}_{2}}{{PC}_{1}}, \frac{{PD}_{2}}{{PD}_{1}}

, which were 1.308, were used as

\frac{b}{a}

in Equation (9). By substituting D = 200 mm into Equation (9), the distance between the camera center and the refraction surface, K was obtained as 10.618 mm.

Once the value of K is calculated, the camera’s field of view (FOV) can be determined. The camera model used in this study is LI-IMX185MIPI-CS, with a sensor type of 1/1.9”, which corresponds to a diagonal length of 1/1.9 inches. The aspect ratio of the image is 16:9. Therefore, the sensor size is 7.3396 mm

\times

4.1285 mm.

In air, using Equation (11), the camera’s field of view is calculated to be 173.9073 mm

\times

309.1704 mm.

\begin{array}{l} H_{a} & = \frac{h \times (K + D)}{f} = \frac{4.1285 \times (10.618 + 200)}{5} = 173.9073 mm \\ W_{a} = \frac{w \times (K + D)}{f} = \frac{7.3396 \times (10.618 + 200)}{5} = 309.1704 mm \end{array}

(11)

where

W_{a}

and

H_{a}

represent the width and height of the FOV, and

w

and

h

are the width and height of the sensor, respectively.

f

denotes the camera’s focal length. The definitions of

K

and

D

are illustrated in Figure 10.

Underwater, due to the refraction of light between different media, the camera’s FOV will be reduced. Using Equation (12), the camera’s FOV is calculated to be 132.6537 mm

\times

235.8289 mm.

\tan α_{h} = \frac{h}{f} H_{w} = D \times \tan θ_{h} + K \times \tan α_{h} = 132.6537 mm \tan α_{w} = \frac{w}{f} W_{w} = D \times \tan θ_{w} + K \times \tan α_{w} = 235.8289 mm

(12)

where

α_{h}

and

θ_{h}

represent the refraction angle and incident angle in the height direction of the sensor, and

α_{w}

and

θ_{w}

represent the refraction angle and incident angle in the width direction of the sensor. These parameters are illustrated in Figure 10.

The accuracy of determining the underwater target position was verified below. The projection of the same target on the camera imaging plane was again recorded both underwater and on-water, with the target located 260 mm from the refraction surface. With the value of K calculated above,

\frac{b}{a}

was determined as 1.313. Using P and b/a, the predicted two-dimensional coordinates of the underwater target position,

\bar{A_{2}}

,

\bar{B_{2}},

\bar{C_{2}}

, and

\bar{D_{2}}

, were obtained, as shown in Figure 11. The error between the predicted and true values is presented in Table 2.

3.3. Result of Image Processing

We downsized the images by different scales, applied Niblack binarization, and then resized them back to their original size. The average processing time for each frame is shown in Table 3. The fluctuations of the keypoint detection results for the target at different scaling ratios over 90 consecutive frames are illustrated in Figure 16.

3.4. Result of the Actual Alignment Experiment

During the experiment, the first stage is the circumferential alignment process, where the circumferential hydraulic cylinder moves until the difference between the x coordinates of the current position and the target position is within the allowable error range. The second stage is the axial alignment process, where the axial hydraulic cylinder moves until the difference between the y coordinates of the current position and the target position is within the allowable error range. The third stage is the charging process, where the hydraulic cylinder controls the charging stack to rise and insert into the UUV charging port. The distance between the camera and the alignment platform was 200 mm. The displacement changes of each hydraulic cylinder during the alignment process are shown in Figure 17, Figure 18 and Figure 19, and the distance changes between the current position of the target and the target position are shown in Figure 20 and Figure 21.

4. Discussion

The experimental results of underwater encoding are presented in Figure 12. Although some ArUco codes were not recognized due to occlusion, two ArUco codes were identified with IDs 31 and 6 and their yaw angles were 179° and −95°, respectively. Based on machine pose estimation, it can be inferred that ID 31 represents

A_{3} A_{1}

; therefore,

A_{1} = 1

and

A_{3} = 3

. ID 6 represents

A_{1} + A_{2} + A_{3}

, thus

A_{2} = 2

. Decoding yields the UUV number 123.

As shown in Figure 13, the original image suffers from low contrast, blurry contours, and a color shift towards blue and green due to light absorption and scattering in the underwater environment. The method based on the gray world assumption theory [43] corrects the color shift but fails to enhance the image contrast. The UDCP method [44] intensifies the color shift towards green. The Shallow-UWnet method [28] enhances the image contrast but introduces an additional color shift towards yellow. In contrast, the proposed method in this paper corrects the color shift while enhancing image details. As shown in Table 1, by applying the proposed image restoration and thresholding methods, eight out of nine ArUco codes can be detected. When only the proposed image restoration method is applied, a maximum of four codes can be detected, while applying only the proposed image thresholding method can detect up to six codes. The best result of the remaining methods is achieved by combining the Niblack and the gray world assumption theory methods, which detects five codes. Therefore, both the proposed image restoration method and thresholding method effectively improve the detection rate of ArUco codes.

As shown in Figure 14, Cao’s method [46] exhibits a fast convergence rate during the initial stages of training, but it also shows significant oscillations during the convergence process. In contrast, the proposed method in this paper has a slower convergence rate but demonstrates a stable convergence process, with the final loss value consistently reaching a smaller value. The minimum testing error achieved by Cao’s method [46] is 0.3828 pixels, while the proposed method in this paper achieves a minimum training error of 0.0036 pixels.

Figure 15b presents the difference between the predicted underwater target position and the actual underwater target position. The predicted underwater keypoints and the actual underwater keypoints overlap almost perfectly. The errors are shown in Table 2, with a maximum error of 5.21 pixels, which meets the accuracy requirements for alignment.

Table 3 shows the average processing time per frame under different scaling ratios. Figure 16 shows the data fluctuation under different scaling ratios. The algorithm proposed in this paper has the same processing results as no scaling detection, the data fluctuation is minimal, but the processing time is only 37.41% of the original time. It can meet the real-time requirements in the alignment control process.

The displacement changes of each hydraulic cylinder during the alignment process are shown in Figure 17, Figure 18 and Figure 19. Finally, the hydraulic cylinder controls the piston rod to move up by 119 mm, indicating that the piston rod has successfully inserted into the hole on the UUV. As shown in Figure 20 and Figure 21, the error between the final alignment position and the target position in the x direction is −0.69661 pixels, and in the y direction it is −0.58738 pixels. Therefore, considering the calibration errors in Figure 14, the underwater coordinate transformation errors in Table 2, and the motion errors in Figure 20 and Figure 21, the maximum alignment error is 4.517 pixels. Based on the calculation of FOV using Equation (12), the maximum alignment error is 0.5548 mm, which meets the accuracy requirements.

Although the method proposed in this paper has successfully achieved the automatic alignment of the UUV underwater charging platform, there is still room for improvement. Firstly, for partially occluded targets, this paper uses redundant information processing methods in both the decoding and target recognition processes. However, accurately completing the occluded parts in real time would further enhance the robustness of the method. Secondly, in determining the camera’s intrinsic and extrinsic parameters, this paper still needs to use Zhang’s calibration method to determine the distortion coefficients and initialize the network parameters using the calibrated intrinsic parameters. This method is not concise enough. Finally, this paper did not explicitly express the camera’s intrinsic and extrinsic parameters. These three points will be our future research directions.

5. Conclusions

This paper presents an automatic alignment method for UUV underwater charging platforms using monocular vision recognition. This method accurately identifies the UUV’s identity information and guides the charging stake to smoothly insert into the charging port of the UUV through target recognition. To ensure the accuracy and robustness of decoding, this study introduces an encoding method based on redundant information and proposes an ArUco code reconstruction method specifically designed for underwater imaging environments for decoding purposes. Additionally, a method for determining the target position is proposed to overcome the difficulty of directly determining the underwater target position. The proposed method accurately determines the underwater two-dimensional coordinates of the target keypoints based on the location of the UUV target spray using deep learning and the law of refraction. The experimental results demonstrate that the proposed ArUco code reconstruction method can improve the detection rate of ArUco codes by at least 22.2%. The proposed target detection algorithm has an average processing time of 0.092 s per frame, meeting the requirements for real-time control. The maximum alignment error is 0.5548 mm, meeting the accuracy requirements for alignment.

Author Contributions

Conceptualization, A.Y. and Y.W.; methodology, A.Y.; software, A.Y.; validation, A.Y., Y.W. and H.L.; formal analysis, A.Y. and B.Q.; investigation, Y.W.; resources, H.L. and B.Q.; data curation, H.L.; writing—original draft preparation, A.Y.; writing—review and editing, A.Y.; visualization, A.Y.; supervision, Y.W.; project administration, Y.W.; funding acquisition, Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zuo, M.; Wang, G.; Xiao, Y.; Xiang, G. A unified approach for underwater homing and docking of over-actuated AUV. J. Mar. Sci. Eng. 2021, 9, 884. [Google Scholar] [CrossRef]
Wang, T.; Zhao, Q.; Yang, C. Visual navigation and docking for a planar type AUV docking and charging system. Ocean Eng. 2021, 224, 108744. [Google Scholar] [CrossRef]
Bharti, V.; Wang, S. Autonomous Pipeline Tracking Using Bernoulli Filter for Unmanned Underwater Surveys. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022; pp. 7129–7136. [Google Scholar]
Zhang, Y.-X.; Zhang, Q.-F.; Zhang, A.-Q.; Chen, J.; Li, X.-G.; He, Z. Acoustics-Based Autonomous Docking for A Deep-Sea Resident ROV. China Ocean Eng. 2022, 36, 100–111. [Google Scholar] [CrossRef]
Chen, Y.; Duan, Z.; Zheng, F.; Guo, Y.; Xia, Q. Underwater optical guiding and communication solution for the AUV and seafloor node. Appl. Opt. 2022, 61, 7059–7070. [Google Scholar] [CrossRef]
Vandavasi, B.N.J.; Gidugu, A.R.; Venkataraman, H. Deep Learning Aided Magnetostatic Fields Based Real-Time Pose Estimation of AUV for Homing Applications. Ieee Sens. Lett. 2023, 7, 22814400. [Google Scholar] [CrossRef]
Lin, R.; Zhao, Y.; Li, D.; Lin, M.; Yang, C. Underwater Electromagnetic Guidance Based on the Magnetic Dipole Model Applied in AUV Terminal Docking. J. Mar. Sci. Eng. 2022, 10, 995. [Google Scholar] [CrossRef]
Chhatpar, S.R.; Branicky, M.S. Search strategies for peg-in-hole assemblies with position uncertainty. In Proceedings of the 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No. 01CH37180), Maui, HI, USA, 29 October–3 November 2001; pp. 1465–1470. [Google Scholar]
Wang, X.; Zhao, X.; Liu, Z. Underwater Optical Image Enhancement Based on Color Constancy and Multiscale Wavelet. Laser Optoelectron. Prog. 2022, 59, 1601002. [Google Scholar] [CrossRef]
Kong, S.; Zhou, K.; Huang, X. Online measurement method for assembly pose of gear structure based on monocular vision. Meas. Sci. Technol. 2023, 34, 065110. [Google Scholar] [CrossRef]
Fan, J.; Jing, F.; Yang, L.; Long, T.; Tan, M. An initial point alignment method of narrow weld using laser vision sensor. Int. J. Adv. Manuf. Technol. 2019, 102, 201–212. [Google Scholar] [CrossRef]
Chen, B.; Liu, Y. Antenna alignment by using computer vision technology. Microw. Opt. Technol. Lett. 2020, 62, 1267–1269. [Google Scholar] [CrossRef]
Kim, J. New Wafer Alignment Process Using Multiple Vision Method for Industrial Manufacturing. Electronics 2018, 7, 39. [Google Scholar] [CrossRef]
Ke, K.; Zhang, C.; Wang, Y.; Zhang, Y.; Yao, B. Single underwater image restoration based on color correction and optimized transmission map estimation. Meas. Sci. Technol. 2023, 34, 55408. [Google Scholar] [CrossRef]
Jaffe, J.S. Computer modeling and the design of optimal underwater imaging systems. IEEE J. Ocean. Eng. 1990, 15, 101–111. [Google Scholar] [CrossRef]
McGlamery, B. A computer model for underwater camera systems. In Ocean Optics VI; SPIE: Washington, DC, USA, 1980; pp. 221–231. [Google Scholar]
He, K.; Sun, J.; Tang, X. Single Image Haze Removal Using Dark Channel Prior. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 2341–2353. [Google Scholar] [CrossRef] [PubMed]
He, K.; Sun, J.; Tang, X. Guided image filtering. In Proceedings of the European Conference on Computer Vision, Crete, Greece, 5–11 September 2010; pp. 1–14. [Google Scholar]
Carlevaris-Bianco, N.; Mohan, A.; Eustice, R.M. Initial results in underwater single image dehazing. In Proceedings of the Oceans 2010 Mts/IEEE Seattle, Seattle, WA, USA, 20–23 September 2010; pp. 1–8. [Google Scholar]
Li, C.; Quo, J.; Pang, Y.; Chen, S.; Wang, J. Single underwater image restoration by blue-green channels dehazing and red channel correction. In Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 20–25 March 2016; pp. 1731–1735. [Google Scholar]
Galdran, A.; Pardo, D.; Picon, A.; Alvarez-Gila, A. Automatic Red-Channel underwater image restoration. J. Vis. Commun. Image Represent. 2015, 26, 132–145. [Google Scholar] [CrossRef]
Hou, G.; Pan, Z.; Wang, G.; Yang, H.; Duan, J. An efficient nonlocal variational method with application to underwater image restoration. Neurocomputing 2019, 369, 106–121. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Kirillov, A.; Wu, Y.; He, K.; Girshick, R. Pointrend: Image segmentation as rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9799–9808. [Google Scholar]
Wang, K.; Shen, L.; Lin, Y.; Li, M.; Zhao, Q. Joint Iterative Color Correction and Dehazing for Underwater Image Enhancement. Ieee Robot. Autom. Lett. 2021, 6, 5121–5128. [Google Scholar] [CrossRef]
Wang, Y.; Yu, X.; An, D.; Wei, Y. Underwater image enhancement and marine snow removal for fishery based on integrated dual-channel neural network. Comput. Electron. Agric. 2021, 186, 106182. [Google Scholar] [CrossRef]
Naik, A.; Swarnakar, A.; Mittal, K.; Assoc Advancement Artificial, I. Shallow-UWnet: Compressed Model for Underwater Image Enhancement (Student Abstract). Proc. AAAI Conf. Artif. Intell. 2021, 35, 15853–15854. [Google Scholar] [CrossRef]
Song, W.; Wang, Y.; Huang, D.; Tjondronegoro, D. A rapid scene depth estimation model based on underwater light attenuation prior for underwater image restoration. In Proceedings of the Pacific Rim Conference on Multimedia, Hefei, China, 21–22 September 2018; pp. 678–688. [Google Scholar]
Meng, C.; Li, Z.; Sun, H.; Yuan, D.; Bai, X.; Zhou, F. Satellite Pose Estimation via Single Perspective Circle and Line. IEEE Trans. Aerosp. Electron. Syst. 2018, 54, 3084–3095. [Google Scholar] [CrossRef]
Luckett, J.A. Comparison of Three Machine Vision Pose Estimation Systems Based on Corner, Line, and Ellipse Extraction for Satellite Grasping; West Virginia University: Morgantown, WV, USA, 2012. [Google Scholar]
Zhang, H.; Meng, C.; Bai, X.; Li, Z. Rock-ring detection accuracy improvement in infrared satellite image with sub-pixel edge detection. IET Image Process. 2019, 13, 729–735. [Google Scholar] [CrossRef]
Huang, B.; Sun, Y.; Zeng, Q. General fusion frame of circles and points in vision pose estimation. Optik 2018, 154, 47–57. [Google Scholar] [CrossRef]
Gould, R.W.; Arnone, R.A.; Martinolich, P.M. Spectral dependence of the scattering coefficient in case 1 and case 2 waters. Appl. Opt. 1999, 38, 2377–2383. [Google Scholar] [CrossRef]
Zhang, Z. A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 1330–1334. [Google Scholar] [CrossRef]
An, Y.; Wang, X.; Zhu, X.; Jiang, S.; Ma, X.; Cui, J.; Qu, Z. Application of combinatorial optimization algorithm in industrial robot hand eye calibration. Measurement 2022, 202, 111815. [Google Scholar] [CrossRef]
Ma, Y.; Zhou, Y.; Wang, C.; Wu, Y.; Zou, Y.; Zhang, S. Calibration of an underwater binocular vision system based on the refraction model. Appl. Opt. 2022, 61, 1675–1686. [Google Scholar] [CrossRef]
Tweddle, B.E. Computer Vision Based Navigation for Spacecraft Proximity Operations; Massachusetts Institute of Technology: Cambridge, MA, USA, 2010. [Google Scholar]
Niblack, W. An Introduction to Digital Image Processing; Strandberg Publishing Company: Hovedstaden, Denmark, 1985. [Google Scholar]
Khurshid, K.; Siddiqi, I.; Faure, C.; Vincent, N. Comparison of Niblack inspired binarization methods for ancient documents. In Document Recognition and Retrieval XVI; SPIE: Washington, DC, USA, 2009; pp. 267–275. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning. Image Recognit. 2015, 7, 770–778. [Google Scholar]
Sun, J.; Shen, Z.; Wang, Y.; Bao, H.; Zhou, X.; Ieee Comp, S.O.C. LoFTR: Detector-Free Local Feature Matching with Transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 8918–8927. [Google Scholar]
Lam, E.Y. Combining gray world and Retinex theory for automatic white balance in digital photography. In Proceedings of the 9th International Symposium on Consumer Electronics (ISCE 2005), Taipa, Macao, 14–16 June 2005; pp. 134–139. [Google Scholar]
Drews, P., Jr.; do Nascimento, E.; Moraes, F.; Botelho, S.; Campos, M. Transmission Estimation in Underwater Single Images. In Proceedings of the 2013 IEEE International Conference on Computer Vision Workshops, Sydney, Australia, 2–8 December 2013; pp. 825–830. [Google Scholar]
Hashem, A.-R.A.; Idris, M.Y.I.; Ahmad, A.E.-B.A. Comparative study of different binarization methods through their effects in characters localization in scene images. Data Knowl. Eng. 2018, 117, 216–224. [Google Scholar] [CrossRef]
Cao, Y.; Wang, H.; Zhao, H.; Yang, X. Neural-Network-Based Model-Free Calibration Method for Stereo Fisheye Camera. Front. Bioeng. Biotechnol. 2022, 10, 955233. [Google Scholar] [CrossRef]

Figure 1. Structure of the underwater charging platform used in this method.

Figure 2. Schematic diagram of the proposed method.

Figure 3. Schematic diagram of charging process. (a) UUV enters the underwater charging platform. In the figure, the arrow represents the direction of UUV movement. (b) Circumferential alignment. In the figure, the arrow represents the rotational direction of the stake during circumferential alignment. (c) Axial alignment. In the figure, the arrow represents the translational direction of the stake during axial alignment.

Figure 4. Encoding pattern.

Figure 5. Schematic diagram of ArUco code detection.

Figure 6.

w_{1}

and

w_{2}

are sliding windows with

I (i, j)

as the center and

d_{w 1}

and

d_{w 2}

as the radii.

Figure 6.

w_{1}

and

w_{2}

are sliding windows with

I (i, j)

as the center and

d_{w 1}

and

d_{w 2}

as the radii.

Figure 7. The schematic diagram of the two-dimensional coordinates of the above-water target position is shown above. In the diagram, T represents the spray position of the target on the UUV, S represents the position of the UUV charging port, and P represents the charging pile of the underwater charging platform. The O-XYZ coordinate system has the center of the UUV charging port as the origin, the

O_{w} - X_{W} Y_{W} Z_{W}

coordinate system has the center of the charging pile on the underwater charging platform as the origin, the

O_{c} - X_{c} Y_{c} Z_{c}

coordinate system has the camera center as the origin, and the o-uv two-dimensional coordinate system has the upper left corner of the image as the origin. L represents the distance between the UUV and the charging platform, which is determined by the UUV model and can be obtained during the decoding process. In this method, the UUV is fixed and moved to a specific position by the clamping device of the charging platform.

Figure 7. The schematic diagram of the two-dimensional coordinates of the above-water target position is shown above. In the diagram, T represents the spray position of the target on the UUV, S represents the position of the UUV charging port, and P represents the charging pile of the underwater charging platform. The O-XYZ coordinate system has the center of the UUV charging port as the origin, the

O_{w} - X_{W} Y_{W} Z_{W}

coordinate system has the center of the charging pile on the underwater charging platform as the origin, the

O_{c} - X_{c} Y_{c} Z_{c}

coordinate system has the camera center as the origin, and the o-uv two-dimensional coordinate system has the upper left corner of the image as the origin. L represents the distance between the UUV and the charging platform, which is determined by the UUV model and can be obtained during the decoding process. In this method, the UUV is fixed and moved to a specific position by the clamping device of the charging platform.

Figure 8. The dataset.

Figure 9. Structure of the neural network.

Figure 10. The imaging principle of the underwater camera. In the diagram, o represents the camera’s optical center, f represents the camera’s focal length, K represents the distance from the refraction surface (the outer glass of the camera) to the camera’s optical center, and D represents the distance from the target to the refraction surface. Similar to L mentioned earlier, D is a known constant value determined by the positioning of the clamp and the structure of the underwater charging platform.

Figure 11. The employed target.

Figure 12. Result of ArUco code recognition.

Figure 13. Result of image processing. (a) Original image; (b) image enhancement result of the algorithms based on the gray world assumption theory; (c) image restoration result of the UDCP algorithm; (d) image restoration result of the Shallow-UWnet method; (e) image restoration result of the proposed method.

Figure 14. Result of ArUco code recognition.

Figure 15. Projection of the same target on the imaging plane underwater and above water. (a) When D = 200 mm, record the same target’s projection on the imaging plane underwater and above water and mark its keypoints. In this figure,

A_{1}

,

B_{1}

,

C_{1}

,

D_{1}

represents keypoints of the above-water image,

A_{2}

,

B_{2}

,

C_{2}

,

D_{2}

represents keypoints of the underwater image. (b) When D = 260 mm, the location of the underwater keypoints is predicted from the calculated value of K and compared with the actual value. In this figure,

\bar{A_{2}}

,

\bar{B_{2}},

\bar{C_{2}}

,

\bar{D_{2}}

represents predicted keypoints of the underwater image.

Figure 15. Projection of the same target on the imaging plane underwater and above water. (a) When D = 200 mm, record the same target’s projection on the imaging plane underwater and above water and mark its keypoints. In this figure,

A_{1}

,

B_{1}

,

C_{1}

,

D_{1}

represents keypoints of the above-water image,

A_{2}

,

B_{2}

,

C_{2}

,

D_{2}

represents keypoints of the underwater image. (b) When D = 260 mm, the location of the underwater keypoints is predicted from the calculated value of K and compared with the actual value. In this figure,

\bar{A_{2}}

,

\bar{B_{2}},

\bar{C_{2}}

,

\bar{D_{2}}

represents predicted keypoints of the underwater image.

Figure 16. Fluctuations of the keypoints.

Figure 17. Displacement of the circumferential hydraulic cylinder. In the figure, the first dash line represents the completion of circumferential alignment, transitioning to axial alignment. The second dash line represents the completion of axial alignment, transitioning to the charging process. The following are the same.

Figure 18. Displacement of the axial hydraulic cylinder.

Figure 19. Displacement of the charging hydraulic cylinder.

Figure 20. Deviation of the x-coordinate between the current position of the target and the target position.

Figure 21. Deviation of the y-coordinate between the current position of the target and the target position.

Table 1. ArUco code detection results of different image processing methods.

	Gray World Assumption Theory	UDCP	Shallow-UWnet	Proposed
Thresholding	Gray World Assumption Theory	UDCP	Shallow-UWnet	Proposed
Niblack
Bernsen
Proposed

Table 2. Deviations between the predicted and true values.

$∆$ x (Pixel)	2.35	0.74	2.49	1.79
$∆$ y (Pixel)	0.7	0.17	5.21	1.21

Table 3. Average processing time per frame.

Scale = 1	Scale = 2	Scale = 4	Ours
0.24592 s	0.08923 s	0.03511 s	0.092 s

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yu, A.; Wang, Y.; Li, H.; Qiu, B. Automatic Alignment Method of Underwater Charging Platform Based on Monocular Vision Recognition. J. Mar. Sci. Eng. 2023, 11, 1140. https://doi.org/10.3390/jmse11061140

AMA Style

Yu A, Wang Y, Li H, Qiu B. Automatic Alignment Method of Underwater Charging Platform Based on Monocular Vision Recognition. Journal of Marine Science and Engineering. 2023; 11(6):1140. https://doi.org/10.3390/jmse11061140

Chicago/Turabian Style

Yu, Aidi, Yujia Wang, Haoyuan Li, and Boyang Qiu. 2023. "Automatic Alignment Method of Underwater Charging Platform Based on Monocular Vision Recognition" Journal of Marine Science and Engineering 11, no. 6: 1140. https://doi.org/10.3390/jmse11061140

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Automatic Alignment Method of Underwater Charging Platform Based on Monocular Vision Recognition

Abstract

1. Introduction

2. Methods

2.1. Encoding and Decoding

2.1.1. Encoding

2.1.2. Decoding

2.2. Determination of Target Position

2.2.1. Above-Water Coordinates of the Target Position

2.2.2. Underwater Coordinates of the Target Location

2.3. Target Recognition and Instruction Provision

3. Results

3.1. Result of the Decoding Experiment

3.2. Result of the Experiment on Determining Target Position

3.2.1. Result of the Experiment on Above-Water Target Position Calibration

3.2.2. Result of the Experiment on Underwater Target Position Calibration

3.3. Result of Image Processing

3.4. Result of the Actual Alignment Experiment

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI