1. Introduction
After undergoing long-term biological evolution and natural selection, fish have developed remarkable abilities to swim rapidly and perform agile maneuvers in complex and dynamic aquatic environments [
1]. Taking inspiration from natural fish, robotic fish act as a dedicated underwater vehicle platform offering diverse potential applications, whether in a cooperative or noncooperative manner. These applications include ocean exploration, seabed mapping, water monitoring, underwater pipeline tracking, and more [
1,
2]. Compared to conventional propeller-driven underwater vehicles, robotic fish possess several favorable characteristics. Firstly, their appearance and movement closely resemble real fish, allowing effective deception and mimicry of the behavior of aquatic organisms. This characteristic facilitates easier access and observation of underwater life while ensuring minimal disturbance and impact on the natural environment during exploration or monitoring. Consequently, data collection becomes more reliable and representative. Secondly, bionic robotic fish exhibit enhanced flexibility and mobility. By mimicking the body shape and movements of real fish, they can navigate quickly through the water and perform a variety of tasks in intricate or confined spaces [
3]. Additionally, these robots replicate the streamlined shape and efficient propulsion mechanism of real fish, resulting in superior hydrodynamic performance and significantly improved energy efficiency.
Robotic fish are equipped with numerous and diverse sensors, such as depth sensors, vision sensors, and inertial measurement units, to enable precise perception and intelligent control [
4]. However, if the sensors break down during operations, not only do their sensed information become unreliable, but also the entire system may become paralyzed and may even cause safety accidents. To ensure the safe and reliable operation of robotic fish, it is critical to promptly and accurately diagnose sensor faults.
Fault diagnosis is a critical task in various fields, and it can be achieved through different methods, such as signal analysis-based, model-based, and data-driven methods [
5]. Recently, there has been a growing interest in intelligent data-driven fault diagnosis methods, driven by the development of deep learning algorithms [
6]. Compared to manual extraction, end-to-end deep learning methods have the ability to automatically extract features of the data distribution, resulting in time-saving and efficient utilization of labor resources [
7,
8,
9]. To achieve high precision and fast fault diagnosis, Fang et al. [
10] and Chen et al. [
11] used one-dimensionality convolutional neural networks (CNN) to extract the multichannel features in order to effectively improve the accuracy of the diagnosis. The former decreased the number of convolution kernels with the reduction in the convolution kernel size and the latter adopted dynamic convolution with separable convolution to classify faults. Liu et al. [
12] combined the advantages of long short-term memory (LSTM) network with statistical process analysis to predict the fault of aero-engine bearing and obtained ideal accuracy. Tang et al. [
13] proposed signal embedding to solve the problem of transformer application in mechanical vibration signals, which has outstanding performance in terms of diagnostic accuracy under unknown operating conditions in a robustness way. Chen et al. [
14] explored the compound fault of industrial robots and proposed an efficient convolutional transformer. The proposed lightweight convolutional transformer network enhanced the meta-learning method to achieve accurate compound fault diagnosis with limited samples.
However, the methods above mainly focus on time domain features, neglecting the spatial domain features. To improve the accuracy of fault diagnosis algorithms, researchers have attempted to convert one-dimensional time series signals into two-dimensional images, and then extract spatial features from the images. For example, Wen et al. [
15] reshaped vibration signals into grey images and used LeNet-5 to classify images, leading to significant improvements compared to fault diagnosis based solely on time domain features. Yang et al. [
16] adopted the Short Time Fourier Transform (STFT) to transform the signal into the corresponding time-frequency map, which contains abundant feature information. But STFT heavily relies on the window length selected and has significant uncertainty. Xu et al. [
17] proposed the generalized S-synchroextracting transform, a new time–frequency post-processing algorithm to address this issue. Xun et al. [
18] used the Markov transfer field, which jointly improved deep CNN with a wide first-layer Kernel, improving the sensitivity to spatial features. To further improve fault diagnosis performance, Hou et al. [
19] proposed a spatial domain image fusion method. Signals were converted into Gramian angular summation field (GASF) images and Gramian angular difference field (GADF) images using the Gramian angular field (GAF) method, and combined half to half. This method achieved great results for fault location, but whether 0.5 is the optimal weighted combination coefficient needs further discussion. Sun et al. [
20] adopted continuous wavelet transform to transform the nonlinear and non-stationary original vibration signal into a time–frequency image, and then used an improved AlexNet model to diagnose faults. Amiri et al. [
21] used the recurrence plots method to convert signal to image, and derived the degree of determinism in the signal to detect series arc faults. The results confirm its high accuracy, high speed, and low computing cost.
As for spatial domain features recognition, several CNN-based methods have been proposed for the recognition of spatial domain features, such as AlexNet [
20], DenseNet [
22], ResNet [
23], and so on. Though they have achieved excellent recognition results, they were difficult to apply in practice due to the high time costs and great computing resources. Therefore, scholars conducted research on lightweight network. For instance, Sun et al. [
20] replaced the global average pooling (GAP) layer with the fully connected layer, which realizes the improvement of the traditional AlexNet model and the reduction in parameters. Liu et al. [
24] replaced large convolution kernels with small convolution kernels to reduce network parameters in AlexNet, which saved model training time significantly. However, these methods targeted three-channel RGB images for local parameter reduction, leaving room for optimizing global parameter reduction for single-channel images.
Although the methods above have significant advantages in terms of accuracy, they require a large amount of computation, and some new methods have started to emerge aiming to reduce the model complexity, such as Inception [
25], MobileNet [
26], ShuffleNet [
27] and so on.
In order to diagnose fault more accurate and faster, the Gramian angular field fusion with particle swarm optimization and lightweight AlexNet (GAFF-PSO-AlexNet) method was proposed to diagnose faults. The main contributions of this article are summarized as follows:
- (1)
The one-dimensional time series sensor signals are converted into two-dimensional images by using the GAF method. The GASF and GADF images are fused by weighted fusion method to generate Gramian angular field fusion (GAFF) images, and the particle swarm optimization (PSO) algorithm is used to optimize the weighted fusion coefficient.
- (2)
Lightweight AlexNet is proposed to diagnose six sensor fault types. In order to use fewer parameters and shorter running time, the channels of conventional layer and nodes of fully connected layers are decreased to compared with the original AlexNet.
2. Fault Diagnosis Method
2.1. Data Preprocessing
2.1.1. Signal to Image
The depth sensor data are one-dimensional time series signals that contain a large amount of time domain information, but it is difficult to extract spatial information directly. To take full advantage of spatial domain information, the GAF method is used to convert one-dimensional time series sensor signals into two-dimensional images, which is shown in
Figure 1.
The primary concept of the GAF method is to transform one-dimensional time series signals in the Cartesian coordinate system to the polar coordinate system, followed by using trigonometric functions to create a GAF matrix. This approach eliminates noise in the time series via spatial transformation, and retains time information via vector inner product. The GAF matrix has two types of images: GASF and GADF images. The GASF images are the cosine of the summed angles, while the GADF images are the sine of the subtracted angles. The mathematical representation of this approach can be explained as follows [
28]:
We suppose that there is a one-dimensional time series signal denoted as
. Firstly, we normalize
S by rescaling the values such that it falls under the interval of [−1, 1] with the equation below:
where
represents the normalized
.
Secondly, rescaled signals can be encoded into polar coordinates. The value of the time series is calculated as the angle and its corresponding timestamp is calculated as the radius. The equation is as follows:
where
indicates the polar of the polar coordinate,
represents the radius of the polar coordinate,
is the time stamp,
indicates the normalization of
S, and
n is a scaling coefficient to regularize the polar coordinate system.
Thirdly, GAF can encode the time series in two different ways. One is GASF, using cosine of the summed angles to mine the correlation between different moment points:
The GADF images are similar to the GASF images except that the GADF images are constructed using the sine of the subtracted angles as follows:
The GASF images and the GADF images have two significant advantages. Firstly, polar coordinates contain absolute time series relationships because they convert time-varying signals into angular values. Secondly, the original value and angular information can be preserved in the diagonal value, which ensures that the GAF method retains all the information about the one-dimensional time series sensor signals.
2.1.2. Spatial Domain Image Fusion
In order to leverage the benefits of GASF and GADF images, the fusion of these two images is a natural approach. The weighted fusion method is used to generate GAFF images, which is a transparency fusion technique commonly used in image composition and image matting domains. The fusion process is expressed mathematically as shown in the following equation:
The proposed method involves the use of the GASF and GADF images to generate the GAFF images. The GASF images are considered as the foreground images, with transparency represented by , while the GADF images are treated as the background images with a transparency of . The range of the transparency value is [0, 1]. By fusing the images at each pixel, the GAFF images contain information of both the GASF and GADF images. When , the GAFF images only contain the GASF information, while a value of 1 for yield only contains the GADF information. The optimal weighted fusion coefficient can be determined to achieve the ideal fault diagnosis performance.
2.2. Lightweight AlexNet
AlexNet, a model-based CNN which is both deeper and wider, was introduced by Alex Krizhevesky and achieved outstanding performance in the ImageNet challenge for visual object recognition in 2012 [
29]. Due to its exceptional ability to perform nonlinear fitting and automatic feature extraction, it has gained significant attention. However, the AlexNet model has numerous parameters that need to be learned, requiring substantial computational resources and extending model training time. To alleviate this complexity, a lightweight version of the AlexNet model, called the lightweight AlexNet, has been proposed, as illustrated in
Figure 2. The proposed model aims to maintain efficient classification capabilities for multiple and complex scenes while reducing model complexity.
The lightweight AlexNet is structured similarly to the original AlexNet, consisting of five convolutional layers, three max-pooling layers, and three fully connected layers. The convolutional layers possess linearity and time-shift invariance properties and can extract features at different scales by employing different sizes of convolutional kernels. In this work, kernels of sizes , , and were selected. Pooling layers, also referred to as downsampling layers, compress the feature map, reduce feature dimensionality, and avoid overfitting without increasing the learned parameters. Furthermore, the fully connected layers in AlexNet use the dropout operation to set the output of hidden layer neurons to 0 when the probability is less than a certain value, which is equivalent to removing some neural nodes to prevent overfitting.
The primary distinction between the lightweight AlexNet and the original AlexNet lies in the reduction in channels for convolutional layers and nodes for fully connected layers by a factor of
, as indicated in
Table 1. In the original AlexNet, the input images consist of red, green and blue three channels, with the corresponding channel numbers for convolutional kernels being 96, 256, and 384, and the number of nodes for fully connected layers being 4096. However, since GAFF images are single-channel grey images, such a large number of channels are not necessary. Consequently, we decreased the channel numbers to 32, 86, and 128, and the nodes to 1066, leading to a decrease in the model parameters.
The complexity of the model can be characterized by two key metrics: space complexity and time complexity. The former is assessed by the total number of parameters in the model, including the weights and biases across all layers, while the latter is reflected in the computational demands of the model, typically quantified as the number of floating point operations (FLOPS) required for training.
For convolutional layers in AlexNet and lightweight AletNet, the parameters and FLOPs can be calculated as follows:
where
and
refer to the parameters and FLOPs of the conventional layer, respectively.
and
indicate the channel numbers of the input and output to the conventional layer, while
and
represent the height and width of the kernel. Additionally,
H and
W denote the height and width of the output feature map.
For fully connected layers in AlexNet and lightweight AletNet, the parameters and FLOPs numbers can be calculated as follows:
where
and
indicate the parameters and FLOPs of fully connected layer, respectively.
and
are channel numbers of input and output to fully connected layer.
For max pooling layers in AlexNet and lightweight AletNet, the parameters are zero and the FLOPs numbers can be calculated as follows:
In order to determine the degree of reduction in total parameters and FLOPs, we define the ratios
and
as follows:
where
and
indicate the total parameters of AletNet and lightweight AletNet, and
and
indicate the total FLOPs of above two networks, respectively. As we can see in
Table 1, the parameter ratio and the FLOPs ratio of the first conventional layer and the last fully connected layer approximately equal to
, the parameter ratio and the FLOPs ratio of other conventional layers and the fully connected layers approximately equal to
, the parameter ratio and the FLOPs ratio of max pooling layers approximately equal to
, and the total parameter ratio and total FLOPs ratio are approximately equal to
, which effectively reduces space complexity and time complexity.
However, while bringing advantages, lightweight AlexNet also brings risks associated with reducing the channels and nodes, including, for example, feature representation decreases, information loss and model under-fitting.
2.3. Weighted Fusion Coefficient Optimization
To improve the accuracy of fault diagnosis, we propose the utilization of intelligent optimization methods to determine the optimal weighted fusion coefficient. This study explores several heuristic swarm optimization algorithms, including Genetic Algorithm (GA), Ant Colony Optimization (ACO), Whale Optimization Algorithm (WOA), Grey Wolf Optimizer (GWO), and PSO, among others. These algorithms employ iterative computations and evaluation functions to efficiently search for the optimal value, making them well-suited for tackling nonlinear optimization problems.
2.3.1. Optimization Algorithm Selection
According to recent research, it has been demonstrated that ACO, WOA and GWO are specific variants of the PSO algorithm [
30,
31]. PSO is an evolutionary algorithm inspired by the foraging behavior of bird flocks in search of food. It incorporates mechanisms of individual improvement, population cooperation, and competition. The fundamental concept of this algorithm is to consider particles as individual entities without volume or mass. Each particle possesses two essential attributes: velocity and position. These attributes are continuously adjusted throughout iterations, aiming to converge towards the global optimum of the particle swarm as well as the particle’s historical optimum. By evolving in this manner, the algorithm strives to discover improved values and enhance overall performance.
GA is an optimization technique that emulates the principles of superiority and inferiority in biological evolution. It possesses characteristics such as self-organization, self-adaptation, and easy parallelism. The fundamental concept of GA is to transform the task of finding an optimal solution into a process of crossover and mutation among chromosomal genes. By applying the rule of superiority, the algorithm selects desirable adaptation values while discarding inadequate data. This process of crossover and mutation is repeated to progressively attain superior solutions. GA facilitates local information sharing through chromosome cross-swapping, whereas PSO globally shares information to guide all particles toward the global optimal solution.
In the context of the sensor fault diagnosis problem based on spatial domain image fusion discussed in this paper, our primary focus lies on achieving accurate fault diagnosis. The accuracy of fault diagnosis is directly influenced by the fusion coefficient, making the selection of the optimization algorithm critical for finding the global optimal solution for this coefficient. By maximizing the utilization of global information and minimizing the risk of getting stuck in local optimum, the probability of obtaining a globally optimal solution is enhanced. Considering the PSO algorithm’s advantages in terms of high utilization of global information and low risk of falling into local optimum, it is the preferred choice for this study.
2.3.2. PSO Mathematical Expression
In the context of the particle swarm optimization (PSO) algorithm operating in the real number space, each potential solution within the search space can be conceptualized as an individual particle maneuvering through the hyperdimensional landscape of the given problem [
32]. The position of each particle is determined by the vector
and its movement by the velocity of the particle
, as shown in following equations:
where
,
are two positive numbers and
,
are two random numbers with uniform distribution in the range of
.
is a particle’s best position and
is a global best position. As we can see, the velocity update equation in Equation (
11) has three major components, which represents three properties as follows [
33]:
- (1)
The first component is sometimes referred to as “inertia”, “momentum”, or “habit”. It models the tendency of the particle to continue in the same direction it has been traveling.
- (2)
The second component of the velocity update equation is a linear attraction towards the best position ever found by the given particle.
- (3)
The third component of the velocity update equation is a linear attraction towards the best position found by any particle.
2.4. Fault Diagnosis Architecture
By synthesizing the strengths of spatial domain image fusion, lightweight AlexNet and PSO, we propose a complete architecture for fault diagnosis, as shown in
Figure 3.
The architecture begins with the construction of an image dataset by converting one-dimensional time series signals into two-dimensional spatial domain images using the GAF method. Afterward, the dataset is split into training, validation, and test sets. The training set is employed to optimize the parameters of the lightweight AlexNet network. Then, the validation set data are used to optimize the weighted fusion coefficients using the PSO algorithm. Finally, the test set data are fed into the network parameter fault diagnosis architecture, utilizing the optimal fusion coefficients and trained network parameters, to evaluate the performance of the model.
3. Experiment
In order to verify the performance of GAFF-PSO-AlexNet method in practical scenarios, validation experiments were designed on the robotic fish platform, and depth sensor was selected for the research.
3.1. Data Collection
Our laboratory developed a robotic fish that imitates the structure of a shark in terms of its streamlined shape, which helps to minimize water resistance [
34].
Figure 4a depicts the three-link posterior body of the robot, equipped with a lunate caudal fin for thrust generation. The robot is also fitted with a pair of pectoral fins possessing two independent degrees of freedom for orientation and depth adjustment. The fore body of the robot is made of the acrylonitrile–butadiene–styrene (ABS) copolymer, while the posterior body is coated with shin, rendering it as water-resistant as possible. The final prototype of the robot has dimensions of 48.3 cm in length (maximum, including caudal fin), 20.8 cm in width (maximum, including pectoral fins), and 12.5 cm in height (maximum, including dorsal fin), with an approximate weight of 1.35 kg.
An embedded control system based on the STM32F407 micro-controller is developed to enable excellent underwater swimming performance of the robotic fish. The Central Pattern Generator (CPG)-governed control strategy is employed to achieve various shark-like movements, such as forward and backward swimming, turning, diving, and surfacing. The robotic fish is powered by 7.4 V direct current batteries that provide operational flexibility by freeing it from power cable constraints. To achieve intelligent perception and precise control, it is equipped with various sensors such as Inertial Measurement Unit (IMU), depth sensor, camera, and infrared sensor. Among these sensors, the depth sensor is installed on the bottom surface of the robotic fish, as shown in
Figure 4b, making it more vulnerable to underwater obstruction collisions than other sensors installed inside the robotic fish. Therefore, this study focuses on fault diagnosis of the depth sensor.
Aquatic experiments are conducted in a laboratory pool measuring 500 cm long, 400 cm wide, and 120 cm high. To collect data automatically, a data collection system based on the HC-12 wireless communication module is designed, as depicted in
Figure 5a. The HC-12 module operates at the 433 MHz frequency band and has a high transmitting power, making it suitable for communicating with the robotic shark. The host PC is used for remote control and monitoring of the robotic fish, and the robotic shark is responsible for sensor data collection, swimming motion, and communication with the host PC. The collected sensor information is recorded in a database.
Following the configuration of hardware and software, data are collected from the robotic fish. The process begins with the robot fish in normal operation, and after a period of time, the occurrence of sensor faults is observed. Depth control commands are sent by the host PC, following which the robotic shark moves as per the control law. The top view and side view of the robotic shark during the depth control process are shown in
Figure 5b,c, respectively. During this process, the host PC is sending port and the robotic shark is receiving port. Real-time sensor information is recorded by the robotic shark on a Secure Digital (SD) Card. Upon completion of all the motions, the robotic shark sends the information in the SD card to the host PC. The host PC receives sensor data and records them in a database with labels, while the robotic shark is sending port and the host PC is receiving port during this process. In order to minimize the impact of different robotic fish tasks on sensor fault diagnosis, the depth data are used for fault diagnosis by subtracting the target depth value from the sensor’s depth value. Thus, the data collection work is completed.
The depth sensor has a variety of fault types. In the experiment, we only considered six types as shown in
Table 2, and some other fault types were not considered due to the limitation of the experimental conditions, such as those arising from poor generation of signals from the robotic fish through to poor transmission of signals, and so on. Our experiment included the normal type and five fault types, namely the depth sensor with no output fault, the depth sensor with intermittent output fault, the depth sensor with jumping output fault, the depth sensor with drifting output fault, and the depth sensor with constant output fault. To avoid quantitative issues with the values and accelerate the convergence of the neural network, the signals were normalized to the range of [0, 1]. Additionally, to eliminate the influence of unbalanced data, an equal number of samples were selected for each type from the collected data. Finally, in order to make full use of collected data, the ratio of 6:2:2 was used to divide the training set, validation set and test set [
35]. In practical applications, the data division ratio depends on the specific problem and the size of the data.
The Wilcoxon rank sum test is a nonparametric test method, the contribution of which is to measure the distribution difference between two groups of data samples [
36]. Without any special assumptions about the distribution of objective data, the Wilcoxon rank sum test can be applied to some complicated distribution situations. Consequently, the Wilcoxon rank sum test is used to measure the distribution difference between two random types data. The hypothesis
is proposed that the two types data have the same distribution at the significant level
.
If the hypothesis
is accepted, it means they are similar in the distribution of the two types of data. In other words, once the hypothesis
is rejected, it means there is a big difference in the distribution of the two types of data. We randomly selected one data type in each fault data and performed the Wilcoxon rank sum test between two of the six fault data, and the significant level of
was achieved. The result of the Wilcoxon rank sum test is shown in
Table 3.
As we can see, the Wilcoxon rank sum test value of normal data and other fault data is 0, which indicates that they are more different from each other. The values of F1 and F5, F2 and F5 are all greater than , which were tested as the same category and are easily misclassified. The Wilcoxon rank sum test value of the F1 fault type and F2 fault type is 0.005, which is in a critical state. The values between the other fault types are less than , indicating that the data are significantly different from each other and the probability of correct classification is relatively high.
3.2. Algorithm Implementation
After completing the data collection work, the next step is to use the proposed algorithm to diagnose faults. The total fault diagnosis flowchart of spatial domain image fusion with PSO and lightweight AlexNet is shown in
Figure 6, which can be divided into the following steps:
Step 1: The time series sensor signals collected from robotic fish in the depth control are inputted to the GAFF-PSO-AlexNet fault diagnosis model.
Step 2: The sliding window method is used to segment the original signal into a series of equal-sized sub-signals and regard each sub-signal as one sample, achieving the effect of data augmentation.
Step 3: The time series sensor signals are converted into GASF images and GADF images using the GAF method.
Step 4:The GASF and GADF images are fused using Equation (
5) to make full use of the information in two types of images.
Step 5: The PSO algorithm is adopted to find the optimal weighted fusion coefficient .
Step 6: With the optimal weighted fusion coefficient, lightweight AlexNet is used to diagnose fault types in the depth sensor.
Step 7: The result of fault diagnosis is output, including confusion matrix, accuracy, precision rate, recall rate and F1-Score.
In teh above Step 4, the PSO algorithm is employed to obtain the optimal weighted fusion coefficient. In this process, the accuracy of the validation set in the lightweight AlexNet is utilized as the fitness function, and the parameter to be optimized is the weighted fusion coefficient. To perform the optimization, a pack of three particles is selected and the maximum number of iterations is set to 60. The specific steps of the optimization process are presented in Algorithm 1.
The proposed fault diagnosis method was implemented in a Python environment on a computer equipped with an Intel 3.8 GHz Core i7-10700K CPU and NVIDIA RTX 3060 Ti GPU with a memory capacity of 8 GB. The Pytorch framework was utilized for training, validating and testing the GAFF-PSO-AlexNet network.
Algorithm 1 Framework of the PSO algorithm optimizing weighted fusion coefficient |
- Input:
the maximum number of iterations N; the number of particles n; the weighted coefficient upper bound and lower bound - Output:
the optimal weighted coefficient - 1:
Initialize the parameters , , , and - 2:
for each particle i - 3:
Initialize position and velocity for particle i - 4:
Evaluate particle i and set - 5:
end for - 6:
= min - 7:
while not stop - 8:
for i = 1 to N - 9:
Update the position and velocity of particle i - 10:
Evaluate particle i - 11:
if fit()<fit() - 12:
- 13:
if fit()<fit() - 14:
- 15:
end for - 16:
end while - 17:
Save optimal weighted fusion coefficients = - 18:
return
|