Application of Target Detection Method Based on Convolutional Neural Network in Sustainable Outdoor Education

Yang, Xiaoming; Samsudin, Shamsulariffin; Wang, Yuxuan; Yuan, Yubin; Kamalden, Tengku Fadilah Tengku; Yaakob, Sam Shor Nahar bin

doi:10.3390/su15032542

Open AccessArticle

Application of Target Detection Method Based on Convolutional Neural Network in Sustainable Outdoor Education

by

Xiaoming Yang

^1,2,

Shamsulariffin Samsudin

^1,*

,

Yuxuan Wang

³,

Yubin Yuan

¹

,

Tengku Fadilah Tengku Kamalden

¹ and

Sam Shor Nahar bin Yaakob

⁴

¹

Department of Sports Studies, Faculty of Educational Studies, Universiti Putra Malaysia, Serdang 43400, Malaysia

²

College of Physical Education, East China University of Technology, Nanchang 330013, China

³

Sports Institute, Nangchang Jiao Tong Institute, Nanchang 330100, China

⁴

Department of Nature Parks and Recreation, Faculty of Forestry and Environment, Universiti Putra Malaysia, Serdang 43400, Malaysia

^*

Author to whom correspondence should be addressed.

Sustainability 2023, 15(3), 2542; https://doi.org/10.3390/su15032542

Submission received: 17 November 2022 / Revised: 10 January 2023 / Accepted: 28 January 2023 / Published: 31 January 2023

(This article belongs to the Section Sustainable Education and Approaches)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

In order to realize the intelligence of underwater robots, this exploration proposes a submersible vision system based on neurorobotics to obtain the target information in underwater camera data. This exploration innovatively proposes a method based on the convolutional neural network (CNN) to mine the target information in underwater camera data. First, the underwater functions of the manned submersible are analyzed and mined to obtain the specific objects and features of the underwater camera information. Next, the dataset of the specific underwater target image is further constructed. The acquisition system of underwater camera information of manned submersibles is designed through the Single Shot-MultiBox Detector algorithm of deep learning. Furthermore, CNN is adopted to classify the underwater target images, which realizes the intelligent detection and classification of underwater targets. Finally, the model’s performance is tested through experiments, and the following conclusions are obtained. The model can recognize underwater organisms’ local, global, and visual features. Different recognition methods have certain advantages in accuracy, speed, and other aspects. The design here integrates deep learning technology and computer vision technology and applies it to the underwater field, realizing the association of the identified biological information with the geographic information and marine information. This is of great significance to realize the multi-information fusion of manned submersibles and the intelligent field of outdoor education. The contribution of this exploration is to provide a reasonable direction for the intelligent development of outdoor diving education.

Keywords:

neural network algorithm; neurorobotics; manned submersible; outdoor education; convolutional neural network; image processing

1. Introduction

Outdoor education can be simply understood as physical education carried out in an outdoor environment. In the process of outdoor learning, students live in an outdoor environment. The outdoor environment is a factor that has an important influence on students, and it will strengthen their learning impression and perception experience. Before the formal start of outdoor teaching, effective teaching will be organized, and the organizers will analyze it according to the actual outdoor conditions and teaching purposes and make scientific preparations for possible problems in the outdoor environment so that students can really participate in the study in the outdoor environment. Outdoor education promotes sustainable development. Knowledge of nature teaches learners to take care of nature, and it will influence their future choices. Diving is an outdoor sports activity under high pressure. Unlike ordinary swimming, it is a technical sport. Apart from the great enjoyment of people’s spiritual life brought by the mysterious underwater environment, diving has other benefits, such as broadening people’s minds and opening them up. Years of diving exercise can help improve the muscle strength of all parts of the human body and improve the quality of reaction speed, patience, and flexibility. It can also cultivate people’s courage, tenacity, cooperation, and prudent perseverance. Manned submersibles can help students learn more about underwater knowledge, and intelligent manned submersibles play an important role in students’ understanding of the underwater world. The ocean, which accounts for 70.8% of the world’s surface area (about 3.6 × 10 square kilometers), is the world’s largest treasure house of resources. It is the cradle of the world’s biological origin and contains massive biological resources, mineral reserves, and other resources, which is an important guarantee for future survival resources. Outdoor education often uses specific natural environment resources to guide people to experience and learn in nature, feel the beauty and mystery of nature, establish a deep connection between man and nature, stimulate people’s motivation to understand nature, cultivate people’s attitude of respecting nature, and promote people’s actions to protect nature. The application of diving in outdoor education can not only enhance learners’ interest in exploring nature but also promote the sustainable development of outdoor education.

The manned submersible is equipment for exploring the deep-sea field and developing deep-sea resources. It can carry various electromechanical equipment and researchers to quickly and accurately reach the deep-sea environment to conduct scientific investigation and development operations [1]. Due to their flexible operation and multiple functions, manned submersible has become one of the research hotspots in exploring deep-sea equipment, and various countries have joined in its research and development. The complex deep-sea environment makes it a series of challenging tasks to break through many key technologies of the submersible. Among them, underwater environment perception and map-building technology and underwater target detection and recognition technology have become the research focus of many researchers worldwide [2]. Underwater target information acquisition is the premise of intelligent decision-making of submersibles. Underwater target recognition is vital for humans to understand the deep-sea environment and develop deep-sea resources. So far, there is no mature theoretical system for underwater identification technology. At present, the most widely used underwater target recognition technology is the sonar sensor. Because the sonar image resolution is unstable and easily disturbed by the deep-sea environment, the target recognized by the sonar image often lacks accuracy. Accurate and efficient underwater target recognition technology is still one of the important research directions for submersibles’ underwater operation [3]. The underwater gleam imaging vision system equipped in a manned submersible is small in size and low in power consumption. It can overcome the complex changes of the underwater environment, and the imaging accuracy is much higher than that of the sonar system. Underwater camera information has increasingly become important data for humans to obtain deep-sea environmental information [4]. Deep learning (DL) and computer vision have been widely used in people’s lives, which play an essential part in security, transportation, industrial production, and online shopping [5]. However, computer recognition technology is rarely used in the marine field, especially in the deep-sea field. Compared with land operations, seabed operation has multiple limitations, leading to problems, such as high difficulty coefficient, low operation efficiency, and high labor intensity of scientific researchers [6,7].

The research problem is how to improve the quality of underwater cameras extracted by manned submersibles. The gap between this exploration and other studies is that the concept of neurorobotics and DL technology are introduced to study this problem. Around this identification technology, the specific objects and characteristics of underwater camera information obtained by manned submersibles are deeply excavated and analyzed. Next, based on the Single Shot-MultiBox Detector (SSD) algorithm, an underwater camera information acquisition system for manned submersibles is designed. Finally, experiments are designed to verify the system’s performance. The innovation of this research lies in outdoor diving education, and a method of underwater camera data information mining based on deep learning is proposed. Compared with other networks, the advantage of this model is that the data set uses the ResNet34 network to train the network model for recognition and classification to realize automatic recognition and classification of underwater targets, thus improving the automation and intelligence of underwater optical image detection. The function of mining the underwater camera information of the manned submersible is realized through feature extraction, feature classification, and target recognition of the specific objects in the underwater camera information obtained by the manned submersible. Then, the efficiency of obtaining the camera information of the manned submersible is improved. Neurorobotics is a comprehensive study of neuroscience, robotics, and artificial intelligence. It is a science and technology that embodies the autonomic nervous system. The nervous system includes brain-inspired algorithms, computational models of biological neural networks, and actual biological systems. Such a nervous system can be embodied in a machine with mechanical or any other form of physical actuation. It includes robots, artificial limbs or wearable systems, small-scale micromachines, and large-scale furniture and infrastructure. Therefore, based on neurorobotics, in order to build an intelligent underwater sports robot, this exploration integrates the idea of DL and computer vision technology, applies it to the underwater field, and realizes the automatic recognition function of underwater camera information of manned submersible. In the outdoor education of underwater robots, the acquisition of target information by intelligent robots can help students learn and understand the underwater world better. Underwater camera information recognition plays an important role in promoting the development of intelligent manned diving. This paper is of great significance to the field of outdoor education, intelligent manned submersible, and marine life survival law research. Diving is a new field in outdoor education. The research contribution of this paper is to promote the application of deep learning technology in outdoor diving education. This paper can promote the sustainable development of outdoor marine education.

2. Literature Review

Underwater vehicles are mainly adopted in the fields of marine data monitoring, underwater equipment maintenance, marine target acquisition, and identification. The success of these tasks is closely related to the target identification and positioning of underwater vehicles. Sonar technology is the traditional ship underwater target detection method and the most commonly used detection technology. Sonar detection technology can be divided into sonar echo and sonar image detection. It can be used for underwater target detection for a long time. However, sonar is often unable to collect clear target images, and it is difficult to obtain accurate recognition results due to the constraints of the complex underwater environment and the diversity of underwater target attributes [8].

With the development of science and technology and the progress of the times, the technology of underwater vehicles is more and more mature, and the underwater optical vision system has been improved accordingly. The visual signal of the underwater vehicle has more applications in the field of image recognition. The research on underwater machine vision technology relies on underwater vehicle development. In foreign countries, the research on underwater robot technology is decades earlier than that in China. Among them, the research and achievements of the United States and Japan are leading other countries worldwide. The Monterey Bay Aquarium Research Institute (MBARI) is the world’s first international scientific research organization to apply high-resolution camera technology to underwater vehicles to detect the types and distribution of marine organisms. It has realized the quantitative interface analysis of underwater images and videos at different depths. Using underwater vehicles to carry cameras to obtain images of marine organisms and then detect the population and distribution of marine organisms replaces the traditional method of trawling to investigate marine organisms [9]. The underwater vehicles studied by the US Naval Postgraduate School (NPS) and the University of Tokyo in Japan can realize the function of monitoring the underwater environment through cameras [10].

There are also many research achievements in underwater target tracking and seabed detection abroad. Among them, the US Navy underwater search system is an underwater vehicle equipped with an earlier optical vision system. The carrier is mainly equipped with a sonar system, an underwater camera system, and an underwater lighting system. The control system is an embedded computer system with artificial intelligence processing, which mainly realizes the acquisition and monitoring of seabed targets. Foreign countries have also made many research achievements in target recognition algorithms. Yu et al. (2022a) established a data-driven model based on the two-dimensional convolutional neural network (CNN). In order to improve the prediction accuracy of the proposed model, the hyperparameters of CNN are optimized in the training stage by using the improved bird swarm algorithm. This method enables the recognition network to extract deeper target features, and the feature graph is more abstract and representative, which is conducive to classification [11]. Yu et al. (2022b) proposed vision-based automation for surface condition recognition of concrete structures. This model is composed of the most advanced pre-training CNN, transfer learning, and decision-level image fusion. An improved Dempster–Shafer (DS) algorithm is designed for decision-level image fusion to improve crack detection accuracy. The robustness of the proposed method is verified by using images polluted by various types and intensities of noise, and satisfactory results are obtained [12].

The research on underwater vehicle technology and underwater target recognition technology in China is relatively later than that in foreign countries, but China has also made considerable achievements in these two fields. Since the 1970s, China has begun to carry out large-scale submersible research. Many institutions and scholars in China also actively study this field and have made numerous achievements and published multiple studies. The PLA Military Institute of Engineering is leading in the research of underwater target recognition technology in China. In the early stage, the Harbin Engineering University and Research Institute jointly developed a set of optical vision systems with an independent working ability to recognize underwater targets, and its recognition rate can reach more than 94% [13]. In the later stage, after the improvement of the related algorithm, the accuracy rate increased to more than 98% after extending the boundary invariant feature as the judgment basis of target recognition. With the rise of machine learning, Cheng et al. (2021) proposed an adaptive feature method based on machine learning to fuse multiple features of underwater targets. A target recognition model can be obtained by modifying the fused target features. This method adopts multiple features of the same target and can better adapt to the underwater optical environment [14].

Through the continuous efforts of researchers, great progress in underwater recognition technology has been made, but this direction is still quite a challenging subject. The main difficulties of underwater recognition technology are that image acquisition equipment, resolution, and acquisition environment will lead to the difference in acquired images. The uncertain factors in the acquisition process increase the difficulty of underwater target recognition to a great extent. According to the above difficulties of underwater target recognition, extracting effective target features has become the top priority of underwater target recognition. This exploration is mainly aimed at the above problems, based on the DL idea, through the research and improvement of the recognition algorithm to extract more abstract features to improve the target recognition performance of underwater camera information of manned submersibles.

3. Theory of Submersible and Machine Learning and Design of Submersible Camera Model System

3.1. Overview of Neurorobotics and Related Principles of the Manned Submersible

Neurorobotics is a branch of neuroscience and robotics that is dedicated to the research and implementation of the science and technology of autonomic nervous systems, such as brain heuristic algorithms. The core idea of neurorobotics is to concretize the brain and embed the body into the environment. Therefore, contrary to the simulated environment, most neuro robots need to play a role in the real world. In addition to robot-inspired algorithms, neurorobotics may also involve designing brain-controlled robot systems. Neuroscience attempts to identify the composition and mode of action of intelligence through the study of intelligent biological systems, while the study of artificial intelligence attempts to rebuild intelligence through abiotic or artificial means. Neurorobotics is the overlap of the two. Among them, the theory inspired by biology is tested in a grounded environment, and the physical implementation of the model is adopted. The success or failure of the neuro robot and its model can provide evidence to refute or support this theory and provide insight for future research. The common type of neuro robot is the robot used to study motion control, memory, motion selection, and perception. The research scope of neurorobotics includes motion and motion control, learning and memory functions, action selection and value system, sensory perception, and biological robots. This exploration aims to lay a technical foundation for underwater motor neuro robots. Therefore, neural network technology is introduced to construct an underwater moving target detection algorithm. This exploration belongs to the field of sensory perception in neurorobotics.

Diving refers to all underwater activities, including diving with a pressurizer or air supply from the water and scuba diving with a ventilation system by the divers themselves. A Human-Occupied Vehicle (HOV) is a device for exploring underwater fields and developing underwater resources. It can carry various electromechanical equipment and researchers to reach the underwater environment quickly and accurately [15]. The complex and changeable underwater environment makes diving a challenging outdoor sport. Underwater environment cognition, map-modeling technology, underwater target detection, and bioidentification technology have become hot topics for multiple foreign researchers. There is no mature theoretical system for underwater target recognition technology. At present, the most common underwater target recognition technology is the sonar sensor. However, the definition of sonar image is not stable and easy to be disturbed by the underwater environment, so there are usually deficiencies in the accuracy of sonar image recognition. Accurate and effective underwater target identification technology is still one of the key exploration objectives of modern submersible underwater operations. DL and computer vision have been widely used in people’s lives and play an important role in outdoor education. However, computer recognition technology is rarely used in the underwater field. Unlike land sports, underwater sports have multiple limitations, leading to problems such as a high difficulty coefficient and high intensity of underwater sports.

Underwater vehicles are widely used in underwater data monitoring, underwater facility protection, and underwater target acquisition and identification. There are also multiple research achievements in underwater target tracking and seabed detection. Figure 1 shows the underwater vehicle and image acquisition.

Figure 1a shows the structure of the AUSS underwater vehicle. The US Navy’s underwater search system is an earlier underwater vehicle equipped with a vision system. The main detection equipment that the underwater vehicle carries includes a sonar system, underwater camera system, and underwater lighting system. The control system is an embedded computer system processed by artificial intelligence, which mainly realizes the acquisition and monitoring of seabed targets [16]. According to the research on the classification of underwater biological types, the underwater vehicles shown in Figure 1b are equipped with high-resolution underwater cameras. The video acquisition of underwater organisms is conducted through motion control. Figure 1c is the video processing process. It is to set the acquisition time for the camera and finally fuse the collected images into a whole seafloor image with a resolution good enough to see the morphological details of seafloor organisms.

3.2. Application of Machine Learning Model in Diving Sports

With the improvement of people’s quality of life, more and more people pay attention to their physical and mental health. Therefore, sports events in outdoor education are gradually increasing. People adjust their sports posture and choose the appropriate sports environment by watching relevant sports images, which is very important to people’s physical and mental health. Image recognition is a crucial foundation in sports management. Establishing a moving image recognition method with good performance is of great significance. This exploration is to study the field of sensory perception in neurorobotics, that is, the combination of diving and image recognition, to provide the basis for the development of neurorobotics. Machine learning has indeed made great breakthroughs in the past decade, and multiple critical world-changing applications have emerged in computer vision and language processing. CNN is one of the common models of the machine learning model. Due to its ultra-high image recognition rate and clear semantic segmentation ability for details, it can better change the poor robustness of traditional recognition algorithms. Thereby, it is widely used in complex object detection and modeling and has achieved excellent image recognition effectiveness [17]. Hence, CNN is adopted to recognize and calculate submersible camera signals based on the machine learning model. It is basically composed of five parts: the input layer, convolution layer, pooling layer, fully connected layer, and Softmax layer, which is responsible for extracting image features [18]. Compared with other neural networks, the network training algorithm of CNN mainly uses the Back Propagation (BP) operator. By continuously adjusting the parameters, the network’s final output is infinitely close to the expected output to achieve the purpose of training. Figure 2 is the input layer of the CNN.

Figure 2 displays that the input layer is an important data source of the neural network. Unlike other neural networks, the original image without image preprocessing can be directly input into the CNN input layer. When input into the network, it is often a three-dimensional pixel matrix representing an image [19]. Figure 3 shows the local connection between weight-sharing convolution neurons.

Figure 3 reveals that multiple convolution kernels can be set in the convolution layer, and each convolution kernel represents different meanings, such as a point, a straight line, or an arc. After the output of the upper layer is convoluted by the convolution kernel, the larger value is likely to be a point, a straight line, or an arc. The pooling layer mainly realizes the spatial sampling function of features. Some studies also call the pooling layer the sampling layer. Its main purpose is to disrupt a feature’s specific location to blur its specific location information, main focus features, and the relative location information of other features [20]. The fully connected layer is usually behind the convolution layer and pooling layer. After too many rounds of alternating processing of the convolution layer and pooling layer, it can be considered that the information contained in the input image has been abstracted into the characteristic of higher signal concentration. At this time, 1–2 fully connected layers are usually set to synthesize the output image characteristics and conduct the final analysis [21]. The Softmax layer mainly aims to solve the multi-classification problem, which is specifically responsible for calculating the probability of different features to obtain different kinds of probability distributions.

Since AlexNet won the championship in the ImageNet competition in 2012, CNN has entered a period of rapid development. Later, the Visual Geometry Group (VGG) network and other networks appear, proving that combining a deeper network and convolution kernel will make the network more sensitive to image features. However, with the deepening of the CNN hierarchy, the network becomes difficult to train, and there will be problems such as gradient disappearance, gradient explosion, and network degradation. In order to solve the above problems, with the increase of network layers, researchers propose the deep residual network ResNet [22]. The residual network is composed of a series of residual blocks, which can avoid problems such as the disappearance of network gradients and thus train deeper networks. ResNet networks use two different remaining blocks, including those used by shallow networks. The left side structure of the remaining blocks is to output the input through two 3 × 3 convolutional layers, and the right side of the remaining blocks is directly output through a shortcut. The output of the remaining blocks can be obtained by adding the left output and right output and then using the ReLU activation function to activate the output. In deeper networks, the remaining block structure is adopted. The left side of the remaining blocks uses 1 × 1, 3 × 3, and 1 × 1 convolutional layers in turn. The first 1 × 1 convolutional is used to reduce the dimension of data. When the input feature matrix depth is 256, the first 1 × 1 matrix can change the depth of the feature matrix to 64. The third convolutional layer, 1 × 1, is used to increase the data dimension to restore the feature matrix involved in the dimension reduction operation to the original depth of 256. Convolutional layers set in this way can reduce parameters and can be used for deeper networks. The direct output structure adds the left and right results and obtains the final output result through the ReLU activation function.

ResNet also uses Batch Normalization (BN) and residual blocks to address network degradation. When training the network, normalization is adopted to make the convolutional layer output meet the rule that the mean value is equal to zero and the variance is equal to one. When the operation can input the activation function, the output value of the convolutional layer can fall in the area where the nonlinear activation function is sensitive to the input. This method can make the loss function change greatly even if the input change is small so that the gradient becomes comparatively larger and the network can avoid the problem of gradient disappearance. Therefore, adding a BN layer after the convolutional layer of the residual block can solve the network degradation problem and accelerate the network training convergence rate. Generally, the deeper the network layer is, the stronger the ability of the network to extract image features. However, the greater the computing load of the corresponding network is, the longer the network operation time. Here, on the premise of ensuring a certain image feature perception and computing speed, the 34-layer-deep residual network ResNet34 is selected to recognize and classify images taken by conventional cameras and underwater laser range-gated cameras. ResNet34 network will perform five sets of convolution operations on the input image. The last four groups of convolutions contain residual modules, in which the number of residual blocks corresponding to block 1, block 2, block 3, and block 4 are three, four, six, and three, respectively. Therefore, the remaining network has 32 layers in total, and then the convolutional layer of conv1 and the final fully connected layer of the ResNet34 network have 34 layers in total. ResNet uses a cross-entropy loss function to measure the difference between actual discrimination and model output discrimination.

3.3. Algorithm and Design of Submersible Camera Model Based on Deep Learning Technique

Based on the SSD algorithm, the information collected by the underwater camera of the manned submersible is detected. The optimal system suitable for the manned submersible’s underwater camera information recognition technology is determined by comparing different detection effects [23]. Figure 4 is the underwater camera information recognition process of the submersible.

Figure 4 displays the signal recognition process of the underwater camera of the manned submersible. With TensorFlow as the detection framework, training dataset, experimental dataset, and test result set are obtained from the dataset of the signals collected by the underwater camera of the manned submersible through data processing, such as cutting and format conversion [24]. The target recognition model for the underwater camera information of manned submersibles is generated by training the model to be identified. The collected underwater test set images are used to detect the trained model performance and, finally, realize the accurate identification of underwater organisms during the underwater navigation of manned submersibles [25].

This algorithm must perform feature extraction on the original image through a CNN. The front CNN extraction feature map is set to S1 × 39 × 256 (width × high × number of channels). A convolution calculation is performed on the image’s features, and it is ensured that the image’s height, width, and number of channels remain unchanged [26]. For a group S1 × 39 × 256 features, it will have 51 × 39 “directions”. Each direction in the new convolution feature undertakes the detection of “frames” of nine sizes corresponding to the position in the original image. It means that each direction in 51 × 39 directions corresponds to the detection of nine “frames”, resulting in a total of Sl × 39 × 9 detection boxes. The concept of “frame” corresponds to the anchor in the literature. For 51 × 39 × 9 anchors, Figure 5 shows the specific calculation process.

Figure 5 displays that if the number of anchors corresponding to each position is k (k = 9), a 3 × 3 sliding window is adopted to convert each position into a unified 256-dimensional feature [27] that corresponds to the output of the two parts. The anchor of this position is the regression between the probability of the object and the detection frame [28]. Among them, the anchor corresponding to the first part is the total length of the probability of the object, which is 2 × k (an anchor corresponds to two outputs, which are the probability sum of the object and not the probability of the object). The other part of the regression box refers to the four-box regression parameters corresponding to an anchor, and its total output length is 4 × k. Figure 6 is its algorithm framework. The function of the Region of Interest (ROI) pooling layer is to extract the same size feature map from different sizes of ROIs mapped on the convolution feature map. The ROI pooling has only one pyramid layer, which turns any size feature map into a fixed-size feature map. It has greatly accelerated the training and testing. It also maintains high detection accuracy. This layer has two inputs: a fixed-size feature map obtained from a deep convolution network with multiple convolutions and the largest pool layer. An N × 5 matrix represents the list of Regions of Interest, where N is the number of ROIs. The first column represents the image index, and the remaining four columns are the coordinates of the upper left corner and the lower right corner of the area. For each Region of Interest from the input list, it takes a part of its corresponding input feature map and scales it to a predefined size (for example, 7 × 7). Scaling is calculated by dividing the regional proposal into equal-sized parts (the number of which is the same as the output dimension): find the maximum value of each part and copy these maximum values to the output (max pooling). As a result, from rectangular lists with different sizes, a list of corresponding feature maps with fixed sizes can be quickly obtained. The dimension of the ROI output actually does not depend on the size of input feature mapping or the size of the regional proposal. It is only determined by the number of parts divided by the proposal.

Figure 6 reveals that the Region of Interest (ROI) pooling layer can be used to extract the features of the input region, and the obtained feature information can be directly input into the fully connected layer. Then, two kinds of signals are output through the algorithm of the fully connected layer. One is a classification signal, and the other is a regression signal [29]. If k types of objects are detected in the image, the number of final output categories should be k + 1. The reason is that the computer automatically classifies the image background into one category when classifying the objects in the image. Therefore, the last category presented is the sum of the number of object categories in the image and the number of background categories, which is k + 1 [30]. Another output parameter is the regression of the detection box. This step is to actually “calibrate” the original detection frame to some extent because there is a certain deviation when obtaining the extraction frame. The learning parameters of regression calculation are

\frac{x^{'} - x}{w}, \frac{y^{'} - y}{h}, \ln \frac{w^{'}}{w}, and \ln \frac{h^{'}}{h} .

(1)

x, y, w and h are the four parameters of the extracted frame,

x^{'}

,

y^{'}

,

w^{'}

, and

h^{'}

are the positions of the real detection frame,

\frac{x^{'} - x}{w} and \frac{y^{'} - y}{h}

are the translation independent of the scale, and

\ln \frac{w^{'}}{w} and \ln \frac{h^{'}}{h}

are the scaling independent of the scale. The model parameters can be infinitely close to the original parameters in the training process through continuous learning of classification scores and regression parameters [31]. A matching strategy is needed during model retraining to match the default frame with the real target frame. In the process, the similarity is calculated by the Jaccard overlap. The specific calculation equations areas follows:

J (A, B) = \frac{| (A \cap B) |}{| (A \cup B) |}

(2)

and

J (A, B) = \frac{| (A \cap B) |}{| A | + | B | - | A \cap B |} .

(3)

A is the default frame, and B is the real target frame [32]. The loss function of the Loss layer is composed of classification loss and location loss.

X_{ij}^{P}

is set as the J-th real target frame of the criticism category, p, of the i-th default frame. According to the matching strategy,

\sum_{i} X_{ij}^{P} \geq 1

. The object detection loss function is:

L (x, c, l, g) = \frac{1}{N} (L_{cong} {(x, c) + α L}_{loc} (x, l, g))

(4)

α represents the influence factor (), and N represents the number of real frames. When N = 0, the loss is set to 0, and the positioning loss L_loc is the smoothing loss L₁ between the matching frames, l and g [33]. The positioning loss equation is:

L_{loc} (x, l, g) = \sum_{i \in Pos}^{N} \sum_{m \in (cx, cy, w, h}} x_{i, j}^{k} {smooth}_{L 1} (l_{i}^{m} - {\hat{g}}_{j}^{m})

(5)

{\hat{g}}_{j}^{ex} = ({\hat{g}}_{j}^{cx} - d_{i}^{cx}) {/ d}_{i}^{w}

(6)

{\hat{g}}_{j}^{cy} = ({\hat{g}}_{j}^{cy} - d_{i}^{cy}) {/ d}_{i}^{h}

(7)

and

{\hat{g}}_{j}^{w} = \log (\frac{{\hat{g}}_{j}^{w}}{d_{i}^{w}}) {\hat{g}}_{j}^{h} = \log (\frac{{\hat{g}}_{j}^{h}}{d_{i}^{h}})

(8)

{\hat{g}}_{j}^{cx}

,

{\hat{g}}_{j}^{cy}

,

d_{i}^{w}

, and

d_{i}^{h}

represents the coordinates of the real position,

d_{i}^{cx}

,

d_{i}^{cy}

,

d_{i}^{w}

, and

d_{i}^{h}

represents the coordinates of the default position, and

l_{i}^{m}

and

- {\hat{g}}_{j}^{m}

represents the offset of the preselected frame relative to the default frame [34]. The equation of the confidence-loss function is:

L_{conf} (x, c) = - \sum_{i \in Pos}^{N} x_{i, j}^{p} \log ({\hat{c}}_{i}^{p}) - \sum_{i \in Pos}^{N} \log ({\hat{c}}_{i}^{0})

(9)

and

{\hat{c}}_{i}^{p} = \frac{e^{c_{i}^{p}}}{\sum_{p} e^{c_{i}^{p}}} .

(10)

c is the feature of the object, and the other variables have been introduced earlier. If there is an object c in an ROI, then the feature region is first subdivided into k × k subdomains. Each domain contains the corresponding part of the characteristic signal of the object, which is convenient for finding out the average response value of each region on the regional sensitivity score mapping [35]. The calculation of the regional sensitivity pooling method is as follows:

r_{c} (i, j ∣ Θ) = \sum_{(x, y) \in bin (i, j)} z_{i, j, c} ({x + x}_{0} {, y + y}_{0} ∣ Θ) / n .

(11)

r_{c} (i, j ∣ Θ)

is the pooling-related result of the class c substance in the range of (

i, j

).

z_{i, j, c}

is an output of the k × k × (c+1) integral mapping process. (x₀,y₀) is the coordinate of the substance on the ROI, and n is the total number of pixels in the range.

Θ

is the amount and height of all selectable parameters in the network. During regression, the ROI also need four parameters as the regression offset, that is, the offset to x, y, w, and h [36]. The equation for establishing the position relationship of the submersible is:

x_{g} = \frac{m_{1}}{m} x_{T}, y_{g} = 0

(12)

and

z_{g} = \frac{m_{1}}{m} z_{T}

(13)

x_{g}

,

y_{g}

, and

z_{g}

are the positions and coordinates of the submersible. m is the overall mass of the submersible.

m_{1}

is the mass of the remaining parts.

X_{T}

and

Z_{T}

are the central position coordinates [37]. At the last level of the shared pooling layer, a group of network layers parallel to the highly sensitive score mapping layer is connected. This layer network realizes the regression operation on the four detection frame parameters, and the final output’s dimension is k × k × 4. The loss function of the algorithm is:

L ({s, t}_{x, y, w, h}) {= L}_{cls (s_{c^{*}})} + λ [c^{*} > 0] L_{reg} ({t, t}^{*})

(14)

and

L_{cls (s_{c^{*}})} = - \log (s_{c^{*}})

(15)

c^{*}

is the real category label of the ROI,

L_{cls (s_{c^{*}})}

is the classification loss function,

L_{reg} ({t, t}^{*})

is the regression loss function,

t^{*}

is the true image frame, and

[c^{*} > 0]

is an indicating switch. If the object is true, 1 is the output. Otherwise, 0 is the output. The network obtains its loss value and regression loss for any ROIs containing characteristic information. In the calculation process, the ROI with the highest degree of freedom of the frame is first selected. Then, in all the remaining ROIs, those whose coincidence degree with the frame is more than 0.5 are selected for matching. Finally, all the remaining ROIs are classified into the background class. Each ROI has its own mark, and then training and experimentation can be conducted.

3.4. Model Data Screening and Model Training Experiment Design

In this paper, the underwater camera data of the model identification part comes from the underwater camera information collected by China’s underwater vehicle during the diving process. The training data are to take two kinds of underwater camera information obtained by the submersible as examples. They are the underwater camera information obtained by comprehensive detection and integration in the process of sea area diving and the underwater camera information in the form of video obtained by manned submersibles in a certain sea area. Figure 7 is an image-capturing result.

Figure 7 reveals that the main goal of the manned submersible’s underwater camera information recognition system is to use image processing technology to obtain the target data in the images for all images of underwater photography. It also aims to use the DL algorithm to recognize the image and realize environmental cognition’s purpose. The number of edited images is still quite large. Among them, 6400 images can be generated from digital images after editing, and 1600 images can be generated from video signals after conversion. However, the original data of underwater photography information of manned submersibles have exceeded 8000. Training the image according to each data in the later stage is also very time-consuming and energy-consuming work, although the larger the amount of data is, the more refined the training model is, and the excess noise harms the accuracy of later recognition. For these reasons, the data screening of the original data is more necessary and critical. Figure 8 shows the data-filtering process.

Figure 8 presents the data-filtering process. The raw data in the figure refer to all digital image data generated by converting underwater camera information collected during manned submersible diving, with a total of 8000 pieces. After going through the judgment conditions in the figure, the screening of the raw data is completed. Data screening refers to the quantitative screening of data without affecting the subsequent recognition results. This screening process is similar to a filtering process. The original data are input into a filter, which filters out the images containing useless information and noise information, leaving the images containing effective information. Data screening needs to follow certain screening criteria. This paper uses the following screening strategies to screen the original data. The specific screening strategies mainly include two parts: discard images containing useless information and discard images containing excessive noise information. Useless information refers to the blind area scanned by the camera during the diving process of a manned submersible or the blind area generated when the submarine image is synthesized in the later stage, such as the blank area generated when the submarine scanning image is synthesized in the later stage. The image with excessive noise refers to the recognition target, in which the image only contains a very small range of target objects and the vast range of non-target objects. This kind of image makes very little contribution to the training of the later recognition model. In order to improve the running efficiency of the computer and reduce the manual workload, filtering out the image with excessive noise can improve the training speed and help the final recognition result.

TensorFlow’s object detection Application Programming Interface (API) is selected as the framework for the manned submersible’s underwater camera information target detection. The later model training and image recognition testing are conducted in the API. Figure 9 shows the training process of the model.

The parts in Figure 9 are image dataset input, target feature extraction, target feature classification, target position regression, and target detection result output. Among them, the image input part selects the API framework, which has no limit on the input image size, so it is no longer necessary to normalize the image size. Feature extraction is the core of the whole recognition task. The quality of target feature extraction is directly related to the accuracy of final target recognition. Then, the submersible camera model based on machine learning is used for the experiment. In the feature extraction of Figure 9, this paper adopts the feature extraction technology based on CNN. The main purpose of the Softmax layer is to solve the multi-classification problem, and it is specifically responsible for calculating the probabilities of different features to obtain different kinds of probability distributions. Softmax regression is a supervised learning algorithm based on the idea of logistic regression, which can achieve excellent classification results through an efficient combination of different learning algorithms. The function of the Regressor is parameter regression.

The TensorFlow framework is used as the development tool of target recognition, which is taken as a DL tool for the follow-up work. It is installed on the notebook. The notebook model is a Lenovo Ideapad Y700, the graphics card is NVIDIA960M, the running memory is 4GB, and the TensorFlow version is 1.11. TensorFlow of the graphics processing unit is installed here.

The experimental parameter settings include the following: The extraction box y = 10.0, the number of anchor layers at 6, the minimum depth of the preselection frame at 0, a matching threshold of the matcher at 0.5, and the minimum depth of the feature extractor at 16; The extraction box x = 10.0, the minimum anchor size at 0.2, the maximum depth of the preselection box at 0, a non-matching threshold of the matcher at 0.5, and multiplier depth of the feature extractor at 1.0; The extraction box h = 5.0, the maximum anchor size at 0.95, the throw of the preselection box at 0.8, the matcher ignores value N, and the feature extractor activation of function Relu_6; The extraction box w = 10.0, the anchor ratio at 1.0, the number of the preselection box cores at 1, the matcher at below-threshold mismatch, and the weight of the feature extractor at 0.00004.

The underwater image classification evaluation indicators include accuracy, precision, recall, specificity, and the F1 value. The calculation method is shown in Equations (16)–(20):

Accuracy = \frac{T N + T P}{T P + T N + F N + F P} .

(16)

TP is the number of positive samples predicted to be correct; TN is the number of negative samples predicted as errors; FN represents the number of positive samples predicted as errors; FP indicates the number of negative samples predicted to be correct.

Precision = \frac{T P}{T P + F P},

(17)

Recall = \frac{T P}{T P + F N},

(18)

Specificity = \frac{T N}{T N + F P},

(19)

and

F 1_{-} Score = \frac{2 * Precision * Recall}{Precision + Recall} .

(20)

In the classification and evaluation parts, conventional and laser-gated cameras’ underwater image datasets are established. These two datasets contain four types of images: human, fish, Autonomous Underwater Vehicle (AUV), and other objects. The image data in the conventional underwater image dataset comes from three sources: the image retrieved by the search engine, the image of the open underwater dataset, and the image actually taken by the research team in the laboratory pool. Then, the image data are expanded by rotating, flipping, and other dataset expansion methods. A conventional underwater image dataset of 1073 images is established. The dataset includes 223 person images, 258 fish images, 312 AUV images, and 280 other object images. The dataset is established by collecting the images of the corresponding objects in the laboratory pool, and the images in the laser-gated dataset are expanded by rotating, flipping, and other data enhancement methods. The dataset of underwater laser range-gated images is established, and there are 992 images in total. The dataset includes 240 human images, 254 fish images, 258 AUV images, and 240 other object images. According to 7:2:1, the dataset is divided into a training set, a verification set, and a test set.

The model pre-trained on ImageNet using ResNet34 will be retrained on the two datasets established here. The training process of the conventional underwater image dataset and underwater laser range-strobe dataset is established. The input image first passes through a convolutional layer with a step size of 2 and a convolution kernel size of 7 × 7 and then passes through a pooling layer with a step size of 2 × 3 and a convolution kernel size of 3 × 3 sampled under the maximum pool. Then, the image passes through five groups of remaining blocks. After the first residual, the size of the input vector is 56 × 56, and the depth of the feature map is 64. This structure is repeated. Then, the second residual block is input, and the output feature map is 28 × 28 in size and 128 in depth. Then, the third residual block is input, and the output feature map is 14 × 14 in size and 256 in depth. Finally, a 7 × 7 feature map is obtained through the last residual block. Finally, the classified and identified outputs are obtained through the average pool and fully connected layers. The hyperparameters in this CNN model are set as follows: pH = KH − 1 and PW = KW − 1 in the padding setting (where H represents rows, W represents columns, and K represents convolution kernel). The stride length is set according to the given height, Sh, and width, Sw, outputs the shape [(NH − KH + pH + SH)/SH] × [(NW − KW + PW + SW)/SW] (where N indicates that the stride length is a multiple of decreasing the height and width of the output every time the kernel window is slid), and rounds down.

4. Experimental Results and Analysis

4.1. Analysis of the Recognition Effect of the Model on a Single Target with a Simple Background

Based on image recognition in the field of sensory perception in neurorobotics, this exploration combines CNN to carry out target recognition analysis during diving. The trained model is derived, on which the underwater camera information of manned submersibles is recognized, and the recognition results are analyzed. First, a single target in the underwater image is detected. Figure 10 shows the results.

Figure 10 shows that, for a single object with simple background information, the recognition efficiency of the algorithm is outstanding, and the accuracy of target detection can reach 99%. The model’s accuracy for the detection of large underwater organisms has been stable at 99%. It can accurately display the object’s location, which is also an essential performance that the model is superior to other models. Compared with the latest research on recognition algorithms by Lei et al. (2022), it is found that the accuracy of underwater biological detection here is 96%, and the research results here are slightly higher than the literature data, which shows that this exploration is feasible [38].

4.2. Effect Analysis of Multi-Target Recognition with a Single Background

Figure 11 displays the recognition results of an image containing multiple targets and complex background information.

Figure 11a reveals that the model has an ideal detection effect for multiple targets of different categories in the image. The five detection frames shown in the figure are all effective detection frames, and the model has made great progress in detection accuracy. Figure 11b shows the detection of multiple seagrass plants. It suggests that the model can successfully detect the different location information of seagrass. Although the detection overlap is large, the model can still accurately detect the specific seagrass information. The problem of large detection overlap may be caused by the small training set, which can be improved by expanding the dataset in the later stage. There are multiple studies on image recognition with complex backgrounds. In the latest research of Foresti and Scagnetto (2022), the recognition results of images with complex backgrounds showed that the dataset has a certain impact on image recognition, which is basically consistent with the research conclusion here [39].

4.3. Analysis of Recognition Effect of the Model on Dense Targets

Figure 12 shows the recognition effect of the network model on dense underwater multiple targets.

The underwater targets in the figure are seagrass and coral, which are small and densely distributed. The recognition effect shows that five targets are successfully identified in Figure 12a, including two seagrass targets, two coral targets, and one reef target. In Figure 12b, eight targets are identified, including five seagrass targets and three coral targets. The recognition accuracy of the two images is above 70%. By observing the model recognition effect, it is found that the detection algorithm can better detect the targets with clear contour, and the recognition effect of individual targets is often higher than that of disorderly targets. Compared with the latest research of Estrada et al. (2022), the target detection and recognition results here are higher [40]. In addition, Zhou et al. (2022) used the same data set to study the underwater vehicle for coral reef ecological protection and obtained the underwater vehicle with a good anti-disturbance effect [41].

4.4. Classification Performance Evaluation

The accuracy rate of network classification and recognition obtained by this test set is 0.966. The classification confusion matrix results obtained for the validation set in the data set are shown in Figure 13.

The accuracy, recall, specificity, and F1-Score of each category are calculated by the confusion matrix, as shown in Figure 14. The network’s evaluation indexes of various categories show that the network has a good effect on image classification and recognition, as shown in the figure. The experiment of underwater image recognition and classification based on CNN is also added. Figure 14 shows the result.

Figure 14 reveals that, whether it is underwater conventional image classification or underwater-gated image classification from the network evaluation indicators of each category, the network has a good effect on image classification and recognition. The values of the four evaluation indicators exceed 0.9. CNN has better classification performance for underwater gated images. Because of underwater laser range-gated imaging characteristics, the targets on the image have no background interference, and the accuracy of classification and recognition is comparatively better. It suggests that using CNN to recognize and classify the images collected by camera hardware automatically realizes the automatic, efficient, and intelligent target perception of underwater image detectors to a certain extent. Yu et al. (2021) proposed a unique behavior recognition method based on simulated feature point selection. Combining feature point extraction with special behavior recognition, the accuracy of eating behavior detection is 96.02% [42]. Compared with the previous studies, the accuracy of underwater classification detection in this paper has little difference, so the CNN model can be applied to automatic image detection. Yang et al. (2021) discussed the submersible underwater vehicle using the same dataset [43]. The overall performance of the target recognition and classification performance evaluation of this model shows that this model has good underwater automatic recognition and classification performance.

5. Discussion

Through feature extraction, feature classification, and target recognition of specific objects in underwater camera information obtained by manned submersibles, and underwater image classification based on CNN, the function of mining underwater camera information of manned submersibles is realized, and the efficiency of obtaining underwater camera information of manned submersibles is improved. It is expected to provide a foundation for future research on the sensory vision of underwater robots. With the development of recognition technology, many classic recognition algorithms have been formed, including recognition methods based on local features, global features, and visual features. According to the characteristics of underwater camera information of manned submersibles, through experiments, analysis of experimental results, and comparison with other studies, it is found that the SSD algorithm has certain advantages in recognition speed and performs well in detecting small targets. Considering the accuracy and recognition speed, based on the requirements of the underwater camera information recognition system of manned submersibles for algorithm accuracy and response time, the research ideas and methods in this paper can realize underwater target detection, which is of great significance to related diving and outdoor sports fields. The optimized algorithm structure will make the recognition accuracy and speed better. Synthetic data is data created by humans rather than obtained from real life. It develops from the demand for machine learning for data collection. Initially, in order to accurately train the AI model, training data covering all possible scenarios must be obtained. If a scene does not happen or is not obtained, there is no corresponding data, and there will be a huge gap in the ability of the machine to understand the scene. By creating corresponding synthetic data through computer programs, these gaps in application scenarios can be made up. The main advantages of data synthesis include cost reduction, speed of data collection, and inclusiveness of data sets. For example, He et al. (2022) studied whether the synthesized data based on the “text-image” generation method is helpful to image classification from three angles, including zero-sample image classification, few-sample image classification, and transfer learning. It was found that the larger the scale of synthesized data, the higher the performance, and increasing the diversity of synthesized samples is also an important influencing factor [44]. If the dataset in this paper uses synthetic data, it will greatly improve the collection speed and efficiency of dataset collection.

6. Conclusions

The birth of manned submersibles is mainly to realize underwater activities, such as underwater surveys and seabed exploration with divers as the activity center. Knowing marine biological species is one of the pleasures of students’ outdoor diving education. This paper has a positive impact on the sustainable development of outdoor marine education, and the research of this article will enhance learners’ interest in exploring and understanding the ocean. A method aiming at the difficulty of underwater information extraction from manned submersibles’ underwater camera data targeting information mining based on deep learning and machine vision is proposed, and the underwater detection model with good performance is obtained. There are still some research deficiencies in this paper. There is still a lack of comparison with other algorithms in the same experimental environment (robot research can tilt toward wearable devices, etc.) There is no analysis of the robustness of the model under the influence of noise and the super parameter setting of the CNN model. In this study, the performance test experiment of the model is not comprehensive enough. The F1-Score is used in the experiment, but the mAP value is not used. Therefore, follow-up research will focus on the parallel and synchronous calculation of the algorithm, learning rate, optimization of iterative parameters, and other related technical issues. More experiments about the model will be added, and the analysis of mAP values will be added. The comparison with other research methods and research results are increased, the extensive database analysis is increased, the parameter optimization of the algorithm is further studied, the analysis of robots is increased, and the combination of target information mining algorithm and CNN identification and classification are strengthened. The details of robustness analysis of noise influence and setting of model hyperparameters are added. This paper is expected to be applied in the field of intelligent underwater detection, laying a foundation for the development of intelligent underwater detection.

Author Contributions

Conceptualization, X.Y. and S.S.; methodology, Y.W.; software, Y.Y.; validation, T.F.T.K. and S.S.N.b.Y.; formal analysis, X.Y.; investigation, S.S.; data curation, Y.W. and Y.Y.; writing—original draft preparation, X.Y.; writing—review and editing, S.S.; visualization, S.S.N.b.Y.; supervision, S.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable for studies not involving humans or animals.

Informed Consent Statement

Not applicable for studies not involving humans.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors without undue reservation upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Li, Z. Consumer behavior analysis model based on machine learning. J. Intell. Fuzzy Syst. 2021, 40, 6433–6443. [Google Scholar] [CrossRef]
Liu, J.; Lin, L.; Liang, X. Intelligent system of english composition scoring model based on improved machine learning algorithm. J. Intell. Fuzzy Syst. 2021, 40, 2397–2407. [Google Scholar] [CrossRef]
Kim, M.J. Building a cardiovascular disease prediction model for smartwatch users using machine learning: Based on the korea national health and nutrition examination survey. Biosensors 2021, 11, 228. [Google Scholar] [CrossRef] [PubMed]
Chen, P.W.; Baune, N.A.; Zwir, I.; Wang, J.; Wong, A. Measuring activities of daily living in stroke patients with motion machine learning algorithms: A pilot study. Int. J. Environ. Res. Public Health 2021, 18, 1634. [Google Scholar] [CrossRef]
Ho, M.C.; Shen, H.A.; Chang, Y.; Weng, J.C. A cnn-based autoencoder and machine learning model for identifying betel-quid chewers using functional mri features. Brain Sci. 2021, 11, 809. [Google Scholar] [CrossRef] [PubMed]
Ma, G.; Pan, X. Research on a visual comfort model based on individual preference in china through machine learning algorithm. Sustainability 2021, 13, 7602. [Google Scholar] [CrossRef]
Sul, B.B.; Dhanalakshami, K. Machine learning-based self-sensing of the stiffness of shape memory coil actuator. Soft Comput. 2021, 26, 3743–3755. [Google Scholar] [CrossRef]
Brandes, T.S.; Ballard, B.; Ramakrishnan, S.; Lockhart, E.; Marchand, B.; Rabenold, P. Environmentally adaptive automated recognition of underwater mines with synthetic aperture sonar imagery. J. Acoust. Soc. Am. 2021, 150, 851–863. [Google Scholar] [CrossRef] [PubMed]
Kazimierski, W.; Zaniewicz, G. Determination of process noise for underwater target tracking with forward looking sonar. Remote Sens. 2021, 13, 1014. [Google Scholar] [CrossRef]
Moghimi, M.K.; Mohanna, F. Real-time underwater image enhancement: A systematic review. J. Real-Time Image Process. 2021, 18, 1509–1525. [Google Scholar] [CrossRef]
Yu, Y.; Liang, S.; Samali, B.; Nguyen, T.N.; Zhai, C.; Li, J.; Xie, X. Torsional capacity evaluation of RC beams using an improved bird swarm algorithm optimised 2D convolutional neural network. Eng. Struct. 2022, 273, 115066. [Google Scholar] [CrossRef]
Yu, Y.; Samali, B.; Rashidi, M.; Mohammadi, M.; Nguyen, T.N.; Zhang, G. Vision-based concrete crack detection using a hybrid framework considering noise effect. J. Build. Eng. 2022, 61, 105246. [Google Scholar] [CrossRef]
Peng, F.; Miao, Z.; Li, F.; Li, Z. S-FPN: A shortcut feature pyramid network for sea cucumber detection in underwater images. Expert Syst. Appl. 2021, 182, 115306. [Google Scholar] [CrossRef]
Cheng, H.; Chu, J.; Zhang, R.; Zhang, P. Simulation and measurement of the effect of various factors on underwater polarization patterns. Optik 2021, 237, 166637. [Google Scholar] [CrossRef]
Goryunov, M.N.; Matskevich, A.G.; Rybolovlev, D.A. Synthesis of a machine learning model for detecting computer attacks based on the cicids2017 dataset. Proc. Inst. Syst. Program. RAS 2020, 32, 81–94. [Google Scholar] [CrossRef] [PubMed]
Park, D.; Ahn, J.; Jang, J.; Yu, W.; Yoo, I. The development of software teaching-learning model based on machine learning platform. J. Korean Assoc. Inf. Educ. 2020, 24, 49–57. [Google Scholar] [CrossRef]
Atzori, M.; Müller, H. PaWFE: Fast signal feature extraction using parallel time windows. Front. Neurorobot. 2019, 13, 74. [Google Scholar] [CrossRef] [PubMed]
Chang, M.; Canseco, J.A.; Nicholson, K.J.; Patel, N.; Vaccaro, A.R. The role of machine learning in spine surgery: The future is now. Front. Surg. 2020, 7, 54. [Google Scholar] [CrossRef] [PubMed]
Chen, K.; Hwu, T.; Kashyap, H.J.; Krichmar, J.L.; Stewart, K.; Xing, J.; Zou, X. Neurorobots as a means toward neuroethology and explainable AI. Front. Neurorobot. 2020, 14, 570308. [Google Scholar] [CrossRef]
Leins, D.P.; Gibas, C.; Brück, R.; Haschke, R. Toward More Robust Hand Gesture Recognition on EIT Data. Front. Neurorobot. 2021, 16, 110. [Google Scholar] [CrossRef]
Liu, Y.; Li, Y.; Yi, X.; Hu, Z.; Zhang, H.; Liu, Y. Lightweight ViT model for micro-expression recognition enhanced by transfer learning. Front. Neurorobot. 2022, 15, 128. [Google Scholar] [CrossRef] [PubMed]
Deeba, K.; Amutha, B. ResNet-deep neural network architecture for leaf disease classification. Microprocess. Microsyst. 2020, 17, 103364. [Google Scholar] [CrossRef]
Taniguchi, T.; Ugur, E.; Ogata, T.; Nagai, T.; Demiris, Y. Machine Learning Methods for High-Level Cognitive Capabilities in Robotics. Front. Neurorobot. 2019, 13, 83. [Google Scholar] [CrossRef]
Tryon, J.; Trejos, A.L. Evaluating Convolutional Neural Networks as a Method of EEG–EMG Fusion. Front. Neurorobot. 2021, 15, 692183. [Google Scholar] [CrossRef] [PubMed]
Nair, B.B.; Sakthivel, N.R. An upper limb rehabilitation exercise status identification system based on machine learning and IoT. Arab. J. Sci. Eng. 2021, 47, 2095–2121. [Google Scholar] [CrossRef]
Wang, Z.; Liu, J.; Zhang, Y.; Yuan, H.; Srinivasan, R.S. Practical issues in implementing machine-learning models for building energy efficiency: Moving beyond obstacles. Renew. Sustain. Energy Rev. 2021, 143, 110929. [Google Scholar] [CrossRef]
Li, L. Software reliability growth fault correction model based on machine learning and neural network algorithm. Microprocess. Microsyst. 2021, 80, 103538. [Google Scholar] [CrossRef]
Kudernatsch, S.; Wolfe, C.; Ferdowsi, H.; Peterson, D. A machine learning approach to hand-arm motion prediction for active upper extremity occupational exoskeleton devices. Proc. Hum. Factors Ergon. Soc. Annu. Meet. 2020, 64, 890–893. [Google Scholar] [CrossRef]
Xiang, L.; Wang, A.; Gu, Y.; Zhao, L.; Shim, V.; Fernandez, J. Recent Machine Learning Progress in Lower Limb Running Biomechanics with Wearable Technology: A Systematic Review. Front. Neurorobot. 2022, 16, 913052. [Google Scholar] [CrossRef]
Fiorentini, N.; Maboudi, M.; Leandri, P.; Losa, M.; Gerke, M. Surface motion prediction and mapping for road infrastructures management by ps-insar measurements and machine learning algorithms. Remote Sens. 2020, 12, 3976. [Google Scholar] [CrossRef]
Little, K.; Pappachan, B.K.; Yang, S.; Noronha, B.; Accoto, D. Elbow motion trajectory prediction using a multi-modal wearable system: A comparative analysis of machine learning techniques. Sensors 2021, 21, 498. [Google Scholar] [CrossRef] [PubMed]
Khosravikia, F.; Clayton, P. Machine learning in ground motion prediction. Comput. Geosci. 2021, 148, 104700. [Google Scholar] [CrossRef]
Chen, Y.; Wang, X.; Du, X. Diagnostic evaluation model of english learning based on machine learning. J. Intell. Fuzzy Syst. 2021, 40, 2169–2179. [Google Scholar] [CrossRef]
Saluja, J.; Casanova, J.; Lin, J. A supervised machine learning algorithm for heart-rate detection using doppler motion-sensing radar. IEEE J. Electromagn. RF Microw. Med. Biol. 2020, 4, 45–51. [Google Scholar] [CrossRef]
Abdollahi, M.; Ashouri, S.; Abedi, M.; Azadeh-Fard, N.; Rashedi, E. Using a motion sensor to categorize nonspecific low back pain patients: A machine learning approach. Sensors 2020, 20, 3600. [Google Scholar] [CrossRef]
Teixeira, J.V.; Hai, N.; Posselt, D.J.; Su, H.; Wu, L. Using machine learning to model uncertainty for water vapor atmospheric motion vectors. Atmos. Meas. Tech. 2021, 14, 1941–1957. [Google Scholar] [CrossRef]
Zheng, Y.; Song, Q.; Liu, J.; Song, Q.; Yue, Q. Research on motion pattern recognition of exoskeleton robot based on multimodal machine learning model. Neural. Comput. Appl. 2020, 32, 1869–1877. [Google Scholar] [CrossRef]
Lei, F.; Tang, F.; Li, S. Underwater Target Detection Algorithm Based on Improved YOLOv5. J. Mar. Sci. Eng. 2022, 10, 310. [Google Scholar] [CrossRef]
Foresti, G.L.; Scagnetto, I. An integrated low-cost system for object detection in underwater environments. Integr. Comput.-Aided Eng. 2022, 29, 123–139. [Google Scholar] [CrossRef]
Estrada, D.C.; Dalgleish, F.R.; Den Ouden, C.J.; Ramos, B.; Li, Y.; Ouyang, B. Underwater LiDAR image enhancement using a GAN based machine learning technique. IEEE Sens. J. 2022, 22, 4438–4451. [Google Scholar] [CrossRef]
Zhou, J.; Zhou, N.; Che, Y.; Gao, J.; Zhao, L.; Huang, H.; Chen, Y. Design and Development of an Autonomous Underwater Helicopter for Ecological Observation of Coral Reefs. Sensors 2022, 22, 1770. [Google Scholar] [CrossRef] [PubMed]
Yu, X.; Wang, Y.; An, D.; Wei, Y. Identification methodology of special behaviors for fish school based on spatial behavior characteristics. Comput. Electron. Agric. 2021, 185, 106169. [Google Scholar] [CrossRef]
Yang, Y.; Xiao, Y.; Li, T. A survey of autonomous underwater vehicle formation: Performance, formation control, and communication capability. IEEE Commun. Surv. Tutor. 2021, 23, 815–841. [Google Scholar] [CrossRef]
He, R.; Sun, S.; Yu, X.; Xue, C.; Zhang, W.; Torr, P.; Bai, S.; Qi, X. Is synthetic data from generative models ready for image recognition? arXiv 2022, arXiv:2210.07574. [Google Scholar]

Figure 1. Underwater vehicle and image acquisition. (a) AUSS vehicle, (b) LAUV vehicle, and (c) image acquisition diagram.

Figure 2. Schematic diagram of CNN input layer.

Figure 3. Local connection diagram of convolution neuron weight sharing.

Figure 4. Process diagram of underwater camera information recognition of the manned submersible.

Figure 5. Schematic diagram of the anchor calculation process.

Figure 6. Algorithm framework.

Figure 7. Acquired underwater camera information. (a) Digital image and (b) video screenshot.

Figure 8. Data filtering flow chart.

Figure 9. Schematic diagram of the model training process.

Figure 10. Experimental results of single-target detection. (a) The first recognition, and (b) the second recognition.

Figure 11. Image recognition results in multiple targets with complex backgrounds. (a) The first recognition, and (b) the second recognition.

Figure 12. Recognition and detection results of dense underwater multiple targets. (a) The first recognition, and (b) the second recognition.

Figure 13. Verification of the confusion matrix of the image set. (a) Conventional underwater image, and (b) underwater-gated image.

Figure 14. Evaluation of CNN classification performance. (a) Conventional underwater image, and (b) underwater-gated image.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, X.; Samsudin, S.; Wang, Y.; Yuan, Y.; Kamalden, T.F.T.; Yaakob, S.S.N.b. Application of Target Detection Method Based on Convolutional Neural Network in Sustainable Outdoor Education. Sustainability 2023, 15, 2542. https://doi.org/10.3390/su15032542

AMA Style

Yang X, Samsudin S, Wang Y, Yuan Y, Kamalden TFT, Yaakob SSNb. Application of Target Detection Method Based on Convolutional Neural Network in Sustainable Outdoor Education. Sustainability. 2023; 15(3):2542. https://doi.org/10.3390/su15032542

Chicago/Turabian Style

Yang, Xiaoming, Shamsulariffin Samsudin, Yuxuan Wang, Yubin Yuan, Tengku Fadilah Tengku Kamalden, and Sam Shor Nahar bin Yaakob. 2023. "Application of Target Detection Method Based on Convolutional Neural Network in Sustainable Outdoor Education" Sustainability 15, no. 3: 2542. https://doi.org/10.3390/su15032542

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Application of Target Detection Method Based on Convolutional Neural Network in Sustainable Outdoor Education

Abstract

1. Introduction

2. Literature Review

3. Theory of Submersible and Machine Learning and Design of Submersible Camera Model System

3.1. Overview of Neurorobotics and Related Principles of the Manned Submersible

3.2. Application of Machine Learning Model in Diving Sports

3.3. Algorithm and Design of Submersible Camera Model Based on Deep Learning Technique

3.4. Model Data Screening and Model Training Experiment Design

4. Experimental Results and Analysis

4.1. Analysis of the Recognition Effect of the Model on a Single Target with a Simple Background

4.2. Effect Analysis of Multi-Target Recognition with a Single Background

4.3. Analysis of Recognition Effect of the Model on Dense Targets

4.4. Classification Performance Evaluation

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI