EMTT-YOLO: An Efficient Multiple Target Detection and Tracking Method for Mariculture Network Based on Deep Learning

Lv, Chunfeng; Yang, Hongwei; Zhu, Jianping

doi:10.3390/jmse12081272

Open AccessArticle

EMTT-YOLO: An Efficient Multiple Target Detection and Tracking Method for Mariculture Network Based on Deep Learning

by

Chunfeng Lv

¹,

Hongwei Yang

² and

Jianping Zhu

^1,*

¹

College of Engineering Science and Technology, Shanghai Ocean University, No. 999, Huchenghuan Rd., Shanghai 201306, China

²

Department of Electronic, Information and Electrical Engineering, Shanghai Jiaotong University, No. 800, Dongchuan Road, Shanghai 200240, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2024, 12(8), 1272; https://doi.org/10.3390/jmse12081272

Submission received: 5 July 2024 / Revised: 27 July 2024 / Accepted: 27 July 2024 / Published: 29 July 2024

(This article belongs to the Special Issue Motion Control and Path Planning of Marine Vehicles—2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

:

Efficient multiple target tracking (MTT) is the key to achieving green, precision, and large-scale aquaculture, marine exploration, and marine farming. The traditional MTT methods based on Bayes estimation have some pending problems such as an unknown detection probability, random target newborn, complex data associations, and so on, which lead to an inefficient tracking performance. In this work, an efficient two-stage MTT method based on a YOLOv8 detector and SMC-PHD tracker, named EMTT-YOLO, is proposed to enhance the detection probability and then improve the tracking performance. Firstly, the first detection stage, the YOLOv8 model, which adopts several improved modules to improve the detection behaviors, is introduced to detect multiple targets and derive the extracted features such as the bounding box coordination, confidence, and detection probability. Secondly, the particles are built based on the previous detection results, and then the SMC-PHD filter, the second tracking stage, is proposed to track multiple targets. Thirdly, the lightweight data association Hungarian method is introduced to set up the data relevance to derive the trajectories of multiple targets. Moreover, comprehensive experiments are presented to verify the effectiveness of this two-stage tracking method of the EMTT-YOLO. Comparisons with other multiple target detection methods and tracking methods also demonstrate that the detection and tracking behaviors are improved greatly.

Keywords:

multiple target tracking; mariculture network; SMC-PHD filter; YOLOv8

1. Introduction

With the development of the marine economy and aquaculture, the detection and tracking of edible marine creatures has been an important research direction recently [1,2,3]. Edible marine creatures are diverse and rich in biological resources, and edible marine organisms play a significant role in both marine ecosystems and marine farming systems. Edible marine organisms are economically important creatures as they have important food value and are rich in nutritional value. Therefore, the rational development, research, and protection of edible marine resources are of great significance. Monitoring and detecting the behaviors of marine creatures can ensure the optimal rational utility of resources, control the water quality for healthy breeding, and assess the welfare of farmed creatures for timely rescue in the marine farming industry. In this paper, the main research object is the detection and tracking of marine farming fish.

MTT technology has been applied to many applications, such as circumstance monitoring, intelligent agriculture, disaster early warning and rescue, the Internet of Things (IoT), and military monitoring, and almost all methods involve the use of our existing technology [4]. At present, there are two main types of MTT. The first is the traditional MTT, which is based on Bayesian prediction and updating. There are some problems for Bayes-based methods, such as an unknown detection probability, random target newborn, and complex data associations. Multiple target filter-based random finite sets (RFS) solutions, such as the probability hypothesis density (PHD) filter [5], cardinality PHD (CPHD) [6], and multi-Bernoulli filter (MeMBer) [7], have been developed to estimate kinetic states and the number of multiple targets. To alleviate computational complexity problems brought by high-dimensional space integrals adopting the Bayes method, the PHD and CPHD filter broadcast periods and cardinality distributions, and MeMBer propagates multiple target posterior density for point targets based on sensor applications and extended targets based on image applications. The existing realization methods for the PHD filter mainly refer to the Gaussian mixture PHD filter (GM-PHD) [8], sequential Monte Carlo PHD filter (SMC-PHD) [9], and their modified versions [10]. The GM-PHD filter needs to be assumed as the Gaussian distribution while the SMC-PHD filter does not need any hypothesis on object distributions, which denotes the PHD by a series of random weighted components. And the SMC-PHD filter suffers from missing detections or an unknown detection probability, which results in estimated performance degradation.

The second type is image-based detection and tracking. With the development of computer vision technology and deep learning technology, more intuitive, more accurate, and more intelligent detection and tracking strategies are proposed, which are the future development trend. The YOLO-BYTE method [11] is proposed to address the problem of missed detection and false detection caused by complex environments in individual cow detection and tracking, which improves the feature extraction module of the YOLOv7 backbone model through adding a self-attention and convolution mixed module to account for the uneven spatial distribution and target scale variation of the cows. And an improved lightweight spatial pyramid pooling cross stage partial connections (SPPCSPC-L) module is used to reduce model complexity, and the states of the Kalman filter are improved through predicting the width and height of the bounding boxes to make the bounding boxes match the cows more precisely and accurately. The DsP-YOLO method [12] is proposed for small object detection, in which YOLOv8 is adopted as the basic framework for detection to eliminate the affections of hyperparameters related to anchors, as well as to improve the detection capability of multiscale and small-size defects. A lightweight and detail-sensitive PAN (DsPAN) is proposed for the small object detection of multiscale defects through designing an attention mechanism embedded feature transformation module (LCBHAM) and optimizing the lightweight implementation. Other detection schemes based on deep learning [13,14,15,16] are proposed to access the problems originated from detecting the circumstance or detecting the object, such as small-scale target detection, underwater small target detection, multiple scale target detection, dense and obstruct target detection, and so on.

The two-stage tracking method [17] is proposed to overcome the limitations of scale variations, random motion, and occlusion in multiple cattle tracking networks, which utilizes a detection-based tracking approach of a YOLOv5 detector for cattle detection to provide initial targets. And then, the tracking algorithm DeepSORT is implemented based on the aforementioned detection results. A U-YOLOv7 [18] is proposed to detect underwater organisms, accompanied by the BoT-SORT method for multiple target tracking. CrossConv and an efficient squeeze–excitation module is combined to increase channel information extraction while reducing the parameters and enhancing the feature fusion, and more semantic information is derived by a lightweight Content-Aware ReAssembly of FEatures (CARAFE) operator. An enhanced YOLOv7 and DeepSORT algorithm [19] is proposed to track foreign entities in the coal domain through the reduction in backbone convolutional layers, adopting the context overlap and transition network (COTN) module, and the incorporation of a compact target detection layer. And also, DeepSORT is improved by substituting the re-recognition network structure with the machine translation interface (MTL) framework and substituting the foreign object tracking method of DeepSORT with the occlusion-aware spatial attention (OSA) module. The feature [20] based on a deep CNN is derived for target tracking in moving scenes with the non-rigid deformation of the target, frequent occlusion, clutter of the target background, and interference of similar objects, in which a feature coding algorithm is proposed and the extracted DCNN features are encoded by the Fisher Vectors algorithm. GN-YOLOv5 is proposed in [21] to detect and track fish, in which GhostNetv2 is integrated into the YOLOv5 object detection algorithm, with the Coordinate Attention (CA) module. And then, the Generalized Intersection over Union (GIoU) method is incorporated into the StrongSORT tracking algorithm, accompanying with an established fish re-identification model. SiamFCA [22], based on the Siamese network framework, is proposed to solve the problems of similarity interference, occlusion, and scale variation among fish for single fish tracking under aquaculture environments, and the contrast-limited adaptive histogram equalization (CLAHE) is adopted to solve the noise problem and then increase the image contrast and retain more details. The FSTA [23], based on the tracking-by-detection paradigm, is proposed to track underwater objects, which adds an amendment detection module to solve the problem of low recognition accuracy. An underwater data association algorithm is introduced to recombine representation and location information to improve the data matching process for aquatic non-rigid organisms. An underwater target detection algorithm YOLO-T and a target positioning algorithm [24] are combined for underwater target detection and tracking, using the Ghost module and SE attention module to improve the calculation time of the target. And then, a positioning algorithm is presented to calculate the position and attitude of the target according to the geometric information of the designed cooperative marker. A combination of YOLOv8 and DeepSORT based on OSNet, FSA, and GFModel, named YOFGD [25], is proposed to access the issues of local occlusion ID dynamic transformation, the nonlinear condition of the pedestrian trajectory, and merging of the local and global information, respectively, for multiple target pedestrian tracking. A similar combination of YOLOv8 and DeepSORT for a multiple target tracking network is proposed in [26,27,28]. The combination of the improved YOLOv5s model with the optimized DeepSORT tracking algorithm [29] is proposed to detect and track vehicles on traffic roads, in which the Attention-based Intra-scale Feature Interaction (AIFI) module is adopted to detect vehicles and the Kalman filtering (KF) algorithm and the re-recognition network of DeepSORT are used to improve the accuracy. A sonar-based fish object detection and counting method using an improved YOLOv8 combined with BoT-SORT [30] is proposed to detect and track multiple fish targets, utilizing the techniques of the lightweight upsampling operator CARAFE, generalized feature pyramid network GFPN, and partial convolution.

An efficient hybridization method [31] of a deep CNN for an Underwater Object Detection and Tracking model, named HDCNN-UODT, is proposed for underwater object detection and tracking, which adopts a data augmentation process to improve detection and prediction. And then, a hybridization of two deep learning (DL) models, namely the RetinaNet and EfficientNet models, is introduced as feature extractors, and the bounding box prediction process is introduced to track the targets. A novel transfer learning algorithm with SMC-PHD [32] is proposed to automatically customize the YOLO framework with unlabeled target sequences, in which the detection probability and clutter density of the SMC-PHD filter are used to retrain the YOLO network for occluded targets and clutter. And a likelihood density with the confidence probability of the YOLO is used to set up the sample particles.

Inspired by the strategies of [17,32], an efficient two-stage multiple target tracking method based on a YOLOv8 detector and SMC-PHD tracker is proposed to enhance the detection probability and then improve the tracking performance. Firstly, the first detection stage, the YOLOv8 model, which adopts several modules to improve the detection behaviors, is introduced to detect multiple targets and derive the extracted features such as the bounding box coordination, confidence, and detection probability. Multiple targets contain varied kinds of fish, crabs, sea cucumber, and other kinds of marine products in a mariculture detection and tracking network, leading to multiple scales for feature extraction. And marine targets are in motion all the time, and the path of movement is unpredictable, which leads to information loss for the detection and tracking process. Moreover, background objects such as rocks, floating kelp, or other moving creatures but not farmed targets within the mariculture environment also seriously affect the efficient identification and tracking. To solve these problems, several improved modules are introduced to the YOLOv8 model. An Adown model is introduced to reduce information loss due to downsampling adopted in the neck. Then, DRC2f, an accurate and lightweight feature extraction module, is designed to enhance feature extraction. And also, a CA attention mechanism is integrated into the SPPF to suppress the background information in the feature map, which can obtain more accurate location information and obtain more precise target features, and consequently enhance the tracking performance.

Secondly, the particles are built based on the previous detection results, and then the SMC-PHD filter, the second tracking stage, is proposed to track the multiple targets. Then, the lightweight data association Hungarian method is introduced to set up the data relevance to derive the trajectories of multiple targets. Moreover, comprehensive experiments are presented to verify the effectiveness of this two-stage tracking method of EMTT-YOLO. The main contributions of this work are shown as follows:

An efficient two-stage MTT method based on a YOLOv8 detector and SMC-PHD tracker, EMTT-YOLO, is proposed to enhance the detection probability and then improve the tracking performance.
For the first detection stage, the YOLOv8 model, which adopts several improved modules to improve the detection behaviors, is introduced to detect multiple targets and derive the extracted features such as the bounding box coordination, confidence, and detection probability.
The particles are built based on the previous detection results. For the second tracking stage, the traditional SMC-PHD filter is then proposed to track multiple targets based on prediction and updating.
Then, the lightweight data association Hungarian method is introduced to set up the data relevance to derive the trajectories of multiple targets.

2. Materials and Methods

An efficient two-stage multiple target tracking method based on a YOLOv8 detector and SMC-PHD tracker is proposed to enhance the detection probability and then improve the tracking performance. The overview of the EMTT-YOLO scheme is shown in Figure 1, which consists of two main parts, the detection process and tracking process.

2.1. Principles of the YOLOv8 Algorithm

YOLOv8 is an enhanced iteration of YOLOv5, which was made publicly available by Ultralytics (https://github.com/ultralytics/ultralytics, accessed on 13 March 2024) in January 2023. The YOLOv8 model contains five versions according to the network depth and width: YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l, and YOLOv8x. To strike a balance between the accuracy and model size, the EMTT-YOLO scheme is proposed to detect and track multiple mariculture targets based on the YOLOv8n model.

The YOLOv8 network architecture is illustrated in Figure 2. Data pre-processing can resize the input image to 640 × 640 × 3, and data enhancement is implemented on the image. The YOLOv8 network is divided into three main components: the backbone, neck, and detection head.

Backbone network: As its backbone network, the YOLOv8 makes use of Dark-net-53. Five downsamplings are performed on the input feature map to generate five distinct scale features ranging from P1 to P5. Compared to the C3 (Conv1, Conv2, and Conv3) structure of YOLOv5 and the ELAN structure of YOLOv7, YOLOv8 combines the idea of efficient layer aggregation networks (ELANs) from YOLOv7 to combine the C3 and ELAN to form the CSPDarknet53 to a 2-stage feature pyramid network C2f module, which enables the feature integration to present various scales to improve the network’s representation of the features and enrich the information of the gradient flow. The CBS module performs a convolution of the input information, BN, and SiLU activation function operations on the input information to obtain the output results.

Neck network: To establish the backbone and neck connection, the neck network employs the spatial pyramid pooling fast (SPPF) module to transform feature maps of variable size into feature vectors of consistent size. SPPF reduces the computation and increases the speed compared to the spatial pyramid pooling (SPP) structure by progressively connecting the three largest pooling layers. To improve the model’s identification performance, the neck section employs the PANet architecture, which enhances the network’s capability of fusing the features of targets with varying scaling scales and is utilized to propagate feature information and merge features of different levels.

Head network: The head part of YOLOv8 has three detection modules. Each detection module consists of two branches. Each branch contains two CBSs, one 2D convolution, and bounding box loss and classification loss. Detection and classification separately use a decoupled header structure. Additionally, the anchor-free algorithm replaces the previous anchor-base algorithm as the YOLOv5 network and combines with the dynamic TaskAlignedAssigner, which significantly decreases the computation time and improves the speed without compromising the accuracy.

And classification and regression are two of the branches that make up the YOLOv8 loss function calculation. The classification branch continues to employ the binary cross entropy (BCE) loss while the regression branch uses the distribution focal loss (DFL) and CIOU loss. The target identification time is significantly increased thanks to the combination of the two loss functions, which makes it possible to gather frame regression information about targets with more accuracy.

2.2. Improved YOLOv8 Algorithm

In a mariculture detection and tracking network, targets contain varied kinds of fish, crabs, sea cucumber, and other kinds of marine products. These different kinds of targets have different scales from each other, which leads to multiple scales for feature extraction. And marine targets are in motion all the time, and the path of movement is unpredictable, which leads to information loss for the detection and tracking process. Moreover, background objects such as rocks, floating kelp, or other moving creatures but not farmed targets within the mariculture environment also seriously affect the efficient identification and tracking. To solve the problem of information loss due to downsampling adopted in the neck, an Adown model is introduced into this improved YOLOv8. Then, an accurate and lightweight feature extraction module DRC2f is designed in this network to enhance the feature extraction. And also, a CA attention mechanism is integrated into the SPPF to suppress the background information in the feature map, which can obtain more accurate location information and obtain more precise target features, and consequently enhance the tracking performance. Structure of our improved YOLOv8 can be shown in Figure 3.

Adown module: Downsampling is crucial for target detection. Conv is adopted for downsampling for the detection process of the YOLOv8 model, which can lead to the loss of a large amount of fine-grained information due to decreasing the size of the feature map. The performance of detection and tracking in a mariculture network can be seriously degraded due to the loss of plenty of fine-grained information and the lack of redundant information, which is brought from a large number of small mariculture targets in this dataset. So, an Adown downsampling module, which is adopted in the YOLOv9 project, is introduced in this network to enhance the detection and tracking behavior, as shown in Figure 4.

The input feature map is downsampled using average pooling to reduce the size of the feature map by half. Then, the feature map is split into two parts in channel dimensions. For the first part, a 3 × 3 convolution operation is adopted to extract the features and reduce the dimensionality of the feature map. For the other part, maximum pooling and a 1 × 1 pointwise convolution operation are adopted to enhance the extraction of nonlinear features and reduce the dimensionality further. And then, after the convolution operation, these two feature maps are combined together to form the output of the ADown module.

The main improvement in the ADown module is that the ADown module can combine the maximum pooling with average pooling throughout the downsampling process to extract and obtain more detailed and comprehensive feature information. Moreover, the multi-branch structure adopted by the ADown module can improve the network’s flexibility and facilitation to extract the features under the detection circumstances with various target scales. The Adown module for varied downsampling can enhance the representation ability of the detection model.

DRC2f module: The C2f module of the original YOLOv8 is composed of the model of the Bottleneck and CBS, which can lead to the problems of redundant computation and inadequate feature extraction. The module of C2f is introduced to enrich the gradient flow of the model through more gradient flow branch connections. The Dilated Reparam Block (DR) is adopted in this detection network to minimize the computation complexity and improve the feature extraction capacity. Combined with the original C2f module, a new DRC2f module is designed, which is an accurate and lightweight feature extraction module, as shown in Figure 5a. The DRC2f module maintains the lightweight features of C2f and has a higher feature extraction capability than C2f, which can enlarge the receptive field while greatly reducing the scale of the parameters. Thus, it can efficiently extract the global information from the input image or video and improve the precision of the model at the same time. The DR Block (DRB) contains a Conv layer and a DRB layer, as shown in Figure 5b.

The DRB module can be implemented as shown in the process in Figure 6. The DRB module adopts a dilatation convolution strategy, combined with the reparameterization method, which can decompose a larger non-expanded convolutional kernel into a smaller non-expanded kernel and several smaller expanded kernels. Dilation convolution is used to capture the sparse features and then generate enhanced features based on the dilation rate. A wider range of contextual information can be captured through expanding the receptive field of the convolution kernel. And the convolution output is nonlinearly transformed by implementing a reparameterization process to provide supplementary parameters and nonlinear functions. This strategy greatly enhances the expressiveness of the model and also enhances its ability to capture input features. The method to eliminate the inference cost of the dilated convolution can be referred to in [30].

CA module: To weaken the influence of the background information, the CA module is integrated into the SPPF to suppress the background information in the feature map, which can effectively improve the detection performance and enhance the accuracy and robustness of detection. The CA module efficiently considers both the spatial and channel information. It integrates the spatial location information into the channel information, which can enhance feature extraction. And the CA module fully takes the relationship between the feature map channels and spatial locations into account, which can adaptively adjust the feature importance in fusion and ignore the interference of irrelevant background information. Moreover, the CA module can cost less in terms of computation and achieve a higher performance. Figure 7 demonstrates the structure of the CA module.

2.3. Principles of SMC-PHD Algorithm

The PHD filter is considered as an efficient multiple target tracking method that can reduce the computational complexity originating from the multiple high-order integrals of the multiple target Bayes filter. The SMC-PHD filter can be introduced briefly as follows. The target states and their corresponding weights for the PHD filter can be represented as the set of

{\{X_{k - 1}^{i}, w_{k - 1}^{i}\}}_{i = 1}^{N_{k - 1}}

for the

i th

target among

N_{k - 1}

particles at time

k - 1

, and

X_{k - 1}^{i} = [x_{k - 1}^{i}, y_{k - 1}^{i}, b_{k - 1}^{i}, l_{k - 1}^{i}, θ_{k - 1}^{i}]

, in which

x_{k - 1}^{i}

and

y_{k - 1}^{i}

refer to the center location coordinates for the

i th

target at time

k - 1

, respectively.

b_{k - 1}^{i}

and

l_{k - 1}^{i}

refer to the width and length of the target at time

k - 1

, respectively.

θ_{k - 1}^{i}

refers to the angle from the central location to the image coordinate origin, which is a special variable that can complementally identify the target location more clearly and special, which is shown in Figure 8. The probability density of PHD can be described as

D_{k - 1} (X) = \sum_{i = 1}^{N_{k - 1}} w_{k - 1}^{i} δ (X - X_{k - 1}^{i}),

(1)

where δ(⋅) is the Dirac delta function. Then, the SMC-PHD filter can be briefly described as follows:

Prediction step: Considering the survival targets

N_{k - 1}

in the previous frame and newborn targets

N_{B, k}

in the current frame, the predicted PHD is given by

D_{k |k - 1} (X) = \sum_{i = 1}^{N_{k - 1} + N_{B, k}} w_{k |k - 1}^{i} δ (X - X_{k |k - 1}^{i}),

(2)

where

N_{B, k}

denotes the number of new particles in the current frame. And the predicted

w_{k |k - 1}^{i}

are defined as follows:

w_{k |k - 1}^{i} = \{\begin{cases} \frac{p_{S, k |k - 1} (X_{k}^{i}) p_{k |k - 1} (X_{k |k - 1}^{i} |X_{k - 1}^{i}) w_{k - 1}^{i}}{q_{k} (X_{k |k - 1}^{i} |X_{k |k - 1}^{i}, Z_{k})}, i = 1, \dots {, N}_{k - 1} \\ \frac{ζ_{k} (X_{k |k - 1}^{i})}{N_{B, k} p_{k} (X_{k |k - 1}^{i} |Z_{k})} {, i = N}_{k - 1} + 1, \dots, N_{k - 1} + N_{B, k} \end{cases},

(3)

where

p_{S, k |k - 1}

refers to the survival probability and

p_{k |k - 1} (\cdot |\cdot)

refers to the state transition density, respectively.

ζ_{k} (\cdot)

refers to the intensity of the newborn target RFS.

Updating step: The updating procedure of the PHD filter can be written as follows based on the measurements

m_{k}

of the current frame,

D_{k} (X) = \sum_{i = 1}^{N_{k - 1} + N_{B, k}} w_{k}^{i} δ (X - X_{k |k - 1}^{i}),

(4)

w_{k}^{i} = w_{k |k - 1}^{i} [1 - p_{D} (X_{k |k - 1}^{i}) + \frac{\sum_{m_{k} \in M_{k}} p_{D} (X_{k |k - 1}^{i}) l_{k} (m_{k} |X_{k |k - 1}^{i})}{κ_{k} (m_{k}) + C_{k} (m_{k})}],

(5)

C_{k} (m) = \sum_{i = 1}^{N_{k - 1} + N_{B, k}} w_{k |k - 1}^{i} p_{D} (X_{k |k - 1}^{i}) l_{k} (m_{k} |X_{k |k - 1}^{i}),

(6)

where

p_{D}

refers to the detection probability and is obtained from the detection phase,

l_{k} (\cdot |\cdot)

refers to the measurement likelihood, respectively. The framework of the EMTT-YOLO for multiple target detection and tracking is shown in Algorithm 1.

Algorithm 1. A framework of EMTT-YOLO algorithms

Step 1: The extraction of target feature, confidence, and detection probability
Input: The mariculture images and videos;
Output:

X_{k - 1}^{i} = [x_{k - 1}^{i}, y_{k - 1}^{i}, b_{k - 1}^{i}, l_{k - 1}^{i}, θ_{k - 1}^{i}]

,

p_{D}

;
For

i = 1 : N_{k - 1} + N_{B, k}

;
Execute the multiple mariculture target detection,
end for
Step 2: Multiple target tracking using PHD filter
Input: The extracted feature, confidence, and detection probability;
Output:

X_{k}^{i} = [x_{k}^{i}, y_{k}^{i}, b_{k}^{i}, l_{k}^{i}, θ_{k}^{i}]

,

w_{k}^{i}

;
Prediction for time step

k - 1

,
Input:

X_{k - 1}^{i} = [x_{k - 1}^{i}, y_{k - 1}^{i}, b_{k - 1}^{i}, l_{k - 1}^{i}, θ_{k - 1}^{i}]

,

w_{k - 1}^{i}

,

p_{D}

,

p_{S}

;
Output:

X_{k |k - 1}^{i} = [x_{k |k - 1}^{i}, y_{k |k - 1}^{i}, b_{k |k - 1}^{i}, l_{k |k - 1}^{i}, θ_{k |k - 1}^{i}]

,

w_{k |k - 1}^{i}

;
for

i = 1 : N_{k - 1}

;
use Equations (2) and (3),
end for
Update for time step

k

:
Input:

X_{k |k - 1}^{i} = [x_{k |k - 1}^{i}, y_{k |k - 1}^{i}, b_{k |k - 1}^{i}, l_{k |k - 1}^{i}, θ_{k |k - 1}^{i}]

,

w_{k |k - 1}^{i}

,

m_{k}

,

l_{k}

;
Output:

X_{k}^{i} = [x_{k}^{i}, y_{k}^{i}, b_{k}^{i}, l_{k}^{i}, θ_{k}^{i}]

,

w_{k}^{i}

;
for

i = 1 : N_{k - 1} + N_{B, k}

,
use Equations (4)–(6),
end for
Step 3: Data association
Input:

X_{k}^{i} = [x_{k}^{i}, y_{k}^{i}, b_{k}^{i}, l_{k}^{i}, θ_{k}^{i}]

,

w_{k}^{i}

;
Output: the target trajectory and its confidence.
for

k = 1 : 300

,

i = 1 : N_{k - 1} + N_{B, k}

,
use data association model of Hungarian algorithm,
end for
end for
Step 4: State and trajectory extraction

The state estimation and trajectory extraction for multiple mariculture targets based on the detector of YOLOv8 and the tracker of the SMC-PHD filter.

3. Results and Analyses

The experimental equipment and software adopted in these experiments are set as follows: operating platform Windows 11, programming environment Python 3.9.11, CUDA version 12.3, and PyCharm 2020.1. The configuration is consisted of an I5-13490F CPU, USA, NVIDIA GeForce RTX 4070 GPU, USA, and 32 GB RAM. The model training parameters are set as follows: the input image size is 640 × 640 pixels, the batch size is 8, and the number of epochs is set to 300. It should provide a concise and precise description of the experimental results and their interpretation, as well as the experimental conclusions that can be drawn.

The BrackishMOT [33,34] dataset is used in this experiment, and we visited this database in January 2024. The dataset consisted of a total of 98 sequences and 6 different classes captured in the wild, in which the detection dataset is divided into pictures originated from 89 collected videos. Another 9 videos (small fish and big fish) for the MOT dataset are then added to the 89 videos to track multiple marine animals. This database for detecting and tracking multiple mariculture targets fits right in our mariculture network to efficiently achieve the detection and tracking function.

3.1. Detection Results

3.1.1. Ablation Study

The efficiency of the EMTT-YOLO model proposed in this paper can be evaluated by adopting ablation experiments based on the YOLOv8n model for detecting multiple mariculture targets. The experimental test results are shown in Table 1, in which the metrics mAP and mAP50-90 represent the accuracy evaluation, while GFLOPs and params are the model’s computational and parametric quantities, respectively. A tick indicates that the module was added to the experiment.

As shown in Table 1, the improved models of the Adown and DRC2f-CA can improve the detection performance. On the one hand, the multiscale feature extraction is improved by combining the maximum pooling with average pooling throughout the downsampling process with various target scales, and meanwhile, the flexibility and facilitation for feature extraction is guaranteed. On the other hand, the DRC2f module can greatly enhance the network’s expressiveness through enlarging the receptive field while greatly reducing the scale of the parameters, and the CA module can coordinate the relationship between the feature map channels and spatial locations to enhance the ability of capturing input features while ignoring the interference of irrelevant background information. The precision and recall are enhanced by 1.3% and 0.8%, respectively.

3.1.2. Comparisons for Detection Performance

Figure 9 and Figure 10 demonstrate the different detection performances for one target and multiple targets, respectively. For the single target detection, our improved model based on YOLOv8n accompanying the enhanced modules of ADOWN and DRC2f-CA can improve the detection behavior to 0.91.

And also, the behavior of our model is similarly enhanced for multiple target detection, which is demonstrated in Figure 10. There is no missing detection in the identification process for our improved model, and the detection accuracy is higher than other models. The image is blurry, and the mariculture is obstructing each other for the underwater environment, which can usually result in it being time-consuming and a low accuracy for the recognition process. Moreover, the mariculture body size is relatively small, there are more types of mariculture species, and there will be some unmarked organisms, which results in the identification method having the characteristics of fine feature extraction, scale diversification, background weakening, and so on. As related in Section 2, the fine feature extraction, scale diversification, and background weakening can be realized through the improved modules of ADOWN, DRC2f, and CA.

The identification and classification are implemented in the detection of different types of mariculture targets, as shown in Figure 11. It is also challenging to identify multiple species relative to a single species, which requires not only recognizing different categories, but also recognizing the target with a higher accuracy. As shown in Figure 11, all classification models can identify two types of mariculture targets, but a higher accuracy can be achieved in our improved model.

The comparison study of multiple target detection in a mariculture network is shown in Table 2. The precision and recall are improved by 5.8% and 8.6%, respectively, while the params are also reduced greatly, and the detection speed is relatively higher.

3.2. Tracking Results

Our detection and tracking model is mainly for realizing green, ecological, and healthy breeding through the real-time monitoring of the movement status of mariculture targets and obtaining the breeding law of the mariculture objects in the marine farm, and then to provide real-time rescue in the case of pollution or disease of the breeding objects. Therefore, real-time tracking and tracking accuracy are the goals to pursue. As related in the previous section in this paper, there are two major problems in adopting the YOLO model to track multiple targets based on video detection or tracking. On the one hand, the hardware resource requirement is relatively higher. High-quality computer graphics cards are required, and other high-quality hardware are also equipped. On the other hand, it needs to deal with the dynamic changes in the underwater environment and to weaken the background objects in underwater videos, which can result in inaccurate and time-consuming tracking. Contrarily, the tracking method based on the multi-target Bayes filter can realize an accurate and time-saving tracking performance after accurate identification based on the YOLO method.

The performance comparisons for multiple target tracking are shown in Table 3. Several standard metrics are commonly adopted in tracking evaluations, such as Higher-Order Tracking Accuracy (HOTA), Association Accuracy (AssA), Identification metric (IDF1), Multi-Object Tracking Accuracy (MOTA), and Multi-Object Tracking Precision (MOTP), in which the MOTA and MOTP are metrics that accumulate the accuracy per frame and the precision of the bounding boxes, which can be considered as the auxiliary tracking performance metrics rather than the true tracking performance metrics for they cannot count errors for the same predicted target changing. Contrarily, the IDF1 can calculate an objective mapping between the ground truth sets and the trajectory predictions, which can provide the tracking performance metrics. Similarly, the tracking metrics of the AssA provide the accuracy of the data association directly. The metrics of the HOTA are the geometric mean of a detection score and an association score, which can fairly combine the different aspects of the tracking evaluation.

The tracking speed is greatly improved for our tracking method based on the combination of the detection strategy of YOLO with the multiple target tracking of the Bayes filter. Based on the accurate detection results and then data association by the Hungarian algorithm, the SMC-PHD filter can provide the higher tracking speed and more accurate predicted path seen in Table 3. The HOTA and AssA are improved by 3.9% and 5.7%, respectively. And also, the detection and tracking metrics of the MOTA and MOTP are also improved, which can present a higher detection performance and then higher tracking behaviors.

The tracking performance comparisons can also be derived, as shown in Figure 12. The tracking performance graph can capture the frame of the 101 epoch, 103 epoch, and 114 epoch within the tracking duration 300. On the one hand, the detection accuracy is obtained by adopting a high-performance detection mechanism, such as the YOLOv8 model in this work. On the other hand, the detection results such as the detection probability, confidence, and target location information obtained from the previous detection process are adopted as the input for the multi-target tracking, and then the tracking results can be derived through Bayesian prediction and updating such as the SMC-PHD filter.

All tracking models can achieve multiple mariculture target tracking, among which our improved mechanism can improve the tracking performance. The performance comparisons among the SiamFCA [22], YOLOv5+DeepSORT [17], GN-YOLOv5+StrongSORT [21], YOLOv7+DeepSORT [19], YOLOv8+DeepSORT [28], YOLOv8+BOT-SORT [30], and EMTT-YOLO+SMC-PHD are shown in Figure 12. EMTT-YOLO+SMC-PHD can track the fish correctly.

And also, the tracking metrics of the precision and recall are shown in Figure 13a,b. At the beginning of the tracking process, all the tracking mechanisms had a low accuracy and recall rates as it needs some time to capture the special target and determine its location. With the progress of the tracking, each algorithm can quickly adapt to the moving direction of the tracking target and accurately predict and determine the location of the moving target. The precision and recall for the EMTT-YOLO scheme can reach 0.89 and 0.87 at the stable stage, respectively. And the time required to achieve stable precision and recall is less than other schemes.

The metrics of mAP0.5 and mAP0.5:0.95 are shown in Figure 13c,d. These values of tracking metrics show little difference among these tracking schemes when reaching the stable stage after epoch 150, while there is a big difference before epoch 150. The value of mAP0.5:0.95 is less than mAP0.5, as shown in Figure 13c,d.

The OSPA distance is introduced to comprehensively present the tracking behaviors of different tracking schemes, as shown in Figure 14a, which is elaborately related as in [35]. The OSPA of the EMTT-YOLO scheme can reach the stable value fleetingly at the epoch of 100, and the value is lower than other tracking schemes. And also, the GOSPA distance [36] can also be introduced to present the tracking performance, as shown in Figure 14b.

The tracking behavior of the EMTT-YOLO scheme is enhanced, as shown in Figure 13 and Figure 14 based on the perspective of traditional tracking indicators. On the one hand, an efficient detection mechanism is adopted to obtain the accurate detection rate, specific location information, and target state information (the information of the target location, target shape and size, target moving direction). On the other hand, a pure mathematical prediction calculation is adopted to obtain the tracking prediction with the lightweight data association method. Contrarily, the multiple target recognition and tracking methods based on video tracking will consume a lot of time and occupy lots of hardware resources, which result in a slow computing speed, and thus less tracking accuracy.

After obtaining the location of each epoch, the trajectory can be derived, as shown in Figure 15, which shows the moving trajectories of three targets. Figure 15a shows the paths of the EMTT mechanism, and Figure 15b shows the paths of the YOLOv8+BoT-SORT model. Only two targets appear within the epochs of 101, 103 and 114 as shown in the video tracking results in Figure 12. The path of the EMTT is relatively straight including the point of the target turning around, while the path of the YOLOv8+BoT-SORT model is somewhat distorted at certain moments. The trajectory for the YOLOv8+BoT-SORT model is derived based on the images or videos as the target moves, which can usually be affected by the angle of view; that is, the distorted path will be derived when the targets are occluded or crossed or overlap each other in the video analyses. Contrarily, the trajectory is derived through the Bayes prediction and updating based on the particles obtained from the detection results, which cannot take the images or videos into account at the stage of obtaining the trajectory.

4. Conclusions

In this work, an efficient two-stage MTT method based on a YOLOv8 detector and SMC-PHD tracker, named EMTT-YOLO, is proposed to enhance the detection probability and then improve the tracking performance. For the first detection stage, the YOLOv8 model, which adopts several improved modules to improve the detection behaviors, is introduced to detect multiple targets and derive the extracted features such as the bounding box coordination, confidence, and detection probability. Multiple targets contain varied kinds of fish, crabs, sea cucumber, and other kinds of marine products in a mariculture detection and tracking network, leading to multiple scales for feature extraction. And marine targets are in motion all the time, and the path of movement is unpredictable, which leads to information loss for the detection and tracking process. Moreover, background objects such as rocks, floating kelp, or other moving creatures but not farmed targets within the mariculture environment also seriously affect the efficient identification and tracking. To solve these problems, several improved modules are introduced to the YOLOv8 model. Firstly, the Adown model is introduced to reduce information loss due to downsampling adopted in the neck. Then, DRC2f, an accurate and lightweight feature extraction module, is designed to enhance the feature extraction. And also, a CA attention mechanism is integrated into the SPPF to suppress the background information in the feature map, which can obtain more accurate location information and obtain more precise target features, and consequently enhance the tracking performance. For the second tracking stage, the particles are built based on the previous detection results. And then, the SMC-PHD filter is proposed to track multiple targets. The lightweight data association Hungarian method is introduced to set up the data relevance to derive the trajectories of multiple targets. Moreover, comprehensive experiments are presented to verify the effectiveness of the EMTT-YOLO.

In the future, we plan to investigate the tracking scheme based on the YOLO method, taking not only more types of objects into account but also the lighting conditions of the detected objects, as well as the color of their surface [37]. Moreover, developing smart mariculture is a new trend for the development of green geoponics in the future and the formation of a large agriculture IoT system.

Author Contributions

Conceptualization, C.L. and J.Z.; methodology, C.L.; software, H.Y.; validation, C.L., H.Y., and J.Z.; formal analysis, H.Y.; investigation, C.L.; resources, C.L.; data curation, H.Y.; writing—original draft preparation, C.L. and J.Z.; writing—review and editing, C.L.; visualization, H.Y.; supervision, C.L.; project administration, J.Z.; funding acquisition, J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research work is supported by the National Natural Science Foundation of China (No. 61362017, No. 61365007) and Starting Foundation of Shanghai Ocean University (No. A2-0203-00-100344, No. A2-0203-00-100343).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ariza-Sentís, M.; Velez, S.; Martínez-Pena, R.; Baja, H.; Valente, J. Object detection and tracking in Precision Farming: A systematic review. Comput. Electron. Agric. 2024, 219, 108757. [Google Scholar] [CrossRef]
Liu, H.C.; Ma, X.; Yu, Y.N.; Wang, L.; Hao, L. Application of Deep Learning-Based Object Detection Techniques in Fish Aquaculture: A Review. J. Mar. Sci. Eng. 2023, 11, 867. [Google Scholar] [CrossRef]
Xu, S.; Zhang, M.; Song, W.; Mei, H.; He, Q.; Liotta, A. A systematic review and analysis of deep learning-based underwater object detection. Neurocomputing 2023, 527, 204–232. [Google Scholar] [CrossRef]
Wang, X.; Sun, Z.J.; Chehri, A.; Jeon, G.; Song, Y.C. Deep learning and multi-modal fusion for real-time multi-object tracking: Algorithms, challenges, datasets, and comparative study. Inf. Fusion 2024, 105, 102247. [Google Scholar] [CrossRef]
Mahler, R.P. Multi-target Bayes Filtering via First-Order Multi-target Moments. IEEE Trans. Aerosp. Electron. Syst. 2003, 39, 1152–1178. [Google Scholar] [CrossRef]
Mahler, R.P.S. PHD filters of higher order in target number. IEEE Trans. Aerosp. Electron. Syst. 2007, 43, 1523–1543. [Google Scholar] [CrossRef]
Wu, S.Y.; Zhou, Y.S.; Xie, Y.; Xue, Q.T. Robust Poisson multi-Bernoulli mixture filter using adaptive birth distributions for extended targets. Digit. Signal Process. 2022, 126, 103459. [Google Scholar] [CrossRef]
Tu, H.S.; Lin, R.F.; Shen, W.C.; Guo, Y.F. An Arithmetic Geometric Mixed Average GM-PHD Algorithm for Decentralized Sensor Network with Limited Field of View. IEEE Sens. J. 2024, 24, 19995–20008. [Google Scholar]
Xu, C.A.; Yao, L.B.; Liu, Y.; Su, H.; Wang, H.Y.; Gu, X.Q. A novel SMC-PHD filter for multi-target tracking without clustering. Displays 2022, 71, 102113. [Google Scholar]
Liu, Y.; Xu, Y.; Wu, P.P.; Wang, W.W. Labelled Non-Zero Diffusion Particle Flow SMC-PHD Filtering for Multi-Speaker Tracking. IEEE Trans. Multimed. 2023, 26, 2544–2559. [Google Scholar] [CrossRef]
Zheng, Z.Y.; Li, J.W.; Qin, L.F. YOLO-BYTE: An efficient multi-object tracking algorithm for automatic monitoring of dairy cows. Comput. Electron. Agric. 2023, 209, 107857. [Google Scholar] [CrossRef]
Zhang, Y.; Zhang, H.F.; Huang, Q.Q.; Han, Y.; Zhao, M.H. DsP-YOLO: An anchor-free network with DsPAN for small object detection of multiscale defects. Expert Syst. Appl. 2024, 241, 122669. [Google Scholar] [CrossRef]
Zhao, H.Y.; Jin, J.; Liu, Y.; Gio, Y.N.; Shen, Y. FSDF: A high-performance fire detection framework. Expert Syst. Appl. 2024, 238, 121665. [Google Scholar] [CrossRef]
Liu, P.Z.; Qian, W.B.; Wang, Y.L. YWnet: A convolutional block attention-based fusion deep learning method for complex underwater small target detection. Ecol. Inform. 2024, 79, 102401. [Google Scholar] [CrossRef]
Ji, W.; Peng, J.Q.; Xu, B.; Zhang, T. Real-time detection of underwater river crab based on multi-scale pyramid fusion image enhancement and MobileCenterNet model. Comput. Electron. Agric. 2023, 204, 107522. [Google Scholar] [CrossRef]
Xu, X.C.; Liu, Y.; Lyu, L.; Yan, P.; Zhang, J.Y. MAD-YOLO: A quantitative detection algorithm for dense small-scale marine benthos. Ecol. Inform. 2023, 75, 102022. [Google Scholar] [CrossRef]
Han, S.J.; Fuentes, A.; Yoon, S.; Jeong Yong, C.; Kim, H.; Park, D.S. Deep learning-based multi-cattle tracking in crowded livestock farming using video. Comput. Electron. Agric. 2023, 212, 108044. [Google Scholar] [CrossRef]
Yu, G.Y.; Cai, R.L.; Su, J.P.; Hou, M.X.; Deng, R.L. U-YOLOv7: A network for underwater organism detection. Ecol. Inform. 2023, 75, 102108. [Google Scholar] [CrossRef]
Yang, D.J.; Miao, C.Y.; Liu, Y.; Wang, Y.M.; Zheng, Y. Improved foreign object tracking algorithm in coal for belt conveyor gangue selection robot with YOLOv7 and DeepSORT. Measurement 2024, 228, 114180. [Google Scholar] [CrossRef]
Liu, L.; Lin, B.; Yang, Y. Moving scene object tracking method based on deep convolutional neural network. Alex. Eng. J. 2024, 86, 592–602. [Google Scholar] [CrossRef]
Zhai, X.Y.; Wei, H.L.; Wu, H.D.; Zhao, Q.; Huang, M. Multi-target tracking algorithm in aquaculture monitoring based on deep learning. Ocean Eng. 2023, 289, 116005. [Google Scholar] [CrossRef]
Mei, Y.P.; Yan, N.; Qin, H.X.; Yang, T.; Chen, Y.Y. SiamFCA: A new fish single object tracking method based on siamese network with coordinate attention in aquaculture. Comput. Electron. Agric. 2024, 216, 108542. [Google Scholar] [CrossRef]
Liu, T.; He, S.Y.; Liu, H.Y.; Gu, Y.Z.; Li, P.L. A Robust Underwater Multiclass Fish-School Tracking Algorithm. Remote Sens. 2022, 14, 4106. [Google Scholar] [CrossRef]
Li, Y.L.; Liu, W.D.; Li, L.; Zhang, W.B.; Xu, J.M.; Jiao, H.F. Vision-Based Target Detection and Positioning Approach for Underwater Robots. IEEE Photonics J. 2023, 15, 8000112. [Google Scholar] [CrossRef]
Sheng, W.S.; Shen, J.H.; Huang, Q.M.; Liu, Z.X.; Ding, Z.H. Multi-objective pedestrian tracking method based on YOLOv8 and improved DeepSORT. Math. Biosci. Eng. 2024, 21, 1791–1805. [Google Scholar] [CrossRef] [PubMed]
Zhao, J.B.; Chen, J.X. YOLOv8 Detection and Improved BOT-SORT Tracking Algorithm for Iron Ladles. In Proceedings of the 2024 7th International Conference on Image and Graphics Processing, Beijing China, 19–21 January 2024; pp. 409–415. [Google Scholar]
Yadav, A.; Chaturvedi, P.K.; Rani, S. Object Detection and Tracking using YOLOv8 and DeepSORT. In Advancements in Communication and Systems; Tripathi, A.K., Shrivastava, V., Eds.; Computing and Intelligent Systems; SCRS: New Delhi, India, 2023; pp. 81–90. [Google Scholar] [CrossRef]
Cao, X.; Ren, L.; Sun, C.Y. Dynamic Target Tracking Control of Autonomous Underwater Vehicle Based on Trajectory Prediction. IEEE Trans. Cybern. 2023, 53, 1968–1981. [Google Scholar] [CrossRef] [PubMed]
Bui, T.; Wang, G.H.; Wei, G.; Zeng, Q. Vehicle Multi-Object Detection and Tracking Algorithm Based on Improved You Only Look Once 5s Version and DeepSORT. Appl. Sci. 2024, 14, 2690. [Google Scholar] [CrossRef]
Xing, B.W.; Sun, M.; Liu, Z.C.; Guan, L.W.; Han, J.T.; Yan, C.X.; Han, C. Sonar Fish School Detection and Counting Method Based on Improved YOLOv8 and BoT-SORT. J. Mar. Sci. Eng. 2024, 12, 964. [Google Scholar] [CrossRef]
Krishnan, V.; Vaiyapuri, G.; Govindasamy, A. Hybridization of Deep Convolutional Neural Network for Underwater Object Detection and Tracking Model. Microprocess. Microsyst. 2022, 94, 104628. [Google Scholar] [CrossRef]
Liu, Q.L.; Li, Y.B.; Dong, Q.H.; Ye, F. Scene-Specialized Multitarget Detector with an SMC-PHD Filter and a YOLO Network. Comput. Intell. Neurosci. 2022, 2022, 1010767. [Google Scholar] [CrossRef]
Pedersen, M.; Lehotský, D.; Nikolov, I.; Moeslund, T.B. BrackishMOT: The Brackish Multi-Object Tracking Dataset. In Scandinavian Conference on Image Analysis; Springer Nature: Cham, Switzerland, 2023; Volume 4, pp. 17–33. [Google Scholar]
Pedersen, M.; Haurum, J.B.; Gade, R.; Moeslund, T.B.; Madsen, N. Detection of Marine Animals in a New Underwater Dataset with Varying Visibility. In Proceedings of the SCIA 2023, Levi Ski Resort, Finland, 18–21 April 2023; pp. 18–26. [Google Scholar]
Schuhmacher, D.; Vo, B.T.; Vo, B.N. A consistent metric for performance evaluation of multi-object filters. IEEE Trans. Signal Process. 2008, 56, 3447–3457. [Google Scholar] [CrossRef]
Liu, Z.H.; Shang, Y.Y.; Li, T.M.; Chen, G.L.; Wang, Y.; Hu, Q.H.; Zhu, P.F. Robust Multi-Drone Multi-Target Tracking to Resolve Target Occlusion: A Benchmark. IEEE Trans. Multimed. 2023, 25, 1462–1476. [Google Scholar] [CrossRef]
Maruschak, P.; Konovalenko, I.; Osadtsa, Y.; Medvid, V.; Shovkun, O.; Baran, D.; Kozbur, H. Surface Illumination as a Factor Influencing the Efficacy of Defect Recognition on a Rolled Metal Surface Using a Deep Neural Network. J. Mar. Sci. Eng. 2024, 14, 2591. [Google Scholar] [CrossRef]

Figure 1. Overview of our model. This model contains two main sections, the detection process and tracking section. The tracking scheme executes target tracking based on the detection results of the confidence and detection box.

Figure 2. Overview of YOLOv8.

Figure 3. Structure of our improved YOLOv8.

Figure 4. Structure of Adown.

Figure 5. (a) Network structure of DRC2f; (b) DR Block.

Figure 6. The schematic of the DRB module.

Figure 7. Structure of CA.

Figure 8. Feather extraction of our detection scheme.

Figure 9. Comparisons for detection performance of one target. (a) YOLOv5n; (b) YOLOv5s; (c) YOLOv7tiny; (d) YOLOv8s; (e) YOLOv8n; (f) our improved model (YOLOv8n+ADOWN+DRC2f-CA).

Figure 10. Comparisons for detection performance of multiple targets. (a) YOLOv5n; (b) YOLOv5s; (c) YOLOv7tiny; (d) YOLOv8s; (e) YOLOv8n; (f) our improved model.

Figure 11. Comparisons for detection and classification performance of multiple targets. (a) YOLOv5n; (b) YOLOv5s; (c) YOLOv7tiny; (d) YOLOv8s; (e) YOLOv8n; (f) our proposed scheme.

Figure 12. Tracking result comparisons for different tracking strategies at the epochs of 101, 103, and 114. (A) The tracking performance description of SiamFCA scheme; (B) the tracking performance description of YOLOv5+DeepSORT scheme; (C) the tracking performance description of GN-YOLOv5+StrongSORT scheme; (D) the tracking performance description of YOLOv7+DeepSORT scheme; (E) the tracking performance description of YOLOv8+DeepSORT scheme; (F) the tracking performance description of YOLOv8+BOT-SORT scheme; (G) the tracking performance description of EMTT-YOLO+SMC-PHD scheme.

Figure 13. The tracking performance comparisons for these multiple target tracking strategies. (a) The metric description of precision; (b) the metric description of recall; (c) the metric description of mAP0.5; (d) the metric description of mAP0.5–0.95.

Figure 14. The tracking performance comparisons for these multiple target tracking strategies. (a) The metric description of average OSPA; (b) the metric description of average GOSPA.

Figure 15. The trajectories of the multiple target tracking strategy of EMTT-YOLO and YOLOv8+BoT-SORT based on three targets. (a) The trajectory of EMTT-YOLO; (b) the trajectory of YOLOv8+BoT-SORT.

Table 1. Ablation study for multiple target detection.

Network	Adown	DRC2f	CA	P	R	mAP	mAP50-90	GFLOPs	Params (M)	FPS
YOLOv8n				97.2	95.6	98.0	80.7	8.1	3	72
√	√			97.8	96.1	98.2	81.6	7.4	2.59	67
√		√		97.9	96.3	98.0	80.8	8.1	3.03	63
√		√	√	98.1	96.4	98.4	81.8	7.4	2.62	57
√	√	√	√	98.4	96.4	98.5	81.7	7.5	2.64	59

Table 2. Comparison study for multiple target detection.

Network	P	R	mAP	mAP50-90	GFLOPs	Params (M)	FPS
YOLOv5n	97.8	94.5	97.6	76	4.2	1.77	68
YOLOv5s	97.8	96.7	98.7	81.5	15.8	7.03	72
YOLOv7tiny	92.6	88.3	93.5	65.5	13.1	6.02	65
YOLOv8n	98.2	95.6	98.0	80.7	8.1	3.00	69
YOLOv8s	98.3	97.3	98.6	94.7	28.4	11.28	72
YOLOv8n+ADOWN+DRC2f-CA	98.4	96.9	98.8	81.8	7.4	2.62	66

Table 3. Comparison study for multiple target tracking.

Network	IDF1	AssA	HOTA	MOTA	MOTP	FPS
SiamFCA [22]	85.1	75.6	70.2	73.7	78.1	10
YOLOv5+DeepSORT [17]	85.7	80.3	69.5	77.6	79.4	10
GN-YOLOv5+StrongSORT [21]	84.7	79.1	70.6	76.8	85.1	8
YOLOv7+DeepSORT [19]	85.2	77.4	70.4	73.1	79.4	8
YOLOv8+DeepSORT [28]	86.2	79.5	71.9	77.9	83.9	9
YOLOv8+BOT-SORT [30]	86.3	80.8	72.8	77.5	84.5	8
EMTT-YOLO+SMC-PHD	86.9	81.3	73.4	78.2	84.3	12

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lv, C.; Yang, H.; Zhu, J. EMTT-YOLO: An Efficient Multiple Target Detection and Tracking Method for Mariculture Network Based on Deep Learning. J. Mar. Sci. Eng. 2024, 12, 1272. https://doi.org/10.3390/jmse12081272

AMA Style

Lv C, Yang H, Zhu J. EMTT-YOLO: An Efficient Multiple Target Detection and Tracking Method for Mariculture Network Based on Deep Learning. Journal of Marine Science and Engineering. 2024; 12(8):1272. https://doi.org/10.3390/jmse12081272

Chicago/Turabian Style

Lv, Chunfeng, Hongwei Yang, and Jianping Zhu. 2024. "EMTT-YOLO: An Efficient Multiple Target Detection and Tracking Method for Mariculture Network Based on Deep Learning" Journal of Marine Science and Engineering 12, no. 8: 1272. https://doi.org/10.3390/jmse12081272

APA Style

Lv, C., Yang, H., & Zhu, J. (2024). EMTT-YOLO: An Efficient Multiple Target Detection and Tracking Method for Mariculture Network Based on Deep Learning. Journal of Marine Science and Engineering, 12(8), 1272. https://doi.org/10.3390/jmse12081272

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

EMTT-YOLO: An Efficient Multiple Target Detection and Tracking Method for Mariculture Network Based on Deep Learning

Abstract

1. Introduction

2. Materials and Methods

2.1. Principles of the YOLOv8 Algorithm

2.2. Improved YOLOv8 Algorithm

2.3. Principles of SMC-PHD Algorithm

3. Results and Analyses

3.1. Detection Results

3.1.1. Ablation Study

3.1.2. Comparisons for Detection Performance

3.2. Tracking Results

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI