Next Article in Journal
A Review of Recent Developments in Friction Stir Welding for Various Industrial Applications
Previous Article in Journal
Variation in Structure and Functional Diversity of Surface Bacterioplankton Communities in the Eastern East China Sea
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Deep Learning Approach to Estimate Halimeda incrassata Invasive Stage in the Mediterranean Sea

by
Caterina Muntaner-Gonzalez
,
Miguel Martin-Abadal
and
Yolanda Gonzalez-Cid
*
Department of Mathematics and Computer Science, University of the Balearic Islands, Carretera de Valldemossa Km. 7.5, 07122 Palma, Spain
*
Author to whom correspondence should be addressed.
J. Mar. Sci. Eng. 2024, 12(1), 70; https://doi.org/10.3390/jmse12010070
Submission received: 21 November 2023 / Revised: 16 December 2023 / Accepted: 21 December 2023 / Published: 27 December 2023
(This article belongs to the Section Marine Biology)

Abstract

:
Invasive algae, such as Halimeda incrassata, alter marine biodiversity in the Mediterranean Sea. Monitoring these changes over time is crucial for assessing the health of coastal environments and preserving local species. However, this monitoring process is resource-intensive, requiring taxonomic experts and significant amounts of time. Recently, deep learning approaches have attempted to automate the detection of certain seagrass species like Posidonia oceanica and Halophila ovalis from two different strategies: seagrass coverage estimation and detection. This work presents a novel approach to detect Halimeda incrassata and estimate its coverage, independently of the invasion stage of the algae. Two merging methods based on the combination of the outputs of an object detection network (YOLOv5) and a semantic segmentation network (U-net) are developed. The system achieves an F1-scoreof 84.2% and a Coverage Error of 5.9%, demonstrating its capability to accurately detect Halimeda incrassata and estimate its coverage independently of the invasion stage.

1. Introduction

The introduction of invasive species due to human activity in new ecosystems represents a threat to the preservation of native species. In fact, invasive species are the second leading cause of species extinction, endangering biodiversity conservation [1]. Monitoring changes in marine ecosystems is crucial for the conservation and preservation of their biodiversity, especially in the context of an increasing anthropogenic impact on the environment [2,3].
The Mediterranean Sea is undergoing an alteration process of its marine biodiversity, as many invasive species have appeared during the last years [4]. One of these invasive species is Halimeda incrassata (from now on referred to as H.i), which is a calcareous rhizophytic species of green macroalgae in the family Halimedaceae, native to the tropical western Atlantic and Indo-Pacific Oceans [5,6].
H.i was first detected on the coast of Palma (Spain) in 2011 and has been rapidly spreading on the south coast of Mallorca and the coasts of Cabrera [7]. H.i is already changing the structural characteristics of invaded habitats; thus, effects on the native ecosystems can be expected [8]. Quantifying the extension of areas invaded by H.i and its temporal evolution is key to assessing its effects on local species and evaluating the efficiency of new eradication protocols.
In this context, new technologies that automate the process of H.i detection and coverage estimation can speed up data processing and significantly reduce the need for human labelling. Nonetheless, estimating the coverage of H.i presents several challenges:
  • Recursive Shape: H.i grows in a recursive shape, making it challenging to delimit individual shoots.
  • Varied Distributions: Different stages of invasion may present varying distributions of H.i, requiring different approaches for accurate detection. Specifically, two different scenarios can be distinguished depending on the type of H.i distribution they present.
    (a)
    Dense Scenario: Comprises locations where the invasion process of the algae is at a late stage, and therefore, the H.i completely covers the studied area.
    (b)
    Sparse Scenario: Comprises locations where the invasion process is in its initial phase, leading to the presence of H.i in the form of scattered individuals on the seabed.
    It is important to note that these scenarios represent the extreme cases, and H.i coverage can be found at any stage of invasion in between the extreme cases.
An important aspect to take into account when automating a task lies in the ability of the system to adapt to the environment and the information it gathers, requiring the concurrent generation of real-time information for effective adaptation. For H.i monitoring, being able to generate real-time information on its presence would allow performing dynamic path planning strategies to adapt and optimize a vehicle trajectory during missions, in terms of duration, quality, and quantity of the gathered data.
The aim of this work is to design a system for automating the H.i coverage estimation process using deep learning. First, two scenario-based solutions are presented: one for the dense scenario, based on semantic segmentation (SS); and another for the sparse scenario, based on object detection (OD). Next, two merging methods are presented to combine the scenario-based estimations into a generic, scenario-independent solution capable of calculating H.i coverage independently of the invasive process stage. Additionally, the entire H.i coverage estimation system is implemented on an Autonomous Underwater Vehicle (AUV) and adapted to be executed online.
The remainder of this document is structured as follows. Section 2 reviews the related work on seagrass detection and coverage estimation and highlights the main contributions. Section 3 describes the adopted methodology and materials, including the neural network architectures used, its training details, and the proposed merging methods. Section 4 presents the experimental results. Section 5 explains the system implementation on an AUV. Finally, Section 6 summarizes the main conclusions and presents possible future research lines.

2. Related Work and Contributions

2.1. State of the Art

Seagrass and seaweed are crucial in marine ecosystems, and monitoring their status is essential for assessing environmental conditions. The detection and mapping of underwater vegetation have been subjects of research since the 1980s. Initially, this monitoring heavily relied on in situ observation and sampling by experts, while later advances in both sensing and processing techniques have facilitated the development of a more automated and accurate monitoring pipeline [9]. In this transition process towards automation, methods can be classified according to two different domains: data acquisition method and data processing methods.
In the domain of data acquisition, three general types of data can be distinguished: acoustic images, hyper or multi-spectral satellite images, and RGB images. Although some research has been conducted in the field of acoustic seagrass monitoring [10,11,12], its prevalence is overshadowed by detection methods involving hyperspectral or RGB images, primarily due to the high cost associated with its requisite sensors. Moreover, while successful in detecting meadow-like seagrass, acoustic monitoring faces challenges in distinguishing between seagrass types and exhibits a dependency on depth ranges. Some advantages of this type of data include being easier to label or that are not significantly affected by turbidity or light conditions.
Regarding hyper-spectral images, their convenience and potential have positioned them as a particularly interesting approach, as they provide the means to study extensive regions with minimal data gathering cost. Numerous approaches have been explored in recent years, especially in the context of seagrass and macroalgae monitoring [13,14,15]. However, employing these kinds of data to monitor smaller and isolated species remains challenging, as small seaweed species like H.i are not discernible in satellite images. Consequently, underwater RGB images continue to be one of the most popular techniques for these tasks, despite the higher cost associated with the data acquisition process compared to satellite imagery. In this study, underwater RGB images will be employed, as they prove to be the most suitable for the application, given that the primary objective is to detect H.i, which is only observable from underwater images.
In the domain of data processing, traditionally, seaweed population estimation and monitoring has relied on the laborious analysis of the extensive imagery of biologists, becoming a very time-consuming task. However, the emergence of new deep learning-based techniques is enabling the automation of these processes, significantly reducing the analysis times and increasing the temporal and spatial scope of data collection experiments [16].
Many approaches for automatically detecting or estimating seaweed and seagrass have been developed. Some studies have utilized traditional vision techniques, such as Bonin et al. [17], where Gabor features along a Support Vector Machine are used for classifying Posidonia oceanica (P.o).
Recently, with the emergence of deep learning as the state-of-the-art for detection and classification problems, deep learning-based techniques have also been introduced in seagrass detection. Following this, two types of approaches can be distinguished.
The first approach is based on using SS networks to generate coverage maps of different species of algae that form meadows. One of the most studied seagrasses in this context is P.o. In 2018, Martin-Abdal et al. [18] used a CNN (VGG16-FCN8) that achieved an accuracy of 96.8% in P.o coverage estimation. Similarly, in 2019, Weidman et al. [19] found that the DeepLabv3Plus Network achieved a mean Intersection over Union of 87.8% on seagrass segmentation.
On the other hand, when the objective is to detect and locate individual instances of seagrass, many studies utilize state-of-the-art OD networks such as YOLO, EfficientDet, or Faster R-CNN [20,21,22]. Recently, Ranolo et al. [23] used YOLOv3 to detect seaweed, achieving an accuracy of 73.0%. Noman et al. [21] presented a YOLOv5-based one-stage detector and an EfficientDetD7–based two-stage detector to detect Halophila ovalis, improving the state of the art results for this algae.
In the specific case of H.i, some preliminary work has attempted to use differentiated approaches for sparse and dense cover scenarios [24] without providing a scenario-independent solution. Based on the current information, all previous research has focused either on estimating the coverage of seaweed meadows or detecting individual instances of algae. However, to the best of our knowledge, no prior work has attempted to address a mixed scenario where a seaweed species can exhibit vastly different distributions depending on the stage of its invasion, as is the case with H.i.
This work proposes a novel system to combine OD and SS techniques and provide an improved H.i coverage estimation, adaptable to dense and sparse scenarios, or a combination of both.

2.2. Main Contributions

The main contributions of this work are composed of:
  • Generation of a dataset containing H.i images covering different stages of the invasion process.
  • Training and testing of an OD network and an SS network to estimate the coverage of H.i in specific stages of invasion.
  • Development of two merging methods to combine OD and SS estimations.
  • Implementation of the system on an AUV, performing an online execution.
  • Creation of a publicly available repository [25] where data, the code, and trained models are provided to the scientific community.

3. Materials and Methods

This section explains the data acquisition, labelling, and organization; presents the scenario-based solutions, indicating the used networks, their training, and hyperparameter study; and describes the proposed merging methods.

3.1. Data

This subsection explains the acquisition, labelling, and management of the data used to train and test the deep neural networks.

3.1.1. Acquisition

To collect the data needed to build the dataset, several dives were conducted at diverse locations of Mallorca and Cabrera between 2020 and 2023 (shown in Figure 1), using three different cameras: a GoPro HERO (GoPro, Inc., San Mateo, CA, USA), an Olympus Tough TG-6 (Olympus Corporation, Tokyo, Japan) and a NIKON COOLPIX S33(Nikon Corporation, Tokyo, Japan). RGB images were gathered at different invasion stages, containing H.i from both dense and sparse scenarios; and under different conditions such as altitude, camera angle, and seabed type. Additionally, some images were obtained from the public repository of the Observadores del mar community [26]. The complete dataset comprises 881 images featuring H.i across a wide spectrum of scenarios. Figure 2 presents some data examples.

3.1.2. Ground Truth Labelling

The ground truth generation of images from the dense and sparse scenarios used different labelling methods. On the one hand, images from the dense scenario were labelled using GIMP [27] to create label maps, marking the areas containing H.i in white and the background areas in black. On the other hand, images from the sparse scenario were labelled using LabelImg [28], marking each H.i instance with a bounding box. Figure 3 shows an example of both types of images with their corresponding ground truths.
It is worth noting that, despite the image division, the labelling process presented some difficulties regarding the inherent distribution of H.i. In some cases, the delimitation of individual H.i plants was unclear due to its recursive shape or the fact that it often grows in small plant groups. There were also images in which the invasion stage was in between the two defined scenarios.

3.1.3. Dataset Management

The complete dataset is divided into two sets: one containing the images from the dense scenario (dense dataset) and another containing the images from the sparse scenario (sparse dataset).
For the dense dataset, 452 images were selected from the complete dataset. The dense dataset was split into a train-validation partition composed of 383 images (85% of the data) and a test partition composed of 68 images (15% of the data).
For the sparse dataset, 433 images were selected from the complete dataset. The sparse dataset was split into a train-validation partition composed of 368 images (85% of the data) and a test partition composed of 65 images (15% of the data). Figure 4 illustrates this distribution.

3.2. Halimeda Coverage Estimation Modules

As previously stated, two different scenarios regarding H.i distribution (dense and sparse) are distinguished. In this section, two scenario-based modules specifically designed to meet the requirements of each scenario are presented, detailing the used neural networks, their training, and hyperparameter selection.

3.2.1. Semantic Segmentation Module

This module tackles H.i detection in the dense scenario. It uses an SS network to perform a pixel-wise segmentation, indicating whether each pixel of an image belongs to an H.i covered region or not. SS networks excel at accurately detecting and delineating irregular regions within an image, as is the case for H.i in the dense scenario case.
For this module, the U-net SS network has been selected. Its architecture comprises a contracting path to capture context and a symmetric expanding path that enables precise localization [29]. This network has demonstrated an outstanding performance in segmentation applications and is particularly well-suited for small datasets. Although its original purpose was a biomedical image segmentation, its use has extended to other generic segmentation tasks, becoming one of the most popular networks for this purpose [30]. Moreover, it has already been used for seaweed SS from satellite images, achieving good results [31]. Figure 5 shows the network architecture.
The network was trained on the dense dataset, using early stopping to prevent overfitting, a batch size of 3, the Adam optimizer, and a binary cross-entropy loss function. To optimize the network performance, a hyperparameter study was conducted. The considered hyperparameters were:
  • Learning rate: controls the pace at which the network learns by modifying its training step.
  • Data augmentation: consists of applying transformations to the input data (e.g., rotations, translations, or crops) to increase the variety of the data and reduce overfitting.
Table 1 lists the tested hyperparameter combinations. The network was trained for each learning rate value, both with and without using data augmentation. The data augmentation techniques applied included rotations, flips, horizontal and vertical shifts, and shear and zoom transformations. These techniques were applied to the original images with a certain probability. The learning rate values were selected by altering the default learning rate value for the U-net, 0.001, by a scale of 3.
The network outputs a probability map, where each pixel has a value between 0 and 255, representing the likelihood of H.i being found in that position of the original image. This probability map is binarized by establishing a probability threshold, C t h r s s . The areas with a predicted probability higher than C t h r s s are marked in white (H.i), while the rest are marked in black (background), resulting in the H.i coverage map. The C t h r s s value that provided better performance was selected. The probability and coverage maps conform to the outputs of the SS module, depicted in Figure 6.

3.2.2. Object Detection Module

This module tackles H.i detection on the sparse scenario. The module is based on an OD network to detect H.i instances. OD networks perform better when the target classes are isolated, as is the case for the H.i in the sparse scenario case.
For this module, the YOLOv5 has been selected. YOLOv5 is an open-source, state-of-the-art deep learning model widely used for OD. Originally introduced by Redmon et al. in 2016 [32], YOLO revolutionized OD as a single-shot object detector, outperforming other networks like FASTER-RCNN [33] in terms of speed, gaining significant popularity in the field. Over time, numerous improvements to the original network have been proposed, resulting in the development of new YOLO versions such as YOLOv3, YOLOv4, YOLOv5, YOLOv6, YOLOv7 and YOLOv8 [34,35,36,37,38,39]. Yolov5 was selected for this work due to its widespread adoption and demonstrated success in previous underwater applications.
YOLOv5 is a family of models that offer scalability and computational efficiency and have obtained promising results in other underwater tasks [40]. The XL model (YOLOv5x) was selected due to its superior performance on State-Of-The-Art benchmarks and its suitability for detecting small objects [36]. Figure 7 shows The YOLOv5 architecture.
The network was trained on the sparse dataset, using early stopping to prevent overfitting, a batch size of 8, a stochastic gradient descent (SGD) optimizer, and transfer learning with pre-trained weights from the COCO dataset.
To select the network hyperparameters, the same procedure described in Section 3.2.2 was followed, testing diverse learning rates, data augmentation combinations and selecting the best-performing ones. The tested hyperparameter combinations are listed in Table 2.
The network was trained for every learning rate value, both with and without using data augmentation. The learning rate values were selected by modifying the Yolov5 default learning rate, 0.01, by a scale of 3. The employed data augmentation techniques included adjustments in hue, saturation, and value, image translations, scaling, horizontal flips, mosaics, mixup of images, and copy-paste augmentation.
The network outputs an array of H.i instances, denoted as O D i n s t , representing instances of detected H.i plants. Each instance in this array contains bounding box coordinates along with its associated class label and confidence score. However, as the final goal is to estimate coverages, the instances are converted into coverage maps of H.i. To do so, a second instance array ( O D i n s t _ t h r _ 1 ) is generated by applying a confidence threshold ( C t h r _ o d _ 1 ) to the initial O D i n s t array, filtering out instances with lower confidence scores. Then, the O D i n s t _ t h r _ 1 array is converted into a coverage map by marking the pixels within the predicted bounding boxes in white (255) and the rest in black (0). The C t h r _ o d _ 1 value that provided a better performance was selected.
Additionally, a probability map is generated by marking the pixels within the predicted bounding boxes in the O D i n s t array with a grey-scale value between 0 and 255, corresponding to their instance confidence percentage. The O D i n s t array, the coverage map, and the probability map conform the outputs of the OD module, depicted in Figure 8.

3.3. Merging Methods

As previously mentioned, the SS and OD modules are specifically designed for the dense and sparse scenarios, respectively. However, the ultimate goal of this work is to obtain a system capable of estimating the H.i coverage, independently of its distribution. Thus, it is necessary to design a method to merge both modules. This section presents the development of two different merging methods.

3.3.1. Weighted Merging

This merging pipeline combines the probability maps outputted by the SS and OD modules by weighting them, and generating a new probability map, g r e y m e r g e . w S S and w O D represent the weights applied to the SS and OD modules, while g r e y S S and g r e y O D correspond to the grey values of the SS and OD probability maps, respectively. The weight values w S S and w O D range from 0 to 1 and satisfy the relationship w O D = 1 w S S . This operation is described in Equation (1).
g r e y m e r g e = w S S × g r e y S S + w O D × g r e y O D
Then, a threshold ( C t h r w ) is applied to g r e y m e r g e to binarize it and obtain the weighted coverage map. The pixels in g r e y m e r g e with a value higher than C t h r w are set to 1 and therefore considered as H.i, and the rest are set to 0 and considered as the background. Figure 9 offers a visual representation of the weighted merging pipeline.
The values for w S S and w O D were determined through a sweeping process, where w S S was varied between 0 and 1 in steps of 0.05. Each combination of weights was evaluated at its corresponding best C t h r w over the SS and OD validation sets. To find the best C t h r w for a given weight combination, another sweeping process was conducted over the threshold value ranging from 0 to 100 in steps of 1, evaluating the performance at each step. For this application, the w S S , w O D , and threshold C t h r w values that provided better results were w S S = 0.15, w O D = 0.85 and C t h r w = 25.

3.3.2. AUC Merging

This merging pipeline combines the information outputted by the SS and OD modules to exploit the strengths of each one. The main idea is to use the SS module to propose H.i covered regions and the OD module to validate or discard them and detect isolated shoots.
The AUC merging pipeline, depicted in Figure 10, consists of the following parts:
SS coverage map clustering.
OD instances thresholding.
Blob validation.
Coverage Merging.

SS Coverage Map Clustering

The first step of the AUC merging pipeline is to cluster the SS coverage map pixel-wise segmentation into blobs of connected pixels. To do so, an opening morphological operation is applied to the coverage map, removing weak connections. Afterwards, the blobs are generated using a 4-connectivity connected-component algorithm. Finally, all blobs conformed by fewer pixels than a determined pixel threshold ( p t h r ) are deleted (for this work, p t h r = 200 ). Figure 11 showcases an example of SS coverage map clustering.

OD Instance Thresholding

The O D i n s t array is used as complementary information to accept or discard the resulting blobs from the previous step. Similar to the process described in Section 3.2.2, a confidence threshold C t h r _ o d _ 2 is applied to filter the predictions in the O D i n s t array. This process generates a new array, O D i n s t _ t h r _ 2 , containing the instances for the blob validation process.
The reason for using an alternative threshold, C t h r _ o d _ 2 , rather than C t h r _ o d _ 1 , is motivated by its distinct purpose. While C t h r _ o d _ 1 primarily filters the output of the OD module, C t h r _ o d _ 2 determines which OD instances can be used as complementary information for validating blobs in the AUC merge. This difference justifies the choice of a more permissive threshold for C t h r _ o d _ 2 , enabling the inclusion of instances with lower confidence levels and enriching the input data, and ultimately enhancing system performance.
In order to determine the value of C t h r o d 2 , a study is conducted utilizing images from the validation partitions of both the dense and sparse datasets. The images of these sets are forwarded into the OD module, obtaining their O D i n s t instance arrays. Then, a sweeping process over the C t h r _ o d _ 2 value, ranging from 0 to 100% in steps of 1%, is performed. For each step, instances with a confidence level lower than the current C t h r _ o d _ 2 value are deleted. The remaining instances are classified as True Positives (TPs) if they overlap with a region marked as H.i in their corresponding ground truths; otherwise, they are classified as False Positives (FPs). Finally, the confidence threshold, C t h r _ o d _ 2 , is determined as the value that maximizes the difference, Δ d i f f , between the rates of change of TPs ( Δ T P ) and FPs ( Δ F P ), expressed in Equation (2). Figure 12 indicates that this maximum occurs at C t h r _ o d _ 2 = 1 % ; therefore, the value of C t h r _ o d _ 2 is set to 1%.
Δ d i f f = Δ T P Δ F P .
Observing how TPs and FPs change over different confidence thresholds via their rate of change offers a clearer view of their behaviour. This is more practical than simply looking at their absolute numbers, as it provides a sharper perspective on the direction and continuity of changes, facilitating the identification of meaningful trends and the selection of an appropriate threshold.

Blob Validation

Once the blobs are obtained from the SS coverage map clustering process, and the O D i n s t has been thresholded into O D i n s t _ t h r _ 2 by applying the threshold C t h r _ o d _ 2 , a blob validation process is carried out to determine which blobs should be accepted and which should not.
To do so, two custom metrics are calculated for each blob. The first of these metrics is the Area Under the Curve (AUC), which quantifies the area beneath the coverage-confidence curve of a blob. This curve is constructed by sweeping through a range of confidence thresholds from 0% to 100% and calculating the blob coverage value at each step. This coverage value for each step is calculated as the percentage of its pixels covered by the instances in the O D i n s t _ t h r _ 2 with a confidence value higher than the threshold of the current step.
The second metric, denoted as N i n s t , is defined as the number of OD instances in O D i n s t _ t h r _ 2 that intersect with the blob. For a clearer understanding of these metrics, refer to Figure 13, which illustrates how AUC and N i n s t are determined for a specific blob.
In the following algorithm step, the previously mentioned metrics are used to categorize the blobs into ten groups based on their AUC values, distributed in ranges of 10% intervals. Each group is associated with a specific threshold value, denoted as N t h r . If the N i n s t value of a blob is higher than its corresponding AUC group N t h r , it is validated; otherwise, it is discarded. Finally, the blob-filtered map is created by preserving only the validated blobs and considering the discarded blobs as a background region. The values of N t h r for each AUC range used for this work are listed in Table 3. The blob validation process, along with an application example, are presented in Figure 14.
To find the best N t h r threshold for each AUC range, a study is conducted using the validation partitions of both the dense and sparse datasets. To do so, the images from these partitions are forwarded into the SS and OD modules, obtaining their SS coverage maps and O D i n s t _ t h r _ 2 arrays. These coverage maps are then processed using the previously described clustering algorithm, obtaining their blobs of connected pixels. These blobs are then labelled as True Blobs (TB) if more than 50% of their area is marked as H.i in their corresponding ground truth label maps; otherwise, they are labelled as False Blobs (FB).
Afterwards, the AUC and N i n s t metrics are calculated for all blobs, and the blobs are split into 10% ranges based on their AUC values. For each AUC range, a sweep of the N t h r from 0 to 300 is conducted. For each step, blobs are accepted or discarded based on whether their N i n s t is higher than the step N t h r or not. If a blob labelled as TB is accepted, it is considered a TP; otherwise, an FN. Similarly, if a blob labelled as FB is accepted, it is considered an FP; otherwise, a TN. An F1-score is calculated using this classification. Finally, the N t h r value that maximizes the F1-score for each AUC range is selected.
The rationale behind creating these distinct N t h r thresholds lies in the observation that blobs with a high AUC value are primarily covered by instances with high confidence, increasing the likelihood that they represent correct blobs. Therefore, such blobs require a lower N t h r value to be validated. Conversely, blobs with a low AUC value are characterized by sparse coverage with instances of lower confidence, making them less likely to be correct blobs. Hence, these blobs require a higher N t h r value for validation.

Coverage Merging

The final step of the AUC merging algorithm is the coverage merging that results in the final outputted coverage map. Once the blob validation is computed, the validated blobs are marked as H.i, and the discarded ones are marked as the background, generating the blob-filtered map. This map is merged with the OD coverage map by applying an OR function. If a pixel is classified as H.i in any of the mentioned maps, it is marked as H.i in the final coverage map too; otherwise, it is considered as the background. This combined coverage map is the output of the AUC merging.

3.4. Validation and Evaluation Metrics

To assess the robustness of the network hyperparameter selection process for the SS and OD modules, a five k-fold cross-validation is performed. To do this, the training and validation data of the dense and the sparse datasets are divided into five folds. Next, five training runs for each hyperparameter combination are conducted, each one using a different fold as validation data and the remaining four folds as training data. Following this process, the hyperparameter combination that achieved the best average result is selected.
To evaluate the performance of the SS and OD modules, and the merging methods, the coverage map output is compared at a pixel level against the ground truth label maps. To do so, the OD ground truths are converted into semantic label maps by marking the pixels within the ground truths bounding boxes in white (255) and the rest in black (0).
From this, the number of TP, TN, FP, and FN pixels is obtained. Finally, the Precision, Recall, and F1-score metrics are calculated following Equations (3), (4) and (5), respectively.
P r e c i s i o n = T P T P + F P ,
R e c a l l = T P T P + F N ,
F 1 - s c o r e = 2 × P r e c i s i o n R e c a l l P r e c i s i o n + R e c a l l .
This pixel-wise comparison is not the usual evaluation procedure for OD models. However, for the H.i particular case and the final goal of this work, a pixel-wise evaluation is more appropriate for several reasons.
Firstly, H.i has a recursive shape, making it difficult to define the boundaries of each shoot. This means that the OD module may detect complete individuals and isolated twigs or sets of twigs belonging to different plants. Thus, for any of these cases, the predicted bounding boxes and the ground truth bounding boxes may differ, and, using the standard IoU evaluation, they would be misclassified as false positives when H.i is present. Moreover, considering that the ultimate goal of this work is to obtain a coverage estimation, it is not necessary to determine the precise boundaries of each individual shoot, so it is desirable to regard these detections as true positives.
Secondly, OD models usually implement a non-maximum suppression to filter similar predicted instances according to an overlapping percentage and score criteria. However, for this application, eliminating overlapping detections is unnecessary, as it could lead to missing H.i shoots or parts of individual plants. Additionally, having repeated instances would not be detrimental to the system. Thus, non-maximum suppression is not integrated into this system.
Apart from the F1-score, a Coverage Error metric is computed, measuring the difference between the estimated H.i coverage and the ground truth coverage, and obtained as:
C o v e r a g e E r r o r = 1 n i = 1 n c o v e r a g e i p r e d c o v e r a g e i g t ,
where n denotes each image and c o v e r a g e is the percentage of pixels covered by H.i.

4. Experimental Results and Discussion

This section presents the results obtained for each developed solution. First, the results of the OD and SS modules over their respective test sets and the combination of both (hereafter referred to as the complete dataset test set) are discussed. Then, the results for each merging method over the complete dataset test set are reported.

4.1. Semantic Segmentation Results

Table 4 presents the average F1-score obtained for each hyperparameter combination when performing the five k-fold cross-validation over the SS dataset test set.
As can be observed, the best result is achieved when using a learning rate of 0.00033 and not applying data augmentation, reaching a mean validation F1-score of 87.8% and a standard variation of 0.19% between the five folds.
The best k-fold model, which obtained a F1-score of 89.9% over the SS dataset test set, was then tested over the complete dataset test set. It experienced a reduction of its performance, obtaining an F1-score of 78.8%, proving that the accuracy of the model decreases when it is evaluated with data from both scenarios. The model obtained a total Coverage Error of 10.0% for the complete dataset test set.

4.2. Object Detection Results

Table 5 presents the average F1-score obtained for each hyperparameter combination when performing the five k-fold cross-validation over the OD dataset test set.
The best result is achieved when using a learning rate of 0.0033 and applying data augmentation, reaching a mean validation F1-score of 77.4%. and a standard variation of 2% between the five folds.
The best k-fold model, which obtained an F1-score of 75.2% over the OD dataset test set, was then tested over the complete dataset test set, obtaining an F1-score of 35.0%. This significant decrease in performance was not unexpected; it was caused by the difficulties the OD module presented when evaluating images from the dense scenario. A Coverage Error of 19.9% was obtained when estimating the total coverage for the complete dataset test set.

4.3. Merging Method Results

As observed in the previous sections, the performance of both scenario-based modules decreased when tested against the complete dataset test set. Here, the results of applying the different merging methods to combine the best qualities of each module are studied.
For the weighted merging method, a sweep was performed over the weights assigned to the SS and the OD modules. The best results were obtained with w S S = 0.15 and w O D = 0.85.
Table 6 presents the results of both SS and OD modules, along with each merging method, over the complete dataset test set.
Both merging methods significantly improve the results of the individual modules. The AUC Merging achieved the highest F1-score at 84.2% and the lowest Coverage Error at 5.9%.

5. Autonomous Underwater Vehicle Online Implementation

The objectives of this work include implementing the H.i coverage estimation system on an AUV and validating its capability for online execution. This section describes the AUV characteristics, the online implementation, and its validation.
The AUV used is a SPARUS II model unit [42] (Figure 15), equipped with a pair of Manta G283 cameras (Allied Vision, Stadtroda, Germany) facing downwards and a computer to run the image acquisition and execute the H.i coverage estimation system (Intel i7 processor at 2.5 GHz, Intel Iris Graphics 6100, and 16 GB of RAM).
The system was implemented using the Robot Operating System (ROS) middleware [43]. This framework is a standard in robotics due to its versatility and powerful collection of tools and libraries that simplify creating and implementing new code and functionalities. First, the images published by the camera are grabbed, rectified, fed into the system, and processed by the OD and SS modules and the merging methods. Finally, the obtained coverage maps are published back into ROS to be accessed by other robots, sensors, or actuators.
AUVs tend to have limited computational resources due to the constraints that underwater environments present (limited space, power supply, and heat dissipation). Thus, an essential key aspect of an online implementation is system efficiency. Following this premise, a series of considerations were made over the different elements of the system. For the SS module, no considerations were taken. For the OD module, given that YOLOv5 is a scalable model, the feasibility of the implementation was studied using diverse model sizes. Finally, the weighted merging pipeline was selected, as its execution time is negligible, unlike the AUC merging, whose execution time depends on the number of detected blobs and instances and takes up to thirty seconds.
When evaluating the feasibility of executing the system online, the most crucial aspect is ensuring that there is overlap between outputted coverage maps, ensuring full coverage of an inspected area. This overlap is obtained using the following equations:
o v e r l a p = ( h F P d K F ) · h F P 1 ,
d K F = v · r a t e 1 ,
h F P = ( a · h i m a g e ) · f 1 ,
where v denotes the AUV velocity, r a t e denotes the coverage map output frame rate, a denotes the navigation altitude, f denotes the camera focal length and h i m a g e denotes the image height pixels.
A performance study for each YOLOv5 model size was conducted, studying their corresponding inference times when executed on board the AUV. Table 7 presents the OD module and total weighted merging inference times for each model.
As expected, smaller YOLOv5 models have much lower inference times. Since computational power is a scarce resource in underwater systems, and aiming at achieving the lowest impact on its workload, the YOLOv5 nano (YOLOv5n) is selected for the online implementation, achieving a frame rate for the coverage map output of 0.81 f p s .
The weighted merging, using the Yolov5n as its OD module, was tested over the complete dataset test set, achieving an F1-score score of 77.8%, which is slightly lower than the obtained using the Yolov5x model. This marginal loss in performance shows that the SS module compensates for the possible loss in performance caused by the reduction in the model size and that the nano model is good enough to be selected for the on-board OD module.

6. Conclusions and Future Work

This paper presents a novel pipeline for the automatic estimation of H.i coverage. First, two scenario-based solutions are presented, one for densely covered scenarios, based on SS using U-net; and another for sparsely covered scenarios, based on OD using YOLOv5x. The modules achieved an F1-score of 87.8% and 77.4% on their corresponding scenarios, respectively.
Although the scenario-based solutions presented a relatively good performance when evaluating images from their corresponding scenario, both modules experienced a loss in performance when estimating H.i coverage over the complete dataset test set. To address this limitation, this work proposed two novel merging methods to generalize the H.i coverage estimation and make it adaptable to any invasion stage: the weighted and AUC merging methods outperformed the scenario-based solutions, obtaining an F1-score of 79.7% and 84.2% over the complete dataset test set and a Coverage Error of 8.8% and 5.9%, respectively.
Also, the system was integrated into ROS and deployed on an AUV. Finally, the system was adapted to run online by selecting a lighter yolov5 model and merging method. The system maintained good H.i estimation results and achieved an execution time that was sufficiently low to ensure a full area coverage for inspection tasks.
Further work will focus on expanding the dataset by using generative methods to fine-tune the models with a larger dataset. Additionally, the models could be adapted to recognize dead H.i and other seaweed species to provide more complete information.

Author Contributions

Conceptualisation Y.G.-C.; methodology, C.M.-G., M.M.-A. and Y.G.-C.; software, C.M.-G. and M.M.-A.; validation, C.M.-G. and M.M.-A.; investigation, C.M.-G. and M.M.-A.; resources, Y.G.-C.; data curation, C.M.-G. and M.M.-A.; writing—original draft preparation, C.M.-G. and M.M.-A.; writing—review and editing, C.M.-G., M.M.-A. and Y.G.-C.; supervision, Y.G.-C.; project administration, Y.G.-C.; funding acquisition, Y.G.-C. All authors have read and agreed to the published version of the manuscript.

Funding

Caterina Muntaner-Gonzalez was supported by the Consejería de Educación y Universidades del Gobierno de las Islas Baleares under the contract FPU-2022-010-C CAIB2022. This work has been partially sponsored and promoted by the Comunitat Autonoma de les Illes Balears through the Direcció General de Recerca, Innovació i Transformació Digital and the Conselleria de Economia, Hisenda i Innovació via Plans complementaris del Pla de Recuperació, Transformació i Resiliència (PRTR-C17-I1) and by the European Union-Next Generation UE (BIO/002A.1 and BIO/022B.1). Nevertheless, the views and opinions expressed are solely those of the author or authors, and do not necessarily reflect those of the European Union or the European Commission. Neither the European Union nor the European Commission are to be held responsible. This work has been partially sponsored and promoted by grant PLEC2021-007525 funded by MCIN/AEI/10.13039/50110001103 and by the European Union NextGenerationEU/PRTR.

Data Availability Statement

The data, code and trained models presented in this work can be found here (https://github.com/srv/Halimeda) (accessed on 26 December 2023).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Bellard, C.; Cassey, P.; Blackburn, T.M. Alien species as a driver of recent extinctions. Biol. Lett. 2016, 12, 20150623. [Google Scholar] [CrossRef] [PubMed]
  2. Kremen, C.; Merenlender, A.M.; Murphy, D.D. Ecological monitoring: A vital need for integrated conservation and development programs in the tropics. Conserv. Biol. 1994, 8, 388–397. [Google Scholar] [CrossRef]
  3. Chatterjee, S. An analysis of threats to marine biodiversity and aquatic ecosystems. SSRN Electron. J. 2017. [Google Scholar] [CrossRef]
  4. Bianchi, C.N. Biodiversity issues for the forthcoming tropical Mediterranean Sea. Hydrobiologia 2007, 580, 7–21. [Google Scholar] [CrossRef]
  5. Guiry, M. AlgaeBase. World-Wide Electronic Publication. 2013. Available online: http://www.algaebase.org (accessed on 9 October 2023).
  6. Van Tussenbroek, B.I.; Barba Santos, M.G. Demography of Halimeda incrassata (Bryopsidales, Chlorophyta) in a Caribbean reef lagoon. Mar. Biol. 2011, 158, 1461–1471. [Google Scholar] [CrossRef]
  7. Alós, J.; Tomas, F.; Terrados, J.; Verbruggen, H.; Ballesteros, E. Fast-spreading green beds of recently introduced Halimeda incrassata invade Mallorca island (NW Mediterranean Sea). Mar. Ecol. Prog. Ser. 2016, 558, 153–158. [Google Scholar] [CrossRef]
  8. Alós, J.; Bujosa-Homar, E.; Terrados, J.; Tomas, F. Spatial distribution shifts in two temperate fish species associated to a newly-introduced tropical seaweed invasion. Biol. Invasions 2018, 20, 3193–3205. [Google Scholar] [CrossRef]
  9. Moniruzzaman, M.; Islam, S.M.S.; Lavery, P.; Bennamoun, M.; Lam, C.P. Imaging and Classification Techniques for Seagrass Mapping and Monitoring: A Comprehensive Survey. arXiv 2019, arXiv:cs.CV/1902.11114. [Google Scholar]
  10. Miner, S.P. Application of acoustic hydrosurvey technology to the mapping of eelgrass (Zostera marina) distribution in Humboldt Bay, California. In Proceedings of the Coastal Zone’93, ASCE, New Orleans, LA, USA, 19–23 July 1993; pp. 2429–2442. [Google Scholar]
  11. Mutlu, E.; Olguner, C. Density-depended acoustical identification of two common seaweeds (Posidonia oceanica and Cymodocea nodosa) in the Mediterranean Sea. Thalass. Int. J. Mar. Sci. 2023, 39, 1155–1167. [Google Scholar] [CrossRef]
  12. Kruss, A.; Tegowski, J.; Tatarek, A.; Wiktor, J.; Blondel, P. Spatial distribution of macroalgae along the shores of Kongsfjorden (West Spitsbergen) using acoustic imaging. Pol. Polar Res. 2017, 38, 205–229. [Google Scholar] [CrossRef]
  13. Gao, L.; Li, X.; Kong, F.; Yu, R.; Guo, Y.; Ren, Y. AlgaeNet: A deep-learning framework to detect floating green algae from optical and SAR imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 2782–2796. [Google Scholar] [CrossRef]
  14. Clarke, K.; Hennessy, A.; McGrath, A.; Daly, R.; Gaylard, S.; Turner, A.; Cameron, J.; Lewis, M.; Fernandes, M.B. Using hyperspectral imagery to investigate large-scale seagrass cover and genus distribution in a temperate coast. Sci. Rep. 2021, 11, 4182. [Google Scholar] [CrossRef] [PubMed]
  15. Dierssen, H.M.; Chlus, A.; Russell, B. Hyperspectral discrimination of floating mats of seagrass wrack and the macroalgae Sargassum in coastal waters of Greater Florida Bay using airborne remote sensing. Remote Sens. Environ. 2015, 167, 247–258. [Google Scholar] [CrossRef]
  16. Raine, S.; Marchant, R.; Moghadam, P.; Maire, F.; Kettle, B.; Kusy, B. Multi-species seagrass detection and classification from underwater images. In Proceedings of the 2020 Digital Image Computing: Techniques and Applications (DICTA), Melbourne, Australia, 29 November–2 December 2020. [Google Scholar] [CrossRef]
  17. Bonin-Font, F.; Burguera, A.; Lisani, J.L. Visual discrimination and large area mapping of Posidonia oceanica using a lightweight auv. IEEE Access 2017, 5, 24479–24494. [Google Scholar] [CrossRef]
  18. Martin-Abadal, M.; Guerrero-Font, E.; Bonin-Font, F.; Gonzalez-Cid, Y. Deep semantic segmentation in an AUV for online Posidonia oceanica meadows identification. IEEE Access 2018, 6, 60956–60967. [Google Scholar] [CrossRef]
  19. Weidmann, F.; Jager, J.; Reus, G.; Schultz, S.T.; Kruschel, C.; Wolff, V.; Fricke-Neuderth, K. A Closer look at seagrass meadows: Semantic segmentation for visual coverage estimation. In Proceedings of the OCEANS 2019—Marseille, Marseille, France, 17–20 June 2019. [Google Scholar] [CrossRef]
  20. Park, J.; Baek, J.; Kim, J.; You, K.; Kim, K. Deep learning-based algal detection model development considering field application. Water 2022, 14, 1275. [Google Scholar] [CrossRef]
  21. Noman, M.K.; Islam, S.M.S.; Abu-Khalaf, J.; Jalali, S.M.J.; Lavery, P. Improving accuracy and efficiency in seagrass detection using state-of-the-art AI techniques. Ecol. Inform. 2023, 76, 102047. [Google Scholar] [CrossRef]
  22. Moniruzzaman, M.; Islam, S.M.S.; Lavery, P.; Bennamoun, M. Faster R-CNN based deep learning for seagrass detection from underwater digital images. In Proceedings of the 2019 Digital Image Computing: Techniques and Applications, DICTA 2019, Perth, Australia, 2–4 December 2019; pp. 1–7. [Google Scholar] [CrossRef]
  23. Ranolo, E.; Gorro, K.; Ilano, A.; Pineda, H.; Sintos, C.; Gorro, A.J. Underwater and coastal seaweeds detection for fluorescence seaweed photos and videos using YOLOV3 and YOLOV5. In Proceedings of the 2023 2nd International Conference for Innovation in Technology (INOCON), Bangalore, India, 3–5 March 2023; pp. 1–5. [Google Scholar]
  24. Bonin-Font, F.; Abadal, M.M.; Font, E.G.; Torres, A.M.; Nordtfeldt, B.M.; Crespo, J.M.; Tomas, F.; Gonzalez-Cid, Y. AUVs for control of marine alien invasive species. In Oceans Conference Record, Proceedings of the OCEANS 2021: San Diego—Porto, San Diego, CA, USA, 20–23 September 2021; IEEE: Piscataway, NJ, USA; pp. 1–5. [CrossRef]
  25. Systems, Robotics & Vision, University of the Balearic Islands. Halimeda. 2023. Available online: https://github.com/srv/Halimeda (accessed on 26 December 2023).
  26. Observadores del Mar. Available online: https://www.observadoresdelmar.es (accessed on 7 June 2023).
  27. The GIMP Development Team. GIMP. Available online: https://www.gimp.org/ (accessed on 26 December 2023).
  28. Tzutalin, D. tzutalin/labelImg. Free Software: MIT License. 2015. Available online: https://github.com/tzutalin/labelImg (accessed on 26 December 2023).
  29. Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
  30. Minaee, S.; Boykov, Y.; Porikli, F.; Plaza, A.; Kehtarnavaz, N.; Terzopoulos, D. Image segmentation using deep learning: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3523–3542. [Google Scholar] [CrossRef]
  31. Scarpetta, M.; Affuso, P.; De Virgilio, M.; Spadavecchia, M.; Andria, G.; Giaquinto, N. Monitoring of seagrass meadows using satellite images and U-Net convolutional neural network. In Proceedings of the IEEE Instrumentation and Measurement Technology Conference, Ottawa, ON, Canada, 16–19 May 2022; pp. 1–6. [Google Scholar] [CrossRef]
  32. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
  33. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 91–99. [Google Scholar] [CrossRef]
  34. Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
  35. Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
  36. Jocher, G.; Chaurasia, A.; Stoken, A.; Borovec, J.; Kwon, Y.; Michael, K.; Fang, J.; Yifu, Z.; Wong, C.; Montes, D.; et al. Ultralytics/yolov5: v7. 0-YOLOv5 SotA realtime instance segmentation. Zenodo 2022. Available online: https://zenodo.org/records/7002879 (accessed on 20 December 2023).
  37. Li, C.; Li, L.; Geng, Y.; Jiang, H.; Cheng, M.; Zhang, B.; Ke, Z.; Xu, X.; Chu, X. YOLOv6 v3.0: A Full-Scale Reloading. arXiv 2023, arXiv:cs.CV/2301.05586. [Google Scholar]
  38. Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
  39. Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLOv8, 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 26 December 2023).
  40. Wang, H.; Sun, S.; Wu, X.; Li, L.; Zhang, H.; Li, M.; Ren, P. A yolov5 baseline for underwater object detection. In Proceedings of the OCEANS 2021: San Diego–Porto, Virtual, 20–23 September 2021; pp. 1–4. [Google Scholar]
  41. Xu, R.; Lin, H.; Lu, K.; Cao, L.; Liu, Y. A forest fire detection system based on ensemble learning. Forests 2021, 12, 217. [Google Scholar] [CrossRef]
  42. Carreras, M.; Hernández, J.D.; Vidal, E.; Palomeras, N.; Ribas, D.; Ridao, P. Sparus II AUV - A hovering vehicle for seabed inspection. IEEE J. Ocean. Eng. 2018, 43, 344–355. [Google Scholar] [CrossRef]
  43. Stanford Artificial Intelligence Laboratory. Robotic Operating System. Version: ROS Melodic Morenia. 2018. Available online: https://www.ros.org (accessed on 20 December 2023).
Figure 1. Locations of the coast of Mallorca and Cabrera where H.i images were taken. The red dots indicate these locations, specifically marking Ses Illetes, Cala Blava, and Colònia de Sant Jordi in Mallorca and sa Platgeta in Cabrera.
Figure 1. Locations of the coast of Mallorca and Cabrera where H.i images were taken. The red dots indicate these locations, specifically marking Ses Illetes, Cala Blava, and Colònia de Sant Jordi in Mallorca and sa Platgeta in Cabrera.
Jmse 12 00070 g001
Figure 2. Dataset examples illustrating different types of scenarios in which H.i can be found depending on its invasive stage. Left: sparse scenario, corresponding with an early stage of invasion. Right: dense scenario, corresponding with an advanced stage of invasion of the algae.
Figure 2. Dataset examples illustrating different types of scenarios in which H.i can be found depending on its invasive stage. Left: sparse scenario, corresponding with an early stage of invasion. Right: dense scenario, corresponding with an advanced stage of invasion of the algae.
Jmse 12 00070 g002
Figure 3. Image annotation example. (a) Original image. (b) Ground truth label map, where the H.i is marked in white and the background in black. (c) Original image. (d) Labelled image marking the H.i instances with bounding boxes.
Figure 3. Image annotation example. (a) Original image. (b) Ground truth label map, where the H.i is marked in white and the background in black. (c) Original image. (d) Labelled image marking the H.i instances with bounding boxes.
Jmse 12 00070 g003
Figure 4. Dataset management. The complete dataset is partitioned into the sparse dataset and the dense dataset.
Figure 4. Dataset management. The complete dataset is partitioned into the sparse dataset and the dense dataset.
Jmse 12 00070 g004
Figure 5. Schema of the U-net network architecture, redrawn from [29]. The left side of the U-shape is the contraction path, where each layer consisting of two 3 × 3 convolutions with ReLu activation and a 2 × 2 maximum pooling layer. The right is the expansion part and consists of the decoding stage and the upsampling process that is realized via 2 × 2 deconvolution to reduce the quantity of input channels by half.
Figure 5. Schema of the U-net network architecture, redrawn from [29]. The left side of the U-shape is the contraction path, where each layer consisting of two 3 × 3 convolutions with ReLu activation and a 2 × 2 maximum pooling layer. The right is the expansion part and consists of the decoding stage and the upsampling process that is realized via 2 × 2 deconvolution to reduce the quantity of input channels by half.
Jmse 12 00070 g005
Figure 6. Diagram of the SS module. The trained U-net processes the input images and outputs a probability map that is then binarized using a threshold C t h r s s to obtain the coverage map. The probability map and the coverage map are the outputs of the SS module.
Figure 6. Diagram of the SS module. The trained U-net processes the input images and outputs a probability map that is then binarized using a threshold C t h r s s to obtain the coverage map. The probability map and the coverage map are the outputs of the SS module.
Jmse 12 00070 g006
Figure 7. Schema of the YOLOv5 network architecture, redrawn from [41]. It is composed of three main parts: Backbone(CSPDarkNet), Neck (PANet), and Head (YOLOv5 Head). The data are first input to CSPDarknet for feature extraction and then fed to PANet for feature fusion. Finally, the Head outputs the detection results (class, score, and bounding box).
Figure 7. Schema of the YOLOv5 network architecture, redrawn from [41]. It is composed of three main parts: Backbone(CSPDarkNet), Neck (PANet), and Head (YOLOv5 Head). The data are first input to CSPDarknet for feature extraction and then fed to PANet for feature fusion. Finally, the Head outputs the detection results (class, score, and bounding box).
Jmse 12 00070 g007
Figure 8. Diagram of the OD module. The trained Yolov5 network processes the input images and outputs the O D i n s t array that is thesholded by C t h r _ o d _ 1 to obtain a new O D i n s t _ t h r _ 1 array. These instance arrays are converted to an image format, generating the probability and the coverage map, respectively. The probability map, the coverage map, and the O D i n s t array are the outputs of the module.
Figure 8. Diagram of the OD module. The trained Yolov5 network processes the input images and outputs the O D i n s t array that is thesholded by C t h r _ o d _ 1 to obtain a new O D i n s t _ t h r _ 1 array. These instance arrays are converted to an image format, generating the probability and the coverage map, respectively. The probability map, the coverage map, and the O D i n s t array are the outputs of the module.
Jmse 12 00070 g008
Figure 9. Weighted merging pipeline. It performs a weighted combination of the probability maps outputted by the SS and OD modules and applies a confidence threshold to obtain the weighted coverage map.
Figure 9. Weighted merging pipeline. It performs a weighted combination of the probability maps outputted by the SS and OD modules and applies a confidence threshold to obtain the weighted coverage map.
Jmse 12 00070 g009
Figure 10. AUC merging pipeline. It uses OD instance information to validate areas of coverage generated from the SS coverage map. Later, this information is merged with the OD coverage map to obtain the AUC coverage map.
Figure 10. AUC merging pipeline. It uses OD instance information to validate areas of coverage generated from the SS coverage map. Later, this information is merged with the OD coverage map to obtain the AUC coverage map.
Jmse 12 00070 g010
Figure 11. SS coverage map clustering. (a) SS coverage map. (b) Blobs after applying the opening operation. (c) Blobs after applying the 4-connectivity connected-component algorithm. (d) Final blobs after applying the threshold p t h r .
Figure 11. SS coverage map clustering. (a) SS coverage map. (b) Blobs after applying the opening operation. (c) Blobs after applying the 4-connectivity connected-component algorithm. (d) Final blobs after applying the threshold p t h r .
Jmse 12 00070 g011
Figure 12. Curves indicating the increase in TPs ( Δ T P s ) and increase in FPs ( Δ F P s ), and its difference ( Δ d i f f ) for each C t h r _ o d _ 2 .
Figure 12. Curves indicating the increase in TPs ( Δ T P s ) and increase in FPs ( Δ F P s ), and its difference ( Δ d i f f ) for each C t h r _ o d _ 2 .
Jmse 12 00070 g012
Figure 13. AUC and N i n s t obtaining process. (a) Original image; (b) O D i n s t _ t h r _ 2 instances from the OD module output printed over the original image; (c) blob generated from the coverage map outputted by the SS module; (d) coverage-confidence curve, indicating its AUC, and N i n s t value, obtained as the number of O D i n s t _ t h r _ 2 instances that intersect with the blob; (eg) O D i n s t _ t h r _ 2 instances with confidence >5, 20, 40%, respectively, printed over the blob, showcasing its coverage at these confidence thresholds.
Figure 13. AUC and N i n s t obtaining process. (a) Original image; (b) O D i n s t _ t h r _ 2 instances from the OD module output printed over the original image; (c) blob generated from the coverage map outputted by the SS module; (d) coverage-confidence curve, indicating its AUC, and N i n s t value, obtained as the number of O D i n s t _ t h r _ 2 instances that intersect with the blob; (eg) O D i n s t _ t h r _ 2 instances with confidence >5, 20, 40%, respectively, printed over the blob, showcasing its coverage at these confidence thresholds.
Jmse 12 00070 g013
Figure 14. (Left): Flow chart of the blob validation algorithm. For each blob generated from the SS coverage map clustering, both its N i n s t and AUC metric are computed. Afterwards, the AUC range corresponding to the AUC value of the blob is determined. If the N i n s t value of the blob surpasses the N t h r value associated with the blob’s AUC range, the blob is validated; otherwise, it is discarded. (Right): blob validation example. The blob validation algorithm is applied to each blob from the SS coverage map clustering. In this example, the yellow and pink blobs are validated as their N i n s t is greater than the corresponding N t h r value within the AUC range matching their AUC value. Conversely, blue and green blobs are discarded as they fail to meet this condition.
Figure 14. (Left): Flow chart of the blob validation algorithm. For each blob generated from the SS coverage map clustering, both its N i n s t and AUC metric are computed. Afterwards, the AUC range corresponding to the AUC value of the blob is determined. If the N i n s t value of the blob surpasses the N t h r value associated with the blob’s AUC range, the blob is validated; otherwise, it is discarded. (Right): blob validation example. The blob validation algorithm is applied to each blob from the SS coverage map clustering. In this example, the yellow and pink blobs are validated as their N i n s t is greater than the corresponding N t h r value within the AUC range matching their AUC value. Conversely, blue and green blobs are discarded as they fail to meet this condition.
Jmse 12 00070 g014
Figure 15. SPARUS II AUV.
Figure 15. SPARUS II AUV.
Jmse 12 00070 g015
Table 1. Combination of hyperparameters tested for the SS module. The network was trained for each learning rate with and without using data augmentation. The learning rate values were selected by altering the default U-net learning rate, 0.001, by a scale of 3.
Table 1. Combination of hyperparameters tested for the SS module. The network was trained for each learning rate with and without using data augmentation. The learning rate values were selected by altering the default U-net learning rate, 0.001, by a scale of 3.
Data AugmentationLearning Rate
0.009
0.003
Yes0.001
0.00033
0.00011
0.009
0.003
No0.001
0.00033
0.00011
Table 2. Combinations of hyperparameters tested for the OD module. The network was trained for each learning rate with and without using data augmentation. The learning rate values were selected by altering the default Yolov5 learning rate, 0.01, by a scale of 3.
Table 2. Combinations of hyperparameters tested for the OD module. The network was trained for each learning rate with and without using data augmentation. The learning rate values were selected by altering the default Yolov5 learning rate, 0.01, by a scale of 3.
Data AugmentationLearning Rate
0.03
0.01
Yes0.0033
0.0011
0.00037
0.03
0.01
No0.0033
0.0011
0.00037
Table 3. N t h r values to validate blobs for each AUC range.
Table 3. N t h r values to validate blobs for each AUC range.
AUC (%)0–1010–2020–3030–4040–5050–6060–7070–8080–9090–100
N t h r ( n ) 801511111111
Table 4. Results of the hyperparameter study for the SS module. The table represents, for each combination of hyperparameters, the mean F1-score, obtained as a result of the five k-fold cross-validation.
Table 4. Results of the hyperparameter study for the SS module. The table represents, for each combination of hyperparameters, the mean F1-score, obtained as a result of the five k-fold cross-validation.
Data AugmentationLearning RateF1-Score
0.00973.8%
0.00376.4%
No0.00184.7%
0.0003387.8%
0.0001185.2%
0.00973.1%
0.00373.7%
Yes0.00181.8%
0.0003385.9%
0.0001184.5%
Table 5. Results of the hyperparameter study for the OD module. The table represents, for each combination of hyperparameters, the mean F1-score, obtained as a result of the five k-fold cross-validation.
Table 5. Results of the hyperparameter study for the OD module. The table represents, for each combination of hyperparameters, the mean F1-score, obtained as a result of the five k-fold cross-validation.
Data AugmentationLearning RateF1-Score
0.0369.3%
0.0172.9%
No0.003368.2%
0.001167.8%
0.0003766.8%
0.0376.7%
0.0176.7%
Yes0.003377.4%
0.001177.4%
0.0003776.7%
Table 6. Comparison results of SS and OD modules and merging methods over the complete dataset test set.
Table 6. Comparison results of SS and OD modules and merging methods over the complete dataset test set.
ApproachF1-ScoreCoverage Error
SS Module78.8%10.0%
OD Module35.0%19.9%
Weighted Merging79.7%8.8%
AUC Merging84.2%5.9%
Table 7. YOLOv5 model size inference times. The table represents the inference time in the AUV onboard computer for each Yolov5 size. The total time is the sum of the OD module inference time and the SS module inference time.
Table 7. YOLOv5 model size inference times. The table represents the inference time in the AUV onboard computer for each Yolov5 size. The total time is the sum of the OD module inference time and the SS module inference time.
ModuleInference Time(s)
SS1.13
ODXLlargemediumsmallnano
1.630.930.450.20.09
Total2.752.051.601.341.23
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Muntaner-Gonzalez, C.; Martin-Abadal, M.; Gonzalez-Cid, Y. A Deep Learning Approach to Estimate Halimeda incrassata Invasive Stage in the Mediterranean Sea. J. Mar. Sci. Eng. 2024, 12, 70. https://doi.org/10.3390/jmse12010070

AMA Style

Muntaner-Gonzalez C, Martin-Abadal M, Gonzalez-Cid Y. A Deep Learning Approach to Estimate Halimeda incrassata Invasive Stage in the Mediterranean Sea. Journal of Marine Science and Engineering. 2024; 12(1):70. https://doi.org/10.3390/jmse12010070

Chicago/Turabian Style

Muntaner-Gonzalez, Caterina, Miguel Martin-Abadal, and Yolanda Gonzalez-Cid. 2024. "A Deep Learning Approach to Estimate Halimeda incrassata Invasive Stage in the Mediterranean Sea" Journal of Marine Science and Engineering 12, no. 1: 70. https://doi.org/10.3390/jmse12010070

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop