ML-Net: A Multi-Local Perception Network for Healthy and Bleached Coral Image Classification

Wang, Sai; Chen, Nan-Lin; Song, Yong-Duo; Wang, Tuan-Tuan; Wen, Jing; Guo, Tuan-Qi; Zhang, Hong-Jin; Mo, Ling; Ma, Hao-Ran; Xiang, Lei

doi:10.3390/jmse12081266

Open AccessArticle

ML-Net: A Multi-Local Perception Network for Healthy and Bleached Coral Image Classification

by

Sai Wang

^1,2

,

Nan-Lin Chen

¹,

Yong-Duo Song

¹,

Tuan-Tuan Wang

^2,*,

Jing Wen

¹,

Tuan-Qi Guo

^1,3,

Hong-Jin Zhang

^1,4,

Ling Mo

⁵,

Hao-Ran Ma

^1,4 and

Lei Xiang

⁶

¹

State Key Laboratory of Marine Resource Utilization in South China Sea, Hainan University, Haikou 570228, China

²

School of Ecology and Environment, Hainan University, Haikou 570228, China

³

Hainan Qingxiao Environmental Testing Co., Ltd., Sanya 572024, China

⁴

Hainan Qianchao Ecological Technology Co., Ltd., Sanya 572024, China

⁵

Hainan Research Academy of Environmental Sciences, Haikou 571126, China

⁶

Department of Ecology, Jinan University, Guangzhou 510632, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2024, 12(8), 1266; https://doi.org/10.3390/jmse12081266

Submission received: 21 June 2024 / Revised: 22 July 2024 / Accepted: 23 July 2024 / Published: 28 July 2024

(This article belongs to the Section Marine Environmental Science)

Download

Browse Figures

Versions Notes

Abstract

:

Healthy coral reefs provide diverse habitats for marine life, playing a crucial role in marine ecosystems. Coral health is under threat due to global climate change, ocean pollution, and other environmental stressors, leading to coral bleaching. Coral bleaching disrupts the symbiotic relationship between corals and algae, ultimately impacting the entire marine ecosystem. Processing complex underwater images manually is time-consuming and burdensome for marine experts. To rapidly locate and monitor coral health, deep neural networks are employed for identifying coral categories, which can facilitate the automated processing of extensive underwater imaging data. However, these classification networks may overlook critical classification criteria like color and texture. This paper proposes a multi-local perception network (ML-Net) for image classification of healthy and bleached corals. ML-Net focuses on local features of coral targets, leveraging valuable information for image classification. Specifically, the proposed multi-branch local adaptive block extracts image details through parallel convolution kernels. Then, the proposed multi-scale local fusion block integrates features of different scales vertically, enhancing the detailed information within the deep network. Residual structures in the shallow network transmit local information with more texture and color to the deep network. Both horizontal and vertical multi-scale fusion blocks in deep networks are used to capture and retain local details. We evaluated ML-Net using six evaluation metrics on the Bleached and Unbleached Corals Classification dataset. In particular, ML-Net achieves an ACC result of 86.35, which is 4.36 higher than ResNet and 8.5 higher than ConvNext. Experimental results demonstrate the effectiveness of the proposed modules for coral classification in underwater environments.

Keywords:

coral image classification; bleached corals; local feature; multi-scale feature

1. Introduction

The importance of healthy corals to marine ecosystems cannot be underestimated, as they provide habitat and food for at least 25% of known marine species [1]. Corals are formed through mutualistic partnerships between cnidarian hosts, their photosynthetic algal endosymbionts, and a complex community of microbes. They are primarily known for forming calcium carbonate skeletons that make up coral reefs. Hard corals, specifically those in the order Scleractinia, are characterized by their stony calcareous skeletons. Zooxanthellae produces energy through photosynthesis and provides most of the organic matter to the corals. In return, corals provide a shelter for the symbiotic algae to grow and a source of waste. This symbiotic relationship allows corals to obtain the energy they need, while also contributing to their vibrant colors [2]. However, factors such as global climate change, overfishing, and marine pollution pose a threat to coral reefs. With the abnormal rise in sea temperature, the phenomenon of coral pigmentation change or loss of symbiont is called coral bleaching [3]. In addition to ocean warming, other environmental pressures such as increased carbon dioxide concentrations in the atmosphere and solar radiation exacerbate bleaching events. Bleaching of corals severely reduces photosynthesis by algae, which in turn leads to a loss of daily energy. If coral does not recover from bleaching in time, it will cause coral death [4]. Therefore, it is essential to protect and maintain healthy corals to maintain marine biodiversity and ecological balance.

Healthy corals generally appear brown and green due to the presence of photosynthetic pigments in their algae symbionts, which often give them a brown color. Additionally, corals may contain fluorescent proteins, which can contribute to their vibrant hues. As the symbiotic algae are gradually expelled, the bright white calcium carbonate exoskeleton of the coral is exposed. Bleached corals are more likely to be white or fluorescent, with a shriveled exoskeleton. Bleached corals differ in color and appearance from healthy corals, as shown in Figure 1. It is possible that the symbiotic disruption caused by coral bleaching can be restored to health, as long as the abnormal state does not last too long [5]. Coral bleaching is a serious problem that requires early identification and action by experts to protect and restore coral health [6]. The wide coral coverage and complex changes in the marine environment pose challenges in identifying and monitoring healthy corals. The researchers are able to obtain a relatively accurate picture of the coral’s health by surveying sightings in the field. Analyzing raw images to extract useful information is not only labor-intensive but also requires experts to manually process each image [7]. The efficiency of this method in identifying bleached coral often depends on the expertise of the researcher.

In recent years, researchers have considered designing automated image analysis tools to aid in monitoring coral conditions. Image classification algorithms focus on finding differences between different categories based on image features such as color, shape, and texture. Researchers design algorithms to extract information about these differences or set parameters to favor certain features. The image classifier then outputs results based on these features. To improve classification accuracy, researchers need to design different feature extraction methods and combinations. However, factors like ocean depth and water turbidity affect the quality of manual features, making algorithm design more challenging. Environmental factors such as coral cover, underwater conditions, and latitude also affect the accuracy of coral classification. How to distinguish healthy coral and bleached coral efficiently and accurately is of great significance for marine environmental protection.

Deep neural networks (DNNs) provide technical support for the automatic processing of image features in complex environments. DNNs are trained to analyze complex and diverse coral image features, greatly reducing the cost of monitoring coral status. The existing methods are often directly transferred from DNN models that have been successful in the field of computer vision; there is no dedicated network for coral image health classification. Existing studies of coral classification tend to focus on local features such as color and texture [8]. With the increase in network depth, existing image classification models tend to lose local information when extracting abstract features in down-sampling operations. Therefore, the two research questions that promote the development of this paper are the following. (1) How should a dedicated DNN network for coral image health classification be designed? (2) Considering that local features are more important in coral image health classification, how should the local feature extraction capability of private network be enhanced?

To answer these questions, we propose a model, ML-Net, as shown in Figure 2. In detail, ML-Net is divided into three steps. First, the shallow features of the input coral images are extracted from the residual feature extraction block. Second, the learned shallow features are input into the multi-branch local adaptive block. The parallel feature extractor can enhance the model’s processing of detail information, so as to improve the perception ability of local features of coral image. Third, the features learned at different levels are input into the multi-scale local fusion block. Through the complementary information of different scale features in the same local receptive field, the local features easily lost by ML-Net are further enhanced. Numerous experiments on Bleached and Unbleached Corals Classification datasets demonstrate the superior performance of ML-Net.

We propose a coral image health classification network: ML-Net. The network can improve the perception ability of local features of coral images from multiple angles. Our main contributions can be summarized as follows:

We propose the multi-branch local adaptive block, which can use the multi-branch structure to improve the perception of local features of coral images in the horizontal direction.
We propose the multi-scale local fusion block, which can use the multi-scale features of coral images in the vertical direction to supplement the expression of local features.

2. Related Work

2.1. Manual Features for Coral Classification

The manual features used for coral classification have been focused on the color and texture descriptions of coral images. Due to the diversity of corals in shape, the characterization of shape has not been used as the preferred angle for coral classification. The researchers relied on a certain combination of features based on color and texture to identify coral types. Beijbom et al. [9] encoded coral image information through multi-scale feature extraction and color extension to construct accurate and large-scale coral data labeling baselines. The paper found that coral objects lacked a clear sense of shape and that the algorithm introduced color information to help reduce the color attenuation of underwater organisms. Shakoor et al. proposed a new LBP (Local Binary Patterns) [10] to extract features from non-uniform clusters and combine effective texture features with mapping methods to identify coral types more efficiently. The proposed method extended the non-uniform pattern to improve classification accuracy and could be applied to all types of textures in coral images.

2.2. Deep Neural Networks

Deep neural networks abstractly represent images with different layers of neurons. The strength of deep neural networks is their ability to automatically learn texture details such as lines or edges and incorporate features into overall object recognition in subsequent layers. This integration process provides robustness for the model to recognize the color distribution and color change in the object. In order to design a deeper network structure, ResNet [11], proposed by He et al., optimized the information transmission of deep neural networks through skip connections and modular design. The skip connection skips one or more layers and connects the input of a layer to the output of a later layer. This forms a shortcut path for the gradient to flow, bypassing some of the intermediate layers. Skip connections added input and output features together, effectively avoiding the problem of disappearing gradients while preserving details. However, ResNet does not further optimize the retained partial information. In view of the different distribution of image information, InceptionNet [12], proposed by Ioffe et al., used convolution kernels in different sizes to extract features under different receptive fields and then combined the features to obtain multi-scale features. Inception modules inputted features into multiple branches, increasing the number of times features are processed and deepening the width of the network. InceptionNet reduced computational complexity while enhancing the ability to represent network features. But InceptionNet only considers vertical multiscale features. EfficientNet [13], proposed by Tan et al., uses Compound Scaling to maintain a balance between the various dimensions of the network. The composite scaling strategy employed by EfficientNet is designed to optimize overall performance by simultaneously adjusting the depth, width, and resolution of the network. However, this global optimization strategy may not fully take into account the special needs of local feature extraction. The deep separable convolution used in EfficientNet, while having fewer parameters and FLOPs, tends not to take full advantage of the computing power of modern accelerators such as GPUs. This can result in some situations where the training speed is not as good as expected. ViT [14], proposed by Dosovitskiy et al., split and flattened the image into the sequence and used the self-attention mechanism to capture the relative relationship between each token in the sequence and the other tokens. ViT constructed remote feature dependencies of images through a series of spatial transformations. Therefore, ViT had advantages in modeling the global context and strong generalization ability. ViT performs well in mining global image information but not well in processing local features. ConvNext [15], proposed by Liu et al., encodes the features based on the Transformer structure, using convolutional layers to embed the images into high-dimensional Spaces. This step helps to transform the raw image data into a high-dimensional feature representation suitable for subsequent processing. A core component of the ConvNext architecture, the Transformer encoder, uses the Transformer structure to encode features in high-dimensional spaces. This step captures long-term dependencies between features and enhances the model’s ability to understand complex image content. ConvNext can capture multi-scale information in images and has a good expression ability for object shapes and textures, so it is suitable for image classification tasks. Although ConvNext can gradually expand the receptive field by stacking Transformer encoder blocks, its initial local receptive field may be smaller compared to traditional convolutional neural networks such as ResNet. This means that in the early stages of the network, ConvNext may not be able to adequately capture the very fine local features in the image.

2.3. CNN for Coral Classification

For coral image classification, there is a large diversity in color, shape, and structure. A convolutional neural network (CNN) has the potential to handle these complex differences. Lumini et al. [16] integrated multiple CNN models to process complex and variable datasets and improved the accuracy of image classification at the cost of increasing memory complexity. The integration model selected the fine tuning of different CNN models to arrive at the best classifier for fusion. Reshma et al. [17] classified the three classification levels at different scales and perspectives to prove the classification ability of pre-trained ResNet. Borbon et al. [18] used CNN to identify the health, death, and bleaching status of corals and came to the conclusion that through CNN, it was easier to detect coral health status through experiments. When CNN processes coral images, it can not only automatically capture fine features but it can also effectively integrate texture and color features to improve the understanding and analysis of image content.

3. Proposed Method

3.1. Residual Feature Extraction Block

Convolutional neural networks construct multi-level deep networks by stacking convolutional operations. With the deepening of the network, the features extracted by CNN in deep and shallow networks differ greatly. Shallow networks usually extract more general features, while deep networks can extract more abstract features. The feature representation extracted by the convolutional neural network in the early stage can be reused by subsequent operations. ML-Net learns the general features of coral images by using two residual feature extraction blocks, as shown in Figure 2a. In the residuals feature extraction block, the input features are passed directly to the deep features through skip connections, so that the original features can be preserved. The classification models make it easier to learn identity mappings.

The module is mainly composed of two branches: the first branch is composed of two two-dimensional convolution layers with convolution kernel size 3 and the second branch is composed of one two-dimensional convolution layer with convolution kernel size 1. The outputs of the two branches are fused by adding them element-by-element. The process is as follows:

O_{1} = {B N (W}_{2} (σ (B N (W_{1} (I)))))

(1)

O_{2} = (B N (W_{3} (I)))

(2)

O = O_{1} + O_{2}

(3)

where

W_{1}

represents the first two-dimensional convolution layer in the first branch,

W_{2}

represents the second two-dimensional convolution layer in the first branch,

W_{3}

represents the two-dimensional convolution layer in the second branch, BN represents the BatchNorm2d layer,

σ

represents the GELU activation function,

I

represents the input feature of the residual feature extraction block,

O_{1}

represents the output of the first branch,

O_{2}

represents the output of the second branch, and

O

represents the output feature of the residual feature extraction block.

3.2. Multi-Branch Local Adaptive Block

The multi-branch local adaptive block (MBLA), shown in Figure 3, adopts a multi-branch structure to learn and capture local features in coral images more efficiently. First, the first branch contains a two-dimensional convolution layer with the convolution kernel size of 1, designed to ensure the overall performance of the multi-branch local adaptive block in the coral image classification task. Secondly, the second branch introduces a two-dimensional convolution layer with the convolution kernel size of 3, which helps to improve ML-Net’s ability to perceive more detailed and complex local features in coral images by expanding the receptive field of the convolution layer. Finally, the third branch consists of a two-dimensional convolution layer with the convolution kernel size of 5. By further expanding the receptive field of the convolution layer, ML-Net can enhance the learning ability of local features of coral images in a larger picture. The design of the multi-branch structure helps the network to extract local coral image features under different receptive fields, so as to improve the performance of ML-Net on coral image dataset. The process is as follows:

O_{3} = (σ (B N (W_{4} (I))))

(4)

O_{4} = (σ (B N (W_{5} (I))))

(5)

O_{5} = (σ (B N (W_{6} (I))))

(6)

O = {C (O}_{3}, O_{4}, O_{5})

(7)

where

W_{4}

represents the two-dimensional convolution layer in the first branch,

W_{5}

represents the two-dimensional convolution layer in the second branch,

W_{6}

represents the two-dimensional convolution layer in the third branch, BN represents the BatchNorm2d layer,

σ

represents the GELU activation function,

I

represents the input feature of the multi-branch local adaptive block,

O_{3}

represents the output of the first branch,

O_{4}

represents the output of the second branch,

O_{5}

represents the output of the third branch, and

O

represents the output feature of the multi-branch local adaptive block.

3.3. Multi-Scale Local Fusion Block

The multi-scale local fusion block (MSLF) is shown in Figure 4, which consists of three branches, each of which contains a two-dimensional convolution layer with convolution kernel size 1. It is worth noting that the input features of these three branches have different scales. This design aims to further improve ML-Net’s local feature perception capability in coral images by integrating information extraction at different scales. The multi-branch structure enables the network to understand the local features in coral images more comprehensively, thus enhancing the performance of image classification models at different scales. By introducing input features at different scales, the network can adapt to complex structures and textures in coral objects at different scales. This multi-branch structure makes ML-Net more adaptable to understand and analyze local features in coral images more comprehensively. By performing feature enhancement at different scales of local features, ML-Net is able to more effectively capture subtle changes and structure in coral images, thereby improving classification accuracy and robustness. The process is as follows:

O_{6} = (σ (B N (W_{7} (I))))

(8)

O_{7} = (σ (B N (W_{8} (I))))

(9)

O_{8} = (σ (B N (W_{9} (I))))

(10)

O = {C (O}_{6}, O_{7}, O_{8})

(11)

where

W_{7}

represents the two-dimensional convolution layer in the first branch,

W_{8}

represents the two-dimensional convolution layer in the second branch,

W_{9}

represents the two-dimensional convolution layer in the third branch, BN represents the BatchNorm2d layer,

σ

represents the GELU activation function,

I

represents the input feature of the multi-scale local fusion block,

O_{6}

represents the output of the first branch,

O_{7}

represents the output of the second branch,

O_{8}

represents the output of the third branch, and

O

represents the output feature of the multi-branch local fusion block.

The classification layer consists of two layers of fully connected networks that contain a dropout layer with a drop rate of 0.5 between the two layers. The output dimension of the first layer is 256. The output dimension of the second layer is the number of categories: two. The pseudo code for the ML-Net framework is given in Table 1.

4. Experiments

4.1. Datasets and Evaluation Metrics

The unbleached coral dataset used in this study is from “StructureRSMAS” [19], one of the most recent RGB open source datasets. The dataset can be downloaded from https://sci2s.ugr.es/CNN-coral-image-classification (accessed on 22 July 2024). Experiments were conducted using images of their healthy corals. The bleached coral classification dataset of open access is available at https://www.kaggle.com/sonainjamil/bleached-corals-detection (accessed on 22 July 2024). In this study, only the bleached images were selected and the dataset was expanded by rotation operation to become the bleached coral category. The image is adjusted to 128 × 128 before being input to the network. To better adapt to the underwater environment, raw images are fed directly into the model.

As a classification problem, this paper uses six evaluation metrics to evaluate the performance of ML-Net, namely, Accuary (ACC), Area Under Curve (AUC), Precision (Prec), Recall (Recall), precision (Accuary), and precision (ACC). F1 score (F1 score) and Average Precision (AP). These formulas are

A C C = \frac{(T P + T N)}{(T P + F P + T N + F N)}

(12)

P r e c = \frac{(T P)}{(T P + F P)}

(13)

R e c a l l = \frac{(T P)}{(T P + F N)}

(14)

F 1 = 2 \cdot \frac{(P r e c \cdot R e c a l l)}{(P r e c + R e c a l l)}

(15)

where TP is the number of samples that ML-Net correctly predicts as the positive class, FP is the number of samples that ML-Net incorrectly predicts as the positive class, TN is the number of samples that ML-Net correctly predicts as the negative class, and FN is the number of samples that ML-Net incorrectly predicts as the negative class.

4.2. Implementation Details

ML-Net is based on pytorch (Pytorch 1.8.0) [20] and is trained on an NVIDIA RTX3060. Pytorch is an open-source deep learning platform that offers flexibility, efficiency, and ease of use in both research and industry. It provides a dynamic computational graph, which makes it easier to build and train neural networks compared to other frameworks. Pytorch is used to easily build and train neural networks due to its intuitive interface and extensive library of tools and modules. AdamW [21] optimizer is used for optimization. The AdamW optimizer weight_decay is set to 0.01, Epoch is 60, and Batchsize is 32. The Loss function is Cross Entropy Loss. The initial Lr is 1 × 10⁻⁴ and the subsequent Lr is adjusted with exponential decay. The gamma of ExponentialLR is set to 0.985. At the same time, due to the small number of datasets, all the experiments in this paper adopted five-fold cross-validation.

4.3. Experimental Results

Since the coral classification task could not find good comparison networks, we used typical image classification networks for comparison, which were the EfficientNet [13], MobileNet [22], RegNet [23], ResNet [11], ShuffleNet [24], ViT [14], and ConvNext [15].

In the experimental results of Table 2, EfficientNet achieves an ACC of 54.25 and an AUC of 49.9, with the worst classification performance. The model lacks consideration of local information of coral objects when scaling the model. MobileNet achieves an ACC of 78.64 and an AUC of 82.21. The model reduces the number of parameters of the model by means of deep separable convolution. However, deep convolution is carried out independently on each channel without cross-channel information interaction, which may lead to limitations in extracting complex local features. ResNet achieves an ACC of 81.99 and an AUC of 86.59, which is an improvement over previous models. This enhancement indicates that the cross-linking of features at different levels in the model is meaningful for the optimization of coral classification tasks. ShuffleNet achieves an ACC of 84.79 and a Recall of 85.03. The significant improvement in Recall results indicates that the model has a strong ability to search positive examples. ShuffleNet uses channel shuffling to rearrange the channels of the feature map so that different groups of channel information can communicate. ShuffleNet improves the traditional residuals by introducing point-by-point grouping convolution and channel shuffling operations to improve the efficiency and performance of residuals. ShuffleNet may need to trade off between feature extraction power and computational volume while pursuing computational efficiency. In some cases, the ability to extract local features may be sacrificed in order to reduce the amount of computation. The ViT achieves an ACC of 72.60 and an AUC of 72.86. The model is able to capture global information in images efficiently without relying on convolutional operations but performs poorly in the bleached coral classification task. In the process of ViT dividing image into image patches, if the size of image blocks is not properly set or the segmentation method is unreasonable, part of image information may be lost or damaged. This may affect the ability of the model to extract and recognize local features. ConvNext achieves an ACC of 77.85 and an AUC of 62.16. The large drop in AUC results indicates that the classifier’s performance is worse from a more comprehensive perspective. ConvNext ignores fine differences in local features when capturing global context information. ML-Net achieves an ACC of 86.35 and an AUC of 90.78 and achieves optimal values on all evaluation indicators. ML-Net’s ACC value improved by 4.36 compared to ResNet, 1.56 compared to ShuffleNet, and 8.5 compared to ConvNext. ML-Net preserves the local features of coral images through residual structures and uses MBLA and MSLF to transfer local information between different hierarchical networks.

4.4. Comparison with State-of-the-Art Methods

According to the data in Table 2, ML-Net has achieved the best results on six indicators. By comparing different complex classification networks, we come to the following conclusions:

ViT introduces the mechanism of patch embedding and multi-head attention when classifying coral images. They improve the ability of the network to capture global information. However, they can be too sensitive to image noise and redundant information, thus interfering with the classification judgment of the model. ConvNext, which introduced global perceptions, shows that CNNs with larger convolutional kernels form better global awareness and yet they lost their advantage in coral image classification tasks;
Each classification network can achieve excellent results but the performance of lightweight CNNs is relatively poor. This may imply that in small-scale coral image datasets, models may not be sufficiently learned and optimized, affecting performance;
By comparing the results from ConvNext, ResNet, and ShuffleNet, we can conclude that overly large models are susceptible to complex high-level features in the coral classification task, thus affecting performance;
The results of ML-Net prove that improving the local perception ability of CNNs from multiple angles can make CNN achieve better performance in the coral image classification task.

4.5. Effectiveness of MBLA

To verify the effectiveness of the MBLA module, we conduct the following experiments. In Table 3, MBLA (w/o 1) means that 1 × 1 convolution is not used in MBLA. MBLA (w/o 3) means that the 3 × 3 convolution is not used in MBLA. MBLA (w/o 5) means that the 5 × 5 convolution is not used in MBLA. MBLA (w/o 3,5) means that only the 1 × 1 convolution is used in MBLA. MBLA (w/o 1,5) means that only the 3 × 3 convolution is used in MBLA. MBLA (w/o 1,3) means that only the 5 × 5 convolution is used in MBLA. ML-Net (w/o mbla) means that MBLA is not used in ML-net.

The results report that each structure of MBLA shows good performance on ML-Net. It is worth noting that MBLA using three branches at the same time can achieve the best performance, especially compared with ML-Net using only 5 × 5 convolution, whose six indicators are only 77.18, 81.77, 76.61, 78.37, 77.18, and 69.40, respectively. This shows that MBLA effectively improves ML-Net’s local sensing ability through the multi-branch structure. The following conclusions can be drawn from the results: compared with the 3 × 3 and 5 × 5 convolution, the 1 × 1 convolution is more suitable for extracting features such as texture in coral images. The 5 × 5 convolution is better at extracting features such as shapes, while the 3 × 3 convolution is better at focusing on both types of features. It is proven that MBLA’s multi-branch structure can improve the model performance when considering different convolution kernel sizes.

According to the confusion matrix shown in the Figure 5, we can clearly observe that the MBLA of each structure does show stronger attention to a certain class of images when processing images. Taking the example of MBLA using only 5 × 5 convolution, the confusion matrix shows that ML-Net is only 0.6210 accurate on Unbleached images. This shows that the larger the size of the convolution kernel, the less satisfactory the processing results are instead when processing Unbleached images. This is because larger convolutional kernel may introduce too much noise and unwanted details when capturing local features of the image.

Among other structures, ML-Net was found to not perform well in recognizing bleached coral images when MBLA was performing with only the 1 × 1 convolutional kernel. This is because bleached coral images show a high degree of consistency in color and texture in most regions; this consistency makes the small convolutional kernel ineffective when trying to learn and extract image features. The small convolutional kernel, due to its small receptive field, struggles to capture the more extensive and complex spatial structural information in the image, thus limiting its effectiveness when dealing with this type of image with relatively homogeneous and widely distributed features. This suggests that ML-Net may have encountered some challenges in processing unbleached coral images, resulting in relatively low classification performance for this category. This phenomenon may hinder the overall accuracy of ML-Net, as the uneven focus on different categories may lead to a decrease in the overall performance of the model.

4.6. Effectiveness of MSLF

To verify the effectiveness of the MSLF module, we conduct the following experiments. In Table 4, MSLF (w/o A) means that A-scale features are not used in MSLF. MSLF (w/o B) means that B-scale features are not used in MSLF. MSLF (w/o C) means that C-scale features are not used in MSLF. MSLF (w/o B,C) means that only A-scale features are used in MSLF. MSLF (w/o A,C) means that only B-scale features are used in MSLF. MSLF (w/o A,B) means that only C-scale features are used in MSLF. ML-Net (w/o mslf) means that MSLF is not used in ML-net. The meanings of A-scale, B-scale, and C-scale are expressed in Figure 4. A-scale, B-scale, and C-scale represent the large, medium, and small size feature branches in MSLF, respectively.

Experiments show that each structure of MSLF has good performance in ML-Net. It is worth noting that MSLF using three scale features at the same time can achieve the best performance, especially the use of C-scale features, whose six indexes are 75.40, 84.56, 83.03, 83.20, 83.11, and 75.68, respectively. This shows that MSLF successfully improves ML-Net’s local sensing capability through the multi-scale strategy.

However, we can also draw the following conclusions from the results. By combining large-scale features with 1 × 1 convolution, features such as the texture of coral images can be better learned. The combination of smaller-scale features and 1 × 1 convolution can better learn features such as the shape of coral images. This conclusion emphasizes that MSLF improves the performance of the model when considering the features of different scales and convolution kernel, making it more suitable for coral image classification.

Based on the confusion matrix in the Figure 6, we find that each MSLF structure seems to focus more on a particular class of images in the image classification task. Taking the example of MSLF using only C-scale features, we observe that ML-Net has the ACCuracy of only 0.7726 on Unbleached images. This may mean that the model’s performance is somewhat challenged when processing unbleached coral images, resulting in relatively low classification performance for this category. This imbalance of focus on a single category can negatively affect the accuracy of ML-Net as a whole, as the model may face certain performance degradations when dealing with other categories.

5. Discussions

The complexity of coral reef ecosystems directly affects the image classification of bleaching and healthy coral datasets. The complexity of coral reef ecosystems directly affects the image classification of bleaching and healthy coral datasets. There are a variety of coral species in the coral reef ecosystem and they are different in shape, color, texture, and other characteristics. This diversity makes it necessary to cover as many coral species as possible when building datasets to ensure the generalization ability of taxonomic models. Coral health is affected by a variety of environmental factors, such as water temperature, light, salinity, pollution, etc. These factors may vary significantly in different regions and seasons, leading to changes in the appearance characteristics of corals.

The single coral dataset used in this study may not fully reflect the diversity and complexity of coral reef ecosystems on a global scale, especially the effects of different geographic locations, seasonal changes, and environmental stresses (e.g., rising temperatures, pollution, etc.) on coral health status. Therefore, the generalizability and universality of the research results may be limited. In addition, this study mainly explores classification network algorithms without complex pre-processing of datasets. Simplified data enhancement methods may limit the generalization ability of the model to complex changes, affecting its performance in practical applications.

Future research should aim to collect more diverse and comprehensive datasets of coral images, covering coral reef ecosystems in different geographical locations, seasons, and environmental conditions. This will help the model learn more complex patterns about the relationship between coral health and environmental factors, improving the accuracy and generalization of the classification. It will also help it to explore and implement effective noise processing strategies, such as image denoising algorithms, light correction methods, etc., to reduce the impact of noise factors such as ocean dust particles and different light conditions on image quality and classification performance. At the same time, noise data can be introduced into the model training process to enhance the robustness and anti-interference ability of the model. Develop or adopt more comprehensive image enhancement techniques, such as color enhancement, texture synthesis, shape transformation, etc., to more realistically simulate the complex changes in coral reef images in the real environment. These techniques can help the model learn more about the characteristics of coral appearance changes, thereby improving its classification performance in different scenarios.

6. Conclusions

The efficient detection of coral health status provides a basis for detecting the balance of marine ecology. In order to fully extract local information such as color and texture from coral images, this paper proposes a multi-local perception network (ML-Net) for image classification of healthy and bleached corals. The model makes use of local features in the shallow network by the residuals feature extraction module. Then, MBLA aims to improve the perception of local features through convolutional layers of convolution kernels of different sizes. MSLF fuses features at different levels to enhance detailed functionality in low-level semantic information. Therefore, ML-Net processes image information in both horizontal core and vertical directions. In addition, the network design skips connections so that the information of the low-level network can be reused. Finally, the fusion operation of the network makes different features complete each other’s information. We compared the dataset with seven networks. Through the presentation of table and confusion matrix, we verify the advantage of ML-Net in coral image classification.

Author Contributions

S.W. and N.-L.C. conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft. Y.-D.S. and T.-T.W. conceived and designed the experiments, performed the experiments, prepared figures and/or tables, and approved the final draft. H.-J.Z. analyzed the data, authored or reviewed drafts of the article, and approved the final draft. J.W. and T.-Q.G. performed the computation work, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft. L.M. performed the experiments, analyzed the data, prepared figures and/or tables, and approved the final draft. H.-R.M. and L.X. performed the experiments, performed the computation work, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft. All authors have read and agreed to the published version of the manuscript.

Funding

This research was financially supported by National Key Research & Development Program of China (No. 2022YFD2401301), National Natural Science Foundation of China (No. 42367054), Hainan Provincial Research & Development Program (Nos. ZDYF2022SHFZ034 and ZDYF2022SHFZ032), Natural Science Foundation of Hainan Province (Nos. 421QN195, 421QN196, and 322QN227), Open Project of State Key Laboratory of Marine Resource Utilization in South China Sea (Nos. MRUKF2023005 and MRUKF2023002), Collaborative Innovation Center Project of Hainan University (No. XTCX2022HYC11), and Hainan University Start-up Funding for Scientific Research (Nos. KYQD[ZR]-21015 and KYQD[ZR]-21033).

Data Availability Statement

All data required can be requested by contacting the author.

Conflicts of Interest

Author Tuan-Qi Guo was employed by the company Hainan Qingxiao Environmental Testing Co., Ltd. Authors Hong-Jin Zhang and Hao-Ran Ma were employed by the company Hainan Qianchao Ecological Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Fisher, R.; O’Leary, R.A.; Low-Choy, S.; Mengersen, K.; Knowlton, N.; Brainard, R.E.; Caley, M.J. Species richness on coral reefs and the pursuit of convergent global estimates. Curr. Biol. 2015, 25, 500–505. [Google Scholar] [CrossRef] [PubMed]
Jiang, J.; Wang, A.; Deng, X.; Zhou, W.; Gan, Q.; Lu, Y. How Symbiodiniaceae meets the challenges of life during coral bleaching. Coral Reefs 2021, 40, 1339–1353. [Google Scholar] [CrossRef]
Schoepf, V.; Grottoli, A.G.; Levas, S.J.; Aschaffenburg, M.D.; Baumann, J.H.; Matsui, Y.; Warner, M.E. Annual coral bleaching and the long-term recovery capacity of coral. Proc. R. Soc. B Biol. Sci. 2015, 282, 20151887. [Google Scholar] [CrossRef]
Elma, E.; Gullström, M.; Yahya, S.A.; Jouffray, J.B.; East, H.K.; Nyström, M. Post-bleaching alterations in coral reef communities. Mar. Pollut. Bull. 2023, 186, 114479. [Google Scholar] [CrossRef] [PubMed]
Hoegh-Guldberg, O.; Poloczanska, E.S.; Skirving, W.; Dove, S. Coral reef ecosystems under climate change and ocean acidification. Front. Mar. Sci. 2017, 4, 158. [Google Scholar] [CrossRef]
Urbina-Barreto, I.; Garnier, R.; Elise, S.; Pinel, R.; Dumas, P.; Mahamadaly, V.; Facon, M.; Bureau, S.; Peignon, C.; Quod, J.P.; et al. Which method for which purpose? A comparison of line intercept transect and underwater photogrammetry methods for coral reef surveys. Front. Mar. Sci. 2021, 8, 636902. [Google Scholar] [CrossRef]
Viswambharan, D.; Sreenath, K.R.; Anto, A.; Raju, A.K.; Mohan, S.; Jasmine, S.; Joshi, K.K.; Rohit, P. New distributional record of Acroporids along the eastern Arabian Sea. Reg. Stud. Mar. Sci. 2021, 41, 101550. [Google Scholar] [CrossRef]
Ani Brown Mary, N.; Dharma, D. A novel framework for real-time diseased coral reef image classification. Multimed. Tools Appl. 2019, 78, 11387–11425. [Google Scholar] [CrossRef]
Beijbom, O.; Edmunds, P.J.; Kline, D.I.; Mitchell, B.G.; Kriegman, D. Automated annotation of coral reef survey images. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 1170–1177. [Google Scholar]
Shakoor, M.H.; Boostani, R. A novel advanced local binary pattern for image-based coral reef classification. Multimed. Tools Appl. 2018, 77, 2561–2591. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Liu, Z.; Mao, H.; Wu, C.Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11976–11986. [Google Scholar]
Lumini, A.; Nanni, L.; Maguolo, G. Deep learning for plankton and coral classification. Appl. Comput. Inform. 2020, 19, 265–283. [Google Scholar] [CrossRef]
Reshma, B.; Rahul, B.; Sreenath, K.R.; Joshi, K.K.; Grinson, G. Taxonomic resolution of coral image classification with Convolutional Neural Network. Aquat. Ecol. 2023, 57, 845–861. [Google Scholar] [CrossRef]
Borbon, J.; Javier, J.; Llamado, J.; Dadios, E.; Billones, R.K. Coral Health Identification using Image Classification and Convolutional Neural Networks. In Proceedings of the 2021 IEEE 13th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM), Manila, Philippines, 28–30 November 2021; pp. 1–6. [Google Scholar]
Gómez-Ríos, A.; Tabik, S.; Luengo, J.; Shihavuddin, A.S.M.; Herrera, F. Coral species identification with texture or structure images using a two-level classifier based on Convolutional Neural Networks. Knowl.-Based Syst. 2019, 184, 104891. [Google Scholar] [CrossRef]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. In Proceedings of the 2019 Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Effificient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Radosavovic, I.; Kosaraju, R.P.; Girshick, R.; He, K.; Dollár, P. Designing network design spaces. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10428–10436. [Google Scholar]
Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]

Figure 1. Corals are shown (a) as healthy corals and (b) as bleached corals.

Figure 2. Overall structure of ML-Net in (b) where (a) indicates residual feature extraction block.

Figure 3. Multi-branch local adaptive block.

Figure 4. Multi-scale local fusion block, where A, B, and C represent features at three different scales.

Figure 5. Confusion matrix for the ablation experiment of the MBLA module. B represents the bleached coral category and UnB represents the healthy coral category.

Figure 6. Confusion matrix for the ablation experiment of the MSLF module. B represents the bleached coral category and UnB represents the healthy coral category.

Table 1. The pseudo code for the ML-Net framework.

σ

represents the activation function, Conv represents convolution layer, RFE represents the residual feature extraction block.

Table 1. The pseudo code for the ML-Net framework.

σ

represents the activation function, Conv represents convolution layer, RFE represents the residual feature extraction block.

The Pseudo Code for ML-Net Framework
Function Model (Input): x = σ (Conv (Input)) x = RFE(x) x = RFE(x) x, p1 = Conv (x) x, p2 = MBLA (x) x, p3 = Conv (x) x = MBLA (x) pp = MBLA (p1, p2, p3) x = torch.cat((pp, x),dim = 1) x = FC (x) return x Input_image = Image.open (image_path) ACC = Model (Input_image)

Table 2. Classification results of each model on the dataset. The best results of the evaluation indicators are shown in bold.

Models	ACC	AUC	F1	Precision	Recall	AP
EfficientNet	54.25	49.91	38.16	29.43	54.25	45.75
MobileNet	78.64	82.21	78.50	78.71	78.64	70.01
RegNet	79.87	83.09	79.89	79.93	79.87	70.74
ResNet	81.99	86.59	81.92	82.03	81.99	74.06
ShuffleNet	84.79	87.50	84.81	85.03	85.03	76.51
ViT	72.60	72.86	72.42	72.58	72.60	63.02
ConvNext	77.85	62.16	77.77	77.84	77.85	68.87
ML-Net	86.35	90.78	86.34	86.34	86.35	79.42

Table 3. Ablation results of MBLA module. The best results of the evaluation indicators are shown in bold.

	ACC	AUC	F1	Precision	Recall	AP
MBLA(w/o 1)	84.12	88.23	84.10	84.10	84.12	76.41
MBLA(w/o 3)	84.12	84.12	84.01	84.32	84.12	84.12
MBLA(w/o 5)	84.12	78.42	84.08	84.12	84.12	76.68
MBLA(w/o 3,5)	83.45	84.76	83.43	83.43	83.45	75.57
MBLA(w/o 1,5)	84.56	72.99	84.56	84.55	84.56	76.91
MBLA(w/o 1,3)	77.18	81.77	76.61	78.37	77.18	69.40
ML-Net(w/o MBLA)	82.89	79.95	82.84	82.90	82.89	75.11
ML-Net	86.35	90.78	86.34	86.34	86.35	79.42

Table 4. Ablation results of MSLF module. The best results of the evaluation indicators are shown in bold.

	ACC	AUC	F1	Precision	Recall	AP
MSLF(w/o A)	83.11	84.90	83.11	83.12	83.11	74.88
MSLF(w/o B)	83.67	84.83	83.67	83.66	83.67	75.69
MSLF(w/o C)	83.67	86.63	83.66	83.66	83.67	75.78
MSLF(w/o B,C)	83.33	79.51	83.33	83.34	83.33	75.20
MSLF(w/o A,C)	83.22	83.16	83.18	83.22	83.22	75.49
MSLF(w/o A,B)	75.40	84.56	83.03	83.20	83.11	75.68
ML-Net(w/o MSLF)	82.77	82.77	82.67	82.93	82.93	75.40
ML-Net	86.35	90.78	86.34	86.34	86.35	79.42

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, S.; Chen, N.-L.; Song, Y.-D.; Wang, T.-T.; Wen, J.; Guo, T.-Q.; Zhang, H.-J.; Mo, L.; Ma, H.-R.; Xiang, L. ML-Net: A Multi-Local Perception Network for Healthy and Bleached Coral Image Classification. J. Mar. Sci. Eng. 2024, 12, 1266. https://doi.org/10.3390/jmse12081266

AMA Style

Wang S, Chen N-L, Song Y-D, Wang T-T, Wen J, Guo T-Q, Zhang H-J, Mo L, Ma H-R, Xiang L. ML-Net: A Multi-Local Perception Network for Healthy and Bleached Coral Image Classification. Journal of Marine Science and Engineering. 2024; 12(8):1266. https://doi.org/10.3390/jmse12081266

Chicago/Turabian Style

Wang, Sai, Nan-Lin Chen, Yong-Duo Song, Tuan-Tuan Wang, Jing Wen, Tuan-Qi Guo, Hong-Jin Zhang, Ling Mo, Hao-Ran Ma, and Lei Xiang. 2024. "ML-Net: A Multi-Local Perception Network for Healthy and Bleached Coral Image Classification" Journal of Marine Science and Engineering 12, no. 8: 1266. https://doi.org/10.3390/jmse12081266

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

ML-Net: A Multi-Local Perception Network for Healthy and Bleached Coral Image Classification

Abstract

1. Introduction

2. Related Work

2.1. Manual Features for Coral Classification

2.2. Deep Neural Networks

2.3. CNN for Coral Classification

3. Proposed Method

3.1. Residual Feature Extraction Block

3.2. Multi-Branch Local Adaptive Block

3.3. Multi-Scale Local Fusion Block

4. Experiments

4.1. Datasets and Evaluation Metrics

4.2. Implementation Details

4.3. Experimental Results

4.4. Comparison with State-of-the-Art Methods

4.5. Effectiveness of MBLA

4.6. Effectiveness of MSLF

5. Discussions

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI