1. Introduction
The Food and Agriculture Organization of the United Nations lists bananas as the fourth-largest food crop globally, ranking behind rice, wheat, and maize [
1]. Bananas are grown in over 130 countries worldwide, and China stands as the third largest producer with a growing area of 350,000 to 400,000 hectares. In 2022, China produced 1.235 million tons of bananas [
2]. While various aspects of banana production, such as harvesting, cleaning, packaging, and transportation, have undergone mechanization, banana crown cutting still heavily relies on manual labor [
3]. However, with the continuous expansion of China’s banana industry, it is crucial to explore intelligent automatic crown cutting technology to overcome the limitations associated with manual crown cutting. Manual crown cutting is labor-intensive, inefficient, and costly. Introducing advanced automatic crown cutting technology for bananas can enhance post-harvest handling, reduce labor expenses, and increase income for fruit growers.
In recent years, numerous academics have conducted extensive research on the identification and segmentation of fruits and vegetables by fruit and vegetable picking robots. The primary objectives of these studies are to enhance the effectiveness of fruit and vegetable production while achieving greater intelligence and automation in the production process [
4,
5]. Among these research endeavors, the robot’s ability to perform the necessary tasks relies heavily on precise segmentation of fruit and vegetable targets. When processing bananas after picking, preserving the integrity of the banana crown becomes crucial. Since both the banana finger and the banana crown exhibit a green color, achieving accurate segmentation poses additional challenges. Therefore, it is of utmost importance to achieve precise segmentation of the banana crown to enable automatic crown cutting of the banana.
There are not many studies on the banana crown segmentation method currently, but it can be explored by examining the green fruit and vegetable division method. The two main techniques used for segmenting fruits and vegetables nowadays are deep learning methods and conventional image segmentation methods. Among the conventional image segmentation methods, the most widely used ones are the Otsu algorithm, K-means clustering algorithm, and the fuzzy C-means (FCM) algorithm. Cui et al. [
6] compared multiple color spaces and chose the R-G color components to segment kiwifruits using the Otsu algorithm, successfully separating the fruit from the background area. Wuzor et al. [
7] utilized the K-means clustering algorithm to separate the guava region from the background, followed by watershed segmentation and morphological manipulation to accomplish single-guava segmentation. Marlinda et al. [
8] used the fuzzy C-means (FCM) algorithm to separate mangoes from the background and measured their maturity.
Traditional image segmentation techniques are easily influenced by environmental elements in real-world application scenarios. Therefore, deep learning techniques with high accuracy and robustness are preferred to ensure the stability of selecting robot operations. Deep learning methods are trained on a large number of samples to extract deeper features, making them suitable for scenarios where both the target and background are green. Consequently, there is a lot of research on green fruit and vegetable segmentation using deep learning methods. For example, Li et al. [
9] combined the edge features and advanced features of UNet with the Atrous Spatial Pyramid Pooling (ASPP) structure to segment green apples in intricate orchard landscapes. Hussain et al. [
10] employed transfer learning on the Mask R-CNN technique to segregate samples of green fruits and stems. Wang et al. [
11] proposed a unique deep learning-based fruit segmentation method SE-COTR, which achieved accurate real-time segmentation of green apples, with an average segmentation accuracy of 61.6%. Liu et al. [
12] suggested a DLNet model with an average accuracy of 80.9% for accurately segmenting green fruits in a fuzzy environment. Ma et al. [
13] proposed using a deep convolutional neural network to detect cucumber illness symptoms and separate them from leaves with an accuracy of 93.4%.
The following are some examples of how DeepLabv3+ has been used to segment green targets. Yan et al. [
14] improved DeepLabv3+ and proposed a method for tea segmentation and picking point localization based on lightweight convolutional neural networks to address the issue of tea bud picking points in real environments, achieving a Mean Intersection over Union (MIoU) of 91.85%. Zhang et al. [
15] enhanced DeepLabv3+ to perform high-precision and rapid lettuce segmentation in complex background and lighting conditions. Yu et al. [
16] utilized the Swin transformer as a feature extraction network and incorporated a convolution block attention module into DeepLabv3+ to obtain the Swin-DeepLabv3+ model for weed segmentation in soybean fields, achieving an MIoU of 91.53%. Deng et al. [
17] employed DeepLabv3+ to semantically segregate seedlings and weeds to get weed location information, with the DeepLabv3+ model achieving a pixel accuracy of up to 92.2%. Li et al. [
18] utilized the mixed attention method in DeepLabv3+ to segment cucumber leaves and lesions, achieving an MIoU of 81.23%.
Currently, significant progress has been made in segmenting green targets, considering both the target and background are green. The DeepLabv3+ semantic segmentation algorithm has been widely applied and proven to deliver high-precision and swift segmentation of green targets even in complex backgrounds and challenging lighting conditions. As a result, DeepLabv3+ was selected for an upgrade to achieve accurate segmentation of banana crowns.
To enhance the efficiency of banana crown cutting and enable intelligent cutting, a lightweight semantic segmentation model capable of accurately and swiftly segmenting banana crowns is necessary. Consequently, an upgraded DeepLabv3+ model is proposed in this study, incorporating the following enhancements:
- (1)
Substituting the backbone network of the traditional DeepLabv3+ model with MobilenetV2, reducing computational requirements and training time.
- (2)
Adding the Shuffle Attention mechanism to the Atrous Spatial Pyramid Pooling (ASPP) module and replacing the activation function with Meta-ACONC. This results in Banana-ASPP, a novel feature extraction module that facilitates the processing of high-level features.
- (3)
Introducing the Multi-scale Channel Attention Module (MS-CAM) to the Decoder to improve the integration of features from multiple semantics and scales.
As a result, a highly accurate and robust banana crown segmentation model is generated, poised to improve the efficiency and intelligence of banana crown cutting.
4. Conclusions
This paper proposes a method for the segmentation of banana crowns based on an improved DeepLabv3+ model, aiming to achieve accurate and rapid segmentation while enabling deployment on mobile devices. Firstly, the traditional backbone network of the DeepLabv3+ model is replaced with MobilenetV2, reducing the model’s weight, training time, and the number of parameters, while improving model speed. Then, the Atrous Spatial Pyramid Pooling (ASPP) module is enhanced by adding the Shuffle Attention mechanism and switching out the activation function for Meta-ACONC, creating a new feature extraction module called Banana-ASPP that excels at extracting high-level features. Furthermore, the Multi-scale Channel Attention Module (MS-CAM) is incorporated into the Decoder to effectively combine attributes from various meanings and scales, resulting in more comprehensive information on the banana crown.
According to experimental findings, the proposed method for banana crown segmentation based on the improved DeepLabv3+ model achieves a Mean Intersection over Union (MIoU) of 85.75%, a Mean Pixel Accuracy (MPA) of 91.41%, with model parameters totaling 5.881 M and a processing speed of 61.05 f/s. The experiments demonstrate that this research’s suggested method for banana crown segmentation based on the improved DeepLabv3+ model can effectively segment the banana crown, providing substantial technological support for adaptive diameter adjustment of the banana crown cutting device.
In future studies, we plan to collect banana crown images from various locations and cultivars to create more diverse datasets, enabling the model to learn generic feature representations for banana crowns and enhancing its applicability. Additionally, we aim to explore the utilization of this model in combination with an RGB-D camera to determine the cutting radius of the banana crown. Further modifications to the model are necessary to increase its speed and improve its compatibility with automation devices for banana crown cutting. This includes reducing the number of model parameters and enhancing the performance of real-time image segmentation.