**1. Introduction**

As an important part of ecological environment protection, wildlife protection is crucial for maintaining the balance and stability of the whole ecosystem [1]. Presently, the widely used wildlife monitoring methods are mainly based on GPS collar systems [2,3], infrared camera technology [4,5], and remote sensing monitoring technology [6,7]. Although GPS collar systems can obtain wildlife locations with a great accuracy and precision, this monitoring method cannot obtain wildlife images, which poses a problem for researchers. The monitoring image, which can accurately estimate species diversity, population size, and habitat distribution, is an important part of wildlife protection. Thus, the monitoring image can provide a scientific basis for wildlife resource conservation [8]. Infrared camera technology and remote sensing monitoring technology can capture wildlife images, but these methods also have limitations. Infrared camera images need to be saved on secure digital (SD) memory cards, thus resulting in long monitoring delays and high labor costs. Remote sensing systems are capable of obtaining species diversity over a large geographic area, but cannot monitor wildlife at an individual level. In order to avoid these problems, the use of wireless multimedia sensor networks (WMSNs), which are used as monitoring carriers, has received extensive attention from researchers [9].

Wireless sensor nodes usually rely on batteries for their power supply, but this setup is difficult to recharge in remote areas. Reducing the consumed energy and extending the life cycle of sensor networks has recently become a hot topic of research [10]. Meanwhile, the energy consumption of the sensor node is mainly concentrated in the data transmission process. Therefore, the progressive transmission of monitoring images in WMSNs can effectively reduce the energy consumption and improve the life cycle of nodes. At the same time, extraction of the image is the basis of progressive transmission [11,12]. The current research work cannot achieve the extraction of wildlife monitoring images with a complex background and uneven illumination. Against this background, this study proposes an image extraction algorithm with a high accuracy for wildlife monitoring images. In this paper, we propose an extraction method for wildlife monitoring images based on the combination of color space construction and Hermite transform for WMSNs. Firstly, we reconstruct a color space model for utilizing the novel color enhancement method to extract the color parameter. The color enhancement method utilizes a bilateral filter with different kernels to process the luminance components based on the Retinex framework. Then, we construct a filter through Hermite transform to acquire texture information in the wildlife monitoring images. Finally, according to the characteristics of the pixel, an adaptive mean-shift algorithm is presented as an ideal clustering model to implement the extraction of the foreground area in the image. In this work, we took the wildlife of the Saihan Ula Nature Reserve in Inner Mongolia province as targets, and applied the proposed method to extract the wildlife images captured by the WMSN monitoring system. The experimental results showed that the proposed method achieved effective extraction results for the wildlife monitoring images and provided effective support for reducing the power consumption when transmitting wildlife monitoring images.

#### **2. Related Work**

Currently, image extraction methods mainly include edge detection, threshold extraction [13,14], the clustering algorithm [15,16], saliency detection [17], and semantic segmentation [18]. Image extraction based on edge detection preserves the edge information of the input image by calculating the derivative between different pixels [19]. However, the algorithm is affected by noise, which might be misjudged as a boundary, thus reducing the edge position accuracy; this is a general disadvantage of the above methods. Threshold extraction, one regional extraction technique, divides pixels into several categories, with the advantage of a low computational complexity [20]. However, this algorithm is applicable to images in which the foreground and background are in different grayscale ranges; the cluster algorithm divides similar features of color space information into a specific group [21]. The saliency detection algorithm extracts the region of the foreground by simulating the visual characteristics of human beings. Considering the characteristics of a high computational complexity, the algorithm may not be applied to real-time applications [22]. Semantic segmentation describes the process of associating each pixel of an image with a class label [23]. The algorithm, based on semantic segmentation, has a high accuracy, but requires a large number of wildlife monitoring images to be marked, resulting in high labor costs. Meanwhile, the high computational complexity of the algorithm makes it unsuitable for WMSNs. It is difficult to extract wildlife images due to complex backgrounds and uneven illumination, which varie significantly in non-uniform illumination in different seasons. Due to the disadvantages of the above extraction methods, Shehu A et al. [24] proposed an edge detection algorithm for the pixel detection of wildlife images captured by sensor nodes through the calculation of the gray threshold and gradient amplitude. To enhance the accuracy of edge detection,

Feng et al. [25] defined an ideology of saliency detection by introducing a positional saliency map. Combined with the edge detection method to extract the images captured by the wireless sensors, this method optimizes the extraction effect and realizes the extraction of wildlife images. Tian et al. [26] segmented multi-colored wildlife images into watershed regions based on watershed transformation. Then, they used the traditional mean-shift algorithm to cluster the watershed regions, which preserves the edge information and effectively suppresses the occurrence of over-segmentation; nevertheless, the traditional mean-shift method based on color space information cannot accurately segment images captured in a complex environment, because it cannot consider the texture parameters in the foreground and background regions. The traditional mean-shift image segmentation algorithm was improved by Akbulut Y et al. [27] in 2018. This method combines texture parameters with color space information and then uses edge-preserving filtering to preserve as much edge information as possible, thus using the mean-shift algorithm to achieve superior image extraction. The algorithm is successful in image extraction within a certain range, but a fixed bandwidth must be set manually, which is not suitable for the real-time extraction of images, and the high computational complexity of the algorithm makes it unsuitable for use in WMSNs with a limited power consumption. We try to utilize the adaptive mean-shift algorithm combined with color information and texture parameters to extract the WMSN wildlife monitoring images.

#### **3. Materials**

In this study, a WMSN monitoring system designed by our laboratory was used to perform the task of capturing images of wildlife. The system was deployed in the Saihan Ula National Nature Reserve in the Inner Mongolia province, which is located in the southern mountainous area of the Greater Khingan Range. It is a forest ecological nature reserve, which has a medium-temperate, semi-humid, and warm climate. This reserve is home to 37 primary species of wildlife, including three kinds of secondary, nationally protected mammals, such as *Cervus elaphus*, *Naemorhedus goral*, and *Lynx lynx*.

The WMSN wildlife monitoring system is used to capture wildlife images using industrial-grade cameras with terminal node equipment embedded. The system, which is mostly composed of coordination nodes, terminal nodes, gateway nodes, and servers, achieves real-time, remote monitoring images. The detailed distributions are depicted in Figure 1.

**Figure 1.** Wildlife monitoring system.

The WMSN node establishes a sensor network in a self-organizing manner by using the ZigBee network protocol. Detailed parameters are given in Table 1. When the wild animals enter the monitoring range, the camera is triggered by the infrared sensor of the terminal node to capture the images. The captured images are then sent to the coordination nodes in a multi-hop manner. After the coordinating nodes successfully receive the image data information, the information is transmitted to the server center through the gateway node in the form of a 4G signal. If there is no target shown in the monitoring region, the nodes stop working to reduce the energy consumption.


**Table 1.** Parameters of the wireless multimedia sensor network (WMSN) node.

More than 20 sensor nodes were deployed in Saihan Ula National Nature Reserve, and the distance between every two sensor nodes was 150 m. In this study, more than 2000 images of 12 species of wildlife were collected in the nature reserve using the monitoring system, including *Cervus elaphus* and *Lynx lynx*, which are nationally protected animals. The monitoring images are shown below in Figure 2.

**Figure 2.** Wildlife monitoring images: (**a**) *Capreolus pygargus*; (**b**) *Sus scrofa*; (**c**) *Cervus elaphus*.

### **4. Experimental Methods**

A novel image extraction method is proposed to process the wildlife monitoring images captured by the WMSN monitoring system, as depicted in Figure 3. The target region that contains the wildlife is the major object of interest, whereas the background regions only provide reference information. The steps of the algorithm are as follows:


**Figure 3.** Flow-process diagram of the proposed method; LUV:CIELUV.

#### *4.1. Color Space Information Extraction*

A traditional red-green-blue (RGB) color space may not extract the desired color parameters, as RGB color space components always have strong correlations in wildlife monitoring images. Therefore, constructing the color space model to extract color parameters is a significant procedure in our proposed method. However, the weakened quality of the acquired wildlife monitoring images aggravates the difficulty of completing the color space model construction due to the different illumination variations in wild environments.

In our algorithm, the CIELUV (LUV) color space is applied in the output of luminance and chrominance components step, as it has two distinguished advantages over other color spaces. One is that it has non-correlation between color components, and the other is that it has been validated to extract detailed edge regions in the color image [28].

To obtain more detailed color parameters, novel color enhancement based on the Retinex method was introduced into the color space model. The classic Retinex model decomposes images into reflectance and illumination:

$$\log(I(\mathbf{x}, y)) = \log(R(\mathbf{x}, y)) + \log(L(\mathbf{x}, y))\tag{1}$$

where *I*(*x*, *y*) is the observed pixel in the monitoring image at the location of (*x*, *y*) [29,30], *R*(*x*, *y*) is the reflectance, and *L*(*x*, *y*) denotes the illumination of the image.

In the theory of multi-scale Retinex, a plurality of individual convolutions with different Gaussian kernels can be applied to the original *I*(*x*, *y*) to approximate the component of *L*(*x*, *y*) by using different weights, as shown in Equation (2), where σ*<sup>i</sup>* is the Gaussian Kernel modulus, and the sum of the weights *wi* is equal to 1.

$$\begin{aligned} R\_{MSR}(\mathbf{x}, \mathbf{y}) &= \sum\_{i=1}^{n} w\_i [\log(I(\mathbf{x}, \mathbf{y})) - \log(\mathcal{g}(\sigma\_i) \times I(\mathbf{x}, \mathbf{y}))] \\ \mathcal{g}(\sigma\_i) &= \frac{1}{2\pi\sigma\_i^2} e^{-\frac{((\mathbf{x}-\mu)^2 + (\mathbf{y}-\mu)^2)}{2\pi\sigma\_i^2}} \end{aligned} \tag{2}$$

The novel color enhancement was inspired by the traditional Retinex framework, and it processes the wildlife monitoring images in the LUV color space. The color shift occurs when the traditional Retinex with a Gaussian Filter simultaneously changes the luminance and chrominance. Therefore, the proposed method utilizes a bilateral filter in the luminance channel to avoid color shift, while the traditional Retinex algorithm utilizes Gaussian filtering in different color space channels. The bilateral filters with different kernels are used to process the L channel by the same weights considering the uneven distribution in the spatial domain, as shown in Equation (3), where σ*<sup>i</sup>* is the kernel coefficient.

$$R\_{\text{Bilatral}-\text{MSR}}(\mathbf{x}, \mathbf{y}) = \sum\_{i=1}^{n} w\_i [\log(I(\mathbf{x}, \mathbf{y})) - \log(g\_{\text{Bilatral}}(\sigma\_i) \times I(\mathbf{x}, \mathbf{y}))] \tag{3}$$

The workflow of the proposed method for reconstructing the color space model is denoted in Figure 4, which can be divided into three main parts: (a) color space transformation, (b) the color enhancement with the filter in the middle, and (c) the extraction of the color parameters. The whole process of the method consists of the following steps:


**Figure 4.** Flow-process diagram of the proposed method.

## *4.2. Texture Information Extraction*

After extracting the color parameters of the wildlife image, we can determine that different areas of the same texture are susceptible to color changes in the image. Here, the texture information is obtained to ensure the integrity of the extraction of the foreground area. In this study, the texture information of the images is extracted by Hermite transform, and the texture information is used as a vital parameter component of the mean-shift algorithm. The continuous Hermite function is defined as follows [31,32]:

$$H\_n(\mathbf{x}) = (-1)^n e^{\mathbf{x}^2/2} \frac{d^n}{d\mathbf{x}^n} e^{-\mathbf{x}^2/2}, n = 0, 1, 2, \dots \tag{4}$$

Hermite transform of a signal is defined as

$$f\_n(t\_0) = \int f(t)H\_n(t\_0 - t)V\_n^2(t\_0 - t)dt\tag{5}$$

where *V*(*t*) is a Gaussian window function defined as follows:

$$V(t) = \frac{1}{\sqrt{\sigma \sqrt{2\pi}}} e^{-\frac{\left(t-\mu\right)^2}{2\sigma^2}}\tag{6}$$

*fn*(*t*0) is obtained by convolution with the Hermite analytic function *dn*(*t*) by the input signal *f*(*t*). This is described in terms of the window and the Hermite polynomials as:

$$d\_{\mathbb{R}}(t) = \frac{1}{\sigma\sqrt{2\pi}}H\_{\mathbb{R}}(t)e^{-\frac{(t-\mu)^2}{2s^2}}\tag{7}$$

Because the function is rotational symmetry and separable, the one-dimensional Hermite space can be transformed into a two-dimensional space. The formula is as follows:

$$d\_{n-m,m}(\mathbf{x}, \mathbf{y}) = d\_{n-m}(\mathbf{x}) d\_m(\mathbf{y}) \tag{8}$$

where*n* − *m* and *m* denote the order in *x* and *y* directions, respectively [33]. The filter of *dn*−*m*,*m*(*x*, *y*) has the characteristics of continuous attenuation, and is not as steep and discontinuous as the ideal filter, which can effectively preserve the edge information of the image. The perspective of the Hermite analytic function is shown in Figure 5.

**Figure 5.** Perspective of Hermite analytic function **H12**.

Finally, the input image *I*(*x*, *y*) can be transformed into *dn*−*m*,*m*(*x*, *y*) as [34]

$$I\_{n-m,m}(\mathbf{x}\_0, y\_0) = \iint I(\mathbf{x}, y) d\_{n-m,m}(\mathbf{x} - \mathbf{x}\_0, y - y\_0) d\mathbf{x} dy \tag{9}$$

Since all the Hermite analytic functions were obtained by multiplication of the Gaussian window function and Hermite polynomial, in order to reduce the computational complexity of the method, several polynomials were chosen as some of the polynomials were not capable of extracting valuable texture parameters. The steps were as follows:


Taking the input image of Figure 4 as an example, the image was convolved with different polynomials to obtain different texture parameters, as shown in Figure 6.

**Figure 6.** Polynomial convolution texture information; (**a**) H11; (**b**) H12; (**c**) H31; (**d**) H23; (**e**) H41; and (**f**) H22.

As shown in Figure 6, when these polynomials were convolved with the input image, the polynomials H12, H22, H23, H31, and H41 were determined to extract useful texture parameters in different directions, in which the texture parameters directed by H12, H22, and H31 in Figure 6 were particularly conspicuous; however, H11 did not extract any texture information. Instead, a parameter image, which is similar to the grayscale image of the input image, was generated. Thus, we next constructed the filter to extract the texture parameter. The architecture of the proposed method for extracting the Hermite texture parameter is demonstrated in Figure 7.

**Figure 7.** Flow-process diagram of texture extraction.

The extracted texture parameter was saved for further processing to obtain a texture image, as shown in Figure 8.

**Figure 8.** Texture parameters extraction; (**a**) Input image; (**b**) texture image.

#### *4.3. Adaptive Mean-Shift Algorithm*

Wildlife monitoring images of different species have different backgrounds and different light intensities. Therefore, after the color and texture parameters were received, we utilized the adaptive mean-shift algorithm to adaptively select the bandwidth according to the pixel characteristics of the wildlife images, which guarantees the extraction quality of the foreground region.

The kernel density estimated with a fixed bandwidth for a set of data points {*xi*, *i* = 1, 2, ... *n*} [35,36] in the mean-shift algorithm is defined as

$$p(\mathbf{x}) = \frac{1}{nh} \sum\_{i=1}^{n} \mathbb{K}\left(\frac{\mathbf{x} - \mathbf{x}\_{i}}{h}\right) \tag{10}$$

where *K*(*x*) is a symmetric kernel function with respect to the origin, and the integration of its domain is 1 [37]. The mean shift algorithm usually uses a Gaussian function as a kernel function in which *h* represents the fixed bandwidth of the core.

In this study, the adaptive mean-shift algorithm was used to cluster pixel data, which means that different sampling data *xi* adopted different bandwidths *h* = *h*(*xi*). The variable bandwidth kernel function density estimate [38] is defined as

$$p(\mathbf{x}\_i) = \frac{1}{n} \sum\_{i=1}^n \frac{1}{h(\mathbf{x}\_i)} \mathcal{K} \left( \frac{\mathbf{x} - \mathbf{x}\_i}{h(\mathbf{x}\_i)} \right) \tag{11}$$

$$h(\mathbf{x}\_i) = h\_0 \times \sqrt{\frac{r}{f(\mathbf{x}\_i)}}\tag{12}$$

In the above function, the pixel point at the center of the grayscale image was taken as the initial center point, and the bandwidth *h*(*x*1) was calculated from this point.

$$h\_0 = \frac{1}{n \times n} \sum\_{\mathbf{x}=1}^n \sum\_{y=1}^n \left| M - I(\mathbf{x}, y) \right| \tag{13}$$

where *h*<sup>0</sup> is the average offset of all pixel values and median *M* in the image. The probability that a pixel has a gray level of *xi* is *f*(*xi*). Then, the scale factor *r* is defined as

$$\log \tau = \frac{1}{m} \sum \log \left( f(\mathbf{x}\_{\bar{i}}) \right) \tag{14}$$

where *m* is the number of gray levels of the image. The center point iteration of the kernel function is given in Figure 9, where two-dimensional Gaussian data was randomly generated as coordinates of the data points. The mean-shift vector shifts to where the sample point changes most, and is also the direction of the density gradient.

**Figure 9.** The center point iteration of the kernel function.

Segmentation and extraction results with different mean-shift algorithms are given in Figure 10. The first column of Figure 10 represents the original image. Figure 10b,f are the best segmentation results obtained by multiple experiments for Figure 10a,e using the traditional mean-shift algorithm, while Figure 10c,g, and Figure 10d,h show the segmentation result and extracted foreground area of the proposed algorithm, respectively. As shown in Figure 10, the results of the proposed algorithm are very close to the best segmentation results obtained by the traditional mean-shift algorithm. The proposed method can control over-segmentation to a small extent and reduce the mean-shift algorithm debugging time. The algorithm comparison parameters are shown in Table 2.

**Figure 10.** Visual comparison of wildlife image segmentation: (**a**,**e**) the original image; (**b**,**f**) the best segmentation results of the traditional mean-shift algorithm; (**c**,**g**) the segmentation result of the proposed algorithm; and (**d**,**h**) the extraction result of the proposed algorithm.

**Table 2.** Comparison of image algorithm parameters.


Time complexity of the mean-shift is defined as *O Tn*<sup>2</sup> , where *T* is the iterations number of the data sets, and *n* is the number of all sample data points [39]. The larger the value of the bandwidth parameter is, the less number of iterations required to traverse all the data sets, which means that the time complexity *O Tn*<sup>2</sup> is reduced. Choosing a larger bandwidth value means ignoring the detail information of the image, which can directly affect the quality of segmentation. The parameters of the algorithm proposed by this study are larger than the parameters of the best results in the comparison experiments, which means the proposed method reduces the debugging time and time complexity for image segmentation.

#### **5. Experimental Results and Discussion**

In order to verify the adaptability and effectiveness of our proposed algorithm, an extraction analysis of the captured wildlife images was conducted. The result was evaluated by several evaluation criteria, and it was compared with other conventional algorithms for image extraction.

#### *5.1. Evaluation Criteria*

The pixel accuracy, relative limit measurement accuracy, and mean intersection over the union were utilized as objective criteria to evaluate the quality of image extraction [40].

The pixel accuracy rate *PA* was used to calculate the ratio of the number of correctly segmented pixels to the number of pixels in the image:

$$PA = \frac{\sum\_{i=1}^{n} n\_{ii}}{\sum\_{i=1}^{n} t\_i} \times 100\% \tag{15}$$

where *ti* is the number of pixels belonging to the division category *i* in the original picture, *nii* represents the number of pixels whose actual category is *i*, and the prediction category is also *i*.

The relative limit measurement accuracy *RLMA* indicates the deviation value between the actual value of the segmented image and the true value of the foreground region:

$$RLMA = \frac{|\alpha - \beta|}{\alpha} \times 100\% \tag{16}$$

where α is the actual number of pixels in the image to be segmented, and β is the number of pixels in the foreground region obtained by segmentation. The smaller the *RLMA*, the better the segmentation effect.

Mean intersection over union *MI*o*U* is used as the intersection ratio calculation of the segmentation result and the true value, which can reflect the accuracy and completeness of the segmentation result, and is the most commonly used evaluation index:

$$M \text{lol} \\ I = \frac{1}{n+1} \sum\_{i=0}^{n} \frac{n\_{ii}}{t\_i + \sum\_{j=0}^{n} n\_{ji} - n\_{ii}} \times 100\% \tag{17}$$

where *nji* denotes the number of pixels whose actual category is *j*, the prediction category is *i*, and *ti* is the number of pixels belonging to category *i*.

#### *5.2. Experiment and Analysis*

We compared the experimental results of our algorithm with three other extraction algorithms, including N-cuts (N-cuts) [41], the aggregating super-pixels (SAS) algorithm [42], and the histogram contrast saliency detection (HCS) algorithm [25]. Experimental samples were selected from wildlife monitoring images of different species with different backgrounds and different light intensities in the Saihan Ula Nature Reserve due to seasonal variations. The comparison results are shown in Figure 11. There are wildlife images in Figure 11(1–3) with diverse background complexity. Figure 11(1) has a simple background and shows an extreme difference between the foreground and background color. Figure 11(2) has a higher background complexity with a similar color between the grass and trees in the background, whereas the image of Figure 11(3) has only a single background, and the shadows region under light conditions is similar to those of the wildlife. There are three typical images of wildlife with different intensities of light in Figure 11(4–6). Figure 11(4,5) show wildlife captured under the normal lighting conditions and weak illumination, in which the overall color of the latter is dark and the details are not obvious; there is a bright and dark mutation area under the condition of non-uniform strong illumination in Figure 11(6). All the experiments were performed using MATLAB (2014b, The MathWorks, Natick, MA, USA, 1984) in a workstation with Intel (R) Core (TM) i5-4590 and 8GB RAM.

**Figure 11.** Vision comparison of wildlife image extraction: (**a**) the original image; (**b**) the extraction results of the proposed algorithm; (**c**) the extraction results of the N-cuts algorithm; (**d**) the extraction results of the aggregating super-pixels (SAS) algorithm; (**e**) the extraction results of the histogram contrast saliency detection (HCS) algorithm; and (**f**) ground-truth.

The above extraction results show that the method proposed in this paper has a superior performance and that its extraction of wildlife regions is more accurate than those of the other three methods. For example, the proposed algorithm, N-cuts, SAS algorithm, and HCS algorithm have a better effect on the extraction of the images of Figure 11(1,3) with simple backgrounds, compared with the extraction of the image of Figure 11(2) with a higher background complexity and a foreground in which grass and trees are very similar in color with slightly different texture features. Therefore, we proposed a method based on adaptive mean-shift and Hermite transform which could effectively segment the image and obtain satisfactory extraction results, whereas the N-cuts algorithm, SAS algorithm, and the HCS algorithm show problems of over-segmentation and even incorrect segmentation. In the surveillance images of wildlife under different illumination conditions, the SAS algorithm and the HCS algorithm caused over-segmentation in weak illumination due to the influence of the shadows, as shown in Figure 11(5). Under the conditions of non-uniform strong illumination, the N-cuts algorithm caused incorrect segmentation in the head and legs of the wildlife in bright and dark mutations, as shown in Figure 11(6).

For the six wildlife monitoring images described in Figure 11, the wildlife regions segmented by hand were manually labeled through an image splitter with reference to the true value. The pixel accuracy, relative limit measurement accuracy, and mean intersection over the union of the extraction results of the proposed algorithm, N-cuts algorithm, SAS algorithm, and HCS algorithm extraction results are shown in Figure 12.

**Figure 12.** Comparative results of each algorithm. PA: Pixel accuracy; RLMA: Relative limit measurement accuracy; MIoU: Mean intersection over the union.

Analysis of the image algorithms corresponding to the data in Figure 12 led to four main findings: (1) Compared with the N-cuts, SAS, and HCS algorithm, the relative limit measurement accuracy *RLMA* of the proposed algorithm was the lowest, which indicates that the foreground region extracted by the proposed algorithm had the least deviation from the reference's true value; (2) an accurate extraction method produces a *PA* value that is close to 100%. Thus, our proposed method yields better extraction than the other methods for all the images except Figure 11(5). Although the HCS algorithm produced the best extraction in Figure 11(5), the foreground and background images could not be effectively extracted; (3) the mean intersection over union *MIoU* of the proposed algorithm was remarkably higher than those of the SAS and HCS algorithms, and slightly higher than that of the N-cuts algorithm, which indicates that the proposed algorithm has the best accuracy and completeness; (4) the mean intersection over union *MIoU* of the proposed algorithm was more than 70, which proves that the proposed algorithm has a high applicability in wildlife monitoring images. The above results show that our algorithm is more suitable for the extraction of wildlife images captured by WMSNs.

In order to further verify the performance of the proposed algorithm, this experiment randomly selected 120 images from the wildlife monitoring images and calculated the performance evaluation index values of the different algorithms. As shown in Figure 13, the proposed algorithm improved the pixel accuracy compared with the N-cuts, SAS, and HCS algorithm. By comparing the mean of the experimental data, the pixel accuracy of the proposed algorithm increased by 11.25%, 5.46%, and 10.39%, respectively; the relative limit measurement accuracy improved by 1.83%, 5.28%, and 12.05%, respectively; and the average mean intersection over the union increased by 7.09%, 14.96%, and 19.14%, respectively. The above results show that the proposed algorithm consistently outperforms other algorithms with respect to both pixel accuracy and average mean intersection over the union.

**Figure 13.** Comparative results of different algorithms. (**a**) Pixel accuracy (PA) experimental result; (**b**) relative limit measurement accuracy (RLMA) experimental result; (**c**) mean intersection over the union (MIoU) experimental result.

#### **6. Conclusions**

In this paper, we proposed a novel extraction method for wildlife images, which can achieve extraction of the foreground region and reduce the energy loss of sensor nodes in WMSNs. The method uses Hermite transform to extract image texture information and combine color information obtained by the color space to achieve adaptive mean-shift clustering. This study used wildlife images captured by a WMSN monitoring system, which was developed by our laboratory, in the Saihan Ula Nature Reserve, as an experimental sample. The proposed method was compared with the N-cuts algorithm, SAS algorithm, and HCS algorithm considering four criteria, including the extraction effect, pixel accuracy, relative limit measurement accuracy, and mean intersection over the union. The experimental results confirmed that the algorithm proposed in this paper was superior to the other three algorithms. The experimental data and results show that the proposed method can realize more accurate extraction of wildlife monitoring images and provide effective support for image transmission in WMSNs. However, uncertainty still remains as to the accurate extraction of the foreground by the threshold segmentation method in the case of irregular numbers of clusters. In future work, we are planning to construct a method for extraction based on gray histogram estimation and regional mergers for each input image.

**Author Contributions:** J.Z. proposed the algorithm; W.L. and H.L. conceived and designed the experiments; W.L., H.L., Y.W., X.Z. and J.Z. performed the experiments; Y.W. and X.Z. analyzed the data; H.L. wrote the paper.

**Funding:** This study was financially supported by the National Natural Science Foundation of China (Grant No.31670553), Fundamental Research Funds for the Central Universities (Grant No.2016ZCQ08).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
