**1. Introduction**

Mangroves form an important coastal wetland ecosystem, dominating tropical and subtropical coastlines globally [1,2]. They are crucial not only for human economic activities, but also for a diverse group of terrestrial and marine species that are dependent on mangrove ecosystems for habitat [3,4]. Mangroves attenuate overland flow of water and therefore act as a shield that protects both natural and human infrastructure from storm surges [5]. Threatened by global climatic changes, sea-level rise, and human developments, mangrove response is variable, either retreating seaward or transgressing landward into other ecosystems [6–12]. To better comprehend these alternative trajectories, it is necessary to understand how mangroves are presently distributed and how their distributions have changed over time across a range of coastal environments. However, long-term monitoring of coastal

and marine systems is rare [13], and, therefore, deciphering the changes in distribution of mangroves through time is a challenging task. In part, the reason is that mangroves typically occupy periodically inundated and remote regions where it is challenging, time-consuming, and cost-intensive to survey them through traditional field-based methods [2]. In contrast, information acquisition with greater coverage at lower cost is achievable through remote sensing methods. Remote sensing methods have been increasingly used in the past few decades to extract information for mapping and monitoring of forests [14].

Mangrove retreat or expansion is likely to be observed first in ecotones, the brackish transition zones between the coastal ecosystems and the interior freshwater ecosystems where mangrove trees mix with freshwater marsh vegetation. We expect that the leaves of evergreen mangrove trees will absorb more light in the blue and red spectra and reflect more light in the green spectrum, resulting in a large reflectance di fference between green and red/blue bands. In contrast, partially senesced marsh vegetation, especially during the dry season, has a relatively small reflectance di fference between green and red/blue bands. This distinct di fference in spectral reflectance between mangroves and the graminoids that dominate in marshes, will allow the separation of these two vegetation growth forms using remote sensing imagery.

Vegetation mapping involving multispectral images are commonly applied in global studies [15,16]. Medium resolution multispectral images (e.g., Landsat, NASA, Greenbelt, MD, USA) are free of charge, have temporal coverage dating to the late 1970s, and spatial resolutions of 10s to 100s of meters that are adequate for detecting large-scale disturbances caused by episodic events such as hurricanes [17]. Though Giri et al. [15] mapped the global distribution of mangroves using medium resolution Landsat images and Global Land Survey data, and mangrove related vegetation mapping studies are becoming commonplace [18,19], we did not find any study that addressed tree crown detection, delineation, and cover estimation of mangroves at the individual patch level using true color or multi-spectral images. Detection of the early stages of mangrove invasion into freshwater marshes necessitates higher spatial resolution images (e.g., WorldView-2, DigitalGlobe, Westminster, CO, USA). Medium resolution imagery from satellites such as Landsat is too coarse to detect the subtle changes occurring at the patch or individual tree scale. However, acquisition of high-resolution images over large spatial extents with commercial satellites can be prohibitively expensive. In addition, mangrove transgression is inherently a slow process, and it takes multiple decades to detect mangroves as they mature starting from small seedlings. As such, the short temporal coverage of high-resolution multispectral images is insu fficient to study mangrove transgression in much detail [20].

At the same time, a huge repository of high-resolution aerial photographs, some dating back as far as the early 1900s, are available for many parts of the world [21]. These aerial photographs are available as true color, infrared, or panchromatic photographs as hard or soft copies. The most commonly used method in mapping vegetation from aerial photography is manual digitization [22,23], which is not only time-consuming but also subject to the interpretation of the digitizing analyst, making repeatability and replication at the same accuracy and precision di fficult.

Therefore, a desirable goal is to use automated detection, and delineation techniques to detect subtle changes in crown- and patch sizes at decadal time scales using high spatial resolution (sub-meter) true color (RGB), near-infrared and panchromatic aerial photographs that were acquired by conventional frame cameras. We present here an initial step toward that goal, an evaluation of the suitability of RGB aerial photography in a fully automated delineation process, di fferentiating tree patches against a graminoid marsh wetland matrix.

Researchers have successfully used true color (RGB) photographs in detection and delineation of tree crowns by various segmentation techniques [24–29]. Segmentation techniques separate an image into target plant and background components. Three widely used segmentation techniques are (i) color-index based segmentation, (ii) threshold-based segmentation, and (iii) learning-based segmentation [30]. Color-index or vegetation index is used to enhance the contrast between vegetation and non-vegetated classes. The rationale behind using color-based vegetation indices is to outline

the vegetation region of interest, e.g., crops or trees, by combining information from several bands into a single grayscale image. Many color-based indices have been developed, among others Excess Green [31], Excess Red [32], Vegetative Index [33], Visible Atmospheric Resistance Index [34], Normalized Di fference Index [35], Triangular Greenness Index [36], and Visible-band Di fference Vegetation Index [28]. Other indices combine two or more vegetation indices such as Excess Green minus Excess Red [25], and the Combined index [27].

Despite promising outcomes, limitations of color-based indices to segmen<sup>t</sup> images have been reported when images are captured under variable light conditions [30]. Segmentation requires thresholding techniques which often depend on a user-selected threshold. Higher threshold selection may lead to under-segmentation, thereby merging plant pixels with background pixels, while lower threshold selection may lead to over-segmentation [30]. Among several thresholding techniques, Otsu's automatic thresholding method [37] is one of the most widely used. Because thresholds are determined automatically in Otsu's method, this approach is particularly applicable where several images must be processed, thereby reducing the time required to binarize the images.

Limitations of color-based vegetation indices and thresholding methods have prompted researchers to use machine learning approaches including both unsupervised [38] and supervised methods [39,40]. However, these approaches are complex and often require substantial user input and feedback at multiple stages of the process, making them labor intensive.

Wang et al. [41] categorized several other automatic recognition algorithms for individual tree delineation into four major types: contour-based, local maximum, template matching, and 3D-model. The contour-based method relies on intensity changes which in turn are scale dependent. Therefore, the biggest challenge with contour-based methods is to find a scale that is appropriate for all individual trees in the same image [41]. Local maximum methods underperform because of varied illumination conditions and irregular background phenomena in the image [41]. Model-based template matching requires detailed *a priori* knowledge about the object and is susceptible to varying illumination and noise in the image. Some researchers have applied 3D-based methods. One such method is the watershed segmentation algorithm, a region-based approach originally proposed by Digabel et al. [42] and revised by Beucher et al. [43]. Later, Meyer et al. [44] introduced marker-controlled watershed segmentation to overcome the problem of over-segmentation due to noise in the image [14]. The underlying principle stems from the geographical concept of watersheds and catchments.

Watershed segmentation requires a grayscale input image which is viewed as a topographic surface where the intensity (gray level) of each pixel represents elevation, and local maxima represent the tree crowns. To form catchment basins and delineate watersheds, the image is inverted so that local maxima become local minima, which form valleys [41,45]. As the surface is slowly flooded with water, water will start accumulating in the valleys (local minima) until it overflows into adjacent valleys. The idea is to prevent the water in neighboring catchments from merging by building dams on the watershed lines, thereby creating the boundary of each segment, or catchment basin [45]. Thus, a catchment basin becomes the tree crown or a contiguous patch region with several clumped trees, and the watershed lines become the edge of the crowns or patches.

There are two critical steps for accurate delineation of tree crowns by the watershed method:


Various approaches have been used to implement these two steps [41,46–48]. Lamar et al. [48] developed an automated segmentation method to extract populations of hemlock trees for multi-temporal assessment from aerial images, using a spectrally classified binary image, and generated the markers by Euclidean distance map construction and Gaussian smoothing. Wang et al. [41] detected and delineated tree crowns from a high resolution multispectral aerial image. They identified and created two sets of treetops from the first component of a principal component analysis. The two sets were created using a local non-maximum suppression method, and a local maximum on morphologically transformed distance method, each producing a binary image of the treetops. The markers were generated by intersection of the two binary images based on well-defined criteria. Recently, Yin et al. [49] detected and delineated individual mangrove trees from light detection and ranging (LiDAR) data by seed region growing (SRG) and marker-controlled watershed segmentation (MCWS). The seeds/markers were assumed to be the treetops which were detected as local maxima from the canopy height model (CHM) using variable window filtering method. Although watershed segmentation holds the potential to use spectral imagery to di fferentiate and delineate tree crowns from a background matrix [48], this method has been evaluated mostly in non-mangrove forest settings.

Our objective was to fully automate an image segmentation technique to detect and delineate mangrove patches. By mangrove patches, we refer to mangroves that either occur as isolated individual trees that are large enough to be detected, or several trees that are clumped together. The mangroves were embedded in a graminoid dominated wetland landscape with a mixture of grasses, sedges, and rushes. Since true color aerial photographs have only three spectral bands (RGB), we evaluated which vegetation indices most e ffectively enhanced the contrast between target pixels (i.e., mangrove patches) and their background.

The application of a fully automated delineation of mangrove patches using the watershed algorithm to high-resolution true-color aerial photography was conducted in a two-step process: (1) Generation of a vegetation index and application of Otsu's thresholding method, followed by morphological operations to delineate markers; (2) Delineation of tree patches with marker-controlled watershed segmentation. In this paper we present the process that identified the vegetation indices and parameter settings that best delineate markers for watershed segmentation to detect mangrove patches. Assessment of the best method was evaluated on the basis of (1) agreemen<sup>t</sup> between algorithm-detected tree cover compared to actual cover, (2) overall and class-specific user's and producer's accuracies, and (3) object-based (patch) accuracy estimates.

The remaining sections of the paper are arranged as follows: Section 2 describes the study area, the components of the watershed algorithm and the metrics used to evaluate algorithm performance; Section 3 presents the results of the sensitivity analysis, and the success of individual tree detection and extraction of tree patches; Section 4 discusses the e ffects of parameter selection, vegetation indices, Otsu's thresholding method, and the presence of shadows on the detection and delineation of trees; and Section 5 presents the study's conclusions.

#### **2. Materials and Methods**

#### *2.1. Study Area and Image Acquisition*

The study area is located adjacent to Everglades National Park, in Florida, USA, approximately 300 m south of the C-111 Canal and 3.6 km west of South Dixie Highway (Figure 1).

The study area consists of heterogeneous freshwater herbaceous marsh vegetation with scattered occurrences of red mangroves (*Rhizophora mangle*). A georeferenced true color aerial photograph was used with a spatial resolution of 0.08 m (0.25 foot), acquired in the dry season on January 24, 2017 by Miami-Dade County [50]. The RGB image was acquired using Vexcel Ultracam Eagle (UCEagle) large format aerial sensor and was processed with Inpho (Trimble, Sunnyvale, CA, USA) Photogrammetry software. Each channel recorded 8-bit digital number (DN) brightness values ranging from 0 to 255. The methodology is presented in a flowchart in Figure 2 and the steps are described in detail in the following sections. Digitization and visual interpretation of reference samples was conducted in ArcGIS 10.5 [51]; index calculation, thresholding, and watershed processing were scripted in Python [52] using openCV [53] and scikit-learn [54]; and data analysis, and accuracy assessment were performed in R [55].

**Figure 1.** Study area in the southern Everglades, adjacent to Everglades National Park, Florida, USA.

**Figure 2.** Flowchart of individual mangrove tree patches delineated from aerial photograph (blue box) using vegetation indices, Otsu's thresholding, and watershed segmentation. Parameter sensitivity analysis and accuracy assessments (red boxes) were performed on segmented images with and without shadow removal (green boxes).
