*2.1. Study Site*

The study area (10 km × 10 km) was located in the elevation range of 1100–2200 m a.s.l. in the Taita Hills (3◦25 S, 38◦19 E) in southeast Kenya (Figure 1). The Taita Hills are part of the Eastern Arc Mountains (EAM). There are two rainy seasons with long rains occurring in the March–June and short rains in October–December [22]. The potential natural vegetation for the Taita Hills is moist Afromontane forest or cloud forest [23] while most of the area is transformed to agricultural use [24,25]. The hills are a biodiversity hotspot with high endemism and exceptional loss of habitat with 80 endemic woody plant species occurring in EAM [8,26]. Only 4.2 km<sup>2</sup> of montane forests persist in 12 forest relicts [27].

In agricultural land, the most common tree species are exotic [8]. *Eucalyptus* spp. native to Australia have been planted to produce lumber, but have been reported by local communities to cause degradation of water quality and lower the availability of water [28] (Figure 2). *Eucalyptus* plantations may even dry up rivers completely [5]. *Grevillea robusta*, also native to Australia, have been planted on farmland to produce lumber since the 1980s, with positive impacts reported by locals [28]. Other studies have shown that *Grevillea robusta* trees may improve agricultural productivity by increasing rainfall utilization, while careful consideration must be given to the distribution of trees among crops to avoid mutually detrimental effects on the tree establishment and the crop growth yield [6,7]. *Acacia mearnsii*, native to Australia, was originally brought to the area to produce vegetable tannins for the tanning of leather [29], but is presently used mainly as firewood. It is considered an invasive species that may reduce local biodiversity [30]. Another commonly found species is *Cupressus lucitanica*, native to Mexico and Central America, which has been planted as fences as they grow thick and dense needle canopy that is difficult to pass through. Most common fruit bearing tree species is *Persea Americana* and less common ones include *Eriobotrya japonica* and *Mangifera indica*.

**Figure 1.** Location of the study area in Coast Province of Kenya and trees sampled in 2013 and 2015.

**Figure 2.** (**a**) Aerial view of town of Wundanyi and surrounding agroforestry landscape with forest fragments; (**b**) agroforestry landscape near Ngangao forest; and (**c**) *Grevillea robusta*; (**d**) *Eucalyptus saligna*; (**e**) *Acacia mearnsii*; (**f**) *Persea americana*; and (**g**) *Euphorbia kibwezensis* trees in a valley in the north-western corner of the study area.

## *2.2. Field Data*

The fieldwork was conducted in two stages. The first campaign was organized between 17 January and 8 February 2013. The 100 km<sup>2</sup> study area was divided into 16 tiles (2.5 km × 2.5 km), which were sampled by 100 ha clusters. Each cluster had ten circular 0.1 ha study plots (17.84 m radius). Ten clusters were selected for the detailed tree sampling, and as one plot was treeless, this resulted in 99 study plots. From each study plot, every tree that had a diameter at breast height greater than 10 cm was measured. The central point of each study plot was measured with GNSS (Trimble GeoExplorer GeoXH 6000, Trimble Inc., Sunnyvale, CA, USA). Measuring tape and compass were used to measure the relative position of each tree from the plot center. To enable the differential correction of the data, a GNSS base station (Trimble Pro 6H receiver, Trimble Inc., Sunnyvale, CA, USA) was logging in a known position during the field measurements. The plots located in the closed indigenous forest were omitted, as we were primarily interested in exotic species that have been planted on agricultural or otherwise managed land. The data from 2013 contained 531 individual trees from 55 different tree species.

The second campaign was organized during 1–30 October 2015. The study area was divided into 1 km × 1 km tiles and 30 tiles were randomly selected. Each tile was further divided into rectangular 1 ha study plots and one study plot was selected within each tile, with the exception of one tile that had two study plots. In total, there were 31 study plots. Within each study plot, nine sampling points with 33.3 m intervals were established. At each sampling point, two trees were selected using the T-square plotless sampling method [8,31]. The same GNSS receiver and base station were used as in 2013 but each tree was measured directly with GNSS. A tree was defined as any woody plant taller than five meters. The data from 2015 contained 538 trees, while we excluded 98 trees that were either located under higher trees (not visible from the air) or that had GNSS positional accuracy <4 m, which left us 440 crowns. In total, there were 950 tree measurements with sufficient positional accuracy and visible canopy from 2013 and 2015 combined.

#### *2.3. Remote Sensing Data*

A flight campaign was conducted in 3–8 February 2013 during the dry season. Two sensors were used for collecting the ALS and IS data from a mean flying height of 750 m. Optech ALTM 3100 (Teledyne Optech, Vaughan, Ontario, Canada) is an oscillating mirror laser scanner capable of recording up to four echoes (returns). The sensor was operated at a pulse rate of 100 kHz and a scan rate of 36 Hz. Scan angle was ±16◦. Achieved pulse density was 9.6 pulses/m2. Mean footprint diameter was 23 cm (Figure 3). The IS data were acquired with AisaEAGLE (Spectral Imaging Ltd., Oulu, Finland) sensor, a pushbroom scanner with an instantaneous field of view of 0.648 mrad and field of view of 36.04◦. The sensor was used with four times spectral binning mode that produced output images with 129 bands and a full width at half maximum of 4.5–5.0 nm in the spectral range of 400–1000 nm. The output pixel size was one meter (Figure 3).

**Figure 3.** An example of AisaEAGLE data (color-infrared) and point cloud derived from laser scanning data. The coordinates are in UTM37S/WGS84 coordinate system.

#### *2.4. Remote Sensing Data Preprocessing*

ALS data were preprocessed by the data vendor (Topscan Gmbh, Rheine, Germany) and delivered as a georeferenced point cloud in UTM37S/WGS84 coordinate system with ellipsoidal heights. The buildings and power lines were excluded and some erroneous measurements from steep slopes were removed using TerraScan software (Terrasolid Ltd., Helsinki, Finland). A canopy height model (CHM) was created using LAStools (version 170201, rapidlasso GmbH, Gilching, Germany) software from point cloud using a pit-free method [32].

The raw IS data were radiometrically corrected and orthorectified with CaliGeoPro 2.2 (Spectral Imaging Ltd. Oulu, Finland). Atmospheric correction was applied using ATCOR-4 (ReSe Applications Schläpfer, Wil, Switzerland), [33]. After the orthorectification, it was noted that there were geometric mismatches between IS and ALS data. As the LiDAR sensor system had higher quality inertial measurement unit and the IS data had obvious distortions, we co-registered the ALS and IS data using control points collected manually from the CHM. The processed IS scanning lines were clipped so that the side overlap was minimized to reduce the distortions on the sides of the flight lines. In total, 50–100 control points were collected for each flight line and first order polynomial transformation was applied to co-register the images. After the co-registration, RMSE was 1.06 m, which was considered appropriate for the data fusion [34].

Before the classification, we filtered the spectral data by excluding pixels with NIR (836 nm) reflectance < 20% and NDVI < 0.5. Higher NDVI threshold has been used in some studies [35], but we selected 0.5 threshold as even the brightest pixels of some of the species in the study area showed lower NDVI values (e.g., *Euphorbia kibwezensis*).

#### *2.5. Segmentation and Preparing Training Data*

Tree crowns were segmented using the dalponte2016 algorithm [36] implemented in the lidR package [37] in R software (R version 3.4.0, R Foundation for Statistical Computing) [38]. The algorithm finds local maxima from rasterized CHM, designates these as tree tops, and then uses a decision tree method to grow individual crowns around the local maxima.

We matched the 950 field measured trees to 543 tree crowns (Figure 4). If multiple field measurements from the same species were detected for the same segmented crown, only one of the observations would be kept. Thus, one segmented tree crown may consist of multiple trees from the same species, while one segmen<sup>t</sup> was considered as one sample. If a crown contained more than one species, it would be excluded. In total 61 species were observed for 543 crowns, while 19 of the species had only 1 observation and 10 had 2–3 observations. These were excluded from the classification as done in a similar setting earlier [10]. The classifications included in total 499 crowns from 31 species (Table 1). All species belonging to *Eucalyptus* and *Syzygium* genuses were labeled as *Eucalyptus* spp. and *Syzygium* spp., respectively, as we could not identify the exact species in all instances. However, the majority of the *Eucalyptuses* were *Eucalyptus saligna*.

**Figure 4.** An example of segmented tree crowns and the field measured trees on top of the canopy height model.

**Table 1.** Species with more than three samples, abbreviation used in figures, type (exotic or native), the frequency of the species (crown level) and number of pixels per species.



**Table 1.** *Cont.*

#### *2.6. Minimum Noise Fraction Transformation*

MNF transformation [20] was applied to the atmospherically corrected reflectance data to reduce dimensionality and pack coherent information in a smaller set of features. The algorithm was applied in ENVI software (version 5.0, Research Systems Inc., Boulder, CO, USA) [39]. We determined the usefulness of the MNF components by the evaluation of their eigenvalues and through visual interpretation [40]. We selected 15 first MNF components and disregarded the rest based on their low eigenvalues. The further visual inspection confirmed that the MNF bands from 16 onwards contained mostly noise. Each MNF component has corresponding eigenvectors that can be used to interpret the weight that each original reflectance band has on the component.

#### *2.7. Narrowband Vegetation Indices*

We calculated a set of narrowband vegetation indices (NVI) that have been linked in earlier studies with vegetation structure, biochemistry and plant physiology (Table S1) [41]. Some of the indices have originally been developed for broadband data but were calculated using narrowband IS data and thus referred as NVIs.

#### *2.8. Point cloud Features*

We used lidR package [37] in R to calculate features for tree crown segments from the ALS point cloud. The average density was 9.6 pulses m2, but there was notable variation in pulse density due to elevation variations across the study area (Figure 1). Especially in the valleys, there were not enough returns per tree crown to calculate complex features. Intensity values were not considered as they were uncalibrated and IS data were available. As ALS derived height information alone is of limited value [9], we calculated variance (var), minimum height (min), 95th percentile (P95), median absolute deviation around median (MADmedian) , median absolute deviation around mean (MADmean), average absolute deviation around median (AADmedian), average absolute deviation around mean (AADmean), quadratic mean (QM), the count of returns (count), maximum height divided by the maximum crown diameter (HD) and maximum height divided by the count of returns (HC).

## *2.9. Feature Selection*

We used VSURF (Variable Selection Using Random Forests) package in R [42,43] to perform the feature selection. VSURF uses the RF variable importance to identify the features that are the most important for the classification task. It was developed especially for handling high dimensional data. VSURF performs three steps: (1) irrelevant features are eliminated; (2) all features related to the response are selected (interpretation step); and (3) selection is refined by eliminating redundancy in the set of features selected in the second step (prediction step). The features retained in the final step were used to test the impact of variable selection on the classification accuracy and to identify the features that were important for the classification.
