**1. Introduction**

Globally, more than 55% of new agricultural land in tropics was converted from forests between 1980 and 2000 [1]. In eastern Africa, the yearly increase rate of agricultural land has been 1.4% during 1990–2010, while the yearly deforestation rate increased from 0.2% during 1990–2000 to 0.4% during 2000–2010 [2]. Agroforestry systems are considered as an option for mitigating the negative impacts of this change [3,4]. In addition, selecting proper tree species is important for a productive and environmentally sustainable agroforestry system [5–7]. However, the transformation of forests and woodlands into agroforestry might decrease biodiversity as native tree species are often replaced with exotic species. In the Afromontane highlands of the Taita Hills (southeast Kenya), 66.5% of tree species observed in the croplands (agroforestry) are exotic, and were associated in a recent study with functional traits such as economic function and nitrogen fixation [8].

Remote sensing based tree species mapping has grea<sup>t</sup> potential to reduce costs of observing changes in the tree species composition in comparison to field based approaches that require large number of field plots [9]. In their recent review, Fassnacht et al. [9] identified that common motives for tree species classification using remote sensing include biodiversity assessment and monitoring, monitoring of invasive species, hazard and stress management, wildlife habitat mapping, sustainable forest management, and resource inventory. Airborne laser scanning (ALS) and imaging spectroscopy (IS) were the most commonly used data types in the recent studies. Most studies had been conducted in temperate forests, while tropical forests had been studied in South America and savannah systems in South Africa [9]. The review did not list any studies from a diverse agroforestry landscape in Africa with patches of shrubland and native forest, where a few exotic tree species are dominant and a high number of native species occur less frequently. This leads to imbalance in the training data used in classification [10]. Although the cost of IS and ALS based tree species mapping is low in comparison to covering the study area on foot, airborne remote sensing is more expensive than satellite-based remote sensing data. However, mapping trees on species level with satellite-based data is challenging in agroforestry landscape, where trees are often isolated on farmland and high resolution data are needed to detect the trees at the crown level.

Previous studies with a high number of tree species have shown decline in the classification accuracy with the increasing number of classes [11], and increase in accuracy with the greater number of samples per species [12]. However, collecting comprehensive field reference data for all the species in a high species diversity system is challenging. The negative impact of imbalanced or limited training data on tree species classification has been approached, for example, by standardizing class sizes using down-sampling [10,13], and by using semi-supervised approaches to increase the size of training data from unlabeled observations [14]. However, we did not find studies where class sizes were balanced using up-sampling, or where different approaches to divide the species into groups based on their spectral and structural characteristics using Jeffries–Matusita (JM) distance, were compared.

According to Fassnacht et al. [9], non-parametric support vector machine (SVM) [15] and random forest (RF) [16] are the most commonly used classifiers for tree species classification. Both classifiers have performed well in remote sensing based classifications while neither has constantly outperformed the other [17–19]. Feature extraction and/or feature selection methods are commonly used with high dimensional data to improve the classification accuracy; in particular, the minimum noise fraction (MNF) transformation [20] has performed well in the previous studies [9,18,21].

The main aim of this study was to study tree species classification in an African agroforestry landscape with high species diversity and imbalanced training set. The specific objectives of the study were to:


#### **2. Material and Methods**
