2.1. Study Site
Namibian encroacher bush species are native, and their expanding growth is a result of multiple and complex interactions between various woody vegetation drivers. Internationally, such as in North America [
7,
8], Australia [
9], and Sub-Saharan Africa [
10], these factors are driven by climate, increased livestock production, and changes in fire patterns. The same factors remain true for Southern Africa such as Botswana [
11], Namibia, and South Africa [
10]; however, in these countries, the growth is additionally increased by a loss of browsing herbivores, wildlife and livestock pandemics, rainfall, and soil types [
12,
13].
Governmental requirements recommend a method of bush thinning, which stipulates that the bush still serves as food and shelter for wildlife. Furthermore, these stipulations acknowledge the fact that the bushes are indigenous and belong in the Namibian savannah, but due to their invasive nature, they should be harvested. Bushes with diameters greater than 18 cm and heights of greater than 4 m must be excluded from harvesting [
14,
15]. If bush harvesting occurs according to the methods as stipulated by the Forest Stewardship Council (FSC) and the forestry guidelines as described above, it is found that biodiversity across flora and fauna could increase [
16]. Current manual, mechanical, and chemical methods of de-bushing cannot successfully meet these requirements.
In the Namibian agricultural sector, three areas could benefit from de-bushing: land use, livestock feed, and charcoal production. Bush invasion results in an imbalance in the bush-to-grass ratio, as well as a decrease in biodiversity and livestock carrying capacity and is associated with economic loss [
12]. It is estimated that the area occupied by encroacher bush in Namibia is between 26 and 30 million hectares [
17]. Financially discounted over 25 years, de-bushing encroacher bush potentially has a net present value of 48 billion Namibian Dollars. This value takes into account financial gains driven by ecosystem benefits, such as groundwater restoring, charcoal production, electricity generation, and firewood collection. Restoration of groundwater levels means that this resource becomes accessible and can be used for agricultural purposes such as irrigation of crops. Charcoal from encroacher bush can be exported to other countries and thus create profits [
14]. It often occurs that employees use protected tree species instead of bushes, as tree stems are thicker, with a higher energy-density and mass. Due to these properties, trees result in higher charcoal yields. The use of trees is however prohibited by law, and a farmer might loose the FSC certification if continuous tree harvesting is detected. In this sector, a computer vision tool is desirable for two parties: the farmer can not only identify high yield areas with this tool, but can also ensure that employees harvest according to the law and certification guidelines. Furthermore, the government could use such a tool for controlling purposes. Especially for FSC, computer vision identification could be of interest, as the officials currently only originate from Europe. If certification controls have to be conducted, these employees first have to gain the necessary botanical skills to be able to distinguish between protected and harvestable species. The tool could overcome this step, and controls could be conducted without training. Electricity generated nationally from encroacher bush replaces electricity and coal imports, which in turn saves money. Financial losses in this model are incurred during the de-bushing process, as this requires capital investment and additional livestock emissions, which negatively impact the climate [
18]. This net present value is 16 % of the country’s annual gross domestic product (GDP) [
19].
From an environmental perspective, it is seen that the bush has a wide root network and leaf canopy, which suppresses the growth of grass and other plants within its proximity, as shown in
Figure 1. This results in a reduction of biodiversity and the degradation of the savannah land. Not only is potential grazing space eliminated, but the penetration of water into groundwater wells is also prohibited. It is estimated that a power plant, fired with encroacher bush, with an installed capacity of 20
could save up to
of water over its lifespan [
14]. These encroacher bushes are however not alien plants and are indigenous to the country. However, due to improved conditions, rapid growth by these species has been supported. The overall aim is not to eradicate these bushes, but to re-claim rangelands [
15]. To ensure that the thinning method is achieved, as described by the governmental institution, a bush identification program might be useful. It could help farmers to harvest within these set boundaries and thus provide a profit for the economy and ensure environmental stability.
To minimize population migration, computer vision could be utilized, which would limit the quantity of workers needed.
Figure 2a,b compares the Namibian population distribution and the extent of bush encroachment. From this figure, it can be noted that the main workforce of the country resides in the north. However, encroacher bush is distributed throughout the country. With the current harvesting methods, this would result in a large portion of the population having to migrate to other parts of Namibia, as the harvesting of encroacher bush delivers temporary work opportunities for the northern population. This would ensure that de-bushing would be conducted to achieve the greatest economic and environmental potential. Migration, however, could potentially result in cultural tension, as 13 different ethnicities reside in the country [
21]. Furthermore, this would also result in additional cost, as new infrastructure would have to be established. Due to the low population density of the country, which is estimated at 3 people/
, all major cities are spread across the country, limiting available infrastructure.
Another aspect, which favors the desirability for a plant identification tool, lies in the versatility and distribution of encroacher bush throughout the country. From
Figure 3a–g, it becomes evident that there are multiple encroacher bush species. These however have adapted to grow in different areas throughout the country, such that not all species are growing in the same location. Such a distribution and the multiplicity of species require harvesters to have botanical knowledge. They first need to be educated, regarding the selection of harvestable species, such that the correct taxonomy is used for the appropriate application. Computer vision would eliminate the educational need, as only a limited number of people would have to be able to identify the plants correctly. This knowledge would then be used to develop an accurate database from which ML-algorithms can learn.
Despite the opportunity to benefit from the removal of invasive bush, the environmental regulations need to be observed by either the government or FSC officials, and this process is currently lacking the required knowledge of identifying various plant species in order to apply government regulation. Identification of bush becomes important, as unskilled personnel might be incapable of identifying the correct bush type, e.g., for charcoal purposes. Furthermore, to make it easier for the producers to comply with governmental guidelines or FSC regulations, a method for accurate bush identification is needed. With the addition of Global Positioning System (GPS) tagging, areas with a high amount of biomass can be detected and tagged, to identify areas with a high economic impact and to protect trees at the same time. The scope of this project is set to cover two tree species (Boscia foetida and Vachellia erioloba) and multiple bush taxonomies. The aim is to classify plant families and not subspecies, i.e., classify a bush and a tree class. This should suffice as a proof of concept that current computer vision technologies are suitable for such datasets.
2.3. Method
The aim of the computer vision program was to identify foliage and wood segments within images. This would then form a tool for bush monitoring. The program consisted of a training part, a search section, and a final boundary output. For this tool, a video stream taken from a car, driving through the Namibian savannah, served as the input. From the stream, frames were extracted and analyzed by the tool. The output from the developed application showed a bar chart, indicating the occurrence of foliage and wood classes. Foliage represented bush prevalence, and wood indicated tree occurrence. Together with the GPS coordinates of the vehicle, a complete monitoring tool could be established.
The flowchart of the algorithm is displayed in
Figure 4. At first, an ML classifier called random forest (RF), developed by [
23], needed to be trained. Random forest classification makes use of multiple decision trees, where a decision tree is a stratification technique. Here, the predictor space was subdivided into a tree-like shape, and classes were identified according to the tree structure. For successful training, a large quantity of data was necessary. Due to the tedious process of capturing images of the desired taxonomies, it was decided to augment the photographic database and thus gain more training data. For this, operations such as tilt, zoom, and vertical axis flip were performed to increase the database by 200 %. These operations were taken from the Augmentor library [
24]. In the diagram below, this step is indicated by augmentation. The next step is to loop over the class labels, as indicated by label. These labels are “foliage”, indicating biomass yield areas, and “wood”, showing the occurrence of protected species.
In the images, noise in the form of grass, sand, and sky was present. This had to be filtered out, indicated by the
Filter operation. During the operation, first, the RGB channels were split. On each channel, dilation and a median blur were applied. During experimental exploration, it was found that a square dilation kernel matrix of size 600 (600 × 600) and a median blur square kernel matrix of size 21 (21 × 21) eliminated shadows satisfactorily over the widest range of images. Thereafter, the absolute difference between the current channel and the blurred channel representation was computed. This result was then subtracted from 255 (the maximum luminance value used in digital images). The final step was to merge all three channels back into one image, thereby removing all regions with unfavorable contrast. This image was then converted into a Hue Saturation and Value (HSV) representation, in which the desired color range ((0, 0, 100)–(255, 255, 255)), representing sky shades, was clipped, resulting in a perfect filter. After filtering, only the desired flora was left in the images. These filtering methods were standard image manipulation steps and widely used in the field of computer vision [
25,
26,
27]. The kernel sizes were tuned until the noise was visibly removed in 75% of all training images. The test results supported that the RF sufficiently learned the features from the desired taxonomies and that the kernel sizes were adequately chosen.
Figure 5a shows a noisy image in which the contrast was non-ideal. In
Figure 5b, the removal of undesired contrast regions is demonstrated, and finally, in
Figure 5c, the color segmented result is displayed, where all grass, sand, and sky are removed.
From the filtered images, information now needed to be extracted, such that the RF could be trained to predict the labels of classes. It was found that the histogram feature extractor worked exceptionally well on the dataset. This step is indicated by “Extraction” in
Figure 4. In computer vision, a histogram represents the pixel intensity distribution of an image. If the histogram’s bins are chosen to collect every possible intensity occurrence, a histogram would range from 0–255 [
28]. This information was used to gain knowledge about the contrast, intensity distribution (tonal range), and brightness. It was found that extraction from filtered HSV images worked well, as bushes were represented in bright green color and wood/trees in dark red. Such an unfiltered representation can be seen in
Figure 6. Furthermore, histogram bin sizes of 3, 4, and 8 were shown to extract data suitable from which the RF could learn. In short, training was done by filtering the images twice, first to remove undesirable contrast areas and a second filter serving to remove sand, grass, and sky patches in the images. With a histogram feature extractor, a feature vector was gained, which could then be used to train an RF classifier.
Scaling and encoding showed that the extracted features were scaled between zero and one. Scaling was necessary, as the ML algorithm could optimally recognize patterns if the features were within a set range. The labels were encoded from alphabetical names to numerical values, such that they were usable by the ML algorithm. All these values were then stored during the
Save Features operation under the
.h5 format, suitable for large matrices. The operation
RSCV means Randomized Search Cross-Validation, which was a hyperparameter-tuning method for the ML. Here, the hyperparameter space was randomly searched for the best cross-validation score. The hyperparameters tuned were “number of estimators”, “minimum samples to split”, “maximum forest depth”, a binary decision for “bootstrapping”, and a classification minimization “criterion”. The details of the hyperparameters were described by [
23].
After tuning the ML hyperparameters, the plant identification operation started. Here, eighty new testing images were extracted from video streams and were used to test the classifier. The video camera was mounted to a car to prove the concept of the monitoring tool, and images were extracted from the video sequence at every 10th frame. At a car speed of approximately 25 / and a frame rate of 30 frames/s, each image covered . A filter was again applied to these images, which removed all the noise. During the search, a sliding window was applied to the image. The area in the window was converted to the HSV color space before the histogram feature extractor was applied. In summary, the search method consisted of first tuning the hyperparameters of the RF with the training data. The test images had to be filtered in a similar fashion as in the training section so that comparable results could be obtained. Searching then commenced after filtering using a sliding window technique on the filtered images, but in the HSV color space, as the taxonomies showed distinct features in this color space. Search results were stored and conditioned for the Output stage.
With this information, the RF could predict the class. RF was chosen as the ML algorithm, after testing a total of six algorithms. Testing was conducted on a small dataset comprised of unfiltered images with only the two tree classes. This was done to pre-select an ML-algorithm, which best performed on a similar dataset. The preconditioning and preprocessing of the entire dataset would then only focus on a single algorithm and not on an entire range, which saved time. Algorithm pre-selection was done according to a fivefold training set cross-validation score, conducted on 1000 images, 500 images per class. All classifiers tested during this stage and their cross-validation score are listed below, where the cross-validation score is indicated in brackets. As is evident from this list, RF was at this stage already the best performing algorithm, such that all other classifiers were rejected.
Logistic regression (0.962).
Gaussian naive Bayes (0.729).
Linear discriminant analysis (0.968).
K-Nearest neighbor (0.968).
Support vector classifier (0.959).
Random forest (0.970).
If one of the two classes was found, the coordinates and the width and length of the sliding window were saved during the
Search Results operation as a .txt file. After all images were analyzed, the stored data from the .txt files were used to map out regions, where the instances found were mapped onto empty images. The output from this technique was a mixture of regions, where foliage and tree classes were identified. For example, if there was a tree in the center of an image, the program would identify the leaves as foliage and the branches and stems as wood. To obtain a uniform output, the Simple Linear Iterative Clustering (SLIC) [
29] algorithm was used. SLIC is a method where similar pixels are clustered into superpixels so that image regions that contain similar information are grouped. This simplified the image representation and reduced the runtime during the identification procedures, as the image as effectively reduced from eight mega pixels to only a few superpixels. This resulted in a homogeneous region of interest, summarizing the most prevalent class. The final region of interest can be viewed in
Figure 7, where red, vertically hatched areas indicate that the wood/tree class was found and green, diagonally hatched regions show that the foliage/bush classes were identified.
The output from this program was an area output. This was a bar chart that indicated on its ordinate the fraction of image area covered by each class. The abscissa showed the distance covered by the vehicle, calculated with the assumption that the car was traveling at 25
/
. The bar chart output of a video-stream analysis can be viewed in
Figure 8. This successfully indicated areas where a large quantity of foliage was present (green with diagonal hatching). This meant that these areas had a high biomass yield potential. Furthermore, the red carets show that there were images where wood was present, which indicated that there were trees in these images, and therefore, caution was advised when harvesting. This program sufficiently proved the concept that a monitoring tool could be created that could identify foliage and woody areas. Above the image, sample images from the sequence are shown, which corresponded to the abscissa indices, i.e., the first image was taken at 0 m distance covered, and the second image corresponded to 32 m covered. All codes were published and are publicly available under [
30]. In conclusion, the
Output stage finalized the label of the found regions of interest. For this, the SLIC algorithm was used as it pre-determined similar regions. Predominant regions were then colored with their corresponding labels, as determined by SLIC. The user then received an output chart, indicating biomass yield areas and areas with a high probability of protected species occurrence.
The developed program worked with sufficient accuracy within its environment. For the performance measure, the program decision boundaries were compared to ground truth boundaries. For this, ground truth boundaries were manually drawn on the test sequence, and the number of correlating program decision boundaries’ pixels was recorded in confusion matrices for each image. Correlating pixels were found using the accuracy score as displayed in Equations (
1) and (
2) [
31,
32]:
In the above equation, TP, TN, FP, and FN are True Positive, True Negative, False Positive, and False Negative, respectively. The computation of this metric was simple and fast and allowed for a rapid evaluation of the model’s accuracy. A final confusion matrix was computed by taking the sum of all individual matrices. Each image contained pixels, so that the entire validation sequence of 40 images contained pixels. In these images, a total of pixels represented the class, where pixels represented the foliage class and pixels represented the wood class. All remaining pixels formed areas where noise existed. Taking the total of all class-representing pixels as the basis of a confusion matrix, the following fractions could be determined: