1. Introduction
Invasive alien plant species (IAPS) have been identified as a growing threat to global sustainability [
1,
2], due to human activity. IAPS are considered to be a major driver of biodiversity loss. Invasive species are among the top five threats to biodiversity worldwide [
3]. They pose a threat to native species, their habitats, and the functioning of the ecosystem. Invasive species can also adversely affect human health and the economy [
4]. Monitoring programs for IAPS are needed to identify where IAPS are most abundant, most likely to spread, or most easily contained [
5]. Transportation lines such as roads, railways, and trails are critical dispersal routes for IAPS since traffic can transport seeds over long distances [
6,
7]. Efficient and cost-effective monitoring for IAPS along, for example, roads will therefore serve as an important advance towards the mitigation of the negative consequences of IAPS.
Remote sensing is increasingly being used for the detection of invasive plant species as reported by Bradley [
8], and Huang and Asner [
9]. Here, the spatial analysis of plant invasions is based on imagery acquired by satellites to create distribution maps of invasive plants to support decision making for management and control. Bolch et al. [
10] focused on remote sensing capabilities to detect and monitor IAPS across terrestrial, riparian, aquatic, and human-modified ecosystems. The identification and remote detection of alien invasive plants in commercial forests are reported by Ismail et al. [
11].
Unmanned aerial vehicles (UAVs) have been popular in remote sensing applications [
12]. For monitoring IAPS, this approach offers a valuable solution for local areas with a high spatial and temporal resolution [
13,
14]. However, alien species that are distributed along roads are impractical to monitor using UAVs. Firstly, the large distances mean that the drones will have to be charged or refueled several times to cover the motorways. Secondly, UAVs are subject to regulation and they may not be flown unattended in Denmark, which makes it difficult to cover thousands of kilometers every week.
In many countries, the road network is already inspected by planned drivings. Our approach is to use these inspections to record pictures of plants taken along the road and construct a map with the occurrence of IAPS. Baard and Kraaij [
15] demonstrated that roadside surveys can be used to facilitate an early detection of invasive plant species along 530 km of roads in the Garden Route, South Africa. McDougall et al. [
16] used manual surveys of roadsides in seven regions worldwide to investigate non-native plants invading the mountain vegetation.
Niphadkar and Nagendra [
17] conducted a survey on research that uses remote sensing combined with functional plant traits for the mapping and monitoring of invasive alien plants. Niphadkar and Nagendra [
17] reported that morphological, biochemical, phenological, or physiological plant features can improve remote sensing mapping. Carlier et al. [
18] presented an application of morphological image analysis to provide an objective method for detection and accurate cover assessment of an IAPS. They used top-down images captured using a hand-held digital camera with images that cover 1 m × 1 m. James and Bradshaw [
19] based their work on images collected using UAVs, which allowed them to cover larger areas, but instead of using morphological image analysis, James and Bradshaw [
19] used the U-net convolutional neural network to segment images semantically. The use of a convolutional neural network makes the method less affected by changes in light intensity and shadows and makes it applicable in images with plant occlusion.
In our work, we investigate whether it is possible to automate the registration of IAPS along the Danish state roads, including motorways. We use high-resolution images that are processed automatically to distinguish individual plants along the road. Since the roads are already inspected from a driving vehicle every week, a camera mounted on the vehicle that automatically scans the roadside is a labor- and cost-efficient tool. This is in contrast to, for example, manual surveys or cameras on UAVs, which would add an extra workflow.
We have tested camera equipment that can provide sufficient image quality so that it is possible to identify selected IAPS in the collected images at normal traffic speeds on the Danish motorways. The work includes images collection and annotation as well as applying deep learning algorithms for automatic detection of IAPS in the collected images. The paper demonstrates how detected IAPS can be presented on a map based on the collected data. Finally, the deep learning algorithms are evaluated and the challenges of monitoring IAPS are discussed.
In summary, this paper makes the following new contributions:
Presentation of a camera solution to collect images of plants along the roadside at normal traffic speeds on the Danish motorways.
Evaluation of deep learning algorithms for object detection and classification of invasive plants.
Steps towards an automatic computer vision system for detection and mapping of IAPS in real-time.
2. Camera and Platform for Collecting Images
A prerequisite for mapping invasive plants automatically is to use camera equipment capable of recording images of sufficient quality to detect the plants. The detection must take place from a vehicle that follows the traffic on Danish motorways, that is, with speeds in the range of 100 km/h–130 km/h. Therefore, the camera should be able to take images with sufficiently fast exposure time to avoid motion blur, with a resolution that allows plants to be recognized, and with a frame rate that ensures the full roadside is covered.
The way the camera is mounted on the car also affects the object resolution, the required exposure time to avoid blur, and the required frame rate to cover the full roadside. This is because the distance to the roadside depends on the camera orientation and motion blur depends on the direction of movement relative to the direction of recording. The following sections will go through the camera choice and camera settings.
2.1. Ground Sampling Distance
The spatial resolution is defined as the distance on the object that is covered by the distance between neighboring pixels as illustrated in
Figure 1. In remote sensing this unit is called ground sampling distance (GSD), since the camera is often pointed towards the ground and the spatial resolution becomes the ground distance between two pixels ([
20], p. 31). The unit is meter per pixel. There are three main factors that affect the ground sampling distance. The first is the image sensor’s resolution and size. The second is the focal length of the lens. The third is the distance from camera to the object (roadside) along the camera’s primary axis (illustrated as ‘Pixel center’ in
Figure 1). When the camera points in the direction of travel, the least amount of blur is obtained in the image, but at the same time, part of the image will be useless, as it does not cover the roadside. When the camera is pointing in the direction of travel, we get pictures that cover the largest possible area. However, this is at the expense of the ground sampling distance and, thus, the level of detail.
If the camera is shifted to point in between the direction of travel and perpendicular to the direction of travel, there will be a large variation in the size of a plant depending on the location in the image. It, therefore, places greater demands on the subsequent image processing, which must not only be invariant to size, due to the plants’ natural variation in size at different growth stages, but also invariant to different locations in the image.
When the camera points perpendicular to the direction of travel (i.e.,
in
Figure 1), the highest resolution of the objects (GSD) and the smallest variation in size of objects is ensured. On the other hand, the change in the content of each pixel during the exposure will be greatest with this orientation, thereby resulting in greater image blur. Higher demands are, therefore, put on the exposure time. Moreover, with an orientation perpendicular to the driving direction, the scanned area by each image will be the least. This orientation, therefore, also requires a high shutter speed and a high frame rate, which puts extra demands on the image processing platform.
Figure 2 illustrates the relationship between the GSD and a pixel orientation relative to the direction of travel. This figure is based on a 7 m working distance,
, corresponding to 1 m from car to the emergency track, 2.55 m emergency track, 2.5 m cut grass, and 1 m from the grass to the plant. The camera parameters are based on a Sony IMX226 sensor with an 8 mm lens, which provides a horizontal field-of-view of 46°. This setup is described in further details in a following section. Since a camera has a given field of view, the ground sampling distance will depend on the location in the image. However, when the distance to the object is large relative to the focal length of the lens, the difference between the GSD for the individual pixels will be small.
2.2. Motion Blur and Resolution
Motion blur will acour when pictures are taken at speed. The amount of motion blur in an image relates to the shutter speed, field of view, and camera orientation relative to the object.
Figure 3 shows the relationships between blur and orientation based on a Sony IMX226 sensor with an 8 mm lens. In the calculation, we assume that the lens is ideal and does not itself contribute to blur. The GSD is at its minimum when
°, meaning we obtain the most details. At the same time, the amount of blur is also the greatest when moving the camera. A short exposure time is, therefore, preferable since it will bring down the amount of motion blur (at the expense of added noise if the gain is increased).
Figure 3 shows that for our particular camera setup, having a shutter speed of 1/1000 s results in horizontal motion blur of more than 20 pixels. When the blur spreads over several horizontal pixels, it means that the horizontal resolution decreases. It, therefore, reduces the details captured in the images and thus, reduces the possibility of recognizing plants along the roadside. Ideally, the blur should be kept below 1 pixel. This is not practically possible without adding external lighting, due to the fast driving speed.
Plants that are far away will appear smaller than plants closer to the camera. As we want to reduce the variation in plants’ scaling to ease the automatic classification, we have chosen to prioritize a camera setup that points perpendicular to the roadside (i.e., °). Thus, the size of the plants is approximately independent of the location in the pictures.
2.3. Camera Setup
We tested a Logitech Brio, a Stereolabs ZED2, and two machine-vision cameras; a Daheng MER2-503-36U3C with a Sony IMX264 sensor and a DaDaheng MER-1220-32U3C with a Sony IMX226 sensor. The Stereolabs ZED2 camera was used in a study by Sousa Guedes et al. [
21], where it was also mounted on a car and used for detecting road-killed amphibians and small birds. However, tests showed that the fastest exposure times of the Logitech and Stereolabs cameras are not sufficient for this application as plants were too blurred to be recognizable, when the cameras were filming perpendicular to the travel direction.
The Sony IMX264 sensor is fast enough for the application. Moreover, it is a global shutter sensor and therefore, does not suffer from the rolling-shutter distortion. However, the resolution is only 5 megapixels, whereas the IMX226 is 12 megapixels. By inspecting the images from the two machine vision cameras from Daheng with 8 mm and 16 mm lenses, it was estimated that the higher resolution of the IMX226 outweighed the rolling shutter distortion. Global shutter cameras with comparable or higher resolution exist, but the IMX226 delivered sufficient image quality for a person to easily recognize the plant species in question at a lower price than a global shutter camera with the same resolution. With the Daheng MER-1220-32U3C camera and an 8 mm lens, we get a field of view of 6.2 m at 7 m from the camera to the roadside (working distance). Full coverage of the roadside requires 161 images per km. At a speed of 110 km/h, the frame rate must be at least 4.91 FPS to achieve full coverage.
Based on these considerations, we selected the Daheng MER-1220-32U3C camera with the Sony IMX226 sensor. The camera was mounted on the roof of the car and oriented perpendicular to the direction of travel (
Figure 4a). The roads of interest in this study are the danish state roads, which typically have two to four lanes in each direction, separated by crash barriers. Therefore a single camera pointing to the right is sufficient in this study, although an additional camera pointing to the left may be added if the system were to be used on smaller roads with a clear view of the roadside on both sides of the road. This camera has a C-mount, which makes it possible to change the lens. In addition, it has the option of manually setting parameters, including shutter speed, analog gain, aperture and focusing distance. A sample image of the Daheng MER-1220-32U3C, taken at 1/
s, is shown in
Figure 4b. The picture shows that this camera can also provide high sharpness, so it is possible to distinguish individual leaves of grass. Even though the images are sharp they suffer from distortion due to the rolling shutter of the sensor. This phenomenon is seen in
Figure 4b, where trees lean to the right, even though they are vertical in reality. However, not to a degree that is worse than if the plants were exposed to wind.
2.4. Camera Mount and Housing
The camera is mounted in a 3D printed housing with a lens hood, shown in
Figure 4a. In addition to protecting the camera, this housing has a bracket for mounting, as well as space for mounting the GNSS receiver. The housing is mounted on a bracket that is attenuated with silicone pads to dampen vibrations. This bracket is mounted on a suction cup otherwise used for windshield replacement, which allows it to be mounted on any car. On top of the camera sits a GNSS receiver from Navilock with a u-blox 8 chipset. This GNSS has a 2.5 m positioning accuracy (Circular Error Probable) and a maximum refresh rate of 10 Hz when using the GPS and GALILEO satellites.
2.5. Processing Platform
With a camera that covers 6.2 m of the roadside in each image, we need 612,903 images to cover one roadside of the 3800 km Danish state-owned roads. Before these images provide value, it is necessary to go through them manually and annotate plants, to create a training basis for automated detection algorithms. To facilitate the manual annotation, we have developed a remote control that can be used to mark places with invasive plants while driving. The images are, therefore, only saved when the button has been activated. When driving approximately at 110 km/h, corresponding to 30 m/s, one may risk passing the invasive plants before the button is pressed. Therefore, a buffer was implemented that constantly contains images from the last second. By pressing the button, the image is thus saved back in time until the time that the button is activated. This means that if you first press the button shortly after passing an invasive plant species, it will still be recorded. The images are saved on a 1 TB Samsung Portable SSD.
A processing platform must be used to save the images and synchronize them with the GNSS receiver. We have used an Nvidia Jetson AGX Xavier, as it was developed for automotive use, has a built-in GPU, and runs full Ubuntu, which makes development easy. The code for synchronizing the camera, GNSS and remote control, is written in C++ and Python and synchronized through the Robot Operating System (ROS).
4. Data Annotation
To train and evaluate the detection algorithms, the collected images were annotated according to which IAPS were present in which images. The images contained both invasive plants and background objects (non-invasive plants, road, roadside, sky, traffic signs, etc.); however, many of the collected images contained only background objects. Therefore, the annotations needed to include which IAPS are present in a given image as well as where in the image.
The images were annotated using CVAT (v1.0.0,
https://github.com/openvinotoolkit/cvat) (accessed on 10 September 2021) according to the seven IAPS, which were observed during the data collection. Individual IAPS found in an image were annotated by drawing a polygon around it. Depending on the plant density, multiple plants were included in the same polygon or split into different polygons. The annotations were performed by experts at The Danish Road Directorate and took roughly 40–50 person hours. In total, 12263 polygons were annotated in 8387 images out of the 14854 collected images (
Table 1). Out of the 8387 images with annotations, 209 images included annotations of multiple IAPS. 6467 images did not contain any IAPS.
Training, Validation, and Test Sets
When training a deep convolutional neural network, it is important to split the data into a training set, a validation set, and a test set. The training set is used for optimizing the parameters of the network, while the validation set is used for monitoring the performance of the network during training and for comparing the performance of different networks with, for example, different hyperparameters or network structures. The test set acts as a final evaluation of a network. To ensure a proper evaluation on both the validation set and the test set, samples in each set must be independent and have similar class distributions.
The full data set can be split in various ways. The naïve approach is to randomly sample images from the full data set. However, due to the images being captured in close proximity and potentially having overlapping fields of views, information may leak between the data sets when using the naïve approach. Another approach would be to split the data based on image acquisition dates. Unfortunately, the annotated species are not evenly distributed across the acquisition dates (
Table 1). This is especially evident for
Solidago and
Reynoutria, which were primarily observed on 15 September 2020 and 6 October 2020, respectively. In addition, images may be acquired from the same location across multiple dates, which would make the split data sets non-independent.
Therefore, to ensure independent data sets, the images were clustered based on the locations they were acquired. Images less than 40 m from each other were assigned to the same cluster (With a maximum speed of 130 km/h, and a GPS update rate of 1 Hz, the max distance between adjacent images were 36.1 m). This distance means that images on opposite sides of the motorway could be close enough to be assigned to the same cluster. The images from the same cluster were then restricted to be assigned to the same data sets. To ensure similar class distribution, each cluster was weighted based on the image class distribution within the cluster. The clusters were then randomly shuffled and iteratively added to one of the data sets. Before adding a cluster to one of the data sets, the distribution of the updated set was compared to the expected distribution of that set, using a -goodness-of-fit. The cluster was added to the set, which showed the largest decrease in -value when adding the cluster. If neither of the clusters showed a decrease in -value, the cluster was added to the training set, as it was significantly larger than the validation and test set.
Before splitting the full data set but after clustering, the data set was cleaned. The data cleaning procedure included removing images with Heracleum, “Multiple species”, and overlapping annotations. Images with Heracleum were removed due to the low number of samples, compared to the other classes. Images with “Multiple species” were removed to avoid ambiguity in the image classification algorithm. Images where two annotations had an intersection over union larger than 75% were removed to avoid ambiguity in the object detection algorithm. In total, 22 images of Heracleum, 209 images of “Multiple species” and 19 images with overlapping annotations were removed.
The full data set was split into a training set, a validation set, and a test set with a 70%/15%/15%-split based on the images in each set (
Table 2). Examples of the six IAPS from the training set are shown in
Figure 6. Additional examples can be found in the
Supplementary Materials.
7. Discussion
Recent developments in machine learning have made it possible to use automated camera-based solutions for plant ecological monitoring [
30,
31]. A prerequisite for being able to do automatic recognition of images is that the images have sufficient quality. In this work, we have tested whether the prerequisites are present to use machine learning for the automatic recognition of invasive plants along Danish motorways.
Four cameras were evaluated, of which the two machine-vision cameras provided sufficient image quality when shooting perpendicular to the direction of travel. The selected camera allowed for short exposure times so that images could be captured with minimal motion blur from a car driving at high speeds (>110 km/h) on the motorway. As the maximum driving speed is the limiting factor of the required exposure time, the camera system can, therefore, be transferred to smaller roads with slower driving speeds. On roads other than the motorway, however, the distance from the camera to the plant will often be shorter, as there is rarely an emergency lane between the road and the roadside. This means that the plants will cover a larger part of the image if the current optics are retained. The classification results showed that while the majority of the plants were small (<25% of the image), the medium-sized plants (25–65% of the image) were the easiest to recognize (
Figure 8). Assuming that plants are closer to the camera on smaller roads, they will also cover a larger part of the image, and thus the small plants will likely be easier to recognize.
Plant species vary in appearance during their growth stages and are typically easiest to identify when they flower. A machine learning model trained on images of a plant species in one growth stage may not perform well when confronted with images of the species in another growth stage. The performance of our detection model would likely improve considerably if training and test data were restricted to images of the plant species in bloom. However, from a management point of view, it may be desirable to identify the IAPS as early as possible, e.g., before flowering, to combat the IAPS before they spread further.
However, it is not possible to make only a single inspection since the plants are not flowering at the same time. Therefore, it will be necessary to collect photos at different times of the season. In this study, we have only collected photos during August to October. This has shown that plants can be recognized, but that the detection rate decreases when tests and training images are taken while the plants are in flowering in August and after flowering in October, respectively. For example Lupinus polyphyllus is very characteristic with its colored flowers, but more discrete after flowering. We, therefore, believe that the recognition results obtained can be improved if the plants are in bloom both in the images used for training the system and the images collected during monitoring.
Therefore, images should be collected for an entire season before the system will able to provide support unless images are annotated immediately after a collection and used for training the system. Still, ongoing supervision will be necessary to minimize false negatives, i.e., IAPS that are overlooked. These will typically be plants that grow under conditions not previously seen in the training material.
There are various sources for street photos besides the camera used in this study, for example, Cowi Gadefotos, Mapillary, and Google Street View. However, in order to increase the probability of the plants being detected, the images should be taken at the right time in terms of development and flowering, where they are most distinct. The recording must, therefore, be synchronized with the development of the plants, which cannot be ensured from existing street photos sources. Therefore, we believe that a dedicated camera system for detecting IAPS is advantageous.
We have used two different paradigms in image recognition: per-image classification (ResNet50v2 and MobileNetv2) and bounding-box detection (YOLOv3). In per-image classification, we assign a plant species to the entire image but do not consider where the plant is located in the image. When conducting bounding-box detection, we detect both the plant species and where this plant is located in the image. Although the level of detail at YOLOv3 is greatest, it also requires more time to annotate training data for it. This is on top of the fact that only a small part of the collected images from 2020 have more than one invasive species in the same image and pointing out the location in an image that covers few meters of the roadside, is not needed when controlling the plants. In terms of detection rate, this study has found no clear winner between ResNet50v2 and YOLOv3.
Plants often grow in clusters. This means that the consequences of overlooking IAPS in an image are limited if the IAPS are detected in neighboring images. Furthermore, the detection in neighboring images can be used to adjust the confidence in the detection of plants if the same species are predicted with a high probability in the neighboring images.
ResNet50v2 and MobileNetv2 are more resource efficient and will be able to run in real-time on the current processing platform without further optimization. YOLOv3 can run in real time with scaled-down images on the processing platform if the entire roadside is to be scanned. If the algorithm is to run in real time on the computer, one can avoid having to save the images for later processing. However, saving images is necessary to improve the training material, and to assess the performance.
Previous studies have focused on the recognition of invasive plants based on their spectral reflections. This includes He et al. [
32] and Lass et al. [
33], who used hyperspectral remote sensing to detect invasive plant species. Bradley [
8] noted that a disadvantage of hyperspectral aerial photography is the price, which speaks against using hyperspectral cameras. Since 2011, when He et al. [
32] published their study, hyperspectral images from satellites have become cheaper with a higher resolution. It is, therefore, likely that hyperspectral images taken from satellites may become a competitive alternative to aerial recordings. However, a disadvantage of hyperspectral photography is that the images must be calibrated to local conditions at each recording since the nutrition of the plants has a great influence on their hyperspectral reflectance. This argues for including textural information that is more robust to differences in nutrition.
Bradley [
8] argued that the spatial resolution is important for recognizing plants in aerial photographs based on textural features. Bradley [
8] states that the limit is 1 px per m
, which is what was available from the aerial photos. This makes it possible to detect large plants and trees. However, we would argue that accurate recognition with textural features requires significantly more than 1 px per plant unless plants are in distinctive clusters. Therefore, a significantly higher resolution is also required. Since Bradley [
8] published her work in 2014, however, the achievable resolution has become significantly lower, which is largely due to the availability of affordable UAVs with high-resolution cameras, which can bring the resolution above 1 px per cm
. Moreover, convolutional neural networks only make sense if the plants have a spatial extent in the image. As a starting point, we argue that the resolution must be sufficient for a trained person to be able to recognize the object. This is required to annotate the training data and ensure that the image contains sufficient information for the plants to be identified.
The roadsides are some of the longest continuous areas in Denmark, where nature is allowed to remain relatively undisturbed, except for the vegetation close to the road, which is mowed to increase roadway safety. This makes the roadsides interesting for the detection of invasive plants, as the plants can spread over long distances [
6,
7]. However, it is important to keep in mind that since we only scan the roadside, there is a significant geographical bias, which was also noted by Wolmarans et al. [
34]. The results can, therefore, only be used to monitor invasive plants on the roadside and not to determine the spread of invasive plants away from the roads. Since our method has been developed for roadside monitoring, our approach will not directly applicable for scanning large flat areas, as is possible with aerial photos. In return, our solution can scan plants from a side view, where plants are often easier to recognize than from above since that is the same viewpoint from which we usually see plants. Furthermore, we assess that our method is robust to moderate plant stress as opposed to methods that are based solely on spectral analysis. Although our method was developed for mounting on cars, with slight modification, it will be possible to mount the equipment on a low-flying drone, whereby large areas can be scanned. This will provide significantly higher resolution than normal aerial photos and satellite images for smaller plants to be detectable. On the other hand, the capacity will be significantly lower compared to traditional aerial and satellite images. When scanning along roads, we assess that our equipment has sufficient capacity for scanning roadsides, as it only requires mounting onto the roof of vehicles that are already driving on the road.
8. Conclusions
This article has demonstrated a camera assisted roadside monitoring system that makes it possible, at high speed, to record images of invasive alien plant species along the Danish state roads. It was subsequently possible to identify the plant species in the images with reasonable precision and recall using deep learning. The developed data collection platform consisting of a machine-vision camera, global positioning system, a remote control and a processing platform enabled high resolution image acquisition at speeds up to 130 km/h on the Danish motorways. Multiple off-the-shelf deep learning models were trained and evaluated for detecting six invasive alien plant species (Cytisus scoparius, Lupinus polyphyllus, Pastinaca sativa, Reynoutria, Rosa rugosa, and Solidago) in the collected images. The models showed reasonable precision and recall in recognizing the invasive alien plant species. However, there is still room for improvement, particularly with respect to Lupinus polyphyllus and Pastinaca sativa, which were difficult to detect, probably due to the lack of images of flowering individuals. Future work may also include additional invasive alien plant species, such as Heracleum, which was excluded from this work, due to a lack of data material, or plant species considered invasive alien plant species in other countries than Denmark. Further improvements in precision and recall may be achieved by collecting images while the plants are in flowering.
Based on the results presented in this article, we believe that a real-time system with full coverage at driving speeds of 110 km/h will be possible in the near future to map invasive alien plants automatically.