1. Introduction
In recent years, unmanned aerial vehicles (UAVs) have driven some of the most important sectors of the economy. Their benefits are hard to ignore, and their versatility makes them suitable for almost every industry. UAVs, easily equipped with cameras and range sensors, can cover large areas in short periods of time while inspecting, recording and building maps. Sectors such as construction [
1,
2] use UAVs for asset monitoring, while surveillance applications [
3,
4] focus on autonomous monitoring of homes and businesses. In the field of conservation and exploration, there are projects using drones to monitor the natural environment and wildlife [
5] or to discover the extent of ancient buried civilizations [
6].
The integration of robots into the renewable energies sector has also been growing [
7]. Given the alarming concerns around global warming, researchers are now looking at ways to reduce costs and accelerate the performance of wind and solar plants. In this direction, the DURABLE project [
8] considers the collaboration of a heterogeneous multi-robot system to automate solar panel inspection and repair tasks. In this joint project, a subset of UAVs provides fast inspection of the solar plant, while Unmanned Ground Vehicles (UGVs) work as inspector of individual solar panels as well as charging station for the UAVs. This functionality requires the UAV to autonomous follow and land on the UGV.
This paper proposes a vision-based system in which a UAV can autonomously follow a moving UGV, and enable the UAV to land on a UGV serving as a landing platform. In this way, the main contributions are the following:
A custom-designed landmark pattern composed of ArUCo markers [
9,
10] and a method to estimate the UAV relative position and heading w.r.t. the UGV is presented.
A hierarchical controller for the following and the landing that runs onboard the UAV and that exclusively relies on markers is developed. Concretely, an Ardupilot flight controller is employed as autopilot at the low-level, and a single-board computer (SBC) implements a trajectory tracking high-level controller for relative 3D position and heading, with a trapezoidal profile speed generator as feedforward and four decoupled PI controllers in the feedback loop.
Tests using a realistic heterogeneous multi-robot simulator as well as in real-world outdoor scenario are presented.
The rest of the paper is organised as follows.
Section 2 highlights the main contributions of the work with respect to the most related studies. Then,
Section 3 describes the aerial and ground vehicles used in this work as well as the simulation tools and auxiliary equipment. In
Section 4 the autonomous following and landing system is presented. Simulated and real experiments are discussed in
Section 5. Finally, conclusions, acknowledgements, and references complete the paper.
2. Related Work
Previous work in this area has explored different markers and control strategies to safely perform the task of following and landing a UAV. Works such as Baca et al. [
11], Falanga et al. [
12] have used a custom landmark represented by a crossed circle surrounded by a rectangle, and rely on range finder sensors, whereas Polvara et al. [
13] used only the crossed circle as a reference point.
In the marker processing state, Baca et al. [
11], Falanga et al. [
12] apply adaptive thresholds in which the shapes are detected in a predefined order and matched against the previously known standards. Past this stage, Baca et al. [
11] follows a Model Predictive Control (MPC) strategy to track the moving target whilst a commercial flight controller provides the measurements regarding the UAV position, velocity and orientation. The latter are corrected by a differential RTK (Real Time Kinematic) GPS using LKF (Linearized Kalman filter) fusion as well as the vertical position estimate, assisted by a TeraRanger range finder and the landmark detection algorithm.
On the other hand, Falanga et al. [
12] follows a non-linear control strategy that drives the quadrotor forward towards the desired trajectory using a high and low-level controller. The high-level controller takes the difference between the reference and estimated position, velocity, acceleration and jerk as inputs and returns the derived collective thrust and body rotations. The low-level controller takes the outputs of the high-level controller and computes the necessary torques to apply to the rigid body. The work developed in Polvara et al. [
13] has been tested only in simulation and it takes a slightly different approach as it implements a hierarchy of Deep Q-Networks (DQN) for each step of the landing phase: landmark detection, descend manoeuvre, and touchdown. It also uses a PID (Proportional–Integral–Derivative) controller to assist the final touchdown manoeuvre.
In Lange et al. [
14], the authors use a sequence of rings surrounded by a hexagonal shape as the reference marker. The visual tracking system identifies the landing pad through the unique radius of each circle, making it distinguishable at high and low altitudes. The marker detection relies on image segmentation with a fixed threshold and in image invariant moments. Regarding the control actuation, the algorithm starts by correcting the measurement of the distance to the landing pad with the current pitch and roll angles. Following this step, a PID controller takes these corrections and computes the necessary motion commands to keep the UAV steady above the centre of the landing pad. In their setup, a ground station is required to process the images from the onboard camera, to run the PID control loop and to generate the necessary motion commands.
In this line of thought, the work developed in Lee et al. [
7], Hui et al. [
15], Cabrera-Ponce and Martinez-Carranza [
16] also use custom markers for detection. Hui et al. [
15] employs a white circle with a 20
radius, Cabrera-Ponce and Martinez-Carranza [
16] relies on a flag and H-shaped tag, whereas Lee et al. [
7] focuses on a red rectangle placed on top of the moving target.
Although tested and proven with quality results, custom markers make the whole detection, tracking and landing algorithm computationally more expensive. Compared to fiducial tags, available off the shelf as open-source algorithms, the entire procedure becomes harder to implement.
To the knowledge of this work, there are different kinds of fiducial markers, among which ARTag, AprilTag, ArUcO and STag are the most common [
17]. Works developed by Delbene et al. [
18] and Gautam et al. [
19] use AprilTags to assist the landing, whereas Chang et al. [
20] relies on ArUco tags.
Delbene et al. [
18] proposes a methodology that estimates the target’s relative pose and velocity, employing not only AprilTags on the landing platform but also ultrasonic sensors on the UAV. According to these authors, ultrasonic sensors added robustness during the final landing phase, given the unreliability of the measurements achieved with AprilTags. Although tested under simulation with the recreation of realistic behaviours of the landing platform, the work does not present tests done in a real-world marine environment. Moreover, the ultrasonic sensor only provides the altitude, and it is often unreliable due to their small field of view. This constraint leads to a poor cost-performance trade-off, given that the sensory system has another input to process. As presented in our work, the markers should be enough to estimate the UAV relative position as well as the heading.
Gautam et al. [
19] addresses the same problem by proposing a vision-based guidance approach with a log-polynomial closing velocity controller integrated with pure pursuit guidance. In their work, the landing pad detection algorithm uses a combination of colour segmentation and AprilTags to ensure flexibility and detectability from low and high altitudes. For better altitude estimates, the authors have also used a LiDAR. In this work, the vision pipeline chooses a random AprilTag as the landing target centre, which it keeps tracking during the landing phase. If the camera system loses this marker, the algorithm initializes the tracking algorithm with a new randomly selected AprilTag. This idea seems rather unusual as it focuses on one randomly selected tag at the time. Furthermore, the approach focuses on AprilTags to assist in the landing, but it does not use them for pose estimation.
The work of Chang et al. [
20] proposes an autonomous landing system based on the implementation of a ground-effect trajectory. Regarding the UAV position estimation, the work exploited a sensor fusion-based algorithm based on a Kalman filter. The estimation method used Inertial Measurement Unit (IMU) data, stereo depth information, ArUco markers and YOLO object detector. Although it focuses on minimizing the demand for the UAV payload whilst maximizing the usage of the computational power, having the computation unit exclusively located on the ground vehicle seems feasible for landing purposes but not achievable in application cases.
Rodriguez-Ramos et al. [
21] developed a deep reinforcement learning strategy for autonomous UAV landing on a moving platform. The work focuses on indoor scenarios, employing an Optitrack motion capture system (Mo-cap) to accurately localise both vehicles, as well as a workstation to implement the UAV controller and command it through a wireless link.
3. System Description
The presented work has been tested in a system consisting of a drone and a ground vehicle. The aerial platform is based on a DJI F550 hexacopter (see
Figure 1). It is equipped with a Hex Cube Black flight controller with a vibration damped IMU and a Here+ GNSS receiver. It runs Ardupilot and provides takeoff functionality and a
guided mode to externally control the drone horizontal location, altitude and heading [
22]. A Jetson Xavier NX onboard computer with 6-core ARM CPU, 284-core NVIDIA GPU, and 8 GB RAM running Ubuntu and ROS is used for high level tasks, including pose estimation and speed set-point generation to command the drone. Communication between the flight controller and the companion computer is achieved using a serial interface and MAVROS, a MAVLink-to-ROS gateway with proxy for Ground Control Station [
23]. Images are provided by the
field-of-view RGB monocular camera of an Intel Realsense 435 device mounted on board the drone. The UAV is powered by two
lithium polymer (LiPo) batteries connected in parallel with a total capacity of 8000 mAh, allowing a flight time between 12
and 15
.
The ground vehicle is a modified Jackal mobile robot from Clearpath Robotics. A 50
width and 56
long landing platform with a marker pattern has been added on top as shown in
Figure 2. Jackal can perform way-point navigation as well as being teleoperated via a wireless gamepad.
A laptop computer with a gamepad was also employed during the tests. It connected wirelessly to the UAV onboard computer and allowed performing tasks such as:
UAV initialization and mode selection (teleoperated or autonomous),
UAV teleoperation via the gamepad, or
sending relative pose set-points or landing commands.
The Gazebo environment developed for simulation purposes is a digital twin of the actual solar farm used in the real-world tests for the DURABLE project. The simulation aggregates multiple ROS packages from which
multi-jackal,
ardupilot and
ardupilot-gazebo are the most relevant. These packages support multiple modified Jackal robots and a quadcopter with an ardupilot flight controller and an onboard RGB camera that enable Software-in-the-Loop (SiL) simulations [
24] (see
Figure 3).
5. Results
The proposed method was tested to evaluate its effectiveness. Concretely, we performed several experiments: (i) to verify the reliability of the estimations computed with the ArUCo markers, and (ii) to evaluate the following and landing algorithms in simulation and real-world scenarios. The system configuration used regarding the localization pattern, the onboard camera and the speed set-point generator is presented next.
5.1. System Setup
The camera is configured to provide images with a resolution in pixels at frame rate. The pitch and roll angles of the onboard camera frame w.r.t. the body frame are approximately equal to rad.
The pattern used to estimate the relative localization between the vehicles is built using markers from a
ArUco dictionary [
30]. It is shown in
Figure 9 with its center highlighted using a red cross.
Table 1 provides the dimensions and position of the markers w.r.t to the pattern frame. The number of markers and its lengths,
, where selected to provide real-time robust detection at different heights. Given this pattern, the
aruco_detect_node running on the Xavier NX onboard computer provides estimations of the relative pose at a maximum frequency of 14
.
The parameter of the trapezoidal speed set-point trajectory generator and the PI controller gains are gathered in
Table 2.
5.2. Pose Estimation Reliability Test
A Motion Capture System (Mo-cap) based on OptiTrack Prime 41 cameras was used to ensure that the position and heading estimations computed with the ArUCo markers were correct. The system 3D accuracy is according to the manufacturer, so it can be considered to provide ground-truth measurements. For this test, the ArUcO marker pattern was fixed to the floor and several passive markers were attached to the drone to be localized with the Mo-cap system.
Two experiments were performed to verify the accuracy of the measurements. Firstly, the drone was moved manually in the
X and
Y directions at a fixed altitude without tilting. Then, at
, a full rotation w.r.t. to the drone z-axis was performed.
Figure 10a shows that the average position estimated with the ArUcO markers are very close to the ground-truth provided by the Mo-Cap system. The estimated values for the yaw angle provided by each marker agree with the ground-truth as seen in
Figure 10b. However, the estimated roll and pitch angles show large deviations, mainly on the smallest markers (55 and 168).
In the second test, the drone was moved manually describing a circular path at around 2
altitude centered w.r.t the marker pattern frame origin. During this motion, the drone was heavily tilted to simulate extreme flight conditions. At approximately
s a rotation was applied to change its heading while tilted. As can be seen in
Figure 11a,b, the ArUcO markers provide accurate estimations for the position and yaw angle, as well as, for the roll and pitch angles with the exception of some outliers.
Additional tests were conducted to find the maximum and minimum height at which the marker were detected, given as a result 4 m and 0.23 cm, respectively. Another case of markers being lost is when they fall out of the camera’s field of view, but this has not been taken into account as long as the UAV is moving faster than the UGV and the UAV is in autonomous mode, as the high-level controller will keep the markers in view when trying to follow the UGV. However, if the UAV loses the target an additional mode in the high-level controller could be implemented to move the drone faster, during a maximum amount of time, in the direction given by the last known position of the UGV Furthermore, if required, it is possible to enhance the field of view of the drone with a gimbal system or a wide-angle camera.
In view of the test results, it can be concluded that the ArUcO markers provide reliably estimation of the relative position and the yaw angle of UAV w.r.t. the marker pattern, so they can be employed to control the drone autonomously.
5.3. Simulation Test
A sequence of following and landing actions are performed in simulation using the solar farm digital twin presented in
Section 3. At the beginning of the experiment Jackal is stopped and the UGV is on top of its landing platform. Next, the actions performed are described in chronologically order and indexed with labels included in
Figure 12 (top presents the speed-set point computed by high-level controller and bottom the estimated relative pose of the stabilized body frame w.r.t. the pattern frame):
- (a)
The UAV autopilot is in guided mode and it is commanded to take off and reach altitude.
- (b)
The UAV high-level controller mode is changed from teleoperation to autonomous and a set-point to approach the landing platform is commanded,
. To achieve this, speed set-points with a trapezoidal profile are generated by the high-level controller (see
Figure 12, top).
- (c)
The UAV is commanded to rotate 90
and reach a higher location over the front side of Jackal by sending
.
Figure 13(1) shows the state of the UAV at the end of this motion.
- (d)
The UAV reaches the commanded set-point, see
Figure 12, bottom, and Jackal starts moving describing a circular path. The PI controllers adapt the UAV speed set-points to follow Jackal maintaining the previously commanded relative pose set-point. This is illustrated in
Figure 13(2)–(6).
- (e)
Jackal stops so the PI controllers reduced the commanded speed set-points as shown in
Figure 12, top.
- (f)
The UAV is commanded to approach the landing platform and to rotate so it is properly aligned for landing,
.
Figure 13(7) shows an intermediate state of this motion.
- (g)
The UAV is commanded to land. In
Figure 13(8) it can be seen the UAV approaching the landing platform.
- (h)
The UAV lands successfully as shown in
Figure 13(9).
5.4. Real-World Tests
Real-world tests were conducted in the outdoor facilities of Instituto Superior Técnico (IST) in Lisbon, namely on the football court marked red in
Figure 14.
First, following capabilities of the UAV were tested with the marker pattern attached to the landing platform of Jackal UGV.
Figure 15 shows the speed set-points commanded to the UAV autopilot and the pose of the UAV w.r.t. the marker pattern estimated by the onboard computer during the test. A sequence of images captured by the onboard camera after being processed by the
aruco_detect node are included in
Figure 16. Prior to the test, a human operator took off the UAV and switched its mode to guided. Then, the experiment begun with the UAV in autonomous mode at an altitude of
, and Jackal positioned in the center of the football court. The performed actions are described below using as reference the labels included in
Figure 15.
- (a)
A relative pose set-point is commanded to center the UAV on top of Jackal at 2 m altitude,
, so speed set-points to reduce the altitude and the heading are generated as shown in
Figure 15, top. At approximately at
the relative pose set-point is reached and it is maintained by the PI controllers (
Figure 15, bottom). The images captured by the onboard camera while the UAV is descending and rotating are presented in
Figure 16(1)–(3).
- (b)
Jackal is teleoperated to move forward (see
Figure 16(4)–(6)), so the PI controllers begin to increase
to maintain the commanded relative pose and follow Jackal, as shown in
Figure 15b.
- (c)
Jackal is commanded to rotate to its left, as shown in
Figure 16(7),(8)), so
is increased by the UAV high-level controller (
Figure 15(c), top). Next, a sequence of forward and left turn commands lead Jackal to the initial test position while the UAV autonomously follows it (see
Figure 16(9)–(12)).
The autonomous landing experiment begins with Jackal stopped at the center of the football court, while the UAV is in autonomous mode at 2
height over the UAV, but not centered. The performed actions are described below using as reference the labels included in
Figure 17:
- (a)
The UAV is commanded to approach Jackal for landing by sending the set-point,
. The computed speed set-points reduces the altitude and center the UAV over the landing platform as shown in
Figure 17, bottom, and
Figure 18(1)–(4).
- (b)
The position and heading errors are detected to be small enough (see
Figure 17, bottom, and
Figure 18(5)) so the UAV is commanded to land. The set-point
is reduced and the UAV lands on Jackal’s platform (see
Figure 18(6)).
Although there were external disturbances such as wind, which reduced the accuracy of the high-level controller compared to its simulated counterpart, the UAV was able to follow the Jackal UGV and land on its platform successfully.
6. Conclusions
This work has presented a vision-based method that allows an aerial robot to autonomously follow and land on a ground mobile platform. This approach uses a custom-designed landmark pattern based on ArUCo markers as a guiding system for the UAV, which has been validated using a Mo-cap system. Unlike other implementations, it relies exclusively on the markers for following and landing. The developed system accepts relative position and heading set-points between the UAV and the UGV, which are reached by planning straight line segments from the current UAV location. The UAV controller has been implemented using a hierarchical structure: an Ardupilot-based commercial flight controller has been used as the low-level controller, while the high-level controller has been implemented in the UAV onboard computer using ROS. The low-level controller accepts UAV speed set-points computed by the high-level controller using a trajectory control scheme, with a trapezoidal profile speed generator as feedforward and four decoupled PI controllers in the feedback loop.
The proposed framework has been tested in simulation and real environments using the digital twin of a solar farm and at the outdoor facilities provided by ISR Lisboa, respectively. In both scenarios, the UAV has been able to autonomously follow, with a specific relative pose, a teleoperated ground mobile robot equipped with a landing platform and a marker pattern on top, as well as to land on it when commanded to do so.
A possible future line of work is to investigate the scalability of the proposed approach for scenarios with several aerial and ground mobile robots operating simultaneously. This could involve the development of new algorithms and protocols to enable coordination and collaboration between robots to tackle more complex tasks.