Dynamic Camera Planning for Robot-Integrated Manufacturing Processes Using a UAV

Boshoff, Marius; Kuhlenkötter, Bernd; Koslowski, Paul

doi:10.3390/robotics14030023

Open AccessArticle

Dynamic Camera Planning for Robot-Integrated Manufacturing Processes Using a UAV

by

Marius Boshoff

^*

,

Bernd Kuhlenkötter

and

Paul Koslowski

Chair of Production Systems, Ruhr University Bochum, 44801 Bochum, Germany

^*

Author to whom correspondence should be addressed.

Robotics 2025, 14(3), 23; https://doi.org/10.3390/robotics14030023

Submission received: 13 January 2025 / Revised: 5 February 2025 / Accepted: 17 February 2025 / Published: 21 February 2025

(This article belongs to the Section Sensors and Control in Robotics)

Download

Browse Figures

Versions Notes

Abstract

:

The optimal viewpoint for monitoring robotic production processes is crucial for maintenance, inspection, and error handling, especially in large-scale production facilities, as it maximizes visual information. This paper presents a method for dynamic camera planning using an Unmanned Aerial Vehicle (UAV), enabling collision-free operation and measurable, high perspective coverage for a user-defined Region of Interest (ROI). Therefore, optimal viewpoints are searched with a greedy search algorithm and a decision for the optimal viewpoint is derived. The method is implemented within a simulation framework in Unity and evaluated in a robotic palletizing application. Results show that the use of a UAV as dynamic camera achieves up to twice the perspective coverage during continuous flight compared to the current capabilities of static cameras.

Keywords:

UAV; view-point-selection; model-based sensor planning; dynamic sensor planning

1. Introduction

For robot-integrated manufacturing processes, such as handling, painting tasks, or obstacle detection, it is essential that the process or product is inspected regularly to ensure process quality. The location of a camera to support the inspection should be chosen so that the camera image continuously maximizes the capture of visual information, is free from occlusion by scene objects, and clearly shows the target object [1,2]. Methods for the selection of this viewpoint are referred to as camera planning and focus on the selection of an optimal viewpoint, which is to be chosen depending on the application, particular constraints of the camera or the scene, and the temporal sequence of actions [3]. Within the field of cognition science, the decisive factors for maximizing visual information are well known, extensively discussed [4], and algorithmically abstracted for optimal viewpoints on objects [5]. The optimal viewpoint captures the object or situation completely, at a defined viewing angle, is free of visual obstacles and at a minimum distance from the object of interest, or Region of Interest (ROI) [6,7].

The literature distinguishes between three different forms of camera systems and their planning: static, semi-static, and dynamic camera planning.

Static installed cameras are attached to a point in the outside world, e.g., a tripod or the fence in the robot cell, and have a fixed viewing direction such as a direct view into the robot’s working area [8]. A hybrid form is represented by semi-static cameras that are attached to the manipulator to support the gripping or machining process from one perspective [9,10]. Actual optimization of the viewpoint in a running application does not actively take place in semi-static applications. Additionally, the choice of viewpoint is often not preceded by a careful consideration of the factors that define the quality of the viewpoint. Instead, the authors attempt to efficiently solve a handling task using the recorded camera image based on methods of visual servoing [11] and object recognition [10]. The camera attached to the manipulator is often referred to as eye-in-hand camera and deals with the transmission of human behavior in hand–eye coordination, as in Ref. [12].

In contrast, a dynamic camera can move freely during observation and can therefore adopt a viewpoint that is considered optimal for supporting a task under restrictions, such as distance or angle to the object being observed. In terms of robot applications, the dynamic selection of the camera position requires a decoupling of the camera and gripping system, after which both objects can move freely in relation to each other and the optimum viewpoint can be adopted for each situation. Figure 1 shows multiple perspective views from the already introduced static (a), semic-static (b,c), and dynamic (d) viewpoint into a robot-integrated manufacturing process.

For the observation of robot-integrated processes, static camera systems are already widely used, whose position is defined once and which observe an ROI from a fixed perspective [13,14]. There are numerous approaches to this, within which the automatic generation of optimal viewpoints is already described, as in Ref. [14]. However, static cameras are limited in their perspective coverage, i.e., their ability to capture an ROI according to user specifications for relative image position and resolution. The reason for this is that the camera cannot be moved when the ROI is moved and can often only be adjusted by rotating, zooming, or the support of multiple cameras [9,15]. Moreover, for covering all relevant positions comprehensively, static cameras need to be highly calibrated, might be expensive due to the amount of cameras, and their installation is time-consuming [15]. However, for dynamic processes such as those that are increasing due to the growing use of mobile robotics, a static camera is often no longer sufficient, and a camera must correspond to the dynamic nature of the process. A new and particularly exciting approach in this context is the use of an Unmanned Aerial Vehicle (UAV) or multicopter as a dynamic camera, already described in Ref. [16]. UAVs offer ideal conditions, as they offer 6 degrees of freedom, are particularly maneuverable, and can be controlled using open-source frameworks with sufficient path accuracy [17]. This approach stands out due to further consideration of coverage and collision problems, as well as safety aspects in a robot cell. The different camera approaches are visualized in Figure 1. However, the use of a dynamic camera that can freely and independently move to any collision-free point within a robot cell poses several challenges. These include finding the optimal, occlusion- and collision-free viewpoint for the current handling situation and collision-free path planning to this viewpoint. In addition, situation-dependent challenges such as the speed of reorientation of the UAV due to sudden changes in the defined ROI or situations without a determinable optimal viewpoint must be considered.

To date, however, there are only a few works that present an application of a fully dynamic camera in a robot application. Rather, there are systems tailored to specific applications that guide a camera, or a system for AI image recognition is used that has been trained for state recognition from a static perspective [10]. What these systems lack is the complete dynamization of a camera, collision-free path planning between optimal viewpoints, and a general method that allows safe cooperation between two robots whose workspaces overlap. For the realization of this application, three fundamental questions need to be answered:

How can the optimal viewpoint for any robot application be determined and evaluated?
What methods can efficiently and accurately verify collision-free paths between two points in a continuous simulation?
What is the optimal system configuration, including minimum distances and collision zones, for ensuring safe cooperation between UAVs and industrial robots?

To answer these questions, we implement our approach in Unity and evaluate it in two different robotic applications. Therefore, the technical requirements for an optimal viewpoint for a robot application are first compiled through a literature review within chapter 2. Next, we present the conceptual framework for the technical implementation of the method in Unity. Finally, the results of the dynamic tracking of the optimal viewpoint are presented, discussed, and compared with the results of four static cameras.

2. Related Works

The challenge of determining the best viewpoint for a robot application has long been known in the literature and is referred to as Machine Vision Planning (MVP) [18], Dynamic Sensor Planning [19], or Automatic Sensor Placement [20]. In subsequent years, these terms have become more mixed, but all aim to determine the best viewpoint for an object of observation. The preliminary work presented is divided into relevant work from the field of robotics, which deals for the first time with the general benefits and initial procedures for static sensor planning, and work that represents the current state of the art in the field of dynamic sensor planning. Works on static sensor planning include technical requirements for observation points and their dependence on the conditions and relative position of the object to be recognized. For the first time, they also present solutions that can be used to continuously determine an optimal but static viewpoint for a sequence of movements, i.e., for the execution of a robot program. This forms a basis for dynamic sensor planning in today’s applications, which is, however, extended by specific requirements of the application or the robots used.

2.1. Camera Planning

Back in 1988, the authors Cowan and Kovesi [20] focused on automatically generating camera locations for observing objects while considering various constraints. Their method involves computing geometric relationships between camera position and given constraints facilitated by robot motion. For a camera view to be considered adequate, they delivered several requirements: the view must achieve a minimum spatial resolution, encompass all surface points, ensure that all surfaces are within the sensor’s field of view, and guarantee that no surface points are occluded. Their results demonstrate the feasibility of finding suitable viewpoints for different inspection tasks, ensuring resolution, focus, and field of view requirements are met.

Followed by Tarabanis et al. in 1991, the challenge of automated sensor planning for achieving clear, well-proportioned views of robot tasks within a first dynamic approach is shown by the authors in Refs. [21,22,23]. In their application, two industrial robots are used to solve a task—one robot freely guides a camera, and the other robot carries out a manipulation. They named their system Machine Vision Planner (MVP) [18], which is a highly influential work in the field of sensor planning. Their methodology employs a probability-based global search technique to optimize viewpoint selection while considering constraints such as reachability, collision avoidance, and task visibility. While effective in providing good viewpoints, further enhancements, such as integrating path planning, they suggested future improvements.

In 1995, Triggs et al. [13] presented their approach for determining an optimum viewpoint for a robot application in which one robot also guides a camera and another robot performs the actual handling task. The aim of the paper is to develop an automated method for choosing the most appropriate position for the camera, given the task and environmental constraints, in order to provide optimal performance for robot vision tasks. Thus, the camera planner is capable of adapting to dynamic changes in the workspace. The recursive search developed by them begins with the decomposition of a search space into sub-search spaces. Using an optimization function, a function value is formed for each search space and the search space with the lowest function value is broken down again into sub-search spaces. In this way, they approximate the solution of an ideal position or region for the camera position.

This work was followed by the method of offline planning for optimal viewpoints for an active robot cell using the swept volume method by Abrams et al. in 1999 [24]. Viewpoints, orientation, and camera settings can be determined within a time interval by knowing the industrial robot’s movements in advance. The authors identify visibility, field of view (FOV), resolution, focus, and lighting as necessary requirements for viewpoint determination. The swept volumes mapped in the simulation allow the definition of candidate volumes in the robot cell. Candidate volumes are regions of the robot cell from which the target features are visible without occlusion; all constraints are fulfilled and therefore are suitable as a search space for the optimization algorithm. In their search method, the authors use an analytical approach to model visibility, resolution, focus, and FOV. A gradient-based search is performed to maximize the distance between the viewpoint and the boundaries of the candidate volumes within which the constraints are satisfied. Although the article discusses dynamic camera placement, the cameras used are static systems.

The preliminary work of Cowan and Kovesi [20], Tarabanis et al. [21,22,23], Triggs et al. [13], and Abrams et al. [24] are fundamental works in the field of robotic vision, which still provide the current approaches for determining ideal viewpoints in robot applications. Cowan and Kovesi [20] use a geometric approach, Tarabanis et al. [21,22,23] and Abrams et al. [24] show this with an analytical method, whereas Triggs et al. [13] use a local search method for the optimal viewpoint.

2.2. Applications in Robot Vision Involving Dynamic Camera Planning

The optimal viewpoint for a robot task should maximize visually perceptible information through complete and unobstructed view on the relevant ROI in the robot cell. The effort involved should be minimal, e.g., the time and path length needed. The applications of dynamic camera planning are diverse and are, therefore, divided into two application groups: Reconstruction and Inspection, and Teleoperation.

Reconstruction and Inspection: In reconstruction and inspection, also known as reverse engineering, the aim is to fully capture a building model, for example, through three-dimensional scanning with a LIDAR sensor. To achieve this, the areas of a building are visually captured by a UAV’s onboard camera while optimizing the flight path for minimal length or duration, with the goal of achieving maximum coverage. The underlying planning problem is referred to as the next-best-view or view-planning problem. Challenges lie in deciding for the fewest possible viewpoints that entirely cover a target object or a scene and define an order in which the viewpoints are taken by the UAV. The viewpoints are determined from geometric model knowledge, within which characteristic points are either specified by the user or determined automatically. The literature reviews in Refs. [25,26] provide a comprehensive overview of the work in this field. Of particular note is the works by Magana et al. in Refs. [8,27], which also hold detailed literature reviews.

Teleoperation: In teleoperation, a user remotely controls a robot. The control is supported by a supplied camera image which offers an egocentric or exocentric view of the handling situation. The egocentric view provides a perspective from a fixed camera positioned on the robot itself, typically aligned with the robot’s own frame of reference. This viewpoint offers a direct, first-person perspective of the robot’s actions and immediate surroundings. In contrast, the exocentric view is positioned outside the robot system, offering a third-person perspective that captures the robot and its manipulation area from an external viewpoint. This external perspective allows for a broader overview of the robot’s interactions with its environment. The use of an additional camera image—whether egocentric or exocentric—is designed to enhance task performance by providing the user with critical visual information. This supplementary view supports better process monitoring and facilitates status recognition, enabling the identification of task progress, potential errors, or obstacles in the robot’s workspace. The literature review in Ref. [28] summarizes a comprehensive state of the art of teleoperation in robot applications. In the field of teleoperation, there are multiple studies that deal with the optimal viewpoint for certain processes, the reduction of a user’s cognitive workload, or the measurability of support. The authors in Ref. [29], for example, name the evaluation of manipulability, passability, accessibility, and maneuverability as the relevant states in mobile robotics that can be resolved visually. In the case of manipulability, an external viewpoint can be taken that detects the open gripper and the object to be handled in the lateral view, and thus recognizes the relative position of the two subsystems [9,30]. This actively supports the successful gripping of an object. The navigation of a mobile robot, for example, can be supported by the top view of a mobile robot by distinguishing obstacles from available travel paths and identifying short paths to reach a target point [31]. For classic applications of industrial robotics, there are a variety of visual support possibilities, which are the tracking and readjustment of automated welding processes [32], inspection tasks [33], or in robot-integrated assembly and quality assurance [34]. Regarding contributions from the field of teleoperation, these seem to be the closest to the target application envisaged here. For this reason, work that uses a flying robot for teleoperation is being investigated further.

2.3. Visual Servoing a Dynamic Camera in Robot Tasks for Viewpoint Optimization

Gemerek et al. [35] plan the observation of a randomly moving target point in a known static environment using a directed visibility graph in two-dimensional, continuous space. The FOV of the flying robots used is represented approximately as a cone in the plan view, but its range is limited. The authors define “coverage” as the uncovered area of FOV (i.e., the cone area), which is reduced by objects and the shadows they cast. The aim of the observation is to capture the target point as centrally as possible in the field of view and to achieve the highest possible coverage rate. Planning is carried out continuously using an iterative Monte Carlo search tree, which determines an optimum observation point for target detection by checking random observation points in the vicinity of the current position of the flying robot.

Rakita et al. [36,37] deal with the visual support of tele-manipulation by an industrial robot that dynamically guides a camera to an optimal viewpoint. Their method aims at determining collision-free viewpoints and guiding the robot to this point so that an unobstructed view of the end effector is obtained. For this purpose, the authors model an inner and outer cone of vision that starts from the direction vector of the end effector. The aim is to guide the camera as close as possible to the inner cone of vision, as this viewpoint in the direction of view of the manipulation is assumed to be optimal by the authors. Using multi-criteria optimized real-time motion planning, the viewpoint of the camera is gradually shifted towards the inner cone of vision, considering factors such as the end effector pose, the axis’ speed of the manipulating robot, and possible collisions between the two robots. If an evasive movement of the camera robot is necessary, it is carried out in the same direction in which the camera was originally moved to the viewpoint.

In their work, Jia et al. [38] develop and investigate the use of a Deep Neural Network (DNN) for the prediction of an optimal viewpoint for the support of an operator. In their approach, a camera attached to a second robot arm autonomously keeps the tool center point (TCP) of the first robot in view and thus reduces the cognitive load during teleoperation. The system continuously suggests new viewpoints that can be actively selected by the operator. Since neural networks require enormous amounts of data, they focus on efficient training strategies, within which they use an operator’s selected viewpoints for continuous online training of the DNN in a simulation environment. An optimal viewpoint is predicted by calculating a score based on the current viewpoint, the gripper pose, and the gripping movement of the last few seconds. Additional model or geometry information is not included in the evaluation. The DNN is used to determine multiple viewpoints on a spherical surface with a constant radius around the TCP, but without checking for collisions or perspective coverage on the movement path.

2.4. Using a UAV as Dynamic Camera Within Robot Tasks

The use of a UAV as a dynamic camera for the teleoperation of any robot is mentioned first in the work of Saakes et al. in 2013 [39]. For search and rescue missions, the authors show the potential of aerial images for reducing the collisions of teleoperated ground vehicles and the advantages for finding victims from a bird’s eye view. The UAV follows a marker attached to the vehicle and keeps a relative position to the vehicle. The aim of the paper is to evaluate the usefulness of third-person camera images for teleoperation. The authors evaluate their assumption in a user survey of the system. Despite the advantages of the flying camera in terms of clarity and flexibility, users note slow turns, instability, and feeling uncomfortable when using the UAV as decisive factors for insufficient visual assistance.

This work is followed by the work of Claret et al. [40,41,42], in which the teleoperation of a mobile robot is supported by a UAV as a dynamic camera. The mobile robot is controlled via a haptic interface and a UAV moves in relation to the position of the mobile robot. As a safety and working area, the authors create a cylindrical envelope around the mobile robot that cannot be entered by the UAV. One focus of the authors’ contributions is avoiding occlusion by the robot’s joints during operation. For this purpose, the position of the UAV is set in relation to the joint positions, and occlusion-free configurations of the robot are determined using the force field formulation. Unfortunately, the articles lack evaluations that examine practical use in corresponding scenarios. It is questionable whether the manipulation is restricted by adopting an axis configuration of the mobile robot to avoid occlusions and whether the actual control is disturbed as a result. In addition, it appears that there is a simpler way to determine visual occlusion of joints and volumes than the authors’ theoretical approach.

In their works, Xiao et al. [1,29,43] use a UAV as flying camera for the teleoperation of an Unmanned Ground Vehicle (UGV). The UAV is tethered to the UGV for power, so the papers are heavily concerned with accessibility and collision issues of the UAV and its tether with the environment. The authors develop a risk model for path planning, in which individual risks along a path are summed up and compared with achievable rewards. An ideal viewpoint of the UAV can, therefore, provide greater value than the potential risk posed by reaching the position would cost. The determination of the optimal viewpoint to be taken is based on the Gibsonian affordances from Ref. [44], which originate from human perception and user studies. Although the authors conduct detailed experiments in interesting scenarios, relevant performance values for solving the tasks are not given. It is also unclear whether and how friction points of the tether with the environment affect the flight characteristics. Continuous flights around a robot, completely decoupled according to the definition of a dynamic camera, are unlikely. The flexibility of the UAV is extremely limited by the tether and is not a solution in environments with multiple collision volumes and interference contours. However, this is not the aim of the authors.

Another visual support for the teleoperation of a ground vehicle is considered by Gawel et al. [31], who provide a camera image from the top-down-view of a vehicle via a UAV. The UAV follows the ground vehicle at a fixed distance and height and moves in relative orientation to the vehicle during a turn or rotation so that the vehicle can be observed from a constant perspective viewpoint. The authors, though, did not consider collision problems of the UAV or the adoption of an optimal viewpoint.

Given our approach, the article by Senft et al. [45] is highly interesting. The authors use a UAV as a flying camera for the continuous tracking of a viewpoint, in this case, the end effector of the robot, and thus for teleoperation of the robot by a user. Therefore, they introduce a novel approach to address the challenge of providing adaptively minimally constrained camera views to support robot teleoperation, utilizing a UAV as a dynamic camera. Their solution focuses on ensuring collision-free, unobstructed viewpoints with optimal viewing angles and distances between the camera and the target. They employ a weighted-sum nonlinear optimization approach to minimize a geometric cost function to calculate the commanded drone pose. Additionally, they utilize a second global viewpoint optimization process to generate alternate viewpoints and switch manually between them in case of occlusion or a better view. Their method demonstrates the ability to generate viewpoints required to support robot teleoperation and quickly switch between them. Although the authors present a promising approach, collisions or perspective occlusions on the way to a new viewpoint are not considered. Additionally, according to our observations, it seems insufficient to determine only two optimal viewpoints per time segment, since several occlusion problems and obstacles between optimal viewpoints can arise in complex robot cells. Rather, the comparison of multiple optimal viewpoints and consideration of costs to change the viewpoint should also be part of an optimization method. In their evaluation, the authors implement their approach within a real-world application to observe a manipulation task. The user, though, must change manually between observing points. It is not fully clear if the user receives any information when to change the viewpoint and how the UAV is controlled to the next viewpoint. From the images, it looks as if the UAV follows a pre-defined path or moves up first, then sideways to be above the next viewpoint, and downwards for reaching it. Our understanding is that to relieve the user of this task, a change of position should occur automatically, by the shortest route and with the least loss of visual information. It is also unclear to what extent multi-criteria optimization makes sense for a one-criteria optimization problem, as the only optimization target is the maximization of the perspective coverage. Due to the manual triggering of a positional change by an operator, the method presented is not completely autonomous.

2.5. Requirements for Visually Assisting Robot Applications with a UAV

Our literature review shows that none of the existing works fully meets the requirements for using a UAV as a dynamic camera to continuously and autonomously observe an ROI within a robot cell. Therefore, this paper presents a method and software implementation that enables a UAV to serve as a dynamic camera for any robot, in any workspace, and across a wide range of tasks. The main purpose is for teleoperation, providing process insight by monitoring any application, or conducting fault investigations. For the implementation within an industrial robot cell, the requirements from Table 1 are figured out after analyzing the literature and user expectations for an ideal dynamic camera.

Using our method, we determine the optimal camera viewpoint based on the positional relationship of a user-defined ROI within an industrial robot application. The optimization process identifies a viewpoint that ensures unobstructed observation of the ROI while maximizing visual information gain, considering potential UAV collisions with the robot or objects in the scene. Additionally, the method calculates multiple optimal viewpoints simultaneously, accounting for the UAV’s switching costs between them. Thereby, the ROI is represented as a square area defined within the camera frustum. In contrast to the work of other authors, the method presented is intended to allow the continuous adoption of the optimal viewpoint and, thus, the visual information acquisition achieved to be measured. This enables a final comparison of the perspective coverage of the UAV with the perspective coverage of static cameras.

3. Our Approach

For the development of our method, an initial implementation and validation is conducted in Unity. To enhance comprehensibility, Figure 2 visualizes key elements of the scene, including the robot’s workspace, collision space, and the collision-free movement space available to the UAV. The ROI is represented as a square area defined by the camera frustum, with the safety sphere at its center. This inner area of the safety sphere, with radius

r

, is excluded from path planning, and the UAV must immediately move out of it upon entry. The radius of the safety sphere is not considered as part of the optimization, but manually defined in a first step, as it might be a part of future work. The safety sphere serves not only to prevent collisions, but also to maintain a target distance in real-world applications and to avoid entering the robot’s collision zone during rapid reorientation of the robot and ROI. Additionally, a capsule-shaped collider surrounds the robot’s axes, and a box collider encloses static scene objects, both acting as collision zones. The box collider is defined as the maximum bounding volume of static objects and must be avoided by the UAV. Furthermore, it is assumed that the camera on the UAV does not have an actuator for orientation and thus further degrees of camera movement. This would introduce an initially undesirable level of complexity and is not normally installed in small, lightweight UAVs such as those under consideration.

3.1. Measurability of Perspective Coverage from Geometric Relationship

The method we propose for evaluating perspective coverage

Θ_{c o v e r a g e}

is based on the ratio of the projected area of the region of interest

A_{p r o j}

to the total area of the Near Clip Plane

A_{N C P}

. This value is related to the position

p

of the given UAV or camera position and written as a functional dependency

f (p)

. As shown in Figure 3, which represents the camera’s viewport, the scene is visualized through the camera frustum. The process begins with the manual definition of the ROI, which represents the area

A_{R O I}

the user aims to observe during operation. The user defines the dimensions of the ROI and specifies its parent object within the scene, relative to which the ROI moves. For example, in a robot-assisted welding process, the ROI can be visually positioned in front of the TCP of the welding device to ensure an optimal view into the process. Objects within the scene are projected onto the Near Clip Plane (NCP), which corresponds to the visible part of the camera’s view. The Far Clip Plane (FCP) marks the boundary of the camera’s visible field and determines the farthest objects visible from the camera’s perspective. The area

A_{N C P}

is defined by the multiplication of height

h_{N C P}

and width

w_{N C P}

of the NCP and remains constant. The area

A_{R O I}

, however, is calculated by projecting the points

{\bar{c}}_{1 - 4}

on the NCP.

Since the camera model can be assumed to be an ideal camera system within the simulation environment, the projection of a point from three-dimensional point

p_{w o r l d} = (x_{c}, y_{c}, z_{c}, 1)

to

p_{p r o j} = (x_{p}, y_{p})

can be calculated geometrically, as stated in Ref. [46]. Therefore, we use the homogeneous projection matrix M with

d_{N C P}

as the distance from the camera center to the center of the NCP.

M = [\begin{matrix} d_{N C P} & 0 \\ d_{N C P} & 0 \\ 1 & 0 \end{matrix}]

(1)

Considering the position and rotation of the camera in space, the view matrix

T

is defined as

T = [R, C] \in R^{4 \times 4}

, where

R

represents the rotation matrix and

C

is the camera’s position vector. The rotation matrix

R

consists of the right (X-axis), up (Y-axis), and forward (Z-axis) vectors of the camera’s coordinate system:

T = [\begin{matrix} X_{x} & X_{y} & X_{z} & - C_{x} \\ Y_{x} & Y_{y} & Y_{z} & - C_{y} \\ Z_{x} & Z_{y} & Z_{z} & - C_{z} \\ 0 & 0 & 0 & 1 \end{matrix}]

(2)

Using

M

and

T

, we can calculate the projected corner point

{\bar{c}}_{i}

, as long as it lies within the

F O V

and is thereby visible in the camera’s viewport.

{\bar{c}}_{i} = M \cdot T \cdot c_{i}

(3)

Since the projected surface

A_{p r o j}

can form a quadrilateral due to an oblique viewing angle, it is divided into two triangles. The area of each triangle is determined by computing the cross product of the difference vectors that define a triangle. The addition of both areas then results in the projected area

A_{p r o j}

.

A_{p r o j} = \frac{1}{2} |\vec{{\bar{c}}_{1} {\bar{c}}_{2}} \times \vec{{\bar{c}}_{1} {\bar{c}}_{3}}| + \frac{1}{2} |\vec{{\bar{c}}_{1} {\bar{c}}_{3}} \times \vec{{\bar{c}}_{1} {\bar{c}}_{4}}|

(4)

f (p) = Θ_{c o v e r a g e} [%] = (\frac{A_{p r o j}}{A_{N C P}}) * 100

(5)

By determining

f (p)

from a geometric relationship, it is possible to evaluate the viewpoint mathematically. The greater the value for

f (p)

, the better the ROI can be recognized in the camera view. This means that maximizing

f (p)

becomes the optimization target for an inserted camera viewpoint

p = (x, y, z) \in R^{3}

. In addition to our presented solution, a built-in function in Unity using Camera.WorldToScreenPoint also returns the projected point in the camera’s viewport. However, the approach presented here is highly efficient and designed to be called numerous times within the search algorithm introduced later.

3.2. Eliminating the Effect of Distortion

The effect of distortion is well known for a central projection and part of radial distortion, in which points appear closer to the edge of the NCP with increasing distance from the center of the NCP in absolute distance and the calculation of a surface, as in our case, leads to an elongation of the surface or distortion of it (see Figure 4). As shown below, the distortion effect is caused by the depth, width, and height distortion of a point on the NCP. For realistic cameras, a calibration matrix is typically used to correct image distortion. However, in 3D game engines like Unity, cameras are assumed to be ideal and do not apply a calibration matrix. Nevertheless, geometric relationships still cause distortion at the image edges. To normalize the size of the projected area, we compute the distortion factor

D

by the ratio of the

d_{N C P}

and the distance

d_{e f f}

from point

p

to the center of the projected ROI. This distortion factor is an approximation method of determining the effect of distortion in an efficient and simple manner.

D = \frac{d_{N C P}}{d_{e f f}}

(6)

The center point of the projected ROI is used for the calculation of

D

, as the relatively small area on the ROI makes the deviation between the center point of the distorted area and the center point of the distorted edges negligible. The effective length

d_{e f f}

can be calculated by computing the diagonal middle point

p_{c e n t e r}

of the projected ROI as well as the distance from this point to the camera’s position

p

.

d_{e f f} = | p_{c e n t e r} - p |

(7)

The effective area

A_{e f f}

can be calculated from the projected area

A_{p r o j}

and

D

. For correct scaling of the area, it must be considered that the distortion is related to the square relationship of

A_{p r o j}

, and thus the height and width of the area. Both the height and the width of the plane are distorted as soon as the projected ROI is shifted from the center of the NCP. The calculated

D

is thus applied three times in the result and is included in the result cubically. For

A_{e f f}

it follows:

A_{e f f} = A_{p r o j} * D^{3}

(8)

3.3. Finding Occlusion-Free Optimal Viewpoints

A suitable optimization method is selected by considering the underlying optimization problem and its requirements. For the highest

f (p)

, an optimal viewpoint is within a fixed distance (radius

r

) from the ROI, does not collide with the environment, and is showing the ROI without any occlusion. With the shortest possible distance to the ROI, potential viewpoints can be found using a search method that fulfills the given criteria and offers the highest value for

f (p)

in comparison to neighboring points. Our search space (

S

) can thus be written as

S \subseteq R^{3}

. Colliders and their respective space (

C

) in the search space are described as

C \subseteq D

, so that

p \notin C

applies for collision-free points in space. For the simplified determination of the visibility of the ROI, it was defined that the UAV can rotate around the z-axis as desired and, thereby, can always look towards the ROI and keep it horizontally in the center of the FOV. This is possible with the real UAV as well, as it can turn at high speed and in any orientation to the actual direction of flight. From this, it follows that the ROI is visible from the viewpoint when both are at the same height (

z_{U A V} = z_{R O I}

). For the vertical visibility test, the condition was defined that the ROI must always be within the FOV. This relationship is visualized in Figure 5. The angle

φ

in Figure 5 is made up of the distance and the relative height difference (

∆ y

) of the viewpoint. If the value of

φ

is smaller than half of the

F O V

, the viewpoint is visible.

To simplify the test, the camera is fictitiously positioned at the center point

R_{0}

of the ROI and translated with an offset of

Δ r

in the direction of

p

. The values

Δ r

and

φ

can be derived from the equations:

Δ r = \frac{h_{R O I}}{\tan (\frac{F O V}{2})}

(9)

φ = \tan^{- 1} (\frac{Δ y}{r - Δ r})

(10)

To summarize the most important findings and objectives for the optimization: the optimal position

p

of a viewpoint maximizes the objective function

f (p)

, which is subject to several constraints. The mathematical formulation is as follows:

Maximize:	$f (p)$
Subject to:	$p \in S$	$(The position lies within the defined operational area D$ )
	$p \notin C$	$(The position lies outside restricted areas C$ )
	$O (p) = 0$	$(No occlusions are present at point p$ )
	$φ < \frac{F O V}{2}$	$(The viewing angle φ$ remains within half the FOV)

Here, the binary function for

O (p)

is indicating whether the viewpoint is obstructed by the obstacles (

O (p) = 1

) or visible (

O (p) = 0

). If obstructed and not visible, the respective viewpoint is not considered as an optimal viewpoint. The objective function is thus continuous, with a local maximum resulting from the non-linear constraints such as collisions and occlusions. As a search algorithm, hill climbing is chosen, due to its short search time needed, and it is executed in parallel with a frequency of 10 Hz. Hill climbing is an optimization technique from the local search family, delivering suitable solutions within a comparatively short time frame without requiring detailed knowledge of the specific application. It is related to the gradient methods, which in turn originate from the greedy algorithms [47]. This is an iterative, metaheuristic algorithm in which function values for a problem are tried out in a fixed adaptation of a function value. If a function value increases the current best value of the target function, the new function value is saved as local optimum. This process continues until no further improvement can be made or the termination criterion is reached. In our case, the termination criterion is defined as a maximum number of iterations of the hill climber without finding any viewpoint improving

f (p)

. For each iteration, the search step size changes, so that a variety of viewpoints in an area can be evaluated. In terms of time complexity, finding a local optimum may take an indefinite amount of time, as the algorithm continuously makes small steps with minimal improvements in the objective function

f (p)

. However, by using a finite search space and an appropriate step size, the hill climbing algorithm can quickly converge to a local optimum, with the speed of convergence being strongly dependent on a reasonable starting point. Similar to greedy algorithms, hill climbing is frequently used in various works focused on optimal viewpoint search, as found in Refs. [48,49,50].

Figure 6 shows a schematic representation of the program sequence using the hill climber algorithm. The green box contains the restrictions to be observed for a viewpoint. However, since hill climbing only provides local maxima and the method for continuous comparison of optimal viewpoints is intended to compare several viewpoints with each other, potentially suitable starting points are determined on the basis of an investigation.

3.4. Starting Points for the Hill Climber

To define the starting viewpoint, values for

f (p)

are first examined for local maxima in order to determine the global minima and maxima around the ROI. The viewpoints are set to have a distance of radius

r

from the ROI and, thereby, form a spherical shape around the ROI. Positions that are not within the FOV were neither calculated nor visualized. Figure 7 shows the result of the investigation of the minimum and maximum values of

f (p)

.

The perspective coverage decreases with a side view, meaning that

f (p)

converges to

f (p) = 0

as the perspective rotates toward the side view of the ROI. Consequently, the global minimum is located at the exact side view. In contrast, the global maximum is found in the optical axis that extends the normal vector in the center of the plane, as indicated by the blue sphere in Figure 7. While this may initially seem intuitive, this can only be reached by eliminating the distortion of the projection of the ROI on the NCP (see Figure 4). By moving the ROI visually further to the edge of the NCP, the projected area

A_{p r o j}

maximizes and so does

f (p)

. This leads to finding the optimum for

f (p)

on the edge to where the ROI is visible from. For later use, these edge viewpoints have the drawback of placing the ROI at the edge of the camera image, which could be problematic in dynamic scenarios. In technical applications with slow movements, such as presence detection, this issue is less relevant. However, in faster-moving scenarios, the UAV’s inertia may prevent the camera from keeping the ROI fully centered, causing it to lose track of the ROI. For teleoperation tasks, this misalignment could also be uncomfortable, as a user wants to see an important object in the center of the screen.

The investigation of the global maximum distributed around the ROI, as shown in Figure 7, provides the result that the highest perspective coverage of a viewpoint is to be expected with a frontal view and on a horizontal line to the ROI, also called the optical axis. Therefore, six starting points for the local search are chosen: one starting point per side of the ROI in extension to the plane normal of the ROI and two at an angle of ±45° in the horizontal center of the ROI. Figure 8 shows the positions for the starting points.

3.5. Collision-Free Path Planning

Since a safety sphere encompasses the ROI, the shortest paths between opposite points inherently lie on the sphere’s surface. However, traditional path planning methods using sampling-based search algorithms like Rapidly-exploring Random Tree (RRT) or A-star (A*) present significant computational challenges in three-dimensional space, particularly for dynamic scenarios requiring rapid path computation [51]. Preliminary investigations revealed substantial limitations with sampling-based search methods: the computational effort is infeasible for real-time path planning, and the generated paths differ too much. Consequently, these path solutions induce unstable flight behaviors, characterized by frequent and abrupt directional changes [52]. To address these challenges, a hybrid approach was developed for determining movement paths on the safety sphere, based on tangent graph or geometric path planning, presented in Refs. [53,54]. With the tangent graph method, visible tangential lines between corners and faces of geometric primitives (e.g., circles, polygons, ellipses) can be planned as shortest paths and converted to a robot trajectory. These primitives represent obstacles in the environment. The approach combines elements of graph theory with geometric optimization to find an efficient, collision-free trajectory that is locally smooth and globally optimized. The use of visible tangents reduces the search space and makes the method computationally efficient. As previously mentioned, the primitive around which we actively plan a path is the sphere encompassing the ROI. The boxes and capsules, being collision zones outside our application, are excluded from active path planning in the initial step for simplification. Since we could not find a comparable approach by other authors using this method for path planning around a sphere, we present our approach in detail.

To derive path points on the sphere, we employ the Great-Circle Distance method, which calculates the shortest distance between two points on a spherical surface, also known as Orthodromic Distance [55]. The Great-Circle Distance defines the shortest path between two points on a spherical surface. This path lies within the plane formed by the starting point, the endpoint, and the center of the sphere (see Figure 9). The start point

p_{s t a r t}

and end point

p_{e n d}

are rotated using the plane’s normal vector

{\vec{N}}_{n o r m a l}

to align with the XZ plane, effectively setting the y-coordinate to zero. Transforming both points to lie on the XZ plane offers a critical methodological advantage: it reduces a complex three-dimensional path-finding problem to a more manageable two-dimensional representation.

We consider scenarios where

p_{s t a r t}

and

p_{e n d}

may not directly intersect the safety sphere’s surface but instead lie at a defined distance from its boundary, visible in Figure 10. In such cases, determining the shortest spherical path requires calculating the first tangential vector from

p_{s a t r t}

to

t_{1}

and the second vector from

p_{e n d}

to

t_{2}

. By leveraging

p_{s t a r t}

and

p_{e n d}

, which have been previously transformed to the XZ plane, we employ the Thales Circle method to derive these tangential vectors.

This geometric approach enables precise computation of the tangential points, facilitating accurate path planning for the UAV. First, the hypotenuses of the blue (

r_{1}

) and red (

r_{0}

) triangles are formed from the geometric relationships.

{r_{0}^{2} = a}^{2} + h^{2}

(11)

r_{1}^{2} {= (d - a)}^{2} + h^{2}

(12)

To obtain values for

h

and

a

, Equation (12) is then converted to

h^{2}

:

h^{2} = r_{0}^{2} - a^{2}

(13)

Substituting this into Equation (13) and converted to

a

results in:

a = \frac{r_{0}^{2} - r_{1}^{2} + d^{2}}{2 * d}

(14)

For

h

follows:

h = \sqrt{r_{0}^{2} - a^{2}}

(15)

The variables

t_{x}

and

t_{y}

are then determined from the ratio of

a

to

d

multiplied by the vector of

p

to

R

:

t_{x} = {p_{s t a r t}}_{x} + \frac{a}{d} * ({p_{s t a r t}}_{x} - R_{x})

(16)

t_{y} = {p_{s t a r t}}_{y} + \frac{h}{d} * ({p_{s t a r t}}_{y} - R_{y})

(17)

The total distance

l

is therefore made up of a linear length

l_{1}

of the first tangential vector, the arc segment

l_{a r c}

, and the second linear length

l_{2}

from the second tangential vector. The angle

α

is between the vectors

\vec{R t_{1}}

und

\vec{R t_{2}}

.

l_{a r c} = \frac{r * π}{180} * α

(18)

l = l_{1} + l_{a r c} + l_{2}

(19)

Finally, the waypoints located on the linear paths can be interpolated with a specified segment length, while the waypoints on the arc segments can be systematically divided into discrete path segments, enabling precise waypoint generation for the UAV’s trajectory. By introducing a specified angular parameter

θ

, the angle from the points

t_{1}

and

t_{2}

to the X-axis is received:

θ = a r c t a n 2 (z, x)

(20)

A normalization of

θ

in the space of

[0, 2 π]

is given by:

θ = (θ + 2 π) m o d 2 π

(21)

The increment

Δ θ

of the circle segments is then calculated using the specified number of circle segments

n_{s e g m e n t s}

:

Δ θ = \frac{θ}{n_{s e g m e n t s}}

(22)

For the calculation of the angle

θ_{i}

of the respective circle point it follows:

θ_{i} = θ_{s t a r t} + i * Δ θ

(23)

And for the circle points:

x_{i} = r * c o s (θ_{s t a r t} + i * Δ θ) z_{i} = r * s i n (θ_{s t a r t} + i * Δ θ)

(24)

The

y

-coordinate remains constant at zero, as the calculation is carried out in XZ planar space. The waypoints of the linear and of the arc segments are then, finally, rotated back to the starting rotation of

{\vec{N}}_{n o r m a l}

.

After receiving the waypoints for travelling to a viewpoint, the collisions of the waypoints with the environment are analyzed. With the integrated method from the Bounds class in Unity, collisions are detected for box and capsule colliders in the scene. If any of the waypoints collide (

p \in C

), the respective viewpoint is discarded and no longer considered as a possible optimal viewpoint. For the collision-free waypoints, an average value of

f {(p)}_{a v g}

for all waypoints in a path is calculated, necessary for the selection of the optimal viewpoint in the next step. Waypoints, indexed with

i

, from which the target is not visible or occluded, become

f (p_{i}) = 0

. The following applies to the total number of path points

n

:

f {(p)}_{a v g} = \frac{\sum_{i}^{n} f (p_{i})}{n}

(25)

The determination of occlusions of the ROI from a respective waypoint is carried out by ray casting from the corners

c_{1 - 4}

and the center of the ROI to the respective viewpoint. Ray casting is carried out within the already implemented function of Unity.

3.6. Decision for the Optimal Viewpoint

The core of our method is the selection of the optimal viewpoint from the available ones and thereby follow a viewpoint to maximize

f (p)

. The basis for this is the availability of several viewpoints, which are in themselves a local optimum for

f (p)

and, thus, represent several options for an optimal viewpoint. The central selection criterion is the cost of reaching a viewpoint. An analytical approach is chosen, in which the value for the perspective coverage of a viewpoint for a given time segment

∆ t

is computed, written as

f {(p)}_{∆ t}

. A value for

∆ t

is set to be 5 s, which is the maximum time to reach a viewpoint of largest path length in preliminary experiments for the UAV. Here, ∆t is split into two components: the dynamic portion

∆ t_{d y n a m i c}

, representing the time the UAV spends moving to the viewpoint, and the static portion

∆ t_{s t a t i c}

, representing the time the UAV remains stationary at the viewpoint. While

∆ t_{s t a t i c}

is multiplied with

f (p)

at the viewpoint, the dynamic component is multiplied by the average perspective coverage

f {(p)}_{a v g}

for the respective path. With this method, we get an approximated average coverage for the path considering the portion of changing and following a viewpoint in the application.

∆ t = ∆ t_{s t a t i c} + ∆ t_{d y n a m i c}

(26)

f {(p)}_{∆ t} = f (p) * ∆ t_{1} + f {(p)}_{a v g} * ∆ t_{2}

(27)

The determination of

∆ t_{d y n a m i c}

is based on the kinematic equations of motion for an acceleration

\vec{a}

of ±2.5

\frac{m}{s^{2}}

and a fixed maximum speed

v_{m a x}

of 2

\frac{m}{s}

. The value for

\vec{a}

was determined in measurements and was found to be nearly constant for the maximum speed [17]. It should be noted that this is a strong simplification, which neglects dynamic forces, the time required for fine referencing at a viewpoint, and also causing a delay when leaving the planned trajectory. The consideration of these factors would exceed the scope of this work and is therefore considered for further development. By knowing the path length

l

, the durations for

∆ t_{d y n a m i c}

can be calculated using common and known formulas.

From this consideration, it follows that the viewpoint with the highest value for

f {(p)}_{∆ t}

within the time interval also provides the highest perspective coverage. However, as viewpoints are sometimes found at the same local maxima, the flight time and perspective coverage are similar for the respective viewpoints, and so is the value for

f {(p)}_{∆ t}

. Small differences in

f {(p)}_{∆ t}

, though, lead to unstable flight behavior, because the movement of direction and acceleration changes frequently as the position of the UAV changes. To select a stable viewpoint, a threshold method is implemented that provides stable selection of a viewpoint. The threshold method is designed to save the current viewpoint and only replaces it with the next viewpoint if

n e x t f {(p)}_{∆ t}

is 10% greater than

c u r r e n t f {(p)}_{∆ t}

.

τ = \frac{n e x t f {(p)}_{∆ t}}{c u r r e n t f {(p)}_{∆ t}}

(28)

p_{c u r r e n t} = \{\begin{matrix} p_{n e x t} \\ p_{c u r r e n t} \end{matrix} \binom{i f τ > 10 %}{e l s e}

(29)

The current viewpoint is then given to the UAV controller. If no viewpoint is available, a safe position is given as to the UAV controller. The safe position is a manually defined static position above the robot (see Figure 11). Defining the safe position above the robot gives the decisive advantage for the UAV to return to a position from which many viewpoints can be reached. From above the robot, following the course of the spherical surface, there are presumably fewer collisions on paths and the UAV has a viewpoint available. However, this approach is probably unsuitable for complex cells, as collisions must always be expected, even on paths to the safe position. The constant availability of a viewpoint and collision-free path planning is therefore the most essential element of our approach.

After selecting the optimal viewpoint, the corresponding waypoints are transferred to the UAV controller. The guidance logic relies on the waypoint hyperplane condition, a nonlinear method that ensures the UAV smoothly transitions between waypoints, introduced in Ref. [56]. This method works by specifying a target point, or lookahead point, from the determined waypoints. The lookahead point is the nearest point to the UAV with a distance greater than the specified lookahead distance

d_{m i n} = 0.3 m

. The specified

d_{m i n}

ensures a smooth transition without abrupt directional changes for the UAV. If there is no waypoint with this distance, the viewpoint itself is approached directly.

3.7. Implementation of a State Machine

A state machine is implemented to ensure stable, predictable, and safe flight behavior, even in situations with changing or temporarily unavailable viewpoints. This distinguishes between the states following, changing, hovering, and safe position. Through clearly defined input and output conditions for each state, the state machine ensures stable flight behavior of the UAV without abruptly changing the viewpoint or flight direction. Detailed explanations of the states can be found in Table 2.

4. Experimental Results and Discussion

To validate our method, two applications are implemented in Unity: a palletizing task and a welding application for a car body. Both tasks are common for industrial robots and hold significant challenges for UAV applications and, thereby, for receiving insight into dynamic robot processes within a dense environment and many obstacles. The ROI is manually defined by the user and linked to the position of the TCP, following the end effector of the robot. The robot’s joint data are sent to Unity via the RESTful API interface of the robot’s virtual controller, run in ABB RobotStudio. It is synchronized with the robot’s digital twin in Unity at 50 Hz. In addition to the UAV and the robot, the cell consists of manipulation objects (boxes), the ROI, and multiple box and capsule colliders. The colliders are assumed to have an offset of 40 cm from the colliding surface of the enclosing object, which seem to represent a realistic safety distance in a UAV application.

4.1. Results of the Palletizing Application

Figure 12 shows the static collision objects in the side (a) and top-down view (b) for the palletizing cell, showing the comparatively small movement space of the UAV, and thus challenges for dynamic camera planning. The turquoise sphere holds the ROI in the center and describes the minimum distance of the UAV from the ROI. The purple capsule colliders form the protective volume of the robot and are also prohibited for the movement path of the UAV.

At the beginning of the routine, the UAV is placed at a location defined by the user. In our case, this is done in the entrance area of the cell visualized by a door, as there is sufficient space available. However, the starting location is irrelevant, as a similar orientation of the UAV in the cell and in tracking the optimal viewpoint is also evident after a short time for different starting locations. Starting the program, the light blue viewpoint is selected as the optimal viewpoint, which is in the optical axis of the ROI. In this application example, the choice for our take-off location merely shortens the time it takes to reach the light blue viewpoint. When placing the UAV on the other side of the ROI, our method will make the UAV switch to the side of the entrance, as the average score is higher and viewpoints here are not occluded by the conveyor belt. Figure 13a shows a screenshot from the program sequence in which the UAV selects the light blue viewpoint and films from the desired perspective. It can be seen that the UAV stays in line with the optical axis of the ROI and moves along with it as the robot palletizes boxes.

To compare the application results with traditional inspection solutions, four static cameras are attached to the fencing, as shown in Figure 14. The static cameras are parameterized identically to the camera of the UAV and thus offer a comparison of the perspective coverage. The locations of the static cameras are chosen to allow for the best possible view providing a meaningful comparison of the application results.

The result of our comparison is summarized in Table 3. With an average perspective coverage of 2.166%, the UAV offers significantly higher perspective coverage than camera 1, which is the best camera in the scenario. As displayed in the following diagrams, the UAV is exclusively in the following state and there are no changes to other viewpoints or safe position states. Such a result can be classified as ideal, as continuously following a viewpoint results in the highest perspective coverage. Since there are no changes to other viewpoints or approaches to the safe position during the scenario, the UAV always achieves a view of the ROI. As a result, the perspective coverage is also the highest in this case. Although the percentage value of the average perspective coverage of around 2.166% seems low, it should be noted that the actual size of visual areas on images in relation to the entire image size is deceptive. Figure 13b, therefore, shows the perspective on the ROI from the UAV’s viewpoint as exemplary, with the maximum theoretical coverage of 2.513% under the given camera constraints and radius of the safety sphere.

Figure 15 illustrates the UAV’s increase and decrease of the perspective coverage, related to the control of the UAV, which steers the system to the position of the optimal viewpoint with a slight delay. While the UAV must be physically accelerated and decelerated, the viewpoint can beam to the next local maxima. As a result, the UAV always flies slightly behind the viewpoint and thus loses perspective coverage. It can also be seen that the viewpoint with the highest perspective coverage is not always the light blue viewpoint. However, since our method uses a threshold for decision making for choosing a viewpoint, slight differences of the perspective coverage do not lead to changing the viewpoint. Instead, the UAV remains at the light blue viewpoint. This can be seen in Figure 16.

Figure 16 shows a stable progression of the score with only minor changes over the entire routine. The light blue viewpoint is constantly selected as the optimal viewpoint, which makes sense, looking at the application. This viewpoint is unoccluded and can be reached by the UAV at any time. Any viewpoint change would probably lower the perspective coverage. The results of the application show that for low and slow movement of the ROI, without occlusions and rotations of the ROI, the resulting dynamic camera planning of the UAV can be well implemented with our method. By analytically determining a score, viewpoints are evaluated in terms of their cost and the achievable perspective coverage, and a strategy can be determined for a given time window that compensates for small fluctuations in the highest perspective coverage and ensures stable perspective coverage.

4.2. Results of the Welding Application

In addition to evaluating our approach in a palletizing application, we want to benchmark the approach in a much more complex welding application, as presented in Figure 17. The welding application is designed in such a way that the TCP is moved along the chassis of the car body, visible in Figure 17b. The TCP starts at no. 1, at the front of the car, and follows the sequence of numbered contact points. Rotations of the ROI are performed at each contact point to set the rotation of the ROI to the displayed perspective and, thereby, cause the UAV to change the viewpoint. The rotation of the ROI is executed within Unity as soon as the robot reaches a contact point. Figure 17a clearly shows the UAV performing evasive movements to the safe position above the robot during flight. While the path of the UAV in the car’s front area only differs slightly, the path in the area of the vehicle’s door deviates significantly, and the UAV’s movement seems uncontrolled.

As shown in the following diagrams, the UAV loses the optimal viewpoint at contact point no. 4 because it cannot respond quickly enough. At this point, our method outputs an optimal viewpoint above the car’s roof. Consequently, the UAV must perform a significant reorientation, moving from a position beside the robot to a position above the car’s roof to avoid colliding with the robot arm. However, since the direct and shortest path passes through the capsule collider, the UAV cannot find a feasible route and instead switches to the safe position state. This behavior can be observed in Figure 17a, where the vertical flight path above the robot indicates the UAV’s movement to the safe position. Since there is no feedback mechanism to inform the robot whether the UAV has reached the designated viewpoint, the robot proceeds to contact point no. 5 without waiting for the UAV. As a result, the optimal viewpoint cannot be reached, causing the UAV to oscillate between different viewpoints and repeatedly return to the safe position. This behavior significantly disrupts the UAV’s flight performance due to the inaccessibility of the target position and the relatively rapid reorientation required. The UAV struggles to reach the viewpoint quickly enough, which severely impairs its overall effectiveness.

Table 4 shows the results for the application. With an average perspective coverage of 1.667%, the UAV gains sight on the ROI. Compared to the static cameras installed on the fencing, this is still an acceptable result. The big difference between the perspective coverage of the UAV and the static cameras is due to the relatively large distance of the static cameras to the ROI. These are attached to the more distant fencing. It could certainly be discussed whether real cameras would be equipped with a lens and a corresponding zoom for this case to capture the scene with a higher perspective coverage despite their greater distance.

The evaluation of the perspective coverage in Figure 18 visualizes two segments of about 5 and 12 s without any sight on the ROI. With constant high perspective coverage, the UAV can initially follow the ROI at the light blue viewpoint and loses sight of the ROI at about 20 s with a change to the violet position. After only approximately 45 s the UAV regains perspective coverage and, thus, switches to the following state and regains sight on the ROI.

The areas without perspective coverage can be found congruently in Figure 19. Inhere, frequent changes of the optimal viewpoint mark an unstable flight behavior, as the system does not constantly decide on a viewpoint and leads to frequent changes of movement direction and, ultimately, to a safe position state. Jumps from second ~22 from the light blue viewpoint to the violet viewpoint, to the blue viewpoint and back are evidence of erratic flight behavior, which, as already explained, is reflected in the perspective coverage. Black segments in the timeline of Figure 19 indicate that no viewpoint was available at this time and the safe position was approached.

Figure 20 provides an additional overview of the resulting score from 22 to 40 s in addition to the display of the respective state for determining the scores of all viewpoints. This occurs even though the perspective coverage in this range is high for all viewpoints and the viewpoints are all available (

f (p) \neq 0

). Waypoints may overlap with the environment or, in our case, with the robot’s capsule collider. This issue arises when the UAV cannot find a viable path from its current position to any of the defined viewpoints.

To sum up the findings for our method’s implementation, the defined target requirements (see Table 1) are largely fulfilled. The implementation conducted in Unity already shows a satisfactory result for perspective coverage for the palletizing application. The measurability of the method in terms of measurable perspective coverage over time means that solid statements can be made about the effectiveness of the method. The hill climber method used for the search of local optima can efficiently find the optimal viewpoint, even in the case of larger step sizes of the algorithm. In the welding application in particular, the starting points of the viewpoints are often obscured by the car body. As a result, collision-free and occlusion-free viewpoints above the body are successfully identified. The path planning method for the UAV described by us has proven to be very promising for the application, as a path can be found time efficiently, without sudden changes in direction, and taking collisions into account. The determination of a constant shortest path is particularly important for the application in order to avoid changes in the direction of movement due to changing paths.

After extensive analysis, our method for calculating the score provides comprehensible decision features for a specified time window despite its simplicity. By defining the score for a time window and a threshold, a stable decision behavior for the optimal viewpoint can be observed. The behavior of the UAV is finally supported by the implementation of the state machine. Within the experiments conducted, it can be concluded that the definition of a safe position and the specification of states contribute significantly to dynamic camera planning.

4.3. Limitations and Future Work

Nevertheless, the limitations of our method are shown in the welding application. The greatest influence on perspective coverage is exerted by flights to the safe position, which occur when no available viewpoint can be determined for the UAV. When approaching the safe position, the UAV completely loses sight of the ROI. We initially assumed the placement of the safe position above the robot to ensure that a position is assumed from which the UAV can free itself from a situation in which it is stuck, for example. Depending on the application, however, it might be worth considering moving the safe position to a position outside the robot’s workspace that is at the same height as the ROI. This gives the advantage of observing the ROI from the UAV’s position even if a safe position state is triggered. Therefore, the safe position must be cleverly selected and reachable depending on the application at minimal cost.

Since the hovering state is triggered if no path to a viewpoint is available, it could be an effective solution to extend the implemented path search around the sphere with additional section planes. By using our score method, multiple optional paths could then be compared by the highest average perspective coverage. In this way, it is more likely to find an available path that may even have a higher score than the path of the great circle distance. It would also be interesting to extend the path search for other geometric primitives, such as the box colliders or the capsule colliders of the robot.

In addition to improving the handling of the safe position state, the application highlights the importance of minimizing sudden, short-term changes of movement direction of the UAV. A sudden change of movement direction with a strong acceleration rotates the system. This can briefly cause the ROI to leave the field of view and, furthermore, negatively affect flight stability, compromising visual continuity and the operator’s ability to maintain clear sight. Ensuring a stable and steady camera movement is therefore crucial for achieving optimal performance.

In addition to the suggestions already mentioned for improving the application, we consider the continuation of our work through the implementation of a method for future knowledge of the application to be one of the greatest potentials. Currently, the viewpoint with the highest score and an added threshold value is always taken in accordance with our greedy strategy. This does not consider the fact that an optimal viewpoint, even if it has a high score at that moment, only exists for a brief moment. This circumstance results in fundamental problems for applications such as the welding application:

If the viewpoint disappears shortly after the UAV changes to the change state without having been reached by the UAV, the theoretical perspective coverage calculated in the score will not be achieved.
The UAV could reach a point in the routine where suddenly no other viewpoint is available and it gets stuck.
The UAV does not recognize a change to a viewpoint that is worse at this point, but which will provide a higher perspective coverage in the future.

These challenges could be addressed by incorporating future knowledge of the robot’s position, the ROI, optimal viewpoints, and potential collisions. By predicting the optimal viewpoints and their perspective coverage at specific points in the robot’s routine, it would be possible to devise a flight path strategy that maximizes perspective coverage when executed by the UAV. This approach would also account for the time required to adjust the viewpoint and the temporary loss of coverage during transitions. As part of our future work, we plan to develop and implement a method that integrates this predictive knowledge into the UAV’s dynamic planning process.

5. Conclusions

Robot-integrated production processes often occur in enclosed areas that are inaccessible to operators. The ability to monitor these processes—whether for teleoperation, maintenance, repair, or troubleshooting—without interrupting production is therefore of significant interest. To address this challenge, a method was developed to determine, evaluate, and control a UAV to operate as a dynamic camera within a robot’s workspace. An optimal viewpoint is free of occlusions and provides the maximum perspective coverage on a defined ROI. Starting the search from six predefined positions distributed with radius

r

around the ROI, a hill-climbing algorithm searches for a locally optimal viewpoint. The shortest paths to these viewpoints are calculated continuously, following the orthodrome (the shortest path on the surface of the safety sphere). These waypoints are then verified to be collision-free. If a waypoint passes this check, a score is assigned to the corresponding viewpoint, which evaluates the perspective coverage achieved at, and on the path to, this viewpoint. Comparing multiple viewpoints as an alternative, a viewpoint change is triggered when an alternative viewpoint achieves a score at least 10% higher than the current one. Additionally, to ensure smoother transitions and decision-making, a state machine is implemented to control the UAV. This state machine stabilizes transitions between states and prevents abrupt changes in the UAV’s behavior, ensuring reliable and predictable operation.

The evaluation of our method within a palletizing and a welding scenario suggests a significant increase in process insight using a UAV as a dynamic camera. However, the perspective coverage of the UAV is strongly related to the number of viewpoint changes and the number of safe position flights, as shown in the welding scenario. Additionally, all paths depend heavily on the defined radius of the safety sphere, which make the radius a part of the optimization itself. This needs to be covered in future work. The results show that for any viewpoint change, the perspective coverage is temporarily reduced. In the case of a safe position state, however, the UAV does not gain any perspective coverage for our selected system setup. This becomes even more problematic in aggressive applications, i.e., fast movements and frequent changes of direction of the manipulator.

We have discussed several proposals to reduce this form of interruption and thus reduce the number of safe position states. In such situations, the UAV might move to a viewpoint that becomes suddenly unavailable or stays in a position from which it cannot reach other viewpoints identified by our previously implemented path search. To reduce this behavior, the biggest advantage seems to be the inclusion of future knowledge for robot-integrated processes within an offline flight path planning, synchronized to the movement of the manipulator. In this regard, robot-integrated processes can include both purely cyclical scenarios and acyclic scenarios with recurring motion patterns. It can be assumed that robot applications have at least partially recurring movement patterns, e.g., when performing certain actions such as tool changes, or moving the TCP along identical paths. Identifying similarities from these behaviors and influencing future UAV control might lead to a huge improvement in dynamic camera planning.

To address these challenges, dynamic path planning is critical. This includes the ability to predict task-specific demands and adjust the UAV’s trajectory in real-time, ensuring continuous operation even in the presence of unforeseen collisions or occlusions. Future systems should use predictions in the form of empirical data, such as past cycle patterns or event frequencies, to support the selection of optimal viewpoints with minimal change and error states. The developed methods must be evaluated in diverse application scenarios, particularly those involving frequent reorientations of the ROI and significant UAV realignments due to changing viewpoints. Additionally, scenarios should include previously unconsidered situations with potential collision risks or other critical error states to thoroughly evaluate the program logic. By addressing these challenges, such systems will not only enhance operational efficiency, but also ensure robust and flexible robot-integrated processes suitable for real-world deployment.

Author Contributions

M.B. is the corresponding author, responsible for the basic concepts, literature review, implementation, and discussing of results. B.K. is the active provider of the underlying ideas and advice, also responsible for revising the manuscript. P.K. is a former student of M.B. and supported in the implementation process. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

As non-native English speakers, we partially used ChatGPT 4.0 and 3.5 as well as DeepL Write for the linguistic revision and translation of our contribution. It should be expressly noted that no content was generated by the use of an AI tool, only linguistic improvements were made.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Xiao, X.; Dufek, J.; Murphy, R.R. Tethered Aerial Visual Assistance. arXiv 2020, arXiv:2001.06347. [Google Scholar]
Gonzalez-Barbosa, J.-J.; Garcia-Ramirez, T.; Salas, J.; Hurtado-Ramos, J.-B.; Rico-Jimenez, J.-J. Optimal camera placement for total coverage. In Proceedings of the 2009 IEEE International Conference on Robotics and Automation, Kobe, Japan, 2–17 May 2009; pp. 844–848. [Google Scholar] [CrossRef]
Christie, M.; Machap, R.; Normand, J.M.; Olivier, P.; Pickering, J. Virtual Camera Planning: A Survey. In Smart Graphics; Butz, A., Fisher, B., Krüger, A., Olivier, P., Eds.; Springer: Berlin/Heidelberg, Germany, 2005; pp. 40–52. [Google Scholar]
Kaiser, D.; Quek, G.L.; Cichy, R.M.; Peelen, M.V. Object Vision in a Structured World. Trends Cogn. Sci. 2019, 23, 672–685. [Google Scholar] [CrossRef] [PubMed]
Rudoy, D.; Zelnik-Manor, L. Viewpoint Selection for Human Actions. Int. J. Comput. Vis. 2012, 97, 243–254. [Google Scholar] [CrossRef]
Bares, W.H.; Thainimit, S.; McDermott, S. A Model for Constraint-Based Camera Planning. In Smart Graphics, Proceedings of the 2000 AAAI Spring Symposium, Palo Alto, CA, USA, 20–22 March 2000; AAAI Press: Menlo Park, CA, USA, 2000; pp. 84–91. [Google Scholar]
Halper, N.; Olivier, P. CamPlan: A Camera Planning Agent. In Smart Graphics, Proceedings of the 2000 AAAI Spring Symposium, Palo Alto, CA, USA, 20–22 March 2000; AAAI Press: Menlo Park, CA, USA, 2000; pp. 92–100. [Google Scholar]
Magaña, A.; Dirr, J.; Bauer, P.; Reinhart, G. Viewpoint Generation Using Feature-Based Constrained Spaces for Robot Vision Systems. Robotics 2023, 12, 108. [Google Scholar] [CrossRef]
Jangir, R.; Hansen, N.; Ghosal, S.; Jain, M.; Wang, X. Look Closer: Bridging Egocentric and Third-Person Views with Transformers for Robotic Manipulation. IEEE Robot. Autom. Lett. 2022, 7, 3046–3053. [Google Scholar] [CrossRef]
Singh, A.; Kalaichelvi, V.; Karthikeyan, R. A survey on vision guided robotic systems with intelligent control strategies for autonomous tasks. Cogent Eng. 2022, 9, 2050020. [Google Scholar] [CrossRef]
Nicolis, D.; Palumbo, M.; Zanchettin, A.M.; Rocco, P. Occlusion-Free Visual Servoing for the Shared Autonomy Teleoperation of Dual-Arm Robots. IEEE Robot. Autom. Lett. 2018, 3, 796–803. [Google Scholar] [CrossRef]
Chao, F.; Zhu, Z.; Lin, C.-M.; Hu, H.; Yang, L.; Shang, C.; Zhou, C. Enhanced Robotic Hand–Eye Coordination Inspired From Human-Like Behavioral Patterns. IEEE Trans. Cogn. Dev. Syst. 2018, 10, 384–396. [Google Scholar] [CrossRef]
Triggs, B.; Laugier, C. Automatic camera placement for robot vision tasks. In Proceedings of the 1995 IEEE International Conference on Robotics and Automation, Nagoya, Japan, 21–27 May 1995; pp. 1732–1737. [Google Scholar] [CrossRef]
Baumgärtner, J.; Bertschinger, B.; Hoffmann, K.; Puchta, A.; Sawodny, O.; Reichelt, S.; Fleischer, J. Camera Placement Optimization for a Novel Modular Robot Tracking System. In Proceedings of the 2023 IEEE SENSORS, Vienna, Austria, 29 October–1 November 2023; pp. 1–4. [Google Scholar] [CrossRef]
Akinola, I.; Varley, J.; Kalashnikov, D. Learning Precise 3D Manipulation from Multiple Uncalibrated Cameras. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 4616–4622. [Google Scholar] [CrossRef]
Boshoff, M.; Kuhlenkötter, B.; Jakschik, M.; Sinnemann, J. Dynamische Kameraverfolgung von Regions of Interest in der Produktion mit Flugrobotern. Z. Wirtsch. Fabr. 2022, 117, 733–736. [Google Scholar] [CrossRef]
Boshoff, M.; Barros, G.; Kuhlenkötter, B. Performance measurement of unmanned aerial vehicles to suit industrial applications. Prod. Eng. 2024. [Google Scholar] [CrossRef]
Tarabanis, K.A.; Tsai, R.Y.; Allen, P.K. The MVP sensor planning system for robotic vision tasks. IEEE Trans. Robot. Automat. 1995, 11, 72–85. [Google Scholar] [CrossRef]
Abrams, S.; Allen, P.K.; Tarabanis, K.A. Dynamic sensor planning. In Proceedings of the IEEE International Conference on Robotics and Automation, Atlanta, GA, USA, 2–6 May 1993; pp. 605–610. [Google Scholar] [CrossRef]
Cowan, C.K.; Kovesi, P.D. Automatic sensor placement from vision task requirements. IEEE Trans. Pattern Anal. Mach. Intell. 1988, 10, 407–416. [Google Scholar] [CrossRef]
Tarabanis, K.; Tsai, R.Y. Computing viewpoints that satisfy optical constraints. In Proceedings of the 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Maui, HI, USA, 3–6 June 1991; pp. 152–158. [Google Scholar] [CrossRef]
Tarabanis, K.; Tsai, R.Y.; Abrams, S. Planning viewpoints that simultaneously satisfy several feature detectability constraints for robotic vision. In Proceedings of the Fifth International Conference on Advanced Robotics ‘Robots in Unstructured Environments’, Pisa, Italy, 19–22 June 1991; Volume 2, pp. 1410–1415. [Google Scholar] [CrossRef]
Tarabanis, K.; Tsai, R.Y.; Allen, P.K. Automated sensor planning for robotic vision tasks. In Proceedings of the 1991 IEEE International Conference on Robotics and Automation, Sacramento, CA, USA, 9–11 April 1991; pp. 76–82. [Google Scholar] [CrossRef]
Abrams, S.; Allen, P.K.; Tarabanis, K. Computing Camera Viewpoints in an Active Robot Work Cell. Int. J. Robot. Res. 1999, 18, 267–285. [Google Scholar] [CrossRef]
Peuzin-Jubert, M.; Polette, A.; Nozais, D.; Mari, J.-L.; Pernot, J.-P. Survey on the View Planning Problem for Reverse Engineering and Automated Control Applications. Comput.-Aided Des. 2021, 141, 103094. [Google Scholar] [CrossRef]
Zeng, R.; Wen, Y.; Zhao, W.; Liu, Y.-J. View planning in robot active vision: A survey of systems, algorithms, and applications. Comput. Vis. Media 2020, 6, 225–245. [Google Scholar] [CrossRef]
Magaña, A.; Gebel, S.; Bauer, P.; Reinhart, G. Knowledge-Based Service-Oriented System for the Automated Programming of Robot-Based Inspection Systems. In Proceedings of the 2020 25th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), Vienna, Austria, 8–11 September 2020. [Google Scholar] [CrossRef]
Moniruzzaman, M.D.; Rassau, A.; Chai, D.; Islam, S.M.S. Teleoperation methods and enhancement techniques for mobile robots: A comprehensive survey. Robot. Auton. Syst. 2022, 150, 103973. [Google Scholar] [CrossRef]
Xiao, X.; Dufek, J.; Murphy, R.R. Autonomous Visual Assistance for Robot Operations Using a Tethered UAV. In Field and Service Robotics: Results of the 12th International Conference; Springer: Singapore, 2019; Volume 16, pp. 15–29. [Google Scholar] [CrossRef]
Sato, R.; Kamezaki, M.; Niuchi, S.; Sugano, S.; Iwata, H. Derivation of an Optimum and Allowable Range of Pan and Tilt Angles in External Sideway Views for Grasping and Placing Tasks in Unmanned Construction Based on Human Object Recognition. In Proceedings of the 2019 IEEE/SICE International Symposium on System Integration (SII), Paris, France, 14–16 January 2019; pp. 776–781. [Google Scholar] [CrossRef]
Gawel, A.; Lin, Y.; Koutros, T.; Siegwart, R.; Cadena, C. Aerial-Ground collaborative sensing: Third-Person view for teleoperation. In Proceedings of the 2018 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), Philadelphia, PA, USA, 6–8 August 2018; pp. 1–7. [Google Scholar] [CrossRef]
Haldankar, T.; Kedia, S.; Panchmatia, R.; Parmar, D.; Sawant, D. Review of Implementation of Vision Systems in Robotic Welding. In Proceedings of the 2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 6–8 May 2021; pp. 692–700. [Google Scholar] [CrossRef]
Glorieux, E.; Franciosa, P.; Ceglarek, D. Coverage path planning with targetted viewpoint sampling for robotic free-form surface inspection. Robot. Comput.-Integr. Manuf. 2020, 61, 101843. [Google Scholar] [CrossRef]
Herakovic, N. Robot Vision in Industrial Assembly and Quality Control Processes. In Robot Vision; Ude, A., Ed.; IntechOpen: Rijeka, Croatia, 2010. [Google Scholar] [CrossRef]
Gemerek, J. Active Vision and Perception. Ph.D. Thesis, Cornell University, Ithaca, NY, USA, 2020. [Google Scholar]
Rakita, D.; Mutlu, B.; Gleicher, M. Remote Telemanipulation with Adapting Viewpoints in Visually Complex Environments. In Proceedings of the Robotics: Science and Systems XV, Freiburg im Breisgau, Germany, 22–26 June 2019; pp. 1–10. [Google Scholar] [CrossRef]
Rakita, D.; Mutlu, B.; Gleicher, M. An Autonomous Dynamic Camera Method for Effective Remote Teleoperation. In Proceedings of the 2018 ACM/IEEE International Conference on Human-Robot Interaction, Chicago, IL, USA, 5–8 March 2018; Kanda, T., Ŝabanović, S., Hoffman, G., Tapus, A., Eds.; ACM: New York, NY, USA, 2018; pp. 325–333. [Google Scholar] [CrossRef]
Jia, R.; Yang, L.; Cao, Y.; Or, C.K.; Wang, W.; Pan, J. Learning Autonomous Viewpoint Adjustment from Human Demonstrations for Telemanipulation. J. Hum.-Robot. Interact. 2024, 13, 32. [Google Scholar] [CrossRef]
Saakes, D.; Choudhary, V.; Sakamoto, D.; Inami, M.; Lgarashi, T. A teleoperating interface for ground vehicles using autonomous flying cameras. In Proceedings of the 2013 23rd International Conference on Artificial Reality and Telexistence (ICAT), Tokyo, Japan, 11–13 December 2013; pp. 13–19. [Google Scholar] [CrossRef]
Claret, J.-A.; Zaplana, I.; Basanez, L. Teleoperating a mobile manipulator and a free-flying camera from a single haptic device. In Proceedings of the 2016 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), Lausanne, Switzerland, 23–27 October 2016; pp. 291–296. [Google Scholar] [CrossRef]
Claret, J.-A.; Basañez, L. Using an UAV to guide the teleoperation of a mobile manipulator. In Proceedings of the XXXVIII Jornadas de Automática, Gijón, Spain, 6–8 September 2017; Universidade da Coruña, Servizo de Publicacións: Oviedo, Spain, 2017; pp. 694–700. [Google Scholar] [CrossRef]
Claret, J.-A.; Basañez, L. Teleoperating a mobile manipulation using a UAV camera without robot self-occlusions. In Proceedings of the XL Jornadas de Automática: Libro de Actas, Ferrol, Spain, 4–6 September 2019; pp. 694–701. [Google Scholar] [CrossRef]
Xiao, X.; Dufek, J.; Murphy, R. Visual servoing for teleoperation using a tethered UAV. In Proceedings of the 15th IEEE International Symposium on Safety, Security and Rescue Robotics, Shanghai, China, 11–13 October 2017; IEEE: Piscataway, NJ, USA, 2017. [Google Scholar] [CrossRef]
Dufek, J.; Xiao, X.; Murphy, R.R. Best Viewpoints for External Robots or Sensors Assisting Other Robots. IEEE Trans. Hum.-Mach. Syst. 2021, 51, 324–334. [Google Scholar] [CrossRef]
Senft, E.; Hagenow, M.; Praveena, P.; Radwin, R.; Zinn, M.; Gleicher, M.; Mutlu, B. A Method for Automated Drone Viewpoints to Support Remote Robot Manipulation. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022; pp. 7704–7711. [Google Scholar] [CrossRef]
Hartley, R.; Zisserman, A. Camera Models. In Multiple View Geometry in Computer Vision; Cambridge University Press: Cambridge, UK, 2004; pp. 153–177. [Google Scholar]
Cormen, T.H. Introduction to Algorithms, 3rd ed.; MIT Press: Cambridge, MA, USA, 2009. [Google Scholar]
Liu, J.; Sridharan, S.; Fookes, C. Recent Advances in Camera Planning for Large Area Surveillance. ACM Comput. Surv. 2016, 49, 6. [Google Scholar] [CrossRef]
Sumi Suresh, M.S.; Menon, V.; Setlur, S.; Govindaraju, V. Maximizing Coverage over a Surveillance Region Using a Specific Number of Cameras. In Pattern Recognition; Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, C.-L., Bhattacharya, S., Pal, U., Eds.; Springer Nature: Cham, Switzerland, 2025; pp. 303–320. [Google Scholar] [CrossRef]
Melo, A.G.; Pinto, M.F.; Marcato, A.L.M.; Honório, L.M.; Coelho, F.O. Dynamic Optimization and Heuristics Based Online Coverage Path Planning in 3D Environment for UAVs. Sensors 2021, 21, 1108. [Google Scholar] [CrossRef] [PubMed]
Chattaoui, S.; Jarray, R.; Bouallègue, S. Comparison of A* and D* Algorithms for 3D Path Planning of Unmanned Aerial Vehicles. In Proceedings of the 2023 IEEE International Conference on Artificial Intelligence & Green Energy (ICAIGE), Sousse, Tunisia, 12–14 October 2023; pp. 1–6. [Google Scholar] [CrossRef]
Wang, P.; Mutahira, H.; Kim, J.; Muhammad, M.S. ABA*–Adaptive Bidirectional A* Algorithm for Aerial Robot Path Planning. IEEE Access 2023, 11, 103521–103529. [Google Scholar] [CrossRef]
Liang, X.; Meng, G.; Xu, Y.; Luo, H. A geometrical path planning method for unmanned aerial vehicle in 2D/3D complex environment. Intell. Serv. Robot. 2018, 11, 301–312. [Google Scholar] [CrossRef]
Yao, Z.; Wang, W. An efficient tangent based topologically distinctive path finding for grid maps. arXiv 2023, arXiv:2311.00853. [Google Scholar] [CrossRef]
O’Neill, B. Elementary Differential Geometry, 2nd ed.; Revised; Elsevier Academic Press: Amsterdam, The Netherlands, 2006. [Google Scholar]
Park, S.; Deyst, J.; How, J. A New Nonlinear Guidance Logic for Trajectory Tracking. In Proceedings of the AIAA Guidance, Navigation, and Control Conference and Exhibit, Providence, PI, USA, 16–19 August 2004. [Google Scholar] [CrossRef]

Figure 1. Looking from (a) the perspective of a static camera installed at the fence, (b) a semi-static camera attached at the robot’s moving joint, (c) a typical hand-eye camera, and (d) the UAV’s perspective.

Figure 2. Definition of object collider and safety sphere in the Unity scene.

Figure 3. Camera frustum of the UAV and the NCP, ROI, and FCP in it. The corners

c_{1 - 4}

of the ROI are projected onto the NCP and named

{\bar{c}}_{1 - 4}

in here.

Figure 3. Camera frustum of the UAV and the NCP, ROI, and FCP in it. The corners

c_{1 - 4}

of the ROI are projected onto the NCP and named

{\bar{c}}_{1 - 4}

in here.

Figure 4. Distortion of the projection of the ROI on the NCP when the viewpoint is moved in the vertical direction. The projected area

A_{p r o j}

thereby scales with the distortion, which is a well-known effect in 3D projections.

Figure 4. Distortion of the projection of the ROI on the NCP when the viewpoint is moved in the vertical direction. The projected area

A_{p r o j}

thereby scales with the distortion, which is a well-known effect in 3D projections.

Figure 5. Relationship between the angle

φ

, the target point

p

, the

F O V

, and the position of the ROI. If

φ < \frac{F O V}{2}

, the viewpoint

p

is in sight, and vice versa, the ROI is fully visible from the viewpoint.

Figure 5. Relationship between the angle

φ

, the target point

p

, the

F O V

, and the position of the ROI. If

φ < \frac{F O V}{2}

, the viewpoint

p

is in sight, and vice versa, the ROI is fully visible from the viewpoint.

Figure 6. Search procedure for finding the optimal viewpoint

p_{o p t}

with the hill climber algorithm. Within the green frame are the termination criteria, which end the current cycle of the algorithm if true.

Figure 6. Search procedure for finding the optimal viewpoint

p_{o p t}

with the hill climber algorithm. Within the green frame are the termination criteria, which end the current cycle of the algorithm if true.

Figure 7. Global minimum and maximum value for

f (p)

. The ROI is visualized as a black plane. The blue spheres represent the global maximum, and the pink spheres are the global minimum for

f (p)

. Shifting the sphere color from green to red indicates a rising value for

f (p)

.

Figure 7. Global minimum and maximum value for

f (p)

. The ROI is visualized as a black plane. The blue spheres represent the global maximum, and the pink spheres are the global minimum for

f (p)

. Shifting the sphere color from green to red indicates a rising value for

f (p)

.

Figure 8. Starting points for the search procedure, colorized in their later formation.

Figure 9. The shortest path around the ROI safety sphere with waypoints highlighted as single spheres. In (a), the shortest path is calculated on the XZ plane, which transforms the y-value of all points to zero. In (b), the waypoints are rotated back to the original rotation of

p_{s t a r t}

and

p_{e n d}

.

Figure 9. The shortest path around the ROI safety sphere with waypoints highlighted as single spheres. In (a), the shortest path is calculated on the XZ plane, which transforms the y-value of all points to zero. In (b), the waypoints are rotated back to the original rotation of

p_{s t a r t}

and

p_{e n d}

.

Figure 10. Calculation of the tangential points

t_{1}

and

t_{2}

in the two-dimensional

X Z

plane to calculate the circular path points and the tangential vectors.

M

is the center of the Thales circle, where

R

is the center of the safety sphere.

Figure 10. Calculation of the tangential points

t_{1}

and

t_{2}

in the two-dimensional

X Z

plane to calculate the circular path points and the tangential vectors.

M

is the center of the Thales circle, where

R

is the center of the safety sphere.

Figure 11. The UAV maintains a safe position above the robot, ensuring a collision-free path to the viewpoints that represent the local maxima of perspective coverage along the optical axis of the ROI.

Figure 12. Collider in the robot cell. (a) Shows the side view, (b) is the top-down view. The transparent orange box colliders stand for colliders in the scene. Purple capsule colliders surround the robot’s joints, and the sphere collider stands for the ROI’s safety sphere. The colliders do not block the view for the UAV.

Figure 13. In (a), screenshot of the UI for monitoring the scene and controlling the UAV. The paths to the available viewpoints are highlighted in their respective colors, as shown for the purple target here. The path of the TCP is highlighted in yellow, and the UAV movement path is presented in white. (b) Shows the perspective from the UAV’s camera.

Figure 14. Position of the four static cameras in the scene numbered from 1 to 4. In the bottom right corner, the view from camera 2 is presented.

Figure 15. Comparison of the perspective coverage of the UAV and the static cameras in the scene. The aggregated coverage is the highest perspective coverage of the viewpoint in the respective color of the timeline and, thereby, describes the highest coverage under the given constraints. The timeline below highlights the current viewpoint offering maximum perspective coverage over time.

Figure 16. The score of the current optimal viewpoint and the average score for the respective segments. Underneath is a line diagram displaying the color of the optimal viewpoint.

Figure 17. Welding application simulated in Unity (a) and ABB RobotStudio (b). The simulation in (a) shows the UAV’s movement path as a white line. In (b), the robot’s movement path is highlighted in yellow, and numbers visualize the sequence of contact points and the desired ROI’s rotation in the path segment.

Figure 18. Perspective coverage of the UAV and the static cameras in the welding application.

Figure 19. Visualization of the score. Frequent changes of the optimal viewpoint can be seen in the bottom timeline. Black segments represent flights to the safe position, as there are no viewpoints available.

Figure 20. Recorded score, perspective coverage, and distance of the UAV from each viewpoint. The colors of the lines represent the color of the respective viewpoint, visible in Figure 8. Each viewpoint search procedure starts from a different starting point around the ROI and offers a different score, perspective coverage or distance. The current state of the UAV is shown below.

Table 1. Requirements for using a UAV as a dynamic camera in any robotic application.

Requirement	Explanation
Determine the optimal viewpoint from defined ROI	The user should be able to define the position, orientation, and size of the ROI dynamically during a running task. The optimal viewpoint is defined from the geometric relationship of the ROI and the camera perspective to maximize process insight over the runtime.
Use UAV as dynamic camera within a control framework	Using a UAV as dynamic camera offers the highest flexibility in continuously achieving the optimal viewpoint, independent of the environment or other robot systems, which may be constrained by their reach and flexibility. A technical framework is needed to enable control while considering the specific characteristics of the system.
Unoccluded	The optimal viewpoint must grant a clear, unoccluded view onto the ROI.
Protected collision zones for path planning	Collision zones must be defined between the robot and the UAV, as well as between the UAV and its environment. Similarly, no-fly zones should be defined in the same way. These collision zones must also be considered when determining the available flight paths.
Comparison of the costs of viewpoints	The costs of getting from one viewpoint to another should be considered. The costs are defined as the length of the route and the lack of perspective coverage on the route. The path with the lowest costs is considered optimal.
Consideration of fluctuations in the selection of the optimal viewpoint	It can be assumed that there are system states in which the costs of flying a path are similar, but sudden changes in path decisions can occur. This would result in an unstable flight behavior of the UAV, as it responds to spontaneous changes in movement direction. Therefore, a decision-making method or strategy should be implemented to ensure consistent decision behavior.
Measurability of method	The entire method of visually covering an ROI should be measurable. For this purpose, perspective coverage of the ROI over time should be expressed in a numerical value and the resulting method should be assessable.

Table 2. The implemented states and their specific actions.

State	Explanation
Following State	In the following state, the UAV follows an optimal viewpoint until it is not available anymore or the threshold condition for $f (p)$ is reached. If the optimal viewpoint changes, the UAV switches to the changing viewpoint state. Arrived at the next viewpoint, the UAV returns to the following state.
Changing Viewpoint State	This state becomes active when a next viewpoint has been identified either if $f (p)$ is better or if the last followed is not available anymore. The UAV navigates to the next viewpoint and then returns to the following state.
Hovering State	If a viewpoint is suddenly no longer available in the following state, the UAV switches to the hovering state. In this state, it remains hovering in the air for a maximum of two seconds and waits for a viewpoint to become available. If no viewpoint becomes available within this time, the UAV switches to the safe position state.
Safe Position State	The safe position is a manually defined point by the user and is located to be reached safely, e.g., above the robot with a sufficient safety distance. Flying to the safe position has to be completed before changing to another viewpoint, in order to avoid abrupt changes in direction. The UAV waits at the safe position until a new viewpoint becomes available.

Table 3. Results of the palletizing task.

Final Results	Value
Theoretically maximum coverage in the scenario	2.513%
Average perspective coverage of the UAV	2.166%
Highest average coverage of a static camera	0.850%
Portion of following time	100%
Portion of changing time	0%
Portion of safe position state time	0%
Portion of time hovering	0%
Number of viewpoint changes	0
Number of safe position flights	0

Table 4. Results for the welding application.

Results	Value
Theoretically highest possible coverage in the scene	2.513%
Average perspective coverage of the UAV	1.667%
Highest average coverage of a static camera	0.298%
Portion of following time	74.136%
Portion of changing time	14.987%
Portion of safe position state time	8.199%
Portion of time hovering	2.678%
Number of viewpoint changes	5
Number of safe position flights	3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Boshoff, M.; Kuhlenkötter, B.; Koslowski, P. Dynamic Camera Planning for Robot-Integrated Manufacturing Processes Using a UAV. Robotics 2025, 14, 23. https://doi.org/10.3390/robotics14030023

AMA Style

Boshoff M, Kuhlenkötter B, Koslowski P. Dynamic Camera Planning for Robot-Integrated Manufacturing Processes Using a UAV. Robotics. 2025; 14(3):23. https://doi.org/10.3390/robotics14030023

Chicago/Turabian Style

Boshoff, Marius, Bernd Kuhlenkötter, and Paul Koslowski. 2025. "Dynamic Camera Planning for Robot-Integrated Manufacturing Processes Using a UAV" Robotics 14, no. 3: 23. https://doi.org/10.3390/robotics14030023

APA Style

Boshoff, M., Kuhlenkötter, B., & Koslowski, P. (2025). Dynamic Camera Planning for Robot-Integrated Manufacturing Processes Using a UAV. Robotics, 14(3), 23. https://doi.org/10.3390/robotics14030023

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Dynamic Camera Planning for Robot-Integrated Manufacturing Processes Using a UAV

Abstract

1. Introduction

2. Related Works

2.1. Camera Planning

2.2. Applications in Robot Vision Involving Dynamic Camera Planning

2.3. Visual Servoing a Dynamic Camera in Robot Tasks for Viewpoint Optimization

2.4. Using a UAV as Dynamic Camera Within Robot Tasks

2.5. Requirements for Visually Assisting Robot Applications with a UAV

3. Our Approach

3.1. Measurability of Perspective Coverage from Geometric Relationship

3.2. Eliminating the Effect of Distortion

3.3. Finding Occlusion-Free Optimal Viewpoints

3.4. Starting Points for the Hill Climber

3.5. Collision-Free Path Planning

3.6. Decision for the Optimal Viewpoint

3.7. Implementation of a State Machine

4. Experimental Results and Discussion

4.1. Results of the Palletizing Application

4.2. Results of the Welding Application

4.3. Limitations and Future Work

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI