**Contents**


### **Mohammad Pashaei, Michael J. Starek, Hamid Kamangir and Jacob Berryhill** Deep Learning-Based Single Image Super-Resolution: An Investigation for Dense Scene Reconstruction with UAS Photogrammetry Reprinted from: *Remote Sens.* **2020**, *12*, 1757, doi:10.3390/rs12111757 . . . . . . . . . . . . . . . . . **193 Mohamed Barakat A. Gibril, Bahareh Kalantar, Rami Al-Ruzouq, Naonori Ueda, Vahideh Saeidi, Abdallah Shanableh, Shattri Mansor and Helmi Z. M. Shafri** Mapping Heterogeneous Urban Landscapes from the Fusion of Digital Surface Model and Unmanned Aerial Vehicle-Based Images Using Adaptive Multiscale Image Segmentation and Classification

Reprinted from: *Remote Sens.* **2020**, *12*, 1081, doi:10.3390/rs12071081 . . . . . . . . . . . . . . . . . **223**

## *Editorial* **Editorial for Special Issue "UAV Photogrammetry and Remote Sensing"**

**Fernando Carvajal-Ramírez 1,\*, Francisco Agüera-Vega <sup>1</sup> and Patricio Martínez-Carricondo 1,2**


#### **1. Introduction**

The concept of Remote Sensing as a way of capturing information from an object without making contact with it has, until recently, been exclusively focused on the use of earth observation satellites.

The emergence of unmanned aerial vehicles (UAV) with Global Navigation Satellite Systems (GNSS) controlled navigation and sensor-carrying capabilities has increased the number of publications related to new remote sensing from much closer distances. Previous knowledge about the behavior of the Earth's surface under the incidence of energy of different wavelengths has been successfully applied to a large amount of data recorded from UAVs, thereby increasing the special and temporal resolution of the products obtained.

More specifically, the ability of UAVs to be positioned in the air at pre-programmed coordinate points, to track flight paths, and in any case, to record the coordinates of the sensor position at the time of the shot and pitch, yaw, and roll angles have opened an interesting field of applications for low-altitude aerial photogrammetry, known as UAV Photogrammetry. In addition, photogrammetric data processing has been improved thanks to the combination of new algorithms, e.g., structure from motion (SfM), which solve the collinearity equations without the need for any control point, producing a cloud of points referenced to an arbitrary coordinate system and a full camera calibration, and multi-view stereopsis (MVS) algorithm that applies an expanding procedure of a sparse set of matched keypoints in order to obtain a dense point cloud. The set of technical advances described above allows geometric modeling of terrain surfaces with high accuracy, minimizing the need for topographic campaigns for the georeferencing of such products.

This special issue aims to compile some applications realized thanks to the synergies established between the new remote sensing from close distances and UAV Photogrammetry. The contributions are briefly described below in alphabetical order of the first author.

#### **2. Overview of Contributions**

In the paper [1], the authors carried out an interesting combination of UAV Photogrammetry and Large-Scale Airborne Lidar Data to monitor snow masses in a forested region in central Arizona, United States. They observed that in low dense forest conditions, both sources of data deliver similar snow depth maps while in high dense forest, lidar maps are more accurate. In the other hand, UAV Photogrammetry terrain model can be used to basin-scale snowpack estimation with a multi-temporal information with a lower cost that airborne lidar campaigns.

In [2], the authors stablish the optimal distribution and number of Ground Control Points (GCPs) to use in corridor maps applied to linear projects obtained in southeast Spain. They used UAV Photogrammetry based on SfM and SMV algorithms and concluded that

**Citation:** Carvajal-Ramírez, F.; Agüera-Vega, F.; Martínez-Carricondo, P. Editorial for Special Issue "UAV Photogrammetry and Remote Sensing". *Remote Sens.* **2021**, *13*, 2327. https://doi.org/10.3390/rs13122327

Received: 4 June 2021 Accepted: 10 June 2021 Published: 13 June 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

9–11 GCP distributed alternatively on both sides of the road, with a pair of GCPs at each end of the road yielded optimal results regarding fieldwork cost.

The paper [3] presents a valuable fusion of digital surface model (DSM) in an extremely challenging urban environment with high level detail, and UAV orthomosaic. The authors integrated three models: adaptive hierarchical image segmentation optimization, multilevel feature selection, and multiscale supervised machine learning. They concluded that the applied methodology showed an excellent potential for the mapping the selected urban landscape in Malaysia.

The quality assessment of UAV Photogrammetric products was the main concern of [4]. In this work, the geolocation procedure of UAV orthomosaics time series was optimized, obtaining a reproducibility of 99% in a grassland located in Germany and 75% in a forest area in the Spanish Pyreneess.

UAV Photogrammetry can model terrain surfaces with extreme or quasi-vertical morphologies [5]. In this work, several combinations of number of GCP, distribution and image orientation were tested in a dam belonging to Spain's hydraulic heritage, located in the Almería province, obtaining similar results than terrestrial laser scanner TLS. The authors advised that the results ostensibly improve including oblique images and break lines.

Vegetation used to be an obstacle for accurate DSM. The authors of [6] applied Deep Learning and Terrain Correction models in Chinese Loess Plateau to solve the restriction of UAV Photogrammetry in a vegetation-dense area with a complex terrain due to reduced ground visibility and lack of robust filtering algorithms. They detected the vegetation with overall accuracy of 95% and the mean square error of final DTM was 0.024 m.

In other cases, the target cover to be detected is precisely vegetation that frequently is modelled through standard vegetation indexes. In [7], the authors studied useful correlations between certain parameters of chemical analysis carried out in agriculture crops and vegetation indexes obtained from UAV-Photogrammetry assessments.

In the design process of a UAV photogrammetric project, the resolution of the images to be captured is established according to the minimum size of the smallest target to be detected. The user wonders how much information is lost when the imagery resolution decreases. In [8], the authors apply the deep convolutional neural network approach, based on a single image super-resolution, on low-resolution UAV imagery for spatial resolution enhancement. Using these high-resolution images in a SfM Photogrammetric process, they observed that the number of points in dense point cloud is about 17 times more than those extracted from a low-resolution image set.

In the paper [9], the authors carried out an interesting practical application for the 3D reconstruction of Power Lines based on UAV Photogrammetry. They stablished a 3D corridor around power lines and detected some objects inside this volume as obstacles that could threat safety of the infrastructure. They compared UAV Photogrammetry models with total station survey and terrestrial laser scanning, concluding that the accuracy is consistent.

UAV Photogrammetry can be used as a valuable source of data in architectural design process [10]. They proposed a virtual integration of UAV Photrogrammetry products and architectural design using building information modeling (BIM) technology, observing error reductions, and significate time and cost saving.

The implementation of GNSS-based UAV navigation capability in real time kinematic (RTK) mode has reduced or even eliminated the need for topographic campaigns to obtain GCPs. However, this technology implies systematic errors in elevation. In [11], the authors observed a linear relationship between these errors and the deviation in the focal length adjustment and proposed the combination of two flights with different image axis angles for their elimination.

One of the difficulties that large UAV projects present is the management of high quantity of images. In [12], the authors proposed an intelligent method for image selections in UAV-Photogrammetry projects that can be used to avoid the time-consuming manual image selection process, maintaining overlaps needed for point cloud extraction and avoiding reductions of products accuracy.

#### **3. Conclusions**

The set of contributions to this special issue points out that the possible scientific applications in the field of UAV Photogrammetry and Remote Sensing from close distances is very wide.

UAVs not only take advantage of the previous knowledge in Remote Sensing acquired over the years, but they also improve and expand its possibilities thanks to the control that users have over the sensors.

UAV photogrammetry has been shown as a previous process in all those remote sensing applications whose observed targets are spatially distributed along the terrain surface, obtaining orthomosaics and digital surface models with high spatial and temporal resolution.

**Author Contributions:** Conceptualization and investigation, F.C.-R., F.A.-V. and P.M.-C.; writing original draft preparation, F.C.-R.; writing—review and editing, F.C.-R.; visualization and supervision, F.A.-V. and P.M.-C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The guest editors of the special issue "UAV Photogrammetry and Remote Sensing" would first like to sincerely thank the authors who have contributed with their innovate work. We would like to acknowledge the valuable work of the reviewers who have improved the quality of the published papers and thank the editorial staff for their support and kind invitation to be guest editors of a special issue of the prestigious journal "Remote Sensing".

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Article* **Using UAV-Based Photogrammetry to Obtain Correlation between the Vegetation Indices and Chemical Analysis of Agricultural Crops**

**Jiˇrí Janoušek 1,\*, Václav Jambor <sup>2</sup> , Petr Marco ˇn <sup>1</sup> , Pˇremysl Dohnal <sup>1</sup> , Hana Synková <sup>2</sup> and Pavel Fiala <sup>1</sup>**


**Abstract:** The optimum corn harvest time differs between individual harvest scenarios, depending on the intended use of the crop and on the technical equipment of the actual farm. It is therefore economically significant to specify the period as precisely as possible. The harvest maturity of silage corn is currently determined from the targeted sampling of plants cultivated over large areas. In this context, the paper presents an alternative, more detail-oriented approach for estimating the correct harvest time; the method focuses on the relationship between the ripeness data obtained via photogrammetry and the parameters produced by the chemical analysis of corn. The relevant imaging methodology utilizing a spectral camera-equipped unmanned aerial vehicle (UAV) allows the user to acquire the spectral reflectance values and to compute the vegetation indices. Furthermore, the authors discuss the statistical data analysis centered on both the nutritional values found in the laboratory corn samples and on the information obtained from the multispectral images. This discussion is associated with a detailed insight into the computation of correlation coefficients. Statistically significant linear relationships between the vegetation indices, the normalized difference red edge index (NDRE) and the normalized difference vegetation index (NDVI) in particular, and nutritional values such as dry matter, starch, and crude protein are evaluated to indicate different aspects of and paths toward predicting the optimum harvest time. The results are discussed in terms of the actual limitations of the method, the benefits for agricultural practice, and planned research.

**Keywords:** multispectral imaging; vegetation indices; nutritional analysis; correlation; photogrammetry; optimal harvest time; UAV

#### **1. Introduction**

Precision agriculture (or site-specific crop management) is an internationally recognized concept and term referring to land cultivation by means of nontraditional technologies that were first designed and developed at the end of the 1980s [1–3]. The aim of the concept rests in adjusting cultivating procedures to suit local conditions, the main principle being to perform the crop-growing tasks at the right place, intensity, and time [4,5].

The standard process to estimate the condition of crops during the growth phase, especially when the correct harvest time has to be defined, involves a land survey in which sample plants are manually collected and then chemically analyzed in a laboratory. Such an approach, however, is labor- and time-intensive because it relies mainly on direct human inspection inside the crop fields, which are usually inhomogeneous and thus difficult to characterize accurately through a single analysis. An effective alternative then appears to lie in remote sensing, a technique applicable in determining crop maturity degrees over large areas. The procedure yields rapid information on spatial and temporal changes in the monitored quantities [6], allowing farmers to recognize and differentiate between the

**Citation:** Janoušek, J.; Jambor, V.; Marco ˇn, P.; Dohnal, P.; Synková, H.; Fiala, P. Using UAV-Based Photogrammetry to Obtain Correlation between the Vegetation Indices and Chemical Analysis of Agricultural Crops. *Remote Sens.* **2021**, *13*, 1878. https://doi.org/10.3390/ rs13101878

Academic Editors: Fernando Carvajal-Ramírez, Francisco Agüera-Vega and Patricio Martínez-Carricondo

Received: 31 March 2021 Accepted: 10 May 2021 Published: 11 May 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

specific conditions that characterize individual portions of the land; this task can also be performed via other survey methods, but only with considerable difficulty. The noninvasive evaluation of crop quality by means of multispectral imaging facilitates reform steps in agricultural management [7–10]. In the discussed field, remote sensing generally offers two functional options, namely, satellite imagery [11–15] and unmanned aerial vehicle (UAV) photogrammetry [16–20].

Advancements in unmanned aerial vehicles (UAVs) and the related developments concerning their use in remote sensing have made the technology a promising tool in recent decades [21]. Most importantly, UAVs (drones) as a remote sensing platform have shown major potential in crop-growth monitoring [22,23], where they ensure a proper balance between the image quality, sensing efficiency, and operating cost. The spectral information and vegetation indices derived from UAV-delivered multispectral or hyperspectral data have now been widely tested for this purpose [24–26].

This article describes the monitoring of a selected corn hybrid within pre-defined growth intervals, to find a relationship between the variation in the nutritional values of crops and changes in UAV-based photogrammetry images. In this context, one of the main problems investigated is the connection between the data obtained from chemical analyses of sample plants and the vegetation indices calculated from spectral reflectivities, which are monitored by using a multispectral camera. The aim of the research is to establish experimentally whether the optimum corn harvest time and quantity can be predicted, with an emphasis on searching for a mathematical relationship between a variation in the content of dry matter in the sampled corn and changes in the vegetation indices. The dry matter content constitutes a significant nutritional parameter for both biogas stations and livestock production [27,28]. The optimum corn harvest time differs according to the intended use of the crop and the technical equipment and installations in the relevant works; for this reason, it is then important to specify the ideal harvest time as precisely as possible, considering the circumstances [29–31].

At present, the maturity of silage corn is normally estimated based on the targeted sampling of plants over large corn-growing farm areas (usually having an acreage of dozens of hectares). The counts of samples differ markedly, depending on the desired accuracy. The proportion of dry matter, nutritional substances, and other parameters are defined by means of a 7- to 10-day laboratory analysis. Due to the cost, the heterogeneity of the samples, and labor intensity, the actual survey can be carried out with only a limited set of samples and does not effectively cover the areal changes in the vegetation. Within the Czech Republic, by extension, the diversity of pedoclimatic conditions and the sizes of land units point to a substantial imbalance in the properties of land managed by enterprises, and embody preconditions for the successful implementation of the principles of precision agriculture.

Currently, systems are available that can assess comprehensively the quantity and quality of crops; to provide relevant examples, we can refer to John Deere's HarvestLab 3000 and the Evo NIR sensor by Dinamica Generale S.p.A. Such systems utilize NIR (nearinfrared) cameras mounted on the harvesters, and are operated only during the harvest period [32–34]. When seeking the optimum nutritional values, agricultural technologists and researchers employ various types of prediction; the authors of source [35], for instance, exploited superspectral airborne imagery to predict corn grain yield and ear weight, and to discriminate between growth stages and irrigation treatments. The use of multispectral imaging to determine the phenophase, however, is somewhat contrasted by the fact that the authors focused solely on utilizing the normalized difference vegetation index (NDVI). Predicting the optimum harvest time is associated with diverse factors, including the volume of dry matter in the plants. The problem of dry matter in forage corn grown for silage is discussed in paper [36], with an emphasis on measuring the NDVI of the plants.

By contrast, our concept has been formulated to deliver area-specific harvest estimates that involve more input elements than merely the NDVI or chemical analyses performed with a limited number of plants from over the entire field. This article thus outlines original options for predicting the optimum harvest time, and these are based on searching for correlations between the chemical analysis of sampled corn and the images acquired with a spectral camera in the course of a UAV photogrammetry cycle. By another definition, the novelty of this article rests in that the statistical analysis is applied to reveal hitherto unexplored relationships between nutritional parameters, acquired through chemical analyses and vegetation indices yielded via processing data collected by a multispectral camera. These relationships will be utilized in estimating the optimum harvest time for the entire area of the selected cornfield.

To summarize the various views and perspectives, we can point out that this subsector still offers ample room for new approaches and interpretations.

As regards the actual structure of this article, the text is organized into three sections: Section 2 presents the chemical analysis of the samples, the remote sensing, and the data correlation methodology, Section 3 introduces the results, and Section 4 contains the discussion and conclusion.

#### **2. Materials and Methods**

The data collection and mathematical processing are characterized in the block diagram in Figure 1. The information relating to the investigated agricultural land is captured via UAV photogrammetry and manual selection.

**Figure 1.** A block diagram defining the data collection and processing.

The multispectral images delivered by the UAV-mounted camera enable us to compile relevant reflectivity maps, which then facilitate computing the vegetation indices. To obtain the nutritional indicators in the sampled corn, we performed laboratory-based chemical analysis. The vegetation indices and the results of the analysis were then correlated at various stages of the growth phenophases; this step allowed identifying the time when the crop yield is ideal for ensuring the production of silage or methane.

#### *2.1. Study Site*

The experimental monitoring and sampling were carried out over agricultural land managed by the enterprise Bonagro Blažovice, a.s., the type of crop involved being the corn hybrid LG Apotheos FAO 500, delivered by Limagrain Central Europe S.E. The land is located in the vicinity of the village of Prace (Figure 2), in the South Moravian Region, Moravia, the Czech Republic; the coordinates of the test fields are 49.1472789N, 16.7701758E. In terms of the climate, the land, being situated within a temperate zone and at an altitude of 260 m, generally experiences warm to hot summers. During the monitored phenophase, the average air temperature and precipitation reached 14.7 ◦C and 12.7 mm a week, respectively. The average amounts of precipitation differed between the individual

sampling cases. The concrete values equaled 17.5 mm per week in the first 3 weeks, zero (no rain) in the following 3 weeks, and 45.5 mm in the last week.

**Figure 2.** The location of the fields observed in the experiment.

The experiment started with the initial corn sampling on 12 August 2020, when the crop was going through the second half of the phenological stage of growth and was still earless. Procedurally, in the area of interest, we performed imaging and collected samples for chemical analysis, invariably at weekly intervals. In total, the samples were collected at eight time intervals, and the last sampling took place on 5 October 2020; by that date, compared to corn not involved in the experiment, the condition of the plants had already corresponded to a later post-harvest stage. The sampling and imaging were regularly executed between 12 p.m and 2 p.m.

#### *2.2. Imaging Methodology*

The initial step consisted of acquiring a sufficient quantity of various data by using a multispectral camera mounted on a UAV (Figure 3a,b). For this purpose, we employed a MicaSense RedEdge camera on a DJI Matrice 600 Pro aerial vehicle. The camera operates in 5 narrow spectral bands, and each of the sensors has a resolution of 1280 × 940 pixels. The device ensures the narrowband recording of wavelengths in regions sensitive to the human eye, namely, the range of 400 to 700 nm in blue—B, green—G, and red—R, and also within the rim of the red sector of visible light (red edge—RE); the near-infrared (NIR) range, invisible to the human eye, is recorded also. The concrete parameters of the bands are summarized in Table 1. To carry out the scanning, we preset the automatic sequential image shooting mode, based on the exact position of the aerial vehicle. The images were stored on a memory card, together with the metadata comprising the concrete GPS locations where the images were taken. Importantly, the camera contains a light sensor module to correct the exposure at varying light intensities; the sensor thus automatically adjusts the camera exposure according to the angle of the incident beams and the brightness. To compensate for the reflectivity, we calibrated the sensors before each flight by taking images of the gray calibration panel indicating the known parameters (Figure 3c). The panel ensures that the images remain stable regardless of the light conditions.

The flight path was created via the application Pix4D Capture [37]. The total area of the monitored crops exhibited a rectangular shape and dimensions of 401 m × 331 m (approximately 13.2 ha) (Figure 4). The length of the flight path equaled 4477 m, and the actual survey flight took 31 min, with an image overlap of 70%. The UAV performed the imaging at a speed and height above ground level of 8.6 km/h and 40 m, respectively. In each monitored band, we acquired 450 images with a resolution of 2.78 cm/pixel.

**Figure 3.** (**a**) The UAV in operation; (**b**) the RedEdge multispectral camera; (**c**) the calibration panel showing the known reflectivity value for each of the bands recorded.


**Table 1.** The parameters of the bands recorded by the RedEdge camera.

**Figure 4.** The area of interest visualized in Pix4D Capture.

#### *2.3. Nutrition Value Processing*

The biomass from the harvested crops comprises exclusively whole corn plants. The basic indicator relating to the phenophase of crops rests in determining the dry matter content; the term dry matter then represents the solid, waterless portion of fodder. In corn, the dry matter indicates vegetation maturity [29]. During the vegetation growing season, the chemistry of corn plants changes; in the course of the earless phases, the energy is stored particularly in the fibers. Fodder for dairy cows, however, requires ear starch. Thus, to ensure that the silage contains both the fibers and the starch, the crops are harvested at wax maturity, when the plants' dry matter content reaches 280 to 330 g/kg. In such cases, the milk line stage attains the level of 2/3 in the corn grain. Another vegetation maturity indicator lies in the corn's capability of being silaged, namely, producing the fermentation acids that conserve the silage.

The quality of the fermentation processes is fundamentally influenced by the harvest time and the total biomass quantity.

Every year, the properties of the crops and the silaging are highly variable, depending on the weather, the selected hybrid and its Food and Agriculture Organization (FAO) designation (the number of vegetation days), and the quality of sowing and care. During the growth, the dry matter content increases, and the fibers lignify. Furthermore, the development of the ears causes the volume of starch to rise, while the amount of sugars decreases due to their transformation into grain starch. In this manner, an easily silageable plant becomes one that can be silaged with medium difficulty, and the chopped crop then has to be shortened to allow effective packing-down and air removal. All of these changes play a major role in the specification of the harvest time [6,30].

The sampling was invariably performed at identical time intervals, together with the multispectral imaging. To monitor the quality of the corn hybrid, we opted for sampling according to the methodology recommended by the Central Institute for Supervising and Testing in Agriculture, Brno, Moravia, the Czech Republic [38]. Before commencing the inspection, we selected 3 spots in various sectors inside the area to obtain representative data of the growth homogeneity. Each of the spots provided 10 successively neighboring plants, and these were immediately transported to the Pohoˇrelice-based laboratories operated by the company NutriVet, s.r.o. After being separated, the samples were ground and dried at 60 ◦C for approximately 24 h to yield a stable content of dry matter. At the sampling and measurement stage, the plants were still earless and could thus be shredded without prior disjoining. The dried mass was homogenized by grinding in a laboratory mill with 1-mm screen openings. Subsequently, each sample was analyzed twice to supply, at different stages of the procedure, information relating to the following structural, nutritional, and chemical quantities: FM—fresh matter; EW—ear weight; DM—dry matter; CP—crude protein (established from the dry matter); CF—crude fiber; starch—starch content; ash—ash content; NDF—neutral detergent fiber; DNDF—digestibility (NDF); and DOM—organic matter digestibility. The data obtained then facilitated computing the hectare yield indicators, namely, the yields of fresh matter (YFM) and dry matter (YDM).

All of the analyses involving the chemical quantities indicated above were executed by applying common techniques. The contents were determined via the methods specified by the Association of Official Analytical Chemists (AOAC), which are represented through numerical codes; here, each code stands for a method used with a particular substance. Thus, we can provide the following list: DM (# 934.01), ash (# 942.05), crude protein (# 976.05), starch (# 920.40), NDF (# 2002.04), and DNFD (# 973.18) [39–42]. The outcomes then enabled us to compute, for each sampling phase, the average values in the monitored substances. After the fifth sampling, when the corn had already developed the ears, the analysis already involved separating the ears from the parent plants and weighing them without the leaves. The procedures were completed by establishing the dry matter and starch contents.

Usually, corn sampling to assess the condition and phenophase takes place at diverse spots. At the milk line stage, the samples began to be transported to laboratories to determine the dry matter contents in both the grain and the entire plant. Based on the level of plant development and the dry matter volume, the harvest time is preliminarily specified and differentiated according to the intended use, namely, milk or methane production (the latter in biogas stations).

#### *2.4. Vegetation Reflectivity Preprocessing*

The scanned multispectral wavelength bands for the blue, green, red, red edge, and near-infrared sectors interact with the vegetation differently, depending on the solar radiation, the absorption and reflection of which result from and show dissimilarities in the overall chemical composition and the contents of water, pigments, and nutrients. The high contrast of variations in the near-infrared band ensures broad usability when setting up vegetation indices. Furthermore, the narrow red edge band also exhibits strong reflectivity changes, from the absorption of red to the considerable reflection of near-infrared radiation. Out of the monitored bands, the near-infrared spectrum has the strongest reflectivity, and, together with the red band, is the most frequently applied option when assembling vegetation indices [20].

The data sensing was carried out at eight time intervals corresponding to specific 7-day phenophases of plants. For each of the monitored spectral bands, we formed a TIF image that embodied a reflectivity map covering the entire area. In these maps, we defined the homogeneous growth subareas that matched the sampling spots. The multispectral images of the individual scanned phases were processed using the Pix4D Mapper software. We then set up a color matrix of the vegetation pixels, assigning this matrix to each image data band. The images of the gray calibrating body indicating the known reflectivity values allowed us to acquire the mean reflectivity value in the preset section of the monitored growth area.

The patterns of the scanned spectral band values will produce a spectral reflectance curve, which represents the quantity of radiation reflected over the entire range of the wavelength bands observed. The spectral reflectance, *ρ* (*λ*) , defines the energy proportion between the reflected *E<sup>R</sup>* (*λ*) and the incident *E<sup>i</sup>* (*λ*) solar radiation at a certain wavelength; utilizing the formulas employed in sources [43–46], we then have:

$$
\rho\_{\left(\lambda\right)} = \frac{E\_{\rm R}\left(\lambda\right)}{E\_{\rm i}\left(\lambda\right)} \cdot 100\ [\%]. \tag{1}
$$

#### *2.5. Multispectral Indices*

The indices usually originate from computing at least two spectral images, selected in such a manner that the vegetation reflectivity changes become prominent. In the majority of cases, the indices are functionally equivalent, and more than 150 have been presented in the literature to date; however, only a small subset of these rest on a solid biophysical basis or were systematically tested [47–52]. Our experiment verifies possible correlations in three proportional indices, computed via a normalized proportion of surface reflectivities.

Each vegetation index focuses on certain vegetation properties and has a specific applicability. To facilitate the analysis, we used the proportional indices NDVI, NDRE, and GNDVI; all of these instruments are computed identically, the only difference being that they contain diverse spectral bands. Together, the indices then embody a comprehensive cross-section through the observed wavelengths (Figure 4).

The formula for calculating the normalized vegetation indices and the spectral band reflectivities reads:

$$\text{NDVI} = \frac{\rho\_{NIR} - \rho\_{Red}}{\rho\_{NIR} + \rho\_{Red}} \,\text{'}\tag{2}$$

$$\text{NDRE} = \frac{\rho\_{NIR} - \rho\_{RedEdge}}{\rho\_{NIR} + \rho\_{RedEdge}} \,\text{\prime} \tag{3}$$

$$\text{GNNDVI} = \frac{\rho\_{NIR} - \rho\_{Green}}{\rho\_{NIR} + \rho\_{Green}}.\tag{4}$$

The normalized difference vegetation index, NDVI, constitutes a numerical indicator of plant health and a source of details on vegetation changes. The index also performs the following functions of informing on the amounts of water stress and the chlorophyll in a plant, assessing the monitored vegetation surface through the proportion between the red and infrared sectors of the spectrum, and recognizing tiny vegetation differences, due to the reflectivity of the near-infrared spectrum [41].

The NDVI takes values between −1 and 1; the higher values usually represent "greener" plants having a photosynthetic capacity greater than that of the other components within the area of interest. In permanent crops, grasses, and cereals, but also in some row crops at the later stages of full growth, the chlorophyll content reaches a point where the index "saturates" close to the maximum value (NDVI 1.0). In such cases, detecting differences between plants by using the NDVI becomes problematic. At the later growth stages, the vegetation aging causes the NDVI values to decline [53–55].

The NDVI utilizes the red band, which is intensively absorbed by the upper portions of the overall plant surface. The lower levels of the plant thus do not contribute significantly to the actual measurement, worsening the correlation between the NDVI and the volumetric properties of the plant. This effect becomes more important in tall plants that carry multiple layers of leaves, especially at the later stages [53].

The normalized difference red edge index (NDRE) utilizes, similarly to the NDVI, the near-infrared band and the frequency band that is situated in the transition region between the visible and the infrared spectra, namely, the red edge band (*ρRedEdge*) [56].

In the NDRE, the computation allows us to better penetrate permanent or late crops, as the absorption by only or primarily the upper level of the plant is not as intensive as in the NDVI. Moreover, the NDRE is somewhat less sensitive to saturation in thick vegetation and therefore offers superior effectivity in the measurement of changes, when the NDVI takes near values +1.0 [53,56].

The green normalized difference vegetation index (GNDVI) exploits for the computation the wavelength of the green spectrum instead of that of the red one, with ρNIR representing the reflectivity values in the near-infrared band and *ρGreen* denoting the values in the green band [57].

The benefit of this index lies in its high correlation with the biophysical parameters of the investigated plants and its low sensitivity to other areas monitored. At the green wavelengths, the reflectivity better responds to variation in the biomass quantity. Furthermore, the green band delivers a higher probability of capturing differences in the lack of nutrients, which then manifest themselves in the resulting production of crops. Assuming these advantages, the index has the potential to eliminate the insufficient sensitivity of the NDVI (due to the green component of the spectrum) [58].

**Figure 5.** The maps of the investigated land at the fifth stage of scanning: (from left to right) the NDVI, NDRE, and GNDVI maps.

#### *2.6. Correlation Analysis*

To determine the relationships between the chemical analyses and the reflectivities of the spectral images, we sought the correlation coefficients. In this context, correlation does not imply causality: we only searched for a mutual linear relationship. The correlation rate was specified through the calculated correlation coefficient, which may take a value from −1 to +1. The resulting values of the correlation coefficient +1 establish a completely direct relationship, and the first variable tends to grow; by contrast, the values of the coefficient −1 establish a wholly indirect relationship, and the first variable tends to decline. If the coefficient equals zero, then no linear relationship exists between the monitored parameter and the reflectivity or the vegetation index.

To decide whether the correlation coefficients were large enough to enable us to plausibly assume a mutual relationship, we needed to calculate their statistical significance. The statistically significant value was calculated according to Student's *t*-distribution, with degrees of freedom *n* − 2. We used:

$$t\_{\text{score}} = \frac{r\sqrt{n-2}}{\sqrt{1-r^2}}.\tag{5}$$

where *r* is the Pearson correlation coefficient.

When searching for the statistically significant value, we selected a significance level of 2%, and the Student's critical value equaled 3.143. If the coefficient is higher than the critical value, the correlation can be considered statistically significant.

#### **3. Results**

This section outlines the results obtained from the nutritional analysis and presents the spectral curves of the reflectivities at the scanned wavelengths, acquired via processing the multispectral images and computing the vegetation indices. These aspects were completed with a description of the process of calculating the correlation coefficients associated with the relationships between the laboratory results' variation and the data from the multispectral images.

#### *3.1. Nutrition Analysis*

In each of the corn samples, on the individual sampling days, we invariably weighed the total mass of 10 plants. Subsequently, the FM (fresh matter) and EW (ear weight) rates were established in each sample; the latter rate, however, began to be determined only with the 5th sampling. The relevant chemical analysis then allowed us to establish the contents of structural, nutritional, and other substances. Out of all the sampled and analyzed values, we computed—invariably for one sampling stage—the average value of the given parameter. The values resulting from the individual sampling instances are summarized in Table 2.

#### *3.2. Multispectral Image Processing*

The spectral reflectivities acquired from the spectral maps capturing the monitored vegetation are minimal—in all the scanning phases—in the visible part of the spectrum as compared to the reflectivity changes in the near-infrared band (see Table 3).

Figure 6 displays the spectral curves of the reflectivities to define the condition of the plants with respect to that of the overall vegetation. The green spectrum, with a wavelength of 560 nm, forms the local reflectivity maximum in the visible sector of the spectrum; the higher reflectivity, compared to those of the blue (475 nm) and red (668 nm) bands, stems from a strong correlation with the chlorophyll contained in the plants. The intensive absorption exhibited by the chlorophyll in the photosynthesis within the blue and the red spectra causes low reflectivity; in these spectra, the chlorophyll absorbs approximately 90% of the incident radiation.


**Table 2.** The laboratory results obtained from plants collected at the different sampling stages.

<sup>1</sup> DM—dry matter; <sup>2</sup> CP—crude protein (determined from the DM); <sup>3</sup> CF—crude fiber; <sup>4</sup> starch—starch content; <sup>5</sup> Ash—ash content; <sup>6</sup> NDF—neutral detergent fiber; <sup>7</sup> DNDF—digestibility (NDF); <sup>8</sup> DOM—digestibility (organic matter); <sup>9</sup> FM—fresh matter; <sup>10</sup> EW—ear weight; <sup>11</sup> YFM—yield of fresh matter; and <sup>12</sup> YDM—yield of dry matter.

**Table 3.** Evaluation of the vegetation reflectivities at the individual sampling stages.


**Figure 6.** The reflectivity relationships at the individual wavelengths and sampling stages.

In the near-infrared band, the vegetation shows 840 nm, with 717 nm being the value for the red-edge portion; the reflectivity is thus markedly higher than in the visible spectrum. The discussed band allows us to clearly discern the growth variation between the individual scanning stages. With the progressing phenophase, the reflectivity in the nearinfrared band exhibits a tendency to rise; however, a decrease begins after the maximum period has been reached and the plants have started to age.

To select suitable indicators for defining the vegetation changes in a time sequence, we first need to distinguish the differences in the reflectivities at the separate wavelengths and in the computed indices. The calculated average values of the indices from the investigated portions of land as related to the eight time intervals within the crop growth period are outlined in Table 4.


**Table 4.** The calculated values of the vegetation indices.

#### *3.3. Correlation Analysis*

All the measured spectral band reflectivity averages (blue, green, red, red edge, and near-infrared) and the calculated values of the vegetation indices (the NDVI, NDRE, and GNDVI) were correlated with the average nutritional values established at the individual sampling stages via laboratory analyses performed on the sampled plants (see Table 5). The linear correlation rates from Table 5 are presented in Table 6.



<sup>1</sup> DM—dry matter; <sup>2</sup> CP—crude protein (determined from the DM); <sup>3</sup> CF—crude fiber; <sup>4</sup> starch—starch content; <sup>5</sup> Ash—ash content; <sup>6</sup> NDF—neutral detergent fiber; <sup>7</sup> DNDF—digestibility (NDF); <sup>8</sup> DOM—digestibility (organic matter); <sup>9</sup> FM—fresh matter; <sup>10</sup> EW—ear weight; <sup>11</sup> YFM—yield of fresh matter; <sup>12</sup> YDM—yield of dry matter.

**Table 6.** The linear correlation rates.


In the individual correlation coefficients, we calculated the statistical significance values, and these were subsequently compared with the critical value of 3.143. The statistically significant values are highlighted in Table 7.

**Table 7.** The statistical significance of the correlation data acquired through multispectral imaging and by means of the chemical analyses.


<sup>1</sup> DM—dry matter; <sup>2</sup> CP—crude protein (determined from the DM); <sup>3</sup> CF—crude fiber; <sup>4</sup> starch—starch content; <sup>5</sup> Ash—ash content; <sup>6</sup> NDF—neutral detergent fiber; <sup>7</sup> DNDF—digestibility (NDF); <sup>8</sup> DOM—digestibility (organic matter); <sup>9</sup> FM—fresh matter; <sup>10</sup> EW—ear weight; <sup>11</sup> YFM—yield of fresh matter; and <sup>12</sup> YDM—yield of dry matter.

#### *3.4. Classifying the Vegetation Relationships*

The charts below visualize the linear relationships between the selected nutritional values and the monitored reflectivities or calculated vegetation indices (see Figure 7).

Of the established nutritional values, we selected the dry matter, nitrogen substances, and starch, due to their high statistical significance with respect to all of the imaging values, and also because they embody the most vital organic nutrients that determine the eventual quality of the corn at harvest. Using these organic nutrient values, which exhibit the most significant correlation indices, we compared the vegetation indices and major spectral bands, namely, red, green, and blue.

**Figure 7.** The laboratory-established linear relationships (left) between the vegetation indices (NDVI, NDRE, and GNDVI) (right), and the reflectivity in the red, green, and blue spectral bands, relating: (**a**) the vegetation indices to the dry matter; (**b**) the reflectance to the dry matter; (**c**) the vegetation indices to the crude protein; (**d**) the reflectance to crude protein; (**e**) the vegetation indices to the starch; (**f**) the reflectance to the starch.

#### **4. Discussion**

Analyzing multispectral images based on an exact knowledge of vegetation health is one of the procedures that support the transition from traditional agricultural methods to precision agriculture. An increase in the quality of harvested corn and a reduced fodder consumption following on from the ability to closely determine the optimum harvest time will generate novel approaches to the contactless analysis of plants at various growth stages, together with a major potential for automated and rapidly expandable applicability in most types of vegetation. In the case of corn, the optimum harvest time is established according to the content of dry matter, depending on whether the chopped crop is intended to be silaged or to produce methane in a biogas station. For an identification of the correct period, it is therefore necessary to know exactly the nutritional values of the crop on the entire land concerned.

Within the research, we achieved the preset goals, namely, defining the relationships between the nutritional parameters acquired through chemical analyses and the vegetation indices yielded via multispectral imaging of the entire area of the cornfield.

The subsections below characterize the results of the nutritional analysis and the outcomes of the multispectral image processing, including the computation of the vegetation indices. The core subsection presents the mutual correlation analysis of the relationships between the patterns of changes in the laboratory results and the data obtained from the UAV-based multispectral images. In this context, possible correlation uncertainties are also considered.

#### *4.1. Nutritional Analysis*

The evaluated nutritional indicators (Table 2) show that, in corn, the progressing phenophase is associated with an increasing content of dry matter (DM). Furthermore, the rising proportion of the grain is accompanied by a growing share of starch in the entire plant; the starch then embodies the central source of energy for the plant to be harvested. In the other parts of the organism, a decrease occurs in the nitrogen substances, and the digestibility of the fiber is markedly reduced due to lignification. Interestingly in this context, no correlation has been found to date between the fiber content and digestibility. The ideal harvest time was identified with the interval separating the 4th and 5th sampling stages. This optimum period was determined through the dry matter values, which, in the discussed phases, amount to 280–330 g/kg. More concretely, in all of the laboratory samples, the values at the 4th stage ranged from 277.9 g/kg to 349.8 g/kg, while at the 5th stage they already ranged between 291.8 g/kg and 347.4 g/kg. Another factor of importance rests in the average volumes of starch; at the 4th stage, the relevant value reached 273 g/kg/DM, and in the 5th phase it already equaled 319.8 g/kg/DM, with the ideal level of 300 g/kg/DM corresponding to 2/3 of the milk line stage. The intensive increase in the dry matter between the third and the fourth phases was induced by considerable precipitation; however, the fall of the precipitation rate down to zero then caused a sharp change in the nutritional values.

#### *4.2. Multispectral Image Processing*

The reflectivity relationships in the various portions of the spectrum confirm the expected scenario and represent the changing condition of the monitored crops over time (Figure 6). From the perspective of the reflectivity level, the spectral curves can be divided between two regions, namely, the visible part of the spectrum and the near-infrared sector. We can then observe a very low reflectivity in the visible portion of the spectrum (up to 670 nm), the relevant value being not more than 14% (the green band); thus, the radiation is mostly absorbed. The reason for this rests in the large quantity of biomass in the observed area, suggesting that the radiation is consumed through photosynthesis. The other set of monitored wavelengths gradually passes into the near-infrared region. This progressive transition is accompanied by more prominent differences (the red edge band) between the sampling stages, with the increasing reflectivity being the highest—at 39%—in the red

edge band in the 6th observed phase. After the maximum, the reflectivity values decline slightly. The most conspicuous differences characterize the NIR band (840 nm), where the greatest reflectivity divergence in the monitored growth stage reaches up to 45%. Similarly to the red edge region, the NIR band attains the maximum value in the 6th phase, which, too, is followed by a decline in the values. All of the monitored spectra are important for the subsequent computation of the vegetation indices.

Considering the calculated vegetation indices in Table 4, the tendency towards a steady minor decrease allows us to assume, without prior knowledge of the nutritional values, a later phenological phase in the plants. Through the monitored period, the vegetation index NDVI ranged between 0.72 and 0.97, the average value being 0.85. This index exhibited higher values than its counterparts. In the NDRE, the range was 0.36–0.65, with an average of 0.52, and the GNDVI showed a scope of 0.64–0.84 and an average of 0.77. The GNDVI thus possesses the smallest resolving ability. By contrast, the best sensitivity is obtained from the NDRE, where differences in a broad band of reflectivities are discernible.

The values of the NDRE enable us to observe a reflectivity shift towards lower levels, compared to the other two indices; such a scenario arises from calculating the proportion between the reflectivities with the red edge spectral component, which exhibits higher reflectivity variations.

#### *4.3. Correlation Analysis*

Within the research, we established that, as regards determining the changes through the eight investigated growth phases in the selected corn hybrid, the best correlation is found between the dry matter values and the NDRE index, Table 5. To evaluate the statistical significance of the correlation coefficients, we employed Student's *t*-test, applying the significance level of 2%; this is matched by the Student's T critical value of 3.143, Table 7.

This subsection discusses the statistically significant values of the correlation coefficients. A strong correlation can be established in the GNDVI (−0.901) and NDVI (−0.920). Furthermore, statistically significant values lie also in the correlations between the CP and the NDVI (0.922), the NDRE (0.854), and the GNDVI (0.746). The previously mentioned starch content, which also exerts a major impact on the resulting quality of the harvested corn, markedly correlates with the indices NDVI (−0.884) and NDRE (−0.867). The values of the indices NDVI, NDRE, and GNDVI also correlate very well with the calculated OM values. We can then infer from these facts that the strong correlations in the indices NDVI and NDRE are usable not only for determining the convenient harvest time but also for predicting the quantity of the organic matter (OM) obtainable from the yielded crop (correlation with the NDVI at −0.924); these steps are then prominent in establishing the organic matter yield.

The evaluated correlations in the individual narrow spectral bands lead to the assumption of a strong correlation in the red band, which correlates markedly with all the determined nutritional values, the strongest correlation being that with the CP (−0.953). By contrast, the weakest values of the correlation coefficient R are found in the NIR spectral band.

#### *4.4. Classification of the Vegetation Relationships*

The diagrams capturing the vegetation relationships (see Figure 7) allow us to derive formulas that facilitate predicting the most optimum harvest values by utilizing the dry matter-, crude protein-, and starch-related data. In this context, the details outlined in the previous subsection indicate that the greatest importance for the prediction rests particularly with the indices NDRE, as related to the DM, and the NDVI, as related to starch. The obtained NDRE linear relationship (*y* = −0.007 *x* + 1.0836) proposes that the ideal index values for the DM range within the NDVI value interval of 0.888–0.853. As regards the NDVI relationship (*y* = −0.0045 *x* + 0.9594), the ideal value amounts to 300 g/kg DM and 0.824 (NDVI starch).

#### *4.5. Comparing the Results with Previous Research Data*

As this article contains unique, novel results, we can only refer to research papers that associate with our experiment in a merely marginal manner. However, let us note that a similar investigation was described in source [40], whose authors demonstrated that the relative data of the NDVI, PH, and a combination of both are usable when predicting the DM yield of fodder corn grown for silage. By comparison, we can point out that our study includes more types of chemical analyses; moreover, we established that the NDRE index best correlates with dry matter variations and is, besides the NDVI, therefore suitable for determining the ideal harvest time in the selected corn hybrid.

Another project that marginally resembles ours is characterized in source [24], with a focus on exploiting superspectral airborne imagery to predict corn grain yield and ear weight, and to discriminate between growth stages and irrigation treatments. Although the authors of [24] utilize multispectral imaging to determine the phenophase in plants, they concentrate solely on the NDVI index and do not state any correlations with chemical analyses, as is the case with our study.

#### *4.6. Limitations and Future Work*

The use of the methods characterized herein is accompanied by uncertainties including, for example, adverse weather conditions that may impair or destroy the entire concept of the fieldwork. For the purposes of future research, some of these uncertainties can be minimized via diversifying the land to support the experiments.

The vegetation indices are unstable due to short-term changes in the weather and in the solar radiation intensity. To minimize the error in the results of the multispectral imaging, we needed to carry out relevant calibration (Figure 3c). This task was executed for the individual crops, at various vegetation periods and in diverse weather conditions, but at identical times of the day. In plant imaging, this calibration will eliminate the inaccuracies that arise from the single-use sampling.

Another drawback to our method probably consists in that we employ data for a given hybrid, site, and year. However, as we do not follow absolute values but instead changes in the chemical composition and vegetation indices, it is possible to assume that the results will be applicable more widely to diverse corn hybrids.

The repeatability and stability of the results are certainly limited by the initial choice of a sown hybrid. The established linear relationships between the nutritional values and vegetation indices, and possibly also the reflectivities of the individual spectral bands, relate to hybrids that exhibit common plant phenophases. Different values may be revealed in *stay-green hybrids*, characterized by prolonged vitality and lower dry matter volumes; these hybrids, however, must be distinguished separately, via the criteria of nutritional values and reflectivity in the different wavelength spectra.

An alternative to UAV-based remote sensing rests in satellite imaging; this procedure may embody a more easily available and less costly option where the reflectivities have to be computed over a very large or highly particularized land area.

The planned expansion of corn growth monitoring research involves, among other steps, assigning images to already completed measurement cycles and improving the precision of the correlation curves. Importantly, we intend to compare the individual crops at the various vegetation stages, realizing that nutrition differences constitute merely one of the sources of variations in spectral behavior; other relevant factors include, for example, marked discrepancies between hybrids of the same crop.

#### **5. Conclusions**

To characterize the main outcomes of the research in general terms, we can claim that the preset aims and objectives were met. We revealed *new mathematical relationships* between the nutritional parameters acquired through chemical analyses and the vegetation indices (such as the NDRE and NDVI) established via multispectral imaging. The defined relationships then allowed us to compute the relevant nutritional values from the multispectral images of the entire monitored cornfield, without the need to perform a chemical analysis (see Figure 7). The nutritional value data corresponded to the average value of the field and may significantly help the farmers in estimating the optimum harvest time.

Considering the applied methodology and procedural options, the optimum harvest time can be predicted solely via remote sensing with a multispectral camera and by utilizing the formulas set out in Figure 7, which enable us to compute the optimum values of the multispectral index (the *y* variable in the formula) by applying the known concrete values of the monitored substances (the *x* variable in the formula). Remarkably, the NDRE and NDVI indices facilitate, based on their high statistical significance (Table 7), predicting the contents of not only the dry matter, namely, the most significant value in this context, but also the starch and crude protein.

Such innovative evaluation of the discussed factors will effectively reduce the cost of additional chemical analyses, and both farmers and researchers will be able to estimate the required quantity over the entire area of the field(s). Thus, compared to a chemical analysis of a limited number of samples, it is possible to estimate more precisely in heterogeneous vegetation the optimum harvest time with respect to the nutritional values. The authors of this paper consider determining the optimum harvest time important as regards the quantity of dry matter to generate methane and the actual production chain between the fodder and the milk.

The multispectral imaging method nevertheless features certain limitations, as described in Section 4.6. The relationships indicated in Figure 7 then can (and will) be made more precise via further experimentation.

**Author Contributions:** Concept, J.J. and V.J.; methodology, J.J., V.J. and P.M.; data curation, J.J., V.J. and H.S.; original draft preparation and final writing, J.J., V.J., P.M., H.S., and P.D.; funding acquisition, P.F. All authors have read and agreed to the published version of the manuscript.

**Funding:** This paper was funded by the general student development project at Brno University of Technology.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** This paper was funded from the general student development project materialized at Brno University of Technology.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Article* **Photogrammetry Using UAV-Mounted GNSS RTK: Georeferencing Strategies without GCPs**

**Martin Štroner \*, Rudolf Urban, Jan Seidl, Tomáš Reindl and Josef Brouˇcek**

Department of Special Geodesy, Faculty of Civil Engineering, Czech Technical University in Prague, Thákurova 7, 166 29 Prague, Czech Republic; rudolf.urban@fsv.cvut.cz (R.U.); jan.seidl@fsv.cvut.cz (J.S.); tomas.reindl@fsv.cvut.cz (T.R.); josef.broucek@fsv.cvut.cz (J.B.)

**\*** Correspondence: martin.stroner@fsv.cvut.cz

**Abstract:** Georeferencing using ground control points (GCPs) is the most common strategy in photogrammetry modeling using unmanned aerial vehicle (UAV)-acquired imagery. With the increased availability of UAVs with onboard global navigation satellite system–real-time kinematic (GNSS RTK), georeferencing without GCPs is becoming a promising alternative. However, systematic elevation error remains a problem with this technique. We aimed to analyze the reasons for this systematic error and propose strategies for its elimination. Multiple flights differing in the flight altitude and image acquisition axis were performed at two real-world sites. A flight height of 100 m with a vertical (nadiral) image acquisition axis was considered primary, supplemented with flight altitudes of 75 m and 125 m with a vertical image acquisition axis and two flights at 100 m with oblique image acquisition axes (30◦ and 15◦ ). Each of these flights was performed twice to produce a full double grid. Models were reconstructed from individual flights and their combinations. The elevation error from individual flights or even combinations yielded systematic elevation errors of up to several decimeters. This error was linearly dependent on the deviation of the focal length from the reference value. A combination of two flights at the same altitude (with nadiral and oblique image acquisition) was capable of reducing the systematic elevation error to less than 0.03 m. This study is the first to demonstrate the linear dependence between the systematic elevation error of the models based only on the onboard GNSS RTK data and the deviation in the determined internal orientation parameters (focal length). In addition, we have shown that a combination of two flights with different image acquisition axes can eliminate this systematic error even in real-world conditions and that georeferencing without GCPs is, therefore, a feasible alternative to the use of GCPs.

**Keywords:** drone; GNSS RTK; UAV; photogrammetry; precision; accuracy; elevation

#### **1. Introduction**

UAV photogrammetry combined with the structure from motion (SfM) technique is a well-established method for the mapping and creation of digital terrain models (DTMs), digital elevation models (DEMs), and/or other spatial models. This technique is often used in mining [1–3], for monitoring of various natural phenomena and geohazards [4,5], for the detection of agricultural crops/trees [6–8], dam and riverbed erosion [9], modeling topographic features [10], updating cadastral data [11], solar irradiation estimates [12], etc. SfM has become so popular that it is currently also used besides ground and UAV photogrammetry in mobile measurement systems [13]; even attempts at creating DTMs using a mobile phone have been reported [14–16]. UAV photogrammetry simplifies the work, makes it faster, and improves the quality, although the resulting model accuracy depends on many circumstances, such as the configuration and number of ground control points (GCPs) [17,18], camera pitch [19,20], image overlap [21], flight trajectory [22], camera calibration method [23,24], software used for reconstruction [25], or the quality of global navigation satellite system (GNSS) signal processing [26–28]. As well as the point cloud,

**Citation:** Štroner, M.; Urban, R.; Seidl, J.; Reindl, T.; Brouˇcek, J. Photogrammetry Using UAV-Mounted GNSS RTK: Georeferencing Strategies without GCPs. *Remote Sens.* **2021**, *13*, 1336. https://doi.org/10.3390/rs13071336

Academic Editor: Fernando Carvajal-Ramírez

Received: 10 March 2021 Accepted: 28 March 2021 Published: 31 March 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

other outputs, such as orthomosaics, are also widely used [29]. Photogrammetry outcomes are usually evaluated using independent laser scanning or control points (CPs), the position of which is measured by ground surveys, which facilitate the assessment of model deformations [30–32]. Correct determination of the elements of internal and external orientation, traditionally performed using GCPs, is crucial for creating an accurate photogrammetric model. At present, the use of UAVs equipped with an onboard global navigation satellite system–real-time kinematic (GNSS RTK) receiver is on the rise. This equipment used to be prohibitively expensive but, lately, it has become more affordable. DJI Phantom 4 RTK multicopter is an example of such a low-end UAV. The knowledge of the camera position during image acquisition (with centimeter accuracy) has been suggested to be able to fully substitute the presence of GCPs for georeferencing of the photogrammetric model [33]. Elimination of the need for GCPs would be a great advantage, potentially making the measurement simpler and/or cheaper; in inaccessible areas, it could be crucial for even making the measurement possible. The resulting model accuracy should correspond to that of the GNSS RTK; the standard deviation should, therefore, not exceed several centimeters. Many studies (e.g., [33]) used such an approach, with satisfactory results. Other studies, however, reported that this approach, combined with the calculation of internal orientation elements when solving bundle adjustment, i.e., without a known camera calibration, yields results systematically shifted in the elevation axis that may be, in some instances, significantly higher than the expected accuracy [34–36]. The expected accuracy of the resulting point cloud is 1–2x the ground sampling distance (GSD) [37]; nevertheless, in our previous research, the systematic error was as high as tens of centimeters despite a ground sampling distance (GSD) of 0.03 m [34]. Similar results were presented, for example, by Forlani et al. [38]. The association of such high inaccuracy with the incorrect determination of the focal length (f) was suggested in those studies. Additionally, the systematic error was shown to be more or less random as it differed even between repeated flights on the same site with the same conditions (the same UAV and processed in the same way) [34]. A small number of GCPs [39], or even a single one [34], was, however, shown to significantly reduce this problem. Several studies also suggested methods that should suppress or even remove this phenomenon. Besides the aforementioned use of a minimum number of GCPs, the use of oblique images is another promising alternative [20,36,40]. The presented study aims to propose and test strategies enabling the acquisition of a quality photogrammetric model without the high systematic elevation error while avoiding the use of GCPs. Proof that the source of the error indeed lies in the incorrect focal length determination would show the path to resolving this problem—choosing a flight configuration ensuring a sufficiently accurate calculation of internal orientation elements. We performed image acquisition at two study sites with various flight parameters to independently evaluate the results associated with different external conditions. By processing this acquired imagery in various combinations, we aimed to find a working strategy (configuration) for safe and accurate use of the GNSS RTK-based approach without the systematic elevation error.

#### **2. Materials and Methods**

#### *2.1. Data Acquisition*

To facilitate the generalization of the results of this study, the experiment was performed in two study sites. The first site—Brownfield—had only a minimum of dense undergrowth, with low buildings, and concrete and natural surfaces without monochromatic areas (Figure 1). The other site—Rural—was characterized by continuous rapeseed fields alternating with more or less dense forest and shrubs (Figure 2). These two sites differ with respect to the present surfaces and their properties. Figures 1 and 2 show the point clouds in colors indicating the reliability of the individual points acquired through SfM in Agisoft Metashape (confidence), with lower numbers indicating lower reliability (confidence; possible range 1–255). The Brownfield site can be, therefore, considered highly suitable for SfM modeling, while the Rural site can be considered problematic, which is particularly true for the part of the site that is covered by dense tree vegetation.

**Figure 1.** Brownfield site ((**a**) orthomosaic, (**b**) point cloud color-coded according to confidence).

**Figure 2.** Rural site ((**a**) orthomosaic, (**b**) point cloud color-coded according to confidence).

The DJI Phantom 4 RTK UAV mounted with a camera equipped with an FC6310R lens (f = 8.8 mm), resolution of 4864 × 3648 pixels, and with a pixel size of 2.61 × 2.61 µm (total price approx. EUR 6000), was used for image acquisition. The GNSS RTK receiver was connected to the CZEPOS permanent reference station network.

The primary flight altitude was set to 100 m above ground with a vertical image acquisition axis and ground sampling distance (GSD) of 0.03 m; at the same altitude, flights with the image acquisition axis angled by 15◦ and 30◦ from the vertical direction were performed. In addition, flights at altitudes of 75 m and 125 m above ground with a nadiral imagery acquisition axis were performed (Figure 3).

Each flight was performed with a gridded flight plan; two perpendicular flights were performed for each flight setup (forming a double grid, see Figure 4).

Altogether, 10 flights with 75% front and side overlaps were performed. Table 1 shows the parameters of individual flights with the hereinafter used designations and numbers of usable images.

**Figure 3.** Flight altitudes and image acquisition directions.

**Figure 4.** Flight trajectory: (**a)** Brownfield; (**b)** Rural. Flight 1 in red, Flight 2 in blue.

Points that subsequently served, depending on the calculation method, as either control points or ground control points were stabilized at each of the sites. Where possible, these were marked out as a cross painted with a contrast matte spray (Figure 5 left). Where this was not possible, especially in the undergrowth, wooden boards with black and white targets were used (Figure 5, right) and stabilized using a 10 cm long nail. The dimensions of the crosses/targets were approx. 0.40 m × 0.40 m.

(Ground) control points were distributed as evenly as possible throughout both areas. In all, 25 points were stabilized at the Brownfield site and 30 at the Rural site (Figure 6). The (ground) control points were surveyed using a GNSS RTK Trimble Geo XR receiver with a Zephyr 2 antenna connected into the CZEPOS permanent reference station network (czepos.cuzk.cz). Measurement of each control point was taken three times (before, between, and after UAV flights) for detecting potential errors or variations caused, e.g., by the change of the configuration or satellite availability. The expected nominal accuracy of each coordinate was 0.03 m.


**Table 1.** Image acquisition flights and their properties.

**Figure 5.** (Ground) control points; (**a**) marking using a spray; (**b**) using black and white target.

**Figure 6.** Distribution of the (ground) control points throughout the study sites—Brownfield (**a**), Rural (**b**).

#### *2.2. Data Processing*

Both GNSS RTK measurements (UAV onboard and ground receiver) were processed using the same methods. First, the terrestrial GNSS RTK measurements were exported

from the receiver in the WGS 84 coordinate system (latitude, longitude, ellipsoidal height). Similarly, the spatial position (in the same coordinate system) was extracted from the GNSS RTK data-containing images using Exiftool. The offset between the GNSS receiver antenna reference point (ARP) and the camera center (CC) was automatically considered by the software. Subsequently, all data were converted into the Czech national coordinate positioning system (S-JTSK, System of Unified Trigonometric Cadastral Network) and the Bpv vertical datum (Balt after adjustment) using the EasyTransform software (http: //adjustsolutions.cz/easytransform/, accessed on 1 March 2021) to ensure that the same algorithm was used on all data and thereby to eliminate potential systematic errors that could occur as a result of different transformation algorithms. The three measurements taken for each point by the terrestrial GNSS RTK receiver were used for the calculation of the standard deviation. Then, images were processed in Agisoft Metashape 1.6.1 using the structure from motion calculation method (SfM) with the custom settings listed in Table 2.


**Table 2.** Agisoft Metashape settings used for calculations.

The UAV was not equipped with a professional metric camera. Although precalibration is generally recommended, the stability of the parameters in non-metric UAVmounted cameras cannot be fully ascertained and other studies have shown that laboratory calibration does not provide better results than the method used in this study [41,42]. The interior orientation parameters were, therefore, calculated in the usual way.

In all, five duplicate flights were performed at each site. The geometries of the imagery from individual flights differed (the bundles intersected at different points due to different flight altitudes and/or camera angles). Each flight (i.e., 10 flights at each site) was processed separately. In addition, joint processing of the two duplicate flights was also performed (i.e., 5 for each site), which allowed investigation of whether a simple increase in the number of images can improve the accuracy.

The primary flight was at an altitude of 100 m with nadiral image acquisition; the remaining flights were performed to acquire data for testing possible accuracy improvement strategies. Hence, all possible combinations of the 100 m altitude flight with nadiral image acquisition (the primary flight) and flights with other configurations were calculated, i.e.,: 100m\_1 + 75m\_1; 100m\_1 + 125m\_1; 100m\_1 + 60◦\_1; 100m\_1 + 75◦\_1; 100m\_1 + 75m\_2; 100m\_1 + 125m\_2; 100m\_1 + 60◦\_2; 100m\_1 + 75◦\_2; the same was done for the second primary flight (100m\_2). Therefore, 16 such combinations were calculated for each site.

Each of the variants described above was calculated with onboard GNSS RTK data only (All\_RTK) and with a combination of the onboard GNSS RTK receiver data with GCP coordinates (All\_Combined).

The processing sequence was the same in all flights—alignment, sparse cloud computation, system optimization, dense cloud calculation. Dense cloud was subsequently manually cropped along the convex envelope of GCPs and filtered in CloudCompare 2.11.0 using the CSF filter to remove trees and shrubs that could potentially cause uncertainty in comparisons of the dense clouds.

Subsequently, parameters of the regression and coefficient of determination describing the relationship between the difference in the focal length f (acquired from the respective flight and from all available data from the respective site, i.e., the All\_Combined calculation variant) and the (mean) systematic error in the entire model elevation were calculated for each site. Similarly, the correlation between the systematic error in model elevation and the principal point offset difference (Cx, Cy) was also calculated. The aim was to show the association between the elevation error and the calculation of internal orientation elements.

#### **3. Results**

#### *3.1. The Accuracy of the GNSS RTK Ground Geodetic Survey*

The accuracy of the GCP/checkpoints survey was evaluated through a calculation of standard deviations in individual coordinates SX, SY, S<sup>H</sup> from the replicates detailed in Table 3. The results confirm the expected accuracy of 0.03 m in each coordinate.

**Table 3.** Agisoft Metashape settings used for calculations.


#### *3.2. Elevation Errors and Related Parameters*

As shown, e.g., in our previous study [34], the elevation component of the resulting model is the most problematic one when using direct georeferencing based only on onboard GNSS RTK measurements. Therefore, the mean difference in elevation of the control points between the onboard GNSS RTK and the geodetic survey was calculated to obtain the systematic error (Tables 4 and 5). The residual error was then characterized by the standard deviation, the total error is described by the root mean square error (RMS). Similarly, residual systematic error between the recorded and adjusted camera elevations was calculated for the camera coordinates; the mean difference for individual flights was always equal to zero, and the standard deviation did not exceed 0.03 m (approx. 1x GSD), which confirms that the coordinates recorded during flights were correct. Tables 4 and 5 also detail other parameters, including the focal length f and the coordinates of the principal point offset Cx and Cy. The first two rows of each table represent the reference values calculated from all images using all available images together with the measured GCP coordinates (All\_Combined) and using GNSS RTK coordinates only (All\_RTK).

These results indicate that systematic (average) elevation error in individual flights is highly variable, with values as high as 0.85 m in some flights (Brownfield 75◦ (100 m)−<sup>1</sup> ). It is also apparent that even results derived from two flights at the same height and with the same image acquisition angle differ (both between sites and within the same site). The standard deviation of elevation is approximately 0.03 m, which is in accordance with the expected GNSS RTK measurement accuracy.

The All\_Combined variants demonstrate a very good agreement of all measured images and coordinates, i.e., that no outliers or erroneous measurements are present. The agreement of internal orientation elements is also very good (a bit worse between sites).

It should be also noted that a higher systematic elevation error was observed in flights where a higher difference between the focal length f calculated for the particular flight and the most likely value determined from the All\_Combined calculation occurred. Similar conclusions can be made with respect to the coordinates of the principal point offset.


**Table 4.** Results of calculation variants—individual flights—Brownfield.

**Table 5.** Results of calculation variants—individual flights—Rural.


Tables 6 and 7 show the results of calculations performed using both corresponding flights. Obviously, the increase in the number of images from the two mutually perpendicular flights led to a reduction of the systematic (mean) elevation error, which is particularly true for the Brownfield site. However, it still exceeds the expected measurement accuracy.

**Table 6.** Results of calculation variants: joint calculation of duplicate flights—Brownfield.


**Table 7.** Results of calculation variants: joint calculation of duplicate flights—Rural.


Tables 8 and 9 detail the values of combined calculations. It is obvious that in the Brownfield site, basically any non-homogeneous combination of flight parameters improved the results to such a degree that the maximum mean elevation error did not exceed 0.05 m and the total error (RMS) of 0.053 m, which is still less than two GSDs. On the other hand, no such improvement was observed when data from flights performed at two different heights were combined at the Rural site; the systematic error still remained up to 0.4 m. However, the combination of the flight at 100 m and flights with oblique image acquisition improved the systematic errors; the improvement increased when increasing the image acquisition angle from the nadiral direction. The standard deviations are similar in all these cases, approximately 0.03 m.

**Table 8.** Results of calculation variants: non-homogenous combinations of the flight at 100 m with nadiral axis of image acquisition and other flights—Brownfield.


**Table 9.** Results of calculation variants: non-homogenous combinations of the flight at 100 m with nadiral axis of image acquisition and other flights—Rural.


#### *3.3. Analysis of the Association between the Systematic Error in Elevation and the Deviation of the Focal Length f*

The data detailed in Tables 4–9 were used for the calculation of the differences between the determined focal lengths and the reference value determined from the All-Combined calculation variant. The relationship is shown in Figures 7 and 8, along with the regression coefficients and determination coefficient (calculated in MS Excel).

**Figure 7.** The systematic error of elevation as a function of the deviation of the focal length f from the reference value—Brownfields).

**Figure 8.** The systematic error of elevation as a function of the deviation of the focal length f from the reference value—Rural.

Both the graphical record and the determination coefficient R<sup>2</sup> , which is in both cases close to 1 (0.97), prove a practically linear association of the systematic error and the focal length error, which confirms the hypothesis stated, for example, in our previous paper [34].

The linear regression parameters can be further interpreted. The constant element represents the independent constant error (mutual difference) between coordinates determined by the onboard UAV GNSS RTK receiver and the terrestrial GNSS RTK survey. The detected size of the difference (max 0.017 m) corresponds to the declared accuracy of 0.03 m. The line direction can also be interpreted; using triangle similarity, the following simple equation can be derived:

$$dH = dF \cdot \frac{h}{f'} \tag{1}$$

describing the geometric configuration of the regression line direction as *h*/*f*, where *h* is the flight altitude above the terrain and *f* is the focal length. The primary flight altitude in the performed experiments was always *h* = 100 m and the focal length *f* = 3685 pix, the ratio *h*/*f* was therefore 0.027 m/pix, which is very close to the experimentally determined *h*/*f* values (0.0271 for the Brownfield site and −0.0285 for the Rural site).

The practically linear relationship between the constant chamber deviation and the systematic error in elevation proves that the inaccuracy of the internal orientation element calculation is indeed the source of that error. The focal length *f* determined during model calculation can, therefore, be used as an indicator of this error.

A similar approach to the calculation of the correlation coefficient was also applied to the coordinates of the principal point offset Cx and Cy. However, in this case, no reasonable correlation was found; there is, therefore, no direct relationship between the erroneous computation of the principal point offset and the systematic elevation error.

#### *3.4. Comparison of Dense Clouds*

Thus far, all comparisons and evaluations in this paper were performed using control points (CPs) in CloudCompare software. It is, therefore, necessary to show that this acquired information also has general validity, i.e., that it is also valid for the generated dense cloud point. For this purpose, the data were cropped, filtered to remove trees and shrubs, and the resulting cloud points were compared. The data from All\_Combined calculations (utilizing all available data) were, again, used as the reference. Below, results of the comparisons of the All\_Combined calculations to calculations from the primary flight providing the worse result from each location are shown. The comparison of the point clouds from the 100 m−<sup>1</sup> flight and All\_Combined dataset for Brownfield returned a systematic elevation error of 0.268 m with a standard deviation of elevation of 0.038 m, as illustrated by Figure 9. This systematic shift corresponds very well to the value calculated from control points (see Table 4). The comparison of the All\_Combined data and 100m\_2 flight for the Rural site illustrated in Figure 10 revealed a systematic error of 0.305 m with a standard deviation of 0.029 m. Again, this systematic shift corresponds very well to the value obtained from the control points (0.315 m; see Table 5). The last presented comparison focused on the difference between All\_Combined and All\_RTK point clouds, which demonstrates the practical agreement of the resulting point clouds; it is obvious that no deformation has occurred and the elevation difference is practically constant (the systematic shift is 0.002 m and the standard deviation is 0.006 m).

The comparisons of the point cloud elevations to the All\_Combined variant detailed in Figures 9–11 clearly show that the average difference value corresponds to that calculated using control points, and the same can be said about the standard deviation. The results of analyses of the remaining point clouds were in agreement with this statement, which proves that the difference is indeed caused by a systematic shift of the point cloud in the nadiral direction without deformation in the horizontal plane.

It is also obvious that the All\_RTK and All\_Combined calculation variants produced practically identical results, which implies that if the internal orientation elements are determined correctly, the result is correct even if only onboard RTK data are used.

**Figure 9.** Height differences between 100m\_1 RTK and All\_Combined point clouds—Brownfield (area cropped along outer control points).

**Figure 10.** Comparison of All\_Combined to the worst result of individual flights, i.e., the flight 100m\_2—Rural (area cropped along outer control points).

**Figure 11.** Comparison of All\_RTK to All\_Combined —Rural.

#### **4. Discussion**

Many studies have tested the use of UAV photogrammetry with onboard GNSS RTK for preparing a point cloud representing a DTM and subsequent calculations. Although, in principle, it is possible to perform the whole calculation without the use of GCPs, practical testing revealed that simultaneous calculation of the internal and external orientation elements may, in some cases, lead to a systematic elevation error, although all measurements are correct. This elevation error is random in a way, as it differed even in duplicate flights with the same configuration and the same UAV. This is in accordance with the findings by other authors (e.g., [34,36,38,39]), regardless of whether a fixed-wing or rotary-wing UAV was used.

James and Robson described in 2014 a so-called doming effect when creating a photogrammetric model solely from photographs without the use of reference points. This was not observed in our paper, which can be explained by the fact that the current algorithms used in Agisoft Metashape already consider the camera coordinates during construction of the model [43].

A detailed analysis of the results revealed that incorrect calculation of the internal orientation elements, particularly of the focal length *f*, is the most likely reason for this problem (e.g., [34,38,40]). Various strategies to deal with this problem have been proposed, such as the use of oblique images (e.g., [19,36,40]), including a small number of GCPs (e.g., [36,39,40,44]) or camera pre-calibration (e.g., [34,42]). Disregarding the use of GCPs (a reliable solution which, however, negates the principal goal of this study, i.e., the simplification of the measurement by avoiding the use of GCPs), it is necessary to try to propose a measurement strategy (flight configuration, processing method, other techniques) to prevent the systematic error. In this study, we aimed to prove the association between the systematic elevation error and erroneous determination of the internal orientation elements and to test various strategies for eliminating the systematic elevation error. We have shown experimentally in two sites (33 models calculated at each site) that the systematic error in elevation and deviation of the focal length are practically linearly dependent, with a coefficient of determination of more than 0.96. The regression coefficients also correspond very well to the relationship between the flight altitude and focal length (Equation (1)). This implies that it is necessary to adopt a strategy ensuring the best possible determination of the internal orientation elements. Here, we must also note that the correct camera calibration is indeed a principal problem of photogrammetry. There are methods that can be used for this purpose, but they usually rely on the use of GCPs and a very specific image acquisition configuration. Additionally, it is also well known that pre/post-calibration, i.e., calibration independent of the particular flight, brings (when using UAVs with non-metric cameras) poorer results than the calibration performed within the scope of the particular flight [45]. It is, therefore, necessary to choose approaches that are actually feasible from the perspective of the camera mounted on the UAV and that can be simply implemented in the image acquisition flight. Here, we combined the primary flight (100 m altitude, nadiral image acquisition) with imagery acquired from other flights. The results of the evaluation of such combined calculations revealed that:


3. The best results were obtained from the combinations of the primary flight and flights with oblique image acquisition. This was the only strategy that worked well at both sites in all tested combinations. The variant with the higher angle (30◦ from the vertical direction) provided the best results, with even the worst systematic error not exceeding 0.03 m (1 GSD).

In our experiment, only relatively small camera angles (15◦ and 30◦ from the vertical direction) were used to prevent disruption of image alignment, which could pose problems in rugged terrain (i.e., terrain with sloped surfaces). Obviously, the higher the difference in the camera angle, the greater are the differences between the appearance of the same area in the images and, therefore, the more difficult the image matching. Some studies [36,40] used a greater angle (45◦ ) but these were performed on an "ideal" flat terrain. The possible angles and their effect on the resulting accuracy should be subject to further research.

#### **5. Conclusions**

UAVs equipped with onboard GNSS RTK receivers are becoming financially more accessible. Their usage for SfM modeling without the need for GCPs seems to be a natural path to take; however, the usual flight configuration with nadiral image acquisition and selfcalibration often produces results with significant systematic elevation error. In this study, we have proved that the principal cause for this is the incorrect determination of the internal orientation parameters as the resulting elevation error is practically directly proportional to the error in the determination of the focal length (coefficient of determination of 0.96). This was also confirmed by the numerical agreement with the geometrical relationship.

To be able to use an onboard GNSS RTK receiver for direct georeferencing, it is, therefore, necessary to ensure correct calibration of the internal orientation elements; where the use of GCPs and/or accurate camera calibration is not possible or feasible, the results can be improved by adjusting the flight geometry and calculation method. We have tested strategies of joint calculations of two flights that were (i) geometrically identical and (ii) geometrically different. A major difference was revealed between sites; at the site with surfaces suitable for SfM, all tested non-homogeneous flight combinations yielded satisfactory results and the systematic error was reduced to approx. 1 GSD; on the contrary, at the Rural site, covered predominantly with rapeseed, shrubs, trees, and other vegetation, combining different flight altitudes did not result in sufficient improvement. However, combinations of two flights at the same altitude with different camera acquisition axes (nadiral and oblique) performed very well. The combination with the higher difference in acquisition angle (i.e., of nadiral and 30 deg from nadiral image acquisition axes) performed best, capable of reducing the systematic elevation error to approx. 1 GSD even in the Rural area with a surface highly unsuitable for photogrammetry.

**Author Contributions:** Conceptualization: M.Š.; methodology: M.Š.; ground data acquisition: T.R. and J.S.; UAV data acquisition: J.B.; data processing of UAV imagery: T.R. and J.S.; analysis and interpretation of results: M.Š. and R.U.; manuscript drafting: M.Š. and R.U. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Grant Agency of CTU in Prague—grant number SGS21/053/OHK1/1T/11 "Optimization of acquisition and processing of 3D data for purpose of engineering surveying, geodesy in underground spaces and 3D scanning".

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**


## *Article* **Quality Assessment of Photogrammetric Methods—A Workflow for Reproducible UAS Orthomosaics**

**Marvin Ludwig 1,\*, Christian M. Runge <sup>2</sup> , Nicolas Friess <sup>1</sup> , Tiziana L. Koch <sup>3</sup> , Sebastian Richter <sup>1</sup> , Simon Seyfried <sup>1</sup> , Luise Wraase <sup>1</sup> , Agustin Lobo <sup>4</sup> , M.-Teresa Sebastià 2,5, Christoph Reudenbach <sup>1</sup> and Thomas Nauss <sup>1</sup>**


Received: 21 October 2020; Accepted: 19 November 2020; Published: 22 November 2020

**Abstract:** Unmanned aerial systems (UAS) are cost-effective, flexible and offer a wide range of applications. If equipped with optical sensors, orthophotos with very high spatial resolution can be retrieved using photogrammetric processing. The use of these images in multi-temporal analysis and the combination with spatial data imposes high demands on their spatial accuracy. This georeferencing accuracy of UAS orthomosaics is generally expressed as the checkpoint error. However, the checkpoint error alone gives no information about the reproducibility of the photogrammetrical compilation of orthomosaics. This study optimizes the geolocation of UAS orthomosaics time series and evaluates their reproducibility. A correlation analysis of repeatedly computed orthomosaics with identical parameters revealed a reproducibility of 99% in a grassland and 75% in a forest area. Between time steps, the corresponding positional errors of digitized objects lie between 0.07 m in the grassland and 0.3 m in the forest canopy. The novel methods were integrated into a processing workflow to enhance the traceability and increase the quality of UAS remote sensing.

**Keywords:** unmanned aerial systems; unmanned aerial vehicle; time series; accuracy; reproducibility; orthomosaic; validation; photogrammetry

#### **1. Introduction**

Unmanned aerial systems (UAS) are widely used in environmental research. Applications encompass the retrieval of crop yield [1] or drought stress [2] in agricultural areas or the mapping of plant species [3–5], biomass [6,7] or forest structure [8–10] in nature conservation tasks. Today, UAS allow an extensive spatial coverage with high resolution that provides detailed observations on the individual plant level, e.g., for the detection of pest infections in trees [11] or rotten stumps [12]. The flexibility of UAS is also beneficial for multi-temporal observations since flights can be scheduled on short notice based on specific events like bud burst or local weather conditions. Therefore, UAS are regarded as a key component for bridging the scales between space-borne remote sensing systems and in-situ measurements in environmental monitoring systems [13].

Applications of UAS can be structured into two main components: the acquisition of individual images—including the flight planning—with an unmanned aerial vehicle (UAV) for a particular region of interest and the processing of these images with photogrammetric methods to obtain georeferenced orthophoto mosaics [8,14–16] or digital surface models [8,17,18]. Studies on the development of workflows for UAS are sparse, often exclude the flight planning and are mostly tied to a very specific application [4,19]. A generalized, flexible and commonly accepted workflow is still missing [13].

Standardized protocols and quality assessments are needed for a better understanding and appropriate use of UAS imagery. Since the final product quality depends on the initial image capturing, flight planning is one important aspect in a common workflow scheme. For example, the flight height in conjunction with the used sensor (RGB, multispectral or hyperspectral) affect the ground sampling distance (GSD, i.e., pixel size or spatial resolution) of the images and in conjunction with the flight path affect the overlap of the individual images which is a key factor for the successful image processing [20]. A fully reproducible study therefore must include the flight path and parameters as well as the camera configuration metadata. For ready-to-fly consumer UAV, flight planning is usually done in software which is tied to the specific hardware (e.g., the DJIFlightPlanner for DJI drones). These commercial solutions often do not provide the full control of autonomous flights and access to metadata which makes it difficult to integrate flight planning in a generalized workflow. In this respect, open hardware/software solutions like Pixhawk based systems and the MAVlink protocol (mavlink.io) are advantageous. To complete the metadata, environmental conditions during the flight like sun angle and cloudiness that also have impacts on the image quality [21] should be recorded.

Equally or even more important for valid results and their use in subsequent data analysis or synthesis is a quality assessment of the resulting image products in terms of their spatial accuracy and reproducibility. The basic image processing workflow starts with the alignment of the individual images, which results in a projected 3D point cloud. Usually, this point cloud is georeferenced through the individual image coordinates or via the use of measured ground control points (GCPs) [22,23]. The point cloud is the basis for the generation of a surface model and the orthorectification and mosaicing of the individual images based on this surface. Commercial software such as Metashape (Agisoft LLC, St. Petersburg, Russia; formerly known as Photoscan) makes these complex photogrammetric methods accessible to a broad range of users and has been utilized in numerous studies [3,13,24].

The quality of photogrammetrically compiled orthomosaics is commonly expressed as their georeference accuracy [25–28] or statistical error metrics derived from the image alignment (e.g., the amount of points in the point cloud or the reprojection error [8,29]). However, these measures alone provide no comprehensive information about the quality of the orthomosaic since the subsequent steps of orthorectification and mosaicing are not taken into account. Image artifacts and distortions can occur during these processing steps that are not reflected in the georeference accuracy [30]. Especially in forest ecosystems, the complex and diverse structures and similar image patterns in the canopy can lead to erroneous imagery [20]. In addition, wind exposure leads to changes in the structure of the tree canopy and consequently causes problems in the alignment of individual images [31]. The quality assessment becomes even more important when time series are analyzed since actual changes of the observed environmental variables have to be separated from deviations which stem from the image processing itself. In addition, here low georeferencing errors are even more important since the errors of the individual time steps can accumulate. Studies utilizing time series relied on georeferencing errors below 10 cm of the individual time steps [26,32].

To evaluate the reproducibility and validity of orthomosaic time series, it is therefore necessary to: (1) optimize the positional accuracy of the individual orthophoto mosaics, (2) to evaluate the reproducibility of the photogrammetric processing of these orthophoto mosaics and (3) to evaluate the positional accuracy between features in the individual time steps of the series.

This study proposes (i) an optimization for the orthorectification of UAS images and (ii) an additional quality criterion for UAS orthomosaics that focuses on the reproducibility of the photogrammetric processing. The orthorectification is improved by an automated optimization of the checkpoint error based

on an iteration of point cloud filters. The reproducibility of orthomosaics is quantified by the repeated processing of the same scene and a pixel-wise correlation analysis between the resulting orthophoto mosaics. We illustrate the importance of both methods using different orthorectification surfaces and two time series in a grassland and a forest area, respectively. To foster an error and reproducibility optimized orthomosaic processing, we incorporate the new methods into an impoved UAS workflow. *Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 4 of 19

#### **2. Materials and Methods** 01 11:20 a.m. Hero 7

#### *2.1. Multi-Temporal Flights* 2020/07/07

Forest 02

Forest 03

Forest 04

Forest 05

Forest 06

Grassland 2013

Grassland 2015

Grassland 2017

*2.2. Image Georeferencing* 

Two multi-temporal UAV-based image dataset were acquired as a test sample for this study (Table 1). The first dataset is a series of six consecutive flights over a small temperate forest patch (Wolfskaute, Hesse, Germany). The surveyed area covers 7 ha of a forested hill with an adjacent meadow and ranges from 283 m to 320 m a.s.l. (above sea level). with a canopy height of up to 37 m (Figure 1a). The six flights were performed on 2020-07-07 between 11:00 and 14:00 CEST using a 3DR Solo Quadrocopter (3D Robotics, Inc., Berkeley CA, USA) and a GoPro Hero 7 camera (GoPro Inc., San Mateo, CA, USA; Appendix B, Table A1). The flight plan was made with Qgroundcontrol and refined with a LiDAR derived digital surface model (DSM, provided by the Hessian Agency for Nature Conservation, Environment and Geology (HLNUG)) with the R-package uavRmp to achieve a uniform altitude of 50 m above the forest canopy (see Appendix A). For georeferencing and checkpoint error calculation, 13 ground control points (GCPs) were surveyed with the Real Time Kinematic (RTK) GNSS (Global Navigation Satellite System) device Geomax Zenith 35 (GeoMax AG, Widnau, Switzerland). The RTK GNSS measurements had an error of between 0.9 and 1.6 cm in the horizontal direction and between 1.9 and 3.7 cm in the vertical direction. Eight GCPs were used as controlpoints and five served as independent checkpoints to evaluate the georeferencing error (Figure 1a). 11:42 a.m. 44.33 cloud free GoPro Hero 7 7 2.58 50 > 90/75 630 2020/07/07 12:11 a.m. 50.33 partially cloudy GoPro Hero 7 7 2.58 50 > 90/75 630 2020/07/07 12:40 a.m. 56.13 partially cloudy GoPro Hero 7 7 2.58 50 > 90/75 630 2020/07/07 13:10 a.m. 61.76 cloudy GoPro Hero 7 7 2.58 50 > 90/75 630 2020/07/07 13:43 a.m. 67.28 cloudy GoPro Hero 7 7 2.58 50 > 90/75 630

The second dataset is an inter-annual time series of a grassland area (La Bertolina, Eastern Pyrenees, Spain). Terrain altitudes range from 1237 m to 1328 m a.s.l. The flights were performed in spring or early summer of 2013, 2015 and 2017 using an octocopter with a Pixhawk controller. Cameras and flight plans in a fixed altitude differ between the dates (Table 1), detailed camera settings in Appendix B, Table A1). The flights took place in the morning with sub-optimal illumination angles below 35 degrees [21] and partly scattered light conditions due to the presence of clouds. Five to eight GCPs were measured in each year with a conventional GPS device without RTK, from which three were used as checkpoints (Figure 1b). 2013/06/01 11:17 a.m. 41.34 partially cloudy Sony NEX-SN 7.68 3.32 111 70/75 27 2015/05/22 09:26 a.m. 19.99 cloud free Sony NEX-7 14.2 3.68 169 75/75 57 2017/05/18 Sony

Both datasets were used to empirically determine the georeferencing accuracy and reproducibility of the photogrammetrically retrieved orthomsoaics. For a better understanding of the newly introduced approaches, the following chapters first outline the general UAS image processing workflow. 08:53 a.m. 13.58 partially cloudy ILCE-7RM2 32.4 3.97 132 75/75 57

**Figure 1.** Overview of the two study areas and the location of ground control points. (**a**) Forested area in Wolfskaute, Hesse, Germany. (**b**) Grassland area in La Bertolina, Eastern Pyrenees, Spain. Both **Figure 1.** Overview of the two study areas and the location of ground control points. (**a**) Forested area in Wolfskaute, Hesse, Germany. (**b**) Grassland area in La Bertolina, Eastern Pyrenees, Spain. Both maps are projected in UTM but with geographic coordinates for a better overview.

maps are projected in UTM but with geographic coordinates for a better overview.

Very high resolution orthomosaics such as those resulting from UAV flights require precise positioning to avoid the introduction of complex errors in the image processing [33]. The standard GNSS receivers that are built in cameras do not provide sufficient accuracy. There are two alternative strategies for georeferencing the UAS products: direct georeferencing of the images with a RTK on


**Table 1.** Overview of the flight missions used to acquire the two test datasets in forest and grassland environments. The cameras were triggered by time interval. In the forest flights, altitude refers to a uniform height above the canopy. In the grassland flights, altitude referes to a fixed relative height above the take-off point. Overlap referes to both forward (F) and side (S) overlap.

#### *2.2. Image Georeferencing*

Very high resolution orthomosaics such as those resulting from UAV flights require precise positioning to avoid the introduction of complex errors in the image processing [33]. The standard GNSS receivers that are built in cameras do not provide sufficient accuracy. There are two alternative strategies for georeferencing the UAS products: direct georeferencing of the images with a RTK on the UAV or the use of GCPs. Direct georeferencing requires the accurate time synchronization between the RTK device and the camera, which has been reported as a major source of error [33–35].

The use of GCP implies that the study area is accessible in order to install visible ground markers before the flight and precisely measure their position. Ideally, the GCP should be equally distributed over the study area to avoid distortions during processing [33]. During the orthomosaic processing, the ground markers need to be interactively identified in the images. Despite these drawbacks, georeferencing through GCP with general-purpose GNSS receiving systems—that are nowadays standard equipment for surveying—is still far more widespread and potentially more cost-effective than the direct georeferencing with RTK [34].

In any case, GCPs are also required for the independent validation of the referencing accuracy during the processing [24] and therefore essential for a proper accuracy assessment. The geolocation accuracy is usually given as the checkpoint root mean squared error (checkpoint error, Equation (1)) that quantifies the distance between the position of the measured GCP (XYZgcp) and the estimated positions of these coordinates in the photogrammetric processing (XYZest). It can be calculated for each direction individually.

$$Checkpoint error = \sqrt{mean\left(\left(XYZ\_{\text{gcp}} - XYZ\_{\text{est}}\right)^2\right)}\tag{1}$$

#### *2.3. Photogrammetric Processing*

The Metashape software (Agisoft LLC, St. Petersburg, Russia; formerly known as Photoscan) is widely used for UAS image processing. The standard photogrammetric workflow includes the image alignment, the generation of a digital surface model (called Digital Elevation Model in Metashape) and the orthorectification and mosaicing (Figure 2). The image alignment starts with the automatic identification of distinct features in the individual images. This process is enhanced and requires less computation time if the individual images already contain GNSS information. Those features which appear in more than one image, the so called tie points, are matched and projected in a 3D space, forming the sparse cloud that is georeferenced using the surveyed GCPs. The georeferenced sparse cloud is subsequently used to compute a digital surface model, either through a dense pointcloud or a mesh interpolation of the sparse cloud (Figure 2a,b). The surface model is finally used for rectifying the georeferenced images.

For each processing step, a multitude of parameters and options are available that affect the results in terms of georeferencing accuracy and orthomosaic quality. While Metashape offers default values for these parameters, the methods described below aim to optimize and alter the standard workflow to obtain high quality and reproducible orthomosaics. *Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 6 of 19 optimization was repeated five times and the standard deviation of the checkpoint error was calculated.

**Figure 2.** Overview of the workflow and reproducibility analysis. The georeferenced sparse point cloud is iteratively filtered until the checkpoint error reaches its minimum. A surface model and orthomosaic can either be retrieved through (**a**) the dense point cloud or (**b**) the creation of a mesh. For the reproducibility analysis (**c**) the whole photogrammetric process is repeated x times, leading to x orthomosaics from the same image source. Pixel-wise correlation analysis between pairs of orthomosaics leads to n correlation layers and n binary layers based on a correlation coefficient **Figure 2.** Overview of the workflow and reproducibility analysis. The georeferenced sparse point cloud is iteratively filtered until the checkpoint error reaches its minimum. A surface model and orthomosaic can either be retrieved through (**a**) the dense point cloud or (**b**) the creation of a mesh. For the reproducibility analysis (**c**) the whole photogrammetric process is repeated x times, leading to x orthomosaics from the same image source. Pixel-wise correlation analysis between pairs of orthomosaics leads to n correlation layers and n binary layers based on a correlation coefficient threshold of >0.95. The final reproducibility layer is the sum of all binary layers.

#### threshold of > 0.95. The final reproducibility layer is the sum of all binary layers. *2.4. Optimizing the Georeferencing*

a high level of reproducibility of a pixel.

correlation layers.

*2.5. Orthomosaic Reproducibility*  Since the orthomosaics are the result of complex photogrammetric methods, its reproducibility has to be assessed. In this context, reproducibility is a measure of how identical individual orthomosaics are, if they are computed from the same image source and with identical photogrammetric processing parameters. This way, the reproducibility of the photogrammetric process itself is evaluated without the influence of changes in the surveyed environment. For this purpose, a set amount of orthomosaics is computed with identical settings (Figure 2). To quantify the reproducibility, the pixel-wise correlation coefficient of the RGB values is calculated between each Each point in the sparse cloud has four accuracy attributes: the reconstruction accuracy (RA), the reprojection error (RE), the projection accuracy (PA) and the image count [35]. In particular, the RE is suggested as the quality measure of tie points [20,36]. It is the deviation of the positions of identified features in the original image from positions of the same features in the calculated 3D space. The removal of points with a high error and the subsequent optimization of the camera positions can improve the georeferencing. However, by removing too many points in the individual sparse clouds, images no longer align and the checkpoint error increases. An iterative approach is used to find the optimal RE threshold for the dataset by filtering the sparse cloud using different RE threshold values in order to minimize the checkpoint error (Figure 2). This method was applied to each flight of the two multi-temporal datasets.

pair of the computations (*raster* R package, corLocal function). Pixels with a correlation coefficient of 0.95 or higher are considered identical between two orthomosaics. This leads to a binary layer for all pair-wise correlations marking reproducible and non-reproducible pixels. By summing up the binary The initial pointclouds are the direct results from the feature identification and matching algorithms from Metashape. In order to account for the inherited randomness of these processes and the slight

The more orthomosaics are computed, the more correlation layers can be calculated (Equation 2) and the more likely a layer is to receive a non-correlating pair of pixels. Therefore, a preliminary test with an arbitrarily high number of 25 orthomosaics (x = 25) was done, which leads to n = 300 differences of the checkpoint error due to the manual alignment of the GCP, the optimization was repeated five times and the standard deviation of the checkpoint error was calculated.

#### *2.5. Orthomosaic Reproducibility*

Since the orthomosaics are the result of complex photogrammetric methods, its reproducibility has to be assessed. In this context, reproducibility is a measure of how identical individual orthomosaics are, if they are computed from the same image source and with identical photogrammetric processing parameters. This way, the reproducibility of the photogrammetric process itself is evaluated without the influence of changes in the surveyed environment. For this purpose, a set amount of orthomosaics is computed with identical settings (Figure 2). To quantify the reproducibility, the pixel-wise correlation coefficient of the RGB values is calculated between each pair of the computations (*raster* R package, corLocal function). Pixels with a correlation coefficient of 0.95 or higher are considered identical between two orthomosaics. This leads to a binary layer for all pair-wise correlations marking reproducible and non-reproducible pixels. By summing up the binary layers, regions of high and low reproducibility can be identified (Figure 2c). High values then denote a high level of reproducibility of a pixel.

The more orthomosaics are computed, the more correlation layers can be calculated (Equation (2)) and the more likely a layer is to receive a non-correlating pair of pixels. Therefore, a preliminary test with an arbitrarily high number of 25 orthomosaics (x = 25) was done, which leads to n = 300 correlation layers.

$$m\_{\text{ correlations}} = \frac{\text{x!}}{\text{2!} \* (\text{x} - \text{2})!} \tag{2}$$

In practice, computing 25 orthomosaics is, in most cases, unreasonable regarding the computation time and processing resources. Therefore, the reproducibility analysis was also done with only 5 identical orthomosaic computations (i.e., 10 pairwise correlations). The comparison of both reproducibility layers revealed that summing up 10 correlation layers (*x* = 5) is sufficient to identify most pixels which are also denoted as non-reproducible when using 300 binary layers. The analysis of the time series and the full forest set were therefore done with only 5 identical orthomosaic computations.

In addition, the edges of the orthomosaic are heavily distorted and have a lower positional accuracy due to less image overlap [37]. Therefore, the orthomosaic should be cropped to the central area with a sufficient overlap. In the R package uavRmp provided with this study, this crop mask is automatically generated from spatial polygons defined by the seamlines (i.e., the outline of the individual image parts) of the mosaic. The outermost polygons are identified using a concave hull of the seamlines and are discarded from the orthomosaic.

All computations were done in R (Version 4.0.2; [38]). All presented methods are provided as the R-package uavRmp (https://gisma.github.io/uavRmp/) and the Metashape Python Scripts (https://github.com/envima/MetashapeTools).

#### *2.6. Assessing the Orthorectification Surface*

The standard workflow in Metashape suggests a DSM created from a dense pointcloud as the orthorectification surface (Figure 2a) [16]. In vegetation free areas, this DSM is mostly equivalent to a digital elevation model (DEM) [39] or digital terrain model (DTM) and therefore suitable for the creation of orthomosaics. In areas with vegetation, the DEM requires the classification of ground points in the dense pointcloud which is currently not viable in Metashape for structurally rich environments like forests or grasslands in the phases of maturation and flowering. Alternatively, a 2.5D mesh can be created from the sparse cloud on which the images are projected [40]. By smoothing the mesh to eliminate sharp edges, the surface can be regarded as an approximation of a DEM. This approach requires far less computational ressources since the creation of a densecloud is skipped. It is therefore more suitable for low-budget UAS setups. To demonstrate and validate its usage, the mesh surface was compared to the DSM for one of the forest scenes with respect to the reproducibility of the derived orthomosaics using the pixel-wise correlation method described above.

#### *2.7. Time Series Accuracy*

To assess the overall reproducibility of time series, reproducibility masks have been computed for each time step and overlaid to identify pixels that are reproducible over the multi-temporal data and suitable for time series analyses. To differentiate between positional errors from the photogrammetric processing and actual environmental changes between the time steps, identifiable objects and trees were digitized in each individual orthomosaic (7 geometries in the forest, 4 in the grassland). The positional shift of the bounding boxes for each digitized object was calculated between each time step. This provides a more critical assessment of time series than the individual checkpoint errors alone, since relative position differences and environmental changes between the time steps are taken into account.

#### **3. Results**

#### *3.1. Optimized Georeferencing Accuracy*

reprojection error thresholds.

errors with standard deviations close to 0 (Table 2).

computations.

To evaluate the optimized georeferencing approach, the sparse clouds were iteratively filtered with decreasing RE thresholds. The sparse clouds of the six forest flights originally included between 560,000 and 680,000 tie points (Figure 3b) after the image alignment with a maximum initial checkpoint error of 3 m. The checkpoint errors were minimized to values between 0.021 and 0.046 m if a RE threshold of 0.4 m was used (Figure 3a). The corresponding pointclouds consisted of about 150,000 tie points. Further reducing the RE threshold to 0.1 m increased the checkpoint error (Figure 3b) due to an insufficient number of tie points for image alignments. *Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 8 of 19 checkpoint error of 3 m. The checkpoint errors were minimized to values between 0.021 and 0.046 m if a RE threshold of 0.4 m was used (Figure 3a). The corresponding pointclouds consisted of about 150,000 tie points. Further reducing the RE threshold to 0.1 m increased the checkpoint error (Figure 3b) due to an insufficient number of tie points for image alignments.

**Figure 3.** (**a**) Checkpoint error in horizontal direction of the six forest flights with different reprojection error thresholds of the pointcloud filters. A reprojection error threshold of 0.4 m (red) led to the optimal checkpoint error of 0.067 m in the horizontal direction. The initial checkpoint error values of the sparse clouds without a camera optimization were 3 m on average. For better visibility, the y-Axis uses a log10 scale. (**b**) Number of points in the sparse clouds of the six forest flights with different **Figure 3.** (**a**) Checkpoint error in horizontal direction of the six forest flights with different reprojection error thresholds of the pointcloud filters. A reprojection error threshold of 0.4 m (red) led to the optimal checkpoint error of 0.067 m in the horizontal direction. The initial checkpoint error values of the sparse clouds without a camera optimization were 3 m on average. For better visibility, the y-Axis uses a log10 scale. (**b**) Number of points in the sparse clouds of the six forest flights with different reprojection error thresholds.

In order to test the robustness of the method, the determined optimal RE threshold of 0.4 m was used to filter the sparse clouds of five identical computations of the six forest flights. The average controlpoint error in the horizontal direction was consistently below 0.02 m in all six flights and deviated less than 0.001 m in each of the five computations. The horizontal checkpoint error was between 0.02 m and 0.06 m over all six flights and deviated less than 0.01 m within the five computations. The error in the vertical direction (Z in Table 2) was up to five times higher; however, In order to test the robustness of the method, the determined optimal RE threshold of 0.4 m was used to filter the sparse clouds of five identical computations of the six forest flights. The average controlpoint error in the horizontal direction was consistently below 0.02 m in all six flights and deviated less than 0.001 m in each of the five computations. The horizontal checkpoint error was between 0.02 m and 0.06 m over all six flights and deviated less than 0.01 m within the five computations. The error in the vertical direction (Z in Table 2) was up to five times higher; however, the reproducibility in each flight is still stable with a maximum deviation of 0.03 m over the five computations.

for the years 2013, 2015 and 2017 were 0.29 m, 0.18 m and 0.07 m, respectively. These errors are up to 10 times higher than in the forest time series, which is mostly due to the use of a conventional GNSS measurements for the GCP in the grassland compared to the RTK GNSS measurement in the forest. Nevertheless, the five computations of the grassland time series led to very consistent checkpoint

**Table 2.** Controlpoint and checkpoint error of the five computations of the six forest flights. The

**Flight XYmean XYsd Zmean Zsd XYmean** XYsd Zmean Zsd

Forest 01 0.0149 0.0003 0.0207 0.0003 0.0220 0.0002 0.0591 0.0010

Forest 02 0.0082 0.0008 0.0217 0.0006 0.0377 0.0008 0.1989 0.0019

**Controlpoint Error (m) Checkpoint Error (m)** 

images of from each flight were computed five times with identical settings.

the reproducibility in each flight is still stable with a maximum deviation of 0.03 m over the five

In the grassland area, the iterative point cloud filtering only marginally improved the checkpoint error since almost all tie points had already very low RE of less than 0.4 m. The final checkpoint errors for the years 2013, 2015 and 2017 were 0.29 m, 0.18 m and 0.07 m, respectively. These errors are up to 10 times higher than in the forest time series, which is mostly due to the use of a conventional GNSS measurements for the GCP in the grassland compared to the RTK GNSS measurement in the forest. Nevertheless, the five computations of the grassland time series led to very consistent checkpoint errors with standard deviations close to 0 (Table 2). *Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 9 of 19 Forest 03 0.0140 0.0001 0.0264 0.0012 0.0565 0.0003 0.1765 0.0030

**Table 2.** Controlpoint and checkpoint error of the five computations of the six forest flights. The images of from each flight were computed five times with identical settings. Forest 04 0.0112 0.0004 0.0122 0.0006 0.0529 0.0034 0.0861 0.0090


#### *3.2. Orthomosaic Reproducibility 3.2. Orthomosaic Reproducibility*

To evaluate the reproducibility of orthomosaics, the images of the 4th forest flight were computed 25 times with identical photogrammetric parameters. The 300 pixel-wise correlation analysis between the 25 orthomosaics were performed within a testing area of 600 by 650 pixels showing the forest canopy (Figure 4a). Pixels with correlation coefficients higher than 0.95 were considered reproducible between the orthomosaics. Highly reproducible pixels are characterized by consistently high correlation coefficients and therefore high values in the summed up layer shown in Figure 4b. Non-reproducible regions appear mainly in forest clearings or at dead trees. The actual canopy appears stable across multiple computations. To evaluate the reproducibility of orthomosaics, the images of the 4th forest flight were computed 25 times with identical photogrammetric parameters. The 300 pixel-wise correlation analysis between the 25 orthomosaics were performed within a testing area of 600 by 650 pixels showing the forest canopy (Figure 4a). Pixels with correlation coefficients higher than 0.95 were considered reproducible between the orthomosaics. Highly reproducible pixels are characterized by consistently high correlation coefficients and therefore high values in the summed up layer shown in Figure 4b. Non-reproducible regions appear mainly in forest clearings or at dead trees. The actual canopy appears stable across multiple computations.

Using only five identical orthomosaics (i.e., 10 pairwise correlation analyses, Figure 4c) revealed the same patterns as the 300 correlation layers. Hence, only five repetitions are considered enough for subsequent reproducibility analyses. Using only five identical orthomosaics (i.e., 10 pairwise correlation analyses, Figure 4c) revealed the same patterns as the 300 correlation layers. Hence, only five repetitions are considered enough for subsequent reproducibility analyses.

**Figure 4:** Pixel-wise correlations of the RGB values of a 600 by 650 pixels area of the canopy (**a**) of identical orthomosaic processings. (**b**) The sum of a binary classification over 25 identical computations, hence 300 pairwise correlations. For the binary classification, a pixel-wise correlation coefficient of 0.95 or greater was used. High values (yellow) denote high reproducibility of the RGB values in this pixel over the 25 images. Low values (blue) indicate non-reproducible orthomosaics since the correlation coefficient between the 25 computations is consistently below 0.95. (**c**) The results of only 5 identical computations, hence the sum of 10 correlation layers. **Figure 4.** Pixel-wise correlations of the RGB values of a 600 by 650 pixels area of the canopy (**a**) of identical orthomosaic processings. (**b**) The sum of a binary classification over 25 identical computations, hence 300 pairwise correlations. For the binary classification, a pixel-wise correlation coefficient of 0.95 or greater was used. High values (yellow) denote high reproducibility of the RGB values in this pixel over the 25 images. Low values (blue) indicate non-reproducible orthomosaics since the correlation coefficient between the 25 computations is consistently below 0.95. (**c**) The results of only 5 identical computations, hence the sum of 10 correlation layers.

To evaluate to which degree the reproducibility of orthomosaics depends on the use of the

otherwise identical computation workflows. Using 5 identical orthomosaic computations, 82% and 85% of the pixels were considered reproducible using the DSM and mesh, respectively. All nonreproducible pixels were found in the forest clearing areas of the images. The surrounding meadow

*3.3. Comparison of Mesh and DSM Surface-Based Orthomosaics* 

#### *3.3. Comparison of Mesh and DSM Surface-Based Orthomosaics*

To evaluate to which degree the reproducibility of orthomosaics depends on the use of the underlying mesh and DSM surface in the forest environment, both surfaces have been used in otherwise identical computation workflows. Using 5 identical orthomosaic computations, 82% and 85% of the pixels were considered reproducible using the DSM and mesh, respectively. All non-reproducible pixels were found in the forest clearing areas of the images. The surrounding meadow did not differ between the orthomosaics (Figure 5). When only the forested area is considered, the number of reproducible pixels decreased to 69% in the DSM and 74% in the mesh-based processing. *Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 10 of 19 did not differ between the orthomosaics (Figure 5). When only the forested area is considered, the number of reproducible pixels decreased to 69% in the DSM and 74% in the mesh-based processing. While being nearly identical in their amounts of reproducible pixels, there is still a large contrast

While being nearly identical in their amounts of reproducible pixels, there is still a large contrast between the orthomosaic reproducibility of the two surfaces. If a pixel in the DSM-based orthomosaics is non-reproducible between two images, there is a high probability that this pixel is non-reproducible in all images (Figure 5). The mesh-based orthomosaics show significantly more pixels that are non-reproducible between only one or two different orthomosaics, but were stable between the other computations. between the orthomosaic reproducibility of the two surfaces. If a pixel in the DSM-based orthomosaics is non-reproducible between two images, there is a high probability that this pixel is non-reproducible in all images (Figure 5). The mesh-based orthomosaics show significantly more pixels that are non-reproducible between only one or two different orthomosaics, but were stable between the other computations.

**Figure 5.** Comparison of DSM and mesh-based orthomosaics in terms of their reproducibility. High values (yellow) denote high reproducibility of the RGB values in this pixel over the 5 orthomosaics. (**a**) RGB orthomosaic of the forested area with the mesh as its orthorectification basis. (**b**) and (**c**) show the sum of the 10 correlation layers of the 5 identical computations using the DSM (**b**) or the mesh (**c**). **Figure 5.** Comparison of DSM and mesh-based orthomosaics in terms of their reproducibility. High values (yellow) denote high reproducibility of the RGB values in this pixel over the 5 orthomosaics. (**a**) RGB orthomosaic of the forested area with the mesh as its orthorectification basis. (**b**) and (**c**) show the sum of the 10 correlation layers of the 5 identical computations using the DSM (**b**) or the mesh (**c**).

impact on the overall reproducibility. Forest 03 to 06 which were performed in cloudy conditions

The same canopy part of the orthomosaics as in Figure 4 was used for the assessment of the

*3.4. Forest Time Series Reproducibility* 

#### *3.4. Forest Time Series Reproducibility*

The same canopy part of the orthomosaics as in Figure 4 was used for the assessment of the reproducibility along a time series. Figure 6 reveals that non-reproducible areas are mostly consistent between the flights. They concentrate around clearings and around the branches of dead crowns visible in Figure 4a. The actual forest canopy is reproducible. Flight conditions also seem to have an impact on the overall reproducibility. Forest 03 to 06 which were performed in cloudy conditions show less deviations between computations than Forest 01 and 02 where cloud-free conditions and low solar elevations are present.

Summing up all the correlation layers of the six flights (Figure 7) leads to a quality mask for the whole time series. This confirms that the canopy region is reproducible and stable even across the time series.

*Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 11 of 19

#### *3.5. Grassland Time Series Reproducibility*

*3.5. Grassland Time Series Reproducibility* 

In the grassland time series, the reproducibility of each orthomosaic was also tested with five identical computations. Only 1% of the pixels in the grassland area deviated between the computations of each year. In 2013 and 2015, the non-reproducible areas occur mainly in the area of a micrometeorological station in the middle of the meadow (Figure 8). In 2017 some singular pixels also deviated in the meadow areas. show less deviations between computations than Forest 01 and 02 where cloud-free conditions and low solar elevations are present. Summing up all the correlation layers of the six flights (Figure 7) leads to a quality mask for the whole time series. This confirms that the canopy region is reproducible and stable even across the time series.

**Figure 6.** Comparison of the orthomosaic reproducibility of the six flights. High values (yellow) denote high reproducibility of the RGB values in this pixel over the 5 orthomosaics. Low values (blue) indicate non-reproducible pixels since the 10 pairwise correlation coefficients between the 5 computations is consistently below 0.95. **Figure 6.** Comparison of the orthomosaic reproducibility of the six flights. High values (yellow) denote high reproducibility of the RGB values in this pixel over the 5 orthomosaics. Low values (blue) indicate non-reproducible pixels since the 10 pairwise correlation coefficients between the 5 computations is consistently below 0.95.

**Figure 7.** Combination of the orthomosaic reproducibility of the six forest flights.

In the grassland time series, the reproducibility of each orthomosaic was also tested with five identical computations. Only 1% of the pixels in the grassland area deviated between the computations is consistently below 0.95.

low solar elevations are present.

time series.

**Figure 6.** Comparison of the orthomosaic reproducibility of the six flights. High values (yellow) denote high reproducibility of the RGB values in this pixel over the 5 orthomosaics. Low values (blue)

show less deviations between computations than Forest 01 and 02 where cloud-free conditions and

Summing up all the correlation layers of the six flights (Figure 7) leads to a quality mask for the whole time series. This confirms that the canopy region is reproducible and stable even across the

**Figure 7.** Combination of the orthomosaic reproducibility of the six forest flights. **Figure 7.** Combination of the orthomosaic reproducibility of the six forest flights. also deviated in the meadow areas.

**Figure 8.** Inter-annual time series of the grassland area. In red are pixels which had a correlation coefficient lower than 0.95 in one of the 10 pairwise correlation layers of 5 identical orthomosaic **Figure 8.** Inter-annual time series of the grassland area. In red are pixels which had a correlation coefficient lower than 0.95 in one of the 10 pairwise correlation layers of 5 identical orthomosaic computations.

#### *3.6. Time Series Positional Accuracy*

computations.

*3.6. Time Series Positional Accuracy*  To further assess the validity of UAS time series, the positional shift between 7 digitized tree crowns in the forest and four visible objects in the grassland were calculated. Tree crowns moved by 0.3 m on average with a maximum shift of 0.75 m of one tree between forest flight 02 and 03. During the image acquisition of these two flights, the lighting conditions changed due to the presence of clouds and changing wind speeds. In Figure 9, the positional shift of 0.3 m to the left of the marked To further assess the validity of UAS time series, the positional shift between 7 digitized tree crowns in the forest and four visible objects in the grassland were calculated. Tree crowns moved by 0.3 m on average with a maximum shift of 0.75 m of one tree between forest flight 02 and 03. During the image acquisition of these two flights, the lighting conditions changed due to the presence of clouds and changing wind speeds. In Figure 9, the positional shift of 0.3 m to the left of the marked tree is visible as well as slight differences in the geometry of the crown due to different lighting conditions and wind.

tree is visible as well as slight differences in the geometry of the crown due to different lighting conditions and wind. The solar panel visible in Figure 6 is one of four objects which were digitized to measure the positional accuracy of the grassland time series. Between the individual time steps, on average, the polygons differed by 0.03 m in their position. The largest deviation occurred between the The solar panel visible in Figure 6 is one of four objects which were digitized to measure the positional accuracy of the grassland time series. Between the individual time steps, on average, the polygons differed by 0.03 m in their position. The largest deviation occurred between the orthomosaics of 2013 and 2017 with a maximum shift of 0.07 m between one object. Hence, environmental changes in the grassland have less impact on the time series.

orthomosaics of 2013 and 2017 with a maximum shift of 0.07 m between one object. Hence,

environmental changes in the grassland have less impact on the time series.

**Figure 9.** Example of the positional change of a tree between the forest flights 01 and 05.

conditions and wind.

computations.

*3.6. Time Series Positional Accuracy* 

also deviated in the meadow areas.

computations of each year. In 2013 and 2015, the non-reproducible areas occur mainly in the area of a micrometeorological station in the middle of the meadow (Figure 8). In 2017 some singular pixels

**Figure 8.** Inter-annual time series of the grassland area. In red are pixels which had a correlation coefficient lower than 0.95 in one of the 10 pairwise correlation layers of 5 identical orthomosaic

To further assess the validity of UAS time series, the positional shift between 7 digitized tree crowns in the forest and four visible objects in the grassland were calculated. Tree crowns moved by 0.3 m on average with a maximum shift of 0.75 m of one tree between forest flight 02 and 03. During the image acquisition of these two flights, the lighting conditions changed due to the presence of clouds and changing wind speeds. In Figure 9, the positional shift of 0.3 m to the left of the marked tree is visible as well as slight differences in the geometry of the crown due to different lighting

The solar panel visible in Figure 6 is one of four objects which were digitized to measure the positional accuracy of the grassland time series. Between the individual time steps, on average, the polygons differed by 0.03 m in their position. The largest deviation occurred between the

environmental changes in the grassland have less impact on the time series.

**Figure 9.** Example of the positional change of a tree between the forest flights 01 and 05. **Figure 9.** Example of the positional change of a tree between the forest flights 01 and 05.

#### **4. Discussion**

The increasing use of UAS imagery in science and science-related services demands a operational processing and reliable validation techniques in the commonly used photogrammetric workflows. This study introduces two optimizations to the conventional photogrammetric workflow: (1) a new optimization for the georeferencing workflow and (2) a novel technique aiming to evaluate the repeatability of photogrammetrically retrieved orthomosaics. The application of these methods demonstrated the possibility to acquire accurately referenced UAS orthomosaic time series with low-cost UAVs and RGB cameras for both forest and grassland environments. The reproducibility of orthomosaics was highly dependent on the vegetation structure of the survey area.

#### *4.1. Optimized Georeferencing Accuracy*

The determination of optimal tie point filters leads to positional precisions of less than 6 cm in forested areas. Regarding the GSD of 2.58 cm/px the resulting orthomosaics have a positional error of up to three pixels. This error is stable over multiple computations and different sets of images from the six flights over the forest. This suggests that the iterative filtering approach leads to robust RE thresholds and only needs to be computed one time.

The difference of 0.04 m in the checkpoint error between the six flights could come from different GNSS satellite constellations or cloud conditions [41] over the three hours the flights took place, but, most likely, these small differences come from slight inaccuracies during the manual alignment of the GCP. This suggests that the operational workflow consistently leads to viable orthomosaics with resolutions of less than 10 cm, which is more than sufficient for detailed spatio-temporal structural analysis of forests [9,10]. In Belmonte et al. [8], a checkpoint error of 1.4 m and a GSD of 15 cm led to validated object-based analysis even in moderately dense canopies. The accuracy in the experimental forest areas even keep up with the checkpoint error in the grassland time series (between 0.04 m and 0.08 m). This also compares very well with other studies in structurally sparse landscapes where checkpoint errors tend to be very low [33,42].

The grassland time series further demonstrates that the proposed methods of optimization and validation work outside of the experimental setup. Differences in flight planning, low quality GNSS measurements at the GCP and the usage of different cameras still led to consistent and accurate orthomosaics. Hence, the provided workflow can be used as a fully operational method in grassland and agricultural contexts.

#### *4.2. Pixel-Wise Reproducibility of Orthomosaics*

The pixel-wise correlation of identically computed RGB orthomosaics leads to a quantitative measurement of reproducibility. This is a necessary addition to assess the photogrammetic processing of images, especially considering the "black box" nature of non-open-source software like Metashape. With the pixel-wise approach, deviations between computations are assigned to certain spatial regions of the orthomosaic. The mesh and DSM as orthorectification surfaces in the forest time series showed similar amounts of reproducible pixels (DSM: 69%, mesh: 74%). However, the mesh is considered superior since it leads to a better reproducibility in canopy areas. The calculation of the mesh is also less time consuming than the computation of the DSM. Both digital surfaces failed to reproduce fine structures like single tree branches or forest gaps. This can be problematic, since these structures are most likely the ones researchers aim to observe with UAS imagery [11,16,43].

The results also suggest that non-reproducibility can be tracked down to uncertainties in the initial step of the photogrammetric process, the feature identification and feature matching of the individual images [31]. These uncertainties increase with the presence of fine structures in the images since they are prone to move even under light wind conditions. It is therefore more likely that their position changes in consecutive images. In particular, Döpper et al. (2020) [44] recently demonstrated that acquiring UAV data for forest, grasslands and crop environments in low-light conditions such as low Solar Elevation Angle or high cloud cover causes problems in matching characteristics in the image alignment process.

Although this study declares these areas as not reproducible, the structures are still apparent in the orthomosaics. Image analysis methods (e.g., an object-based classification) of these areas might still lead to viable results and consistent geometries. This should be investigated in subsequent studies.

#### *4.3. Time Series*

The combination of multiple reproducibility layers enables the validation of UAS derived orthomosaic timeseries. The high reproducibility of multi-temporal grassland orthomosaics confirms the valid analysis of vegetation dynamics in grassland and agricultural studies. Forested area time series are also possible, however non-reproducible regions have to be considered.

The checkpoint error of each orthomosaic in the time series alone gives no insight into the positional relation between the individual time steps. Image acquisition with the UAV, analysis tools (processing software), field experiment designs and environmental conditions have a strong impact on the geometric accuracy of photogrammetric products [6,45,46]. Hence, it is essential to quantify the geometric accuracy on aerial imagery when combining UAS data from different flights, dates and sources [47]. We suggest the addition of geometry-based deviations from digitized objects. In case of the grassland time series, a maximum positional shift of 0.07 m between the time steps is tolerable for most use cases such as the modelling of the temporal dynamics of biophysics and biochemical variables of the meadow canopy or even analyze the variability in size and distribution of vegetation patterns (Lobo et al. in prep). This error also lies in the range of the individual checkpoint errors (0.04 to 0.08 m). In the forest time series, similar accuracies were achieved in the surrounding meadow areas. However, the canopy showed positional deviations of up to 0.5 m in digitized trees. A proportion of this error comes from the actual movement of the canopy due to wind and changes in the lighting conditions [44]. The non-reproducibility of some parts of the canopy, especially at forest clearings may also contribute to this error. We suggest object-based analysis instead of pixel-based approaches when high-resolution forest time series are regarded.

#### *4.4. Improved UAS Workflow*

The suggested methods of checkpoint error optimization and reproducibility validation complement the general UAS workflow. In order to make these methods more accessible to users, we provide a Python module—MetashapeTools (https://github.com/envima/MetashapeTools)— which utilizes the Metashape API for an improved photogrammetric workflow. The orthomosaic processing in form of a script-based workflow ensures the documented parameterization of all the modules in Agisoft Metashape. The workflow is therefore shareable and can be easily integrated into a version control system, making UAS research more transparent. Apart from the manual alignment of the GCP, the photogrammetric process is fully automated. The default parameters in the MetashapeTools are the results of the experimental forest flights and a starting point for a multitude of flight areas. The script-based framework provides flexibility to alter different parts of the workflow and, e.g., integrate alternative processing steps for time series like in Cook et al. [48].

In the future, the general workflow should utilize only open-source software. Currently, Agisoft Metashape is the de facto standard and the most promising software in affordable UAS image processing [3,13]. The development of open-source photogrammetry projects like OpenDroneMap are promising and will be integrated once they are fully operational. The transition from proprietary software towards open and transparent workflows is an ongoing trend worth supporting in spatial analyses [49]. For now, publications utilizing Metashape or other "black box" software should at least include the checkpoint error and the full parameterization of the processing modules. Ideally, the parameterization can be provided as a script, e.g., as a supplementary material or published in a repository. Although the computation of the reproducibility layer can be intensive, its inclusion in studies provide the necessary transparency about the quality and interpretation of the orthomosaics. The documented and evaluated orthomosaics are a big contribution to environmental mapping and monitoring system [50].

#### **5. Conclusions**

The rising popularity of UAS imagery in all fields of spatial research led to a variety of processing approaches. The supposedly ease of use and low cost of ready-to-fly UAS opened up some pitfalls in the image acquisition and processing which this study addressed. The evaluation of the orthomosaic accuracy aimed at the reproducibility of the final product. The presented optimization of the georeferencing accuracy based on the checkpoint error and the quantification of the orthomosaic reproducibility enhance the UAS workflow with the necessary quality assessment. This complements the standardized acquisition of high quality UAS time series.

In forest environments, there are still some shortcomings of UAS orthomosaic reproducibility that quantitative analyses need to consider. In grassland environments, these issues are marginal, which supports the validity of UAS in agricultural applications. The novel approaches of this study and their incorporation into a workflow are promising for validated and transparent UAS reserach.

#### **Supplementary Materials:** https://github.com/envima/UASreproducibility.

**Author Contributions:** M.L. analyzed data, contributed to the study design, developed the methodology, and was the main writer of the manuscript; C.M.R. analyzed data (grassland), and contributed to the manuscript; N.F. contributed to the study design (forest), and reviewed the manuscript; T.L.K. contributed to the study design, and reviewed the manuscript; S.R. provided methods; S.S. performed the flights (forest), and provided methods; L.W. provided methods; A.L. contributed to the study design (grassland), analyzed data, reviewed the manuscript, and supervised the project; M.-T.S. contributed to the grassland study design, reviewed the manuscript, and supervised the project; C.R. provided methods, contributed to the study design, contributed to the manuscript, and supervised the project; T.N. contributed to the study design, reviewed the manuscript, and supervised the project; All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Hessian State Ministry for Higher Education, Research and the Arts, Germany, as part of the LOEWE priority project Nature 4.0—Sensing Biodiversity. The grassland study was funded by the Spanish Science Foundation FECYT-MINECO through the BIOGEI (GL2013- 49142-C2-1-R) and IMAGINE (CGL2017-85490-R) projects, and by the University of Lleida; and supported by a FI Fellowship to C.M.R. (2019 FI\_B 01167) by the Catalan Government.

**Acknowledgments:** Thanks to Wilfried Wagner for letting us use his meadow as a landing site.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

Flight mission planning is the basis for all UAS derived orthomosaics and therefore crucial for high quality and reproducible image processing. The planning requires the consideration of hardware limitations like UAS speed or the image sampling rate of the camera as well as the aspired ground sampling distance. Further, individual images need to overlap sufficiently in order to process the orthomosaics.

The provided R-package uavRmp strives for the automated and reproducible creation of flight tracks. The package helps users by suggesting image sampling rates and UAS speed with the given camera parameters and the required overlap and GSD. Rectangular study areas can directly be planned in R. Furthermore, uavRmp provides a high resolution surface following mode if a digital elevation model is provided. This makes it possible to follow detailed structures like forest canopies and areas with steep terrain. The camera is also oriented in a fixed direction for the whole mission. The flight is automatically split into multiple MAVlink protocols according to a provided battery lifetime including a safety buffer for proper operations.

#### **Appendix B**


**Table A1.** Details about the cameras and settings.

#### **References**


50. Haase, P.; Tonkin, J.D.; Stoll, S.; Burkhard, B.; Frenzel, M.; Geijzendorffer, I.R.; Häuser, C.; Klotz, S.; Kühn, I.; McDowell, W.H.; et al. The next generation of site-based long-term ecological monitoring: Linking essential biodiversity variables and ecosystem integrity. *Sci. Total Environ.* **2018**, *613–614*, 1376–1384. [CrossRef]

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **3D Reconstruction of Power Lines Using UAV Images to Monitor Corridor Clearance**

**El˙zbieta Pastucha 1,2, Edyta Puniach 1,\*, Agnieszka Scisłowicz ´ <sup>1</sup> , Paweł Cwi ˛akała ´ <sup>1</sup> , Witold Niewiem <sup>1</sup> and Paweł Wi ˛acek 1,3**


Received: 1 October 2020; Accepted: 10 November 2020; Published: 11 November 2020

**Abstract:** Regular power line inspections are essential to ensure the reliability of electricity supply. The inspections of overground power submission lines include corridor clearance monitoring and fault identification. The power lines corridor is a three-dimensional space around power cables defined by a set distance. Any obstacles breaching this space should be detected, as they potentially threaten the safety of the infrastructure. Corridor clearance monitoring is usually performed either by a labor-intensive total station survey (TS), terrestrial laser scanning (TLS), or expensive airborne laser scanning (ALS) from a plane or a helicopter. This paper proposes a method that uses unmanned aerial vehicle (UAV) images to monitor corridor clearance. To maintain the adequate accuracy of the relative position of wires in regard to surrounding obstacles, the same data were used both to reconstruct a point cloud representation of a digital surface model (DSM) and a 3D power line. The proposed algorithm detects power lines in a series of images using decorrelation stretch for initial image processing, the modified Prewitt filter for edge enhancement, random sample consensus (RANSAC) with additional parameters for line fitting, and epipolar geometry for 3D reconstruction. DSM points intruding into the corridor are then detected by calculating the spatial distance between a reconstructed power line and the DSM point cloud representation. Problematic objects are localized by segmenting points into voxels and then subsequent clusterization. The processing results were compared to the results of two verification methods—TS and TLS. The comparison results show that the proposed method can be used to survey power lines with an accuracy consistent with that of classical measurements.

**Keywords:** unmanned aerial vehicles; power lines; image-based reconstruction; 3D reconstruction

#### **1. Introduction**

Power lines are a typical part of urban and rural landscapes. Due to the need for power, national and regional networks cover most of the world and continue to expand. They require regular monitoring and maintenance work. Monitoring power lines features two aspects: power line components and occlusions of the line corridor. Both are important and interconnected and are thus often addressed simultaneously.

The power line corridor is a 3D buffer around the wires and is defined by the set distance from the wires. It is thus necessary to find the precise position of the wires [1]. Regular inspections of vegetation inside and near the power line corridor are needed to identify trees or branches that need to be cut due to safety concerns, as the direct proximity of trees in a line corridor might trigger, for example, a bushfire. There are monitoring methods that are used to identify branches or canopies that endanger the inviolability of the power line corridor [2] or detect and classify trees to help evaluate the impact on the line [3]. Other techniques focus on volumetric analysis in order to evaluate the impact of the vegetation and its progression by calculating a differential map of a digital surface model (DSM) for two epochs [4].

A range of techniques have been implemented to solve the above problem. They vary from mundane and time-consuming methods of classical surveying to technologically advanced and highly expensive ones. Among the most popular inventory methods for inspecting power lines are airborne laser scanning [5–8] and mobile terrestrial scanning [9,10]. The dense point clouds generated by laser scanning can be used to form models of 3D power lines, in the context of surrounding vegetation, and a survey network. The typical workflow features classifying point clouds, creating the digital terrain model (DTM), and 3D line modeling [11]. Algorithms based on point position, the intensity of response, multiple echoes, and 2D projections enable automated data processing [6,12,13]. Although Light Detection and Ranging (LiDAR) is an effective and robust method, it has some drawbacks. Some, such as problems with suitable weather conditions, are partially shared with passive methods. Additional drawbacks include problems with identifying towers and simply the cost of the equipment, survey, and processing [14]. Some research has also dealt with a combination of unmanned aerial vehicle (UAV) technology and LiDAR for surveying power lines [5,15,16]. The availability of lighter LiDAR scanners and developments in the UAV platforms have contributed to improving efficiency. Although this technology has potential for further development, the cost of a device in combination with the high risk of failure decreases its economic efficiency.

Many applications use optical images and computer vision systems [17]. Satellite, airborne, and UAV images have been employed [18,19]. Satellite images, owing to their low resolution, are limited to providing generalized information on terrains and vegetation [20]. Aerial images rely significantly on manual stereo measurements [19]. UAV-based optical images can provide accurate and high-resolution data [21], and their use with a range of stereomatching algorithms is a promising solution. Attempts have been made to consider photogrammetry as a source of the point cloud and to analyze and filter data similarly to the procedure in LiDAR [22]. Automatic software for dense matching, where the geometry of power lines is reconstructed, could be a fast and convenient solution. In addition to its clear drawbacks relating to optical images, such as the sensitivity to changes in lighting conditions, and the atmospheric influence, the radiometric differences between lines and a background make it even less effective. The lines usually occupy a small part of a photo; otherwise, the time needed for the surveying, the size of the survey data, and the processing time increase. Furthermore, the complexity and the variability of the background is an obstacle for the 3D line reconstruction [23]. The aforementioned reasons and the use of the outliers' approach might preclude the operation of dense matching algorithms. For these reasons, this solution is not feasible.

Other solutions have been proposed for clearance monitoring using aerial images, but methods that use UAVs as the main source of data are becoming increasingly popular. The biggest advantages to using UAVs are their low altitude of flight and the flexibility and economy of the method in comparison with airborne photogrammetry [24,25]. Both multi-rotors and fixed wings are used for this task. The former constructions are especially useful for precise surveys at a low altitude, but the latter approaches are more efficient and have a greater fly range [18].

Considerable research has been dedicated to power line monitoring using UAVs. Most of it has focused solely on wire detection in images [26–34], but a few studies have taken a more holistic approach by considering not only the position of the power line, but also the line corridor and obstacles. The means of data processing and 3D power line reconstruction are different and depend on the aim of the calculation. Such an approach was proposed in two papers [26,35]. One focused on dense matching algorithms and the automation of obstacle detection, whereas power line reconstruction was performed manually, and aided by epipolar images [23]. The other study implemented the results of past work together with fully automated line detection algorithms based on images [35]. This method is based on

changes in the gradient combined with the high gray response of the power line and assumes multiple thresholds. This solution provides good results, although it has yet to be proven to work with more versatile data. Similarly, research on epipolar imagery was presented in another piece of research [36], where a real-time system was developed for obstacle avoidance by UAVs as they monitored power lines. To calculate the relative 3D position of the power lines, several steps were implemented, including TopHat transform and the cross-based arbitrary shape support region method, to create a depth map. The study assumed that the background was not complex, and featured either the sky or some treetops. Another holistic approach used semantic segmentation based on fully convolutional neural networks to enhance depth maps and accurately reconstruct linear objects [37]. The research also used enhanced dense matching to detect obstacles in the power line corridor. The neural networks were also used successfully for line segmentation in another study [38]. The effectiveness of these algorithms is at least 80%. However, their major drawback is the demand for a vast learning dataset [3]. Additionally, there are hardly any cases of a comprehensive methodology for detecting and reconstructing power lines in the literature using neural networks [39]. Additional research on the 3D reconstruction of power lines was presented in two papers by the same authors [40,41]. In both, the lines are initially detected from epipolar images using a simple extraction template. Three-dimensional reconstruction is performed differently in each, however. One introduces a 3D grid based on the expected ground sampling distance (GSD) and the positions of the utility poles [40]. The grid is then reprojected on images to validate the detected power lines and establish their relative correspondence. In the second paper, all combinations of wires detected in both images in a stereopair are considered and then reprojected on a third image to validate the choice [41]. background was not complex, and featured either the sky or some treetops. Another holistic approach used semantic segmentation based on fully convolutional neural networks to enhance depth maps and accurately reconstruct linear objects [37]. The research also used enhanced dense matching to detect obstacles in the power line corridor. The neural networks were also used successfully for line segmentation in another study [38]. The effectiveness of these algorithms is at least 80%. However, their major drawback is the demand for a vast learning dataset [3]. Additionally, there are hardly any cases of a comprehensive methodology for detecting and reconstructing power lines in the literature using neural networks [39]. Additional research on the 3D reconstruction of power lines was presented in two papers by the same authors [40,41]. In both, the lines are initially detected from epipolar images using a simple extraction template. Three-dimensional reconstruction is performed differently in each, however. One introduces a 3D grid based on the expected ground sampling distance (GSD) and the positions of the utility poles [40]. The grid is then reprojected on images to validate the detected power lines and establish their relative correspondence. In the second paper, all combinations of wires detected in both images in a stereopair are considered and then reprojected on a third image to validate the choice [41]. This paper aims to develop a comprehensive and robust method for occlusion monitoring in the power line corridor (Figure 1). The main goals are to minimize the time needed to perform the survey and the user input in subsequent processing. The same dataset was thus used to reconstruct power lines and acquire a point cloud representation of a DSM for a possible occlusion check. However, images suitable for the creation of a dense point cloud representation of the terrain usually record power lines as barely distinguishable objects that are only a few pixels wide. Therefore, additional processing is required to successfully extract and model power lines.

*Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 3 of 31

although it has yet to be proven to work with more versatile data. Similarly, research on epipolar imagery was presented in another piece of research [36], where a real-time system was developed for obstacle avoidance by UAVs as they monitored power lines. To calculate the relative 3D position of

arbitrary shape support region method, to create a depth map. The study assumed that the

This paper aims to develop a comprehensive and robust method for occlusion monitoring in the power line corridor (Figure 1). The main goals are to minimize the time needed to perform the survey and the user input in subsequent processing. The same dataset was thus used to reconstruct power lines and acquire a point cloud representation of a DSM for a possible occlusion check. However, images suitable for the creation of a dense point cloud representation of the terrain usually record power lines as barely distinguishable objects that are only a few pixels wide. Therefore, additional processing is required to successfully extract and model power lines. The remainder of this paper is structured as follows: Section 2 describes the proposed method, including the data acquisition process, initial data processing, 2D image processing, 3D reconstruction of power lines, and obstacle detection. The datasets acquired to create and test the proposed method are also presented in Section 2. Section 3 describes the results of UAV image processing along with the assessment of their accuracy. A discussion of errors and their possible sources is in Section 4. Section 5 offers the conclusions of this paper. In Appendix A, the additional results of a threshold sensitivity analysis are included.

**Figure 1.** A visualization of the goal of the research, i.e., the results of the 3D reconstruction of power lines (source: FlyTech unmanned aerial vehicle (UAV) test flights)—power line over the UAV-derived point cloud representation of the digital surface model (DSM). **Figure 1.** A visualization of the goal of the research, i.e., the results of the 3D reconstruction of power lines (source: FlyTech unmanned aerial vehicle (UAV) test flights)—power line over the UAV-derived point cloud representation of the digital surface model (DSM).

The remainder of this paper is structured as follows: Section 2 describes the proposed method, including the data acquisition process, initial data processing, 2D image processing, 3D reconstruction of power lines, and obstacle detection. The datasets acquired to create and test the proposed method are also presented in Section 2. Section 3 describes the results of UAV image processing along with the assessment of their accuracy. A discussion of errors and their possible sources is in Section 4. Section 5

offers the conclusions of this paper. In Appendix A, the additional results of a threshold sensitivity analysis are included. *Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 4 of 31

#### **2. Materials and Methods 2. Materials and Methods**

The proposed method features the following steps (Figure 2): The proposed method features the following steps (Figure 2):


**Figure 2.** A general schema of the proposed method. **Figure 2.** A general schema of the proposed method.

*2.1. Data Acquisition*  All the steps are described in detail below.

#### Appropriate data acquisition can enable the detection of power lines and is as important as the *2.1. Data Acquisition*

UAVs.

methods for the subsequent processing of the data. Some requirements need to be satisfied to ensure the reliable operation of the algorithm. While maintaining the highest efficiency, the measurement requirements of the data should be universal enough to allow for the use of different cameras and Appropriate data acquisition can enable the detection of power lines and is as important as the methods for the subsequent processing of the data. Some requirements need to be satisfied to ensure

altitude.

the reliable operation of the algorithm. While maintaining the highest efficiency, the measurement requirements of the data should be universal enough to allow for the use of different cameras and UAVs.

First, it is important to define image quality requirements. In this case, the most significant parameter is resolution as defined by the GSD. The maximal GSD to ensure that the wires are detected must be smaller than their diameters. However, to increase the efficiency of detection, the GSD should be half the diameter. Another important aspect is the camera's exposure settings, which must ensure that the wires can be distinguished from the background (Table 1). Owing to the small size of the wires, the ISO setting should be as small as possible. To avoid blurred images, the shutter speed should be adjusted to UAV flight speed, and should not be higher than the ratio of the GSD to the speed of the UAV. If the camera settings allow for the disabling of the low-pass filter, this should be done. Moreover, for the high accuracy of the end product, the use of a global shutter camera would be advised. *Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 5 of 31 should be half the diameter. Another important aspect is the camera's exposure settings, which must ensure that the wires can be distinguished from the background (Table 1). Owing to the small size of the wires, the ISO setting should be as small as possible. To avoid blurred images, the shutter speed should be adjusted to UAV flight speed, and should not be higher than the ratio of the GSD to the speed of the UAV. If the camera settings allow for the disabling of the low-pass filter, this should be done. Moreover, for the high accuracy of the end product, the use of a global shutter camera would be advised.



Secondly, data acquisition must ensure an appropriate data structure for the algorithm. Because a major function of the algorithm is to transfer detection between stereopairs, the flight plan has to ensure the visibility of the wires for all subsequent images between poles. To meet this requirement, the flight path must be linear, parallel to the power line, and consist of at least two rows, placed on opposite sides of the surveyed corridor (Figure 3). Another advantage of a linear flight plan is the possibility of measuring power lines over distances ranging from a few (multi-rotor) to dozens of kilometers (fixed wing) in one flight. Secondly, data acquisition must ensure an appropriate data structure for the algorithm. Because a major function of the algorithm is to transfer detection between stereopairs, the flight plan has to ensure the visibility of the wires for all subsequent images between poles. To meet this requirement, the flight path must be linear, parallel to the power line, and consist of at least two rows, placed on opposite sides of the surveyed corridor (Figure 3). Another advantage of a linear flight plan is the possibility of measuring power lines over distances ranging from a few (multi-rotor) to dozens of kilometers (fixed wing) in one flight.

**Figure 3.** A model flight plan against point cloud representation of the terrain. Power lines in red, opposite stripes (left and right) in pink and blue. **Figure 3.** A model flight plan against point cloud representation of the terrain. Power lines in red, opposite stripes (left and right) in pink and blue.

acquisition of the camera. The front overlap should thus be adjusted according to flight speed and

Third, image overlap and the field of view should be carefully planned. The side overlap should

Third, image overlap and the field of view should be carefully planned. The side overlap should allow for the wires to be visible in both neighboring strips. The front overlap should not be lower than 50% (at the height of the wires, overlap on the ground is higher) to ensure the correct transfer of detection between stereopairs, but this should also not exceed the minimum duration of the acquisition of the camera. The front overlap should thus be adjusted according to flight speed and altitude.

The last thing to take into consideration while planning a survey is the global accuracy of the resultant product. Photogrammetric blocks, in the case of corridor mapping, have highly unstable geometry. To maintain high accuracy and prevent errors that could occur in self-calibration due to potentially high correlation between interior and exterior camera orientation, additional measures have to be introduced. A network of ground control points (GCPs) can be introduced to stabilize the block [43]. However, it might not be feasible in remote terrains and it also increases the time and cost of the survey. Equipping the UAV with a global navigation satellite system (GNSS) post-processing kinematic (PPK) receiver to achieve centimeter accuracy of EOE can also help to mitigate the problem [44]. However, the PPK receiver increases the overall cost of the platform. To ensure correct results, one of those measures must be introduced.

#### *2.2. Bundle Adjustment and Data Ordering*

The Agisoft Metashape Professional (version 1.6.0 build 9128) software was used to estimate both the IOE and EOE for the obtained UAV images. All images collected during the mission were processed together. The processing, which included automatic bundle adjustment with self-calibration, was mostly automated. A typical set of parameters were estimated during self-calibration [45], utilizing the Brown distortion model [46]:


The global accuracy was ensured either by the use of GCPs or precise EOE (GNSS PPK processing). Ultimately, the resulting data contained undistorted images (achieved using the calculated parameters of calibration), along with their corresponding EOE and calibrated focal length.

The data ordering process, as shown in Figure 4, involved firstly automatically assigning the images to flight lines (strip) and then subsequently assigning stereopairs, consisting of the two closest images from their opposite strips (Figure 3), which were subjected to further processing. A set of ordered stereopairs is crucial for the seamless operation of the algorithm. It was necessary to process the images in a defined order (in the direction of processing). Therefore, stereopairs were selected and sorted in ascending order in the direction of processing. This direction was defined by the ordered coordinates of the poles acquired through Agisoft Metashape (or obtained from the operator of the power line).

For the seamless operation of the algorithm, a power line fragment targeted in a UAV flight survey mission was divided into survey sections that were subjected to further processing. The section was defined by a starting utility pole, transfer utility poles, and an ending utility pole. One survey section might have consisted of several power line spans (Figure 5). The order of the poles in the survey section determined the direction of subsequent image processing, and thus had a crucial effect on the entire process of power line detection.

**Figure 4.** The chart of the first step of algorithm processing. This step of the algorithm is aimed at data ordering for further processing. The survey data (survey exterior orientation elements (EOE)) and the data acquired by means of Agisoft Metashape Professional (aligned EOE, interior orientation elements (IOE), undistorted images, and coordinates of tops of utility poles) are used. **Figure 4.** The chart of the first step of algorithm processing. This step of the algorithm is aimed at data ordering for further processing. The survey data (survey exterior orientation elements (EOE)) and the data acquired by means of Agisoft Metashape Professional (aligned EOE, interior orientation elements (IOE), undistorted images, and coordinates of tops of utility poles) are used. **Figure 4.** The chart of the first step of algorithm processing. This step of the algorithm is aimed at data ordering for further processing. The survey data (survey exterior orientation elements (EOE)) and the data acquired by means of Agisoft Metashape Professional (aligned EOE, interior orientation elements (IOE), undistorted images, and coordinates of tops of utility poles) are used.

**Figure 5.** Sketch of basic elements of a survey section. **Figure 5.** Sketch of basic elements of a survey section. **Figure 5.** Sketch of basic elements of a survey section.

*2.3. Power Line Detection in Images*  Detection and reconstruction were performed separately, using dedicated algorithms. The process consists of multiple subsequent steps and was written in the Python programming language. For effective automation, it was necessary to detect the power lines continuously through a *2.3. Power Line Detection in Images*  Detection and reconstruction were performed separately, using dedicated algorithms. The process consists of multiple subsequent steps and was written in the Python programming language. For effective automation, it was necessary to detect the power lines continuously through a Defining the survey sections for image processing was an important stage in data ordering. Depending on the type of power line, they were defined differently. For high-voltage lines, one span was defined as one survey section. For medium-voltage lines, the survey section included rectilinear sections of the power line. To divide the survey section, each image containing a pole was analyzed. Three-dimensional lines connecting a pole, captured within the image, and its neighbors were

reprojected on the image. When the in-line direction change was greater than 1◦ , the pole, captured within the image, was assumed to mark the end of the given survey section and the start of a new one. Data thus prepared and ordered were then subjected to further processing. onto the next image. Such reconstruction was possible by using two across-track neighboring images. Thus, the processing unit in the detection algorithm was a single stereopair. Detection was then performed separately on both images while keeping track of the respective wires. Both images were

*Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 8 of 31

#### *2.3. Power Line Detection in Images* then used to reconstruct the 3D position of the given power line.

direction of the power line.

Detection and reconstruction were performed separately, using dedicated algorithms. The process consists of multiple subsequent steps and was written in the Python programming language. The process can be divided into several steps (Figure 6) that vary depending on the processed image: detection (Case 1—images capturing the survey sections from the starting utility pole) or continued detection (Case 2—all remaining images).

For effective automation, it was necessary to detect the power lines continuously through a sequence of images. The simplest and most accurate way to transfer detection between neighboring photographs is through 3D space, where the relevant geometry was reconstructed and then projected onto the next image. Such reconstruction was possible by using two across-track neighboring images. Thus, the processing unit in the detection algorithm was a single stereopair. Detection was then performed separately on both images while keeping track of the respective wires. Both images were then used to reconstruct the 3D position of the given power line. Due to the wire's relative position in the image and its sag, the wire was almost never recorded as a straight line in an image. The discrepancies between the fitted line and the empirically captured position of each wire varied from two to dozens of pixels. Moreover, due to varying backgrounds, neither the color of the wire nor the contrast in the image remained constant. To accommodate change in both position and contrast, a local, constantly adapting approach was chosen. For each wire, the image was divided into small, ordered sections within which the power line could be approximated by a straight line. To define the position and order of the processing windows, past information about

The process can be divided into several steps (Figure 6) that vary depending on the processed image: detection (Case 1—images capturing the survey sections from the starting utility pole) or continued detection (Case 2—all remaining images). the approximate positions and directions of the power lines was needed. This was acquired either by a manual initialization procedure in the image at the beginning of the survey section or by using projected points from the preceding stereopair.

**Figure 6.** Outline of power line detection process. **Figure 6.** Outline of power line detection process.

2.3.1. Approximation of Position and Direction of Power Line To approximate the position and direction of each power line, two approaches were adopted according to the given case. Case 1—starting detection required a manual input in the form of two points per power line in the image. This was sufficient to calculate the general direction and position of the given wire, and important information was obtained regarding the correspondence of the wires between image stereopairs. Owing to the different modes of construction of the utility poles, the wires could be captured in different orders between images. To transfer detection between images (Case 2), the 3D coordinates of the power lines were used and were then projected onto images in the subsequent stereopair. The line was fitted to points that Due to the wire's relative position in the image and its sag, the wire was almost never recorded as a straight line in an image. The discrepancies between the fitted line and the empirically captured position of each wire varied from two to dozens of pixels. Moreover, due to varying backgrounds, neither the color of the wire nor the contrast in the image remained constant. To accommodate change in both position and contrast, a local, constantly adapting approach was chosen. For each wire, the image was divided into small, ordered sections within which the power line could be approximated by a straight line. To define the position and order of the processing windows, past information about the approximate positions and directions of the power lines was needed. This was acquired either by a manual initialization procedure in the image at the beginning of the survey section or by using projected points from the preceding stereopair.

were within the boundaries of the image. The line parameters defined sought after the position and

#### 2.3.1. Approximation of Position and Direction of Power Line

To approximate the position and direction of each power line, two approaches were adopted according to the given case.

Case 1—starting detection required a manual input in the form of two points per power line in the image. This was sufficient to calculate the general direction and position of the given wire, and important information was obtained regarding the correspondence of the wires between image stereopairs. Owing to the different modes of construction of the utility poles, the wires could be captured in different orders between images.

To transfer detection between images (Case 2), the 3D coordinates of the power lines were used and were then projected onto images in the subsequent stereopair. The line was fitted to points that were within the boundaries of the image. The line parameters defined sought after the position and direction of the power line. *Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 9 of 31

#### 2.3.2. Initial Image Processing and Edge Detection 2.3.2. Initial Image Processing and Edge Detection

To detect power lines in images, a method to enhance their visibility was needed. For this purpose, a decorrelation stretch of histograms of the images was used [47]. The enhancement algorithm was based on principal component analysis (PCA). The covariance matrix was calculated from the three RGB bands, and its eigenvalues were found to form coefficients of the transformation of the principal component. After the normalization of the transform bands, new image bands were created. The result was compared with the weighted arithmetic mean of all bands (Figure 7) [48]. The third band of the processed image was chosen as it delivered the highest visibility of the power line. To detect power lines in images, a method to enhance their visibility was needed. For this purpose, a decorrelation stretch of histograms of the images was used [47]. The enhancement algorithm was based on principal component analysis (PCA). The covariance matrix was calculated from the three RGB bands, and its eigenvalues were found to form coefficients of the transformation of the principal component. After the normalization of the transform bands, new image bands were created. The result was compared with the weighted arithmetic mean of all bands (Figure 7) [48]. The third band of the processed image was chosen as it delivered the highest visibility of the power line.

(**a**) (**b**)

**Figure 7.** The same part of an example image showing power lines: (**a**) the band corresponding to the original blue band, obtained by the decorrelation stretching of the raw image; (**b**) weighted arithmetic mean of all bands (0.2989 × R + 0.5870 × G + 0.1140 × B). **Figure 7.** The same part of an example image showing power lines: (**a**) the band corresponding to the original blue band, obtained by the decorrelation stretching of the raw image; (**b**) weighted arithmetic mean of all bands (0.2989 × R + 0.5870 × G + 0.1140 × B).

The edge detection operator needed to be set to enhance long, linear edges, and presumably only ones that aligned with the approximated direction of the power line. The modified Prewitt operator was used for this. The operator was expanded from its base 3 × 3 form to 31 × 31. Then, the rotation to the direction of the power line was computed using the previously acquired approximate parameters of the power line. The final operator scheme is shown in Figure 8. The edge detection operator needed to be set to enhance long, linear edges, and presumably only ones that aligned with the approximated direction of the power line. The modified Prewitt operator was used for this. The operator was expanded from its base 3 × 3 form to 31 × 31. Then, the rotation to the direction of the power line was computed using the previously acquired approximate parameters of the power line. The final operator scheme is shown in Figure 8.

The convolution of the grayscale image was calculated using the modified Prewitt operator. The

resulting image was not normalized and had both negative and positive values.

unsigned integer space and subsequently binarized (Figure 9).

**Figure 8.** An excerpt from the modified Prewitt operator.

All the highest values occurred along the edges on the right side, and all the lowest values occurred along the edges on the left side. Two images were created: one to identify edges on the right side and the other to identify those on the left side. Both images were then normalized to eight-bit 2.3.2. Initial Image Processing and Edge Detection

mean of all bands (0.2989 × R + 0.5870 × G + 0.1140 × B).

parameters of the power line. The final operator scheme is shown in Figure 8.

resulting image was not normalized and had both negative and positive values.


To detect power lines in images, a method to enhance their visibility was needed. For this purpose, a decorrelation stretch of histograms of the images was used [47]. The enhancement algorithm was based on principal component analysis (PCA). The covariance matrix was calculated from the three RGB bands, and its eigenvalues were found to form coefficients of the transformation of the principal component. After the normalization of the transform bands, new image bands were created. The result was compared with the weighted arithmetic mean of all bands (Figure 7) [48]. The third band of the processed image was chosen as it delivered the highest visibility of the power line.

(**a**) (**b**)

**Figure 7.** The same part of an example image showing power lines: (**a**) the band corresponding to the original blue band, obtained by the decorrelation stretching of the raw image; (**b**) weighted arithmetic

The edge detection operator needed to be set to enhance long, linear edges, and presumably only ones that aligned with the approximated direction of the power line. The modified Prewitt operator was used for this. The operator was expanded from its base 3 × 3 form to 31 × 31. Then, the rotation to the direction of the power line was computed using the previously acquired approximate

**Figure 8.** An excerpt from the modified Prewitt operator. **Figure 8.** An excerpt from the modified Prewitt operator. —right edge coherence, a quotient of inliers in RANSAC to all positive pixels in the image

All the highest values occurred along the edges on the right side, and all the lowest values occurred along the edges on the left side. Two images were created: one to identify edges on the right The convolution of the grayscale image was calculated using the modified Prewitt operator. The resulting image was not normalized and had both negative and positive values. segment; —left edge coherence, a quotient of inliers in RANSAC to all positive pixels in the image segment;

side and the other to identify those on the left side. Both images were then normalized to eight-bit unsigned integer space and subsequently binarized (Figure 9). All the highest values occurred along the edges on the right side, and all the lowest values occurred along the edges on the left side. Two images were created: one to identify edges on the right side and the other to identify those on the left side. Both images were then normalized to eight-bit unsigned integer space and subsequently binarized (Figure 9). \_—the maximum distance between lines of the right and left edges within the image segment; —parallelism coefficient, the quotient of the minimal and maximum distances between lines of the right and left edges within the image segment.

**Figure 9.** Four stages of image processing: (**a**) original image; (**b**) pre-processed image; (**c**) normalized image; (**d**) binarized image. **Figure 9.** Four stages of image processing: (**a**) original image; (**b**) pre-processed image; (**c**) normalized image; (**d**) binarized image.

(**c**) (**d**)

#### 2.3.3. Power Line Detection

Separately, for each wire in the image, the process of detection was conducted over small image segments. The position and order of the segments were calculated according to the approximated position and direction of the wire, and they were evenly spaced, with at least a 10% overlap, along the line between the captured utility pole (if present) and the boundary of the image (Figure 10). The order of the segments was set along the direction of the power line. *Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 11 of 31

**Figure 10.** Original image with the calculated image segments in red. **Figure 10.** Original image with the calculated image segments in red.

Depending on the values of the above parameters, the detection was judged to be successful or incorrect/implausible. If the detection was accepted, the parameters of the line were calculated for the line between the right and left images, and involved the following: image coordinates of both ends of the detected line segment, For each segment, the detection was performed as follows. In images of both the left and the right edges, a line was fitted within the data using the RANSAC algorithm [49]. Then, detected symmetric lines were used to determine the final position of the power line. A list of parameters of detection was provided, together with the results, to assess the correctness of the process:


coordinates of the points along the power line were determined using the spatial intersection of homologous rays. The problem was to identify homologous points on wires between the left and the right images in a given stereopair. Previous knowledge of corresponding wires and epipolar Depending on the values of the above parameters, the detection was judged to be successful or incorrect/implausible. If the detection was accepted, the parameters of the line were calculated for the line between the right and left images, and involved the following:

geometry was invoked to perform this task. Instead of identifying corresponding points on wires using feature descriptors, a purely geometric approach was chosen. • image coordinates of both ends of the detected line segment,

(Figure 11b). Together, they provided a discrete representation of the wire.

Each wire captured within the left and right images in a stereopair was represented by line • *a*, *b* parameters of line equation *y* = *a*x + *b*.

segments established in the detection step. Two hundred evenly spaced points were chosen along the wire on the left image. Their coordinates were derived directly from the parameters of segments of the line. Then, for the nodal points, the respective epipolar lines on the right image were computed The rejected detection was replaced by an extension of parameters detected in the previous image segment. The process was repeated until the end of the image was reached. All parameters of the line for all segments were then saved for further processing.

(Figure 11a) using a fundamental matrix (Equation (1)): 2.3.4. Three-Dimensional Geometry Reconstruction

′ ∙ = ′′, (1) where: *x*' = [x y 1]—focal coordinates of a point on the left image, The last stage of the power line detection process was 3D geometry reconstruction. The global coordinates of the points along the power line were determined using the spatial intersection of homologous rays. The problem was to identify homologous points on wires between the left and the

*F*—fundamental matrix, calculated from the positions and rotations of the left and right images,

The intersections of the epipolar lines and line segments representing the wire on the right image were then calculated to determine key points in it. Finally, the corresponding key points in both images were used to compute the spatial intersections and determine the terrain 3D coordinates right images in a given stereopair. Previous knowledge of corresponding wires and epipolar geometry was invoked to perform this task. Instead of identifying corresponding points on wires using feature descriptors, a purely geometric approach was chosen.

Each wire captured within the left and right images in a stereopair was represented by line segments established in the detection step. Two hundred evenly spaced points were chosen along the wire on the left image. Their coordinates were derived directly from the parameters of segments of the line. Then, for the nodal points, the respective epipolar lines on the right image were computed (Figure 11a) using a fundamental matrix (Equation (1)):

$$\mathbf{x}' \cdot \mathbf{F} = k'' \,, \tag{1}$$

where:

*x* 0 = [x y 1]—focal coordinates of a point on the left image, *F*—fundamental matrix, calculated from the positions and rotations of the left and right images, *k*" = [A B C]—parameters of line equation in a general form.

*Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 12 of 31

**Figure 11.** Three-dimensional reconstruction procedure: (**a**) calculated epipolar line in red; (**b**) the obtained key points and their resection. Symbol descriptions: O′, O″—left and right image projection centers; b—baseline, k', k″—epipolar lines, respectively, on the left and the right image; π—epipolar plane; Q, P—points in 3D space; P′, P″, Q′, Q″—P, Q points projections on the left and the right image; r', r″—homologous rays; x', y', x″, y″—image coordinates axis on the left and the right image. **Figure 11.** Three-dimensional reconstruction procedure: (**a**) calculated epipolar line in red; (**b**) the obtained key points and their resection. Symbol descriptions: O0 , O"—left and right image projection centers; b—baseline, k0 , k"—epipolar lines, respectively, on the left and the right image; π—epipolar plane; Q, P—points in 3D space; P0 , P", Q0 , Q"—P, Q points projections on the left and the right image; r 0 , r"—homologous rays; x0 , y0 , x", y"—image coordinates axis on the left and the right image.

2.3.5. Catenary Curve Fitting The wire in discrete representation was not sufficient for assessing power line diagnostics. The sag of the wire, which is acquired from a fitted catenary curve, was also needed. The expression for The intersections of the epipolar lines and line segments representing the wire on the right image were then calculated to determine key points in it. Finally, the corresponding key points in both images were used to compute the spatial intersections and determine the terrain 3D coordinates (Figure 11b). Together, they provided a discrete representation of the wire.

#### it describes the geometry of a wire hanging under its weight when supported only at its ends: 2.3.5. Catenary Curve Fitting

 = ∙ ℎ ( ), (2) where: = —the catenary constant, where The wire in discrete representation was not sufficient for assessing power line diagnostics. The sag of the wire, which is acquired from a fitted catenary curve, was also needed. The expression for it describes the geometry of a wire hanging under its weight when supported only at its ends:

− 

) − ∙ ℎ

 

$$y = k \cdot \cosh\left(\frac{\mathbf{x}}{k}\right) \tag{2}$$

), (3)

, (4)

The catenary curve was fitted to previously obtained points representing the wires using the where:

where:

where:

classical approach. It assumes that the points are represented in the local coordinate system, where the *x*-axis runs along the wire, while the coordinates of the *y*-axis correspond to the height of points *k* = *Fx <sup>q</sup>* —the catenary constant, where

of the wire was calculated as follows:

 , = − , = − , and

= ∙ ℎ

equation was determined in the following form:

− = ∙ ℎ (

, —a parallel offset of the terrain coordinate system from that of the catenary curve.

= +

 

∙ ( −

The solution to Equation (3) was obtained by using the least squares method**.** Three randomly selected points from a set of representative points on the wire were used to calculate the approximate values of the unknowns (i.e., *w*0, *u*0, and *k*0). Together with the *x* coordinate of each point, they were used to calculate the deviations in the *y* coordinates and, subsequently, the adjusted parameters of the catenary curve that best fitted a series of data points. Using this, the maximum value of the sag *Fx*—horizontal force on the cable [kG/mm<sup>2</sup> ],

*<sup>q</sup>*—the weight of the cable per unit arclength [kG/(m·mm<sup>2</sup> )].

The catenary curve was fitted to previously obtained points representing the wires using the classical approach. It assumes that the points are represented in the local coordinate system, where the *x*-axis runs along the wire, while the coordinates of the *y*-axis correspond to the height of points on the wire. This coordinate system was defined independently for each wire, and the catenary equation was determined in the following form:

$$y - w = k \cdot \cosh(\frac{\chi - \mu}{k}) \tag{3}$$

where:

*w*, *u*—a parallel offset of the terrain coordinate system from that of the catenary curve.

The solution to Equation (3) was obtained by using the least squares method. Three randomly selected points from a set of representative points on the wire were used to calculate the approximate values of the unknowns (i.e., *w*0, *u*0, and *k*0). Together with the *x* coordinate of each point, they were used to calculate the deviations in the *y* coordinates and, subsequently, the adjusted parameters of the catenary curve that best fitted a series of data points. Using this, the maximum value of the sag of the wire was calculated as follows:

$$f\_s = y\_A + \frac{b}{a} \cdot (\mathbf{x}\_S - \mathbf{x}\_A) - k \cdot \cosh \frac{\mathbf{x}\_S}{k} \tag{4}$$

where:

*<sup>x</sup><sup>S</sup>* <sup>=</sup> *<sup>k</sup>*·*arcsinh<sup>b</sup> a* , *a* = *x<sup>B</sup>* − *xA*, *b* = *y<sup>B</sup>* − *yA*, and *xA*, *yA*, *xB*, *yB*—coordinates of the beginning and end of a catenary curve in the local coordinate system. , , , —coordinates of the beginning and end of a catenary curve in the local coordinate system. Owing to the large number of points representing each wire and the random nature of their selection to calculate approximate values of parameters of the catenary curve, unsatisfactory results

Owing to the large number of points representing each wire and the random nature of their selection to calculate approximate values of parameters of the catenary curve, unsatisfactory results of curve fitting were possible (Figure 12a). To avoid such errors and obtain the best possible result, two additional assumptions were introduced: only solutions where the value of the sag of the wire (Equation (4)) was positive were considered, and the choice of the best solution was based on the RANSAC algorithm (Figure 12b). of curve fitting were possible (Figure 12a). To avoid such errors and obtain the best possible result, two additional assumptions were introduced: only solutions where the value of the sag of the wire (Equation (4)) was positive were considered, and the choice of the best solution was based on the RANSAC algorithm (Figure 12b). The final curve parameters and discrete representation of the wire were saved. The latter consisted of 1000 equally spaced points along each curve per wire.

**Figure 12.** The catenary curve fitting based on the randomly selected points: (**a**) without control, (**b**) with the implementation of the RANSAC algorithm. **Figure 12.** The catenary curve fitting based on the randomly selected points: (**a**) without control, (**b**) with the implementation of the RANSAC algorithm.

data allowed for the correct identification of obstacles too close to the power lines.

bounding box were recorded for further analysis by the power line operator (Figure 13).

distance to the nearest point representing the wire was determined.

One objective of the 3D reconstruction of power lines and the creation of the DSM is to monitor the separation of the wires from elements of land cover. We had to check whether the UAV-derived

The purpose of the analysis here was to detect DSM fragments (points in the point cloud) that were within the power line corridor. Cloud Compare software and its cloud-to-cloud distance function were used for this task. For each point in the point cloud constituting the DSM, the spatial

Next, all the points recorded within the corridor distance were segmented into set size voxels (3D pixels) and subsequently clustered into objects using neighborhood connections. Objects consisting of only one voxel were removed from the dataset. For each object, its volume, center, and

The final curve parameters and discrete representation of the wire were saved. The latter consisted of 1000 equally spaced points along each curve per wire.

#### *2.4. Detection of Obstacles within the Power Line Corridor*

One objective of the 3D reconstruction of power lines and the creation of the DSM is to monitor the separation of the wires from elements of land cover. We had to check whether the UAV-derived data allowed for the correct identification of obstacles too close to the power lines.

The purpose of the analysis here was to detect DSM fragments (points in the point cloud) that were within the power line corridor. Cloud Compare software and its cloud-to-cloud distance function were used for this task. For each point in the point cloud constituting the DSM, the spatial distance to the nearest point representing the wire was determined.

Next, all the points recorded within the corridor distance were segmented into set size voxels (3D pixels) and subsequently clustered into objects using neighborhood connections. Objects consisting of only one voxel were removed from the dataset. For each object, its volume, center, and bounding box were recorded for further analysis by the power line operator (Figure 13). *Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 14 of 31

**Figure 13.** Detected occlusion—positioned at 49°55′09.3′'N, 20°43′09.2′'E with a volume of 39.25 m<sup>3</sup> . A close look at a point cloud with voxels containing occlusion (**a**), bounding box containing the object with a point cloud representation of DSM (**b**), and source image with an arrow pointing to the detected occlusion (**c**). **Figure 13.** Detected occlusion—positioned at 49◦55009.3"N, 20◦43009.2"E with a volume of 39.25 m<sup>3</sup> . A close look at a point cloud with voxels containing occlusion (**a**), bounding box containing the object with a point cloud representation of DSM (**b**), and source image with an arrow pointing to the detected occlusion (**c**).

#### *2.5. Verification and Quality Assessment*

*2.6. Experimental Data* 

with varied relief.

proposed method were reliable.

2.6.1. Datasets I and II

*2.5. Verification and Quality Assessment*  To assess the algorithm, three approaches were adopted. First, the quality of the power line detection and subsequent reconstruction was evaluated. All data used in this assessment were firstly processed and subsequently manually checked for correct and incorrect detection. All errors were To assess the algorithm, three approaches were adopted. First, the quality of the power line detection and subsequent reconstruction was evaluated. All data used in this assessment were firstly processed and subsequently manually checked for correct and incorrect detection. All errors were marked, and a presumed source of error was noted.

marked, and a presumed source of error was noted. Secondly, the global accuracy of the reconstruction of the power lines was compared to established methods—total station (TS) and terrestrial laser scanning (TLS). The comparison was conducted in three ways. For each wire, both horizontal and vertical position discrepancies were calculated. Additionally, sag values were compared between proposed and reference methods. Though TS and TLS accuracy capabilities reach millimeter levels, this is not the case for a highly dynamic object, where small environmental influences change its geometry. Wire sag changes substantially with a change in temperature, while wind gusts cause oscillating vibrations. Considering the time needed to carry out both TS and TLS surveys, one has to expect significant changes within the surveyed power line. This notably diminishes the accuracy of both methods. Thus, the data were not treated as ground truths, but as reference, comparison data. Thirdly, corridor Secondly, the global accuracy of the reconstruction of the power lines was compared to established methods—total station (TS) and terrestrial laser scanning (TLS). The comparison was conducted in three ways. For each wire, both horizontal and vertical position discrepancies were calculated. Additionally, sag values were compared between proposed and reference methods. Though TS and TLS accuracy capabilities reach millimeter levels, this is not the case for a highly dynamic object, where small environmental influences change its geometry. Wire sag changes substantially with a change in temperature, while wind gusts cause oscillating vibrations. Considering the time needed to carry out both TS and TLS surveys, one has to expect significant changes within the surveyed power line. This notably diminishes the accuracy of both methods. Thus, the data were not treated as ground truths, but as reference, comparison data. Thirdly, corridor occlusions detected using the proposed method and TLS data were compared.

The data were divided into four datasets denoted as Dataset I, Dataset II, Dataset III, and Dataset IV. Dataset I was used to develop and optimize the algorithm. The effectiveness and the feasibility of the algorithm were analyzed on an independent dataset (Dataset II). The accuracy of the measurements of the power lines and corridor obstacle detection were assessed on Datasets III and IV. The datasets were independent, and therefore assessments of the reliability and accuracy of the

Dataset I was used for threshold sensitivity analysis and algorithmic optimization. The survey area contained 13 middle-voltage power line spans (14 utility poles (Figure 14a)), over 1.2 km in a rather flat area. The photogrammetric data were collected in March 2017 using a GRYF octocopter (FlyTech UAV, Krakow, Poland), fitted with a precision positioning system based on a singlefrequency GNSS receiver Emlid Reach M+ (Table 2). Dataset I contained 225 images captured with an a6000 camera (Sony, Tokyo, Japan) (Table 3) and Nakton 40 mm (Voigtlander, Braunschweig,

occlusions detected using the proposed method and TLS data were compared.

#### *2.6. Experimental Data*

To conduct the assessment, data were collected for several fragments of medium- and high-voltage power lines. The power lines were located in Małopolskie Voivodeship (Poland), in areas with varied relief.

The data were divided into four datasets denoted as Dataset I, Dataset II, Dataset III, and Dataset IV. Dataset I was used to develop and optimize the algorithm. The effectiveness and the feasibility of the algorithm were analyzed on an independent dataset (Dataset II). The accuracy of the measurements of the power lines and corridor obstacle detection were assessed on Datasets III and IV. The datasets were independent, and therefore assessments of the reliability and accuracy of the proposed method were reliable.

#### 2.6.1. Datasets I and II

Dataset I was used for threshold sensitivity analysis and algorithmic optimization. The survey area contained 13 middle-voltage power line spans (14 utility poles (Figure 14a)), over 1.2 km in a rather flat area. The photogrammetric data were collected in March 2017 using a GRYF octocopter (FlyTech UAV, Krakow, Poland), fitted with a precision positioning system based on a single-frequency GNSS receiver Emlid Reach M+ (Table 2). Dataset I contained 225 images captured with an a6000 camera (Sony, Tokyo, Japan) (Table 3) and Nakton 40 mm (Voigtlander, Braunschweig, Germany). The flight was fully autonomous and was conducted linearly along the power line in two strips. The flight altitude was 60 m above ground level and 50 m above the top of the poles, which yielded 6 and 5 mm of GSD, respectively (Table 4). *Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 15 of 31 Germany). The flight was fully autonomous and was conducted linearly along the power line in two strips. The flight altitude was 60 m above ground level and 50 m above the top of the poles, which yielded 6 and 5 mm of GSD, respectively (Table 4).

Pixel size 15.28 μm<sup>2</sup>

Shutter Mechanical curtain

II.

(**a**) (**b**)

(4.5 × 4.5 μm)

Mechanical central (without rolling shutter effect)

**Figure 14.** Example images of utility poles supporting power lines: (**a**) for Dataset I; (**b**) for Dataset **Figure 14.** Example images of utility poles supporting power lines: (**a**) for Dataset I; (**b**) for Dataset II.


PPK Vertical position accuracy 14 mm + 1 ppm

(with rolling shutter effect)

Interchangeable lens YES NO Focusing system mechanical electronic Aperture setting F/5.6 F/4.0 Shutter setting 1/1000 s 1/1600 s ISO setting Auto 100–400 Auto 100–400

**Table 2.** UAV on-board global navigation satellite system (GNSS) receiver parameters.

**Table 4.** Basic parameters of UAV missions. **Dataset Number of Images Flight Altitude GSD Side/Front Overlap** Dataset I 225 60 m 6 mm 70%/50% Dataset II 238 70 m 9 mm 75%/75% Dataset III 282 60 m 8 mm 75%/75% Dataset IV 203 124 m 15 mm 75%/70%

Dataset II was used to validate the feasibility, reliability, and efficiency of the proposed method. It consisted of 238 images captured with an a6000 camera (Sony) (Table 3) and 30 mm Sigma lens

**Table 3.** Camera parameters.

**Model Sony a6000 Sony RX1RM2** Image Sensor APS-C (15.6 × 23.5 mm) FF (35.9 × 24 mm)


**Table 3.** Camera parameters.

**Table 4.** Basic parameters of UAV missions.


Dataset II was used to validate the feasibility, reliability, and efficiency of the proposed method. It consisted of 238 images captured with an a6000 camera (Sony) (Table 3) and 30 mm Sigma lens (Sigma Corporation, Kawasaki, Japan) in November 2017. The flight was performed using a GRYF octocopter (FlyTech UAV). It was fully autonomous and was conducted in three strips parallel to the power line. The flight altitude was 70 m above ground and 40 m above the top of the poles (Table 4). Both the side and front overlap between the images were 75%. The flight area covered four spans of high-voltage power lines (five utility poles (Figure 14b)) with a total length of 1.4 km. Only two outer strips were used in the processing.

#### 2.6.2. Datasets III and IV

Datasets III and IV were used to verify the accuracy of the proposed method. They included UAV imagery as well as the results of the survey of power lines through TLS and classic TS measurements. The 3D geometry of the wires is constantly changing as a consequence of changing weather conditions (especially changes in temperature). It was thus assumed that the data for the power lines needed to be collected using different methods simultaneously. Unfortunately, the time needed to perform each of the surveys varied greatly from a couple of hours (UAV) to a couple of days (TLS).

The data marked as Datasets III and IV concern power lines located in hilly and difficult to access areas. Dataset III contained data on a segment of a medium-voltage power line consisting of 10 spans, with a total length of 1.3 km. The segment was located in an area with a maximum height difference of 60 m. The power line was fitted with three transmission wires set at the same height. A 400 kV high-voltage power line was another object of research, and data related to it were collected in Dataset IV. The tested section of the power line with a total length of 1.55 km consisted of four spans. The height difference between the beginning and end of the analyzed power line section was 100 m, with a height difference of 60 m for one of the spans. It was fitted with 12 transmission wires and two ground wires, all positioned at varying heights.

The photogrammetric data in Datasets III and IV included high-resolution digital images of the power lines taken with a DSC-RX1RM2 (35 mm) non-metric camera (Sony) (Table 3) from the GRYF octocopter (FlyTech UAV) on 6 December 2018 (Table 4). The images were captured in two strips parallel to the power line that was the subject of the measurement. A network of GCPs and checkpoints (CPs) was established for each dataset. The coordinates of markers were determined by the GNSS real time network (RTN) method with a Leica GS16 receiver (Leica Geosystems AG, Sankt Gallen, Switzerland) that had a horizontal accuracy of 3 cm and a vertical accuracy of 5 cm.

The segments of power lines represented by Datasets III and IV were measured using other methods such as TLS and TS measurements for reference. Fieldwork was performed on 1–3 and 6 December 2018. The weather conditions during the measurements were variable. The temperature was between −5 ◦C and +5 ◦C, consequently changing the power line sag. In addition, on 2 and 3 December, the wind blew at a speed of up to 15 m/s (gusts), which caused the oscillating movement of the wires.

For TS measurements, a Nova MS50 (Leica Geosystems AG) was used. Each span (all its wires) was measured from a single instrument station approximately located in the middle of the span. A single wire point at its beginning and end, and several points along its entire length, were measured. The coordinates of the stations and reference points were determined by the GNSS RTN method. Using these data, the spatial coordinates of points representing the individual wires were determined.

In addition to the TS measurements, the power lines were measured using TLS. The Leica ScanStation C10 laser scanner (Leica Geosystems AG), together with a set of HDS (High Definition Survey) 6" targets (Leica Geosystems AG), were used for this purpose. The TLS measurements were carried out using the three-tripod method with a traverse workflow. The resolutions of the TLS were 10 and 7.5 mm at a distance of 10 m. This was connected with the maximum reduction in measurement time while maintaining satisfactory scanning results and was preceded by tests to determine the optimum scanning resolution depending on the type of power line and the distance between stations. The data were collected at 39 stations for Dataset III and 29 stations for Dataset IV. Finally, the point clouds from all scanner stations were registered and georeferenced (Figure 15) using Leica Cyclone v.9.4.2 (Leica Geosystems AG) software. The point cloud registration was performed using HDS 6" targets and their coordinates, as determined by the GNSS RTN method. The final root mean-squared errors (RMSEs) of the registration of the point clouds was 1.2 cm for Dataset III and 1.0 cm for Dataset IV. *Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 17 of 31 using Leica Cyclone v.9.4.2 (Leica Geosystems AG) software. The point cloud registration was performed using HDS 6″ targets and their coordinates, as determined by the GNSS RTN method. The final root mean-squared errors (RMSEs) of the registration of the point clouds was 1.2 cm for Dataset III and 1.0 cm for Dataset IV.

**Figure 15.** High-resolution terrestrial laser scanning (TLS) point cloud of a high-voltage power line (Dataset IV). The color of the points is determined by the intensity (the strength of the reflected laser beam). **Figure 15.** High-resolution terrestrial laser scanning (TLS) point cloud of a high-voltage power line (Dataset IV). The color of the points is determined by the intensity (the strength of the reflected laser beam).

The reference data did not fully reflect the state of the power lines when measured using the UAV because different durations were needed to record the data using different means. UAV missions to survey sections for Datasets III and IV were performed over one day and lasted four The reference data did not fully reflect the state of the power lines when measured using the UAV because different durations were needed to record the data using different means. UAV missions to survey sections for Datasets III and IV were performed over one day and lasted four hours (including

hours (including preparatory work). The TLS and TS surveys were time consuming and fieldwork

This section presents the results of the validation of the proposed method for using UAV images to detect power lines, provide a 3D reconstruction, and localize obstacles in the power lines corridor.

The photogrammetric data were processed using Agisoft Metashape Professional. Aerotriangulation was performed at a high level of accuracy, whereby the software worked with images at their original sizes. Optimization was performed in the next stage and included a realignment of the image block and the determination of the parameters for camera calibration. In the case of Dataset I and Dataset II, the GCPs were not used in bundle adjustment; instead, only the precise coordinates of the projection centers of the images, measured using GNSS PPK, were used. The GNSS PPK calculations were done using RTKLIB software, utilizing measurements from a GNSS base station set up for the duration of the survey. However, in the bundle adjustment of images from Datasets III and IV, the GCPs were used. The a priori accuracy of the GCPs was taken into account in this process. Table 5 presents data on the accuracy of the aerotriangulation of Dataset III and Dataset IV. The last stage involved generating dense point clouds at a high level of detail, which means that the software determined the spatial coordinates for each group, consisting of four pixels in the image

Information on the location of each pole within sections of the power lines was manually obtained through Agisoft Metashape Professional. Finally, the data necessary for further processing

using these methods lasted four days.

**3. Results** 

(2 × 2 pixels).

*3.1. Bundle Adjustment* 

preparatory work). The TLS and TS surveys were time consuming and fieldwork using these methods lasted four days.

#### **3. Results**

This section presents the results of the validation of the proposed method for using UAV images to detect power lines, provide a 3D reconstruction, and localize obstacles in the power lines corridor.

#### *3.1. Bundle Adjustment*

The photogrammetric data were processed using Agisoft Metashape Professional. Aerotriangulation was performed at a high level of accuracy, whereby the software worked with images at their original sizes. Optimization was performed in the next stage and included a realignment of the image block and the determination of the parameters for camera calibration. In the case of Dataset I and Dataset II, the GCPs were not used in bundle adjustment; instead, only the precise coordinates of the projection centers of the images, measured using GNSS PPK, were used. The GNSS PPK calculations were done using RTKLIB software, utilizing measurements from a GNSS base station set up for the duration of the survey. However, in the bundle adjustment of images from Datasets III and IV, the GCPs were used. The a priori accuracy of the GCPs was taken into account in this process. Table 5 presents data on the accuracy of the aerotriangulation of Dataset III and Dataset IV. The last stage involved generating dense point clouds at a high level of detail, which means that the software determined the spatial coordinates for each group, consisting of four pixels in the image (2 × 2 pixels).


**Table 5.** Root mean-squared errors (RMSEs) of coordinates of the GCPs and checkpoints.

Information on the location of each pole within sections of the power lines was manually obtained through Agisoft Metashape Professional. Finally, the data necessary for further processing were exported to reconstruct the 3D geometry of the power lines based on the UAV images (i.e., final EOE, IOE, undistorted images, and coordinates of the poles and the dense point cloud).

#### *3.2. Results of Processing Datasets I and II*

The proposed method for 3D reconstruction of power lines in UAV images used multiple thresholds. To establish appropriate values, a threshold sensitivity analysis was conducted. The process is described in Appendix A. The established set of thresholds was then used in the processing of all experimental data (filter size—30, *cr*—0.4, *cl*—0.4, *e*\_*distmax*—10, *p*—10).

Owing to a lack of reference data for Datasets I and II, as well as the use of a camera with a rolling shutter, a check was performed manually to verify only the efficiency of the proposed detection algorithm. Errors were recorded whenever a detected line segment did not overlay an empirically determined line within an image segment. As a summary of the validation, the success rate was calculated. Each wire in an image was considered a case. A complete and correct case detection was regarded as a success, and any other outcome was considered a failure. The success rate here was the ratio of the number of cases of success to the total number of cases in the dataset.

Dataset I contained 225 images, covering 13 power line spans (14 utility poles (Figure 14a)). Three wires were spaced equally at the same height above ground. The survey was conducted in a rural area, where the background consisted mostly of farm fields, meadows, backyards, and occasional roads. Using Dataset I, the proposed algorithm derived three survey sections (straight sections) between

line.

the utility poles 1–6, 6–7, and 7–14. The processing was smooth and detection was uninterrupted (Figure 16). The detection was rejected in 13 images (detection parameters were below set thresholds). Upon closer inspection, six images with less than perfect detection were obtained. A total of seven errors were recorded in cases where the power line was incorrectly detected. Six of these occurred due to an unfavorable background, and the other was in the vicinity of a utility pole. The calculated success rate was 98.96% (Table 6). *Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 19 of 31

**Figure 16.** Detection results of the wires: (**a**) original image; (**b**) processed image with highlighted wires detected. **Figure 16.** Detection results of the wires: (**a**) original image; (**b**) processed image with highlighted wires detected.


**Table 6.** The results of processing Datasets I and II.

(**a**) (**b**) (**c**)

**Figure 17.** Example problematic images: (**a**) closely recorded wires; (**b**,**c**), two wires recorded as one

wires detected.

(**a**)

(**b**)

Dataset II contained 238 images covering four spans of high-voltage power lines (Figure 14b), featuring six transmission wires and two ground wires. As the wires were at different heights, they were recorded in different configurations in the right and left side images, with some wires lying only a few pixels from one another (Figure 17). The background varied greatly, from forest to industrial. Owing to the high sag of the wires, all spans were processed separately. In this dataset, far more errors were observed. They mostly occurred due to unfavorably placed wires in images (placed close to or overlying wires). Moreover, in a few cases, mistakes were caused by a challenging background. A success rate of 92.16% was calculated (Table 6). A lower success rate was expected because Dataset II consisted of far more challenging images than Dataset I. The wires were barely visible, while the background was very unfavorable, containing mostly industrial waste or dried vegetation.

industrial. Owing to the high sag of the wires, all spans were processed separately.

**Figure 16.** Detection results of the wires: (**a**) original image; (**b**) processed image with highlighted

Dataset II contained 238 images covering four spans of high-voltage power lines (Figure 14b), featuring six transmission wires and two ground wires. As the wires were at different heights, they were recorded in different configurations in the right and left side images, with some wires lying

*Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 19 of 31

**Figure 17.** Example problematic images: (**a**) closely recorded wires; (**b**,**c**), two wires recorded as one line. **Figure 17.** Example problematic images: (**a**) closely recorded wires; (**b**,**c**), two wires recorded as one line.

In this dataset, far more errors were observed. They mostly occurred due to unfavorably placed wires in images (placed close to or overlying wires). Moreover, in a few cases, mistakes were caused by a challenging background. A success rate of 92.16% was calculated (Table 6). A lower success rate was expected because Dataset II consisted of far more challenging images than Dataset I. The wires were barely visible, while the background was very unfavorable, containing mostly industrial waste or dried vegetation.

#### *3.3. Results of Processing Datasets III and IV*

The data collected and pre-processed from Datasets III and IV enabled the assessment of the accuracy of the 3D reconstruction of the power lines using UAV imagery. Attention was paid here to the comparison of the resultant catenary curves obtained from the proposed method and reference methods.

The UAV images were processed using the method proposed in Section 2, using thresholds established in Appendix A, and the 3D geometry of the wires was saved for comparison. A detailed manual check of the detection was not performed. Processing for Dataset III was uninterrupted, while processing for Dataset IV was manually restarted once to detect the ground wires. The RMSEs of fitting the catenary curve to the photogrammetric data varied from ±1.0 cm to ±23.8 cm for Dataset III, and from ±2.1 cm to ±19.1 cm for Dataset IV, which shows sufficient accuracy for the purpose of corridor clearance monitoring [7,8]. The largest errors were noted in the detection of the ground wires in Dataset IV and were related to their diameters and the difference in sag from the transmission wires. Most of them were fixed by a second iteration of the detection for problematic images.

The catenary was also fitted into the data collected using TLS and TS. A single wire was usually represented by several thousand TLS-derived points and a few dozen points obtained from TS measurement. The RMSEs of the curve fitting into TLS-derived points varied between ±0.4 cm and ±3.9 cm for Dataset III. For Dataset IV, it varied from ±0.7 cm to ±9.3 cm. In the case of TS data, the RMSEs of the catenary fitting were between ±0.2 cm and ±9.2 cm for medium-voltage power lines (Dataset III), and between ±2 cm and ±15.1 cm for high-voltage power lines (Dataset IV). Multiple discrepancies were observed in the data, most likely due to small momentary vibrations in the wires. During two days of the survey, there was a strong wind, which led to high-amplitude vibrations in the wires. This had a direct impact on the quality of the results obtained.

The accuracy of the proposed method was verified in three ways: comparisons in both the vertical and the horizontal plane, and a comparison of the parameters of the catenary curve.

A comparison of the calculated heights of the corresponding points was carried out between the proposed and the reference methods. The catenary curves were represented by 1000 points for the UAV-based method, with 1000 points calculated with respect to the XY positions for the TS, and source data for the TLS. The corresponding comparative points were found by locating the nearest points in between points from TLS and UAV datasets.

As a measure of accuracy, the RMSEs of the differences in height for individual wires were calculated. The results of the comparison are presented in Table 7.


**Table 7.** The accuracy of wire reconstruction using the proposed method.

TS—total station measurements, TLS—terrestrial laser scanning, UAV—proposed method based on UAV imagery; TS–TLS—difference between TLS and TS; TS–UAV—difference between UAV and TS; TLS–UAV—difference between UAV and TLS.

The results of the recorded wire geometry using TS and TLS measurements were highly consistent. The mean difference in height for Dataset III was −2.0 cm, and for Dataset IV it was −3.0 cm, with RMSEs of ±2.3 cm and ±6.1 cm, respectively. After removing outliers, the mean differences in height between the reference data and the UAV-derived data varied from −13.0 to −11.0 cm for Dataset III, and from −17.2 to −14.3 cm for Dataset IV. This means that, within the vertical profile, wires reconstructed using UAV imagery were placed higher than those determined by means of the reference methods. The discrepancies can be attributed to differences in the densities of points at wires and the continuity of TLS point clouds representing the wires. This was especially prominent in the longest span—a 500 m long span in Dataset IV. The span length is one of the most important factors in changes to the sag of the wire due to temperature differences. Since the TLS survey took a significant amount of time, discrepancies were to be expected. In light of this, as well as the expected accuracy of the 3D reconstruction for the provided data, we can conclude that the proposed method achieved the expected accuracies.

To assess the proposed method along the horizontal plane, discrete descriptions of the UAV-derived catenary curves were used. For the TS and TLS, line representations of the catenary curves in the XY plane were utilized. The distances between UAV-derived points and horizontal lines fitted to TS and TLS data were calculated for each point regardless of its position toward the reference lines (absolute value).

For Dataset III, the results of the analysis were promising. The mean horizontal distance between the UAV-derived data and the reference data was, on average, 2.8 cm, with a maximum value not exceeding 7.8 cm. A comparison of the TLS and TS data gave similar results. In the case of Dataset IV, the horizontal accuracy of the geometry of the wires determined using UAV imagery was worse. The mean value of the parameter analyzed was 12.0 cm for TS–UAV differences and 26.1 cm for TLS–UAV differences. RMSE values were ±3.8 cm for TS–UAV differences and ±12.3 cm for TLS–UAV differences (Table 7). The maximum value obtained for the ground wires reached up to 1.2 m and was significantly higher compared to transmission wires. The discrepancies can be attributed to many factors: the complexity of the power line captured within Dataset IV, significantly longer spans (which

translated into larger geometrical changes), and the height of wires above the ground. The root of the problem probably lay within the bundle adjustment procedure, where most of the tie points were located on the ground. The longer the distance to the ground, the larger the errors are to be expected in the 3D reconstruction.

Another parameter used for comparison was the value of the maximum sag *f<sup>s</sup>* of the wire calculated for each method, according to Equation (4). This parameter is required for power line inspections in many countries. The differences in the maximum sag of the wires ∆*f<sup>s</sup>* were calculated among the measurement methods. The RMSEs of ∆*f<sup>s</sup>* are listed in Table 7. The occurrence of outliers was connected with incomplete TS and TLS data, collected during wire measurements. The results are presented here for medium- (Dataset III) and high-voltage (Dataset IV) power lines.

For Dataset III, the results obtained using UAV data differed on average by −3.6 cm and −6.1 cm from the results of TLS and TS data processing, respectively. The RMSEs of these differences were ±16.5 cm and ±14.5 cm, respectively.

There was no significant decrease in the accuracy of the reconstruction of the vertical geometry of the wire for spans of the medium-voltage power line (Dataset III) located in forested areas, which occurred when the maximum values of the sag of the wires were determined. In the case of Dataset IV, a cross-comparison of all measurement methods used gave similar results, both in terms of the values of mean difference and RMSEs. Therefore, the accuracy of determining the maximum sag of the wire using UAV images did not significantly differ from the accuracy of calculating this parameter by means of TS and TLS measurements.

#### *3.4. Validation of Detection of Obstacles within the Power Line Corridor*

Reference data were used to validate obstacle detection in data processed by the created solution. The TLS data were chosen as a reference owing to the great detail of geometry reconstruction, both for the wires and in the vicinity of the power line.

Two analyses were conducted, visual and quantitative. The visual analysis included comparing data resulting from distance analysis in Cloud Compare software. A check was performed to see if similar places were included in the resultant occlusion sets. The exemplary results are shown in Figure 18, where points in the point cloud are colored according to the distance to the power lines.

The results obtained using the two methods (TLS, UAV) were consistent, especially in places where abnormalities related to the maintenance of an appropriate separation between the wires and elements of land cover were significant.

Then, for quantitative analysis, occlusion points were submitted to a voxelization procedure and, lastly, clustered into separate objects (Figure 19). For each object summary, the approximated volume was calculated as a sum of the volume of all voxels that formed it. Then, all the data were summed up and compared to UAV and TLS data results (Table 8).

**Table 8.** Summary of detected occlusions within the power lines corridor. The corridor for Dataset III was 5 m distance from the power lines, while for dataset IV it was 15 m.


Big differences can be observed between TLS and UAV data. This was to be expected. TLS and UAV products due to their acquisition procedures being widely different. Photogrammetric products do not penetrate greenery, while TLS does. Thus, TLS data have the advantage of a more continuous representation of lattice objects. This can cause an effect of one object in TLS data being represented by multiple objects in UAV data (Figure 19e,f). As a consequence, the recorded volume of obstacles determined using TLS data will be far higher than using UAV data.

Similarly, very small vegetation elements could have been picked up by TLS but would not come up within dense point cloud reconstruction (Figure 19a,b). The importance of such objects is negligible. Nonetheless, that also causes differences in the total volume of occlusions. The date of the data acquisition was also not without consequence. Bare trees are quite difficult to reconstruct in photogrammetric data, so—for better results—data should be acquired during the growing season. *Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 22 of 31

**Figure 18.** Results of the monitoring of occlusions of a high-voltage power line corridor (Dataset IV): (**a**) based on TLS data; and (**b**) based on UAV data. The red color indicates points less than 10 m from **Figure 18.** Results of the monitoring of occlusions of a high-voltage power line corridor (Dataset IV): (**a**) based on TLS data; and (**b**) based on UAV data. The red color indicates points less than 10 m from the wires and the blue color indicates those more than 15 m away.

The results obtained using the two methods (TLS, UAV) were consistent, especially in places where abnormalities related to the maintenance of an appropriate separation between the wires and

Then, for quantitative analysis, occlusion points were submitted to a voxelization procedure and, lastly, clustered into separate objects (Figure 19). For each object summary, the approximated

Big differences can be observed between TLS and UAV data. This was to be expected. TLS and UAV products due to their acquisition procedures being widely different. Photogrammetric products do not penetrate greenery, while TLS does. Thus, TLS data have the advantage of a more continuous representation of lattice objects. This can cause an effect of one object in TLS data being represented by multiple objects in UAV data (Figure 19e,f). As a consequence, the recorded volume of obstacles

Similarly, very small vegetation elements could have been picked up by TLS but would not come up within dense point cloud reconstruction (Figure 19a,b). The importance of such objects is

the wires and the blue color indicates those more than 15 m away.

determined using TLS data will be far higher than using UAV data.

elements of land cover were significant.

negligible. Nonetheless, that also causes differences in the total volume of occlusions. The date of the data acquisition was also not without consequence. Bare trees are quite difficult to reconstruct in photogrammetric data, so—for better results—data should be acquired during the growing season.

**Table 8.** Summary of detected occlusions within the power lines corridor. The corridor for Dataset III

No. of objects 264 98 214 709

**Dataset III Dataset IV TLS UAV TLS UAV**

) 62,794 4217 228,573 93,451

) 7849.25 527.125 28,571.625 11,681.375

was 5 m distance from the power lines, while for dataset IV it was 15 m.

Volume (m<sup>3</sup>

(**e**) (**f**)

**Figure 19.** Exemplary results of obstacle detection for proposed methods and TLS reference data. Points recorded within the corridor (**a**) UAV, (**b**) TLS, calculated voxels (**c**) UAV, (**d**) TLS, clustered objects (**e**) UAV, (**f**) TLS. **Figure 19.** Exemplary results of obstacle detection for proposed methods and TLS reference data. Points recorded within the corridor (**a**) UAV, (**b**) TLS, calculated voxels (**c**) UAV, (**d**) TLS, clustered objects (**e**) UAV, (**f**) TLS.

#### **4. Discussion**

**4. Discussion**  This paper presents a comprehensive method of processing UAV images to detect and reconstruct 3D power lines, then subsequently compare them with a point cloud representation of This paper presents a comprehensive method of processing UAV images to detect and reconstruct 3D power lines, then subsequently compare them with a point cloud representation of DSM to localize any objects threatening the safety of the power lines.

DSM to localize any objects threatening the safety of the power lines. Power line inspections are a topical issue these days. Modeling wires in 3D space is essential for the assessment of power line safety. Thus, their reconstruction has received much attention. However, unlike other studies on the subject, this paper presents the entire workflow for corridor clearance monitoring. Each step of the proposed method, starting with data acquisition requirements and ending with obstacle detection, is described comprehensively. The wire detection in UAV images was performed using a decorrelation stretch for initial image processing, the modified Prewitt filter for edge enhancement, and RANSAC with additional parameters for line fitting. This classic approach causes the line extraction in the UAV images to take place in a controlled way and, if necessary, the user can modify the processing parameters (thresholds) and even manually restart the process when detection errors occur. The combination of these elements creates a solution that is robust to low-quality input data (images). Despite a variety of backgrounds and the dubious visibility of wires, the created solution

managed to consistently detect power lines in a series of images. Its highest achieved success rate exceeded 98% and remained above 90% for more challenging data. Good performance in the highly changeable environment can be attributed to complete disregard for the wire color, as well as the implementation of a local, ordered approach, which made the method adaptable to both contrast change as well as angle change in the wire positions.

The algorithm does not require an extensive learning set, which makes it different from many deep learning methods [37,38], which are currently gaining popularity. Similar to some other solutions [23,35,40,41], wire reconstruction in 3D space is performed using epipolar geometry. However, the proposed RANSAC-based approach to fit the catenary curve to previously obtained points representing the wires minimizes the noise.

Despite the fact that the algorithm is not completely autonomous, it is relatively flexible and robust. The minor user intervention allows the system to be applied in a variety of cases, either to a low or high voltage. However, there is room for improvement in the proposed method. Such errors as pixel discrepancies due to detection transfer are removed while fitting the chain curve to the 3D data. Others, such as a complete loss of detection, or overlay with neighboring wires, can be solved by implementing a two-step approach: after initial processing, the algorithm automatically directs the user to problematic sections where the process can be restarted or corrected manually.

There are multiple thresholds set in the method, though it must be said that, after an initial threshold sensitivity analysis, none were changed for any of the tested sets, nor in further commercial exploitation.

Another drawback might be the mandatory initial user input. However, it must be stated that this is limited to two points per wire for the whole survey section, which takes no more than a couple of minutes. Many methods can be applied instead of initial user input. The approximate direction based on the positions of the utility poles and the Hough transform [50] was used to detect the wires in a test phase in this research. The initial results were encouraging for medium-voltage power lines. However, with the diversification of the background and the introduction of wires at multiple heights in high-voltage lines, this method failed. The problem of the relative positioning of wires between images in stereopairs is complex. It is the process of recognizing the same wire in images within a stereopair. As wires are made of the same material, and due to the large distance between the background and them, there is no information linking two images that capture the same wire. An alternative here would be to include a third verification strip of images or to create multiple 3D reconstructions containing all combinations of wires and then deciding on the pairing. The first process would extend the duration of the survey and necessitate different mission planning methodologies. The second would also require either manual input or prior knowledge of the number of wires and types of utility poles.

In many studies on power line inspections, datasets have been small or numerical quality analyses have been omitted [23,40,41]. The method presented in this paper has been profusely tested beyond its scope since it has already been commercially implemented. This study demonstrates that the accuracy of the proposed method for the 3D reconstruction of power lines is consistent with that achieved using classical measurements. Similar accuracies are reported by Oh and Lee [40], but their accuracy analyses were limited to only two wires whose geometry was determined using the reference method.

It should be noted that none of the verification methods used for accuracy assessment in this study were free from errors or significantly more accurate than any other. As the true value of the measured sag of the wires or their 3D geometry was unknown, it was only possible to assess the mutual compatibility of the methods used, and not their absolute accuracy.

The results of the accuracy assessment described in this paper were also influenced by the fact that surveys of power lines using three methods (UAV, TLS, TS) were not carried out simultaneously, or under the same weather conditions. This was due to the technical feasibility of surveys. For most spans, the TS and TLS measurements were performed in parallel (except for one span of a high-voltage power line) and lasted four days. On the contrary, UAV flights over power lines in Datasets III and IV were completed in one day. For this reason, different conditions of the power lines were recorded between the methods.

The mapping of vegetation around power lines is also an issue of great importance. In this paper, we assessed whether UAV-derived point cloud representations of DSM, generated using a classic approach, are sufficient to monitor power line corridors. In the case of the UAV-based photogrammetric products, the most common error was missing data in representing the DSM. The solution to this was to take UAV images in more than two strips. However, this did not significantly increase the correctness of tree reconstruction for the DSM, which, in turn, was the main reason for incorrect DSM extraction. One way to solve this problem is to capture UAV-based photogrammetric data during the vegetation period. However, as a rule, in the immediate vicinity of the power line (up to ~10 m from the line axis), the quality of the UAV-derived DSM was adequate to analyze the separation between the wires from and elements of the land cover.

#### **5. Conclusions**

The aim of this research was to create a novel method based on UAV imagery for occlusion monitoring in the corridor of power lines. The proposed method mainly consists of three parts: the reconstruction of the geometry of the wires in 3D space, the reconstruction of point cloud representation of the DSM, and subsequently detecting obstacles in the power lines corridor. Power line reconstruction, which was carried out using images captured for the DSM calculation, is its essential part. Well-known computer vision algorithms and epipolar geometry were adopted for this task. This makes the proposed method user-friendly and allows for image processing to be performed in a fully controlled way. There are other merits: no training data are required, the method is robust to low-quality input data, and the RANSAC-based approach to model the wires reduces the influence of the noise.

An integral part of the proposed method is a workflow for the detection of obstacles in the power line corridor. Obstacles are selected by calculating the distance between power lines and each point in the point cloud representation of the DSM and simplified into voxels and then objects. The analyzed data are georeferenced. Thus, the parts of the power line corridor where the maintenance work has to be performed are documented using both the precise information of their locations and images.

The feasibility of the proposed UAV-based method for the 3D reconstruction of power lines and corridor clearance monitoring was confirmed by reference surveys. They achieved results similar to those obtained using other available solutions. The method's relatively high accuracy, comparable with that obtained by means of the reference measurements, was also verified. The accuracy of its 3D reconstruction for medium-voltage power lines was 15 cm. In the case of high-voltage power lines, it did not exceed 30 cm. The proposed method allows for measurement data to be collected in a relatively short time, and is cheaper than other commonly used methods in the area.

In the case of corridor clearance monitoring, the results are also satisfying. Visual analysis proved that obstacles were detected in the same places for both UAV and TLS data. However, there were big differences in the volume of calculated obstacles in between methods. This was expected due to the properties of both data acquisition methods. However, this does not disprove the usability of the method. Crucial obstacles were identified, and the presence of obstacles is of the utmost importance. In future works, more focus should be placed on DSM.

The results of this study were implemented for commercial use by FlyTech UAV. The algorithm has already been used to measure several hundreds of kilometers of power lines.

**Author Contributions:** Conceptualization, E.P. (El˙zbieta Pastucha), A.S., W.N. and E.P. (Edyta Puniach); ´ methodology, E.P. (El ˙zbieta Pastucha), A.S. and E.P. (Edyta Puniach); software, E.P. (El ˙zbieta Pastucha) and A. ´ S.; ´ validation, E.P. (Edyta Puniach) and P.C.; formal analysis, E.P. (El˙zbieta Pastucha), E.P. (Edyta Puniach), A. ´ S., ´ P.C., W.N. and P.W.; investigation, E.P. (El ˙zbieta Pastucha) and E.P. (Edyta Puniach); survey, E.P. (Edyta Puniach), ´ P.C. and P.W.; writing—original draft preparation, E.P. (El ˙zbieta Pastucha), E.P. (Edyta Puniach), A. ´ S., P. ´ C., W.N. ´ and P.W.; writing—review and editing, E.P. (El ˙zbieta Pastucha) and E.P. (Edyta Puniach); project administration, E.P. (El˙zbieta Pastucha) and E.P. (Edyta Puniach); funding acquisition, P.C. All authors have read and agreed to ´ the published version of the manuscript.

**Funding:** The research for and publication of this paper were completed as part of the project "Development of an unmanned aerial vehicle with diagnostic and surveying capabilities for use in the operation of power grids, especially high and medium-voltage grids", co-financed by the European Regional Development Fund as part of the Regional Operational Programme of Małopolskie Voivodeship for 2014–2020, Priority Axis 1–Knowledge-based economy, Measure 1.2, Research and Innovation in Enterprises, and Sub-measure 1.2.1, R&D projects of enterprises. The APC was funded by AGH University of Science and Technology under the Initiative for Excellence Research University project.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

To establish appropriate threshold values, a set of seven images was chosen from Dataset I. The choice was made to maintain the maximum diversity of backgrounds within the images:


Threshold sensitivity analysis was performed on the given images. As all thresholds were connected in a sense, it was not possible to perform separate tests, and not all values were tested numerically to simplify the process.

Multiple thresholds were used. They can be divided into two categories. The first, and less important, consisted of thresholds that were dependent on the size of the image or had been arbitrarily chosen. This group included segment window size, segment overlap and binarization threshold. It was decided to use five segments per image, though—depending on whether the sag of the wires was big or small—the number could be increased or decreased. The overlap value was set between 0% and 10% to avoid excessive calculations. The binarization threshold depended on the width of the wires in the images and, as a consequence, on the approximated values of pixels in segments that captured the wire. Within the data used in this study, each wire was around two pixels in width; thus, at a segment *Remote Sens.*  size of 1000, the binarization threshold was set to 0.003. **2020**, *12*, x FOR PEER REVIEW 27 of 31

and left edge lines within the image segment.

filter size,

the image segment,

image segment,

(**f**) 50 pix., (**g**) 60 pix.

calculated for each group (Figure A3).

segment,

(**a**) (**b**)

**Figure A1.** Sample images used for threshold sensitivity analysis: (**a**) ID 217, (**b**) ID 36. **Figure A1.** Sample images used for threshold sensitivity analysis: (**a**) ID 217, (**b**) ID 36.

The second group of thresholds consisted of thresholds that needed to be set based on the threshold sensitivity analysis. They included: The second group of thresholds consisted of thresholds that needed to be set based on the threshold sensitivity analysis. They included:

\_—a maximum distance between the right and the left edge lines within the image

—parallelism coefficient, a quotient of the minimal and maximum distances between the right

(**a**) (**b**) (**c**) (**d**) (**e**) (**f**) (**g**)

**Figure A2.** Results of filtration for filter sizes: (**a**) 5 pix., (**b**) 10 pix., (**c**) 20 pix., (**d**) 30 pix., (**e**) 40 pix.,

To test the remaining thresholds, a different approach was used. The approximated positions of the wires were defined on the test images, and detection was performed to identify the relevant parameters that were saved in a file. A manual classification was then performed to sort the results into correct and incorrect detection groups. The values of the descriptive statistics were then

thresholds, the wires became more distinguishable with a filter size of up to 30.

The filter size was tested first. Filtration was performed on all test images, and each one was then normalized. Their results were compared to find the maximal difference between the wires and the background. The following values were tested: 5, 10, 20, 30, 40, 50, and 60 (Figure A2). In the listed threshold sensitivity analysis. They included:


(**a**) (**b**)

**Figure A1.** Sample images used for threshold sensitivity analysis: (**a**) ID 217, (**b**) ID 36.

*Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 27 of 31


The filter size was tested first. Filtration was performed on all test images, and each one was then normalized. Their results were compared to find the maximal difference between the wires and the background. The following values were tested: 5, 10, 20, 30, 40, 50, and 60 (Figure A2). In the listed thresholds, the wires became more distinguishable with a filter size of up to 30. The filter size was tested first. Filtration was performed on all test images, and each one was then normalized. Their results were compared to find the maximal difference between the wires and the background. The following values were tested: 5, 10, 20, 30, 40, 50, and 60 (Figure A2). In the listed thresholds, the wires became more distinguishable with a filter size of up to 30.

**Figure A2.** Results of filtration for filter sizes: (**a**) 5 pix., (**b**) 10 pix., (**c**) 20 pix., (**d**) 30 pix., (**e**) 40 pix., (**f**) 50 pix., (**g**) 60 pix. **Figure A2.** Results of filtration for filter sizes: (**a**) 5 pix., (**b**) 10 pix., (**c**) 20 pix., (**d**) 30 pix., (**e**) 40 pix., (**f**) 50 pix., (**g**) 60 pix.

To test the remaining thresholds, a different approach was used. The approximated positions of the wires were defined on the test images, and detection was performed to identify the relevant parameters that were saved in a file. A manual classification was then performed to sort the results into correct and incorrect detection groups. The values of the descriptive statistics were then To test the remaining thresholds, a different approach was used. The approximated positions of the wires were defined on the test images, and detection was performed to identify the relevant parameters that were saved in a file. A manual classification was then performed to sort the results into correct and incorrect detection groups. The values of the descriptive statistics were then calculated for each group (Figure A3).

calculated for each group (Figure A3). A clear difference can be seen in results for both *c<sup>r</sup>* and *c<sup>l</sup>* . The distributions of those parameter values with resultant incorrect detection are quite narrow and focused below 0.25, with a couple of outliers. The spread for parameter values with resultant correct detection is wider, but only a few values fall below 0.5. The opposite can be observed for *e*\_*distmax* and *p* parameters. The parameter values with resultant correct detection have a narrow range, while the opposite ones are quite widespread. For *e*\_*distmax* only, outliers have a higher value than 10, while, for *p* only, outliers reach above seven. Parameter values for incorrect detection are far more widespread in the case of *e*\_*distmax* and *p* parameters. There is an overlap between values for the *p* parameter with resultant correct and incorrect detection. Thus, based on the *p* parameter value, only gross errors can be filtered out.

Following an analysis of the results, the same threshold was chosen for *c<sup>r</sup>* and *c<sup>l</sup>* . A more rigid approach was then chosen, and the threshold was set to 0.4. Parameters *e*\_*distmax* and *p* were more challenging because these thresholds relied on the size of the wire in the image. It was nearly impossible to keep the GSD on the power lines constant. In the case of Dataset I, the width of the power line varied from two to eight pixels. The threshold for *e*\_*distmax* and *p* was set to 10 pixels to provide enough space to pass all possible correct detections while filtering out gross errors.

out.

A clear difference can be seen in results for both

and

values with resultant incorrect detection are quite narrow and focused below 0.25, with a couple of outliers. The spread for parameter values with resultant correct detection is wider, but only a few values fall below 0.5. The opposite can be observed for \_ and *p* parameters. The parameter values with resultant correct detection have a narrow range, while the opposite ones are quite widespread. For \_ only, outliers have a higher value than 10, while, for *p* only, outliers reach above seven. Parameter values for incorrect detection are far more widespread in the case of \_ and *p* parameters. There is an overlap between values for the *p* parameter with resultant correct and incorrect detection. Thus, based on the *p* parameter value, only gross errors can be filtered

approach was then chosen, and the threshold was set to 0.4. Parameters \_ and were more challenging because these thresholds relied on the size of the wire in the image. It was nearly impossible to keep the GSD on the power lines constant. In the case of Dataset I, the width of the

to provide enough space to pass all possible correct detections while filtering out gross errors.

Following an analysis of the results, the same threshold was chosen for

. The distributions of those parameter

and

. A more rigid

**Figure A3.** Boxplots of the parameter values against detection results. **Figure A3.** Boxplots of the parameter values against detection results.

#### **References References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
