1. Introduction
Unmanned aerial vehicle applications and new methods in photogrammetry [
1] and remote sensing have increased rapidly in recent years [
2,
3,
4,
5]. Currently, unmanned aerial vehicles (UAVs) are used by a wide community and for cases and applications that could not be performed in the past. Small UAVs, as a photogrammetry measurement tool, provide flexibility and reliability, are safe and easy to use, can be deployed in minutes, and initial measurements can be delivered on the field. User demands are growing both for the quality of the modeling and the final resolution. UAVs are used in many areas where visual spectrum images or multi-spectral images, digital surface models (DSMs), and orthoimagery are derived and encompass the following fields: geodesy [
6,
7,
8,
9,
10,
11,
12,
13,
14,
15], agriculture [
14,
16,
17,
18,
19], forestry [
20,
21,
22], archaeology and architecture [
10,
23,
24,
25,
26,
27], environment and technical infrastructure monitoring [
6,
7,
11,
17,
18,
19,
21,
28,
29,
30,
31,
32,
33], and emergency management and traffic monitoring [
34,
35,
36]. Numerous cases of UAV applications have been realized by the author, during which, some problems have been encountered [
37]. One of these problems is connected with the flight level (altitude, above ground level (AGL)) and the determined ground sample distance (GSD) in specific areas, especially within cities and industrial and construction areas. The terms flight level, altitude, and above ground level in this paper are used equivalently and mean a height measured with respect to the underlying ground surface at take-off position.
The problem is that the safe flight level and camera parameters do not meet the required or demanded ground sampling distance (GSD) (geometrical quality) and texture quality for interpretation (interpretation quality). The safe flight level within an industrial environment can be limited by high cranes, high power lines (which are even more dangerous for UAVs), high buildings [
20,
36], etc. If a required GSD demands a flight level lower than the highest objects in the area, then the required quality cannot be met. A flight level must consider the safe separation between objects and the UAV. This separation (defined by a vertical distance between highest point of the object and UAV) varies and depends on the object type and consists of many coefficients, like altimeter accuracy, global navigation satellite systems (GNSS) accuracy, local law regulations, and the level of confidence in the object’s height as known by the operator.
In the cases where the flight level cannot be reduced and there is no technical ability to change the UAV camera or lens, the author proposes the use of super-resolution (SR) algorithms for increasing the geometrical and interpretation quality of the final photogrammetric product.
In recent years, many techniques to improve the visual quality of images and videos have been developed. The main reason that this kind of technology is being developed is to satisfy user demands for high-quality multimedia content. People require crystal clear and visually pleasing pictures displayed on new, high-quality viewing equipment, such as LCDs (liquid-crystal displays) and LEDs (light-emitting displays). Moreover, high resolution and image quality are commercially attractive, and producers of display equipment want to increase their dimensions (given in diagonal dimension of the screen) and resolution. High-resolution content is not always available due to reasons that include down-sampling for the sake of bandwidth limitations, different types of noise, different compression techniques, different video standards, etc. [
38].
A group of techniques for estimating a high-resolution (HR) image from its low-resolution (LR) counterpart are [
39] called super-resolution (SR) techniques [
38]. Super-resolution methods try to do image upscaling and upsizing without sacrificing the detail and visual appearance of the images. Consequently, the main goal of super resolutions is to find the value of the missing pixels in a high-resolution image. In the context of the presented research, the idea is to find the value of the pixels in the images taken from higher altitudes and make them similar to those taken from a lower altitude. Recent works have considered super-resolution methods in remote sensing [
40,
41,
42,
43,
44,
45], satellite imagery [
41,
42,
43,
44,
45,
46,
47,
48,
49,
50,
51], medicine [
52,
53,
54,
55], and microscopy [
56,
57,
58,
59].
Generally, super-resolution methods are classified into two classes [
60]: multiple-image super-resolution methods [
61,
62,
63] and single-image super-resolution methods [
39,
64,
65,
66,
67]. The first group enhances the spatial resolution of images based on multiple images presenting the same scene. Multiple-image super resolution is based on information fusion, which benefits from the differences (mainly subpixel shifts) between low-resolution images [
61]. From the practical point of view for photogrammetry and remote sensing, multiple images are not always available, or if they are available, there are slight changes between the images. For example, earth observation missions allow for acquisition of the same scene on a regular basis, but the scenes still change too fast in comparison to the revisit time. There are changes including shadows, cloud, snow coverage, moving objects, or seasonal changes in vegetation [
65].
The second group, single-image super-resolution algorithms, are more practical for UAV photogrammetry or remote-sensing applications. An interpolation method (like bicubic interpolation) is the simplest approach to solve the single-image super-resolution problem. However, results from those methods are far from ideal. Developments in the field of machine learning, and especially evidence-based learning techniques, are using parameters learned during training to enhance the results in the evaluation of unknown data. Deep-learning techniques, particularly convolutional neural networks (CNNs), are actually able to enhance the data in an information-theoretical sense [
65], and due to that fact, those techniques were used in the presented experiment.
2. Materials and Methods
This chapter describes the methodology used in the research. The main objective was to study super-resolution (SR) algorithms to improve the geometric and interpretative quality of the final photogrammetric product and its impact on the accuracy of the photogrammetric processing and on the traditional digital photogrammetry workflow. The research concept assumes a comparative analysis of photogrammetric products obtained on the basis of data collected from small, commercial UAVs and products obtained from the same data but additionally processed by the super-resolution algorithm.
The super-resolution algorithm was applied for image-data calculation before the standard postprocessing routine (
Figure 1) for data collected at 110 meters in altitude, in accordance with the main research intention. The data collected at the lower altitude in this research are used as a reference data for comparison with the reduced ones. In other words, the intention was to prove that data collected at a higher altitude can be enhanced using super-resolution algorithms and, using standard photogrammetric processing data, are comparable to those collected at the lower altitude. In the practical cases, where flight at a lower altitude cannot be performed and planned data quality cannot be reached, that algorithmic enhancement can be the only way, and the simplest one, to reach the planned data quality.
2.1. Photogrammetric Process
The photogrammetry technique encompasses methods of image measurement and interpretation in order to derive the shape and location of an object based on photographs. The photogrammetric methods can be applied in cases where the object can be photographically recorded. The purpose of the photogrammetric measurement is a three-dimensional reconstruction in a digital or graphical form. The measurements (images) and a mathematical transformation between the image and the object space have the means to model the object.
Currently, the digital photogrammetry process (
Figure 1a) consists of the data acquisition, processing, and exporting. All steps within this process are made based on raw (not modified) images. Moreover, the photogrammetric software providers underline the fact that images loaded to the processing software are not to be modified [
68,
69]. Any modification can change the internal or external orientation parameters, and the modeling software will not be able to correctly conduct the reconstruction process. Here, a new method, enhanced by a super-resolution, photogrammetric process, was designed and tested on a typical, state-of-the-art photogrammetric software [
70]. In this research, Agisoft Metashape v. 1.6.1 software was used.
The main purpose of augmentation is to increase the resolution of images obtained from the flight at a higher altitude, which will result in a higher geometric and interpretation quality of the final products. This approach is close to reducing the flight level of unmanned aerial vehicles or, in other words, reducing the effective distance to the object. Moreover, the research verified if, despite the guidelines of the software developers, it is possible to modify the resolution of the images and to process them on the commercial software without sacrificing the reconstruction possibilities.
2.2. UAV Flights
The commercial drone market is now dominated by the Chinese DJI (Da Jiang Innovations Science & Technology Co., Ltd., Shenzhen, China) company [
71,
72], and products of this company are used in almost every company which uses UAVs for measurements. For this research, the author used the currently most popular representatives on the commercial market for UAVs: DJI Phantom 4 Pro (PH) and DJI Mavic Pro (MP). Both represent the same class: small, commercial UAVs. Apart from different flight capabilities, both UAVs also have different cameras, and in this regard, only 13-Mpix (megapixel) and 20-Mpix sensors sizes are available. Higher-resolution cameras require a different, larger aerial platform typically mounted on custom constructions and, due to their minority share within the market, were not used for this research.
In the presented research, the single-grid flight path (
Figure 2) was used for both UAVs, with parameters presented in
Table 1. The single flight path is usually used for cases where a main interest is 2D map outputs (orthophotomaps, digital surface models, or digital terrain models) for relatively flat surfaces, such as fields. Typically, an effective area that can be covered during one flight of small commercial UAVs at an altitude of 100 m using a single-grid path is limited to an area of around 600x600 m with a calculated flight time of around 19 minutes. The maximum flight time is calculated for no-wind conditions and, due to that fact, real coverage in windy conditions will be reduced.
During the study, 4 different UAV flights were conducted. Detailed data of the flight patterns (
Figure 2) are presented in
Table 1, where:
—width of the area of interest,
- length of the area of interest,
—distance between two stripes,
—distance between the perspective centers of two consecutive photos,
—image footprint across flight line,
—image footprint along flight line.
2.3. Super Resolution
As it was mentioned, super-resolution methods try to do image upscaling and upsizing without sacrificing the detail and visual appearance of the images. This super-resolution property, embedded in the classic digital photogrammetry process, should theoretically increase the accuracy of the location of ground-control points and the photogrammetric reconstruction itself. Based on recent super-resolution methods, review papers [
73,
74,
75,
76], and the latest available implementations [
60,
64,
77,
78,
79,
80,
81,
82,
83,
84,
85], the method based on the super-resolution generative adversarial network (SRGAN) [
39] was chosen. The method belongs to the group of single-image super resolution (SISR).
The SRGAN network uses high-resolution images and their low-resolution equivalents in the training process. The low-resolution images are obtained by using a Gaussian filter and a down-sampling factor. In the training process, the generator network outputs high-resolution images. The generator network employs a deep-residual network (ResNet) [
86]. The result is evaluated by the critic network with perceptual loss using high-level feature maps of the VGG (visual geometry group) network [
87] and then optimized. VGG is a pretrained convolutional neural network model that is trained on images from the ImageNet database [
88]. The VGG network is combined here with a discriminator that encourages solutions perceptually hard to distinguish from the high-resolution (reference) images.
The aim of optimizing supervised SR algorithms is usually to minimize the mean squared error (MSE) between the recovered high-resolution image and the reference image. MSE minimization also maximizes the peak signal-to-noise ratio (PSNR), which is commonly used to evaluate and compare super-resolution algorithms [
87]. The use of MSE as a critique, especially for real-world images, may result in an insufficient result for the generator [
39]. Therefore, the SRGAN method ignores MSE and replaces the MSE-based content loss with a loss calculated on feature maps of the VGG network [
89]. Since small shifts in the contents of images leads to very poor MSE and PSNR results even when the contents are identical [
90], the change to the VGG network makes it more invariant to changes in the pixel space. In this approach, the generator can learn to create solutions that are highly similar to real images, and that was the main reason of choosing the SRGAN method to enhance photogrammetric images.
The photogrammetric images enhancement was realized using a TensorLayer framework [
60]. Firstly, the pretrained VGG 19-layer model was downloaded, and high-resolution images for the generator network training were obtained from [
91]. This dataset was designed for the New Trends in Image Restoration and Enhancement (NTIRE) challenge on image super resolution. Based on the implementation [
60] and trained networks, the final image enhancement was conducted.
UAV images taken at higher flight levels (110 m) were enhanced using a SRGAN method with a 2
x scaling factor. The lower flight level (55 m) was used as a reference image for further modeling and model comparison. Additionally, original images were resized using bicubic interpolation with a scaling factor of 2
x (the output pixel value is a weighted average of pixels in the nearest 4-by-4 neighborhood). The assessment of image qualities and the evaluation of the SRGAN method in comparison with the bipolar interpolation was conducted on the basis of three different image quality metrics (IQM): blind referenceless image spatial quality evaluator (BRISQUE) [
92], natural image quality evaluator (NIQE) [
93], and perception-based image quality evaluator (PIQE) [
94]. Chosen no-reference image quality scores generally return a non-negative scalar.
The BRISQUE score is in the range from 0 to 100. Lower values of scores reflect better perceptual qualities of images. The NIQE model is trained on a database of pristine images and can measure the quality of images with arbitrary distortion. NIQE is opinion-unaware and does not use subjective quality scores. The tradeoff is that the NIQE score of an image might not correlate as well as the BRISQUE score with human perceptions of quality. Lower values of scores reflect better perceptual qualities of images with respect to the input model. The PIQE score is the no-reference image quality score, and it is inversely correlated to the perceptual quality of an image. A low score value indicates high perceptual quality, and high-score values indicate low perceptual quality. The image scores are presented in
Table 2.
The PIQE scale of the image is based on its PIQE score given in
Table 3. The quality scale and respective score range are assigned through experimental analysis on the dataset in the database [
95].
2.4. Georeferencing Accuracy
Ground-control points (GCPs) can be defined as a feature with known real-world coordinates that can be clearly identified in an image. These points used during the photogrammetric process are required to achieve results of the highest quality, both in terms of the geometrical precision and georeferencing accuracy; therefore, it is very important to correctly locate and point them out during the photo-processing process.
Figure 3 and
Figure 4 present a visual evaluation of two different types of GCPs marked in the area. Typically, GCPs are to be precisely identified at the resolution of the raw image and marked in the processing software. GCPs can be marked in the terrain, as in this case, with a white spray (GCPs no. 1-4) and some kind of pattern, e.g., a chessboard pattern (GCP no. 5).
The GCPs position was measured using a GNSS RTK (real-time kinematic) geodetic receiver (Trimble R8 by Trimble Inc., Sunnyvale, California, USA) with the maximum available precision for this system 8 mm at the horizontal and 15-mm vertical axes. After initial photo alignment, GCPs were marked in the software, and then camera alignment optimization was performed.
Figure 5 presents GCP locations and error estimates after camera alignment optimization.
Table 4 and
Table 5 present detailed values of GCP error estimates and calculated percentage changes between error values calculated for the traditional photogrammetric process with relation to values calculated for the enhanced photogrammetric process. The percentage change is calculated in accordance with following formula:
where
—calculated error value for the traditional process and
represents the calculated error value for the enhanced process.
3. Results
The images collected at 110 meters were enhanced using the described super-resolution algorithm. As a result, new double-sized images were processed (
Table 6). The autocalibration algorithm used in the processing software used an enhanced image resolution and sensor size (in accordance with the provided camera model) to calculate the pixel size. Double-sized images resulted in double-reduced calculated pixel sizes for both UAVs.
The processing report summary and calculated percentage differences for all cases for the photo alignment process are presented in
Table 7 and
Table 8.
The analysis of the results presented in
Table 7 showed that the enhanced process resulted in a 50% decrease in ground resolution (Mavic Pro—SR to 110 m), which is the expected result, as the pixel size of the image was reduced by 50%. The number of tie points increased by 21% for the Mavic Pro (
Table 7) and 40% for the Phantom 4 (
Table 8). The reprojection error in the case of the Mavic Pro has been increased by 23%, but in the case of the Phantom 4 Pro, the reprojection error has been decreased by 15%. A reprojection error is the distance between a point detected in an image and a corresponding world point projected into the same image given in pixels. This error depends on the quality of the camera calibration, as well as on the quality of the detected tie points on the images. In the context of images taken by the Phantom 4 Pro camera, it can be assumed that its overall initial original image quality is better than the Mavic Pro’s images (
Table 2) (excellent on the PIQE scale). The number of tie points detected on the Phantom 4 Pro super resolution is 19% percent higher than that detected on the Mavic Pro’s super-resolution images; therefore, super-resolution enhancement provides a reprojection error reduction for the Phantom 4 Pro. The SR algorithm increases the number of tie points on the processed images for both UAVs at 21% and 40%, respectively. The increased number of tie points in extreme cases like forest-mapping presented in [
96] may result in an increasement in the number of tie points over the minimum number required to carry out the modeling process successfully.
The processing report summary and calculated percentage differences for all cases for the final photogrammetric products are presented in
Table 9 and
Table 10.
The SR significantly increased the number of points in the point clouds (
Table 9 and
Table 10). The number is even higher than in the referenced model. The number of points in the dense point clouds for both cases was increased up to 337 % (Mavic Pro—SR to 110 m). The visual examination of the dense point clouds for the same example object is presented in
Figure 6. This significant improvement resulted in further modeling quality. It can be expected that DEM and 3D models will be generated with higher resolution and with higher levels of details.
Figure 7 presents the results of point cloud comparations. Point clouds generated from the 55-m images were compared to point clouds generated from the SR images using a cloud-to-cloud (C2C) comparison technique [
97]. Based on the C2C distance visualization and histograms, we can prove that the SR point clouds are similar to the referenced one, and that super-resolution enhancement can be applied to the traditional photogrammetric software with no additional modification required. The differences presented on this comparison are visible, particularly in the areas where lower altitude products (referenced) suffered from the modeling problem or reality was modeled missing some objects (trees or bushes). Some objects, like trees or bushes, were modeled on the 110-m images, while the 55-m images, theoretically with smaller GSD, were not reconstructed at all (
Figure 8). This situation appears on both cases, for images taken by the Mavic Pro and Phantom 4 Pro.
A more detailed analysis of the problem reveals that, in the area where bushes and trees are present, the algorithm does not find any tie points. That can result from many aspects, most of all, small but noticeable dynamics of the object (bushes and trees moved in the wind). Particularly noticeable from the low altitude flights are the small dimensions of the object elements (branches), multiple times smaller than the GSD. During the flight at a higher altitude, the size of the GSD allows for some generalization of the object, especially with such small elements as branches and tiny leaves, and also, the dynamics of the object are not so noticeable.
As far as the orthophotomap and DEM are concerned, a close-up of the parts of the products are shown in
Figure 9 The presented part of the products was selected deliberately. The image presents buildings that exceed the average ground level. The flight altitude was determined relative to the mean terrain level. In the pictures presented in
Figure 9, the artifact that was formed on one of the structures appears. There were a series of such artifacts in the entire project, and they occurred at the edges of the higher structures.
The artefacts presented above are revealed on the products created as a result of processing the images from flight at the level of 55 m. They do not occur on the products created from the 110-m altitude flight, and the same fragments are much better reproduced on the products resulting from the images processed by the super-resolution algorithms. The analysis has found that the resulting artifacts are the result of too-low overlap, which was created as a result of low UAV flight (55 m), but only for the higher structures exceeding the average ground level. A building fragment visible on
Figure 9a at a 55-m flight height was imaged only on four photographs (
Figure 10a), while the points on the ground surface were visualized on more than six photographs (
Figure 10b).
The situation described above proves that, in some circumstances, it might not be practical to reduce the flight level to achieve the desired GSD. Naturally, in a comparable scenario, it is possible to increase coverage, which will eliminate similar errors, but it will extend the flight time and reduce the area that will be covered during one flight. The orthophotomap and DEM, which are the result of super-resolution algorithms, do not have similar artifacts, and the GSD is comparable to a flight conducted at an altitude twice that.
4. Conclusions
This study presents the results of increasing the resolution of photogrammetric aerial images, as well as the effect of super-resolution algorithm operations, on the resulting product. As the study has shown, there are photogrammetric products that are created as a result of the algorithmic operation that show a very similar quality to the reference products and, in some cases, even improve their quality. Super-resolution enhancement can be applied to the traditional photogrammetric software, with no additional software modification required.
The typical procedure of photogrammetry image-processing can be extended by the application of super-resolution algorithms in cases where UAV flight altitude reduction is not feasible. Therefore, this provides the capability to preserve the desired quality of the processing. As has been calculated, the ground resolution in cm/pixel can remain unaffected for images acquired at double-height if super-resolution algorithms are applied.
Super-resolution algorithms in the photogrammetric process significantly increased the number of points in the point cloud. The number of points has been increased by 337% as compared to the point clouds generated from images not super-resolutioned, resulting in a significant increase in output quality. These algorithms also do not affect the process itself or the standard functionality of the image-processing software. These applications correctly solve tasks and model objects from photos that have been treated with the super-resolution technique.
The number of tie-points was increased (21% for the Mavic Pro and 40% for the Phantom 4 Pro). In the extreme cases, this feature may result in an increase in the number of tie-points over the minimum number required to carry out the modeling process successfully.
The precision of the position of ground-control points was reduced. This reduction is the higher the better for the original quality of the UAV images. In the case of the Phantom 4 Pro, this reduction was 310%; however, if this result is reported in the total error in millimeters, it drops from 4.7 mm to 19.46 mm.
The study also observes that, for the product development obtained from the images of a lower altitude, in addition to the obvious reduced GSD, there may be some issues in the processing, such as artefacts, deficiencies in the structure, and deficiencies in some reality elements. The use of super-resolution algorithms and flight at a slightly higher altitude resulted in a remarkable elimination of these shortcomings, and, as a result, the resulting product was complete, without spaces or artefacts.
To summarize, it can be considered that super-resolution methods in modern photogrammetry and remote sensing will be applied with increasing frequency. Their potential enables their implementation on the basis of already known photogrammetry software and well-known workflow. Photogrammetric products can be created, as shown in the paper, based on low-cost cameras installed on common UAVs, and at the same time, the geometrical and interpretation quality of the work can be improved by super-resolution algorithms.