1. Introduction
Augmented Reality (AR) is a “human-centered” visualization technology that differs from Virtual Reality (VR), which creates independent immersive 3D virtual environments that are detached from the physical world. AR can overlay virtual objects in real-time onto the displayed image of the real environment through mobile device screens or head-mounted displays, and provide interactive experiences that combine both virtual and real elements according to user behavior [
1]. This allows users to enhance their spatial thinking and orientation capabilities for virtual objects without leaving the physical world. AR can be applied in various visualization scenarios, ranging from indoor to outdoor spaces. Examples include a virtual teaching sandbox [
2], geographic process simulation [
3], indoor and outdoor AR navigation [
4,
5], and underground pipeline inspection [
6].
In UAV operation and aviation-related fields, AR technology has been widely used in assisted flight and simulation. For example, the augmented reality head-up display (AR HUD) has been used to project some important parameters to the screen in front of the pilot, such as altitude, speed, and navigation information [
7], or the panel of the UAV’s image transmission system [
8]. Flightradar24, a real-time flight information service provider, uses mobile AR technology to display information such as the departure, destination, flight number, flight altitude, and airspeed of the flight in the corresponding sky orientation [
9]. In the indoor environment, AR can be used to superimpose the scaled-down virtual scene into the real environment and observe the overall situation of the virtual scene from a global perspective. For example, C Liu et al. presented the reconstructed UAV flight environment for 3D visualization, and specified the moving position of the UAV target through gestures. The position of the virtual UAV is displayed in the flight environment expressed by AR [
10].
In recent years, civilian UAVs have experienced an explosive growth in the applications market. The large number of low-altitude UAVs flying in disorder poses risks to public safety. However, the current means of UAV supervision and air traffic management cannot adapt to the millions of drones in the future sky. Hence, UAVs’ low-altitude public air route (also known as UAVs’ Skyroad) is proposed as a forward-looking low-altitude traffic management solution [
11]. In our study, low altitude is defined as 300 m AGL.
Different from the traditional road traffic, the UAVs’ Skyroad is a digital traffic infrastructure in mid-air above the ground surface. The intuitive visualization and interaction involved in Skyroad plays an important role in the effectiveness of decision-making from planning to operation. At present, the visualization of the air route is mainly based on the 2D or 3D GIS platform developed by the general graphics library. For example, He et al. [
12] used ArcGIS software to visualize the spatial distribution of the UAV logistics air route network in the 2D map. For users to understand the 2D visual expression, a strong spatial abstraction ability is often required, which makes it unsuitable for non-professional users. Zhang et al. [
13] used the 3D WebGIS platform (i.e., Cesium) to visualize collision-free path planning for the UAVs. Although a computer-aided 3D rendering can display the multi-dimensional information of Skyroad, because it creates a virtual scene through 3D projection onto a 2D screen, users cannot intuitively perceive the spatial relationship of Skyroad by using mouse clicks to change the perspective, which increases the information processing burden.
The use of GIS in aviation has traditionally focused on combining dynamic and static map visualization with basic spatial analysis. Skyroad, a forward-looking transportation infrastructure for large-scale UAVs operations, is in the early research stage. To meet new demands, AR + GIS visualization is crucial for assisting decision-making throughout Skyroad’s lifecycle. At present, there is still a lack of research on the AR visualization of Skyroads, and there are some shortcomings in applying traditional AR visualization methods to Skyroad. For example, the use of AR in large outdoor scenes is not suitable for displaying Skyroad from a global perspective, while indoor AR is mostly displayed on a miniature model or a card with quick response (QR) code technology using marker-based tracking, which is easily constrained by the environment and the location of markers. In addition, virtual objects are always rendered first in the AR scene, while less consideration is given to virtual-real occlusions in the environment, which reduces the layered sense of AR 3D expression.
In order to present the UAVs’ Skyroad in the most macroscopic and realistic way, this paper proposes an AR visualization framework for Skyroad based on a physical sandbox model. We developed an innovative model-based marker-less tracking and virtual-real occlusion handling method, and built a prototype system (UAVs’ Skyroad AR Visualization System, SkyroadAR) on mobile devices, and verified its effectiveness and usability through a comparative testing of system performance and user questionnaires.
3. Materials and Methods
3.1. SkyroadAR Framework
The planned Skyroad shuttles in the complex low-altitude geographical environment, and the user can intuitively express the overall operation concept of the air routes and their relationship to place attachment, assist stakeholders in route planning, display the planned global air routes in the physical sandbox reconstructed from a real geography environment, and meet the simplicity, efficiency, accuracy and interactivity requirements of AR visualization. This paper proposes the overall technical framework for the development of the SkyroadAR visualization system. This is shown in
Figure 1, it is divided into four main parts:
(1) The basic geographic information data of the study area are used to obtain the geographical constraint elements that affect air route planning, such as buildings, roads, and water bodies, restore the geometric shape, texture and color of the elements in the scene in proportion through 3D-printing technology, and establish a physical sandbox model of the study area. As shown in the red block in
Figure 1; see
Section 3.2 for specific methods.
(2) Based on geographic information data, multi-level air-route planning (backbone, trunk, branch, terminal) [
38] and the iterative construction of an air route network in the urbanization region [
39] are executed. The generated waypoint data are imported into the 3D animation software in a specific file exchange format to create a multi-scene and multi-level virtual animation scene for UAVs’ low-altitude public air routes. This is shown in the blue block in
Figure 1; see
Section 3.3 for specific methods.
(3) The 3D virtual model of the sandbox is generated by multi-view 3D reconstruction. Using a model-based marker-less tracking and registration method, in the offline stage, the multi-view reference images of the virtual model are collected, and the camera position and attitude information are recorded in the extensible markup language (xml). In the online stage, the key frame image collected in real time is template-matched with the gradient information of the reference image in the offline stage, and the roughly estimated pose information is obtained from the xml. By matching the key frame image with the local natural feature descriptor of the reference image, the relationship between the 2D and 3D corresponding points is obtained to accurately calculate the camera pose. This is shown in the green block in
Figure 1; see
Section 3.4 for specific methods.
(4) Based on the depth information of the 3D virtual sandbox model, the template buffer in the GPU real-time rendering pipeline is used to make a transparent mask template, and the fusion effect of virtual-real occlusion is realized through template testing and depth testing. This is shown in the yellow block in
Figure 1; see
Section 3.5 for specific methods.
Figure 1.
Methodological framework of SkyroadAR. The red block represents
Section 3.2 physical sandbox construction, the blue block represents
Section 3.3 AR Scene Production for UAVs’ Skyroad, the green block represents
Section 3.4 Model-Based Markless Registration and Tracking and the yellow block represents
Section 3.5 GPU-Based Virtual-Real Occlusion Handling.
Figure 1.
Methodological framework of SkyroadAR. The red block represents
Section 3.2 physical sandbox construction, the blue block represents
Section 3.3 AR Scene Production for UAVs’ Skyroad, the green block represents
Section 3.4 Model-Based Markless Registration and Tracking and the yellow block represents
Section 3.5 GPU-Based Virtual-Real Occlusion Handling.
3.2. Physical Sandbox Construction
Representing the environment in the form of miniature entities can display abstract geographical cognition in the form of a 3D model and enhance the multi-scale spatial imagination. With the development of 3D printing technology, the use of GIS and CAD software and high-precision inkjet technology to quickly generate a realistic sandbox and reduce the cost of manual sandbox production has been greatly welcomed. This paper takes an area of about 25 square kilometers in Shekou, Nanshan District, Shenzhen, Guangdong Province, China, as the research area, and builds a physical sandbox with a size of 3.6 m × 3 m.
3.2.1. GIS Data Preparation and Processing
The low-altitude geographical environment of UAVs‘ flight is complex and changeable, and the factors affecting route safety in the geographical environment are defined as route-sensitive constraints, such as terrain, building, road, water body, vegetation, and power lines (or poles). Firstly, determine the geographic boundaries of the planned UAV Skyroad. Collect basic geographic information data such as a digital elevation model (DEM) from the open data sharing platform, and obtain building boundaries, roads, vegetation, water bodies and other information based on intelligent interpretation technology for high-resolution remote sensing images [
40], as shown in
Figure 2.
3.2.2. CAD Data Generation
Except for the DEM data stored in the raster image format, mapped by the 2D grayscale symbology, most of the basic GIS data are vector boundary information stored in the shapefile (.shp) file or json file format. A few, such as oblique photographic models and laser point cloud data, are directly stored in 3D data format. Before 3D printing, it is necessary to unify the format into the stereolithography file (STL).
Use the DEMto3D plug-in in QGIS software to project the DEM raster data of the research area into the general transverse Mercator coordinate system, determine the model scale according to the output printing size set by the user and adjust the elevation exaggeration coefficient to meet the horizontal and vertical visual balance. The STL file is generated by calculating the grid vertex coordinates.
For the vector boundary data, leverage the Supermap software to extrude the 3D model according to the height attribute, and export the model in STL format. The specific process of converting GIS data to CAD data is shown in
Figure 3, and the generated CAD data of the study area are shown in
Figure 4.
3.2.3. Sandbox Printing
The 3D printer prints a section with a certain micro-thickness and a specific shape each time, and bonds it layer by layer to form a 3D model. Since the STL file format only stores the discrete triangular patch information on the surface of the CAD model, its simple data structure is suitable for sliced 3D printers, which use specific materials (such as foam boards) to print various geographically constrained elements onto the white model, on which the texture and color extracted from remote sensing images are painted, and sound, light, and electrical systems are added to make a vivid physical sandbox, as shown in
Figure 5.
3.3. AR Scene Production for UAVs’ Skyroad
In order to express the concept of low-altitude public routes on the physical sandbox, it is necessary to plan air routes based on GIS data and create a virtual air route scene by simulating the flight of the UAVs along the Skyroad. By integrating the virtual scene with the physical sandbox, the Skyroad planning and application effect of UAVs can be realistically displayed.
3.3.1. Low-Altitude Flight Environmental Modeling for UAVs
For the complex and changeable low-altitude geographical environment, Skyroad planning needs to comprehensively consider the specific geographical elements that are to be avoided (such as buildings and power lines) and important flight conditions (such as noise an privacy) so as to set different altitude levels and safety intervals for the route. Intelligent interpretation technology is used to extract the information of various sensitive constraints, combined with risk assessment to determine the boundaries of geo-fences, while parametric modeling methods are used to establish a 3D representation model of the low-altitude flight environment. The geographical fence is divided into four levels, the airworthiness zone, the buffer zone, the warning zone, and the no-fly zone, and the level of welcoming UAVs in the area is gradually reduced, as shown in
Figure 6.
3.3.2. Multi-Level and Regional Air Route Network Planning
According to the IEEE standard for planning UAVs’ low-altitude public air routes based on massive geographic information, the construction of national-scale UAV Skyroad is divided into four levels according to their operating capabilities (backbone routes, trunk routes, branch routes, and terminal (regional) routes [
11]), which can be obtained by multiple geographic constraints and the path search algorithm [
38]. At the regional scale, the initial regional airway network is generated by vertically lifting the ground road network, and the airway network is constructed and iteratively improved in five steps under the condition of utilizing or avoiding geographical constraint elements [
39], as shown in
Figure 7.
3.3.3. Animation of Virtual Skyroad Scene
In order to dynamically express the Skyroad scene of UAVs’ operation, a multi-level route network is designed according to the application requirements, and used to verify altitude conversion and approach rules [
41]. For normal operation scenarios, such as the urban loading of passengers or goods, and logistics distribution, a general air route with two-way operation lanes is set. For special scenarios, such as sightseeing and tourism, power (or road) inspection, survey and mapping, and cross-sea logistics transportation, through setting up both dedicated and public air routes to achieve staggered peak operation, information technology can be used to share the Skyroad’s right of way, and make rational use of airspace resources. As shown in
Figure 8, the scale-transformed virtual sandbox model can be used to establish the UAVs flight animation in the scene, and the 3D models of point of interest (POI) and geo-fence can be added to vividly display a virtual Skyroad scene.
3.4. Model-Based Markless Registration and Tracking
Since the Skyroad scene is represented by a 3D Cartesian coordinate system that has nothing to do with the real world, to superimpose the virtual Skyroad on the physical sandbox, we need to align the size of the virtual scene and the real scene, and obtain the spatial relationship between the camera and the target in real time through a change in the target position in the real scene. This relationship is represented by Formula (1).
denote the scale factor, and a 3D point
is corresponds to the homogeneous coordinate of a 2D point
on the image through the pose matrix
, which is equal to
, where
is 3 × 3 rotation matrix and
denote translation vector (3 × 1 column matrix); together, they constitute the camera’s extrinsic parameter matrix, while
denote the intrinsic parameters matrix that occurs independent of motion, as shown in Formula (2).
Here, and depend on the focal length and the size of the photosensitive element, is the principal point of the camera, and the camera tilt factor approximately equals zero.
The intrinsic parameter is stable after camera calibration, so recovering the camera’s extrinsic parameter accurately is the primary concern of AR registration and tracking. In this paper, a model-based tracking method is adopted. The physical sandbox is reconstructed to generate a virtual 3D model. In the offline stage, reference images of various viewpoints with different angles and distance are collected by simulating the real camera movement and pose parameter information of the virtual camera, and edges and natural feature points in the reference image are also recorded. In the online stage, the extrinsic parameter matrix of the camera is recovered by template-matching and feature-matching on the real-time 2D image and the reference image.
3.4.1. 3D Reconstruction of Physical Sandbox
The multi-view images are collected by shooting around the sandbox with the camera, and the feature points of each image are extracted using the scale-invariant feature transformation (SIFT) operator. Sparse point-line feature-matching is employed to calculate the internal and external orientation elements, and they are optimized using bundle adjustment. The 3D point cloud of the target is acquired based on dense matching, and finally a 3D model is generated through triangulation and texture mapping. We add real scale constraints in the reconstruction process so that the ratio of the reconstructed virtual sandbox to the real physical sandbox is 1:1, which is used as the scale benchmark of the virtual Skyroad scene to achieve virtual-real scale matching. The sandbox 3D reconstruction is shown in
Figure 9.
3.4.2. Offline Training of Gradient-Based Contour Directions and ORB Features
In the offline training phase, by simulating the real camera motion parameters, the 3D world coordinates of the virtual sandbox model are projected to the image plane to generate reference views under different view angles, and the gradient directions of the contours in the reference views and the oriented FAST and rotated BRIEF (ORB) features are extracted, saved and encoded.
Assume that the virtual sandbox is located at the center of the spherical coordinate system formed by the camera orbit motion, and the optical axis of the camera always passes through the center. By specifying the interval of longitude
, latitude
and distance
in the spherical parameters, the viewing range covered by the camera under different view angles and different distances is limited to a certain spherical hexahedron, as shown in
Figure 10.
We draw on the approach of hierarchical tree proposed by Wiedemann [
42]; the views are roughly sampled at higher image pyramid levels to speed up recognition in the online stage. View sampling starts from the lowest image pyramid level, and calculates the similarity between views at adjacent camera positions by applying a contour similarity measurement (Formula (5)). The pair of views with the highest similarity is merged into a new view, and the similarity between its neighbors is calculated. This process is repeated until the similarity is below a certain threshold. As shown in
Figure 11a, the views remaining after merging are stored at the lowest (original) hierarchical level. In order to derive the view of the next higher hierarchical level, the similarity constraint is relaxed by reducing the image resolution while continuing to merge according to the similarity measure, as shown in
Figure 11b–d. The subviews are views of the lower pyramid level that have been merged to obtain views at the current pyramid level or that cannot be merged. Each view stores references to all subviews via a hierarchical tree structure. This information is used during the online phase to query a given view at a higher pyramid level, and at the lower pyramid level to refine the match.
The outline template for the view is extracted in the hierarchy tree. Denoising and preprocessing are completed by Gaussian fuzzy low-pass filtering, using the linear parallel multi-modal LINE-MOD template-matching method [
43], while the contours of the sandbox are calculated by the Sobel operator, and the Phase function is used to calculate the three-channel (red, green and blue) gradient direction of each pixel of the contours. The maximum value is taken, as shown in Formula (3), which represents the gradient of image
at position
, and
denotes radians.
where
In order to use the rich texture information of the sandbox to obtain accurate matching, the extraction of ORB feature points and the generation of feature descriptors are also performed in the offline stage, and the extracted feature points are back-projected onto the 3D model of the object using the pose of the virtual camera to determine the world coordinates of the corresponding 3D points. Finally, the pose, the contour template, the ORB feature points and feature descriptors, and the corresponding 3D world coordinates of each reference view are stored in the XML file.
Figure 12 shows the contour template and extracted ORB feature points under a certain view.
3.4.3. Online Matching and Pose Optimization
In the online stage, by matching the similarities between the offline training template image and the current input image, the 3D pose of the physical sandbox relative to the camera is roughly determined, and the range of ORB feature points is narrowed down to accurately estimate the pose.
Recognition starts from the top level of the input image pyramid. All 2D contour templates at this level are searched using Formula (5) to compute the similarity measure between the input image and the contour template.
where
is the similarity between input image
and template
.
denotes the gradient direction at position
in the template image
expressed in radians;
is the radian representation of the gradient direction at position
in the input image
;
is the list of position
, and
is the outline template of the object.
where
represents the area with the offset
at position
as the center and
as the radius of the neighborhood.
The matching similarity measure at position is computed in the input image using a sliding window approach, and the contour template parameters (position, rotation, scale) that exceed the similarity threshold are stored in the matching candidate list. At the next lower hierarchical tree level, refinement is performed by computing the similarity measure between the 2D contour templates of all subviews and the current pyramid level input image, and the scope of template parameters in the candidate list is limited to the immediate neighbors of the parent match. This process is repeated until all matching candidates have been traced down to the lowest pyramid level. Through hierarchical trees and image pyramids, this approach can greatly speed up the matching.
After the key frame image that is most similar to the pose of input image is obtained through template-matching, a more accurate camera pose is estimated based on the method of ORB natural feature point-matching [
44]. By inputting the 2D image and the corresponding 3D feature points in the key frame, the pose matrix
in Formula 1 of the current camera is accurately computed based on the PnP algorithm [
45]. Utilizing the continuity between adjacent frames, the Lucas Kanade sparse optical flow algorithm [
46] is used to predict the coordinates of the image feature points in the next frame during the tracking process to reduce the time spent repeatedly extracting and identifying features of the entire image and improve the tracking speed and real-time performance of the algorithm. The template-matching process will only be reactivated when the number of matching feature points is less than the set threshold, for example, the view angle or distance changes greatly.
3.5. GPU-Based Virtual-Real Occlusion Handling
After determining the spatial relationship between the camera and the sandbox, it is necessary to superimpose the created virtual Skyroad scene on the physical sandbox, and in order to render the virtual scene more realistically, it is necessary to obtain the correct occlusion relationship between them. The virtual sandbox model can be used to obtain the depth information of the physical sandbox in the scene and realize the virtual-real occlusion handling through the template test and depth test of the GPU graphics rendering pipeline. The specific process is shown in
Figure 13.
Firstly, the depth map of the reconstructed 3D sandbox model can be obtained according to the current pose of the camera, and the camera projection matrix can be used to transform the 3D points in the visible surface into the camera coordinate system. This can be set not to output any color (that is, transparent display), thereby generating a transparent mask, and its distance from the camera (that is, the depth value) can be stored.
Next, traversal is used to obtain all the pixels in the current view. If it is determined that the pixel is inside the outline of the sandbox, the pixel value is set to 1; otherwise, it is set to 0, so we obtain the occlusion template.
Then, a template test is performed on the pixel value of the virtual scene in the fragment shader, the pixel with a value of 1 is set to pass the template test according to the occlusion template, and the RGB color information of the pixel is stored in the color cache. Pixels that fail the stencil test are discarded.
Finally, a depth test is performed on the pixel that has passed the template test, and its depth value is compared with the one that already been stored. If the new pixel depth value is smaller than the original one, the new pixel value will replace the original pixel value. The RGB information of the pixel is stored in the color cache so as to draw the virtual Skyroad scene blocked by the transparent mask, as shown in
Figure 14.
5. Discussion
5.1. The Efficiency of Tracking
As shown in
Figure 17, the markerless tracking method employed by SkyroadAR (red line) demonstrates faster initialization during the first stage of tracking compared to the Vuforia model target module tracking method (green line). This is because Vuforia directly adopts the ORB feature point-matching, while SkyroadAR narrows down the search range of ORB feature points and reduces the time consumed in feature point-matching through hierarchical tree-based contour template-matching. After successful tracking, the camera moves smoothly to the right side of the sandbox. The sparse optical flow method is used for follow-up tracking in SkyroadAR, which is robust to distance and scale changes under stable lighting conditions. Vuforia also employs a similar processing method, making the overall time-consuming trend of the second stage similar.
After swiftly moving to the left side of the sandbox, the algorithm encounters a loss of tracking due to significant camera movement, resulting in a scarcity of feature points. SkyroadAR reverts to template matching for registration. Because of the hierarchical tree structure we used, the search was only performed in the vicinity of the subview, resulting in lower time consumption compared to the first stage. However, this was slightly higher than the time consumption of Vuforia since Vuforia integrates the attitude sensor data of mobile devices for tracking and registration. However, due to the cumulative error of the attitude sensors, the accuracy of tracking declined, while the virtual and real fusion effect of SkyroadAR remained stable.
When using the palm for full occlusion, feature points are completely lost, leading to an increase in tracking time. However, when the palm is partially removed, SkyroadAR utilizes the key frame before occlusion to narrow the search range of ORB feature points, resulting in faster registration compared to Vuforia.
5.2. The Effect of Occlusion Handling
According to the questionnaire results, users gave an average score of 4.09 for the hypothetical question Q3 on the occlusion effect. This indicates that the occlusion handling in this paper reduces the false spatial cognition of users and enhances scene realism. The occlusion effectively utilizes prior depth information of the reconstructed model and GPU-based shader rendering, ensuring correct and fast occlusion without compromising user experience. However, this method has limitations. The occlusion effect relies on the 3D model’s fineness and tracking accuracy. Due to pre-modeling accuracy, occlusion may not be ideal in local fine structure areas, resulting in fragmented and jagged occlusion edges. Additionally, pre-modeling limits AR flexibility and cannot handle dynamic occlusion.
5.3. System Usability and Usefulness Evaluation
The user experience questionnaire results show mean scores for seven questions that are higher than the theoretical mean, with narrow gaps around the average value (standard deviations ranging from 0.49 to 1.16). The single-sample t-test indicates a significant difference between the average score of Q1 (T = 27.12, p < 0.05) and the theoretical average score, demonstrating that most users agree with the assumption that SkyroadAR is an intuitive and effective visualization system. Q2 (T = 13.1, p < 0.05) and Q3 (T = 14.39, p < 0.05) indicate that the acceptance of tracking and occlusion handling effects surpasses the average. SkyroadAR excels in perceived usability.
In terms of perceived ease of use, Q5 (T = 0.66, p > 0.05) suggests that users perceive the ease of interaction to be slightly higher than the theoretical average score, with minimal difference. However, Q6 (T = −4.18, p < 0.05) indicates that the current software interaction has certain limitations, differing significantly from the assumption, and users desire more interactive functions. This is because the primary objective of SkyroadAR is to visualize the operating concept of UAVs’ low-altitude public air route. Due to the substantial difference between mobile device operation and traditional keyboard-mouse interaction, desired interactive functions like a precise interactive air-route design module have not been developed. Q7 (T = 16.86, p < 0.05) and Q8 (T = 4.32, p < 0.05) indicate user affirmation regarding the useful role of SkyroadAR in the forward-looking research and development of Skyroad. Both professional and non-professional users find this valuable for UAV regulation in this innovative approach.
5.4. The Opportunities and Challenges of SkyroadAR
We introduced a novel AR visualization framework for UAVs’ low-altitude public air route. It is a geovisualization method that not only applies to Skyroad but also benefits other domains with spatio-temporal information, including trajectory data analysis, environmental simulation, and GIS data visualization. Furthermore, the method does not require the use of a specific AR device. Other head-mounted display (HMD) systems, such as Hololens, can also be used as a carrier for visualization. These characteristics ensure that ARSkyroad is a promising and feasible AR visualization method. On the other hand, geographic information data should not only be used to reconstruct the sandbox, but should be integrated with IMU, RTK-GPS and other multi-sensor data to increase the intelligence of AR full-space scene registration and tracking, and advanced technologies such as cloud rendering should be used to further improve the rendering authenticity, accuracy and efficiency.
Since the SkyroadAR prototype aimed to display the planned UAVs’ Skyroad on the physical sandbox, it primarily addressed markerless tracking and virtual-real occlusion, leaving room for improvement in AR interaction. In the future, alignment with domain experts’ needs and targeted user evaluations will enhance software quality. For instance, in research on urban crowdsensing for complex tasks like UAV route planning [
48] and resource scheduling [
49], using SkyroadAR visualization to verify the results offers an innovative avenue for in-depth exploration.
6. Conclusions
In this paper, we proposed an innovative AR sandbox visualization framwork for UAVs’ low-altitude public air route, and developed the SkyroadAR prototype system. The framework provides an intuitive and effective visualization for the forward-looking low-altitude digital transport infrastructure. We examined the key technologies in sandbox reconstruction, Skyroad scene production, tracking and registration, and virtual-real occlusion. The system’s usability, ease of use, and user intentions were verified through system performance experiments and user questionnaires. The experimental results demonstrate that superimposing the virtual Skyroad scene on the physical sandbox offers an intuitive and efficient environment for expressing UAVs’ low-altitude public air route. The improved LINE-MOD template-matching method was based on hierarchical trees and image pyramids, enhancing tracking speed and accuracy. The transparent mask created by the sandbox effectively handles occlusion in the GPU graphics-rendering pipeline, enhancing the user’s AR experience with low computational cost.
Nevertheless, the user questionnaire feedback indicates system shortcomings in the interaction function, with the current AR primarily focusing on expressing the Skyroad concept on the sandbox. In future research, specific outdoor UAV air-route AR visualization should be conducted, and the potential of AR methods in assisting UAVs’ low-altitude management and applications should be further explored.
In addition, with the development of AI and space technology, AR and GIS visualization will not be limited to sandbox applications. An intelligent perception and understanding of real geographical scenes will be crucial for future AR in fields like robotics and autonomous driving. AR autonomous positioning technology combined with GIS map semantics to improve human–machine coupling indoor and outdoor spatial cognition is expected to achieve better results. Combining AR, AI and GIS map semantics enhances indoor and outdoor spatial cognition and accelerates the development of maps from 2D to 3D, and eventually to 4D, enabling comprehensive geographic expression.