Multi-Size Voxel Cube (MSVC) Algorithm—A Novel Method for Terrain Filtering from Dense Point Clouds Using a Deep Neural Network

Štroner, Martin; Boušek, Martin; Kučera, Jakub; Váchová, Hana; Urban, Rudolf

doi:10.3390/rs17040615

Open AccessArticle

Multi-Size Voxel Cube (MSVC) Algorithm—A Novel Method for Terrain Filtering from Dense Point Clouds Using a Deep Neural Network

by

Martin Štroner

^*

,

Martin Boušek

,

Jakub Kučera

,

Hana Váchová

and

Rudolf Urban

Department of Special Geodesy, Faculty of Civil Engineering, Czech Technical University in Prague, Thákurova 7, 166 29 Prague, Czech Republic

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(4), 615; https://doi.org/10.3390/rs17040615

Submission received: 1 December 2024 / Revised: 7 January 2025 / Accepted: 10 February 2025 / Published: 11 February 2025

(This article belongs to the Special Issue New Perspectives on 3D Point Cloud (Third Edition))

Download

Browse Figures

Versions Notes

Abstract

:

When filtering highly rugged terrain from dense point clouds (particularly in technical applications such as civil engineering), the most widely used filtering approaches yield suboptimal results. Here, we proposed and tested a novel ground-filtering algorithm, a multi-size voxel cube (MSVC), utilizing a deep neural network. This is based on the voxelization of the point cloud, the classification of individual voxels as ground or non-ground using surrounding voxels (a “voxel cube” of 9 × 9 × 9 voxels), and the gradual reduction in voxel size, allowing the acquisition of custom-level detail and highly rugged terrain from dense point clouds. The MSVC performance on two dense point clouds, capturing highly rugged areas with dense vegetation cover, was compared with that of the widely used cloth simulation filter (CSF) using manually classified terrain as the reference. MSVC consistently outperformed the CSF filter in terms of the correctly identified ground points, correctly identified non-ground points, balanced accuracy, and the F-score. Another advantage of this filter lay in its easy adaptability to any type of terrain, enabled by the utilization of machine learning. The only disadvantage lay in the necessity to manually prepare training data. On the other hand, we aim to account for this in the future by producing neural networks trained for individual landscape types, thus eliminating this phase of the work.

Keywords:

point cloud; ground filtering; neural network; machine learning

Graphical Abstract

1. Introduction

Point clouds are nowadays commonly used to describe object surfaces, including the surface of the Earth. Unlike standard geodetic methods using, e.g., total station or a geodetic GNSS-RTK receiver measuring individual significant terrain points, point clouds are typically non-selectively acquired in irregular grids using methods such as airborne laser scanning (ALS) [1], terrestrial 3D scanning [2], photogrammetry [3], or mobile laser scanning systems mounted on terrestrial vehicles [4]. More recently, unmanned airborne vehicles (UAVs) equipped with cameras [5] and/or lidar systems [6] have become widely used for this purpose. Lately, a great improvement in the quality and reliability of measurements has been observed in mobile (typically operator-carried) SLAM (simultaneous localization and mapping) scanners [7] that can be successfully used even in areas without GNSS signal and with limited visibility (such as underground spaces) [8,9].

All point clouds, regardless of their different characters (density, absolute or relative accuracy, origin, etc.), inherently include points that do not represent the surface of interest and need to be filtered out using one or more filtering methods. Extraction of the points representing the ground (ground filtering, vegetation filtering) is a typical example of such filtering. Many methods and algorithms for this process have been developed, typically based on a single attribute, allowing us to distinguish between the points of interest and other points. The selection of the filtering method depends, among other things, also on the sensor type; the suitability of individual filters for the data acquired using different types of sensors is discussed in several systematic reviews [10,11,12].

The most common filters are based on inclination/slope (e.g., [13,14,15,16,17]), 3D alpha shape [18], interpolation (e.g., [19,20,21,22]), morphology (e.g., [23,24,25]), or segmentation (e.g., [26,27]). Other types of filters include, for example, the statistics-based filter [28], the cloth simulation filter [29], a combination of the cloth simulation filter with a progressive TIN densification [30], or the MDSR filter based on iterative determination of the lowest terrain points taken from multiple perspectives [31]). In principle, all these methods assume that the terrain is relatively level, and the slope (or change in slope) does not exceed certain maximum values. Based on these assumptions, an approximation of the terrain is created (triangular network, square grid, or cloth simulation) and the points until a certain distance from it are considered terrain. However, these methods all come with some limitations, such as difficult distinguishing between rocks (ground) and buildings (non-ground).

Recently, however, the trends in new filtering methods have shifted towards the use of machine learning, which can easily be designed as multicriterial and, therefore, more universal. Here, however, a principal problem occurs—to be able to load an irregular point cloud (note that the points within the point cloud are not in any regular grid or structure) into a neural network, it is necessary to transform the point cloud into a regular structure.

Machine-learning-based methods often employ assessment algorithms based on local features. In other words, they assess each point based on the characteristics calculated for it from its close vicinity. Such algorithms can use, for example, the variance in spheres of various diameters around the point [32], colors of the points [33], or other spatial and spectral features [1,34,35,36]. Researchers also often use approaches from image analysis, transposing the point cloud into 2D, processing it through standard image analysis methods such as convolutional neural networks (CNNs, [37]), and, subsequently, reverse transposing it to a 3D point cloud ([38,39]) from one or more perspectives ([40,41,42,43]). Also, the transformer networks are used [44]. The PointNet [45] architecture and its advanced version PointNet++ [46] have brought about another concept for object recognition in the point cloud; it is, however, applicable rather to spatially limited point clouds and objects. (A more detailed overview of the algorithms is given in Table A1 and Table A2).

Moreover, the aforementioned methods have been developed for sparse ALS point clouds, which is sufficient for most remote sensing applications. Their use on dense data for engineering applications necessitating sub-decimeter resolutions and centimeter accuracies is, therefore, problematic (note that such dense point clouds contain hundreds to tens of thousands of points per m² and the noise can be higher than the actual distance between points).

To be able to perform such detailed filtering from dense point clouds, alternative approaches are necessary. Voxelization is one such possible approach, e.g., [47], that could be applied even to large point clouds. This approach is based on the transformation of the irregular point cloud into a regular structure. So far, however, papers evaluating the benefits of combining point cloud voxelization with neural networks were also applied to sparse point clouds only. Moreover, all these papers used only a single voxel size (at most, compared two sizes). In effect, the voxel size directly determined the resulting terrain resolution and accuracy (a voxel too large yielded a poor resolution, and a voxel too small was excessively computationally demanding as a very high number of the surrounding voxels must be considered to recognize the terrain shape). This limits the usability of the traditional single-size voxel solution to dense clouds.

For this reason, we have proposed an algorithm based on a progressive reduction in voxel size, starting from very large voxels and gradually removing points evaluated by a trained neural network as non-ground. After each such step, the voxel size is reduced, which is repeated until a target resolution is obtained. As such an algorithm is, to the best of our knowledge, missing so far, this paper aimed to (i) present the algorithm in detail, (ii) present its outcomes on two different dense point clouds describing highly rugged terrain, and (iii) compare its results with those obtained using the cloth simulation filter (CSF), which was previously shown to perform better than other freely available filters (SMRF, PMF) on this type of point cloud.

2. Materials and Methods

2.1. Method Principles

Figure 1 provides a 2D illustration of the basic principle of the method, showing the profile of a point cloud in a forested area. The point cloud was voxelized into voxels of a suitable size (2 × 2 × 2 m in Figure 1), and each voxel was characterized by its position in the grid and the number of points it contained. Voxels below the terrain contained no points (or very few points, e.g., noise). The evaluation of each individual voxel was performed on the basis of its surroundings (here, 9 × 9 × 9 voxels, i.e., the central voxel and the four voxels in each principal direction), forming a voxel cube (VC)—the principal assessment unit in this algorithm. It is necessary to keep in mind that there are local differences in the point cloud density associated, e.g., with the differences in the distance of the 3D scanner from the objects. This leads to the acquisition of denser point clouds at locations closer to the scanner. Such differences in the density of a point cloud describing the identical object closer and further from the scanner would impair the training of the neural network, making the recognition of such objects less reliable.

To avoid this issue, the number of points in each individual voxel within each 9 × 9 × 9 voxel cube was normalized (i.e., divided by the total number of points in the entire 9 × 9 × 9 voxel cube). Thus, each voxel was represented by the percentage of points within the voxel cube after each shift of the voxel cube. From the perspective of the neural network, each identical object was then “described” in an identical way regardless of its distance from the scanner. Subsequently, each individual voxel was classified as ground or non-ground by a neural network trained on a part of the investigated point cloud or on a point cloud from a similar area. The points in the voxels classified as ground then entered a second pass with a reduced voxel size, providing finer detail of the terrain. In this way, the voxel size (and detail of the terrain) gradually decreased until the required detail was achieved (Figure 2, Figure 3 and Figure 4).

The initial voxel size (m) and voxel cube size (number of voxels) needed to be chosen in a way to ensure that the voxel cube contained a distinguishable part of the real terrain. For this reason, it was safer to initially select a larger voxel size. A bigger voxel cube size (in the sense of the greater number of included voxels) was capable of providing better terrain identification as it gave the neural network better information on the surroundings. At the same time, however, it increased the computational costs; hence, a compromise size of 9 × 9 × 9 voxels was used for the verification of this method and its performance. In our experience, the gradual reduction in voxel size should be to approx. 75% of that in the previous step, which ensures correct detection even in highly rugged terrain.

The above-described process, however, had an inherent flaw. The red arrows in Figure 3 indicate voxels that included only marginal numbers of terrain points, which would lead to their classification as non-ground. This can be, however, easily prevented by evaluating the point cloud in two voxel grids that are mutually shifted by half of the voxel size in each axis (we will refer to them as the “regular” and “shifted”). Next, all points from the point cloud were classified as ground or non-ground based on whether or not they lay within a voxel classified as ground in at least one of the grids.

The classification of narrow and tall (or deep) terrain features that could be lost during classification using large voxel size posed another possible problem. To rectify this, the algorithm was adjusted to keep (besides the points in the voxels classified as ground) also one additional “envelope” layer of voxels for the next step, ensuring that no terrain points were removed even in the cases of slight misclassification. This envelope, therefore, helped prevent the undesirable removal of features such as narrow rocks that could be erroneously considered trees. Figure 4 shows the gradual action of the filter in 3D, and the classification scheme is visualized in Figure 5.

2.2. The Deep Neural Network and Its Training

As mentioned above, the classification itself was performed using a deep neural network (DNN). The inputs into the triangular DNN were the normalized numbers of the points in individual voxels, i.e., 729 values (9³) for each voxel cube. The first hidden layer had 1458 neurons, and subsequent hidden layers had 729, 364, 182, 91, 45, and 22 neurons, with one output neuron returning binary 0/1 values (1—ground; 0—non-ground). The network was created in Python version 3.11.9. Training and use of the neural network were performed using the TensorFlow 2.17.0 libraries and Keras interface. In training, two types of regularization were used—the L2 layer regularization in all hidden layers, and drop-out regularization, which was inserted between each of the two subsequent hidden layers. The definition of the neural network, including regularization coefficients, is shown in Appendix C. As the terrain shape for each voxel/voxel cube size slightly differed, separate training was performed for each step of the algorithm. To optimize the training process, we started the training from the manually classified data at the finest (required) resolution. As the individual steps of the algorithm used voxels of relatively similar size (75%), the patterns of voxel cubes could be rightly expected to be similar, and the DNN training was, therefore, performed stepwise, always using the previous network as the initial state.

For training, data manually classified in CloudCompare v 2.13.0 were used. For each voxel size, the training data were voxelized accordingly and if the particular voxel contained at least one point representing terrain, the voxel was considered to be a ground-representing voxel. For the training of each voxel size, the point cloud was cropped to a distance of max. 4 voxels from the nearest ground point (which corresponded to the actual classification process, as described above). The training was performed with both regular and shifted grids. Moreover, to allow the best possible training of the general shapes, the DNN was trained also on data rotated by 90°, 180°, and 270°. To account for the differences in the numbers of ground and non-ground voxels in the training data, weights were assigned to the voxel classes to compensate for this.

2.3. Training/Testing Data

2.3.1. Data 1

Data 1 describes a steep slope with rocks. The point cloud was acquired photogrammetrically using the structure from motion (SfM) technique from a manually piloted flight with the UAV DJI Phantom 4. The data were processed in Agisoft Metashape ver. 2.0.1 without any filtering to determine as many terrain points as possible. Considering the very dense vegetation, the resolution was set to ultra-high. Both these settings led to the presence of a considerable amount of noise. No further editing was performed for the purpose of testing. The data are shown in Figure 6 (the training data (area of 80 × 40 × 40 m; Figure 6a; contains 34,859,808 points), the test data (area of 54 × 32 × 47 m; Figure 6b; consists of 17,278,032 points)). In both cases, the resolution is approx. 1 cm.

2.3.2. Data 2

Data 2 were acquired in a densely forested area with rugged terrain using a lidar system DJI mounted on a UAV DJI Matrice 300 RTK. For this paper, four discrete areas (see Figure 7, Table 1) were cut out from the scanned area of approx. 340 × 250 m—one was used as a training dataset, the remaining three as test datasets. The reason that we did not use the entire area lay in the very laborious manual preparation of the reference ground surface, the location of the individual data in the area are shown in Figure A1. Data 2—Boulders (Figure 7c,d) showed a typical area of this site—relatively rugged terrain with large boulders and the predominance of higher vegetation. The area in Data 2—Tower (Figure 7e,f) contained built structures, which were not present in the training data. Data 2—Rugged (Figure 7g,h) showed the most challenging area as it was covered with lower dense vegetation (shrubs), which even lidar often fails to penetrate, and contained highly rugged terrain features.

2.4. Testing and Evaluation Procedure

Each used dataset was manually classified into two classes—ground and non-ground. One area on each site was used for the training of the DNN, and the remaining ones were used for testing. The results of the testing areas produced by DNNs were evaluated (i) visually to be able to assess the character of the potential classification errors and (ii) using standard accuracy characteristics, i.e., balanced accuracy (BA) and F-score (FS). These two characteristics were used as complementary values as BA characterizes the average classification of all (in this case, two) classes considering their representation, while FS rather focuses on the success of classification of the element of interest (in our study, ground). Considering the points classified as ground by the algorithm positives (P) and the points classified as non-ground as negatives (N), the quality of classification was then expressed as a true positive rate (% of all ground points in the point cloud that were correctly identified) and a true negative rate (% of all negatives in the point cloud that were identified as negatives). For details, see the calculation formulas in Table 2 [48].

The voxel sizes for the MSVC method were selected according to the principles described above, with an initial voxel size of 6 m. The minimum (final) voxel size was chosen in accordance with the particular dataset—based on the noise level. It was set to 6 cm for Data 1, and to 11 cm for Data 2, respectively. The entire series of steps in meters was 6.00, 4.50, 3.38, 2.53, 1.90, 1.42, 1.07, 0.80, 0.60, 0.45, 0.34, 0.25, 0.19, 0.14, 0.11, 0.08, and 0.06 m.

To compare the classification success with an established method, we used a freely available cloth simulation filter (CSF) implemented as a plugin in CloudCompare ver. 2.13.0. This filter is based on the idea of simulating a cloth that is dropped onto the inverted point cloud. The cloth will naturally drape over the surface of the point cloud, and the points that are covered by the cloth can be classified as ground points. There are only three parameters defining the behavior of the cloth—the distance of the points defining the deformation of the cloth (square grid), the pliability of the cloth (in the form of three scenarios—steep slope, relief, and flat), and the possibility of slope processing—i.e., whether the terrain can climb very steeply. Since the number (density) of points defining the spatial area of the cloth is small relative to the original point cloud, usually all points of the point cloud closer to the cloth than a defined constant (threshold) are considered as terrain points. This filter was selected for comparison because in our previous paper, in which we tested several widely used freely available filters (CSF, SMRF, PMF) [32], CSF performed the best of the three filters on the rugged rocky terrain with characteristics similar to those presented here. As there is no universal setting of the CSF, multiple settings were tested for each of the test areas and the best result was always used. Namely, the CSF parameters were set as follows: Scene processing—yes; Scene—Steep slope; Cloth resolution—0.025, 0.5, 0.1 m and threshold of 0.15, 0.20, and 0.25 m.

3. Results

3.1. Data 1—Rocks

The results of the principal evaluation, i.e., the TPR, TNR, BA, and FS characteristics, are detailed in Table 3. The MSVC outcome was presented not only for the final (minimal) voxel size, but also for the purpose of illustration of the process for the four previous steps (0.08–0.19 m).

While the true positive rate (TPR), i.e., the success rate of the identification of ground points, of the CSF was approximately 89%, MSVC reached an excellent TPR of 99.94%, i.e., it classified almost all the ground points within the point cloud as ground. The TNR (True negative rate, i.e., the success rate of identification of non-ground points) in both methods was approx. 76%. This was caused by the relatively high density of the non-ground points (low vegetation) in the immediate vicinity of the terrain; hence, neither CSF nor MSVC could fully remove them (due to the CSF threshold and the MSVC voxel size). The BA of the MSVC method was 88.3% and the FS was 98.3%. This indicated that the ground point classification success was high and the method preserved the necessary points. The CSF results were clearly inferior (BA, 82.4%; FS, 92.6%), as can be seen in Figure 8a and in detail in Figure 8c.

The CSF filter, similar to others tested, e.g., in [31,32], was not suitable for this type of terrain. MSVC, however, learned the available terrain shapes, which led to excellent terrain classification. In the top right corner of the detail in Figure 8d, a lower part of a tree (bush) that was misclassified by both algorithms was clearly visible. However, this was a very difficult detail to distinguish from the ground—the vegetation was enclosed in a thin crevice; to be fair to the MSVC, we should note that no such feature was present in the training data for the MSVC filter. In a way, therefore, we cannot consider this to be a failure of the MSVC algorithm, as it “saw” no example of this in the training data; had the training data included such an area, the result would likely have been even better.

3.2. Data 2

Due to the use of multiple locations within the Data 2 point cloud, only the best results are presented in Table 4 (complete results are shown in Appendix D, Table A3, Table A4 and Table A5). It should be, however, pointed out that many of the accuracy parameters were very close for multiple settings so it was difficult to pick a single setting in some instances.

In all cases, both BA and FS parameters were better for MSVC than for CSF, although the differences were very low and both methods provided good success. However, the differences were more apparent in data visualization (see Figure 9, Figure 10 and Figure 11). These figures also obviate that MSVC preserved more points in the rugged areas than CSF, thus better “recognizing” terrain shape. The unfiltered vegetation was low (i.e., very close to the terrain) and was approximately the same in both filtering approaches (Figure 9a,b). Unlike in Data 1, this dataset also contained terrain features that were not present in the training data (compare, e.g., Figure 7c,e, Figure 9 and Figure 10 to Figure 7a, which shows training data, and Figure 12, showing the buildings in the Tower area). Despite this lack of training data, the MSVC algorithm dealt very well with such areas, better than CSF (the data were more complete).

Data 2—Boulder (Figure 9) showed a similar rate of misclassification of low vegetation for ground for both algorithms (compare Figure 9b to Figure 9a). However, as seen in Figure 9d, MSVC was more successful in identifying ground points in problematic areas than CSF.

Figure 10 shows results from the Data 2—Tower area. Both filters suffered from noise more than in Data 2—Boulder (a, b); more importantly, however, CSF incorrectly classified the roof of the building as ground (a) while MSVC correctly recognized it as a non-ground object despite having no training data on built-up areas. The (c) and (d) panels showed a similar effect as in the previous case—the MSVC-generated terrain was more complete (with the exception of the precipice just below the building, which was not correctly recognized as terrain by MSVC).

The results of Data 2—Rugged classification (Figure 11) showed a clear difference between the performance of both filters in the area with dense shrubs (a) and (b). In such areas, CSF misclassified low vegetation for terrain more often than the MSVC filter (blue ovals). On the other hand, the yellow oval showed an area of low vegetation with the same shape as the terrain underneath that was incorrectly classified as terrain by MSVC (a), (b). Lastly, the green oval showed an area where MSVC identified significantly more terrain points (see Figure 11d); it was, however, at the cost of preserving some points belonging to low vegetation.

4. Discussion

In this paper, we developed and evaluated a novel ground filter based on a combination of point cloud voxelization and a deep neural network. The principle of the method lay in the gradual reduction in the voxel size, which gradually, step by step, refined the terrain.

A vast majority of currently used filters are constructed for airborne laser scanning data, where they serve relatively well. It is, however, necessary to point out that applications utilizing ALS data typically need a resolution of just several points per square meter. However, where dense point clouds based on UAV data acquisition (be it SfM-based or UAV-borne lidar point clouds) are concerned, these filters may fail to provide satisfactory results, particularly in rugged terrain ([32]). Such dense point clouds are often used in engineering applications (such as in quarries [49,50], or in rocky terrain where landslides and overhanging or collapsing rocks may pose a danger to human constructions or lives [51]).

It should be emphasized that to assess the performance of both algorithms, we used data that were extremely difficult for ground filtering. In everyday practice, such “tricky” terrain features concentrated in our testing areas represent only a small fraction of the point clouds. In effect, excellent overall results can be achieved even for areas containing small amounts of rugged features when using any other filter, but these problematic spots would be largely misclassified. In other words, if such difficult terrain, as shown in our datasets, normally represents a very small proportion of the landscape and the rest is correctly classified by any filter, the error caused by the incorrect classification of these features causes only a negligible overall error.

Although both the CSF and MSVC filters performed well in identifying terrain points, the MSVC filter consistently outperformed the widely used CSF filter, especially in the most rugged and problematic areas, such as the rock face in Data 1 (best balanced accuracies of 88.3% vs. 82.4%, and F-scores of 98.3 vs. 92.6% for MSVC and CSF, respectively). As shown obviously from the true positive rates (which were as high as 99.9% for MSVC and 89.2% for CSF), MSVC correctly detected almost all terrain points. Even more importantly, this was not at the cost of decreasing the true negative rate (76.6% and 75.7% for MSVC and CSF, respectively); on the contrary, MSVC performed slightly better even in this parameter. Where Data 2 was concerned, MSVC still outperformed CSF, although the difference was not as high. This was given by the fact that the Data 2 point cloud was more “standard” and contained fewer highly problematic spots. The biggest difference between the two algorithms lay in the better preservation of terrain points by MSVC even in the challenging spots (see the points highlighted in green in Figure 9d, Figure 10d and Figure 11d). In addition, MSVC was less prone to identifying low vegetation as ground (look at the blue ovals in Figure 11a,b).

We were surprised by the fact that unlike CSF, the MSVC algorithm managed to successfully remove the buildings in Data 2—Tower, although no such construction was present in the training data. This suggests that the neural network was capable of certain abstraction, distinguishing between the true terrain and the roof. The likely explanation was that rather than recognizing the roof and the wall as such (which the algorithm had no way of knowing), it only recognized that these structures did not represent the ground as it did not correspond to any terrain feature it “saw” in the training data. This finding makes the algorithm even more promising, as it appears that the filter needs to be trained only on the character of the ground, and the knowledge of the exact vegetation type (or other obstacles) might be less important for its correct function. In effect, this opens the door to the possibility of more universal application of the algorithm—if the algorithm is pre-trained to several terrain types, it may be able to correctly identify terrain without even the need for creating reference data for the particular area; rather, selecting just the terrain type (e.g., flat terrain, rocks, urban, etc.) that was previously used for training could be sufficient. This, however, needs to be verified in future research.

The MSVC algorithm brings multiple benefits to the ground filtering of dense point clouds. It does not need any computationally demanding mathematical operations, and the algorithm makes its decisions always solely based on the number of points in individual voxels and its surroundings. Last but not least, the algorithm is inherently robust against the presence of outliers and noise. Considering the principle of the method (using the relative number of surrounding points, not the absolute number), it can be assumed that the performance would be similar on clouds with any density. Still, detailed analysis of the MSVC performance on clouds of different densities could also be interesting and should be the subject of further research.

On the other hand, it is not possible to achieve terrain containing only a single layer of ground points due to the cubic character of the voxel, which always contains the remnants of low vegetation. This is, however, true for almost any filter as most of them operate with a parameter such as an offset or threshold that characterizes the allowed distance of ground points from the approximated terrain surface.

It is very difficult to compare our results to the existing literature as this is a novel filter that has never been employed before. Moreover, only a few studies investigated ground filtering on such dense point clouds characterizing highly rugged terrain at resolutions comparable to our study (with the resulting terrain resolution of approx. 1 to 5 cm).

There is ample space for future research on the use of this algorithm, besides the aforementioned construction of a neural network that would be universally trained to detect a particular terrain type (which could simplify the use of this algorithm). The simplest direction of future research is increasing the number of voxels in the voxel cube, which could allow for faster operation (by needing fewer steps). At present, this algorithm only uses a single characteristic—the number of points in the voxel; this could be built upon by the addition of other characteristics, such as the spatial variance. Another possible improvement could lie in training—in the present paper, we trained the filter on terrain rotated in steps of 90 degrees. Reduction of the rotation step could also lead to the improvement of training and, thus, to better terrain detection performance.

5. Conclusions

In this paper, we proposed a novel ground filtering method utilizing the multi-size voxel cube (MSVC) approach combined with a deep neural network. We demonstrated its effectiveness in an extremely difficult (rugged) terrain with dense vegetation. Compared to traditional filters, such as CSF, the MSVC method identified terrain points more accurately, thanks to the learning feature of the neural network. These results demonstrated that the MSVC filter can be successfully used for digital terrain extraction from dense point clouds even in highly demanding environments where other filters fail, such as steep slopes and/or rugged densely vegetated areas. In the present study, we used a part of the study area for learning, necessitating the manual creation of reference terrain, which can be considered a disadvantage of this algorithm. In the future, however, we aim to produce more generally valid neural networks trained for the individual types of landscape (such as rocky mountains, urban environment, hills, etc.), which would make the use of this algorithm much more user-friendly.

Author Contributions

Conceptualization, M.Š.; methodology, M.Š.; software, M.Š.; validation, R.U., M.B., J.K. and H.V.; formal analysis, R.U.; investigation M.B., J.K. and H.V.; resources, M.Š.; data curation, R.U.; writing—original draft preparation, M.Š.; writing—review and editing, R.U., M.B., J.K. and H.V.; visualization, M.Š.; supervision, M.Š.; project administration, R.U.; funding acquisition, R.U. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Technology Agency of the Czech Republic—grant number CK03000168, “Intelligent methods of digital data acquisition and analysis for bridge inspections”, and by the grant agency of CTU in Prague—grant number SGS24/048/OHK1/1T/11 “Data filtering and classification using machine learning methods”.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to the size of the data.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Overview of geometric ground filtering algorithms (sparse point clouds—less than 50 points per m², dense point cloud—more than 50 points per m²).

Algorithm	Data Type	Description	Reference
Slope based	ALS/sparse	For each point in the point cloud, the local ground slope is calculated, and points where the slope value is higher than the selected threshold are considered non-ground points. It is assumed that the points with above-threshold slope are points on buildings, trees or are faulted/removed points.	[13]
Slope adaptive	ALS/sparse	This is an improvement on the previous method, where the slope threshold is not strictly chosen, but adaptively adjusted based on the wider environment	[14]
Multi- Directional	ALS/sparse	This method analyzes the gradients between a point and its neighbors in multiple directions, and it is based on the assumption that the neighborhood of ground points is smoother than that of non-ground points.	[15]
Adaptive Slope	ALS/sparse	This works similarly to the previous one, except that the slope threshold is adaptively adjusted based on local statistics.	[16]
Triangular Grid Filter	ALS/sparse	A Triangular grid filter based on the Slope Filter, which finds violation points of the spatial position relationship within each point in the triangulation network using improved KD-Tree-Based Euclidean Clustering	[17]
Modified 3D Alpha Shape	ALS/sparse	Preprocessing for outlier removal and potential ground point extraction; the deployment of a modified 3D alpha shape to construct multiscale point cloud layers; and the use of a multiscale triangulated irregular network (TIN) densification process for precise ground point extraction.	[18]
Weighted mean	ALS/sparse	The terrain model is roughly estimated, which may include also non-ground points. The terrain points are then iteratively determined by a weighted mean of the surrounding points, where the weight decreases with distance from the predicted terrain	[19]
TIN densification	ALS/sparse	This method uses a triangular irregular network (TIN) as a representation of the terrain. The algorithm first selects a basic approximation of the terrain defined by a small number of points (usually with the lowest elevations in different parts of the data space) and creates a TIN from these points. New points are added iteratively and the TIN is condensed. Points are added if they fit the expected terrain characteristics (e.g., slopes are not too steep). The thresholds for point selection are adaptively adjusted during the iteration to allow the method to handle variable topography.	[20]
Repetitive Interpolation	ALS/sparse	The basis of the method consists of iterative iteration between points, where points that do not meet the criteria are gradually eliminated. The iterative interpolation produces an approximated terrain, and points are eliminated based on their distance from this approximation.	[21]
Adaptive TIN densification	ALS/sparse	Improves [18] by selecting the starting points in the adaptive grid.	[22]
Progressive Morphological Filter	ALS/sparse	The progressive morphological filter gradually increases the size of the filter window and uses elevation difference thresholds to remove the points of unwanted objects (buildings, vegetation, etc.) while the terrain data is preserved.	[23]
Simple Morphological Filter	ALS/sparse	The simple morphological filter (SMRF) solves the terrain classification problem using image processing techniques. This filter uses a linearly increasing window and simple slope thresholding, along with a novel application of image completion techniques.	[24]
Morphological Multi-Gradient	ALS/sparse	A morphological filter that is based on the analysis of multiple gradients for each point.	[25]
Object-Based Land Cover Classification	ALS/sparse	This method focuses on the relative information content from the height, intensity, and shape of features found in the scene. Eight object metrics were used to classify the terrain into land cover information: average height, the standard deviation of height, height homogeneity, height contrast, height entropy, height correlation, average intensity, and compactness. A machine learning decision tree was used.	[26]
Contextual Segment-Based	ALS/sparse	This method is based on a conditional random field (CRF), which is a graphical model that can be used to model the relationship between different variables. In this case, the variables are the labels of the different segments in the point cloud. The CRF is trained on a dataset of labeled point clouds, and then it can be used to classify new point clouds.	[27]
Skewness and Kurtosis	ALS/sparse	Iterative analysis of the skewness and kurtosis of the statistical distribution around a point.	[28]
CSF	ALS/sparse	Based on the idea of simulating a cloth that is dropped onto the inverted point cloud. The cloth will naturally drape over the surface of the point cloud, and the points that are covered by the cloth can be classified as ground points. There are only three parameters defining the behavior of the cloth—the distance of the points defining the deformation of the cloth (square grid), the pliability of the cloth (in the form of three scenarios—steep slope, relief, and flat), and the possibility of slope processing—i.e., whether the terrain can climb very steeply. Since the number (density) of points defining the spatial area of the cloth is small relative to the original point cloud, usually all points of the point cloud closer to the cloth than a defined constant (threshold) are considered terrain points.	[29]
Complementary Cloth Simulation and Progressive TIN Densification	ALS/sparse	This combines the methods of cloth simulation and progressive TIN densification. This hybrid approach exploits the strengths of both methods: the accuracy of cloth simulation in detecting terrain in challenging environments and the robustness of progressive TIN densification in removing non-terrain points.	[30]
MDSR	Dense	This method is based on the idea of rasterizing the point cloud from multiple directions and with multiple shifts of the raster grid to identify ground points.	[31]

Table A2. Overview of machine learning ground filtering algorithms (sparse point clouds—less than 50 points per m², dense point cloud—more than 50 points per m²).

Algorithm	Data Type	Description	Reference
Combined Structural and geometrical filtering	Dense	This method combines structural filtering using the CANUPO tool and a geometric filter (preferably CSF) applied to the data in the horizontal position (rock wall transformed so that the fitted plane is horizontal)	[32]
RGB filtering	Dense	The method presented here performs point cloud filtering based on non-geometric point parameters—based on color (RGB). This is done using both a neural network and an approximation of the color spaces of each class, using an originally designed automatic method based on 3D Gaussian mixed models. The method has a wider scope, and it can be also used for general color-based classification.	[33]
Spatial and spectral characteristics	Dense	The method presented here uses a neural network for the ground filtering of the point clouds of coastal salt marches (essentially flat terrain with ground vegetation), the input data are not coordinates but spatial (e.g., height) and spectral characteristics (e.g., eigenvalues calculated from the spherical neighborhood). The test data were acquired using a UAV lidar system.	[34]
Spatial and spectral characteristics	ALS/sparse	Spatial and spectral features used for machine learning; testing was performed on flat terrain data with buildings and tall vegetation acquired by ALS	[35]
Local characteristics—edge convolution	ALS/sparse	A method using spatial and local characteristics here in addition to the reduced position of the point, e.g., roughness. A special feature is the use of the edge convolution operation.	[1]
Voxelization and 3D convolution	ALS/sparse	The voxelization and post-processing is performed using 3D convolutions and max-pooling; the calculation is demanding and tested again on ALS data, where the maximum number of points in the cloud is 1 million, with a density 5–10 points per square meter. The terrain is flat.	[36]
Transformation to image like structure	ALS/sparse	This uses the transformation of the point’s surroundings into an image form and its further processing by a convolutional neural network, using specially designed local characteristics instead of RGB. Again, ALS (8 points/m², 700 k points), flat terrain with buildings.	[37]
Point clouds projected to multidimensional image	ALS/sparse	The point clouds are first projected onto a horizontal plane and converted into a multidimensional image, using pixel sizes of 0.5 m and 1 m. This is then analyzed using a multiscale fully convolutional network. Flat terrain.	[38]
CNN under slope and copula correlation constraint	ALS/sparse	Farthest point sampling with slope constraints, intra-class feature enhancements via copula correlation and attention mechanisms, filter error correction using copula correlation and confidence intervals, and the refinement of filtering accuracy by adjusting for negatively correlated point sets.	[39]
Multi-Scale and Multi-View Deep Features	ALS/sparse	Elevation features, spectral features, and geometric features are used. The cloud is voxelized, and feature maps generated by projections onto three orthogonal planes are used for classification. The classification is performed using a fully convolutional network (FCN). Test dataset—point cloud showing flat terrain obtained by ALS (with all features as usual).	[40]
Iterative sequential terrain prediction	ALS/sparse	This converts the terrain filtering problem into an iterative sequential prediction problem using point profiles. Uses deep reinforcement learning (DRL): DRL optimizes the prediction sequence and sequentially acquires the bare terrain.	[41]
2D projection of 3D features	ALS/sparse	This uses point cloud transformation into voxel representation and the 2D projection of 3D features; classification is done by convolutional neural network.	[42]
Multi-Scale CNN with Attention Mechanism	ALS/sparse	This classifies the point cloud by transforming it into a 2D image, where the transformed height is used instead of colors. The classification itself is performed using a convolutional neural network (CNN).	[43]
Vertical Slice Equal Sampling	ALS/sparse	Locally samples the original point cloud, organizing the unordered sequence of points and reducing their number while maintaining the terrain’s representation, and then classification is performed using a specially designed transformer network	[44]
PointNet	All	Neural network architecture designed for the direct processing of point clouds without the need to convert them into regular 3D voxel grids or collections of images. PointNet respects the permutation invariance of points in a point cloud and provides a unified architecture for applications such as object classification.	[45]
PointNet++	All	An enhanced version of PointNet. PointNet++ focuses on learning hierarchical features on point sets in a metric space. It addresses the limitations of the original PointNet by capturing local structures induced by the metric space, improving the ability to recognize fine patterns and generalize complex scenes.	[46]
Point Cloud Binary Voxelization	ALS/Sparse	Point clouds are converted into a binary voxel-based data (BVD) model, where each voxel has a value of 1 or 0 depending on whether it contains LiDAR points. The algorithm selects the lowest voxels with a value of 1 as ground seeds and then labels them and their 3D-connected set as ground voxels.	[47]

Appendix B

Figure A1. Data 2—location of individual data in the area (a) Data 2 Training (b) Data 2 Boulders (c) Data 2 Tower (d) Data 2 Rugged.

Appendix C. Definition a Neural Network in Python Using the Tensorflow Library

import numpy
import tensorflow as tf
import keras
from keras import regularizers

kernel = 9;
k3 = numpy.power(kernel,3)
n1 = int(2*k3)
add_layers_number = 7            # number of hidden layers
regu = 0.001                     # coefficient L2 regularization
drop = 0.25                      # drop rate 25%

def CreateNetModel_T2(add_layers_number, n1, k3, regu, drop):
    model = keras.Sequential()
    model.add(keras.Input(shape=(k3,)))

    for i in range(add_layers_number):
        model.add(keras.layers.Dense(int(n1/np.power(2,i)),
        activation=’relu’, kernel_regularizer=regularizers.L2(regu))) model.add(keras.layers.Dropout(drop))

    model.add(keras.layers.Dense(1, activation=’sigmoid’))
    model.compile(loss=’binary_crossentropy’, optimizer=’adam’,
    metrics=[’accuracy’])

return model

Appendix D. Complete Classification Results for Data 2

Table A3. Classification success rate—Data 2 Boulders.

Method	Cloth Resolution/Voxel Size [m]	Threshold [m]	TPR [%]	TNR [%]	BA [%]	FS [%]
CSF	0.025	0.250	99.35	99.57	99.46	99.32
	0.050		99.00	99.67	99.34	99.23
	0.100		97.97	99.75	98.86	98.77
	0.250		90.23	99.81	95.02	94.71
	0.025	0.200	98.85	99.64	99.24	99.13
	0.050		98.13	99.77	98.95	98.87
	0.100		96.66	99.85	98.26	98.18
	0.250		86.90	99.88	93.39	92.90
	0.025	0.150	97.45	99.71	98.58	98.47
	0.050		95.71	99.85	97.78	97.68
	0.100		93.25	99.92	96.58	96.44
	0.250		80.93	99.93	90.43	89.41
MSVC	0.110	-	99.30	99.79	99.54	99.47
	0.140		99.67	99.69	99.68	99.58
	0.190		99.88	99.57	99.72	99.59
	0.250		100.00	99.16	99.58	99.32

Table A4. Classification success rate—Data 2 Tower.

Method	Cloth Resolution/Voxel Size [m]	Threshold [m]	TPR [%]	TNR [%]	BA [%]	FS [%]
CSF	0.025	0.250	99.52	97.11	98.32	96.97
	0.050		99.29	97.41	98.35	97.14
	0.100		98.61	97.56	98.08	96.93
	0.250		94.33	98.07	96.20	95.20
	0.025	0.200	99.33	97.37	98.35	97.12
	0.050		99.00	97.79	98.39	97.35
	0.100		98.09	97.97	98.03	97.06
	0.250		92.51	98.46	95.49	94.61
	0.025	0.150	98.60	97.92	98.26	97.27
	0.050		97.93	98.57	98.25	97.56
	0.100		96.41	98.78	97.59	96.98
	0.250		88.61	99.12	93.86	93.10
MSVC	0.110	-	99.78	98.00	98.89	97.95
	0.140		99.79	97.75	98.77	97.71
	0.190		99.80	97.57	98.69	97.55
	0.250		99.81	97.43	98.62	97.42

Table A5. Classification success rate—Data 2 Rugged.

Method	Cloth Resolution/Voxel Size [m]	Threshold [m]	TPR [%]	TNR [%]	BA [%]	FS [%]
CSF	0.025	0.250	99.19	97.56	98.37	97.78
	0.050		98.51	98.16	98.33	97.88
	0.100		98.17	98.60	98.38	98.03
	0.250		93.77	98.96	96.37	96.01
	0.025	0.200	99.33	98.52	97.85	98.18
	0.050		99.00	97.33	98.56	97.95
	0.100		98.09	96.38	99.03	97.70
	0.250		92.51	90.66	99.31	94.98
	0.025	0.150	96.76	98.18	97.47	97.00
	0.050		94.23	98.93	96.58	96.23
	0.100		91.96	99.37	95.66	95.34
	0.250		84.48	99.58	92.03	91.27
MSVC	0.110	-	99.20	98.82	99.01	98.71
	0.140		99.64	98.46	99.05	98.67
	0.190		99.81	98.06	98.94	98.47
	0.250		99.87	97.66	98.76	98.19

References

Ciou, T.-S.; Lin, C.-H.; Wang, C.-K. Airborne LiDAR Point Cloud Classification Using Ensemble Learning for DEM Generation. Sensors 2024, 24, 6858. [Google Scholar] [CrossRef] [PubMed]
Wegner, K.; Durand, V.; Villeneuve, N.; Mangeney, A.; Kowalski, P.; Peltier, A.; Stark, M.; Becht, M.; Haas, F. Multitemporal Quantification of the Geomorphodynamics on a Slope within the Cratére Dolomieu—At the Piton de La Fournaise (La Réunion, Indian Ocean) Using Terrestrial LiDAR Data, Terrestrial Photographs, and Webcam Data. Geosciences 2024, 14, 259. [Google Scholar] [CrossRef]
Peralta, T.; Menoscal, M.; Bravo, G.; Rosado, V.; Vaca, V.; Capa, D.; Mulas, M.; Jordá-Bordehore, L. Rock Slope Stability Analysis Using Terrestrial Photogrammetry and Virtual Reality on Ignimbritic Deposits. J. Imaging 2024, 10, 106. [Google Scholar] [CrossRef]
Treccani, D.; Adami, A.; Brunelli, V.; Fregonese, L. Mobile Mapping System for Historic Built Heritage and GIS Integration: A Challenging Case Study. Appl. Geomat. 2024, 16, 293–312. [Google Scholar] [CrossRef]
Marčiš, M.; Fraštia, M.; Lieskovský, T.; Ambroz, M.; Mikula, K. Photogrammetric Measurement of Grassland Fire Spread: Techniques and Challenges with Low-Cost Unmanned Aerial Vehicles. Drones 2024, 8, 282. [Google Scholar] [CrossRef]
Štroner, M.; Urban, R.; Křemen, T.; Braun, J. UAV DTM Acquisition in a Forested Area—Comparison of Low-Cost Photogrammetry (DJI Zenmuse P1) and LiDAR Solutions (DJI Zenmuse L1). Eur. J. Remote Sens. 2023, 56, 2179942. [Google Scholar] [CrossRef]
Marotta, F.; Teruggi, S.; Achille, C.; Vassena, G.P.M.; Fassi, F. Integrated Laser Scanner Techniques to Produce High-Resolution DTM of Vegetated Territory. Remote Sens. 2021, 13, 2504. [Google Scholar] [CrossRef]
Štroner, M.; Urban, R.; Křemen, T.; Braun, J.; Michal, O.; Jiřikovský, T. Scanning the Underground: Comparison of the Accuracies of SLAM and Static Laser Scanners in a Mine Tunnel. Measurement 2024, 242, 115875. [Google Scholar] [CrossRef]
Pavelka, K., Jr.; Běloch, L.; Pavelka, K. Modern Methods of Documentation and Visualization of Historical Mines in the Unesco Mining Region in the Ore Mountains. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2023, X-M-1-2023, 237–244. [Google Scholar] [CrossRef]
Meng, X.; Currit, N.; Zhao, K. Ground Filtering Algorithms for Airborne LiDAR Data: A Review of Critical Issues. Remote Sens. 2010, 2, 833–860. [Google Scholar] [CrossRef]
Qin, N.; Tan, W.; Guan, H.; Wang, L.; Ma, L.; Tao, P.; Fatholahi, S.; Hu, X.; Li, J. Towards Intelligent Ground Filtering of Large-Scale Topographic Point Clouds: A Comprehensive Survey. Int. J. Appl. Earth Obs. Geoinf. 2023, 125, 103566. [Google Scholar] [CrossRef]
Chen, C.; Guo, J.; Wu, H.; Li, Y.; Shi, B. Performance Comparison of Filtering Algorithms for High-Density Airborne LiDAR Point Clouds over Complex LandScapes. Remote Sens. 2021, 13, 2663. [Google Scholar] [CrossRef]
Vosselman, G. Slope based filtering of laser altimetry data. Int. Arch. Photogramm. Remote Sens. 2000, 33, 935–942. [Google Scholar]
Sithole, G. Filtering of laser altimetry data using a slope adaptive filter. Int. Arch. Photogramm. Remote Sens. 2001, 34, 203–210. [Google Scholar]
Meng, X.; Wang, L.; Silván-Cárdenas, J.L.; Currit, N. A Multi-Directional Ground Filtering Algorithm for Airborne LIDAR. ISPRS J. Photogramm. Remote Sens. 2008, 64, 117–124. [Google Scholar] [CrossRef]
Susaki, J. Adaptive Slope Filtering of Airborne LiDAR Data in Urban Areas for Digital Terrain Model (DTM) Generation. Remote Sens. 2012, 4, 1804–1819. [Google Scholar] [CrossRef]
Kang, C.; Lin, Z.; Wu, S.; Lan, Y.; Geng, C.; Zhang, S. A Triangular Grid Filter Method Based on the Slope Filter. Remote Sens. 2023, 15, 2930. [Google Scholar] [CrossRef]
Cao, D.; Wang, C.; Du, M.; Xi, X. A Multiscale Filtering Method for Airborne LiDAR Data Using Modified 3D Alpha Shape. Remote Sens. 2024, 16, 1443. [Google Scholar] [CrossRef]
Kraus, K.; Pfeifer, N. Determination of Terrain Models in Wooded Areas with Airborne Laser Scanner Data. ISPRS J. Photogramm. Remote Sens. 1998, 53, 193–203. [Google Scholar] [CrossRef]
Axelsson, P. DEM generation from laser scanner data using adaptive TIN models. Int. Arch. Photogramm. Remote Sens. 2000, 33, 111–118. [Google Scholar]
Kobler, A.; Pfeifer, N.; Ogrinc, P.; Todorovski, L.; Oštir, K.; Džeroski, S. Repetitive Interpolation: A Robust Algorithm for DTM Generation from Aerial Laser Scanner Data in Forested Terrain. Remote Sens. Environ. 2006, 108, 9–23. [Google Scholar] [CrossRef]
Zheng, J.; Xiang, M.; Zhang, T.; Zhou, J. An Improved Adaptive Grid-Based Progressive Triangulated Irregular Network Densification Algorithm for Filtering Airborne LiDAR Data. Remote Sens. 2024, 16, 3846. [Google Scholar] [CrossRef]
Zhang, K.; Chen, S.-C.; Whitman, D.; Shyu, M.-L.; Yan, J.; Zhang, C. A Progressive Morphological Filter for Removing Nonground Measurements from Airborne LIDAR Data. IEEE Trans. Geosci. Remote Sens. 2003, 41, 872–882. [Google Scholar] [CrossRef]
Pingel, T.J.; Clarke, K.C.; McBride, W.A. An Improved Simple Morphological Filter for the Terrain Classification of Airborne LIDAR Data. ISPRS J. Photogramm. Remote Sens. 2013, 77, 21–30. [Google Scholar] [CrossRef]
Li, Y. Filtering Airborne Lidar Data by an Improved Morphological Method Based on Multi-Gradient Analysis. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2013, XL-1/W1, 191–194. [Google Scholar] [CrossRef]
Im, J.; Jensen, J.R.; Hodgson, M.E. Object-Based Land Cover Classification Using High-Posting-Density LiDAR Data. GIScience Remote Sens. 2008, 45, 209–228. [Google Scholar] [CrossRef]
Vosselman, G.; Coenen, M.; Rottensteiner, F. Contextual Segment-Based Classification of Airborne Laser Scanner Data. ISPRS J. Photogramm. Remote Sens. 2017, 128, 354–371. [Google Scholar] [CrossRef]
Crosilla, F.; Macorig, D.; Scaioni, M.; Sebastianutti, I.; Visintini, D. LiDAR Data Filtering and Classification by Skewness and Kurtosis Iterative Analysis of Multiple Point Cloud Data Categories. Appl. Geomat. 2013, 5, 225–240. [Google Scholar] [CrossRef]
Zhang, W.; Qi, J.; Wan, P.; Wang, H.; Xie, D.; Wang, X.; Yan, G. An Easy-to-Use Airborne LiDAR Data Filtering Method Based on Cloth Simulation. Remote Sens. 2016, 8, 501. [Google Scholar] [CrossRef]
Cai, S.; Zhang, W.; Liang, X.; Wan, P.; Qi, J.; Yu, S.; Yan, G.; Shao, J. Filtering Airborne LiDAR Data Through Complementary Cloth Simulation and Progressive TIN Densification Filters. Remote Sens. 2019, 11, 1037. [Google Scholar] [CrossRef]
Štroner, M.; Urban, R.; Línková, L. Multidirectional Shift Rasterization (MDSR) Algorithm for Effective Identification of Ground in Dense Point Clouds. Remote Sens. 2022, 14, 4916. [Google Scholar] [CrossRef]
Štroner, M.; Urban, R.; Lidmila, M.; Kolář, V.; Křemen, T. Vegetation Filtering of a Steep Rugged Terrain: The Performance of Standard Algorithms and a Newly Proposed Workflow on an Example of a Railway Ledge. Remote Sens. 2021, 13, 3050. [Google Scholar] [CrossRef]
Štroner, M.; Urban, R.; Línková, L. Color-Based Point Cloud Classification Using a Novel Gaussian Mixed Modeling-Based Approach versus a Deep Neural Network. Remote Sens. 2024, 16, 115. [Google Scholar] [CrossRef]
Liu, K.; Liu, S.; Tan, K.; Yin, M.; Tao, P. ANN-Based Filtering of Drone LiDAR in Coastal Salt Marshes Using Spatial–Spectral Features. Remote Sens. 2024, 16, 3373. [Google Scholar] [CrossRef]
Nurunnabi, A.; Teferle, F.N.; Li, J.; Lindenbergh, R.C.; Hunegnaw, A. An Efficient Deep Learning Approach for Ground Point Filtering in Aerial Laser Scanning Point Clouds. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2021, XLIII-B1-2021, 31–38. [Google Scholar] [CrossRef]
Zhang, Z.; Sun, L.; Zhong, R.; Chen, D.; Zhang, L.; Li, X.; Wang, Q.; Chen, S. Hierarchical Aggregated Deep Features for ALS Point Cloud Classification. IEEE Trans. Geosci. Remote Sens. 2020, 59, 1686–1699. [Google Scholar] [CrossRef]
Chen, R.; Wu, J.; Zhao, X.; Luo, Y.; Xu, G. SC-CNN: LiDAR Point Cloud Filtering CNN under Slope and Copula Correlation Constraint. ISPRS J. Photogramm. Remote Sens. 2024, 212, 381–395. [Google Scholar] [CrossRef]
Yang, Z.; Jiang, W.; Xu, B.; Zhu, Q.; Jiang, S.; Huang, W. A Convolutional Neural Network-Based 3D Semantic Labeling Method for ALS Point Clouds. Remote Sens. 2017, 9, 936. [Google Scholar] [CrossRef]
Rizaldy, A.; Persello, C.; Gevaert, C.; Elberink, S.O.; Vosselman, G. Ground and Multi-Class Classification of Airborne Laser Scanner Point Clouds Using Fully Convolutional Networks. Remote Sens. 2018, 10, 1723. [Google Scholar] [CrossRef]
Lei, X.; Wang, H.; Wang, C.; Zhao, Z.; Miao, J.; Tian, P. ALS Point Cloud Classification by Integrating an Improved Fully Convolutional Network into Transfer Learning with Multi-Scale and Multi-View Deep Features. Sensors 2020, 20, 6969. [Google Scholar] [CrossRef]
Dai, H.; Hu, X.; Shu, Z.; Qin, N.; Zhang, J. Deep Ground Filtering of Large-Scale ALS Point Clouds via Iterative Sequential Ground Prediction. Remote Sens. 2023, 15, 961. [Google Scholar] [CrossRef]
Dai, H.; Hu, X.; Zhang, J.; Shu, Z.; Xu, J.; Du, J. Large-Scale ALS Point Clouds Segmentation via Projection-Based Context Embedding. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–16. [Google Scholar] [CrossRef]
Wang, B.; Wang, H.; Song, D. A Filtering Method for LiDAR Point Cloud Based on Multi-Scale CNN with Attention Mechanism. Remote Sens. 2022, 14, 6170. [Google Scholar] [CrossRef]
Wen, W.; Yang, R.; Tan, J.; Liu, H.; Tan, J. Vertical Slice Equal Sampling and Transformer Network for Point Cloud Ground Filtering. Int. J. Remote Sens. 2024, 45, 4710–4736. [Google Scholar] [CrossRef]
Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the Fourth International Conference on 3D Vision, Stanford, CA, USA, 25–28 October 2016; pp. 601–610. [Google Scholar]
Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In Proceedings of the 31st Conference on Neural Information Processing System (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Wang, L.; Xu, Y.; Li, Y. Aerial Lidar Point Cloud Voxelization with Its 3D Ground Filtering Application. Photogramm. Eng. Remote Sens. 2017, 83, 95–107. [Google Scholar] [CrossRef]
You, S.-H.; Jang, E.J.; Kim, M.-S.; Lee, M.-T.; Kang, Y.-J.; Lee, J.-E.; Eom, J.-H.; Jung, S.-Y. Change Point Analysis for Detecting Vaccine Safety Signals. Vaccines 2021, 9, 206. [Google Scholar] [CrossRef]
Kovanič, Ľ.; Peťovský, P.; Topitzer, B.; Blišťan, P. Spatial Analysis of Point Clouds Obtained by SfM Photogrammetry and the TLS Method—Study in Quarry Environment. Land 2024, 13, 614. [Google Scholar] [CrossRef]
Braun, J.; Braunová, H.; Suk, T.; Michal, O.; Peťovský, P.; Kuric, I. Structural and Geometrical Vegetation Filtering—Case Study on Mining Area Point Cloud Acquired by UAV Lidar. Acta Montan. Slovaca 2022, 26, 661–674. [Google Scholar] [CrossRef]
Kovanič, Ľ.; Štroner, M.; Urban, R.; Blišťan, P. Methodology and Results of Staged UAS Photogrammetric Rockslide Monitoring in the Alpine Terrain in High Tatras, Slovakia, after the Hydrological Event in 2022. Land 2023, 12, 977. [Google Scholar] [CrossRef]

Figure 1. A 2D illustration of the point cloud (profile) and its voxelization to 2 × 2 × 2 m voxels. Individual dots represent the centers of the voxels, color-coded to represent the number of points in the voxel (see the color bar). The central red square indicates the evaluated voxel, and the large orange square indicates the entire area used for its evaluation (2D representation of the voxel cube). (a) shows the overall view, (b) detail.

Figure 2. A 2D illustration of the progressive reduction in vegetation with a gradual reduction in the voxel size (color-coding indicates the number of points in the voxel relative to the most populated voxel; grey indicates voxels with no points; and the greyed-out part of the point cloud indicates the points removed in previous steps). (a) A voxel size of 3.38 m; (b) A voxel size of 1.90 m; (c) A voxel size of 1.42 m; (d) A voxel size of 0.6 m.

Figure 3. (a) The misclassification of voxels with low numbers of points (marked with red arrows) as non-ground and (b) the solution to this problem through the use of the additional shifted grid (blue lines); voxels classified as ground in any of the grids (thick lines) are considered ground and carried forward to the next step.

Figure 4. Gradual filtering with stepwise reduction in the voxel size: (a) Original point cloud; (b) Step 2 (voxel size 4.5 m); (c) Step 5 (voxel size 1.9 m); (d) Step 15—final result (voxel size 0.11 m).

Figure 5. Flowchart of the multi-size voxel cube (MSVC) algorithm.

Figure 6. Data 1 with the vegetation color-coded according to the vegetation height: (a) Training data, (b) Test data; note that the training data contain all types of terrain as well as the vegetation character present in the test data.

Figure 7. Data 2—training area (a,b) and the testing areas Boulders (c,d), Tower (e,f), and Rugged (g,h).

Figure 8. Data 1—best classification results: (a) CSF (cloth resolution 2.5 cm; threshold 25 cm); (b) MSVC (voxel size 6 cm); (c) detail of the CSF classification; (d) detail of the same area classified by MSVC; the color-coded points indicate erroneously preserved vegetation, along with its height.

Figure 9. Classification success for Data 2—Boulder: (a) CSF classification and (b) MSVC classification, with points erroneously classified as ground highlighted in red; (c) CSF classification, with points correctly identified by CSF but not by MSVC highlighted in green (d) MSVC classification, with points correctly identified by MSVC but not by CSF highlighted in green.

Figure 10. Classification success for Data 2—Tower: (a) CSF classification and (b) MSVC classification, with points erroneously classified as ground highlighted in red; (c) CSF classification, with points correctly identified by CSF but not by MSVC highlighted in green (d) MSVC classification, with points correctly identified by MSVC but not by CSF highlighted in green. Blue ovals indicate areas with the biggest differences in the performance of the filters, where CSF identified more points falsely as ground.

Figure 11. Classification success for Data 2—Rugged: (a) CSF classification and (b) MSVC classification, with points erroneously classified as ground highlighted in red; (c) CSF classification, with points correctly identified by CSF but not by MSVC highlighted in green (d) MSVC classification, with points correctly identified by MSVC but not by CSF highlighted in green. Colored ovals indicate areas with the biggest differences in the performance of the filters.

Figure 12. The terrain model of the Data 2—Tower area with buildings shown; note that no buildings were present in the training data.

Table 1. Data 2—Dimensions and the numbers of points of the training and test areas.

Area	Dimensions [m]	Number of Points	Mean Resolution [m]
Data 2 Training	74 × 65 × 38	11,454,057	0.04
Data 2 Boulders	50 × 42 × 22	3,726,774	0.05
Data 2 Tower	85 × 72 × 26	20,941,671	0.03
Data 2 Rugged	100 × 53 × 27	7,569,811	0.05

Table 2. Overview of the success rate characteristics used (TP = true positives; FP = false positives; TN = true negatives; FN = false negatives).

Characteristics	Abbreviation	Calculation
True positive rate	TPR	TPR = TP/(TP + FN)
True negative rate	TNR	TNR = TN/(TN + FP)
Balanced accuracy	BA	BA = (TPR + TNR)/2
F-score	FS	FS = 2TP/(2TP + FP + FN)

Table 3. Data 1—classification success (TPR = True positive rate, TNR = True negative rate, BA = Balanced accuracy, FS = F-Score).

Method	Cloth Resolution/Voxel Size [m]	Threshold [m]	TPR [%]	TNR [%]	BA [%]	FS [%]
CSF	0.025	0.25	89.18	75.66	82.42	92.57
	0.050		87.88	77.04	82.46	91.93
	0.100		86.20	78.08	82.14	91.05
	0.025	0.20	87.66	78.16	82.91	91.89
	0.050		86.17	79.75	82.96	91.15
	0.100		84.07	80.97	82.52	90.01
	0.025	0.15	85.33	81.51	83.42	90.78
	0.050		83.53	83.35	83.44	89.86
	0.100		80.70	84.79	82.75	88.25
MSVC	0.060	-	99.94	76.61	88.28	98.32
	0.080		99.94	74.91	87.43	98.20
	0.110		99.97	72.31	86.14	98.03
	0.140		99.97	70.43	85.20	97.90
	0.190		99.98	68.40	84.19	97.77

Table 4. Classification success characteristics for the best results of both methods in Data 2.

Method	Data Area	Cloth Resolution/Voxel Size [m]	Threshold [m]	TPR [%]	TNR [%]	BA [%]	FS [%]
CSF	Boulders	0.025	0.25	99.35	99.57	99.46	99.32
MSVC		0.140	-	99.67	99.69	99.68	99.58
CSF	Tower	0.050	0.15	97.93	98.57	98.25	97.56
MSVC		0.110	-	99.78	98.00	98.89	97.95
CSF	Rugged	0.050	0.25	98.51	98.16	98.33	97.88
MSVC		0.110	-	99.20	98.82	99.01	98.71

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Štroner, M.; Boušek, M.; Kučera, J.; Váchová, H.; Urban, R. Multi-Size Voxel Cube (MSVC) Algorithm—A Novel Method for Terrain Filtering from Dense Point Clouds Using a Deep Neural Network. Remote Sens. 2025, 17, 615. https://doi.org/10.3390/rs17040615

AMA Style

Štroner M, Boušek M, Kučera J, Váchová H, Urban R. Multi-Size Voxel Cube (MSVC) Algorithm—A Novel Method for Terrain Filtering from Dense Point Clouds Using a Deep Neural Network. Remote Sensing. 2025; 17(4):615. https://doi.org/10.3390/rs17040615

Chicago/Turabian Style

Štroner, Martin, Martin Boušek, Jakub Kučera, Hana Váchová, and Rudolf Urban. 2025. "Multi-Size Voxel Cube (MSVC) Algorithm—A Novel Method for Terrain Filtering from Dense Point Clouds Using a Deep Neural Network" Remote Sensing 17, no. 4: 615. https://doi.org/10.3390/rs17040615

APA Style

Štroner, M., Boušek, M., Kučera, J., Váchová, H., & Urban, R. (2025). Multi-Size Voxel Cube (MSVC) Algorithm—A Novel Method for Terrain Filtering from Dense Point Clouds Using a Deep Neural Network. Remote Sensing, 17(4), 615. https://doi.org/10.3390/rs17040615

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Size Voxel Cube (MSVC) Algorithm—A Novel Method for Terrain Filtering from Dense Point Clouds Using a Deep Neural Network

Abstract

1. Introduction

2. Materials and Methods

2.1. Method Principles

2.2. The Deep Neural Network and Its Training

2.3. Training/Testing Data

2.3.1. Data 1

2.3.2. Data 2

2.4. Testing and Evaluation Procedure

3. Results

3.1. Data 1—Rocks

3.2. Data 2

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix B

Appendix C. Definition a Neural Network in Python Using the Tensorflow Library

Appendix D. Complete Classification Results for Data 2

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI