1. Introduction
Reconstruction of three-dimensional (3-D) building models in urban areas has been a hot topic in remote sensing, photogrammetry, and computer vision for more than two decades [
1]. Tomographic synthetic aperture radar (TomoSAR) is a remote sensing technique that extends the conventional two-dimensional (2-D) SAR imaging principle to three-dimensional (3-D) imaging [
2] (see
Figure 1). SAR has the ability to work all day due to the active emission of signals. Moreover, SAR is almost independent of weather conditions because of the use of microwaves in radar signals, which is a major advantage when compared to sensors in the visible or infrared spectrum [
3,
4]. Very high resolution (VHR) satellite SAR imagery nowadays offers sub-meter resolution and TomoSAR techniques introduce the idea of synthetic aperture to the elevation, which make it possible to reconstruct 3-D building models from SAR images. It produces 3-D point clouds with unavoidable noise that seriously deteriorates the quality of 3-D imaging and the reconstruction of buildings over urban areas [
5]. SAR is capable of assessing the deformation of the ground and buildings in the order of centimeters and millimeters due to its coherent nature and short wavelengths, and it supports various important application scenarios, such as damage assessment in urban areas after natural disasters [
6]. Spaceborne TomoSAR are particularly suited for the long-term monitoring of such dynamic processes [
7]. TomoSAR technology has a very important role in terrain monitoring and building deformation detection, especially for huge manmade facility-deformation monitoring in the future. In an urban remote sensing, building information retrieval and reconstruction from SAR images have been extensively investigated [
8]. In recent years, scholars mainly focused on TomoSAR 3-D imaging algorithms, which can be roughly divided into three categories: backward projection [
9], spectral estimation [
10], and compressive sensing [
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21]. TomoSAR 3-D imaging algorithms, such as compressive sensing and spectral estimation-based multiple signal classification (MUSIC), are relatively mature at present and it produces 3-D point clouds with unavoidable noise that seriously deteriorates the quality of 3-D imaging and the reconstruction of buildings over urban areas. The point cloud obtained by these methods cannot be directly applied. Therefore, the question regarding how to reconstruct the building from the point cloud submerged by noise is an urgent problem in the TomoSAR research field.
Depending on, or in combination with, some other means [
21,
22], the reconstruction of buildings can be carried out. The detection and reconstruction of the building radar footprints from single very high resolution SAR images [
6] are automatically detected and reconstructed using the 2-D and 3-D building shape from Spaceborne TomoSAR point clouds using a region-grow algorithm [
22]. Today, machine learning and deep learning are widely spread [
23] in the remote sensing process. Recurrent 3-D fully convolutional networks [
23] are used in hyperspectral images for changing detection. A deep recurrent neural network [
24] is utilized for the agricultural classification of SAR images. A neural network [
5] is introduced in the 3-D reconstruction from a TomoSAR point cloud, but if the data is not cleaned well, the result will not be satisfactory.
Toward solving this issue, a combined method with Hough transform and clustering was proposed in this study for filtering these interferences and training the continuous outline of a building in cloud points. The Hough transform was adopted to detect the outline of a building; however, on one hand, the obtained outline of a building from a Hough transform is broken, and on the other hand, some of these broken lines belong to the same segment of a building outline, but the parameters of these lines are slightly different. These problems will lead to an issue where a segment of a building outline is represented by multiple different parameters in the Hough transform. Therefore, an unsupervised clustering method was employed for clustering these line parameters. The lines gathered in the same cluster were considered to correspond to the same segment of a building outline. In this way, different line parameters corresponding to a segment of a building outline were integrated into one and then the continuous outline of a building in cloud points was obtained. Steps of the proposed data processing method were as follows. First, the Hough transform is made use of for detecting the lines on the tomography plane in TomoSAR point clouds. These detected lines lay on the outline of the building, but they were broken due to the density variation of point clouds. Second, the detected lines using the Hough transform were grouped as a date set for training the building the outline. Unsupervised clustering was utilized to classify the lines in several clusters. The cluster number was automatically determined using the unsupervised clustering algorithm, which means the number of straight segments of the building edge was obtained. The lines in each cluster were considered to belong to the same straight segment of the building outline. Then, within each cluster, which represented a part or a segment of the building edge, a repaired straight line was constructed. Third, between each two clusters or each two segments of the building outline, the joint point was estimated by extending the two segments. Therefore, the building outline was obtained as completely as possible. Finally, taking the estimated building outline as the clustering center, a supervised learning algorithm was made use to classify the building cloud point and the noise (or false targets), and then the building cloud point was refined. Then, our refined and unrefined data were feed into the neural network for building the 3-D construction. The comparison results showed the correctness and the effectiveness of our improved method.
3. 3-D Building Reconstruction from TomoSAR Point Clouds
For simplicity, in the following analysis, we assumed that the input point clouds of our method were projected from the range direction to the ground-range using the method described in Zhou et al. [
5]. The schematic geometry of one single target building is illustrated in
Figure 5. Considering the side-look imaging principle of TomoSAR, we made a simplifying assumption that the SAR did not penetrate the building and ground, such that the surface scatterers could be “seen” only from one side of the building [
5]. Then, we directly used these surface scatterers to represent the building structure. An example of a three-dimensional reconstruction of point clouds would return all blue dots in
Figure 5 to the red line as much as possible, and remove other false targets.
3.1. Hough Transform in the Tomographic Plane of 3-D Point Clouds
Typical compressive sensing (CS) algorithms, such as orthogonal match pursuit (OMP) and regularized OMP (ROMP), are employed to inverse the scatterers on the building surface. However, due to the existence of noise, there can still be a few false targets and outliers scattered in a disorderly manner in the tomographic plane, which caused the building to be submerged in these disorganized false targets. Therefore, it is necessary to detect the outline of the building in the tomography plane. Generally, the outline of a regular building in the tomography plane is usually composed of several straight connected line segments. The commonly used method of line detection is a Hough transform, which is widely used in computer vision and pattern recognition. Here is the principle of straight-line detection with a Hough transform. This algorithm is essentially a voting process where each point belonging to the patterns votes for all the possible patterns passing through that point. These votes are accumulated in an accumulator array called bins, and the pattern receiving the maximum votes is recognized as the desired pattern [
25].
Given an
binary edge image, straight lines are defined as:
where (
) is the measurement of the position in
coordinates,
(
) denotes the angle that the norm line makes with the x-axis and
(
) is the norm distance from the origin to the line. As shown in
Figure 6, (
) denotes the coordinates of the red points,
and
are defined as the parameter space of the Hough transform. The straight line is mapped to a point in the parameter space of the Hough transform.
For example, every straight line through (
), such as the red dot (see
Figure 6) in the
coordinates is mapped to a point in the parameter space of the Hough transform. Then, all the possible straight lines through the red dot in
coordinates will form a curved line, such as the red dashed curve in the parameter space of the Hough transform. For all parameter cells (
), the Hough transform algorithm calculates the parameter values and accumulates the pixels that drop in the parameter cells (
). If there are enough pixels, which are mapped to a parameter cells (
), then (
) is determined as a line in the
coordinates, and if not, it is determined as noise.
The Hough transform computation consists of three parts: (1) calculating the parameter values and accumulating the pixels in the parameter space of the Hough transform, (2) finding the local maximums that represent line segments, and (3) extracting the line segments using the knowledge from the maximum positions. It visits each pixel of the image once.
3.2. Unsupervised and Supervised Clustering
After the Hough transform, the building outline is initially estimated in the tomography plane. However, on one hand, the detected lines were often broken due to the existing noise or the different density of the point cloud, as shown in
Figure 7 (red line segments). On the other hand, some of these broken lines belonged to the same segment of a building outline, but the parameters of these lines were slightly different. Taking the Hough transform parameter (
) as the data features, which are shown in
Table 1, and utilizing a K-means clustering method, which is a typical unsupervised learning algorithm, the detected lines (red line segments) were grouped in several clusters, as illustrated in
Figure 7b (each color represents a cluster). Here, we used the parameter distance for clustering. The parameter distance can be written as:
For example, in
Table 1,
and
. If
(
is very small), these two lines belong to a same straight segment. However, one should be careful when treating situations such as
,
, and
. This is caused by the characteristics of the Hough transform; here, we modified the distance formula to become:
For example, in
Table 1, lines 7 and 9 belong to the same straight segment
and
. For the situation
, which corresponds to horizontal lines in the image, if there is a small error in the Hough transform, it means that
and
. For the situation
, which corresponds to vertical lines in the image, if there is a small error in the Hough transform, it means
and
. For the situation
, which corresponds to slope lines passing through the origin of the coordinates in the image, if there is a small error in the Hough transform, it means
.
Each cluster belongs to the same segment of the building outline. As shown in
Figure 7b, yellow lines, green lines, and red lines represent the lines in different segments of the building outline. Within each cluster or each segment, the parameter (
,
), which represents a segment of the building, outline is estimated. The parameter
is the number of clusters and it is obtained from the procedure of unsupervised clustering. Using the estimated parameter (
,
), the continuous building outline function
is established. It can be written as:
where the estimated parameters
and
are:
Then, taking the estimated building outline function
as the clustering center, supervised clustering was used to separate the false targets and the noise from the building point cloud. The method can be expressed as:
where (
) is the position of the arbitrary pixels in binary image. If (
) is very close to the building outline, Equation (6) almost equals zero, and the pixels are determined as the building point cloud. Conversely, the point is determined as noise or a false target.
Figure 7c,d exhibits the procedure of clustering.