Author Contributions
Conceptualization, Y.Z. and C.W.; methodology, Y.Z.; validation, Y.Z.; investigation, Y.Z.; data curation, Y.Z.; writing—original draft preparation, Y.Z.; writing—review and editing, Y.Z., C.W. and H.Z.; supervision, X.X., Z.Y. and M.D.; funding acquisition, C.W. and H.W. All authors have read and agreed to the published version of the manuscript.
Figure 1.
The main structure of TCPSNet.
Figure 1.
The main structure of TCPSNet.
Figure 3.
Proposed multimodal cross-attention module for deep feature fusion. First, convolution is performed on the hyperspectral data to achieve their feature size equal to that of the lidar data. Second, the multi-source remote sensing data are subjected to feature secondary processing. Third, we add positional embedding and learnable markers to features. Finally, the output is passed into the MHSA block for depth information capture. Additionally, the multiplication symbols in the figure all represent the execution of the multiplication operation.
Figure 3.
Proposed multimodal cross-attention module for deep feature fusion. First, convolution is performed on the hyperspectral data to achieve their feature size equal to that of the lidar data. Second, the multi-source remote sensing data are subjected to feature secondary processing. Third, we add positional embedding and learnable markers to features. Finally, the output is passed into the MHSA block for depth information capture. Additionally, the multiplication symbols in the figure all represent the execution of the multiplication operation.
Figure 4.
Structure of cross-head attention module. HSI data as a learnable categorical marker for Lidar data for self-attention learning and finally using MLP Classifier Head for result output.
Figure 4.
Structure of cross-head attention module. HSI data as a learnable categorical marker for Lidar data for self-attention learning and finally using MLP Classifier Head for result output.
Figure 5.
Structure of cross-pseudo-siamese learning module. The upper branch is a branch of HSI and the lower branch is a branch of LiDAR. The plus signs in the figure all represent the execution of the summing operation.
Figure 5.
Structure of cross-pseudo-siamese learning module. The upper branch is a branch of HSI and the lower branch is a branch of LiDAR. The plus signs in the figure all represent the execution of the summing operation.
Figure 6.
Structure of heterogeneous information-induced learning module. The plus and multiplication signs in the figure represent addition and multiplication operations, respectively.
Figure 6.
Structure of heterogeneous information-induced learning module. The plus and multiplication signs in the figure represent addition and multiplication operations, respectively.
Figure 7.
Structure of the fusion module. F represent feature-level fusion and D represent decision-level fusion. New feature from feature-level fusion of HSI and LiDAR.
Figure 7.
Structure of the fusion module. F represent feature-level fusion and D represent decision-level fusion. New feature from feature-level fusion of HSI and LiDAR.
Figure 8.
Trento dataset and all its feature classes. (a) Pseudo-color composite images in the 20th, 15th and 5th bands based on hyperspectral data. (b) Grayscale image for LiDAR-based DSM. (c) Ground truth map.
Figure 8.
Trento dataset and all its feature classes. (a) Pseudo-color composite images in the 20th, 15th and 5th bands based on hyperspectral data. (b) Grayscale image for LiDAR-based DSM. (c) Ground truth map.
Figure 9.
Houston dataset and all its feature classes. (a) Pseudo-color composite images in the 59th, 40th and 23th bands based on hyperspectral data. (b) Pseudo-color composite images of 3rd, 2nd and 1st bands based on multispectral data. (c) Grayscale image for LiDAR-based DSM. (d) Ground truth map.
Figure 9.
Houston dataset and all its feature classes. (a) Pseudo-color composite images in the 59th, 40th and 23th bands based on hyperspectral data. (b) Pseudo-color composite images of 3rd, 2nd and 1st bands based on multispectral data. (c) Grayscale image for LiDAR-based DSM. (d) Ground truth map.
Figure 10.
Augsburg dataset and all its feature classes. (a) Pseudo-color composite images in the 20th, 15th and 10th bands based on hyperspectral data. (b) Grayscale image for LiDAR-based DSM. (c) Pseudo-color composite images based on SAR data. (d) Ground truth map.
Figure 10.
Augsburg dataset and all its feature classes. (a) Pseudo-color composite images in the 20th, 15th and 10th bands based on hyperspectral data. (b) Grayscale image for LiDAR-based DSM. (c) Pseudo-color composite images based on SAR data. (d) Ground truth map.
Figure 11.
Influence of different parameters on the overall accuracy in three datasets. (a) The number of bands retained after PCA dimensionality reduction. (b) Patch size. (c) The number of heads in the multi-head self-attention mechanism. (d) Related parameters of RSJLM.
Figure 11.
Influence of different parameters on the overall accuracy in three datasets. (a) The number of bands retained after PCA dimensionality reduction. (b) Patch size. (c) The number of heads in the multi-head self-attention mechanism. (d) Related parameters of RSJLM.
Figure 12.
Classification maps by different models on the Trento dataset. (a) Ground truth map. (b) 2D-CNN (94.99%). (c) 3D-CNN (95.18%). (d) M2FNet (98.82%). (e) CALC (98.44%). (f) DSHF (98.74%). (g) Coupled CNN (98.87%). (h) Proposed (99.76%).
Figure 12.
Classification maps by different models on the Trento dataset. (a) Ground truth map. (b) 2D-CNN (94.99%). (c) 3D-CNN (95.18%). (d) M2FNet (98.82%). (e) CALC (98.44%). (f) DSHF (98.74%). (g) Coupled CNN (98.87%). (h) Proposed (99.76%).
Figure 13.
Classification maps by different models on the Houston dataset. (a) Ground truth map. (b) 2D-CNN (79.23%). (c) 3D-CNN (87.14%). (d) M2FNet (93.53%). (e) CALC (93.50%). (f) DSHF (91.79%). (g) Coupled CNN (95.73%). (h) Proposed (99.92%).
Figure 13.
Classification maps by different models on the Houston dataset. (a) Ground truth map. (b) 2D-CNN (79.23%). (c) 3D-CNN (87.14%). (d) M2FNet (93.53%). (e) CALC (93.50%). (f) DSHF (91.79%). (g) Coupled CNN (95.73%). (h) Proposed (99.92%).
Figure 14.
Classification maps by different models on the Augsburg dataset. (a) Ground truth map. (b) 2D-CNN (92.53%). (c) 3D-CNN (93.01%). (d) M2FNet (90.61%). (e) CALC (91.43%). (f) DSHF (90.21%). (g) Coupled CNN (92.13%). (h) Proposed (97.41%).
Figure 14.
Classification maps by different models on the Augsburg dataset. (a) Ground truth map. (b) 2D-CNN (92.53%). (c) 3D-CNN (93.01%). (d) M2FNet (90.61%). (e) CALC (91.43%). (f) DSHF (90.21%). (g) Coupled CNN (92.13%). (h) Proposed (97.41%).
Figure 15.
Ablation experiments using different modules on three datasets (overall accuracy).
Figure 15.
Ablation experiments using different modules on three datasets (overall accuracy).
Figure 16.
Influence of sample size on three datasets (overall accuracy), where the Trento and Houston datasets are represented by black and blue solid lines, with the upper axis as the horizontal axis, and the Augsburg dataset is represented by a red dashed line, with the lower axis as the horizontal axis.
Figure 16.
Influence of sample size on three datasets (overall accuracy), where the Trento and Houston datasets are represented by black and blue solid lines, with the upper axis as the horizontal axis, and the Augsburg dataset is represented by a red dashed line, with the lower axis as the horizontal axis.
Table 1.
Training and test sample numbers for Trento, Houston and Augsburg datasets.
Table 1.
Training and test sample numbers for Trento, Houston and Augsburg datasets.
| Trento Dataset | Houston 2013 Dataset | Augsburg Dataset |
---|
Class |
Class Name
|
Training.
|
Test.
|
Class Name
|
Training.
|
Test.
|
Class Name
|
Training.
|
Test.
|
---|
C01 | Apple Trees | 129 | 3905 | Healthy Grass | 198 | 1053 | Forest | 675 | 12,832 |
C02 | Buildings | 125 | 2778 | Stressed Grass | 190 | 1064 | Residential Area | 1516 | 28,813 |
C03 | Ground | 105 | 374 | Synthetic Grass | 192 | 505 | Industrial Area | 192 | 3659 |
C04 | Woods | 154 | 8969 | Trees | 188 | 1056 | Low Plants | 1342 | 25,515 |
C05 | Vineyard | 184 | 10,317 | Soil | 186 | 1056 | Allotment | 28 | 547 |
C06 | Roads | 122 | 3052 | Water | 182 | 143 | Commercial Area | 82 | 1563 |
C07 | | | | Residential | 196 | 1072 | Water | 76 | 1454 |
C08 | | | | Commercial | 191 | 1053 | | | |
C09 | | | | Road | 193 | 1059 | | | |
C10 | | | | Highway | 191 | 1036 | | | |
C11 | | | | Railway | 181 | 1054 | | | |
C12 | | | | Parking Lot1 | 194 | 1041 | | | |
C13 | | | | Parking Lot2 | 184 | 285 | | | |
C14 | | | | Tennis Court | 181 | 247 | | | |
C15 | | | | Running Track | 187 | 473 | | | |
- | Total | 819 | 29,395 | Total | 2932 | 12,197 | Total | 3911 | 74,383 |
Table 2.
Classification accuracies (%) and Kappa coefficients of different models on the Trento dataset.
Table 2.
Classification accuracies (%) and Kappa coefficients of different models on the Trento dataset.
| 2D-CNN | 3D-CNN | M2FNet | CALC | DSHF | CoupledCNN | Proposed |
---|
Apple Trees | 94.29 | 94.21 | 99.23 | 98.13 | 99.62 | 99.54 | 99.90 |
Buildings | 85.24 | 87.19 | 98.38 | 99.24 | 99.68 | 97.37 | 99.24 |
Ground | 83.69 | 89.30 | 90.37 | 81.55 | 97.86 | 97.59 | 100.00 |
Wood | 99.59 | 99.91 | 99.59 | 100.00 | 98.53 | 100.00 | 100.00 |
Vineyard | 99.02 | 98.60 | 100.00 | 99.84 | 100.00 | 100.00 | 100.00 |
Roads | 79.03 | 78.93 | 93.48 | 90.20 | 90.43 | 92.40 | 97.77 |
OA (%) | 94.99 | 95.18 | 98.82 | 98.44 | 98.74 | 98.87 | 99.85 |
AA (%) | 90.14 | 91.36 | 96.84 | 94.94 | 97.76 | 97.82 | 99.49 |
Kappa × 100 | 93.29 | 93.58 | 98.42 | 97.91 | 98.31 | 98.48 | 99.58 |
Table 3.
Classification accuracies (%) and Kappa coefficients of different models on the Houston dataset.
Table 3.
Classification accuracies (%) and Kappa coefficients of different models on the Houston dataset.
| 2D-CNN | 3D-CNN | M2FNet | CALC | DSHF | CoupledCNN | Proposed |
---|
Healthy Grass | 96.30 | 98.10 | 82.24 | 82.05 | 81.20 | 93.83 | 98.39 |
Stressed Grass | 95.77 | 98.68 | 98.68 | 98.59 | 98.78 | 97.27 | 100.00 |
Synthetic Grass | 96.04 | 99.80 | 94.65 | 94.46 | 100.00 | 96.24 | 99.80 |
Trees | 90.63 | 89.30 | 96.40 | 97.35 | 99.81 | 100.00 | 99.81 |
Soil | 98.96 | 99.72 | 99.43 | 100.00 | 100.00 | 100.00 | 100.00 |
Water | 97.90 | 99.30 | 95.80 | 97.90 | 84.62 | 95.80 | 100.00 |
Residential | 74.91 | 90.58 | 92.07 | 90.21 | 83.49 | 93.84 | 99.91 |
Commercial | 53.00 | 74.45 | 92.78 | 91.45 | 69.33 | 95.63 | 100.00 |
Road | 76.49 | 79.60 | 95.00 | 89.52 | 39.81 | 89.14 | 99.62 |
Highway | 74.81 | 52.41 | 89.67 | 94.88 | 80.02 | 98.65 | 100.00 |
Railway | 53.80 | 89.28 | 92.13 | 99.62 | 82.54 | 97.25 | 99.91 |
Parking Lot1 | 51.87 | 80.79 | 92.80 | 87.80 | 94.91 | 90.97 | 98.46 |
Parking Lot2 | 83.86 | 94.39 | 90.18 | 85.96 | 83.51 | 89.12 | 98.25 |
Tennis Court | 99.60 | 96.36 | 100.00 | 100.00 | 96.76 | 97.57 | 100.00 |
Running Track | 98.52 | 100.00 | 99.15 | 100.00 | 99.79 | 99.79 | 100.00 |
OA (%) | 79.23 | 87.14 | 93.53 | 93.50 | 91.79 | 95.73 | 99.92 |
AA (%) | 82.83 | 89.52 | 94.07 | 93.99 | 92.58 | 95.67 | 99.61 |
Kappa × 100 | 77.52 | 86.04 | 92.98 | 92.94 | 91.09 | 95.36 | 99.58 |
Table 4.
Classification accuracies (%) and Kappa coefficients of different models on the Augsburg dataset.
Table 4.
Classification accuracies (%) and Kappa coefficients of different models on the Augsburg dataset.
| 2D-CNN | 3D-CNN | M2FNet | CALC | DSHF | CoupledCNN | Proposed |
---|
Forest | 97.20 | 94.97 | 92.25 | 90.46 | 96.34 | 98.59 | 99.02 |
Residential Area | 96.79 | 98.20 | 96.73 | 96.13 | 94.10 | 98.36 | 99.40 |
Industrial Area | 87.87 | 75.62 | 76.89 | 91.38 | 67.18 | 63.11 | 87.65 |
Low Plants | 96.51 | 97.81 | 94.57 | 95.61 | 97.48 | 95.33 | 98.99 |
Allotment | 1.10 | 35.65 | 23.33 | 25.81 | 51.24 | 51.05 | 90.49 |
Commercial Area | 3.58 | 17.15 | 1.16 | 1.65 | 0.08 | 8.55 | 60.84 |
Water | 38.93 | 50.89 | 39.55 | 53.02 | 0.27 | 32.58 | 71.80 |
OA (%) | 92.53 | 93.31 | 90.61 | 91.43 | 90.21 | 92.13 | 97.41 |
AA (%) | 60.28 | 67.18 | 60.64 | 64.87 | 58.05 | 63.94 | 86.88 |
Kappa × 100 | 89.22 | 90.31 | 86.43 | 87.71 | 85.77 | 88.60 | 95.98 |
Table 5.
Classification performance obtained by different fusion methods (overall accuracy (%)).
Table 5.
Classification performance obtained by different fusion methods (overall accuracy (%)).
| F-Sum | F-Max | F-PagFM | DF-Sum | DF-Max | DF-PagFM |
---|
Trento | 99.71 | 99.65 | 99.77 | 99.77 | 99.73 | 99.85 |
Houston | 99.72 | 99.73 | 99.78 | 99.91 | 99.83 | 99.92 |
Augsburg | 97.03 | 96.60 | 97.17 | 97.17 | 97.24 | 97.41 |
Table 6.
Ablation analysis of different components in TCPSNet on the Trento dataset (overall accuracy (%)).
Table 6.
Ablation analysis of different components in TCPSNet on the Trento dataset (overall accuracy (%)).
MCAM | CPSLM | Dy-MFEM | OA (%) |
---|
✓ | ✓ | ✓ | 99.85 |
✓ | ✓ | × | 99.75 |
× | ✓ | ✓ | 99.70 |
✓ | × | ✓ | 99.76 |
✓ | × | × | 99.55 |
× | ✓ | × | 99.69 |
× | × | ✓ | 99.67 |
× | × | × | 99.26 |
Table 7.
Training and test sample numbers for MUUFL and Berlin datasets.
Table 7.
Training and test sample numbers for MUUFL and Berlin datasets.
Class | MUUFL Dataset | Berlin Dataset |
---|
Class Name
|
Training.
|
Test.
|
Class Name
|
Training.
|
Test.
|
---|
C01 | Trees | 150 | 23,096 | Forest | 443 | 54,511 |
C02 | Mostly Grass | 150 | 4120 | Residential Area | 423 | 268,219 |
C03 | Mixed Ground Surface | 150 | 6732 | Industrial Area | 499 | 19,067 |
C04 | Dirt and Sand | 150 | 1676 | Low Plants | 376 | 58,906 |
C05 | Road | 150 | 6537 | Soil | 331 | 17,095 |
C06 | Water | 150 | 316 | Allotment | 280 | 13,025 |
C07 | Buildings Shadow | 150 | 2083 | Commercial Area | 298 | 24,526 |
C08 | Buildings | 150 | 6090 | Water | 170 | 6502 |
C09 | Sidewalk | 150 | 1235 | | | |
C10 | Yellow Curb | 150 | 33 | | | |
C11 | Cloth Panels | 150 | 119 | | | |
- | Total | 1650 | 52,037 | Total | 2820 | 461,851 |
Table 8.
Classification accuracies (%) and Kappa coefficients obtained by different combination methods.
Table 8.
Classification accuracies (%) and Kappa coefficients obtained by different combination methods.
| Houston (M + L) | Houston (H + M) | Augsburg (S + L) | Augsburg (H + S) | MUUFL (H + L) | Berlin (H + S) |
---|
OA (%) | 99.67 | 99.91 | 73.39 | 97.03 | 87.97 | 91.96 |
AA (%) | 99.65 | 99.87 | 47.31 | 85.07 | 71.88 | 92.67 |
Kappa × 100 | 99.61 | 99.81 | 62.18 | 95.74 | 85.34 | 87.86 |