**3. Dataset**

The dataset contained the hyperspectral UAV images acquired at two sites of Jeonju City in South Korea. The hyperspectral UAV images were acquired on September 19, 2019, by a DJI Matrice 200 UAV equipped with hyperspectral sensors (Corning microHSI SHARK 410). This platform had accurate flight controls and inherent stability. Its spatial resolution was 15 cm and spectral resolution was 4 nm over 150 bands ranging from 398.78 to 996.74 nm. The flight path of the UAV was selected to follow the waypoint at a flight height of 200 m. The whole study area (890 m × 730 m) was covered in 15 courses. Study sites of area 600 m × 600 m, where the errors associated with camera shaking and gematric problems were few, were selected from the whole area. The images were registered using the geographic map projection WGS-84. The center coordinates of Sites 1 and 2 were (35◦48 19" N, 127◦05 45" E) and (35◦47 16" N, 127◦07 14" E), respectively (Figure 5). These sites included crop lands, forests, and building areas. Owing to the high spatial resolution of the hyperspectral UAV images, objects such as vehicles, the centerlines of roads, and shadows, besides buildings and trees, could be identified. As such information was unnecessary for updating the cadastral map, the spatial resolution of the images was reduced to 60 cm to limit the number of classification classes and reduce the memory requirements of deep learning. Prior to the classification, the images were pre-processed by geometric and radiometric corrections based on GNSS and field spectrometry data.

**Figure 5.** Locations of the two study sites in South Korea, along with their UAV hyperspectral images. The background map was obtained from ArcGIS (a geographic information system (GIS) for working with maps and geographic information maintained by Esri) world map [32]. The hyperspectral UAV images were obtained on 19 September 2019.

Figure 6 shows the cadastral maps of the study sites. There were 284 and 250 parcels in the cadastral maps of Sites 1 and 2, respectively. We obtained the most recently updated serial cadastral map taken in January 2018. In Korea, land categories of cadastral maps can be divided into 28 items, and a cadastral map can be divided into 28 main land categories. The study sites included 17 land category items: building sites, paddy fields, fields, park sites, school sites, roads, forests, reservoirs, miscellaneous land, sites for religious use, parking lots, ditches, factory sites, cemeteries, gas station sites, sports areas, and ranches.

**Figure 6.** Cadastral map of the study sites in South Korea: (**a**) Site 1, (**b**) Site 2.

#### **4. Results**

## *4.1. Classification Results*

The hyperspectral UAV images were classified by the proposed hybrid CNN. The network was optimized in 30 epochs of Adam with a learning rate of 10−<sup>3</sup> and a batch size of 256. The Adam optimizer is a combination of the stochastic gradient descent with momentum and RMSprop, and has relatively low memory requirements and is quite computationally efficient [33]. At the start of each iteration, the network was randomly initialized. The ground-truth data were manually defined from the field data. The field work acquired the spectral libraries and types of surface materials. The ground truth was composed of 88,567 pixels and contained six classes: crops, forests, roads, buildings, bare soil, and water. The classes that could be mapped to the land category items were then defined. The various crop lands and grass covers were combined into "crop land," and relatively high trees were classified as "forest." Colored roofs, such as blue, brown, and white, were all classified as "buildings." "Bare soil" represented ground without buildings and vegetation, and "road" encompassed asphalt roadways. The ground-truth data were randomly divided into training, validation, and test samples. Sixty percent of the ground-truth data (53,140 pixels) were used as training samples, which were subdivided into validation and training data at a ratio of 7:3 to avoid overfitting problems. The remaining 40% of the ground-truth data (35,427 pixels) were reserved as the test samples. The performance of the proposed network was estimated from the classification accuracy of the test data.

To confirm the effectiveness of the hybrid network, the classification accuracies of the 2D and 3D-CNNs were compared. Both networks were composed of three convolutional layers and used the same variables as those used by the hybrid CNN. In each experiment, the performance of the network was evaluated by the F1 scores of the six classes and the overall accuracy (OA). The F1 score measured the classification accuracy in terms of the precision and recall scores (Equation (4)). Precision defines the fraction of correctly retrieved instances among all instances, and recall is the fraction of correctly retrieved instances among all correct instances.

$$\text{F1 score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}.\tag{4}$$

Figure 7 shows the classification losses and accuracies in each epoch for the training and validation samples of Sites 1 and 2. The hybrid CNN achieved lower classification loss and higher accuracy compared to 2D-CNN and 3D-CNN. Relative to 2D-CNN, the loss reduction and accuracy improvement in the hybrid CNN became more noticeable with increasing epoch number. Although 3D-CNN also achieved higher accuracy than 2D-CNN, it was less accurate and incurred higher losses than the hybrid

CNN at Site 1. According to the results, 3D-CNN was more useful for classifying hyperspectral images than 2D-CNN but combining the 2D and 3D CNNs improved the classification performance.

**Figure 7.** Classification losses and accuracies in each epoch on (**a**) training samples, (**b**) validation samples at Site 1, (**c**) training samples, (**d**) validation samples at Site 2.

Figure 8 shows the classification results of the hyperspectral UAV images using the hybrid CNN. The F1 scores and overall accuracies of the six classes are listed in Table 2. The OAs of the land cover classifications at Sites 1 and 2 were 99.93% and 99.75%, respectively. Because the ground-truth data did not cover the entire study area, it was not the classification accuracy of the entire image but rather that of a randomly selected test sample location. According to Table 2, all six classes were well classified. As there was no water at Site 2, the results of this site were divided into five classes. Forests and roads obtained a lower F1 score than the other classes, because the spectral characteristics of crop land and forest were very similar. Furthermore, roads, parking lots, and car were classified into the "road" class and various colored rooves were classified into the "building" class. Moreover, areas that appeared to be farmland with low vegetation were classified as "bare soil." Pixel-level classification errors in the results can be considered as insignificant because an inconsistency comparison will be conducted at the parcel level.

**Table 2.** Classification results of the South Korean sites: F1 score and overall accuracy.


**Figure 8.** Classification results of hyperspectral UAV images at the two South Korea sites: (**a**) Site 1, (**b**) Site 2.
