*3.2. Feature Extraction*

The convolutional layers extract the overall features by sweeping the image pixels in a certain step sequence through the convolutional kernel, which is the feature mapping process (Figure 4). Each kernel is an n\*n matrix containing weight values. The high dimensional feature vector output after multi-layer convolutional operations could represent the overall features of the input image because spatially adjacent pixels in an image have considerable correlations [41].

**Figure 4.** Schematic of a convolutional kernel sweeping over an image for feature mapping.

To quantify the urban morphological features, we used the deep convolutional neural network GoogLeNet, which includes the Inception-v3 module. This architecture of the Inception-v3 module contributes to the high performance of the deep convolutional neural networks on image classification because it is susceptible to the context of input images [42]. A pre-trained GoogLeNet could be implemented for various feature extraction tasks. The weights of the kernels were optimized by training the model with the ImageNet dataset [43]. In this way, the deep convolutional neural networks map the image features into high dimensional feature vectors by feature mapping. Figure 5 briefly illustrates the structure of the GoogLeNet.

**Figure 5.** The structure of the GoogLeNet and the 2048-dimensional feature vector output of the bottleneck layer is taken out as high dimensional feature vectors (HDFV).

The process of passing an image through a trained convolutional neural network up to the bottleneck layer can be viewed as a feature extraction process for the image [44]. Typically, the final layer's output in a convolutional neural network is a number between 0 and 1 to represent the prediction of the categories. Before the linear layer is the so-called bottleneck layer, the output size of which is 1 × 1 × 2048. The bottleneck layer's output can be considered a more concise and representative feature vector of the image. This layer can represent the features learned by the neural network. Therefore, we take the penultimate layer's output, where the dimension of the input image increases to 1 × 1 × 2048. The output data were collected as HDFV for further comparison. We carried out a comparative study regarding the case retrieval performance on the plot shape and building distribution, focusing on the plot shape and plots with distributed buildings as independent inputs.
