Refined Land Use Classification for Urban Core Area from Remote Sensing Imagery by the EfficientNetV2 Model

Wang, Zhenbao; Liang, Yuqi; He, Yanfang; Cui, Yidan; Zhang, Xiaoxian

doi:10.3390/app14167235

Open AccessArticle

Refined Land Use Classification for Urban Core Area from Remote Sensing Imagery by the EfficientNetV2 Model

by

Zhenbao Wang

^1,*

,

Yuqi Liang

¹,

Yanfang He

¹,

Yidan Cui

² and

Xiaoxian Zhang

³

¹

School of Architecture and Art, Hebei University of Engineering, Handan 056038, China

²

School of Architecture and Urban Planning, Nanjing University, Nanjing 210093, China

³

Department of Biomedical Engineering, National University of Singapore, Singapore 117583, Singapore

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(16), 7235; https://doi.org/10.3390/app14167235 (registering DOI)

Submission received: 10 July 2024 / Revised: 10 August 2024 / Accepted: 15 August 2024 / Published: 16 August 2024

(This article belongs to the Special Issue Geospatial Artificial Intelligence (AI) in Earth Observation, Remote Sensing and GIScience)

Download

Browse Figures

Versions Notes

Abstract

:

In the context of accelerated urbanization, assessing the quality of the existing built environment plays a crucial role in urban renewal. In the existing research and use of deep learning models, most categories are urban construction areas, forest land, farmland, and other categories. These categories are not conducive to a more accurate analysis of the spatial distribution characteristics of urban green space, parking space, blue space, and square. A small sample of refined land use classification data for urban built-up areas was produced using remote sensing images. The large-scale remote sensing images were classified using deep learning models, with the objective of inferring the fine land category of each tile image. In this study, satellite remote sensing images of four cities, Handan, Shijiazhuang, Xingtai, and Tangshan, were acquired by Google Class 19 RGB three-channel satellite remote sensing images to establish a data set containing fourteen urban land use classifications. The convolutional neural network model EfficientNetV2 is used to train and validate the network framework that performs well on computer vision tasks and enables intelligent image classification of urban remote sensing images. The model classification effect is compared and analyzed through accuracy, precision, recall, and F1-score. The results show that the EfficientNetV2 model has a classification recognition accuracy of 84.56% on the constructed data set. The testing set accuracy increases sequentially after transfer learning. This paper verifies that the proposed research framework has good practicality and that the results of the land use classification are conducive to the fine-grained quantitative analysis of built-up environmental quality.

Keywords:

deep learning; land-use classification; remote sensing images; EfficientNetV2; urban core built-up area

1. Introduction

The current development of urbanization has led to the emergence of large-scale and complex land use clusters in cities [1,2,3,4], and the information in urban remote sensing images has become more diverse [5,6]. Classification and detection of multiple urban land types by remote sensing image interpretation is the current research trend, which has been gradually applied in the fields of land use [7,8,9], urban planning [10,11], geographic information [12,13], and other fields. Therefore, the more accurate and efficient remote sensing image interpretation techniques are urgently needed for the intelligence and automation of land use classification methods [11]. There is a lack of data sets and deep learning model applications for land use classification using civil three-channel remote sensing images in existing studies, and solving this problem will improve the practicality of land use classification techniques for remote sensing images.

Traditional urban remote sensing image recognition methods have limitations such as limited extraction of information, long processing times, and low efficiency [14]. Therefore, how to recognize the information in urban remote sensing images with high efficiency and high accuracy has become a challenge for current research [15,16,17,18,19]. With research breakthroughs in computer vision methods, machine learning has transitioned from traditional shallow learning to deep learning [20,21,22,23]. Deep learning, an emerging machine learning technology, models the functioning of the human brain using multi-layer neural networks and can automatically learn feature representation [24,25,26]. The utilization of convolutional neural networks has demonstrated inherent advantages in autonomously extracting target features, thereby enabling the application of more sophisticated algorithms for the automatic extraction of higher-level feature representations [27,28]. The evolution of convolutional neural network models have evolved from the original five-layer structure to later developments such as Alex Net [29], VGG Net [30], Res Net [21], and Efficient Net [31,32], and the use of convolutional neural network models for image classification has become a mainstream trend [33,34,35,36]. AlexNet emerged victorious in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012, introducing deep CNNs into the computer vision community. Its application in remote sensing image classification has also demonstrated promising results [29,37]. VGGNet is renowned for its simplicity and the exclusive use of 3 × 3 convolutional filters. This method has been successfully applied to various remote sensing image classification tasks [30]. GoogLeNet, also known as Inception, introduced the concept of the Inception module, which enables more efficient computation and improved feature extraction. This architecture has been utilized for land use classification in remote sensing imagery, achieving high accuracy. ResNet (Residual Networks), with its introduction of skip connections and residual blocks, allows the training of very deep networks. ResNet has achieved state-of-the-art performance in many remote sensing image classification tasks [21,38]. Despite the significant potential demonstrated by convolutional neural networks in land use classification, challenges remain, including limited labeled data, class imbalance, and model interpretability.

Compared with traditional image recognition methods, deep learning networks are more effective in extracting high-level features from image information for detection, leading to constantly improving accuracy in remote sensing image classification [13,39,40,41,42]. For example, Liu and Shi [40] used a convolutional neural network model to classify land use in remote sensing images. In the study, they used a neural network with a deep architecture to extract image features and obtain better classification results. Yao et al. [12] constructed a neural network (TR-CNN) that fused time-series power data and remote sensing images to identify urban land use types. Yu et al. [43] combined U-Net and DenseASPP (Densely connected Atrous Spatial Pyramid Pooling) convolutional neural network utilized the attention mechanism to efficiently fuse multi-source semantic information from the output of the two-layer network, and realized high-precision urban land use classification. Refining land use classification from remote sensing images and inputting them into deep learning models enables these models to effectively analyze and reason about land use classification. This approach allows for quick and accurate identification of construction categories within the existing built environment, ultimately leading to more precise and efficient evaluation processes. Table 1 provides a concise summary of the literature references for urban land use classification using deep learning models. These studies, based on convolutional neural networks, achieved promising results in urban land use classification. Nevertheless, they also present certain drawbacks. For instance, they require a substantial number of data for network training. If the data set does not contain sufficient images, the performance may deteriorate, and the training time and computational resource consumption are significant. Moreover, these models often suffer from issues such as large model size, slow inference speed, and difficulties in deploying on mobile devices. Models that do not rely on external devices and can be deployed offline on mobile terminals are more aligned with current practical production needs. To address these challenges, a new deep learning approach, transfer learning, has emerged. Transfer learning transfers knowledge from a pre-trained model on a similar task to improve the learning of a new task, thereby significantly enhancing the network’s learning efficiency and reducing training costs [16]. Transfer learning avoids the need for a large number of data for network training, significantly reducing training time, generalization error, and computational costs associated with model construction. Transfer learning encompasses three main methods: full weight transfer, feature extraction, and fine-tuning [14].

Currently, China faces diverse challenges in the field of urban land use classification. Over the past four decades, the country has undergone rapid socio-economic development. With its large population base and swift urbanization, especially in the eastern coastal regions and certain major inland cities, high population densities and scarce land resources present significant challenges. In this context, planning for residential, industrial, commercial, and ecological land use within the confines of limited land resources is a formidable task. The refinement of urban land use classification necessitates the provision of more detailed data support. Moreover, harnessing modern technological tools such as big data and remote sensing to enhance the precision and efficiency of land use classification is among the challenges that we must currently address [40,45]. Chinese land use type is evolving towards intensive and compact development. However, there are some problems in land use classification, such as the boundary is not easy to identify, the road is blocked by green vegetation on both sides, and the artificial classification has fuzzy judgment [4,46]. In comparison with the advantages of foreign urban land classification, such as clear boundaries, accurate road delineation, and easy identification, it is evident that foreign data sets cannot be directly applied to land classification in China. At present, the commonly used scene recognition data sets include SIRI-WHU data set [47], WHU-RS19 data set [48], and UC Merced Land Use data set [49]. These data sets encompass categories like desert, forest, beach, and so on. Due to the differences between foreign data sets and domestic land use status, the applicability of these foreign data sets is limited, and they cannot be directly used for image recognition of urban land use classification in China. Therefore, it is necessary to establish a remote sensing image data set for local land use classification task.

The main research objectives of this paper are as follows: (1) to construct a data set based on the characteristics of land use classification in the urban core area of Chinese northern cities. (2) The core built-up areas of four typical cities, namely Handan, Shijiazhuang, Tangshan, and Xingtai, are selected to construct a three-channel satellite remote sensing image data set and manually classified into 14 categories. (3) By using the convolutional neural network EfficientNetV2 model to realize the intelligent classification of urban land remote sensing images. (4) To provide powerful data support for the precise analysis of the current structure of urban land use.

The remainder of this paper is organized as follows: Section 2 introduces the main research methodology. Section 3 presents the experimental data. Section 4 includes the results and discussion. Section 5 presents the conclusions.

2. Methods

In this study, we employ the following four steps to classify urban land using remote sensing images. These steps include (1) data set construction, (2) model training, (3) result assessment, and (4) the influence of land use classification. We train the EfficientNetV2 model using the urban land use data set of remote sensing images, classifying urban land use types and evaluating the classification performance across different data sets. This study provides a new idea for more refined urban land classification. Figure 1 shows the research method flow.

2.1. EfficientNetV2 Model

With the increasing research on neural networks in recent years, scholars have been paying attention to the speed of the network’s training and inference. Most of the previous high-precision classification algorithms were difficult to use in embedded devices and other endpoints because of the large model size. In April 2021, the convolutional neural network framework EfficientNetV2, published in CVPR, addressed the issue that most classification algorithms improve accuracy only by expanding a single dimension of the depth, width, and resolution of the neural network [31]. EfficientNetV2 employs a balanced scaling approach across these three dimensions, allowing the model to be optimized for different hardware and requirements, thereby achieving the effect of increased precision. The EfficientNetV2 model can be trained in a shorter time than the most advanced models. Gradually increasing the image size during training can further expedite this process, although it may lead to a decrease in accuracy. To address the issue of accuracy loss, Tan et al. [32] introduced a progressive learning strategy that dynamically adjusts regularization, such as data augmentation, as well as image size. The performance of EfficientNetV2 on mobile devices has also been improved, making it more suitable for deployment in resource-constrained environments [31].

In the experiment, we utilized the EfficientNetV2-S model was employed with a reduced input image size to conserve computer memory and reduce training time. The model structure is illustrated in Figure 2. The overall architecture of the model comprises eight modules. Module 1 serves as the Stem layer, incorporating a convolutional layer with a 3 × 3 kernel size and a stride of 2, the SiLU activation function layer, and batch normalization (BN) layer for initial feature extraction from the image. Modules 2 to 4 consist of Fused-MBConv. Modules 5 to 7 comprise MBConv convolutional structures. Both of them utilize 3 × 3 convolutional units and identify deep image features through incremental scaling of the image size using expansion factors. Lastly, the Head module consists of a convolutional layer with a 1 × 1 kernel size. Following dimensionality reduction via convolution, an image classifier is obtained through pooling and fully connected layers. The different depth levels of convolutional layers within the network structure enable feature extraction at various levels when processing image data. Low-level convolutions aid in distinguishing basic feature types in urban land remote sensing images such as roads, buildings, and water bodies. Middle-level convolutions can extract complex features like geometric structures and local patterns that assist in distinguishing feature classes with similar textures and colors, such as public spaces and squares. High-level convolutions capture abstract features to enhance classification accuracy for remote sensing images.

The MBConv and Fused-MBConv modules serve as the core components of the model structure. The MBConv convolution consists of a 1 × 1 normal convolution, a 3 × 3 depth-separable convolution with SiLU activation function layers and bulk normalization (BN) layers, a Squeeze-and-Excitation (SE) module, and a Dropout layer. In cases where the expansion factor in the model is equal to one, the first upscaled 1 × 1 convolutional layer is omitted. The final output features of the model include the Dropout layer and shortcut branch (Shortcut connection) weights. The Fused-MBConv module closely resembles the overall MBConv convolution but replaces the 1 × 1 upscaled convolution and the 3 × 3 depth-separable convolution in the main branch of the original MBConv structure with a normal 3 × 3 convolution while eliminating the SE module. The SE attention module addresses issues arising from the varying importance of channel importance within different feature maps during convoluted pooling processes, such as loss problems. This module is constructed from global average pooling and two fully connected layers. The structure of the MBConv and Fused-MBConv modules is shown in Figure 3.

2.2. Transfer Learning

In the field of deep learning, transfer learning, as an important technical means, is widely applied in various tasks such as image recognition, speech recognition, and natural language processing. By making use of the already trained model parameters and fine-tuning them for new tasks, it can significantly improve the performance of the model in specific domains. This method not only saves a significant amount of training time and computing resources but also effectively avoids the problem that the model gets stuck in the local optimal solution and fails to converge. Take image recognition as an example. When a convolutional neural network acquires the ability to identify basic shapes and features on a large-scale data set, the use of transfer learning allows for the application of the learned knowledge to new tasks. For instance, in the field of face recognition, the pre-trained face detector can be used as the initialization point for a new model, with subsequent fine-tuning according to the required specifications to adapt the model to the specific demands of face recognition in diverse scenarios. In this way, the model can have relatively good performance in the initial stage and adapt to the new data set more quickly.

In the training process of convolutional networks, the successive stacking of networks leads to an increase in the number of model parameters, resulting in an excessive length of time for the deep learning model. Transfer learning can reduce the number of parameters in training and enhance the generalization ability of the model. Transfer learning employs the learning weights of the model in similar tasks to avoids the model learning from scratch, enabling the model to acquire better initial weight parameters. In this study, the pre-training weights of the EfficientNetV2 model in the constructed data set were used as the weights of the initial parameters for model training. Primarily, partial parameters of the model were migrated, and the convolution layer, pooling layer, and fully connected layer in the final convolution structure were trained, significantly enhancing the speed of model training. The transfer learning flowchart is demonstrated in Figure 4.

2.3. Model Validation

Model evaluation is an indispensable part of the experimental task. In order to measure the recognition performance of the model more comprehensively, the accuracy, precision, recall, and F1-score commonly used in classification models are used as the evaluation indexes of the model. The confusion matrix is a table used to summarize the classification results, which contains four important statistics: TP (True Positive) represents the number of samples that were originally positive and were predicted to be positive; FP (False Positive) is actually a negative category, but was wrongly predicted to be positive by the model; FN (False Negative) is actually a positive category, but the number of samples wrongly predicted by the model as a negative category; TN (True Negative) is the number of samples that were originally negative and the model correctly predicted as negative. The calculation methods of accuracy, accuracy, recall, and F1-score are shown in Formulas (1)–(4):

A c c u r a c y = \frac{T P + T N}{T N + F N + T P + F P}

(1)

P r e c i s i o n = \frac{T P}{T P + F P}

(2)

R e c a l l = \frac{T P}{T P + F N}

(3)

F 1 - s c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(4)

3. Data

3.1. Data Set Construction

The data collection area is selected within the orthophoto region of the urban core built-up area, with a total sample coverage of approximately 4231 square kilometers. The original data consist of Google 19-level RGB three-channel civil satellite remote sensing image maps acquired by QGIS software (version 3.38.1), with a spatial resolution of 0.25 m. During the data preprocessing stage, the original satellite images of the four urban core built-up areas are manually cut into uniformly sized tiles of 256 × 256 pixels. Ultimately, a total number of 10,640 samples are obtained in the study area, with 80% randomly allocated to the training set and the remaining 20% allocated to the test set [12,50,51]. Our data set has been uploaded to a data repository, and the download site is https://www.scidb.cn/en/s/yQJFfm (accessed on 15 May 2024).

Since most of open source data set does not meet the requirements for relatively fine land classification, the image information of the remote sensing images in our data set was identified through visual interpretation. The land use classification of the data set was divided into 14 categories. The category names are as follows: building, building_greenland, building_street, construction_vacant, greenland, greenland_street, greenland_street_parking, parking, public space, railway, river_lake_square road_street, sport, and street_parking. These 14 categories of land use classification better reflect the characteristics of urban land use in cities in northern China. Some of these categories such as parking, green spaces, rivers, lakes, squares streets, and sports allow for more refined quantitative analyses of the built environment. For example, the greening rate of communities, the accessibility of neighborhoods to blue and green space and sports facilities, the quality of greening in residential neighborhoods and streets, the distribution of parking space, and the occupation of parking in public spaces such as sidewalks can be analyzed. The constructed data set contains remote sensing images from the core built-up areas of Handan, Shijiazhuang, Tangshan, and Xingtai. There are 200 remote sensing images for each category stored in folders named after each category, and each city has 14 folders according to the classification. The data format is TIF. The file naming method is “city name + serial number”; for example, one image in Handan data set is named “handan11120.TIF”, and the classification samples are presented in Table 2.

The rules for manual classification of the data set images include: defining building, parking, and road as the main classification categories; if the land use classification markers of a category in a tile account for more than two-thirds of the total image area, the land use classification of that tile is determined to be in that category, such as the building category and the parking category; if each of the two land use classification markers in a tile account for half of the image area, the land use classification of that tile is determined to be a mix of two categories, such as the building and green space category and the building and street category; if each of the three land use classification markers in a tile account for one-third of the image area, the land use classification of that tile is determined to be a mix of three categories, such as the green space, street and parking category; and if the land use classification markers in a tile are less than one-quarter of the image area, that category is not considered in the site classification results for that tile.

3.2. Experimental Environment and Parameter Configuration

The experiment’s training and testing environment is the deep learning framework Pytorch 1.11.0 + CUDA 11.6, with programming language Python 3.9.12 and operating system Windows 10. The specific hardware configuration includes an Intel Core i5-9700F @2.90 GHz processor, 32 GB of RAM, and an NVIDIA RTX3060 12 G graphics card. During the convolutional neural network training process, the data set is divided into multiple batches (Batch), with each batch containing 64 images (Batch Size). The experimental setup for image input size is uniformly set to 384 × 480. All images undergo one pass through the model as a single epoch (Epoch), with a total of 50 iterations. After employing the Adam optimization algorithm and conducting several rounds of debugging, the experimental learning rate is determined to be 0.001.

4. Results and Discussion

4.1. Convolutional Network Basic Model Comparison

The process of training and testing the Handan City land use classification data set involved the utilization of multiple convolutional neural network models. Experimental and comparative analyses with some previous convolutional neural networks, as evidenced in Table 3, indicate that EfficientNetV2 has a clear advantage in terms of training speed and detection accuracy, with fewer parameters and a lighter model accuracy. The accuracy results of the testing sets show that the EfficientNetV2 model performs the best, with an accuracy rate of 80.97%. However, it is important to note that this accuracy rate did not reach a higher level due to several factors. One significant factor was the low resolution of the civil satellite remote sensing images used for classification. Additionally, confusion among the refined 14 categories of land use classification contributed to this limitation. In light of these challenges, it was found that adopting a transfer learning method holds promise for improving the accuracy rate.

4.2. Transfer Learning Process

By leveraging transfer learning techniques, we aim to capitalize on existing knowledge and patterns learned from one task or domain and apply them to another related task or domain—thereby potentially improving our model’s ability to accurately classify land use categories within Handan City. This strategic shift towards transfer learning reflects our commitment to continuously refining our approach in order to achieve more robust and reliable results in land use classification using remote sensing data. The constructed data set comprises four cities, and the weights obtained after training the model on data set for one city are applied to the next city data set for transfer learning. Studies have shown that transfer learning is more effective in domains with higher data set similarity and requires fewer iterations for the same classification [32]. In order to achieve such notable improvements, a series of strategies and methods were employed. Firstly, the pre-trained model was used for training on the Handan data set, and the resulting weights were applied to the next city data set for transfer learning. Then, continuous training was carried out on the Shijiazhuang data set to obtain the model result of this city, and it was applied to the next city—the Xingtai data set. Finally, further training was conducted on the Tangshan data set to obtain the Tangshan model result. Through transfer learning, there was a gradual improvement in accuracy of each city’s land use classification testing set: 81.44% for Handan City testing set, 84.67% for Shijiazhuang City testing set, 86.58% for Xingtai City testing set, and 88.22% for Tangshan City testing set. The comparison of using pre-training weights with transfer learning for each data set is shown in Table 4.

In the quantitative analysis part of this paper, the accuracy, precision, recall, and F1score are used to evaluate the recognition and classification effect of the model on each data set. For the training results of four city data sets, we can see that the results of each average index are improved after the transfer learning in turn. The results are shown in Table 5.

4.3. Accuracy and Loss Analysis

The data obtained from the training method with a learning rate of 0.001 were selected, and it was found that as the training accuracy increased, the validation accuracy started to increase, and as the training loss decreased, the validation loss started to decrease. In addition, the accuracy of the validation set was found to be generally higher than that of the training set during the training process. The reason for this may be that the data set was not evenly proportioned when it was sliced, resulting in the variance within the training set being greater than that within the validation set to the point of error.

After completing the step-by-step training and testing of the land use classification data set of four cities, the overall data set was integrated according to 14 classifications and applied to the EfficientNetV2 convolutional neural network model for testing and training. The accuracy of the overall data set classification is 84.56%, indicating that the model performs well when applied to land use classification for this data set. It can be applied to identify large-scale spatial land use types and statistical analysis of urban land use. Figure 5 shows the overall accuracy and loss graphs of the four cities, which proves that the current model has good accuracy on both the training and testing sets. In this study, it was found that too few training times do not fully realize the learning ability of the model, while too many training times may lead to unstable change curves or cause an overfitting phenomenon. With the increment of epoch, the loss values of both the training and testing sets start to show a decreasing trend, and the loss curve converges and stabilizes after 250 epochs.

4.4. Model Confusion Matrix Analysis

In addition to using a confusion matrix for evaluating model performance, it is important to consider the impact of its results. The confusion matrix provides valuable insights into the classification errors across different categories and helps in assessing the level of confusion between them. By visually representing the distribution of misclassifications, it offers a comprehensive view of how well the model is performing. Furthermore, when analyzing the n × n matrix recording image, where each number represents the probability of being classified as a certain category. As illustrated in the confusion matrix diagram, the similarity of visual features between different categories can result in inaccurate categorization, with some categories being particularly susceptible to confusion. For instance, distinguishing between green space and street or parking areas may pose challenges due to their shared visual characteristics. This similarity can result in misclassifications and contribute to errors between predicted and actual categories. The standardized confusion matrix derived from recognition results plays a crucial role in identifying this potential error. It allows for a deeper understanding of where misclassifications are occurring and highlights areas for improvement in model training or feature engineering. Ultimately, the insights derived from the confusion matrix analysis illustrated in Figure 6 can be employed to inform decisions regarding the refinement of the model’s performance and the enhancement of its capacity to accurately classify images across diverse categories.

For EfficientNetV2 model, the accuracy, precision, recall, and F1-score are calculated by category and population. As the average of all categories, the classification results of each category are shown in Table 6, Table 7, Table 8 and Table 9. Because of the different landforms and structures of each city, the classification results will be different. The training results of the model in Handan City Data set show that the precision of construction_vacant and greenland_street_parking is the highest, while that in Shijiazhuang City Data set shows that the precision of construction_vacant is the highest, that in Xingtai City Data set is river_lake_square and sport, and that in Tangshan City Data set is building, parking, and railway. Because F1-score contains both accuracy and recall scores, it is an important consideration when evaluating the classification performance. In Handan data set, the F1-Score of construction_vacant, railway, and river_lake_square is the highest, with 0.998, 0.990, and 0.942, respectively. In the data set of Shijiazhuang City, the F1-Score of construction_vacant, sport, and river_lake_square is the highest, with 0.956, 0.952, and 0.948, respectively. In Xingtai data set, the F1-Score of river_lake_square, sport, and building_ greenland is the highest, with 0.998, 0.998, and 0.985, respectively. In Tangshan data set, the F1-Score of building, sport, and river_lake_square is the highest, with 0.989, 0.980, and 0.976 respectively.

4.5. Inference Results

Using the trained model, the land use classification reasoning of tiles was carried out, and the inference results of some cases of tiles are shown in Table 10. Combined with the remote sensing image tile data, the model classification results were visualized and analyzed in GIS to obtain the land use classification map in the study area, as shown in Figure 7. The identification results show that the proportion of building and street, public space, and green space types is high in the four urban core built-up areas. Additionally, it was observed that there is a significant presence of industrial zones and agricultural land on the periphery of these urban core areas. Moreover, another finding of this study was that certain areas had mixed-use development with a mix of residential buildings and commercial establishments.

In summary, Table 11 presents the proportion of each refined urban land use classification for four urban core areas using a combination of advanced GIS techniques combined. This information provides valuable insights for policymakers and urban planners to make informed decisions regarding sustainable development initiatives within these urban regions.

Although the introduction highlights the development of a new data set for urban land use classification in northern China, the methodology and framework of the data set are universal. It is based on widely recognized land use classification standards and combines Geographic Information System (GIS) and remote sensing technology, both of which are universal technologies and methods worldwide. Therefore, in theory, this data set development method can be transplanted and applied in other regions or countries. However, there are differences in natural environment, economic development level, urban structure, and cultural background among different region or countries, which may affect the direct transplantation of data sets. To be applied in other regions or countries, the following factors need to be considered: First, natural conditions such as climate and topography in different regions may influence land use patterns, so the development of data sets should be adjusted according to local actual conditions. Secondly, the economic development stage, industrial structure, and population distribution will impact land use classification, so the application of data sets should be combined with local socio-economic conditions. In the future, we can conduct case studies in specific regions or countries, verify and adjust the development methods of data sets, and gradually extend them to wider regions. Through practice and continuous improvement, the portability and universality of this method can be enhanced. On a larger data set, the model’s accuracy may remain stable or decrease, depending on the quality and diversity of the data. In different cities, the model’s accuracy may vary due to changes in land use patterns. The model’s generalizability to new cities or larger data sets is critical. If the model overfits the original data set, its generalizability may be compromised. Consequently, it is essential to refine the model and establish a standardized data processing and model training protocol to facilitate rapid deployment across diverse data sets and enhance its capability to handle large data sets.

5. Conclusions

The refined identification of urban land classification is essential for the statistical analysis of urban land and assessment of built environment quality. In this paper, we explored an automatic method for identifying urban land classification using the EfficientNetV2 model to accurately and efficiently identify multiple urban land types in remote sensing images. The research conclusions are as follows:

(1): Compared with the traditional method of field survey to identify land use, the establishment of a remote sensing image data set can effectively avoid the problems of extended time, large input of human and material resources, and low efficiency of extracting information. The fine land use identification method proposed in this study can be applied to the typical northern Chinese cities in the data set to obtain the latest urban land use classification data in a time-efficient manner. This allows for the real-time monitoring and scientific assessment of the built-up situation of urban land use.
(2): The EfficientNetV2 model is used to perform large-scale land classification inference on remote sensing images, providing technical methods for more refined quantitative analysis of the built environment, such as the greening rate of the community, the quality of the blue and green space of neighborhoods, the accessibility of sports facilities, and the occupation of parking in the public space.
(3): Experiments can be completed more efficiently by utilizing the lightweight and fast nature of the EfficientNetV2 model compared to traditional convolutional neural networks. The testing set accuracy was improved after transfer learning, and the model is gradually stabilized with increased iterations. Higher accuracy of land use classification is achieved by using free civil remote sensing images, which provides a new idea and method for quantitative analysis of the urban built environment at the macro level. We will consider collecting more remote sensing satellite image data to build up data sets for training and testing so that it can be better applied to identify urban land classification in future studies.

There are some limitations in this study. Firstly, there are two constraints in the artificially constructed data set. On the one hand, it is prone to be influenced by the mixture of multiple land use types during manual classification, which would lead to the confusion of the classification results. On the other hand, there might be minor differences in the definition of the same photo by different researchers, which could also constitute errors in the classification data set. Subsequently, this error can be reduced to the acceptable range after repetitive tests by multiple individuals. Secondly, due to the variations in urban structure, land use distribution, and land usage, multiple city data sets in different regions should be constructed, and large-scale data training should be carried out to improve the generalization ability of deep learning models.

Author Contributions

Conceptualization, Z.W. and X.Z.; data curation, Y.L., Y.H., Y.C. and X.Z.; funding acquisition, Z.W.; methodology, Z.W., Y.L. and X.Z.; resources, Y.H.; supervision, Z.W.; visualization, Y.H.; writing—original draft, Z.W., Y.L. and Y.C.; writing—review and editing, Z.W., Y.L. and Y.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Hebei Social Science Development Research Project in 2023 (No. 20230203044).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Alcock, I.; White, M.; Cherrie, M.; Wheeler, B.; Taylor, J.; McInnes, R.; Im Kampe, E.O.; Vardoulakis, S.; Sarran, C.; Soyiri, I. Land cover and air pollution are associated with asthma hospitalisations: A cross-sectional study. Environ. Int. 2017, 109, 29–41. [Google Scholar] [CrossRef] [PubMed]
Chan, I.Y.; Liu, A.M. Effects of neighborhood building density, height, greenspace, and cleanliness on indoor environment and health of building occupants. Build. Environ. 2018, 145, 213–222. [Google Scholar] [CrossRef]
Hassan, A.M.; Lee, H. Toward the sustainable development of urban areas: An overview of global trends in trials and policies. Land Use Policy 2015, 48, 199–212. [Google Scholar] [CrossRef]
Xia, C.; Yeh, A.G.-O.; Zhang, A. Analyzing spatial relationships between urban land use intensity and urban vitality at street block level: A case study of five Chinese megacities. Landsc. Urban Plan. 2020, 193, 103669. [Google Scholar] [CrossRef]
Gong, P.; Li, X.; Zhang, W. 40-Year (1978–2017) human settlement changes in China reflected by impervious surfaces from satellite remote sensing. Sci. Bull. 2019, 64, 756–763. [Google Scholar] [CrossRef]
Li, W.; Dong, R.; Fu, H.; Wang, J.; Yu, L.; Gong, P. Integrating Google Earth imagery with Landsat data to improve 30-m resolution land cover mapping. Remote Sens. Environ. 2020, 237, 111563. [Google Scholar] [CrossRef]
Gao, J.; O’Neill, B.C. Mapping global urban land for the 21st century with data-driven simulations and Shared Socioeconomic Pathways. Nat. Commun. 2020, 11, 2302. [Google Scholar] [CrossRef]
He, J.; Li, X.; Liu, P.; Wu, X.; Zhang, J.; Zhang, D.; Liu, X.; Yao, Y. Accurate estimation of the proportion of mixed land use at the street-block level by integrating high spatial resolution images and geospatial big data. IEEE Trans. Geosci. Remote Sens. 2020, 59, 6357–6370. [Google Scholar] [CrossRef]
Li, W.; Wu, C. Incorporating land use land cover probability information into endmember class selections for temporal mixture analysis. ISPRS J. Photogramm. Remote Sens. 2015, 101, 163–173. [Google Scholar] [CrossRef]
Li, W.; Wu, C.; Zang, S. Modeling urban land use conversion of Daqing City, China: A comparative analysis of “top-down” and “bottom-up” approaches. Stoch. Environ. Res. Risk Assess. 2014, 28, 817–828. [Google Scholar] [CrossRef]
Zhang, W.; Tang, P.; Zhao, L. Remote sensing image scene classification using CNN-CapsNet. Remote Sens. 2019, 11, 494. [Google Scholar] [CrossRef]
Yao, Y.; Yan, X.; Luo, P.; Liang, Y.; Ren, S.; Hu, Y.; Han, J.; Guan, Q. Classifying land-use patterns by integrating time-series electricity data and high-spatial resolution remote sensing imagery. Int. J. Appl. Earth Obs. Geoinf. 2022, 106, 102664. [Google Scholar] [CrossRef]
Zhong, Y.; Zhu, Q.; Zhang, L. Scene classification based on the multifeature fusion probabilistic topic model for high spatial resolution remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 2015, 53, 6207–6222. [Google Scholar] [CrossRef]
Gomroki, M.; Hasanlou, M.; Reinartz, P. STCD-EffV2T unet: Semi transfer learning EfficientNetV2 T-unet network for urban/land cover change detection using sentinel-2 satellite images. Remote Sens. 2023, 15, 1232. [Google Scholar] [CrossRef]
Afrin, S.; Gupta, A.; Farjad, B.; Ahmed, M.R.; Achari, G.; Hassan, Q.K. Development of land-use/land-cover maps using landsat-8 and MODIS data, and their integration for Hydro-Ecological applications. Sensors 2019, 19, 4891. [Google Scholar] [CrossRef] [PubMed]
Dastour, H.; Hassan, Q.K. A comparison of deep transfer learning methods for land use and land cover classification. Sustainability 2023, 15, 7854. [Google Scholar] [CrossRef]
Kavran, D.; Mongus, D.; Žalik, B.; Lukač, N. Graph neural network-based method of spatiotemporal land cover mapping using satellite imagery. Sensors 2023, 23, 6648. [Google Scholar] [CrossRef]
Li, W. Mapping urban land use by combining multi-source social sensing data and remote sensing images. Earth Sci. Inform. 2021, 14, 1537–1545. [Google Scholar] [CrossRef]
Yin, J.; Dong, J.; Hamm, N.A.; Li, Z.; Wang, J.; Xing, H.; Fu, P. Integrating remote sensing and geospatial big data for urban land use mapping: A review. Int. J. Appl. Earth Obs. Geoinf. 2021, 103, 102514. [Google Scholar] [CrossRef]
Gong, X.; Su, H.; Xu, D.; Zhang, Z.; Shen, F.; Yang, H. An overview of contour detection approaches. Int. J. Autom. Comput. 2018, 15, 656–672. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Qi, K.; Wu, H.; Shen, C.; Gong, J. Land-use scene classification in high-resolution remote sensing images using improved correlatons. IEEE Geosci. Remote Sens. Lett. 2015, 12, 2403–2407. [Google Scholar] [CrossRef]
Zhang, Q.; Zhu, S. Visual interpretability for deep learning: A survey. Front. Inf. Technol. Electron. Eng. 2018, 19, 27–39. [Google Scholar] [CrossRef]
He, F.; Tao, D. Recent advances in deep learning theory. arXiv 2020, arXiv:2012.10931. [Google Scholar] [CrossRef]
Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to Sequence Learning with Neural Networks. arXiv 2014, arXiv:1409.3215. [Google Scholar]
Zhang, R.; Li, W.; Mo, T. Review of deep learning. arXiv 2018, arXiv:1804.01653. [Google Scholar] [CrossRef]
Wang, M.; Tan, K.; Jia, X.; Wang, X.; Chen, Y. A deep siamese network with hybrid convolutional feature extraction module for change detection based on multi-sensor remote sensing images. Remote Sens. 2020, 12, 205. [Google Scholar] [CrossRef]
Wen, D.; Huang, X.; Zhang, L.; Benediktsson, J.A. A novel automatic change detection method for urban high-resolution remotely sensed imagery based on multiindex scene representation. IEEE Trans. Geosci. Remote Sens. 2015, 54, 609–625. [Google Scholar] [CrossRef]
Zhang, Y.; Govindaraj, V.V.; Tang, C.; Zhu, W.; Sun, J. High performance multiple sclerosis classification by data augmentation and AlexNet transfer learning model. J. Med. Imaging Health Inform. 2019, 9, 2012–2021. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar] [CrossRef]
Tan, M.; Le, Q.V. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019. [Google Scholar]
Tan, M.; Le, Q.V. EfficientNetV2: Smaller models and faster training. In Proceedings of the 38th International Conference on Machine Learning, Online, 18–24 July 2021. [Google Scholar]
Li, A.; Qi, J.; Lu, H. Multi-attention guided feature fusion network for salient object detection. Neurocomputing 2020, 411, 416–427. [Google Scholar] [CrossRef]
Raza, A.; Huo, H.; Fang, T. EUNet-CD: Efficient UNet++ for change detection of very high-resolution remote sensing images. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
Yang, L.; Chen, Y.; Song, S.; Li, F.; Huang, G. Deep siamese networks based change detection with remote sensing images. Remote Sens. 2021, 13, 3394. [Google Scholar] [CrossRef]
Zhang, M.; Shi, W. A feature difference convolutional neural network-based change detection method. IEEE Trans. Geosci. Remote Sens. 2020, 58, 7232–7246. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Bello, I.; Fedus, W.; Du, X.; Cubuk, E.D.; Srinivas, A.; Lin, T.-Y.; Shlens, J.; Zoph, B. Revisiting resnets: Improved training and scaling strategies. Adv. Neural Inf. Process. Syst. 2021, 34, 22614–22627. [Google Scholar]
Cao, G.; Wang, B.; Xavier, H.-C.; Yang, D.; Southworth, J. A new difference image creation method based on deep neural networks for change detection in remote-sensing images. Int. J. Remote Sens. 2017, 38, 7161–7175. [Google Scholar] [CrossRef]
Liu, S.; Shi, Q. Local climate zone mapping as remote sensing scene classification using deep learning: A case study of metropolitan China. ISPRS J. Photogramm. Remote Sens. 2020, 164, 229–242. [Google Scholar] [CrossRef]
Yuan, Q.; Shen, H.; Li, T.; Li, Z.; Li, S.; Jiang, Y.; Xu, H.; Tan, W.; Yang, Q.; Wang, J. Deep learning in environmental remote sensing: Achievements and challenges. Remote Sens. Environ. 2020, 241, 111716. [Google Scholar] [CrossRef]
Zhang, X.; Du, S. A Linear Dirichlet Mixture Model for decomposing scenes: Application to analyzing urban functional zonings. Remote Sens. Environ. 2015, 169, 37–49. [Google Scholar] [CrossRef]
Yu, J.; Zeng, P.; Yu, Y.; Yu, H.; Huang, L.; Zhou, D. A combined convolutional neural network for urban land-use classification with GIS data. Remote Sens. 2022, 14, 1128. [Google Scholar] [CrossRef]
Seydi, S.T.; Hasanlou, M.; Amani, M. A new end-to-end multi-dimensional CNN framework for land cover/land use change detection in multi-source remote sensing datasets. Remote Sens. 2020, 12, 2010. [Google Scholar] [CrossRef]
Gong, P.; Chen, B.; Li, X.; Liu, H.; Wang, J.; Bai, Y.; Chen, J.; Chen, X.; Fang, L.; Feng, S.; et al. Mapping essential urban land use categories in China (EULUC-China): Preliminary results for 2018. Sci. Bull. 2020, 65, 182–187. [Google Scholar] [CrossRef] [PubMed]
Yao, Y.; Liu, X.; Li, X.; Zhang, J.; Liang, Z.; Mai, K.; Zhang, Y. Mapping fine-scale population distributions at the building level by integrating multisource geospatial big data. Int. J. Geogr. Inf. Sci. 2017, 31, 1220–1244. [Google Scholar] [CrossRef]
Zhu, Q.; Zhong, Y.; Zhao, B.; Xia, G.-S.; Zhang, L. Bag-of-visual-words scene classifier with local and global features for high spatial resolution remote sensing imagery. IEEE Geosci. Remote Sens. Lett. 2016, 13, 747–751. [Google Scholar] [CrossRef]
Dai, D.; Yang, W. Satellite image classification via two-layer sparse coding with biased image representation. IEEE Geosci. Remote Sens. Lett. 2010, 8, 173–176. [Google Scholar] [CrossRef]
Yang, Y.; Newsam, S. Bag-of-visual-words and spatial extensions for land-use classification. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose, CA, USA, 2–5 November 2010; pp. 270–279. [Google Scholar]
Helber, P.; Bischke, B.; Dengel, A.; Borth, D. Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 2217–2226. [Google Scholar] [CrossRef]
Li, J.; Lin, D.; Wang, Y.; Xu, G.; Zhang, Y.; Ding, C.; Zhou, Y. Deep discriminative representation learning with attention map for scene classification. Remote Sens. 2020, 12, 1366. [Google Scholar] [CrossRef]

Figure 1. Flowchart of the proposed method.

Figure 2. The structure of EfficientNetV2-s, which contains eight blocks.

Figure 3. MBConv and Fused-MBConv module structure. (a) the structure of MBConv blocks; (b) the structure of Fused-MBConv blocks.

Figure 4. Flowchart of the transfer learning method.

Figure 5. Accuracy and loss of experimental results. (a) training set accuracy and testing set accuracy; (b) training set loss and testing set loss.

Figure 6. Confusion matrix. (a) Handan City data set; (b) Shijiazhuang City data set; (c) Xingtai City data set; (d) Tangshan City data set.

Figure 7. Distribution of refined urban land use classification for the four urban core areas. (a) Handan City, (b) Shijiazhuang City, (c) Xingtai City, and (d) Tangshan City.

Table 1. Literature review of land use classification based on the deep learning model.

Author	Data Source	Models	Application
Cao et al. (2017) [39]	Deep brief network	Multispectral SPOT-5 and Landsat images, Google earth images	Urban land use and vegetation
Zhang et al. (2019) [11]	ImageNet data set	CNN-CapsNet	Scene classification
Wang et al. (2020) [27]	ZY-3 and GF-2 satellites	Hybrid convolutional feature extraction module	Feature Extraction Module for change detection based on multi-sensor remote sensing images
Yao et al. (2022) [12]	Google Earth images	TR-CNN	Perceive urban land-use patterns
Yu et al. (2022) [43]	GIS data to produce a well-tagged and high-resolution urban land-use image data set	DUA-Net	Land use classification
Gomroki et al. (2023) [14]	Sentinel-2 satellite images, OSCD data set	EffIcientNetV2 T-Unet, Semi transfer learning	Urban change detection
Dastour and Hassan (2023) [16]	EuroSAT data set	ResNet50, EfficientNetV2B0, ResNet152	Land use/Land cover classification
Kavran et al. (2023) [17]	Sentinel-2 L2A imagery	EfficientNetV 2-S	Land use/Land cover classification
Liu et al. (2020) [40]	Sentinel-2 multispectral data	A deep convolutional neural network composed of residual learning and the Squeeze-and-Excitation block, namely the LCZNet	Land use classification
Seydi et al. (2020) [44]	OSCD	Multidimensional CNN	Urban land use land cover

Table 2. Illustration of a sample land classification.

Category	Tile	Category	Tile
building		building_greenland
building_street		construction_ vacant
road_street		parking
street_parking		greenland_street_parking
greenland		greenland_street
public space		river_lake_square
sport		railway

Table 3. Various methods and their accuracy of testing set.

Methodologies	Accuracy (%)
AlexNet	61.00
ResNet	72.98
DenseNet	75.00
Transformer	77.96
EfficientNet	75.99
EfficientNetV2	80.97

Table 4. Testing sets accuracy comparison.

Data Set	Accuracy with Pre-Training Weight (%)	Accuracy with Transfer Learning Weight (%)
Handan City data set	81.44
Shijiazhuang City Data set	79.12	84.67
Xingtai City data set	79.83	86.58
Tangshan City data set	80.27	88.22

Table 5. Accuracy, Precision, Recall, and F1-Score, which were averaged over the train and test sets.

Data Set Name	Metric
Data Set Name	Accuracy	Precision	Recall	F1-Score
Handan City data set	0.814	0.815	0.814	0.812
Shijiazhuang City data set	0.847	0.862	0.862	0.860
Xingtai City data set	0.866	0.890	0.888	0.889
Tangshan City data set	0.882	0.892	0.886	0.886

Table 6. Precision, Recall, and F1-Score of Handan City data set by category.

Land Use Category	Metric
Land Use Category	Precision	Recall	F1-Score
building	0.825	0.800	0.812
building_greenland	0.744	0.900	0.814
building_street	0.690	0.594	0.638
construction_vacant	1.000	0.980	0.990
greenland	0.933	0.840	0.884
greenland_street	0.724	0.636	0.677
greenland_street_parking	0.581	0.545	0.563
parking	0.766	0.727	0.746
public space	0.884	0.760	0.817
railway	1.000	1.000	0.998
river_lake_square	0.907	0.980	0.942
road_street	0.736	0.780	0.757
sport	0.926	1.000	0.962
street_parking	0.689	0.848	0.760

Table 7. Precision, Recall, and F1-Score of Shijiazhuang City data set by category.

Land Use Category	Metric
Land Use Category	Precision	Recall	F1-Score
building	0.931	0.960	0.945
building_greenland	0.895	0.859	0.876
building_street	0.776	0.918	0.841
construction_vacant	0.933	0.980	0.956
greenland	0.872	0.843	0.857
greenland_street	0.818	0.909	0.861
greenland_street_parking	0.750	0.743	0.746
parking	0.897	0.780	0.834
public space	0.772	0.796	0.784
railway	0.932	0.960	0.946
river_lake_square	0.958	0.939	0.948
road_street	0.833	0.808	0.821
sport	0.909	1.000	0.952
street_parking	0.792	0.576	0.667

Table 8. Precision, Recall, and F1-Score of Xingtai City data set by category.

Land Use Category	Metric
Land Use Category	Precision	Recall	F1-Score
building	0.979	0.950	0.964
building_greenland	0.971	1.000	0.985
building_street	0.854	0.880	0.867
construction_vacant	0.885	0.929	0.906
greenland	0.979	0.960	0.969
greenland_street	0.789	0.750	0.769
greenland_street_parking	0.755	0.808	0.780
parking	0.889	0.808	0.847
public space	0.955	0.867	0.909
railway	0.902	0.920	0.911
river_lake_square	1.000	1.000	0.998
road_street	0.750	0.750	0.750
sport	1.000	1.000	0.998
street_parking	0.752	0.812	0.781

Table 9. Precision, Recall, and F1-Score of Tangshan City data set by category.

Land Use Category	Metric
Land Use Category	Precision	Recall	F1-Score
building	1.000	0.988	0.989
building_greenland	0.883	0.980	0.929
building_street	0.851	0.970	0.907
construction_vacant	0.952	1.000	0.976
greenland	0.938	0.891	0.914
greenland_street	0.710	0.863	0.779
greenland_street_parking	0.774	0.650	0.707
parking	1.000	0.828	0.906
public space	0.976	0.837	0.901
railway	1.000	0.940	0.969
river_lake_square	0.952	1.000	0.976
road_street	0.818	0.735	0.774
sport	0.962	1.000	0.980
street_parking	0.667	0.714	0.690

Table 10. Land use inference category cases.

Tile Case	Inference Category	Tile Case	Inference Category
	building		building_greenland
	building_street		construction_ vacant
	road_street		parking
	street_parking		greenland_street_ parking
	greenland		greenland_street
	public space		river_lake_square
	sport		railway

Table 11. The proportion of each refined urban land use classification for the four urban core areas.

Categories	Handan City (%)	Shijiazhuang City (%)	Xingtai City (%)	Tangshan City (%)
building	10.84	2.56	6.38	3.37
building_greenland	5.69	7.37	5.59	5.76
building_street	18.33	10.07	3.33	9.76
construction_vacant	11.33	1.51	2.09	3.41
greenland	5.65	10.41	27.49	12.82
greenland_street	6.03	6.35	8.83	5.68
greenland_street_parking	2.34	1.86	2.48	6.02
parking	4.3	18.15	12.27	16.66
public space	10.4	15.48	17.26	16.65
railway	10.88	9.62	4.8	5.6
river_lake_square	2.28	3.88	2.33	2.53
road_street	3.27	1.45	1.36	2.02
sport	5.3	7.13	5.06	6.98
street_parking	3.37	4.16	0.72	2.73

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Z.; Liang, Y.; He, Y.; Cui, Y.; Zhang, X. Refined Land Use Classification for Urban Core Area from Remote Sensing Imagery by the EfficientNetV2 Model. Appl. Sci. 2024, 14, 7235. https://doi.org/10.3390/app14167235

AMA Style

Wang Z, Liang Y, He Y, Cui Y, Zhang X. Refined Land Use Classification for Urban Core Area from Remote Sensing Imagery by the EfficientNetV2 Model. Applied Sciences. 2024; 14(16):7235. https://doi.org/10.3390/app14167235

Chicago/Turabian Style

Wang, Zhenbao, Yuqi Liang, Yanfang He, Yidan Cui, and Xiaoxian Zhang. 2024. "Refined Land Use Classification for Urban Core Area from Remote Sensing Imagery by the EfficientNetV2 Model" Applied Sciences 14, no. 16: 7235. https://doi.org/10.3390/app14167235

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Refined Land Use Classification for Urban Core Area from Remote Sensing Imagery by the EfficientNetV2 Model

Abstract

1. Introduction

2. Methods

2.1. EfficientNetV2 Model

2.2. Transfer Learning

2.3. Model Validation

3. Data

3.1. Data Set Construction

3.2. Experimental Environment and Parameter Configuration

4. Results and Discussion

4.1. Convolutional Network Basic Model Comparison

4.2. Transfer Learning Process

4.3. Accuracy and Loss Analysis

4.4. Model Confusion Matrix Analysis

4.5. Inference Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI