1. Introduction
Population data are one of the most direct indicators of human activity [
1]. With the development of China’s urbanization process from 1949 to 2015, the proportion of the urban population in China increased from approximately 10% to 57.35% [
2]. The spatial distribution of the population, population flow, and population structure are becoming increasingly important for the development of cities. The spatial distribution of the population influences not only the urbanization process and living environment [
3,
4], but also the development plan of the regional public education system, medical facilities, and other services, which are related to people’s vital interests [
5,
6,
7].
The spatial distribution of the population is affected by many factors, such as geographic location, land cover, convenience of road networks, water areas, and economic development [
8,
9]. Therefore, traditional research methods mainly fit spatial population distributions by studying the coupling relationship between regional population density and its influence factors. Liao Shunbao et al. [
10] examined the correlation between the population density and land use in Tibet and Qinghai Province and proposed a spatial model of population through multi-source data fusion method. Du Guoming et al. [
11] used the data from the fifth census of Shenyang City and residential areas data in order to simulate population distribution through the spatial interpolation method. Given the shortcomings of the current spatial methods for urban populations, Kang Tingjun et al. [
12] developed a multi-agent-based urban population distribution method. Using North Korea’s district-level census data, Shi Tingting et al. [
13] analyzed the relationship between North Korea’s population density and spatial factors, and then performed multiple regression analysis to spatial status of North Korea’s population density. Dong Chun et al. [
14] combined population statistics data with geographical data and economic data to establish apopulation spatialization method, which examines the coupling relationship between population distribution and related factors in a certain region.
Remote sensing imagery provides a new idea for population spatial modeling [
15,
16]. Many scholars use the advantages of remote sensing imagery, including its multilevel nature and high degree of timeliness, combined with geographic information system (GIS) technology to buildpopulation spatialization model at different spatial levels [
17,
18,
19,
20]. Chen Qing et al. [
21] studied the correlation between night-time remote sensing images and geographic factors, performing a population spatialization experiment in the highly efficient eco-economic region in the Yellow River Delta. Lo C P et al. [
22] studied the relationship between the gray value of thematic mapper (TM) images in different bands and urban population density. Li Shujuan et al. [
23] used high-resolution remote sensing image to extract building information for different functions and calculated the population accommodation coefficients of different buildings in order to establish the spatial distribution map of the urban population. Wang Shixin et al. [
24] used three-dimensional (3D) reconstruction technology to identify and extract urban residents’ housing areas and elevation in ZY-3 images.
Typically, in population census data, the statistical unit is the administrative unit, thus, the statistical level is relatively coarse and the types of data are limited. Therefore, the selection and improvement of mathematical methods are crucial for obtaining high-precision population spatialization results. Commonly used methods include geostatistics methods [
25], spatial regression models [
26,
27], spatial interpolation methods [
28,
29], and machine learning methods [
30,
31]. Holt et al. [
32] used the improved population weight method to interpolate census data spatially, and this method can better explain the spatial distribution of the population within the census administrative division. Wang Keijing et al. [
33] studied population spatialization by using multivariate statistical regression and geo-weighted regression (GWR) models. Cao Li-qin et al. [
34] predicted the population of 76 districts or counties in Hubei Province in 2002 by using the neural network model to establish a relationship between the brightness of nighttime light data and the urban population.
The study of population spatialization has become more comprehensive given the integration of more data sources [
35,
36] and technological methods [
37,
38]. At present, there are a number of mature data sets of population spatialization achievements covering the world, countries, or regions, such as Landscan [
39], Worldpop [
40], and GHS-POP [
41]. These data sets provide detailed and accurate results of population maps of dynamic population flow [
42,
43,
44], age structure change [
45,
46,
47], urbanization development [
48,
49,
50], building or settlement characteristic information [
51,
52,
53], and greatly promote the cross-study of population spatialization. By combining with other related fields, important data and method support are provided to guide the urban planning [
42,
54], to assess the risk of demographic risk [
55,
56], and to improve the population quality of life [
57,
58].
Facing thefact that the fusion of more and more data sources, the variety method of population spatialization and the difference perspective of population research, it is a very important direction of the future research to use suitable data and establish the population spatialization method to meet the needs of different administrative units. At present, many scholars have carried out a series of researches on data process and methods for the improvement of the data source precision [
54], the cross validation of population spatialization method [
45,
47], and the evaluation of the experimental results [
31]. Few people pay attention to the demand and difference of the population spatialization method under different administrative units. Based on this, this paper makes use of data sources to establish population spatialization methods under the perspective of different administrative units, and tries to establish a reasonable method system of population accuracy evaluation to verify the rationality of the experimental results.
The China’s first national geoinformation survey started in January 2013 and lasted three years. Its purpose was to systematically obtain the authoritative, objective, and accurate information on the geographic conditions of the country in order to provide an important data foundation for promoting ecological environmental protection and building a resource conserving and environmentally friendly society. Through synthetically using the global navigation satellite system (GNSS), aerospace remote sensing (RS) technology, GIS technology, and other modern surveying and mapping technology, the survey can dynamically and quantitatively recognize land surface morphology, land covers, build-up zones, and monitor the spatial distribution and development of resources, the environment, ecology, and economic factors. This data set mainly contains three types of data: land topography data (DEM, Slope data), land cover classification data (“LCA”, which contain 10 major categories, such as farmland, garden land, woodland, and more than 100 smaller categories), and social geographical units, including point of interests (“POIs ”, educational facilities, hospitals, and so on), administrative unit categories and vector boundaries and other urban integrated functional units (“BUCA”, “BUCP”). The greatest advantage of this data set is that it is highly accurate in building space information, such as building location, shape, and other characteristics, including building types and height. Such detailed information on building classification provides useful data for the study of population spatializationbased on the housing construction area.
The main innovations of this paper are as follows: (1) The spatial and attribute information of buildings in China’s first national geoinformation survey is fully mined. Through the combination of different administrative divisions and thresholds for the proportion of housing construction areas, this paper gives functional attributes to all buildings and screens out residential houses. (2) A multi-level population spatialization method that is applicable to different administrative unit levels is established. (3) Various methods are used to qualitatively and quantitatively study differences in the experimental results on thedifferent levels. Thecommon and differences are well analyzed and explained.
4. Results and Discussion
Wuchang District is one of the areas in downtown of Wuhan City and it is adjacent to the Yangtze River and the Han River. This district is the political center of Hubei Province and is also the place where universities and talents converge. The total area is 107.76 km2, and the center is 30°33’56”in the north latitude and 114°18’90” in the east. The urbanization rate of Wuchang District reached 96.2% in 2010, and the district has consistently maintained an urbanization rate of 100% in recent years. The population in Wuchang District has maintained a growth rate of approximately 1% since 2010, implying that the population has steadily increased. In 2015, there are 14 street-level governments, 196 educational resources, 105 medical and health facilities, and a road network length of 338.84 km.
According to the administrative divisions, Wuchang District consists of the following 14 streets Baishazhou, Huanghelou, Jiyuqiao, Liangdao, Luojiashan, Nanhu, Shouyi Road, Shuiguohu, Xujiapeng, Yangyuan, Zhonghua Road, Zhongnan Road, Ziyang, and Shidong. The geographic location of Wuchang District is shown in
Figure 6.
Wuchang District is selected as an experimental area for the following reasons:
- (1)
Wuchang District is located in the central urban area of Wuhan City, where the buildings are more concentrated and the types of buildings are more complicated. Therefore, the method will be scientific and universal if it has highly accurate results.
- (2)
Wuchang District contains 14 streets and 195 communities, and the data and information on house buildings are adequate.
- (3)
Wuhan City has conducted the Community Demographic Census since 2013. The granularity of statistical units is small, the population data sources are adequate, and the recency of the data is sufficient.
- (4)
Wuchang District has made many efforts to rebuild house buildings in recent years. If we can extract the spatial distribution of residential houses accurately, and remove abandoned buildings and other types of buildings to calculate the population spatialization results accurately, the study can provide important reference values for other cities with rapid urbanization.
4.1. The Results of the Population Spatialization Method on the District Level
In the experiment, it is not satisfactory to divide residential houses into four types because the results do not meet a conditional judgment. Thus, it is helpful to reclassify the residential houses. The correlation between the street population and estimated population when using the least squares regression model is shown in
Figure 7.
The
Figure 7 shows a clear linear correlation between the resident population and estimated population. The fitting coefficient reaches 0.936 and the goodness of fit is 0.725, which not only satisfies the conditional judgment, but also verifies that the experimental results have good accuracy.
According to the relative proportion between the estimated population and street population, it is reasonable to use Equation (3) to correct the population count of each type of house building. The corrected coefficient results are shown in
Table 3.
As shown in
Table 3, the relative proportion of almost all streets is basically less than 30%, except for Nanhu. The field investigation found that the main reason for this result is that the construction of Nanhu has been developing rapidly in recent years; the areas of house buildings have been increasing significantly, while the resident occupancy rate remains relatively low. The average fitting error in the Wuchang District is only 13.03%, indicating that the use of this method on the district level is reasonable.
According to the calculations of the population spatialization method on the district level, the number of population grid cells in Wuchang District is 1300 and the total population is approximately 1.21 million. The overall accuracy of the experiment reaches 99.98%, as the actual resident population is 1.182 million. The 250-m spatial population result for Wuchang District is shown in
Figure 8.
As shown in
Figure 8a, the areas with a large population are mainly distributed in the northern and middle-central areas of Wuchang District, including Yangyuan, Zhongnan Road, Zhonghua Road, Huanghelou, and Liangdao. The local correlation analysis result of the population spatialization is shown in
Figure 8b. Most areas of the Wuchang District do not have an obvious spatial correlation of the population, especially Xujiapeng, Jiyuqiao, Luojiashan, Baishazhou, and Nanhu, where the population is small and the residents are scattered. In regions where the correlation is obvious, the vast majority of regions satisfy the “High-High” condition, and they are concentrated in Yangyuan, Shuiguohu, Zhonghua Road, Huanghelou, Liangdao, and Zhongnan Road. The residential houses in these areas are relatively compact, and large-level residential quarters have a spatial structure of “adjacent”. Therefore, these areas mainly provide living space that meets the needs of residents in terms of regional planning.
As shown in
Figure 8c and
Figure 9 and
Table 4, as the population increases, these factors obviously reduce the service distance, and the experimental results have better fit accuracy. Medical and health resources are most sensitive to population concentration, and, when compared with other features, these services are associated with a higher percentage of the population in the nearest buffers. With the expansion of urbanization, the areas surrounding the government are mainly replaced by commercial land, and the population coverage in different buffer areas shows that the population tends to move outward from the center of the street, but the location of governments remains highly attractive. The reasonable distribution and perfect construction of educational resources and road networks have a lesser influence on the population level. As shown by the overlay, the central region with a larger population has obvious advantages in terms of location and the strong construction of public service facilities. In the southern parts of Zhongnan Road and Shuiguohu, areas that are located on the main road of the Wuchang District, the road network is well developed and traffic is convenient. Although these areas are far from the government, they are also highly attractive to the population, leading to the development of education and medical facilities.
4.2. The Results of the Population Spatialization Method on the Street Level
According to the calculations of the population spatialization method on the street level, the number of population grid cells in Wuchang District is 28,599 and the total population is approximately 1.22 million. The overall accuracy of the experiment reached 99.97%, as the actual resident population is 1.182 million. The experimental results are shown in
Figure 10.
As shown in
Figure 10a, the areas with large populations in Wuchang District are mainly distributed in the central region, including parts of Luojiashan, Zhongnan Road, Huanghelou, Shouyi Road, and Shuiguohu. In contrast to the results that are presented in
Section 3.1, these results show that the development of road networks in some parts of Luojiashan and Shuiguohu is relatively common, but the population is also relatively large. This result is mainly due to the proximity of colleges and universities to these areas, as educational and medical resources are abundant. Areas with a small population are also more concentrated, and are mainly distributed in Xujiapeng, Baishazhou, and Nanhu. Educational and medical and health resources in these areas are relatively scarce.
Similarly, the local correlation analysis on the results of the population spatialization is shown in
Figure 10b. The figure shows no obvious aggregation in most areas of Wuchang District. In addition, the “High-High” condition and “Low-Low” condition regions reflect the strong correlation of the population distribution in some areas of Wuchang District. The differentiation between high-aggregation regions and low-aggregation regions is also obvious. When comparing the results of the spatial autocorrelation on the district and street levels, except for some streets in Luojiashan and Xujiapeng, Shouyi Road, and Nanhu, the distribution of “High-High” population aggregation areas obtained by the two methods is very similar. The “Low-Low” population clustering characteristics on the street scale are more obvious than those on the district scale.
As shown in
Table 5, more than half of the regions in Wuchang District have no population. Most grid cells have less than 25 people, while the grid cells with more than 300 people account for only approximately 3% of the total grid cells. The number of grid cells shows a significant decrease as the number of people increases, indicating that living space in Wuchang District is relatively decentralized and that the number of population-concentrated areas is relatively small.
When compared with the use of the method on the district level, which shows a strong correlation between population results and these features, the use of the method on the street level can better reflect the geospatial uncertainty. Of the examined features, medical and health resources are the most sensitive to population concentration. Furthermore, as compared with the other features, these services are associated with a higher percentage of the population in the nearest buffers. The government has no apparent sensitivity to the extent of population aggregation. Distance increases as the number of people increasing, and the trend is rapidly decreasing in areas with a population of more than 1200. Educational resources and road networks have a relatively small impact on the population and cover nearly 80% of the population in the first buffer area. As shown by the overlay, most of the areas with large populations are close to governmental locations with medical and health and educational resources and well-developed road networks. Although road networks are not well developed in some parts of Luojiashan and Shuiguohu, there are universities and abundant educational and medical care resources nearby, so the population is large too. The Baishazhou and Nanhu streets are far from the urban center. These areas have poor road networks, and the population is small.
4.3. The Results of Cross-Validation Analysis
The average deviations of the population that were covered by government, educational resources, medical and health resources, and road networks in buffer zones were 7.98%, 0.91%, 3.68%, and 7.56%, respectively, and the correlation coefficient of the results that were obtained by the two methods was 0.59. This analysis shows that the population within the coverage of the impact factor is relatively small, and that the results of the two methods are highly consistent. A thematic map in the population in Wuchang District on the different levels is shown in
Figure 11.
Data on differences in the population results are collected, and the results are shown in
Table 6.
Table 6 shows that the population results obtained from the two levels are not significantly different.61.84% of the population difference value is between −0.4 and 0.4, and only approximately 9% of the results are less than −1 or greater than 1. Large population differences are concentrated in the marginal areas of Wuchang District, such as Xujiapeng, Zhongnan Road, Baishazhou, Shidong, and Luojiashan. When combined with the areas of residential houses, the type of residential houses, and the residents of the community, these communities that are far from the center of Wuchang District have a small number of people and large areas of residential houses. Therefore, the estimated population on the district level is larger than that on the street level. Regarding Luojiashan, the distributions of the population and the residential houses are much more concentrated than those of the other streets because of the large terrain undulations. Additionally, several communities have many residents and small areas of houses, so the estimated population on the street level is greater than that on the district level.
The following conclusions can be drawn that are based on the above analysis: the population spatialization method on the district level can better highlight the spatial distribution of the population from the macro perspective. This method focuses on the impact of different types of residential houses on population density. This method is suitable when house areas are sufficiently large and the distribution of population and types of houses are diversified. Meanwhile, the population spatialization method on the street level can better display the present situation of the spatial distribution of the population from the micro perspective. This method focuses on calculating the impact of the residential houses within a small region on the population of the community. It is suitable for areas where the type of residential houses is simple and the house areas are small.
4.4. The Evaluation of Population Fit Accuracy
This paper uses the 1-km population grid data set of China from the National Earth System Science Data Sharing Infrastructure (
http://www.geodata.cn), which established multivariate statistical models for populations in 1-km pixels in 2010 in China based on the correlations between the population and land use types. Urban population density, traffic conditions, DEM, and so on were used for model correction and forty counties with township population data from eastern, western, and central of China were chosen for precision verification.
To solve the problem of grid size mismatch between the two results, this paper spatially aggregates the population results on the street level and the district level, and the size of the merged grid is consistent with the 1-km population grid data set. The number of effective population grids has been adjusted to 66 by excluding the population grid around Wuchang District and avoiding a large population error due to the lack of data on housing construction in other urban areas. The fit analysis was performed on the estimated population on the district level and on the street level. The obtained results are shown in
Figure 12.
As shown in
Figure 12, the goodness of fit on the street level is slightly lower than that on the district level, and the fitting coefficient on the street level is closer to 1 than that on the district level. The results on the district level have higher accuracy, but the results on the street level have a lower coefficient sensitivity. Thus, the results of the two levels are highly accurate and they have their own advantages.