1. Introduction
The pace of global urbanization has increased since the industrial revolution. The global average urbanization rate reached 55% in 2018 and is associated with widespread regional imbalance. The regional imbalance is the inevitable result of urbanization and is the driving force for the further reorganization of regional urban space. Developed countries with high urbanization rates generally have regional imbalances. These imbalances vary within countries and at different spatial scales, including national, regional, urban and rural-urban scales [
1]. The United States has obvious regional development imbalances caused by urban sprawl [
2]. The Paris region master plan (SDAU) was launched in the mid-1960s and the structure of the largest French urban agglomeration was changed, leading to regional inequalities [
3]. In London, UK, the imbalance is shown as a distinct east–west divide [
4]. In New Zealand, the development of rural areas on the east coast of the North Island has gradually lagged behind that of other regions [
5]. In Canada, there has been a serious development gap between urban and rural areas, in terms of skills, manpower, etc. [
6]. As for developing countries, in the process of rapid urbanization, the imbalance of regional development also has become increasingly apparent. In the post-reform period, India has faced a very high regional disparity in its development [
7,
8]. There also exist similar phenomena between the west and east Malaysia [
9]. In east Poland, the imbalance is shown evidently between counties [
10]. In Paraguay, the imbalance between rural and urban development is even more striking [
11]. Regional imbalance in China has attracted considerable scholarly attention [
12,
13,
14,
15]. The imbalance of regional development will aggravate the gap between the rich and the poor in the region, which may bring a series of problem, such as, the difficulties in equilibrium of infrastructure construction, the imbalance of regional social public services, the loss of population in backward areas, and so on. In other words, it will threaten the sustainable development of the region [
1]. Therefore, characterizing the imbalance, clarifying the driving mechanism of imbalance and exploring the method to promote regional coordinated development has become a scientific research hotspot and urgently practical need.
Research on the imbalance of regional spatial development focuses mainly on the unbalanced characteristics and dynamic mechanisms of regional urban spatial form. The characteristics of urban spatial form imbalance are commonly considered using three metrics: the urban spatial expansion size, urban spatial development intensity, and urban spatial distribution aggregation. Among these, more studies have focused on urban spatial expansion size when analyzing the imbalance of regional urban spatial form. Wei et al. [
14] found that the existence of regional inequality in urban land expansion was led by the more rapid growth of urban land. Xu and Hou [
12] constructed an index of population, economy, and land for a comprehensive evaluation of urbanization, which indicated a regional imbalance in the Yangtze River Delta, China. Bonilla-Bedoya et al. [
16] analyzed the interactive relationship between the uneven expansion size of different urban spatial patches and its urbanization process. There are relatively few studies that describe the characteristics of urban spatial form using the urban spatial development intensity and distribution aggregation. Wang et al. [
17] and Wang [
18] both presented an index system for the assessment of the level of urban development intensity from the perspective of land-use intensity, economic intensity, and population intensity. Hu et al. [
19] used the method of spatial point pattern analysis to characterize the spatial agglomeration of different land uses in Ningbo city. Overall, there are few studies on the comprehensive consideration of urban spatial expansion size, urban spatial development intensity, and urban spatial distribution aggregation that actually analyze the relationship between them. Thus, it is difficult to properly understand the characteristics of regional imbalance of urban spatial form.
Natural resources [
20], economic [
21], infrastructure, and population [
22] are commonly considered to be the driving forces in research on the regional urban spatial imbalance. Jones and Henderson [
23] demonstrated that the distribution of emerging industries would further expand the gap of urban spatial expansion size between the relatively prosperous coastal zone and the industrial hinterland in the Cardiff City-Region in South Wales. Farmer [
24] studied Chicago, USA, and demonstrated that the level of regional infrastructure service, especially public transportation facilities, led to the uneven development of urban spatial development intensity. Ye et al. [
25] found that a Chinese urban agglomeration is a capital-intensive region, but planning and governance have more influence than the market in the evolving process of urban agglomeration. Ebeke and Ntsama Etoundi [
20] demonstrated that an increase in the share of natural resources led to a rapid increase in urbanization and urban concentration in Africa. They used correlation analysis to show that the spatial pattern of cities in underdeveloped areas mainly depended on natural resources. In general, the current research on the driving mechanism of unbalanced regional urban spatial form is mostly focused on single variables, such as natural, economic or social variables, and there is still a lack of systematic and comprehensive analysis on the selection and comparison of driving variables.
Common analysis techniques include correlation analysis, linear regression (ordinary least squares (OLS)), geographic information systems (GIS), mapping, and geographically weighted regression (GWR). Salvati et al. [
26] used principal component analysis and GIS techniques to explore regional differences in northern, central, and southern Italy. Sangkasem and Puttanapong [
27] used OLS and Moran’s I statistics and concluded that regional imbalances in Bangkok have declined. Ansong et al. [
28] used GWR to explore the correlation between educational resource input and policies and regional development imbalance in Ghana. Moreover, Oduro et al. [
29] used two-stage least-square regression to test the socioeconomic effects of urbanization levels, ecological factors, proximity to national capitals, and proximity to interregional highway systems in Ghana. Falzetti and Sacco [
30] used the GWR and k-mean clustering to study the spatial variability of the impact of educational resources on regional disparities in Italy. The heterogeneity of spatial units within the region may lead to different degrees of influence on regional development. Therefore, it may be difficult to model the formation mechanism of regional urban imbalance in space through traditional regression analysis, and put forward the differentiation strategies to promote regional coordinated development. Spatial statistics provides modern techniques that can be used to study spatial heterogeneity of individual variables [
31,
32] and to study spatial variability in the relationship between two or more variables [
33,
34,
35].
In this paper, we take counties of Jiangsu Province China (
Figure 1), a typical coastal plain region, as the basic research unit. We aim to explore the unbalanced development characteristics of regional urban spatial forms. The objectives of this paper are (1) to identify the unbalanced development characteristics and compare the differences among the urban spatial expansion size, development intensity, and distribution aggregation degree; (2) to identify their different driving mechanisms by using modern spatial analysis tools and data on physical geography, economy, and society; and (3) to put forward a differentiated regional optimization adjustment strategy.
2. Methods and Data Sources
Since districts and counties have the same administrative level in China, we took the districts and counties of Jiangsu Province as the basic research unit, with a total of 99 spatial samples. Jiangsu is located in the Yangtze River Delta, with flat terrain (see
Figure A23 for digital elevation model (DEM) in
Appendix A), which covers an area of 107,200 km
2. The latest accessible year of land use data and point of interest (POI) data is 2015, and other statistical data also has a certain lag. In order to match spatial data with social and economic data, we selected 2015 as the study year. In 2015, Jiangsu Province had a total population of 80 million, with a GDP of 7012 billion Chinese yuan (CNY). The area of the built-up area, the proportion of built-up area, and the global Moran’ I of the built-up area of each district and county were used to characterize urban spatial expansion size, development intensity, and distribution aggregation degree. First, spatial autocorrelation analysis was used to analyze the spatial distribution pattern and characteristics of urbanization in Jiangsu Province. Then, the traditional statistical methods and spatial statistical methods were combined to analyze 30 commonly considered potential driving variables related to physical geography, economy, and society [
12,
36]. Pearson correlation analysis was used to screen out the variables that were significantly related to the spatial pattern of urbanization. Finally, linear regression (ordinary least squares: OLS) and geographically weighted regression (GWR) were used to identify the driving variables that led to the difference in urbanization spatial form.
2.1. Data
The data used in the study were obtained from the following sources. The physical geography and remote sensing monitoring data of the status of land use in Jiangsu Province in 2015 with a resolution of 1 km comes from the Resource and Environment Science Data Center of the Chinese Academy of Sciences (
http://www.resdc.cn/; accessed date: 20 March 2020). It is based on Landsat 8 remotely sensed images and was generated by human visual interpretation. The social and economic data came from the 2015 “Statistical Yearbook of Jiangsu Province” (
http://stats.jiangsu.gov.cn/2015/indexc.htm; accessed date: 20 March 2020) and the statistical yearbook of each city. The population data came from the sixth national census of the National Bureau of Statistics (
http://www.stats.gov.cn/tjsj/pcsj/rkpc/6rp/indexch.htm; accessed date: 20 March 2020). The point of interest (POI) data came from the 2015 Gaode map. These data are listed in
Table 1. We divided potential influencing variables of urban spatial form into three types. The variables of physical geography were mainly dependent on the natural background environmental conditions, related to geographical location and formed naturally; the economic variables were related to economic income and industrial development; the societal variables were related to social and historical development, influenced by human beings and driven by human development needs. This categorization and choice of variables reflect common practice among researchers working on related studies [
19,
20,
21,
22,
23,
24,
25,
28,
36].
We extracted the urban land from the remote sensing monitoring data of Jiangsu Province in 2015 as the built-up area. The value of built-up area and non-built-up area was recorded as 1 and 0, respectively. First, we calculated the area of the built-up area, the proportion of the built-up area, and the global Moran’s I (see
Section 2.1) of the built-up area within each district or county. We used these three indicators to represent urban spatial expansion size, development intensity, and distribution aggregation degree, respectively.
2.2. Methods
In this section, we present the statistical methods used in this paper for correlation in and regression analysis. We first summarize the conventional, non-spatial approach (correlation, regression) and then the associated spatial method (Moran’s I, GWR) and explain their relevance in this study. We also provide references that readers can consult for further details on the methods and applications.
2.2.1. Pearson Correlation Analysis
The Pearson product moment correlation coefficient [
37,
38] was used to evaluate the linear correlation between two continuous variables.
where
and
are the sample mean values of two continuous variables
and
, respectively, and the value range of
is in [–1,1]. If
r > 0, it means that the two variables are positively correlated, if
, then the two variables are negatively correlated, and if
r = 0, it means that there is no linear correlation between the two variables. Since
is estimated from a sample, a hypothesis test is used to evaluate whether the true correlation,
ρ, is significantly different from zero.
As shown in
Table 1, 30 potential variables that might have an impact on the regional urban spatial form were selected. We then performed Pearson correlation analysis in SPSS 24.0 to quantify the relationship between the three urban metrics and the 30 potential driving variables. The units being evaluated were the districts and counties of Jiangsu province.
2.2.2. Spatial Autocorrelation Analysis
Spatial autocorrelation is used to quantify the association between the attribute values of nearby units [
39]. In this study, the units were the counties and districts of Jiangsu Province. Positive spatial autocorrelation indicates that counties with similar attribute values are located close to each other and negative spatial autocorrelation indicates that counties with different attribute values are located close to each other. If the spatial autocorrelation is close to zero, then there is no spatial association. As with Pearson’s correlation, this can be assessed using a hypothesis test. Spatial autocorrelation can be evaluated using the Moran’ I index [
31,
39,
40]. The value of I is between [−1,1], −1 indicates perfect negative autocorrelation, and 1 indicates perfect positive autocorrelation, 0 means no spatial autocorrelation. The formula for the global Moran’ I is shown below.
where
indicates location and
,
identify specific spatial units;
is the spatial connectivity matrix. If two geographic units are adjacent, the value is 1, and if the two geographic units are not adjacent, the value is 0;
and
are the value of the attribute in the
-th geographic unit and the mean value of the study area;
is the total number of geographic units. The formulation is similar to Pearson’s
; however, we only consider one variable rather than two and evaluate how observations of that variable are related to observations of the same variable at adjacent locations.
There is also a local version of Moran’s I, which calculates the statistic at location
Global statistics evaluate the spatial autocorrelation over the whole study area. They assume that the autocorrelation is constant over the study area but, in reality, it can vary in space. Local statistics calculate the spatial autocorrelation around a specific spatial unit [
31,
32,
41]. These local statistics are called local indicators of spatial association (LISA) [
41] and they have been used in both environmental and social-science studies [
27,
31,
32] to evaluate local patterns. Here, we used ArcGIS 10.6 to calculate the local Moran’s I, and identify high-value clusters (H-H), low-value clusters (L-L), outliers with high values mainly surrounded by low values (H-L), and outliers with low values mainly surrounded by high values (L-H). We implemented spatial autocorrelation analysis in software GeoDa [
42].
We used the global Moran’s I, calculated for the built-up area within each county or district, to define the urban aggregation degree.
2.2.3. Linear Regression (Ordinary Least Squares Regression: OLS)
Linear regression analysis is often used to study the relationship between the variable of interest (the response variable) and one or more covariates. It can also be used for prediction [
37,
43]. It can be expressed as:
where
is the value of the response variable associated with the
th observation and
is the constant (intercept) term;
is the regression coefficient, and
is the residual, which represents the difference between the fitted value and the true value. In the most simple case,
and there is only one covariate,
. In this paper, we evaluated multiple covariates (
Table 1) for predicting the three characteristics of urban form: urban spatial expansion size, urban spatial development intensity, and urban spatial distribution. The analysis was restricted to linear regression. OLS refers to the method used to estimate the regression coefficients using the data. It is based on minimizing the sum of squares of the residuals (hence “least squares”) [
43]. We used IBM SPSS statistic 24.0 to establish an OLS regression equation with variables that were significantly related to urban spatial expansion size, development intensity, and distribution aggregation degree and used the stepwise method to automatically eliminate variables with strong collinearity to obtain the final OLS regression equation.
2.2.4. Geographically Weighted Regression
Geographically weighted regression (GWR) [
44] extends the linear regression model and uses the local weighted least square method to calculate the regression coefficient. In other words, the weight is determined according to the Euclidean distance between the spatial position of the estimated point and the spatial position of other observation points, so that the regression coefficient of the model is no longer a single global value, but can vary in geographic space [
34]. The estimated parameter values at different geographical locations describe the spatially varying nature of the relationship between
and
. The structure of the model is as follows:
where
is the response variable of the i-th sample at location
;
is the value of the
-th covariate at the
-th location; (
) is the coordinate of the
-th point;
is the local regression coefficient at
; and
is the residual. The key difference between GWR and OLS is that the regression coefficients can vary in space. Hence, it is necessary to indicate the location (
) of each observation and associate regression coefficient (compare Equations (4) and (5)).
GWR has been used to model spatially varying relationships in both the social [
44] and environmental sciences [
34]. This gives more flexibility compared to linear regression (OLS) because the regression coefficients (
in Equation (5)) can vary in space. Exploring the spatial variability in the relationship between the response variable and covariates can give more insights into the process [
33] Note that GWR does not automatically lead to an improvement compared to OLS. This needs to be evaluated.
We used covariates from the OLS model and put them into the GWR model to explore the spatial structure in the driving forces of regional imbalance in urban spatial form. GWR was implemented in ArcGIS 10.6.
We computed the following statistics for the OLS and GWR mode: the standard error of the residuals,
; the coefficient of determination, R
2; the adjusted-R
2 and the adjusted Akaike information criterion (AICc) [
37]. The standard error of the residuals quantifies the variability of the residuals around the fitted regression line. The coefficient of determination quantifies the proportion of the variability in the response variable that is explained by the model and take as a value between 0 and 1 (larger is better). The adjusted-R
2 is the R
2 adjusted for a number of covariates. Adding covariates to a linear regression model will not reduce the R
2, but does increase the model complexity. Hence, the adjusted-R
2 supports evaluating whether adding a covariate is sufficiently useful to justify the increase in model complexity. AICc is commonly used to compare models. It gives a trade-off between goodness-of-fit and model complexity. A lower AICc indicates a better model.
5. Conclusions
Based on the general concern of unbalanced inter-regional development, this study aimed to reveal the characteristics and driving mechanism of unbalanced regional urban spatial form. In particular, we have used multiple indices. In terms of indicators of unbalanced development characteristics of urban spatial form, most previous studies still use a single indicator and lack multiple indicator analysis. As for driving mechanisms, comparison of the influencing variables of multiple unbalanced development characteristics is rare. Furthermore, in terms of research methods, the traditional statistical analysis does not allow the full exploration of spatial patterns in the data. In order to fill these research gaps, we used three indicators of urban spatial expansion size, development intensity, and distribution aggregation degree. In addition, spatial analysis tools and traditional statistical analysis tools were combined in this study. First, spatial autocorrelation analysis was used to analyze the characteristics of the unbalanced spatial form of towns in Jiangsu province. It was found that there is a positive spatial correlation between the urban spatial expansion size and development intensity. Specifically, the regions with large values of both were mainly in southern Jiangsu, while the regions with small values are mainly in northern Jiangsu. While the spatial distribution of cities and towns has no agglomeration phenomenon. Secondly, the Pearson Correlation Analysis, OLS, and GWR Analysis were applied to reveal the correlations and differences between various driving mechanisms, namely, economy, urbanization quality, urbanization level, and natural landform. It was found that urbanization level can lead to inter-regional imbalances of urban spatial development intensity and distribution aggregation degree at the same time. Finally, the optimization strategies were formulated to promote balanced development between regions in Jiangsu Province. Southern Jiangsu should focus on improving the urbanization quality and promote regional integration. Central Jiangsu should improve the urbanization level and develop along the axis relying on rapid transportation. Northern Jiangsu should expand the economic scale and build the urban agglomeration with central cities as the core.
Many variables affect the unbalanced development of inter-regional urban space and this study could not cover all possible variables. The significantly correlated variables could change over time. These two points should be considered in future studies.