1. Introduction
According to the seventh census of China, the number of people living in cities has reached 902 million, accounting for 63.89% of the total population. In the next five years, China’s urbanization rate will continue to grow steadily, and the population will be further concentrated in cities. The agglomeration of the economy has played an essential role in increasing income [
1,
2,
3,
4]; however, wage disparities within the city are also large [
5,
6,
7]. Rising inequality in earnings is a recurring concern in policy debates in China; simultaneously, “China’s 14th Five-Year Plan” proposes to promote common prosperity more effectively, and it pays attention to both efficiency and fairness, which are the essence of inclusive growth. In brief, if a certain factor has a positive impact on income growth, and relatively poor people benefit more from this factor, then this factor will bring about inclusive growth. Considering the accelerating urbanization and large spatial wage disparities in China, it is particularly important to explore the impact of urbanization on inclusive growth; however, this has barely been studied.
However, the major issue that must be addressed in order to draw a causal interpretation of the impact of economic agglomeration is how to identify low-, middle-, and high-income workers. Different strands of the literature have classified workers by a single indicator of income, which often assigns a larger role to agglomeration effects because the effect contains the impact of other individual differences in the workers on wages, and also cannot reflect the heterogeneity of internal multidimensional metrics between groups [
8]. To this end, we used the advantages of unsupervised machine learning algorithms to cluster urban workers by combining the high-latitude indicators of 21 individual characteristics. On this basis, the effect of economic agglomeration on wages was estimated and compared with the estimation results, which were classified according to income.
For many years, studies have focused on income inequality caused by the unbalanced development between urban and rural areas or regions [
9,
10]. In fact, there are income gaps among farmers [
11,
12] and urban populations; with the rapid development of urbanization, the imbalance of the income distribution among urban people is more prominent [
13,
14]. The reason for this is that although urban agglomeration can promote economic growth with externalities as a link to increase income, the differences in the skills, work experience, and other individual characteristics of different groups will lead to inequality. With upgrades in the urban industrial structure, the demand for skill-intensive talents increases, and highly skilled individuals with higher education are more likely to obtain relatively high wages [
15,
16]. The learning-by-doing effect also causes the accumulation of work experience, resulting in wage differences between groups. Household, registration, discrimination [
17,
18,
19], and information asymmetry in the job market [
20,
21] hurt the income of urban migrants. The existing studies mainly reflect the problem associated with measuring the income gap between specific groups using one indicator, as it cannot identify group differences caused by other characteristics, which may lead, to a certain extent, to group classification bias. This study uses the advantages of machine learning algorithms to process high-latitude data and re-examines the characteristics of different groups from multiple dimensions, which is more objective than classification based on a single factor, providing a new perspective on studying inequality.
Another source of inequality is the positive externality of the effects of economic agglomeration. Economic agglomeration improves the productivity of labor and generates a positive externality to income [
22,
23]. The coordinated development of urban characteristics and economic agglomeration, such as urban commuting facilities, human capital levels, and industrial structure, is conducive to the release of economic agglomeration effects. Labor has different preferences in urban and industrial characteristics, resulting in unequal benefits. Studies on the income gap caused by the characteristics of cities and industries have shown that larger urban-scale agglomeration effects and endowment effects have obvious advantages for wage growth premiums when compared with low-scale cities [
24,
25]. In addition, the spatial form is an important dimension that affects the externality of urban space, and improving infrastructure can alleviate the negative impact of inferior urban forms on workers’ wages [
26,
27]. The rationalization of the industrial structure significantly affects income distribution; after controlling for time trends and regional characteristics, the industrial structure has been found to significantly reduce income inequality [
28,
29]. A more extensive stock of human capital produces positive effects, including knowledge spillover and learning sharing. An increase in the concentration of human capital will increase regional income levels [
30,
31], while an uneven distribution of human capital may widen regional income gaps. Through reviewing the existing research, it has been found that the heterogeneity of urban and industrial characteristics is an important factor in the formation of income gaps; however, few studies discuss the endogenous relationship between economic agglomeration and industrial characteristics.
This paper aimed to contribute to the literature by proposing a general framework that encompasses economic agglomeration to evaluate the unequal wage gains across workers that arise from it. We also explored whether the gains were similar across workers and whether different classification methods led to different results. We tested for these effects by identifying the low- and middle-income and low and middle socioeconomic workers simultaneously and by comparing the magnitude of the estimation results. Understanding whether and how different groups of workers benefit from the spatial concentration and the presence of low and middle socioeconomic workers (low- and middle-income workers) in cities has key policy implications for China’s future urban growth, including whether to accelerate or slow down its tremendous urbanization process.
Based on the above analysis, we adopted a machine learning algorithm to analyze the effects of economic agglomeration on different urban labor groups. In contrast to the existing methods of dividing groups by a single index, we divided the sample into three groups, including a low socioeconomic group, middle socioeconomic group, and high socioeconomic group, based on the advantages of unsupervised machine learning algorithms. The differences were distinguished from the perspective of high-dimensional data, providing a new perspective to analyze the inclusiveness of China’s urbanization.
The possible contributions of this paper are mainly reflected in the following three aspects. First, we assessed the inclusiveness of China’s urbanization, including both economic factors and non-economic factors, which provides a new perspective for the study of China’s urbanization. Second, we introduced an unsupervised clustering machine learning algorithm for economic analysis, which provides a new basis for future labor research and introduces the possibility of observing more economic phenomena. Third, from the perspective of theoretical contributions, we expand the research boundary of wage disparities by using an unsupervised clustering machine learning algorithm to classify workers. In addition, based on the analysis framework of Combes et al. (2020), we further construct a theoretical model of wage income differences between groups and describe the relationship between economic agglomeration and wage disparities through comparative static analysis.
The remainder of the paper is organized as follows.
Section 2 establishes a theoretical model and discusses the reasons for the formation of wage gaps between different groups.
Section 3 includes a description of the data and the cluster analysis.
Section 4 pertains to the model setting and variable definition.
Section 5 includes the empirical results and analysis, while
Section 6 pertains to the mechanism analysis. Finally,
Section 7 and
Section 8 include the discussion and conclusions, respectively.
2. Theoretical Analysis
We intended to estimate the impact of economic agglomeration on the wages of low- and middle-income groups. Therefore, we divided urban workers, including migrant workers, into three different groups: low-income, middle-income, and high-income; these groups were called different socioeconomic groups if the labor was clustered by machine learning algorithms. Based on the analytical framework of Combes et al. (2020), a theoretical model was further constructed.
2.1. Production Process
It is assumed that different workers have the same production function within the same group. The output of a city is produced using a labor input composed of three different groups. To characterize the substitutability between groups, it is assumed that the main mechanisms are grasped by considering a CES production function, as shown in Equation (1):
where
is the production of different cities and
,
represents the efficiency of workers in the high-, middle-, and low-income groups. As the low-income group should be more substitutable to the middle-income group than the high-income group, we also expect
.
Assuming that the price of the final product is 1, it is possible to solve for wages at the group level using the first-order conditions that determine the optimal use of each type of labor under perfect competition. For instance, the wages of high-income workers can be expressed by the following formula:
where
is the wage of the high-income group. Under the same assumption, the average wage of the middle-income group can be obtained as:
The wage of the low-income group can be expressed by Equation (4):
2.2. Transmission Mechanism of Economic Agglomeration and Wage
Economic agglomeration enables workers to earn higher wages by improving the efficiency of production. We assume that the labor efficiency of each group of workers is a function of economic agglomeration and draws on the analytical framework of Combes et al. (2020); the specific functional form is shown in Formula (5):
where
is the economic agglomeration and
represents the output elasticity of the production efficiency relative to the economic agglomeration of the group
. Economic agglomeration ultimately impacts the total output of labor by affecting the productivity of different workers.
By combining the wage function (Equations (2)–(4)) and the economic agglomeration and wage transmission mechanism function (Equation (5)), the impact of economic agglomeration on the wages of high-income groups can be obtained, as shown in Equation (6):
Importantly, Equation (6) directly matches the specifications we estimated, showing that economic agglomeration has a positive spillover effect on the wages of high-income groups. Similarly, it has a similar effect on the wages of middle-income and low-income groups. Thus, the theoretical analysis shows that economic agglomeration has an overall positive wage spillover effect. These are important conclusions from a policy perspective, for instance, concerning the economic agglomeration that would enhance labor productivity and affect wage gaps between workers. Based on the above analysis, we put forward hypothesis 1:
Hypothesis 1. The development of economic agglomeration can increase the wages of workers.
On this basis, we use the ratio of the wages of different types of workers to measure the wage gap, where
is the wage gap between urban low-income and high-income groups;
is the wage gap between urban low-income and middle-income groups; and
is the wage gap between urban middle-income and high-income groups. Combined with
, the wage gap between low- and high-income groups derived from economic agglomeration is shown in Equation (7):
where
represents the ratio of the output of the low-income group to that of the middle-income group and is assumed to be less than 1. Therefore, whether the result of Equation (7) is positive or negative is determined by the difference in the production efficiency, which is related to the elasticity of economic agglomeration between the low- and high-income groups. According to the conclusion of the theoretical model, economic agglomeration causes wage spillover effects by improving production efficiency. In addition, the elasticity of production efficiency specific to the worker’s group is the reason for wage disparities. Groups with relatively high elasticity will share more development bonuses as a result of economic agglomeration. Based on the above analysis, we put forward hypothesis 2:
Hypothesis 2. The characteristics of structural transformation will affect the distribution of wages.
The result of the derivation of the wage gap between the low-income group and the middle-income group with economic agglomeration is shown in Equation (8), and the conclusion is consistent with Equation (8):
Interestingly, it is difficult to determine which labor groups have a greater impact on the labor productivity of economic agglomeration at the theoretical level. If the effect of economic agglomeration on the production efficiency of low-income workers is greater than that of those with high income, the low-income worker shares a greater dividend of urbanization. Urbanization must show inclusiveness; otherwise, it does not have the characteristic of inclusive growth. Moreover, one may also obtain different results if different indicators are used to classify labor. Here, we wish to evaluate whether China’s urbanization is inclusive or not; based on the above analysis, we put forward hypothesis 3, which states that different classification methods will obtain different results. This hypothesis must be further verified by empirical analysis.
Hypothesis 3. Different classification methods will influence the results with respect to the effect of economic agglomeration on wages.
6. Mechanism Analysis
We demonstrated that economic agglomeration has spillover effects on the wages of low and middle socioeconomic groups; these conclusions are robust and consistent with the theoretical analysis. Therefore, how does economic agglomeration lead to spillover effects on the wages of low and middle socioeconomic groups? Moreover, what is the intermediate mechanism? Our theoretical analysis showed that the elasticity of the production efficiency specific to the groups is the reason for wage disparities, and groups with relatively high elasticity will share more development bonuses as a result of economic agglomeration. Unfortunately, we cannot directly obtain the elasticity of the production efficiency between different groups at the individual level. However, according to the existing research, the urban industrial structure has external characteristics, which will affect the production efficiency of different workers. Therefore, we tried to explain the mechanism from the perspective of urban industrial characteristics.
Column (1) in
Table 7 shows the regression result based on the PAM algorithm. The result indicates that the interaction coefficients of industrial specialization and the effects of economic agglomeration on low and middle socioeconomic groups are significantly positive. Column (3) shows the result for the advanced industrial structure, which also has a positive effect. Columns (2) and (4) in
Table 7 show the regression results classified according to the income standard published in the “China Statistical Yearbook”. These results show that the wages of the low-and middle-income groups are also positively affected by the urban industrial structure. It is evident that the industrial structure can explain the wage spillover effect of economic agglomeration.
We now turn to the inclusiveness of urbanization in China. As shown in
Table 7, the results estimated according to the PAM algorithm were relatively small. The inclusiveness was reduced when the differences between workers were taken into account. A possible reason for this is that urbanization does increase workers’ income when we only consider their wage, but after considering social security and other factors, urbanization may not have such an obvious positive effect. To summarize, economic agglomeration has improved the income of the low- and middle-income groups through the effect of industrial structure; however, the results obtained after dividing the groups based on a single indicator of income were overestimated to a certain extent.
7. Discussion
Different strands of the literature have shown that the urbanization of China is an important factor in reducing poverty; in particular, scholars have reached a consensus on the small urban–rural income gap [
39,
40]. With economic and social development, social security and employment welfare have become important indicators to judge the quality of labor employment. The disparities between different labor forces are not only reflected in the income gap; therefore, the impact of economic agglomeration on wage distribution from the perspective of income may overestimate or underestimate the role of China’s urbanization. For this reason, we attempted to use a machine learning algorithm to identify workers in multiple dimensions.
Some striking results can be observed from
Table 8. Firstly, the wage, education, and health of low- and middle-income workers were lower; this result held for all of the classification methods. However, the difference was smaller in the multi-indicator classification results. This shows that China’s urbanization is unbalanced [
41,
42,
43]. Based on the classification results of the machine learning algorithm, it was found that this imbalance is manifested in many aspects, such as level of education. In particular, we found that health status also showed an imbalance; thus, we need to pay more attention to the health status of low and middle socioeconomic workers. The second noticeable observation from
Table 8 is that the employment environments of low and middle socioeconomic workers were more severe, their work was relatively unstable, and their aging was more serious. In general, our conclusion is that the imbalance in China’s urbanization is multifaceted.
We now turn to the magnitude of the estimated effects in the different groups of workers. Based on our classification results, we found that economic agglomeration improved the imbalance of urbanization in China; this resulted in a significant alteration of wage inequality, in agreement with previous studies [
7,
44]. The effect was 20.3% from the perspective of wage only, and 3.9% from multiple perspectives. One possible explanation for these results is that the marginal utility of welfare conducted by economic agglomeration is less than the marginal effect of wages; this is also supported by the existing research. These results show that although China’s rapid urbanization has improved income inequality, it still lags behind in labor market norms [
45,
46]. Therefore, the impact is smaller when we estimate China’s urbanization from multiple perspectives; this result is our novel discovery.
From the perspective of the structural transformation, we explained the impact mechanism from two aspects: the specialization externalities effect and the advanced externalities effect. The specialization externalities focus on the question of how to develop comparative advantage industries in a region, while the advanced externalities pay attention to the issue of how to simulate agglomeration economies by optimizing industrial distribution in space [
47,
48,
49]. Our theoretical analysis and empirical results suggest that the structural transformation—whether by industry specialization or advanced industrial structure—presents a positive effect on the wage elasticity of low and middle socioeconomic groups (low- and middle-income groups). Meanwhile, the literature also reports evidence that individual characteristics affect workers’ rewards from urbanization [
50,
51]. As shown in our theoretical analysis, the reason why economic agglomeration has a positive effect on the wage elasticity of low and middle socioeconomic groups can be mainly attributed to individual characteristics. The transformation of the industrial structure will lead to different demands for workers, which will lead to different wage spillover effects caused by economic agglomeration.
A clear limitation of this study Is that although the clustering contained more dimensions of individual characteristics, it was generated using data. This made it difficult to accurately describe the economic meaning of each cluster, as the deeper explanations of reasons and motives were missing. This method will be effective for future research exploring deeper reasoning and, for instance, engaging in focus group discussions with the different types of workers. Another limitation lies in the fact that although the sample was large and comprised workers from a high regional coverage in China, it was still only a section of the population; therefore, the representativeness could still be improved.
8. Conclusions
In contrast to the previous studies, we considered the multidimensional characteristics of workers; thus, the results we observed are relatively comprehensive. This is important for exploring the law of urbanization. In addition, we applied machine learning algorithms to the area of wage disparities and provided a method for group classification that does not assume the classification indicators in advance; all of the classifications are determined by the data, which is conducive to a greater variety of research perspectives. We used the PAM algorithm and comprehensively considered the values of 21 individual indicators to cluster urban workers into three groups, including a low socioeconomic group, middle socioeconomic group, and high socioeconomic group. In addition, we compared the regression results based on the PAM algorithm with the results of the classification according to income.
Our conclusions include three aspects. First, the clustering results showed that, in addition to significant differences in wages, there are also considerable differences in social security and living standards among urban workers. We also found that migrant workers are mainly distributed in the low socioeconomic group, accounting for 56.85%. The level of health and social security of this group is relatively low, and what is more serious is that the aging trend is particularly obvious. Second, the wages of the low and middle groups show the spillover effect of economic agglomeration, and groups divided by a single indicator of income overestimate the spillover effect. The effects of economic agglomeration become smaller when considering multidimensional indicators. Third, the industrial structure has a positive influence on the effects of economic agglomeration on the low and middle groups, which helps to amplify these effects. The different returns to urbanization contribute to the role of different industrial characteristics in shaping the wage gaps in urban China.
Our findings have important policy implications. First, the government should continue to promote the transfer of agricultural populations to cities and enable more people to share the fruits of economic development, and strengthen its role in increasing income and narrowing the income gap. In particular, it should pay more attention to the improvement of the human capital level, social capital level, and psychological capital level of the low- and middle-socioeconomic groups. Second, the Chinese government should promote a people-oriented new urbanization strategy while paying attention to disparities other than income, speed up the reform of the household registration system, and improve the social security level of low- and middle-socioeconomic groups. It should pay attention to both the efficiency and fairness of development. Third, the local governments should optimize their industrial structure and release more structural dividends and accelerate the transformation and upgrading of the industry by improving the industrial infrastructure to constantly improve the wage income and welfare level. Fourth, it is necessary to focus on the aging challenge of the low socioeconomic group and actively respond to the problem of the solidification of the income classes caused by aging. It should strengthen the equalization of public services for the low socioeconomic group and pay more attention to the equality of educational opportunities for their children.