*3.1. Migration Attention Index Based on Baidu Search Query 3.1. Migration Attention Index Based on Baidu Search Query*

To verify the hypothesis that the migration-related search queries from individual users can positively reflect the population migration, three issues should be concerned: (1) what are the main driving factors cause population migration; (2) how to express those factors in cyber space through search query data; and (3) how to synthesize those search query data to comprehensively express public attention on migration in cyber space. For the first issue, based on the dynamic monitoring survey of China's migration population in 2015, we have conducted the statistic of population percentage on different migration reasons to confirm the main factors which cause population migration. For the second issue, a series of search keywords expressing different migration reasons has been selected. The Baidu Index of keywords versus the name of city has been collected to reflect the public attention on migration in cyber space. For the third issue, migration attention indexes (MAIs) have been constructed to integrate public attentions generated based on different migration To verify the hypothesis that the migration-related search queries from individual users can positively reflect the population migration, three issues should be concerned: (1) what are the main driving factors cause population migration; (2) how to express those factors in cyber space through search query data; and (3) how to synthesize those search query data to comprehensively express public attention on migration in cyber space. For the first issue, based on the dynamic monitoring survey of China's migration population in 2015, we have conducted the statistic of population percentage on different migration reasons to confirm the main factors which cause population migration. For the second issue, a series of search keywords expressing different migration reasons has been selected. The Baidu Index of keywords versus the name of city has been collected to reflect the public attention on migration in cyber space. For the third issue, migration attention indexes (MAIs) have been constructed to integrate public attentions generated based on different migration reasons.

#### reasons. 3.1.1. Confirmation of Main Migration Driving Forces

3.1.1. Confirmation of Main Migration Driving Forces To pointedly select search keywords that load public attention on migration. First, we confirm the main reason for population migration based on the dynamic monitoring survey of China's migration population in 2015. The percentage statistics of migrant population based on diverse migration reasons in the three different urban agglomerations are deployed. The results have been shown in Table 1; we can see that work and trade, that study and training, that accompanying transferring of family members, and that relocation are the main migration factors in the study area. To pointedly select search keywords that load public attention on migration. First, we confirm the main reason for population migration based on the dynamic monitoring survey of China's migration population in 2015. The percentage statistics of migrant population based on diverse migration reasons in the three different urban agglomerations are deployed. The results have been shown in Table 1; we can see that work and trade, that study and training, that accompanying transferring of family members, and that relocation are the main migration factors in the study area. The percentages of population who migrate for the four reasons separately occupy 75.70%, 85.39%, and 89.77% in Beijing-Tianjin-Hebei metropolitan region, the Yangtze River Delta, and the Pearl River Delta.

The percentages of population who migrate for the four reasons separately occupy 75.70%, 85.39%, and 89.77% in Beijing-Tianjin-Hebei metropolitan region, the Yangtze River Delta, and the Pearl River Delta. Due to the transferring of family members always accompanying family relocation [36], we have viewed them as one perspective and marked as relocation. Therefore, three main reasons for population migration have been confirmed as *work and trade*, *study and training*, and *relocation*.


**Table 1.** Migration population percentage of different migration reason in the different urban agglomerations.

#### 3.1.2. Selection of Search Keywords from Baidu Index

To better exhibit and exploit search query data, relevant search exploit services based on search query data are produced, typically as Google Trend (www.google.com/trends/) and Baidu Index (http://index.baidu.com/). A series of researches have been conducted to analyze data from Google Trend and Baidu Index; the robustness and effectiveness of them have been assessed [37–39]. In China, compared to Google, which is the largest search engine in the world, Baidu shares more internet search engine market. In 2016, there are 731 million netizens in China and the number of search engine users has reached 602 million [34]. Hereinto, Baidu shares 77.07% of the Internet search engine market, which is more than Google China. Especially, Vaughan and Chen [40] collected and compared the data from Google and Baidu and found that Baidu Index can offer more search volume data than Google Trend did in China. Under such context, the Baidu Index is employed in this paper to obtain public search attention in the cyber space.

Focusing on the three main migration reasons, we endeavor to confirm the search keywords which reflect public attention on migration. The confirmation of search keywords is comprehensively confirmed under five steps. First, according to the least effort principle in network information retrieval behaviors, users incline to choice the search keywords in their common language with brief and straightforward features [21,36,41–44]. We set the candidate keywords with brief structure and expressed them in Chinese. Second, the specific content of candidate keywords was derived from the three main migration reasons. Relevant search terms for them were selected by brainstorming common words used in searching for migration and review of related literature [21,45–47]. Third, we have compared the daily average search query data of designated search keywords with similar words during the same period to confirm that the selected keywords are the most popular search keywords in the related aspects. For example, "租房 (house renting)" has been compared to "出租 (rent)" and "租赁 (lease)"; collecting and organizing their average daily Baidu Index can find that "house renting"(11,795) gets much more attention than "rent"(477) and "lease"(636). Fourth, we sift the candidate words to follow the principle of search query data for each keyword in each city to be delineated as a sequential time series with a yearly resolution. Fifth, the correlation analysis between the last candidate keywords has been conducted and the one with a high correlation with others has been removed to reduce data redundancy. Through the comprehensive consideration of keyword selection, the last keywords can be viewed as not only representing the meaning itself but also including some clues for other potential keywords. Finally, six Chinese keywords from Baidu index have been confirmed to express public attention on migration in cyber space as list in Table 2.


**Table 2.** Selection of search keywords.

### 3.1.3. Construction of MAIs

The migration attention indexes (MAIs) are designed to comprehensively express public attention on migration in cyber space comprehensively. First, we combine the candidate search keywords with the name of objective cities to obtain the cityward migration keywords, such as "school + Beijing", "house price + Shanghai", "recruitment + Shenzhen", etc.; second, the average daily search volume of these cityward keywords are acquired based on Baidu Index from 1 January 2015 to 31 December 2015; third, the population percentages of different migration reasons are viewed as index weight to synthesize the corresponding Baidu Index into MAIs; fourth, according to the origin location of Baidu Index, the *local-MAI*, *external-MAI*, and *intercity-MAI* are separately constructed to express public migration attention on objective cities from internal area of the objective cities, external areas, and other specific cities. The relationship among those indexes can be depicted as follows:

$$\text{MAI}\_{\text{i}} = \text{External\\_MAI}\_{\text{i}} + \text{Local\\_MAI}\_{\text{i}} \tag{1}$$

$$\text{External\\_MAI}\_i = \sum\_{j=1}^{} \text{interity\\_MAI}\_{ij} \tag{2}$$

where *i* is the objective city, *j* is the original city, *MAI<sup>i</sup>* is the total migration attention city *i* has achieved from all regions, and *local-MAI<sup>i</sup>* and *External-MAI<sup>i</sup>* are separately the total migration attention city *i* has received from the urban internal area and external areas. *Intercity-MAIij* is the public migration attention derived from city *j* to city *i*. The formula of those indexes can be shown as follows:

$$\text{Local\\_MAI}\_i = \sum\_{n=1}^{3} \mathcal{W}\_{\text{in}} \times \text{BI}\_n / \text{MAI}\_{\text{max}} \tag{3}$$

$$\text{External\\_MAI}\_i = \sum\_{j=1} \sum\_{n=1}^3 \mathcal{W}\_{ijn} \times \text{BI}\_n / \text{MAI}\_{\text{max}}, i \neq 1 \tag{4}$$

$$\text{Intercrity\\_MAI}\_{\text{ij}} = \sum\_{n=1}^{3} \mathcal{W}\_{\text{ijn}} \times \text{BI}\_n / \text{MAI}\_{\text{max}} \,\text{i} \neq \text{j} \tag{5}$$

where *BI<sup>n</sup>* is the average daily volume of Baidu Index about different search keywords under migration reason *n*; *Win* and *Wijn* are the weights of *BIn*, which are defined by the proportion of people who migrate into city *i* for this reason; and *MAImax* is the max absolute value of MAI indicators.

#### *3.2. Correlation Analysis between MAIs and Population Migration*

#### 3.2.1. Correlation with Urban Migrants

To investigate the relationship between public migration attentions in cyber space and population migration in geographical space, we conduct the correlation analysis between MAIs and urban migrants. In the cyber space, local-MAI, external-MAI, and intercity-MAI were selected to represent public migration attentions with different originations to objective cities; in geographical space, floating population, inflow population, and intercity population flow were collected. Regarding the diverse kinds of migration and different definition of MAIs, the correlation analysis have been conducted from three aspects: (1) the correlation between local-MAI and floating population, which reflects the

relationship between migration attention generated from the local city and actual floating population inside the city; (2) the correlation analysis between external-MAI and inflow population, which explores the relationship between migration attentions received from the external areas and actual inflow population of the objective city; and (3) the correlation analysis between intercity-MAI and intercity population flow, which investigates the relationship between cyber migration attention flows and the actual population flows in the geographic space. Pearson correlation coefficient is employed to test such correlations, the formula can be shown as follows:

$$r = \frac{1}{n-1} \sum\_{i=1}^{n} \left( \frac{\mathcal{X}\_i - \overline{\mathcal{X}}}{\delta\_X} \right) \left( \frac{\mathcal{Y}\_i - \overline{\mathcal{Y}}}{\delta\_Y} \right) \tag{6}$$

where *r* is the correlation coefficient of the two indexes and *n* is the number of cities.
