3.1. Selection of Variables
Based on the above study, an empirical study of the variables was conducted by combining data related to the total tourism revenue, road mileage, civil aviation flights, number of travel agencies, the total number of tourists, disposable income of urban residents, disposable income of rural residents, employees in the tertiary industry, and the scale of foreign direct investment in Guizhou Province (
Table 1).
Gross tourism receipts (R): Gross tourist revenues include profits from both local and overseas travel. The growth of the tourist sector is directly reflected in the volume of tourism income. As a result, tourist income is a crucial indicator of the health of China’s tourism industry [
27]. It serves as the dependent variable in this essay.
Road route mileage (T): A tourism destination’s appeal to travellers is directly influenced by its accessibility. Family automobiles are becoming more and more common as people’s living standards rise, and “self-driving” has taken over as the preferred form of transportation. Tourists find a place more appealing if its road transportation system is better established [
28]. Remarkably, the province of Guizhou has continuously developed its road system, which has greatly aided the growth of the tourist industry. Therefore, this paper takes road mileage as one of the variables affecting the tourism economy.
Civil flight flow (M): Air travel will be a common means of transportation if the distance between the source and the destination is too vast. Travel by air may significantly save time spent on tourist activities and improve time efficiency [
29]. As a result, the growth of the tourist industry is influenced by the number of passengers on commercial aircraft.
Total tourism arrivals (P): Inbound and domestic travel are also counted in the overall number of visitors. As more people travel, the tourism sector, which is highly dependent on human labour, adapts, which boosts local tourism spending and boosts the overall economic output [
30].
The number of travel agencies (A): The number of travel agents in the area represents the existing state of the region’s tourist business, where visitors arrive directly via travel agents for associated services based on tourism and produce relevant tourism-related spending with the help of travel agents [
31]. It also indicates the state of a region’s vital auxiliary services and the quality of its primary services.
Disposable income per urban resident (D): The real money that urban inhabitants have available to them for everyday living is referred to as their “disposable income”. The most significant and often-used metric to gauge urban inhabitants’ income levels and conditions of life is the disposable income per urban resident [
32]. Additionally, Guizhou Province’s urban population are the industry’s largest customer segment, and their disposable income directly influences the growth of the tourist industry.
Net income per capita of rural residents (I): Net income per farmer is defined as net income “estimated based on the rural population and reflecting the average income level of rural people in a nation or area”. Net income is the sum of all annual revenue earned by rural dwellers from all sources, minus any associated costs paid to achieve that. Residents in rural areas earn more money as society develops, and their quality of life is progressively improving [
33]. Rural populations have also developed into crucial customers in China’s tourist industry. As a result, one of the key factors influencing the growth of Guizhou Province’s tourist industry is the disposable income of rural populations.
Tertiary sector employees (E): The expansion of the tourist sector requires a large workforce and the building of the required infrastructure and other hardware expenditures. Numerous employees are needed to work in the tourist services sector since it is a service- and labour-intensive industry [
34]. If we want to foster tourism’s high-quality growth, we will eventually need the academic assistance of high-quality individuals in the tertiary sector.
The scale of foreign direct investment (F): In today’s world, foreign direct investment (FDI) is a fundamental form of international capital exchange and an effective technique for using foreign money. In general, increasing the use of foreign investment denotes higher chances for economic development, and often denotes an improvement in a nation’s position concerning its balance of payments, which stimulates the domestic economy. The foreign capital influx will raise overall societal demand. If there is still room for growth, businesses will raise production to be in line with that potential. However, economic development will be harmed if inflows of foreign money decline. Foreign capital inflows can boost a nation’s foreign reserves and foreign currency supply, reduce the current account deficit, and enlarge the balance of payments surplus [
35].
3.2. Stability Test of ADF Data Series
The following conclusions can be drawn from the below table (
Table 2):
(1) For R, the t-statistic for the ADF test of this time series data is 7.051, with a p-value of 1.000 and critical values of −5.500, −4.072, and −3.493 for 1%, 5%, and 10%, respectively. Due to the fact that p = 1.000 > 0.1, the original hypothesis cannot be rejected, and the series is not smooth. The series was subjected to first-order difference and then an ADF test. The result of the ADF test on the data after first-order difference shows that p = 0.982 > 0.1, and so the original hypothesis cannot be rejected, and the series is not smooth; therefore, the series was subjected to second-order difference and then the ADF test. The result of the ADF test after second-order differencing shows that p = 0.041 < 0.05; therefore, there is more than 95% certainty of rejecting the original hypothesis, and the series is smooth.
(2) For A, the t-statistic of the ADF test for the time series data is −0.621, with a p-value of 0.978 and critical values of −5.118, −3.918, and −3.411 for 1%, 5%, and 10%, respectively. Due to the fact that p = 0.978 > 0.1, the original hypothesis cannot be rejected, and the series is not stationary. The series was subjected to first-order difference and then an ADF test. The results of the ADF test on the data after first-order differencing show that p = 0.012 < 0.05; therefore, there is more than 95% certainty that the original hypothesis is rejected, and the series is smooth at this point.
(3) For P, the t-statistic for the ADF test of the time series data was 9.272, with a p-value of 1.000 and critical values of −5.500, −4.072, and −3.493 for 1%, 5%, and 10%, respectively. Due to the fact that p = 1.000 > 0.1, the original hypothesis cannot be rejected, and the series is not smooth. The series was subjected to first-order difference and then an ADF test. The results of the ADF test on the data after first-order differencing show that p = 0.124 > 0.1; therefore, the original hypothesis could not be rejected, and the series was not stationary, so the series was subjected to second-order differencing and then the ADF test. The result of the ADF test after second-order differencing shows that p = 0.021 < 0.05; therefore, there is more than 95% certainty of rejecting the original hypothesis, and the series is smooth now.
(4) For T, the t-statistic of the ADF test for the time series data is −1.646, with a p-value of 0.774 and critical values of −5.118, −3.918, and −3.411 for 1%, 5%, and 10%, respectively. Due to the fact that p = 0.774 > 0.1, the original hypothesis cannot be rejected, and the series is not stable. The series was subjected to first-order difference and then an ADF test. The results of the ADF test on the data after first order differencing show that p = 0.568 > 0.1; therefore, the original hypothesis could not be rejected, and the series was not stationary. The series was subjected to second-order differencing and then an ADF test. The result of the ADF test on the data after second-order differencing shows that p = 0.001 < 0.01; therefore, there is more than 99% certainty that the original hypothesis is rejected, and the series is smooth at this point.
(5) For M, the t-statistic for the ADF test of this time series data is −0.684, with a p-value of 0.974 and critical values of −5.118, −3.918, and −3.411 for 1%, 5%, and 10%, respectively.
With p = 0.974 > 0.1, the original hypothesis cannot be rejected, and the series is not smooth. The series was subjected to first-order difference and then an ADF test.
The ADF test result of the data after the first order difference shows p = 0.001 < 0.01; therefore, there is a higher than 99% certainty of rejecting the original hypothesis, and the series is smooth now.
(6) For D, the t-statistic of the ADF test for the time series data is 14.647, with a p-value of 1.000 and critical values of −5.500, −4.072, and −3.493 for 1%, 5%, and 10%, respectively. Due to the fact that p = 1.000 > 0.1, the original hypothesis cannot be rejected, and the series is not smooth. The series was subjected to first-order difference and then an ADF test. The results of the ADF test on the data after first-order differencing show that p = 0.055 < 0.1; therefore, there is more than 90% certainty that the original hypothesis is rejected, and the series is smooth at this point. The ADF test result for the second-order differential data shows that p = 0.000 < 0.01, with more than 99% certainty of rejecting the original hypothesis, and the series is stable at this point.
(7) For I, the t-statistic for the ADF test of the time series data is −1.348, with a p-value of 0.876 and critical values of −4.884, −3.822, and −3.359 for 1%, 5%, and 10%, respectively. Due to the fact that p = 0.876 > 0.1, the original hypothesis cannot be rejected, and the series is not smooth. The series was subjected to first-order difference and then an ADF test. The results of the ADF test on the data after first-order differencing show that p = 0.137 > 0.1; therefore, the original hypothesis could not be rejected, and the series was not stationary, so the series was subjected to second-order differencing and then the ADF test. The ADF test result of the data after second-order differencing shows p = 0.006 < 0.01. Therefore, there is more than 99% certainty of rejecting the original hypothesis, and the series is smooth now.
(8) For E, the t-statistic of the ADF test for the time series data is −0.364, with a p-value of 0.988 and critical values of −5.118, −3.918, and −3.411 for 1%, 5%, and 10%, respectively. Due to the fact that p = 0.988 > 0.1, the original hypothesis cannot be rejected, and the series is not smooth. The series was subjected to first-order difference and then an ADF test. The ADF test result of the data after the first-order difference shows p = 0.000 < 0.01; therefore, there is a higher than 99% certainty of rejecting the original hypothesis, and the series is smooth at this time.
(9) For F, the t-statistic for the ADF test for this time series data is −3.515, with a p-value of 0.038 and critical values of −5.118, −3.918, and −3.411 for 1%, 5%, and 10%, respectively.
With p = 0.038 < 0.05, there is a higher than 95% certainty that the original hypothesis is rejected, and the series is smooth at this point.
In summary, the data for all variables are serially stationary, and can proceed to the next step of the empirical data study.
3.3. Exploring: Tourism Revenue and Tourism Visitor Numbers
In summary, through the ADF test of the time series data for the variables of total tourism revenue (R), number of travel agencies (A), the total number of tourists (P), road route mileage (T), number of civil flights (M), per capita disposable income of urban residents (D), per capita net income of rural residents (I), employees in the tertiary industry (E) and scale of foreign direct investment (F), the data are all serially smooth, and further analysis can be conducted. The relationship between tourism income and the number of tourists is explored below.
(1) Literature support.
Firstly, in Lu Liu’s “The impact of tourism numbers on domestic tourism income in China”, the author focussed on the impact of domestic tourism numbers on domestic tourism income in China by developing a one-dimensional linear regression model [
36]. The author also introduced the dynamic process of the development of the number of domestic tourists and domestic tourism income in China, followed by the determination of the quantitative relationship between these variables using the Eviews software system to determine the linear regression function from the data information; then, the author conducted statistical tests on the credibility of the model and determined the significance of the variables from the relevant variables [
36]. Based on the conclusions drawn, countermeasures and suggestions for improving China’s domestic tourism revenue were proposed to achieve a smooth growth of domestic tourism revenue [
36].
Based on this author’s well-documented view, the number of tourist visitors is included in the tourism revenue impact variable and empirically studied in this paper.
(2) A brief description of the relationship between the two.
Tourism is an activity that requires the movement of people and consumption across regions, and the essential thing in this process is the participation of people; in short, it is the number of tourists which plays a crucial role in this process [
37]. It is evident that without the participation of people and tourists, a region could not generate the so-called tourism income. Without the involvement of tourists in tourism, the various functions would not be able to perform their work and utility, they would not be able to function accordingly, and they would not be able to generate direct economic benefits, i.e., tourism revenue, as most scholars have openly argued and thought [
38]. If the one-sided relationship between tourism visitor numbers and tourism revenue is too much, the so-called pseudo-debate between the two will fall into an infinite cycle of ‘chicken producing eggs and eggs producing chickens’, which is not conducive to the depth of tourism research and is not in line with conventional thinking.
Therefore, this paper insists on including the number of tourists in the variables that affect tourism revenue for empirical analysis.
3.5. Factor Analysis
The below table analyses the factor extraction (
Table 5) and the amount of information extracted from the factors. From the below table, we can see that a total of three factors were extracted from the factor analysis, and the variances explained by the rotation of these three factors were 40.953%, 32.122%, and 25.964%, respectively, and the cumulative variance explained by the rotation was 99.039%.
The data in this study were rotated using the maximum variance rotation method (varimax) to find the correspondence between the factors and the study items. The above table shows how well the factors extracted information from the study items and the correspondence between the factors and the study items, from which it can be seen that all of the study items have a commonality value above 0.4, which means that there is a strong correlation between the study items and the factors and that the factors can extract information effectively. After ensuring that the factors could extract most of the information from the research items, the correspondence between the factors and the research items was then analysed (an absolute value of factor-loading coefficient greater than 0.4 means that there is a correspondence between the item and the factor).
Table 6 shows that the two-road mileage (T) and civil flights (M) converge on the first common factor, and according to the characteristics of these three variables, the first common factor can be named the infrastructure influence factor. The number of travel agencies (A) and the total number of tourists (P) converge on the second common factor, and according to the characteristics of these two variables, the second common factor can be named the influence factor of tourism flow. The four variables of urban disposable income per capita (D), rural net income per capita (I), tertiary industry employees (E), and foreign direct investment (F) converge on the third common factor, and according to the characteristics of these two variables, the third common factor can be named as the investment and consumption-influence factor. After extracting the three common factors, it is necessary to consider the linear relationship between each common factor and the variables, which can be obtained from the component score coefficient matrix, as shown in
Table 7.
Once the three common factors have been extracted, it is necessary to consider the linear relationship between each common factor and the variables, which can be obtained from the matrix of component scoring coefficients, as shown in
Table 7.
[Tips]
1: A research item corresponds to more than one factor. Due to this, time should be combined with professional knowledge to determine the specific attribution of that factor.
2: If a research item does not correspond to a factor, consider deleting the research item.
3: If a factor and a research item do not correspond, a reduction of one factor may be considered
4: If there is no correspondence between a research item and a factor, consider deleting the research item.
If factor analysis is used to condense information, then the ‘component score coefficient matrix’ table is ignored. If factor analysis is used to calculate weights, the relationship equation between the factors and the study items (based on standardised data to create a relationship expression) is created using the ‘component score coefficient matrix’ (
Table 7), as shown in the formula below (3).
This is example three of an equation:
When the line suddenly becomes smooth, the number of factors from steep to smooth is the reference number of factors extracted. The rubble diagram only assists in the decision-making of the number of factors, and the actual study is more based on professional knowledge combined with the situation of the correspondence between the factors and the study items, and the comprehensive weighing judgment to arrive at the number of factors (
Figure 6).
3.6. Linear Regression Analysis
From the above table (
Table 8), F1, F2, and F3 were used as independent variables, while SN_R (total tourism revenue) was used as the dependent variable for the linear regression analysis from above table; from this, model Equation (4) can be derived.
This is example four of an equation:
The model R-squared value of 0.985 implies that F1, F2, and F3 can explain 98.5% of the variation in SN_R (total tourism revenue). An F-test of the model revealed that the model passed the F-test (F = 221.321, p = 0.000 < 0.05), which means that at least one of F1, F2, and F3 would have a meaningful relationship on SN_tourism total tevenue (R), with a regression coefficient value of 0.035 for F1 (t = 11.989, p = 0.000 < 0.01), implying that F1 would have a significant favourable influence relationship on SN_R (total tourism revenue). The regression co-efficient value of F2 is 0.060 (t = 20.886, p = 0.000 < 0.01), implying that F2 will have a significant positive effect on SN_R (total tourism revenue). The regression coefficient value of F3 is 0.026 (t = 9.166, p = 0.000 < 0.01), implying that F3 will have a significant positive effect on SN_R (total tourism receipts) and has a significant favourable-influence relationship.
To summarise the analysis, it can be seen that F1, F2, and F3 significantly positively affect SN_R (total tourism revenue).
From the below graph (
Figure 7) and table (
Table 9), it can be seen that linear regression analysis was carried out with F1, F2, and F3 as the independent variables and SN_R as the dependent variable. The below table shows that the model R-squared value is 0.985, implying that F1, F2, and F3 can explain 98.5% of the variation in SN_R.
From the above table, the model was found to pass the F-test (F = 221.321,
p = 0.000 < 0.05) when the model was tested (
Table 10), which means that the model construction is meaningful.