4.1.2. Statistical Models

This study uses a fixed-effects regression approach to determine the influences of education finance, economic indicators, and educational systems on short-cycle tertiary vocational education enrollment. Using fixed-effects models to analyze cross-country timeseries (panel) data can capture country-specific, unobservable, and time-invariant effects that may exist for each country (e.g., national history and culture, socio-political structure, education values, and finance pattern for higher education). Additionally, our fixed-effect models include time dummy variables for each year to detrend variables that tended to increase or decrease over time (e.g., enrollment ratios, total population). Detrending or controlling the time trend may help the study avoid a "spurious regression problem" in results [70]. The following fixed-effects model was applied to estimate country and time effects of education finance variables and other predictors of tertiary vocational education:

$$y\_{it} = \alpha + \beta\_1 \mathbf{x}\_{1it} + \dots + \beta\_k \mathbf{x}\_{kit} + \mu\_i + \gamma\_t + \varepsilon\_{it} \tag{1}$$

where *εit*~*IID (0*, *σε 2)*, *μi*~*IID (0*, *σμ 2)*

In this model, *i*(=1, 2, ... , *N*) represents the *i*th country; *t* (=2000, 2004, ... , 2018) denotes the year, where the year of 2000 is treated as the reference category; *α* is the intercept of the model; β<sup>k</sup> is the coefficient associated with the independent variables *xkit*; the term *γ<sup>t</sup>* denotes the time effect; *μ<sup>i</sup>* represents country-varying, time-invariant variables (country fixed-effects); and *εit* denotes the country-varying and time-varying error term. In a fixedeffects model, the country dummies *μ<sup>i</sup>* are considered part of the intercept. By assumption, E(*εit*) = 0 and Var(*εit*) = *σε <sup>2</sup>*, while *εit*~*IID (0*, *σε 2)* denotes that errors are independent and identically distributed (IID).

This study uses the above equation to estimate four fixed-effects models on two dependent variables by excluding (Models 1 and 3) and including the interaction items between the country's development level and predictor variables (Models 2 and 4). We report the results from all models and highlight consistent results across two or more models. At least one of the interaction terms in Models 2 and 4 appears statistically significant, so we present the regression coefficients and the interaction effects.

#### 4.1.3. Data Limitations

This study has some limitations. First, some potentially influential factors on vocational education are unavailable from the WDI dataset (e.g., tuition fees and financial aid programs are missing) or lack necessary classification into components. For example, government expenditure on tertiary education did not distinguish between vocational and academic programs.

Second, except for the two economic indicators and three population variables, the rest of the variables contain considerable missing values. Assuming that the mechanism of missing data is random and that variables change slowly over time, the study imputed a small part of the missing data, only 5 percent of all data points in total for the 18 variables, with values adjusted to an average growth rate of the individual variable that contains missing data.

Third, the countries in the statistical analysis are primarily middle- or upper-income countries that are more likely to report data to the World Bank. Moreover, the developed countries in the dataset are mostly members of the European Union plus New Zealand. In addition to the United States, referred to in many places in the paper, China, Russia, India, Turkey, Argentina, Venezuela, France, Australia, and other nations are also missing due

to a lack of data. Thus, it is inappropriate to generalize beyond the countries included in the study.

Fourth, since more recent data after 2018 are unavailable for most countries in the World Development Indicators dataset, caution is suggested when generalizing the results to the coronavirus pandemic and post-pandemic periods.

Finally, as noted above (Section 4.1.1), we took steps to reduce the risk of multicollinearity. We first reviewed pair-wise correlations between all independent variables and made sure no severe collinearity affected the estimation. Furthermore, our panel dataset has a long time series with a maximum of 19 years of homogeneous entries (2000–2018, when there was no missing data) clustered for each country, causing higher correlation than a single cross-country dataset [97]. Thus, the correlation table is not provided (but is available on request).
