Undoubtedly, there are some specific factors responsible for the dynamics of GDP in individual economies associated with specific characteristics.
One of the questions that has to be asked is whether it is possible to identify such determinants. The MC algorithm used in the BMA method, presented in the following section, makes it possible to “capture” the models and variables with the greatest explanatory power.
2.2. Bayesian Methods Used in Study
Model-building strategies based on theoretical and statistical assumptions always include elements of uncertainty about the determinants. One of the most significant challenges of contemporary theory of economics and economic policy is accurately identifying the factors determining the economic growth. The economic growth literature, e.g., Sala-i-Martin et al. [
19] and Cuaresma et al. [
38], encompasses a range of studies that refer to various factors and groups of factors responsible for the processes of economic growth. These studies provide the foundation for the considerations below. There is consensus in the literature that methods developed on the basis of Bayesian econometrics are generally applicable in the analysis of the complex economic phenomenon of the determination of the sources of economic growth.
From a statistical point of view, one has to face problems about using the proper set of independent variables during model construction, and the goodness of fit of a statistical model has to be evaluated. Moreover, with a large number of variables and different selection procedures, it is difficult to decide which model and variables are the most appropriate to use in the analysis of the dependencies. For example, if we take into account a set of twenty independent variables, we will get more than one million linear combinations of determinants in a simple regression model. Therefore, it is really hard to find the optimal set of variables in terms of goodness of fit measures. Additionally, Raftery et al. [
61] showed that process modeling approaches lead to different estimates and conflicting conclusions about the estimates. From a Bayesian point of view, model uncertainty is a natural aspect of building a strategy and can be incorporated in the construction process. For example, Zellner [
62] showed that we can calculate the posterior odds ratio between two competitive models and obtain a posterior probability of every one of them. Using Bayesian inference, we can also obtain not only the posterior probability of the model, but also the posterior characteristics of the parameters, such as the mean, variance, and quantiles (see Koop [
63]). Since we have characteristics for all models, we can calculate some interesting measures across the whole model space instead of making inferences based on a single model.
Consider the normal linear regression
for a dependent variable
y:
where
is a constant,
denotes an
vector of ones,
is an
matrix of regressors in model
, and
is a
vector of parameters.
is a vector of dimensions
with a normal distribution
, where
is the variance of random error
and
is an identity matrix of size
N. Data are taken from
objects.
To illustrate Bayesian model averaging, we can calculate a posterior mean of regression parameters across the whole model space using the following equations:
with the variance:
where
denotes the posterior probability of model
,
,
and
are the expected value and the variance of the parameters, and
is the total number of all linear combinations in the regression model. From Equations (
2) and (
3), it is clear that the posterior mean and variance calculated across the whole model space are weighted averages of the posterior means and variances of the individual models.
The calculation of the posterior model probability and estimation of parameters in the linear regression model is a well-known topic in the Bayesian statistic literature, so here, we just provide a common overview of the main steps used, especially those related to the model averaging framework.
For computational simplicity, we use a natural conjugate normal-Gamma prior of the regression parameters (see DeGroot [
64], Koop [
63]); thus, we assume standard noninformative priors for
and intercept
, which are common parameters in all regression models:
and for regression coefficient
, we assume a normal prior distribution with mean
and covariance matrix
:
From Equation (
5), it is clear that the covariance of the prior distribution of
depends on
. Additionally, note that the prior covariance matrix is proportional to the data-based covariance matrix and
g-prior (here,
). The basic idea, underlined by Zellner [
65], of the
g-prior is to assume a common prior distribution for the regression coefficients due to the computational speed required for posterior distributions and convenience in the model selection framework. In this case, we used the “benchmark” prior, which is popular in the Bayesian model averaging framework and was recommended by Fernández et al. [
17] and Ley and Steel [
23]. In our approach, we use
for a large number of regressors, i.e.,
and
when
.
We assume that the residuals in the regression model are normally distributed; therefore, the likelihood function has the following form:
It is well known from the Bayesian literature that with a natural conjugate framework and integrating out intercept
, the posterior for
follows a multivariate Student-
t distribution, where the posterior mean and covariance matrix of regression coefficients can be written as follows (see Fernández et al. [
17], Koop [
63]):
where:
and
. After integrating out all parameters, we know that the density of the marginal distribution of the vector
y is given by:
Since we have the marginal data density
in Equation (
10), the posterior probability of any variant of regression model
can be calculated by the following formula, which is essential for Bayesian model averaging:
where expressions
denote the prior probabilities of competitive models. In our work, we take the very simple assumption that all linear combinations are equally probable:
and
. Therefore, Equation (
11) can be simplified to:
The estimation of parameters in the linear regression model and the computation of marginal data density is a very well-known issue in the Bayesian literature, and it does not require, in most cases, advanced computation techniques Koop [
63]. On the other hand, we have to face the problem of obtaining posterior quantities for a large set of exogenous regressors. For example, if we consider
independent variables, we have to estimate
, i.e., more than one million linear combinations, which requires tremendous computational CPU time. Both from a practical and computational point of view, this does not seem reasonable. If we decide to choose only the “best” model, we will probably neglect much information from the other potentially interesting competitive models. On the other hand, if we need information based on the whole model space, we will have to estimate a tremendous number of combinations, some of them with very low posterior probability. Moreover, we will have to spend much CPU time obtaining all estimation results for all linear combinations. A much better idea is to use a “smart” algorithm that finds the most probable models and ignores low probability models with a reasonable CPU time.
One of such procedures is the MC
algorithm, which was developed by Madigan et al. [
66] based on the Markov chain Monte Carlo method. This method facilitates easy “capturing” of the models with the greatest explanatory power. This means that we focus on the most probable variables and models, while neglecting the least likely ones. We use an atheoretical approach for a large number of combinations of determinants, which is why the usage of BMA with MC
is crucial for our study. The candidate model
is accepted with the probability:
where
denotes the previously-accepted model in the Markov chain of models.
After a sufficient number of iterations, we get an equilibrium distribution
of the posterior model probabilities, and the posterior mean and variance are calculated across the whole model space. Using Monte Carlo simulation, we can also derive additional posterior characteristics that are useful for the Bayesian averaging approach. One of them is the posterior inclusion probability
, i.e., the probability that, conditional on the data, but unconditional with respect to the model space, the independent variable
is relevant for explaining the dependent variable
y. The value of the posterior inclusion probability indicates the importance of an independent variable in the regression model. Another useful posterior characteristic is the jointness measure defined by Ley and Steel [
21], which is the posterior odds ratio of the models including both
and
versus the models that include them only individually. It has the following form:
where
denotes the sum of the posterior probabilities of those models that contain both variables
and
. Using the jointness measure, we can identify three types of variable in the regression model: independent, substitute, and complementary. Using the interpretation of the posterior odds ratio, we can classify the strength of jointness, namely, strong substitutes
, significant substitutes
, not significantly related
, significant complements
, and strong complements
(Doppelhofer and Weeks [
22], Madigan and Raftery [
67]).