4.1. Multiple Logistic Regression Model
The objective of binary logistic regression is to find a parsimonious model that provides a good fit to the data. What distinguishes the logistic regression model from the linear model is that the dependent variable, called the response variable, is dichotomous. In the case under study, the dependent variable business digitalization (BD) takes only two values: one if the answer to the question constitutes a “success”, that is if the digitalization of the business is considered important, very important, or extremely important, or zero if the answer constitutes a “failure”, in case the digitalization of the business is considered not important or not very important. The logistic model [
71], for the case in which there are
p independent variables, is as follows (1):
where
is the vector of estimated probabilities (
P(Y
1 = 1)) and
β is the vector of
p logistic regression coefficients. To linearize this function, the logit function (
π) (link function) (2) is used.
The values of the regression coefficients,
β, can be challenging to interpret directly. Therefore, it is common practice to examine the exponential of these coefficients, known as odds ratios, for interpretation purposes (3):
The odds ratio estimates the ratio of the possibilities of “success” versus “failure” per unit of the independent variable i. Note that an Exp(β) value greater than one (β > 0) indicates an increase in chances. In contrast, an Exp(β) value less than one (β < 0) indicates a decrease in chances when the independent variable (in our study, the independent variables are all qualitative) passes from the reference class (in our case, the first class) to the class under test.
In this work, logistic regression was used, starting with the “Enter” method (in which all independent variables are selected), followed by the Forward-LR method (stepwise selection method based on the likelihood ratio). The significance level (p-value) for adding an independent variable to the model was α = 0.10 (the variable enters the model if its p-value in the addition test is less than or equal to 0.10). The p-value for removal was 0.15 (the variable is removed if its p-value in the removal test is more significant than 0.15).
To assess the significance of the complete model, we used the likelihood ratio test (omnibus tests of model coefficients,
Table 2), noting that the greatest significance occurs for the 11th step model (G
2 = 199.206,
p-value < 0.001). This result allows us to conclude that at least one of the independent variables of the complete model has predictive power (significant influence) for the dependent variable (BD).
Next, to assess the significance of the independent variables (indicated in
Table 1) on the probability of a company considering business digitalization (BD) to be at least important, the Wald test was used (test for the significance of the model coefficients). This test is constructed based on the null hypothesis (H
0) that a coefficient (
βi,
i = 1, …,
p) associated with a particular variable is null, that is, that this variable is not significant, against the alternative hypothesis (H
1) that is non-zero.
In the first application of the test to the complete model, a non-significant p-value = 0.301 was obtained for the variable “Information on the macroeconomic and fiscal framework” for the usual levels of significance, so it was decided to remove this variable from the model and readjust the model with only the remaining significant independent variables. After re-estimating the model and the consequent evaluation of the significance of the variables, two more variables emerged, “Distribution channels” and “Leadership confused about what to do”, with high p-values (p-value = 0.115 and p-value = 0.376, respectively), so they were also removed from the model, and the model was readjusted again. This resulted in a simplified model consisting of only eight significant variables (overall p-values all below 5%, as will be shown later.
To apply the logistic regression model, it was necessary, since the independent variables are qualitative, to choose the reference classes that are left out of the model for each of them. For example, for the variable “Need to explore new resources (NER)”, the reference class is the “not important” class, with the “not very important” classes being class 1, “important” being class 2, “very important” class 3, and “extremely important” class 4. The reference class is always the first class for all qualitative variables in the final model. This information is essential for interpreting the odds ratio (Exp(βi)).
In summary, the independent variables included in the final model to assess the probability of a company considering business digitalization (BD) to be at least important were SER, CR, NER, ANT, ADM, C, R, and IB. These variables had a statistically significant effect on the probability of business digitalization (BD) being at least important.
To assess the significance of the adjusted model, we again used the likelihood ratio test, obtaining the value of the G
2 test statistic, G
2 = 181.401 with a
p-value < 0.001, which allows us to conclude that at least one independent variable in the model has predictive power over the dependent variable (BD). To evaluate the quality of the model adjustment, we used the −2LL statistic (−2LogLikelihood) in which we obtained that the
p-value corresponding to the −2LL estimated by χ
2(277) = 248.35 (
Table 3) is 0.89. Given this value, the H
o hypothesis cannot be rejected: the model fits the data. The table obtained also presents the pseudo-R
2 values of Cox and Snell (R
2 = 0.443) and Nagelkerke (R
2 = 0.591). These values reveal a model with adequate quality.
Table 4 presents the Hosmer–Lemeshow fit test. Given that χ
2 = 9.210 and
p-value = 0.325, we can then conclude that the values estimated by the model are close to the observed values; that is, the model fits the data.
Next, to assess whether the model classifies companies well in terms of the importance they attribute to the digitalization of business in the internationalization process, we turn to
Table 5, which provides the classification of the responses observed and predicted by the adjusted model.
The model’s sensitivity is 198/216 = 0.917; that is, the model correctly classifies 91.7% of companies that consider business digitalization (BD) to be at least important (successes). The model’s specificity is 57/94 = 0.606; that is, the model correctly classifies 60.6% of companies that do not consider business digitalization important (failure). This model correctly classifies 82.3% of cases (of companies). Given these specificity and sensitivity measures, the model has acceptable predictive capabilities.
At the same time, the ROC curve was constructed (
Figure 3) by calculating the respective area under the curve (AUC), given that this is another measure widely used to evaluate the model’s ability to discriminate between “companies that consider that the digitalization of business (BD) is at least important” against “companies that do not consider it important”.
Table 6 gives the area under the ROC curve (AUC = 0.878), which is significantly higher than 0.5 (
p-value = 0.000), which validates that the adjusted model presents an excellent discriminating capacity.
Finally, we analyze the residuals and diagnose influential cases.
The standardized residuals graph (
Figure 4) is a powerful tool for identifying outliers, which play a crucial role in our analysis. In our investigation, we identified some potential outlier observations, |r| > 2. However, their inclusion in the final model was justified as their removal did not enhance the significance or the quality of the adjustment of the logistic model.
Regarding the diagnosis of influential cases (observations that influence the adjustment), a graphical representation (
Figure 5) was used, which indicates both the influence of observations on the quality of the model and on the estimates of the model coefficients [
69].
Only two cases influence the quality of the model (DX2 ≥ 4). However, these cases present a Cook’s distance greater than 1, meaning that none of the observations significantly influence the model coefficients (they are not eliminated).
Table 7 summarizes information about the independent variables in the entire model. Since the variables are qualitative, the numbers in parentheses indicate the classes (codes) that participate in the model.
Thus, the final model that allows estimating the probability (
of a company considering business digitalization (BD) to be at least important is then (according to
Table 7) (4):
That is (5),
which is equivalent to (6)
According to this model, we can state the following:
The importance attributed to the digitalization of business is approximately 0.179 less critical in companies that classify the need to explore new resources (NER) as not very important and 0.131 less necessary in those that classify it as extremely important compared to those that classify it as not important. In the latter case, the chances decrease (0.131 − 1) × 100% = −86.9% when we go from the not important classification for the digitalization of business (reference class) to the extremely important classification.
The importance attributed to the digitalization of business is approximately 5.450 higher in companies that give the rating of less essential to allow access to new technologies or resources (ANT) compared to those that classify it as not important, 9.758 higher in companies that give the rating of essential to allow access to new technologies or resources compared to those that classify it as not important, and around 27.988 higher in companies that assign the classification of extremely important to allow access to new technologies or resources compared to those that classify it as not necessary. The chances of classifying business digitalization as at least important increase as the importance of allowing access to new technologies or resources increases.
The importance attached to the digitalization of business is approximately 1.475 higher in companies that assign the rating of not very important to strong entrepreneurial and risk-taking propensity by the main employees (SER), 1.371 higher in companies that assign the rating of important to strong entrepreneurial and risk-taking propensity, and is 1.360 higher in companies that classify strong entrepreneurial and risk-taking propensity as extremely important, compared to those that classify it as not important at all (even though these effects are not statistically significant). The chances of classifying business digitalization as at least necessary increase by approximately (1.360 − 1) × 100% = 36% when strong entrepreneurial and risk-taking propensity goes from not important to extremely important.
The importance attributed to business digitalization is approximately 1.602 higher in companies that assign important to autonomy in decision-making (ADM) and 2.209 higher in companies that assign very important to autonomy in decision-making, compared to those that classified it as not important. The chances of classifying business digitalization as the least significant increase by approximately (1.651 − 1) × 100% = 65.1% when autonomy in decision-making goes from not important to extremely important.
The importance attached to the digitalization of business is approximately 1.334 higher in companies that rate counselling partnership (C) as important, 5.359 higher in companies that rate counselling partnership as very important, and 7.911 higher in the companies that give the classification of the counselling partnership as extremely important, compared to those that classified it as not important. The chances of classifying business digitalization as the least significant increase as the degree of the importance of the counselling partnership increases.
The importance attributed to the digitalization of business is approximately 6.035/6.108 higher in companies that classify the credibility (CR) partnership as not very important/important than in those that classify it as not important at all. However, when the importance of this partnership is considered very important or extremely important, the importance given to business digitalization is about 1.731/1.154 higher.
The chances of the importance attached to the digitalization of business are not affected when the frequency of almost always or sometimes is attributed to managers’ resistance (R) because exp(β) ≅ 1, (0.983 and 0.920, respectively).The importance attributed to the digitalization of business is approximately 4.492 higher in companies that attribute never to the resistance of managers, compared to companies that classify resistance as always. When there is no resistance from managers, the importance attributed to business digitalization is more significant than when there is always resistance.
The importance of business digitalization is approximately 0.075 less in companies that consistently perceive their budget as inadequate (IB), 0.040 less in companies that occasionally perceive it as inadequate, 0.079 less in companies that rarely perceive the budget as inadequate, and 0.035 less in companies that never perceive it as inadequate, all of them compared to those that always perceive the budget as inadequate.
The importance that companies attach to business digitalization regarding internationalization decreases as the importance of inadequate budgets decreases. The digitalization is crucial for companies that consider budgets inadequate.
4.2. Multiple Linear Regression Model
Multiple linear regression with the stepwise variable selection method (with criteria where significance level α = 0.10 for the entry value and α = 0.15 for the removal value) was used to obtain a parsimonious model that allows for the prediction of the degree of the importance of the “digitization of business” (Y) in affecting the internationalization of the company depending on the independent variables (IPP, ANT, ADM, C, R, and IB).
We began by analyzing whether the model’s applicability assumptions (the normal distribution of errors, homogeneity, and independence of errors) were verified. The first assumption was validated graphically (
Figure 6) together with the Kolmogorov–Smirnov test (
Table 8).
We can conclude from this graph, where the abscissa axis shows the cumulative observed probability of the errors and the ordinate axis shows the cumulative probability that would be observed if the errors had a normal distribution, that since the values shown above are mostly distributed on the main diagonal, the errors are normally distributed. This assumption is also validated by the Kolmogorov–Smirnov test (p-value = 0.200).
The second assumption (the homogeneity of errors) was also validated graphically. Finally, the third assumption (the independence of errors) was validated using the Durbin–Watson test. Given that IBM SPSS does not produce the
p-value associated with the Durbin–Watson test statistic, we then use the decision rule empirically—do not reject H
0: there is no autocorrelation between the residuals if d
obs ≈ 2 ± 0.2. It should be noted that, as d
obs = 1.867 is far from 2 (
Table 9), H
0 is accepted; the residuals are independent.
Next, to diagnose the possible existence of multicollinearity (association between independent variables), the ratio k =
designated as the condition index (
Table 10) was used. As the values obtained for this ratio for each dimension (the number of model parameters) are all lower than 15 [
69], we conclude there is no multicollinearity between the independent variables.
To assess the existence of influential observations in the sense that there are observations that affect the values of the estimated parameters, the effects of leverage and residuals are graphically represented. As we can see in
Figure 7, there are no outliers because no centered leverage value is close to 0.5.
Having verified all of the applicability assumptions of the model and given that there was no association between the independent variables, the application of the multiple linear regression model made it possible to identify the variables IPP (β = 0.247, t = 3.367, p-value < 0.001), ANT (β = 0.203, t = 4.031, p-value < 0.001), ADM (β = 0.253, t = 4.020, p-value < 0.001), C (β = 0.193, t = 2.813, p-value = 0.005), R (β = 0.191, t = 3.225, p-value = 0.001), and IB (β = −0.108, t = −2.012, p-value = 0.045) as significant predictors of the dependent variable Y (business digitalization). A type I error probability of α = 0.05 was considered for all analyses.
The final adjusted model is highly significant (F = 426.160,
p-value < 0.001) and explains a high proportion of the variability in variable Y (
, we can state that 90% of the total variability in Y is explained by the independent variables present in the adjusted linear regression model—
Table 11).
As we can see in
Table 12, the final adjusted model is highly significant (F = 426.160,
p-value < 0.001) and explains a high proportion of the variability in the Y variable (
0.9, we can say that 90 per cent of the total variability in Y is explained by the independent variables present in the adjusted linear regression model—see
Table 9).
The final fitted model (
Table 11) that allows for the estimation of the “Importance of business digitalization (Y)” for the internationalization of a company is then as follows:
As all independent variables are expressed in the same units, regression coefficients can be used to assess the importance of each independent variable in the model (note that all regression coefficients are significant).
The higher the IPP, ANT, ADM, and C, the greater the importance of business digitalization. The lower the resistance from managers, the greater the degree of importance attributed to the digitalization of business (inverted Likert scale as explained above). Finally, with less effect on the prediction of Y, we can state that the more inadequate the available budgets are, the less importance is attributed to the digitalization of business.
The multivariate linear regression model, a model widely used in practice and renowned for its straightforward interpretation, was instrumental in identifying the factors (independent variables) most valued by entrepreneurs (survey respondents) for internationalization, based on the significance they attribute to the digitalization of the business (Y). The statistical variables used in both models (multivariate logistic and multivariate linear regression) are nearly identical, which further reinforces the validity of using the linear model, the most common in practice. This ultimately validates its use, instilling confidence in the results.