3.1. Defining the Sigma Convergence of the Digital Processes
Given that data are a new economic resource of the 21st century and that digital data development is the engine of economic development, it is reasonable to determine the sigma convergence regarding the indicator (the total number of billions of persons) of the number of people around the global Internet services. The information base used the study results of the International Telecommunication Union [
43]. The period is 20 years, namely the range from 2000 to 2020.
The countries under study are Albania, Austria, Bahrain, Belarus, Belgium, Bolivia, Bosnia and Herzegovina, Bulgaria, Cambodia, Chad, China, Costa Rica, Croatia, Cyprus, Czech Republic, Denmark, Egypt, Estonia, Finland, Georgia, Germany, Greece, Hong Kong, China, Hungary, Indonesia, Iran, Ireland, Kazakhstan, Korea, Kuwait, Latvia, Lithuania, Luxembourg, Malaysia, Malta, Mauritius, Mexico, Mongolia, Montenegro, Morocco, Netherlands, Northern Macedonia, Norway, Oman, Paraguay, Peru, Poland, Portugal, Qatar, Romania, Russian Federation, Saudi Arabia, Serbia, Seychelles, Singapore, Slovakia, Slovenia, Spain, Sweden, Taiwan, Thailand, Turkey, Ukraine, United Arab Emirates, Great Britain, and Vietnam. The sample covers both countries with a high level of economy and countries with a low level of economy.
Such inequality indicators as the Herfindahl–Hirschman index, Tayle index, and Gini index are most often used to test the hypothesis regarding the presence or absence of sigma convergence (sigma divergence) in terms of economic growth. However, it is proposed to use the variation indicator to be independent of the input sample size and to transfer the logic of determining the sigma convergence to the digitization indicator—the number of Internet service consumers. Based on the coefficient of variation (
Figure 4), we can conclude that there is
-convergence if this indicator falls over time. Formula (4) is used to calculate the coefficient of variation:
where
is the standard deviation of the sample from 66 countries,
is the mean,
n is the number of all data points, and
xi is the number of Internet users for the
i-country.
Formula (4) uses a sample variance, calculated for a sample of 66 countries.
Vertical and horizontal segments for the value of the coefficient of variation (
Figure 4) show the allowable limits of error. The decline in the coefficient of variation indicates a high level of convergence in the studied countries in the degree of Internet use by individuals in 2009–2010. According to the study sample, the lowest coefficient of variation is during these years. From 2011 to 2020, CV gradually increased, but the sigma-convergence index remains relatively high for the studied countries regarding the number of people using the Internet. The increased variation rate relates to the peculiarities of organizing Internet communication and the financial capabilities of citizens of the studied countries. Thus, if we compare the digital development infrastructure and the features of access to the network of all countries [
39], the dynamics for some of them significantly differ. A comparative description of indicators of infrastructure, access, opportunities, and barriers for Poland, Ukraine, Germany, and Cyprus is given in
Table 2.
It is necessary to conduct a further detailed analysis of what factors affect a country’s digital development and to what extent they affect it, provide an opportunity to identify risk factors for using financial institutions for money laundering, and assess how well cybersecurity and anti-fraud are organized in countries.
3.2. Multiple Regression Model Development
As input indicators for the development of a regression model to describe the digitization level, indicators for 2021, covering 104 countries, are used: digital development level (DDL) [
40], National Cyber Security Index (NCSI) [
40], ease of getting electricity (TINY) [
41], ease of doing business (SEES) [
41], and Basel AML Index [
42]. These indicators are already aggregated according to the appropriate methodology of institutions, which officially determine and publish statistical reports on these indicators. DDL and NCSI are determined according to the methodology of the e-Governance Academy (EGA [
40]), which was founded in 2002. It is a non-profit consulting organization that develops a knowledge base of best practices in e-government.
DDL values are calculated as the arithmetic mean of the ICT Development Index (
IDI), determined by the International Telecommunication Unit and the Networked Readiness Index (
NRI) [
44] (an indicator that characterizes the development of information technology and network economy in the world):
The generalized value of the NCSI is formed based on the score features of 46 indicators, divided into 12 factors according to three categories (
Table 3,
Table 4 and
Table 5). An example of the distribution by factors for Ukraine is given in
Figure 5.
Thus, indicators that determine the factor of cybersecurity policy development are given by Formula (6)
For example, for Ukraine we have the following indicators as of 6 September 2021, according to analytical reports of the e-Governance Academy Foundation [
45]: population 42.7 million; area (km
2), 603,700; GGP per capita (USD), 8700; National Cyber Security Index, 24th; Global Cybersecurity Index, 78th; ICT Development Index, 79th; Network Readiness Index, 53rd.
The TINY indicator is found based on the values of such indicators as procedures (number), time (days), cost (% of income per capita), and reliability of supply and transparency of tariff index (0–8) [
41].
The SEES indicator is also integrated according to the World Bank’s Doing Business methodology and is formed by nine categories measured by values on a 100-point scale (0 is the worst value of the categorical indicator, 100 is the best), namely: ease of starting a business, ease of dealing with construction permits, ease of registering property, ease of getting credit, ease of protecting minority investors, ease of paying taxes, ease of trading across borders, ease of enforcing contracts, ease of resolving insolvency. Category ease of starting a business has such indicators as procedures—men (number), time—men (days), cost—men (% of income per capita), procedures—women (number), time—women (days), cost—women (% of income per capita), and paid-in minimum capital (% of income per capita). The category ease of dealing with construction permits is based on procedures (number), time (days), cost (% of warehouse value), and building quality control index (0–15). The next category, ease of registering property, is defined using procedures (number), time (days), cost (% of property value), and quality of land administration index (0–30). Indicators credit information index, legal rights index, and sum getting credit determine the content of the category ease of getting credit; disclosure index (0–10), director liability index (0–10), shareholder suits index (0–10), shareholder rights index (0–6), ownership and control index (0–7), corporate transparency index (0–7), and strength of minority investors protection index (0–50) are the essence of the category ease of protecting minority investors. The score of the ease of paying taxes category is determined by the values of such indicators as payments (number), time (hours), total tax and contribution rate (% of profit), time to comply with VAT refund (hours), time to obtain VAT refund (weeks), time to comply with corporate income tax audit (hours), time to complete a corporate income tax audit (weeks), and postfiling index (0–100). The category ease of trading across borders is formed by indicators of time to export: border compliance (hours), time to export: documentary compliance (hours), cost to export: border compliance (USD), cost to export: documentary compliance (USD), time to import: border compliance (hours), time to import: documentary compliance (hours), cost to import: border compliance (USD), cost to import: documentary compliance (USD). The category ease of enforcing contracts is defined by the values of indicators time (days), cost (% of claim), and quality of judicial processes index (0–18). The category ease of resolving insolvency is defined by the recovery rate index (cents on the dollar) and strength of insolvency framework index (0–16).
As we can see, many indicators, on the values of which ease of doing business (SEES) indicator is based, are financial inclusion indicators [
46], i.e., related to the definition of access to financial services and financial literacy.
The Basel AML Index [
42,
47] is a comprehensive integrated indicator defined by the Basel Institute for Governance to identify and assess the risks of using countries’ financial institutions for money laundering and finance terrorism. Basel AML Index is measured using a 10-point scale: 0 is the best value, the minimum value of risk, indicating risks of corruption and money laundering are absent; 10 is the worst value, the maximum value of risk, indicating that the country is at risk for money laundering. The rating value of the index is determined based on the share of five domains, which specify 17 indicators, namely [
47] the quality of anti-money laundering and terrorist financing (quality of AML/CFT framework) (65%), corruption and bribery (corruption and bribery risk) (10%), financial transparency and standards (10%), public transparency and accountability (5%), and political and legal risk (10%).
Thus, the given list of integrated indicators allows us to carry out the complex analysis of the effect made by the factors of social and economic transformation digitalization on a state’s digital development.
Since the input indicators, firstly, are already complex and different methodologies considering indices, relative and absolute values of indicators, and scores were used for their convolution, and, secondly, reflect level values (DDL, NCSI) and indices (TINY, SEES, Basel AML Index), it is necessary to carry out their normalization for the possibility of further calculations obtaining significant and adequate results. The final values also depend on the normalization quality. Many scientists worldwide [
48,
49,
50] suggest normalization based on weights, stimulant indicators (the increase in which has a positive effect on the studied indicator), and disincentive indicators. Therefore, the smallest value of the stimulant or disincentive indicator does not need to correspond to its best value. It depends directly on the content and essence of the indicator. The following weighting coefficients of normalization functions can be used: (1) weights that determine the measures of the central trend of the indicator (median, mode, mean), measures of variability (variance, minimum, maximum value of the variable, scope, asymmetry, and excess); (2) weighted indicators; and (3) scales, which are formed because of expert opinions.
where
is the standardized value of the
i-country of
j-indicator,
is the value of the indicator
at which the transformation function is at least 0.95, and
is the value of the indicator
at which the transformation function is 0.5 [
51] (
Table 6).
When establishing a regression model in which the digital development depends on the NCSI, TINY, SEES, and Basel AML Index, it is reasonable to determine the strength of the relationship between them. We propose to find the correlation coefficients using Spearman rank correlation coefficients, where their ranks (not numerical values of these variables) are used to assess the strength of the linear relationship between variables [
52]:
where
n is the number of observations,
is the rank of observation
in a row of the variable
x,
is the rank of observation
in a row of the variable
y, and
.
Practical calculations were performed in the applied software Statgraphics 19 using the Describe/Multiple Variable Analysis function. The results are presented in
Table 7.
Table 7 shows Spearman rank correlations between each pair of variables. These correlation coefficients range between −1 and +1 and measure the strength of the association between the variables. In contrast to the more common Pearson correlations, the Spearman coefficients are computed from the ranks of the data values rather than from the values themselves. Consequently, they are less sensitive to outliers than the Pearson coefficients. In addition, the number of pairs of data values used to compute each coefficient is shown in parentheses. The third number in each location of the table is a
p-value which tests the statistical significance of the estimated correlations.
p-values below 0.05 indicate statistically significant non-zero correlations at the 95.0% confidence level. The following pairs of variables have
p-values below 0.05: NCSI and DDL; NCSI and TINY; NCSI and SEES, NCSI and Basel AML Index; DDL and TINY; DDL and SEES; DDL and Basel AML Index; TINY and SEES; TINY and Basel AML Index; SEES and Basel AML Index.
The Basel AML Index is inversely related to all other indicators that are logically justified by the essence of this indicator and the measurement scale. The lowest correlation is observed between the Basel AML Index and TINY (−0.3782), indicating a low correlation, but the correlation value of this indicator with DDL, which is dependent on the regression equation, is high and moderate. The correlation between digital development and all other influential indicators is also relatively high, ranging from 0.6 to 0.8. Next, we consider the regression model. We use the modern statistical package Statgraphics 19, namely the options of the Multiple Regression dialog box, specifying the Backward Stepwise Selection, which checks for multicollinearity of relationships between influential variables. If there are any, it proposes rejecting insignificant variables according to Student and Fisher statistical tests. As a result of calculations, the econometric regression model is received:
Since the
p-value in the ANOVA
Table 8 is less than 0.05, there is a statistically significant relationship between the variables at the 95.0% confidence level. In addition, the statistical significance of model (6) is confirmed by the Student’s criterion, the level of significance of the
p-value (
Table 9), R-squared statistics, and the Durbin–Watson test.
The R-squared statistic, the coefficient of determination, indicates that the model explains 80.084% of the variability of the dependent indicator at the digital development level. The standardized value of the R-squared statistic is 79.4865% and indicates the adequacy and static significance of the econometric multiple linear regression model (9). So, the coefficient of determination, which explains the fraction of the variance of the dependent variable in the regression model and is calculated as the ratio of the regression sum of squares (SSR) to the total sum of squares (SST), allows us to estimate how well the theoretical model agrees with real data if even the dependent variable does not have a normal distribution. Thus, the developed model (6) agrees very well with the initial data. The standard error of the estimate has the standard deviation of the residuals 0.148. The mean absolute error (MAE) is equal to 0.107 and characterizes the average value of the residuals. The Durbin–Watson (DW) test checks the residuals to determine whether there is a significant correlation between the independent variables in the order in which they are entered into the model. The calculated value of the Durbin–Watson test (2.372) is in the range from 0.584 to 2.464, which indicates compliance with the uncertainty zone. Further study of autocorrelation of residues using the John von Neumann test shows its absence;
—no autocorrelation [
54].
The absence of multicollinearity between the independent variables of the econometric model (9) was proven using the variance inflation factor test (
VIF test):
where
R2 is the coefficient of determination.
Strict VIF should be below 3.0 and moderate VIF should be below 5.0.
The calculation of the
VIF test was performed using Excel software (
Table 10), which approved the absence of multicollinearity between the independent variables (9).
3.3. Development of Quantile Regression Models
We conduct a quantile analysis during the third step by developing quantile regressions. In such a way, we describe the NCSI and SEES impact on DDL for countries with high digital development quantiles of the order 0.9 [
54,
55], and countries with a low digital development quantile of the order 0.1 [
56,
57], to provide a comprehensive analysis of how digitization affects the inclusive economic growth [
58].
The proposed logic for developing quantile regressions for different values of quantiles is based on the following steps.
Step 1. Determining the estimates of the regression coefficients for the quantile of the order of 0.5 using Formula (11) and nonlinear optimization by the gradient descent method:
where
is the “check” loss function, a weight coefficient, the value of which is calculated by the Formula (12):
where
is the value of the quantile and
a is the model error value.
Step 2. Assessing the error of the model using the covariance matrix and kernel estimation of error density.
Step 3. Determining the standard error, Student’s criterion, and level of significance of the p-value based on the covariance matrix values, considering the kernel estimation of the model error density.
The loss function of the simple linear regression is quadratic. We minimize the sum of squares of deviations from the actual value of the response variable and estimate the conditional mean that is the center point of linear regression. Koenker R. has shown that if we minimize absolute deviations, we estimate conditional median. If we use the so-called “check” loss function where tau is any quantile from zero to one (zero percentile being the lowest realization, one being the highest possible realization, 0.5 or 50 being the median).
The software implementation of determining the quantile regression coefficients at quantile values of 0.5, 0.9, and 0.1 is carried out using MS Excel and the Solver add-on. Before using the Solver tool, we must directly calculate the objective function (11) [
57].
A fragment of the implementation is presented in
Appendix B,
Table A3. The sum of the products of the required quantile regression estimates and the true values is used to determine the Forecast column (
Appendix B,
Table A3). The error value is calculated as the difference between the true values of the digitization level indicator and the predicted values. The column “Loss” values (
Appendix B,
Table A3) are calculated by the formula (12).
Having used the Solver add-on and using the gradient descent nonlinear optimization method [
59], the conditional median regression equation is obtained:
Then, it is necessary to go to step 2 and estimate the error of the model using the covariance matrix (14) and the function of kernel estimation of the distribution density ((15) and (16)) [
60].
where
is the kernel estimation of error density (KDE):
where
and represents the bandwidth,
n is the sample size, and
K is the weighted core (weight function):
where
is the interquartile range (robust scatter measure calculated using percentiles).
We should note that the Student’s distribution is used to find
K (15). However, depending on the purpose of the study, different kernel functions (homogeneous, triangular, three-weighted, normal, etc.) can be used. The parameter
h is a free smoothing parameter. It strongly affects the evaluation result, so other formulas usually calculate it; the smaller the bandwidth value, the better. An alternative formula for determining the value of bandwidth may be the following Formula (17):
where
is the mean integrated squared error and
is the assessment of kernel density.
Therefore, the intermediate values calculated using the built-in MS Excel functions to find the covariance matrix and further determine the statistical significance of the conditional median Equation (12) are presented in
Table 11.
The kernel distribution function for the studied countries using Formula (16) and the built-in MS Excel functions is T.DIST (B $ 11-H21)/B $ 10; B $ 9–3; 0).
The error quantile indicator, equal to zero or close to it, characterizes the correctness of the calculations that determine the estimates of NCSI and SEES with Solver and gradient descent. Next, it is necessary to calculate the covariance matrix (14). The array formula and built-in MS Excel functions are used. The dimension of the covariance matrix will be 3 × 3, determined by the values of Constant, NCSI, and SEES for 104 studied countries (range D21: F124). The formula to be entered in the MS Excel formula row is as follows:
The keyboard shortcut Ctrl + Shift + Enter is used to obtain the resulting covariance matrix. The calculation results of the covariance matrix used to estimate the errors of the KDE model are presented in
Table 12.
The third step is to verify the significance of the quantile regression of the order 0.5 (12).
The test results are presented in
Table 13.
The covariance matrix (
Table 12) enables quickly determining the standard error as the square root of the elements of the main diagonal and the value of the Student’s criterion (t-stat) as the ratio of model coefficients (13) to standard error. The
p-value is calculated using the T.DIST.2T function:
When analyzing the results, it is obvious that the p-value for a free member exceeds the maximum allowable 5% and does not give objective estimates.
The proposed methodology will be used to develop quantile regressions of orders 0.9 and 0.1. They characterize the numbers of countries with high (quantile 0.9) and low (quantile 0.1) levels of digital development to determine how the NCSI and SEES indicators affect the formation of digital development.
The general results of the study are presented in
Table 14.