A Case Study on Hierarchical Linear Models Applied to the UN’s Sustainable Development Goals (SDGs): A Perspective Using the World and Brazil’s Data

Lemes, Murilo; Belfiore, Patrícia; Fávero, Luiz Paulo

doi:10.3390/su15108304

Open AccessArticle

A Case Study on Hierarchical Linear Models Applied to the UN’s Sustainable Development Goals (SDGs): A Perspective Using the World and Brazil’s Data

by

Murilo Lemes

^1,*,

Patrícia Belfiore

¹

and

Luiz Paulo Fávero

²

¹

Engineering, Modeling and Applied Social Science Center, Federal Univsersity of ABC, São Bernardo do Campo 09606-045, SP, Brazil

²

School of Economics, Business and Accounting, University of São Paulo, São Paulo 05508-900, SP, Brazil

^*

Author to whom correspondence should be addressed.

Sustainability 2023, 15(10), 8304; https://doi.org/10.3390/su15108304

Submission received: 16 March 2023 / Revised: 29 April 2023 / Accepted: 9 May 2023 / Published: 19 May 2023

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

This study analyzed the statistical relation between the Sustainable Development Goals and their relative indicators for the UN’s 2030 Agenda through the implementation of a two-level linear hierarchical model (HLM2) using STATA/SE 16 statistical software. The objective of this model was to address priorities by saying how much and where each country should invest so that they can achieve these goals by the end of the decade. Intrinsically, it was checked whether the indicators listed by the UN are statistically capable of describing the expected output. After analyzing the results, SDGs 8, 9 and 7 were, respectively, identified as priorities. The HLM2 also pointed out that economic growth is the most important variable amongst all considered. Finally, it was concluded that a generic answer does not serve to respond to the complexities worldwide, and therefore, it would be more appropriate to direct actions on a case-by-case basis.

Keywords:

hierarchical linear model; sustainable development goals; data-driven decision making; agenda for sustainable development; toward sustainable economic development

1. Introduction

Conceptually, “Sustainable Development” was introduced to the world by the United Nations (UN) in the 1980s, with the Brundtland report. However, it was only after the Rio-92 Climate Conference, a turning point for climate issues, that it gained notoriety and became recognized as the principal guideline for sustainability matters [1,2,3].

In the year 2000, during the Millennium Summit, the UN proposed the Millennium Development Goals (MDGs) for between then and 2015. Composed of eight broad objectives, the MDGs mark the beginning of an era where universal goals to achieve sustainable development became reality. The MDGs also represent the first attempt to quantify the sustainability of development [4]. More recently, in 2015, in Paris, driven by the success achieved with the MDGs, the UN presented a 2.0 version, called the 2030 Agenda of Sustainable Development Goals (SDGs). This, in turn, is made up of an ambitious agenda of 17 objectives, associated with 169 goals and 95 indicators, which were approved by the 193 Member States gathered in the General Assembly [5].

According to [6,7], achieving sustainable development surpasses the obligation to meet goals and deadlines, as it illustrates the maintenance of certain desired and necessary aspects for people, their communities and organizations and the surrounding ecosystem, considering a long time horizon. However, even though it was established and considered a priority in the past, progress towards the 2030 Agenda, whether due to government issues or external factors, has stopped evolving worldwide, putting itself in a hostile stagnation less than a decade before the deadline [5,8].

Thus, the objective of this work is to analyze the statistical relationship between the performance of the SDGs of the UN Agenda by 2030 and their respective indicators through a linear hierarchical model. The SDGs were analyzed from the perspective of two large groups (country and year), and therefore the model has two levels. In addition, a second objective is to analyze Brazil’s individual performance in this context, bringing specific details to the national reality.

The motivation for creating this study is based on the importance of sustainable development if we want to think about prosperity for future generations after 2030, as well as directing international actions and mobilizing the general public towards a common goal. Moreover, to date, no study has proposed the creation of a hierarchical model that takes into account temporal variations and helps guide each country’s progress towards the SDGs. This points to the existence of a literary gap at the heart of what this work aims to contribute.

2. Challenges Related to the Sustainable Development Goals

The 2030 Agenda represents the UN’s effort to synchronize ideals worldwide so that humanity progresses towards peace, diplomacy and international cooperation (Figure 1). However, the COVID-19 pandemic outbreak in 2020, the current war in Ukraine and other military conflicts, as well as the resurgent shadow of the nuclear threat between conflicting states are some of the most fateful examples of recent humanitarian tragedies that divert the focus to short-term issues, such as progress on the SDGs [3].

This deviation can be quantified through the goals and indicators linked to each SDG, which are made available annually in the UN’s Sustainable Development Report (SDR). The number of indicators included grows annually, but the current 95 represent only

\frac{2}{5}

of what was initially idealized, which can lead to inaccurate decision making due to incomplete information.

As for a reflection upon the SDGs, two of them seem to be more severely affected: SDG 1 (eradicating poverty) and SDG 8 (decent employment and economic growth), which remain below the level they were during the prepandemic period in many low-income countries. Sustained and inclusive economic growth can drive progress, create decent jobs for all and improve living standards; however, this is the biggest setback ever observed, especially considering the initial assessment period, before the pandemic (between 2015–2019), when the world was progressing in indicators at a rate of 0.33 points/year, which was already insufficient to achieve the 2030 goals [3,9].

Despite difficult times, the SDGs must remain the main road map towards 2030 as ecological concerns reach more and more spaces in society [10,11]. However, for the second year in a row, the world is no longer making progress towards them. In fact, the average index declined slightly in 2021, due to the slow—or even non-existent— recovery of poor and vulnerable countries [3,11].

Details about each SDG and their indicators can be found with the Appendix A.

Brazil’s Performances against the Goals

Brazil currently occupies 53rd position in the rankings out of 165 countries analyzed, with an overall average score of 72.80. This represents a difference of 13.71 points from first place, Finland (86.51), and of 33.75 from the last one, South Sudan (39.05). In addition, the Brazilian performance is 3.3 points above the average for Latin America and the Caribbean (69.5), the region in which it is inserted [12,13].

It is important to say that despite initially having been approved by all 193 UN Member States in 2015, the ranking currently comprises only 165 geopolitical provisions. The remaining 28 countries are listed, but no data are available. The explanation for this fact will not be discussed here.

Individually, Brazil still has major challenges in relation to SDG 10 (reduction of inequalities) when compared to the others, specially in remote states, which is justified by the historical disparity of classes and income concentration in Brazilian civil society [14,15]. On the other hand, there are four other SDGs in which Brazil stands out positively: SDG 1 (eradication of poverty), SDG 4 (quality education), SDG 7 (affordable and clean Energy) and SDG 13 (combating climate change) [13,16].

Chronologically, during 2016, Brazil regressed in the general average (precisely −1.43% y/y, or −1.03 in absolute points), mostly caused internal political issues, lack of control of public accounts and the economic recession felt in the country. It is also observed that it took three years to repair the 2016 setback; not even the COVID-19 pandemic was as harmful to the Brazilian SDGs [17].

In summary, the performance is average, tending to positive, with large margins of evolution, since for some goals, Brazil’s performance is considerably worse than when compared to others. Chronologically, the country is progressing (Figure 2), but still taking short steps that are incapable of leading it to reaching the goals by the year 2030 [18]. More is required.

3. Nested Data Structures

Hierarchical models make it possible to investigate a certain variable Y, which represents the phenomenon of interest, based on explicative variables, in which they may occur, for grouped data, between instances and between groups to which such instances belong, and for data with measures repeated over time. In short, there must be variables that present data that change between individuals who represent a certain level, but remain unchanged for certain groups of individuals, and these groups represent a higher level [19,20].

The absence of mutable explicative variables characterizes the existence of fixed effects components in a model. In many applications, a fixed effects model refers to a model in which the means of each group are fixed (not random), as opposed to a random effects model, where the group means are a random sample from a crowd [21,22].

Now, imagine a database with data from n individuals, where each individual

i = 1, \dots, n

belongs to one of the

j = 1, \dots, J

groups, with it being obvious that

J > n

. In this way, the data structure can present certain explicative variables

X_{1}, \dots, X_{Q}

referring to each individual i, and other explicative variables

W_{1}, \dots, W_{S}

referring to each group j, but being invariant for the individuals of a certain group. The table below presents the general model of a database with a nested structure of data grouped into two levels (individual and group); the model for superiors is given by analogy [23].

Based on Table 1, it can be seen that

X_{1}, \dots, X_{Q}

are Level 1 variables (data change between individuals) and

W_{1}, \dots, W_{S}

are Level 2 variables (data change between groups, but not for individuals in each group). Furthermore, the numbers of notes in groups

1, 2, \dots, J

are equal, respectively, to

n_{1}

,

n_{2}

−

n_{1}

, n −

n_{j - 1}

. It is still possible to verify the existing alignment between the units of Level 1 (individuals) and the units of Level 2 (groups), which characterizes the existence of grouped data.

Table 1 can also be represented in a diagram according to the Figure 3 below.

If

n_{1}

=

n_{2} - n_{1}

= …=

n - n_{j - 1}

, the nested data structure is said to be balanced.

3.1. Two-Level Hierarchical Linear Model (HLM2)

According to [24], to understand how the general expression of a hierarchical linear model with data grouped into two levels is defined, we need to use a multiple linear regression model, whose expression is given by

Y_{i} = b_{0} + b_{1} X_{1 i} + b_{2} X_{2 i} + \dots + b_{Q} X_{Q i} + r_{i}

(1)

where

i: subscribed to each of the individuals under review;
$Y_{i}$ : study phenomenon (dependent variable);
$b_{0}$ : intercept, or average value for when the other variables are equal to zero;
$b_{1}$ … $b_{Q}$ : coefficients of each variable;
$X_{1 i}$ … $X_{Q i}$ : explicative variables (metrics or dummies);
$r_{i}$ : terms of regression error.

The model by expression (1) presents homogeneous instances, that is, not coming from different groups that could, for some reason, directly influence the behavior of the variable Y [23,24]. However, it is possible to think of two groups of instances from which two different models would be estimated, as follows:

\begin{matrix} Y_{i 1} = b_{01} + b_{11} X_{11 i} + b_{21} X_{21 i} + \dots + b_{Q 1} X_{Q 1 i} + r_{1 i} \end{matrix}

(2)

\begin{matrix} Y_{i 2} = b_{02} + b_{12} X_{12 i} + b_{22} X_{22 i} + \dots + b_{Q 2} X_{Q 2 i} + r_{2 i} \end{matrix}

(3)

in which

$b_{01}$ and $b_{02}$ : respective intercepts for groups 1 and 2;
$b_{11}, b_{21} \dots b_{Q 1}$ : coefficients of variables $X_{1}, \dots, X_{Q}$ for group 1;
$b_{21}, b_{22} \dots b_{Q 2}$ : coefficients of variables $X_{2}, \dots, X_{Q}$ for group 2;
$r_{1}$ and $r_{2}$ : specific regression error terms in each model.

Therefore, for

j = 1, \dots, J

groups, you have the general expression of a regression model for grouped data, which can be considered a first-level model, as follows:

\begin{matrix} Y_{i j} = b_{0 j} + b_{1 j} X_{1 i j} + b_{2 j} X_{21 j} + \dots + b_{Q j} X_{Q i j} + r_{i j} \end{matrix}

(4)

\begin{matrix} Y_{i j} = b_{0 j} + Σ b_{q j} X_{q i j} + r_{q j}, q = 1, \dots, Q \end{matrix}

(5)

Didactically and illustratively, it is possible to write the expansion of the hoped values of Y—in other words,

\hat{Y}

—for each instance i belonging to each group j when there is a single explicative variable X in the proposed model, as below:

\begin{matrix} G r o u p 1 : {\hat{Y}}_{i 1} = β_{01} + β_{11} X_{i 1} \end{matrix}

(6)

\begin{matrix} G r o u p 2 : {\hat{Y}}_{i 2} = β_{02} + β_{12} X_{i 2} \end{matrix}

(7)

\begin{matrix} G r o u p J : {\hat{Y}}_{i J} = β_{0 J} + β_{1 J} X_{i J} \end{matrix}

(8)

The graph in Figure 4 conceptually presents the plot of expressions (6) to (8) (

β

parameters are estimates of the b coefficients, following the pattern described above). With it, it is verified that the individual models that represent the instances of each can present different intercepts and inclinations, a fact that can occur due to certain characteristics of the groups themselves.

Therefore, still according to [23,24], there must be characteristics of groups (Level 2), invariants for the instances belonging to each group (as explained in Table 1), which can explain the differences in intercepts and in the slopes of the models that represent these groups. In this sense, based on the regression model with an explicative variable X and with nested instances in

j = 1, \dots, J

groups:

Y_{i j} = b_{0 j} + b_{1 j} X_{i j} + r_{i j}

(9)

Hence, as follows, the expressions of the intercepts

b_{0 j}

and the slopes

b_{1 j}

are functions of a certain explicative variable W, which represents a certain characteristic of the j groups:

Intercepts:

\begin{matrix} G r o u p 1 : b_{01} = γ_{00} + γ_{01} W_{1} + μ_{01} \end{matrix}

(10)

\begin{matrix} G r o u p 2 : b_{02} = γ_{00} + γ_{01} W_{2} + μ_{02} \end{matrix}

(11)

\begin{matrix} G r o u p J : b_{0 J} = γ_{00} + γ_{01} W_{J} + μ_{0 J} \end{matrix}

(12)

\begin{matrix} i n g e n e r a l : b_{0 j} = γ_{00} + γ_{01} W_{j} + μ_{0 j} \end{matrix}

(13)

in which

$γ_{00}$ : expected value of the dependent variable for a given instance i belonging to a group j when $X = W = 0$ (general intercept);
$γ_{01}$ : change in the expected value of the dependent variable for a given instance i of a given group j when there is a unitary change in the w characteristic of this, ceteris paribus;
$μ_{0 J}$ : error term of the model, which also indicates the existence of randomness in the intercepts, which can be generated by the presence of instances from different groups in the data structure.

Intercepts:

\begin{matrix} G r o u p 1 : b_{11} = γ_{10} + γ_{11} W_{1} + μ_{11} \end{matrix}

(14)

\begin{matrix} G r o u p 2 : b_{12} = γ_{10} + γ_{11} W_{2} + μ_{12} \end{matrix}

(15)

\begin{matrix} G r o u p J : b_{1 J} = γ_{10} + γ_{11} W_{J} + μ_{1 J} \end{matrix}

(16)

\begin{matrix} i n g e n e r a l : b_{1 j} = γ_{10} + γ_{11} W_{j} + μ_{1 j} \end{matrix}

(17)

in which

$γ_{10}$ : expected value for the dependent variable for a given instance i belonging to a group j when there is unitary change in the characteristic X of individual iceteris paribus (change in slope due to X);
$γ_{01}$ : change in the expected value of the dependent variable for a given instance i of a given group j when there is a unitary change in the $W * X$ product, also ceteris paribus (change in slope due to $W * X$ );
$μ_{1 J}$ : error term of the model, which also indicates the existence of randomness in the inclinations, which can also be generated by the presence of instances from different groups in the data structure.

Replacing (13) and (17) in (9), one has the following general expression of random effects:

Y_{i j} = {(γ_{00} + γ_{01} W_{j} + μ_{0 j})}_{I R E} + {(γ_{10} + γ_{11} W_{j} + μ_{1 j} X_{i j})}_{S R E} + r_{i j}

(18)

In summary, therefore, hierarchical modeling represents a set of techniques that, in addition to estimating parameters of the proposed model, allow for estimation of the variance components of the error terms (for example, in expression (18),

μ_{0 j}

,

μ_{1 j}

and

r_{i j}

), as well as the respective statistical significance, in order to verify if, in fact, randomness occurs in the intercepts and slopes arising from the presence of higher levels in the analysis [21,25]. If the error term variances

μ_{0 j}

and

μ_{1 j}

are not found to be statistically significant in expression model (18), that is, if both are statistically equal to zero, estimating a linear model using traditional methods becomes appropriate, since the existence of randomness in the intercepts and slopes has not been proven [22,23].

Assuming that the random effects

μ_{0 j}

and

μ_{1 j}

have normal multivariate distributions, they have means equal to zero and variances equal, respectively, to

τ_{00}

and

τ_{11}

; in addition, assuming that the terms of error

r_{i j}

present normal distribution, with means equal to zero and variances equal to

σ^{2}

, one can define the following matrices of variance–covariance of the terms of error:

var [u] = [\begin{matrix} μ_{0 j} \\ μ_{1 j} \end{matrix}] = G = [\begin{matrix} τ_{00} & σ_{01} \\ σ_{01} & τ_{11} \end{matrix}]

var [r] = [\begin{matrix} r_{1 j} \\ ⋮ \\ r_{n j} \end{matrix}] = σ^{2} I_{n} = [\begin{matrix} σ^{2} & 0 & \dots & 0 \\ 0 & σ^{2} & ⋱ & ⋮ \\ ⋮ & ⋱ & ⋱ & 0 \\ 0 & \dots & 0 & σ^{2} \end{matrix}]

By rearranging the last matrix, one can thus establish the relationship between the variances of these error terms, known as intraclass correlation, as defined:

ρ = \frac{τ_{00} + τ 11}{τ_{00} + τ 11 + σ^{2}}

(19)

This correlation is between the proportion of total variance that is due to Level 1 and 2. If it is equal to zero, there is no variance of the individuals between the groups of Level 2. However, if it is considerably different from zero due to the presence of at least one significant error term due to the presence of Level 2 in the analysis, traditional procedures for estimating model parameters, such as ordinary least squares, are not adequate [26]. At the limit, the fact that it is equal to 1, that is,

σ^{2}

= 0, indicates that there are no differences between individuals, that is, all are identical, which is very unlikely to happen. This correlation is also called intraclass Level 2 correlation.

Rearranging expression (18) to separate the fixed effects component, in which the parameters of the model are estimated, from the random effects component, from which the variances of the error terms are estimated, we have:

Y_{i j} = {(γ_{00} + γ_{10} X_{i j} + γ_{01} W_{j})}_{F E} + {(γ_{11} W_{j} X_{i j} + μ_{0 j} + μ_{1 j} X_{i j})}_{R E} + r_{i j}

(20)

In general, and starting from expression (5), one can define the general equation of a model with two levels of analysis, in which the first level offers the explicative variables

X_{1}, \dots, X_{Q}

referring to each individual i, and the second level offers the explicative variables

W_{1}, \dots, W_{S}

referring to each group j:

\begin{matrix} L e v e l 1 : Y_{i j} = b_{0 j} + Σ b_{q j} X_{q j} + r_{q i j}, q = 1, \dots, Q \end{matrix}

(21)

\begin{matrix} L e v e l 2 : b_{q j} = γ_{q 0} + Σ γ_{q s} W_{s j} + μ_{q j}, s = 1, \dots, S_{q} \end{matrix}

(22)

Regarding model estimation, while the fixed effects parameters are traditionally estimated by maximum likelihood estimation (MLE), the variance components of the error terms can be estimated both by maximum likelihood and by restricted estimation of maximum likelihood (REML).

Parameter estimations by MLE or REML are computationally intensive, which is why they will not be elaborated algebraically in this study. However, both require objective function determination optimization, which generally starts from the initial parameter values and uses a sequence of iterations to find the parameters that maximize the previously defined likelihood function [23,24].

3.2. HLM2 vs. Traditional Multiple Linear Regressions (MLR)

A hierarchical linear model takes into account the fact that individuals belonging to the same group are more similar, and therefore, present correlations in their responses. Therefore, unlike conventional linear regression models, which have four basic assumptions (linearity, additivity, normality, homoscedasticity and independence), hierarchical linear models seek to more reliably maintain the correlation structure present in the data [27].

In summary, three main aspects are highlighted, where models that consider only one level differ when compared to a hierarchical linear model:

The coefficients (both intercept and slope) may vary concomitantly between the higher-level variables, i.e., refuting the basic assumption of independence between them;
They include additional statistical parameters, corresponding to intercept variance and slope coefficients between higher-hierarchical-level units;
When the random coefficients of the model are not null, explicative variables referring to the second hierarchy level are included in the model, assisting in explaining the variation between the units of this level.

It is also important to state that although they are different models, linear hierarchical models can be simplified in rough mode, and preferably when

W_{1}, \dots, W_{S}

= 0 (level variables) in conventional multiples. Using as an example the nested data structure in Table 1, a simplification would result in

Ψ

multiple linear regression models, where

Φ

represents the number of levels analyzed

(i, j, k, \dots)

.

4. Methodology

4.1. Sample Data

The data structure used to create the linear hierarchical model originally has 3895 × 119 dimensions of rows per column, and is published annually by the Sustainable Development Report (SDR), backdated [13] in the SDG Index section. It includes data on the overall and individual average performance of each of the 17 SDGs, all 95 associated indicators for 165 Member States and 12 other regional geopolitical classifications (E_S_Asia, E_Euro_Asia, _HIC, _LMIC, _LIC, MENA, Oceania, OECD, SIDS, Africa, -UMIC, World), over 22 uninterrupted years (2000–2021).

Further details on regional geopolitical classifications can be found in the same report.

4.2. Variable Selection

Once there is no outlier present with the SDR, it is not necessary to remove or treat them prior to using the data [28].

Of all 119 variables available, 50 will not be considered for model creation, namely [Country, Income Group, Region] and the 17 Goal Scores (they will not individually be analyzed, only the final score they represent; otherwise, 17 other models would be required to only then obtain the sdg $_{i n d e x s c o r e}$ ). The latter, however, will not be left aside, as they have discursive relevance in specific topics in the evaluation of the model.

In addition to the variables not considered above, all those listed below also do not comprise the group of first-level explicatives in the creation of the model, either due to lack of data availability or presence of collinearity in pairs with similar ones. They are, in sequence: sdg2 $_{o b e s i t y}$ , sdg2 $_{s n m i}$ , sdg2 $_{p e s t e x p}$ , sdg3 $_{m a t m o r t}$ , sdg3 $_{n c d s}$ , sdg3 $_{t b}$ , sdg3 $_{b i r t h s}$ , sdg6 $_{w a s t e w a t}$ , sdg6 $_{s c a r c e w}$ , sdg8 $_{s l a v e r y}$ , sdg8 $_{i m p a c c}$ , sdg12 $_{m s w}$ , sdg12 $_{e w a s t e}$ ,

sdg12 $_{S O_{2} p r o d}$ , sdg12 $_{S O_{2} i m p o r t}$ , sdg12 $_{N_{2} p r o d}$ , sdg12 $_{N_{2} i m p o r t}$ , sdg12 $_{e x p l a s t i c}$ , sdg13 $_{C O_{2} e x p o r t}$ ,

sdg14 $_{b i o m a r}$ , sdg15 $_{r e d l i s t}$ , sdg15 $_{b i o f r w a t e r}$ , sdg16 $_{d e t a i n}$ , sdg16 $_{p r s}$ , sdg16 $_{u 5 r e g}$ , sdg16 $_{c p i}$ ,

sdg16 $_{w e a p o n s e x p}$ , sdg17 $_{c o h a v e n}$ , sdg17 $_{o d a}$ , sdg17 $_{g o v e x}$ .

Thus, the postselection data structure has dimensions of

3895_{x} 69

(

q = 1, \dots, 65

).

One last note of consideration refers to the second-level variable (

W_{S j}

), which will also be disregarded from the model due to computational performance [29]. If it were considered, it would be necessary to evaluate

W * X

for each instance, which would add complexity to the model, and consequently, time spent on the iterative procedure of defining the coefficients. The option not to use it is a simplification, which implies that the proposed model is called a simplified two-level linear hierarchical model (HLM2s).

As explained visually in Table 2, even with more indicators each year, it is still noticed that there are vacancies in data distributed in both levels due to measurement difficulties and/or reported official sources. Additionally, as the numbers of individuals in groups

1, 2, \dots, J

are the same, the nested data structure is characterized as balanced.

Yet, Table 2 can also be represented in a diagram according to the Figure 5 below.

4.3. Methodology Applied in STATA/SE 16:

The methodology developed here aims to create the HLM2 model, based on Section 4.2, whose description originates from the adaptation of Equations (21) and (22).

\begin{matrix} Y e a r (t) - L e v e l 1 : Y_{t j} = b_{0 j} + Σ b_{q j} X_{q j} + r_{q i j}, q = 1, \dots, 65 \end{matrix}

(23)

\begin{matrix} C o u n t r y (j) - L e v e l 2 : b_{q j} = γ_{q 0} + Σ γ_{q s} W_{s j} + μ_{q j}, s = 1, \dots, S_{65} \end{matrix}

(24)

Strategically, an arranged step-up configuration is used, where the final model is built as the aggregated result of previous null hypothesis rejection tests (

H_{0}

). For some authors, this strategy reflects the opposite of the one most commonly used statistical methods, called stepwise procedure (Figure 6) [30].

4.4. Null Hierarchical Model

Once the previous verification steps have been completed, the next step is to build the null hierarchical model (or nonconditional model). This is nothing more than a particular simplified case of HLM, which disregards all present-level variables, that is, from Equations (27) and (28), it would result that

Σ b_{q j} X_{q t j}

=

Σ γ_{q s} W_{s j}

= 0. Therefore, an approach with application of the null model allows one to check if there is variability of the

{sdg}_{i n d e x s c o r e}

over the years from different countries, since no explicative variable will be inserted in the modeling, which only considers the existence of an intercept and the error terms

μ_{0 j}

and

r_{t j}

, whose variances are equal to

τ_{00}

and

σ^{2}

, respectively [22,23].

\begin{matrix} Y e a r (t) - L e v e l 1 : Y_{t j} = b_{0 j} + r_{t j} \end{matrix}

(25)

\begin{matrix} C o u n t r y (j) - L e v e l 2 : b_{0 j} = γ_{00} + μ_{0 j} \end{matrix}

(26)

\begin{matrix} w h i c h r e s u l t s in : Y_{t j} = γ_{00} + μ_{0 j} + r_{t j} \end{matrix}

(27)

On STATA/SE 16, one can create a null model by the command called mixed

{sdg}_{i n d e x s c o r e}

|| country:, var nolog REML, where the variable whose behavior will be studied is passed as a parameter (

{sdg}_{i n d e x s c o r e}

), together with the level variable responsible for the random effects (country). The null model algorithm looks for the associated variance, var, and is denoted in the syntax by REML (restricted estimation of maximum likelihood), the best and least demanding method available to date [22].

4.5. Hierarchical Model with Random Intercepts and Slopes

With this model, the output of the random intercept model is adjusted by adding the random tilt effects of

X_{t j}

, making it a closing of no more parallel lines.

In terms of equations, rewriting (27) and (28) achieves:

\begin{matrix} Y e a r (t) - L e v e l 1 : Y_{t j} = b_{0 j} + Σ b_{q j} X_{q t j} + r_{t j}, q = 1, \dots, 65 \end{matrix}

(28)

\begin{matrix} C o u n t r y (j) - L e v e l 2 : b_{q j} = γ_{q 0} + μ_{q j}, q = 0, \dots, 65 \end{matrix}

(29)

\begin{matrix} w h i c h r e s u l t s in : Y_{t j} = γ_{00} + Σ b_{q 0} X_{q t j} + μ_{0 j} + μ_{q j} + r_{t j}, q = 1, \dots, 65 \end{matrix}

(30)

On STATA/SE 16, the model can be created according to Figure 7 below:

4.6. Final Random Coefficients and Predictions

Once having created the models from the previous steps and chosen which one is the most suitable for the nested data structure studied according to the relative statistical tests, it is possible to generate the final random intercept/slope coefficients of the model, as well as to estimate the values of the

Y_{t j}

variable through them.

On STATA/SE 16, the series of commands for creating the final random intercept/slope coefficients and for estimating

Y_{t j}

, in that order, can be seen in the following Figure 8:

Conceptually, predict

μ_{q j}

so that

q = 0, \dots, 65

, and predict sdg $_{p r e d i c t e d}$ , fitted, where sdg $_{p r e d i c t e d}$ fulfills the role of

Y_{t j}

.

5. Results

5.1. Null Hierarchical Model

Although STATA/SE 16 does not directly show the result of the z tests with the respective significance levels for the random effects parameters (Figure 9), the fact that the estimation of the variance component, corresponding to the random intercept

μ_{0 j}

, is considerably higher than its standard error (

τ_{00}

= 102.4911, var(_cons)

≫ σ^{2}

= 10.9531) indicates significant variation in

{sdg}_{i n d e x s c o r e}

, along with country. Statistically, it appears that

z = \frac{102.4911}{10.95311} \approx 9.36 ≫ 1.96

, with 1.96 being the critical value of the standard normal distribution for a significance level of 5%.

The information in the paragraph above is very important to support the choice of hierarchical modeling, to the detriment of traditional regression modeling by OLS, and it is the main reason why a null model is always estimated in the preparation of hierarchical analyses. It is possible to prove this fact by analyzing the result of the likelihood ratio test, where

σ . X^{2} = 0.0000

implies rejecting the null hypothesis that the random intercepts are equal to zero (

H_{0} : μ_{0 j} = 0

), which causes the estimation of a traditional linear regression model to be discarded for the nested data of the base of the model.

Furthermore, using the estat icc command, an estimate of the intraclass correlation (

ρ

) of the country-level variable is approximately equal to 0.947 (

ρ

=

\frac{τ_{00}}{τ_{00} + σ^{2}}

). Interpreting it, it becomes clear that approximately 95% of the variations in

{sdg}_{i n d e x s c o r e}

are independent of year, reaffirming country as largely responsible for the configuration of its behavior (Figure 10).

Finally, with the estimation coefficient of the null model in hand, when replacing it in (29) and (30), there is:

\begin{matrix} Y e a r (t) - L e v e l 1 : Y_{t j} = b_{0 j} + r_{t j} \end{matrix}

(31)

\begin{matrix} C o u n t r y (j) - L e v e l 2 : b_{0 j} = 63.74597 + μ_{0 j} \end{matrix}

(32)

5.2. Hierarchical Model with Random Intercepts and Slopes

It is interesting to observe that the estimations of parameters and variances in the model with random intercepts and slopes are practically identical to those that can be obtained from the estimation of parameters of the model with only random intercepts (not pictured in this article, but which can be created analogously). This stems from the fact that the estimation of the variance

τ_{q j}

of the random slope terms

μ_{q j}

are statistically equal to zero (Figure 11).

5.3. Final Random Coefficients and Predictions

The values of the final random intercepts, obtained through the sum of all the individual ones (M =

\sum_{q = 1}^{65} μ_{q j}

), are shown in Table 3. It is important to check them separately, because the figures with the results of the models in the previous section do not bring them explicitly.

Following the results for

{sdg}_{p r e d i c t e d}

:

For World (Figure 12):

For Brazil (Figure 13):

5.4. HLM2 vs. Traditional Multiple Linear Regressions (MLR)

For didactic purposes, it is also possible to compare the results of HLM2 with those obtained from traditional MLR. The algebraic development or application in STATA for the latter are not the focus of this work, and consequently will not be detailed here.

For World (Figure 14):

Mean absolute error (\sqrt{\frac{Σ Δ}{n - 2}}) : HLM 2 = \pm 0.08; MLR = \pm 0.13

For Brazil (Figure 15):

Mean absolute error (\sqrt{\frac{Σ Δ}{n - 2}}) : HLM 2 = \pm 0.09; MLR = \pm 0.15

6. Conclusions

The Sustainable Development Goals offer a detailed and realistic perspective on the vast and complex range of challenges facing the world. Certainly, they are capable of converting different regions, cultures, histories and so on to a level of numerical equality, allowing for the creation of egalitarian goals to collective needs.

This study sought to apply a two-level linear hierarchical model to the SDGs that is capable of bringing the differences between where we are and where we need to be to the light of mathematical transparency. In this sense, the hierarchical model with random intercepts and slopes proved to be the best candidate.

Beyond mathematics, the hierarchical model created has a guidance connotation towards where there is still very little knowledge available. For instance, when it illuminates the fulfillment of one objective to the detriment of another, contrary to the common sense of what the majority thought was correct. Therefore, it is also a clear example of how the model manages to break with previously established biases.

In fact, it is through it, with the Level 1 explicative variables of the selection, that we realize that SDG 8 (decent employment and economic growth), SDG 9 (industry, innovation and infrastructure) and SDG 7 (affordable and clean energy) should be prioritized, since together, their random slope coefficients can explain 28.1% (9.8%, 9.7% and 8.6%, respectively)

(\frac{b_{q j}}{Σ b_{q j}})

of the

{sdg}_{i n d e x s c o r e}

variations. On the other hand, SDG 16 (peace, justice and strong institutions), SDG 17 (partnerships towards goals) and SDG 15 (life on Earth)—were the least significant, with a sum of 9.5% (1.8%, 3.8% and 4%, respectively).

Achieving SDG 8, one of those pointed out for prioritization, would mean establishing a sustained and inclusive economic growth, which can contribute to improved livelihoods for people around the world. Economic growth can lead to new and better employment opportunities and provide greater economic security for all. Moreover, rapid growth, especially among the least developed and developing countries, can help them reduce the wage gap relative to developed countries, thereby diminishing glaring inequalities between the rich and poor.

From the explicative variables point of view,

sdg 8_{a d j g r o w t h}

,

sdg 1_{w p c}

,

sdg 10_{g i n i}

,

sdg 10_{p a l m a}

and

sdg 1_{320 p o v}

have the random slope coefficients with the largest of the same explicative effects. For information purposes, in the same order, the explicative percentages of each one are 5.8%, 4.1%, 3.9%, 3.5% and 3.4%. On the other side are

sdg 3_{f e r t i l i t y}

,

sdg 16_{j u s t i c e}

,

sdg 11_{p i p e d w a t}

,

sdg 3_{t r a f f i c}

,

sdg 3_{s w b}

and

sdg 3_{s t u n t i n g}

with the worst capabilities. Added together, the five variables manage to explain only 3.55% of what is expected.

Furthermore, the variables

sdg 3_{h i v}

,

sdg 3_{u h c}

,

sdg 3_{n e o n a t}

,

sdg 3_{p o l l m o r t}

,

sdg 6_{t o i l e t}

,

sdg 11_{p i p e d w a t}

,

sdg 14_{c l e a n w a t e r}

,

sdg 15_{c p t a}

and

sdg 16_{c l a b o r}

, according to the test

P > | z |

for a significance level of 95%, were unable to define the

{sdg}_{i n d e x s c o r e}

. In a possible opportunity, disregarding the above variables, possibly SDG 3 (good health and wellbeing) would also lose its position in the ranking of priorities.

Although there are SDGs and variables with low (or with no contribution at all) to the HLM2 output, the conclusion of this work does not claim to discard them, but to prioritize them. The prioritization, in time, will culminate in effectiveness, where resources will be put in the right places, in addition to the search for an execution of excellence.

As for the analyses of specific niches, considering the priorities identified, in the range from 2022 to 2030, with the targets representing approximately 80 points on average for each SDG, it is concluded that it is necessary to progress in the following:

World:

SDG 8: 17.3% (≈1.5 points/year);
SDG 9: 33.1% (≈2.5 points/year);
SDG 7: 18.7% (1.6 points/year).

On average, this would be progress of

\approx 1.9

points/year. Considering the current World progression of 0.30 points/year, the targets for these SDGs would require 633% more effort to be achieved. In addition, their achievements would raise the

{sdg}_{i n d e x s c o r e}

to 68.8 points, 2.8 above the current level, which shows that reaching them is a priority for the World, but it is not a sufficient condition.

Brazil:

SDG 8: 23.5% (1.9 points/year);
SDG 9: 26.6% (2.1 points/year);
SDG 7: Goal already achieved.

Similarly, this is an average progress of 2 points/year. Considering Brazil’s current progression of 0.25 points/year, the targets for these SDGs would require 800% more effort to be achieved. In addition, his achievements would raise the

{sdg}_{i n d e x s c o r e}

to 73.6 points, 0.8 above the current level.

When comparing the results of Brazil and the World, we can see significant differences that prove how complex and challenging it is to achieve uniformly sustainable development. Although World offers a broad view of the world, it is not possible to ignore the individuality of each country; we would be simplifying cultures, histories, very heterogeneous economies, roughly speaking. Statistically, this statement is confirmed when we look at the error term

r_{t j}

, which is considerably higher for joint geopolitical representations.

From what was seen from SDGs 7, 8 and 9, the World has more discrepant scenarios to be resolved, while Brazil needs to deal with them in a more equal way. In addition, Brazil has already achieved the goals of one of the SDGs that was a priority, meaning that the next step can already be started. According to the ranking of priorities, the next SDGs are 1 and 10, in that order (SDG 4 has also already been achieved). In particular, SDG 10 will be the biggest challenge of all by 2030.

Comparing both models performances, it is noted that both the HLM2 and the MLR presented satisfactory outputs (i.e., mean absolute error

\approx 0

), showing that either techniques are applicable. However, from a careful perspective, the MLR mean error was, on average, 64% higher than that of HLM2 for both the World and Brazil’s scenarios; therefore, HLM2 is statistically more indicated for the proposed situation.

Another way to explain what happened is by observing the intraclass correlation for HLM2. Since the null model, it was already observed that the

{sdg}_{i n d e x s c o r e}

variance for the country level was much more significant than for the year level, and as the MLR model was constructed considering country as a static level variable, the outputs generated were very similar.

Concluding, the sustainability of development, in its full and universal form, is perhaps the greatest challenge in the modern world. It is dynamic, and requires resources, focus, and most importantly, willingness from governments, individuals and all other layers of society to be achieved. However, despite all the counterpoints, we cannot fail to pursue it, either for us or for those who will continue after us in the future. Progress and sustainability will wind up becoming synonymous.

Author Contributions

Conceptualization, M.L. and P.B.; Methodology, M.L., P.B. and L.P.F.; Software, M.L.; Validation, M.L. and P.B.; Formal analysis, P.B.; Investigation, P.B.; Resources, M.L. and L.P.F.; Data curation, M.L.; Writing—original draft, M.L.; Writing—review & editing, M.L.; Visualization, M.L.; Supervision, P.B.; Project administration, M.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

FE\|RE	Fixed\|Random Effects
HLM	Hierarchical Linear Model
HLM2	Two-level Hierarchical Linear Model
IRE\|SRE	Intercept\|Slope with Random Effects
MDG	Millennium Development Goals
MLE	Maximum Likelihood Estimation
MLR	Multiple Linear Regression
OLS	Ordinary Least-Squares
REML	Restricted Estimation of Maximum Likelihood
SDG	Sustainable Development Goal
UN	United Nations

Appendix A

Table A1. SDG Indicators Description.

	Indicators: Description
SDG 1	`_wpc`: Number of people living on less than 1.90 USD/day (%) `_320_pov`: Number of people living on less than 3.20 USD/day (%)
SDG 2	`_undernsh`: Prevalence of malnutrition (%) `_stunting`: Prevalence of obesity, BMI ≥ 30 (% of adult population) `_wasting`: Prevalence of dwarfism in children under 5 years of age (%) `_obesity`: Prevalence of wasting in children under 5 years of age (%) `_trophic`: Human trophic level (2—best, 3—worst) `_crlyld`: Cereal yield or harvest effectiveness (tons per hectare of harvested land) `_snmi`: N $_{2}$ sustainable management index (0—best, 1.41—worst) `_pestexp`: Export of hazardous pesticides (tons per million inhabitants)
SDG 3	`_matmort`: Maternal death rate at birth (per 100,000 live births) `_neonat`: Neonatal mortality rate (per 1000 live births) `_u5mort`: Under-5 mortality rate (per 1000 live births) `_tb`: Incidence of tuberculosis (per 100,000 inhabitants) `_hiv`: New HIV infections (per 1000 people infected) `_ncds`: Death rate due to cardiovascular disease, cancer, diabetes, or chronic respiratory disease in adults aged 30 to 70 years (%) `_pollmort`: Age-standardized death rate attributable to ambient air pollution (per 100,000 population) `_traffic`: Traffic deaths (per 100,000 inhabitants) `_lifee`: Life expectancy at birth (years) `_fertility`: Teenage fertility rate (births per 1000 women aged 15 to 19) `_births`: Rate of births attended by a qualified health professional (%) `_vac`: Surviving infants who received at least 2 WHO-recommended vaccines (%) `_uhc`: Index of universal health coverage as an essential service (0—worst, 100—best) `_swb`: Subjective perception of wellbeing (avg score; 0—worst, 10—best)
SDG 4	`_earlyedu`: Participation rate in preprimary organized learning (% of 4–6 years old) `_primary`: Net enrollment rate in primary education (%) `_second`: Lower high school completion rate (%) `_literacy`: Literacy rate (% of population aged 15–24)
SDG 5	`_familypl`: Demand for family planning satisfied by modern methods (% of women aged 15 to 49) `_edat`: Proportion of average years of education received between men and women (%) `_lfpr`: Female-to-male labor force participation ratio (%) `_parl`: Seats held by women in the national parliament (%)
SDG 6	`_water`: Population with access to basic drinking water services (%) `_sanita`: Population with access to basic sanitation services (%) `_freshwat`: Freshwater abstraction (% of available freshwater resources) `_wastewat`: Anthropogenic effluents receiving treatment (%) `_scarcew`: Scarce consumption of water incorporated in imports (m $^{3}$ H $_{2}$ O equivalent/capita)
SDG 7	`_elecac`: Population with access to electricity (%) `_cleanfuel`: Population with access to clean fuels and cooking technology (%) `_CO $_{2}$ TWh`: CO $_{2}$ emissions from fuel combustion per total electricity production (MtCO $_{2}$ /TWh) `_ren`: Participation of renewable energy in the total supply of primary energy (%)
SDG 8	`_adjgrowth`: Annual adjusted GDP growth (%) `_slavery`: Victims of modern slavery (per 1000 inhabitants) `_accounts`: Adults with a bank or other financial institution account or mobile money service provider (% of population aged 15 and over) `_unemp`: Unemployment rate (% of total workforce) `_rights`: Labor rights are effectively guaranteed (0—worst, 1—best) `_impacc`: Fatal work accidents incorporated into imports (per 100,000 inhabitants)
SDG 9	`_intuse`: Population with Internet access (%) `_mobuse`: Mobile broadband subscriptions (per 100 inhabitants) `_lpi`: Logistics performance index: quality of trade and transport infrastructure (1—worst, 5—best) `_uni`: Performance of the top three universities in the country in the Times Higher Education Universities Ranking (0—worst, 100—best) `_articles`: Articles published in academic journals (per 1000 inhabitants) `_rdex`: Government spending on research and development (% of GDP)
SDG 10	`_gini`: Gini coefficient (%) `_palma`: Palma ratio (%)
SDG 11	`_slums`: Proportion of urban population living in slums (%) `_pm25`: Annual average concentration of particulate matter with less than 2.5 micrometers in diameter in the air ( $μ$ g/m $^{3}$ ) `_pipedwat`: Access to treated and piped water (% of urban population) `_transport`: Satisfaction with public transport (%)
SDG 12	`_msw`: Municipal solid waste (Kg/capita/day) `_ewaste`: Electronic waste (Kg/capita) `_SO $_{2}$ prod`: SO $_{2}$ emissions based on production (Kg/capita) `_SO $_{2}$ import`: SO $_{2}$ emissions incorporated in imports (Kg/capita) `_N $_{2}$ prod`: N $_{2}$ emissions based on production (Kg/capita) `_N $_{2}$ import`: N $_{2}$ emissions incorporated in imports (Kg/capita) `_explastic`: Exports of plastic waste (Kg/capita)
SDG 13	`_CO $_{2}$ gcp`: CO $_{2}$ emissions from fossil fuels and cement produced (tCO $_{2}$ /capita) `_CO $_{2}$ import`: CO $_{2}$ emissions incorporated in imports (Kg/capita) `_CO $_{2}$ export`: CO $_{2}$ emissions embodied in fossil fuel exports (Kg/capita)
SDG 14	`_cpma`: Average area protected in marine sites important for biodiversity (%) `_cleanwat`: Ocean Health Index: clean water score (0—worst, 100—best) `_fishstocks`: Fish caught in overexploited or collapsing stocks (% of total catch) `_trawl`: Fish caught by trawling or dredging (%) `_discard`: Fish caught that are discarded (%) `_biomar`: Threats to marine biodiversity embodied in imports (per million population)
SDG 15	`_cpta`: Average area protected in terrestrial sites important for biodiversity (%) `_cpfa`: Average area protected in freshwater sites important for biodiversity (%) `_redlist`: Species survival red list index (0—worst, 1—best) `_forchig`: Permanent deforestation (% of forest area, 5-year average) `_biomafrwter`: Threats to terrestrial and freshwater biodiversity embodied in imports (per million inhabitants)
SDG 16	`_homicides`: Homicide rate (per 100,000 inhabitants) `_detain`: Unsentenced inmates (% of prison population) `_safe`: Population that feels safe walking alone at night or in the area where they live (%) `_prs`: Property rights (1—worst, 7—best) `_u5reg`: Birth registrations with civil authority (% of children under 5) `_cpi`: Corruption perception index (0—worst, 100—best) `_clabor`: Children involved in child labor (% of population aged 5–14) `_weaponexp`: Exports of major conventional weapons (Constant TIV in millions of dollars per 100,000 population) `_rsf`: Press freedom index (0—best, 100—worst) `_justice`: Access and accessibility of justice (0—worst, 1—best)
SDG 17	`_govex`: Government spending on health and education (% of GDP) `_oda`: Concerning OECD Members: international grant of public finance, including official development assistance (% of GNI) `_govrev`: Government revenue, excluding subsidies (% of GDP) `_cohaven`: Corporate tax haven score (0—best, 100—worst) `_statperf`: Statistical performance index (0—worst, 100—best)

References

Tayra, F.; Ribeiro, H. Beyond technical and economic issues: A value review for sustainable development. J. Pontif. Cathol. Univ. São Paulo 2005, 16, 20–35. [Google Scholar]
Our Common Future: Report of the World Commission on Environment and Development (WCED). Available online: sustainabledevelopment.un.org/content/documents/5987our-common-future.pdf (accessed on 8 August 2022).
United Nations—Brazil. Available online: brasil.un.org/pt-br/sdgs (accessed on 2 July 2022).
Van Belle, H.M. Sustainable Development: A description of the main evaluation tools. J. Environ. Soc. 2004, 7, 67–87. [Google Scholar]
Saito, O.; Kaine, N.; Kauffmann, J.; Takeuchi, K. Sustainability science and implementing the sustainable development goals. Sustain. Sci. 2017, 12, 907–910. [Google Scholar] [CrossRef]
Hardi, P.; Zdan, T. Assessing Sustainable Development: Principles in Practice, 1st ed.; IISD: Winnipeg, MB, Canada, 1997. [Google Scholar]
Krama, M.R. Development and Analysis of Sustainable Development Indicators in Brazil Using the Sustainability Panel Tool. Master’s Thesis, Catholic University of Paraná, Curitiba, Brazil, 2008. [Google Scholar]
Shamaeva, E.F.; Surskova, E.S. Modeling the relationship between achieving the sustainable development goals on the example of the Russian Federation. In Proceedings of the XVII International Scientific and Practical Conference on Sustainable Development of Region, Yekaterinburg, Russia, 24 November 2021. [Google Scholar]
Salvia, A.L.; Filho, W.L.; Brandli, L.L.; Griebeler, J.S. Assessing research trends related to Sustainable Development Goals: Local and global issues. J. Clean. Prod. 2019, 208, 841–849. [Google Scholar] [CrossRef]
Barbosa, G.S.; Cortina, C. Challenges of sustainable development. J. Visions 2018, 4, 97–108. [Google Scholar]
Why the Sustainable Development Goals Matter. Available online: http://www.project-syndicate.org/commentary/sustainable-development-goals-shift-by-jeffrey-d-sachs-2015-03 (accessed on 11 September 2022).
Toscano, C.V.; Ferreira, J.P.; Gaspar, J.M.; Carvalho, H.M. Growth and Weight status of Brazilian children with autism spectrum disorders: A mixed longitudinal study. Pediatr. J. 2019, 95, 705–712. [Google Scholar] [CrossRef] [PubMed]
Sustainable Development Report (SDR). Available online: dashboards.sdgindex.org (accessed on 9 August 2022).
Clemente, F.; Ferreira, D.M.; Lírio, V.S. Evaluation of Ceará’s State Sustainable Development Index (SDI). J. Econ. Dev. 2011, 52, 149–168. [Google Scholar]
Montenazo, L.; Isidro, A. Proposal for a Multilevel Competency Model for Innovative Public Management. Future Stud. Res. J. 2020, 2020, 355–378. [Google Scholar]
Curtis, P.G.; Slay, C.M.; Harris, N.L.; Tyukavina, A.; Hansen, M.C. Classifying drivers of global forest loss. J. Sci. 2018, 361, 1008–1011. [Google Scholar] [CrossRef] [PubMed]
Martín-Blanco, C.; Zamorano, M.; Lizárraga, C.; Molina-Moreno, V. The Impact of COVID-19 on the Sustainable Development Goals: Achievements and Expectations. Int. J. Environ. Res. Public Health 2005, 19, 16266. [Google Scholar] [CrossRef]
Veroneze, S.; Schmidt, O.; Magro, C.B.D.M.; Mazzioni, S. Corporate social responsibility and adherence to the Sustainable Development Goals. J. Adm. IMED 2021, 1, 113–137. [Google Scholar]
Boggia, A.; Cortina, C. Measuring Sustainable Development Using a Multi-Criteria Model: A Case Study. J. Environ. Manag. 2010, 91, 2301–2306. [Google Scholar] [CrossRef] [PubMed]
Snijders, T.A.B.; Bryk, R.J. Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling, 2nd ed.; Sage Publications: London, UK, 2012. [Google Scholar]
Lisboa, E.G. Sustainability Development Index: A quantitative analysis using multiple linear regression. Braz. J. Dev. 2020, 6, 15179–15185. [Google Scholar]
Gelman, A.; Hills, J. Data Analysis Using Regression and Multilevel/Hierarchical Models, 1st ed.; Cambridge University Press: Cambridge, UK, 2007; p. 648. [Google Scholar]
Fávero, L.P.; Belfiore, P. Manual of Data Analysis—Statistics and Multivariate Modeling with Excel, SPSS and Stata, 1st ed.; Gen|LTC: São Paulo, Brazil, 2017. [Google Scholar]
Raudenbush, S.; Bryk, A. Hierarchical Linear Models: Applications and Data Analysis Methods, 2nd ed.; Sage Publications: Chicago, IL, USA, 2002. [Google Scholar]
Searle, S.R.; Casella, G.; Mcculloch, C.E. Variance Components, 1st ed.; Wiley: New York, NY, USA, 2006. [Google Scholar]
Tabachnick, B.G.; Fidell, L.S. Using Multivariate Statistics, 6th ed.; Pearson: Boston, MA, USA, 2013. [Google Scholar]
Hox, J.J. Applied Multilevel Analysis, 1st ed.; TT Publikaties: Amsterdam, The Netherlands, 1995. [Google Scholar]
Cousineau, D.; Chartier, S. Outliers detection and treatment: A review. Int. J. Psychol. Res. 2010, 1, 59–68. [Google Scholar] [CrossRef]
Coelho, F.R. Multilevel Hierarchical for Education Performance Data. Master’s Thesis, Federal University of São Carlos, São Paulo, Brazil, 2017. [Google Scholar]
West, B.T.; Welch, K.B.; Galecki, A.T. Linear Mixed Models: A Practical Guide Using Statistical Software, 2nd ed.; Chapman and Hall/CRC Press: Boca Raton, FL, USA, 2015. [Google Scholar]

Figure 1. UN’s Agenda 2030 Sustainable Development Goals.

Figure 2. SDG over the Years—Brazil and the World.

Figure 3. General diagram model of a database with nested structure of data grouped in two levels.

Figure 4. Individual models representing the instances in each of the J groups.

Figure 5. Structured diagram of data grouped in two levels applied to the SDR dataset.

Figure 6. Stepwise procedure methodology.

Figure 7. Command in STATA/SE 16 for the random intercepts and slope model.

Figure 8. Command in STATA/SE 16 for to obtain the final intercept/slope coefficients.

Figure 9. Null hierarchical model output in STATA/SE 16.

Figure 10. Intraclass correlation.

Figure 11. Hierarchical model with random intercept and slope outputs in STATA/SE 16.

Figure 12. SDG over time—actual and HLM2, World.

Figure 13. SDG over time—actual and HLM2, Brazil.

Figure 14. SDG over time—HLM2 vs. MLR, World.

Figure 15. SDG over time—HLM2 vs. MLR, Brazil.

Table 1. General tabular model of a database with nested structure of data grouped in two levels.

Individual i	Group j	$Y_{ij}$	$X_{1 ij}$	$X_{2 ij}$	⋯	$X_{Qij}$	$W_{1 j}$	$W_{2 j}$	⋯	$W_{Sj}$
(Level 1)	(Level 2)
1	1	$Y_{11}$	$X_{111}$	$X_{211}$	⋯	$X_{Q 11}$	$W_{11}$	$W_{21}$	⋯	$W_{S 1}$
⋮	⋮	⋮	⋮			⋮				⋮
2	1	$Y_{21}$	$X_{121}$	$X_{221}$	⋯	$X_{Q 21}$	$W_{11}$	$W_{21}$	⋯	$W_{S 1}$
⋮	⋮	⋮	⋮			⋮				⋮
$n_{1}$	1	$Y_{n_{1} 1}$	$X_{1 n_{1} 1}$	$X_{2 n_{1} 1}$	⋯	$X_{Q n_{1} 1}$	$W_{11}$	$W_{21}$	⋯	$W_{S 1}$
$n_{1} + 1$	2	$Y_{n_{1 + 1} 2}$	$X_{1 n_{1 + 1} 2}$	$X_{2 n_{1 + 1} 2}$	⋯	$X_{Q n_{1 + 1} 2}$	$W_{12}$	$W_{22}$	⋯	$W_{S 2}$
⋮	⋮	⋮	⋮			⋮				⋮
$n_{2}$	2	$Y_{n_{2} 1}$	$X_{1 n_{2} 1}$	$X_{2 n_{2} 1}$	⋯	$X_{Q n_{2} 1}$	$W_{12}$	$W_{22}$	⋯	$W_{S 2}$
⋮	⋮	⋮	⋮			⋮				⋮
$n_{j - 1} + 1$	J	$Y_{(n_{j - 1} + 1) J}$	$X_{1 (n_{j - 1} + 1) J}$	$X_{2 (n_{j - 1} + 1) J}$	⋯	$X_{Q (n_{j - 1} + 1) J}$	$W_{12}$	$W_{22}$	⋯	$W_{S 2}$
$n_{j - 1} + 2$	J	$Y_{(n_{j - 1} + 2) J}$	$X_{1 (n_{j - 1} + 2) J}$	$X_{2 (n_{j - 1} + 2) J}$	⋯	$X_{Q (n_{j - 1} + 2) J}$	$W_{12}$	$W_{22}$	⋯	$W_{S 2}$
⋮	⋮	⋮	⋮			⋮				⋮
N	J	$Y_{n J}$	$X_{1 n J}$	$X_{2 n J}$	⋯	$X_{Q n J}$	$W_{1 J}$	$W_{2 J}$	⋯	$W_{S J}$

Table 2. Nested structure of data grouped in two levels applied to the SDR dataset.

Year i (Level 1)	Country ISO₃ j (Level 2)	SDG Index Score_ij	sdg1_{wpc_ij}	sdg1₃₂₀_{pov_ij}	⋯	sdg15cpfa_ij	⋯	Population_ij
2000	AFG	44.8	-	-	⋯	0.0	⋯	20,779,957
2001	AFG	45.1	-	-	⋯	0.0	⋯	21,606,992
⋮	⋮	⋮	⋮	⋮			⋮
2000	BRA	67.4	92.5	74.0	⋯	27.2	⋯	174,790,339
2000	BRB	67.3	96.7	88.0	⋯	-	⋯	271,511
⋮	⋮	⋮	⋮	⋮			⋮
2001	PRT	70.9	99.8	99.4	⋯	48.6	⋯	10,297,117
⋮	⋮	⋮	⋮	⋮			⋮
2021	USA	74.5	99.4	98.9	⋯	31.6	⋯	332,915,074
2021	UZB	69.6	92.6	51.4	⋯	13.4	⋯	33,935,765
⋮	⋮	⋮	⋮	⋮			⋮
2021	ZWE	56.5	-	-	⋯	65.8	⋯	15,092,171

Table 3. Final random coefficients by country.

Country	…	Brazil	…	World
M	…	17.0338796	…	17.0209741

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lemes, M.; Belfiore, P.; Fávero, L.P. A Case Study on Hierarchical Linear Models Applied to the UN’s Sustainable Development Goals (SDGs): A Perspective Using the World and Brazil’s Data. Sustainability 2023, 15, 8304. https://doi.org/10.3390/su15108304

AMA Style

Lemes M, Belfiore P, Fávero LP. A Case Study on Hierarchical Linear Models Applied to the UN’s Sustainable Development Goals (SDGs): A Perspective Using the World and Brazil’s Data. Sustainability. 2023; 15(10):8304. https://doi.org/10.3390/su15108304

Chicago/Turabian Style

Lemes, Murilo, Patrícia Belfiore, and Luiz Paulo Fávero. 2023. "A Case Study on Hierarchical Linear Models Applied to the UN’s Sustainable Development Goals (SDGs): A Perspective Using the World and Brazil’s Data" Sustainability 15, no. 10: 8304. https://doi.org/10.3390/su15108304

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Case Study on Hierarchical Linear Models Applied to the UN’s Sustainable Development Goals (SDGs): A Perspective Using the World and Brazil’s Data

Abstract

1. Introduction

2. Challenges Related to the Sustainable Development Goals

Brazil’s Performances against the Goals

3. Nested Data Structures

3.1. Two-Level Hierarchical Linear Model (HLM2)

3.2. HLM2 vs. Traditional Multiple Linear Regressions (MLR)

4. Methodology

4.1. Sample Data

4.2. Variable Selection

4.3. Methodology Applied in STATA/SE 16:

4.4. Null Hierarchical Model

4.5. Hierarchical Model with Random Intercepts and Slopes

4.6. Final Random Coefficients and Predictions

5. Results

5.1. Null Hierarchical Model

5.2. Hierarchical Model with Random Intercepts and Slopes

5.3. Final Random Coefficients and Predictions

5.4. HLM2 vs. Traditional Multiple Linear Regressions (MLR)

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI