2.2. Basic Definitions and Notation
We summarized information on the education achievements of parents and children through a 2 × 2 (correspondence) matrix, denoted with
. The rows of the matrix indicate the education level of the parents, whereas the columns represent the same variable for the children. In particular,
denotes the event ‘to have at least one parent tertiary-educated’;
, the complement of
, represents the event ‘to have neither parent tertiary-educated’;
denotes the event ‘to have achieved tertiary education’, and
indicates the complement of
, the event ‘to have not achieved tertiary education’. Consequently, the columns display the percentage of children that have achieved tertiary education (
) or not (
). Likewise, the rows indicate the percentage of parents (at least one) that have achieved tertiary education (
) or not (
). We use
to denote the percentage of children who have not achieved a tertiary education level, given that their parents have not achieved the tertiary education level,
. In the same way,
represents the probability that a child has not achieved a tertiary education level given that he/she has at least one parent that has achieved a tertiary education level,
. Formally:
This specification can be interpreted as a sort of transition matrix, and the rows sum to one. In fact, we observe that this is a particular case of an
transition matrix with
. Thus, starting from the definition of intergenerational mobility reported in [
8], we assume two groups (parents and children) stratified according to their level of education, which consist of two states (‘with tertiary education’ and ‘without tertiary education’). Again, the parents’ status is reported by rows, and the columns denote the children’s status. Based on these premises, the correspondence matrix can be interpreted as an intergenerational transition matrix since we are analyzing the educational status of parents (at time t − 1) and children (at time t).
‘Mobility’ and ‘Opportunities’: Two Sides of the Same Coin?
We assumed intergenerational mobility to be strongly related with income levels, and we thus divided the income distribution of both parents and children into
quantiles. Moreover, here we do not require the matrix
to be bi-stocastic; that is, both rows and columns sum to one, since a row-stochastic matrix completely satisfies such a key assumption. The following example clarifies our definition. The matrix calculated for Italy is:
The upper left value (86.1%) indicates the percentage of adults not achieving tertiary education from families where no parents have achieved tertiary education. This means that the remaining 13.9% has achieved tertiary education. Likewise, the bottom row represents the percentage of adults from families where at least one parent has achieved tertiary education, where 32.3% of children have not achieved tertiary education and the remaining 67.7% have. Since the numbers in each row sum to one, is a row-stochastic matrix.
Looking at
distribution, we find that the observed values range from 20.7% to 57.1%. Half of countries are concentrated in the interval (30–40%); Poland, Korea, and Israel have the lowest values—20.7%, 21.0%, and 21.3%, respectively. This means that only 20% of children whose parents have tertiary education do not achieve tertiary education; the remaining 80% graduate. Austria (57.1%), Sweden (44.2%), and Estonia (40.0%) rank bottom.
Figure 1 compares
p and
q values for all OECD countries, outlining a moderate correlation (
) between these two dimensions. Countries with the percentage of individuals whose educational status does not change with respect to their parents are on the diagonal. Without loss of generality, we assume
p >
q. Empirical data confirm this assumption, since the mean of
and
was 0.69 and 0.30, respectively;
and
ranged between 0.497 and 0.878 and between 0.207 and 0.571, respectively, meaning that the majorization holds.
2.4. Composing Individual Indexes into a Summary Indicator
Literature on composite indicators related to social, economic, and environmental domains consolidated in the last decades [
34] and will develop further thanks to the rising availability of digital data in any research field [
35]. Multivariate indicators collapsing information into a single metric attracted increasing attention in recent times [
36], likely because they allow the performance of a given unit (e.g., country or region) to be measured (and compared) over time and space in a fast and intuitive way [
37]. Instead of multiple measures, a single number contributes to both political decisions and public communication [
38]. Although collapsing a multivariate set of information into a single number might hide some interesting aspects [
39], advantages largely overpass disadvantages in this kind of aggregation. Conceptually speaking, the idea behind the aggregation of the two indexes in a single measure was aimed at delineating a latent dimension of long-term sustainability in tertiary education [
40]. In other words, we assumed the outcomes of tertiary education in a given economic system as sustainable (sensu [
41]) if the two dimensions of ‘mobility’ and ‘opportunities’ reach the highest scores [
42], i.e., giving the best chance to achieve a satisfactory education level in the most favorable socioeconomic context to a given student [
43]. The highest ranks of the composite indicator thus delineate a condition of long-term sustainability of the tertiary education system [
44], in turn reflecting the intrinsic efficiency of the system itself [
45].
With this perspective in mind, we investigated the appropriateness of different methodologies aggregating the two elementary indexes introduced above into a composite index: (i) the arithmetic mean (EW), (ii) the Adjusted Mazziotta-Pareto index (AMPI), and (iii–vii) five modifications thereof based on distinctive weighting systems that depend on the Gini coefficient, its reciprocal value (GW, iG, GAMPI, and iGAMPI), and the geometric mean (GM). Ciommi et al. [
46] provided a detailed description of these methodologies. We represent our data with a matrix
, whose entry
represents the value of the j-th elementary indicator corresponding to the
i-th country, with
j = 1, 2, and
i = 1, 2, …, 26. As proven by [
46], the following formula summarizes six methods for building the composite indicator for a given country
i
where α = {−1,0,1}, β = {0,1}, and
denotes the normalized indicators obtained according to the Mazziotta and Pareto method [
47] (it covers the interval
);
and
are the two
goalposts for
and
, respectively.
and
are the respective maximum and minimum values of the two indicators (denoted by
) across all OECD countries, whereas
denotes the reference value; that is, the average value for any indicator.
represents the Gini index computed for indicator
across all countries. Finally,
and
denote, respectively, the standard deviation and coefficient of variation of all normalized indicators
. The six indices were determined choosing different combinations of
and
. The simplest index,
, is obtained by fixing
and
. The resulting index is
which represents the simple mean. With the simple mean, we assume that the two indexes, namely
and
, have the same weight and thus the same importance. This is a reasonable requirement when we have no a priori information about the relative importance of the characteristic dimensions of the phenomenon under investigation. The second method is the Adjusted Mazziotta–Pareto method [
13], hereafter
, computed as
by assuming
and
. Thus, for a given country
we have
As stressed by Mazziotta and Pareto, this method belongs to non-compensatory composite measures based on a penalty function (
). The starting point for constructing the
is the computation of the arithmetic mean of the elementary indexes, adjusting for horizontal variability of the indexes themselves. The third method,
, captures vertical inequality. It is a modification of the
method where, instead of choosing the same weight for both indicators, we compute a weighted average of
and
with weights based on the Gini index of inequality. With this system of weights, we account for the distribution of indicators [
48]. Formally, we have
weights unequal distributions more heavily [
49], so if we believe that a more homogenous distribution of an indicator should imply a higher weight for that indicator, we use the inverse Gini index as the weighting system. Thus, the definition of the
index is as follows:
The fifth and sixth methods are a combination of
,
,
, and
respectively. In detail, for the former, the starting point is
, penalized as in the
method. The resulting method is the so-called
, defined as
whereas, for the former,
, the starting point is the
, which we penalize according to the
method. The resulting index is computed as follows:
Finally, the last method is a complete non-compensatory method, namely the geometric mean. It is defined as
The idea behind the use of a geometric mean is that if a country reaches the minimum value in one dimension [
50], this component should be not compensated by a high performance in the second dimension, as occurs for the arithmetic average [
51,
52,
53]. A comparison of the corresponding rankings contributes to identify the countries with the best and worst performance [
54], delineating optimal aggregation rules for our data (e.g., [
55]). Finally, we compare the ranking obtained using this aggregation rule with expenditure on educational institutions as a percentage of Gross Domestic Product (GDP). Following [
56], this exercise aimed at verifying the eventual relationship between input (expenditure side) and output (namely the ‘opportunities’ and ‘mobility’ dimensions characteristic of tertiary education).