1. Introduction and Motivation
The purpose of the study described in the article was to assess social cohesion of the provinces of Poland in 2018. The study was based on classic metric data and symbolic interval-valued data. Interval-valued variables describe objects of interest more precisely than classical metric variables. For classical metric data an observation on each variable in a data matrix is expressed as one real number (atomic approach). In contrast, for symbolic interval-valued data, observations on each variable are expressed as intervals
), where
denotes the lower bound and
the upper bound of the interval. Studies by [
1,
2] provide different examples of data that in real life are of interval type.
Social cohesion is often measured by means of various composite indicators. Duhaime et al. [
3] make an attempt to measure the level of social cohesion in the Canadian Artic using six sets of indices: presence of social capital, demographic stability, social and economic inclusion, community quality of life, individual quality of life. Based on the definition of social cohesion by Bernard [
4] and Chan et al. [
5], the VALCOS (VALeurs et COhésion Sociale) index of social cohesion elaborated for European countries was developed [
6]. It covers the political and socio-cultural domains of life in their formal and substantial relations. Langer et al. [
7] developed two social cohesion indices: a national average SCI and a Social Cohesion Index Variance-Adjusted (SCIVA) to measure the national-level of social cohesion for 19 African countries in 2005, 2008 and 2012. The indices represent social cohesion as a triangle composed of the three components of societal relationships and attitudes: inequalities, trust, and identities. In the Polish literature, e.g., [
8] analysed social cohesion in EU countries using a synthetic measure of development put forward by Z. Hellwig [
9]. There are a number of other publications that have proposed composite indicators of social cohesion (i.e., [
10,
11]).
In some studies the index-based analysis is extended by the inclusion of additional statistical methods. Janmaat [
12] considered 14 indicators describing 8 components of social cohesion and used exploratory factor analysis (EFA) to map 41 countries in the world in a two-dimensional space showing the relationship between two factors (solidarity, participation). In another study, Bottoni [
13] used 24 indicators describing 7 dimensions of social cohesion for 29 European countries to build a multilevel CFA (confirmatory factor analysis) model of social cohesion. Dickes and Valentova [
14] used multidimensional scaling (MDS) and a CFA model of social cohesion to map 47 European countries a two-dimensional MDS space. The results served as the basis for comparing levels of social cohesion in six broader geographical regions in Europe. An analogical study involving the same methods (MDS, CFA) for 33 European countries was described in Dickes et al. [
15]. Using factor analysis and standardisation, Rajulton et al. [
16] created an overall index of social cohesion across 49 Census Metropolitan Areas of Canada based on three dimensions: political (voting and volunteering), economic (occupation, income, labour force participation) and social (social interactions, informal volunteering). Lafuente et al. [
17] make an attempt to assess the sustainability of social cohesion in the EU using nonlinear time-varying factor model and analyzing the level of convergence across EU countries.
In Dehnel et al. [
18] the level of social cohesion was assessed using a hybrid approach combining multidimensional scaling and linear ordering, applied separately to classic metric data and symbolic interval-valued data (1st and 3rd quartile); the results were then compared.
The novelty of the study presented in this article consists in jointly mapping, by means of multidimensional scaling, classic metric data and symbolic interval-valued data (three data types: min-max, 1st decile and 9th decile, 2nd decile and 8th decile) in one chart. In the next step, all objects (districts) were ordered according to the level of social cohesion determined by means of an aggregate measure (composite indicator) based on the Euclidean distance from the pattern object. The application of a consistent research method ensures comparability of rankings of Poland’s provinces in terms of social cohesion. In the following step assessments of social cohesion based on 4 different types of data were compared using two criteria: results of cluster analysis involving a distance based on two correlation coefficients (Spearman’s and Kendall’s ) and the analysis of the degree of compatibility between rankings of districts based on individual variables and the overall ranking based on the aggregate measure.
It is worth noting that existing studies aimed at assessing the level of social cohesion are usually based on primary data, from a surveys. The use of these sources are associated with a certain limitation. Namely primary data do not allow for the inclusion for assessing the level of social cohesion in higher level territorial units (e.g., provinces) data for lower-level units (districts). Only secondary sources can provide this possibility by introducing interval-valued data into the analysis. The empirical study described in this article was based on secondary data from official statistical sources. They were obtained from the Local Data Bank (BDL) using the bdl R package [
19] and API interface (Application Programming Interface). BDL is the Polish acronym of the Local Data Bank (Bank Danych Lokalnych).
2. Overview of Social Cohesion Concepts
Social cohesion is a concept which is frequently mentioned in various projects and analyses, both in research and in government policies. The measurement of social cohesion and comparative analyses of its level in different territorial units are far from easy. This is because no clear definition or conceptualization of social cohesion has been proposed so far. There is still no consensus on how to define values and factors related to the construct of social cohesion. The complexity of the concept of social cohesion can be illustrated by the variety of approaches that can be found in the literature [
4,
5,
7,
15,
20,
21,
22,
23,
24,
25,
26,
27]. All of them refer, to a varying degree, to six dimensions: social relations, identification, orientation towards the common good, shared values, quality of life, and (in)equality, though it is often indicated that the last three are antecedents or consequences of social cohesion rather than its core dimensions [
28]. Although various approaches involve different areas of social cohesion, refer to different political views, are informed by different ideologies or concerns of policy makers, the majority of them overlap, covering similar dimensions [
29] (p. 23). Nonetheless, each of the approaches proposed in the literature can be viewed as representing one of two discourses: the academic or the policy discourse [
5].
The first category academic discourse is closely connected with social sciences, such as sociology and psychology. Studies referring to the academic discourse focus on processes of social integration and stability and social exclusion, while ignoring dilemmas associated with the definition of social cohesion [
30,
31,
32].
The second category policy discourse refers to policies undertaken by governments and various national and international institutions (the European Union, Council of Europe, World Bank or OECD). In this case, social cohesion is viewed as a prerequisite of economic well-being. This goal can be achieved by overcoming numerous economic and social problems resulting from unequal income distribution, employment, housing issues, limited access to health care and education, participation in political and public life. Thus policy discourse can be described as problem-driven [
5,
15]. The list of major problems includes unemployment, poverty and social exclusion.
Policy-oriented studies are initiated and conducted by many national and international socio-political entities (governments, think tanks, foundations, organizations). Policy-makers usually address problems of social cohesion with a focus on their own concerns and from particular policy fields. Sometimes they even try to use the social cohesion term to promote their own agendas. As a result, policy-orientated analyses need to be treated with caution. For one thing, such studies often include too many social indicators, many of which do not properly capture components of social cohesion. Moreover, because territorial units face a complex mixture of socio-economic problems, every entity conducting research tends to create its own definition of social cohesion. Another problem is associated with the lack of distinction between factors affecting social cohesion and its actual components [
5,
13,
15,
28].
Both types of discourse on social cohesion described above are multi-faceted, but in each case the concept of social cohesion is different. The academic discourse is focused on a conceptual and analytic understanding of social cohesion. The policy discourse is rather social and economic problem-oriented. This is the main cause of existing discrepancies in measurement, and consequently in the assessment of the level of social cohesion. An additional difficulty of the academic approach is that it is largely based on subjective, qualitative assessments. The need to standardize the measurement method used to monitor the level and changes in social cohesion across time and societies is not new. However, the development of universal, assessable indicators would require the establishment of a single definitional framework of social cohesion. It should be based on the core elements of existing approaches to social cohesion presented and systematically mentioned in the literature of the past decades. Most approaches proposed so far refer to the policy discourse, which focuses on all types of social challenges faced by society [
5,
15,
32]. A synthetic review of key approaches representing both the policy and academic discourse is presented below.
One of the first and most frequently cited concepts of social cohesion was formulated by Jane Jenson and was based on results of her own studies [
20]. It represents the perspective found in the policy discourse. Rather than being a single definition, it consists of five dimensions:
Belonging vs. isolation—which refers to the existence or lack of shared values and a sense of identity,
Inclusion vs. exclusion—which refers to equal opportunities and citizens’ access to economic institutions and the market,
Participation vs. non-involvement—which refers to political and social participation at various levels of government, especially the local level,
Recognition vs. rejection—which refers to respect and tolerance of diversity in a diverse society,
Legitimacy vs. illegitimacy—which addresses the question of respect for existing social norms and laws, the legitimacy of the main political and social institutions, especially the state, as mediators between different stakeholders.
Bernard [
4] developed Jenson’s definition by adding another dimension—equality vs. inequality — as an important element of the economic sphere. He introduced a typology based on two aspects. The first one comprised spheres of activity undertaken by society (economic, political and socio-cultural, while the second referred to social relations in the strict sense [
6]).
In 2002, Beauvais and Jenson [
22], drawing on Jenson’s pluralistic approach [
20], put forward a definition of social cohesion consisting of five elements: 1. Common values and a civic culture, 2. Social order and social control, 3. Social solidarity and reduction in wealth disparities, 4. Social networks and social capital, 5. Place attachment and identity [
5]. The authors indicate that the way social cohesion is defined should depend on the main aspects investigated by the researcher, on a specific policy and should also address the most urgent social problems faced by territorial units, such as unemployment, poverty, discrimination, exclusion or any other problems that the researcher deems relevant [
5].
In the literature a lot of attention has been paid to concepts of social cohesion are defined in terms of instruments enabling the achievement of cohesion, which is known as the means-end approach. In this case, the definition contains conditions that need to be met in order to ensure social cohesion [
15]. Such concepts, representing the political approach, have been proposed, among others, by Berger-Schmitt [
33,
34] and Noll [
35]. They demonstrated that social cohesion consists of two analytically different dimensions: inequality and social capital. The first refers to the question of promoting equal opportunities and reducing disparities and inequalities. The goal of the second one involves the strengthening of social relations, interactions and ties and comprises all aspects generally regarded as social capital [
34]. The concepts have been criticized for being based on the definition of social cohesion defined in terms of conditions that can foster its development. An alternative proposal was advanced by Duhaime [
5], who identified two components of social cohesion: access to formal economic and governmental institutions and access to family and community-based, face-to-face relations. Indicators identified in this concept (in line with the means-end approach), in many cases made reference to conditions favouring the development of social cohesion [
3].
Another concept worth mentioning was formulated in 2006 by Chan et al. [
5]. Following Bernard’s definition, social cohesion was described as “a state of affairs concerning both vertical and the horizontal interactions among members of society as characterized by a set of attitudes and norms that includes trust, sense of belongingness and the willingness to participate and help as well as their behavioural manifestations” [
5]. While retaining the political and sociocultural sphere of social cohesion, the authors decided to exclude the economic dimension, arguing for a minimalist definition which ignores all characteristics regarded as factors or determinants of social cohesions, such as equality of opportunities, equality and social integration [
6]. It should be emphasized that both Bernard [
4] and Chan et al. [
5] assert that social cohesion is a property of a group or society, not an individual. This means that even if it is measured at individual level, ultimately such data are aggregated and social cohesion is described at the level of different groups, regions or communities [
15].
Following the studies of Bernard [
4] and Chan et al. [
5], Dickes et al. [
15] and Dickes and Valentova [
14] proposed their own definitions of social cohesion, which also did not account for the economic dimension. Four elements of social cohesion were distinguished: institutional trust (i.e., legitimacy vs. illegitimacy), solidarity and concern for the common good (i.e., acceptance vs. rejection), political participation and socio-cultural participation.
When one analyses the conceptualisation of social cohesion, it is possible to track the direction of changes which reflect the increasing role of socio-cultural and political indicators and the omission of the economic sphere. In contrast, a similar review of social cohesion research reveals that the emphasis is shifting towards spatial analysis of social cohesion, which takes into account not only the national but also regional and local level [
36,
37,
38,
39]. Moreover, a given society’s level of social cohesion can only be properly assessed when it can be compared across territorial units and over time. This can be achieved using methods of measuring social cohesion applied in the EU or proposed by OECD (Organisation for Economic Cooperation and Development). In the case of studies relating to countries and regions of the European Union, the EU regional Social Progress Index (EU-SPI) has been used since 2016 (
https://ec.europa.eu/regional_policy/en/information/maps/social_progress). For purposes of the index, social progress is defined as “a society’s capacity to meet the basic human needs of its citizens, to establish the basis for people and communities to improve and sustain their quality of life and to create the conditions for people to reach their full potential” [
40] (p. 91). The EU-SPI is calculated on the basis of variables representing the socio-cultural and political dimension. The economic dimension is deliberately excluded, which facilitates the assessment of the level of social cohesion, as economic indicators make it difficult to distinguish between causes and effects of this level in the final analysis. The EU-SPI is consistent with the overall framework of the global Social Progress Index and is based on fifty indicators, primarily from Eurostat. It covers three dimensions of social progress [
41]:
basic human needs (nutrition and basic medical care, water and sanitation, shelter housing, personal safety),
foundations of well-being (access to basic knowledge, access to information and communication, health and wellness, environmental quality),
opportunity (personal rights, personal freedom and choice, tolerance and inclusion, access to advanced education).
Given the nature of statistical (symbolic interval-valued) data used in the analysis, the empirical research described in the article was based on the approach adopted in studies conducted by EU countries using the EU-SPI. In our study this concept was applied at a lower level of spatial aggregation, namely at province level. These territorial units (Pol. województwo) vary considerably in economic and social terms, reflecting different historical developments in three parts of Poland annexed by three neighbouring countries in the 18th century. As a result, western provinces are generally characterised by a higher level of economic development than the ones in the eastern part of the country. The purpose of the analysis was to determine in this differentiation is reflected in the assessment of the level of social cohesion.
3. Research Methodology
The assessment of social cohesion in the provinces of Poland was performed using four datasets: classic metric data and interval-valued data (three types: min-max, 1st and 9th deciles, 2nd and 8th deciles) by applying a hybrid approach involving multidimensional scaling and linear ordering. Multidimensional scaling made it possible to map 4 datasets describing 16 provinces onto a two-dimensional space; and then results of linear ordering were used to compare rankings of provinces in terms of social cohesion.
The research methodology is a modified approach proposed by Walesiak [
42] and Dehnel et al. [
18]. The research procedure, which makes it possible to present four types of data in one study, consists of the following steps:
Select a complex phenomenon that cannot be measured directly. In this study, it was the level of social cohesion.
Select a set of objects and a set of variables closely related with the complex phenomenon of interest. The study involves measuring characteristics of objects (—object number) described by means of variables —variable number). Collected information comprises classic metric data and three types of interval-valued data (min-max, 1st and 9th deciles, 2nd and 8th deciles). Metric data converted into interval-valued data are arranged in the form of data table (. The three types of interval-valued data min-max, 1st and 9th deciles, 2nd and 8th deciles are arranged in data tables (), (, ().
Combine the data in the form of a single data table () containing data tables , and .
Add a pattern and anti-pattern object to the set of objects. Variables of interest can be divided into three types of preference variables: stimulants (where higher values are preferred), destimulants (where lower values are preferred), nominants (where the preferred value lies somewhere within the variable range). Formal definitions of stimulants, destimulants can be found in [
43] (p. 48) while nominants are defined in [
44] (p. 118). These definitions are also provided in [
42]. Owing to the structure of the anti-pattern object nominants need to be converted into stimulants. Coordinates of the pattern object represent the most favourable values of preference variables (maximum values for stimulants and minimum values for destimulants). Coordinates of the anti-pattern object represent the least favourable values (minimum values for stimulants and maximum values for destimulants). In the case of symbolic interval-valued variables, coordinates area calculated separately for the lower and upper value of the interval. After including the pattern and anti-pattern object, the joint data table has the form
(
).
Normalise interval-valued variables and arrange the data in the form of a normalised data table
(
;
normalised observation) for symbolic interval-valued variables. The purpose of normalization is to ensure comparability of variables (cf. [
45]). This is achieved by removing units from measurement results and standardizing their orders of magnitude. Symbolic interval-valued data require special normalization treatment. The lower and upper bound of the interval of the
j-th variable for
objects (
objects for 4 types of data, pattern and anti-pattern) are combined into one vector containing
observations. This approach makes it possible to apply normalization methods used for classic metric data. Metric data were normalized using the interval_normalization function from the
clusterSim package implemented in the R program [
46].
Normalisation methods can be represented by the following formula (cf. [
47]):
where:
Select a measure of distance for symbolic interval-valued data (see
Table 2), calculate distances and arrange into a distance matrix
.
Conduct multidimensional scaling (MDS):
for all pairs
, where
f denotes distance mapping from
m-dimensional space
into corresponding distances
in
q-dimensional space (
To enable graphic presentation of results
q is set to 2. Distances
are unknown. The iterative procedure, implemented in the smacof algorithm, used to find configuration
(given
q dimensions) and calculate distance matrix
, is presented in [
50] (pp. 204–205).
The solution used in the study makes it possible to select an optimal procedure of multidimensional scaling (MDS) for a given normalization method (
Table 1), distance measure for symbolic interval-valued data (
Table 2) and scaling models (ratio, interval, spline—polynomial function of second and third degree), available in the
mdsOpt R package [
46], which use the
smacofSym function from the
smacof package [
51]. Two criteria were used to choose the optimal MDS procedure: the value of Kruskal’s
goodness-of-fit function and Hirschman-Herfindahl Index (
), calculated for percentage shares of objects in the value of the
function (
stress per point). Out of MDS procedures for which
(
—acceptable value of the goodness-of-fit measure), we select one for which
(
—number of the MDS procedure). More information about the selection of the optimal MDS procedure can be found in the mdsOpt package vignette.
- 8.
In the end, as a result of applying multidimensional scaling, we obtain a two-dimensional data matrix
(
). Depending on the location of the pattern and anti-pattern object in the two-dimensional scaling space the coordinate system needs to be rotated by an angle of
according to the formula:
where:
—data matrix in a two-dimensional scaling space after rotating the coordinate system by an angle of ,
—rotation matrix.
The rotation does not change the arrangement of objects relative to one another but makes it possible to position the set axis connecting the pattern and anti-pattern along the identity line, which improves the visualization of results.
- 9.
Visualise and interpret the results (of multidimensional scaling) in a two-dimensional space. This is done by first joining two points, representing the anti-pattern and pattern, by a straight line to form the so-called set axis in the diagram. Then isoquants of development (curves of equal development) are drawn from the pattern point. Objects located between the isoquants represent a similar level of development. The same level can be achieved by objects located at different points along the same isoquant of development (due to a different configuration of variable values).
- 10.
Order objects according to the value of the aggregate measure
based on the Euclidean distance from the pattern object [
43]:
where:
—
j-th coordinate for
i-th object in the two-dimensional MDS space,
—
j-th coordinate for the pattern (anti-pattern) object in the two-dimensional MDS space.
Values of the aggregate measure belong to the interval . The higher the value of , the higher the social cohesion of the objects. The objects are arranged according to descending values of the aggregate measure (3).
5. Results of the Empirical Study
5.1. Results for Metric and Interval-Valued Data
In line with the procedure described in
Section 3, four types of data—classic metric data and three types of symbolic interval-valued data (min-max, 1st and 9th deciles, 2nd and 8th deciles)—were mapped into a two-dimensional space and then rankings of provinces in Poland were compared in terms of social cohesion.
Data on social cohesion in the 16 provinces of Poland, described by 25 variables, were arranged in a data matrix . Because the data were to be combined with symbolic interval-valued data, they had to be put in a data table (.
Poland has a three-tier system of administrative division, consisting of 16 provinces (Pol. województwo), 380 districts (Pol. powiat) and 2477 communes (Pol. gmina). In order to obtain symbolic interval-valued data, classic metric data on social cohesion in 380 districts described by 25 variables were aggregated at province level. The lower and upper bound of the interval for each variable in each province was obtained by calculating the minimum and maximum, 1st and 9th deciles, 2nd and 8th deciles, using district-level data. Interval-valued data (min-max, 1st and 9th deciles, 2nd and 8th deciles) were arranged in three data tables (), (, ().
The data tables , , and were combined into one data table . After adding the pattern and anti-pattern object the final dataset was a data table ().
The optimal scaling procedure was selected after testing combinations of seven normalization methods (n1, n2, n3, n3a, n5, n5a, n12a—see
Table 1), four distance measures for interval-valued data (Ichino-Yaguchi, Euclidean Ichino-Yaguchi, Hausdorff, Euclidean Hausdorff—see
Table 2) and four MDS models (ratio, interval, polynomial function of second and third degree), yielding a total of 112 MDS procedures. Values of Kruskal’s
STRESS-1 belong to the interval
. Of the MDS procedures for which
(acceptable value of the goodness-of-fit measure calculated as a median), we selected the combination (using the optSmacofSymInterval function from the mdsOpt R package) for which
. This procedure involves normalisation n5 (normalisation to
range), the scaling model based on polynomial function of the 3
rd degree and the Euclidean Hausdorff distance. For this MDS procedure
.
The Shepard diagram (
Figure 1a) and the Stress Plot (
Figure 1b) confirm the correctness of the selected multidimensional scaling procedure.
Figure 2 shows results of multidimensional scaling of 16 provinces of Poland for four types of data according to the level of social cohesion in 2018. In the diagram the anti-pattern (AP) object and the pattern (P) object are connected by a straight line, known as the set axis. Six isoquants of development (curves of equal development) were arbitrarily identified, which divided the set axis into six equal parts. Isoquants located further away from the pattern object represent a lower level of social cohesion.
The results made it possible to assess the level of social cohesion in the provinces using four types of data simultaneously. One thing worth noting in
Figure 2 is the arrangement of provinces in relation to the set axis
. Sets of provinces based on symbolic interval-valued data are located increasingly further away from the set based on metric data (
), as the width of the interval for each set increases (2nd and 8th deciles, 1st and 9th deciles, minimum and maximum).
Table 4 shows a ranking of 16 provinces for four types of data according to the level of social cohesion in 2018. Calculations were made using the clusterSim package.
Moreover, the dispersion of provinces, measured by the standard deviation and median absolute deviation (
Table 4), increases as one moves from results based on metric data to those based on symbolic interval-valued data (with increasing interval width). The width of each set of provinces with respect to the set axis
increasing as the its interval width increases (it is the smallest for metric data, where
, and largest for symbolic interval-valued data comprising 100% observations, from minimum to maximum).
5.2. Comparative Analysis of the Results in the Assessment of Social Cohesion
The rankings of provinces of Poland according to the level of social cohesion were compared on the basis of the aggregate measure
for four types of data (see
Table 4): metric data, symbolic interval-valued data comprising 2nd and 8th decile (60% of observations), 1st and 9th decile (80% of observations) as well as minimum and maximum (100% of observations).
The results of the assessment of social cohesion were compared using two criteria. The first one was based on coefficients of correlation (Spearman’s and Kendall’s ) between aggregate measures for the four types of data in order to determine similarities and differences between different rankings of provinces according to social cohesion. The second criterion was the degree of compatibility between rankings of provinces based on individual variables and that based on the aggregate measure for the four types of data. The results of this comparison were used to choose the ranking providing the best reflection of the level of social cohesion in the provinces of Poland. This reason why this second criterion was used is that the overall ranking is the result of rankings obtained for individual variables.
Spearman’s
and Kendall’s
coefficients of correlation between aggregate measures
calculated for the four types of data are shown in
Table 5.
For the purpose of cluster analysis, correlation coefficients were converted into distances:
and
. Cluster analysis was used to identify similarities and differences in the rankings of provinces according to the level of social cohesion based on the aggregate measure
. The results were presented in the form of dendrites (
Figure 3), following the Wrocław taxonomic method [
52].
Results of the symbolic interval-valued approach differ considerably from the approach based on metric data. Assessments of social cohesion obtained using metric data are closest to results obtained for symbolic interval-valued data comprising 2nd and 8th decile.
The next step involved analysing the degree of compatibility between rankings of provinces based on individual variables and that based on the aggregate measure for the four types of data, using the following procedure:
-
16 provinces of Poland for 4 datasets (metric, min-max, 1st and 9th decile, 2nd and 8th decile) are linearly ordered according to a set of
m variables to produce 4 rankings based on aggregate measures
(see
Table 4).
For each variable (
) a distance between each object and the pattern object is calculated according to the formula (the Ichino-Yaguchi distance for one variable):
where:
(
) interval (
, min-max, 1st and 9th decile, 2nd and 8th decile);
interval length;
;
;
(
) interval of the pattern (anti-pattern) object for
j-th variable.
This yields m values of measures
The general rankings based on the aggregate measures (step 1) are compared with individual rankings based on measures (step 2), separately for each data type using Spearman’s and Kendall’s correlation coefficients.
For each data type, the median of results obtained in step 3 is calculated. A higher value of the median represents a higher degree of compatibility between rankings based on individual variables and the ranking based on the aggregate measure. The results are displayed in
Table 6.
The greatest degree of compatibility between rankings of objects based on individual variables and the ranking created using the aggregate measure was obtained for intervals based on 2nd and 8th deciles. The use of deciles is an example of a robust approach (helps to eliminate the effect of outliers).
The ranking of provinces obtained on the basis of metric data does not account for the variation in the level of social cohesion between districts of a given province (lower level units). Mazowieckie (16) and Podlaskie (15) are characterised by a big disparity in the level of social cohesion between the leading district (Warsaw and Białystok, respectively) and the other ones. The leading districts have a strong impact on the overall level of variables at province level (metric data), and, consequently, on the more favourable assessment of social cohesion (see
Table 4, columns 3 and 4). When metric data are replaced with symbolic interval-valued data, which account for the variation between districts with respect to the variables of interest, these two provinces are located much closer to the anti-pattern (see
Figure 2). A switch from metric to interval-valued data comprising 2nd and 8th decile leads to the biggest decline in the values of measure
for Mazowieckie (
) and Podlaskie (
). This is also reflected by the ranking drops registered by these provinces: down 9 places (Mazowieckie) and down 5 places (Podlaskie).
The results of the study indicate that the differences in the rankings depend on the degree of variation between districts within a given province. The biggest changes in the ranking of provinces after switching from metric to interval-valued data were observed for those objects (provinces) which were characterised by the relatively biggest variation among districts in terms of the study variables (see
Figure 4). The use of interval-valued data comprising 2nd and 8th decile made it possible to eliminate the impact of outliers. Provinces in which districts were not very different from one another, in both cases occupied the same or similar position in the ranking (see
Figure 5).
Figure 6 shows the spatial distribution of Polish provinces in 2018 in terms of the level of social cohesion based on values of measure
for metric data and interval-valued data (the approach involving 2nd and 8th deciles). After applying another method of measurement the position of provinces has changed significantly. The provinces were not assessed merely on the basis of mean values, which can easily be affected by outliers, but also by accounting for how the variables varied across districts. In the case of provinces, districts classified as spatial poles of growth behave like extreme observations and can strongly influence the measurement for the entire province they belong to. This phenomenon is exemplified by measurements for the province of Mazowieckie, which are strongly affected by the district of Warsaw. The proposed modification of the method (using of interval-valued data) has made it possible to avoid this problem.
The assessment of the level of social cohesion across provinces on the basis of interval-valued data (the approach involving 2nd and 8th deciles) is clearly consistent with the assessment of the level of economic development. Provinces situated in Western Poland, characterised by a higher level of economic development, have also a higher level of social cohesion. It should be assumed that such a system of spatial differentiation results to a large extent from historical conditions, mainly related to the period of partitions (1795–1918) and changes after World War II.
6. Discussion and Conclusions
As a result of economic changes there is a growing demand for assessment of social cohesion. It plays an important role both in the economic and social sphere not only for the country as a whole but, more importantly, at the regional level. This is because there is a need to develop a methodology dedicated to this area of interest.
In the study the level of social cohesion in provinces of Poland in 2018 was assessed using a hybrid approach combining multidimensional scaling and linear ordering. The traditional approach is conducted using classic metric data and it does not account for the variation between lower level units (i.e., districts). The authors propose a methodology which makes this possible. The dataset containing the classic metric data was extended to symbolic interval-valued data (three data types: min-max, 1st decile and 9th decile, 2nd decile and 8th decile) and used in the analysis. In addition to producing a ranking of provinces according to the level of social cohesion, the results of assessment were presented in a two-dimensional space.
The results of the symbolic interval-valued approach differ considerably from those obtained using metric data (see
Table 4 and dendrite diagrams in
Figure 3). The differences become more evident, the wider the intervals are. The biggest degree of similarity between rankings of objects based on individual variables and the ranking based on the aggregate measure was obtained for intervals based on 2nd decile and 8th decile comprising 60% of observations at district level. In this case, social cohesion in the provinces was not assessed on the basis of atomic variable values, but using information from 60% of districts (the interval between 2nd and 8th decile). This approach made it possible to take into account the degree of variation in social cohesion between lower level units (districts) within a given province. The use of 2nd decile and 8th decile is an example of a robust approach, which is used for eliminating the effect of outliers.
The proposed modification makes it possible to assess social cohesion in provinces not only on the basis of a single real number (metric data), but also by taking into account characteristics of particular districts within each province (interval-valued data). It was found that the assessment of social cohesion in provinces based on interval-valued data is strongly affected by lower level units (districts). The overall assessment of a given province does not depend on one or two districts but on all of them. Decision makers at province level should therefore ensure that all districts develop more or less uniformly.
The novelty of the study presented in this article consists in jointly mapping, by means of a hybrid approach combining multidimensional scaling and linear ordering, classic metric data and symbolic interval-valued data (three data types: min-max, 1st decile and 9th decile, 2nd decile and 8th decile) in one chart. This approach makes it possible to take into account the degree of variation in social cohesion between lower level units (districts) within a given province.
It is worth adding that the method of assessing the level of social cohesion in territorial units proposed in the article does not require an additional survey, as it relies on secondary data sources. However, its application is limited by the availability of appropriate data. Not all phenomena associated with social cohesion are measured by official statistics. The situation looks different regarding the assessment of social cohesion based on primary data. In such cases, the researcher can include all dimensions of social cohesion when designing the survey questionnaire.
All calculations were performed using R scripts written by the authors.