1. Introduction
Sustainable development is a crucial trend in today’s world, on which the future of future generations depends. Achieving the goals outlined in the SDG agenda is a complex process that requires careful planning and execution. The analysis of progress towards these goals, especially at the regional level, is directly related to the involvement of additional indicators of sustainability, which are based on ESG principles.
The transition to sustainable development requires the efforts of not only national governments but also the actions of many micro-level actors, such as firms, households, and individuals.
Quantitative assessments of regional development are traditionally based on the statistical data of various levels. This inevitably leads to a delay due to the complex procedures of collecting, processing, and publishing a wide range of indicators.
However, the lack or incompleteness of available data can be a significant barrier to evaluating the effectiveness of these goals. This can make it difficult to accurately measure progress and make organizational decisions. Issues related to data availability require an expansion of innovative approaches, such as the use of Big Data, to overcome these challenges [
1].
Analysis of news content provides a more prompt understanding of socio-economic processes. The use of social networks as a source of information is associated with a number of problems, including the presence of fake news and information chaos [
2]. Therefore, the emphasis in our study is on using reliable news sources, such as official regional pages and large online communities affiliated with media.
The set of potential indicators of regional sustainable development is very extensive for empirical analysis. However, it should be noted that it is not always possible to collect statistical information on all aspects of sustainability in a timely manner. The use of text mining methods provides additional opportunities to solve problems related to the systemic assessment of regional processes.
The motivation for this study is to expand the important areas of analysis of territorial and temporal features of sustainable development, taking into account additional sources of information. The aim of the research is to expand the tools for monitoring changes in the environmental, social, and governance (ESG) agenda of Russian regions. The novelty of this approach is determined by the development of a methodology for tracking systemic changes, assessing positive and negative processes in the development of regions from the point of view of sustainability.
The main hypotheses were considered:
H1. There is a high degree of heterogeneity in the implementation of the Sustainable Development Goals of the regions;
H2. There are insignificant dynamics in the expansion of strategic directions for sustainable regional development.
The existing classification of the Sustainable Development Goals by relevant areas is aimed at solving various social, economic, and environmental problems [
3]. The Sustainable Development Goals (SDGs) are designed to reflect the most important and urgent challenges facing the global community [
4]. Moreover, these goals are interconnected, necessitating a comprehensive approach when assessing the outcomes of their implementation [
5]. Our research proposes to systematically assess the sustainability of regional development taking into account all ESGs.
The concept of sustainable development based on the formulated goals represents the joint influence of business, government, and civil society. Additional clarification is required regarding the regions’ characteristics and contributions to the Sustainable Development Goals. Several studies have been conducted on regional development practices, considering the necessary adjustments to indicators [
6,
7,
8].
Science-based approaches to SDG implementation of indicator-based assessments are an area in need of additional research [
9,
10,
11].
To understand the range of factors contributing to sustainable development, it is essential to consider the diverse characteristics of different countries and regions [
12].
Considering the rather abstract nature of the Sustainable Development Goals, it is important to formulate them correctly to search for relevant news reports [
13,
14].
Regression models are traditionally used to quantify the impact of time and macro-regional factors. Analysis of panel data and surveys of respondents made it possible to determine the impact of efforts related to the Sustainable Development Goals on the perception of residents of regions [
15].
It is proposed to determine the corresponding weighting factors in order to form a comprehensive assessment through sustainable development indicators based on the Sustainable Development Goals in accordance with their priorities [
16].
To evaluate the trends and progress of selected indicators of sustainable development, a method is employed that calculates the distance from the indicator’s current value to its target value of 2030 [
17,
18].
The use of factor analysis for the selection of indicators and hierarchical methods of cluster analysis partially avoids the disadvantages of combining a large number of indicators into a single index [
19].
For the purpose of cause-and-effect analysis, we propose to consider various combinations of factors for achieving the Sustainable Development Goals. Methods based on fuzzy logic and qualitative comparative analysis (QCA), in particular fsQCA (fuzzy set qualitative comparative analysis), make it possible to identify indicators for Specific Sustainable Development Goals that may be sufficient for different regions [
20].
Data Envelopment Analysis (DEA) has been applied to assess the sustainable development of cities. The method of variable selection and the DEA models allowed us to identify five social, five economic, and three environmental key indicators for evaluation [
21].
A relevant approach for evaluating Russian regions is the ESG (Environmental, Social, and Governance) City and Regional Index, which comprises 60 indicators across 16 comprehensive factors grouped into three categories related to sustainable development [
22]. The final assessment is formed by the additive method, which uses official public statistical data and other ratings. Open Internet sources were not used in this research.
Approaches based on artificial intelligence techniques are employed to analyze the implementation processes of the Sustainable Development Goals [
23,
24].
When text analysis was applied to sustainability assessment, machine learning methods, particularly modified convolutional neural networks, were used to classify companies according to their sustainability level [
25].
The use of news reports to form a corpus of documents for further text analysis is due to the presence of a cause-and-effect relationship between the relevance of social problems among the population and the thematic priorities of the media [
26]. Some studies use news articles to assess individual sustainability indicators related to climate change [
27,
28].
Assessing news sentiment during crises can serve as an operational indicator of the economy’s state and can be used in combination with more traditional explanatory variables [
29].
Improving the tools for monitoring sustainable development provides an opportunity to gain insight into the extent to which local and national territories are moving towards Sustainable Development Goals.
2. Materials and Methods
The proposed methodology includes a number of stages (
Figure 1):
The creation of a news articles corpus based on the extraction of data from media resources using a developed application, considering the subject matter, selected region, and period;
Preprocessing and stemming of the obtained corpus of documents, the removal of stop words;
Construction of a word cloud and analysis of keyword frequencies in document sets for different regional topics and periods;
Comparative analysis of news topics for various regions and different periods using text mining techniques;
Sentiment analysis of the generated data slices for individual regions and periods.
Web scraping allows solving various analytical tasks, in particular, automation of the process of collecting data from websites in accordance with the user’s needs [
30]. An application for parsing user requests for the social network VKontakte has been developed.
Currently, the Vkontakte network is the most popular, publicly accessible, and widespread social network in the Russian Federation, which is confirmed by expert assessments of professional communities (
https://mediascope.net/data/ (accessed on 4 March 2025). The official pages of the regions are presented in this network. This ensures that the sample contains information from official sources. Regional mass media are also represented in Vkontakte communities. The activity of regions in news communities may vary, but the selection of a certain number of news communities by these methods ensures the relevance of the sample. The results obtained as a result of the application of the developed methodology are relevant to the regional news agenda.
Various tools are offered for implementing parsing, including on the Python, Java, Ruby, and JavaScript platforms. This application uses BeautifulSoup and Scrapy framework on the Python 3.9 platform, which has an extensive range of text mining libraries.
The necessity to create a representative sample of textual data led to the search for information sources that have special APIs and contain a sufficient amount of news articles. VKontakte, the most popular Russian language network, is used as such a source. The VK API is based on the HTTP protocol and utilizes the JSON data transfer format [
31].
The news search was carried out for each region using a special VK API groups.search method, which can be used to specify a search query. The search query was specified in a format that included the name of the region and the keyword “news”.
This method provides a list of relevant news communities using the internal algorithms of the VKontakte network. A limited number of communities were selected for each region. The results of the expert assessment confirm that the final list of received network resources includes news from regional media and official sources that maintain their communities on VKontakte.
Then, for each region, the news in the resulting list of communities was selected by year and a list of keywords converted into a regular expression. The news data slices obtained in this way include a fairly wide range of news from regional authorities and other official sources.
To optimize the execution of a large number of queries, VK Script, a language for automating various tasks in VK, was used. The developed parser was designed to extract news reports. It provides a comprehensive set of features to perform data preparation, including stemming, specifying a list of keywords for searching, setting the period, specifying specific news sources to apply territorial restrictions to the generated sample, collecting data according to the specified parameters, and saving the collected documents in convenient file formats (e.g., .xlsx). In addition to extracting posts, it is also possible to generate a set of associated comments, significantly expanding the possibilities for analyzing textual information. The document corpus generated in this way provides additional solutions to various text mining tasks, including word cloud generation, sentiment analysis, clustering, and text categorization.
During the pre-processing of the collected news documents, tokenization was performed, i.e., the text was divided into sentences, and the sentences were divided into words. Stemming was implemented, which is the normalization of words, reducing a word to its base. Removing stop words that are common in the language and that do not carry serious substantive information is an important element of text pre-processing. Stop word dictionaries used to process texts in Russian require expansion or the creation of separate dictionaries that will be used in the analysis using tokenization.
A word cloud is a weighted dictionary representation, showing the frequency of words in a certain text, which provides convenient visualization of the document. A word cloud as a method of text compression can easily obtain the most significant information about the content of the text and highlight subtopics.
One of the main benefits of a word cloud is its ability to visually represent large volumes of data that are difficult to describe quickly and precisely using the original text. Additional information, such as the relative importance of each keyword, can be added to the word cloud.
In the received news texts, not only stop words from the standard dictionary were removed. The list of stop words was modified, for example, we have redeleted Russian proper names and place names, and others. During the study, our own list was formed, which made it possible to form a more relevant set of keywords. To do this, during computational experiments, the list of stop words was adjusted by adding the names of regions and territorial entities, special characters, and certain frequently used abbreviations. The modified list of stop words includes numbers, punctuation marks, emoji, names of months, names of regions and their capitals, and heads of regions. These words do not carry a semantic meaning in this context.
Various methods are used for vector representation of texts, including bag-of-words, tf–idf (tf—term frequency, idf—inverse document frequency), Word2Vec, GloVe, BERT [
32]. The two most commonly used methods are frequency analysis and word weight analysis. Frequency analysis entails counting the number of occurrences of a word in a text and making inferences based on the frequency of the most common words.
The statistical approach tf–idf can be used as a preliminary step to assess document similarity [
33].
To determine the relevance of keywords, we calculate the frequency of occurrence of a word within a selected data subset (slice). In this context, a slice refers to a specific fragment of information. For the purpose of evaluating news reports, we used slices that consisted of texts related to a selected region in a given time period. A higher occurrence frequency of a keyword within a data subset indicates a higher level of relevance. The top-ranking keywords are presented in a list in order of decreasing relevance.
Let t is a word, d is a document, D is a collection of documents.
is defined as follows:
where
is the number of occurrences of word
t in document
d, and the denominator is the total number of words in document
d.
is presented in the following way:
where |
D| is the number of documents in the collection
D;
is the number of documents from collection
D for which word
t occurs.
For each word
ti from document
dj, the value
is calculated. We obtain a matrix
V consisting of elements
,
,
where
n is the number of unique words in all documents from the document collection
D,
. To estimate the similarity measure of documents transformed into a vector representation, a cosine measure is determined. Each document is described by a vector, each component of which corresponds to a word. Let
R = and
P =
are vectors corresponding to some two documents. Then, the cosine similarity measure is calculated as follows:
Sentiment analysis is a crucial component in natural language processing (NLP), particularly in the fields of review analysis, social media monitoring, and many other areas. The selection of an appropriate model for this task is of paramount importance, as it significantly impacts the accuracy of tonality assessments.
Various approaches are used for sentiment analysis; namely, lexicon methods, machine learning methods, deep learning and neural network approaches, and combined methods.
Classical machine learning methods, such as logistic regression, SVM, and ensemble models also remain relevant due to their simplicity.
Among the deep learning methods, Transformer models, such as Bidirectional Encoder Representations from Transformers (BERT) and GPT (Generative Pre-trained Transformer), stand out, as they account for the contextual and semantic aspects of language [
34]. The main problem in the case of news content in Russian is the lack of available high-quality samples for training such multiparameter models.
In recent years, transformers such as RuBERT, FinBERT, Multilingual BERT, and XLM-R have become popular.
The use of transformers in this case is limited by the need for a large amount of data for fine-tuning. Although RuBERT can be adapted to specific tasks using fine-tuning, this process requires a fairly large amount of labeled data. In this study, fine-tuning of the pre-trained RuBERT model was carried out for a specific task; namely, sentiment analysis. This process allows using the knowledge gained by the model during pre-training on a large corpus of texts and applying it to highly specialized data.
To compare the effectiveness of different models, it is advisable to use samples focused on news texts. In particular, FinBERT is a specialized BERT-based model developed for text analysis in the financial sector [
35]. It was trained on a corpus of financial documents, including company reports, news, and analytical articles, which makes it especially effective for tasks related to financial analytics, including sentiment analysis of financial texts. This model has a limitation typical of other transformers; namely, high resource requirements. In addition, FinBERT has a fairly narrow specialization focused specifically on the financial sector. To analyze Russian-language financial texts, the model needs to be adapted, for example, by retraining on Russian-language data and using machine translation.
Multilingual BERT is a version of BERT with the ability to process texts for multilingual tasks [
36]. It should be noted that for the Russian language, the accuracy may be low due to the peculiarities of training.
The XLM-R model is an improved version of the Cross-lingual Language Model (XLM), based on the RoBERT architecture. The model demonstrates high efficiency for tasks related to languages with a small amount of available training data. This is due to its improved pre-training process. However, it requires additional training for tonality analysis using a larger sample size.
Dictionary methods are characterized by the interpretability of results and, unlike machine learning, do not require large volumes of labeled data for training. In addition, they are less susceptible to the problem of overfitting. An argument in favor of the dictionary approach was the lack of large Russian-language samples that allow for training classification models for news content.
The publicly available tonal dictionary LinisCrowd (
https://linis-crowd.org/ (accessed on 4 March 2025)) is focused on solving the fundamental linguistic problem of the lack of a Russian-language dictionary of tonal vocabulary for user texts on socio-political topics. The LinisCrowd dictionary includes more than 26,000 lexemes [
37]. The LinisCrowd dictionary includes more than 26,000 lexemes.
To select the most accurate sentiment analysis method for specific news texts, the RuBERT, FinBERT, and XLM-R transformer models were considered, and the LinisCrowd and KartaSlovSent dictionaries were also used. In our study, 1000 news articles on topics related to sustainable development were added to adapt transformer models.
In this study, the open linguistic dataset KartaSlovSent was used to assess the sentiment. The sentiment dictionary of the Russian language KartaSlovSent contains words and expressions of the Russian language [
38]. Dictionary lexemes are supplied with a sentiment label (“positive”, “negative”, “neutral”) and a scalar value of the strength of the emotional assessment from the continuous range [−1, 1], where +1 corresponds to inputs with the most positive sentiment strength score, −1 corresponds to inputs with the most negative sentiment, 0 corresponds to inputs with a neutral assessment (the same as no coloring). The total volume of the dictionary obtained as a result of expert sentiment labeling is 28,197 words and expressions of the Russian language. To analyze the sentiment of news messages, a scale was configured for sentiment classification taking into account the features of news texts with a neutral tonality of presentation.
Computational experiments to assess the accuracy of the constructed models were carried out on a special test data sample. It contains 430 news messages on socio-economic topics. To ensure a relevant assessment of the sentiment detection model, the test sample was marked by experts. Generated positive and negative news texts were used to expand the test sample.
Table 1 presents the Accuracy and F1-scores of models that use different approaches to analyzing the sentiment of news articles in Russian.
The F1-score accuracy of the constructed sentiment assessment model based on the dictionary approach KartaSlovSent was 0.87. As a result, it was used to assess the tonality of news.
3. Results
The proposed approach has been tested for the example of some Russian regions. This study collected data on the Volga Federal District, which includes 14 different territorial entities, including republics and regions. This was conducted purposefully to test the methodology. These regions were selected as a full-fledged research object. Given the significant unevenness of regional development, it was necessary to consider regions characterized by territorial and geographical community, relatively equal in terms of development. These regions are connected by complex connections of the same level, but they have sufficient differences to illustrate the possibilities of the methodology. It is for such regions that benchmarking analysis of the current level and directions of sustainable development in neighboring regions is especially important.
Case studies of the following regions have been used: Republic of Tatarstan, Udmurtian Republic, Chuvashi Republic, Republic of Bashkortostan, Republic of Marij El, Republic of Mordovia, Nizhni Novgorod region, Orenburg region, Kirov region, Penza region, Perm territory, Samara region, Saratov region, Ulyanovsk region. A corpus of news reports was compiled for these 14 regions.
In the developed application for assessing the sustainable development of regions, a set of keywords was used in the process of collecting data from news resources of the VKontakte social network. A selection of network regional news resources was made. Official resources representing regional authorities and additional news sources with a sufficient number of subscribers were selected. During the parsing process, regional resources in the VKontakte network were selected as sources based on the VK API popularity rating. The corresponding news slices were obtained by this application for different periods. The application allows parsing news resources for thematic selection of texts taking into account the presence of keywords and phrases. The selection was carried out based on the following keywords in accordance with the Sustainable Development Goals and taking into account Russian specifics:
Poverty, income distribution, income of the population, welfare, social security;
Food security, food sovereignty, food culture, regenerative agriculture, organic food, food price;
Mental health, public health, mental well-being, disability, health education, infectious diseases, child mortality, family planning, neonatal mortality, infant mortality, child health, road accidents, reproductive health, epidemics, health insurance;
Education, environmental education, technical and vocational education, free education, accessible education, primary education, secondary education, higher education;
Gender equality;
Reclamation, water efficiency, groundwater depletion, desertification, green infrastructure;
Renewable energy, wind, solar, geothermal, hydroelectric, fuel-efficient technologies, emissions, greenhouse effect, biofuels;
Employment, economic growth, sustainable development, wages, economic empowerment, small and medium enterprises, youth employment;
Infrastructure, investment, internet, industrial diversification;
Trade, financial market, taxation, social security, government program;
Public transport, climate change adaptation, affordable housing, pedestrian zone, public spaces;
Natural resources, recycling, industrial ecology, reuse, decarbonization, food waste;
climate, greenhouse gas, global warming, weather, environment;
Water protection, fish stocks;
Land use, ecological land restoration, forest conservation, deforestation, reforestation;
Social justice, legal system, and fight against terrorism.
Filtering using keywords allowed us to limit the topics of messages, which provides relevant news collections following the topic of sustainable development. Corresponding data slices were formed for regions of the Volga Federal District. The number of news items by region for 2021–2023 is presented in
Table 2.
After preliminary processing of the obtained news slices, word clouds were generated to select the most relevant terms. In addition to the visual representation as a word cloud, frequency estimates were obtained for keywords in the corpus of news documents by region and corresponding periods.
As a result, it is possible to quantitatively evaluate the relevant keywords to provide more objective characteristics of the main topics in the news content.
The next stage of the study is presented as ranked lists of keywords for some slices.
Figure 2,
Figure 3,
Figure 4 and
Figure 5 show the obtained results for several regions (Tatarstan, Samara, Baschkortostan, Penza) as examples. Fifteen keywords were identified for these regions in the slices according to
tf value.
It should be noted that the presented ranked lists illustrate the heterogeneity of key topics related to regional sustainable development. It can be concluded that there is no broad coverage in the areas formulated on the basis of Sustainable Development Goals. This is largely due to the limited resources of Russian regions due to the current economic situation. In the regions considered as examples, much attention is paid to pressing aspects of education at different levels.
According to the Russian ESG index ratings, Tatarstan is one of the leading regions. In addition, this region is traditionally considered a research and university region. In the resulting list of key topics, education is presented as the most relevant topic for this region. The keywords cover all three areas of the ESG assessment. In the recent period, the environmental agenda has not been so relevant for the region. Samara Oblast demonstrates increased attention to goals related to the development of education and entrepreneurship; environmental issues are not a priority. For Bashkortostan, social aspects of ensuring sustainable development (promoting social development) are presented for the most part. In the Penza region, health care and demography issues are actively considered during all the periods under study.
Similarity metrics were calculated to quantitatively assess changes in the discussed issues of sustainable development.
Table 3 presents numerical estimates of the similarity of sustainability issues discussed in the news agenda for all 14 regions of the Volga Federal District based on the cosine measure.
The numerical estimates for 2023 based on the cosine measure do not exceed 0.6, which indicates significant differences in regional approaches to increasing sustainability (H1). To assess the change in the topic in news reports for individual periods, term-document matrices were obtained, and the cosine metric was applied. Using the developed methodology, it is possible to assess the sustainability of regional development in dynamics. For this purpose, similarity matrices were constructed based on news content for all periods under consideration. For example,
Table 4 shows similarity metrics for Tatarstan for 2021, 2022, and 2023.
Table 5 shows similarity metrics for the Samara region,
Table 6 contains data for Bashkortostan, and data for the Penza region are available in
Table 7.
The presented numerical estimates of the cosine measure show a relatively high similarity between the key topics discussed in news reports in 2021, 2022, and 2023 in Tatarstan, Samara region, and Penza region. Bashkortostan is characterized by a more significant revision of the target areas for increasing the region’s resilience.
Tatarstan is included in the sustainable development rankings as one of the leaders in Russia during the periods under review. According to the values of the cosine metric, this region demonstrates a high degree of compliance with the ESG goals that have already been formulated earlier in a number of development programs.
Based on the obtained similarity estimates, it can be concluded that news reports devoted to issues of regional sustainability change insignificantly during the selected periods (H2). This is largely due to the adoption of federal strategic programs for increasing resilience at the regional level.
The final stage of the study is the Sentiment Analysis of news reports related to the goals of sustainable regional development. News in slices is assessed in accordance with tonality. The relative number of positive and negative documents in slices for the period from 2021 to 2023 is presented below in
Figure 6 and
Figure 7.
The number of positive news on sustainable development is decreasing on average across regions (in 2021, the percentage of positive news in the Volga Federal District was 32.47%, in 2022 it was 30.24%, and in 2023 it was 29.75%). However, for certain regions (Nizhny Novgorod Region, Penza Region, Bashkortostan, Mordovia, Udmurtia), there is a tendency for positive news to increase.
The proposed methodology can be applied in a similar way for different regions. The expansion of the list of regions will be associated with the parsing of relevant news sources, which is a rather complicated process. The methodology is a means of forming an up-to-date cross-section of news content based on regional ESG factors for further expert work, comparison with other regions, and identification of development directions.
4. Discussion and Conclusions
Various aspects of sustainable development of regions remain in the spotlight for a long time. An attempt to comprehensively apply mathematical methods for operational support of the decision-making process at the regional level expands traditional approaches to sustainability management [
39,
40]. The contribution of our research is the application of intellectual analysis methods to the processing of news texts related to the sustainability of Russian regions.
Textual information presented in various reports was used to solve individual sustainability assessment tasks, particularly at the enterprise level [
41]. In our research on sustainable development, an important aspect is the regional level with coverage of online news resources.
The currently existing methodology for forming the rating assessment of Russian regions includes a large set of indicators of sustainable development, both from official statistics and from other consolidated ratings. Such a systematic approach is carried out based on the results of the period when the initial data are reflected in full in official sources. Such a methodology will be retrospective in nature [
22].
To increase the efficiency of the assessment, we used additional sources of information in the form of news reports. This will also increase the objectivity of the trends analysis of sustainable development concerning Russian regions.
The differences in news topics for regional sections revealed using the proposed quantitative assessment are associated with various external and internal features. With a variety of factors influencing the change in the news agenda on the goals of sustainable development of regions, the presence of regional strategic programs plays an important role. Limited regional resources and high key rates in the current situation led to a change in priority tasks in increasing sustainability. Factors that determine the change in the agenda relating to sustainable development include economic, financial, and reputational incentives. It is also important to keep in mind the interest of regional governments and government structures in decision-making, associated with the emergence of sustainability criteria in the distribution of subsidies and the provision of budget loans.
Sustainability issues at the macro level are studied in sufficient detail, but the proposed methodology is aimed at the regional level. This enables key stakeholders to assess the dynamics of sustainable development in regions.
The text mining approach is innovative, providing the opportunity to assess and compare the current level of sustainability for different regions in each period of time based on an operational assessment of the news context. Comparative analysis of news content can be performed as a benchmarking stage of the strategic development of regions.
The proposed methodology will enable stakeholders at both the regional and federal levels to identify key trends in the sustainable development of regions. This will allow for the identification of common goals and strategies among different regions, as well as the comparison of innovative solutions and institutional support measures for enhancing sustainability. Taking into account a wide range of sustainable development objectives, this process will also contribute to the development of strategic planning and corrective actions to ensure successful outcomes. The tonality assessment, as part of this process, provides additional opportunities for comparative analysis among regions.
System analysis will make it possible to more quickly identify the dynamics of regional sustainability in order to identify competitive advantages of the regions. The developed approach can be applied not only at the regional level but also to individual enterprises. For enterprises, this is an additional opportunity to integrate with ESG goals at the regional level.
A limited list of keywords and phrases was formed based on commonly accepted formulations to form a cross-section of news data related to various SDGs. The direction of future research involves identifying a specific SDG for differentiated assessment of regions. In this case, an expansion of the list of keywords will be required to scrape suitable documents.
Comparison with official statistics on sustainable development is possible, taking into account the time lag caused by a significant delay in the publication of a block of similar indicators. The construction of models that take into account a multifactorial set of statistical indicators of the sustainable development of regions and additional aspects taking into account news content is the subject of further research. It involves a comprehensive analysis of statistical indicators of sustainable development and the construction of predictive models with additional information from news sources.