# **Cultural Heritage Storytelling, Engagement and Management in the Era of Big Data and the Semantic Web**

Edited by Charalampos Dimoulas Printed Edition of the Special Issue Published in *Sustainability*

www.mdpi.com/journal/sustainability

## **Cultural Heritage Storytelling, Engagement and Management in the Era of Big Data and the Semantic Web**

## **Cultural Heritage Storytelling, Engagement and Management in the Era of Big Data and the Semantic Web**

Editor

**Charalampos Dimoulas**

MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin

*Editor* Charalampos Dimoulas School of Journalism and Mass Communications, Multidisciplinary Media and Mediated Communication research group Aristotle University of Thessaloniki Thessaloniki Greece

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *Sustainability* (ISSN 2071-1050) (available at: www.mdpi.com/journal/sustainability/special issues/ cultural heritage storytelling engagement management era big data semantic web).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Volume Number*, Page Range.

**ISBN 978-3-0365-3069-7 (Hbk) ISBN 978-3-0365-3068-0 (PDF)**

© 2022 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

## **Contents**


#### **Dimitris Dimitriadis, Sofia Zapounidou and Grigorios Tsoumakas**

## **About the Editor**

#### **Charalampos Dimoulas**

Dr. Charalampos A. Dimoulas was born in Munich, Germany on August 14, 1974. He received his diploma and PhD from the School of Electrical and Computer Engineering of Aristotle University of Thessaloniki (AUTh) in 1997 and 2006, respectively. In 2008, he received a post-doctoral research scholarship on audiovisual processing and content management techniques for intelligent analysis of prolonged multi-channel recordings at the Laboratory of Electronic Media (School of Journalism and Mass Communications, AUTh). He was elected Lecturer (November 2009), Assistance Professor (June 2014), and Associate Professor (October 2018) of Electronic Media in the School of Journalism and Mass Communications, AUTh, where he is currently serving.

Dr. Dimoulas was the program chair of the AudioMostly 2015 conference on "Sound, semantics and social interaction"and co-editor of the JAES Special Issue(s) on "Intelligent audio processing, Semantics, and interaction,"(Vol. 64, no 7/8 & 9, 2016). He is one of the authors of the IGI book Cross-Media Authentication and Verification: Emerging Research & Opportunities (https://www.igi-global.com/book/cross-media-authentication-verification/190476). He is a member of the Multidisciplinary Media & Mediated Communication (M3C) Research Group (http://m3c.web.auth.gr/) and participates in the initiative for the Greek media misinformation observatory (https://ellpap.gr/). He is also serving as the Secretary-General of the Hellenic Institute of Acoustics–Helina (https://helina.gr/en/the-institute/administration/). Dr. Dimoulas has participated in over thirty (30) national and international research projects and many respective scientific publications. His current scientific interests include media technologies, audiovisual signal processing, machine learning, multimedia semantics, cross-media authentication, digital audio, audiovisual forensics, and more.

## **Preface to "Cultural Heritage Storytelling, Engagement and Management in the Era of Big Data and the Semantic Web"**

Cultural heritage (CH) refers to a highly multidisciplinary research and application field that collects, archives, and disseminates traditions, monuments/artworks, and overall civilization legacies that have been preserved throughout the years of humankind. This effort is considered very important for historical and educational purposes, which can be deployed in schooling and training sessions, science and social/humanistic studies, artistic expression, and everyday entertaining environments. Todays'digital media landscape offers innumerable ways to expedite the above processes at both ends, i.e., CH content production and "consumption,"taking advantage of the contemporary networking utilities with the associated capabilities for augmented interaction. For instance, many museums and other art/cultural organizations have invested in the development of featured digital applications with appealing storytelling and their online dissemination to engage the audience in featured CH projects. Likewise, the proliferation of mobile devices and services and the vast expansion of the so-called User Generated Content (UGC) has fueled the digitization of personal CH artifacts and their progressive organizations in larger-scale databases. A typical example in that direction is the Europeana project, which formed specific media archiving and metadata standardization rules for CH institutions and sole users to follow. At the same time, urgent needs for better documentation and management of CH documents have emerged, making it difficult for the average user to be part of such large-scale undertakings.

Today, Semantic Web and Big Data technologies promise the facilitation of more straightforward data analysis, information classification, semantic conceptualization, and management automation of multimodal content, which could also be applied for the benefit of the sensitive CH sectors. Specifically, these automation layers could work as mediated communication and collaboration mechanisms between corporations and individuals to accelerate the proper launching, maintenance, and sustainability of suitable CH repositories in favor of all the participants. For instance, many significant personal collections have not yet been detected, captured, fully restored, and documented (e.g., photos, films/movies, other private items, etc.). The development of sophisticated digital crowdsourcing procedures with the necessary technological/interdisciplinary cooperation and support would allow mining and shaping such unique CH masterpieces, in addition to making them available to the public. Thereafter, suitable engaging audience practices and models are welcomed to enhance the impacts of heritage initiatives, amplifying the environmental, cultural, economic, and social sustainability of human beings.

The current Special Issue launched with the aim of further enlightening important CH areas, inviting researchers to submit original/featured multidisciplinary research works related to heritage crowdsourcing, documentation, management, authoring, storytelling, and dissemination. Audience engagement is considered very important at both sites of the CH production–consumption chain (i.e., push- and pull-ends). At the same time, sustainability factors are placed at the center of the envisioned analysis. A total of eleven (11) contributions were finally published within this Special Issue, enlightening various aspects of contemporary heritage strategies in today's society.

*Editor*

## *Editorial* **Cultural Heritage Storytelling, Engagement and Management in the Era of Big Data and the Semantic Web**

**Charalampos A. Dimoulas**

Multidisciplinary Media & Mediated Communication (M3C) Research Group, Aristotle University of Thessaloniki, 54636 Thessaloniki, Greece; babis@eng.auth.gr; Tel.: +30-2310-994245

Cultural heritage (CH) refers to a highly multidisciplinary research and application field, intending to collect, archive, and disseminate the traditions, monuments/artworks, and overall civilization legacies that have been preserved throughout the years of humankind [1–4]. This effort is considered very important for historical and educational purposes, which can be deployed in schooling and training sessions, science and social/humanistic studies, artistic expression, and everyday entertaining environments [5–8]. Today's digital media landscape offers innumerous ways for expediting the above processes at both ends, i.e., CH content production and "consumption," taking advantage of the contemporary networking utilities with the associated augmented interaction capabilities [9]. For instance, many museums and other art/cultural organizations invested in the development of featured digital applications with appealing storytelling and their online dissemination to engage the audience in featured CH projects [10–13]. Likewise, the proliferation of mobile devices and services and the vast expansion of the so-called user-generated content (UGC) fueled the digitization of personal CH artifacts and their progressive organizations in larger-scale databases [14–19]. A typical example in that direction is the Europeana project, which formed specific media archiving and metadata standardization rules for CH institutions and sole users to follow [20,21]. At the same time, urgent needs for better documentation and management of CH documents have emerged, making it difficult for the average user to be part of such large-scale undertakings.

Today, Semantic Web and Big Data technologies promise to facilitate more straightforward data analysis, information classification, semantic conceptualization, and management automation of multimodal content, which could also be applied in the benefit of the sensitive CH sectors [22–26]. Specifically, these automation layers could work as mediated communication and collaboration mechanisms between corporations and individuals to accelerate the proper launching, maintenance, and sustainability of suitable CH repositories, in favor of all the participants. For instance, many significant personal collections have not yet been detected, captured, fully restored, and documented (e.g., photos, films/movies, other private items, etc.). The development of sophisticated digital crowdsourcing procedures with the necessary technological/interdisciplinary cooperation and support would allow mining, shaping, and making available to the public such unique CH masterpieces [14–18]. Thereafter, suitable engaging audience practices and models are welcomed to enhance the impacts of heritage initiatives, amplifying the environmental, cultural, economic, and social sustainability of human beings [8–13].

The current Special Issue launched with the aim of further enlightening the above areas, inviting researchers to submit original/featured multidisciplinary research works related to heritage crowdsourcing, documentation, management, authoring, storytelling, and dissemination. Audience engagement is considered very important at both sites of the CH production-consumption chain (i.e., push- and pull-ends). At the same time, sustainability factors are placed at the center of the envisioned analysis. A total of eleven contributions (*C<sup>i</sup>* , *i* = 1, . . . , 11) were finally published within this Special Issue, enlightening various aspects of contemporary heritage strategies placed in today's ubiquitous society. Table 1

**Citation:** Dimoulas, C.A. Cultural Heritage Storytelling, Engagement and Management in the Era of Big Data and the Semantic Web. *Sustainability* **2022**, *14*, 812. https:// doi.org/10.3390/su14020812

Received: 19 December 2021 Accepted: 5 January 2022 Published: 12 January 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

outlines the scientific focus and contribution of the listed articles, prior to the associated description/presentation of the conducted works and their research outcomes.

**Table 1.** Contributions by research areas, involved technologies, and proposed solutions.


The first paper presents an approach for leveraging the abundance of images posted on social media and specifically Twitter for large-scale 3D reconstruction of cultural heritage landmarks. Doulamis, Voulodimos, Protopapadakis, Doulamis, and Makantasis (2021) elaborated on an automatic solution for tweets' content identification, image retrieval/filtering, and 3D reconstruction. The proposed approach extracts key events from unstructured tweet messages and identifies cultural activities and landmarks. Next, content-based filtering selects a representative portion of cultural images to support fast 3D reconstruction. The presented methods are experimentally evaluated using real-world data to verify the effectiveness of the proposed scheme (Contribution 1).

The second paper focuses on the effective communication of cultural heritage initiatives which, in the era of big data and the intense environment of social media, it is considered of equal or—in some cases—even greater importance than heritage data themselves. Maniou (2021) assesses the role of media and journalists in propagating cultural heritage news through social media platforms and the narratives they tend to create in the digital public sphere. A qualitative approach is employed by the study as a means of examining in-depth specific narratives, their meaning(s) and connotation(s), using semantic analysis (Contribution 2).

The third paper deals with the environmental impact of green hosting services, bringing forward new insights regarding green websites and sustainability. Karyotakis and Antonopoulos (2021) investigate how green hosting websites tend to communicate their green services through a qualitative content analysis approach. Therefore, new perspectives on supporting environmental heritage are highlighted, including the education toward sustainable development and a broader green cultural tradition (Contribution 3).

The fourth paper focuses on the case of open-source multimedia tools on cultural storytelling. Papamichail and Symeonidis (2021) present data-driven analytics towards evaluating the extent to which software components are maintainable based on the evolution of static analysis metrics that quantify primary source code properties (Contribution 4).

The fifth paper casts light on cultural heritage storytelling in the context of interactive documentary, a hybrid media genre that employs a full range of multimedia tools to document reality, provide sustainability of the production and successful engagement of the audience. Podara, Giomelakis, Nicolaou, Matsiola, and Kotsakis (2021) explore the usability of the interactive documentary genre for the sustainability of cultural heritage, analyzing web metrics from a seven-year database. They conclude that interactivity affordances of this genre enhance the social dimension of cultural storytelling, presenting three main factors that enhance audience engagement (Contribution 5).

The sixth paper addresses sustainability, heritage, management, and communication from UNESCO's Marine World Heritage (MWH) perspective. Kenterelidou and Galatsopoulou (2021) analyze its digital narrative footprint through social media (Instagram), examining whether it is framed with sustainability and biocultural values. The study contributes to setting the ground rules for strengthening marine heritage management and communication in light of the United Nations Sustainable Development Goals (SDGs) and the Ocean Literacy Decade (2021–2030) (Contribution 6).

The seventh paper focuses on the development of an enhanced Mobile Journalism (MoJo) model for soundscape heritage crowdsourcing, data-driven storytelling, and management in the era of big data and the semantic web. Stamatiadou, Thoidis, Vryzas, Vrysis, and Dimoulas (2021) elaborate on previous/baseline MoJo tools, deploying Machine and Deep Learning solutions on sound semantics, driven by a thorough analysis of the audience, the technological framework, and the desired heritage crowdsourcing model. Hence, primary algorithmic backend services are implemented and positively validated, providing convincing proof of concept of the proposed model (Contribution 7).

The eighth paper (Mouzakis et al., 2021) provides a requirements' engineering study for designing new eXtended Reality (XR) experiences authoring systems according to the IEEE830 methodology. The study has reviewed 30 existing authoring environments and proposed 10 candidate scenarios for new tools. Six of them were evaluated by 47 individuals in the fields of media, arts, architecture, and informatics. Evaluation results and comments collected can be helpful in future systems design, democratizing XR-media authoring, including the sensitive areas of intangible culture and heritage storytelling (Contribution 8).

The ninth paper presents a study on semantic indexing of 19th-century Greek literature, incorporating transformer-based models and text summarization techniques in a joint fashion. Dimitriadis, Zapounidou, and Tsoumakas (2021) elaborate on the role of literature on cultural heritage to help people understand the cultural context of an era, a nation, a monument, a place, etc., enabling them to adopt more inclusive and equitable attitudes and behaviors. The proposed language understanding automations can help humans classify literature faster and more consistently (Contribution 9).

The tenth paper focuses on the smart evolution of historical cities, integrating innovative solutions that support the energy transition while respecting cultural heritage. Tsoumanis et al. (2021) carried out a study for the implementation of Building-Integrated Photovoltaics (BIPV) solutions in the Historic Centre of Évora is provided, within the framework of the European project POCITYF (Project H2020). The proposed solutions aim at fulfilling all the guidelines for preserving the historic center and achieving the positivity metrics agreed with the European Commission on the challenging and indispensable path to the decarbonization of European cities (Contribution 10).

The eleventh paper emanates from the fact that a vital part of humanity's cultural heritage resides in its literature, a rich body of interconnected works revealing the history and workings of human civilization across the eras. In this context, Christou and Tsoumakas (2021) implement a deep-learning-based approach to discover semantic relationships in literary texts (19th century Greek Literature), thus helping to sustain critical cultural insights and facilitating the analysis, organization, and management of collections through the automation of metadata extraction (Contribution 11).

Based on the provided insights, a holistic data-driven CH approach is envisioned in Figure 1, depicting a generic strategy for engaging the audience into collecting, preserving, sharing, and managing digital heritage. Among others, the diagram projects the relation and complementarity of the eleven scientific contributions (*Ci*) to the different model phases. Hence, the research works included in this Special Issue are highly representative, appropriately demonstrating the main processes of the end-to-end chain. Nevertheless, future multidisciplinary research and collaborations are also highlighted and anticipated, augmenting the outcomes and the impact of the current *Sustainability* volume on *Cultural Heritage Storytelling, Engagement and Management in the Era of Big Data and the Semantic Web*.

**Figure 1.** Cultural Heritage Storytelling, Engagement and Management in the Era of Big Data and the Semantic Web: A holistic end-to-end model for engaging the audience into collecting, preserving, retrieving, authoring, sharing, documenting, experiencing, and managing digital heritage.

Table 2 list all the eleven (11) contributions incorporated in this special issue with their associated citations.

**Table 2.** List of contributions with their associated citations.


#### **Table 2.** *Cont.*


**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Data Availability Statement:** Data supporting this article can be found in the listed contributions and their assocciated Data Availability Statements.

**Conflicts of Interest:** The author declare no conflict of interest.

#### **References**


## *Article* **Automatic 3D Modeling and Reconstruction of Cultural Heritage Sites from Twitter Images**

**Anastasios Doulamis <sup>1</sup> , Athanasios Voulodimos 2,\* , Eftychios Protopapadakis 1,2 , Nikolaos Doulamis <sup>1</sup> and Konstantinos Makantasis <sup>3</sup>**


Received: 16 April 2020; Accepted: 19 May 2020; Published: 21 May 2020

**Abstract:** This paper presents an approach for leveraging the abundance of images posted on social media like Twitter for large scale 3D reconstruction of cultural heritage landmarks. Twitter allows users to post short messages, including photos, describing a plethora of activities or events, e.g., tweets are used by travelers on vacation, capturing images from various cultural heritage assets. As such, a great number of images are available online, able to drive a successful 3D reconstruction process. However, reconstruction of any asset, based on images mined from Twitter, presents several challenges. There are three main steps that have to be considered: (i) tweets' content identification, (ii) image retrieval and filtering, and (iii) 3D reconstruction. The proposed approach first extracts key events from unstructured tweet messages and then identifies cultural activities and landmarks. The second stage is the application of a content-based filtering method so that only a small but representative portion of cultural images are selected to support fast 3D reconstruction. The proposed methods are experimentally evaluated using real-world data and comparisons verify the effectiveness of the proposed scheme.

**Keywords:** 3D modeling; 3D reconstruction; event detection; Twitter; spectral clustering; cultural heritage

#### **1. Introduction**

According to UNESCO [1], cultural heritage (CH) is the legacy of physical artifacts and intangible attributes of a group or society that are inherited from past generations, maintained in the present and bestowed for the benefit of future generations. Tangible CH encompasses natural and archaeological sites, historic cities and buildings, monuments, artworks and a plethora of artifacts, located in museums around the word. CH is an important resource for economic growth, employment, new skills, and social cohesion, offering the potential to revitalize urban and rural areas and promote sustainable tourism [2]. Furthermore, CH provides significant information on our origin, knowledge, practices, and traditions. Therefore, CH preservation is an important task in both its tangible and intangible facets [3,4]. Three dimensional (3D) reconstruction is a widely adopted technique in the case of tangible CH content.

Three dimensional reconstruction from multiple images is the process of capturing the shape and appearance of real objects [5]. Other definitions include the creation of virtual model of a lost or partially lost building/artifact [6–8], and virtual reconstruction, which replicates an environment that simulates physical presence [9]. In this paper, we focus on the former case, i.e., 3D reconstruction from multiple images. The image sets required for such a task are available online in either repositories (e.g., Flickr) or social media (e.g., Facebook, Twitter). There is only a major drawback: the sources are unrefined; limited/incorrect tags, noise, occlusions, nonexistent angle-of-view, and low resolution are the main problems. In our case, we address all these problems using Twitter as the image repository.

Twitter is a popular and widely used microblogging service, as manifested by the hundreds of millions of new tweet microposts that are posted online daily, providing various research opportunities in data extraction and information retrieval [10–12]. In addition, tweets may contain photos. Thus, the Twitter medium can be seen as a network of "distributed cameras." Ideally, a set of images supporting an adequate 3D reconstruction can be retrieved from Twitter. In the sequel, the Structure from Motion (SfM) algorithm can be exploited to provide the 3D model [13]. Yet, prior to any early conclusions, we should consider the limitations of such an approach.

Generally, 3D reconstruction of a CH asset from images-on-the-wild was investigated [14]. However, focusing on Twitter-located-resources is a challenging process mainly because the images were collected for various purposes but not digitization [14]. Therefore, they are i) inadequate as raw data, since several parts of the CH objects are missing; ii) present a lot of noise (e.g., occlusions of any type); and iii) were captured under quite different conditions, i.e., varying resolution, image calibration, and registration. Although SfM is able to handle noisy images, computational complexity is a problem. As such, content-based retrieval and filtering tools need to be implemented to select only a small subset from the available images, i.e., the ones that contribute most to the 3D modeling.

In this paper, we propose a novel mechanism for automatic 3D reconstruction of important cultural sites and monuments from photos originating from tweet messages. Nowadays, users often post photos from sites they visit and comment on them using Twitter. Therefore, a significant number of images are available online, allowing for 3D reconstruction of CH objects. The core idea is to see the tweets as "distributed social sensors" and, thus, exploit the huge volume of such information for fast—and approximate—digitization of CH content.

The remainder of the paper is organized as follows: Section 2 describes previous works. An overview of the proposed approach is presented in Section 3. Section 4 describes the image collection process as an event detection problem, whereas Section 5 presents the content-based filtering step for representative image selection for 3D reconstruction. Section 6 provides an experimental evaluation of the proposed methods, and Section 7 concludes the paper with a summary of findings.

#### **2. Related Work**

Previous work reviewed here can be grouped into two main categories: one referring to event detection on Twitter and one regarding content-based filtering.

#### *2.1. Event Detection on Twitter*

Any tweet event detection algorithm could be efficiently implemented in two main steps. Firstly, we have the textual characterization of tweet content, which is accomplished through the extraction of textual features that categorize the significance of a word in tweets. The second step involves the development of learning strategies for event retrieval by analyzing the fluctuations in the appearance count of certain words.

Text characterization is usually implemented using the term frequency–inverse document frequency (TF–IDF) metric [15] or distributional features [16]. Yet, both approaches are not appropriate for the Twitter case. Tweets are short messages, usually less than 280 characters [17], leading to statistical inaccuracies in estimated traditional document metrics. At this point, there are two more topics we should consider: following and retweet features. The former allows users to subscribe to other users' tweets, i.e., follow the tweets. The latter gives users the ability to forward an interesting tweet to their followers.

A family of approaches trying to structure tweet content is called news content aggregators. The work of [18] proposes a spatial–temporal algorithm to classify tweets into geographical clusters

and then proceed to produce local news, while the work of [19] introduces a TweetSieve demo, in which a user can submit textual queries. However, both approaches are in fact indirect event detection mechanisms that return tweet posts containing words of a submitted query instead of identifying the most important events. Ranking the tweets according to predefined importance scores is also considered, by using either a combination of tweet properties (e.g., retweets, followers) [20] or applying additional features, such as URL links [21].

Elaborating more on the event-detection algorithms for Twitter, two main approaches exist [22]: document-based and feature-based. The first approach detects events by clustering tweet posts, while the second exploits the distributions of words by extracting a pool of key words that present similar temporal behavior [23]. The proposed event detection method in this paper exploits feature-based metrics. As such, the following paragraphs focus on the specific topic.

The most commonly used static feature-based methods use latent Dirichlet allocation (LDA) [24]. In particular, [25–28] combined LDA with graph models, appropriate metrics (exploiting specific Twitter's parameters, such as followers), and features able to identify the most important events within tweets. Geographic-based topic analyses using LDA models were presented in [29,30]. The works of [31,32] modify the LDA metric to cover sequential summarization of trendy stories over Twitter, and the location–time constrained topic (LTT) for capturing the spatial and temporal content properties, respectively.

The work of [23] introduces a frequent pattern mining method. Alternatively, the authors of [33] analyze temporal feature trajectories for event detection and then apply the discrete Fourier transformation (DFT) on the time–word signals. In this case, events are detected by analyzing spikes in the frequency domain. However, when using DFT, time information is lost. In the same direction, the work of [22] exploits a clustering methodology based on wavelet-transformed word's distribution signals in order to identify events from a collection of tweet posts.

Recently, an event photo mining approach was proposed [34], where tweets, including geotags and photos, were mined to detect various events. Spatial and temporal analysis is presented in [35], while one of the first approaches of visual detection of events in Twitter messages is given in [36]. Finally, clustering methods are proposed in [37–39]. In [40], 3D reconstruction of cultural images is accomplished exploiting tweet messages. This work, however, focuses only a simple theoretic information metric, exploits limited advances in graph theory regarding the analysis of the content, and introduces a small set of results.

#### *2.2. Content-Based Filtering*

Many works in the content-based filtering field start from an initial retrieved image set and then apply hierarchical clustering to filter out images. However, the efficacy of such an approach inherently depends on the visual properties of the reference image(s) and the way the data were captured, e.g., camera parameters, lighting conditions, and existence of occlusions. In [41], a content-based retrieval system was presented with the objective to recall all views (instances) of an object in a large database upon a query exploiting visual similarities. Video content-based retrieval was introduced in [42,43], while a mobile agent was presented in [44]. All the methods, however, need a query to get the retrievals, something which is not relevant in our approach.

In this context, [45] uses image signatures to form clusters of photos that share similar properties. The grouping exploits color information. On the other hand, the work of [46] uses fuzzy support vectors machines for the unsupervised clustering. These approaches share the same limitation; they use global image features to encode visual content. Global visual representation fails to describe the different view instances of an object, since both the geometry of the foreground, as projected onto the 2D image plane, and the content of the background are quite dissimilar. A fast content-based image retrieval system that relies on hierarchical embedded image clouds is presented in [47]. The work of [48] integrates wavelet transforms, local binary patterns, and moments for an efficient content-based image retrieval system.

In this paper, we need to identify and cluster different views of a CH monument. Such an approach was initially introduced by [14]. The key difference of this approach to the present one is the use of a Twitter-based event detection scheme so as to extract key textual events upon which the clustering will be accomplished. This constitutes one of the main originalities of this paper; a social network is exploited as a medium for rapid 3D reconstruction of cultural heritage items.

#### **3. An Overview of the Proposed Approach**

Our approach towards 3D reconstruction over Twitter resources is decomposed into two innovative steps, plus a one for reconstruction. The first step refers to the application of a Twitter-oriented event detection algorithm [49] so as to classify tweets with respect to their content and then extract images describing a particular CH site. The second step involves the application of an automatic algorithm [14] that will cluster the selected images into main groups with respect to the orientation of the cultural site they view, removing outliers so as to accelerate the time of 3D reconstruction. The final step involves the use of the SfM algorithm, which is a photogrammetric range imaging method for calculating three dimensional structures from two-dimensional image sequences, possibly coupled with local motion signals [50].

At first, an event detection algorithm from tweets, based on new information characterization metrics, is utilized. The new metrics are based on the TF–IDF metric [15], appropriately modified to fit tweet characteristics such as number of followers and retweets, as in [49]. Then, a wavelet transformation is exploited to localize the signals in time and frequency domain, simultaneously. Recall that tweets are not synchronized messages. Thus, localization is important prior to any event–cluster approach implemented.

A graph–cut approach, over the calculated metrics' scores, is applied to identify the various events. In order to create the graph, the cross-correlation criterion serves as the distance metric; it can express the similarity of two feature vectors, while being invariant in scale and translation. Once the graph over tweets is completed, clustering can be interpreted as a graph partitioning problem. The adopted graph cut problem is formulated in a way that an optimal solution can be found so that intracluster elements present the maximum coherence, while the intercluster elements present the minimum one. As such, no feedback is required for the definition of number of clusters. In this graph partitioning problem, we contribute by allowing for a multiassignment clustering since, in our context, one word can belong to several clusters (events).

Multiassignment clustering is achieved by introducing a modification of the spectral clustering algorithm. In our approach, we estimate the membership function of a word as the degree to which it may belong to an event, and then we obtain a discrete approximation to this continuous solution that satisfies the multiassignment clustering property. Finally, tweets of similar content (i.e., located at the same cluster) are organized further to create 3D models by "structuring" image content of the same object/place.

The creation of any 3D model is straightforward. Initially, we analyze the image content through suitable features and locate correspondences between similar images. Then, to remove image outliers from the retrieved tweet photos, we consider each image as a point onto a multidimensional hyperspace manifold, as in [14]. The coordinates of each image point on the manifold express the position of the images on the hyperspace. Therefore, they constitute a clear indicator of how close the images are, allowing for the implementation of a density-based approach for the outlier detection.

The distribution of the tweet photos share similar textual information. The first is derived from the tweet events in our case, instead of [12] where a simple textual information is exploited to discard unrelated images. This is one of the main innovation elements of this work since we relate the properties of Twitter for the 3D reconstruction using unstructured, uncalibrated, and not properly annotated images. The advantages of Twitter as the selected social network for the discussed application pertain to its API, as well as the brevity of the posts (tweets), which facilitates the application of event detection methods, in contrast with other social media, e.g., Facebook, where posts are far more text-heavy. One of the drawbacks of this approach (not just using Twitter, but of crowdsourcing for images) is that crowdsourced images do not equally cover all surfaces of a cultural monument, since the majority of the photos posted on social media depict the facades of monuments that are most accessible by visitors (e.g., front face vs. roofs). This limitation unavoidably leads to an imbalanced reconstruction density, whereby the more popular and accessible parts of a monument are reconstructed in a more precise manner compared to the remaining ones. However, creating a fully accurate 3D model with equal reconstruction density necessarily requires sophisticated equipment that would allow acquisition of images from all parts to be processed using photogrammetric techniques; such an approach entails a far larger scale of resources and expertise and has a different scope compared to the proposed work.

In the sequel, we leverage a density-based spatial clustering algorithm to eliminate outliers. In particular, we select the density-based spatial clustering of applications with noise (DBSCAN) algorithm, because of its robustness to outliers. For visual analysis, the ORB (Oriented FAST and Rotated BRIEF) descriptor is utilized. In the next phase, the compact subspace is partitioned regions including salient geometric perspectives of each asset. The goal is to find representative views and then these views are fed as input to the SfM algorithm for 3D reconstruction.

#### **4. Image Collection as a Twitter Event Detection Problem**

#### *4.1. Tweet-Post Characterization*

There are specific research challenges that have to be addressed for an adequate tweet-post characterization. First, tweet messages are often unstructured, meaning that words of an event do not appear absolutely synchronized, requiring new forms of presentation to compensate for these temporal variations. Second, multiassignment clustering approaches are needed since one word can belong to several events. This is in contrast to conventional one-class assignment clustering methods. Third, since words belong to several events, clusters are not well-separable, requiring advanced methods, such as graph partitioning.

The adopted information theoretic metrics, for tweet-post characterization, are based on modifications of the TF–IDF score. TF–IDF score is a statistical measure that evaluates how important a word is to a document in a collection of documents. The importance of a word proportionally increases with the number of times the word appears in a document but is logarithmically offset by the frequency of the word in the corpus of documents. For the sake of completeness, consider the following example.

We are given a document containing 100 words wherein the word "Cat" appears 3 times. The term frequency (TF) for the word "Cat" is then 0.03. Now, assume that we have a corpus of 10 million documents and the word "Cat" appears in one thousand documents. Then, the inverse document frequency (IDF) is calculated as log(10,000,000/1,000) = 4, and the TF–IDF score is the product of these quantities: 0.03 × 4 = 0.12. Thus, TF expresses how often a word appears within the tweets extracted at the current time instance k, while IDF expresses the information contained in the word based on tweets gathered at previous time instances. If a given word is common, IDF takes low values to compensate for the high values expected for TF. On the contrary, if the word is rare (thus the reader gets a surprise, or "more information," when he/she sees it), IDF is high and strengthens the TF term. Consequently, TF–IDF takes high values for words that are not statistically frequent in the previous collections of tweets, but present high appearance rate at the current time instances k.

#### *4.2. Twitter-Based Information Metrics*

As explained in Section 4.1, TF–IDF is not suitable for the Twitter case. Consequently, the same modifications as in [49] are adopted. The first theoretic metric, θ1(k, w), is defined as:

$$\theta\_1(\mathbf{k}, \mathbf{w}) = \frac{\mathbf{N}^{(\mathbf{w})}}{\mathbf{N}(\mathbf{k})} \cdot \log \frac{\sum\_{\mathbf{i}=1}^{\mathbf{P}} \mathbf{N}(\mathbf{k} - \mathbf{i})}{\sum\_{\mathbf{i}=1}^{\mathbf{P}} \mathbf{N}^{(\mathbf{w})}(\mathbf{k} - \mathbf{i})} \tag{1}$$

where θ1(k, w) counts the number of tweets that contains a specific word w within the k-th time interval compensating over the number of tweets that include this word over p previous time intervals. The second metric θ2(k, w) measures the frequency of the word w in the tweet posts compensating over the frequency of p previous time intervals:

$$\Theta\_2(\mathbf{k}, \mathbf{w}) = \frac{\mathbf{C}^{(\mathbf{w})}(\mathbf{k})}{\mathbf{C}(\mathbf{k})} \cdot \log \frac{\sum\_{\mathbf{i}=1}^{\mathbf{P}} \mathbf{C}(\mathbf{k} - \mathbf{i})}{\sum\_{\mathbf{i}=1}^{\mathbf{P}} \mathbf{C}^{(\mathbf{w})}(\mathbf{k} - \mathbf{i})} \tag{2}$$

Variable C (w) (k) counts how many times a word w appears in the N(k) tweets and C(k) is the total number of words that appear within the N(k) tweets.

The third metric θ3(k, w) takes into consideration the significance of a tweet as expressed either by the number of followers or by the number of retweets it produces. The number of followers, denoted as fm(k), with m = 1, . . . , N(k), indicates the credibility of the author of the tweet. The number of retweets rm(k) is a metric for ranking the importance of the textual content posted on the tweets. The variable θ3(k, w) is defined as:

$$\theta\_3(\mathbf{k}, \mathbf{w}) = \frac{\sum\_{\mathbf{m}=1}^{\mathcal{N}(\mathbf{k})} \mathbf{p}\_{\mathbf{m}}^{\mathbf{f}}(\mathbf{k}) \cdot \mathbf{p}\_{\mathbf{m}}^{\mathbf{r}}(\mathbf{k}) \cdot \mathbf{i}\_{\mathbf{m}}(\mathbf{w}, \mathbf{k})}{\sum\_{\mathbf{m}=1}^{\mathcal{N}(\mathbf{k})} \mathbf{p}\_{\mathbf{m}}^{\mathbf{f}}(\mathbf{k}) \cdot \mathbf{p}\_{\mathbf{m}}^{\mathbf{r}}(\mathbf{k})} \cdot \log \frac{\sum\_{\mathbf{j=1}}^{\mathcal{P}} \sum\_{\mathbf{m}=1}^{\mathcal{N}(\mathbf{k}-\mathbf{j})} \mathbf{p}\_{\mathbf{m}}^{\mathbf{f}}(\mathbf{k} - \mathbf{j}) \cdot \mathbf{p}\_{\mathbf{m}}^{\mathbf{r}}(\mathbf{k} - \mathbf{j})}{\sum\_{\mathbf{j=1}}^{\mathcal{P}} \sum\_{\mathbf{m}=1}^{\mathcal{N}(\mathbf{k}-\mathbf{j})} \mathbf{p}\_{\mathbf{m}}^{\mathbf{f}}(\mathbf{k} - \mathbf{j}) \cdot \mathbf{p}\_{\mathbf{m}}^{\mathbf{r}}(\mathbf{k} - \mathbf{j}) \cdot \mathbf{i}\_{\mathbf{m}}(\mathbf{w}, \mathbf{k} - \mathbf{j})} \tag{3}$$

Variable im(w, k) is defined as:

$$\mathbf{i}\_{\mathbf{m}}(\mathbf{w}, \mathbf{k}) = \begin{cases} & \text{ $\mathbf{1}$  iff  $\mathbf{m}$  -  $\mathbf{th}$  twoet contains  $\mathbf{w}$ }\\ \mathbf{0} & \text{otherwise} \end{cases} \tag{4}$$

Variables p f <sup>m</sup>(k) and p r <sup>m</sup>(k) are the normalized versions of number of followers and retweets over the k-th time interval. They are defined as:

$$\mathbf{p}\_{\mathbf{m}}^{\mathbf{f}}(\mathbf{k}) = \mathbf{f}\_{\mathbf{m}}(\mathbf{k}) / \sum\_{\mathbf{m}=1}^{N(\mathbf{k})} \mathbf{f}\_{\mathbf{m}}(\mathbf{k}) \tag{5}$$

and

$$\mathbf{p}\_{\mathbf{m}}^{\mathbf{r}}(\mathbf{k}) = \mathbf{r}\_{\mathbf{m}}(\mathbf{k}) / \sum\_{\mathbf{m}=1}^{N(\mathbf{k})} \mathbf{r}\_{\mathbf{m}}(\mathbf{k}) \tag{6}$$

The outputs of the information theoretic metrics, over the long period of K intervals, are utilized for the calculation of a signal x (k) <sup>w</sup> , which measures the impact of the word w on the posted tweet content. Please note that K and p are different variables; K is a time window over which we evaluate the time series signal of a word w. Instead, p refers to a time period over which we take statistics for calculating the term IDF of Equations (1)–(3).

Noise variations, due to time shifts (i.e., temporal variations), of a word are observed. Such variations are caused by temporal delays and the fact that messages posted on the Twitter are not synchronized. Discrete wavelet transform (DWT) is applied on the gross signal of the information theoretic metrics, as in [49], to mitigate such noise effects. DWT is chosen since it can localize both time and frequency domains simultaneously. In what follows, let us denote s (k) <sup>w</sup> as the wavelet transformed signal of x(k) <sup>w</sup> .

#### *4.3. Defining the Similarity Metric*

Euclidean distance is a commonly used approach in many cases, e.g., [51–53]. Yet, the main limitation such distance is that it does not directly express the similarity between two feature vectors. Euclidean distance is sensitive to feature vector scaling and/or translation. For this reason, normalized

cross-correlation has been widely used as it remains unchanged with respect to feature vector scaling or translation. For example, adding to or multiplying by a constant value all elements of a feature vector affects the Euclidean distance but not the normalized cross-correlation.

The normalized cross-correlation Dcc wi , w<sup>j</sup> between the two words' signals sw<sup>i</sup> and sw<sup>j</sup> is given by:

$$\mathbf{D}\_{\rm cc}(\mathbf{w}\_{\rm i}, \mathbf{w}\_{\rm j}) = \frac{\mathbf{s}\_{\rm W\_i}^{\rm T} \cdot \mathbf{s}\_{\rm W\_j}}{\sqrt{\mathbf{s}\_{\rm W\_i}^{\rm T} \cdot \mathbf{s}\_{\rm W\_i}} \cdot \sqrt{\mathbf{s}\_{\rm W\_j}^{\rm T} \cdot \mathbf{s}\_{\rm W\_j}}} \tag{7}$$

Thus, in the following, we adopt the normalized cross-correlation distance as the similarity metric. Initially, we discuss how the signal x (k) <sup>w</sup> behaves on the cross-correlation metric. We recall that signal x (k) <sup>w</sup> refers to one of the three metrics (see Equations (1)–(3)).

#### *4.4. Multiassignment Graph Partitioning Approach to Event Detection in Twitter*

For the detection of the events, we exploit concepts of the graph partitioning problem under a multiassignment perspective. More specifically, if we denote as G = {V, E} the graph, the vertex of which V = {w1, . . . , wL} corresponds to the set of L different words and its edges eij = wi , w<sup>j</sup> refers distances between two words w<sup>i</sup> , w<sup>j</sup> , as described in Section 4.3 then our goal is to divide the graph into M partitions (clusters), each of which will contain words that were extracted from the tweet-post collection and present similar burst patterns. Since one word can belong to more than one event, a multiassignment graph partitioning is adopted [49].

More specifically, the partitioning is carried out using concepts of spectral clustering [54] and properly modifying them to cope with the multiassignment graph partitioning problem. Spectral clustering finds the most suitable clusters so as to maximize the coherence for all nodes belonging to a partition while simultaneously minimizing the coherence of the nodes assigned to different partitions. For the minimization, the graph is represented as a matrix and then the Ky Fan theorem [55] is applied to decompose the matrix from into optimal solutions based on an eigenvalue decomposition approach. The key problem of such optimization is that it gives the optimal solution in a continuous form [56]. In our tweet-based event detection problem, each message belongs to a limited number of events. To address this, we need to approximate the solution under a multiassignment clustering framework. This is addressed by allowing each row of the matrix to have more than one unit element. In the sequel, an iterative methodology is carried out to recover the optimum discrete approximate solution. This methodology recursively estimates an optimal rotation matrix in a way that the continuous solution will be as close as possible to the discrete one. This optimization can be solved through a singular value decomposition problem as described in [49].

Spectral clustering is selected in lieu of other partitioning methods such as k-means and min cut graph. More specifically, in [49], we compared the effect of different clustering methods on event detection performance from tweets on the basis of precision–recall scores on a ground truth dataset. The analysis concludes that the multiassignment graph partitioning algorithm outperforms other simpler forms of partitioning that use objective metric criteria. This is the reason for selecting this clustering scheme than other approaches, as it seems to better "model" the structure of tweet messages.

#### **5. Representative Image Selection as a Content-Based Filtering Problem**

Having detected the events from the tweet messages, the next step of our approach is to identify the most relevant images on a given CH asset, removing the outliers, and finding the most representative views of the monument. To achieve this objective, the following steps are incorporated in our research.

#### *5.1. Visual Modeling and Image Representation onto Multidimensional Manifolds*

Local visual descriptors are used to capture the different geometric perspectives of an object. The calculated local similarities, among the selected images, are required for a 3D reconstruction. X ⋅ X<sup>T</sup>

N

In this paper, we constrained the problem so that the selected images lay on the same event, which represents the same touristic location and monument. ORB descriptor [57] is utilized since is fast and, simultaneously, robust. Let us now assume that we have two images named A and B, selected from different tweets that refer to the same cultural object. In order to find the corresponding points for these two images, multiprobe locality sensitive hashing [58] is used for nearest-neighbor search exploiting the Hamming distance. The proposed approach is used because ORB keypoints are described by a binary pattern. –

X

B =

For a given dataset, all possible pairs of images are considered for a two-way matching of the corresponding points. The corresponding points allow for the calculation of a similarity metric among all images, as presented in [14]. If the two images are visually similar, the respective values in the similarity matrix will be small. This means that if images are represented as points onto a multidimensional manifold, then visually similar images will belong to high spatial density subspaces, and image outliers that will be spread out. If we define as X a matrix containing the coordinates of all N images in the dataset, the classical multidimensional scaling (cMDS) [59] can be used to establish a connection between the space of the distances and the space of Gram matrix B = X · X T , as proven in [60]. Figure 1 depicts the position of inliers and outliers in a 2D space case when applying the classical multidimensional scaling (cMDS) on a set of image data. G = (V, E) wi,j i th j th wi,j

**Figure 1.** Distribution of inliers/outliers using images projected onto a 2D space when applying the classical multidimensional scaling (cMDS) algorithm. Illustration is for a given monument.

#### *5.2. Additional Content Refinement and Extraction of Representative Views*

Outlier removal exploits density properties of the projected images. In particular, we assume that outliers lie onto isolated regions. Then, we extract these outliers by identifying corresponding areas as in [14]. To further improve the outlier removal accuracy, we implement a variation of the density-based spatial clustering of applications with noise (ST–DBSCAN) algorithm [61]. Then, we apply a graph-partitioning algorithm for finding the most diverse views of a cultural heritage asset. The partitioning is applied on the detected compact subset of relevant images as explained in the previous paragraph. The main goal of this partitioning is to separate the image data into clusters of representative views of an object.

The discrimination among representative views of an object is achieved by using spectral clustering. In this way, clustering is addressed as a graph partitioning problem, hence no assumptions are made regarding the generated clusters. Let us, again, assume a graph G = (V, E), where V denotes the vertices of the graph and E the edges of the graph. In this case, vertex set V includes the images of

the detected compact subset as extracted by DBSCAN. We also denote as wi,j the weight of the edge connecting the i th with the j th vertex. In this representation, edge weight wi,j is equal to the similarity distance between the descriptors of the images corresponding to the vertices i and j. The spectral clustering approach is similar to the one adopted for event detection. However, in this case the conventional spectral approach is followed since we do not need a multiassignment partitioning.

#### *5.3. Twitter Content Copyright Issues*

An important issue pertains to social media content and copyright issues. Today, there is a tremendously vast amount of data available on social media and posted daily on Twitter, but simultaneously there is a confusion regarding if these data are for public domain, fair user, creative commons, or even copyrighted [62]. The key question is if "will a short description of 140 characters cause intellectual property rights infringement"? This question cannot be answered with the same manner for all cases and for all countries.

In the US, short quotations or a public Twitter message would most likely qualify as fair use [63]. This is in accordance with the 1961 Report of the Register of Copyrights on the General Revision of the US Copyright Law that cites examples of activities that courts have regarded as fair use, such as, for instance, "quotation of short passages in a scholarly or technical work, for illustration or clarification of the author's observations; reproduction by a teacher or student of a small part of a work to illustrate a lesson; reproduction of a work in legislative or judicial proceedings or reports; incidental and fortuitous reproduction, in a newsreel or broadcast, of a work located in the scene of an event being reported."

In other words, the fair use principle allows others to use copyrighted material but under a reasonable manner and without the owner's consent mainly for teaching, scholarship, and research purposes [62]. This restriction does not refer to a certain amount of words, but the restriction is applied to a case-to-case basis. It all depends on the circumstances [63].

#### **6. Experimental Results**

In the following, we first evaluate the event detection algorithms from tweets (Section 4) and then the proposed content-based filtering method (Section 5) that eventually lead to 3D reconstruction after employing SfM techniques. In this work, VisualSFM was used, a GUI application for 3D reconstruction using SfM, exploiting multicore parallelism for feature detection, feature matching, and bundle adjustment [64].

#### *6.1. Evaluation of Event Detection on Twitter*

#### 6.1.1. Evaluation under a Controlled Environment

We assume that at each time interval *k*, 3000 tweets are posted which are equally categorized as being of high, medium, or low importance. We also assume that 10 events take place, each consisting of five words in order to form the ground truth set *Wgt*. To generate the words within an event, we develop a word generator. This generator produces tweets that contain the particular word based on a probability that indicates the percentage of the tweets of a particular category (high, medium, low importance) that have posted the specific word. In our experiments, this probability follows a Gaussian distribution; the mean value corresponds to the probability of the event this word is assigned to (average number of tweet-post words of this event over the total number of tweets), while the standard deviation σ regulates the coherence degree that the respective event has in terms of word appearance. Small values for the standard deviation means that almost all the words within an event are synchronized in time, since their appearance probability is quite similar for every time interval. The opposite is held for high standard deviation values.

Figure 2 shows an example of the probability distribution for three out of the ten events generated under the controlled environment being posted by the most important tweets over 20 time intervals (periods). We observe that the three events have a different time period of occurrence. To reduce the statistical noise, 500 experiments are conducted and then we take the average.

**Figure 2.** Distribution probability over 20 time intervals (periods) for three different events regarding the most important tweets.

– θ<sup>i</sup> (k, w) i = 1,2,3 θ3 (k, w) θ2 (k, w) Figure 3 presents the precision–recall results for the three metrics θi(k, w), i = 1, 2, 3 when the wavelet representation described in Section 4.2 is applied (wavelet) or not (nonwavelet). We observe that metric θ3(k, w) yields higher precision values for the same recall than the other two metrics, while the lowest precision values are achieved for metric θ2(k, w). A slight improvement in the precision values is also noticed for the wavelet representation described in Section 4.2 and for all metrics due to a better compensation of the temporal variations of the words signals. – θ<sup>i</sup> (k, w) i = 1,2,3 θ3 (k, w) θ2 (k, w)

– tandard deviation σ equal to 1. – tandard deviation σ equal to 1. **Figure 3.** Precision–recall curve for the three proposed metrics using both wavelet and nonwavelet representation and for a standard deviation σ equal to 1.

σ σ σ σ In Figure 4 precision values are plotted against the standard deviation σ, for recall value RE equals to 0.7. We notice that as σ increases, the precision decreases for all metrics, since higher standard deviation means that the words of an event are not posted under a synchronized framework. Additionally, from Figure 4, we can see that the use of wavelet representation makes the proposed event detection algorithm more robust to noise. Instead, for small values of noise (standard deviation) the effect of the wavelet on precision performance is eliminated, since the words are posted in a synchronized way. Similar conclusions are drawn in Figure 5 for RE = 0.4.

σ **Figure 4.** Precision versus standard deviation σ for the three tweet-post characterization metrics when we (do not) apply the wavelet representation for a recall value RE = 0.7. σ 

σ σ **Figure 5.** Precision versus standard deviation σ for the three tweet-post characterization metrics when we (do not) apply the wavelet representation for a recall value RE = 0.4. σ

#### 6.1.2. Evaluation on Real-World Data

– – We extracted real-world tweet data using a publicly available API of Twitter. Using this API, we downloaded tweets spanning a one-month period, with six-hour duration time intervals. We filtered the results using cultural heritage and touristic domains.

θ3 (k, w) θ2 (k, w) θ3 (k, w) θ2 (k, w) Figure 6 shows the precision–recall curve on real-life data. Again, the weighted conditional word tweet frequency metric θ3(k, w) yields the highest precision for a given recall value, while word frequency θ2(k, w) provides the lowest precision accuracy. In Figure 6, we also depict the effect of the wavelet representation on the three information theoretic metrics, verifying again that the use of wavelet representation increases precision scores. – θ3 (k, w) θ2 (k, w)

**Figure 6.** Precision–recall curve for the three tweet-post characterization metrics when we (do not) apply the wavelet representation of Section 4.2. The results were obtained using cross-correlation distance on real-life tweet posts.

–

In Figure 7, we compare several methods proposed in the literature for tweet-based event detection with ours in terms of precision–recall, including (a) the LDA approach, (b) the method of [31], and (c) the method of [26], where event detection on Twitter is based on a combination of LDA with the PageRank algorithm. As is observed, our method outperforms the other methods that are compared, since it better models the dynamic behavior of Twitter data while simultaneously compensating for the time vagueness in an event's appearance. – compensating for the time vagueness in an event's appearance.

**Figure 7.** Comparison of the proposed method with other approaches for extracting events in tweets.

#### 6.1.3. Computational Complexity

θ3

O(p) θ<sup>i</sup> (k, w) i = 1,2 p O(p ⋅ N) (k, w) N p O(M ⋅ L 2 ⋅ τ) M L τ M O(L 2 ) L The computational complexity of the proposed algorithm includes (i) the computational cost needed to construct the word signals and their wavelet representation, and (ii) the clustering computational cost. The complexity for constructing the word signals is of order O(p) for θi(k, w), i = 1, 2 where p is the number of previously examined time intervals, and of order O(p · N) for θ3(k, w), where N is the maximum number of tweets over the p periods. The wavelet representation adds a small cost to the word signal construction. Similarly, the main bottleneck for the multiassignment graph partitioning algorithm is the eigenvalue decomposition optimization. The fastest implementation for the eigenvalue decomposition problem is though the Lanczos method [65] whose complexity is O M · L 2 · τ , where M is the number of eigenvalues that have to be computed (the number of events), L is the number of words, and τ the number of iterations of the algorithm. Typically, M is several times smaller than L and the complexity is of order O L 2 , where L is bounded since it indicates the number of words.

O(M ⋅ L ⋅ τ ⋅ q) τ sw (k) By adopting a simpler clustering algorithm, like k-means, we can get slightly lower complexity than with the proposed multiassignment graph partitioning. In particular, k-means has O(M · L · τ · q) complexity, where again τ bounds the iterations of the algorithm and q is the size of vector s (k) <sup>w</sup> . Usually, q is smaller than L, resulting in slightly faster convergence of the k-means than the proposed multiassignment graph partitioning. However, in case of event detection from tweet posts, the number of words L is often of the order of some hundreds/thousands, meaning that the actual extra running time is of order of some milliseconds, making the computational efficiency of k-means practically negligible compared to what we lose in terms of event detection performance. This implies that it is more important to have a better clustering algorithm rather than to try to improve the running time by few milliseconds.

#### 6.1.4. Reconstruction Efficiency

In this section, we evaluate the efficiency of 3D reconstruction with and without the use of the proposed Twitter-based algorithm. In particular, in Figure 8a, we depict the reconstruction accuracy using the SfM scheme using 100 inliers and 0 outliers for images all captured from the Monument to the Discoveries (Padrão dos Descobrimentos) in Lisbon, Portugal. These images were randomly selected from tweet posts of users' visits this monument. In Figure 8b, we show that the reconstruction accuracy was severely affected by the inclusion of 10% of outliers in the image data, that is, 90 inliers and 10 outliers. The results become really very bad when the number of outliers is significantly large, reaching 30% (Figure 8c) or even 40% (Figure 8d) of the total number of images. In the last case, indeed, the reconstruction is so poor that we are unable to understand which monument the reconstruction refers to.

**Figure 8.** The effect of the number of the outliers in reconstruction performance using the SfM algorithm. (**a**) 100 inliers, 0 outliers. (**b**) 90 inliers, 10 outliers. (**c**) 70 inliers, 30 outliers. (**d**) 60 inliers, 40 outliers.

What deteriorates the situation in Figure 9 is that the selected images are not captured using proper geometry as they should. Instead, they are randomly selected from tweet messages and thus they do not correspond to a proper geometry of the whole space. This proves the significance of our scheme. We remove the outliers and we keep only the most geometrically representative images. So, though we use a small number of images, we get very high reconstruction accuracy rather than just

feeding all the images into the SfM structure. In addition, we get much higher computational cost. More specifically, if we remove the outliers, then we will get the times depicted in Table 1.

**Figure 9.** The effect of the number of the outliers in reconstruction performance using the SfM algorithm. (**a**) 100 inliers, 0 outliers. (**b**) 90 inliers, 10 outliers. (**c**) 70 inliers, 30 outliers. (**d**) 60 inliers, 40 outliers.

**Table 1.** Execution time for 3D reconstruction with respect to the number of fed images.


As a result, in our approach about 100 images are considered adequate to provide a satisfied reconstruction of the monument. However, these images have been selected by removing the plethora our outliers and keeping only the most representative data as being automatically extracted by our algorithms.

Another example of the effect of the number of outliers/inliers on 3D reconstruction performance is shown in Figure 9, depicting the most popular monument of Paris, the Eiffel Tower. The same results are extracted, also showing scale properties of the proposed approach. This means that our method can be useful for different types of users and different information needs regarding the 3D reconstruction accuracy.

#### **7. Conclusions**

The rapid expansion of Twitter as a social messaging medium stimulated a series of new services that can leverage its power. A prominent space of interest is cultural heritage since nowadays, millions of users share opinions and comments over Twitter on cultural matters, especially when they visit new places for tourism. In other words, Twitter can be seen as a "distributed camera" sensor at the hands of users who can capture and comment (thus implicitly annotate) cultural content when they are in a place of interest. This is presented in this paper by combing algorithms for event detection and content-based filtering.

Experiments were conducted using both a controlled environment and real-world data. The experiments showed that the proposed Twitter-oriented information metric outperforms the conventional TF–IDF schemes stressed to different objective criteria such as precision, recall, and F1-score. Comparisons were given with other tweet-post event detection algorithms that indicate the superior performance of our approach compared to that of previous approaches. Finally, we investigated the computational complexity of our proposed methods.

Regarding the content-based filtering, we demonstrate that our scheme is able to perform adequate 3D reconstruction using a small set of images. This significantly reduces the time for 3D modeling while allowing for a massive digitalization of world-rich cultural identity. The final results show that we can significantly accelerate the time needed for a 3D reconstruction exploiting only a small but representative number of image views for a particular monument. This promises massive reconstruction of the 3D geometry of cultural sites of interest, though without high precision. Such a massive 3D reconstruction can be significantly useful for preservation purposes.

Regarding future research, it would beinteresting toinvestigate dynamic aspects of graph partitioning construction and the use of incremental learning algorithms. A combination of a multilevel graph partitioning technique with the proposed eigenvalue decomposition could be examined. Finally, metrics that rank the importance of a tweet according to previous statistics based on a variety of criteria could also be applied.

**Author Contributions:** Conceptualization, A.D. and N.D.; Formal analysis, A.D., A.V., and K.M.; Funding acquisition, A.D.; Methodology, A.D., A.V., and K.M.; Project administration, A.D. and N.D.; Software, A.V., E.P., and K.M.; Supervision, A.D. and N.D.; Validation, E.P. and K.M.; Writing—original draft, A.D., A.V., and E.P.; Writing—review and editing, A.V., E.P., and N.D. All authors have read and agreed to the published version of the manuscript.

**Funding:** This paper is supported by the research project "4DBeyond: 4D Analysis Beyond the Visible Spectrum in Real-Life Engineering Applications", project No. HFRI-FM17-2972 funded by the Hellenic Foundation for Research & Innovation.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Semantic Analysis of Cultural Heritage News Propagation in Social Media: Assessing the Role of Media and Journalists in the Era of Big Data**

**Theodora A. Maniou**

Department of Social and Political Sciences, Journalism Faculty, University of Cyprus, Aglantzia Campus, Nicosia 1678, Cyprus; maniou.theodora@ucy.ac.cy

**Abstract:** In the era of big data, within the intense environment of social media, the effective communication of cultural heritage initiatives is considered of equal or—in some cases—even greater importance than heritage data themselves. Media and journalists play a critical and in some cases conflicting role in audience engagement and the sustainable promotion of cultural heritage narratives within the social media environment. The aim of this study was to assess the role of media and journalists in propagating cultural heritage news through social media platforms, and the narratives they tend to create in the digital public sphere. A qualitative approach is employed as a means of examining in-depth specific narratives, their meaning(s) and connotation(s), using semantic analysis.

**Keywords:** cultural heritage; social media; news; journalism; semantic analysis

#### **1. Introduction**

In the era of big data, multimodal content production and distribution processes have been revolutionized, propelling the emergence of novel mediated communication services [1]. In recent years, technologies and techniques have been developed that harvest, organize and analyze data, providing knowledge and insights into the structure and behavior of online activity. Among other functions, such techniques include conceptual network analysis, which can provide insights into the structure and dynamics of concepts (words, ideas, phrases, symbols, web pages, etc.) [2]. Through this type of analysis, the content produced and disseminated on social media platforms can be interpreted as an indicator of people's attitudes towards a product/service/event [3]. Information released on social media can thus affect people's perceptions of it and the framework within which it is presented.

In recent years, cultural heritage seems to be among the sectors whose popularity on social media platforms is rapidly rising, as cultural organizations acknowledge that the proper communication of cultural heritage initiatives is considered of equal or—in some cases—even greater importance than heritage data themselves [4]. Media and journalists themselves are neither distant from nor ignorant of such practices. As the intermediate aggregators of this public information, they can play a critical role in audience engagement and the sustainable promotion of cultural heritage narratives in social media. Especially in the era of big data, the significance of this role is expected to increase, because big data, among other reasons, generated within the global social media environment, can often produce ambivalent and/or contradictory narratives to the initial posts.

The era of the Semantic Web brought forward a series of new challenges. The Semantic Web is not a separate Web but an extension of the current one, in which information is given well defined meaning, better enabling computers and people to work in cooperation and is generally understood to be an evolution of conventional Web technology towards an incorporation of semantics, facilitating the automated processing of and reasoning of Web content [5]. Since its inception, the term has come to encompass a spectrum of technologies

**Citation:** Maniou, T.A. Semantic Analysis of Cultural Heritage News Propagation in Social Media: Assessing the Role of Media and Journalists in the Era of Big Data. *Sustainability* **2021**, *13*, 341. https://doi.org/10.3390/su13010341

Received: 19 October 2020 Accepted: 16 December 2020 Published: 1 January 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

and standards for formally describing, structuring, querying and processing semantically enriched information. With regard to social media platforms, it was identified as early as 2004 that a formal, web-based representation of social networks is both a necessity in terms of infrastructure as well as a prominent application for the Semantic Web [6]. As the years passed and the communication technologies were rapidly evolving, it soon became evident that due to the distinct nature of social media platforms, a new 'Social Semantic Web' or Web 2.5 had emerged, providing a formal representation of knowledge based on the meaning of data. When social data meets semantics, social intelligence can be formed in the context of a semantic environment in which the user and community profiles and interactions are semantically represented [7]. This study uses a semantic approach to analyze the social media data retrieved.

The aim of this study was to assess the role of media and journalists in propagating cultural heritage information through social media platforms, and the way(s) they can affect the digital public sphere. The Amphipolis tomb excavations in Greece during 2014 were selected as a case-study for studying cultural heritage information through social media platforms. A qualitative content analysis approach was employed as a means of applying semantic analysis and examining in-depth specific narratives, their meaning(s) and connotation(s). The study builds on the findings of Fouseki and Dragouni (2017) [8], who conducted an analysis of newspaper content about the Amphipolis excavations to identify the ways in which cultural heritage was used in traditional media narratives. This work attempts to identify the relevant narratives that emerge from media and journalists' posts on social media using semantic analysis.

#### **2. Cultural Heritage Information and Social Media Platforms**

Cultural heritage (CH) was initially understood in a range of different ways until the 1972 World Heritage Convention, which codified a detailed definition: "*The following will be considered as CH: monuments (architectural works, works of monumental sculpture and painting, elements or structures of an archaeological nature, inscriptions, cave dwellings and combinations of features, which are of outstanding universal value from the point of view of history, art or science); groups of buildings (groups of separate or connected buildings which, because of their architecture, their homogeneity or their place in the landscape, are of outstanding universal value from the point of view of history, art or science); sites (works of man or the combined works of nature and man, and areas including archaeological sites which are of outstanding universal value from the historical, aesthetic, ethnological or anthropological point of view)*" [9]. At national and regional levels, the scope of the term was broadened after 1972 to include gardens, landscapes and environments, and later reinterpreted and defined quite differently in Europe, Australia, New Zealand, Canada and China. Although the scope of heritage, in general, is agreed internationally to include 'tangible' and 'intangible' as well as 'environments', the finer terminology of 'heritage' has not been streamlined or standardized, and thus no uniformity exists between countries [10].

More recently, Barrère (2016) [11] offered a wider definition, by arguing that heritages do not only include 'official' and 'institutional' forms of heritage such as museums, libraries, archaeological and historic sites, and archives, but also all heritages resulting from the accumulation and sedimentation of creativity, i.e., by the history that develops and passes culture through to society. This includes a whole range of elements—from the heritage of know-how found in a *maison de couture* to the recipes typical of a culinary culture, or a common language used within a territory. Individuals, families, companies, industries, territories, societies, and humanity all inherit these resources from the past. Parallel to this extension process, the selection criteria for assessing CH have also changed; while initially historic and artistic values were the only parameters, additional ones now include cultural values, the value of identity and the capacity of the object/monument to interact with memory [12].

A technology-driven alternative to promoting CH has emerged through the advent of Web 2.0, as multimedia platforms have the potential to move the state of the art of promotion beyond static displays, capturing in interactive forms the social, cultural and human aspects of CH and the societies who inherit it. The term 'new heritage' was adopted after the introduction of new media in the mid-2000s, in an effort to broaden the field and address the complexity of both tangible and intangible CH and the related social, political and economic issues surrounding aspects of CH [13].

Whereas the problem 20 years ago was the scarceness of information (precious original documents only accessible in major libraries), the Semantic Web era's problem is that of information overload: many online databases are available, each with their own search forms and attributes [14]. With more and more digital content being added to the enormous collection of online news, social media, archives, etc. every day, making sense of this mass of information is becoming increasingly important and challenging. Not only is new content being generated continuously, but existing legacy information in analogue formats is being converted into the digital realm for the purposes of preservation and sustainability [15].

The rapid rise of social media platforms brought forward new challenges in promoting CH information, as these platforms function with completely different norms than traditional media, since users take the lead in content publication and dissemination. On the one hand, huge amounts of user-generated content have become a valuable source of news for the mainstream media [16]. On the other hand, mainstream media and journalists themselves increasingly embrace semantic tools to improve the ways in which news materials are gathered from a variety of sources (social media being among them), to provide a machine-readable data structure and facilitate information integration and presentation [17].

#### **3. The Role of Media and Journalists in Disseminating CH News within the Social Media Environment**

In recent years, as information regarding CH and the related topics have gained increasing importance in social media investigations, several relevant studies have emerged. Most of these focus on CH as a participatory culture [18], the connection between place and heritage, and the possible threats and opportunities that social media offer [19], and wider issues with regard to the relation between CH and the social media environment. However, to date the specific area of CH news propagation in social media and the role of media and journalists in this process remains largely understudied.

A new, 'semantic' form of journalism that emerged in the Semantic Web era [17,20] seems to be gradually altering not only the way(s) in which news is disseminated on the Web, but also the practices for finding, compiling, aggregating and validating newsworthy material posted on social media platforms. This aims to engage audience members individually, validating their involvement and positively reinforcing personal participation in the narration of news/events/information [21]. These semantic units of journalistic and media information can be viewed as existing on a continuum ranging from the simple annotation of articles and annotated textemes to fully structured statements and single posts. They are constrained by the symbols that represent them, and especially, by the availability of a shared semantic grounding system within which those symbols can reference actual things, concepts or events in the world [22]. Furthermore, for any given CH news topic, relevant information scattered across various social media platforms seems to be heterogeneous (in regard to the collaboration between human subjects having different cultural and technical biases and backgrounds, ranging from humanistic domains to multiple scientific and analytic areas), highly unstructured and—in some cases—incomplete [3].

More importantly, CH news propagation within social media by media and journalists themselves may lead to the formation of various and—even—conflicting narratives and connotations for the users. After all, the Social Web is an ecosystem of participation, where value is created by the aggregation of many individual user contributions [23]. In the case of media and journalists, aggregation seems to be a key issue in the era of big data, along with practices of information filtering and categorization. Anderson (2013) [24] argues that the line between aggregation and original reporting is not entirely clear, despite rhetorical attempts at category purification and boundary-drawing. The real conflict

between aggregation and journalism lies in the type of objects from which they build their stories and that they take as their criteria of evidence (p. 1021).

Journalists compile facts, quotations, documents, and links together in order to create narrative-driven news stories. In the social media environment, the narratives that tend to emerge are not always based on media and journalists' posts, but in several cases, these may be generated by comments and reactions to these posts. From this perspective, whereas in some cases the created narratives may be in line with the content of media and journalists' posts, in other cases they may be conflicting and contradictory.

As the specific topic of CH news content in social media appears to be a rather understudied area, this study uses as a starting point the findings of Fouseki and Dragouni (2017) [8], who conducted an analysis of newspaper content about the Amphipolis excavations to identify the ways in which cultural heritage was used in traditional media narratives as a means to negotiate national identities. Their study identified six main narrative categories: (a) Amphipolis as a reality show of agony and thrills; (b) the political use of Amphipolis to distract the public from dystopia (politics of distraction); (c) an orchestrated attempt to further feed the myth of Alexander the Great; (d) the use of Amphipolis to foster national pride and a sense of national euphoria; (e) an emphasis on the uniqueness of the discovery and the sacredness of the objects; and (f) the use of Amphipolis as an inspiration to discuss everyday social and political issues.

Following a combined deductive and inductive approach and in relation to the existing literature review as presented and analyzed here, this study sought to answer the following research questions:

**RQ1.** How can journalists' posts affect constructed narratives in the process of cultural heritage news propagation within the social media environment?

**RQ2.** How can media posts affect constructed narratives in the process of cultural heritage news propagation within the social media environment?

**RQ3.** How does the role of news propagator differ among media and journalists on social media platforms?

#### **4. The Case Study of the Amphipolis Tomb in Crisis-Ridden Greece**

The media frenzy of Amphipolis started on 11 August 2014 when the Greek media announced that a group of state archaeologists had reached the entrance of a tomb, surrounded by a 497-metre tumulus at Kasta Hill and guarded by two marble sphinxes that framed its entry arch [8]. Official announcements generated hopes that this could be the tomb of Alexander the Great and if so, archaeologists were on the verge of one of the greatest discoveries of the century. Immediately, journalists from Greek media as well as foreign correspondents arrived on site to start the coverage of the excavations and their subsequent daily news feeds on the excavations transformed the archaeological research into a media spectacle.

Meanwhile, following the Great Recession of 2007, Greece was faced with a prolonged sovereign debt crisis that became evident in late 2009, connected to protracted and deepening social and political crises. As such, everyday modes of living were decisively altered and all societal aspects were deeply affected [25]. Media narratives were centered on the day-to-day aspects of the crisis, as successive governments enforced several rounds of tax increases, spending cuts (including pensions and public sector employees' salaries) and structural reforms, often triggering social protests and riots around the country [26]. All these resulted in the formation of a turbulent political scene with successive electoral procedures that resulted in an unstable social and political environment.

In this context, the archaeological discovery was the first piece of positive news Greeks had received in a long time. The coalition government of Nea Dimokratia and PASOK, and the Prime Minister Antonis Samaras rushed to grasp the opportunity and alter the dominant narrative in the national public sphere: in the space of one day, indebted Greece changed from the 'outcast' of Europe to—once again—the cradle of global culture, the land of the great and proud Greeks. As Fouseki and Dragouni (2017) [8] argue, the media

coverage of the archaeological excavations in Amphipolis cultivated a fertile ground for the political maneuvering the focus away from depressing economic developments.

#### **5. Method and Research Sample**

This study was based on a qualitative analysis of the findings from a conceptual network analysis of the Facebook posts of journalists who covered the Amphipolis tomb excavations and the posts of the media outlets these journalists were working for. The sample examined includes various posts (articles, news beats, pictures, personal posts) and their comments. The core of this study is based on these comments and the narratives they create within the social media environment, since social network analysis relates to understanding connections between humans as they interact [27]. A social network is a constellation of nodes and their respective links. A node (also known as an actor or a vertex) is the fundamental unit of any network, social or otherwise [28]. The node, in this study, is the individual Facebook user (journalist or media outlet).

While the posts were created/produced by the media entities and journalists themselves, the comments on these posts derive from Facebook users in general. As such, media entities and journalists in this work were seen as content aggregators within the social media environment, who can initiate a 'digital discussion' within the network of their links (friends/followers) which, in turn, results in the creation of various and sometimes even contradictory narratives. These narratives emerge from the analysis and categorization of the data examined (in this case, comments as reactions to the posts). The research aimed to assess the role of media and journalists in propagating CH news on social media platforms using semantic analysis, being the process of drawing meaning from text, allowing the understanding and interpreting of sentences, paragraphs, or whole documents. This study sought to discover semantic similarity or dissimilarity among the data retrieved, in this case namely posts and comments [29].

In order to find the journalists on site who provided daily coverage of the Amphipolis excavations, initial research was conducted of the news stories presented in the country's mainstream media. The period selected was based on the timeline of the archaeological discoveries in Amphipolis (see analytical Table A1 in Appendix A). Although the data presented here do not cover the excavation's entire trajectory, the sample corresponds with the major outbreak events related to the archaeological discovery, as presented through Google Trends (see Figure 1), covering the period from 11 August to 31 December 2014.


**Figure 1.** Google Trends regarding the Amphipolis excavations.

Initial research on the media content of the Amphipolis excavation for this period indicated a total of 18 journalists working for 11 different media entities (eight were

ployed a data mining technique based on the Quintly platform and the "exportcomments"

outlet, three specific keywords were used (custom metrics): "Amphipolis", "Amfipolis" and "Αμφίπολη" (the word "Amphipolis" spelled in two different ways in English and its respective Greek spelling). "Exportcomments" is an open source analysis software pri-

post's selection was the content relation to the Amphipolis excavations and discoveries.

relevant content but different accounts (media accounts and journalists' accounts).

working for more than one different media outlet, e.g., for a Greek newspaper and an international news agency).

Within the framework of conceptual network analysis, the collection of posts employed a data mining technique based on the Quintly platform and the "exportcomments" software. Quintly is a platform designed for social media analysis that provides data from several social media platforms as long as these data (posts, in this case) are published in public accounts (i.e., Facebook public pages). In order to retrieve the posts for each media outlet, three specific keywords were used (custom metrics): "Amphipolis", "Amfipolis" and "Aµϕ*ι*´πoλη" (the word "Amphipolis" spelled in two different ways in English and its respective Greek spelling). "Exportcomments" is an open source analysis software primarily used for network analysis, discovery and the exploration of particular social media spaces that allows for the extraction and import of Facebook data from specific dates and Facebook public pages onto Excel sheets. Regarding the collection of content posted by journalists themselves on their personal accounts (personal posts), the Facebook search tool was used for each account separately for the time period selected for the study.

The unit of analysis was the individual post and its comments. The criteria for a given post's selection was the content relation to the Amphipolis excavations and discoveries. The initial sample from August to December 2014, for all journalists and media outlets included in the research, consisted of 2368 posts and 16,400 comments. As this was too large a dataset for qualitative analysis, the final sample was narrowed down to the 10 journalists with the highest number of followers/friends and the respective media outlets they were working for. The direct links for all retrieved posts and comments were pasted into an Excel file and 30 posts (with their respective comments) from each Facebook account were randomly selected, to avoid sample bias, and were all included in the qualitative content analysis. Then, based on the date of the posts and the custom metrics, the respective posts of the media outlets were collected. This process allowed a comparative analysis to be conducted, revealing different types/categories of comments for posts of relevant content but different accounts (media accounts and journalists' accounts). The final dataset comprised 660 posts and 3684 comments in total, deriving from the Facebook accounts of 10 journalists as wella s the Facebook public pages of 10 media outlets and can be considered representative of the initial sample of 2368 posts and 16,400 comments. Table 1 shows the final sample for the research and Figure 2 depicts the sample of retrieved posts and comments for all media outlets examined.


**Table 1.** Research sample.

It should be noted that the initial sample from which the final sample emerged meets the four basic 'V' criteria [30] that usually describe big data: *Volume*, in the number of data points in relation to CH that were traced within the social media environment; *Variety*, in the range of media and journalists' post types (e.g., posts, videos, photographs) found in relation to CH; *Velocity*, for the speed at which these data were generated on social media platforms; and *Veracity*, regarding the differences in data quality that may lead to differentiated narratives and meanings. That is not to say that the sample used here constitutes a typical example of big data; however, it does display the basic characteristics that usually describe samples deriving from social media networks in the era of big data.

the four basic 'V' criteria [30] that usually describe big data:

the range of media and journalists' post types (e.g., posts, videos, photographs) found in

method, according to McLamore and Uluğ (2020) [34], typically follows these steps: selection of the material (i.e., journalists' posts and respective comments for every post), build-

matic analysis. These narratives adhere to the core signs within users' discourses that are

**Figure 2.** Posts and comments retrieved for all media outlets examined (Source: Quintly).

lists' and All posts and comments were analyzed via thematic qualitative content analysis [31,32], which is a method used to help researchers reduce data, focus on selected content aspects of the data, and systematically describe it in terms of these aspects [33]. This method, according to McLamore and Ulu ˘g (2020) [34], typically follows these steps: selection of the material (i.e., journalists' posts and respective comments for every post), building a coding frame, dividing the material into units of coding (i.e., categories of posts and comments), evaluating and modifying the coding frame, and then proceeding to analysis and interpretation (categories of narratives).

The analysis focused on identifying the specific narratives created by posts and comments on these posts. The narratives were identified through repeated readings and thematic analysis. These narratives adhere to the core signs within users' discourses that are crucial for indicating meanings and connotations. The ultimate goal of the analysis is the exploration of the new meanings and connotations these narratives can produce within the specific socio-cultural context in which they occur [32,35].

#### **6. Findings and Analysis**

The six categories of narratives identified in the study of Fouseki and Dragouni (2017) [8] were initially used in this study for the general categorization of the journalists' and media Facebook posts. However, in the course of the study, the retrieved data showed that not all narrative categories detected in traditional media content could be identified in the content posted on Facebook, whereas new narratives emerged that did not exist in the traditional media analysis of Fouseki and Dragouni, and—as such—the analysis here is both deductive and inductive. This finding indicates that media content posted on social media platforms tends to generate different narratives than content presented through traditional media. However, this finding needs further analysis of the data retrieved to be fully validated.

Four specific categories of narratives were identified in the content of all posts examined (as presented in Table 2), which, in turn, generated a series of various and, in some cases, contradictory comments. In several cases, the comments were not directly related to the theme of the posts but tended to refer to different/alternative narratives, meanings and connotations.


**Table 2.** Categories of narratives.

The first category identified is related to *notions of national pride and national achievements*. This included news connected CH (in this case, the archaeological excavations) to the nation's history and historical personalities. For example, as the excavations in Amphipolis were progressing, there was an escalating effort to connect the discovery with any aspects of the history of Alexander the Great. Whereas media posts were mainly based on the excavation outcomes (e.g., Ant1 News, 22/12/2014, "*New findings in Amphipolis*"), journalists' posts tended to be more personalized and emotional (22/12/2014, female journalist, "*For all of us here in the excavation field, the new findings make us proud as Greeks*"; 23/10/2014, male journalist, "*The new findings are important as they pinpoint the direction of 'our' Alexander the Great*"). In addition, whereas international media outlets were indirectly referring to the connection of the Amphipolis discovery to Alexander the Great, Greek media directly recognized this relationship. For example, on 12 November 2014, a BBC post was titled "*Amphipolis skeleton from Alexander's time found in Greece*"; the next day, the Greek television channel Ant1 posted that "*BBC directly relates the Amphipolis discovery to Alexander the Great*", as showed in Figure 3.

**Figure 3.** BBC and Ant1 news posts for Alexander the Great.

For several Facebook users, such posts were characterized as *indices of disinformation* and received negative comments:

*"What was the journalist thinking? Can't he read?" "What was the journalist thinking? Can't he read?"*

(Facebook user, male, 13 November 2014)

*"They are disinforming the public in purpose* . . . *this is a disgrace!"*

*"They are disinforming the public in purpose… this is a disgrace!"* (Facebook user, male, 13 November 2014)

*is mocking us…*"

In general, several posts in this category referring to national achievements seem to have generated negative comments regarding the economic crisis in Greece and members

olis (" "), there were several negative comments

*"Let's see what they are going to discover next…all the media are mocking us, the government* 

main post on 6 September 2014 reflected the Greek Prime Minister's statement about Amphipolis (" "). One of

phipolis as " ". Posts in this category

" (male journalist, 7/9/2014) while other media posts referred to Am-

the journalists, in his post regarding the PM's statement, noted: "

MPA News Agency's

*"The media need us to feel proud of our country, while all Greeks are starving…"*

of the Greek government, which, in turn, were interpreted as disinformation. For example, while a post by kathimerini.gr was referring to the forthcoming new projects in Amphipolis ("*The next projects set to take place in the tomb*"), there were several negative comments attached to this post that related to national achievements:

*"The media need us to feel proud of our country, while all Greeks are starving* . . . *"*

(Facebook user, female, 3 December 2014)

*"Let's see what they are going to discover next* . . . *all the media are mocking us, the government is mocking us* . . . *"*

(Facebook user, male, 4 December 2014)

It seems that in this case, the nodes (media and journalists) not only failed to affect the created narratives, however, on the contrary, the created narratives emerged as reactions to the posts. Similar findings also characterize the second important category of narratives that were related to *policies of distraction*, following similar findings in the analysis by Fouseki and Dragouni (2017) [8]. However, posts in this category mainly came from media outlets with a political affiliation close to the government in office and state-owned media and less from the journalists. For example, the Greek ANA-MPA News Agency's main post on 6 September 2014 reflected the Greek Prime Minister's statement about Amphipolis ("*We are progressing with professionalism and responsibility in Amphipolis*"). One of the journalists, in his post regarding the PM's statement, noted: "*By looking back, we focus forward on our future*" (male journalist, 7/9/2014) while other media posts referred to Amphipolis as "*one of the top ten archaeological discoveries of the decade*". Posts in this category seem to have received the most negative comments, as several Facebook users characterized them as attempts to distract public opinion in a crucial socio-political and economical period for the country:

*"Congratulations! Now, can you tell us about the new taxes they are planning to impose on us, again???"*

(Facebook user, male, 7 September 2014)

*"Do they really believe that all Greeks are idiots? We know that you are trying to distract us from the real problems."*

(Facebook user, female, 8 September 2014)

Narratives referring to policies of distraction were mainly generated by comments on media posts rather than on journalists' own posts. In several cases, such comments tended to be ironic towards the government and members of the Greek Parliament, whereas most of them referred to the Amphipolis archaeological discoveries in relation to the economic crisis. Constructed narratives with negative connotations were based on feelings like anger, irony and distrust.

The third category identified in the analysis referred to *political and scientific conflicts* with regard to the Amphipolis discoveries. For example, a post by kathmerini.gr on 28 December 2014 ("*Disagreements regarding Amphipolis*") referring to political and scientific conflicts regarding the excavation, received the following comment:

*"Amphipolis remains doubtful on so many levels!"*

(Facebook user, male, 28 December 2014)

Political conflicts over the archaeological discoveries often occurred between the government and the opposition parties. Conflicts also occurred among archaeologists over the nature of the discovery. In several cases, these scientific conflicts became a matter of public dispute on social media platforms and users were negatively commenting on the conflict itself, placing greater emphasis on the infotainment aspect of the news than the informational one. In this case, the nodes (journalists and media) clearly affected the links (friends/followers) and consequently, the created narratives with their posts:

*"Unfortunately, this is what Greek culture and history is about* . . . *In this country we are never going to stop fighting each other!"*

(Facebook user, female, 19 November 2014)

*"As if political disputes were not enough in this country* . . . *"*

(Facebook user, female, 2 December 2014)

In some cases, the scientific conflict tended to attract users' attention more than the political one. In other cases, comments were quite intense both towards the political and scientific conflicts:

*"Can't they agree on something? Why are they constantly fighting? This is a great moment for all Greeks* . . . *This is shameful* . . . *"*

(Facebook user, male, 23 November 2014)

*"And somewhere in these conflicts, development and progress are hidden* . . . *!"*

(Facebook user, male, 17 October 2014)

The final category of narratives identified in the analysis was mainly detected among journalists' own posts rather than those of the media outlets and referred to policies of *personal and professional self-promotion*. Policies of self-promotion have been the cornerstone of social media use since the rise of these platforms in the late 2000s. Facebook seems to be particularly focused on facilitating personal self-presentation, self-expression and self-promotion [36]. Several studies refer to the ways in which journalists use Facebook to promote their work and themselves as part of personal branding policies [37–39]. In the case of the Amphipolis discoveries, most of the journalists included in this study tended to post on a daily basis and their posts varied from personal comments, selfie photographs and links to their published/publicized news stories of the excavations. Most of these posts had an informal tone and referred to their day-to-day experiences in Amphipolis. Most of the positive comments on these posts came from fellow journalists and showed positive connotations with regard to the significance of their work in the field:

*"You are doing a great job all this time, we are counting on you!"* (Facebook user-journalist, female, 3 September 2014) *"This is what on-the-spot reporting means! Great job, all of you!"*

(Facebook user-journalist, male, 27 August 2014)

In several cases, the journalists' own posts regarding personal and professional selfpromotion tended to generate more informal comments with regard to the difficulties of reporting. In this way, a new public–private sphere seems to have been generated within the social media public sphere. Journalists were having private discussions with their colleagues but within a public space, adhering to the notion of personal salience, by encompassing digital, online activities for the establishment of personal agendas [40]. This narrative, in turn, is enhanced not only by personal posts and experiences but also by personal photographs and videos that frame the posts and serve as tokens of selfconstructed narratives of personal significance. In this case, the nodes (journalists), via their posts, seem to have affected their links within the network they tended to operate (friends and followers) by affecting the created narratives.

#### **7. Conclusions**

This study builds on earlier work, on the ways in which cultural heritage is used to generate traditional media narratives and aimed to identify the narratives through which CH is communicated through news content posted on social media platforms. The basic target was to assess the role of media and journalists in this context.

Whereas some of the narratives identified within the traditional media context were also identified within the context of social media, several others emerged, both from the news content posted online and the comments this content generated. Specifically, the analysis of retrieved posts led to the identification of four main categories of narratives: *notions of national pride and national achievements, policies of distraction, political and scientific*

*conflicts* and *policies of personal and professional self-promotion*. However, the comments attached to these categories of posts were often contradictory to the initial posts and led to the construction of alternative narratives and meanings. These narratives, with both positive and negative connotations, were identified as *disinformation*, interpreted by the audience as media attempts to intentionally disinform; *political and scientific conflicts* that reflected more infotainment rather than informational aspects of the news; and *selfpromotional policies* with regard to the journalistic work of covering the archaeological discoveries. In the case of the first two categories of narratives (namely, *notions of national pride and national achievements*, and *policies of distraction*) not only did media and journalists (the nodes) fail to manage to affect groups of users within the social media environment (links), however, on the contrary, the created narratives emerged as conflicting reactions (comments) on the initial posts; whereas in the case of the other two categories of narratives identified in the study (namely, *political and scientific conflicts* and *policies of personal and professional self-promotion*), the media and journalists affected individual users and the created narratives.

CH in social media seems to be used on multiple levels as a field for news propagation. The key issues identified in this study relate to the role of media and journalists within the sustainable digital environment, where big data play a significant role. The initial sample used for the needs of the study was in accordance with the four main 'Vs' that usually describe big data: *Volume* relates to the increased number of CH data on social media platforms; *Variety* is evident in the various media and journalists' post types (e.g., posts, videos, photographs) that can be traced within the social media environment in relation to CH; *Velocity* refers to the speed at which these data are generated within social media, which may often lead to misleading information for users; and *Veracity* relates to differences in data quality that may lead to disinformation and confusing content for users.

As far as the media are concerned, this study shows that generated narratives within the social media environment are directly and/or indirectly connected to their overall role and performance within society. As such, the audience tends to be more critical towards them, whereas negative connotations are often identified in the content posted online. As far as journalists are concerned, although fewer negative connotations tend to be generated by their role as news aggregators, at the same time they do not seem to be able to denounce contradictory and/or conflicting roles, both as media professionals (part of the overall media system) and individual practitioners of public information routines. Although this analysis showed that there are certain differences between media and journalists as regards their role within the Semantic Web, there are several common characteristics they share, most notably disinformation. A future extension of this work could focus on the ways in which journalists themselves can segregate their role within the Semantic Web, among media professionals and individual users, and the impact of these roles on professional norms and practices.

**Funding:** The APC was funded by the University of Cyprus.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** All data used for the study were publicly available data retrieved by the author.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **Appendix A**

**Table A1.** Timeline of the Amphipolis discoveries.


#### **References**


## *Article* **Web Communication: A Content Analysis of Green Hosting Companies**

**Minos-Athanasios Karyotakis 1,\* and Nikos Antonopoulos 2,3**


**Abstract:** While many studies in the field of environmental communication have focused on exploring the environmental impact of social media, this research paper takes a different turn. It investigates, through a qualitative content analysis, 391 websites that support and provide green hosting services. This study is considered the first in the field that aims to examine in-depth how these green websites tend to communicate their green services. Therefore, its contribution is to enhance the relevant bibliography and present more insights regarding green websites and sustainability. The results showed that most of the websites were trying to highlight the positive impact their services will have on the environment. In addition, many websites tried to educate their consumers concerning sustainable development and make them part of a broader green cultural tradition. Nevertheless, on many websites, green hosting seemed a supplementary factor for choosing the company's services.

**Keywords:** big data; cultural heritage; data center; digital marketing; eco-friendly; environmental communication; green websites; green culture; green hosting; sustainability

**Citation:** Karyotakis, M.-A.; Antonopoulos, N. Web Communication: A Content Analysis of Green Hosting Companies. *Sustainability* **2021**, *13*, 495. https://doi.org/10.3390/su13020495

Received: 29 September 2020 Accepted: 30 December 2020 Published: 7 January 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

In the last decade, there was a rapid development concerning new technologies. The daily use of services, such as websites, social media platforms, and phone applications, has led to an increase in power consumption. Despite not being highlighted usually, the new technologies are also to be blamed for crucial environmental phenomena such as climate change and the reduction of natural resources. In particular, it is estimated that around the globe exist more than almost two billion websites and almost eight million web-facing computers [1]. Furthermore, these impressive numbers are accompanied by the constant use of certain websites by a large scale of individuals, such as social media platforms. Facebook, YouTube, and Twitter had around 4.5 billion users in 2018 [2].

It should be taken into consideration that the aforementioned social media platforms are based in the Western world, where the majority of the Earth's population is not living. For instance, China is forbidding the use of these platforms, and, thus, it has its own social media scene with popular platforms such as Sina Weibo. Apart from this fact, China already in 2012 had the most Internet users in the world (513 million): more than double of the users in the United States of America and far more than the most populated European Union (EU) countries [3]. In addition, "China's social media users are more active than those of any other country but also, in more than 80% of all cases, have multiple socialmedia accounts" ([3] p. 2). It is also worth mentioning that the news industry has focused the last decade on data journalism for exposing the wrongdoings of powerful individuals, such as the Panama Papers. These revelations are based on an enormous number of data that usually are stored in electronic formats. Therefore, there is a need for more extensive hosting services [4,5].

To underline the problem with the power consumption concerning the services, as mentioned earlier, which seems to be unnoticed by the news industry, it is worth taking a closer look at some of the world's largest data centers. According to Energy Innovation [6], an average of the largest data centers in the world can exceed the power capacity of 100 MW (megawatts), as it accommodates thousands of devices. This power capacity can be enough for around 80,000 households in the United States of America (USA). In addition, the growing demand for more internet and information services will further influence the energy use regarding these services alongside the rapid growth of CO<sup>2</sup> (carbon dioxide) emissions. CO<sup>2</sup> emissions are associated with energy use. However, the actual number of CO<sup>2</sup> emissions coming from the data centers cannot be estimated, as there is a lack of data. Only a few companies in the world publicize such data. Since it is not possible to accurately estimate the energy consumption and the CO<sup>2</sup> emissions, it is worth highlighting that "between 2010 and 2018, global IP traffic—the quantity of data traversing the internet increased more than ten-fold, while global data center storage capacity increased by a factor of 25 in parallel" [6].

The urgent need for data, accountability, and promotion of a green culture (e.g., the use of renewable resources and low power consumption solutions) concerning the growing IT services has resulted in the emergence of new terms, such as the concept of "green websites". "Green websites" presents the use of environmentally friendly operations and renewable resources [1,7,8]. In particular, that includes green Information Technology (IT) and green computing operations, such as energy-efficient computing, responsible recycling and disposal, design for environmental sustainability, and data centers that use renewable energy or have efficient power management. The main goal of green practices in IT services is to reduce carbon footprints and the waste of energy [8]. Green hosting is also a part of green computing and culture, as it "is the web hosting solution powered by environmentally friendly resources such as renewable energy" [9].

Despite the trend of green IT practices, there is a lack of research in this field, as there is a need for more public data and research concerning the understanding and the investigation of this phenomenon [10]. For instance, a recent study from Antonopoulos and his colleagues [1] provided alarming findings regarding green practices in news websites. The study explored whether the most prominent 500 websites of the globe were using and promoting green practices. Almost none of the websites were using environmentally friendly servers or organizing actions in favor of the environment. Furthermore, the majority of the news websites did not even have a unique news category regarding environmental news, despite the fact that the more news there is concerning environmental issues, such as climate change, the more public awareness is raised [11]. Therefore, it is believed that there is not adequate dissemination of the green tradition about the IT services and operations.

All in all, this study aims to enhance the relevant literature by examining how a significant number of websites around the globe support and communicate green hosting services to their users.

#### **2. Communication, Cultural Heritage, and Green Hosting**

Environmental communication was a narrow term initially, as it did not include a number of communication practices that are considered parts of this term nowadays [8]. The field has been developed to include several actions and practices that examine in depth, for instance, state or non-state actors and how cultural products can impact society concerning environmental problems. Furthermore, the worsening of environmental phenomena such as climate change has led even the UN (United Nations) to intervene and support the actions that aim to raise awareness and protect the environment [8]. These important changes in the field of environmental communication resulted in broader approaches and studies of those practices. According to Pezzullo and Cox [12], every actor who disseminates viewpoints and solutions for dealing with environmental issues is a part of the environmental communication, as these actions promote a certain consciousness for eco-friendly (green) solutions. In addition, this constant communication of that consciousness influences society, as the older generations inherit its environmental concerns and solutions to the new one. This process can be considered the creation and preservation of a culture, or at least a tradition. That is one of the reasons why studying and understanding these processes and transformations is important.

Environmental communication includes different approaches and methods for exploring the processes of disseminating information concerning the environment. Part of those methods belongs to the critical approaches, such as discourse, textual, and rhetorical analysis for understanding how communication affects the ideas, perspectives, and feelings of the people regarding environmental issues. "These humanist and critical approaches are less concerned with unearthing environmental "facts", but rather focus on understanding how communication functions pragmatically and constitutively" ([13] p. 39). These approaches are based on the thorough use of a theoretical and conceptual application of a methodology that includes textual evidence by the researchers for backing up their arguments [13]. In addition, in the last decade, there are calls for a need for studying topics in environmental communication that are associated with the traditional sociological issues, such as inequality, power, and the socio-political aspects (i.e., financial, sustainability, and cultural issues) that influence the dissemination of information about the environment significantly [14].

Moreover, the green/environmental approaches can be core components of sustainable development, which is also associated with the dissemination of CH (Cultural heritage) and societal traditions [15]. "Cultural Heritage is an expression of the ways of living developed by a community and passed on from generation to generation, including customs, practices, places, objects, artistic expressions, and values" [16]. The United Nations Educational, Scientific, and Cultural Organization (UNESCO) has also pointed out that CH is not only the physical artifacts that are passed down from one generation to the other but also the process of preserving and improving these traditions that will benefit society in the future years. Although it may not seem significant, these traditions can positively influence employment, social cohesion, employment, and even new skills to promote sustainability and more sustainable societal and economic models [17].

The European Union (EU) and its countries have tried in the last years to connect more intensely the CH with the sustainable development that involves implementing green practices. That is why certain initiatives have taken place for reducing, for example, the air pollution that can damage CH products. Internet services contribute to the worsening of air pollution with CO<sup>2</sup> emissions [18]. The EU directives and conventions have been progressive regarding the CH when it was understood that nature is changing due to the human activity. Thus, it affects the environment, as "landscapes are characterised by a strong cultural stance" ([19] p. 617). The Netherlands is one of the most prominent examples in the EU regarding CH, as it has developed a CH management approach that includes green approaches and socioeconomic factors. Furthermore, it "recognises that Dutch cultural heritage provides invaluable traditional knowledge on managing water- and flood-related hazards and spatial and climate adaptation" ([20] p. 2). In addition, the Netherlands has recognized around five years ago that climate change can be a threat to the country's CH, especially in the coming years as the weather events, such as floods, heatwaves, and droughts will be increased. Therefore, it must be tackled to protect the Netherlands' CH. At this point, it should be mentioned that climate change has challenged the CH management approaches globally, as they seek to deal effectively with the threats caused by the environmental change by promoting sustainable models [21].

For instance, one prominent example of such a model is sustainable tourism, which promotes an alternative approach that goes against the negative phenomenon of overtourism that harms the local population, the social order, and the environment of the destination with the excess number of tourists [22]. Sustainable tourism respects the destination as it tries to meet the local population's needs, the tourists, and, of course, the environmental factors (e.g., biological and ecological diversity). In other words, sustainable tourism also wants the tourism destination's infrastructure to be efficient enough in order

not to provoke excessive power and water (or energy) consumption [23]. The definition of sustainable development aligns with the tourism example, as it describes a model where everyone's wishes and demands are met, but with securing that the next generations will also have the ability to meet the same demands [24]. Regarding the definition of sustainability, it seems that it includes the same meanings as the term "green". Sustainability and green as definitions emerged due to the overconsumption of natural resources in favor of short-term economic development. Therefore, there was a need to communicate this urgent problem to the public. However, the overuse, mostly for marketing purposes, of the term sustainability and its variations has led to the public's confusion about their real meaning [25].

That is one of the reasons why the public tends to forget that the green culture must also be applied in the field of big data and the Internet services if the actual goal is the sustainable development of the planet. Similar to the above-mentioned example of sustainable tourism, it is uncommon to hear the same terms for IT services. Nowadays, websites are an essential part of the individuals' daily routine. They are used to read news articles, promote, buy products, or communicate with other individuals daily. Thus, this paperless communication is thought not to be harmful to the environment. Nevertheless, all these webpages that belong to each website need to consume energy to function. If they host big data, unnecessary data, many different files (e.g., videos and photos), and programs, power consumption is usually increased. As a result, the goal is not only to reduce this unnecessary consumption but, firstly, to focus on operating the hosting services exclusively on renewable sources (green hosting). The second goal is to communicate efficiently this need to the public [1].

Sustainability of the Internet cannot be achieved without paying close attention to the data centers and their power consumption. For all the large companies, especially social media companies, they have to operate 24 h for 365 days yearly. Furthermore, if there is dysfunctionality with a server, a backup server is needed for keeping the company products and services online despite the problems. Some of these servers and data centers operate with diesel fuel. In 2013, the New York Times revealed that a data center is similar to a small town regarding electricity consumption [24]. Moreover, the data warehouses were consuming around 30 billion watts of electricity, which equals the power provided by 30 nuclear power plants. Another interesting fact is that the data centers have a high chance to waste about 90% of the power, as the data centers do not use more than 12% of electricity for the computational purposes and the servers. Lastly, many providers of the above-mentioned services claim that they offer green energy, but it is not clear enough if the energy comes from renewable resources or alternatively if there is any other way that makes the services green [24].

The analysis and understanding of the data centers based on specific metrics is not an easy task. Several vital dimensions define a data center's operation, such as performance, cooling, energy efficiency, air, thermal management, financial impact, energy efficiency, storage, and security [25]. The most prominent metric for assessing data centers is the Power Usage Effectiveness (PUE), which was proposed in 2006 by the non-profit institution called "Green Grid". PUE has become a standard practice in the field for assessing the energy efficiency of a data center [26]. Its importance lies in the fact that it presents "the proportion of energy which is actually used to operate the IT equipment with respect to the total power draw of a facility, and is defined in equation" ([26] p. 155). However, concerning the energy consumption, "it is not clear how to measure the total energy that goes into IT equipment accurately" ([25] p. 293), as "the precise values of such consumption and its future growth as projections are continuously revised and real data is difficult to acquire" ([27] p. 1015). In addition, the notion of performance seems to play a crucial role in consumers' choices regarding the usage of green products and services. The consumers tend to be skeptical about the advertising messages that promote environmental practices and the advertised company's actual green performance. A misleading communication

practice can damage the company's reputation because the public's concerns regarding sustainability are not met or understood fully [28].

Green technologies and services seem to be promoted as a notion even in large global events, such as the 2022 FIFA Men's World Cup in Qatar [29], but once again, it is not clear what exactly the term green includes. For instance, it is not clear enough if these services include green hosting and how much power consumption these green technologies save. According to Masanet and his colleagues [10], there was a growth of about 6% from 2010 to 2018 of the power consumption. Nevertheless, it is argued that there were important steps for reducing the energy consumption by the data centers, but it is unclear how long these approaches will last. In addition, the lack of transparency makes the future predictions uncertain. However, there is a consensus amongst the researchers that pursuing green approaches, such as green hosting, will eventually lead to a holistic sustainable approach for the IT services [30–33]. Finally, accepting the consensus of experts, the UN started promoting from 2015 the *2030 Agenda for Sustainable Development*, which was adopted by all the UN Member States. It incorporates some of the main objectives of green computing and culture, such as the energy consumption and that "economic, social and technological progress occurs in harmony with nature" [34].

The need for green practices and a future environmental and cultural heritage in the Internet sector was highlighted by scholars around a decade ago [30,35]. Environmental Heritage can be considered an actual part of the CH of society, and the dissemination of its traditions about protecting Earth's natural resources seems to come in contrast with the development of modern society [36]. Consequently, the last years' efforts, such as the *2030 Agenda for Sustainable Development*, want to change this unsustainable narrative and convince the coming generations that environmental heritage and sustainability must be essential components of CH. The dissemination of this new empowered narrative with the help of digital storytelling can create a new sustainable CH that will be common for several communities worldwide [34,37].

The sustainable development of a company, including its communication practices, focuses on dealing with the current environmental problems by providing solutions that have a global impact considering the financial and environmental costs that these solutions would bring at a macroscopic level to society. The technological initiatives combined with the companies' relevant strategic decisions can emphasize their communication on the significance of not damaging the environment more. In other words, regardless of whether the company provides Internet services, it should manifest its will to protect the environment through green practices. That can be achieved by implementing a welldefined environmental plan and constant communication of its green practices to the users-consumers [38,39]. Nevertheless, the implementation and the communication of green practices through the web are also connected with the socio-political context. Some societies do not pay so much attention to sustainable development and green practices. There is a crucial difference in environmental web communication based on the company's audience. A website will have a different communication approach if it tries to reach a global audience compared to a web company, for example, that is located in a society where its members make a living by working on mining sites. Therefore, some web companies will not pay such a closer look at their green services and practices in their communication campaigns [40].

In conclusion, the communication process for promoting the green culture and technologies has not been studied extensively, although sustainable development seems to be one of the most important goals on a global scale. In order to achieve sustainability in Internet services, a closer look should be taken into the promotion of green operations in the data centers, such as green hosting. The next section explains how this study researches green hosting communication by examining prominent companies around the world.

#### **3. Materials and Methods** *Green Web Foundation's directory*

"

The current study used the *Green Web Foundation's directory*, which contained 475 green hosting companies that operated in 56 different countries of the globe. According to the *Green Web Foundation* to register to its directory, the company has to provide evidence that the website is a real green provider and that a green provider hosts it. There are two options for proving that the websites-companies are an actual green user. The first one is the "Proof of using green energy", which collaborates with a data center that is run by renewable energy. The directory's companies have to provide to the foundation a certificate "stating the number of MWhs that are bought in green and a period" [41]. The second option is to the "Proof of accounting for the carbon emitted", which means that the services are run carbon neutral "by buying carbon offsets from projects that mitigate CO<sup>2</sup> in other projects". Similar to the first option, a certificate must be provided to the foundation for being included in the directory [41]. However, a search was conducted from 2 to 10 August 2020 to check if every website was still online or for duplicates in the list. The result was that 391 companies' websites were available in 47 countries (Figure 1). A few were removed as they were not operating or considered duplicates, because they belonged to the same company. The most prominent example of duplicates was "Amazon Web Services". one is the "Proof of using green energy", w by renewable energy. The directory's companies have to provide to the foundation a certificate "stating the number of MWhs that are bought in green and a period" [41]. The second option is to the "Proof of accounting for the carbon emitted", which means that the services are run carbon neutral "by buying carbon offsets from projects that mitigate in other projects" list. The result was that 391 companies' websites were available in 47 countries ( belonged to the same company. The most prominent example of duplicates was "Amazon

– **Figure 1.** The top 18 countries in green hosting websites–companies according to the *Green Web Foundation*.

–44]. "Qualitative content anal-The study used qualitative content analysis to examine the following research question (RQ1): How do a significant number of websites around the globe support and communicate green hosting services to their users? Qualitative content analysis can be employed for analyzing all kinds of media texts. Its purpose is not just to present the content of the examined media texts but also to identify and explain the main ideas communicated in those media texts. Qualitative content analysis is considered a common method in communication studies, and it has been used several times for analyzing communication phenomena based mostly on media texts, such as websites [42–44]. "Qualitative content analysis goes beyond merely counting words to examining language intensely for the purpose of classifying large amounts of text into an efficient number of categories that represent similar meanings" ([45] p. 1278). As a research method, it focuses on the subjective explanation of data's content to spot shared patterns and themes [45] (Figure 2).

similar meanings" nation of data's content to spot These websites were chosen as there is an intermediate foundation that guarantees and checks that these websites–companies indeed use and provide green hosting services. *The Green Web Foundation* is located in The Netherlands, and, thus, it is not surprising that the majority of the entries in the directory are companies located in The Netherlands. In addition, the aforementioned directory includes some of the most prominent green

hosting companies in the world, such as "A2 Hosting", "DreamHost", "GreenGeeks", and "HostPapa", to name a few [46–48]. As a result, the data collection was evaluated and focused on the study's research question, considering that it is not easy to identify a similar directory or if the websites–companies use green hosting [24,49].

**Figure 2.** The summary of the methodology.

– " " " " " " The current study is inspired by previous papers [1,10,33,50] that are related to sustainability studies and employed content analysis [51–53]. The current research is highlighting the need for a better understanding of the green practices and how this green tradition is communicated through the green hosting companies in the era of big data, during which there is an urgent need for sustainable development and a reduction of the energy consumption used for the IT services. The next section presents how the websites' media texts were communicating the green hosting services to the users.

–

#### " " **4. Results**

#### ' – *4.1. Green Hosting as an Important Part of the Services*

– Among the examined websites–companies, many considered green hosting as an essential factor in choosing the services. These websites were highlighting that the services were not only eco-friendly but could also have a severe impact in favor of the environment. In order to explain these arguments, different sections, or even blog posts, were focusing on presenting how the green hosting is being implemented. For example, GreenGeeks, one of the most popular companies providing green hosting, explained that it is a "green web hosting provider putting back 3 times the power we consume into the grid in the form of renewable energy" [54]. Furthermore, it was underlined that there is no waste of power and that the customers–users will be able to make a difference in the world by choosing these kinds of services. The company's website is promoting through different ways how the company is implementing the 300% renewable energy by different sections on the website about this issue. For example, one was exclusively about the data center

grid in the form of renewable energy" [54]. Furthermore, it was underlined that there is –

ing these kinds of services. The company's website is promoting through

–

'

"green web hos

of the company, and another one about how Internet services are actually polluting the environment. The latter section was titled as "Did You Know The Internet Is One of the World's Largest Polluters?" and, amongst many arguments, it claimed that:

"Today, data centers account for 2% of the world's carbon emissions, that is as much as the AIRLINE INDUSTRY! But it doesn't stop there. Data Center pollution is expected to grow to 14% of the world's carbon emissions, as much as the United States of America, by 2040" [54].

Moreover, some websites go beyond highlighting the importance of green hosting; they tried to connect the two terms of green and sustainability, providing an in-depth analysis of the company's initiatives toward this path. The primary goal seemed to convince the customer that although the company has not succeeded in the 100% usage of renewable resources, it is about to achieve this goal in the coming years. Therefore, some websites– companies such as Amazon Web Services (AWS) had several sections on the relevant web page, showing what renewable resources are used for power consumption (e.g., wind and solar power). It presented the company's farms with reports and more data for the user to search more if needed [55]. Meanwhile, the company's commitment to 100% renewable energy was being highlighted on several occasions. For example, AWS chose on its website to thoroughly explain the next steps of this commitment by presenting the overall power consumption by the hour that will be achieved using renewable sources. AWS services will open in the coming years, one new solar farm and four wind farms. "Once complete, these wind and solar farms, combined with AWS's nine previous renewable energy projects, are expected to generate more than 2,900,000 MWh of renewable energy annually" [55].

For some websites–companies, the promotion of the term green was significant. It was used exclusively as a term throughout the website. These websites tried to educate the users and make them choose green products that consume green energy in every aspect of the company, such as the energy for the company's offices. In that way, the consumers could be ensured that they contributed to a healthier and cleaner planet by choosing these services. In the meantime, through their choices, they were supporting the best practices in the field of IT services that promote holistic, sustainable business practices, and corporate responsibility. Apart from these facts, some websites also chose to underline the notion of carbon footprints and carbon-neutral in association with their employees' green life-office. For instance, HostPapa, a UK-based company, argued that it "has taken the initiative of going green by purchasing 100% green renewable energy to power our data centers, web servers, office computers, laptops, and office space" [56]. Another example is DreamHost's website, which was making several innovative claims, such as that there were "recycling bins in every office as far as the eye can see, even for single-serving coffee pods!", "ceramic cups, plates, and real silverware in every office. No disposables here!" and "generous work-from-home policies keep people off the roads and in their happy places" [57].

Adding to all these, some green hosting companies connected green and sustainable practices with an ethical aspect of the business model. For several companies, it seemed that going green was a must if the company cared about the nature and the well-being of individuals. Thus, the customers must know the harmful impact that the hosting services have on the environment. Probably, that can be one of the reasons why alarming facts for the environment are linked with the services of the company. One example could be the company greneIT, which used the following paragraph on its website:

"The internet industry emissions are currently level with aviation traffic but the consumption at data centres is expected to double by 2025, producing more emissions than air transport" [58].

There seemed to be no significant differences between the promotion of those websites– companies' green services in relation to where they were headquartered. Other companies– websites based in Greece, the Netherlands, Austria, Germany, the United Kingdom, the USA, and so forth did not seem to use highly different content from the one presented above. The majority of the websites were located in the Europe (Figure 3).

–

'

–companies, according to the study's sample based on the world's continents (Turkey was inc **Figure 3.** The number of countries green hosting websites–companies, according to the study's sample based on the world's continents (Turkey was included in the Middle East and Russia was included in Asia).

#### *4.2. Green Hosting as a Non-Important Part of the Services*

– *ndation's* – servers' performance regarding time, cloud app – *Green Web Foundation's* ustify the "Proof of accounting for the carbon emitted" – – Despite several websites–companies considering green hosting as a significant aspect, there was also a large number from the *Green Web Foundation's* directory that did not include information about that kind of sustainable approach. These websites–companies paid attention to the detailed explanation of their services, such as security issues, the servers' performance regarding time, cloud applications, email, and domain services. In some cases, traces of green services could be identified by searching the websites thoroughly. There might be hints about the use of efficient data services and partners that were known for their commitment to go green, such as the Amazon Web Services (Figure 4). Nevertheless, it could not be identified if those websites–companies were using green hosting, or according to the guidelines for being in the *Green Web Foundation's* directory, to demonstrate actions that justify the "Proof of accounting for the carbon emitted". To put it differently, several websites–companies, even if they were participating in the green initiatives, did not want to highlight it, such as the above websites–companies.


– **Figure 4.** A screenshot of a relevant website–company that had as a partner Amazon Web Services (AWS).

–

's text, the website's design also did not try to present th

Furthermore, there were several other websites that, although they were offering green hosting and other green services, downplayed that feature, as it was not considered essential for choosing the company. Sometimes, with small banners, icons, and sentences, those websites–companies were showing to the potential customer their commitment about the green approaches, but they were not trying to explain in detail the actual meaning of the green hosting, the sustainable approaches, or the ethical and efficient support of the services (Figure 5). –

– **Figure 5.** A screenshot of a relevant website–company in Greek with an eco-friendly banner.

's text, the website's design also did not try to present th Lastly, these companies did not exploit the use of renewable resources, such as solar or wind energy, to differentiate themselves from the other competitors in the industry. Similar to the website's text, the website's design also did not try to present the green approaches as an advantage for choosing those companies. It was just a supplementary reason for choosing secured, quick, cheap, and reliable hosting services.

#### **5. Discussion**

Green hosting is an important phenomenon concerning the actual impact that it can have on the environment, as it is a sustainable model. One of its aims is to reduce the power consumption that is spent on IT and Internet services mainly through the use of renewable resources, such as solar or wind power. However, despite its significance, as there is considerable power consumption for those services, studies focusing on green hosting are on the margins of the research, especially in communication studies. In addition, until now, no research has focused on investigating how the websites and companies communicate information regarding their green hosting services to the consumers. Thus, this study situated itself in that gap to provide more evidence for understanding the process of communicating the green hosting services and the overall green tradition. It has practical implications for the fields of sustainability and environmental studies.

The findings proved that several green hosting services focused on explaining the green, ethical, and sustainable services and the beneficial impact they could have on the environment to the consumers. As a result, since they were disseminating viewpoints and solutions in favor of the environment, they were participating as actors in environmental communication [12]. That also shows that several companies try to convince the consumers–users that they have a well-defined environmental plan that includes green practices [38,39]. In addition, one of the primary purposes of the websites–companies was to differentiate themselves from the other hosting providers and communicate a green culture to the consumer. That trend has been observed in other studies of the field, in which some websites were trying to educate the users and initiate further participation of the individuals for environmental initiatives [1,50]. Apart from this communication process, it was observed that the most crucial point of the green hosting websites was the issue of power consumption, as, despite the standard PUE metric for assessing the energy efficiency of a data center, there is still uncertainty concerning the overall power consumption [26,27]. Thus, some highlighted the use of data centers that secure a low consumption or even

surpass the 100% consumption by renewable resources, such as GreenGeeks that claimed that its data centers match 300% of renewable energy [54].

Moreover, several websites–companies alongside power consumption underlined CO2, carbon emissions, or carbon footprints to raise awareness for their services' benefits. Adding to these, some of the websites–companies went a step further, presenting a different working style, a green life-office that goes beyond providing services. These companies, contrary to those that considered green hosting as a supplementary service, promoted a different culture that included some of the components of sustainable development, which is also associated with the dissemination of CH (Cultural heritage) and societal traditions [15]. Furthermore, this research seems to support the arguments that eventually, the preservation or the improvement of traditions could benefit society in the future years, changing even the working conditions in a company [17]. Lastly, the promotion of green hosting seems to be an efficient digital marketing approach, as the Millennials prefer "buying from companies that help people, communities, and the environment" ([59] p. 87).

At this point though, it should also be highlighted that there seemed to be a different communication approach concerning the promotion of the green practices from the most influential websites–companies that were included in the sample [46–48]. These companies probably due to their global audience and impact have chosen to highlight their green services and practices, in comparison to smaller companies whose revenue is based on a local market, in which the green practices might not be strongly supported by the local socio-political context [40].

Apart from these facts, it was startling that some the world's most populated countries, such as China, India, and Indonesia, had almost no websites–companies in the *Green Web Foundation's* directory. In particular, China had no website, and the other two countries had one each. This finding was alarming, considering that these countries will have more Internet users in the coming years, meaning more power consumption. For example, China already has the most Internet users in the world, far more than other Western countries [3]. It seemed that some European countries had adopted green hosting as a practice more than other nations. The most significant example was the Netherlands with 177 websites– companies in the directory followed by Switzerland and Greece (11 websites–companies for each country), considering the low population of these countries compared to the other countries of the directory. In addition, with 40 websites–companies, Germany seemed to play an important role in disseminating green hosting around Europe. Even though *Green Web Foundation* is located in the Netherlands, the number of 177 websites–companies was high, considering its population (Figure 6).

– according to the study's sample. **Figure 6.** The percentage of the most prominent countries in green hosting websites–companies, according to the study's sample.

–

–

) are represented in the study's

in dairy chains by 2020" through green practices [60]. The Netherlands is one of the most

despite the study's straightforward findings, no other research until today has investi-

issue in the coming years that has to be tackled, as they affect citizens' everyday lives

the customers' interest, many companies are highlighting different aspects of their green

*Green Web Foundation's*

he websites' services. These services will be a prominent global

–

–

These findings might be associated with the Netherlands' efforts for realizing its ambitious Sustainability Agenda that included, for instance, "the Dutch Dairy Organisation and the Dutch Agricultural and Horticultural organisation to have zero-carbon emissions in dairy chains by 2020" through green practices [60]. The Netherlands is one of the most prominent examples of European countries recognizing the urgent need for the implementation of green practices in order to protect also the CH of the country [19–21]. It is not surprising that 26 countries in Europe (see Figure 3) are represented in the study's sample. Due to the conventions and directions promoted by the EU, European countries have supported green practices, such as green hosting [18]. However, it is interesting that despite the study's straightforward findings, no other research until today has investigated the communication process of the green practices (i.e., green hosting) of these websites–companies.

#### **6. Conclusions**

The current study is considered the first in the field that provides evidence about how the green websites–companies communicate their green services through a novel methodology that can be replicated by other scholars in the field to study more in-depth environmental aspects of the websites' services. These services will be a prominent global issue in the coming years that has to be tackled, as they affect citizens' everyday lives worldwide. In addition, the use of the *Green Web Foundation's* directory offers a database and an organization that actually tries to assess the green practices of the websites and the Internet services, highlighting the need for more systematic analysis of more websites– companies around the globe, especially in Asia, where the number of the users are about to be increased and, thus, the relevant energy consumption and pollution will be raised significantly if the green practices are not followed.

Apart from these findings, the research reveals the connection between CH and green practices. It shows that the EU countries, such as the Netherlands, follow sustainable strategic decisions to promote environmental solutions concerning several aspects of everyday life, such as green hosting and data centers. Surprisingly, countries such as Greece, which is not known for its environmental initiatives, seem to follow the sustainable practices of other influential nations such as Germany and the Netherlands. Furthermore, despite their small population, the EU countries seem to take the lead in dealing with the negative environmental issues provoked by the Internet and website services. To attract the customers' interest, many companies are highlighting different aspects of their green practices, such as a green life-office, which does not seem to be strictly connected with the problems of power consumption and CO<sup>2</sup> emissions. That is a finding that underlines again the need for a more systematic assessment of the overall operations of the websites around the globe and the need for the companies to become more accountable by providing more relevant data to the public, instead of relying on other initiatives and organizations to evaluate if their services are actually following the known green practices.

Finally, future studies can take a closer look at other aspects of green practices and culture (i.e., if the green websites choose to use solar, wave, or wind power and how this usage is connected with each country's culture). That can be done by focusing more on the Asian countries (especially India and China), which due to their large population and the development of their Internet services, seem to be perceived as future global powerhouses. Therefore, a similar qualitative content analysis of such websites will enhance the relevant bibliography of green practices and culture.

**Author Contributions:** Conceptualization, M.-A.K. and N.A.; methodology M.-A.K. and N.A.; validation, M.-A.K. and N.A.; formal analysis, M.-A.K. and N.A.; investigation, M.-A.K. and N.A.; resources, M.-A.K. and N.A.; data curation, M.-A.K. and N.A.; writing—original draft preparation, M.-A.K. and N.A.; writing—review and editing, M.-A.K. and N.A.; visualization, M.-A.K. and N.A.; supervision, M.-A.K. and N.A.; project administration, M.-A.K. and N.A.; funding acquisition, M.-A.K. and N.A. All authors have read and agreed to the published version of the manuscript.

**Funding:** This article is based on research undertaken by the lead author while a doctoral student at Hong Kong Baptist University, supported by the Hong Kong PhD Fellowship Scheme (HKPFS). Apart from that scholarship, the authors received no other financial support for the research, authorship and/or publication of this article.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data used for the study were publicly available, and the authors retrieved them.

**Acknowledgments:** The authors want to thank the Editors and the Reviewers for their constructive comments that improved the current paper.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Data-Driven Analytics towards Software Sustainability: The Case of Open-Source Multimedia Tools on Cultural Storytelling**

**Michail D. Papamichail \* and Andreas L. Symeonidis**

Electrical and Computer Engineering Department, Aristotle University of Thessaloniki, Egnatia Str., University Campus, 54124 Thessaloniki, Greece; symeonid@ece.auth.gr **\*** Correspondence: mpapamic@issel.ee.auth.gr; Tel.: +30-2310-996-349

**Abstract:** The continuous evolution of modern software technologies combined with the deluge of available "ready-to-use" data has triggered revolutionary breakthroughs in several domains, preservation of cultural heritage included. This breakthrough is more than obvious just by considering the numerous multimedia tools and frameworks that actually serve as a means of providing enhanced cultural storytelling experiences (e.g., navigation in historical sites using VR, 3D modeling of artifacts, or even holograms), which are now readily available. In this context and inspired by the vital importance of sustainability as a concept that expresses the need to create the necessary conditions for future generations to use and evolve present artifacts, we target the software engineering domain and propose a systematic way towards measuring the extent to which a software artifact developed and applied in the cultural heritage domain is sustainable. To that end, we present a data-driven methodology that harnesses data residing in online software repositories and involves the analysis of various open-source multimedia tools and frameworks.

**Keywords:** software sustainability; multimedia tools; static analysis; evolution analytics

**Citation:** Papamichail, M.D.; Symeonidis, A.L. Data-Driven Analytics towards Software Sustainability: The Case of Open-Source Multimedia Tools on Cultural Storytelling. *Sustainability* **2021**, *13*, 1079. https://doi.org/ 10.3390/su13031079

Received: 20 December 2020 Accepted: 18 January 2021 Published: 21 January 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

Given the definition provided by UNESCO [1], cultural heritage encompasses two main categories. The first is *tangible cultural heritage*, which refers to certain artifacts that survive in time such as paintings, manuscripts, sculptures, monuments, as well as cities or underwater ruins. The second is *intangible cultural heritage*, which refers to non-physical artifacts such as oral traditions, performing arts, rituals, as well as social practices and pre-existing knowledge. Preserving cultural heritage is of vital importance, since it is the only way to constitute the necessary knowledge-base upon which we as humans can evolve. Although preserving tangible cultural heritage artifacts is somehow "straightforward", preserving intangible ones requires recording, semantic annotation, and augmentation to put information in the right context. In fact, tangible artifacts are better perceived using such techniques. To that end, modern multimedia tools and frameworks provide several opportunities [2,3] as they can serve as a means for providing enhanced cultural storytelling experiences.

Living in a world of continuous digitalization [4], one could also argue that computer and software development trends carry along important cultural characteristics. Watching a "computer" or "science fiction" movie of the 1980s or 1990s, playing an arcade game, or listening to a certain type of music reveals many cultural heritage elements on how people lived, communicated, and entertained themselves [5]. In fact, similar to the case of typical cultural heritage assets, where preservation is of vital importance, the same applies in the software engineering domain where the need to produce sustainable software has been defined as one of the key challenges in the fields of computational science and engineering [6]. This fact is more than evident considering that maintainability (the "official" term for sustainability in the "software engineering" language) is one of the most important quality characteristics according to ISO/IEC 25010:2011 [7]. Maintainability is defined as

the "degree of effectiveness and efficiency by which a product or system can be modified by the intended maintainers". The importance of producing maintainable software is also indicated by the fact that maintenance-related tasks (i.e., code refactoring given the existence of updated requirements or bugs) often require up to 80% of the total effort put into the software project [8]. To that end, several research efforts are directed towards the identification and construction of efficient methodologies along with the respective tools that enable evaluating maintainability [9].

Assessing the extent to which a software component is maintainable is a multifaceted problem and is defined by the scope and internal characteristics of every software project. These characteristics greatly influence the effort required to modify and/or extend the project based on changes that occur both in terms of functional and non-functional requirements [10,11]. Various metrics have been proposed to model maintainability that quantify several primary properties of the source code such as cohesion, complexity, coupling, and degree of inheritance [12,13]. These metrics are often used as the information basis upon constructing maintainability evaluation models and predictors [8,14]. Apart from harnessing this information for assessing the extent to which a source code component is maintainable, constructing models requires defining the appropriate thresholds, which is a non-trivial task often taken by experts who manually examine the source code so as to decide the desirable values. However, considering the fact that maintainability evaluation is an adaptable procedure that requires examining various parameters over the project lifecycle along with the fact that maintenance effort is highly context-dependent, existing approaches are often restricted to certain use-case scenarios.

In this work, we argue that assessing software maintainability is not a one-off procedure, but a constantly running process throughout the development lifecycle that requires being able to adapt to the constantly changing characteristics of the software project. Given the continuously increasing size and complexity of the software projects currently, which makes the manual examination of the source code unattainable, we extend the data-driven methodology proposed by Papamichail et al. [8], which harnesses information residing in online code hosting facilities towards building a maintainability evaluation methodology based on the analysis of software releases. As opposed to the aforementioned approach, which involves analyzing various software projects without accounting for their score and given the highly context-dependent nature of maintainability, we focus on a specific domain and show how the scope of software projects indeed affects the software maintainability methodology. In order to be aligned with the scope of this Special Issue, we target the multimedia domain, a domain highly related to preserving cultural heritage. Upon formulating our benchmark dataset, we apply static analysis to more than one hundred open-source multimedia tools and frameworks in order to compute a large number of metrics along with their evolution in time. This information constitutes the basis upon which we construct four models, each evaluating maintainability from the perspective of a certain code's property.

The rest of this paper is organized as follows. Section 2 reviews the current literature approaches on maintainability evaluation, while Section 3 discusses the concepts included in our maintainability evaluation methodology along with the construction of our benchmark dataset. Section 4 presents the components of our maintainability evaluation system, as well as the the steps involved towards the generalization of our models. Finally, Section 5 evaluates our approach on a set of diverse axes, while Section 6 concludes the paper and provides insights for further research.

#### **2. Background Knowledge**

According to several studies [14–17], maintainability prediction is considered as one of the most challenging tasks involved in the area of software quality. To that end and given its significance as a quality attribute, several research efforts are directed towards proposing methodologies that aspire to assess the extent to which a source code component is maintainable. In this context, the majority of the proposed approaches construct models based on the values of static analysis metrics that quantify several aspects of the source code [8,18,19].

Although using metrics for evaluating maintainability has been proven efficient in certain use case scenarios, it exhibits certain inherent weaknesses. At first, using metrics requires setting the appropriate thresholds and/or defining the acceptable intervals. Given that this is a multi-faceted problem that requires taking into account various parameters, this process is usually taken by quality experts who are responsible for examining the source code and come up with the necessary quality targets [20,21]. However, the manual examination of the source code is both time- and resources-consuming, especially for large and complex projects. On top of that, this process is usually not feasible considering the fact that maintainability evaluation involves analyzing the source code on a regular basis given the changes that occur throughout the development process. These changes refer to both functional and non-functional requirements, and their frequency is highly dependent on the field of application of the software projects under evaluation. Especially in the multimedia domain, these changes are frequent considering that software needs to handle various different devices and formats, as well as the constantly updated architectures and communication protocols [22].

Given the limitations of expert-aided solutions, several approaches employ machine learning as a way to model the influence of the values of static analysis metrics with the maintainability degree of software components. In this context, Koten and Gray [23] used empirical data so as to train a Bayesian Belief Network (BBN) for assessing software maintainability, while Cong and Liu [24] applied a fuzzy C-means clustering technique as the preprocessing step towards evaluating maintainability using a Support Vector Regression (SVR) model. Additional maintainability evaluation approaches suggest the usage of Artificial Neural Network (ANN) models [25] and Adaptive Multivariate Regression Splines (MARSs) [26]. Although efficient, these approaches do not account for the evolution of the software project under evaluation and thus do not provide the ability to predict non-maintainability at an early stage (before occurrence) when the required refactoring along with the respective change cost is minimal.

In an effort to overcome the aforementioned limitations and provide models that enable predicting maintainability, there are also approaches that employ release information as a way to monitor software evolution over time [27,28]. In a similar research direction, our prior work [8] suggested that the evolution of the values of static analysis metrics as reflected in their linear trends can be used as a maintainability indicator. In the context of this approach, we harness information residing in online code hosting facilities so as to identify non-maintainable code components and thus construct the ground truth upon which we build our maintainability evaluation models.

In this work, we employ the aforementioned approach and extend it in several directions in order to create an efficient maintainability evaluation methodology applicable in a specific domain (in our case, the multimedia domain). At first, given that multimedia projects exhibit certain characteristics, we build a benchmark dataset that involves the analysis results of the most popular and reused multimedia tools, libraries, and frameworks. In an effort to provide a more accurate evolution analysis, we refrain from using releases, but we resort to analyzing the development lifecycle at the week level. Consequently, while trying to reduce false positives and given that trend analysis is highly dependent on the development phase, we design a methodology that combines evolution analytics with the absolute values of the static analysis metrics under examination. The evaluation of our approach indicates that our models are able to successfully model the special characteristics of software projects that target the multimedia domain and thus efficiently predict non-maintainability along with providing interpretable results.

#### **3. Source Code Evolution as a Maintainability Indicator**

In this section, we discuss our maintainability evaluation methodology built on information originating from the evolution of the source code throughout the project

lifecycle. Specifically, we present our modeling strategy towards defining the extent to which a software component that targets the multimedia domain is maintainable, inspired by the one proposed by Papamichail et al. [8]. Furthermore, we describe the construction of our benchmark dataset, which includes the values of various static analysis metrics computed for all multimedia-related tools, libraries, and frameworks included in the 1000 most starred and most forked GitHub Java projects.

#### *3.1. Towards Modeling Maintainability*

Employing the aforementioned maintainability evaluation approach [8] suggests that using the trends of several static analysis metrics is able to quantify the source code properties of *complexity*, *coupling*, *inheritance*, and *cohesion*. We further extend it and resort to defining the degree to which a software component is maintainable based on the combination of two key factors. The first refers to the evolution of the values of static analysis metrics as reflected in their trends, while the second refers to the absolute values of these metrics. Combining the absolute values of the static analysis metrics with their trends aims at reducing the number of false positives, especially in the cases of relatively new software projects where changes in metrics are intense. This change intensity may not always suggest that a component is becoming non-maintainable, but it should definitely act as a warning factor. The severity of this warning factor depends on the absolute values of the static analysis metrics. In addition, it is worth noting that the desirable intervals of the values of static analysis metrics do not involve expert knowledge, but originate from the benchmark dataset.

Given the above, we analyze the lifecycle of packages that have been dropped from certain multimedia software projects (considered as candidates for non-maintainability occurrence), as reflected in the progressing behavior of a series of static analysis metrics along with their absolute values. Instead of using releases for defining the frequency of our analyses and in an effort to capture the progressing behavior of metrics in a more efficient manner, we analyze projects on a weekly basis. This design choice originates from the fact that the release schedule is subject to change especially in projects with a long lifecycle.

Figure 1 illustrates the evolution of the Nesting Level (NL) metric for the package *com.eftimoff.androipathview* included in the repository *geftimov/android-pathview* (https:// github.com/geftimov/android-pathview) over its full lifecycle, which consists of 73 weeks. Given the presented evolution, it is obvious that there are certain time periods where there are no changes in the respective package (for instance, the time period between Week 5 and Week 14). These idle periods refer to cases where the project appears to be inactive or the development focuses on different parts of the source code. As a result, in an effort to capture the actual evolution of each respective package, we keep only the weeks that exhibit at least one metric that has been changed. After having computed the actual change sequence for each package, we calculate the linear trend of each metric, which reflects its evolution behavior.

**Figure 1.** Overview of the evolution of the Lines Of Code (LOC) metric regarding the package *com.eftimoff.androipathview* of repository *geftimov/android-pathview*.

#### *3.2. Benchmark Dataset*

In an effort to create models that are tailor-made to the characteristics of the software projects that target the multimedia domain, our benchmark dataset includes the analysis results for all multimedia-related tools, libraries, and frameworks included in the 1000 most starred and forked GitHub Java projects. This selection originates from the fact that stars and forks reflect the degree of acceptance of the projects and thus their success among the community of developers. In addition, especially the high number of forks suggests that the projects adhere to certain software development principles and code writing guidelines and thus can be used as representative examples of the state-of-the-practice. Furthermore, projects that receive high traction are usually projects that exhibit a long lifecycle (usually several years) and a large number of contributors and thus are suitable for analyzing maintainability-related information.

Upon having extracted the information regarding the 1000 most starred and most forked GitHub projects, our first step involves selecting the ones that refer to the multimedia domain. To that end and in an effort to construct an automatic benchmark dataset formulation procedure, we use the GitHub API (https://api.github.com) and extract the description and the keywords of each project. Then, we check whether they contain words that are related to multimedia (such as *image*, *video*, *audio*, *view*, *sound*, *player*, and *media*, along with their synonyms). Following this process, we identified 114 projects, which constitute our benchmark dataset. Table 1 presents some general statistics regarding our benchmark dataset.

**Table 1.** Dataset statistics.


After having constructed our benchmark dataset, we perform two types of analysis. The first involves analyzing the latest version (last commit) of all projects in order to compute a large set of static analysis metrics that quantify four primary code properties: *cohesion*, *complexity*, *coupling*, and *inheritance*. This dataset is used to extract the profiles of the static analysis metrics and thus calculate the desirable intervals based on frequency analysis. The second analysis type refers to monitoring code evolution. In this context, we select 10 projects to perform a full lifecycle analysis at the week level. Given that performing full lifecycle analysis is a highly time- and resource-consuming task and in an effort to create a dataset that represents multimedia projects that exhibit different characteristics, this selection is based on the size, the complexity, and the length of the lifecycle of the projects. Of course, although the full analysis of the 10 projects provides enough information for building our maintainability evaluation models, given that our methodology is fully incremental, we can increase this number in order to further strengthen the effectiveness of our models. Information regarding our experimental setup along with the corresponding source code for creating our benchmark dataset can be found online (https://github.com/AuthEceSoftEng/multimedia-tools-sustainability).

In the context of this analysis, we extract information regarding the packages that have been removed, and this removal originates from them being non-maintainable. The analysis results for these packages are used to extract the trends of the static analysis metrics and thus create the training dataset for our maintainability evaluation models. The evolution analysis at the week level involves analyzing more than 50 M lines of code. Table 2 provides a full reference of the computed metrics along with their computation level (method or class). Given that all static analysis metrics are computed either at the class level or method level, we generate the value of each metric at the package level as the average of the values regarding all classes and methods included in the package. The computation of all static analysis metrics was performed using Sourcemeter (https://www.sourcemeter.com/) tool.


**Table 2.** Overview of the computed static analysis metrics.

#### **4. Maintainability Evaluation System**

In this section, we present our approach towards quantifying the extent to which a multimedia-related software component is maintainable. In addition, we describe the calculation of the desired intervals of the values of various static analysis metrics based on our benchmark dataset along with the construction of our models, each targeting a certain source code property.

#### *4.1. Overview*

Figure 2 provides a general overview of our maintainability evaluation system targeting multimedia tools, which involves the following steps.


metrics, while the second involves analyzing all the aforementioned commits of the 10 projects selected for lifecycle analysis. This information is going to be used to calculate the metrics' progressing behavior.


**Figure 2.** Overview of the designed system.

#### *4.2. Preprocessing*

The preprocessing step involves determining the non-maintainable packages by determining the source code property (or properties) that cause non-maintainability. This first step involves computing all metrics at the package level in order to identify their trends using linear regression. Given that we have already performed a full-scale analysis at the week level, we use the results in order to create the necessary mappings and thus identify the classes that are part of each package. In order to perform the necessary mappings, we use the package declarations located at the top of each source code file.

After having calculated all trends for all packages included in the 10 projects under evaluation, the next step involves extracting the packages that are being dropped. Upon using the commit information to sort the analyses in the correct order based on the commit timestamp, we identify the lifecycle of each unique package. The term lifecycle refers to the time period between the first week and the last week the package existed in the project. In case the index of the last week is not equal to the index of the last week of the project, then the package has been dropped and thus is considered as a candidate for being non-maintainable.

After having extracted all candidates, the next step involves applying a series of quality criteria so as to eliminate false positives given that dropping a package does not necessarily originate from actions that have to do with quality control. To that end, in an effort to maintain the purity of our dataset, we apply the following filters:


After having extracted all the non-maintainable packages, the next step involves deciding the property (or properties) that is (or are) responsible for them being nonmaintainable. For instance, a dropped package that appears to have a high positive trend for the Nesting Level (NL) metric is considered as non-maintainable due to complexity. Upon applying this process for all packages and code properties under evaluation (complexity, coupling, cohesion, and inheritance), we use this ground truth information for constructing our maintainability evaluation models.

#### *4.3. Metrics Behaviors' Extraction*

As already noted, our methodology involves using the analysis results for the 114 multimedia-related projects included in the 1000 most popular and reused GitHub Java projects in order to extract the general behavior of each static analysis metric. These behaviors are then used for translating the values of each static analysis metric into a score in the interval of [0, 1], which reflects the compliance of the source code component with the state-of-the-practice as extracted by the benchmark dataset. This score is used along with the metrics' trends for constructing our maintainability evaluation models. Given that each metric quantifies a certain property, the scores of all metrics that refer to a certain property are aggregated into a final score that reflects the property itself.

Our first step towards modeling the general behavior of each metric involves computing its distribution using all code components included in our benchmark dataset. In order to eliminate any introduced bias and given that different projects contain code components that exhibit high differences in terms of the values of static analysis metrics, we apply outlier detection techniques so as to eliminate extreme values. In this context, we use boxplot analysis and eliminate values that fall outside the interval [*Q*1 − 1.5 ∗ *IRQ*, *Q*3 + 1.5 ∗ *IRQ*], where *Q*1 and *Q*3 refer to the first and the third quartile, respectively, while *IRQ* refers to the the Interquartile Range. After having eliminated outliers, we compute the distribution of the values of each metric as reflected in their histogram. For selecting the appropriate bin size, we employ the Scott formula [30], which asymptotically minimizes the integrated mean squared error and represents a global error measure of a histogram estimate. Given the Scott formula, bin width is given by the following formula:

$$
\hat{B}\hat{m}\hat{M}\hat{d}\hat{H}\hat{h} = \mathbf{3.49} \cdot \hat{\sigma} \cdot \boldsymbol{n}^{-1/3} \tag{1}
$$

In the above equation, *σ*ˆ is an estimate of the standard deviation of the metric values and *n* is the size of the data sample.

Upon having extracted the generic distribution of the values of each static analysis metric following the aforementioned procedure, we use the generated bins in order to construct a set of data instances that translate the values of each metric into a compliance score. These data instances have the form [*BinCenter*, *Score*], where *BinCenter* refers to the center of each bin and *Score* refers to the normalized frequency of the bin. In that way, the bins of higher frequency receive higher scores. In an effort to model the identified behaviors, we apply polynomial regression on the set of data instances produced in the previous step, and the result for each metric is an evaluation model able to translate the values of the metric into a score given the standards of the benchmark dataset.

Figure 3 illustrates the aforementioned procedure for the case of the *Nesting Level* (NL) metric, which is computed at the class level. The blue bars depict the histogram of the NL values (the ones kept after the outlier detection step), while the black dashed line refers to the fitted curve that translates NL values into a complexity score. The degree of the polynomial for each metric s determined using the elbow method of the *Root-Mean-Squared-Error (RMSE)*. This ensures that the constructed models are effective and able to provide reasonable estimates, while we avoid overfitting. Given the actual scores *y<sup>i</sup>* , *y*2, . . . , *y<sup>n</sup>* and the predicted scores *y* ˆ <sup>1</sup>, *y* ˆ2, . . . , *y* ˆ*n*, the RMSE is calculated as follows:

$$RMSE = \sqrt{\frac{1}{N} \cdot \sum\_{i=1}^{N} (\mathcal{g}\_i - y\_i)^2} \tag{2}$$

**Figure 3.** Overview of the fitting procedure based on the general distribution of the nesting level metric.

The RMSE and Mean Absolute Error (MAE) of the polynomial regression models for all metrics computed at the method and class levels are shown in Table 3.


**Table 3.** Polynomial regression results.

#### *4.4. Models' Construction*

As already noted, after having calculated the metrics' trends for the packages identified as non-maintainable along with the respective property (or properties) flagged as responsible for the non-maintainability, the next step involves training four maintainability evaluation models, each targeting a certain source code property. Given that we only have information regarding the packages identified as non-maintainable (we cannot come to a conclusion for the other packages), we employ one class classification using Support Vector Machines (SVMs). The selection of four models instead of one (using all metrics) originates from the fact that our primary target was building a configurable and interpretable

maintainability evaluation system able to adapt to the individual needs of each project under examination.

As for training each model, we use only the packages that were flagged for the respective source code property. The attributes of the training dataset are the computed trends of the static analysis metrics that quantify the respective property along with the score computed using the general behavior of metrics. Table 4 presents the number of packages identified as non-maintainable for each source code property, while Table 5 provides information regarding the selection of meta-parameters for each one-class classifier. This selection is based on the percentage of False Negatives (FNs) and optimizes the values of three meta-parameters: *nu*, which corresponds to the fraction of training errors and a lower bound of the fraction of support vectors, *gamma*, which is the kernel coefficient that reflects how far the influence of a single training example reaches, and *cost*, which trades off the misclassification of training examples against the simplicity of the decision surface. As shown in Table 4, coupling and complexity are the dominant properties responsible for most non-maintainable occurrences.


**Table 4.** Number of non-maintainable packages per source code property.

The following paragraphs present the training results regrading the trained maintainability evaluation models, each targeting a different primary source code property.

• Complexity model:

The dataset includes the trends regarding five static analysis metrics that are related to complexity: NL, Nesting Level Else-if (NLE), Weighted Methods per Class (WMC), McCabe Cyclomatic Complexity (McCC), and Halstead Program Length (HPL). As shown in Table 5, the selected values for the nu, gamma, and cost parameters are 0.022, 0.134, and 512, respectively. The percentage of the FNs is 2.62%.

• Cohesion model:

The dataset includes the trends regarding the Lack of Cohesion in Methods (LCOM5) metric, which corresponds to the number of coherent classes into which each class could be split. In a similar manner to the aforementioned analysis, the selected values are 0.041, 0.047, and 256 for the nu, gamma, and cost parameters, respectively, while the percentage of false negatives is 3.37%.

• Coupling model:

The dataset includes the trends regarding five static analysis metrics that are related to coupling: Coupling Between Object classes (CBO), Coupling Between Object classes Inverse (CBOI), Number of Incoming Invocations (NII), Number of Outgoing Invocations (NOI), and Response set For Class (RFC). For the coupling model, the selected values are 0.03, 0.06, and 256 for the nu, gamma, and cost parameters, respectively, while the percentage of false negatives is 2.84%.

• Inheritance model:

The dataset includes the trends regarding five static analysis metrics that are related to inheritance: Depth of Inheritance Tree (DIT), Number Of Ancestors (NOA), Number Of Children (NOC), Number Of Descendants (NOD), and Number Of Parents (NOP). For the inheritance model, the selected values are 0.027, 0.12, and 32 for the nu, gamma, and cost parameters, respectively, while the percentage of false negatives is 2.67%.


**Table 5.** Statistics regarding the selection of meta-parameters for the constructed models based on the percentage of False Negatives (FNs).

#### **5. Evaluation**

The evaluation of our maintainability evaluation methodology is performed around three axes. The first evaluates our system for its ability to predict the maintainability degree at the package level for a number of randomly selected multimedia-related projects that exhibit different characteristics in terms of the size and length of the lifecycle. The second axis evaluates our system for its ability to predict non-maintainability at an earlier stage, while the third evaluates the maintainability evaluation results from a software quality perspective.

#### *5.1. Efficiency of Maintainability Evaluation*

Our first evaluation axis assesses the ability of our maintainability evaluation system to effectively identify non-maintainable packages by employing the evolution of static analysis metrics as reflected in their linear trends along with their compliance with the stateof-the-practice as reflected in their acceptable intervals based on the constructed benchmark dataset. To that end, we apply our methodology on four independent and randomly selected multimedia projects. Namely, the selected projects are alexvasilkov/GestureViews (https://github.com/alexvasilkov/GestureViews), graphhopper/graphhopper (https:// github.com/graphhopper/graphhopper), janishar/PlaceHolderView (https://github.com/ janishar/PlaceHolderView), and wyouflf/xUtils3 (https://github.com/wyouflf/xUtils3). Table 6 presents certain statistics regarding the evaluation repositories. As given by the provided statistics, the evaluation repositories differ both in terms of size and in their length of lifecycle. At this point, it is worth noting that the full-scale analysis for the aforementioned repositories involves analyzing more than 25 million lines of code.

**Table 6.** Statistics of evaluation repositories.


Figure 4 gives a graphical representation of our maintainability evaluation results using the janishar/PlaceHolderView project as our reference repository. The lifecycle of the PlaceHolderView project is 96 weeks, and it includes 24 packages that contain around 9000 lines of code. Upon analyzing a snapshot of the project for each one of the 96 weeks (around 500 K lines of code), we compute the trends of all static analysis metrics that quantify the properties complexity, coupling, inheritance, and cohesion. In addition, given the constructed models that enable translating the values of each static analysis metric into a compliance score (using the extracted general distributions), we compute for each property one score for each package that expresses its compliance with the acceptable

intervals of the static analysis metrics that quantify the property. The computed trends along with the compliance scores are then given as the input in our already constructed maintainability evaluation models, each targeting a certain code property. Each row of the heat map illustrates the maintainability evaluation results based on a different property. The green color denotes that the package is considered as maintainable regarding the respective code property, while red indicates that the package is considered as nonmaintainable. The final maintainability score for each package occurs as the average of the four respective properties. This final score reflects the risk of the package becoming non-maintainable, while the disaggregation provides interpretable results regarding the properties that need improvement.

**Figure 4.** Overview of the maintainability evaluation results for the repository janishar/PlaceHolderView.

Upon further evaluating the ability of our models to effectively identify non-maintainable packages, we present the results for the four repositories used for evaluation. Table 7 presents the respective results based on the sensitivity criterion along with the percentage of packages identified as non-maintainable for each source code property. Sensitivity was chosen as our evaluation criterion as it expresses the proportion of true positives that are correctly identified by our models given that we can only come to a safe conclusion for non-maintainable packages. Given the provided results, the sensitivity (true positive rate) of our maintainability evaluation approach varies from 76.12% (janishar/PlaceHolderView project) to 92.47% (graphhopper/graphhopper project), which indicates that our models are able to effectively identify non-maintainable packages. Finally, as for the properties responsible for the non-maintainable packages, they indicate that every project exhibits different strengths and weaknesses as they vary among the evaluation repositories. This is expected as the characteristics of each project are greatly influenced by its scope. For instance, given the nature of the graphhopper project, which implements a routing engine for OpenStreetMap, along with its large size and complex functionality, it is expected to exhibit a higher percentage of non-maintainable packages based on complexity and coupling.

**Table 7.** Maintainability evaluation results.


#### *5.2. Ability to Provide Early Predictions*

Given the vital importance of predicting maintainability at an earlier stage as a way to prevent cases where major refactoring is needed, which is a highly time- and resourceconsuming task, our second evaluation axis targets assessing the ability of our system to

provide early predictions and thus act in a preventive rather than in a corrective manner. The early prediction refers to the percentage of lifecycles (expressed as the number of weeks) for which our models are able to correctly identify non-maintainability.

To that end, we calculated the metrics' trends for every package and for every week in the lifecycle taking into account only the previous releases. For instance, given a certain package that appears to be in the project for 60 weeks (this time period constitutes its full lifecycle) and is then being dropped, we use only the values of the first 45 weeks (as if our current timestamp was the 45th week) in order to calculate the metrics' trends and use our models to evaluate its maintainability degree. If we successfully identify the package as non-maintainable, then we have a correct prediction 15 weeks ahead, which corresponds to 25% of the package lifecycle. Using this strategy for all weeks, we were able to assess the maintainability degree of each package for every release, as if it was the current release and thus calculated the number of releases ahead for which our models provided correct evaluation. The number of releases was then transformed into the percentage of lifecycles for each package by dividing it by the total number of weeks. As already noted, we use the term lifecycle for a package in order to refer to the time period between the first and the last week it existed in the software project.

Figure 5 illustrates the ability of each evaluation model to provide early predictions. Specifically, the *y* axis corresponds to the percentage of packages correctly identified as non-maintainable for each source code property, while the x axis refers to the percentage of lifecycles divided into ten intervals. Each interval expresses the percentage of the lifecycles ahead for which our models is able to provide correct maintainability evaluation. For instance, the first interval refers to the time period between the current week (0% ahead) up to 10% ahead. Given a certain interval, each bar refers to a different code property. As expected, while the lifecycle ahead increases, the percentage of correctly identified packages decreases. The results indicate that all four models are able to provide correct evaluation (for almost 50% of the non-maintainable packages) at least 50% to 60% earlier.

**Figure 5.** Overview of the percentage of the correctly identified non-maintainable packages for the evaluation repositories.

#### *5.3. Case Study*

As for the third evaluation axis and in an effort to assess whether the compliance scores computed using the general distribution of metrics are logical from a software quality perspective, we manually examined the values of the static analysis metrics for methods and classes that received both high and low compliance scores (these scores are then aggregated into the package level and are used along with the metrics' trends for modeling). Table 8 provides an overview of the computed static analysis metrics for representative examples of methods and classes with different scores. The table contains static analysis metrics for two methods and two classes regarding each source code property that received both high and low scores.

Examining the values of the metrics, we may note that the scores regarding all four properties are reasonable from a quality perspective. Concerning the class that received a high cohesion score, it appears to be very cohesive as the LCOM5 (Lack of Cohesion in Methods 5) metric, which refers to the number of cohesive classes in which a non-cohesive class should be split, is one. From a complexity perspective, the class that received a high score appears to be very well structured, which is denoted by the low values of the nesting level (NL and NLE) metrics, along with the low value (eight) of the Weighted Methods per Class (WMC) metric. The latter is computed as the sum of the McCabe's Cyclomatic Complexity (McCC) values of its local methods. As a result, a high score for the complexity property is expected. The same applies for the case of methods, where the one that received a low score appears to be highly nested and of extreme complexity considering the value of McCC (31).

As for the coupling property, the class that received a high score appears to be very well decoupled, which is denoted by the values of all five metrics. The same applies for the respective methods that exhibit low values regarding the metrics NII and NOI, which refer to the number of incoming and outgoing invocations, respectively. On the other hand, the code components that receive low scores appear to be highly coupled, which has a negative impact on their maintainability degree. Finally, given the values of the static analysis metrics that quantify inheritance, the class that received a low score appears to lie deep in the code inheritance tree, which has a negative impact in its understandability. This fact also affects its maintainability degree. On the other hand, the class that received a high score appears to be well placed in the inheritance tree following the principles of object-oriented programming. Given all of the above, our scoring mechanism appears to be able to effectively translate the values of static analysis metrics into an interpretable compliance score.


**Table 8.** Overview of the scores for methods and classes that received both high and low compliance scores.

#### **6. Conclusions and Future Work**

In this work, we propose a maintainability evaluation methodology targeting multimedia projects that harness information residing in code hosting facilities. Our methodology, applicable at the package level, employs the evolution of static analysis metrics along with the compliance of the source code with their acceptable intervals as extracted by the benchmark dataset, which contains multimedia projects that receive a high degree of acceptance by the community of developers. Upon performing a thorough code analysis on a weekly basis and in an effort to provide interpretable results, our methodology quantifies the extent to which a software component targeting the multimedia domain is maintainable by evaluating four axes, each targeting a primary source code property: complexity, coupling, inheritance, and cohesion. The evaluation of our approach denotes that our models are able to predict maintainability at an earlier stage (in many cases, more than 50% of the project lifecycle), while at the same time, the results regarding all four axes are logical from a software quality point of view. Considering all the above, we argue that our system can be a valuable tool for developers.

Future work lies in several directions. At first, we could further expand the selection of metrics to be used for the construction of our models. In addition, we could also expand the trend analysis by employing additional trend types, especially non-linear, in order to be able to identify more complex behaviors. Furthermore, we could also apply clustering techniques in order to split our code components into coherent clusters and thus construct additional models, each applying to the specific characteristics of each cluster. Finally, we could also expand our benchmark dataset with more multimedia-related projects to cover additional use case scenarios and thus strengthen the effectiveness of our system.

**Author Contributions:** Conceptualization, M.D.P. and A.L.S.; Methodology, M.D.P. and A.L.S.; Software, M.D.P. and A.L.S.; Writing—original draft, M.D.P. and A.L.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research has been co-financed by the European Regional Development Fund of the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call RESEARCH-CREATE-INNOVATE (project code: T1EDK-02347).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Data is contained within the article.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Digital Storytelling in Cultural Heritage: Audience Engagement in the Interactive Documentary New Life**

**Anna Podara \* , Dimitrios Giomelakis , Constantinos Nicolaou , Maria Matsiola and Rigas Kotsakis**

Faculty of Economic and Political Sciences, School of Journalism and Mass Communications, Aristotle University of Thessaloniki, 54636 Thessaloniki, Greece; dgiomela@jour.auth.gr (D.G.); nicolaouc@jour.auth.gr (C.N.); mmat@jour.auth.gr (M.M.); rkotsakis@jour.auth.gr (R.K.)

**\*** Correspondence: apodara@jour.auth.gr; Tel.: +30-231-099-4284

**Abstract:** This paper casts light on cultural heritage storytelling in the context of interactive documentary, a hybrid media genre that employs a full range of multimedia tools to document reality, provide sustainability of the production and successful engagement of the audience. The main research hypotheses are enclosed in the statements: (a) the interactive documentary is considered a valuable tool for the sustainability of cultural heritage and (b) digital approaches to documentary storytelling can provide a sustainable form of viewing during the years. Using the Greek interactive documentary (i-doc) NEW LIFE (2013) as a case study, the users' engagement is evaluated by analyzing items from a seven-year database of web metrics. Specifically, we explore the adopted ways of the interactive documentary users to engage with the storytelling, the depth to which they were involved along with the most popular sections/traffic sources and finally, the differences between the first launch period and latest years were investigated. We concluded that interactivity affordances of this genre enhance the social dimension of cultural, while the key factors for sustainability are mainly (a) constant promotion with transmedia approach; (b) data-driven evaluation and reform; and (c) a good story that gathers relevant niches, with specific interest to the story.

**Citation:** Podara, A.; Giomelakis, D.; Nicolaou, C.; Matsiola, M.; Kotsakis, R. Digital Storytelling in Cultural Heritage: Audience Engagement in the Interactive Documentary New Life. *Sustainability* **2021**, *13*, 1193. https://doi.org/10.3390/su13031193

Academic Editor: Asterios Bakolas Received: 19 December 2020 Accepted: 20 January 2021 Published: 23 January 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

**Keywords:** interactive documentary; cultural heritage; audience engagement; sustainability; digital storytelling; intangible heritage; media users' engagement

#### **1. Introduction**

Digitization has made a profound impact on the way we manage cultural heritage material, both online and on site. Concurrently, audio and video content transfer made possible the production of the most appealing stories to the audience [1–5]. As the technical resources that are available to the storytellers evolve, the forms of the narratives evolve too. Emerging media technologies, such as hypertexts and multimedia resources are incorporated in the narration adding new concepts and promoting engagement, thus creating new media cultures. In this digital era of streaming media, audiovisual content is consumed more than at any other era ever before [4], while online viewing has not displaced but has rather worked as a compliment to the traditional media viewing [5]. This fact has brought further potential not only for the producers but also for the audience, setting the field for more personalized content addressed to the fragmented online audience. Furthermore, the use of digital techniques introduces (e.g., through either interactive documentary, the application of digital communication techniques, or both) several new opportunities in the cultural heritage field which effectively engage the audience and create an impact.

The position taken in this article is that interactive documentary (i-doc from here on) can be used for sustaining cultural heritage in the new digital era. Documentaries bear a long history in playing an important role in cultural heritage documentation and preservation. As researchers admit every film is a documentary [6], and every documentary is a cultural document [7] that influences and drives cultural response. I-doc, a relatively new genre that converges traditional documentaries and digital media (e.g., social networks,

social media and platforms, audiovisual platforms, new media, etc.) can be characterized as one of the most powerful tools to explain reality and present observations of culture and people [8]. The focus is made on the relationship between cultural heritage stories and their audience and the potential growth of engagement though interactive storytelling techniques. Their interactivity affordances enhance the social dimension of cultural projects, since the audiences may not just observe or watch but also engage with the story, which means that they can comment, express different points of view or share in social networks [9].

Given that there is too little audience analysis published on this new genre, our research wants to shed light on issues of engagement and sustainability of digital storytelling in cultural heritage, though a theoretically and empirically rich discussion. This can be useful both for filmmakers and for curators as well as policy makers of cultural heritage, who are interested in increasing the audiences' engagement through digital storytelling practices. In this paper, we employ as a case study the i-doc NEW LIFE (2013), which is one of the first interactive documentaries created in Greece. The i-doc presents the "new life" that a group of refugees who left ancient Lampsakos (present name Lapseki) in Asia Minor, Turkey, after the Minor Asia Catastrophe occured in their settlement, in New Lampsakos, Greece. The story is enriched with oral history testimonies from residents of both villages in Greece and Turkey, on-field observation and documentation of their intangible cultural practices; while also unseen archive material is included, contributing to the preservation of collective historical memory of the community. In that frame, we examine the engagement of the audiences/users in an interactive documentary in the field of cultural heritage through data derived from the website's Google analytics (GA) database, drawing upon a large 80-month dataset. To contribute to the scientific discourse on the subject, the following statements enclose the main hypotheses that were initially analyzed by the researchers: (a) the interactive documentary is considered a valuable tool for the sustainability of cultural heritage (H1); and (b) according to literature review, digital, interactive approaches to documentary storytelling can provide a sustainable form of viewing during the years, engaging online viewers of the streaming era (H2). These queries were used as the basis to further develop the objective of this study. The questions which were posed as specific research objectives were decided should serve two goals: (a) to measure the online audiences' engagement with interactive storytelling, to search for patterns over media usage and to reflect its use in cultural heritage area (RO1); and (b) to examine in practice the existing knowledge and aspirations about interactive documentaries employing empirical ways of understanding provided by web analytics (e.g., GA) (RO2).

In the following sections and initially under the Literature Review section (Section 2), the significance of digital storytelling in cultural heritage is acknowledged, as it is considered a valuable tool in creating repositories of cultural material and specifically in the audiovisual form it may provide vivid representations thus sustaining customs. In the same section, the interactive documentary as a new genre for new-styled audience is presented both conceptually and technically. The web analytics and audience metrics are completing this part of the paper by defining the terms employed and underlining their importance as tools used to provide useful insights for the online presence. Subsequently, the research methodology is justified in the Materials and Methods section (Section 3) where the quantitative analysis employed in the research with data derived through web analytics from the website's GA dashboard is thoroughly described. In addition, the Greek i-doc NEW LIFE project, which is used as the case study, is introduced step by step following all the procedures engaged during all the phases from the production to the dissemination. Afterwards, the findings of the research are presented and explained in the Results section (Section 4). Next, the findings are justified and explained further (Section 5), while finally the authors' perceptions on the research are stated, the limitations are referred, and additional studies are suggested (Section 6).

#### **2. Literature Review**

#### *2.1. Digital Storytelling in Cultura Heritage*

Cultural heritage's significance in sustaining the evolution of human and maintaining society cohesion is indisputable. Digital storytelling as a form of digital narrative may encompass many alternative ways and tools (i.e., interactive stories, multimedia presentations, web-based games, etc.) in the presentation of a story that nowadays is employed to attract and engage audiences in many areas in a revolutionary way [10], cultural heritage preservation included [11]. Innovative customizable web-based authoring tools and mobile applications are deployed in that field [12,13]. In the situations of tangible cultural heritage where museums or archeological sites are to be presented, many specialties from multidisciplinary fields, such as archeologists, museologists and creative designers (e.g., art directors, sound engineers, etc.) and communicators (e.g., advertisers, public relations professionals, etc.) must co-operate to result in the best narrative that the audiences will comprehend [13]. In some cases interactive elements, such as quizzes may be utilized to offer enhanced experience and further engagement or alternatively possible connections as branching points between the parts that will lead the users meaningfully to the next path [12].

Digital technology is a simple, though valuable, tool to archive and create repositories of cultural elements. Especially in the form of audiovisual narrative, it attempts to vividly perpetuate the customs, representations and artifacts as well as leave testimonies from one generation to another, thus strengthening the community ties; while through the interactive mode of an i-doc it aims at offering rich-media experiences. The cultural diversity of the world, as expressed through traditions and events is in peril by the rapid pace of contemporary life, technological and economic development and globalization [14]. As Lenzerini [15] (p. 103) argues "the cultural variety of humanity is progressively and dangerously tending towards uniformity", therefore the need for the large and younger public to experience cultural heritage in a new more engaging way in a modernized context is crucial.

The driving force for that engagement would be, along with the narrative tools that present the characters through the plot of the story, which is created over the division into basic units, the perceived authenticity that the i-doc may present. Furthermore, the access that is provided through interactive mechanisms, which younger people (e.g., people of Generation Z) are familiar with and fragmented narrative that complies with the attention span of that cohort [4,5,16–20], may prove to be valuable tools in conveying the story in an entertaining way. Young audiences are accustomed to receiving information by pieces and they do not present patience in watching long videos [21], therefore they must be approached in alternative structures that give them the opportunity to choose how deep they want to get into the story [22].

#### *2.2. Interactive Documentaries: A New Genre for a New-Styled Audience*

The i-doc is a relatively new genre that applies to the basic characteristics of the online audience, mentioned above; it involves taking action and making choices in the viewing process, fragmentation and personalization [21,23]. These features emerge from the new media technologies that include hypertext, remediation, modulation, and interactivity and which at the end constitute a combination of cinema and digital technologies [24]. Castells [8] defines interactive documentaries (i-docs from here on) as "interactive online/offline applications, carried out with the intention to represent reality with their own mechanisms, which we will call navigation and interaction modalities, depending on the degree of participation under consideration". Aston and Gaudenzi [25] have employed a broad definition which does not relate interactive documentaries only to interactive platforms but also recognizes the interactivity as a part of the production process. According to them "any project that starts with the intention to document the real and that uses digital interactive technology to realize this intention can be considered an interactive documentary" [25].

I-docs are stand-alone websites, whose narrative is web-customized. They are not to be confused with cross-media project platforms, where the website serves as an accompanying database and not as the first viewing platform. It is difficult to establish specific categorization since the core of this hybrid genre is a complex and constantly changing way of expression [26]. The categorization based on four different interactivity modes to present the modern landscape of i-docs is considered in this study [25]. The first one is conversational genre, in which docu-games belong, where the viewer has the illusion of being in conversation with the computer. Another one is the experiential mode that creates experiences blurring the virtual and the physical space (i.e., 360◦ Virtual Reality (VR) or 3D documentaries). Nowadays, one of the most used genres of i-doc employs a hypertext mode of interactivity, where i-doc works similarly with the interactivity logic of DVDs and Blue Ray discs (i.e., "click here, go there"). In this kind of documentary, the viewer moves in any exploratory way in a closed video archive. One other contemporarily highly used genre applies a participative mode where the audience collaborates with the producers, to create an open database, which is constantly evolving. The users are asked to contribute with footage, answer questions or provide help, such as translation. This is the kind of documentary that in the early days of the genre was named "database documentary" [27,28].

The last years, shifts have occurred in the characteristics of the new audience. Technological affordances of new media have highlighted and created new modes of viewing. There has been a switch from passive audience to active media users, who are looking for content that they are interested in, creating many fragmented and autonomous niches [29]. Modern audience studies employ the term "media user", an individualistic term, opposed to the collective term "audience" [30]. The change does not only involve the end-user; important changes to the distribution, exhibition and promotion of media content have arisen as well [17,31,32]. Online users discover their content through platform algorithms [33] or through their personal bubble in social networks [5,16,18]. Moreover, audience activity cannot be questioned since it is now exposed [34] through web analytics. In the new era, the screen is not only considered as a medium of projecting reality, as "another window to the world", but also serves as an interactive surface, in which the viewer is also a user (aka "viewser"). The convergent new media does not project a "singular text" on the screens [35] and, furthermore, at the other end of the communicational channel there is not a homogenous audience but different users who have different engagement experiences with the projected content.

Nowadays, i-docs exploit the new digital technologies and present stories that document reality either in linear or nonlinear participatory ways [21]. Therefore, the result is a collaborative project, where the outcome arises from the contribution of the creator, the medium and the user. The kind of interactivity is varied for each i-doc, depending on the platform affordances, the templates' designs or the producers' choices. However, the significant element that every i-doc requires is the physical action from viewers, asking them to have an energetic role in order to "watch" the story [36]. The concept of choice is a fundamental key for audience engagement. As far as i-doc is concerned, selections are given to the audience, providing them the opportunity to be in control of the narration which is no longer guided by the producers. Audiences can choose what they want to watch, along with the ability to determine the time and place to do it [37].

The development of an engaged audience which will not get distracted and abandon the viewing is also one of the biggest challenges for i-docs. The proliferation of content on the Web has led to an easily distracted audience and producers fight for its attention. Long-form stories lose popularity compared to shorter videos which deliver bits of information that can be rapidly consumed. Especially, regarding younger generations this can prove to be a real problem; Generation Z has only 8 s of attention span and as years go by it gets even shorter [38]. Nevertheless, the i-doc demands the user's attention and engagement in order for the story to be unfolded. The viewer cannot be passive; otherwise, the documentary will never be watched. The audience can drop out at every click since active decisions should be taken at every step. Traditional documentaries do not present such

dilemmas since once the decision to enter the cinema hall to watch a linear documentary is made; there is rarely abandonment before the end [21].

Wilson [39] defines interactivity as the ability to act in order to influence the flow of events or to modify their form. The traditional documentary genre may have a low degree of interactivity [39] but the sense of narrative is very strong [40]. On the other hand, i-docs do not present such great narrative efficiency, but they involve the viewer differently in the story: they make him/her "work" in order for the story to be revealed. Depending on the perspective, interactivity may imply diverse actions or features. Regarding the narration, interactivity could be connected to co-authorship and non-linearity, while regarding the audience, engagement and involvement in discussions could be more appropriate. Finally, through the perspective of "users", interactivity may imply new paths for accessing the content [41].

Although traditional documentaries offer a few interactive components as well (e.g., DVD choices), in the case of interactive documentaries there are more powerful features [8]. However, how much interactivity is needed? Some critical views argue that as the interactivity gets more enhanced, the producer reduces its power to convince, which increases the risk that the viewer will not be engaged by the storyline [42]. Almeida and Alvelos [43] state that the coherence of the narrative is more important than its interactivity. Indeed, the dynamic producer–audience relation is not always clear and its features may not be exploited. This echoes the young audience's reluctance to use interactive features in data-driven, informative stories [4,23]. The creation of a new author–audience relationship through the fragmentation of the story is not always clear, thus the features of i-doc are in danger of not being fully understood by the user [44].

#### *2.3. Web Analytics and Audience Metrics*

The evolution of internet technologies over the past decades has considerably changed the media landscape not only by providing new ways of audience interaction, but also by providing new ways of measuring the audience. Besides, the measurability of digital media has been deemed as one of its greatest benefits in comparison to old media. As online channels of information have become extremely significant nowadays, the interest in monitoring users' website usage and online activities has intensified [45,46]. In this context, web analytics can help website owners understand how their audience find, consume and interact with online information by providing web metrics that refer to any quantitative and aggregated measure of preferences, passive viewing or consumption of content by internet users [47,48].

In general, the term web analytics can be defined as "the measurement, collection, analysis and reporting of Internet data for the purposes of understanding and optimizing Web usage" [49] according to the formerly known as Web Analytics Association, and from 2012 onwards as Digital Analytics Association. These measurement tools can provide information regarding both the audience exposure and audience behavior such as the number of page views and the most popular pages, new or daily visitors and where they come from, the visit depth and what other links they are clicking, the geographic distribution of visitors, the average visit time on site and plenty of other data. The above can be compared over time providing useful insights for website improvements or decisions about campaign effectiveness [50,51].

Identifying who uses a website and how it is used has been of interest since 1990 when Tim Berners-Lee developed the first web browser [52] and similarly, the use of web analytics dates back to the 1990s when the first tools were developed [53]. Web analytics tools can be differentiated in many ways. Firstly, they can be classified based on their data collection method, either page tagging (e.g., GA) or transaction-web server log file analysis (e.g., AWStats). Secondly, they can be grouped considering the access of their functions. Specifically, such tools can be provided as software as a service (SaaS) through a cloud service provider (CSP) or as software installed in-house. Other ways refer to web site access devices (mobile or non-mobile web analytics) as well as the time lag between data

collection and the availability of services (i.e., real-time or not) [54]. Finally, onsite web analytic tools measure the actual visitor traffic arriving on a website, the onsite journey (engagements and interactions) and the website's performance in general. In contrast, offsite tools can measure the size of a potential website audience, the visibility (share of voice) and the buzz (i.e., comments, sentiment) that is happening on the Internet [51].

Internet technologies together with the field of web analytics are constantly evolving and the range of tools and services available on the market is extensive and diverse [47,48,51,53]. Thus, many organizations employ multiple tools in order to gain useful insights for their online presence. Undeniably, GA is considered globally as the most popular web analytics package and a leading tool for sales, marketing and advertising reasons [50,55,56]. This Google service provides many built-in reports, charts and tables and it is attractive to users because of its free availability, tremendous features and ease of use [56,57]. It is preferred by many professionals or different sectors such as e-commerce, e-tourism, libraries, news industry and media websites [46,58–60] in order to measure a web site's performance, analyze user behaviors and gather technical information. Today, the evolution of the audience should be taken into account seriously in audience measurement and understanding why something is happening on a website constitutes a valuable management skill [50]. The use of web analytics enables owners to see a website from the perspective of its users [61]. In the case of cultural heritage, web analytics—what Manovich would define as cultural analytics—allow us to think of contemporary culture in new ways and helps us to question concepts and methods for studying culture that we take for granted [62].

#### **3. Materials and Methods**

This research lies within the field of quantitative research, since it applies a quantitative analysis using data derived through web analytics from the website's GA dashboard. Based on the objectives of the study, as presented in the introduction section, the research questions are:

RQ1: How did the users of the i-doc NEW LIFE engage with the storytelling according to web metrics and quantitative evaluation?

RQ2: To what depth did the viewers engage with the i-doc and which were the most popular sections/traffic sources?

RQ3: What are the differences between the first period that the i-doc was launched and latest years?

Web analytics are considered the cornerstone of audience evaluation in the new era. Monitoring dashboards is a new way of understanding audience behavior and exploring new viewing and usage practices. In most cases, GA are used to estimate the amount of "exposure" a project received and the extent to which it captured the attention of audiences [63]. However, in general, there is a lack of available empirical data since production companies rarely reveal these data [41]. The raw data were gathered from the behavior, audience and acquisition reports of GA over the course of seven years (from 4 April 2013 to 5 December 2019), before performing our own analysis.

The selection of the metrics was based on the literature [49,64–66] and the suitability of the metric to achieve the purpose of the study. More specifically, to understand the way the audience has engaged with the site we used (a) five GA metrics (quantitative variables): (i) page views (the number of pages viewed), (ii) number of sessions (visits), (iii) sessions' duration (sec), (iv) pages per session and (v) bounce rate; as well as (b) four dimensions (categorical variables): (i) traffic sources, (ii) social networks, (iii) landing pages and (iv) page depth (Table 1). According to Google, a dimension is a descriptive attribute or characteristic of an object that can be given different values [67]. Descriptive statistics were employed for the analysis of data.


**Table 1.** Dimensions and web metrics under study.

Our data are unsampled (GA employs sampling as a calculation method only for big sites with more than 500,000 sessions), based on 100% of the sessions. We depended on standard reports of GA with aggregated data and we avoided user-level and event-level data associated with cookies, user-identifiers (e.g., User-ID) and advertising identifiers (e.g., DoubleClick cookies, Android's Advertising ID, Apple's Identifier for Advertisers) because the validity of their processing was a controversial and sensitive subject before General Data Protection Regulation (GDPR) implementation in 2018.

#### *The Case Study of GREEK i-doc NEW LIFE*

The NEW LIFE i-doc at Lampsakos.Com was launched in 2013 and was followed, later, by a narrative feature-length version on the same topic. Both interactive and traditional linear version communicates ethnography and recreates immaterial elements, such as historical events, social values, traditions, ceremonies and living expressions, therefore it is associated with intangible cultural heritage which is particularly difficult to preserve [68,69]. Through documenting, an attempt is made to provide a sense of continuity of the NEW LIFE that was established in another place while protecting and retaining the identity of the society that moved and therefore salvage the traces of the past.

This project employs hypertext and participative modes of interactivity [25] to gain user's engagement with the story. It employs the most used and less interactive form which does not present a story direction, giving viewers the freedom of choosing their own narrative path. Furthermore, 40 short videos through hyperlinks from YouTube (which are unlisted) of maximum duration 5 min, which stand alone as short stories, are included within four larger thematics (galleries). In the comment section of each video the users are able to express their thoughts. Although the story is developed within a pre-arranged set of short videos, viewers are called to discover the story constructing their own storytelling roads. Without specific end or start the viewer holds the choice to explore the videos in their own way. Users may click on keywords and locate relevant content according to what has triggered their interest, such as more pieces of interviews or more videos on the same topic.

The four galleries are organized in chronological order following the milestones of the history of this refugee community:


The interactive genre was chosen from the producers, as an experimental effort to reach younger audiences. Furthermore, they dealt with organizing and distributing an

extended amount of historical data for the community (photo archives, media archives, testimonials, books, interviews, data, etc.) that constituted of all unseen data or data that have never been available on the Web before. The database-documentary typology [27] was preferred in order to contribute to the collective historical memory, without expressing opinion, letting the viewer form their own interpretation. Of course, this could not be totally avoided, since in every documentary or i-doc, the producers' opinions are reflected in their choices which are present via editing and structuring of the documentary [21].

The project is built in a way that more than one visit should be included to cover the NEW LIFE story which delves into more than one thematic. Moreover, the 40 videos (as mentioned above) are in total 5 h long and it would have been very difficult for a user to binge watch them all in one visit. Audience engagement is encouraged in several ways: either starting a conversation and expressing a point of view for every issue or contributing by publishing the viewers' own raw footage or archive material about the history of the community. They can also share each video on their social networks, social media and platforms encouraging discussion with their bubble. The producers asked the viewers to contribute to the story as well, posting their own content, user-generated, or archived data.

One of the major challenges of this i-doc was to be functional for the users. The interactive documentary was an unknown term in Greece at that period, so the structure and presentation of the story was adopted so as to be as familiar as possible for the online user. The selected content management system (CMS) was Wordpress; a free and open-source software. The template of the website was carefully constructed to be user-friendly and quite simple. Particular attention was paid for a full-screen template, in order for the viewers to have the familiar "lean back" experience [21], like watching it on their TV screen. As mentioned above, videos were uploaded to YouTube as unlisted, and then embedded on the website to ensure quick streaming and better user experience. Furthermore, YouTube was chosen because it provided the opportunity to enable settings such as automatic start of the video or recommendations for relevant content within a gallery.

Search engine optimization (SEO) techniques, which are techniques designed to achieve better positioning in the organic (i.e., unpaid) results of search engines [70], were applied to the website so as to be more easily discovered by users. Besides, documentary is an informative genre and according to relevant studies a large percentage of readers get informed through search engines [71–73]. Additionally, the website was added to free directories and social media accounts were created following the then suggested techniques [74–76]; both were important for the visibility of the site on the Web as part of the promotion [74].

It is worth highlighting that the i-doc itself, was created in a participatory mode, since a crowd-funding campaign was employed to cover the expenses of production. More than 100 people had responded, giving money in order that their cultural heritage be documented. This has changed the typical producer–viewer roles from the beginning, since the funders, as co-authors of the documentary had the privilege to be the first to acquire visibility. A database consisted of 1000 fans had already been formed before the launch of the i-doc. That campaign has helped them to gain visibility since there were already 1100 members in the i-doc Facebook group (members of the group have been reduced over the years, since the account has not been very active) while the promotional trailer for the crowd-funding campaign was watched by 2000 people during the first week. Even before the documentary was launched, a large audience had already been established waiting for it.

Additionally, to support the i-doc an outreach campaign was undertaken online media posts (e.g., through various relevant social media pages and groups) such as digital publications, brochures, leaflets and posters as well as interviews on online media; mainly focused on Greece. The producers also applied campaign methods for traditional documentaries to make it known to generations that do not have easy access to the Internet. For example, presentations of the project at relevant events took place, like the Refugee Memorial Event in Nea Lampsakos (14 September 2012), where audiovisual material was screened. Furthermore, leaflets and posters were distributed in print form as well as press conferences. As a result, following every dissemination event, Facebook group members (mentioned above) as well as the views of the website were increasing gradually.

The linear, feature-length version of the i-doc which was created afterwards also further assisted the reach of the website. After every screening, either on documentary festivals or on TV, visitors reach was increased. The traditional documentary bearing the same name NEW LIFE had a successful route since it has been screened in four festivals, documentary and cultural ones (October 2013 and February 2014, December 2013 and July 2017) and it was aired for two years (2015 and 2016) by the first documentary channel in Greece, the Cosmote History Channel, of the leading Cosmote TV streaming platform, as well. Traditional documentaries' performances are easily evaluated, typically based on screenings, festivals awards, number of viewers and TV ratings. This does not apply to i-docs, which are always online, they can be watched during several visits and their impact is valued following a different logic [21]. A specific story may have greater impact on small, niche audiences.

#### **4. Results**

In order to address the research questions outlined above, the current research focuses on web analytics data that were drawn between April 2013 and December 2019 from the idoc website. This period includes the launch of the i-doc NEW LIFE and 80 weeks onwards. At this point, we should mention that the i-doc was first uploaded in beta version/format to test its functionality form in April 2013, while at the end of July 2013 it made its debut to the media and people.

Regarding the RQ1 proposed in this study, data revealed that the i-doc had, in total, 12,115 sessions and 40,465 page views in an 80-month period. The bounce rate was 61.03% and the average session duration 03.29 min. On average 3.34 pages per session were viewed. As expected, the peak of the viewership occurred during the first year after the launch of the i-doc with the highest score being in August 2013 with 948 sessions and 5444 page views in one month (Figure 1).

**Figure 1.** Audience engagement for 80 months period (April 2013 to December 2019).

⅕ Since the release of the i-doc more people were gradually using the website. The best performance was noticed from July to November of 2013. During these five months almost 1/5 of the total viewership of the seven years (14,541 page views and 2094 sessions) had occurred. Furthermore, during these months, there were sessions with high interest in the content since viewers watched almost five pages per session. Website usage wound down after the first year. As shown in Figure 1, certain peaks were noticed around the dates of the linear documentary screenings. It is worth noticing that the engagement of the ⅕

audience with the i-doc has remained the same ever since. The page views/session has been stabilized around 3 min, even though the sessions gradually reduced.

In our case, the page views and session metrics provided data on how effective the site content was at keeping viewers on the site and engaged. In the below figure (Figure 2) it is observed that the longer the session duration is, the higher the page views are. We did not take into consideration sessions 0–10 s since most of them came from spam traffic sources. The most common pattern of viewing was sessions that lasted 181–600 s each (3–10 min), followed by sessions that lasted 601–1800 s (10–30 min). In 381 sessions, the viewer's attention was kept for longer than 1801 s (30 min).

**Figure 2.** Page views per session.

To respond to RQ2, we chose to study the GA's default dimension of landing pages, page depth, traffic sources in tandem with the above web metrics. These descriptive data are very useful for an interactive documentary that encourages participation and updating, because they indicate the type of content the producers should boost more or the section of the story that could be enriched and improved.

According to data presented in Figure 3 the vast majority of traffic came from search engines (e.g., Google and Yahoo!) as the organic traffic was about 47 percent overall. Furthermore, direct traffic was the second largest source (27 percent of incoming sessions) followed by social networks (mainly from Facebook), as well as referral traffic channels (Figure 3). It is worth noting that the first two rankings remained unchanged each year and the organic traffic was steadily the largest director to the website. The above results are anticipated, considering the significant impact of search engines on distribution and dissemination of online information.

Furthermore, analytics on referral traffic (sessions that came from a link on another site) show that news websites, like lifo.gr and news247.gr where interviews of the producers were posted, are among the eight most popular sources for the i-doc (Figure 3). Viewers that came from these sources have developed a strong engagement pattern. Online readers who visited the website after certain news posts triggered their interest, had long and engaged sessions. Data has also shown that sessions reached from referral source had the minimum bounce rate of all sources (33.33% compared to the highest, 88%, from direct sources). This complies with prior studies which have shown that relevant content is a key factor for a web documentary with high engagement [23]. The story is the main reason for a greater commitment, according to i-doc users [44].

**Figure 3.** Traffic sources.

Moreover, web analytics data indicated that out of the 12,115 sessions, a high percentage was generated by social traffic (Table 2). Within the explored date range, the main source of social network traffic was, by far, Facebook (94.32%). This is anticipated because mainly Facebook and consequently Twitter were favored by producers, as the social networks that most i-docs which were launched in these years employed to get in touch with their audiences [77]. Facebook, in particular, was used even before the launch of the documentary. Facebook sessions present 3.5 pages watched on average, which is analogous to the average values of the website (3.34 pages/session). YouTube visitors are few but stand out for the remarkable engagement level: long average session duration and more than 5 pages per session.


**Table 2.** Social network referrals.

As data revealed what kept viewers' attention the longest was the first page of the website, where an introductory video was shown (Table 3). More specifically, most visitors landed on the homepage of the documentary (5898 sessions). These were the most highly engaged sessions that kept the user's attention for 280.24 s on average. Furthermore, the second landing page was the gallery "Dive in the Past" (voutia-sto-parelthon) with notably less sessions (395) and the third one was "New Life in New Lampsakos". All entrance pages were related to the Greek village Lampsakos, which makes sense since the language used in the i-doc was Greek. It is worth noting that the gallery about the Turkish village "New Life in Lapseki" (nea-zoi-sto-lapseki) which contains testimonials in Turkish


**Table 3.** Loading pages.

Apart from the three galleries, the list of the top ten landing pages includes seven videos, the ones that have been frequently posted in the i-doc's Facebook group. The bounce rate in videos was higher than the bounce rate in galleries.

language with Greek subtitles scored extremely low in bounce rate (only 40%) while it had good engagement rates as far as pages/session and session duration are concerned.

Subsequently, an analysis on the most visited pages of the i-doc was performed. Web analytics exposed the ten most browsed galleries which were all Greek culture related (Table 4).



Likewise, most of the top ten landing pages were included in the top ten listed popular pages (Table 4). The bounce rate was high, even after the introductory video. However, an encouraging outcome was revealed from further analysis; page view metrics demonstrated that the i-doc NEW LIFE had 40,645 page views, of which 29,565 were unique. This means that almost <sup>1</sup> 4 of the total page views came from returning sessions. These data are in agreement with relevant report of the StoryCode, one of the few related researches that are available. This open-source community has studied web analytics of five relevant web projects which revealed that on average they had 75% new vs. 25% returning visits [78].

Page depth reveals a different dimension of engagement. As we see in Figure 4, most of the sessions held only one level of page depth, which means that the users/viewers watched one video and then left the site. This is the category where the maximum page views belongs as well. Furthermore, 320 sessions were the most engaged ones including

more than 20 pages per session. Results of further analysis indicated that 260 of the 320 sessions occurred in the first 40 weeks which drove us to RQ3.

**Figure 4.** Page depth.

To answer the RQ3 we partitioned the data processing into two periods. The first one starts from the launching of the i-doc on the 5 April 2013 and ends on the 4 August 2016. This was the 40-month period that most of the promoting events took place and the traditional linear documentary had a parallel route as well, giving a boost to the website. After August 2016, the linear documentary had stopped screenings in streaming platform Cosmote TV. The second 40-month period which starts on the 5 August 2016 and lasts until the 5 December 2019 includes the "quiet" period in the documentary's life where all promoting procedures had ceased but the i–doc was still online. To this scope four GA metrics were used: (a) sessions; (b) page views; (c) session duration and (d) bounce rate (Table 5).

**Table 5.** Comparison between two periods of usage.


Almost 2/3 of the total sessions in the i-doc took place in the first 40 weeks. Whereas the page views were noticeably reduced in the second period; the users' behavior remained the same as we can see in the following figures (Figures 5 and 6).

Data in Figure 5 reveal that sessions in the first period were higher than the second but what demonstrated high decrease is the number of pages that were viewed in every session. On the other hand, in Figure 6, it seems that the users who were mostly interested in the content of the i-doc visited the website when it was launched.

**Figure 5.** Comparison between sessions and page views.

**Figure 6.** Average session duration in two periods.

#### *Indicative Results*

In 2016, Google had to change the way GA operated. Their main goal was to start tracking the same user across different devices (e.g., laptop, smartphone, tablet, etc). Thus, during the second period that was investigated, we had the opportunity to examine two more audience-related variables: (a) precise data for new (first time users) vs. returning users and (b) devices the users preferred. In short, users' data demonstrated that the desktop is by far the most preferred device for viewing (65.42%), followed by mobile phone (30.02%) and tablet (4.56%). This complies with data of previous studies that documentary as an informative genre favors this type of viewing—not on the move, but more often at home [5,18,23]. Furthermore, the i-doc lacked responsiveness to mobile phones which is another reason for these results. Additionally, it was reported that the session duration was doubled for returning visitors compared to new ones (about 110 s for new visitors and 240 s for returning ones). The percentage of those who return to the site is high: an indication that the advantage of web-docs being available for viewing whenever the user wants is something that the viewers make use of.

#### **5. Discussion**

This paper casts light on the audience behavior in an i-doc and the role that this new genre can play on using digitized cultural heritage material. That project which was launched on the Internet seven years ago provided a unique opportunity for evaluating the audience's behavior from 2013 onwards and look for patterns over media usage.

Regarding the engagement of the viewers with the storytelling (RQ1), our analysis demonstrated in total 12,115 sessions and 40,465 page views in an 80-month period (Figure 1). The bounce rate was 61.03% and the average session duration was 03.29 min. The most engaged sessions lasted more than 30 min whereas the majority of the sessions lasted between 3 and 10 min. On average, 3.34 pages per session were viewed. These data are similar to relevant analysis about other i-docs which is quite satisfying for a local-level production. Although audience data about i-docs are rarely published by production companies, we have collected a few relevant researches that enabled us to get a broader, comparative point of view. Data that was retrieved from an audience research of the Slovenian interactive documentary iOtok [39] showed that the bounce rate for all pages was 53%, with average session duration of 3 min, 47 s, and an average of 1.9 pages viewed per session. Furthermore, according to a statement by a former executive of Canadian media fund (Anaphora), which supports the biggest interactive documentary production worldwide from the National Film Board (NFB) of Canada, the average session lengths are between 2–3 min, whereas the most successful web docs keep the viewer on-line for 10 min. Indeed, the web analytics from the multi-prized interactive documentary "Out of my Window" (also awarded a Digital Emmy in 2011) revealed that most sessions lasted 10–12 min. If the users' attention was kept for more than 12 min then these sessions could reach up to 45 min duration [79]. Another documentary of ARTE (https://www.arte.tv/en) with global success, Gaza Sderot (2008), had average session duration about 8 min [79]. Using descriptive variables (dimensions) in tandem with web metrics, we explored to what depth the viewers engaged with the i-doc and which were the most popular sections/traffic sources (RQ2) for the given period. Most of the sessions displayed only one level of page depth, which means that they saw one video and then left the site. The main landing page—the entrance to the website—was the homepage, followed by entrance pages all related to Greek village's history. Of course, this was anticipated since the producers' target groups were mainly Greek people and the i-doc has not been translated in other languages. What triggered our attention was a gallery that contains testimonials in Turkish language with Greek subtitles that had notably low bounce rate (only 40%) and good engagement rates as far as pages/session and session duration are concerned. These data pointed out that given the high interest, this section of the story could have been enriched and also the i-doc could have expanded its target-group, translating all video stories in the Turkish language. Apparently, the most important factor in determining the kind of storytelling people engage with is the way this triggers their interest. This is in accordance with previous researches which note that the type of storytelling is not as decisive as the story itself [4,16,21,23,44]. Whether it is enhanced with digital techniques (e.g., through Search Engine Optimization SEO, social media, social networks, etc. [20,74]), or not, a good story is the key to the success of each documentary (traditional or not). Its content and relevance to the viewer's interests emerges as a very significant viewing factor, which is also confirmed by our further analysis on traffic sources. It was noticed that sessions that occurred from media posts referrals were long and engaged and had the lowest bounce rate of all sources (33.33% compared to the highest, 88%, from direct sources). Online readers, who visited the website because certain posts have triggered their interest, had long and engaged sessions, as well.

Sessions that were initiated from the homepage of the documentary were the most highly engaged sessions that kept the user's attention almost a minute longer than the total average. However, what is interesting here is that the bounce rate was also high. This may indicate the weakness/inability of this i-doc to develop a strong narrative or stimulate the viewers for more page views. Users were landing on a gallery or video, watching its content and then immediately exited from the website, without browsing further for more content. Indeed, the interactive capacities of an i-doc may provide multiple pathways through the story, but risks may be encountered such as the difficulty in understanding a non-standard navigation system, the possibility of losing attention during the navigation or the hazard of no return after visiting an external link [44]. The presence of an explaining button or a recommendation system that would give direction through a narrative path may have decreased the bounce rate, guiding the viewer to discover more. Furthermore, as far as the content is concerned, the presence of a cliffhanger at the end of the story could also have helped to lure the viewer to continue.

A good example is the i-doc Prison Valley of ARTE which had global success and reach about 10,000 viewers per month [79]. Several interactive techniques have been launched to keep the viewers' attention and avoid the bounce rate [79]. For example, the user could start browsing by logging in through either Facebook or Twitter [74,79]. Later on, the producers send a motivation message to their accounts, encouraging those who did not stay long to reconnect [79]. Furthermore, at the end a question popped up to the viewers "What do you fear most?", 60% of them answered, engaging with the producer [79].

As far as traffic sources are concerned, web analytics revealed that most of the traffic came mainly from search engines. Furthermore, direct traffic was the second largest source. The predominance of search engines is expected. Today, major search engines are deemed as the most common and trusted tool to retrieve information from the Internet, a primary method used for navigation and one of the most common online activities for hundreds of millions of users worldwide [80–83]. Studies have shown that internet traffic depends largely on them and therefore, web search is one of the best sources for every website [82–85]. Given the informative nature of documentary genre, the results are in agreement with prior studies that have shown that a large percentage of readers get informed through search engines [71–73]. As reading habits change because of the dynamic nature of the web, many people around the world prefer to access media content or news through channels such as search engines, news aggregators and social media that take on the info mediation process [86]. In this context, the implementation of SEO practices to this newly created website as part of its promotion and mainly digital marketing strategies seemed to be fruitful in order to boost the website's online presence and drive more traffic to it. The results and traffic analysis affirm the role of SEO in today's digital media landscape and that it can be applied to different types of websites (a documentary website in this case) for various search purposes [87]. It might be thought that documentary producers cannot ignore the power of web search traffic (mainly Google traffic) and should pay close attention to search engines and SEO when promoting content, especially content developed for the Internet. Given that the media industry has entirely entered the digital age, the effective use of SEO seems to be an important element for attracting more viewers online.

In addition, GA have shown that social media has also been an important source of traffic. Our analysis demonstrated that the sessions from Facebook, which were the most (94.32%), had analogous viewing duration to the website's average, while the YouTube visitors although they were few had impressive engagement level (long average session duration and more than 5 pages per session). At this point, we should mention that from 2013 until today (2020), the use and social media and social networks themselves have changed in the form as well as the interaction they cause [20], while new technical promotions were created due to constant technological changes [74], and social media optimization (SMO) and social media analytics (SMA) (e.g., Facebook analytics, etc.) now exist [20,74]. All of these are now useful tools for both the promotion and the sustainability of cultural heritage from and through the Internet.

To answer the RQ3 we explored the differences between the first period that the documentary was launched and the latest years. As was expected, the viewership scored less and less as years passed by while the peak was noticed in the first five months (almost 1/5 of the total viewership of the 7 years). The core of engagement appeared in this period since the viewers watched almost five pages per session, quite a bit more than the 3.34 total average. Moreover, 260 of the 320 sessions that lasted more than 45 min occurred in these first 40 weeks.

Accordingly, it seems that digital projects need a transmedia approach, which employs both conventional and interactive online promotional techniques in order to reach the public. Digital technologies along with the Internet perform a significant role in the easy dissemination of the content to a varied audience which can be engaged and therefore create a participatory culture. However, offline activities, such as live events and analogue initiatives contribute highly to the audience's perception of being emotionally connected as a part of the documentary and experience its immersion [88].

Our findings also throw light on some of the users' metrics, since, after August 2016, GA's indicative results of this month and onwards demonstrate that the desktop was the most preferred device for viewing (65.42%). This also gave us the opportunity to have a clearer view about the returning visitors, since the new data from this given period revealed that the percentage of those who returned to the site was high and those who returned viewed twice as much content as the new visitors (about 110 s for new visitors and 240 s for returning ones). This was an encouraging finding which addresses the previous queries about the bounce rate and asserts that the viewing pattern of database-documentary was what the producers were aiming at. It also proves in practice the aspirations of literature that database i-docs encouraged the viewers' freedom and gives them the choice to come back without feeling that they left halfway through the story or that they need to start again from the beginning. In documentaries with pre-fixed narrative paths this cannot be done. Although the average session duration of NEW LIFE was 3.29 min, the overall viewing engagement was in practice much more, since the returning viewers/users have a longterm viewing pattern, which cannot be compared with that of the traditional documentary.

Based on the relevant literature review, we assumed that the interactive documentary genre is a sustainable digital storytelling form that can be used for documentation of cultural heritage (H1), engaging the new-styled viewers of the streaming era (H2). We conclude that the key factors for a sustainable digital storytelling are (a) constant promotion with transmedia approach; (b) data-driven evaluation and reform and (c) a good story targeted to the right audience.

#### **6. Conclusions**

Back in 2013, when the NEW LIFE film was launched, interactive documentaries were still in their infancy. As the latest buzz in documentary production worldwide, they were heralded as the future, with interactivity being the red line that separated traditional from digital storytelling. In Greece, multiplatform media production was gradually gaining attention and terms such as interactive documentary or transmedia projects were evolving as a new vocabulary. Seven years later, the genre of interactive documentary is still considered in flux, employing different technologies and affordances of web platforms. Moreover, the expansion of mobile devices' use and the proliferation of audiovisual consumption have changed the audience's practices and preferences. Web analytics helped us not only to test our hypothesis but also to question the assumptions about the strengths and weaknesses of interactive documentary. There were theoretical and practical implications of this study that are discussed in this section.

Our analysis has demonstrated that i-docs engage the online audience in different ways, according to the interactivity modes that they employ. While we agree that the database documentary with hypertextual and participative modes of engagement offers great freedom to the viewer, we acknowledge that this type of database-documentary may not be appropriate when a certain story needs to be told. What is emphasized as an advantage in i-docs literature (the viewers' choice to take unexpected journeys and discover for themselves the unfolded story) is not always welcome by the audiences. The interactivity might be confusing and we should also bear in mind the viewer's need for more lean-back, passive viewing patterns. In this case, to ensure that the storytelling

power will not be missed, producers have to provide a clear narration path, either pre-fixed or just recommended.

The basis of every documentary is a good story but in the case of i-docs when there is no clear narrative path to follow and no one is guiding them, whether they will continue or not, depends absolutely on their interest in the content. The sustainability of digital storytelling relies heavily on this parameter. Thus, it is very important when we develop digital storytelling to exactly know the audience we are addressing and where to find it. This gives the documentary filmmaker the opportunity to deal in depth with very specialized topics and approach fragmented audiences that mass media fail to allure. The traces the online users leave may provide elements that content creators can make use of to acquire information about their audiences. Thus, targeted stories through engaging content may be produced to reach the relevant audience.

Specifically, producers of interactive documentaries should bear in mind that the documentaries they built need promotion, as much as any website needs. Apart from a new form of storytelling, new forms of promotion should be created also, to find and approach the online audience. The 70 min linear documentary helped the promotion of the online documentary and vice versa. The non-linear documentary arisen after the viewing of the linear version, meaning that people were looking into the web for additional information. An effective approach to engage with more audiences would have been to have offline events, which can enforce the audience building. A transmedia approach also makes economic sense since the product may target and finally attract different audiences in the different forms (e.g., text, verbal, iconic or even narrational structure) and platforms that it is present in; thus, creating different groups of consumers [89,90].

It is our belief that i-docs do have a certain life cycle, although they are constantly available to the viewers on the Web. As the years go by the consumption decreases. It is important, though, to be constantly supported by offsite techniques (e.g., social media advertising, content updates, etc. [20,74–76,79]) as well as traditional documentaries, in order to be able to continue being watched. So, the assumption that i-docs have no expiration date is disputed. On the other hand, an i-doc which is incorporated in a larger network with a constantly updated user database (i.e., in a relevant cultural organization's website or a news site) could overcome this challenge.

To conclude, the success of this documentary genre cannot be measured with traditional standards. Typically, i-docs have a small audience but this does not mean they have less impact. Their success lies on the fact that they gather relevant audiences, with specific interest in the story rather than random, mass audiences. Furthermore, the viewing experience is different since most of the viewers have multiple viewing experiences which means they are engaged more, in a long-term relationship with the story. I-docs can be a useful tool for organizations experimenting with digital storytelling, since it combines several kinds of materials (photos, videos, audio, graphics, etc.) and its interactivity affordances enhance the social dimension of cultural projects. It is also a great way to diffuse this material on younger audiences which are more familiar with digital culture [21,23], increasing their engagement with cultural heritage.

As with every research, this one comes with limitations as well. Beside the large baseline on web metrics we had available, which offered us mainly quantitative and some elements of qualitative data about audience behavior, we lacked knowledge about motivations and gratifications of audiences' usage. Conclusions made about audiences based on web analytics should be deemed working hypotheses until a more experimental approach can be taken. Besides, Web analytics tools are most helpful when used together with other methods to validate findings and develop interpretations by discovering potentially significant patterns in Internet use.

Extensions of this work could focus on comparing these data with more audience analytics of similar i-docs. We would then have the opportunity to generalize the results and set a group of guidelines, which will be used as a starting point for filmmakers, audience scholars and cultural organizations. Furthermore, more relevant researches, would have given us the opportunity to perform comparative analysis with other i-docs. Future studies may enable us to determine not only who, where and when they see the i-doc, but also to understand how they see it. Finally, future extension of this work could explore web metrics that were not covered by this case study as well as traffic data for longer periods. The change in GA's calculating system after 2016 did not give us the opportunity to analyze in depth the "user" metric throughout the seven years. Nevertheless, through indicative analysis we obtained promising indicators that can be explored in a secondary analysis with a different time frame.

**Author Contributions:** Conceptualization, A.P.; methodology, A.P., D.G., C.N. and R.K.; software, D.G., M.M. and R.K.; validation, D.G., C.N. and M.M.; formal analysis, A.P., D.G. and M.M.; investigation, A.P., D.G., C.N. and M.M.; resources, A.P., D.G., C.N. and M.M.; data curation, D.G., M.M. and R.K.; writing—original draft preparation, A.P., D.G. and M.M.; writing—review and editing, A.P., D.G., C.N. and M.M.; visualization, A.P., D.G., M.M. and R.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Informed consent was obtained from all subjects involved in the study.

**Data Availability Statement:** The data is not publicly accessible since they are derived from or related to the dashboard of the i-doc's website. Access was provided to the producers in a research and time limited framework.

**Acknowledgments:** The authors would like to thank the production team of the i-doc and George Kalliris who supervised the project.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Sustainable Biocultural Heritage Management and Communication: The Case of Digital Narrative for UNESCO Marine World Heritage of Outstanding Universal Value**

**Clio Kenterelidou \* and Fani Galatsopoulou \***

School of Journalism and Mass Communications, Faculty of Economic and Political Sciences, Aristotle University of Thessaloniki—Greece, 46, Egnatia street, GR-54625 Thessaloniki, Greece **\*** Correspondence: ckent@jour.auth.gr (C.K.); fgal@jour.auth.gr (F.G.)

**Abstract:** The paper addresses sustainability, heritage, management, and communication from UN-ESCO's Marine World Heritage (MWH) perspective, analyzing its digital narrative footprint through social media. It aims to understand how MWH is conceptualized, managed, and communicated and whether it is framed with sustainability and biocultural values facilitating interactivity, engagement, and multimodal knowledge. Hence, a content analysis of the Instagram accounts of the MWH of Outstanding Universal Value (OUV) sites and protected areas has been conducted. The study included evidence from their Instagram profile, posts, features, and reactions. The findings indicated the dearth of a management and communication strategy being shared among and across UNESCO's MWH of OUV sites and protected areas, capturing the "lifeworld" and the "voice" of the marine heritage as unified. They also revealed that nature and human, and biological and socio-ecological ecosystems of MWH of OUV sites and protected areas are not interlinked in marine heritage management and communication featuring the whole and the entirety of the marine heritage site ecosystem. The lack of this expansion of meaning and engagement does not facilitate the shift of the route in the marine-scape, from discovery and being listed as World Heritage to human-nature interaction, diversity, dynamicity, and ocean literacy. The study contributes to setting the ground rules for strengthening marine heritage management and communication in light of the United Nations Sustainable Development Goals (SDGs) and the Ocean Literacy Decade (2021–2030).

**Keywords:** marine heritage; biocultural heritage; heritage management; heritage communication; digital narrative; social media; Instagram; UNESCO; marine protected areas of outstanding universal value; sustainability

#### **1. Introduction**

UNESCO's Marine World Heritage (MWH) acknowledges unique marine biodiversity, singular ecosystems, unique geological processes, or incomparable beauty. However, the marine landscape is more than the blue environment and its beauty. It goes beyond the blue ecosystem; the sea, the underwater environment, the sea surface, the coastline, and its land [1–4]. Marine ecology is a result of interaction between all the above ecosystems and landscapes together with human cultural and societal processes. Indicative is the fact that just over 40% of the world's population lives within 100 km of the coast [5]. Yet, this dialectic multifold nature and multimodal knowledge of the marine-scape are neglected in the conceptual definitions and academic research. Thus, unlike the marine nature-scape, the maritime cultural and social landscape is poorly observed or incorporated in the sustainable development context and praxis.

In this paper, we argue that new approaches to heritage, conservation, management, communication, and development goals should be followed and that a broader transformative impact is needed. For this purpose, we fall in line with those peers that propose a unified, joined-up approach for culture, heritage, landscape and systems, and

**Citation:** Kenterelidou, C.; Galatsopoulou, F. Sustainable Biocultural Heritage Management and Communication: The Case of Digital Narrative for UNESCO Marine World Heritage of Outstanding Universal Value. *Sustainability* **2021**, *13*, 1449. https:// doi.org/10.3390/su13031449

Received: 22 December 2020 Accepted: 26 January 2021 Published: 30 January 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

sustainable livelihoods. This integrated framework for heritage and the various landscapes/environments is the biocultural heritage conceptualization [6,7]. Through this optic, sustainable development outcomes derive from acknowledging that biological ecosystems are in a continuous dynamic dialogue with socio-ecological ecosystems. This means that it is recognized that human practices are being developed or originated not only by biological habitats and species, but also by other forms of outputs, like:


Further, the abovementioned outputs mean that it is also recognized that the way these outputs are transferred from the past, play out in the present, and enhance futures literacy is of great significance. Therefore, we align with those peers supporting that sustainable biocultural heritage management and communication can benefit the heritage. These benefits are beyond the heritage's immediate value (economic, cultural) [14,15] and can have the potential to be enhanced. Their enhancement can be in an indigenous, participatory, multimodal, and metafunctional manner. It can also have strong relevance for other human and social groups across the globe. In this way, the future of humanity and the Earth will be solidified [13].

Facilitating the above, this study's purpose is to explore questions about sustainability and bioculture in the context of the management and communication praxis of the UN-ESCO's MWH of Outstanding Universal Value (OUV) sites and protected areas within the digital social media environment (particularly Instagram). By drawing from the literature on the forenamed topics, we sketch an overview framework for understanding functional terms like heritage, MWH, heritage management and communication, and, specifically, their relation to sustainability and bioculture. Next, and building on the former work, the research data are presented, and the findings are discussed in accordance with the digital narrative footprint, and the nature-human, sustainability, and biocultural frames. We conclude by identifying needs for further research and scope for improvement in this kind of approach and analysis in the era of big data and the semantic web.

#### **2. Defining and Delimiting Marine Heritage as a Typology of Heritage**

Heritage, unlike history, is dynamic, open, and changing, and its significance belongs to the public realm [16]. According to the International Charter of Venice (1964), "heritage" is multiple, tangible and intangible, things and their imbued message, that, in the present, remain as living witnesses of the "old" and are safeguarded for and handed on to future generations. Hence, heritage is the monuments, objects, artefacts, instruments, genes, species, ecosystems, cultural spaces, traditions and customs, folklore, performing arts, practices, representations, languages, artistic expressions, skills, beliefs, knowledge systems, and human values. All these derive from the past, are preserved in the present and passed on to future generations [13,17–19]. When heritage is in the ocean or sea areas, underwater, or is a marine island, it is known as marine heritage [1–4]. Marine heritage is also considered the heritage located in the coastal zone of the continent or continent island, or when it relates directly to marine resources or environment, the coastal line and land/surface, or even when it refers to the relationship with the ocean and sea [17–21]. When the heritage is of outstanding universal value to humanity, and of high cultural and natural importance, it is defined as world heritage, according to the United Nations Educational, Scientific and Cultural Organization (UNESCO). Particularly, the UNESCO Conventions referring to heritage of OUV are those concerning (a) the Protection of the World Cultural and Natural Heritage (1972), (b) the Safeguarding of the Intangible Cultural Heritage (2003), and (c) the Protection and Promotion of the Diversity of Cultural Expression (2005).

Being of "Outstanding Universal Value" (OUV) means being of cultural and/or natural significance, which is so exceptional as to transcend national boundaries and to be of common importance for present and future generations of all humanity. Therefore, its permanent protection is "*of the highest importance to the international community as a whole*" [22–25]. The heritage is identified as being of outstanding relevance for future generations according to one out of the ten selection criteria of UNESCO (six cultural and four natural criteria), as Table 1 shows [22].

**Table 1.** UNESCO World Heritage selection criteria (10 in total) [22,23].


When heritage is identified as being of OUV, and in order to be protected, it is inscribed in the World Heritage List of UNESCO [22,24,26,27]. This not only facilitates the preservation of the heritage identified as world heritage, and the awareness-raising about it, but it also brings prominence and monetary revenue to the related actors as well [28,29]. Being included in the World Heritage List of UNESCO raises the site's profile and brings resources and expertise to support its protection [30]. The World Heritage List is dynamic, and, nowadays, it includes a total of 1121 World Heritage sites existing across 167 countries and representing all continents; 869 cultural, 213 natural, and 39 mixed (cultural and natural) sites, with 39 being transboundary and 53 in danger ([31], data October 2020).

When applying the World Heritage Convention (1972) criteria to marine systems and connecting the UNESCO World Heritage List with marine heritage, the importance of the marine environment and its different features are revealed. Consequently, MWH of OUV, together with the marine natural values of sites, are made salient globally. This, in turn, means that they are brought under international oversight, and their protection should be shared and be a shared responsibility of us all [16,30]. Some of the globally significant marine sites and iconic ocean places on Earth are the Great Barrier Reef in Australia, the Galápagos Islands in Ecuador, the Banc d'Arguin National Park in Mauritania, the Socotra Archipelago in Yemen, and the Ogasawara Islands in Japan. In total, there are 50 MWH of OUV sites and protected areas existing across 37 countries and representing all continents; zero cultural, 46 natural, and four mixed (cultural and natural) sites, with three (3) being transboundary and three (3) in danger ([31], data October 2020).

The three (3) that are transboundary are Kluane/Wrangell–St Elias/Glacier Bay/Tatshenshini–Alsek (Canada and the United States); the Wadden Sea (Germany and the Netherlands); and the High Coast/Kvarken Archipelago (Finland and Sweden). The three (3) that are in danger are East Rennell Island (Solomon Island group); Everglades National Park (Florida, USA); and Islands and Protected Areas of the Gulf of California (Mexico). The first marine site on the UNESCO World Heritage List was included in 1979. It was the Everglades National Park in the United States of America and then, in 1981, the Great Barrier Reef in Australia [24,27,30,32]. To ensure the application of the World Heritage Convention to marine ecosystems globally and to encourage a representative, balanced, and credible World Heritage List [25,33], the International Union for the Conservation of Nature (IUCN) issued a road map. This roadmap of the IUCN also serves as a navigational chart. It addresses issues like a) introduction and interpretations of the World Heritage criteria and their relevance to and application in the marine ecosystems, and b) biogeographic gaps and ecosystem-based approaches to address them. This initiative of the IUCN was taken in order to facilitate and affect what we recognize as MWH [20,30,34–38].

#### **3. Managing and Communicating Heritage**

Further to the need to identify heritage or marine heritage and preserve and protect it, there is also a need for its management and for understanding the key issues in it in order to facilitate heritage site development. Heritage management refers to both cultural and natural heritage resources as well as tangible, intangible, formal, official, and informal collective heritage [39]. It incorporates various actors: public, private, government, advocacy groups, non-governmental organizations, and local and indigenous community, and is considered complicated because there is no commonly agreed-upon definition. Therefore, it is introduced in the literature in a variety of ways [40]. In an expansive phrasing, it can be said that heritage management refers to the process where the undertaken activities aim to care for heritage item's assets and protect the physical and natural features of its environment [40,41].

Heritage, in any form and type, is an essential element affecting sustainable development. The latter has been recognized clearly by the World Heritage Committee when, in 2002, it declared that heritage is "an instrument for the sustainable development of all societies" [42]. Moreover, besides the various policy documents highlighting the former, it is also depicted in the 2030 Agenda for Sustainable Development and the 17 Sustainable Development Goals (SDGs) in which there are explicit references to cultural aspects that emphasize the central role of heritage in sustainable development [43]. As a result, heritage management practice revolves around integrating the key sustainability dimensions. On top of that, the roles of authenticity and genuineness, values, community, and the public are much appreciated in protecting and managing heritage as they can be critical delimitating and legitimating factors and components of intertwined sustainability dimensions [39,44,45]. Indeed, sustainability connects to society's adaptability and resilience by being the equilibrium between the development that is needed and the protection of its values [46]. It encapsulates a vital well-being aspect and its maintenance over a long, or even indefinite, period [47], as well as a civilization of enhanced human well-being and environmental resilience together with value-led change for getting there [48]. Communicating the latter felicitously is fundamental for heritage.

#### *3.1. Sustainability Framework for Heritage Management and Communication*

The forenamed new challenges and new attention (sustainability dimensions; roles of authenticity and genuineness, values, community, and the public) lead to new bases and focus in the managerial and communication approaches. These new tendencies invite us to become sustainable and, therefore, to implement sustainable heritage management, stimulating developmental potential and impact [49–51]. The latter means a shift from the physical consistency of the heritage to aspects pertinent to human deeds and thoughts as well as a reconsideration of distinctions; a shift from silo thinking to more integrated

approaches. Thus, heritage is not considered a collection of things and sites, but a process of meaning-making, "a way of knowing and seeing" (Smith, 2006:44 in Barrère, 2016) [39]. It is understood as a social phenomenon that strongly reflects the society in which it is created and valued. Heritage interprets, represents, and decodes the way of living of those communities that reside within the vicinity (Long, 2000 [52]). Yet, in tandem, it is strongly connected and also engaged with the global community. It is promoted and made visible for the public interest so that it delivers socio-economic and development benefits [52].

In line with the abovementioned issues, sustainable heritage management needs to integrate the economic, social, and environmental dimensions into strategic planning and actions. The latter means extending from the planetary biosphere and specifically marine ecosystems to the local and indigenous community and human and social ecosystems [53]. It needs to aim not only to preserve and restore but also to increase the knowledge about the heritage as well [54]; a heritage that is met at sea level, submerged and underwater, in the coastal area and marine environment [1–4]. Moreover, genuine sustainability exists when heritage is present anywhere and anytime in everyday life. Therefore, moving towards these new pathways in which heritage and management are harmoniously integrated leads (a) to value creation as the heritage site's outstanding universal value is recognized and (b) to proactive, future-oriented management.

Further, sustainable heritage management and communication leads to futures literacy and futures foresight, since alternative future scenarios for the heritage site and the desired future are constructed and selected. Through this process, likely outcomes are predicted, and today's planned actions define tomorrow's outlook [55]. All these result in applying participatory practices and cultivating a participatory culture [56], developing a forward-looking attitude and skills, and establishing dialogic collective action. The former challenges also tightly relate to the MWH of OUV sites, because they raise the question of how conservation of a site's irreplaceable values can be balanced with the shift to socio-economic development and use, although sustainable in nature [57]. Besides a few geographically remote marine heritage sites, which are off-limits for exploitation, the remaining MWH of OUV sites are confronted with this challenge, and durable and meaningful ways to respond to it are sought by the site managers [57]. Concurrently, the industrialization of the ocean, climate change, habitat destruction, marine pollution, overfishing, invasive species, and others threaten the irreplaceable core values of the marine heritage sites.

On top of the abovementioned challenges, it is also a fact that coastal and pelagic biogeographic provinces are under-represented in terms of MWH [25]. Therefore, being recognized as having outstanding value does not necessarily mean that nature and humans are interconnected, but rather it appears that they are dichotomized [30,55,58]. Therefore, it is made evident that efforts, plans, actions, and impact should be future oriented, in line with shared and common goals. These goals are summarized in the five Cs: credibility, conservation, capacity building, communication and outreach, and communities [59]. In addition to them, great effort is needed to be sustainable by interlinking biodiversity and ecosystems to the broader seascape.

Through the prism of the previously mentioned issue, the management objectives need to shift focus. Nowadays, they are related mainly to science, wilderness protection, ecosystem protection and recreation, conservation of specific natural features, protected seascapes, and sustainable use. Following the this approach, the management objectives need now to facilitate a more ecosystem-based management approach and benefit from cooperation, partnerships, open communication, engagement, and interconnectivity between the heritage site and the surrounding marine area. Then, the result of the latter approach will be a reveal of the "big picture"; the entirety of the marine heritage site ecosystem, and its dynamics [60,61]. It is also worth mentioning here the insight with regard to the heritage communication and outreach, from the results of a survey on management issues on MWH sites from the marine site managers' point of view. This survey was conducted by the World Heritage Marine Programme during the 1st World Heritage Marine Site Managers

Meeting in Honolulu, Hawaii (1–3 December 2010). In this survey, it is noted that communication and outreach, in particular, which both relate to goals and vision formulation and, as a strategy/process/stage, run throughout the whole life cycle of the management, are not considered essential. As a result, they are not included in the identified categories of management issues and the elements of the effective management cycle by the heritage site managers. They actually ended up rating MWH sites' current management positively [60].

It is evident that we are in an era when MWH sites are acknowledged as exceptional, diverse, of highest international recognition, sharing common characteristics (at least one), and are more than the sum of their parts. We are also in an era when all these sites also share common threats and management challenges. Yet, in this era, there is additionally a need for heritage management and communication change. Indeed, heritage management and communication should act proactively, leading by example and as models of excellence [60,62]. They should assist by acting as models of a broader effort and transform MWH sites and their human community to change-facilitators and futurethinking actors towards a sustainable society and engaging community.

#### *3.2. Biocultural Framework for Heritage Management and Communication*

The sustainable shift in heritage management and communication and the aim to facilitate resilient livelihoods suggests interlinking landscapes, biodiversity, customs, cultural values, traditional knowledge, and local and indigenous communities. In other words, it is necessary, nowadays, to view the well-being of society and to develop a view of its future (futures foresight) through a biocultural lens [55,63,64]. The latter means that new approaches to heritage, nature conservation, landscape planning, and development goals are entering the conceptual framework of heritage [65]. These new approaches assist the conceptualization of biocultural heritage [8]. In turn, management goals and Sustainable Development Goals (SDGs) are connected and evaluated via resilience indicators. The response to this biocultural challenge is structured upon the adoption and application of sustainable biocultural heritage management and communication aiming at sustainable development and a broader transformative impact. This biocultural framing shapes the development of goals to be not only either nature-focused or people-focused but jointly interwoven. Additionally, it shapes evaluation and performance indicators that measure nature-human ecosystems in an integrated manner. This means that there are not only growth and wildlife indicators and indexes, and gross domestic product (GDP) rankings, but also biodiversity and social and human well-being ones (e.g., economic welfare, genuine progress, social connections, environmental quality, human capital, services to and from ecosystems, sustainable human development, local ontologies) [59,66–70].

Biocultural heritage links biodiversity with human diversity [71], the biological and cultural, the environment and the people [72]. It conceives human and ecological wellbeing as an interrelated system [59]. It focuses on the nexus between biology and culture and all that they involve [73]. Seeing it in more detail, it is the local ecological knowledge, innovations, and practices of indigenous peoples and local communities, and their associated landscapes, ecosystems, and biological resources. Indicative paradigms entrench (a) "from genetic varieties of crops the communities develop to the landscapes they create", or (b) "from seeds to landscapes", and (c) from knowledge to cultural landscapes and values. All of these are interwoven with heritage, memory, experience, living practices (e.g., traditional food/crops, medicines, handicrafts, long-standing traditional activities related with nature (i.e., festivals)) [8,10,11]. Biocultural heritage "encompasses the natural and cultural components of human and environment interaction, including knowledge, practices and innovation" [74]. According to UNESCO (2008) [75] (p. 8), it is defined as "living organisms or habitats whose present features are due to cultural action in time and place" [10]. Moreover, within it, there are recognized "areas of interdependencies between biological and cultural diversity" (e.g., language and linguistic diversity, material culture, knowledge and technology, modes of subsistence, which includes land use, economic relations, social relations, belief systems, etc.) [10]. According to Lindholm and Ekblom

(2019) and Ekblom et al. (2019) [8,74], biocultural heritage is constituted and framed by the following five interactive elements, as shown in Table 2.


**Table 2.** Interactive elements of biocultural heritage [8,74].

The concept of biocultural heritage offers a holistic approach to conventional crosscutting boundaries, as it inextricably links social, ecological, and biophysical systems [6,7]. This means that biological and material features of the landscape interlink with memory, experience, and knowledge [8–12]. Or that they encapsulate knowledge, practices, and values that reflect more modernized communities and not only those that adopt traditional lifestyles. Or that they are outputs that rely on informativity, diversity, dynamicity, and interactivity or interactions that relate to different human groups living with biodiversity within different contexts (rural and urban areas) [13]. It further paves the path to sustainability as it enhances the impact of partnerships. The reasoning for the latter is that it stresses local collaborative initiatives in tandem with institutional incentives (state, supra-state), emphasizing collective action and participation, and therefore features new forms of management and communication [8].

Moreover, it serves as a guiding framework for collective resource management and endogenous development [76]. It can synthesize in situ and ex situ knowledge recognizing local perspectives [66]. It can be conceptualized in a multimodal and metafunctional manner with strong relevance for other human and social groups across the globe, solidifying the future of humanity and the Earth [13]. Therefore, it has become a development tool used to inform thinking about the environment, nature governance, and management (for example, by the Institute for Environment and Development, or the International Union of the Conservation of Nature) [77]. Hence, using such an ecosystem-based approach helps ensure the integrity of MWH and ensures that the conditions of integrity are maintained further and enhanced over time [32,38]. It further endures on-the-ground impact [66]. It reflects the notion that people should also be recognized as central social figures for the conservation of nature and sustainable development (anthropo-centricity), in tandem with the marine ecological environment aiming at livelihood and climate resilience (biocultural frame).

#### **4. Methodology, Materials and Methods**

This paper aims to elaborate on the value and complexity of marine heritage by focusing on UNESCO's MWH management and communication approaches, and the challenges they face in light of the United Nations Sustainable Development Goals (SDGs) and the Decade of Ocean Literacy (2021–2030), and in the era of big data and the semantic web [78–80].

The MWH of OUV (50 sites in total across 37 countries) features some of the world's most exceptional ecosystems and is globally significant and a shared responsibility of humanity. It amounts to 10% by surface area of all the world's heritage protected areas [24,27,30,32]. On top of this, it is worth noting that the MWH of OUV represents 4.7%

of all sites, and 20% of natural and mixed sites. Additionally, the fact that "the area included in these marine sites is 56.5% of the area of all World Heritage sites, due to the enormous size of some marine listings, notably Papahãnaumokuãkea in the Hawaiian Archipelago in the United States of America, the Phoenix Islands Protected Area (PIPA) in the Republic of Kiribati in the Southern Pacific Ocean, and the Great Barrier Reef in Australia, which are, by a considerable margin, the three largest World Heritage Sites. Further, only about 40% of the world's oceans are within the jurisdiction of countries" [25–27,33]. Additionally, lastly, that "currently, about 2.9% of Earth's coastal and marine areas have some form of protected status [30,81], and only 0.01% of the global area is fully protected from extractive uses" (Laffoley and Langley, 2010) [22,25,33].

Focusing on MWH of OUV, this paper aims to illuminate the importance of a more holistic and integrated heritage management and communication approach, the sustainable biocultural framing. In this way, the shifting of the route in the blue marine-scape, from discovery and being listed as a World Heritage site to engagement and expansion of meaning, including other social and ecological contexts, together with informativity, diversity, dynamicity, and interactivity, will be facilitated. This biocultural heritage conceptualization functioning as an integrated framework for heritage and the various landscapes and environments reframes and facilitates synthesis across human and ecological wellbeing [55,63,64]. As a result, this expansive biocultural framing can create a common ground to develop futures literacy and build a joint future for nature and people [55]. More particularly, when entering the blue dimension, it encapsulates the fact that marine and maritime-scapes and systems include any kind of hermeneutic human relationship to the sea and the communities living along the coastlines. Therefore, it enriches the appreciation of marine heritage [16,82,83].

By mapping and analyzing the UNESCO MWH through the lens of social media, it comes into focus how MWH is conceived, managed, and communicated. By studying how it is framed, its meaning for a sustainable future is disclosed [84]. Furthermore, by showcasing the heritage management and communication approaches, it is revealed whether they are sustainable and biocultural in nature. Hence, it is revealed if they facilitate multimodal knowledge, engagement and participation, ocean literacy, and sustainability in light of the United Nations Sustainable Development Goals (SDGs) and the Decade of Ocean Science for Sustainable Development (2021–2030) [85–88].

All the above are approached and explored by focusing on how UNESCO MWH is promoted through social media and, particularly, Instagram. Specifically, the study and methodology approach is articulated as follows:

Firstly, the conceptual framework of marine heritage and further biocultural heritage and its specific correlation to sustainable management and communication is provided. Then, the sustainable and biocultural framework is researched by using as a case study heritage sites that are blue in nature and of great importance to humanity and of outstanding universal value. Later on, through the research findings, their digital communication profile on Instagram and what it says about each one of them are analyzed, and further, how they are communicated to and experienced by the public (multimedia and user-generated content). Through this approach, via its social media footprint on Instagram, the digital "living" culture and knowledge of UNESCO MWH are mapped, as an attempt to capture (in a database) everything about MWH and explore its "lifeworld", and its "voice" [89–95]. Moreover, the digital narrative footprint of heritage as content, experiences, discourse, voice, music, video, audio, and visual messages according to text interwoven on Instagram and in general in digital environments is depicted [96–98]. The digital narrative is a meaning vehicle articulated by media usage, motion, relationships, context, and communication [99]. Moreover, as genuine sustainability exists when heritage is present anywhere and anytime in everyday life [57], in the era of big data, this means that the concept of data from everywhere is applicable. The digital narrative footprint of heritage offers (a) metafunctional meanings in the flow of information, (b) indigenous, traditional, or biocultural data that indicate interests as the data travel, and (c) indigenous land, geospatial, and

place-based datasets. Sets of metadata can be developed, and culturally sensitive materials can be found online. In the case of heritage, and with regard to sustainability, it is important not only to foster openness by sharing and creating knowledge and to preserve all these cultural and heritage items for future generations but also to do it in a sustainable manner, which is to respect rights and follow the norms of the communities that created them.

The purpose of this study can be identified with the following research questions:

*RQ1*: How are UNESCO's MWH of OUV protected areas and sites of Europe promoted and communicated through social media nowadays? (What is their digital narrative footprint?) Subquestion: Are multimodal knowledge, engagement and participation, ocean literacy, and sustainability triggered and facilitated?

*RQ2*: Is MWH framed through a unified, joined-up approach for culture, heritage, landscape and systems, and sustainable livelihoods?

*RQ3*: Can Europe's UNESCO MWH protected areas and sites be interlinked and brought into existence by a unified nature-human ecosystems frame, viewing nature and people as an undifferentiated whole and be promoted and communicated in "one voice" (the blue digital narrative footprint), highlighting their sustainable and biocultural value (sustainability and biocultural framework)?

The choice of the mobile and sharing social media environment and, particularly, the Instagram service, as the field of research is based on the following:

The mobile and sharing social media environment refers to a broad spectrum of digital interaction and information exchange platforms aiming at enabling the general public to contribute, disseminate, and exchange information [100–102]. It constitutes a vehicle upon which experiences and emotional connections with geographical landscapes and wildlife are created and shared with the rest of the world. Thus, it is a big data system, since it is characterized by volume, variety, velocity, and users [78–80,103], and can be analyzed through social media analytics. Further, when social media is entangled with heritage, then big datasets are created, culture is presented and communicated in a multimodal way, heritage itself and heritage data are displayed, and heritage management and communication have to exploit knowledge from multimodal cultural and heritage data analytics. Thus, social media is fertile ground for harvesting various forms of multimodalbased data (e.g., images, videos, speech data, gestures, facial expressions, location-based data, gene-based data) [104,105], and their analysis also entails social media listening and sentiment analysis.

Additionally, social media consists of the digital online space where the management and communication strategy and practice are revealed and applied. It is also the space where the public interacts with heritage through the social media profile and the generated content (user-generated content, UGC) [106,107]. How the public experiences and understands a heritage site is of the highest importance for its lifespan and conservation [108]. It is worth mentioning that today, one out of two organizations enrich and enhance their internet profile by using social media and exploit this to develop their image and communicate, interact, and facilitate collaboration and knowledge-sharing with peers and the public. It is characteristic that European entities use social media and networks mostly for activities that relate to information and communication and for the development of their image. Additionally, it is characteristic that social media and network participation in the European Union reaches 56% of people aged 16–74, with the highest participation scores found in Denmark (79%), Belgium (73%), Sweden (70%), and the United Kingdom (70%) [109–111].

Instagram, which is a social photo and video sharing service that allows users to generate content, is one of the most popular social networks worldwide, and has 1.158 billion active users (monthly). Moreover, together with other social media platforms like Facebook, WhatsApp, and Facebook Messenger, it constitutes a core family product surpassing 7.2 billion registered accounts [112]. Furthermore, almost 855 million users access the platform monthly, and it is foreseen that this will exceed 988 million users in the next two years, a 15.5% increase [112]. On top of that, Instagram is the second leading platform

after Facebook that is used by marketers worldwide for promotion purposes in the digital environment due to its significant potential reach to audiences and its popularity in influencer marketing, with global spending worldwide growing to 8.08 billion US dollars [112]. Additionally, Instagram incorporates engaging tools like Instagram stories (which are temporary videos or a sequence of photos to form a storyline in a slideshow) that boost engagement and advance strategies for creating content and building an audience. For these reasons, Instagram stories have an ever-increasing trend in the number of daily active stories users worldwide [112].

Sharing information and experiences and commenting and interacting on social media, and particularly Instagram, is a usual practice. It is a practice that is deployed for impacting the conceptualization of culture and heritage [113,114], and strategically engaging in communication and creating a public image of blue marine heritage-scapes. Furthermore, the social media content is often geo-tagged as coordinates or toponyms of locations, which constitute a "crop" to harvest and analyze content, revealing information on issues, cultural dynamics, and the human landscape [100]. The latter transforms social media to also be geo-social since, by studying the interaction of users and data (topics, sentiment, space, etc.), the social structure and community connections can be observed [100]. Hence, the volume and richness social media offer open research paths for understanding situations and responding to research questions' challenges that, in our case, relate to sustainable biocultural heritage management and communication.

The data source and place of analysis were the official Instagram account and profile of each of the 14 UNESCO MWH of OUV of Europe. The analysis of more than one case in the same study assists the comparison and further considers the heritage management and communication in different settings. This diversity adds value to the research and offers a holistic vision [114].

The analysis unit was the multimedia and user-generated content in the Instagram accounts, which creates the mosaics and multiples of UNESCO MWH of OUV sites of Europe while connecting land and people digitally [102].

The sample consisted of 14 UNESCO MWH of OUV sites of Europe representing 11 European countries: Denmark—1 heritage site (HS), Finland—1 HS, France—3 HSs, Germany—1 HS, Iceland—1 HS, Netherlands—1 HS, Norway—1 HS, Russian Federation— 1 HS, Spain—1 HS, Sweden—1 HS, United Kingdom—2 HSs. Specifically, they include the following, grouped by country:


This study's research method is based on observation of Instagram through content analysis and comparative metrics between the official accounts of UNESCO's MWH of OUV sites of European countries [114]. The choice of the particular research method (content analysis) is because it is especially facilitative in drawing inferences from the text and visual information in social media postings through a set of procedures. Therefore, it constitutes a useful evaluation tool (Weber, 1990 [114]) and offers practical applicability [115]. In order for the researchers to strengthen the content analysis in the era of big data and the semantic web [78–80], they connected systematic rigor with contextual sensitivity and blended them with multimodal representations that reveal the communication process and the multifold role of the actors (e.g., targets of messages, producers of communication and meanings, and co-creators in meaning-making) [116–118]. The analysis of the content generated by the official account holder and the reactions or interactions it produces provides a fertile ground for understanding UNESCO's MWH management and communication and offers an overview of the heritage marine-scape and how it is conceptualized.

According to the three research questions developed, MWH is analyzed through the following:

(i) the digital metrics of the posts, such as activity traffic, likes, views, comments, posted photos and videos (visual and audiovisual representations), hashtags, tags, volume of entries, engagement, interaction, sentiment [114], and

(ii) hermeneutic themes and meaning units (i.e., cultural or biocultural heritage, ecosystems, landscapes, memory, knowledge construction, experience, activity, collaboration, informativity, diversity, dynamicity and interactivity, local and indigenous knowledge, practices, ontologies, and their synthesis with landscapes and biological ecosystems) [8–12,74] and frames related to the style of expression, critique (negative; neutral; positive/normative), rhetoric (hopeful/optimistic; alarmist/pessimistic), and generation references of looking at the present or ahead (current generation references/present/now; future generation references/resilience/sustainability) [119]. Themes and frames express data on an interpretative level and underlying meanings by answering questions like why, how, and by what means and communicating with the public on both the intellectual and emotional level [116].

The data collection method and tools used were scraping and searching thoroughly through official Instagram profiles and completing a documentation scheme for data entry designed for this purpose. The form included 60 closed-ended and open-ended purpose-built questions that required inputting values and specific information and data elements. These questions were grouped into thematic units (e.g., profile information, technical information, communication and audiovisual information, interaction, metadata), as Figure 1 shows.

**Figure 1.** Research documentation scheme of data elements for data entry.

through UNESCO's

—

By using this research documentation scheme of data elements groups, multimodal forms of knowledge and communication based on observations are mined (e.g., opinions, sentiments/emotions, interaction), and management and communication lines of strategic thinking are captured [103].

The data were collected in April and May 2020. The database was created only through UNESCO's MWH of OUV protected areas and Europe sites, which had created an official social media account and an Instagram profile. These cases were seven out of the 14 UNESCO MWH of OUV protected areas and sites of Europe, representing six out of 11 countries, as in Figure 2: Denmark, Wadden Sea (transboundary property); Finland, High Coast/Kvarken Archipelago (transboundary property); France, Lagoons of New Caledonia: Reef Diversity and Associated Ecosystems; Germany, Wadden Sea (transboundary property); Netherlands, Wadden Sea (transboundary property); Norway, West Norwegian Fjords—Geirangerfjord and Nærøyfjord; Sweden, High Coast/Kvarken Archipelago (transboundary property).

**Figure 2.** UNESCO Marine World Heritage of Outstanding Universal Value (MWH of OUV) protected areas and sites of Europe (research sample).

> The researchers retrieved from their Instagram accounts and analyzed a total of 4223 posts of UNESCO MWH of OUV protected areas and European sites. The total of 4223 posts created by the account holder generated a total of 30,451 comments and their diffusion, immersion, impact, communication, and dissemination were amplified by 53,442 hashtags (#) and 5505 tags (@). Moreover, in the total posts, framing was researched regarding the style of expression based on keywords/phrases found in the text, revealing critique, rhetoric, or looking at the present or ahead [83]. Additionally, mapping was conducted with regard to audiovisual material, emoticons, and geoinformation. Furthermore, in an attempt to carry out a more in-depth analysis, the top 10 most popular posts (according to their likes) created by each account holder were identified (a total of 70 most

popular posts with 2733 comments, 1250 #, and 156 @). They were studied thoroughly regarding their framing, audiovisual material, emoticons, geoinformation, metadata, and interaction.

#### **5. Results and Discussion**

The findings of the study demonstrate that maritime heritage should be managed and communicated through a more digitally and socially enriched internet profile (including social media and particularly Instagram). It also needs to be managed and communicated through a more active, informative, diverse, dynamic, interactive, participatory, and collaborative manner. Additionally, it needs to be managed and communicated in sustainable and biocultural terms, if it is to cultivate public multimodal knowledge and engagement, literacy, and resilient future livelihoods.

Regarding UNESCO MWH, it can be noted, as a general observation, firstly, that even only by the name of the site, which builds and represents its identity, patterns of marine heritage management and communication are revealed. All 14 UNESCO MWH of OUV protected areas and European sites have a blue item in their name that is traditionally linked to landscapes or, better, marine-scapes (e.g., sea, lagoon, archipelago, fjords, gulf, island). Yet, the United Kingdom, Spain, and Iceland's choice are toponym oriented; they use the actual name of the islands (St Kilda, Scotland, GB and Gough, GB; Ibiza, Spain; Surtsey, Iceland) to refer to and promote the UNESCO MWH sites, with no significant sensitivity towards whether the names of the Scottish, Spanish, and Icelandic islands are widely known. Having the site bound to the country designates the country and property ownership over its world importance and belonging to humanity. Additionally, besides the Wadden Sea, St Kilda, and Surtsey, all the others have a descriptive title articulated by at least of an average of six to seven words. This makes it difficult to remember them or be imprinted in one's mind so that one can recall them or search for them on the web through a search engine. As a result, online visibility, findability, and interaction are not facilitated or enhanced.

Secondly, with regard to the internet profile, out of the 14 UNESCO MWH of OUV protected areas and sites of Europe, only seven (7) have an official social media account and an Instagram profile, representing six out of the 11 countries: Denmark, Wadden Sea (transboundary property); Finland, High Coast/Kvarken Archipelago (transboundary property); France, Lagoons of New Caledonia: Reef Diversity and Associated Ecosystems; Germany, Wadden Sea (transboundary property); Netherlands, Wadden Sea (transboundary property); Norway, West Norwegian Fjords—Geirangerfjord and Nærøyfjord; Sweden, High Coast/Kvarken Archipelago (transboundary property).

Thirdly, with regard to the blue marine heritage communication identity, it is observed that the visual identity/logo of the UNESCO MWH of OUV protected areas and sites of Europe that hold an Instagram account is mostly co-aligned, with one exception, as Figure 3 shows.

Specifically, the ones that are a transboundary property (Wadden Sea; High Coast/ Kvarken Archipelago) chose to present themselves publicly by using the name of the heritage site in the visual identity/logo together with blue elements or graphics and with a wording emphasizing that they are "world heritage" and thus bring out the sites' universal value. The one that is property of France (Lagoons of New Caledonia: Reef Diversity and Associated Ecosystems) also follows the same line as the transboundary ones, except the emphasis on "world heritage". Yet, the one that is property of Norway (West Norwegian Fjords—Geirangerfjord and Nærøyfjord) differentiates itself by using only blue elements or graphics, with no text at all and not making its world heritage nature salient.

Iceland's choice

—

imprinted in one's mind so that one can recall them or search for them

**Figure 3.** UNESCO Marine World Heritage of Outstanding Universal Value (MWH of OUV) protected areas and sites of Europe (visual identity/logos). Own elaboration by authors.

#### *5.1. The Digital Narrative Footprint of UNESCO Marine World Heritage of Outstanding Universal Value (MWH of OUV) Protected Areas and Sites of Europe*

" " the sites' According to the research findings, and with regard to the social media footprint through the Instagram profiles of the researched UNESCO MWH of OUV protected areas and sites of Europe and the type of site with regard to property, the following results are observed, as Figure 4 shows.

**Figure 4.** Instagram feature distribution chart (followers, following, posts, likes, views, and comments) of the researched UNESCO Marine World Heritage of Outstanding Universal Value (MWH of OUV) protected areas and sites of Europe, by property type. Own elaboration by authors.

— — — — When a UNESCO MWH of OUV protected area or site of Europe constitutes transboundary heritage (e.g., Wadden Sea—transboundary property: Denmark, Germany, Netherlands; High Coast/Kvarken Archipelago—transboundary property: Finland, Sweden), then its public communication is not as vivid, active, and dynamic as when the UNESCO MWH of OUV protected areas or site of Europe is managed and communicated by one country and, therefore, by a central actor (e.g., Lagoons of New Caledonia: Reef Diversity and Associated Ecosystems—France; West Norwegian Fjords—Geirangerfjord

"Can Europe's UNESCO

iocultural frame)?", it is apparent that shared man-

" "

—

" "

and Nærøyfjord—Norway) (Figure 4). As a general remark, it can be said that location and promotion of the local cultural authorities and tourism organizations play an essential role in the public communication footprint and the visibility and engagement of the MWH site. On top of that and, responding in part to *RQ3:* "Can Europe's UNESCO MWH protected areas and sites be interlinked in a unified nature-human ecosystems frame viewing nature and people as an undifferentiated whole and be promoted and communicated in "one voice" (the blue digital narrative footprint), highlighting their sustainable and biocultural value (sustainability and biocultural frame)?", it is apparent that shared management and communication do not lead to common strategic plans of action that have as an outcome a "common, one voice" of UNESCO MWH of OUV protected areas and sites of Europe. Hence, a broad transformative and positive blue sustainable impact cannot be identified and detected.

Furthermore, and in relation to the former finding (Figure 4), it is of significant interest that an oxymoron appears in the communication of the transboundary MWH sites, as noted via their Instagram profile information; instead of having a common voice building on their transboundary nature, they chose different communication paths. One (Wadden Sea) highlights its world and common nature and ownership. In contrast, the other one (High Coast/Kvarken Archipelago) chooses to act as country property and stand out not with its unified nature but with its Finnish ownership and geographical aspect. Additionally, combining the latter finding with the one about using geoinformation in the posts (Table 4), it is notable that when unity is in the foreground, geolocation is paired with its featured region–country–place, but when disaffiliation is noted, then geoinformation is linked to the region–place–landscape–protected area. Therefore, trying to conceptualize and depict the antinomy, it can be said that country is a secondary reference when universality is a prominent element in the public communication profile, and conversely, the landscape and protected area are given prominence only after first having the country pinpointed as a core public communication element. – – – – –

The statistical results of the social media footprint through the Instagram profiles of the researched UNESCO MWH of OUV protected areas and European sites are presented in Figure 5.

**Figure 5.** Statistics (mean, standard deviation) of the core Instagram features (followers, following, posts, likes, views, and comments) of the researched UNESCO Marine World Heritage of Outstanding Universal Value (MWH of OUV) protected areas and sites of Europe. Own elaboration by authors.

In Figure 5, are statistics for the important social media features of an Instagram profile. The mean values for all the features besides likes and views are considerably low, ranging

'

from 624.5 to 27,703.5 actions. The variation of values of the features is very disperse, meaning that the actions in all features' categories are spread out. This dispersion depicts the scattered, fragmented, and ad hoc communication and managerial and strategic choices for the MWH sites, as also discussed in the analysis of the findings.

In analyzing the correlation and association between and among the features of the Instagram profile of the researched UNESCO MWH of OUV protected areas and sites of Europe, the following results are noticed, as Table 3 shows.

**Table 3.** Correlation matrix of the Instagram profiles of the researched UNESCO Marine World Heritage of Outstanding Universal Value (MWH of OUV) protected areas and sites of Europe. Own elaboration by authors.


In a general view, strong relationships are observed between almost among all variables, as anticipated. However, it is worth noting that the strongest and nearly perfect positive linear correlation is between the views and Instagram stories (SD 0.998784) and following and Instagram stories (SD 0.997221). This shows that, when a UNESCO MWH of OUV protected area or site of Europe decides to create a digital social network and to follow other Instagram accounts, those accounts are interested in the public communication profile of their follower. This choice is because they want to know how they relate to the follower, why the follower decided to follow them, and the follower's activity. A schema of multimedia content power might be detectable here and is worthy of further research [120]. The interesting aspect of the forenamed correlation is that they monitor the follower through the instant social media content created by the follower (Instagram stories that disappear after 24 h), not through the posts archived in the account's timeline and the Instagram feed. It can be interpreted that they are interested in the vivid social profile of the follower and the UNESCO MWH of OUV protected areas and sites of Europe. They are drawn explicitly to the audiovisual content and the narrative accompanying it, as it is incorporated within the framework of stories. The reason for that is that Instagram stories are a tool for representing oneself in an online world to get connected virtually, and on top of that, to boost reach and engagement. Moreover, they can be used either for inspiration or quality social media listening. They are vibrant and live and a feature of self-disclosure, self-presentation, perceived collectivism, and new relationship building [121]. Based on the latter, it can be argued that this highly focused interest and interaction reveal, in turn, highly positive attention and thus, a form of content power (influence) [120].

Another finding worth noting is the almost nonexistent linear correlation between the views and posts (SD 0.160411), and the limited correlation between the views and likes (SD 0.30538) and the views and comments (SD 0.463724). It can be argued that this suggests that interactivity is being facilitated through other features of Instagram and not the anticipated and obvious ones. It is possible that other, nonlinear types of relationship between the two can exist and are worth further research, e.g., looking through the lens of the social relevance feedback based on multimedia content power [120].

Looking more thoroughly into the research findings, the nexus of the social media footprint through the Instagram profile of the researched UNESCO MWH of OUV protected areas and sites of Europe and the social media engagement and Instagram features is depicted in Figure 6. Instagram classifies the following feature categories: posts, likes, followers, profile information, following, views, comments, and other information. In this way, it is shown which are the fundamentals in heritage communication and management

#### regarding raising the visibility of the MWH "lifeworld" and its "voice", the digital one, via engagement and interactivity.

**Figure 6.** Instagram follower, following, post, like, view, and comment distribution chart of the researched UNESCO Marine World Heritage of Outstanding Universal Value (MWH of OUV) protected areas and sites of Europe. Own elaboration by authors.

> — Of the total 4223 posts of UNESCO MWH of OUV protected areas and sites of Europe (Figure 6, 3rd bar), the Lagoons of New Caledonia: Reef Diversity and Associated Ecosystems Instagram account is leading the communication praxis as they have created the most posts (62.174%), then the West Norwegian Fjords—Geirangerfjord and Nærøyfjord Instagram account follows, with 29.678% and, lastly, the High Coast/Kvarken Archipelago and the Wadden Sea Instagram accounts, with 5.377% and 2.771%, respectively.

 lic's awareness and experience with world marine heritage. This finding A consequence of this finding, and the active presence on social media via Instagram posts, is that the public appears to interact, via its likes, mostly with the Lagoons of New Caledonia: Reef Diversity and Associated Ecosystems marine heritage site and then with the West Norwegian Fjords—Geirangerfjord and Nærøyfjord marine heritage site, whereas the two transboundary UNESCO MWH of OUV protected areas and sites of Europe (High Coast/Kvarken Archipelago; Wadden Sea) are nonexistent in terms of public engagement and interaction, with no likes at all (Figure 6, 4th bar). Therefore, only two out of four UNESCO MWH of OUV protected areas and sites of Europe enhance the public's awareness and experience with world marine heritage. This finding also relates to the fact that social media is used little by European entities for the exchange of views or knowledge and, thus, for interaction or collaboration, and is used in a more linear way and only for obtaining or harvesting opinions [109–111,122,123].

Regarding the followers of the UNESCO MWH of OUV protected areas and sites of Europe, the picture remains the same as presented previously (Figure 6, 1st bar). The only differentiation is the interchange in the first place between the Lagoons of New Caledonia: Reef Diversity and Associated Ecosystems and the West Norwegian Fjords—Geirangerfjord and Nærøyfjord marine heritage sites. As shown in Figure 4, from the number of followers, first is the West Norwegian Fjords—Geirangerfjord and Nærøyfjord marine heritage site, although it is not the first in terms of the number of posts or likes. One explanation for that may be that the West Norwegian Fjords—Geirangerfjord and Nærøyfjord Instagram account is run by the official tourism board of Fjord Norway, whereas the Lagoons of New Caledonia: Reef Diversity and Associated Ecosystems Instagram account is the official account for tourism to New Caledonia. This means that being an official account holder of heritage whose official name is nature-specific (fjord), together with place specificity (Norway), and not only toponym based (new Caledonia), is more effective communicationwise, as someone has to know the name of the area and toponym to search for an Instagram account and follow it.

The communication identity sketched in the profile information in the Instagram accounts is also noteworthy. The High Coast/Kvarken Archipelago marine heritage site introduces itself to the public by emphasizing its locality, although it is a transboundary property and heritage (e.g., "we are the Finnish part"). Thus, instead of being entangled with its statement of "world heritage" in its logo (Figure 3), it moves away. The Wadden Sea marine heritage site adopts a different communication frame. In its profile information, it points out its unique marine characteristics (e.g., "the largest tidal flats system in the world"), validating its world marine heritage nature that is also stated in its logo (Figure 3). It also goes a step further by designating unity and validating its universality through saying that, although it is shared among three countries as heritage, it is one sea (e.g., "Denmark—Germany—Netherlands. ONE Wadden Sea"). Therefore, one could infer that collective framing can be identified, which can act as a fundamental feature of a "common, single voice". Yet, it should be researched further whether this principle is also transferred to and runs through the whole management and communication strategy.

According to the number of the accounts that every UNESCO MWH site follows on Instagram (following), it is evident, as Figure 6 (2nd bar) shows, that the effort to cultivate a network of relationships and influence is being made by the West Norwegian Fjords— Geirangerfjord and Nærøyfjord heritage site (68.615%). Again, one explanation could be that its Instagram account is being run by the official tourism board of Fjord Norway; this could make it easier for it to approach interlocutors, and for that purpose, following the path of locating actors and stakeholders and Instagram members that can relate to it and its mission.

Taking a look at the views and comments (Figure 6, 5th and 6th bar), it is discerned that followers and following, in other words, the digital social network around a heritage site, bring more views and, thus, amplify the visibility of the MWH site. However, the more conversant an MWH site is in the digital and social media world through posts, the more engagement and interactivity it cultivates, as the number of the comments and likes shows.

In a deeper quantitative and qualitative look at the research findings, the digital communication profile of each one of the researched UNESCO MWH of OUV protected areas and sites of Europe is sketched out. In tandem, the response to *RQ1*: "How are the UNESCO MWH of OUV protected areas and sites of Europe promoted and communicated through social media nowadays? (What is their digital narrative footprint?)/Subquestion: Are participation, ocean literacy, and sustainability triggered and facilitated?" is shaped as follows, and as shown in Figure 7.

**rs** 

**nts** 

**ns** 

**Figure 7.** Instagram profiles of the researched UNESCO Marine World Heritage of Outstanding Universal Value (MWH of OUV) protected areas and sites of Europe. Own elaboration by authors.

> Analyses of the digital communication profile of each one of the researched UNESCO MWH of OUV protected areas and sites of Europe, per criterium relating to more information, are shown in Table 4 (complete profile data) and Table 5 (top 10 most popular posts according to the number of likes).

> **go ry ef — i-**In more detail, as Table 4 shows, the High Coast/Kvarken Archipelago (transboundary property: Finland, Sweden) marine heritage site does not have a significant digital social media footprint. Its Instagram account had 613 followers, 8949 likes, 1652 views, and 228 posts, with 83 comments in the research period (April–May 2020). There have been no Instagram Stories produced and published. There is very rarely a response to the followers' comments although, at first, interaction seems to be welcomed, as all three (3) buttons facilitating it (follow; send a message; send email) are there. The metadata used in the posts created by the account holder include emoticons, geoinformation with regard to the region, the place, the landscape and the protection area, hashtags (#), with an average of seven to eight hashtags per post, and tags (@), with an average of two tags per post. Yet, there is no significant network of relationships and interlinkages cultivated, as the account follows only 136 other Instagram members.


**Table 4.** Instagram profiles of the researched UNESCO Marine World Heritage of Outstanding Universal Value (MWH of OUV) protected areas and sites of Europe. Own elaboration by authors.

> In a more qualitative analysis of the top 10 most popular posts of the account in the research period, as Table 5 shows, it is observed that the posts are supported only by photos (an average of two images per post) that reveal the site's nature in equal terms as either nature/landscape heritage or mixed (nature and culture) heritage. There are no videos at all, the text has an average of 28–29 words, and its style and expression are framed mostly as positive/normative critique, and hopeful/optimistic rhetoric. There are references to current generation/present/now, there are no emoticons or geoinformation used. They are accompanied by an average of 10 hashtags per post relating to the country, region, place, and activity/action message, with the message being like a statement, and almost no tags per post. Every post of the top 10 most popular ones of the account in the research period generates one comment at the most, including emoticons and a tag, and it gets almost no reaction or reply from the account holder.

> Hence, to conclude, it is observed that this particular site not only does not have a significant social media footprint, but it also appears to the public as only a nature/landscape heritage site. Its digital narrative footprint states positivity and hope, yet it does not relate to multimodal knowledge, engagement and participation, ocean literacy, and sustainability. Therefore, it does not build solid marine knowledge with a sustainable view.

> The Wadden Sea (transboundary property: Denmark, Germany, Netherlands) marine heritage site, as Table 4 shows, does not have a significant digital social media footprint like the other transboundary property of Finland and Sweden of the High Coast/Kvarken Archipelago, although in comparison to it, it appears to be a bit more active. On its Instagram account, there are 1301 followers, 10,860 likes, 787 views, and 117 posts with 255 comments in the research period (April–May 2020). There have been no Instagram stories produced and published. Occasionally, there is a response to the followers' comments, although, at first, interaction seems to be welcomed as all three (3) buttons that facilitate it (follow; send a message; send email) are available. The metadata used in the posts created

by the account holder include emoticons, geoinformation with regard to the country, the region, and the place, hashtags (#), with an average of seven to eight hashtags per post, and tags (@), with maximum of one tag, if any, per post. Additionally, there is no significant network of relationships and interlinkages cultivated as the account follows only 234 other Instagram members.

**Table 5.** Instagram profiles of the researched UNESCO Marine World Heritage of Outstanding Universal Value (MWH of OUV) protected areas and sites of Europe, according to the top 10 most popular posts (number of likes). Own elaboration by authors.


In a more qualitative analysis of the top 10 most popular posts of the account in the research period, as Table 5 shows, it is observed that the posts are supported only by photos (a maximum of one photo, if any, per post) that reveal the site's nature as a nature/landscape heritage site, there are no videos at all, the text has an average of 22–23 words, and its style and expression is framed mostly as positive/normative critique, and hopeful/optimistic rhetoric. There are references to current generation/present/now, and there are some emoticons or geoinformation used, particularly related to the country,

region, city, and place. They are accompanied by an average of three hashtags per post relating to place and feelings, a message revealing mindset, and almost no tags per post. Every post of the top 10 most popular ones of the account in the research period generates an average of 50 comments, that might include emoticons, and an average of eight hashtags and 13 tags, an average of 38 likes, and it has an interaction and exchange of comments with an average of 61 replies per post.

Therefore, it can be concluded that this particular site not only does not have a significant social media footprint, but it also appears to the public simply as a nature/landscape heritage site. Its digital narrative footprint states positivity and hope, yet it does not relate to multimodal knowledge, engagement and participation, ocean literacy, and sustainability. Therefore, it does not build solid marine knowledge with a sustainable view. However, it enhances the unity of the world marine heritage site, and it comes forward as "one voice" geographically and not geo-socially. Therefore, responding in part to *RQ3:* "Can Europe's UNESCO MWH protected areas and sites be interlinked in a unified nature-human ecosystems frame, viewing nature and people as an undifferentiated whole, and be promoted and communicated in "one voice" (the blue digital narrative footprint), highlighting their sustainable and biocultural value (sustainability and biocultural framework)?", it is apparent that "one voice" is framed and communicated in terms of property rights of the MWH and not of its sustainable and biocultural value.

The Lagoons of New Caledonia: Reef Diversity and Associated Ecosystems (France) marine heritage site, as Table 4 shows, has a significant digital social media footprint. Its Instagram account has 40,600 followers, 1,465,261 likes, 31,564 views, and 2625 posts with 16,985 comments in the research period (April–May 2020). There were limited Instagram stories produced and published (five in total), and there were rarely responses to the comments of the followers, at first, although interaction seems to be welcomed, as two (2) buttons facilitating it (follow; send a message; no "send email") are there. The metadata used in the posts created by the account holder include emoticons, geoinformation with regard to the region and the place, hashtags (#), with an average of 15 hashtags per post, and tags (@), with an average of one (1) tag per post. Yet, there is a minimal network of relationships and interlinkages cultivated as the account follows only 415 other Instagram members.

In a more qualitative analysis of the top 10 most popular posts of the account for the research period, as Table 5 shows, it is observed that the posts are supported by both photos (an average of one photo per post) that reveal the site's nature as only a nature/landscape heritage and videos (an average of one video per post), the text has an average of 14–15 words, and its style and expression is framed mostly as positive/normative critique, and hopeful/optimistic rhetoric, there are references to current generation/present/now, there are emoticons and geoinformation used, and they are accompanied by an average of three (3) hashtags per post relating to the country, region, place, and activity/action message, the message is like a statement, and there are 16–17 tags per post. Every post of the top 10 most popular ones of the account in the research period generates three to four comments, including emoticons and a tag, and they almost never get any reaction or reply from the account holder.

Therefore, in the end, it is observed that this particular site, although it does have a significant social media footprint, appears to the public only as a nature/landscape heritage site with a digital narrative footprint stating "positivity" and "hope", instead of constructing a narrative that relates and amplifies multimodal knowledge, engagement and participation, ocean literacy, and sustainability, building solid marine understanding with a sustainable view.

The West Norwegian Fjords—Geirangerfjord and Nærøyfjord (Norway) marine heritage site, as Table 4 shows, does have a significant digital social media footprint. Its Instagram account had 68,300 followers, 863,583 likes, 607,800 views, and 1253 posts with 13,128 comments in the research period (April–May 2020). There were a few Instagram stories produced and published (50 in total). There is very rarely a response to the followers'

comments, although, at first, interaction seems to be welcomed, as all three (3) buttons facilitating it (follow; send a message; send email) are there. The metadata used in the posts created by the account holder include emoticons, geoinformation with regard to the region, the city, and the place, hashtags (#), with an average of seven to eight hashtags per post, and tags (@), with an average of one (1) tag per post. Yet, although there is an evident attempt to create a network of relationships and interlinkages, there is not a significant one cultivated as the account follows only 1714 other Instagram members.

In a more qualitative analysis of the top 10 most popular posts of the account in the research period, as Table 5 shows, it is observed that the posts are supported only by photos (an average of one photo per post) that reveal the site's nature mainly as a nature/landscape heritage. There are no videos at all, the text has an average of 16 words, and its style and expression is framed mostly as positive/normative critique, and hopeful/optimistic rhetoric, and there are references to current generation/present/now, emoticons and geoinformation are used, particularly related to the country, region, place, and landscape, and they are accompanied by an average of eight hashtags per post relating to country, place, and activity/action message, the message is like a statement, and there is one tag per post. Every post of the top 10 most popular ones of the account in the research period generates two to three comments, that might include emoticons and a tag, which get only the maximum of one reaction or reply from the account holder.

Thus, in the end, it is observed that this particular site, although it has a significant social media footprint, appears to the public simply as a nature/landscape heritage site. Its digital narrative footprint states positivity and hope, but does not relate to multimodal knowledge, engagement and participation, ocean literacy, and sustainability and, therefore, does not build solid marine knowledge with a sustainable view.

Through a bird's eye view, the salient features of the social media and Instagram profile of each one of the researched UNESCO MWH of OUV protected areas and sites of Europe are as represented in Figure 8. Through a bird's eye view,

**Figure 8.** Salient features of Instagram profiles of the researched UNESCO Marine World Heritage of Outstanding Universal Value (MWH of OUV) protected areas and sites of Europe.

" ' " Summing up and responding to RQ1, it can be argued that the mosaic of the digital communication profile of the UNESCO MWH of OUV protected areas and sites of Europe is fragmented; each one is pulling in different paths. Although they are of universal value and of common importance for humanity, this is not also the case for their management and communication; no common lines or even principles can be identified strategically, managerially, or communicatively. Instead, they present themselves to the public and the digital and social media environment (whenever they do), in a scattered manner, detached from the core universal values they represent. They do not integrate elements of informativity, diversity, dynamicity, and interactivity that relate to the big data and semantic web era. Further, they do not integrate the key sustainability dimensions, let alone facilitate the interlinkage of authenticity and genuineness, values, community, and the public [39,44,45]. Therefore, they do not constitute "an instrument for all societies' sustainable development" [42].

nable livelihoods?", it is made evident from

–

'

: "Is

#### *5.2. Europe's UNESCO Marine World Heritage (MWH) and the Nature-Human Frame*

Regarding *RQ2*: "Is MWH framed through a unified, joined-up approach for culture, heritage, landscape and systems, and sustainable livelihoods?", it is made evident from the research findings, as Figure 9 shows, that the only element that unifies the MWH sites is their nature and the physical environment and landscape heritage (71%).

**Figure 9.** Instagram marine world heritage (MWH) type depiction chart of the researched UNESCO Marine World Heritage of Outstanding Universal Value (MWH of OUV) protected areas and sites of Europe. Own elaboration by authors.

This is not sufficient and satisfying according to the joined-up approach being looked for to promote MWH through a holistic approach that incorporates elements of culture, heritage, and sustainable livelihoods [55,63,64]. Therefore, no ecosystem-based management approach is activated. Still, there is no capitalizing of the benefits from cooperation, partnerships, open communication, engagement, and interconnectivity that the digital and social environments offer [3–12].

Moreover, in the era of big data and the semantic web, the strategic choice, as shown below in Figure 10, is the element of interactivity with the public that is rare. The technological and media potentials are not exploited since audiovisual articulations and stories are not used to support the heritage sites' posts.

– From a managerial and communicative perspective, the purpose of having an official Instagram (and social media in general) account and profile for blue world heritage is to improve awareness, increase interest about it, and enhance its online visibility and findability (1148. Additionally, the nature and size of Instagram (and social media in general) serve to contain a significant volume of heritage information integrated into unique images, videos, hyperlinks, and text with reviews, so that the users/the public can engage with the content while generating real-time behavioral datasets with high velocity [122]. Therefore, the result of the previous strategic choice to have minimal interactivity (Figure 10) is to lose out on all this opportunity (a) of retrofitting their account in Instagram (and social media in general) to an online databank of quantitative and qualitative information, (b) of having user-generated content, and (c) of harnessing it by exploiting big data and semantic web approaches and methods that offer the potential to gather data with volume, velocity, and veracity. Consequently, this restricted interaction and the abundance of user-generated content and data analytics and insights lead to a very limited potential to have metrics of interaction with heritage content. This leaves out not only metrics but also a chance to better understand the users/the public during the communication and interaction with social media technology and blue heritage [123]. Further, it does not allow the managers and communicators to cover their information needs [122] and to participatorily include them/it in the public communication of MWH and the cultivation of multimodal knowledge for blue heritage.

**Figure 10.** Instagram interaction chart of the researched UNESCO Marine World Heritage of Outstanding Universal Value (MWH of OUV) protected areas and sites of Europe. Own elaboration by authors.

It is also noteworthy that in the cases where the MWH site is the property of one country and not transboundary (e.g., Lagoons of New Caledonia: Reef Diversity and Associated Ecosystems/France; West Norwegian Fjords—Geirangerfjord and Nærøyfjord/Norway), the locality and the centricity of the importance of the country owning it are salient as they choose to use hashtags that mention the country specifically, and enrich posts with hashtags that relate to the messages formulated, like statements as absolute truths. There are no hashtags revealing mindset, feelings, persons, or activity and actions. Therefore, the human element, which is so sought after and desired, is neglected [55,66–70]. The human-nature frame is not being incorporated. As a result, a shift of the route in the blue marine-scape, from discovery and being listed as a World Heritage site to engagement and expansion of meaning, including other social and ecological contexts, together with informativity, diversity, dynamicity, and interactivity, cannot be ascertained.

#### *5.3. Europe's UNESCO Marine World Heritage (MWH), and the Sustainability and Biocultural Framework*

Regarding *RQ3*: "Can Europe's UNESCO MWH protected areas and sites be interlinked in a unified nature-human ecosystems frame, viewing nature and people as an undifferentiated whole, and be promoted and communicated in "one voice" (the blue digital narrative footprint), highlighting their sustainable and biocultural value (sustainability and biocultural framework)?", the research findings reveal that the "positivity" and "hope" frame prevail in the sites' critique and rhetoric and that the narrative focus is only on the current generation/present/now. Viewing the latter realization in relation to management and communication, it can be argued that the strategic and forward thinking in the fore-

named approaches is exposed as poor and rather fragmented, rather than interlinking. Nature-human ecosystems and, therefore, the sustainability and biocultural framework, are not components of the conceptualization of UNESCO MWH of OUV protected areas and sites of Europe. This is apparent since there are no references to (a) memory, experience, local and indigenous knowledge, practices, living practices, and ontologies [8–12], or (b) knowledge, practices and values that reflect more modernized communities and not only those that adopt traditional lifestyles, or (c) outputs that rely on informativity, diversity, dynamicity, and interactivity, or (d) interactions that relate to different human groups living with biodiversity within different contexts (rural and urban areas) [13]. Thus, further, these components do not also constitute components of heritage management and communication. Having management and communication aims and plans stripped of a future orientation does not facilitate shifting the heritage management and communication towards sustainability and bioculture. Consequently, Europe's UNESCO MWH protected areas and sites are not brought into existence by a unified blue digital narrative having "one voice", viewing nature and people as an undifferentiated whole (biocultural heritage ecology). They do not exploit and capitalize on big data and semantic web opportunities. They cannot be digitally and socially produced and shared in a dialectic and participatory manner, enhancing sustainable heritage management and communication design and praxis.

#### **6. Conclusions and Key Recommendations**

The current research aimed to present how MWH is managed and communicated through social media. The paper has sketched the digital narrative footprint of the UN-ESCO MWH of OUV sites of Europe, on Instagram. With the digital and social media environment as a vehicle and the management and communication approaches as structural guidelines, the study revealed the strategic choices made by the main actors of heritage management and communication in the blue marine environment and for future generations. While this study has not exhausted the topic of whether heritage nowadays can be managed and communicated in a sustainable and biocultural manner, it definitely maps, felicitously and clearly, the picture that Europe's UNESCO MWH protected areas and sites are drawing for themselves and the public eye. A picture and public image that is, apparently, as suggested by the research findings, not in line with ecosystem-based management and communication [8–12]. It does not depict a digital "living" culture [93] and multimodal knowledge, capturing the "lifeworld" and the "voice" of marine heritage, as unified [89–95]. Consistent with the conclusions drawn by recent studies, although the tendencies are identified and the conceptual frameworks are offered to frame the heritage management and communication approaches in a joined-up manner, it is evident that the effort being made is lean. Despite the large volume of data, information, and users/human groups available on Instagram, the utilization of big data analytics for strategic managerial and communication schemas remains in its infancy [123]. UNESCO MWH protected areas and sites of Europe upload content to their official accounts on social media (Instagram), but not enough, and manage it, but not sufficiently in terms of big data, the semantic web, sustainability, and bioculture [6,7]. It should also be pointed out that further uploading, big data management insights, and collaboration are needed in order to lead to further collective intelligence, participation, and collective action for resilient future livelihoods [55,63,64,124].

The research raises important questions about the communication praxis for MWH and its blue digital narrative footprint. It would be fruitful to pursue further research about the public's point of view and the nature-human interlinkage in order to holistically map the conceptualization of marine heritage and its relationship to the public realm, futures literacy, and sustainable livelihoods. Furthermore, in the era of big data and the semantic web, the advancement of computer processing, and the development of sophisticated applications promising multimodal data collection (i.e., electroencephalography—EEG, eye movements, video, keystrokes and wristband data, 4D modeling and transforming

intangible cultural heritage and live expressions into tangible digital objects, the i-Treasures platform incorporating multisensory technology, relevance feedback algorithms, and multimedia content power) [18,19,79,107,122], it would be effective and prolific for leading actors of heritage management and communication to act and move towards the following paths:


Then, all this could be converted into actionable insights. Hence, they can be exploited to address challenges like management and communication decision making, strategic planning and performance evaluation, and developing well-informed strategies while adopting novel principles [123] and forward-thinking and incorporating societal and educational values. Moreover, exploiting big multimodal data analytics related to sentiment and emotion could facilitate designing meaningful experiences for the public. Further, it could enable creating strong relationships and trust and engage effectively in multimodal knowledge creation and preservation, and in public communication and resilience with regard to blue heritage.

In conclusion, blue world heritage has to be representative of the social desire to preserve and cross-link the blue heritage legacy in today's world with sustainability and bioculture. Moreover, blue heritage and public interaction traces must be captured and the key aspects and their dynamics must be identified for a broader transformative impact and sustainable biocultural heritage management and communication. In this way, the joint generation of understanding of the past and public appreciation in the present may be catalyzed and be decisive for local and global sustainability [122].

**Author Contributions:** C.K. and F.G. have participated and contributed equally in the conception and design of this article, as well as in the conceptualization, methodology, validation, formal analysis, resources, data curation, writing the original draft, review, and editing. All authors have read and agreed to the published version of the manuscript.

**Funding:** The APC was funded by the European Maritime and Fisheries Fund (EMFF) of the European Union (EU), under grant agreement No 863524, project NAUTILUS (CALL: EMFF-BlueEconomy-2018/EMFF-02-2018 Blue Careers).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are available on request from the corresponding authors. The data are not publicly available due to the fact that they are part of a broader study of sustainability and tourism development in coastal and maritime areas, in the framework of the EU-funded NAUTILUS project (EMFF), as mentioned in the acknowledgements.

**Acknowledgments:** This study aims to raise awareness and build knowledge of the UNESCO MWH sites of Europe and is part of the research and communication plan for the EU-funded NAUTILUS project (GA No: 863524/CALL: EMFF-BlueEconomy-2018/EMFF-02-2018 Blue Careers) of the European Maritime and Fisheries Fund (EMFF) of the European Union (EU). The funders had no role in study design, data collection and analysis, decision to publish, or manuscript preparation.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Semantic Crowdsourcing of Soundscapes Heritage: A Mojo Model for Data-Driven Storytelling**

**Marina Eirini Stamatiadou \*, Iordanis Thoidis , Nikolaos Vryzas, Lazaros Vrysis and Charalampos Dimoulas \***

> Multidisciplinary Media & Mediated Communication Research Group (M3C), Aristotle University, 54636 Thessaloniki, Greece; ithoidis@auth.gr (I.T.); nvryzas@auth.gr (N.V.); lvrysis@auth.gr (L.V.) **\*** Correspondence: mstamat@auth.gr (M.E.S.); babis@eng.auth.gr (C.D.); Tel.: +30-2310-994245 (C.D.)

**Abstract:** The current paper focuses on the development of an enhanced Mobile Journalism (MoJo) model for soundscape heritage crowdsourcing, data-driven storytelling, and management in the era of big data and the semantic web. Soundscapes and environmental sound semantics have a great impact on cultural heritage, also affecting the quality of human life, from multiple perspectives. In this view, context- and location-aware mobile services can be combined with state-of-the-art machine and deep learning approaches to offer multilevel semantic analysis monitoring of sound-related heritage. The targeted utilities can offer new insights toward sustainable growth of both urban and rural areas. Much emphasis is also put on the multimodal preservation and auralization of special soundscape areas and open ancient theaters with remarkable acoustic behavior, representing important cultural artifacts. For this purpose, a pervasive computing architecture is deployed and investigated, utilizing both client- and cloud-wise semantic analysis services, to implement and evaluate the envisioned MoJo methodology. Elaborating on previous/baseline MoJo tools, research hypotheses and questions are stated and put to test as part of the human-centered application design and development process. In this setting, primary algorithmic backend services on sound semantics are implemented and thoroughly validated, providing a convincing proof of concept of the proposed model.

**Keywords:** soundscapes; audiovisual heritage; semantic audio; data-driven storytelling; cultural heritage; content crowdsourcing; heritage management

#### **1. Introduction**

Cultural Heritage (CH) is considered very important from multiple perspectives of everyday modern human life, including but not limited to education, history, cultivation of cultural awareness, social engagement, entertainment, and well-being. The proliferation of Information and Communication Technologies (ICTs) and especially digital mobile devices has significantly propelled CH projects and associated featured services (websites, multimedia/mobile apps, etc.). In this context, ordinary users can navigate and virtually visit places and artifacts displaying cultural and heritage interests, literately, without time or geographical restrictions. These services can be deployed at the change of attending a physical environment with cultural value for augmenting the whole experience (before, during, and after the visit) or general infotainment activities. Apart from the cases of digital museums and exhibitions concerning artworks, historical buildings, monuments, and other cultural items, intangible CH has flourished through the processes of information capturing, documentation, and digital synthesis of CH storytelling experiences [1–7].

Among others, the audiovisual heritage associated with places, performances, and events can benefit from this progress in recording, managing, and authoring data-driven narratives [5–9]. In this context, average users can become active participants in the processes of contributing and exploiting multimedia content by experiencing, evaluating,

**Citation:** Stamatiadou, M.E.; Thoidis, I.; Vryzas, N.; Vrysis, L.; Dimoulas, C. Semantic Crowdsourcing of Soundscapes Heritage: A Mojo Model for Data-Driven Storytelling. *Sustainability* **2021**, *13*, 2714. https:// doi.org/10.3390/su13052714

Academic Editors: Asterios Bakolas and Marc A Rosen

Received: 12 December 2020 Accepted: 25 February 2021 Published: 3 March 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

and reinforcing the associated services. For instance, previous works have proved that applicable media assets can be quickly and massively crowdsourced, making use of the inherent audiovisual capturing and networking capabilities that modern mobile devices offer [10–12]. Apart from the data themself, useful context-, time-, and location-aware metadata can be extracted to facilitate semantic information management and retrieval [13–16]. Through social tagging, it is possible to gather information about emotionally pleasant or unpleasant sounds in different urban areas [17]. However, as discussed in [10], not many ICT tools and/or services have been developed to support people in contributing audiovisual data, assisting toward the design of a CH framework.

Environments, either physical or artificial, bring together their own acoustic profiles. Distinct sound languages can shape a recognizable identity offering an individual experience to the human's sound perception [18]. The concept of the soundscape was introduced as early as 1977 by R. Murray Schafer, making the first attempts to describe what exactly a human ear hears or listens to, when in a particular and self-explained environment [19]. It was in 2008 when the International Organization for Standardization (ISO) established the working group ISO/TC 43/SC1/WG 54 "Perceptual assessment of soundscape quality." The objective of this group was to assist and promote consistency and compatibility between both theoretical and methodological approaches of soundscape studies and practice, developing the following definition, as given in ISO 12913-1, Section 2.3 [20]:

#### "*Soundscape is an acoustic environment as perceived or experienced and/or understood by a person or people, in context.*"

Therefore, when discussing soundscapes heritage, the key issue is to focus not only on the meaning of sounds, but on their implicit impact on the everyday quality of life and the opportunity to promote genuine acoustic sustainability. Besides, the interdisciplinary field of soundscape studying and research also lies in the conservation of acoustic heritages [21,22].

Data-driven storytelling is related to the way of making stories through data, i.e., the captured audiovisual content and its associated semantic metadata. In this perspective, possible multisite monitoring (offered by multiple mobile users) can be deployed, offering the option of selecting and/or augmenting the preferred viewpoint/reproduction configuration [14]. This feature makes a good match to the empirical and strongly personalized aspects of perceiving soundscapes, opposed to the somewhat neutral/impersonal acoustic environment capturing and reproduction [18–21]. Hence, the idea is to engage the audience for sound-related CH capturing and semantic description, thus forming a mediated way of experiencing soundscapes. Apparently, there are multiple aspects that can be assembled in this direction, encompassing all spatiotemporal, acoustic, visual, and semantic levels at the reproduction site. Nonetheless, the main goal here is to attract mobile users for collecting and contributing semantically enhanced media assets (i.e., audiovisual records with their pattern-related metadata), equipping them with the necessary Machine Learning (ML) capabilities for on-site sound detection and classification [15,16]. Such mobile applications would allow the description of the associated scenes and sound-fields (both aurally and visually), and to share the soundscape experience as intangible CH storytelling. This notion of soundscapes, which is perceived by the captured content, the offered retrieval/reproduction, and the associated sound (and video) semantics, will be considered throughout the rest of the paper.

The current work focuses on the collaborative collection and documentation of soundscapes and environmental sound semantics, which apart from CH, also significantly impact human life quality in multiple perspectives (as explained in the next sections). The whole approach has many similarities with sophisticated Mobile Journalism (MoJo) services, helping professional and citizen journalists collect news-items and shape them into featured data-driven storytelling [23,24]. Relying on the so-called *MoJo-mate* platform (Mobile Journalism Machine-Assisted Reporting) [23,24], an analysis is held regarding model elaboration and adaptation for the needs of soundscape heritage purposes. In this perspective, state-of-the-art machine and deep learning services are implemented both client- (mobile) and cloud-wise. This approach allows for multilevel semantic monitoring of sound-related heritage, while offering new insights toward sustainable urban and rural growth. Much emphasis is placed on capturing, preserving, and recreating soundscapes and open ancient theater acoustics, representing important cultural artifacts.

#### *1.1. Related Work*

Based on the preceding introduction, there are multiple perspectives concerning the related work around the discussed research domain. Data-driven storytelling, as a form of digital, sensemaking narrative, has recently received significant attention. Recognizing the increasing need to support novel means for integrating data visualization into narrative stories, featured cultural and audiovisual heritage projects deploy state-of-the-art technologies to capture, manage, and publish CH data through rich-media storytelling experiences [1–9]. Among others, related services or cultural activities (tomorrow's heritage) include tourist promotion and environmental preservation/awareness for landscapes and intangible artifacts [1–3], sites modeling/reconstruction and content restoration/documentation [4,5], and multi-disciplinary collaborations in research and education innovations [6–9]. Audiovisual and soundscape-related heritage initiatives also emerge, focusing on historical sound records and landscapes preserved, re-created, and reproduced as means of intangible CH expressions [25–29]. Furthermore, the impact of environmental sounds, noise, and soundscape components is analyzed on various aspects of modern human life, i.e., examining their associations with the residents' physical/mental health, perception, and behavior, aiming to unveil factors of sustainable growth and development and overall quality of life as well [30–37]. Social media soundscape information can serve for the prediction of health effects of noise pollution in different areas [38]. In this context, cooperative smart-sensing and crowdsourcing practices have been proposed and launched to raise public awareness toward soundscape conservation, safeguarding, and overall ecological consciousness through multimodal mapping capabilities [39–46].

In recent years, mobile devices offered significant advantages in the direction of massive harvesting of large-scale diversified audio and image data, enabling users to exploit their mobile terminals for capturing, recreating, and sharing various events [10–16,44]. Smartphone capabilities can serve for citizen science projects, following a user-centered design and providing motivation factors [45]. The cultural sector is also benefitted from this evolution, adopting these practices to collaboratively collect, share, and annotate heritage sites and artifacts [7,39–42,47]. These processes feature many resemblances with aspects of the MoJo paradigm and other Digital Journalism genres (i.e., Data, Multimedia, Immersive Journalism, etc.) [16–24,48]. Context- and location-aware services can be combined with (multichannel) semantic processing to offer spatiotemporal sound mapping and pattern-related visualizations. Such featured summarization techniques are encountered on generic audio detection and classification tasks, including environmental sound recognition [14,49–59]. In this view, crowdsourced audio data can offer soundscape enhancement with multiple augmentation layers in favor of documentation, data-driven storytelling, and management. The massive research progress on the domain has established multiple pattern recognition schemes and hierarchical semantic audio taxonomies to describe the sound-fields associated with the different social events [13–24,52–59]. Apart from the geographical- and time-related information that a mobile terminal can easily hold, environmental sounds and soundscapes can be classified, filtered, and highlighted based on the associated pattern classification taxonomies, various low-level audio descriptors, other semantic labels concerning the transmitted or perceived emotions, etc. [49–56,60,61]. Furthermore, recent audio and audiovisual captioning trends can offer additional semantic conceptualization meta-data [62–65]. These meta-information augmentation perspectives can accompany the above-discussed sustainable growth and well-being indicators, suggesting added-value innovative services for soundscape preservation and their engaging promotion at environmental, ecological, and heritage views.

A linked popular research topic that significantly propelled multidisciplinary scientific projects and associated knowledge gain is the way of learning by example, through the Machine and Deep Learning (ML/DL) paradigms. Both the audio semantics and the CH domains have also benefited the made breakthroughs and progress [4–7,13–16,52–59]. Hence, sound and acoustic scene recordings can be processed to provide event detection and recognition outcomes, offering pattern-related metadata, content-based description, and management automation (i.e., retrieval, summarization/highlighting, etc.). Coarse classification schemes (i.e., Speech, Music, Other) can be deployed for detecting human activity and other main events, which could be hierarchically extended to additional classes [13–16,54–59,66]. More complex audio patterns have been formed/adapted to the needs of environmental sound monitoring, incorporating additional classes, therefore increasing the pattern recognition difficulty (e.g., the UrbanSound classification task containing 10 environmental sound categories) [15,54–59,66]. These two taxonomies represent the primary/baseline recognition demands that the proposed system should be able to handle (i.e., to extract such class-related metadata). Hand-crafted feature extraction has been extensively used for abstracting audio information to feed ML systems, taking advantage of the perceptual human experience. Early and late integration methods were also deployed, either by temporally fusing base features or by combining multiple classifiers (both in parallel and in cascade order), also increasing the computational load demands [54–59,66]. A recent trend in the field is the use of convolutional networks and DL architectures, shifting from the feature-based representation to automatically forming audio embeddings, as part of the training process [54–59,66,67]. These latest approaches are computationally heavier (especially at the learning phase), while they also require much more labelled/ground-truth samples as inputs, so dedicated datasets are continuously formed to serve the various training and testing needs. Again, the proposed framework should be able to cope with such solutions, as well as to expedite the creation of soundscape-adapted datasets through the process of mobile crowdsourcing.

Summing up, the conducted literature review revealed important aspects of soundscapes, i.e., environmental monitoring, sound and intangible cultural heritage, data-driven documentation, decentralized/smart sensing, etc., with diverse extensions on human health and sustainable growth indicators. Many related publications have attempted to enlighten most of the above viewpoints by utilizing mobile terminals and collaborative mapping [17–19,38–45]. However, to the best of our knowledge, such a multi-faceted approach (like the current one) has not been reported, incorporating sophisticated on-site semantic analysis and crowdsourcing dynamics, as they are advanced in today's ubiquitous society (i.e., in the era of big data and the semantic web). The impact of the anticipated services is also strongly connected to featured projects, which have been deployed to discover and recreate sounds of the past, emanating from the perspectives of acoustic heritage, archaeo-acoustics, and historical acoustics. Such works, supported by limited historical/acoustic data, rely mainly on computational models and simulation outcomes to offer an intangible CH experience, projecting relationships between people and sound over time [46,68–72]. In this direction, we can forestall the dense impact of the proposed MoJo-adapted system, which can document today's soundscapes to be experienced as tomorrow's heritage, taking advantage of semantically enhanced data-driven storytelling. Recalling the importance of ground-truth datasets and crowdsourcing audio semantics in the age of deep learning, the launched model can easily lead to massive soundscape data and metadata. The in-depth analysis of those repositories would reveal finer pattern correlations and taxonomies, with sharper conceptualization capabilities.

#### *1.2. Project Motivation and Research Objectives*

The related work presented in the previous section indicates that the field of crowdsourcing soundscape assets is very fruitful and mature, providing significant benefits for cultural heritage preservation and urban development. Audience engagement can be feasible, given a proper framework design. The motivation of the current project em-

anates from the idea of incorporating proper ML/DL analysis for soundscape semantics through a cloud-based architecture. For this reason, early backend implementations for General Audio Classification and Detection are presented and evaluated. The successful implementation of *MoJo-mate*, a mobile application offering machine-assisted reporting with semantically enhanced capture and documentation MoJo facilities [23,24], justifies this approach. The encompassed audio processing and recognition layers exhibit stateof-the-art time-, context- and location-aware ubiquitous computing services, combined with generic/hierarchical pattern classification schemes [13–16]. These content analysis perspectives are considered ideal for meta-information augmentation of environmental sounds and soundscapes, which can be massively crowdsourced as User-Generated Content (UGC) to represent essential sites or places of intangible CH. The multilevel semantic interpretation of audio (and audiovisual) streams, contributed by both experienced and average users, will allow monitoring how the formed soundscapes have evolved and/or are still evolving over time and within special areas of interest. Typical examples include sensitive ecological zones, landscapes with environmental and cultural interest, and places hosting cultural activities (in ancient or modern theaters and music halls), UNESCO world heritage sites, etc.).

The utmost target is to collect the necessary volumes of data in an easy and entertaining way, provide in-situ/real-time and batch semantic analysis modes, augment the physical visiting experience, and enable data-driven storytelling through multiple auralization and visualization layers. Such techniques will allow the monitoring of the way acoustic comfort of historic urban and rural areas is affected by sound space components (e.g., cars, motorbikes, tourists) and, overall, the necessities of improving the environmental qualities. Another important aspect refers to assessing the mediated navigation experience of both physical and virtual visitors, with respect to the offered digital storytelling, derived by soundscapes and environmental acoustics recreation. No doubt, these perspectives are equally important for the processes of intangible CH collection, management, and preservation. In the long-term, sustainable growth and well-being indicators could be systematically monitored, correlated, and predicted in relation to the associated sound-field attributes (e.g., in heritage sites and areas featuring substantial environmental, cultural, or historical interest).

The work presented here is part of a broader project, aiming to collect and document multimedia semantics of soundscape heritage, to be later used for data-driven storytelling. The Logical User-Centered Design (LUCID) [6,7,11,23] was adopted through the whole process, emphasizing the audience engagement and reinforcement part. This was also one of the principal elements that had to be answered in the early beginnings of this undertaking, i.e., the degree to which targeted users would be interested to actively participate and contribute in this effort, which is aligned with the Analysis/Communication phase of standard application development procedures. Hence, a related survey was carefully set-up and executed to serve the needs of audience analysis. The second key factor would be to investigate whether mobile terminals and the associated algorithmic backend can be adapted to the task of crowdsourcing soundscape semantics. In this perspective, ML and DL systems were implemented as the initial/piloting algorithmic solutions and were thoroughly evaluated at various levels to provide a convincing proof-of-concept of the tested scenario.

Based on the above analysis, Research Hypotheses (RH) are stated and put to test, providing a convincing proof of concept of the proposed model, its feasibility, and effectiveness, emphasizing the semantic processing part:

**Research Hypothesis 1 (RH1)**: *It is both feasible and innovative to launch a Mobile Journalism application for soundscape heritage crowdsourcing and data-driven storytelling, and there is an audience willing to use the application and contribute.*

**Research Hypothesis 2 (RH2)**: *General Audio Detection and Classification techniques can be implemented by means of Machine and Deep Learning to serve the required soundscape semantics.*

In this context, risen Research Questions (RQ) accommodated to the listed hypotheses are as follows:

**Research Question 1 (RQ1)**: *How can the MoJo framework be configured for soundscape heritage capturing and documentation? How can the crowdsourced media assets serve the needs for datadriven storytelling?*

**Research Question 2 (RQ2)**: *What are the main classification taxonomies that can be incorporated in the initial backend implementations of soundscape recognition? What is the estimated accuracy and computational load of these algorithmic systems?*

The rest of the paper is organized as follows. The system architecture and concept, as well as the experimental procedures, are presented and justified in the Materials and Methods section. Results and discussion illustrate the corresponding outcomes (and their thorough evaluation), providing multi-perspective analysis with regard to the stated hypotheses and questions. Conclusions are finally drawn, stressing the novel aspects and the contribution of the whole project, followed by the respective Summary section.

#### **2. Materials and Methods**

#### *2.1. Integration of State-of-the-Art Audio and Soundscape Semantics on the Cloud*

The main target of the current paper is to enhance the semantic aspects of capturing, managing, and recreating soundscapes, engaging the audience in the direction of mobile crowdsourcing and sharing related audio events. In this context, crowdsourced audio data can be comprehended in various ways, one of them being monitoring encountered soundscapes. Theoretically, this can be achieved by manually matching and managing different input streams from end-users, exploiting the aspects of semantic tagging, and annotation at different levels of hierarchy. However, in real-world conditions, difficulties regarding user- and context-related heterogeneities arise, which require the employment of intelligent audio processing and interaction methods, to utilize and benefit from the underlying semantic information of audio data.

While many related processing strategies can be deployed on mobile computing environments, resources for processing and analyzing vast amounts of audio data in a mobile device are typically limited [10–24]. Thus, a strong motivation for embracing cloudbased services emerges in this scenario. In this direction, accessible and highly capable cloud-based computing environments can facilitate the binding of semantically relevant content, by incorporating previous knowledge on individual soundscape characteristics (i.e., the rules that a listener would associate to a specific soundscape) [73].

Prevailing research on intelligent audio analysis and sound recognition is highly focused on the sub-fields of General Audio Detection and Classification (GADC) and Environmental Sound Recognition (ESR). The analysis aims at the semantic description of complex acoustic scenes, relying on a system that inputs an audio signal and outputs the semantic description of that signal. Hence, in this case, the meaningful aspects of a soundscape are to be detected and identified.

State-of-the-art approaches in computer audio intelligence motivate data-driven modeling, through machine learning. A wide variety of pre-processing and classification algorithms can deliver a solid generalization performance, given large amounts of training data. Moreover, the performance of these models is strongly dependent on the quality of the utilized data. For this reason, mobile devices can offer significant advantages in the direction of large-scale diversified labeled audio data gathering and the construction of generic ground-truth semantic audio databases [15].

Efficient pre-processing and semantic monitoring techniques can also be deployed as a front-end client-based system, given the ability to adapt to the variance in the acoustic environments and the respective sound recording conditions. This process can locally interact with the input signal and map it into a latent space, allowing users to on-site-monitor soundscape semantics, with the option to define patterns of interest and associate them with specific audio features, geolocations, and/or visual content [56,57]. The proposed

modular architecture allows the attachment of multi-channeled ambisonics sensors to the client terminal (i.e., soundfield microphones), to apply more sophisticated spatiotemporal localization and mapping that could facilitate the audiovisual content description and management [49–51,74,75]. On the other side, more demanding semantic analysis can be performed on a batch processing mode, as a cloud service, making use of recent advantages on Convolutional Neural Networks (CNN), Deep Learning (DL), and multimodal decisionmaking systems [58–65]. The focus here lies in the discrimination of time-concurrent audio events in a hierarchical classification taxonomy. This processing type is more adapted to the audio domain and may have considerable advantages over end-to-end solutions. Moreover, a soundscape crowdsourcing approach is favored in the proposed methodology for constructing big datasets, as users are encouraged to contribute with new labeled data while making use of the services. This real-world soundscape intervention approach to audio management systems can offer further conceptual analysis perspectives of crowdsourced audio data, layered on top of existing semantic analysis assets.

#### *2.2. The Implemented Sound Heritage and Storytelling Model*

Soundscapes can tell the story of spaces through time. While the acoustic scenes can characterize certain places and ecosystems, they are also in constant movement and evolution, as they change as a whole, and as temporary events occur, breaking the perceived continuity of sound. Treating environmental recordings in this scope allows the design of an interactive storytelling mode, where varying soundscapes can be in the spotlight of the narration.

When a crowdsourcing approach is adopted, definitive and linear storytelling is replaced by a collective narration, formed with the combination of the provided audio recordings and audiovisual assets provided by the users. The criteria that individual listeners follow to access the available files define different perspectives and can form a vast amount of stories that emerge from the provided soundscape recordings. An intuitive design can support interactive storytelling, facilitating the exploration of the dataset in creative ways. Two of the main aspects of treating soundscapes have already been mentioned and they refer to their spatial and temporal evolution. An interactive map, with a supplementary timeline option, can provide the functionality for filtering the data, using both the geographical and temporal information of the recordings. The user can access environmental recordings using an interactive world map, while the option of selecting the time interval within which the recording was created is available. Context-aware content-creating applications can provide such information without manual annotation at the time of the recording [11–16,74,75].

Besides the straightforward spatiotemporal filtering of results, content-based retrieval can form different storytelling paths. Soundscapes that are far away in terms of distance or time may capture similar acoustic scenes, e.g., open theaters, cities, forests. Manual annotation from the content creators can provide a tagging scheme to retrieve relevant assets. By providing a data-driven analysis system on the cloud, several soundscape descriptors can be extracted automatically from the audio characteristics of the recordings. Users can form queries to browse through the dataset, based on the manual and automated tagging of data. In this approach, the integration of featured personalization and recommendation modules can push relevant content to the users, based on their queries and, overall, the monitoring of their behavior and interests.

So far, several scenarios of searching for audio content through textual input, as well as extracting textual descriptors from audio content, have been presented. However, modern trends in Human–Computer Interaction demand more intuitive query processes. In the context of soundscape storytelling, it is possible to retrieve audiovisual content through an audiovisual input query. By recording or providing a soundscape, users should be able to search the database through similarity checks (e.g., pattern matching). This will result in accessing content with audio characteristics that match the input. In the same way, by accepting not only audio recordings but also videos, or accompanying assets (e.g., users can upload photographs along with the environmental recordings), a mapping between different modalities can be created. By providing certain soundscapes, relevant content can be generated, and vice-versa. This interaction can provide great possibilities in the paths a user can follow to access different stories.

Another meaningful parameter that can boost interactive storytelling functionality is the acoustic modeling of distinctive soundscapes, especially those related to cultural heritage (i.e., the cases of notorious ancient open theaters). This process of defining a transfer function can be used to estimate and imitate the acoustic behavior of a scene. In the case such a functionality is offered, users can provide studio-quality or close-miking recordings with no reverberation and simulate the reproduction of their recordings as if they had been held within various soundscapes [5,56]. It is essential to mention that related functionalities have been recently deployed on the *MoJo-mate* application, facilitating time-, context-, and location-aware audiovisual recordings with significant semantic enhancements concerning the encountered audio patterns and the surrounding acoustic behavior [11,12,16–24]. While these modalities have been successfully integrated and evaluated for the needs of MoJo capturing and publishing services, the proposed re-orientation can be even more valuable in the direction of preserving and demonstrating soundscape heritage. Furthermore, the collection of big data in a more organized manner and the gradual construction of semantically enhanced audio (and audiovisual) repositories can force added-value services toward implementing diverse ground-truth sets and their utilization on more sophisticated semantic conceptualization automations. As already stated, such analysis perspectives can be correlated with human well-being, cultural heritage, and sustainable development indicators, which is very important in today's rapidly changing ubiquitous society.

#### *2.3. The Proposed Model Architecture*

The work presented emanates from the particularities residing in the vast increase in UGC. Apparently, mobile devices offer significant advantages in the direction of massive harvesting of large-scale diversified labeled audio data. Users' smartphones make the procedures of recording, recreating, and sharing audio and audiovisual material as simple as possible. Professional and nonprofessional users capture audiovisual content using mobile devices (smartphones and tablets) and upload it to the platform. However, multimedia data that are collected through crowdsourcing are often of low quality, due to nonprofessional hardware limitations and the lack of proper training. In this direction, mobile automations add a level of intelligence to assist the process. Difficulties regarding user- and context-related heterogeneities are overcome through the adoption of dedicated audio processing and interaction techniques for the semantic tagging and annotation of audio events.

To this end, the implementation of a 4-layer, cloud-based architecture is shown in Figure 1, offering audio-driven multimedia analysis and classification. Mobile terminals offer sensory and recording software to capture sound and audiovisual data, which can be enhanced with time, geolocation, and other context-aware metadata. The user can upload the created files on the cloud for analysis. The data handling layer is responsible for orchestrating and distributing the incoming data depending on the resource allocation, while also extracting audio tracks from audiovisual material and selecting the channels/segments to be further processed. Next, the audio processing and classification layer takes over, resulting in an assembly of salient (human-crafted) audio features, as presented on the left side of Figure 1 (terminal-wise analysis). A set of dedicated temporal feature integration processes is involved [54,57,59], attempting to classify the sounds identified in the given soundscape through typical Multi-Layer Perceptron (MLP) architectures. Apart from this on-site analysis, heavier processing is deployed on the cloud, utilizing state-of-the-art CNN architectures for machine-driven convolutional feature engines and finer pattern recognition (right side of Figure 1). Overall, these two independent flows employ different-complexity (and computational load) machine learning models, associated with the client-wise and server-wise (cloud) perspectives, as previously stated. The resulting entities are stored

in a repository along with their semantic representation. Based on this information, an interactive map is created, augmented with a timeline bar and multiple semantic filtering options, taking into consideration time, location, and pattern-related tags. The captured audio streams are pinned in this multilevel information mapping so that spatiotemporal monitoring and auralization processes are offered as part of the storytelling. Hence, both the UGC contributors (displayed at the bottom of Figure 1) and the end-users/consumers (depicted at the top of Figure 1) can reproduce the evolution of sound and soundscapes over time, and in relation to the available semantic layers. The main goals of the proposed architecture concern the efficient and purposeful employment of cloud services and mobile artificial intelligence for the support of interactive soundscape exploration. More specifically, the current paper evaluates the individual and ensemble potentials of the two different semantic analysis processes (terminal- and cloud-wise), thus making a convincing proof of concept for their usefulness in the attempted CH data-driven storytelling.

**Figure 1.** The adopted semantic crowdsourcing model architecture. Terminal-site audio semantics is deployed through feature extraction, temporal integration (enhanced temporal integration (ETi)), and multi-layer perceptron (MLP)-driven pattern recognition. Server-wise semantics are applied in heavier processing modes using convolutional neural networks (CNN) architectures for end-toend content-based recognition. Captured audio (and audiovisual) data are enhanced with diverse semantic tags and pattern-related metadata, which are documented in the formed ground-truth repository. These media assets also augment the proposed data-driven cultural heritage (CH) storytelling model.

#### *2.4. Experimental Setup*

#### 2.4.1. Concept Validation: Preparation of a Questionnaire Survey

The initial hypothesis (RH1) can be examined by answering typical questions for soundscape capturing, sharing, exploration, and specific aspects regarding users' cultural interests and habits, thus retrieving vital feedback. In order to grasp and monitor users' preferences the research utilized a quantitative survey method for data collection, with the formation of a corresponding online questionnaire.

Detailed information regarding this survey is provided in the associated results section, along with the assessment outcomes. An overview of the chosen inquiries is presented here, aiming to justify the adoption and configuration of the formed questionnaire. Hence, background-related questions (soundscape knowledge, relevance, previous use, etc.) were structured in a categorical form of potential answers, with 5-point Likert scales (1–5, from "Totally Disagree" to "Totally Agree" or from "Not at all" to "Very Often"). Binary values (i.e., gender) and higher-dimensional lists were also involved. The items were divided into three subsets, with the former involving basic characteristics/demographics of the users (questions 1–4), the second implicating questions on the participants' background/knowledge on soundscapes (questions 5–10), and the latter containing suggested modalities and usability characteristics of the proposed mobile application (questions 11–17, in Table 1). The test formation was validated after discussions and focus groups with representative users and authorities of various kinds. Specifically, there were involved journalists, cultural and soundscape heritage enthusiasts, multimedia producers/programmers, technologists and researchers in machine/deep learning, environmental sound recognition, audio semantics, etc. The survey was updated based on the received feedback, investigating the audience interest in soundscapes and soundscape heritage, while also estimating the anticipated dynamics of the proposed approach.


Table 1 synopsizes the final set of questions selected for the needs of this survey. During the survey preparation, all ethical approval procedures and rules suggested by the "Committee on Research Ethics and Conduct" of the Aristotle University of Thessaloniki were followed. The respective guidelines and information is available online at https: //www.rc.auth.gr/ed/ (accessed on 2 March 2021). Moreover, the declaration of Helsinki and the MDPI directions for the case of pure observatory studies were also taken into account. Specifically, the formed questionnaire was fully anonymized, and the potential participants were informed that they agree to the stated terms upon sending their final answers, while they have the option of quitting anytime, without submitting any data.

#### 2.4.2. Configuration and Validation of the Audio-Semantic Modalities

Aiming to conduct an objective evaluation for both terminal- and server-side classification algorithms, a comparative evaluation between a lightweight feature-based method and a deep learning approach was decided. As already explained, these two approaches represent the earliest algorithmic implementations that the project should launch, so they are investigated in this first research. Specifically, an Enhanced Temporal Integration (ETi) model [57] with a fully connected neural network (i.e., MLP) and typical 2-dimensional CNN topologies [58], proposed as the terminal and server-side classification approaches, respectively, were tested on typical audio classification scenarios, utilizing common datasets. Again, the specific pattern analysis taxonomies are thought of as the minimum, though entirely adequate, pilot developments to provide a convincing proof of concept, while initiating the semantic crowdsourcing process and the gradual construction of the anticipated ground-truth repository, as well.

The classification scenarios involve two datasets, according to a 3-class generic classification and an environmental 10-class scheme. The first one is simulated using the LVLib-v3 dataset [59], which follows the Speech/Music/Other (SMO) taxonomy, while the 10-class task is based on the UrbanSound8K dataset [55]. This decision is justified by the fact that the Other class of the LVLib-v3 can be hierarchically split into more classes, which for instance, can follow the scheme of the UrbanSound8k [15]. On the one hand, LVLib-v3 includes 1.5 h of recordings, and it is available online at m3c.web.auth.gr/research/datasets (accessed on 2 March 2021) and specifies a 3-fold cross-validation strategy to make the results comparable across the algorithms of different creators. On the other hand, UrbanSound8K is a standard benchmark for environmental sound recognition and contains 8.75 h of field recordings, divided into 10 environmental sound categories.

Regarding the classification units, as aforementioned, the ETi with an MLP and a 2-dimensional CNN and were deployed. It is a fact that the latest deep learning approaches can process raw waveform data [58], but the 2-dimensional topologies deliver the best balance between performance and computational cost and were selected in this case. In addition to this, the ETi method proved to be a lightweight solution for conventional feature-based classification, offering decent performance [59]. The CNN processes melspectrogram patches, with a shape of 84 time-steps × 56 bands. Spectral analysis is executed on a 512/256 sample size/step basis with a sampling rate of 22,050 Hz. The convolutional network consists of four consecutive CPD blocks (each one containing successive Convolutional, Pooling, and Dropout layers), a Global Average Pooling (GAP), and two Fully Connected (FC) layers with an additional Dropout layer in between. The number of filters is 16, 32, 64, and 128 for the convolutional layers with a kernel size of 3 × 3, while the pooling size is set to 2 × 2. The number of neurons of the FC layers was set to 64 and according to the number of classes, respectively. A schematic of the deployed CNN architecture is given in Figure 2. The MLP configuration takes as input 200 features, extracted in a 512/256 sample size/step basis and integrated according to the ETi method. The extracted baseline features are 12 MFCCs, Perceptual Sharpness, Perceptual Spread Spectral Centroid, Spectral Decrease, Spectral Flatness, Spectral Flux, Spectral Kurtosis, Spectral Rolloff, Spectral Skewness, Spectral Slope, Spectral Spread, Spectral Variation, and Zero Crossing Rate. These features are temporally integrated

using the Mean Value, Standard Deviation, Skewness, Kurtosis, Mean Absolute Sequential Difference, Mean Crossing Rate, Flatness, and Crest Factor metrics [54]. A typical network setup was deployed with two hidden layers, featuring 64 and 32 neurons. Concerning the rest of the parameters, both networks follow the same configuration: The ReLU function was used as activation for all intermediate (Convolutional and Fully Connected) layers and SoftMax for the output layer, Categorical Cross-Entropy as the loss function, and Adam as the optimizer. Dropout was set to 25%.

**Figure 2.** Schematic of the deployed CNN architecture, where the succession of the used convolutional, pooling, and dropout blocks (CPD), global average pooling (GAP), and fully connected (FC) blocks and layers is presented. The evolution of the data format along the network is also depicted.

#### **3. Experimental Results**

#### *3.1. Concept Validation: Audience Analysis Results*

To examine the proposed research question regarding the usefulness of an application similar to the one proposed, we undertook an online survey (N = 171). Data collection via an online survey appeared to be the most realistic and feasible method to reach a broad audience that would lead to a representative sample. From the collected sample, 61.4% of the responders were females, 36.4% were males, while 2.3% preferred not to state their gender. Regarding sample's distribution in the given age groups 18–25, 26–35, 36–45, 46–55, and above 55, the results are 30.4%, 48%, 15.8%, 5.3%, and 0.6% respectively. In general, the results showed that many people are not familiar with what a soundscape is. In more detail, given six (6) common acoustic scenarios, the participants were asked to identify which of them could be considered soundscapes. The study shows that over 70% of the participants were able to identify the cases in which actual soundscapes were given (e.g., sound of a bell in a village), while on the other hand, about 40% of them had difficulty distinguishing what was rather a false-positive soundscape (e.g., a teleconference). The majority of the participants expressed their interest in the mediated soundscape experience that is aimed within the current project, as thoroughly analyzed below.

In order to balance the diversity of the sample, we selected 104 out of the 171 participants, the ones positively posed against soundscape heritage, considering it an important factor for sustainability, especially in cultural places. This division was also dictated by the fact that some of the questions require a basic background and understanding of soundscapes. Thus, it would be unreliable or biased to equally balance the replies on soundscape heritage and semantics of those without a basic comprehension of the associated terms. The results from the selected sample (N = 104) show that only 30% of the participants explore soundscapes once a month. In addition, 30% of the participants record sounds and soundscapes frequently, while 66% of them record mostly cultural-related content. Moreover, 40% stated that they want soundscapes to be available for future reference and/or exploration. Moreover, the selected sample featured a clear interest in soundscape preservation over time, while the majority of them (69%) stated that they use their mobile devices for soundscape capturing and sharing. On the other hand, from the smaller percentage of participants not showing interest in sound heritage (13%) or being moderate

about it (26%), almost half of them capture soundscapes quite often, thus constituting a group of potential application users.

It is noteworthy that although soundscape capturing, sharing, and reproduction is not that widespread, the selected participants showed a high interest in the proposed application. More specifically, 89% of the participants would use an application like the one proposed for soundscape capturing and sharing. Further, 77% would use the application for the reproduction of what was once recorded, either by themselves or other users. Finally, 87.5% of the participants believe that an application similar to the one proposed here would assist in the sustainability of soundscapes' heritage.

Figure 3 provides graph statistics for both the whole (N = 171) and the subset group (N = 104), concerning some of the important questions (namely, #12, #13, #14, and #16). It can be noticed that most users are willing to capture and contribute soundscape recordings, especially the ones belonging to the selected subset (a mean value of 4.03 is observed with a st.dev of ±1.11, compared to the 3.47 ± 1.11 respective values of the entire population). Likewise, almost all participants consider it very likely to reproduce their own or other soundscapes, appraising the impact of the application to sound and soundscape heritage (again, the mean values are higher and with slightly smaller dispersion in the case of the selected sub-group). In summary, the results of the conducted survey validate the first hypothesis (RH1) and the associated research question (RQ1) that there is an audience willing to use the suggested MoJo application, contributing to soundscape heritage crowdsourcing and the subsequent data-driven storytelling (even subjects that do not fully comprehend the underlying principles of the soundscape semantic).

**Figure 3.** Results on the probability (**a**) to record—contribute soundscapes (q#12); (**b**) to reproduce recorded (own) soundscapes (q#13); (**c**) to reproduce recorded (others') soundscapes (q#14); and (**d**) on the estimated impact of the application in sound heritage. Statistical moments of mean and standard deviation (st.dev) are presented both for the entire population (N = 171) and the selected subset (N = 104).

#### *3.2. Audio Classification Results*

Classification results are presented (Table 2) in terms of accuracy statistics (mean value/standard deviation) as they have been extracted by the associated evaluation in


**Table 2.** Classification accuracy (mean ± st.dev%) on the LVLib-v3 and UrbanSound8k Datasets.


The results show that the ETi lives up to the standards of deep learning approaches, especially when computational resources are limited [13,16,56]. This was further investigated, and a computational complexity evaluation was also executed. The additional evaluation involves the measurement of prediction times for both models, and a relative presentation of the results was decided because absolute measurements can significantly vary on different processing units. Table 3 depicts the computational cost in terms of network size and prediction times.

**Table 3.** Network size and relative computational complexity for the ETi and CNN models.


It can be noticed that in the case of the ETi approach, network size is significantly smaller, facilitating the deployment on devices with low processing power. Nevertheless, the size of the CNN is not that large to make the deployment of the model impossible in the modern mobile computing devices. Summing up, the CNN can equip both client- and cloud-wise semantic analysis services, while the ETi provides adequate performance at the lowest processing cost. These findings directed our decision for selecting the ETi and the CNN as client- and cloud-wise classification solutions, respectively.

Overall, based on evaluation results of the trained models, and the justification concerning the selection of these two demanding datasets, the remaining research hypothesis (RH2) and question (RQ2) are validated/positively answered. Hence, the adopted audio classification schemes, suited for pattern-related soundscape semantics, can be served through relatively light-weight (concerning the required memory and computation load) ML and DL modules. Two related systems have been successfully trained and evaluated as the initial algorithmic backend solutions. The accuracy of those models is already more than satisfactory. However, it can be further enhanced through the users' feedback (and the implicated semi-supervised learning features) deployed within the proposed MoJo framework. Furthermore, the hierarchical and/or hybrid combination of the two taxonomies, along with the initiation of the crowdsourcing process, would lead to the gradual construction of a dedicated dataset. This problem-adapted ground-truth repository

would facilitate the training of more sophisticated ML and DL networks with superior performance and additional semantic conceptualization perspectives.

#### **4. Discussion**

The current paper introduces MoJo services updated and adapted to the need of semantic soundscape, crowdsourcing, management, and data-driven storytelling. Based on the conducted experiments, the stated hypotheses have been fully verified, i.e., the audience is interested in such a mobile application (RH1). Furthermore, current technology is adequately mature to reliably deliver the wanted functionalities through General Audio Detection and Classification techniques deployed through Machine and Deep Learning networks to serve the required soundscape semantics (RH2). Furthermore, specific audio processing and semantic analysis features were tested in an effort to quantify the implementation parameters set in RQ1. The configured modalities, both client- and server-wise, exhibit remarkable accuracy with acceptable computational load. Based on the previous experience with the *MoJo-mate* platform [11–24], especially for the data shaping, presentation, and publishing part, the proposed model can efficiently deploy the desired data-driven storytelling and management services, which have a heavy impact on the CH domain. Concerning the technological adequacy and reliably that RQ2 inquires, the proposed integration seems to overcome the expected difficulties and to suitably serve the desired semantic enhancement, documentation, and auralization/reproduction perspectives. Specifically, along with the above-mentioned low-level measurement modes, the software also provides long-term audio analysis capabilities, based on semantic audio processing concepts [56]. This higher-level mode brings real-time audio-pattern recognition, visually resulting in an event detection markup timeline. A dynamic audio-samples database is used as a pattern-storing matrix, which is configurable by users. Samples can be added, by making a simple recording, and deleted as well. Relying on the *MoJo-mate* application experience, a user-friendly measurement session manager is feasible, allowing each measurement to be easily stored on the mobile terminal memory and recalled on demand. Additional session measurement data can be stored, including title, location, user's comments, etc., while the position is automatically determined utilizing the device GPS. Likewise, timestamps are easily overlaid by the device, while a handy interface allows photo and video capturing of the measurement location, i.e., the recorded soundscape. A cloud-based session manager handles all the users' data, aiming at building a user-generated, spatiotemporal digital map used for storing measurements. Users can store, update, and retrieve raw audio data and their corresponding analysis output. All measurements uploaded to the cloud are accessible by anyone who uses the application. By exploiting the GPS sensor and cellular data capabilities, the application can easily classify and group measurements by geographical location and kind. Thus, a user can instantly check and confirm the correctness of a specific measurement by comparing it to similar ones, provided by other users. They can even obtain the desired data without making a measurement.

Audio recognition usually refers to different recognition tasks, like acoustic scene detection, speech recognition, and speaker recognition. Systems that implement such models are oriented to specific scenarios of recognition. Applying audio recognition to soundscape management is a much more complicated task. The information that can be extracted from the recordings is not pre-defined. Environmental noise can contain multiple layers of audio information and includes a great variety of possible temporal audio events. In the proposed approach, an ensemble of algorithms is proposed to compose a hierarchical classification scheme. For example, an algorithm for acoustic scene classification can classify an acoustic scene as "river," while an audio event detection can recognize a "speech" audio event at a certain time, triggering algorithms that extract information concerning speaker diarization and spoken language, thus triggering algorithms that transform speech-to-text, etc. This approach results in several layers or perspectives of audio monitoring, giving the user the possibility to browse through the data with different levels of information abstraction. In the context of environmental recordings, several information layers concerning

acoustic characteristics, noise levels, etc. can also be included in the defined hierarchical scheme. Another interesting approach for analyzing complex scenes is automated audio (and audiovisual) captioning. This defines an end-to-end model that maps acoustic scenes to descriptive texts but can also correlate them with associated visual entities.

#### **5. Summary**

The current work focuses on the collaborative collection and documentation of soundscapes and environmental sound semantics. The whole approach has many similarities with sophisticated Mobile Journalism services, assisting professional and citizen journalists in collecting news-items and shaping them into featured data-driven storytelling. Crowdsourcing media assets for cultural heritage is a fruitful field that can engage an audience through successful design and motivation decisions. Along with audio/multimedia content and metadata, semantic annotation can be incorporated through typical sound classification scenarios. A comparative evaluation between a lightweight feature-based machine learning network and a convolutional deep learning architecture was decided for the terminal and server-side algorithmic approaches, employing two different classification taxonomies with applicable audio datasets. Adopting the LUCID design and development methodology, audience engagement and reinforcement was triggered through an online survey, confirming that users are willing to contribute and appraise the impact of the application to crowdsource sound semantic and soundscape heritage.

The innovation of the paper lies in the incorporation of sophisticated on-site semantic analysis and crowdsourcing dynamics, as they are advanced in today's ubiquitous society (i.e., in the era of big data and the semantic web). Specifically, one of the advantages of this approach, which also highlights one of the main novelties of our work, is that besides collecting and storing resources (recordings of soundscapes and corresponding metadata) from users, it is possible to provide semantically enhanced services on the cloud. Environmental sound recognition is addressed in the paper as one of the featured functionalities using machine learning techniques. Relying on the so-called *MoJo-mate* platform, an analysis is held regarding model elaboration and adaptation for the needs of soundscape heritage. A four-layer, cloud-based architecture was deployed, incorporating two independent flows that employ different-complexity (and computational load) ML/DL models, associated with the client-wise and server-wise (cloud) perspectives for soundscape semantics. The achieved model performance supports the feasibility of the proposed system. The impact of the proposed MoJo-adapted system lies in the ability to document today's soundscapes to be experienced as tomorrow's heritage, taking advantage of semantically enhanced data-driven storytelling.

**Author Contributions:** Conceptualization, C.D. and M.E.S.; methodology, N.V., L.V. and I.T.; software, L.V. and N.V.; validation, M.E.S., L.V. and C.D.; formal analysis, N.V. and I.T.; investigation, M.E.S.; resources, C.D. and L.V.; data curation, C.D., L.V. and N.V.; writing—original draft preparation, all authors; writing—review and editing, all authors; visualization, M.E.S.; supervision, C.D. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Data available in a publicly accessible repository that does not issue DOIs. Publicly available datasets were analyzed in this study. This data can be found here: [m3c.web.auth.gr/research/datasets (accessed on 2 March 2021)].

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

<sup>1.</sup> Yao, D.; Zhang, K.; Wang, L.; Law, R.; Zhang, M. From Religious Belief to Intangible Cultural Heritage Tourism: A Case Study of Mazu Belief. *Sustainability* **2020**, *12*, 4229. [CrossRef]


## *Article* **Holistic Requirements Analysis for Specifying New Systems for 3D Media Production and Promotion**

**Christos Mouzakis <sup>1</sup> , Dimitrios Ververidis 1,\* , Luis Miguel Girao <sup>2</sup> , Nicolas Patz <sup>3</sup> , Spiros Nikolopoulos <sup>1</sup> and Ioannis Kompatsiaris <sup>1</sup>**


**Abstract:** This paper presents a requirements engineering process for driving the design of new systems that will allow for enhancing 3D media productivity, for lowering the entry barrier in 3D media creation, and for innovative media forms across many media types. This work has been carried out with the perspective of enhancing recovery and transformation as the pandemic has driven many professionals in culture to zero income. Toward this goal, we perform a requirements engineering process based on the IEEE 830 standard for requirements specification. It allows us to elucidate system requirements through existing (AS-IS) and envisioned (TO-BE) scenarios affected by the latest trends on design methodologies and content promotion in social media. A total of 30 tools for content creation, promotion, and monetization are reviewed and 10 TO-BE scenarios were engineered and validated. The validation was performed through a survey of 24 statements on a 5 Likert scale by 47 individuals from the domains of Media, Fine arts, Architecture, and Informatics. Useful evaluation results and comments have been collected that can be useful for future systems design.

**Keywords:** requirements engineering; authoring tools; 3D content; IEEE 830 standard; social media

#### **1. Introduction**

In system engineering and development, the most crucial part is the definition of the requirements. Many projects worldwide start from the assumption that a certain system is needed, a decision that is proven to be fatal. Google Poly for example was a 3D repository platform that it has recently shut down as there were already too many similar platforms in the market [1]. Success stories are only those that are driven by wide surveys. For example, the WebVR portal was providing an API key only if creators were filling in a survey questionnaire [2]. This process lasted two years and it was only then that the WebXR Device API emerged [3]. The methodology followed in our research for requirements analysis is the IEEE 830 Requirements Specification Standard [4–6] which is based on an elucidation, analysis, and validation process through surveys. In this manner, we can engineer requirements in a structured manner and derive conclusions more accurately.

Our paper can be useful to research organizations, companies, and policymakers that seek to develop new platforms related to 3D media content. We are investigating crucial features that new platforms should have in order to be attractive to experts, but also to enable non-expert citizens to participate in the creation process. The skills elevation of non-experts in programming and designing is considered by Gartner as a crucial factor for economic growth [7]. According to European Commission reports, the culture and creative ecosystem have been deeply affected by the pandemic [8]. European media SMEs face severe issues, while unemployment has increased, and many media professionals in culture—particularly those who are subject to precarious employment conditions or are freelancers—have found

**Citation:** Mouzakis, C.; Ververidis, D.; Girao, L.M.; Patz, N.; Nikolopoulos, S.; Kompatsiaris, I. Holistic Requirements Analysis for Specifying New Systems for 3D Media Production and Promotion. *Sustainability* **2021**, *13*, 8155. https://doi.org/10.3390/ su13158155

Academic Editor: Charalampos Dimoulas

Received: 7 April 2021 Accepted: 17 July 2021 Published: 21 July 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

themselves with no income. Cinemas suffered a collapse in revenues (with losses estimated at 100,000 euros per screen per month during lockdown), whilst the shooting of new films, programs, and TV series has been in many cases halted. In parallel, the crisis has accelerated major trends in digital technology. Online platforms have strengthened their market position, launched new services, and attracted new audiences during the lockdowns. New online social media platforms—largely based on audiovisual content—have also hit records in downloads. The audiovisual industries are facing many changes.

One of the new trends is the emergence of XR (eXtended Reality) content in many media applications. XR content is a term introduced to cover Virtual Reality and Augmented Reality and, in most cases, requires 3D models in order to be constructed. Social media are continuously embedding XR content, for example, TikTok AR filters and the Facebook Horizon platform. It is necessary to find new authoring systems that also allow non-experts (non-programmers or non-designers) to be involved in the process and thus tackle the transformation efficiently without excluding sensitive groups from the production. Art and Cultural Heritage (CH) in general need also to conform with this XR norm so as not to risk losing part of their audience. Museums, galleries, and other stakeholders in CH are in need of tools that will allow their content to be consumed through virtual experiences such as games, VR, AR, and blended versions of them, so as to attract their audiences. What should be done more to boost the XR content creation by non-experts is a crucial question that we have investigated through our survey.

The paper is structured as follows. In Section 2, several surveys that focus on requirements analysis in the field of 3D media are analysed. Existing tools related to content creation are reviewed in Section 3. In Section 4, we describe the methodology adopted for requirements engineering that is based on finding candidate scenarios for evaluation. In Section 5, we present the results of the survey conducted for evaluating these scenarios. In Section 6, a discussion of the results is provided. Finally, conclusions are derived in Section 7.

#### **2. Review of Related Work**

Below, we review four surveys performing a requirements analysis procedure for new 3D media platforms. The first survey presents the opinion of entrepreneurs regarding the potential use of XR platforms; the second survey regards the issues of existing XR authoring tools in the market; the third one identifies issues in the adoption of VR technologies; and the fourth survey regards of acquiring the opinion of "Lead-users" as a methodology for early AR media product development.

The first survey was performed by the XR Association in 2020 (before the pandemic) with nearly 200 professionals representing startups, enterprise technology firms, and investors [9]. The survey has shown that growth in 3D technologies is driven by the rise of AR applications, which are software-based, leading to strong market penetration and lower costs of production. Although only 25% of respondents were working primarily in AR, 76% agreed that the AR market would overtake VR in terms of revenue. Moreover, 85% predicted that the AR market overtakes VR within the next five years. With so many daily digital touchpoints centred on and around the smartphone, it follows that developing specific content for social media channels is the priority for immersive technology companies' content creation efforts. In 2019, only 31% of respondents were creating specific content for social media, but that number reached 47% in 2020. With AR devices in half of the global population, over two-thirds of respondents expect that businesses will be investing slightly or significantly more in immersive technologies in 2020 compared to 2019. With a few clicks, consumers can download apps to visualize clothing purchases, test out new makeup looks, rearrange the furniture in their apartment, or play a game catching virtual objects around their neighbourhood. The main issue, however, is that this survey was performed before the pandemic during which the necessity of VR applications has emerged for telepresence applications.

Several issues with XR authoring tools were spotted in 2018 by an academic survey [10]. In particular, it was mentioned that non-technical designers and end-users are facing

a lack of tools to quickly and easily prototype and test new XR user experiences. A review of 20 existing XR authoring tools was performed. Significant technical skills and programming experience was required to create XR experiences. While tools like Unity, Unreal and A-Frame have in many ways become the "standard for XR", they still provide a high threshold for non-technical designers and are inaccessible to less experienced endusers. Although there is a new class of tools, most of them web-based, for creating basic XR experiences, allowing users to choose from pre-made 3D models and existing scripts for animation and interactivity, these tools usually cover only a very limited spectrum of the XR design space, and still require programming for more advanced application logic.

In another academic survey in 2019 [11], 611 German consumers with different (socio-) demographic backgrounds were surveyed for user acceptance of VR devices. The findings indicate that health and privacy risks diminish adoption rates, whereas—contrary to other technologies—psychological or physical risks—do not. Likewise, fashionable designs and wearable comfort—two novel constructs investigated in this research—matter in addition to established utilitarian and hedonic constructs. Finally, this study includes a novel perspective on media technologies by showing that VR-adoption intention is highest when consumers expect to experience both a strong sense of virtual embodiment (the sensation of being another person) and virtual presence (the sensation of being at another place), while the presence of only one of these conditions may even have a negative effect. However, this study is quite old for VR standards, as head-mounted displays have been significantly improved in the last two years as regards quality, comfort, and price.

In 2019, a study was performed aimed to identify the connection of "lead-usership" and technology acceptance in the context of AR media innovation evaluation [12]. Through a sample of 273 participants, they tried to predict sales volumes or market shares. The model applied is a lead-usership extended version of the UTAUT2 model. Specifically, lead-users are ahead of important market trends, which lets them experience a need that other users will have at a later point in time. The results have shown that behavioural intention and the effects of influencing factors substantially differ between lead-users and other users. Using an innovative AR mobile gaming app as the object of investigation, they found in a quantitative study with 273 participants that lead-users' technology acceptance differed from that of non-lead-users. Despite the limitations on the participant group size and background diversity, the findings demonstrated that lead-users not only contribute to the early stages of the innovation management process but can also play a significant role in later stages such as acceptance predictions. This might provide benefits especially to media companies, which often have many consumers who are constantly demanding innovative products and are willing to add to their development.

As inferred from the aforementioned surveys, the importance of an accurate requirements analysis is crucial for 3D media technologies. Our survey extends the aforementioned surveys with more updated results for the post-pandemics era and specifies new features that 3D media platforms should have for the authoring and promotion of 3D content in social media.

#### **3. Review of Current 3D Authoring and Promoting Platforms**

This section provides an overview of the existing tools for the creation and promotion of 3D content. A list of these tools can be found in Table 1 and are explained in the following lines.


#### **Table 1.** Authoring tools for VR experiences in Arts and Culture.


**Table 1.** *Cont.*

The tools presented in Table 1 can be categorized according to their use into the following categories:


These categories are explained in greater detail in the following.

**Graphics design tools:** Google Tilt Brush and Adobe Medium are useful tools for artists and designers to create visual arts in a virtual world and author virtual tours and galleries. Google Tilt Brush supports features like dynamic brushes, and it has intuitive interfaces mostly suitable for experts on artistic design. It is compatible with Vive, Oculus, Oculus Quest, Windows Mixed Reality, Valve Index, and Playstation VR headsets. It requires the downloading of the software from Steam, Humble Store, Vive, Oculus, or PlayStation Store. Recently, Google Blocks has been introduced which is similar to Tilt

but much more simplified targeting for novice designers. Adobe Medium is a similar application available only for Oculus Rift and Oculus Quest headsets. It supports features like 3D editing and multi-playing. Other design tools available in the market are the "traditional" ones, that is, those based on a desktop screen interface such as Blender, Maya, 3DS Max, and Cinema4D.

**Professional VR experiences design tools**: Many professional artists rely on programming interfaces and tools to generate content. In this category, Unreal, Unity, and Godot Graphics Engines can be found which are often used to generate VR experiences. Amazon Sumerian is another tool that is based on web technologies for VR applications. It is compatible across VR headsets such as HTC Vive, Oculus Go, Google Daydream, and Hololens. It can be used to create VR tours like virtual museums and other subjects of cultural interest, for example, a 360 video presentation that provides an immersive real-world experience to help travellers select their destination, accommodation, or adventure. It has a 3D editor, Amazon Web Services speech synthesis and recognition, ready to use templates and assets. PlayCanvas is a 3D editor for VR applications targeting novice programmers. The application has a cloud-hosted creation platform that allows multiple users to interact within the project. It supports features like asset uploader, animation control, scripts editor, multiplaying, etc. Wonda VR is an application for novice programmers based both on 3D geometries and 360 media to generate VR experiences. The scope of the application is to author social VR environments for immersive storytelling and interactive branching narratives. It supports features like 3D editor, asset uploader, animation control, scripts editor, and multiplaying.

**Amateur VR experiences design tools:** Artists, journalists, and other amateur content creators often resort to the solutions of this category where VR experiences are easily authored through 360 media. Several tools can be found in this category:


on 360-degree video or a standard video shown in a cinema-style mode. For editors, supports both 360 and 180 VR format.


**Free coding tools for Art:** Many artists prefer to create designs by coding in free programming tools. These tools have gained particular interest as they can be used to make audiovisual content for VR environments. Prominent examples are Isadora, MaxMSP, OpenFrameworks, Three.js, Processing, and VVVV.


**Education on VR design:** This category contains useful tools for educational purposes that help students or employees to easily learn subjects of interest.


like a 3D editor, an asset uploader, and multiplaying. It is also a tool to sketch ideas and explain them to others.

**Storyboard and VFX design tools**: This category contains tools that allow experts in design to generate storyboards, virtual productions, real-time XR productions, and apply special effects to movies. Graphics Engines such as Unity, Unreal, Godot, and Blender can be found in this category. Other editing tools are movie editing tools such as Adobe After Effects and TorusMediaLabs 360 Canvas for Adobe Premiere that allow to easily change background font or foreground characters with 3D graphics.

#### **4. Requirements Collection Procedure**

A requirement is defined by IEEE [4–6] as: (1) a condition or capability needed by a user or a system to solve a problem or achieve a goal; (2) a condition or capability, which has to be provided by a system to fulfil a contract, a standard, a specification or any other formal documentation; and (3) a documented representation of a condition or capability. In our work, particular gravity was given to the third definition as we wanted to find a well-justified and documented representation of a capability for future developments during a Requirements Engineering (RE) process. Overall, the RE process is depicted in Figure 1 and it consists of the following steps: (1) Preparation, (2) Elicitation, (3) Analysis, (4) Specification, and (5) Validation.

**Figure 1.** Requirement Engineering phases.

The **Preparation Phase** consists of three steps. The first is related to scenario modelling. In this step, a "Template for collection of scenarios" is defined. The aim is to distribute this form among artists and journalists so they can perform their contributions to the art industry scenarios identification. Based on this input information, modelling can be achieved. This step intends to provide the foundations and guidelines for the representation of various types of artists, so that the current process may be analysed and improved. Modelling or representing the current (AS-IS) situation is the basis for identifying shortcomings and potential improvements and forms the basis for the design of adequate models (TO-BE). The results of the template, namely the AS-IS and TO-BE models for art are summarized in 10 cases in Table 2.


**Table 2.** Target user group activities and requirements engineering elicitation phase.

Both the AS-IS model, which represents the current situation as it is, and the TO-BE model, resulting from incorporating the desired improvements, are equally important. Catalysts are the new trends that appear in society and particularly in electronically generated visual arts. These will allow the proposed system to find a track in the market of creative industries.

**Elicitation:** The requirements elicitation phase represents all the actions performed to acquire raw requirements related to what is intended to develop in the project. The purpose of our project is to construct a platform that can help artists such as those in the first column of Table 2 to do their job more efficiently, quicker, with lower cost, or publish and disseminate it better. In principle, the requirements are defined according to the traditional internal expectations such as an increase in profits, production cost savings, streamlining of processes, reduction of creation times, shortening of processing times, receive up-to-date information, achieve better communication between production units, or minimizing of idle times. On the other hand, some of the common external customer and/or market-oriented socio-business expectations are higher process quality and resulting product quality, closer proximity to customers and better customer commitment, and faster communication with market partners [7]. These socio-business indicators are clearly defined in order to be later used for progress measurement.

A formalization of the requirements according to the IEEE 830 standard characteristics is presented in Table 3 [5]. These characteristics are Unambiguity, Completeness, Consistency, Verifiability, Relevancy, and Feasibility. The most important is Unambiquity which is a perceivable definition of the requirement. Many Requirements Specification Languages (RSLs) exist from the early 80s. However, we preferred to define the requirement in Attempto Control English (ACE), which is a natural language [13]. Although not tested with an ACE parser, the premature descriptions allow one to confine the definition of the component and to let it be a seed for future official tests. With respect to Completeness, a short concise description of the requirement is provided. As regards Consistency, it involves conflicts in the implementation of the proposed requirement. For example, some requirements have inconsistencies out of which the most important is security. Although WebGL standard is designed according to security standards, a poor implementation might be open to penetration attacks. As regards the use of web cameras for applications targeted to children, they will have to grant access to a web page or a native program to access their camera. This may be an impedance to application adoption. As regards Verifiability, several requirements do not have a quantitative result that can be measured directly, for example, how good are visual effects on a video or how well are the borders of objects are defined on a video stream. In such cases, user evaluation quality tests are required. Relevancy is correlated to the scope, the budget, and the contract terms of the project, namely, to develop a 3D-media-related system for artists. As regards feasibility, budget and current state of the art were taken into consideration.

In the second step, brainstorming based on the aforementioned scenarios was carried out to discuss and present ideas on the tools/solutions necessary to develop/implement to accomplish the TO-BE scenarios among stakeholders, namely an experienced artist, an experienced journalist, and the project technical manager. Requirements elicitation was an iterative activity, where brainstorming and interviews were used. A specific template form for the requirements definition process was defined as an extension of Table 3 with more project-related details that are out of the scope of the paper to present. Next, in requirements analysis, user requirements were clarified, categorized, and documented to generate the corresponding specifications. A crucial step is the "Approval of end-users". Therefore, we conducted a survey in order to mine the opinion of the end-users, namely artists, journalists, architects, and informatics scientists, regarding the posed requirements.


#### **Table 3.** Understanding the requirements.

#### **5. Survey Results**

The main goal of the electronic survey is to allow media content creators to evaluate and validate the proposed TO-BE scenarios and the requirements that were found during

the Preparation and Elicitation phase. A total of 47 individuals participated in our survey who were reached from social media such as Facebook, Twitter, and LinkedIn. The survey consisted of four parts:


In the following, we present the acquired information. In Section 6—Discussion, the results are analysed in detail.

#### *5.1. Demographics*

#### 5.1.1. Nationality of Participants

The nationality distribution of the participants can be seen in Figure 2. The majority of the participants were located in Greece (75%) whereas the rest of the participants were located in Belgium, Bulgaria, Germany, Hungary, Italy, Malta, Spain, Switzerland, and the United Kingdom.

5.1.2. Educational Background of the Participants

As can be seen in Figure 3, 18.39% of the participants are from Journalism and Mass Media Communication (MMC). Significant participation is from Fine Arts (8.17%), Informatics (8.17%), and Architecture (6.13%). Other participants are from Advertising, Design, Education, Gaming, Humanities, Marketing, and Social Sciences with 1.2%.

**Figure 3.** Educational Background.

#### 5.1.3. Age Group of the Participants

The age group distribution of the participants can be seen in Figure 4. The majority of the participants are from the 18–25 yo group (49%), followed by the 25–35 yo group (28%), 35–45 yo group (15%), and the 45–55 yo group (8%).

**Figure 4.** Age group of participants.

5.1.4. Type of Employment Organization

As can be seen in Figure 5, most of the participants are from universities (55%), Private Companies (24%), and Research Centres (13%). Other types of employment are Freelancers (4%), Media Organizations (2%), and NGOs (2%). As regards individuals from universities, their age distribution can be seen in Figure 6. Mainly Bachelor and M.Sc. students participated with a percentage of 80%.

**Figure 5.** Organization of participants.

**Figure 6.** Age group from universities.

In general, demographics provide more insights into the survey results. As regards the nationality of the participants, it can be seen that all participants are Europeans. Most participants are from Greece with a 75% which makes the research mostly focused on the Greek media status quo. As regards the educational background of the participants, it is seen that Journalism and Mass Media Communication, Fine Arts, Informatics, and Architecture dominate. These disciplines have flexibility regarding the use of tools both for 2D and 3D design. As regards the age group of the participants, high participation of youth (18–25yo) was observed at about 50%, and there is a gradual decrease to half when observing older groups, until the 45–55yo group with 8% (Figure 4). A remark is that no participants were above 55yo. This can be interpreted to meant that youth is keener

on using electronic tools for artistic content creation. As regards the organizations that participants stem from, it can be seen that most of the participants were from academia. Private companies also have a strong presence with 24%. From Figure 6, it can be observed that participants from universities are mostly students with a percentage of 80%.

#### *5.2. Previous Experience Collection and the AS-IS Scenarios*

#### 5.2.1. Software Used for Creation

In this question, participants have provided information about the experience with existing software for creation activities. Multiple selection of answers was possible from the list of tools that were described in Section 3. The results are shown in Figure 7.

**Figure 7.** Most used software tools for creativity.

Most of the participants use Adobe Illustrator (46.8%) for graphics design, Adobe After Effects (40.4%) for applying effects on videos, and Adobe Premier for editing videos. Next, Unity3D graphics engine is used for 3D experience production (27.7%). Rhino3D is popular with 17% and Blender is popular with 14.9% which are both 3D design programs. Adobe XD for mobile and web experience design, Google Blocks for 3D design in VR, and Unreal graphics engine for 3D experience creation are also popular with 12.8%. Cinema4D, Maya-3DSMax design tools and Python programming language show some indication for increased usage.

The software tools popularity not only shows the popularity of tools but also the popularity of each type of content. 2D content creation tools of Adobe such as Illustrator, After Effects, Premier, and XD are very popular. This can be explained by the fact that most of the participants stem from media and fine arts. Unity3D and Unreal graphics engines

show increased popularity as well, which indicates that game creation and storytelling are popular. However, they have lower rates than 2D creation tools which can be explained by the fact that they need programming skills and require more training to produce the final result. Rhino, Blender, Maya-3DS Max, and Cinema4D are also popular tools for 3D designing content. The increased rates for Rhino can be explained by the participation of Architects in the research as it is a tailored solution for them. Google Blocks is a surprise as it was not expected to receive more votes than Adobe Medium and Google Tilt. These are all 3D design tools that are used through VR glasses, but Google Blocks is only for low polygon models whereas the other two can achieve more realistic models. It seems that low polygon design is more attractive to be done inside VR environments. Python language is also popular, but it seems to be affected by the participation of the informatics experts.

#### 5.2.2. Promotion and Monetization Software

Participants voted on which software tools they use for promoting and monetizing their work. The distribution of the most voted tools is shown in Figure 8. It can be seen that social media such as Instagram is prevalent with 78.7%, Facebook follows with 72.3%, next is YouTube with 55.3%. LinkedIn is also popular with 42.6%, WordPress personal blog received 27.7%, whereas TikTok and Blogspot received 25.5% each. Lesser used software are Twitch gaming social media and Wix web page creator with 10.6%. CGtrader and Turbosquid repositories for selling 3D models have low percentages with 6.4%. Mozilla Hubs and Playcanvas as 3D space creation tools for multiplying activities received 4.3%.

The results of the question regarding promotion and monetization software tools and platforms indicate that major social media platforms such as Instagram, Facebook, YouTube, and LinkedIn are widely adopted by media creators to promote and monetize their work. "Facebook Creator Studio" and "YouTube Studio" allow monetization through advertisement, whereas LinkedIn is for promoting career opportunities and portfolios. Instagram is a smartphone-centred application without a desktop front-end. The latest addons of Instagram such as Live Shopping allows one to monetize non-electronic art such as handmade paintings. Another pathway for promotion and monetization is through personal websites achieved by WordPress, Blogspot, and Wix. All these three platforms provide a simple way to make personal websites through templates without requiring programming.

**Figure 8.** Most used tools for promotion and monetization.

#### *5.3. Evaluation of the TO-BE Scenarios and Requirements*

This is the main part of the survey. Six TO-BE scenarios were selected from Table 2, which are relevant to our project [1]. Then, several statements per scenario were composed according to Table 3. Each statement can be rated on a 5-grade Likert scale where 1 corresponds to a disagreement and 5 stands for an agreement with the statement. The results for the evaluation of each of the six scenarios are presented in Figures 9–14, respectively.

Scenario 1 refers to the visualization of 3D models in web pages. The participants disagreed with "Current web pages are adequate, there is no need to visualize 3D models" as indicated by the average score of 2.15 (Figure 9a). This reveals that participants consider the presence of 3D models in web pages as a great need. According to the results in Figure 9b, participants believe that "The 3D models will significantly increase downloading time and will require high-end client devices". The need for 3D models in web pages stems from the fact that the current web page design software does not support the insertion of 3D models in personal websites pages as can be inferred from the results in Figure 9c.

As regards Scenario 2 about the question "Do 3D special effects in web or mobile?", the results are as follows. For the statement regarding the belief that mobile and web software will never have the potential of desktop devices, most participants disagreed with a score of 2.27 (Figure 10a), indicating that users have a strong belief that mobile and web technologies are capable of being incorporated in the media production chain. As regards the choice among web or mobile technologies, most participants answered with web technologies with a score of 3.19 (Figure 10b) vs. 3 for mobiles (Figure 10c). Further analysing the results, 17 participants voted for web technologies against 12 for mobile technologies.

**Figure 10.** Scenario 2 validation results.

As regards Scenario 3 about "Making edits on video streams inside VR worlds", the participants agreed that it is difficult to do video edits inside a 3D environment with an average score of 3.25 (Figure 11). In general, the participants were sceptical about this scenario.

**Figure 11.** Scenario 3 validation results.

As regards Scenario 4 about "Promoting artistic media inside VR worlds", the statement about the commercial saturation of the media due to high exposure has received strong disagreement with a score of 2.34 (Figure 12a). This reveals that VR worlds are suitable for promoting artistic and media work. Almost the same was the response about the risk of the 3D models being stolen if exposed in VR environments with a score of 2.19 (Figure 12b) indicating that participants consider VR spaces as a safe place for 3D content. As regards the introduction of a fee to enter VR spaces, the users show a disagreement with a score of 2.70 (Figure 12c). As regards the exposure to and accessibility from social media, the participants have shown great agreement with a score of 3.55 (Figure 12d), indicating that collaboration with social media is an important factor for such a scenario. Even stronger was the opinion that the existing repositories do not promote and visualize content adequately with a score of 3.59 (Figure 12e).

As regards Scenario 5 about the "Theme dedicated VR experiences for personal or joint blogs", the participants disagreed with the statement "It is a bad idea since next generation social VR environments are more attractive" with an average score of 2.59 (Figure 13a). This indicates that there is innovation potential in such a scenario. In general, they found the idea interesting because the features offered can be tailored better to personal interests (average score 3.82, Figure 13b). Also, about the idea of "COVID19 VR nano-worlds", they found it interesting with the highest score observed among all questions, that is, 3.97 (Figure 13c).

**Figure 13.** Scenario 5 validation results.

As regards Scenario 6 about the capability of designing 3D models inside VR spaces, the participants disagreed with the statement "The idea is bad because traditional software allows easier to design with mouse and keyboard" with a score of 2.23 (Figure 14a). They also disagreed that such a design method will be for amateur artists with a score of 2.59 (Figure 14b). Obviously, the idea of designing 3D models inside VR spaces is good for both amateur and professional creators, however, as it is inferred also from the comments received (see Section 5.4), the interfaces of existing approaches are not very intuitive.

**Figure 14.** Scenario 6 validation results.

#### *5.4. General Comments Received*

The overall comments received by participants were divided into four categories, namely Interfaces, Barriers, Benefits, and Personalization. The comments regarding Interfaces are:


7. VR/AR engineering and training application including detailed human–computer interaction via fingers haptic devices.

Comments 1 and 2 refer to the easiness of the interfaces and particularly the editing capability inside the 3D environment. Comment 3 refers to the accessibility of the solution mostly achieved through the cloud and to the coverage of all types of media. Comment 4 refers to media convergence, that is, to allow the combination of several types of media in order to create a new solid type of media. Freedom to create, project personal content, and expose media outside VR environments should be also possible. Comment 5 is on an online platform for VR content, namely, to make a repository for whole scenes as pre-built solutions for certain scenarios. Comment 6 is on the idea to create 3D models inside the VR environment. Comment 7 suggests that VR applications for training are also interesting if they also exploit peripheral devices such as haptic devices.

It was observed that most individual comments regarded the interfaces of VR environments. They are related to easiness, accessibility, and haptic devices interconnection. It is true that current VR systems require a process to download, install, and learn the interfaces of the design tool. As regards accessibility, many devices still do not allow the use of VR headsets with prescription glasses. Another issue is the cost, as they are still expensive for amateur creators with a cost of around 400 euros. All these seem to be the most important barriers for content creators. Another evident issue is the cloud availability and the co-design capabilities. Many creators would like a cloud-oriented application where assets, scenes, and projects can be easily created, shared, and co-edited in joint repositories. For the time being, most design applications limit the design capabilities to local use or single-user cloud repositories. The low penetration of VR design tools due to the high cost of devices might be a reason that such repositories did not gain acceptance from users.

The comments regarding Barriers are as follows:


Comment 1 refers to the high price of 3D models with respect to the 2D media. As regards Comment 2, it highlights the need for training materials for using the authoring tools. Comment 3 regards the internet speed that allows one to download 3D models on VR/AR applications on the fly improving thus the user experience. Comment 4 refers to the collaboration of many users for VR environments or 3D models creation but the sharing of spaces for collaboration with other users is a far-fetched goal. Comment 5 focuses on the device-independent environment for publishing the results, whereas Comment 6 is highlighting the need for reducing cognitive burden so that the user can focus on the creative process. It seems that users have many concerns regarding 3D content creation. Most of the Barriers refer to the design inside VR environments and only one refers to the expensiveness of 3D content as for the time being too expensive to be used as an artistic improvement for various applications. As regards VR environments, it seems that users are missing enough training material. There are limited resources for viewing what other designers are designing in VR as the approach to create 3D models in VR space differs from the traditional way of designing 3D models. Shared spaces for multiple users to design seems to be a far-fetched target for the time being as many issues are not solved for a single user. A comment also refers to the 5G connectivity that will enhance downloading times.

As regards the category Benefits, the comments received are as follows:


It is inferred from Comment 1 that creators should have some benefits after creating the content. Comment 2 refers to the variety of media so the space can be commercially exploitable, as well as to the interaction among the designers in order to improve the final result. Comment 3 refers to the combination of media that will allow one to enhance creativity.

As regards the Personalization category, two comments were received, namely:


Comment 1 refers to the use of VR spaces as a replacement for the traditional 2D web pages. Comment 2 is addressing the fact that VR experiences can be expanded to illusionary spaces rather than limited to the representation of real spaces. In general, the comments refer to monetization, namely the mutual professional benefits and the existence of a variety of subjects and costs. This might be interpreted as an increased need for a monetization plan that allows the provision of mutual benefits to collaborators in the VR space. Another comment states that new mediums that might be interpreted as new media can actually merge all types of media.

#### **6. Discussion**

Several dimensions of our requirements engineering approach are topics for discussion. Firstly, the demographics of the survey suggest that the tools to be developed should be tailored to two extreme poles, namely a) fit well to youth, for example, using a low-price plan, or b) to companies with a high-budget plan. As regards the creation tools and the type of software, in general, we can infer that big companies dominate with tools tailored to experts in design and in programming. Adobe desktop tools for 2D content creation are widely used by designers, a trend that is difficult to change. As regards 3D content creation, tools are not limited to certain brands, apart from the Unity3D graphics engine that dominates in programming XR applications. Promotion and monetization are correlated to the exposure of content on Instagram, Facebook, YouTube, and LinkedIn. It seems that Instagram is more popular for 3D content creation promotion than any other social media platform. Another fact is that content creators tend to have their own website in WordPress, Blogspot, and Wix as these three platforms provide a simple way to make a personal website through templates.

As regards the evaluation of scenarios, Scenario 1, namely the exposure to 3D models on websites is ranked highly according to the responses of the users. Content creators would like to publish their 3D models on web pages as long as the downloading time and the end-user device requirements are kept low. It is also inferred that creators do not have or do not know a certain methodology to expose 3D content. If this is combined with the conclusions of the previous paragraph, it seems that the content creators do not know how to embed 3D models in WordPress, Blogspot, and Wix. Exposing 3D content (or images of it) on Instagram, Facebook, YouTube and LinkedIn is very important for dissemination purposes. Scenario 2 about achieving visual effects on web or mobile is bending towards web applications for both amateur and professional users. This might be interpreted to mean that the latest technologies such as 5G internet, WebAssembly, and the upcoming WebGPU [15] standards have increased the creators' trust in web browsers. The conclusions from Scenario 3 are rather mild. Users are not inclined towards a direction indicating only that the editing of videos inside VR space is difficult or not interesting to them. Scenario 4 about the promotion of artistic media inside VR worlds has also raised the interest of participants. It is inferred that creators believe that their content is not promoted well on current repositories. The accessibility of VR spaces from social media and the exposure of content to social media is also important. As regards Scenario 5, the

idea of a personal or joint blog in a VR space seems to be attractive as a means to express better personal interests. The idea of a nano VR space with information regarding biology, and most specifically COVID-19 mechanisms, seems to be more attractive than illusionary spaces of other types. Towards this direction, one of the most unexploited resources in Art, but prominent in Biology, is the Protein Data Bank (PDB) [16]. It provides rich data for visualization, such as the protein-related SARS-COVID-19 model as shown in Figure 15.

**Figure 15.** Scientific 3D models repositories can be used as a source of 3D content [16].

Scenario 6, which addresses the case for designing 3D models in the VR space was also attractive. However, it was prompted that the interfaces should be improved, the headsets should be more accessible, the software should be better promoted, and sufficient training material should be available.

According to other requirements analysis surveys as reviewed in Section 2, the future of 3D platforms is promising but certain aspects should be improved. An issue is the quality of the existing authoring tools for non-experts as they are not offering the number of options that tools for experts (e.g., Unreal or Unity) offer. Another issue is the comfort of XR headsets and their price.

The IEEE 830 requirements specification methodology has two additional steps, namely Requirements Specification, where the requirements are brought into a suitable and unambiguous form, and Requirement Validation, namely the review and the validation of the requirements for clarity, consistency, and completeness. These are future steps that will be executed internally in the project with the ultimate goal to end up with a commonly agreed collection of raw requirements. Also, the methodology has a mechanism for defining more low granularity requirements, namely, a role-playing theatre also referred to as a game (see at the bottom of Figure 1). This role play theatre is used to simulate how a project's stakeholders would interact among themselves and with the hypothetical prototype of the project. This theatre is divided into three steps: the Preparation; the Execution and the Review steps. The preparation step deals, as the name indicates, with the preparation and implementation of the theatrical plays (games) to be used in the next step. The gaming step comprises the workshop, where the gaming sessions happen. The review step consists of the analysis and documentation of the requirements in each repetition. This game will be executed multiple times to find the "hidden" requirements and to improve the results.

#### **7. Conclusions and Future Work**

The 3D media sector is a booming sector especially due to the innovative technologies that arise for collaboration in creation, the democratization of hardware and software, the XR capabilities of modern mobile devices, and the increased demand for XR media due to the pandemic. However, most of the content creation tools today are mainly targeting programmers and skilled designers, leaving out general users and experts in culture from the media creation pipeline. In order for the 3D media sector to flourish, the next generation 3D media platforms should have the following characteristics: (a) Easy to use interfaces for media content creation in VR; (b) Collaborative characteristics for content creation and integration of the content creation in the production pipeline for non-experts; (c) VR environments as social spaces for enhancing promotion and interaction; (d) Provide methodologies to export created 3D content in personal web pages/spaces; and (e) Capability of creating content in Web or mobile platforms. We foresee that opensource technologies will allow more inclusive approaches and participation of research organizations in the media production pipeline.

**Author Contributions:** Conceptualization, L.M.G.; Investigation, C.M.; Methodology, D.V.; Supervision, N.P., S.N. and I.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** The research leading to these results has received funding from the European Union H2020 Horizon Programme (2014–2020) under grant agreement 957252, project MEDIAVERSE (A universe of media assets and co-creation opportunities at your fingertips): https://MediaVerse-project.eu, 1 October 2020.

**Institutional Review Board Statement:** The study was conducted according to the guidelines of the Declaration of Helsinki, and ethical procedures approved by the Ethical Committe at the Universitat Autònoma de Barcelona (protocol code 5207 approved on 05/11/2020).

**Informed Consent Statement:** Informed consent was obtained from all subjects involved in the study.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Dimitris Dimitriadis 1,\* , Sofia Zapounidou <sup>2</sup> and Grigorios Tsoumakas <sup>1</sup>**


**Abstract:** Manual classification of works of literature with genre/form concepts is a time-consuming task requiring domain expertise. Building automated systems based on language understanding can help humans to achieve this work faster and more consistently. Towards this direction, we present a case study on automatic classification of Greek literature books of the 19th century. The main challenges in this problem are the limited number of literature books and resources of that age and the quality of the source text. We propose an automated classification system based on the Bidirectional Encoder Representations from Transformers (BERT) model trained on books from the 20th and 21st century. We also dealt with BERT's constraint on the maximum sequence length of the input, leveraging the TextRank algorithm to construct representative sentences or phrases from each book. The results show that BERT trained on recent literature books correctly classifies most of the books of the 19th century despite the disparity between the two collections. Additionally, the TextRank algorithm improves the performance of BERT.

**Keywords:** semantic indexing; text classification; Greek literature; TextRank; BERT

#### **1. Introduction**

The role of cultural heritage in sustainability is mainly perceived in terms of monuments and sites and their role in raising awareness of the landscape or in contributing to the touristic development of the area. Literature can have an important role in this respect, as it helps people understand the cultural context of an era, of a nation, of a monument, of a place, etc., thus enabling them to adopt more inclusive and equitable attitudes and behaviors. From an opposite perspective, the sustainability of cultural heritage through information technology is also very important, as several pieces of cultural content have not yet been fully digitized. This is particularly relevant to works of literature, especially of the past centuries.

Semantic indexing of works of literature using concepts related to their subject, such as genre or form terms, is an important process enabling the work to be searched and retrieved by such concepts. Taking into account that people often exhibit preferences for specific genres [1–3], information about genre is a useful search filter for finding literature. Moreover, without indexing with relevant metadata, works of literature are not easy to discover and eventually use, which puts the maintenance and sustainability of such cultural content at risk. A well-known literature classification scheme is the Genre/Form Terms for Library and Archival Materials (LCGFT) [4] , which is used by most libraries around the world [5]. Manual indexing of works of literature with concepts is a time-consuming task that requires domain expertise. Automated indexing systems based on natural language understanding can help humans do this work faster and more consistently.

This paper documents our approach towards the construction of a system for automatically classifying Greek books of the 19th century with genre/form concepts. Our work is part of the ECARLE research project [6], which concerns the semantic enrichment

**Citation:** Dimitriadis, D.; Zapounidou, S.; Tsoumakas, G. Semantic Indexing of 19th-Century Greek Literature Using 21st-Century Linguistic Resources. *Sustainability* **2021**, *13*, 8878. https://doi.org/ 10.3390/su13168878

Academic Editor: Julie Ernst

Received: 30 June 2021 Accepted: 4 August 2021 Published: 9 August 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

of 19th-century Greek books with metadata, such as layout information, named entities, relations among named entities, and form/genre terms. ECARLE compiled a dataset of 57 Greek books of the 19th century, classified as essays, prose, poems, and manuals. A distinctive aspect of our work is that Greek books of the 19th century were written mainly in the *katharevousa* variety of the Greek language, a language in between ancient Greek and the modern *demotic* variety of Greek. Working with text in katharevousa is challenging, as state-of-the-art natural language processing resources, such as embeddings and pre-trained neural language models, are based on modern Greek corpora. Most of the past approaches for literature classification have focused on modern Greek [7–9].

Nowadays, state-of-the-art models for text classification employ deep learning and in particular pre-trained language models, which are fine-tuned on training data from the task at hand [10,11]. Following this paradigm, our approach leverages a version of the BERT model pre-trained on modern Greek corpora [12]. To the best of our knowledge, this is the first work to use BERT for classifying Greek literature in general. Most of the past studies put more emphasis on feature engineering [9,13].

One limitation of BERT is that it cannot take large pieces of text as input. To address this issue, we use the TextRank algorithm [14] to extract from each book a representative set of sentences or phrases, which are then passed as input to BERT. This has been recently proposed in [15], but for a different task (information retrieval) and domain (legal documents) than in our case. Other approaches to dealing with long documents include extending BERT [16,17], and selecting parts of the document [10,18].

As the size of the ECARLE dataset was too small for training or fine-tuning models and at the same time evaluating the accuracy of the trained models, we constructed a second annotated dataset of 755 books from the 20th and 21st century by leveraging the content of the *Open Library* (OL) [19], a repository of more than seven thousand Greek digital books distributed freely and legally on the Internet in PDF format. In our empirical study, we use a large part of the OL dataset for fine-tuning BERT and the rest of the OL dataset, as well as the ECARLE dataset, for measuring the accuracy of the fine-tuned model.

In summary, this work makes the following main contributions: (i) a case study on semantic indexing of 19th-century Greek literature using 21st-century state-of-the-art models and linguistic resources, and (ii) two collections of Greek books classified as essays, prose, poems, and manuals, which span three centuries [20].

Οur work aims to answer the following research questions:


To answer the first question, we experimented with two data sets, one from the 19th century (ECARLE) and one from the 20th and 21st century (OL). We used the BERT model, which has previously achieved state-of-the-art results. We hypothesize that BERT will manage to transfer knowledge from modern Greek to katharevousa, as it models language at the sub-word level and the two Greek language varieties share common sub-words. To answer the second question, we experimented with the TextRank extractive summarizer to extract sentences/phrases, hypothesizing that it will manage to distill the necessary information for genre/form classification from each book.

The rest of this paper is structured as follows. Section 2 presents a comprehensive review of related work and the current state of the art in text classification. Section 3 outlines the methodological approach we used. Section 4 describes the evaluation of our approach, including the experimental setup (parameter tuning and datasets preprocessing), the results on both the OL and the ECARLE datasets, and the discussion of the results. Finally, Section 5 concludes this paper and mentions directions for future work.

#### **2. State of the Art and Related Work**

We first discuss traditional and state-of-the-art methods for text classification in general. Next, we review related work focused on semantic indexing of literature, contrasting them with the approach we adopted in this work.

#### *2.1. State of the Art*

Traditional approaches to text classification typically involve feature engineering and supervised learning. The most common machine learning algorithms used in this task are decision trees, pattern rule-based classifiers, support vector machines (SVMs), neural networks, and Bayesian classifiers [21], while in the feature selection phase, several evaluation metrics have been explored, such as term frequency, information gain, and chi-square [22].

Recent approaches work directly with the text, using complex neural networks for extracting patterns. The authors of [23] use dynamic embedding projection-gated convolutional neural networks, outperforming other approaches on four well-known text classification datasets. A hybrid approach was used by [24] incorporating predictive text embedding and graph convolutional networks for text classification to address the limitations of both methods enabling faster training and improving performance in small labeled training set scenarios. The study in [25] uses a vector representation based on a supervised codebook in document classification in Nepali. These approaches perform well in text classification, but they have not been tested in a large number of benchmarks and natural language processing tasks.

Transformer-based models generally perform well in many natural language processing tasks, including text classification. A recent study shows that BERT and its variations outperform all the previous machine and deep learning techniques, such as SVMs, naive Bayes, convolutional, and recurrent neural networks [11]. The study shows the results of many text classification tasks and benchmarks, such as sentiment analysis, question answering, and topic classification. Towards this direction, we use the BERT model to classify Greek books based on their form/genre concepts, and we build on previous research to overcome the problem of long documents by leveraging the TextRank algorithm to create appropriate inputs for the model.

#### *2.2. Related Work*

For genre classification of literary texts, the most common features are based on stylometrics [26], which are extracted by statistical analysis of the texts [27–30]. The authors of [27] used as style markers (i.e., countable linguistic features) the frequency of occurrence of the most frequent words of the entire written English language using the British National Corpus to automatically detect the genre of the given text showing that these frequent words are reliable discriminators of text genre. In the same spirit, we use the TextRank algorithm to extract sentences/phrases considering that the output of the algorithm can be used as a discriminator of genre/forms concepts. Stylometrics, content-based features, and social features were used in [28] for genre classification of German novels. The authors showed that even though topics are orthogonal on genre, the SVMs considering topic-based features achieved the highest accuracy compared to other traditional machine learning algorithms such as k-nearest neighbor, naive Bayes, and multilayer neural networks. Our study shares the same limitations with this work in terms of the amount of data for training the models. However, we also have an extra obstacle related to the language varieties.

Other studies on analyzing literary texts place more emphasis on the representation of each document as a multidimensional vector, which is given as input to a classifier. Towards this direction, Yu [31] experimented with four different text representations (based on absence/presence of a word, frequency of words, normalized frequency of the words, and idf-weighted frequency of the words) in eroticism classification of Dickinson's poems and sentimentalism classification of chapters in early American novels. The authors of [32]

propose the encoding of books as binary vectors, where each dimension corresponds to the existence or not of a character 4-gram in the document. In contrast to these approaches and other similar ones, we use the BERT model, which can both represent the given input as a dense vector and categorize it into a genre/form concept. In this way, the model itself encodes the information needed for solving the task, instead of being affected by human choices on document representation.

In the case of very long documents such as books, the extraction of smaller parts of texts is necessary. In this context, Worsham and Kalita [18] propose different methods to select a representative text for training the learning models, such as extracting the first, last, or random 5K words of a book or 2.5K words of each chapter of the book. We similarly sliced the text into parts, but we also proposed the use of TextRank for constructing more sophisticated sets of sentences/phrases for the classifier.

There are also approaches which categorize books by focusing on other parts of the book, such as the book cover [33], as well as approaches which categorize web documents [34]. These approaches are out of the scope of this paper, as the first one considers text with images or only images, while the second one considers web documents that have a very different structure to literature books.

There are few studies on text classification of Greek literature. Most of them show the language independence of their approach in the genre identification task [8,9,35], or test several feature engineering approaches [7,13]. None of the existed studies dealt with the language variety problem, while as far as we know, none of the existed studies have experimented with text classification in Greek literature leveraging transformerbased models.

#### **3. Methodological Approach**

This section introduces the approach that we used for constructing a 19th-century Greek literature semantic indexing model, as well as the two datasets that are involved in our study.

#### *3.1. Approach*

Our approach is based on a BERT model that was trained on the Greek language [36]. Specifically, it was trained on the Greek part of Wikipedia, the Greek part of European Parliament Proceedings Parallel Corpus, and the Greek part of OSCAR, a cleaned version of Common Crawl. We fine-tune this model on the semantic indexing task of our case, using the modern Greek books of the OL dataset. We then evaluate it on the 19th-century books of the ECARLE dataset.

As BERT cannot accept very long sequences of text as input, we had to find a way to distill representative content from the books that could both fit as input to BERT and contain useful information for discriminating among the four different literature categories. A common way to deal with this problem is to select parts of a long document, e.g., randomly, up to the desired number of tokens. Another way is to use transformer-based models to represent several parts of the document, which are then used by a traditional supervised model as input. Here, we adopt a simpler method that employs TextRank [14], an extractive summarization algorithm, to distill a small set of representative sentences/phrases from each document. These pieces of text are then given as input to the BERT model for the classification task.

Figure 1 illustrates this approach. The long document is given as input to TextRank, which extracts the top *N* sentences/phrases and passes them to the BERT model. The input sentences/phrases are separated by the [SEP] special token which corresponds to the end of a sentence.

**Figure 1.** A transformer-based classifier trained on sentences/phrases extracted from TextRank.

#### *3.2. Data*

Literature experts selected 107 Greek books [20] from the library of the Aristotle University of Thessaloniki for the purposes of the ECARLE project. The experts classified these books into eight categories: prose, poetry, letters, essays, lexicons, encyclopedias, magazines, and manuals. The books were already in digital form through scanning. However, extracting their text via OCR proved to be very challenging. Despite employing several state-of-the-art tools, from open source libraries and commercial software to training deep learning models on sample transcribed pages of the books [37], the OCR accuracy was far from satisfactory. Some characters were not recognized correctly, e.g., πέχνην instead of τέχνην *(art)*, or were completely missed, e.g., ςρεθλόν instead of παρελθόν *(past)*, by the OCR. In some cases, the intensity of this phenomenon led to whole words and phrases missing or without meaning. In addition, sometimes the OCR process misinterpreted page headers and footers as normal page text. Due to these difficulties, the extraction was accomplished for only 57 of the books, 29 of which were essays, 15 prose, 11 poems, and 2 manuals.

As the size of the ECARLE dataset was too small for training machine learning models, we constructed a second dataset by leveraging the content of the *Open Library*, a repository of more than seven thousand Greek digital books distributed freely and legally on the Internet in PDF format. The digital books of Open Library are classified into 40 thematic categories. In addition, each book is accompanied by one or more tags and metadata, entered by the platform's administrators. The literature books of this repository are classified into 8 main categories: classic literature, novels-novellas, short stories, poems, essays, plays, children's literature, and comics. Two out of the four categories of interest exist in this categorization: poems and essays. The novels-novellas and short stories categories were jointly considered as members of the prose category. Lacking a category related to manuals, we considered the books having the word manual in their metadata.

Typically, the category of each book is also included in the metadata. This is not the case for all books, however. In addition, we observed that some books that were classified

in one of the categories of our interest, had another one in the metadata. For example, the book ΄Ησουνα κάποτε εδώ (*You were once here*) belongs to the *poetry* class, but includes prose in the metadata. To avoid noisy examples, we decided to keep books that include their category in the metadata and at the same time do not contain another member from our category set in their metadata. Furthermore, we manually removed books that did not contain any readable characters due to the PDF extraction process. The final dataset contains 124 essays, 254 prose, 177 poems, and 200 manuals from the 20th and 21st century. For extracting the plain text from the PDF files of Open Library, we used the Python library PDFMiner [38].

Figure 2 illustrates the workflow of creating the two datasets including the conversion and preprocessing of the original collections. Firstly, we mined the books from the Open Library and the Library of Aristotle University and applied PDF extraction using PDFMiner tool and OCR conversion accordingly (technical details about OCR can be found in [37]). Some books of the Open Library did not have readable characters at all, so we manually removed them. Each dataset passes through the TextRank algorithm using PyTextRank [39] from SpaCy library for creating sentences/phrases for each book. Then, the changed books are tokenized based on the BERT tokenizer provided by the transformers [40] Python library. After the tokenization phase, the OL and ECARLE datasets are ready for the training and testing.

**Figure 2.** The workflow of creating the OL and ECARLE datasets.

Figure 3 shows the histogram of the publication year of the books in the two datasets. As we can see, there is a gap from 1900 to 1970 between the two datasets apart from three books which were published in 1917, 1959, and 1963, respectively. All the books in the ECARLE dataset were published before 1900, while most of the books in the OL dataset were published after 2010. One important difference between the books of the two datasets, stemming from the different century that they were written, is the variety of the Greek language that they use. Books in the ECARLE dataset are mainly written in the katharevousa variety, while books in the OL dataset are written in modern Greek (demotic variety).

**Figure 3.** Histogram of the publication year of the books in the OL and ECARLE datasets. The x-axis corresponds to ranges of years, the left y-axis to the number of books in the ECARLE dataset in a specific range, and the right y-axis to the number of books in the OL dataset.

Differences between the two datasets have also been observed in the number of words contained in each book (Figure 4). The books in the OL dataset have approximately three thousand words on average, while the books in the ECARLE dataset more than four thousand. There are also some outlier books with more than seven thousand words. To find the words in the datasets, we used the el\_core\_news\_lg vocabulary of the spaCy [41] Python library and we ignored tokens belonging to PUNKT and SPACE classes since the first one includes all punctuation marks and the second one all space characters.

**Figure 4.** Distribution of the number of words in books of the OL and ECARLE datasets. The outliers are missing for the sake of visualization.

#### **4. Evaluation**

This section describes the experimental setup and discusses the results. We first present the datasets that were used for training and testing. Next, we mention the hyperparameter tuning process and present the results on both datasets. Finally, we discuss and explain the results.

#### *4.1. Experimental Setup*

We split the OL dataset into a train and a test set, in a way such that the distribution of the classes in the test set is the same as in the ECARLE dataset. This allows for a more informative comparison between the accuracy of the model at in-sample modern Greek text and out-of-sample katharevousa text. As a result, the training set consists of 698 books and the test set 57. Table 1 presents the number of instances per class and the total number of instances for the train and test sets of the OL dataset, as well as for the ECARLE dataset.


**Table 1.** Number of instances per class and total number of instances for the train and test sets of the OL dataset, as well as for the ECARLE dataset.

TextRank was used in three different variations: we extracted phrases with a rank score greater than 0.01, as well as the top 5/10 sentences. In addition, we experimented with splitting each book into three equal parts and considering the first 256 tokens of each part, as 256 is the maximum sequence length that our BERT model can accept.

To find the appropriate hyper-parameters for fine-tuning BERT to our classification task, we used stratified 5-fold cross-validation on the train set of OL. We followed the instructions of BERT's creators [42] for the fine-tuning process, experimenting with the following set of parameters: (i) learning rate 2 × 10−<sup>5</sup> , 3 × 10−<sup>5</sup> , 5 × 10−<sup>5</sup> , (ii) batch size 16, 32, and epochs 2, 3, 4. Table 2 shows the selected hyper-parameters, with respect to each different method of input selection for the BERT model, along with the corresponding accuracy of the model.

**Table 2.** The selected hyper-parameters for fine-tuning BERT with respect to each method for selecting the input to the BERT model.


To further enhance our assumption about the effectiveness of BERT to generalize beyond the training set, we also experimented with the most common traditional machine learning algorithms that have achieved great performance in text classification. Particularly, we experimented with the support vector machines (SVMs), naive Bayes (NB), and logistic regression (LG). To give appropriate inputs to the classifiers, we used count vectorization converting the training/test sets of text documents to a matrix of token counts. As the vocabulary, we used the top 60,000 tokens ordered by term frequency across the training set. Since the entire document can be represented using such method, we did not experiment with different input methods. We used stratified 5-fold cross validation to select the models with the highest accuracy considering a set of parameters for each one. For SVMs, we experimented with the regularization parameter (C) with values (0.1, 1, 10, 100), the kernel coefficient for radial basis function (rbf), polynomical and sigmoid kernels (gamma) (1, 0.1, 0.01, 0.001) and the kernel type to be used in the algorithm (kernel) (rbf, polynomial, sigmoid, linear). The degree of the polynomial kernel function was set fixed to 3 and

tolerance for stopping criterion to 1 × 10−<sup>3</sup> . To support multiclass classification, we used the one-against-one scheme which is used as multiclass strategy and performs better than other schemes such as one-against-all [43]. For NB, we experimented with the additive smoothing parameter (alpha) with values (0.5, 1, 2). Finally, for LG, we experimented with different solvers (newton-cg, lbfgs, sag, and saga), the norm used in penalization (L1, L2), and the inverse of regularization strength (C) (0.1, 0.5, 1.0). The tolerance for stopping criteria was set fixed to 1 × 10−<sup>4</sup> . To implement the infrastructure for the traditional machine learning algorithms, we used the Scikit-learn Python library [44].

Table 3 summarizes the results of the classifiers along with the selected parameters and the mean accuracy over the five folds during validation.

**Table 3.** The best hyper-parameters for each machine learning algorithm based on the accuracy.


To evaluate the performance of the learning models we used the following measures: 1. **Accuracy** counts the correct predictions over the total number of examples.

$$Acc = \frac{TP + TN}{TP + TN + FP + FN} \tag{1}$$

where *TP*, *TN*, *FP*, *FN* correspond to the true positives, true negatives, false positives, and false negatives, respectively.

2. **Kappa coefficient** [45] indicates how much better a trained classifier is performing over the performance of a classifier that simply guesses at random according to the frequency of each class. The smaller the value, the more likely it is that the classifier would randomly classify the instances.

$$K = \frac{c \ast s - \sum\_{k}^{C} p\_k \ast t\_k}{s^2 - \sum\_{k}^{C} p\_k \ast t\_k} \tag{2}$$

where *c* is the total number of instances correctly predicted, *C* the total number of classes, *s* the total number of instances, *p<sup>k</sup>* the number of times that *k* was predicted, and *t<sup>k</sup>* the number of times that *k* truly occurs.

3. **F1 score** is the harmonic mean of the precision and recall for a class.

$$F1 = 2 \ast \frac{precision \ast recall}{precision + recall} \tag{3}$$

where:

$$precision = \frac{TP}{TP + FP} \tag{4}$$

and

$$recall = \frac{TP}{TP + FN} \tag{5}$$

4. **Weighted average F1 score** estimates the weighted average of the harmonic means of the precision and recall for all classes.

$$WAF1 = \frac{\sum\_{i=1}^{C} n\_i \* F1\_i}{\sum\_{i=1}^{C} n\_i} \tag{6}$$

where *n<sup>i</sup>* is the number of instances of the class *i*, *C* the total number of classes, and the *F*1*<sup>i</sup>* is the F1 score of the class *i*.

Although accuracy is an appropriate measure for balanced datasets, in our case it can be misleading, since the dataset is significantly imbalanced. F1 score and Kappa coefficient can give us better insights on the outcomes.

#### *4.2. Results*

Firstly, we present results on the OL test set (Table 4) and the ECARLE dataset (Table 5) in terms of accuracy and kappa coefficient (K), based on the hyper-parameters selected earlier. As expected, we notice that the results in the ECARLE dataset are worse than those in the OL test set. The models trained on the first part of the book or TextRank phrases have high performance in OL test set. In the ECARLE dataset, the model trained on five sentences extracted from TextRank has the best accuracy (68.42% acc.) The models trained on second/third parts of the books have the worst performance in OL test set with 75.44% and 71.93%, respectively, while in the ECARLE dataset, the models trained on second part of the book or TextRank phrases have equally the worst performance with 59.65% accuracy. Models trained on 10 sentences extracted from TextRank have high performance on OL test set with 85.96% accuracy. However, the performance is lower in the ECARLE dataset.

**Table 4.** Results on OL test set. Bold indicates the best performance.


**Table 5.** Results on ECARLE dataset. Bold indicates the best performance.


Regarding the K score in OL test set, there is greater agreement between the raters (actual and predicted values) for the model trained on TextRank phrases (0.8140) in OL dataset, while in the ECARLE dataset, the model trained on 5 sentences extracted from TextRank has K equal to 0.5090.

Table 6 presents the results on OL test set and ECARLE dataset considering the F1 score. The models trained on TextRank phrases/sentences have the highest weighted average F1 score on the OL test set (88.32%) and on the ECARLE one (67.75%). The difference in the performance on the two datasets of the model trained on the TextRank 5 sentences, is the second-lowest (15.04%). The model trained on the first part of the book has a high weighted average F1 score on the OL test set. However, its performance in the ECARLE dataset is low (55.19%). Finally, the model trained on the third part has the lowest difference between the results on the two datasets (10.13%), but this depends on the lowest performance on the OL test set (74.81%).


**Table 6.** Results based on F1 score per class and weighted average F1 score (WAF1) on OL test set and ECARLE dataset with and without TextRank. Bold indicates the highest weighted average F1 score either on OL dataset or ECARLE one.

We further present the corresponding confusion matrices to show the predictions of the models over the true labels of the books with and without TextRank. Table 7 shows the confusion matrices of the OL test set. In all cases the models predict correctly the manuals. Firstly, we observe that only the models trained on an input constructed by TextRank correctly predicts all the poems. The model trained on TextRank phrases is the only one that did not misclassify other books as manuals while the models trained on the second/third parts of the books misclassified 10 and 7 books, respectively, as manuals. The model trained on the first part of the books is biased towards the essay class while it is the only one that successfully classified 25/29 books as essays. The models trained either on the second or on the third parts of the books are biased towards manual and prose classes which justifies the low performance of the models based on the accuracy.

Table 8 shows the confusion matrices for the ECARLE dataset. Models trained either on the 5 or 10 sentences extracted from TextRank correctly predict 9/11 poems while the model trained on the third part of the book has the worst performance in this class (6/11). The model trained on the first part classifies 26/29 essays correctly, but misclassifies 13/15 prose books as essays. Only the model trained on TextRank phrases classifies 11/15 prose books correctly, while it is the one with the worst performance in the essay class predicting 16/29 books. Furthermore, the model trained on the third part of the book predicts 1/2 manuals.


**Table 7.** Confusion matrices of OL test set. Rows correspond to predictive values and columns to the actual ones for the four categories (essays (E), prose (Pr), poems (P), and manuals (M)).


**Table 8.** Confusion matrices of ECARLE dataset. Rows correspond to predictive values and columns to the actual ones for the four categories (essays (E), prose (Pr), poems (P), and manuals (M)).

> Finally, we present the results with traditional machine learning (ML) algorithms for the OL test set (Table 9) and the ECARLE one (Table 10) . As we expected, the algorithms have very good performance in the OL test set, since they have achieved great performance in a variety of text classification tasks before. The LG algorithm outperforms all models with 89.47% accuracy and WAF1 89.84%. However, in the ECARLE dataset, the results are significantly worse. The NB algorithm has the worst performance (19.30% acc.), while the best algorithm LG has 52.63% accuracy.

**Table 9.** Results on OL test set using ML.


**Table 10.** Results on ECARLE dataset using ML.


#### *4.3. Discussion*

The results indicate that we can build an efficient classifier for Greek books of the 19th century using resources from the 20th and 21st century since the BERT model classified most of the ECARLE books correctly. We observe that the model on both the OL test and ECARLE dataset has equally high performance. Furthermore, our assumption about the effectiveness of the BERT model has been confirmed since the alternatives, the traditional machine learning algorithms, had the worst performance in the ECARLE dataset.

All learning models had high performance during the hyperparameter tuning process with stratified 5-fold validation and OL training set. We expected good performance of the traditional machine learning algorithms since they have achieved high accuracy in text classification generally and also because all books are from the same collection and produced by the same extraction method. We also expected good performance of the BERT model since (1) it has achieved state-of-the-art results in many natural language processing tasks; (2) all books are from the same collection produced by the same extraction method and are written in modern Greek; (3) the BERT model has also been pre-trained on modern Greek texts; and (4) BERT is highly adaptable in downstream tasks.

The TextRank algorithm improves the results of the BERT model. The model finetuned on TextRank phrases has the highest weighted average F1 score (88.93%), the highest Kappa Coefficient score (81.40%), and the equal highest overall accuracy (87.72%). An explanation for the performance of TextRank is that BERT discovered more patterns in the phrases than among the sentences. These are not syntactic and semantic patterns but maybe simple statistics such as word and punctuation frequencies. Considering also the fact that

most previous approaches achieve high performance using stylometric features including word and punctuation frequencies [27–30], the results can be justified. The performance of the models fine-tuned on consecutive parts of texts is affected by the category distribution in the test set. Although the models trained on the second/third part of the text have high accuracy during hyperparameter tuning (90.12% and 86.93%, respectively), they have the lowest performance in the test set (75.44% and 71.93%, respectively).

We expected the results on the ECARLE dataset for a multitude of reasons. First, we were unable to find a plethora of 19th-century books, and the training of the learning models happened using modern Greek books of the 20th century and the 21st century. Thus, we expected some failures due to fundamental differences between the language used in the 19th-century and 20th-century Greek texts. These failures are more apparent when we use traditional machine learning algorithms. Descriptive statistics showed that there is a chronological gap between the two datasets, as well as a difference between the distribution of the words. The differences which occurred from the language itself related to the so-called *Greek language question* between the high variety of katharevousa and the low variety of demotic. This conflict regarding the dominance of one variety over the other took place in the late 19th and the 20th century.

The high variety of katharevousa retained the ancient Greek synthetic character and was established as the official national language of the Greek state in 1827. Regarding literary texts, katharevousa was mostly used in official documents, essays, prose, while the dominant language variety in poetry was the demotic [46]. In the late 19th century, many poems and scholars started defending the use of the demotic variety, establishing the movement of *demoticism*. The demotic variety was the one spoken by Greeks of the time. Concerning the katharevousa, the demotic is not a synthetic language but an analytic one. In practice, this means that the demotic uses more words and phrases to express a meaning in contrast with the katharevousa. In the 20th century, the *Greek language question* took on political dimensions; the conservatism supported the use of katharevousa, while the communists supported the use of the demotic. The use of demotic in written texts expanded in the 20th century and became dominant in the 1960s. In 1976, demotic was finally recognized as the official language of the Greek state.

The above statements justify the performance of TextRank phrases in predicting the prose books with high accuracy (11/15). The input of the BERT model based on the TextRank phrases is a concatenation of non-consecutive small pieces of texts that mitigates the differences between the demotic and katharevousa. Indeed, there are not more words and phrases in demotic books to express a meaning in contrast with the books in katharevousa. Although the prose books are written in katharevousa in the 19th century and in Demotic in the 20th century, this difference did not influence the performance of the model. On the other hand, the model trained on the first part of the book predicts 14/15 books as prose books in OL test set, while it classified only 2/15 books in the ECARLE dataset.

The performance of the models is also justified for the poems. Many books have been misclassified as poems. Sentences extracted from TextRank can classify 9/11 poems correctly. Poems have already been written in *Demotic* since the 19th century. Thus, books in the OL dataset have similar way of writing with the books of ECARLE dataset.

An observation is that the models are biased towards essays in ECARLE dataset except the model trained on TextRank phrases. This is an evidence that the OL dataset can be used for training models that can be used for predicting books from an earlier century. The high performance of the models in OL test set for the essay class is equally high for the ECARLE dataset despite the fact that we have a small number of essays during training in contrast to the number of books of other classes (e.g., prose).

The corrupted sets due to OCR conversion and PDF extraction seem not to affect the performance of the learning models. The noisy and missing sentences did not significantly affect the performance of the BERT model, neither did the differences between the Greek demotic and katharevousa. Although the test sets are small enough to provide a full explanation of the performance of the BERT model and TextRank algorithm, we observe that despite the limitations and obstacles presented in the paper, the models are capable of classifying an important set of books correctly.

BERT is known to learn complex features, such as syntactic patterns and semantic dependencies [47]. Considering that previous studies have shown that stylometrics play a key role in genre identification, we believe that BERT manages to learn such types of features during fine-tuning.

#### **5. Conclusions and Future Work**

This paper addressed the problem of constructing a model for classifying Greek literature of the 19th century by genre/form concepts, under the limitations of a small collection of works of literature and the low quality of the source text. To address these challenges, we compiled a collection of modern Greek books and employed the state-ofthe-art BERT model in conjunction with the TextRank algorithm for extracting significant sentences/phrases from each book. We posed two research questions and experimented with state-of-the-art algorithms for answering them. We found that recent books written in the modern Greek language helped us train efficient models correctly classifying most of the literature books in our target collection which were written in katharevousa. The assumption that the BERT model can efficiently build such a classifier has been confirmed considering that traditional machine learning algorithms had the worst performance in katharevousa. In addition, we found that using TextRank leads to better results compared to consecutive text parts extracted from the start, middle, or end of each book.

In future work, we aim to extend our collections of literature books to conduct more data-intensive experiments. Furthermore, we aim to experiment with several extractive and abstractive summarizers to confirm that a set of representative sentences/phrases can carry enough information for training an efficient classifier in this task. Finally, more experiments with traditional machine learning and deep learning models will give us a better perspective about the efficiency of the transformer-based models in this task and domain of interest.

**Author Contributions:** Conceptualization, D.D. and G.T.; methodology, D.D.; software, D.D.; validation, D.D.; formal analysis, D.D.; investigation, D.D.; resources, D.D., S.Z. and G.T.; data curation, D.D.; writing—original draft preparation, D.D., S.Z. and G.T.; writing—review and editing, D.D., S.Z. and G.T.; visualization, D.D.; supervision, G.T.; project administration, G.T.; funding acquisition, G.T. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research has been co-financed by the European Regional Development Fund of the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call RESEARCH-CREATE-INNOVATE (project code: T1EDK-05580).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Acknowledgments:** This work was supported by computational time granted from the National Infrastructures for Research and Technology S.A. (GRNET S.A.) in the National HPC facility—ARIS under project ID pa181002-NEBULA.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **The Smart Evolution of Historical Cities: Integrated Innovative Solutions Supporting the Energy Transition while Respecting Cultural Heritage**

**Georgios Tsoumanis 1,\* , João Formiga <sup>2</sup> , Nuno Bilo <sup>3</sup> , Panagiotis Tsarchopoulos <sup>1</sup> , Dimosthenis Ioannidis <sup>1</sup> and Dimitrios Tzovaras <sup>1</sup>**


**Abstract:** Building retrofitting is seen as an efficient method for improving a building's energy performance. On the other hand, when historical buildings are considered for this procedure, retrofitting gets more complicated. As historical buildings typically consist of low-performance building and energy systems, energy retrofits can be highly beneficial. However, not every retrofit technology can be installed in a historical building. In this paper, the study carried out for the implementation of Building-Integrated Photovoltaics (BIPV) solutions in the Historic Centre of Évora is provided, within the framework of the European project POCITYF (Project H2020). The study took into consideration all the observations of the Regional Directorate of Culture of Évora and the administration of the involved schools (including the Association of Parents), the needs of the Municipality of Évora, and the capabilities of technology developers ONYX and Tegola. The proposed solutions aim at fulfilling all the guidelines for preserving the historic centre and achieving the positivity metrics agreed with the European Commission on the challenging and indispensable path to the decarbonisation of European cities.

**Keywords:** smart cities; energy transition; cultural heritage; Évora; POCITYF

#### **1. Introduction**

Over the past decades, overall energy consumption demands have been increased due to global population growth and rapid economic development. Among others, buildings' energy consumption plays a crucial role in this increase. To reduce the buildings' energy consumption and greenhouse gas emissions, it is vital for cities to follow stable and longterm strategies for the transformation of their buildings and districts into Positive Energy Buildings (PEBs) and Positive Energy Districts (PEDs), respectively. A PEB is a building (or a group of buildings) that produces more on-site energy from renewable sources than it consumes [1]. The same applies to PED as well, at an urban area level. Under these energy transition strategies, building materials and energy technologies must be optimised to achieve the given goals [2].

Europe has already set its goals for becoming the first climate-neutral continent and implement integrated and innovative solutions in its buildings and districts. Under the European Green Deal [3], it aims at an economy where (i) there are no net emissions of greenhouse gases by 2050; (ii) economic growth is decoupled from resource use; (iii) natural capital is protected, sustainably managed, and restored; (iv) the health and well-being of citizens is protected from environment-related risks and impacts; and (v) no person and no place are left behind. The goal is to create many PEBs and PEDs among European

**Citation:** Tsoumanis, G.; Formiga, J.; Bilo, N.; Tsarchopoulos, P.; Ioannidis, D.; Tzovaras, D. The Smart Evolution of Historical Cities: Integrated Innovative Solutions Supporting the Energy Transition while Respecting Cultural Heritage. *Sustainability* **2021**, *13*, 9358. https://doi.org/10.3390/ su13169358

Academic Editors: Marc A. Rosen and Charalampos Dimoulas

Received: 30 June 2021 Accepted: 13 August 2021 Published: 20 August 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

countries as a step towards the 2030 set goals [4] for (i) 40% cuts in greenhouse gas emissions (compared to 1990′ s levels); (ii) 32% share for renewable energy; and (iii) 32.5% improvement in energy efficiency.

On the one hand, building retrofitting is seen as a more efficient method for improving energy performance than building reconstruction [5]. Standard energy retrofits include installing walls or insulating roofs, renovating windows, upgrading heating systems, installing ventilation and air conditioning systems, and optimising system operation schedules [6]. Further energy retrofit actions include lighting improvements such as lamp replacement and the use of lighting control systems, the introduction of solar energy systems, improvements to mechanical equipment, and the use of renewable energy [7]. On the other hand, when historical buildings become the case, then retrofitting gets more complicated. As historical buildings typically consist of low-performance building and energy systems, energy retrofits can be highly beneficial. However, not every retrofit technology can be installed in a historical building, as the buildings must not be damaged, especially in the cases of cultural heritage buildings. In addition, several other barriers arise when considering the historical buildings' energy transition, mainly based on their architectonic, conservative, and cultural barriers, along with each country's existing regulations. Thus, alternative methods of retrofitting must be considered for historical buildings.

In accordance with Europe's goals and given that most European cities have buildings of historical interest, POCITYF [8] is an EU-funded smart city project that will help cities to become greener, smarter, and more liveable while respecting their cultural heritage. As cities are responsible for the majority of the global energy consumption and buildings use about 40% of global energy [9], by implementing and testing PEDs and PEBs in the involved cities, POCITYF will support Europe in the race to become the first carbon-neutral continent by 2050. In this sense, POCITYF will demonstrate innovative smart city technologies (to be called hereafter Innovative Elements–IEs) in two lighthouse cities, Alkmaar (NL) and Evora (PT), and replicate them in six fellow cities. POCITYF combines positive energy blocks with grid flexibility, e-mobility, innovative ICT technologies, and citizen engagement strategies while respecting the urban cultural heritage.

In this paper, an insight about POCITYF's general strategy is given, while the focus is on describing (some of) the proposed IEs that will be implemented in Évora's historical buildings during the project (i.e., until 2023). More specifically, this work describes the study carried out within POCITYF for the implementation of Building-Integrated Photovoltaics (BIPV) solutions in the historic centre of Évora. This study houses all the guidelines aiming at preserving the historic centre while achieving the positivity metrics agreed with the European Commission on the challenging and indispensable path to the decarbonisation of European cities. Under these guidelines, Évora and the rest of the participating cities in POCITYF, that will follow Évora's example, will make a step towards their transformation to smart cities and preserve their cultural heritage buildings at the same time. Although Évora is one of the two lighthouse cities and one of the eight cities participating in the project, it is the only city that fulfils the requirements for the paper, as it is a UNESCO World Heritage Site with many historical buildings and the testbed of many BIPV solutions.

In Section 2, the past related work is given and is followed by a short description of POCITYF's general strategy in Section 3. The participation of Évora as a lighthouse city in POCITYF is given in Section 4. The paper's main contribution is discussed in Section 5, where all proposed solutions are thoroughly described, along with their implementation in the selected areas of cultural interest. Conclusions are drawn in Section 6.

#### **2. Past Related Work**

#### *2.1. Research Studies*

There is a limited number of studies regarding the application of innovative solutions in heritage contexts to support the energy transition, mainly due to the presence of architectonic, conservative, and cultural barriers.

Borda and Bowen, in their 2017 work [10], review some of the developments regarding smart cities and their effects on the cultural heritage sector. In this work, the enabling technologies for smart cities and smart cultural heritage consist of: (i) Internet of Things (IoT), (ii) Cloud computing, (iii) Wireless Sensor Networks (WSNs), (iv) Mobile broadband, and (v) Short range wireless, such as Bluetooth Low Energy (BLE) and NearField Communication (NFC). Moreover, they mention visualisation technologies as an important component for smart cultural heritage developments. In the same sense and year, Angelidou et al., in their work in [11], investigate how the historical and cultural heritage of cities can be enhanced by smart city tools, solutions and applications. By examining the incorporation of the historical and cultural heritage in three smart city strategies, they found that, despite the technological advancements, cultural heritage is not systematically exploited and formally incorporated in smart city initiatives.

In 2018, Allam and Newmer, in their paper [12], while reviewing the literature on the nature, challenges, and opportunities of smart cities, also propose a framework that takes into account the dimensions of culture, metabolism, and governance. This framework highlights that the economic dimension underlies each of the three dimensions (i.e., smart culture, smart metabolism, and smart governance) and it does not require its own focus in the development of smart cities.

Marsella and Marzoli, in their work in [13], present the work conducted in the EU project STORM [14], a European Research and Innovation Action co-funded in the H2020 framework. Under STORM, the creation of intelligent tools for gathering data from libraries, sensors, and crowdsensing techniques took place in order to help cultural heritage stakeholders in the prevention, response, and recovery phases. The developed tools were based on a variety of sensor types (e.g., air temperature sensors, LIDAR sensors, etc.) and aimed at (i) protecting cultural heritage by climate changes; (ii) enabling the smart cities technologies for protecting the cultural heritage protection.

Siountri and Vergados, in their 2018 work "Smart Cultural heritage in Digital Cities" [15], mention that the Smart Cultural Heritage concept encompasses: (i) Heritage; (ii) Urban factors; and (iii) Digital technologies. In addition, they also remark that even if the technological advancements have already led to many achievements, there is a need for further promoting and understanding the connection of Smart Cities to Cultural Heritage and integrating them into the smart city plans.

Akram et al., in their 2016 work in [16], evaluate the possibility of making heritage buildings more sustainable and preserve their cultural values at the same time. To achieve the above goal, they compare the construction materials, the traditional and futuristic design features, as well as the environmental impact of each era and energy source. The interesting point is that the heritage buildings presenting a great possibility of becoming greener due to the high quality of their construction materials and features. These materials and features represent many "values" from different perspectives, such as energy and materials expenditure and architectural characteristics that can no longer be replaced.

Adding to the above study regarding the materials of cultural heritage buildings and their impact on smart cities applications, Tse et al., in their 2018 paper [17], propose an indoor pollution study for Biblioteca Joanina in Coimbra, Portugal, a UNESCO heritage monument. More specifically, they installed a low-cost edge-based sensing platform inside the monument and then helped the library managers to understand how to modify the paths of tourists in order to reduce the atmospheric particles pollution in the library. What is most interesting in this work is the impact of the outside pollution into the library, higher than expected, likely due to the ancient doors and window, that do not represent a bearer for the PM2.5 or PM10.

Pierucci et al., in their 2018 work [18], compare the performance of smart windows with PVCC glass with the traditional solar control glasses and building-integrated photovoltaic panels. They perform a comparative life-cycle assessment of two buildings with the same size and typology, but one equipped with PVCC and the other with the traditional solar control glasses and building-integrated photovoltaic panels. The comparison's results

show that smart building-integrated adaptive technologies show benefits under comfort and operational energy points of view when compared to traditional ones. In addition, they contribute to the reduction of the carbon footprint of new and existing buildings.

In a more recent (2020) work, in [6], Cho et al., taking into account that cultural buildings need special attention when being retrofitted in order to avoid any damage, simulate the application of six different building energy retrofit (BER) packages on a historical educational building. Each BER package consists of a different mix of a set of technologies (e.g., Photovoltaic panels, Heating- and Ventilation-Air-Conditioning systems, etc.) and is compared with each other to find which is the most energy- and thermal-efficient (heating and cooling).

#### *2.2. Guidelines for RES Adaptation*

In many countries, there were sets of guidelines for adapting the Renewable Energy Solutions (RES) integration in historical buildings and sites [19]. Each guideline is mainly focused on one integrated solution/system or a set of systems. In the sequel, some of these guidelines are indicatively given.

Back in 2009, the Scottish guidelines were set in [20] as a guide for fossil fuel reduction, mainly in historical buildings that use renewable energy solutions, mainly photovoltaic and solar thermal systems. For the same country, in 2014, Historic Environment Scotland (HES) in [21] has set the guidelines for the use of renewable energy sources, providing the readers with examples and considerations for building-integrated photovoltaics and building-integrated solar thermal systems on historical buildings and sites.

The American National Renewable Energy Laboratory (NREL), in the 2011 work by Kandt et al. [22], presented the criteria for balancing the preservation of historic sites/buildings and energy production. Their focus was on building-integrated photovoltaics and building-integrated solar thermal systems. In the same year, the Bundesdenkmalamt (BDA), in [23], set the scale of compatibility among different renewable energy source technologies. The BDA mainly refers to photovoltaic and solar thermal systems. Regarding the integration of photovoltaics and building-integrated photovoltaic systems on Italy's historical buildings, the Ministero per i Beni e le Attività Culturali (MiBACT), in [24], defines the best practices.

The Solarenergie, Dipartimento Federale dell'Interno (DFI), cantonal guideline, and Federal Office of Culture (FOC) have provided the guidelines in Switzerland. Solarenergie, in 2014 [25], describes the use of handbooks with technical solutions in order to reduce modern and historical buildings' energy consumption, mainly under building-integrated photovoltaics and building-integrated solar thermal systems. For the same systems, DFI, in [26], focuses on making clear that the installation of the systems should follow an initial clarification with the competent authorities. On the other hand, in the cantonal guideline [27], there is an approach to defining specific rules for the renewable energy sources' aesthetic and technical integration. Photovoltaics and solar thermal, building-integrated photovoltaics and building-integrated solar thermal systems are considered in the cantonal guideline. Several illustrations regarding the photovoltaic reconciling procedure and the quality of constructions are the highlight in FOC's guideline in 2019 [28].

#### *2.3. EU Research Projects*

RES integration in heritage buildings and sites has been vastly examined within international, EU, and local research projects. In the following table, Table 1, some of these projects are indicatively given, along with the funding programme, years of implementation, types of RES technologies, and the focus of each project. In the table, the projects displayed were funded by one of the following programmes: (i) Intelligent Energy Europe (IEE) [29]; (ii) Fifth RTD Framework Programme (FP5) [30]; (iii) 7th Framework Programme (FP7) [31]; and (iv) Horizon 2020 [32].


**Table 1.** Research projects on RES integration in historical buildings and sites (non-exhaustive list).

#### **3. POCITYF**

POCITYF's goal is to build upon intelligent, user-driven, and demand-oriented city infrastructures and services in order to foster the cities' energy efficiency. The considered city infrastructures enhanced by e-mobility solutions can lead to a substantial increase of renewable energy merit; thus leading to a wider deployment and market uptake of PEDs. Overall, POCITYF aims to transform the cities by adding layers of smartness in their key infrastructures, technologies, and services, such as buildings, energy grid, and e-mobility. It will also form an open, collaborative ecosystem towards improving citizens' quality of life, innovation, and sustainability at a district and city level. POCITYF creates new possibilities for the cities to become safer, greener, and more responsive to the needs of their citizens, businesses, and other organisations. To this end, it brings new technologies and renewed infrastructure to cut household bills, create jobs and boost growth for achieving a sustainable, low carbon and environmentally friendly economy, putting Europe at the forefront of RES production and efforts against global warming.

#### *3.1. POCITYF's General Strategy*

In order to achieve the goals set, POCITYF's strategy has been built around four multidisciplinary and complementary Energy Transition Tracks (ETTs). Each ETT aims to propose solutions on how to increase the integration of both commercialised and innovative energy systems in current city blocks, with the ultimate goal of achieving higher selfsustainability for the cities and making them more environmentally friendly for their citizens. Each ETT consists of several Integrated Solutions (Iss), while each IS further groups several Innovative Elements (Ies) dedicated to the IS's goal. The first three ETTs enable the transition towards (1) reduced energy demand, (2) increased shares of renewables, and (3) e-mobility, while the fourth focuses on Citizen Engagement. Table 2 presents POCITYF's ETTs and their respective ISs.

More specifically, in ETT#1, the included Iss are focused on achieving significant energy savings at both the building and district levels (e.g., by reducing energy bills for citizens and enabling a high share of locally produced and consumed renewable energy). The Innovative Elements (Ies) to be demonstrated and replicated in ETT#1 include both PEBlevel and PED-level retrofits. In terms of the PEB level, Ies include: (a) circular insulation materials, (b) solar roofs and facades, (c) PV canopy, (d) PV skylight, I PV thermal panels, (f) thermo-acoustic heat pumps, and (g) hybrid wind/solar generation systems. In terms of the PED-level IEs, these are: (a) District Heating and Cooling (DHC) (biomass, waste and geothermal), (b) DC lighting with charging points and smart lamp posts, (c) (Peer-To-Peer) P2P energy trading platforms, (d) smart distribution management systemI(e) community solar farms, and (f) Aquifer Thermal Energy Storage (ATES) heat/cold storage. The implementation of these two levels of retrofits also consider the principles of circular

economy with the (a) utilisation of available waste streams (heat/building materials after demolishing), (b) reverse collection of waste, (c) waste management tools, and (d) the Pay-As-You-Throw system.

**Table 2.** POCITYF ETTs and Iss.


ETT#2 include ISs that mainly focus on (1) maximising self-consumption, (2) reducing grid stress, (3) avoiding renewable generation curtailment, and (4) increasing revenue through flexibility services to the grid. The Integrated Solutions of this ETT include the following IEs: (a) low temperature waste heat and geothermal sources, (b) innovative shortand long-term seasonal (in some cases storage) solutions (i.e., hydrogen fuel cells, Vehicle-To-Grid (V2G) coupled with stationary batteries, Phase Change Material (PCM)), (c) DC to work in parallel with AC grids, (d) smart Information and Communications Technology (ICT) to interconnect an Energy Management System (EMS) on a home/building/district level, € Demand-Side Management (DSM) platforms for the optimisation of energy flows, (f) thermal grid controllers, (g) market-oriented building flexibility services, and (h) storage systems connected to the LV and MV grid.

In ETT#3 the focus is on individual elements for electro-mobility on the energy system, with the goal to increase the penetration of electric vehicles (EVs) that utilise RES and enhance the potential of EVs to support grid flexibility (while reducing curtailment), promoting the decarbonisation of the mobility sector with the adoption of EVs, reducing citizens' mobility costs, and better traffic management. Innovative elements to be demonstrated and replicated in this ETT include: (a) deployment of V2G using the batteries of EVs, (b) exploitation of EVs in local car-sharing systems, (c) district-wide dissemination of smart charging stations powered mainly by RES, (d) DC public lamp posts as charging poIs, (e) bidirectional smart inverters, (f) optimal charging algorithms, and (g) EV charging management platforms.

ETT#4 focuses on improving citizens' quality of life and increasing the city's efficiency by involving citizens in the early development, design, and evaluation phases. Innovative elements to be demonstrated and replicated here include (a) P2P energy transactions, (b) gamification of bidding and trading in decentralised systems, (c) infotainment apps, (d) local caIigns, (e) crowdfunding, and (f) energy ambassadors creating local energy communities with the use of platforms.

Lighthouse cities (i.e., Evora and Alkmaar) have already pointed out the innovative elements that they are going to demonstrate throughout POCITYF. Fellow cities, in their turn, have already picked the innovative elements that they are interested to replicate.

#### *3.2. Monitoring & Evaluation*

POCITYF, as a Smart City project, incorporates a multitude of solutions that will accelerate the energy transition of its lighthouse cities and help with the replication activities in its fellow cities. Positive energy buildings and districts, grid flexibility, circular economy, e-mobility, and citizen-driven innovation are all integral parts of the POCITYF ecosystem. Therefore, the project success can only be evaluated through specific, tailored Key Performance Indicators (KPIs) [40] that need to be defined according to the scope of the specific interventions and stakeholders' needs, as well to provide comparability through established evaluation frameworks and monitoring databases [41].

Each KPI in POCITYF is responsible for monitoring and evaluating one or more of the following concepts:


Energy performance indicates the level of achieved energy efficiency in a system and is demonstrated using a set of energy-related KPIs (e.g., energy demand and consumption, degree of energetic self-supply by RES, etc.). Calculated KPIs can be used to assess the impact of the POCITYF project in facilitating the energy transition of the lighthouse cities. The impact can be assessed in terms of, for example, increasing self-sufficiency, reducing the amount of energy consumed, increasing renewable self-consumption with energy storage, or simply increasing the amount of renewable energy generated.

Environmental performance measures the level of environmental sustainability. It entails considerations on the (efficient) use of (renewable) resources, improved energy and water efficiency, the reduction of air contaminants and greenhouse gas emissions, increased reuse and recycling, and the reduction of hazardous waste and toxic pollutants. Related KPIs include, among others, greenhouse gas emissions and energy demand and consumption KPIs.

Economic performance defines the roadmap for assessing the performance of economic viability of the lighthouse cities' interventions and common technical evaluation methodologies to guide the monitoring and the evaluation activities needed to calculate the economic KPIs (e.g., average electricity price for companies and consumers, carbon dioxide reduction cost efficiency, etc.). The objective is to provide an evaluation plan to measure and analyse the economic performance, impacts, and effectiveness of the POCI-TYF interventions focusing both on the integrated solutions and, in a more holistic picture, on the ETT into which the ISs are aggregated and later on the city level.

In the POCITYF project, the ICT technologies (data communication, data management platforms and analytics) aim at supporting and enhancing the innovative solutions and general efficiency measures in energy, mobility, governance, and social dimensions. These enhancements are expected in such aspects as the increased flexibility of the domainspecific systems with the support of ICT, the increased usage of open data, improved data privacy and security, as well as more effective real-time data sharing, leading indirectly to energy efficiency improvements, caused by ICT technologies improvements. The successful usage of ICT and highly efficient domain-specific technologies in the energy and mobility domains will increase the overall impact of all ISs developed in the POCITYF project. KPIs

that are related to ICT solutions include, among others, the increased system flexibility for energy players and the quality of open data.

The mobility performance of district cities is measured via the respective mobility KPIs (e.g., electric vehicles and low-carbon emission vehicles deployed in the area, number of EV charging stations and solar powered V2G charging stations deployed in the area, etc.). Mobility performance will indicate the extent to which the POCITYF project was able to stimulate the transition from traditional fossil-fuel-based vehicles to low-carbon ones (i.e., EV, PHEF, and hydrogen). Apart from the usage of new types of vehicles, the implementation and usage of the new charging infrastructure should be measured as well.

Social performance is crucial to estimate the extent to which the project facilitates the involvement of citizens and social actors in the planning, decision-making, and implementation activities. Citizen groups represent groups of citizens with various activities related to POCITYF actions and objectives. They include actors such as residents, non-residential agents with a high interest, citizen associations, professional associations (e.g., engineers, taxi drivers), neighbouring cities/towns, as well as citizen ambassadors. Their perspective is of the utmost importance for the citizen-centric approach of POCITYF.

The successful governance practices contribute to the effective and progressive process of the project implementation as well as to a city with an efficient administration and a welldeveloped local democracy, thereby engaging the community proactively in innovative ways, which is translated into increasing citizen participation and enhancing the active involvement of various user groups, the community, and professional stakeholders in city developments. Accordingly, in POCITYF, the governance performance dimension is addressed by a set of KPIs (e.g., online visits to the municipal open data portal, legal framework compatibility, etc.), which allow for the evaluation of the effective governance of smart cities from the side of the municipality administration, planning, monitoring, and evaluation. In addition, the aspects of the legal domain (regarding the regulatory framework and its compatibility with the proposed solutions and implemented policies at the project or city level) are included as supporting monitoring measures.

The success of the implementation and, consequently, the replication of smart solutions in the context of a smart city depend on a series of external and internal indicators. The propagation aims at the improvement of the replicability and scalability of smart city solutions at a wider city scale. Propagation is about the potential for dissemination to other locations, other contexts, and other cities. Propagation depends in the first place on the inherent characteristics of the (innovative) smart city solutions both for transfers to other locations and countries, and for up-scaling from small single projects in the same city. KPIs that are related to propagations include social compatibility and the diffusion to other locations.

#### **4. The Case of Évora**

Eight cities participate in POCITYF, either as lighthouse (LH) cities or as fellow cities (FCs). The general idea behind this categorisation is the following: the LH cities are responsible for demonstrating ISs that they have already implemented in the past. FCs will use this knowledge to implement these ISs. An exchange of knowledge will also take place between the LH and FCs. The LH cities in POCITYF are: (i) Évora (Portugal) and (ii) Alkmaar (Netherlands). The FCs are: (i) Granada, (ii) Bari, (iii) Celje, (iv) Ujpest, (v) Ioannina, and (vi) Hvidovre. Each city has set its PEBs and PEDs for participating in POCITYF. In each PEB or PED, a group of ISs will be implemented, either in terms of demonstrating the IS (LH cities) or in terms of replicating an IS after the demonstrations (LH cities and FCs).

Note that all POCITYF cities have a proven record of actions aiming at achieving the Energy Union's objective of creating a resilient energy system and an ambitious climate policy for secure, sustainable, competitive, and affordable energy, setting themselves even more ambitious emission reduction goals than the EU. For example, the two leading LH cities, Évora and Alkmaar, have approved sustainable energy action plans and are active members in European networks on energy, mobility, and ICT.

#### *4.1. Évora as a POCITYF LH City*

Évora is a middle-sized city inhabited by about 53,000 citizens. It is located in the southern inner mainland of Portugal. Due to its well-preserved old town centre, partially enclosed by medieval walls, and many monuments dating from various historical periods, Évora is a UNESCO World Heritage Site and a member of the Most Ancient European Towns Network [42]. These characteristics present peculiar conditions for the integration of ISs in its buildings and districts.

Évora is participating in POCITYF with three PEDs, where the PEBs are located: (i) Évora city centre; (ii) Valverde village; and (iii) Industrial and commercial park. These PEBs are the hosts of the demonstrations that will take place in Évora. Table 3 presents the complete list of the IEs per IS that will be demonstrated in Évora.

**Table 3.** Évora's IEs in POCITYF.


In Figure 1, the exact location on the map of each PEB and the typology of buildings in each area are depicted. In addition, there is a list of each solution that will be deployed per PEB and at the district level as well.

**Figure 1.** The 3 Positive Energy Blocks (PEBs) in Évora. **Figure 1.** The 3 Positive Energy Blocks (PEBs) in Évora.

#### *4.2. Distinguish PEBs and PEDs of Cultural Interest*

Évora's well-preserved historical city centre (PEB1) is one of the richest monuments in Portugal, having earned the title of city-museum as well. In harmony with the urban fabric, the monumental complexes have led to the classification of Évora as a World Heritage Site by UNESCO since 1986. On the other hand, on this historical matrix, it is once again reemerging as a pole of regional development through the creation of products and services of excellence in tourism and the development of urban infrastructures that prioritise the well-being of its inhabitants. Furthermore, energy efficiency has become a policy priority for Évora due to its ability to address challenges such as reducing the dependence on foreign energy, reducing the GHG emissions, and improving the competitiveness of the economy.

This World Heritage City is the first urban area in Portugal to hook up to the intelligent energy grid. By promoting energy efficiency, micro generation, and electrical mobility, Évora is a shining example of sustainability for the whole country. The city has 31,000 domestic customers with installed smart meters; it also has an improved capability of RES and EV connections. Évora was selected because it complies with a set of criteria relevant to this experiment, such as: dimension, type of grid, national and international visibility, average level of consumption, inclusion in the national pilot network of electric vehicle charging stations.

Moreover, the city centre values its environmental component and the promotion of sustainable development. It holds heritage, cultural, academic, and services vocation with environmental quality. It also has a national and international recognition of its recovery policies and heritage preservation.

#### *4.3. Distinguish ISs That Could Affect the Cultural Heritage of the PEBs–PEDs*

To support the PEBs in reaching their objectives, energy flexibility solutions allow the groups of buildings to modify energy consumption and generation profiles according to various needs, while respecting users' comfort preferences. Examples of such needs refer to the reduction of electricity costs through the matching of buildings' demand and on-site generation (increasing the benefits of ETT#1-related solutions) or the decrease in electric vehicles' charging peak loads (mitigating impacts associated with the electrification of energy demand under ETT#3). For the municipal buildings in PEB1 of Évora, 10 energy routers that are equipped with battery energy storage systems are smartly operated via flexibility control algorithms to improve self-consumption and self-sufficiency ratios and reduce electricity costs.

Regarding e-mobility, a set of new solutions will also be deployed and tested in the cultural site (PEB1): smart lamp posts, vehicle-to-grid (V2G) chargers with PV integration,

and an EV charging management platform. The first is a modular lamp post, developed by the Ubiwhere company, composed of efficient LED lighting, EV Charger capabilities, and telecommunication services that enable the deployment of a 5G solution (4G and Wi-Fi are also included). The second consists in EV chargers which consider local PV generation to smart-charge EV. The integration is done using voltage control algorithms, thus going beyond the state-of-the-art by combining the charging control with a high level of integration with the LV network control elements, resulting in possible improvements in the power quality and voltage deviation. These chargers will also allow the injection of energy in the demonstration areas. Finally, the EV charging management platform will allow a better integration of the V2G capabilities of the EV chargers, minimising the impact of integration EV in the power grid. This platform will also enable users to manage their charging and discharging preferences and allow for a totally remote control of charging options using web-app and mobile-app interfaces.

#### **5. Implementing ISs while Respecting Cultural Heritage**

The fact that the historic centre has been a UNESCO World Heritage Site since 1986 comprises the added challenge of the solutions being able to conserve the city's cultural heritage (respecting the legal framework in force) while contributing to the PEB's positivity. In this sense, two different solutions aim to enable the historic centre and its citizens to take an active part in the pressing energy transition:


#### *5.1. Community Solar Farm on the Outskirts of the City of Évora*

The Community Solar Farm (CSF) project aims to provide the residents of the historic centre of Évora with the opportunity to access photovoltaic generation solutions, given that they are unable to install this type of solution in their homes, due to the protection mechanisms of cultural heritage that prevent the installation of photovoltaic panels in the walled interior of the city of Évora. Thus, CSF aims to provide the inhabitants of the historic centre with the possibility of owning and/or benefiting from a part of the generation of the plant, being rewarded for the corresponding produced energy. In POCITYF, CSF is represented by the "Community Solar Farm" given in IS1.1.

A solution based on the new legislative framework for collective self-consumption and renewable energy communities could respond to the challenge that CSF proposes. Based on the new legislation, the solution is fully aligned with the innovative character of POCITYF and the same disruptive character of the Portuguese Decree-Law No. 162/2019. As an innovation project, POCITYF proposes to explore, find, and design new solutions under the aforementioned legislation and the regulation of self-consumption of electric energy. Furthermore, the consortium is available to share with the Portuguese National Authority for Energy and Geology (DGEG), during this assessment process, the lessons learned and contribute, as a pilot project, to the implementation and improvement of the provisions set out in that Decree-Law.

In practice, the intention is to take advantage of this new legislative context and implement a renewable energy community (REC) in which the role of a self-consumption production unit (SCPU) would be played by the photovoltaic plant outside the historic centre of Évora. In turn, consumption points in the historic centre would play the role of REC's user facilities (UF).

As for the land that can be used to install the photovoltaic plant, the City Council of Évora (CME) provided land that is approximately 4 km from the historic city centre (see Figure 2). The technical solution is currently being drawn up, and apart from the technical feasibility study, different business models will be presented, aiming to bring value to both citizens and operators/promoters of the central.

**Figure 2.** Map with the possible location of the CSF in green and relative position in relation to the centre of Évora and the substation of Caeira.

#### *5.2. Renewable Energy Community in Évora's Municipal Buildings*

The historic centre of Évora presents very restrictive laws for the protection of cultural heritage, which constitute an obstacle even concerning the implementation of businessas-usual solutions for renewable generation (e.g., photovoltaic panels). Eight municipal buildings (Municipal Market 1 de Maio, Arena, Tetro Garcia de Resende, EB1 School of S. Mamede, Évora City Hall Building (CME), Vista Alegre School, Rossio Laboratorio Vivo School for the Electric Decarbonisation–LvpDé) will be provided with renewable generation capacity. Consequently, to overcome the constraints, five different BIPV (Building-Integrated Photovoltaics) solutions were designed by two entities of the consortium, that comply with the specifications and guidelines imposed by the Regional Culture Administration of Alentejo: ONYX—A Spanish company specialised in BIPV solutions; and Tegola–an Italian company specialised in aesthetic photovoltaic roofing. In terms of the POCITYF, the above solutions are part of the provided BIPV solutions and the Renewable Energy Community Management Platform, both described in ETT#2. Examples of BIPV solutions that will be installed in Evora are given in Figure 3.

**Figure 3.** Examples of BIPV solutions to be installed in Évora (**left**: photovoltaic tiles; **right**: aesthetic photovoltaic pergolas).

With this group of solutions, the buildings will obtain the desired and necessary positivity, as they will have an annual renewable generation superior to consumption.

In Figure 4, the eight municipal buildings and the parking lot are identified, delimiting the historic centre area that is the target of the collective self-consumption proposal. It should be noted that the different IUs' geographical distances do not exceed 2 km, in most cases being less than 1 km.

**Figure 4.** Spatial representation of the location of municipal buildings in the historic centre.

Thus, within the scope of the POCITYF project, and regulated by Decree-Law No. 162/2019 and the respective regulations for self-consumption of electric energy, the intention is to channel and take advantage of this surplus energy for other CME buildings located in the historic centre, constituting a collective self-consumption community with several buildings and not just those that will be endowed with BIPV solutions. Below is the list of buildings in the historic centre that are eligible for the community:


This high number of consumers belonging to the community could use surplus production to respond not only to the surplus of energy but also to the lag of "generation vs. consumption" (e.g., in the summer and weekend periods, schools will have very low consumption).

Considering that all the buildings in question are the property of CME, a collective selfconsumption project is the best solution, to the detriment of the constitution of a renewable energy community.

#### *5.3. Paços do Concelho (Town Hall Building)*

In the Building of Paços do Concelho, the Town Hall Building, located in Praça do Sertório, two types of solutions will be implemented: ONYX's aesthetic photovoltaic skylights and photovoltaic tiles provided by Tegola. ONYX's aesthetic photovoltaic skylight solutions will be implemented to replace the existing skylight in the main building and the roof in fibre cement tile existing in the praefurnium area of the Roman Baths. Regarding the skylight of the main building, the solution proposed by ONYX is illustrated in Figure 5.

**Figure 5.** Skylight of the main building in Paços do Concelho where ONYX solutions will be implemented (**a**); ONYX solution designed for said skylight (**b**).

Moreover, Figure 6 provides the aerial view of the Town Hall building, indicating the geographical orientation, a parameter of the utmost importance to maximise the BIPV design.

**Figure 6.** Town Hall building view from the top (with North orientation depicted).

Figure 5 shows that the ONYX solution will consist of the photovoltaic solution (represented as "PV glass") and a glass solution ("regular glass"). It is important to note that the aesthetic appearance of the glass solution will be similar to the photovoltaic solution. The need to include these two solutions in the skylight design lies in the impossibility of producing "incomplete" photovoltaic solutions, since they would imply cutting the crystalline solution.

This skylight from ONYX will guarantee the ideal working conditions of the building, both from the point of view of light and from the point of view of air conditioning. Figure 7 shows an ONYX aesthetic skylight implemented in a building of the Regional Government of Andalusia, whose aesthetic profile is similar to that designed for the main building in Paços do Concelho.

**Figure 7.** ONYX aesthetic skylight implemented in the Reg **Figure 7.** ONYX aesthetic skylight implemented in the Regional Government of Andalusia building.

Figure 8 presents the aesthetic photovoltaic skylight solution proposed by ONYX regarding the coverage of the Roman Thermal Baths. Similarly to the skylight of the main building of Paços do Concelho, the solution will be composed of a mixture of "PV glass" and "regular glass".

**Figure 8.** Aesthetic photovoltaic skylight solution proposed by ONYX for the Roman Thermal Baths (**a**); projection of the skylight over the thermal baths (**b**).

Figure 9 illustrates the interior aesthetic profile of an ONYX solution similar to that designed for the Baths.

**Figure 9.** ONYX aesthetic skylight solution similar to that designed for the thermal baths at the Town Hall. Figures 10 and 11 show the new solution of photovoltaic tiles supplied by Tegola.

**Figure 10.** New Tegola photovoltaic tile.

**Figure 11.** Installation of new Tegola photovoltaic tiles.

The new solution presented by Tegola accommodates the requirements indicated by the Regional Culture Administration of Alentejo in the first interaction in 2020. Thus, the tile has an aesthetic aspect very similar to the traditional tile of the historic centre of Évora, highlighting the curved shape and clay colour. Additionally, the colour of the tile will be adapted to the colour of the tiles of the Building of Paços de Concelho, in order to ensure an aesthetic harmony. Figure 12 shows the proposal for the location of Tegola tiles in the Building of Paços de Concelho. This location considers the budget limitation of the project while intending to ensure the compliance between the different sections of the coverage.

**Figure 12.** Location of Tegola tiles (in red) in the Municipality Palace Building.

#### *5.4. Arena d'Évora*

In Arena d'Évora, located on Av. Gen. Humberto Delgado, the solution that will be implemented will be Tegosolar PV, Tegola. This solution consists of a light coverage of amorphous silicon photovoltaic cells, conferring coverage and insulation capabilities, in addition to the aforementioned photovoltaic generation capabilities. Figure 13 illustrates Tegosolar PV's proposal to be installed in the Arena. The small coverage area is related to the very sensitive weight requirements of the existing cover and the maximisation of the use of irradiance in the south slope of the building. In Figure 14, an example of a building with Tegosolar PV technology is presented.

**Figure 13.** Tegosolar PV solution to be installed on the roof of the Arena d'Évora. **Figure 13.** Tegosolar PV solution to be installed on the roof of the Arena d'Évora.

**Figure 14.** Example of an installation with Tegosolar PV technology.

**Figure 15.** Arena's aerial view, with geographic representation (North).

#### *5.5. São Mamede School*

In the Basic School of São Mamede, located in Largo Dr. Evaristo Cutileiro, two types of solutions will be implemented: aesthetic photovoltaic pergolas and photovoltaic windows provided by ONYX. The ONYX photovoltaic pergola will be installed as set out in Figure 16. Figure 17 shows an example of an ONYX photovoltaic pergola installed in Tony Gallardo Park in the Canary Islands. This will be the aesthetic aspect granted by the solution proposed by ONYX for the S. Mamede School. Regarding the ONYX photovoltaic windows, a study was made for their installation on the school's ground floor. The target windows of this installation are identified in Figure 18 in the blue bound zone.

**Figure 16.** Aesthetic photovoltaic pergola solution proposed by ONYX for S. Mamede School.

**Figure 17.** Photovoltaic pergola solution installed in Tony Gallardo Park, in the Canary Isla **Figure 17.** Photovoltaic pergola solution installed in Tony Gallardo Park, in the Canary Islands.

**Figure 18.** Floor windows 0 that will be replaced by ONYX photovoltaic windows, marked in blue.

Figure 19 illustrates the solution presented by ONYX for the windows mentioned above. The photovoltaic window consists of a triple laminated glass of amorphous silicon and a semi-transparency degree of 20%.

**Figure 19.** Illustration of the ONYX study for the installation of photovoltaic windows.

The semi-transparency of 20% will give a better air conditioning to the corridor of the ground floor, residually reducing the interior light, an option that meets the requirements of the school administration. From the aesthetic point of view, Figure 20 shows an installation similar to the one that will be installed in São Mamede School.

**Figure 20.** ONYX photovoltaic window in San Francisco, USA.

#### *5.6. Rossio de São Brás School*

In the Basic School of the 1st Cycle of Rossio de São Brás (Figure 21), located on Av. Fighters of The Great War 2, two aesthetic photovoltaic pergolas by ONYX will be implemented. The pergolas, represented in Figure 22 in areas A and B, aim at providing a covered play area for children, thus offering protection against sun exposure and weather. The proposal meets the requirements of the school administration: in area A, an existing cover will be replaced, and in area B, a small pergola will be replaced, adding, however, a larger coverage area. The aesthetic aspect granted by the solutions proposed by ONYX for the Rossio School of São Brás is shown in Figures 23 and 24.

**Figure 21.** Rossio de São Brás School aerial view.

**Figure 22.** Location of pergolas to be installed by ONYX identified by A and B (**a**); Representation of ONYX photovoltaic pergolas (**b**).

**Figure 23.** Illustration of ONYX roofs to be installed at the Rossio School of São Brás (A) **Figure 23.** Illustration of ONYX roofs to be installed at the Rossio School of São Brás (A).

**Figure 24.** Illustration of ONYX roofs to be installed at the Rossio School of São Brás (B).

#### *5.7. Parking Lot*

Regarding the parking lot located on Avenida Engenheiro Arantes e Oliveira, in the vicinity of the Garcia de Resende Theatre (see Figure 25), Tegosolar PV photovoltaic roofs will be installed in order to provide a shading area for parked cars. The installation details are illustrated in Figures 26 and 27. The aesthetic aspect of the cover will be the same as the Tegosolar PV technology presented in Figure 14.

**Figure 25.** Location of the car park targeted by BIPV solutions on Avenida Engenheiro Arantes e Oliveira, in the vicinity of the Garcia de Resende Theatre.

**Figure 26.** Identification of the type of roof to be installed (**a**) and the respective stratigraphy (**b**).

**Figure 27.** Layout of the photovoltaic roofs with Tegosolar PV to be instal **Figure 27.** Layout of the photovoltaic roofs with Tegosolar PV to be installed in the parking lot.

#### *5.8. LVpDÉ*

The Living Laboratory for the Decarbonisation of Évora, alias LVpDÉ, located in Rua do Raimundo (see Figure 28), will house two types of BIPV solutions: ONYX's aesthetic photovoltaic pergolas and roof covers with Tegosolar PV technology.

**Figure 28.** Living Laboratory for the Decarbonisation of Évora (LVpDÉ), located in Rua do Raimundo.

As mentioned, the ONYX photovoltaic pergola will be installed in three distinct locations. The three solutions will consist of a double laminated glass of crystalline silicon and its aesthetics will be similar to that presented in Figure 16. Note that, in the case of LVpDÉ (see Figure 29), no pillars/support beams are required, and the ONYX solution is supported by the existing facades.

**Figure 29.** ONYX solutions to be installed on LVpDÉ.

Regarding Tegosolar PV technology, Figure 30 shows the proposal made for the coverage of LVpDÉ. Note that Tegosolar PV technology is indicated in din blue color, while the dark grey areas refer to a material with the same aesthetic appearance as Tegosolar PV. The reason for the application of the two distinct layers lies in the impossibility of "cutting" and adapting photovoltaic material to the corners of the surfaces. As an illustrative example, Figure 31 shows the installation of the Tegosolar PV product on a residential roof.

**Figure 30.** Application of Tegosolar PV technology in the coverage of LVpDÉ.

**Figure 31.** Tegosolar PV technology in a residential penthouse. **Figure 31.** Tegosolar PV technology in a residential penthouse.

#### *5.9. Municipal Market 1*◦ *de Maio (Fruit Market)*

Concerning the Municipal Market 1◦ de Maio, located in 1◦ de Maio Square, an ONYX aesthetic photovoltaic skylight will be installed in the blue-outlined area of Figure 32. The proposed solution for the market consists of a double laminated glass of crystalline silicon, whose aesthetic design is similar to that of the Edmonton Congress Centre, represented in Figure 33.

**Figure 32.** Municipal Market May 1, with the location of the onyx photovoltaic skylight.

**Figure 33.** ONYX photovoltaic skylight installed at the Edmonton Congress Centre. **Figure 33.** ONYX photovoltaic skylight installed at the Edmonton Congress Centre.

#### *5.10. Summary of Solutions with Renewable Generation Capacity*

In this section, a summary of the renewable generation capacity that the BIPV solutions will introduce in the historic centre is given. Analysing Table 4. Annual productions associated with BIPV solutions of municipal buildings, which details the generation foreseen for each of the installations, it is verified that, with the sized powers, an annual renewable production of about 845 MWh will be achieved.


**Table 4.** Annual productions associated with BIPV solutions of municipal buildings.

Additionally, the surplus generation that will arise at certain times of the year (e.g., during weekends schools will continue to have production but their consumption will be residual) will not be sold to the national electricity grid as usual, but rather shared with other municipal buildings of the historic centre, in light of the recent Portuguese Decree-Law No. 162/2019. Thus, BIPV solutions will not be the only disruptive solution to be implemented in the historic centre, being also the driver of an innovative renewable energy community involving the buildings of the historic centre of the Municipality of Évora.

#### **6. Conclusions**

This paper presents the study carried out for the implementation of BIPV solutions in the historic centre of Évora within the framework of the European project POCITYF (Project H2020). The study carried out throughout the demanding year 2020 considered all observations of the Regional Directorate of Culture of Évora and the administration of the involved schools (including the Association of Parents), the needs of the Municipality of Évora, and the capabilities of technology developers ONYX and Tegola. Thus, this study aims to house all the guidelines aimed at preserving the historic centre, while proposing to achieve the positivity metrics agreed with the European Commission on the challenging and indispensable path to the decarbonisation of European cities.

At POCITYF, in addition to the creation of the three PEBs, we are committed to empowering the citizens of the historic centre of Évora, so that they can have the same opportunities to engage in the energy transition as any other citizen. Moreover, the solutions presented here have a high replication potential not only in Portugal but also in Europe. It is a distinctive feature of POCITYF to contribute new and disruptive solutions that can provide Europe, especially its historic cities, with tools to lead electrical decarbonisation. As POCITYF continues to bring its cities to the new "smart cities" era, more results will arise and will be presented in future works, along with other useful information, such as 3D images showing the level and performance of the integration of the new technologies in the cities.

**Author Contributions:** Conceptualisation, P.T. and G.T.; methodology, G.T. and N.B.; resources, N.B. and J.F.; writing—original draft preparation, G.T., J.F. and P.T.; writing—review and editing, D.I. and D.T.; supervision, D.I. and D.T.; project administration, D.I. and D.T. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the POCITYF project (A positive energy city transformation framework), Grant agreement number 864400, which received funding from the European Union 's framework programme Horizon 2020 for research and innovation.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Article* **Extracting Semantic Relationships in Greek Literary Texts**

**Despina Christou \* and Grigorios Tsoumakas**

School of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece; greg@csd.auth.gr **\*** Correspondence: christoud@csd.auth.gr

**Abstract:** In the era of Big Data, the digitization of texts and the advancements in Artificial Intelligence (AI) and Natural Language Processing (NLP) are enabling the automatic analysis of literary works, allowing us to delve into the structure of artifacts and to compare, explore, manage and preserve the richness of our written heritage. This paper proposes a deep-learning-based approach to discovering semantic relationships in literary texts (19th century Greek Literature) facilitating the analysis, organization and management of collections through the automation of metadata extraction. Moreover, we provide a new annotated dataset used to train our model. Our proposed model, REDSandT\_Lit, recognizes six distinct relationships, extracting the richest set of relations up to now from literary texts. It efficiently captures the semantic characteristics of the investigating time-period by finetuning the state-of-the-art transformer-based Language Model (LM) for Modern Greek in our corpora. Extensive experiments and comparisons with existing models on our dataset reveal that REDSandT\_Lit has superior performance (90% accuracy), manages to capture infrequent relations (100%F in long-tail relations) and can also correct mislabelled sentences. Our results suggest that our approach efficiently handles the peculiarities of literary texts, and it is a promising tool for managing and preserving cultural information in various settings.

**Citation:** Christou, D.; Tsoumakas, G. Extracting Semantic Relationships in Greek Literary Texts. *Sustainability* **2021**, *13*, 9391. https://doi.org/ 10.3390/su13169391

Academic Editor: Charalampos Dimoulas and Amir Mosavi

Received: 30 June 2021 Accepted: 16 August 2021 Published: 21 August 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

**Keywords:** relation extraction; distant supervision; deep neural networks; Transformers; Greek NLP; literary fiction; heritage management; metadata extraction; Katharevousa

#### **1. Introduction**

An important part of humanity's cultural heritage resides in its literature [1], a rich body of interconnected works revealing the history and workings of human civilization across the eras. Major novelists have produced their works by engaging with the spirit of their time [2] and capturing the essence of society, human thought and accomplishment.

Cultural Heritage (CH) in its entirety constitutes a "cultural capital" for contemporary societies because it contributes to the constant valorization of cultures and identities. Moreover, it is also an important tool for the transmission of expertise, skills and knowledge across generations and is closely related to the promotion of cultural diversity, creativity and innovation [3]. For this reason, proper management of the development potential of CH requires a sustainability-oriented approach, i.e., one that ensures both the preservation of the heritage from loss and its connection to the present and the future. Proper management of literary cultural heritage, therefore, requires extensive digitization of collections and procedures that allow for the automatic extraction of semantic information and metadata to ensure the organization of past collections and their linkage with present and future documents.

Until recently, engaging with a large body of literature and discovering insights and links between storytellers and cultures was a painstaking process which relied mainly on close reading [4]. Nowadays, however, the large-scale digitization of texts as well as developments in Artificial Intelligence (AI) and Natural Language Processing (NLP) are making it possible to explore the richness of our written heritage with methods that were not possible before at an unprecedented scale, while facilitating the management and preservation of texts [5].

One of the opportunities afforded by digitization is *relation extraction* (RE): the automatic discovery of relations between entities in a document. This task plays a central role in NLP because relations can be used to populate knowledge bases (KB), to index corpora in search engines, to answer questions related to the text, to assist in the comparative analysis of texts and to understand/analyze the narration of a story. In this paper, we present a novel deep-learning model for RE that enables applications in all of the above domains by automatically identifying relations between entities in 19th century Greek literary texts. Although there are several RE approaches in the literature, the particular texts we are interested in (fiction), the language and the specific period all present significant challenges. We will have more to say about these shortly.

Most RE methods follow a supervised approach; thus, the required large amount of labeled training data constitutes perhaps the greatest barrier for real-world applications. In order to overcome this challenge, RE research has adopted distantly supervised approaches that are based upon automatically constructed datasets. Towards that end, Reference [6] proposed to use distant supervision (DS) from a KB, assuming that if two entities in a KB exhibit a relation, then all sentences mentioning those entities express that relation. This assumption inevitably results in false positives and to remotely generated records containing incorrect labels. In order to mitigate the problem of *wrong labeling*, Reference [7] relaxed the assumption so that it does not apply to all instances and, together with [8,9], proposed multi-instance learning. In that setting, classification shifts from instance-level to bag-level, with current state-of-the-art RE methods focusing on reducing the effect of noisy instances.

At the same time, extracting relations from literary texts has been undertaken only in the broader context of people in dialogue [10–13], people in the same place [14] and event extraction [14,15] and not, thus far, in the context of predefined relations among named entities other than person and place. We also emphasize the fact that state-of-the-art RE approaches are evaluated mostly on news corpora. The reason is that literary texts put emphasis on the narrative craft and exhibit characteristics that go beyond journalistic, academic, technical or more structured forms of literature. Moreover, literary texts are characterized by creative writing peculiarities that can vary significantly from author to author and time to time. Moreover, as most works of literature have been digitized through OCR systems, the digitized versions can also suffer from character or word misspellings. All these make it extremely challenging to discover entity relations in literary texts.

In order to address these challenges, we propose REDSandT\_Lit (Relation Extraction with Distant Supervision and Transformers for Literature), a novel distantly supervised transformer-based RE model that can efficiently identify six distinct relationships from Greek literary texts of the 19th century, the period that "contains" the largest part of digitized Modern Greek literature. Since no related dataset exists, we undertook the construction of a new dataset including 3649 samples annotated through distant supervision with seven semantic relationships, including 'NoRel' for instances with non-labelled relation. Our dataset is in the *Katharevousa* variant of Greek, an older, more formal and more complex form of the Modern Greek language in which a great part of Modern Greek literature is written in. In order to capture the semantic and syntactic characteristics of the language, we exploited the state-of-the-art transformer-based Language Model (LM) for Modern Greek (GREEK-BERT [16]), which we fine-tuned on our specific task and language. In order to handle the problem of noisy instances as well as the long sentences which are typical in literary writing, we guided REDSandT\_Lit to focus solely on a compressed form of the sentence that includes only the surrounding text of the entity pair together with their entity types. Finally, our model encodes sentences by concatenating the entitypair type embeddings, with relation extraction to occur at bag-level as a weighted sum over the bag's sentences predictions. Regarding the selected transformer-based model, the reasons for choosing BERT [17] are twofold: (i) BERT is the only transformer-based model pre-trained in Modern Greek corpora [16], and (ii) BERT considers bidirectionality while training with [18], showing BERT to capture a wider set of relations compared to

GPT [19] under a DS setting. Extensive experimentation and comparison of our model to several existing models for RE reveals REDSandT\_Lit's superiority. Our model captures with great precision (75–100% P) all relations, including the infrequent ones that other models failed to capture. Moreover, we will observe that fine-tuning a transformer-based model under a DS setting and incorporating entity-type side information highly boosts RE performance, especially for the relations in the long-tail of the distribution. Finally, REDSandT\_Lit manages to find additional relations that were missed during annotation.

Our proposed model is the first to extract semantic relationships from 19th century Greek literary texts, and the first, to our knowledge, to extract relationships between entities other than person and place; thus, we provide a broader and more diverse set of semantic information on literary texts. More precisely, we expand the boundaries of current research from narration understanding to extended metadata extraction. Even though online repositories provide several metadata that accompany digitized books to facilitate search and indexing, digitized literary texts contain rich semantic and cultural information that often goes unused. The six relationships identified by our model can further boost the books' metadata, preserve more information and facilitate search and comparisons. Moreover, having access to a broader set of relations can boost downstream tasks, such as recommending similar books based on hidden relations. Finally, distant reading [4] goes one step further with readers and storytellers in terms of understanding the story set more quickly and easily.

The remainder of this paper is organized as follows: Section 2 contains a brief literature review, Section 3 discusses our dataset and proposed methodology. Sections 4 and 5 contain our results and discussion, respectively.

#### **2. Related Work**

Our work is related to distantly supervised relation extraction, information extraction from literary texts and metadata enhancement.

#### *2.1. Distantly-Supervised Relation Extraction*

Distant supervision [20,21] plays a key role in RE meeting its need for a plethora of training data in a simple and cost-effective manner. Mintz et al. [6] were the first to propose DS to automatically construct corpora for RE, assuming that all sentences that include an entity pair that has a relation in a KB express the same relation. Of course, this assumption is very loose and is accompanied by noisy labels. Multi-instance learning methods were proposed to alleviate the problem by performing relationship classification at the bag level, where a bag contains instances that mention the same entity pair [7,8].

With the training framework being typically the aforementioned, research focused on features and models that better suppress noise. Until the advent of neural networks (NNs), researchers used simple models heavily relying on handcrafted features (part-of-speech tags, named entity tags, morphological features, etc.) [7,8]. Later on the focus turned to model architecture. Initially, a method based on a convolutional neural network (CNN) was proposed by [22] to automatically capture the semantics of sentences, while piecewise-CNN (PCNN) [23] became the common architecture for embedding sentences and handling DS noise [24–29]. Moreover, Graph-CNNs (GCNN) proved an effective method for encoding syntactic information from text [30].

The development of pre-trained language models (LMs) that rely on transformer architecture [31] and enable to transfer common knowledge in downstream tasks has been shown to capture semantic and syntactic features better [32]. In particular, it has been shown that pre-trained LMs significantly improve the performance in text classification tasks, prevent overfitting and increasing sample efficiency [33]. Moreover, methods in [34,35] that fine-tune the pre-trained LM models, as also observed in [18,19] who extended GPT [32] and BERT [17] models, respectively, to the DS setting by incorporating a multi-instance training mechanism, show that pre-trained LMs provide a stronger signal for DS than specific linguistic and side-information features [30].

#### *2.2. Information Extraction from Literary Texts*

While relation extraction has been extensively studied in news and biomedical corpora, extracting semantic relationships from literary texts is a much less studied area. Existing research attempts to understand narration mostly from the viewpoint of character relationships but not to augment existing KBs or enhance a story's metadata in an online repository. An explanation based on [10] is the difficulty in automatically determining meaningful interpretations (i.e., predefined relations) and the lack of semantically annotated corpora. Therefore, most research is focused on extracting a limited set of relationships among characters, such as "interaction" [10–12], "mention" [10] and "family" [13].

The key challenges in extracting relations from literary texts are listed out in [36], an excellent survey on extracting relations among fictional characters. The authors point out that there can be significant stylistic differences among authors and grammar misformats in books of different periods, while the closed-word fashion of fiction where plot involves recurring entities entails coreference resolution issues. This work aims at capturing relations not only between people or places but also between organizations, dates and work of art titles.

#### *2.3. Metadata Enhancement*

It was only two decades ago when book information was only available by accessing libraries. On the other hand, nowadays we suffer from information overload, with libraries now including their own databases to facilitate search [37].

With increasing digital content being added to the enormous collection of libraries, archives, etc., providing machine-readable structured information to facilitate information integration and presentation [38] is becoming increasingly important and challenging. Moreover, research has shown that providing metadata in fiction books highly affects the selection of a fiction book and their perception on the story [39,40]. For that reason, we believe that enhancing the metadata of literary texts is crucial.

#### **3. Materials and Methods**

As discussed in the Introduction, extracting cultural information from literary texts demands either a plethora of annotations or robust augmentation techniques that can capture a representative sample of annotations and boost machine learning techniques. Meanwhile, automatically augmented datasets are always accompanied by noise, while creative writing's characteristics set an extra challenge.

In this section, we present a new dataset for Greek literary fiction from the 19th century. The dataset was created by aligning entity pair-relation triplets to a representative sample of Greek 19th century books. Even though we efficiently manage to augment the training samples, these inevitably suffer from noise and include imbalanced labels. Moreover, the special nature of the 19th century Greek language sets an extra challenge.

We present our model as follows: a distantly supervised transformer-based RE method based on [18] that has proven to efficiently suppress noise from DS using multi-instance learning and exploiting a pre-trained transformer-based LM. Our model proposes a simpler configuration for representing the embedding of the final sentence, which manages to capture a larger number of relations by using information about the entity types and the Greek BERT's [16] pre-trained model.

#### *3.1. Benchmark Dataset*

Preserving semantic information from cultural artifacts requires either extensive annotation that is rarely available or automatically augmented datasets to sufficiently capture context. In the case of literary texts, no dataset exists to train our models. Taking into account that the greatest part of digitized Modern Greek literature refers to the 19th century, we construct our dataset by aligning relation-triples from [41] to twenty-six (26) literary Greek books of the 19th century (see Table A1). Namely, we use the provided relation triplets (i.e., head-tail-relationship triplets) as an external knowledge base (KB) to

automatically extract sentences that include the entity pairs, assuming that these sentences also express the same relationship (distant supervision).

The dataset's six specific relations and their statistics can be found in Table 1. Train, validation and test datasets follow a 80%-10%-10% split. We assume that a relationship can occur within a period of three consequent sentences and only between two named entities. Sentences that include at least two named entities of different types but do not constitute a valid entity pair are annotated with a "NoRel" relation. These can either reflect sentences with no actual underlying relation or sentences for which the annotation is missed. The dataset also includes the named entity types of the sentence's entity pair. The following five entity types are utilized: person (PER), place (GPE), organization (ORG), date (DATE) and book title (TITLE). We made this dataset publicly available (Data available at: https://github.com/intelligence-csd-auth-gr/extracting-semantic-relationships-fromgreek-literary-texts (accessed on 3 August 2021)) to encourage further research on 19th century Greek literary fiction.


**Table 1.** Dataset's Statistics.

The challenges of this dataset are threefold. At first, similar to all datasets created via distant supervision, ours also suffers from noisy labels (false positives) and is imbalanced, including relations with a varying number of samples. Secondly, the dataset includes misspellings stemming from the books' digitization through OCR systems. Lastly, the documents use a conservative form of the modern Greek language, *katharevousa*, which was used between the late 18th century and 1976. Katharevousa, which covers a significant part of modern Greek literature, is more complex than modern Greek, including additional cases, compound words and other grammatical features that set an extra challenge for the algorithm.

#### *3.2. The Proposed Model Architecture*

In this section, we present our approach towards extracting semantic relationships from literary texts. We highlight that the specific challenges that we have to address are as follows: DS noise, imbalanced relations, character misspellings due to OCR, Katharevousa form of Greek language and creative writing peculiarities. Inspired by [18,19] who showed that DS and pre-trained models can suppress noise and capture a wider set of relations, we propose an approach that efficiently handles the aforementioned challenges by using multi-instance learning, exploiting a pre-trained transformer-based language model and incorporating entity type side-information.

In particular, given a bag of sentences {*s*1,*s*2, . . . ,*sn*} that concern a specific entity pair, our model generates a probability distribution on the set of possible relations. The model utilizes the GREEK-BERT pre-trained LM [16] to capture the semantic and syntactic features of sentences by transferring pre-trained common-sense knowledge. In order to capture the specific patterns of our corpus, we fine-tuned the model using multi-instance learning; namely, we trained our model to extract the entity pairs' underlying relation given their associated sentences.

During fine-tuning, we employ a structured, RE-specific input representation to minimize architectural changes to the model [42]. Each sentence is transformed to a

structured format, including a compressed form of the sentence along with the entity pair and their entity types. We transform the input into a sub-word level distributed representation using byte-pair encoding (BPE) and positional embeddings from GREEK-BERT fine-tuned on our corpus. Lastly, we concatenate the head and tail entities' types embeddings, as shaped from BERT's last layer, to form the final sentence representation that we used to classify the bag's relation.

The proposed model can be summarized in three components: the sentence encoder, the bag encoder and model training. Components are described in the following sections with the overall architecture shown in Figures 1 and 2.

**Figure 1.** Sentence Representation in REDSandT\_Lit. The input embedding *h*<sup>0</sup> is created by summing the positional and byte pair embeddings for each token in the structured input. States *h<sup>t</sup>* are obtained by self-attending over the states of the previous layer *ht*−<sup>1</sup> . The final sentence representation is shaped by concatenating the head entity embedding *hL*\_*h*−*type* and the tail entity embedding *hL*\_*t*−*type* . Head and tail entity type embeddings are marked with bold lines.

**Figure 2.** Transformer architecture (**left**) and training framework (**right**). We used BERT transformer architecture and precisely the *bert-base-greek-uncased-v1* GREEK-BERT LM. Sentence representation *s<sup>i</sup>* is formed as shown in Figure 1. Reprinted with permission from [18]. Copyright 2021 Copyright Despina Christou.

#### 3.2.1. Sentence Encoder

Our model encodes sentences into a distributed representation by concatenating the head (*h*) and tail (*t*) entity type embeddings. The overall sentence encoding is depicted in Figure 1, while the following sections examine in brief the parts of the sentence encoder in a bottom-up manner.

In order to capture the relation hidden between an entity pair and its surrounding context, RE requires structured input. To this end, we encoded sentences as a sequence of tokens. At the very bottom of Figure 1 is this representation, which starts with the head entity type and token(s) followed by the delimiter (H- SEP), continues with the tail entity type and token(s) followed by the delimiter [T- SEP] and ends with the token sequence of a compressed form of the sentence. The whole input starts and ends with the special delimiters [CLS] and [SEP], respectively, which are typically used in transformer models. In BERT, for example, [CLS] acts as a pooling token representing the whole sequence for downstream tasks, such as RE. We do not follow that convention. Furthermore, tokens refer to the sub-word tokens of each word, where each word is also lower-cased and normalized in terms of accents and other diacritics; for example the word "Αρσάκειο" (Arsakeio) is split into the "αρ" ("ar"), "##σα" ("##sa") and "##κειο" ("##keio") sub-word tokens.

#### **Input Representation**

As discussed in Section 3.1, samples including a relation can include up to three sentences; thus, samples generally referenced as sentences within the document can entail information which is not directly related to the underlying relation. Moreover, creative writing's focus on narration results in long secondary sentences that further disrupt the content linking the two entities. In order to focus on the important to the relation tokens, we adopt two distinct compression techniques, namely the following:


Our selection is based on the fact that context closer to the entities holds the most important relational information. We experimented with two compressed versions of the text, one that keeps all text between the two entities (*trim\_text\_1*) and one that keeps only the very close context (*trim\_text\_2*) assuming that the in-between text, if long enough, typically constitutes a secondary sentence, irrelevant to the underlying relation. Our assumption is reassured in our experiments (see Sections 4 and 5).

After suppressing the sentences to a more compact form, we also incorporate the head and tail entities text and types in the beginning of the structured input to bias LM focusing on the important for the entity pair features. Extensive experimentation reveals that the extracted entity type embeddings hold the most significance information for extracting the underlying relation within two entities. Entity types are considered known and are also provided in the dataset.

#### **Input Embeddings**

Input embeddings to GREEK-BERT are presented as *h*<sup>0</sup> in Figure 1. Each token's embedding results from summing the positional and byte pair embeddings for each token in the structured input.

Position embedding is an essential part of BERT's attention mechanism, while bytepair embedding is an efficient method for encoding sub-words to account for vocabulary variability and possible new words in inference.

To make use of sub-word information, the input is tokenized using byte-pair encoding (BPE). We use the tokenizer of the pre-trained model (35,000 BPEs) to which we added seven task-specific tokens (e.g., [H-SEP], [T-SEP] and five entity type tokens). We forced the model not to decompose the added tokens into sub-words because of their special meaning in the input representation.

#### **Sentence Representation**

Input sequence is transformed into feature vectors (*hL*) using GREEK-BERT's pretrained language model fine-tuned in our task. Each sub-word token feature vector

(*hLi* . . . *Dt*) is the result of BERT's attention mechanism over all tokens. Intuitively, we do understand that feature vectors of specific tokens are more informative and contribute more in identifying the underlying relationship.

To the extent that each relation constrains the type of the entities involved and vice versa [30,43], we represent each sentence by concatenating the head and tail entities' type embeddings:

$$\mathbf{s}\_{i} = [h\_{L\_{h-type}}; h\_{L\_{t-type}}] \tag{1}$$

where *s<sup>i</sup>* ∈ ℜ*dh*∗<sup>2</sup> .

While it is typical to encode sentences using the vector of the [CLS] token in *h<sup>L</sup>* [11], our experiments show that representing a sentence as a function of the examining entity pair types reduces noise, improves precision and helps in capturing the infrequent relations.

Several other representation techniques were tested; i.e., we tested the method of also concatenating the [CLS] vector to embed the overall sentence's information and also using the sentence representation from [18], including relation embeddings and further attention mechanisms, with the presented method to outperform. Our intuition is that the LM was not able to efficiently capture patterns in Katharevousa since manual observation revealed most words to have split in many sub-words. This occurs because Katharevousa differs to Modern Greek, while some words/characters were also misspelled in the OCR process.

#### *3.3. Bag Encoder*

Bag encoding, i.e., aggregation of sentence representations in a bag, comes to reduce noise generated by the erroneously annotated relations accompanying DS.

Assuming that not all sentences equally contribute to bag's representation, we use selective attention [24] to highlight the sentences that better express the underlying relation.

$$B = \sum\_{i} \mathfrak{a}\_{i} \mathfrak{s}\_{i\nu} \tag{2}$$

As observed in the above equation, selective attention represents each bag as a weighted sum over its individual sentences. Attention *α<sup>i</sup>* is calculated by comparing each sentence representation against a learned representation r:

$$\alpha\_{i} = \frac{\exp(s\_{i}r)}{\sum\_{j=1}^{n} \exp(s\_{j}r)} \tag{3}$$

At last, the bag representation *B* is fed to a softmax classifier in order to obtain the probability distribution over the relations:

$$p(r) = \text{Softmax}(\mathcal{W}\_r \cdot \mathcal{B} + b\_r),\tag{4}$$

where *W<sup>r</sup>* is the relation weight matrix, and *b<sup>r</sup>* ∈ ℜ*d<sup>r</sup>* is the bias vector.

#### *3.4. Training*

Our model utilizes a transformer model, precisely GREEK-BERT, which fine-tunes on our specific setup to capture the semantic features of relational sentences. Below, we briefly present the overall process.

#### **Pre-training**

For our experiments, we use the pre-trained *bert-base-greek-uncased-v1* language model [16], which consists of 12 layers, 12 attention heads and 110M parameters where each layer is a bidirectional Transformer encoder [31]. The model is trained on uncased Modern Greek texts of Wikipedia, European Parliament Proceedings Parallel Corpus (Europarl) and OSCAR (clean part of Common Crawl) with a total of 3.04B tokens. GREEK-BERT is pre-trained using two unsupervised tasks, masked LM and next sentence prediction, with masked LM being its core novelty as it allows the previously impossible bidirectional training.

#### **Fine-tuning**

We initialize our model' s weights with the pre-trained GREEK-BERT model and fine-tune only the last four layers under the multi-instance learning setting presented in Figure 2, using the specific input shown in Figure 1. After experimentation, only the last four layers are fine-tuned.

During fine-tuning, we optimize the following objective:

$$L(D) = \sum\_{i=1}^{|B|} \log P(l\_i | B\_i; \theta) \tag{5}$$

where for all entity pair bags |*B*| in the dataset, we want to maximize the probability of correctly predicting the bag's relation (*l<sup>i</sup>* ) given its sentences' representation and parameters (θ).

#### *3.5. Experimental Setup*

#### 3.5.1. Hyper-Parameter Settings

In our experiments we utilize *bert-base-greek-uncased-v1* model with hidden layer dimension *D<sup>h</sup>* = 768, while we fine-tune the model with *max\_seq\_length D<sup>t</sup>* = 128. We use the Adam optimization scheme [44] with *β*<sup>1</sup> = 0.9, *β*<sup>2</sup> = 0.999 and a cosine learning rate decay schedule with warm-up over 0.1% of training updates. We also minimize loss using the cross entropy criterion.

Regarding dataset-specific REDSandT\_Lit model's hyper-parameters, we automatically tune them on the validation set based on F1- score. Table 2 shows the applied search space and selected values for the dataset-specific hyper-parameters.


Experiments are conducted in Python 3.6, on a PC with 32.00 GB RAM, Intel i7-7800X CPU@ 3.5 GHz and NVIDIA's GeForce GTX 1080 with 8 GB. Fine-tuning takes about 5 min for the three epochs. The implementation of our method is based on the following code: https://github.com/DespinaChristou/REDSandT (accessed on 18 May 2021).

#### 3.5.2. Baseline Models

In order to show the proposed method's effectiveness, we compare against three strong baselines in our dataset. More precisely, we compare REDSandT\_Lit to the standard featurebased [45] and NN-based [46] approaches used in the literature while also comparing to the Greek version of BERT [16]. All models were tested on both sentence compression formats presented in Section 3.2.1 and are indicated with respective (1, 2) superscripts. For the Bi-LSTM approach we also experimented with both full-word and BPE tokenization indicated with (⋆) and (⋆⋆) superscripts, respectively.

#### **Feature-based Methods**

• *SVM*<sup>1</sup> : A Support Vector Machine classifier. Sentences are encoded using the firstpresented compression format.

• *SVM*<sup>2</sup> : A Support Vector Machine classifier. Sentences are encoded using the secondpresented compression format.

#### **NN-based Methods**


#### **Transformer-based Methods**


#### 3.5.3. Evaluation Criteria

In order to evaluate our model against baselines, we report accuracy macro-P, R, F and weighted-P, R, F for all models. For a more in-depth analysis of models' performance in each relation, we report Precision, Recall and F1-score metrics for all models and relations. Moreover, we conduct Friedman's statistical significance test to compare all presented models on our dataset, following [47,48].

#### **4. Results**

In this section, we present the results of our model against the predefined baselines both overall and for each relation, separately.

#### *4.1. Overall Models Evaluation*

Table 3 compares our model to the baseline models mentioned above. We observed the following: (1) both *REDSandT*\_*Lit*<sup>1</sup> and *REDSandT*\_*Lit*<sup>2</sup> are better overall in terms of precision, recall and F1-score, followed by *SVM*<sup>2</sup> and *BiLSTM*2,<sup>∗</sup> ; (2) preserving the surrounding context of entity pairs (*trim\_text\_2*) almost always results in better results; and (3) using full-word tokenization in Bi-LSTM models shows a tremendous performance improvement over using BPE tokenization. Focusing on the *REDSandT*\_*Lit* models, a detailed investigation of their performance on each separate relation showed that the high accuracy achieved by *REDSandT*\_*Lit*<sup>1</sup> was mainly due to that model being highly accurate in identifying "NoRel" relations. This explains the differences in macro vs. weighted metrics of *REDSandT*\_*Lit*<sup>1</sup> .

Moreover, when it comes to training times, the SVM models are clearly the winner with training times less than a sec, with the rest models deviating from 4 min (BERT-based trained in GPU) to 20 min (BiLSTM trained in CPU). Moreover, it is worth mentioning that the extra complexity added by bag training induces only 10 s additional training time in REDSandT\_Lit compared to the training time of the simple Bert models.


**Table 3.** Baselines Comparison. We report the overall accuracy (ACC), precision (P), recall (R) and F1-score (F1) at the Test set. For P, R and F1 we present both macro-version and weighted-version of the metrics.

In order to validate the contribution of all presented models, we compare (i) all examined models and (ii) the best performed ones by using the Friedman's statistical test. As observed in Table 4, the *p*-value of both compared model sets is less than 0.05 (actually close to zero); thus, we have sufficient evidence to conclude that using different models results in statistical differences in the predicted relations and that our outcomes are statistical significant.

**Table 4.** Friedman's Statistical Test—We compare (i) all models and (ii) only the best performed models (those highlighted in bold in Table 3).


#### *4.2. Models Evaluation on Each Relation*

Tables 5–7 compare our models to the above-mentioned baselines across all relations, reporting precision, recall and F1-score, respectively. Overall, we observed following: (1) the *REDSandT*\_*Lit* models exhibit strong performance across all relations, while *REDSandT*\_*Lit*<sup>2</sup> best captures relations in the long-tail; (2) *SVM*<sup>1</sup> , *SVM*<sup>2</sup> and *BERT*<sup>2</sup> are generally consistent but all Bi-LSTM models exhibit significant performance variabilities; and (3) *SVM* models perform well regardless of chosen sentence compression.

**Table 5.** Baselines Comparison—We report Precision (P) (in % format) at Test set for all relations.



**Table 6.** Baselines Comparison—We report Recall (R) (in % format) at Test set for all relations.

**Table 7.** Baselines Comparison—We report F1-score (F) (in % format) at Test set for all relations.


#### **5. Discussion**

#### *5.1. Error Analysis*

Figure 3 presents the confusion matrices for *REDSandT* \_ *Lit* <sup>2</sup> and *SVM*<sup>2</sup> models. Even though the SVM model seems to slightly over-perform the *REDSandT* \_ *Lit* approach, the confusion matrices show that this superiority comes from the "NoRel" relation. Excluding the "NoRel" relation, *REDSandT* \_ *Lit* <sup>2</sup> model performs much better across all relations including those in the long tail. As previously discussed, "NoRel" relation can include sentences which do not contain a relation or were not annotated. For this reason, we further analyze the performance in this class below.

**Figure 3.** Confusion matrices of *REDSandT*\_*Lit* 2 (**left**) and *SVM*<sup>2</sup> (**right**) models.

#### *5.2. Effectiveness on Mislabelled Sentences*

Sentences marked with "NoRel" relation correspond to sentences that include at least two recognized entities but where not annotated with a relation. This can correspond either in no underlying relation within sentence or a missed annotation. In order to examine this case, we further investigate the performance of the best performing models on the "NoRel" relation. Our goal is to reveal the model that can capture missed annotations and propose it as an efficient model that can correct mislabels and augment samples which in our case and industry-wise is of high importance.

Table 8 compares the best two models on predicting mislabelled samples. We observe that *REDSandT*\_*Lit*<sup>2</sup> is superior to *SVM*<sup>2</sup> in this task and precisely in identifying "artAuthor" relations within sentences that were not annotated.


**Table 8.** Comparing the two best performing models (*REDSandT*\_*Lit*<sup>2</sup> , *SVM*<sup>2</sup> ) on predicting mislabelled samples in "NoRel" relation.

#### **6. Conclusions and Future Work**

We proposed a novel distantly supervised transformer-based relation extraction model, REDSandT\_Lit, that can automate metadata extraction from literary texts, thus helping sustaining important cultural insights that otherwise could be lost in unindexed raw texts. Precisely, our model efficiently captures semantic relationships from Greek literary texts of the 19th century. We constructed the first dataset for this language and period, including 3649 samples annotated through distant supervision with six semantic relationships. The dataset is in the Katharevousa variant of Greek, in which a great part of Modern Greek literature is written. In order to capture the semantic and syntactic characteristics of the language, we exploited GREEK-BERT, a pre-trained language model on modern Greek, which we fine-tuned on our specific task and language. To handle the problem of noisy instances, as well as the long sentences that are typical in literary writing, we guided REDSandT\_Lit to focus solely on a compressed form of the sentence and the entity types of the entity pair. Extensive experiments and comparisons with existing models on our dataset

revealed that REDSandT\_Lit has superior performance, manages to capture infrequent relations and can correct mislabelled sentences.

Extensions of this work could focus on augmenting our dataset to facilitate direct BERT pre-training on the Katharevousa form of the Greek language. Even though we achieve high accuracy with pre-trained models in Modern Greek and finetuned on the Katharevousa variant, this inconsistency suggests that augmenting the studied data and providing a model specific to these data can further improve results. Moreover, we would like to further investigate the effect of additional side-information such as POS info and entities description, while also an end-to-end model that is not based on pre-recognized entities and extracts both entities and relations in one pass. At last, although there is extensive research on ancient Greek philosophy, literature and culture, as well as research in modern Greek Natural Language Processing (NLP) tools, the very important (from a cultural, literary and linguistic point of view) Katharevousa form of the Greek language has not been studied in terms of automatic NLP tools. Thus, creating automated tools specific to this form is a step towards revealing important cultural insights for the early years of the modern Greek state.

**Author Contributions:** Conceptualization, D.C. and G.T.; methodology, D.C.; software, D.C.; validation, D.C. and G.T.; formal analysis, D.C.; investigation, D.C.; resources, G.T.; data curation, D.C.; writing—original draft preparation, D.C.; writing—review and editing, D.C. and G.T.; visualization, D.C.; supervision, G.T.; project administration, G.T.; funding acquisition, G.T. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was co-financed by the European Regional Development Fund of the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call RESEARCH–CREATE-INNOVATE (project code:T1EDK-05580) in the context of the ECARLE (Exploitation of Cultural Assets with computer-assisted Recognition, Labeling and meta-data Enrichment) project.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are available at: https://github.com/ intelligence-csd-auth-gr/extracting-semantic-relationships-from-greek-literary-texts (accessed on 3 August 2021).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:



#### **Appendix A**

**Table A1.** Nineteenth century Greek books catalogue.


#### **References**


MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Sustainability* Editorial Office E-mail: sustainability@mdpi.com www.mdpi.com/journal/sustainability

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18

www.mdpi.com ISBN 978-3-0365-3068-0