*Review* **Using Big Data to Measure Tourist Sustainability: Myth or Reality?**

#### **Yamilé Pérez Guilarte 1,\* and Daniel Barreiro Quintáns <sup>2</sup>**


Received: 25 July 2019; Accepted: 10 October 2019; Published: 13 October 2019

**Abstract:** The concern about the production of international standards to measure the sustainability of tourism is present today, especially the discourse on the introduction of new sources. This article aims to survey and describe the main approaches and methodologies to use big data to measure tourism sustainability. Successful cases are addressed by explaining the main opportunities and challenges for the creation of official tourist statistics. A comprehensive review of publications regarding this field was carried out by applying the systematic literature review technique. This contributes a knowledge base to destination management organisations to encourage the implementation of official tourism statistics systems using big data.

**Keywords:** big data; tourism sustainability; official statistics; indicators

#### **1. Introduction**

Sustainability constitutes a key element in the tourism industry's competitiveness, as destinations are appreciated according to the quality of their environment, including the local communities' attitudes. For tourist historic cities, this is a real challenge considering that the flows of visitors are continuously growing [1]. As a result, some destinations or significant tourist attractions suffer from overtourism, which not only makes the resident's quality of life worse, but also the visitors' experience. In order to prevent such conditions in historic cities, tourism practices should follow integral sustainable models, instead of only guaranteeing heritage protection [2].

Today, in the discussion about the role of natural and social resources to increase economic benefits, sustainable development and sustainability are fundamental [3]. In this context, various initiatives have emerged at different territorial scales to establish systems to measure tourism sustainability. However, there is not an international and generally accepted statistical framework, including social and environmental dimensions for the measurement of tourism sustainability [4]. The initiative, "Measuring Sustainable Tourism (MST)", is currently under development by the United Nations World Tourism Organisation (UNWTO) with the aim of publishing a global procedure to measure tourism's effect on sustainability. It asks for a framework based not on the use of traditional data sources, but capable of using and integrating all possible sources to provide the richest picture possible.

In this context, new sources have emerged from the use of big data technologies in the tourism sector, with real potentialities to improve the relevance and quality standards of official statistics [5]. Some examples of these new data sources include: store cashiers, mobile network operators, social media, web activity, flight reservation systems, smart mobile devices, financial transactions, traffic loops, satellite images, Wikimedia content and image collections, among others [6].

Most tourist applications have focused on recommender systems, which are software-based tools to personalise tourist products based on visitors' interests. This is to propose the model experience according to the visitors' desires [7]. However, the exploitation potential to support destination management organisation's strategic decisions has only received little consideration [8,9]. The application of big data technologies to tourism planning and managing is complex as it requires technological expertise [10]. Nevertheless, this is not the only factor determining its application. Important and coordinated efforts have to be made by statistical authorities and data providers to obtain results with the quality standards actually achieved by current official statistics [5]. Furthermore, for the integration of private and public stakeholders, the organisational learning processes are fundamental, allowing them to define their specific knowledge requirements [11,12].

From the above-mentioned, two gaps can be identified: On the one hand, the lack of an up-to-date international and generally accepted statistical framework to measure tourist sustainability; on the other, the underexploited application of big data by official statistical agencies. Therefore, this study considers two hypotheses: (1) Official tourism statistical systems are not including specific indicators to measure tourist sustainability (economic, social, environmental) because of the absence of practical guidelines and tools; (2) they are still generally based on the use of traditional sources, especially due to the lack of collaboration among tourism authorities, data providers, big data experts and academia.

This article aims to survey and describe the approaches and methodologies for big data to generate official tourism statistics that support destination management organisations. A special focus is placed on measuring social, economic, and environmental sustainability. The research sets out to study to the extent to which big data potentialities are exploited in the generation of official tourism statistics, as well as in the design of tourist intelligence information systems. According to the authors' knowledge, this is the first systematic review of literature in the field of hospitality and tourism that focuses on the use of big data in official tourism statistics. The management of the main opportunities and challenges addressed in this study could encourage destination manager organisations to use it to optimise competitiveness and sustainability, especially in tourist historic cities.

#### **2. Literature Background**

#### *2.1. Measuring Tourism Sustainability*

Sustainability applied to tourism refers to a type of tourism that satisfies the current public's necessities without risking the possibility of future generations satisfying their own needs [13]. For the European Association of Historic Cities and Regions, sustainability encompasses social, environmental and economic issues, and in the case of cultural tourism, this means taking into account each of the components through [14]:


Sustainable tourism has been included in the agenda of some of the most important global institutions in the sector. This is the case of the Global Sustainable Tourism Council (GSTC) (https://www.gstcouncil.org/). It started in 2007 as a partnership among international institutions to promote knowledge on tourist sustainability and to agree on common rules for sustainable tourism, and became an organisation in 2010 supported by the United Nations Environment Programme, United Nations Foundation and the United Nations World Tourism Organisation (UNWTO). The latter, launched the Guide, "Indicators of Sustainable Development for Tourist Destinations", intended to use indicators as a main tool to optimise tourism planning and management [13]. However, since its release in 2004, new formulas have emerged to support local authorities to guarantee the destination's sustainability.

Furthermore, in 2015, all members of the United Nations adopted the Sustainable Development Goals (SDGs). This constitutes a set of 17 goals to encourage sustainability at a global level by setting targets to be fulfilled by 2030 in issues such as the environment, health, poverty, social rights, innovation and education (https://www.un.org/sustainabledevelopment/sustainable-development-goals/). In addition, in 2017, the International Year of Sustainable Tourism for Development was nominated by the 70th General Assembly of the United Nations. In this favorable context, multiple initiatives have emerged. A relevant example is the UNWTO International Network of Sustainable Tourism Observatories (INSTO). It is a network of tourism observatories that, through the regular use of monitoring, assessment and information administration, provide significant instruments to support the design and application of policies on sustainable tourism [15].

In Europe, several initiatives have been promoted by the European Commission, the European Environment Agency, or the Council of Europe among other organisations. In 2010, a group of actions included in the communication, "Europe, the World's No. 1 Tourist Destination—A New Political Framework for Tourism in Europe", were launched to encourage sustainable tourism in Europe and promote competitiveness and visibility on a global scale [16]. The European Tourism Indicator System for sustainable management at a destination level (ETIS) is one of these actions. It was initiated in 2013 by the European Commission as an easy and useful toolkit for tourism stakeholders to improve sustainable tourism management. The ETIS results are supported by self-assessment, observation, data gathering and analysis, which allow the destinations to collect the necessary information to supervise sustainability and effectively manage tourism activities. The ETIS includes 43 indicators that have been divided into four categories: destination management, social and cultural impact, economic value and environmental impact. The ETIS tries to respond to the need to protect and enhance cultural heritage, local identity and resources to avoid the phenomenon of banalisation and the residents' discontent [3].

It is also worth mentioning the Barcelona Declaration, "Better Places to Live, Better Places to Visit" launched in April 2018 aiming to deliver a legacy for Europe beyond the 2018 European Year of Cultural Heritage (EYCH 2018). This was an action initiated by the Network of European Regions for Sustainable and Competitive Tourism (NECSTouR), in collaboration with the European Cultural Tourism Network, the European Travel Commission and Europa Nostra, and supported by the European Heritage Alliance 3.3. Its main objective is to show the synergies between tourism and cultural heritage to benefit European citizens, cultural heritage, companies, visitors and destinations. In addition, it parts from assuming a collective responsibility of all involved sectors to achieve SDGs. Principle 4 "Balancing Place, People and Business" of the Declaration clearly mentions the need for efficient tools to measure tourism impacts [17].

The need to introduce new tools to measure tourism sustainability is present in several projects and actions that have been implemented worldwide. For example, the project "Models of Integrated Tourism in the MEDiterranean Plus (MITOMED+)" (https://mitomed-plus.interreg-med.eu/) financed by the Interreg Mediterranean Programme, focuses on public policies for the sustainable development of maritime and coastal tourism. It develops evaluation and planning tools to help tourist destinations to improve their sustainability levels.

In Asia, for instance, some projects have been developed through the involvement of small and medium-sized enterprises (SMEs) as part of the SWITCH-Asia programme (https://www.switch-asia.eu/). This initiative is based on the possibilities SMEs can offer in terms of innovation management, uniqueness of services, and practical solutions in the implementation of sustainable measures in the tourism industry [18]. In Latin America, or more specifically in Honduras, Bolivia, Peru, México and Costa Rica, some projects are responsible for improving the locals' quality of life, including indigenous communities from the development of sustainable tourism both in urban and rural areas [19]. It is also worth mentioning some initiatives in African countries such as Kenya, Zimbabwe, Egypt, Burkina Faso, South Africa and Mozambique (https://sustainabletourism.net/case-studies/austrailianz/africa/). These actions focus on eco-efficient accommodation, ecotourism, wildlife tourism regulations, instruments for economic development, and initiatives to preserve the communities' culture and the environment.

Several studies have been carried out worldwide to assess the importance of sustainable tourism in the promotion of well-being and local development due to the use of indicators [20–22]. Indeed, the indicators to monitor tourist sustainability have been accepted as valid tools for: (1) The assessment of policies and the monitorisation of destination performances [23–25]; (2) the definition of development plans and establishment of quantitative objectives [26–29]; (3) easy communication to destination stakeholders about the present situation and upcoming scenarios [30]. In more exact terms, the problems regarding the practical application of sustainability are understood by all stakeholders, including policymakers, local communities, entrepreneurs, Non-Governmental Organisations (NGOs), and visitors. Hence, it still remains a challenging concept [3].

The need to measure both the performance of tourism and its impacts has led the tourism sector to focus, for the past 15 years, on the sustainability indicator-based case studies [28]. However, some issues make it difficult to implement actions to measure and manage tourism, thus creating a gap [31–33]. The handling of the large number of indicators that are generally included in measuring tourism impacts, the data availability at a local level, and the incomplete quantification of indicators are some of the difficulties [26,28,34]. In addition, there are a few studies especially oriented to the use of sustainable tourism indicators at heritage destinations [26,35].

The discussion on the synergies between science and policy in choosing a set of indicators to properly monitor sustainability [34,35] expresses the relevance of incorporating both scientific principles and participatory planning processes [36–40]. Therefore, this is a political as well as a technical choice that must focus on establishing significant indicators to assess sustainability in the social, economic and environmental dimensions [41–43].

In addition, there are recurring criticisms which sustain that academics and public organisations have manifested great enthusiasm towards sustainable tourism, but without achieving any major results. [36]. While academia is criticised for concentrating their efforts on the production of literature instead of the production of practical tools, public agencies are accused of misusing the concept to justify tourism development. In spite of these criticisms, there is a recognition of the need to move towards a more sustainable horizon, as well as the important role that the business sector will play in its effective implementation [18,37,38].

The concern about the production of practical international standards to measure the sustainability of tourism activity is very present today. The UNWTO has been working on a draft framework through the initiative "Measuring Sustainable Tourism" (MST). It was presented at the 6th International Conference on Tourism Statistics held in Manila in June 2017. The Secretary-General of the UNWTO expressed the relevance of the MST initiative as a framework of meaningful and feasible indicators for a real contribution of tourism to SDGs, the 2030 Development Agenda, and a new era of sustainable and inclusive development. It also highlighted the need to collect more data sources, developing clear and unified concepts, and building technical capacity. More precisely, in the conference Session 5, "Producing Data on Sustainable Tourism", the potential to use various data sources, particularly big data, for the measurement of sustainable tourism was addressed. It was concluded that it is essential that statisticians find opportunities to access and utilise new data sources to improve and extend current tourism datasets [4].

#### *2.2. Big Data: A New Source for O*ffi*cial Tourist Statistics*

Big data can be defined as a set of data collected from various sources with diverse formats, including texts, images, voices or rasters. They may be extracted from Instagram, Facebook, Twitter, blogs, videos and voice recordings, and also, from communication systems, business databases and sensors. Apart from the large volume of information, there are other features that characterise big data. The five main properties of big data are well-known as the 5V: variety, velocity, volume, veracity and value [39].

In exact terms, the report "Tourism Statistics: Early Adopters of Big Data?" states that big data in a gradual but persistent manner will partially take the place of traditional sources or surveys [6]. It also highlights the relevant role tourism statisticians should have in rethinking the statistics systems through the integration of big data. It allows the measurement of not only an individual's physical movements, but also monetary transactions, thereby becoming an indispensable tool for designing, implementing and disseminating innovation systems within the field of tourism. Figure 1 shows the most generally considered sources of big data. As in other categorisations, some elements may be subjectively identified within different groups. For example, publications on social networks can be classified either as communication systems or world wide web, while Wikipedia can be considered web-based and crowd sourced at the same time [6].

**Figure 1.** Sources of big data [6].

Recently, the amount of available public data has increased exponentially due to: The implementation of open data initiatives worldwide by public sectors; the popularisation of collaborative tools, such as Open Street Maps or social networks like Twitter or Instagram, that provide data generated by users without the need of governmental or central institutions; and also the broad usage or tools like TripAdvisor or Booking.com. Accordingly, this circumstance represents a new paradigm of communication and knowledge sharing between citizens, companies and public institutions. However, they are not being fully exploited [9].

As part of the current data revolution, the concept of a smart tourism destination has gained relevance. It has been defined as the product of the interconnection between a tourist destination and the various stakeholders through dynamic platforms and knowledge-intensive communication flows, as well as with improved support systems for decision-making [40–43]. The final purpose of an intelligent tourist destination must be the improvement of the tourist experience, the maximisation of competitiveness and consumer satisfaction based on sustainability [44].

However, although sustainability is supposed to be a fundamental pillar in the smart tourism destinations or smart cities approaches, the models integrating smartness and sustainability are still dealing with some gaps [45–50]. The situation is particularly acute in tourist destinations, because reaching sustainability is, in general, an unsolved issue which still lacks practical actions [51]. In this context, and in order to reinforce the sustainable dimension, terms such smart sustainability [51] and smart sustainable cities have emerged [48].

Smart sustainability is based on a governance framework that applies technology to five fundamental pillars [51]: (i) long-term planning, the efficient management of resources; (ii) monitoring, transparency and participation, public-private cooperation, knowledge, innovation; (iii) communication, (iv) awareness raising; (v) the improvement of the tourist experience. The intensive use of technology by smart tourism destinations plays a key role through the potential interactions that

may arise between the technologies and the basic elements of sustainability, which could accelerate the process of achieving it. Nevertheless, the relevant modifications in the business sector at different territorial scales are required to exploit the potential of information technologies to support sustainable tourism, as well as more innovative models developed by tourism academics and solid interactions with public authorities in tourism [52].

Big data includes, on one hand information collected from the sensorisation of the destination from different subsystems, such as those to monitor water consumption, waste volume, energy usage, urban mobility, etc. On the other hand, it encompasses new and relevant data sources to support sustainable tourist models. This is the case of the information regarding the spatial and temporal concentration of visitors compiled from online booking [53] or from social networks [54]. A deep understanding of visitors' movements at the destination and the factors influencing them allows tourism managers to solve or prevent overcrowding situations that affect the tourist's experience and the residents' quality of life, particularly in historic centres.

As part of the expansion of the use of big data in the tourism sector, different applications have arisen, such as destination management systems (DMS) or tourist information systems (TIS), which integrate relevant statistical data collected from traditional sources or big data. They cover the demand side, as for example: the visitor's profile, behaviour and opinions, the supply side (expenditure, overnight stay, seasonality), and the residents' perspective (satisfaction, involvement). The data can be accessed through platforms, which generally allows for the easy visualisation and understanding of the information, chart data, keyword graphs, trend charts, tag cloud, etc. Furthermore, if the DMS or TIS incorporates the data about environmental indicators, they will contribute not only towards reaching economic and social sustainability of the destination, but also environmental [55,56]. Unfortunately, this integration is too far in the future to be in widespread use at this time [51].

The use of intelligent systems in tourism is widely used as a support for destination management. For example, in Spain, the State Society for the Management of Innovation and Tourism Technologies, A. S. (SEGITTUR, by its Spanish acronym) is leading initiatives to use the latest technologies (big data and business intelligence) to measure and analyse the real behaviour of the consumption of the city by its visitors and tourists. This is the case of the Tourism Intelligence System (SIT, by its Spanish acronym), a technological platform based on the exhaustive analysis of different sources of information selected according to the needs and idiosyncrasies of the territory and the priorities that are marked by its managers. The system has been implemented in the cities of Las Palmas de Gran Canarias, Palma de Mallorca, and Badajoz. In the latter, the system is shared with the city of Elvas (Portugal) framed in a project financed by the European program of Cross-Border Cooperation Spain-Portugal (POCTEP) [57].

The Tourist Intelligence System of Buenos Aires (https://turismo.buenosaires.gob.ar/es/ observatorio) also exploits big data to generate information about the visitors' volume, origin, stay, expense, booking preferences, as well as the data from the accommodation industry and aviation connection competitiveness. It also provides information about the visitors' movements in the city by neighbourhood, day and even by hours. This indicator is significant enough to ensure the social and environmental sustainability of the destination. As overcrowded areas and tourist attractions can be identified, the adoption of measures to ensure a quality tourist experience and the preservation of the local environment and communities can be adopted in real time.

Sustainable tourism can benefit from the application of technologies on at least three levels. At a destination level, they provide stakeholders with a global understanding of the tourist phenomenon and its economic, social and environmental impacts, which can encourage them to adopt a responsible and proactive attitude towards sustainable goals [58]. At the visitors' level, as they can access these platforms, be informed about the sustainability levels of the destination, and as a prosumer, they can choose one place or another to travel, while more responsible behaviour at the destination is encouraged. At the local communities' level, as residents are interviewed, they participate in tourism planning, and as a result, engage in more actions supporting sustainable tourism [51].

#### **3. Materials and Methods**

The establishment of a strong theoretical frame was the base to survey and describe approaches and methodologies for using big data in the generation of tourism statistics, with a special focus on measuring sustainability. For this purpose, a comprehensive review of publications regarding this field was carried out by applying the systematic literature review (SLR), widely used in social sciences [59–62]. The SLR allows studies to be weighed against each other in terms of the confidence with which their findings can be accepted, while data integration makes it possible to reach an overall judgement from all studies. Both contribute to the communication between researchers and practitioners. They also reduce the effort required by practitioners and other service decision-makers in finding and evaluating research evidence to make their decisions [59].

According to this research objective, the systematic literature review was oriented to answer the following questions:


In order to locate high-quality studies for the research topic, databases from the Web of Science (WOS) and SCOPUS were examined. The search encompassed different types of publications, such as articles from peer-review journals, books, proceedings or reviews. Furthermore, relevant publications from international organisations such as the World Tourism Organisation, UNESCO or the European Commission related to the use of big data in tourist statistics were consulted.

In order to ensure the inclusion of all significant studies needed to give a response to the research questions, several criteria were established. Firstly, both databases were searched using the following keywords: "big data & tourism", "big data & tourism & sustainability", "big data & tourism & indicators", "big data & tourism statistics". Secondly, papers were selected between the years 1999 and 2019, thus guaranteeing a wide period to observe tendencies and changes. Thirdly, both theoretical and case studies within the social science, arts and humanities fields were included.

The process of selection and exclusion of articles is shown in Figure 2. After removing duplicate articles, a total of 180 abstracts were read. However, 108 were excluded because they did not respond to the research objective and questions. Next, the full texts of the remaining 72 selected were read, which allowed a final selection of 10 articles to be analysed, representing only 15% of the 72 full texts. The criteria to select them was the presence of a clear intention to use the information gathered with big data technologies to generate official statistics in tourism or to create tourist information systems to promote sustainable tourism planning. Therefore, the remaining 85% of articles were rejected for not matching this essential condition, which is the main purpose of this research.

Most of the refused articles (58%) cover interesting aspects of the application of big data in tourism, but the information is not used to generate official tourist statistics. They deal with understanding the visitors' profile, opinion and behaviour through user generated content as a tool to study specific issues such as the destination image, tourist movement patterns and preferences, and visitors' satisfaction [9,59–63]. Furthermore, tourist companies' interests were present in 17% of the papers. They use big data to follow online consumers' reviews, to predict hotel demands, as well as to co-create new tourist products together with visitors [64–68].

**Figure 2.** Flowchart of the inclusion and exclusion of studies [62].

In addition, four papers (6%) related to the development of methodologies to measure tourism sustainability were identified [69–72]. However, they were not included because they did not consider big data as a source to measure tourism sustainability. Finally, three theoretical works (4%) were not included either, as they had a general approach on the use of big data in tourism [73,74] or addressed a different perspective in relation to this research [75]. In terms of research bias, there is one issue that should be taken into account. Some publications related to the research topic could have been left out of the literature review, because they are not included in Scopus or Web of Science. However, using these databases guaranteed the selection of the ones with the highest research quality.

Once selected, the studies were evaluated to identify the kinds of theoretical and conceptual contributions and advances made, the array and nature of empirical situations investigated, the methodological approaches adopted, the conclusions and recommendations outlined, and the tools, guidelines and regulations produced in reports. The synthesis was done based on a descriptive approach through registering, tabulating and integrating all of the articles' contributions.

#### **4. Results and Discussions**

This section presents the results and discussions of the information gathered from the selected papers. For the integration of the data, the following parameters had previously been defined and extracted from the papers: title, authors, year of publication, journal or editorial, objectives, methodology (research technique, setting, type of data and source, tools for collection processing and visualising data), sustainability approach, stakeholders involved, big data opportunities and challenges.

#### *4.1. Publications on Tourism Statistics and Sustainability: General Remarks*

Table 1 shows the list of the selected publications that were analysed in depth in order to identify the approaches and methodologies to use big data to generate tourism statistics, especially indicators that measure tourism sustainability. The selection includes 8 articles from journals, *Tourism Management*

being the only journal with 2 articles [63,64]. Furthermore, a chapter of a book [65] and a report based on a keynote prepared by EUROSTAT were included [6].


**Table 1.** List of publications on tourism statistics and sustainability.

In relation to the research objectives, all papers are oriented to the use of big data technologies to develop tools and methods to support strategic decision-making in tourism destination management. In particular, the incorporation of big data to official tourism statistical systems was addressed in four papers [5,6,12,53]. They all highlighted the potential relevance of big data in gathering tourism statistics, as well as its opportunities and challenges, which is further discussed in the following sections.

In spite of the fact that the research period was set in the last 10 years, between 1999 and 2019, the first publication addressing the subject of interest of this paper was published in 2014 [11]. As shown in Figure 3, in 2017 and 2018 the number of papers increased, totalling 70% of the total papers. This is a result of the growing recognition of big data as a complementary source for the generation of official tourism statistics [5,12]. However, if compared with the rest of articles being published on the use of big data in the tourism sector, the number of papers on this particular topic still remains low, as explained above.

**Figure 3.** Distribution of the selected papers according to their publication date.

#### *4.2. Methodologies and Approaches: Tourism Statistics and Sustainability*

A case study is the most common technique used in the analysed papers. However, what does differ is the territorial scale, which varies from local to continental. Thereby, there are cities such as: Melbourne [8], Beijing [66], Noci [9], Helsinki, Oslo, Stockholm and Copenhagen [65]; Dublin and London [12]. Furthermore, regions such as Are in Sweden [11] and Saare and Tartu county in Estonia [64] are studied. The country level is represented by research developed in Spain [4], while research at a continental level is covered by Batista et al. [53]. In general, a prevalence of European territories can be seen, with the only exceptions being Melbourne and Beijing.

The types of data used to generate tourism statistics were grouped in three categories, according to Li et al. [67]: users, devices and operations (Figure 4). The user-generated content (UGC) is the prevalent source, used in 60% of the papers. This includes online textual data, mainly social and news media, and geotagged photos. Further, 20% exploits the potential of devices by collecting information from mobile roaming, traffic loops and traffic control cameras. While, the transaction data was used by another 10%, in particular, point of sales terminals (POS), ATM withdrawals and Booking.com. According to this, it can be said that UGC is the most relevant source for tourism statistics purposes. This conclusion is the same that Li et al. [67] obtained when researching the applications of big data in tourism studies in general, although they addressed a lower dominance of UGC (47%).

**Figure 4.** Sources of data for the generation of tourism statistics [67].

#### 4.2.1. User-Generated Content

The articles based on the analyses of online textual data utilise different analytics methods. Donovan et al. [12] used the Hadoop tool to process data extracted from Wikipedia, as it is open

source, relatively user-friendly and could be useful in official statistics for analysing text files, networks and sensor data. Furthermore, these authors utilised Big Data Sandbox which is a United Nations Economic Commission for Europe platform. This is an experimental area where participating statistical organisations around the world can jointly explore how big data can be best used for the production of official statistics (https://joinup.ec.europa.eu/solution/big-data-sandbox/about).

To understand the role of social big data in nurturing open innovation to define sustainable tourism strategies, Del Vecchio et al. [9], used two tools: Keyhole and Buzztrack. They allowed the examination of different social networks such as Instagram, Twitter and Facebook to extract the users' preferences, behaviours and opinions in relation to a destination. These applications provide access not only to visitors demanding a conventional supply, but also to ones interested in eco-friendly products and services. Therefore, this market segment could be better exploited, thus attracting visitors that can minimise environmental impacts. Furthermore, the issues that could affect the destination's sustainability can be also detected. For example, the identification of overcrowded areas assists in the implementation of measuring visitor's management, which is an urgent topic, especially in tourist historic cities. This means big data provides tourism managers with valuable data to take actions on matters significant for managing sustainable destinations such as accessibility, mobility, pricing, taxes or booking systems [68–71].

The user-generated content is also useful when translated to web intelligence applications. This is the case of Visual Analytics Dashboard (https://www.weblyzard.com/interface/) proposed by Scharl et al. [65], which monitors data posted on online media channels for a tourist destination in real-time. The system, developed by WebLyzard Technology, has a visual analytics dashboard, which is an advanced information, exploration and retrieval interface. It provides information about tourists' perceptions on particular destinations or events, allowing geographical patrons to be established and also to be aware of the volume of documents associated with the topic of interest. From the authors' point of view, the great benefit of this technology is the use of interactive tools (trend charts, keyword graphs, tag cloud, etc.) that help to easily visualise and understand the information. This is a valuable aspect in raising awareness among tourism authorities and enterprise managers about the need to use them to support decision-making based on sustainable objectives.

WebLyzard Technology is also oriented to the environmental sustainability domain through the creation of different applications as for example the United Nation Environment Web Intelligence (https://unep.ecoresearch.net/weblyzard/en/). It aggregates data from Twitter and website news on sustainable development, climate change, biodiversity, water and energy consumption, and air pollution (Figure 5). The users can filter the search by date, source, language, and country of publication. Through diverse visual resources, the users can also discover connections among different institutions, places or people, which is a valuable tool for decision makers to be kept up to date. This kind of instrument increases the visualisation of environmental issues, thus raising awareness among the general public, and as a consequence, enhancing the visitor's behaviour once at their destination. Therefore, they contribute to the need to mitigate tourism impacts on climate change to ensure environmental sustainability, as widely addressed [38,52,72,73].

Moreover, the Destination Management Information System Åre (DMIS-Åre) was introduced by Fuchs, Höpken and Lexhagen, 2014 [10]. This is an intelligent application developed in Åre, a mountain destination in Sweden for monitoring tourism activity from the supply and demand perspective. Some economic indicators include bookings, rate of occupancy, overnights, product and services prices and trades. Furthermore, the customers' behaviour is measured through website search and navigation, profiles and booking trends. In addition, the customer's perception about image destination, satisfaction, loyalty, and value for money were included. One interesting aspect in DMIS-Åre that the authors identified was the integration of big data with traditional sources, a highly recommended practice to make the most of big data [6]. For example, the visitors' feedback integrates the data from Booking.com and TripAdvisor, surveys conducted by some accommodation providers, destination surveys, and real-time feedback from an electronic registration tool.

**Figure 5.** Screenshot from the United Nation Environment Web Intelligence showing the results of a query for "Sustainable Tourism" [74,75].

Otherwise, for the papers that used geotagged photos data extracted from Flickr, there are similar methodologies, although with some particularities. A spatial clustering and text mining approach was used in both cases [8,66]. A model for predicting the future trend of tourist demand was developed by Miah et al. [8] with the aim of complementing the estimate figures from general surveys and official statistics. However, the method proposed by Peng and Huang [66] is higher in classification accuracy, enables tourist zones or in-demand attractions to be distinguished and is more adaptable to irregular density distribution. These two cases assist tourist managers with other possibilities to monitor the visitors' flows that consequently allow them to carry out preventative actions to guarantee tourist sustainability.

#### 4.2.2. Device and Transaction Data

The use of traffic loops and traffic control cameras was used by Cortina et al. [5] to estimate the number of foreign visitors (tourist and same-day visitors) that arrive in Spain every month by road. In addition, these authors utilised mobile phones positioning data to measure the number of tourists, both residents and non-residents, and their average stay, broken down by region of destination (NUTS 2) and region/country of origin. This tool was also applied by Raun et al. [64] to measure visitor flows through spatiotemporal tracing records in Estonia. The data on foreign visitors were collected from the main national mobile operator. These techniques for the quantification of the number of visitors are in fact relevant to estimate the amount of excursionists at a destination, which according to the authors' knowledge, is difficult to calculate using traditional methods. It allows better estimations of the volume of visitors and, therefore, of the economic impact of tourism activity.

Another source of information that was explored was the data recorded by the BBVA bank electronic payment system, one of the most important in Spain [5]. In the case of residents, the registers of all payments made by the bank's clients at every point of sales terminal (POS) and ATM withdrawals with an entity card were analysed. Only cash payments and those made with a card from any other entity were out of scope of the study. For non-residents, the available information came from the payments or extractions in POSs or ATMs in the BBVA network, so the vision of their activity in Spain was more limited.

Moreover, Batista et al. [53] combined the data from Eurostat official statistics, and also from Booking.com and TripAdvisor, the two main online booking systems, which provide accurate localisation and capacity for accommodation providers. The objective was to produce a comprehensive dataset representing tourist density supported by statistical software and geographical information systems in all European countries. This study proposes relevant indicators to measure tourism sustainability in EU-28, such as tourism intensity, tourism seasonality, and regional vulnerability to the tourism index. Tourism intensity measures the relative importance of tourism in the territorial context, which for example, can be useful for the detection of overtourism. In the authors' opinion, this constitutes an invaluable resource as this phenomenon is increasingly affecting social sustainability of highly in-demand places. In the last few years some destinations, such as for example Barcelona, Venice or Amsterdam have been dealing with the residents' intolerance towards tourists and social movements demanding urgent intervention from the public authorities to control overtourism. Therefore, measuring variables, such as tourism seasonality and regional vulnerability to tourism, provides managers with a complete picture of the social, economic and environmental impacts of the tourism activity, which should be used to support measures to ensure sustainable tourism.

#### *4.3. Opportunities and Challenges in Using Big Data for Tourism Statistics*

As discussed in the previous sections, the application of big data is useful technology to support decision-making processes in sustainable tourism planning and management. However, some challenges should be considered, particularly when contemplating its use for official statistical purposes. Figure 6 summarises some of the opportunities and challenges extracted from the analysed articles for the application of big data in tourism statistics, which is further discussed below.

**Figure 6.** Opportunities and challenges in using big data for tourism statistics.

One of the most relevant opportunities of big data is the availability of an immense volume of information. The traditional analytical practices are insufficient for the analysis of the enormous and unstructured datasets gathered from such diversified sources (social media, devices, transactions, etc.) [8]. Apart from the high volume, another possibility is the real-time synchronisation of big data sources, which allows destination management organisations to respond timely to breaking news [65].

Furthermore, big data allow the introduction of new indicators to measure the functioning of the destination, visitors' behaviour and experiences [11]. For instance, the number of arrivals and overnight stays can be estimated, independent of the accommodation category. This possibility offers a more precise quantification of the real volume of tourists. Nowadays, with the explosion of informal accommodation systems such as Airbnb, Homestay, HomeExchange, among others, establishing the volume of visitors just considering official accommodation is not accurate.

However, big data opportunities are particularly significant to measure and promote tourist sustainability. Monitoring creates a huge amount of data that allows tourism statistics to be supported with information unable to be gathered by traditional methods [5,53]. This is the case of spatiotemporal analysis [5,53], which is very useful in managing tourist flows to ensure social sustainability to preserve a quality tourist experience and the liveability of the place. In addition, the combination of big data and computational knowledge allows the creation of intelligence tourism information systems to generate meaningful information and predictive insights. Some examples of this are the Web Intelligence Application developed by WebLyzard Technology [65], the Destination Management Information System Åre [11], the Tourism Intelligence System Badajoz-Elvas (http://www.sitbadajozelvas.es/) and the Tourist Intelligence System of Buenos Aires (https://turismo.buenosaires.gob.ar/es/observatorio).

Finally, big data facilitates open data innovation practices which contribute to the sustainable development of tourism activities. Adverse tourist impacts can be minimised by encouraging destination stakeholders and the general public to raise awareness of the need to preserve environmental sustainability. For instance, the use of applications on climate change, such as the United Nation Environment Web Intelligence, provide knowledge and make people aware of the urgency of taking active participation in this situation. Visitors are increasingly demanding eco-friendly products and services and a lower consumption of natural resources. As a consequence, tourist providers are more interested in offering ecological products and services and getting a quality certification that can differentiate them from their competitors. In the authors' opinion, all these changes in behaviour from both the supply and demand perceptions are what support the sustainable development of destinations.

In relation to the challenges, one important issue to consider regarding big data for official statistics purposes is the need for collaboration among public and private agents [5,11]. However, 50% of the papers that matched this research came from academia [8,9,11,64,66]. Public administrations are represented in 20% of papers by the National Statistics Institute of Spain [5], and the European Commission [6,53], while initiatives involving different stakeholders are only present in two papers. One case combines a technological company, a national public institution and a European organisation [12], and the other is a case of academics linked to a technological company [65].

Guaranteeing access to big data sources and its continuity in time is also a handicap. While access to some data such as social media posts, web activity and dynamic websites are free, others are held by private companies such as mobile network operators or bank entities that are not always willing to share them for statistical purposes. Some of the motivations could be: legal uncertainty [64], internal data monetisation projects or concern about public dissatisfaction. For these reasons, collaboration among stakeholders is a crucial factor to guarantee transparency and, as a consequence, a balanced win-win for all involved [65]. Furthermore, the fact that diverse data suppliers may be involved represents a risk of guaranteeing the systematicity of data in time [6].

The complexity of data is another disadvantage. In order to use the data as an input to produce statistics, a deep examination and definition of algorithms on datasets is necessary [6]. Heterogeneous and large volumes of data must be aggregated and visual dashboards must be provided to analyse patterns and relations in the extracted information [65]. Raun et al. [64] also addressed the necessity to standardise geographical analyses with destination marketing and development demands. Miah et al. [8] found that technologies to analyse and convert such amounts of big data to support decision-making are generally available for large companies. However, from this paper's authors' point of view, company size does not necessarily prevent it from exploiting big data, as the diversification of the data is quite wide, even including some with free access.

The use of big data may also introduce bias when framing populations. For example, the estimations from mobile network operators are based on market share, but it can introduce differences according to the region or socio-economic segment. In addition, the penetration levels for mobile phone ownership and utilisation are not necessarily at 100 %, although this is comparable to the matter of over-coverage or under-coverage when establishing a sample structure in the traditional application of surveys [6].

Furthermore, as official statistics follow high-quality principles, big data also needs to guarantee these criteria to be considered as official statistics. In this respect, Eurostat have studied methods for statisticians to evaluate the quality of big data sources [76]. Furthermore, a change towards different sources or methodologies can cause a considerable break in systematicity. This can risk the establishment of the data series that guarantee comparisons through time, which is one primary objective of official statistics [6].

Another issue to consider for the statistical use of big data is that official statisticians lose full control of data production processes and depend on data providers, which are partially responsible for controlling data quality. In addition, they would need to acquire some skills in data management and understanding computing methods [6].

A holistic perspective to integrate big data with economic, environmental and social tourism sustainability is not generally addressed in the reviewed articles. Indeed, it can be appreciated that the direct intention of applying big data to measure sustainability remains unsolved, as stated by Perles and Ivars, 2018 [51]. In addition, the articles mostly concentrate on the analysis of visitors, disregarding local communities. In the authors' opinion, big data should also be applied to the resident's approach to measure social sustainability. For example, indicators such as the residents' level of satisfaction with tourism, the effects on the rented housing market, the situations of expulsion, the amount of involvement in planning tourist policies, among others, must be undoubtedly monitored.

In terms of methodologies, the authors consider that the best way to improve the connection among big data and measuring sustainability is the integration of the different big data sources (users, devices and operations) in an open access tourist intelligence systems. Additionally, it should be connected to traditional touristic sources (surveys, interviews, etc.) and to environmental monitoring systems which are able to address the supply, demand and residents' approaches. In the case of tourist historic cities, special attention must also be paid to the damage that tourism can cause to tangible and intangible heritage. Therefore, the tourist intelligence system should also envisage the integration with other applications such as heritage information systems for heritage preservation and management based on spatial data infrastructure [77]. This is certainly a complex issue, especially if destination management organisations fail to lead the process along with relevant stakeholders.

#### **5. Conclusions**

The research on the application of big data for tourism statistics to support destination management organisations is still relatively new, being concentrated in the last two years. The case study is the most used methodology with examples that cover local, regional, national and European levels. However, local cases, in particular, from European cities prevail over all others. In exact terms, the incorporation of big data in official tourism statistics, either at local, regional or national levels has a favorable environment, as it has been supported by different European and international organisations [4,6,12,53].

Although big data offer a wide range of possibilities, a predominance of user-generated content for tourism statistics can be seen. It includes online textual data such as Wikipedia, Facebook, Twitter or Instagram and geotagged photos data from Flickr. This behaviour can be explained because these sources are free and, in some cases, can be collected using free tools, such as Keyhole. However, the data from mobile network operators or banking entities are more difficult to be accessed as they belong to private companies.

In this regard, the need to establish win-win relationships among public and private stakeholders is crucial. However, this constitutes a critical issue that needs to be resolved due to the traditional separation among key agents such as academia, public authorities, tourist companies and technological centres. Furthermore, this can be proved as only two papers surpassed this barrier [12,65], while half came from academia. In summary, the initiatives to create tourist information systems using big data have been developed within the academic environment [11,65], however an example of transference to the business world could be appreciated [65].

The measurement of sustainable tourism is perhaps one of the most under researched subjects in tourism statistics, because it lacks the practical tools to guide the implementation and systematicity. Nevertheless, big data can cover this gap by proposing indicators, especially those contributing with geographical and temporal granularity, as used by Batista et al. [53] and Cortina et al. [5]. Apart from this, some interesting initiatives, such as the ones developed by WebLyzard Technology, are examples of the potentialities of social and news media to encourage integral sustainable practices and open innovation [9]. These cases prove that integration among indicators to measure tourist sustainability and big data really is possible, it is not a myth, and it also shows its potential to destination management organisations. Unfortunately, this is not expansive enough. For this reason, future research should be oriented towards creating mechanisms to coordinate tourism authorities with data providers, data experts, academia and business communities. On the condition that all these actors understand and believe in big data as a complementary data source, they will be able to face the challenges and build official tourist statistics.

**Author Contributions:** Introduction, Y.P.G.; literature review, Y.P.G.; materials and methods, Y.P.G. and D.B.Q.; results and discussions, Y.P.G. and D.B.Q.; conclusions, Y.P.G.; references, Y.P.G. and D.B.Q.

**Funding:** This research was funded by the Xunta de Galicia and the European Union (European Social Fund—FSE) through predoctoral stage grants to universties and public research organisations in Galicia and other organisations of the Galician R+D+I System (2017), grant number ED481A-2017/230.

**Conflicts of Interest:** The authors declare no conflicts of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
