Towards a Volunteered Geographic Information-Facilitated Visual Analytics Pipeline to Improve Impact-Based Weather Warning Systems

Vrotsou, Katerina; Navarra, Carlo; Kucher, Kostiantyn; Fedorov, Igor; Schück, Fredrik; Unger, Jonas; Neset, Tina-Simone

doi:10.3390/atmos14071141

Open AccessArticle

Towards a Volunteered Geographic Information-Facilitated Visual Analytics Pipeline to Improve Impact-Based Weather Warning Systems

by

Katerina Vrotsou

^1,*

,

Carlo Navarra

²

,

Kostiantyn Kucher

¹

,

Igor Fedorov

¹

,

Fredrik Schück

³

,

Jonas Unger

¹

and

Tina-Simone Neset

²

¹

Department of Science and Technology, Linköping University, 602 33 Norrköping, Sweden

²

Department of Thematic Studies-Environmental Change, Centre for Climate Science and Policy Research, Linköping University, 581 83 Linköping, Sweden

³

Forecast and Warning Service, Swedish Meteorological and Hydrological Institute, 601 76 Norrköping, Sweden

^*

Author to whom correspondence should be addressed.

Atmosphere 2023, 14(7), 1141; https://doi.org/10.3390/atmos14071141

Submission received: 1 June 2023 / Revised: 5 July 2023 / Accepted: 11 July 2023 / Published: 13 July 2023

(This article belongs to the Special Issue Weather and Climate Extremes: Observations, Modeling, and Impacts)

Download

Browse Figures

Versions Notes

Abstract

:

Extreme weather events, such as flooding, are expected to increase in frequency and intensity. Therefore, the prediction of extreme weather events, assessment of their local impacts in urban environments, and implementation of adaptation measures are becoming high-priority challenges for local, regional, and national agencies and authorities. To manage these challenges, access to accurate weather warnings and information about the occurrence, extent, and impacts of extreme weather events are crucial. As a result, in addition to official sources of information for prediction and monitoring, citizen volunteered geographic information (VGI) has emerged as a complementary source of valuable information. In this work, we propose the formulation of an approach to complement the impact-based weather warning system that has been introduced in Sweden in 2021 by making use of such alternative sources of data. We present and discuss design considerations and opportunities towards the creation of a visual analytics (VA) pipeline for the identification and exploration of extreme weather events and their impacts from VGI texts and images retrieved from social media. The envisioned VA pipeline incorporates three main steps: (1) data collection, (2) image/text classification and analysis, and (3) visualization and exploration through an interactive visual interface. We envision that our work has the potential to support three processes that involve multiple stakeholders of the weather warning system: (1) the validation of previously issued warnings, (2) local and regional assessment-support documentation, and (3) the monitoring of ongoing events. The results of this work could thus generate information that is relevant to climate adaptation decision making and provide potential support for the future development of national weather warning systems.

Keywords:

weather warning systems; flooding; volunteered geographic information; visualization; visual analytics; artificial intelligence; machine learning; natural language processing; classification; social media

1. Introduction

Extreme weather events, such as heavy rainfall and flooding, are expected to increase in frequency and intensity as a result of climatic changes, leading to negative societal consequences [1,2]. Preparedness for extreme weather events, including knowledge and capacity to monitor and assess their local impacts, are important aspects of increase societal resilience [3,4]. Therefore, the prediction of extreme weather events, assessment of their impacts in urban environments, and implementation of adaptive actions have become a priority for local and regional authorities worldwide [5,6].

In the break of extreme weather events, what is of vital importance is access to timely and accurate information that can communicate knowledge about the current conditions and impacts. The primary source of such information is commonly sensors used for monitoring and collecting observations about, for example, river flow levels, precipitation levels, temperature, wind etc. In addition to sensor networks set up by official authorities, in the past decades, Volunteered Geographic Information (VGI) has gained increasing prominence as a complementary source of valuable information [7]. VGI have been classified into participatory and opportunistic approaches, depending on the level of intentional activity of the contributing individual [8,9]. Participatory VGI includes crowd-sourced spatiotemporal data collected through, for example, individuals’ home weather stations, volunteered reports and observations that are submitted intentionally to explicitly inform an ongoing event. Nowadays, however, individuals tend to continuously share their experiences, observations, and news online through multiple social media channels. As a consequence, social media is emerging as a new possible data source for reporting on ongoing events. These can be characterized as opportunistic approaches because while the purpose of the observation is not primarily to provide data on a specific event, it is nonetheless useful information that can be extracted. Due to the nature of VGI, be it participatory or opportunistic, harvesting this information opens the possibility for accessing localized first-hand information from citizens that are potentially directly affected or are direct observers of such events. This can be of particular value in the occurrence of extreme weather events, but also for long-term disaster management to better inform and complement standardised, nationwide warnings and processes.

Traditionally, meteorological agencies issue weather warnings based on meteorological and hydrological models. These, however, frequently lack insights into local vulnerabilities and potential impacts. The challenge to better couple weather warnings to local impacts and actors that can provide additional information regarding the potential effects of weather events has therefore been addressed by the new Swedish national system for impact-based weather warnings, which was launched in October 2021 by the Swedish Meteorological and Hydrological Institute (SMHI). The new approach implies a consultation process with authorities and actors at local, regional, and national levels prior to issuing certain types of warnings. This process further demands a series of preparatory efforts to establish supporting documents for local and regional impact assessments, building on the knowledge and experiences of current and previous risks and impacts. Such collective efforts could be strengthened by VGI as a novel source of first-hand local information provided by citizens, which can inform the warning and impact assessment processes.

In this paper we outline the design considerations, opportunities, and first steps towards formulating and implementing a Visual Analytics (VA) [10,11] pipeline based on citizen-contributed VGI to inform and verify impact-based weather warning systems. The ambition is to complement the current processes with access to supplementary actionable information. The work is conducted within an ongoing research project, AI4ClimateAdaptation (https://liu.se/en/research/ai4climateadaptation, accessed on 29 May 2023). The aim of the project is to assess the potential of combining visualization and Artificial Intelligence (AI)-based image and text analysis with the national impact-based weather warning system. To this end, we report on common practices to consider; on previous research that we draw inspiration from, we outline preliminary plans, and we describe multiple analytical approaches we have considered and applied so far.

The remainder of this paper is structured as follows. Section 2 provides a brief overview of related work. Section 3 outlines the motivation and background of the work. Section 4 describes the design space for the envisioned VA pipeline and discusses the main facets identified in relation to each of its steps: data collection (Section 4.1), classification (Section 4.2), and visualization (Section 4.3). Section 5 includes a discussion on the limitations of the proposed approach, and finally, conclusions and future work are outlined in Section 6.

2. Related Work

In this section, we review research related to our ongoing work towards a VA pipeline for detecting and visually exploring extreme weather events from VGI texts and images.

Several visual analytics systems have been proposed for exploring and visualizing crisis events using user-generated messages from microblogging services. SensePlace2 [12] and Twitinfo [13] are examples of web-based geovisual analytics systems that use user-formulated keyword queries to identify and extract relevant tweets, log their frequencies, and display them in coordinated views for interactive exploration. These examples use classic keyword-based Natural Language Processing (NLP) approaches for the identification of relevant tweets and focus primarily on the visual exploration of the data. Later approaches increasingly incorporate data mining and machine learning methods. Chae et al. [14], for example, proposed a VA approach for emergency management and disaster preparedness that includes topic modelling for extracting and following topics from the texts. Cerutti et al. [15] used data mining and exploratory visualization to identify disaster-affected areas from Twitter data. Bosch et al. [16], in ScatterBlogs2, propose a VA approach for monitoring microblog messages. The system makes use of filters and SVM classifiers for extracting messages and topic modelling for identifying and monitoring topics of interest.

The approaches above have made significant contributions to crisis event detection through the analysis and visualization of purely textual information generated by users, primarily microblogs such as Twitter. Such data, however, are short and noisy, which is why approaches combining multiple data sources have been investigated in the literature. Cai et al. [17] introduce STM-TwitterLDA, an approach based on generative probabilistic topic modelling, which incorporates five Twitter features (text, image, timestamp, location, and hashtags) in a joint model to identify topics on Twitter. Qian et al. [18] propose a multi-modal event topic model for identifying correlations between textual and visual modalities to extract semantic topics and their evolutionary patterns and visualize these with texts and images over time.

In our work, we envision a VA pipeline that combines NLP and computer vision techniques with interactive visualization in order to enable the identification and exploration of extreme weather events from VGI text and images—primarily posts collected from Twitter. Feng et al. [19] proposed a similar approach, which uses location filtering to collect Twitter data within a specific geographic area. They then combine a deep learning-based classification approach with spatiotemporal clustering to detect flood events. The visual exploration of the identified flood events in their work is performed via simple visual representations showing the detected flood events as markers on a map and the tweet frequency per region through a choropleth map. Our intention, however, is to provide a considerably more advanced interactive visual interface that will enable the in-depth, flexible exploration of multiple aspects characterizing the identified extreme weather events.

3. Motivation and Background

This paper outlines our design considerations for formulating a VA pipeline within an ongoing research project, AI4ClimateAdaptation. The project aims to assess the possibility of combining VGI from citizens, AI-based text and image analysis, and visualization to support weather warning processes and increasing knowledge of local impacts.

The project is tightly connected to and motivated by the new Swedish national system for impact-based weather warnings [20]. Following guidelines from the World Meteorological Organization [21], the warning issuing process builds on a direct consultation process with local and regional representatives for, e.g., first responders, municipalities, and infrastructural services. The inclusion of regional and local actors in the process aims to provide for both more accurate assessments of local thresholds and risk factors, and for local and regional efforts to develop assessment support documentation across sectors.

This localized and impact-based approach to weather warnings provides additional motivation to explore complementary sources that can further inform the weather warning processes and to validate previously issued warnings and their impacts. To this end, VGI, both in its participatory and opportunistic form, is of high interest. In this work, we use text and images retrieved from social media, in particular from Twitter [22].

We envision a final working pipeline composed of three main steps: (1) data collection, (2) image/text classification and analysis, and (3) visualization. To achieve these three generic steps of the overall pipeline, we outline the following work plans:

Exploration of available existing image and text data sets related to extreme weather events, particularly flooding. Development of effective data collection approaches of VGI in the form of text and images.
Implementation of machine learning (ML) algorithms and computational methods for the classification and analysis of VGI texts and images for the detection of extreme weather events, with a focus on flooding.
Design and development of a VA interface for the visualization and exploration of the classified text and image data, with a focus on their spatio-temporal and contextual characteristics, in order to detect and assess the occurrence, extent, and impacts of extreme weather events, with a focus on flooding.

Our entire process towards this final result is informed and guided by representatives from potential stakeholder groups through a co-design process [23] based on interviews and workshops. Stakeholders include climate adaptation experts and experts responsible for the impact-based weather warning system at SMHI, as well as actors at local, regional, and national levels.

In the following sections, we discuss our design considerations and implementation plans for each of these three main steps of the pipeline in more detail.

4. Design Space

Multiple facets relate to each of the steps of the envisioned VA pipeline for the identification and exploration of extreme weather events, in particular flooding, from VGI texts and images. Each of these facets involves different considerations, opportunities and challenges. In the following, we discuss the main facets we have identified for each step. An overview diagram of the design space can be seen in Figure 1.

4.1. Data Collection

Since the overall goals of the envisioned project go beyond implementing a visualization approach for the existing standard datasets, the data-related concerns constitute an important part of the design space.

4.1.1. Sources and Modalities

First of all, the data provided by the authorities (i.e., the public weather warning announcements) is one of the important sources. SMHI warnings are published on their website and mobile application up to three days in advance of the start date. The warnings are also available at WIS, a portal for Swedish actors to share information about civil emergencies. Information about particular events can be used to identify further relevant data (e.g., by considering the facts that warnings were issued for a particular location during a particular date/time range), but also used eventually for validation purposes.

Next, we consider the data available on social media. We chose to collect and to use data from Twitter since it is a widely recognized microblogging platform that facilitates the dissemination of information. In the context of disasters, it has been extensively employed to communicate evacuation strategies, disseminate warnings, and aid in the evaluation of damages [24]. One additional data modality of interest that is supported by Twitter is image data: photos relevant to the flood events and their impact would be very valuable for the analysis of the outcomes of such events and feedback towards the respective impact-based weather warnings.

Besides social media, possible data sources and collection channels include explicit data collection and submission approaches.

4.1.2. Collection Methods

For the purpose of the project, the Twitter Streaming API is utilized to extract the text and metadata of tweets by configuring a query that retrieves items containing keywords relevant to flood, heavy rain, and cloud-burst events. The temporal parameters for the query are being determined in reference to the warnings issued by SMHI. In order to exclusively obtain tweets composed in Swedish, keywords based on terms used in Swedish to refer to flood-related events were chosen. Spatial and language restrictions were not incorporated into the query, as this could potentially reduce the number of tweets acquired.

For collecting additional data, we have been exploring the opportunity to make use of a custom citizen sensing application that was developed as a mobile web application in order to facilitate data collection by volunteering users.

Finally, in order to identify the relevant subsets and aspects of the data for collection and further processing, we carried out pilot studies involving the existing data sets, resources, and tools. In particular, we explored the feasibility of applying lexical markers from the existing resources for social media queries or even further text data processing stages. As demonstrated in Figure 2, we applied a custom version of a previously developed visual text analytic tool, uVSAT [25], to analyze the use of lexical markers from CrisisLex [26] and EMTerms [27] (e.g., “flood”, “storm”, etc.) in Twitter data from the point of view of individual documents and also results aggregated over time (e.g., in order to check for particular temporal patterns, such as the use of relevant markers peaking around the time of the corresponding events and declining over the course of the next 48 h). While these preliminary analyses were conducted with the existing resources and data in English, our main application scenario assumed the exclusive use of Swedish, which affected the choice of keywords for the main data collection stage, as mentioned above.

4.2. Classification

In our work, we are interested in the identification of flooding events through the classification and analysis of flood-related VGI in the form of texts and images. A multitude of machine learning approaches of increasing sophistication have been emerging for the classification of such data [28,29]. Overall, the main considerations in choosing an appropriate method are the modality of the data (in our case text and data), the availability and size of suitable training data, and the type of intended categorization. Of particular interest for the AI4ClimateAdaptation project is the increase in knowledge and awareness of local impacts of flooding events in order to inform and validate the impact-based weather warning process. To this end, aspects that become central are (1) the possibility of identifying and/or exploring impacts and (2) the ability to associate flooding-related data entries to geographic locations at a relatively high resolution.

4.2.1. Training Data

Text and image data collections to support classification tasks are increasingly becoming publicly available. Examples of labelled dataset resources for natural disasters from social media include CrisisLex [26] and CrisisNLP (https://crisisnlp.qcri.org/, accessed on 29 May 2023) [30]. These resources focus primarily on social media text entries. On the other hand, examples of resources for labelled images of natural disasters include the MediaEval data challenges (https://multimediaeval.github.io/, accessed on 29 May 2023) and Kaggle data repository [31].

While such public datasets are of high interest for training, they need to be complemented with case-specific data to fine-tune classification models towards the specific task and potentially towards the geographic and/or contextual setting. For this purpose, focused data collection and annotation initiatives need to be explored. An alternative is downloading relevant, localized Twitter text and images and manually annotating these within the project or potentially through crowdsourcing platforms. In addition, in this project, we investigate the development of a tailored app for submitting images relating to extreme weather events (in particular flooding) and annotating these with a number of labels, as mentioned in Section 4.1.

Since collection campaigns of localized, task-specific datasets are usually of small-scale and not sufficient for training a classification model on their own, the potential of transfer learning [32] needs to be investigated. Furthermore, we acknowledge the challenges of ambiguity in the underlying real-world data that might lead to annotation quality issues, especially with text data [33,34], which will require careful consideration from both quantitative and qualitative annotation reliability/agreement analyses [35,36].

4.2.2. Image Classification

Our first objective in the context of image classification is to classify VGI images as flood vs. non-flood related. The task of detecting flooded and non-flooded images is closely related to the classical problem of image classification based on a set of supervised images. In recent years, a large number of models have been created and trained by professionals using a large amount of data and extensive computing power [37]. This task simulates a real-life scenario, where a person tries to identify a place by studying its individual parts (landscape, buildings, trees, etc.).

In the ideal case, the model should receive a complete “observation” as input—a set of photographs of the same place, taken on the same day, using the same device, under the same weather conditions. However, a more realistic scenario that is relevant to our project involves a single photo attached to a social media post rather than such a set of (high-quality) images. In this context, images from various posts might be distributed within the area of interest with respect to geographic position and time, which complicates the task of recognition, since the context of the observed scene might be missing. This makes the task of image classification for efficient and accurate flood impact/damage assessment highly challenging. To address it, we will initially experiment with a Convolutional Neural Network (CNN) model trained on millions of images from a publicly available database and evaluate the CNN model learning algorithm using project specific data. As the base model architecture alternatives, we consider EfficientNet [38], DenseNet [39], and ResNet18 [40]. These networks have been trained on a huge number of images and are already able to recognise the simplest objects, making them promising for binary classification.

While an initial binary classification may be sufficient for simple flooding event detection, it is not necessarily adequate for the task of informing and validating the issuing of impact-based weather warnings. In this context, information regarding the surrounding infrastructure and local impacts is critical to understand the local characteristics, context and extent of an event. To this end, methods for multi-label and multi-class classification are highly relevant.

4.2.3. Text Classification and Analysis

Text classification is one of the most common tasks in NLP, which is and has been a research topic of interest for some time [41,42]. The main text classification and analysis aspects that are relevant for this step of the proposed VA pipeline include (1) the basic task of classifying VGI text entries as flood-relevant or flood-irrelevant to retrieve a dataset of interest, and (2) the continued contextual and content analysis of this dataset, for example, through automatically identifying topics that further categorize the data; extracting relevant meta-data to be explored, such as demographic information, locations, and impact-related information; and potentially also estimating the perceived severity of events through sentiment analysis.

As in the case of image classification and in text classification, it becomes relevant to consider classification beyond binary flood/non-flood labels and attempt to also identify impacts types, impact severity, and/or affected infrastructure. There are two potential paths towards achieving this. The first one would be to explore multi-label classification models for text that directly try to assign a set of complementary target labels to each data item [43]. The second path would be pursuing a progressive analysis approach by applying an initial binary classification to extract a relevant target dataset and, following this, explore complementary NLP approaches for further classification and information extraction, such as further text classification, keyword extraction, topic modelling, and named entity recognition [44].

There are numerous approaches that could be appropriate for classification tasks, from traditional machine learning methods such as Naive Bayes, Support Vector Machines (SVM), and Random Forest (RF), to deep learning models such as Recurrent Neural Network (RNN), Convolutional Neural Network (CNN), and increasingly, Transformer models [45].

As part of our exploration of the processes of appropriate methods, in the work of Styve et al. [22], we outlined our first steps towards a VA pipeline for the identification and exploration of flooding events from text data, particularly Twitter data. The proposed pipeline combines (1) text classification, (2) location extraction, and (3) visualization. We tested and assessed the performances of two classic (logistic regression and random forest) and two neural network-based (CNN and ULMFiT) text classification algorithms and proposed an algorithm for the geo-location extraction of the Tweets. Figure 3 shows the visual interface developed in the work of Styve et al. [22] for exploring the geo-tagged tweets with respect to their spatio-temporal distribution and textual content.

4.2.4. Location Extraction

Increasing knowledge and awareness of local impacts is one of the main objectives of the envisioned VA pipeline. Therefore, the need to match VGI to geographical locations is an inherent part of the current work.

The challenge related to employing VGI from social media is the limited number of posts that explicitly include fine-grained geographical attributes. For example, only 1% of posted Twitter messages are geo-tagged [46]. To overcome this limitation, the use of geoparsing, a well-established NLP task, can be employed to extract toponyms from a text and to associate these to the real-world coordinates. The geoparsing task has several components; the prominent ones are ‘toponym recognition’, where tokens in a text referring to place names are identified using ML approaches, and ‘toponym resolution’, where by using geocoding methods, geographical attributes can be assigned to the detected toponyms.

A well-known issue in geoparsing, however, is ‘toponym ambiguity’, which refers to the case of a toponym having multiple geographical locations [46,47]. For example, there are at least 11 cities called Paris in the world, such as Paris, France and Paris, Texas. Often, the same instance of a toponym can exist several times within a country; for example, 9 of the 11 cities called Paris are located in the USA. Several deep neural network pipelines have been proposed to mitigate this issue and improve the accuracy of the linking between toponyms and geographical coordinates [48,49].

Social media messages can also contain images or videos, and as for text, it is possible to retrieve location from this kind of data. Using geotagging methods, it is possible to analyse the content of an image or a video and retrieve their geographic location. CNNs have been employed to analyse images and predict their corresponding locations; PlaNet [50], for example, uses the Inception architecture to perform this task.

4.3. Visualization

The final part of the envisioned VA pipeline for detecting and visually exploring extreme weather events from VGI texts and images is an interactive visual interface that brings together the different components of this system. There are three main factors to be considered for the design and implementation of our VA interface: the data, the users, and the analysis tasks to be performed [51].

4.3.1. Data-Users-Tasks

The data of focus for this work, as described previously in Section 4.1, are VGI images and texts. In their unprocessed form, these data display or describe some content that can be explored; they always include a temporal reference (the time that they were posted), as well as additional meta-data such as information on the person that entered the post, and in certain cases also an explicit spatial reference (tagged geographic position). Within the AI4ClimateAdaptation project and through the envisioned VA pipeline, additional attributes will be computed that further characterize the data and which can be represented and explored in the visual interface. The main ones include a classification label and classification confidence as well as an estimation of a geographic location.

Apart from the data, the other factors defining the design requirements for the visual interface are its potential users and their analysis needs. The main stakeholders and potential users of the envisioned VA pipeline are experts at SMHI who are actively engaged in issuing impact-based warnings, as well as actors at local, regional, and national levels that participate in the local consultation process that occurs prior to or simultaneously with the issuing of weather warnings. Through a co-design process and building on individual interviews, consultations, and workshops, we are assessing the users’ needs, mapping their tasks and the way these are currently performed, and outlining the potential additional support that could be provided to them by considering VGI as an additional source of information.

We expect that the VA pipeline that we envision and propose has the potential to support the stakeholders in three ways:

For the validation of impact-based weather warnings. Such validation can be performed in the envisioned VA interface by allowing SMHI experts and regional actors to explore confirmed weather events and their impacts and compare them to previously issued warnings. Additional insight might also include the identification of events for which warnings were not issued.
For assisting local and regional assessment–support documentation by supporting local and regional actors to discover, highlight, and explore potential local impacts and vulnerabilities. These could be identified through recurring impacts to specific infrastructures during similar weather events, and thus contribute to the regional assessment documentation.
For monitoring during an ongoing warning by providing real-time information to support the regional consultation process as well as first responders, who potentially could feed back into the system, validating the crowdsourced information through in situ observations.

With the considerations above, the following are examples of analysis tasks of interest that we need to support:

Identification and exploration of when extreme weather events occur and of their temporal extent.
Identification and exploration of where extreme weather events occur and of their spatial extent and progression.
Identification and exploration of the types of impacts, including their spatio-temporal characteristics.

4.3.2. Visual Interface

To satisfy such analysis needs, the VA interface has to incorporate at least the following views:

Map view showing the spatial density and distribution of relevant VGI and allowing the visual detection of extreme weather events.
Temporal view showing the temporal density and distribution of relevant VGI and allowing the visual detection of events.
Appropriate content views allowing details-on-demand [52] of the extreme weather events and relevant VGI texts and images.
Potentially also views of impact-related information extracted through NLP.

Following the outlined co-design process (reflected in Figure 1), we aim to evaluate [53,54] the resulting VA interface and its potential to inform and provide complementary support to the impact-based weather warning process, through feedback sessions and task-based experimentation.

5. Discussion

Apart from the outlined considerations, there are also limitations that should be discussed in the proposed approach. The main limitation relates to biases that inherently characterize the opportunistic VGI data that are used. Since such data are provided by individuals through social media and not through a dedicated app with the explicit aim of reporting flooding, biases exist with regards to the (1) distribution, (2) timing and (3) location of the collected data records.

The coverage and distribution across the country is entirely dependent on the posts retrieved for a certain event and thus are unpredictable to a large extent. Unavoidably, areas with a higher population density might be better represented than more sparsely populated areas. Due to this, there is a risk of creating a false impression of urgency or an impression of a higher level of impacts in areas where data are more dense. This needs to be carefully considered when designing representations of the data.
Attention is needed also in relation to the timing of the retrieved posts since these too can hide biases. The quality of social media posts can degrade as hashtags relating to events gain momentum, since false posts can appear using the same tags but directing a user to unrelated content. The most representative posts are thus the ones closest to the time of the event of interest.
A final limitation of the data is the estimation of their geographic location. As only 1% of posted Twitter messages are geo-tagged [46], there is an evident need to extract the position of the posts using geoparsing. This increases the uncertainty of the positioning and reduces considerably the precision of the location. When the goal is to explore local effects, this factor can impede the analysis.

In addition to these data biases, the issue of data credibility needs to be highlighted. As the credibility of the posting sources cannot be verified with certainty, it is important to adopt strategies to ensure a certain quality of results. One approach for this could be, for example, to not consider retweets in order to limit the time of collection of posts close to the event, as well as to allow for the visual assessment of the content of posts as proposed above. Finally, there are also limitations relating to the use of machine learning approaches for the classification of data that should be considered. The confidence of the classification results depends entirely on the training data used and can never produce complete and fully trustworthy results. Therefore, in our approach, we emphasize the importance of keeping the human in the loop and the advantage of allowing for the exploration of the content of the classified posts within a VA interface. This way, a user is able to visually assess the results, and this can potentially also allow for refinements of the models used for the classification.

6. Conclusions

In this paper, we have presented and discussed the design considerations and opportunities that we have outlined and are pursuing towards the creation of a VA pipeline for the identification and exploration of extreme weather events, in particular flood events, and their impacts on VGI, specifically from social media texts and images. The presented work was performed as part of AI4ClimateAdaptation, a research project that aims to assess the potential of combining visualization and AI-based text and image analysis with the newly launched national impact-based weather warning system. Our goal is to provide support to experts at SMHI working on the issuing of impact-based weather warnings and to actors at local, regional, and national levels who are participating in the regional consultation process in connection with a weather warning being issued. The intention is to complement their work by providing access to additional, relevant contextual data and not replace the current warning system. We envision that our work has the potential to support these stakeholders in three domains: (1) in the validation of issued warnings, (2) for local and regional assessment–support documentation, and (3) for the monitoring of evolving events.

The outlined design space and the discussed considerations and limitations form the basis for the future work within the scope of our project. The next steps involve the implementation of the envisioned VA pipeline and its assessment with representative experts from the involved stakeholder groups. Furthermore, these considerations can also be generalized for further scenarios involving the application of computational and interactive visual analytic methods with multimodal data for addressing climate-related challenges.

Author Contributions

Conceptualization, K.V., C.N., K.K. and T.-S.N.; methodology, K.V., C.N., K.K., I.F. and J.U.; software, K.V., C.N., K.K. and I.F.; formal analysis, K.V., C.N., K.K. and I.F.; investigation, K.V., C.N., K.K., F.S. and T.-S.N.; data curation, C.N., K.K., I.F. and T.-S.N.; writing—original draft preparation, K.V., C.N., K.K., I.F. and T.-S.N.; writing—review and editing, K.V., C.N., K.K., I.F., F.S., J.U. and T.-S.N.; project administration, T.-S.N.; funding acquisition, T.-S.N., J.U. and K.V. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Sweden’s Innovation Agency, VINNOVA, grant number 2020-03388, ‘AI for Climate Adaptation’.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of the data; in the writing of the manuscript; or in the decision to publish the results.

References

IPCC. Summary for Policymakers. In Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change; Masson-Delmotte, V., Zhai, P., Pirani, A., Connors, S., Péan, C., Berger, S., Caud, N., Chen, Y., Goldfarb, L., Gomis, M., et al., Eds.; Cambridge University Press: Cambridge, UK; New York, NY, USA, 2021; pp. 3–32. [Google Scholar] [CrossRef]
IPCC. Summary for Policymakers. In Climate Change 2022: Impacts, Adaptation, and Vulnerability. Contribution of Working Group II to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change; Pörtner, H.O., Roberts, D., Poloczanska, E., Mintenbeck, K., Tignor, M., Alegría, A., Craig, M., Langsdorf, S., Löschke, S., Möller, V., et al., Eds.; Cambridge University Press: Cambridge, UK, 2022. [Google Scholar]
Thieken, A.H.; Kienzler, S.; Kreibich, H.; Kuhlicke, C.; Kunz, M.; Mühr, B.; Müller, M.; Otto, A.; Petrow, T.; Pisi, S.; et al. Review of the Flood Risk Management System in Germany After the Major Flood in 2013. Ecol. Soc. 2016, 21, 51. [Google Scholar] [CrossRef]
Opach, T.; Navarra, C.; Rød, J.K.; Neset, T.S.; Wilk, J.; Cruz, S.S.; Joling, A. Identifying Relevant Volunteered Geographic Information About Adverse Weather Events in Trondheim Using the CitizenSensing Participatory System. Environ. Plan. Urban Anal. City Sci. 2022. [Google Scholar] [CrossRef]
Aguiar, F.C.; Bentz, J.; Silva, J.M.; Fonseca, A.L.; Swart, R.; Santos, F.D.; Penha-Lopes, G. Adaptation to Climate Change at Local Level in Europe: An Overview. Environ. Sci. Policy 2018, 86, 38–63. [Google Scholar] [CrossRef]
Schultze, L.; Johannesson, R.; Lindgren, E.; Keskitalo, C.; Kjellström, E.; Storbjörk, S.; Bohman, I.; Larsson, H.; Vulturius, G. Första Rapporten FråN Nationella ExpertråDet för Klimatanpassning (First Report from the National Expert Council for Climate Adaptation); Technical Report; SMHI: Norrkoping, Sweden, 2022. [Google Scholar]
Goodchild, M.F. Citizens as Sensors: The World of Volunteered Geography. GeoJournal 2007, 69, 211–221. [Google Scholar] [CrossRef] [Green Version]
Ostermann, F.O.; Spinsanti, L. A Conceptual Workflow for Automatically Assessing the Quality of Volunteered Geographic Information for Crisis Management. In Proceedings of the 14th AGILE International Conference on Geographic Information Science, Utrecht, The Netherlands, 18–21 April 2011. [Google Scholar]
Sester, M.; Arsanjani, J.J.; Klammer, R.; Burghardt, D.; Haunert, J.H. Integrating and Generalising Volunteered Geographic Information. In Abstracting Geographic Information in a Data Rich World: Methodologies and Applications of Map Generalisation; Springer: Berlin/Heidelberg, Germany, 2014; pp. 119–155. [Google Scholar] [CrossRef]
Thomas, J.J.; Cook, K.A. (Eds.) Illuminating the Path: The Research and Development Agenda for Visual Analytics; IEEE Press: Hoboken, NJ, USA, 2005. [Google Scholar]
Keim, D.; Andrienko, G.; Fekete, J.D.; Görg, C.; Kohlhammer, J.; Melançon, G. Visual Analytics: Definition, Process, and Challenges. In Information Visualization; Springer: Berlin/Heidelberg, Germany, 2008; pp. 154–175. [Google Scholar] [CrossRef] [Green Version]
MacEachren, A.M.; Jaiswal, A.; Robinson, A.C.; Pezanowski, S.; Savelyev, A.; Mitra, P.; Zhang, X.; Blanford, J. SensePlace2: GeoTwitter Analytics Support for Situational Awareness. In Proceedings of the 2011 IEEE Conference on Visual Analytics Science and Technology, VAST ’11, Providence, RI, USA, 23–28 October 2011; pp. 181–190. [Google Scholar] [CrossRef]
Marcus, A.; Bernstein, M.S.; Badar, O.; Karger, D.R.; Madden, S.; Miller, R.C. TwitInfo: Aggregating and Visualizing Microblogs for Event Exploration. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’11, Vancouver, BC, Canada, 7–12 May 2011; pp. 227–236. [Google Scholar] [CrossRef] [Green Version]
Chae, J.; Thom, D.; Jang, Y.; Kim, S.; Ertl, T.; Ebert, D.S. Public Behavior Response Analysis in Disaster Events Utilizing Visual Analytics of Microblog Data. Comput. Graph. 2014, 38, 51–60. [Google Scholar] [CrossRef]
Cerutti, V.; Fuchs, G.; Andrienko, G.; Andrienko, N.; Ostermann, F. Identification of Disaster-Affected Areas Using Exploratory Visual Analysis of Georeferenced Tweets: Application to a Flood Event. In Proceedings of the 19th AGILE Conference on Geographic Information Science, Helsinki, Finland, 14–17 June 2016. [Google Scholar]
Bosch, H.; Thom, D.; Heimerl, F.; Püttmann, E.; Koch, S.; Krüger, R.; Wörner, M.; Ertl, T. ScatterBlogs2: Real-time Monitoring of Microblog Messages Through User-Guided Filtering. IEEE Trans. Vis. Comput. Graph. 2013, 19, 2022–2031. [Google Scholar] [CrossRef]
Cai, H.; Yang, Y.; Li, X.; Huang, Z. What are Popular: Exploring Twitter Features for Event Detection, Tracking and Visualization. In Proceedings of the 23rd ACM International Conference on Multimedia, Brisbane, Australia, 26–30 October 2015; pp. 89–98. [Google Scholar] [CrossRef]
Qian, S.; Zhang, T.; Xu, C.; Shao, J. Multi-Modal Event Topic Model for Social Event Analysis. IEEE Trans. Multimed. 2015, 18, 233–246. [Google Scholar] [CrossRef]
Feng, Y.; Sester, M. Extraction of Pluvial Flood Relevant Volunteered Geographic Information (VGI) by Deep Learning from User Generated Texts and Photos. ISPRS Int. J. Geo-Inf. 2018, 7, 39. [Google Scholar] [CrossRef] [Green Version]
SMHI Introduces Impact-Based Weather Warnings in Sweden. Available online: https://www.smhi.se/en/news-archive/smhi-introduces-impact-based-weather-warnings-in-sweden-1.176502 (accessed on 29 May 2023).
WMO. WMO Guidelines on Multi-Hazard Impact-Based Forecast and Warning Services; Technical Report; WMO: Geneva, Switzerland, 2015. [Google Scholar]
Styve, L.; Navarra, C.; Petersen, J.M.; Neset, T.S.; Vrotsou, K. A Visual Analytics Pipeline for the Identification and Exploration of Extreme Weather Events from Social Media Data. Climate 2022, 10, 174. [Google Scholar] [CrossRef]
Steen, M. Co-Design as a Process of Joint Inquiry and Imagination. Des. Issues 2013, 29, 16–28. [Google Scholar] [CrossRef]
Zhang, C.; Fan, C.; Yao, W.; Hu, X.; Mostafavi, A. Social Media for Intelligent Public Information and Warning in Disasters: An Interdisciplinary Review. Int. J. Inf. Manag. 2019, 49, 190–207. [Google Scholar] [CrossRef]
Kucher, K.; Schamp-Bjerede, T.; Kerren, A.; Paradis, C.; Sahlgren, M. Visual Analysis of Online Social Media to Open up the Investigation of Stance Phenomena. Inf. Vis. 2016, 15, 93–116. [Google Scholar] [CrossRef]
Olteanu, A.; Castillo, C.; Diaz, F.; Vieweg, S. CrisisLex: A Lexicon for Collecting and Filtering Microblogged Communications in Crises. Proc. Int. AAAI Conf. Web Soc. Media 2014, 8, 376–385. [Google Scholar] [CrossRef]
Temnikova, I.; Castillo, C.; Vieweg, S. EMTerms 1.0: A Terminological Resource for Crisis Tweets. In Proceedings of the 12th International Conference on Information Systems for Crisis Response and Management, ISCRAM ’15, Krystiansand, Norway, 24–27 May 2015. [Google Scholar]
Imran, M.; Ofli, F.; Caragea, D.; Torralba, A. Using AI and Social Media Multimodal Content for Disaster Response and Management: Opportunities, Challenges, and Future Directions. Inf. Process. Manag. 2020, 57, 102261. [Google Scholar] [CrossRef]
Feng, Y.; Huang, X.; Sester, M. Extraction and Analysis of Natural Disaster-Related VGI from Social Media: Review, Opportunities and Challenges. Int. J. Geogr. Inf. Sci. 2022, 36, 1275–1316. [Google Scholar] [CrossRef]
Zahra, K.; Imran, M.; Ostermann, F.O. Automatic Identification of Eyewitness Messages on Twitter During Disasters. Inf. Process. Manag. 2020, 57, 102107. [Google Scholar] [CrossRef]
Roadway Flooding Image Dataset. 2019. Available online: https://www.kaggle.com/datasets/saurabhshahane/roadway-flooding-image-dataset (accessed on 29 May 2023).
Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A Comprehensive Survey on Transfer Learning. Proc. IEEE 2020, 109, 43–76. [Google Scholar] [CrossRef]
Plank, B.; Hovy, D.; Søgaard, A. Linguistically Debatable or Just Plain Wrong? In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), ACL’14, Dublin, Ireland, 22–27 May 2014; pp. 507–511. [Google Scholar] [CrossRef]
Aroyo, L.; Welty, C. Truth is a Lie: Crowd Truth and the Seven Myths of Human Annotation. AI Mag. 2015, 36, 15–24. [Google Scholar] [CrossRef] [Green Version]
Artstein, R.; Poesio, M. Inter-Coder Agreement for Computational Linguistics. Comput. Linguist. 2008, 34, 555–596. [Google Scholar] [CrossRef] [Green Version]
Uma, A.N.; Fornaciari, T.; Hovy, D.; Paun, S.; Plank, B.; Poesio, M. Learning from Disagreement: A Survey. J. Artif. Intell. Res. 2021, 72, 1385–1470. [Google Scholar] [CrossRef]
Rawat, W.; Wang, Z. Deep Convolutional Neural Networks for Image Classification: A Comprehensive Review. Neural Comput. 2017, 29, 2352–2449. [Google Scholar] [CrossRef]
Tan, M.; Le, Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; Volume 97, pp. 6105–6114. [Google Scholar]
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR’17, Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar] [CrossRef] [Green Version]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR’16, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef] [Green Version]
Kowsari, K.; Jafari Meimandi, K.; Heidarysafa, M.; Mendu, S.; Barnes, L.; Brown, D. Text Classification Algorithms: A Survey. Information 2019, 10, 150. [Google Scholar] [CrossRef] [Green Version]
Minaee, S.; Kalchbrenner, N.; Cambria, E.; Nikzad, N.; Chenaghlu, M.; Gao, J. Deep Learning-Based Text Classification: A Comprehensive Review. ACM Comput. Surv. 2021, 54, 1–40. [Google Scholar] [CrossRef]
Liu, J.; Chang, W.C.; Wu, Y.; Yang, Y. Deep Learning for Extreme Multi-Label Text Classification. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’17, Madrid, Spain, 11–15 July 2017; pp. 115–124. [Google Scholar] [CrossRef]
Chowdhary, K.R. Natural Language Processing. In Fundamentals of Artificial Intelligence; Springer: Berlin/Heidelberg, Germany, 2020; pp. 603–649. [Google Scholar] [CrossRef]
Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M.; et al. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, EMNLP ’20, Online, 16–20 November 2020; pp. 38–45. [Google Scholar] [CrossRef]
Middleton, S.E.; Kordopatis-Zilos, G.; Papadopoulos, S.; Kompatsiaris, Y. Location Extraction From Social Media: Geoparsing, Location Disambiguation, and Geotagging. ACM Trans. Inf. Syst. 2018, 36, 1–27. [Google Scholar] [CrossRef] [Green Version]
Buscaldi, D. Approaches to Disambiguating Toponyms. SIGSPATIAL Spec. 2011, 3, 16–19. [Google Scholar] [CrossRef] [Green Version]
Magge, A.; Weissenbacher, D.; Sarker, A.; Scotch, M.; Gonzalez-Hernandez, G. Deep Neural Networks and Distant Supervision for Geographic Location Mention Extraction. Bioinformatics 2018, 34, i565–i573. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Xu, C.; Li, J.; Luo, X.; Pei, J.; Li, C.; Ji, D. DLocRL: A Deep Learning Pipeline for Fine-Grained Location Recognition and Linking in Tweets. In Proceedings of the World Wide Web Conference, WWW ’19, San Francisco, CA, USA, 13–17 May 2019; pp. 3391–3397. [Google Scholar] [CrossRef] [Green Version]
Weyand, T.; Kostrikov, I.; Philbin, J. PlaNet—Photo Geolocation with Convolutional Neural Networks. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Part VIII 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 37–55. [Google Scholar] [CrossRef] [Green Version]
Miksch, S.; Aigner, W. A Matter of Time: Applying a Data–Users–Tasks Design Triangle to Visual Analytics of Time-oriented Data. Comput. Graph. 2014, 38, 286–290. [Google Scholar] [CrossRef]
Shneiderman, B. The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations. In Proceedings of the IEEE Symposium on Visual Languages, VL’96, Boulder, CO, USA, 3–6 September 1996; pp. 336–343. [Google Scholar] [CrossRef] [Green Version]
Sedlmair, M.; Meyer, M.; Munzner, T. Design Study Methodology: Reflections from the Trenches and the Stacks. IEEE Trans. Vis. Comput. Graph. 2012, 18, 2431–2440. [Google Scholar] [CrossRef] [Green Version]
Isenberg, T.; Isenberg, P.; Chen, J.; Sedlmair, M.; Möller, T. A Systematic Review on the Practice of Evaluating Visualization. IEEE Trans. Vis. Comput. Graph. 2013, 19, 2818–2827. [Google Scholar] [CrossRef] [PubMed] [Green Version]

Figure 1. Outline of the design space of the proposed VGI-facilitated visual analytic approach and envisioned VA pipeline.

Figure 2. Application of a custom version of a previously developed visual text analytic tool uVSAT for flood-related lexical marker analysis in tweets in English as part of the data collection stage. (a) Timeline view with separate and combined results of CrisisLex and EMTerms marker detection. (b) Document view with the complete set of tweets loaded for exploration and potential export of further markers.

Figure 3. Visualization dashboard used in combination with several computational models for the tasks of tweet relevance classification and location extraction [22]. Here, similar to the pilot studies described in Section 4.1, the data and the respective developed models were based on tweets in English rather than the target social media posts in Swedish.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Vrotsou, K.; Navarra, C.; Kucher, K.; Fedorov, I.; Schück, F.; Unger, J.; Neset, T.-S. Towards a Volunteered Geographic Information-Facilitated Visual Analytics Pipeline to Improve Impact-Based Weather Warning Systems. Atmosphere 2023, 14, 1141. https://doi.org/10.3390/atmos14071141

AMA Style

Vrotsou K, Navarra C, Kucher K, Fedorov I, Schück F, Unger J, Neset T-S. Towards a Volunteered Geographic Information-Facilitated Visual Analytics Pipeline to Improve Impact-Based Weather Warning Systems. Atmosphere. 2023; 14(7):1141. https://doi.org/10.3390/atmos14071141

Chicago/Turabian Style

Vrotsou, Katerina, Carlo Navarra, Kostiantyn Kucher, Igor Fedorov, Fredrik Schück, Jonas Unger, and Tina-Simone Neset. 2023. "Towards a Volunteered Geographic Information-Facilitated Visual Analytics Pipeline to Improve Impact-Based Weather Warning Systems" Atmosphere 14, no. 7: 1141. https://doi.org/10.3390/atmos14071141

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Towards a Volunteered Geographic Information-Facilitated Visual Analytics Pipeline to Improve Impact-Based Weather Warning Systems

Abstract

1. Introduction

2. Related Work

3. Motivation and Background

4. Design Space

4.1. Data Collection

4.1.1. Sources and Modalities

4.1.2. Collection Methods

4.2. Classification

4.2.1. Training Data

4.2.2. Image Classification

4.2.3. Text Classification and Analysis

4.2.4. Location Extraction

4.3. Visualization

4.3.1. Data-Users-Tasks

4.3.2. Visual Interface

5. Discussion

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI