Rapid Damage Estimation of Texas Winter Storm Uri from Social Media Using Deep Neural Networks

Pi, Yalong; Ye, Xinyue; Duffield, Nick; on behalf of the Microsoft AI for Humanitarian Action Group,

doi:10.3390/urbansci6030062

Open AccessTechnical Note

Rapid Damage Estimation of Texas Winter Storm Uri from Social Media Using Deep Neural Networks

¹

Institute of Data Science, Texas A&M University, College Station, TX 77843, USA

²

Department of Landscape Architecture and Urban Planning, Texas A&M University, College Station, TX 77843, USA

³

Department of Electrical and Computer Engineering & Institute of Data Science, Texas A&M University, College Station, TX 77843, USA

^*

Authors to whom correspondence should be addressed.

^†

Microsoft Corporation, Redmond, WA 98052, USA.

^‡

A complete list of group members is provided in the Acknowledgments.

Urban Sci. 2022, 6(3), 62; https://doi.org/10.3390/urbansci6030062

Submission received: 18 July 2022 / Accepted: 2 September 2022 / Published: 13 September 2022

(This article belongs to the Special Issue Feature Papers in Urban Science)

Download

Browse Figures

Versions Notes

Abstract

:

The winter storm Uri that occurred in February 2021 affected many regions in Canada, the United States, and Mexico. The State of Texas was severely impacted due to the failure in the electricity supply infrastructure compounded by its limited connectivity to other grid systems in the United States. The georeferenced estimation of the storm’s impact is crucial for response and recovery. However, such information was not available until several months afterward, mainly due to the time-consuming and costly assessment processes. The latency to provide timely information particularly impacted people in the economically disadvantaged communities, who lack resources to ameliorate the impact of the storm. This work explores the potential for disaster impact estimation based on the analysis of instant social media content, which can provide actionable information to assist first responders, volunteers, governments, and the general public. In our prototype, a deep neural network (DNN) uses geolocated social media content (texts, images, and videos) to provide monetary assessments of the damage at zip code level caused by Uri, achieving up to 70% accuracy. In addition, the performance analysis across geographical regions shows that the fully trained model is able to estimate the damage for economically disadvantaged regions, such as West Texas. Our methods have the potential to promote social equity by guiding the deployment or recovery resources to the regions where it is needed based on damage assessment.

Keywords:

Texas Winter Storm Uri; deep neural network; damage estimation; social media; natural language processing; geographic information system

1. Introduction

The severe Winter Storm Uri occurred during 11–21 February 2021, causing power outages across the State of Texas. It was declared a national disaster in the US, with the Federal Emergency Management Agency (FEMA) coordinating the federal government response. Texas was severely impacted due to the failure in the electricity supply infrastructure compounded by its limited connectivity to other grid systems in the United States. Loss of electricity supply affected 10 million people directly, with cascading failures in heat, water, transportation, and food supply, among others [1,2]. In the aftermath, the monetary damage assessment across communities is crucial for efficient and equitable allocation of resources for response and recovery. However, this assessment has still not been widely available months after the event, due to the resource-demanding and time-consuming preliminary damage assessment (PDA) procedures including self-reporting, onsite surveys, and insurance claims [3]. Although some insurance organizations, such as the Insurance Council of Texas (ICT), collected data for building and vehicle claims, they did not publish the raw data.

Social media platforms such as Facebook and Twitter allow users to share real-time data including text, image, and video with location information. During disasters, people can facilitate information diffusion, gain situational awareness, and request assistance to enhance disaster response through social media platforms [4,5,6,7,8,9,10]. Such digital platforms can provide a communication channel to disadvantaged communities through the accounts of local residents or volunteers [11,12,13]. For instance, one recent work utilized power outages, pipe bursts, and food accessibility data on Mapbox, SafeGraph, and 311 in Harris Country, Texas, during Uri, and the following analysis revealed that low-income and racial/ethnic minority groups were more disrupted [14]. While social media can be useful in disaster information exchange, the representativeness of such data varies by demographic factors such as age, gender, race, and educational attainment, potentially creating bias in data analysis [15,16]. For instance, younger users in urban areas are more likely to geotag their social media messages [17,18].

The spatial difference in the social media contents from various communities can be used to identify the needs of diverse social groups and measure the spatial inequality of disaster impact. Considering the input data complexity and various formats, this work proposes a deep neural network (DNN)-based framework to represent the nonlinear relationship between the social media damage description and straightforward monetary damage. Particularly, the social media data in Winter Storm Uri and damage data collected by FEMA are employed to train, validate, and test such a DNN framework. The content of this paper is organized as follows. Section 2 reviews the related work and how this work is related to the current literature. Section 3 demonstrates the methodology, including the details of data collection, DNN architecture design, training, and testing arrangement. Following that, Section 4 shows the results analysis and the estimation products. Lastly, Section 5 provides the conclusion and lists the future research directions.

2. Related Work

There are many aspects in harnessing social media to assist disaster response and recovery [19,20]. One of the methods is the sentiment sensing based on natural language processing (NLP) technique. It extracts people’s reactions and situation awareness by classifying the text data of social media into positive, negative, or neutral categories [21]. Another aspect is information fission and coordination. For instance, a research team from Washington created Twitris to classify emergency-related topics and visualize the results on a Geographic Information System (GIS) map for the general public [22]. Mobility pattern can also be measured from social media. For instance, georeferenced tweets were utilized to quantify and predict New York City citizens’ movements before and after Hurricane Sandy [23,24].

Besides the above-mentioned various applications, damage estimation is critical to efficient first response and resource delivery, and a wide range of methods have been used to address this problem. For instance, social media was adopted to derive a disaster index related ratio (DIRR) and sentiment in each county in Florida during Hurricane Matthew, and suggested the positive correlation between damage claims and DIRR and sentiment [9]. Other research studied Hurricane Sandy by utilizing a multiscale analysis of Twitter activity before, during, and after the event, concluding that negative sentiment correlates with per-capita damage [25]. Even though correlations have limited applications, they show that it is possible to inform damage from social big data.

Other than measuring such correlations, computer vision techniques have been used to measure the severity of the impact. One research team constructed convolutional neural network (CNN) models to classify hurricane and earthquake damage images from social media into three categories (none, mild, severe) with 76% accuracy [26]. One more recent work fused CNN and natural language processing (NLP) techniques to generate captions for the social media images that enable victim and building damage identification [27]. With proper aggregation and analysis, these images can provide the detailed severity description to the users, including government, volunteers, and first responders. However, this method requires images taken from cameras and angles similar to the training dataset, which are not always available on social media. Despite the works of correlation analysis, GIS mapping, and computer vision classification, there is limited work on the end-to-end monetary loss estimation from social media, especially at the household level. A deep neural network (DNN), which has many layers of neurons, hence deep, is good at modeling complex relationships between a wide range of input and output [28]. For example, a supervised DNN with enough data can model chemical and physical processes with high speed [29], recognize images via CNN architectures [30], and predict stock market returns based on historical data [31]. The reason behind this is that the high-dimensional neural connections and weights can be optimized by many iterations based on the training data. Therefore, this work attempts to utilize a DNN to map the social media data (texts, images, and videos) to the straightforward monetary loss.

3. Methodology

3.1. DNN Structure

Equation (1) shows the high-level concept of the DNN design. In this equation, z indicates zip code and DNN represents the DNN function. Variables Text, Image, and Video are all the vectorized texts, images, and videos from social media located within that zip code area z. On the output side of the DNN function is the numerical description of the impact, such as Damage and Injury, in each zip code area. It is worth noting that the data categories on both input and output sides of the equation are expandable, e.g., the amount of displaced people can append to the output, as long as the data can be vectorized (converted to numerical values). With this end-to-end structure, the DNN model will produce the numerical estimates, such that the social media usage bias mentioned in Section 1 will be modeled in the DNN weights. Indeed, there is bias in the intermediate features; however, the accurate estimation on the output end provides valuable information regardless.

The reason for selecting zip code as the model building granularity is twofold. First, each zip code area covers similar population in Texas, as shown in Figure 1, compared to the larger population across counties. According to this histogram, most zip code offices serve fewer than 60,000 people, whereas the most populated county in Texas is Harris County, with 4.7 million residents. The second is that FEMA publishes household-level damage data with zip code and county information but without precise location (latitude and longitude) for privacy protection. Therefore, the zip code is selected to align the input and output resolution. In other words, this DNN is designed to sense the disaster reaction and records in each zip code, based on the underling similar resiliency and behavior commonalities in close communities.

D N N [\begin{matrix} T e x t_{z} \\ I m a g e_{z} \\ V i d e o_{z} \\ . . . \end{matrix}] = [\begin{matrix} D a m a g e_{z} \\ I n j u r y_{z} \end{matrix}] . . .

(1)

Figure 2 shows the details of the DNN architecture in this research with the input (green box states “text”) on the left and output (blue box states “damage data”) on the right. This work aims to test the plausibility of such a DNN design; hence, only the text data are used, rather than fully integrating multiple data formats such as images and videos. Nevertheless, image and video data can also be vectorized by using CNN, long short-term memory (LSTM), and other techniques such that they can be combined with text data as input. On the input side, all the text data in zip code (z) will be stacked together and vectorized to a 1024-dimensional vector using HashingVectorizer [32]. In the network in between, i.e., hidden layers, there are 5 fully connected layers marked with yellow color. The number of neurons of each hidden layer is 512, 256, 128, 64, and 32, respectively. Lastly, the output layer is a 7-dimensional vector representing the average economic loss in 7 categories defined by FEMA. The input and output neurons are also fully connected with the hidden layers. At the bottom of Figure 2, the dimensions of input, each hidden layer, and output are marked in solid boxes with the corresponding color.

Between each two connected neurons, activation function rectified linear units (ReLU) is used to add non-linearity [33]. The L1 loss function, as shown in Equation (2), is used in the training process. In this equation, z indicates the zip code and c represents the FEMA damage category,

d a m a g e_{t r u e}

is the ground truth (FEMA) average household damage in USD, and

d a m a g e_{p r e d i c t}

is the DNN predicted damage data. The prepared dataset is split into three parts: training (80%), validation (10%), and testing (10%). While training, the DNN weights are optimized by using the stochastic gradient descent technique [34] to minimized the L1 loss so that the DNN predictions are as close as possible to the FEMA damage data. The initial learning rate is set to be 0.001 and decreases by one magnitude if no L1 loss decrease occurs in 3 iterations. To avoid overfitting, the validation loss is monitored and the training process is terminated if the validation loss does not decrease within 10 iterations [35]. Lastly, the testing portion is used to report the DNN performance.

L 1 l o s s = \sum_{i = 1}^{z} \sum_{j = 1}^{c} |d a m a g e_{t r u e} - d a m a g e_{p r e d i c t i o n}|

(2)

3.2. Social Media Data and Keywords Calculation

Previous studies have collected data from a variety of social media, such as Facebook, Twitter, and Flickr. However, some platforms limit data access due to privacy concerns, e.g., Facebook. This research uses Twitter because it allows users to share all types of data (e.g., texts, images, videos, polls, links, and hashtags) with geotags, which is critical to spatial damage analysis and response. Most importantly, Twitter recently launched an “academic research product track” providing access to 10 million records per month from its historical data repository [36], which was not available before. Therefore, the Twitter application programming interface (API), which is a keywords based system, is used to collect tweets during Uri in Texas.

There are broadly three types of social media noise: constant background topics (e.g., daily traffic), overlapping event (e.g., Valentine’s day and Uri), and random signals (e.g., advertisement robots) [37]. In order to only mine the winter-storm-related text data, this work introduces a keywords calculation mechanism to generate closely related keywords for the Twitter API system. In detail, 1% (2 million) of all the global tweets in February 2021 is collected from an online library named Internet Archive (accessed 1 November 2021 from https://archive.org/details/twitterstream). Next, all the tweets with location in Texas are divided into three parts by Uri’s timeframe of 11–21 February 2021 (FEMA definition): February before (FB), February during (FD), and February after (FA), meaning the tweets before, during, and after Uri, respectively. In addition, a week’s worth of current data (from 1 October) is collected to form a subset of October current (OC). The purpose of such data arrangement is to automatically calculate Uri-related keywords using Equation (3), followed by using these keywords to search as many data as possible via Twitter API.

k e y w o r d s = (F D + F A) - (O C + F B)

(3)

In this equation, FB, FD, FA, and OC are lists of keywords extracted from FB, FD, FA, and OC tweets using Rake [38]. The term keywords represents the desired keywords for Twitter API. The reasoning of Equation (3) is as follows. Table 1 shows the cosine similarities [39] comparison of these four portions. According to this table, OC shares the least similarity among them due to its different season, and appears to be closer to FB and FA than FD. The reason could be that OC, FB, and FA are regular times, whereas FD is undergoing a storm impact. Looking at February only, FD appears to be closer to FA than FB. This implies that the impact of Uri continues in the aftermath FA which is different from regular time FB. Therefore, in Equation (3), the operator “+” combines the keywords during and after Uri (FD + FA) and regular time keywords (OC + FB). Then, operator “−” subtracts the two and removes regular time keywords, leaving only Uri-related keywords. Given the maximum API limit of 1000 characters (roughly 92 words excluding necessary operators such as location), the top 92 keywords were used to mine the Twitter repository without retweets. The full list is below:

“sanantonio OR career OR opening OR warm OR jobs OR park OR interesting OR alerts OR cold OR event OR stones OR measured OR mexico OR heavy OR due OR group OR mobile OR manager OR security OR fun OR reports OR media OR details OR sales OR snow OR turn OR recommend OR facebook OR view OR video OR apply OR places OR blvd OR landmark OR utc OR stepping OR shreveport OR exit OR santa OR rock OR discover OR valentine OR cleburne OR tree OR dollar OR inch OR odessa OR link OR ave OR public OR atxtraffic OR nurse OR hiring OR place OR titles OR chase OR technician OR san OR blocked OR open OR case OR read OR oklahoma OR university OR gainesville OR hear OR pkwy OR bank OR engineering OR follow OR center OR west OR traffic OR ice OR stay OR antonio OR earthquake OR wsw OR cst OR weather OR latest OR wnw OR service OR click OR plano OR left OR bio OR store OR vday OR okctraffic OR created OR fort OR round”.

The operator “OR” in between each word is to define the search logic, i.e., tweets containing any of the keywords are defined as relevant. This list contains unique keywords to winter storm such as “warm”, “cold”, and “snow”. Nevertheless, other seasonal keywords, such as “valentine” are also in this list. According to the latest Twitter features, users can geotag a tweet with place (polygon with multiple points) or geoinformation (one point with latitude and longitude). The place polygon can be as big as Texas or as small as a park. While mining, the geotag filter is “place:Texas” which defines a rectangle with the most north–west and east–south coordinates for content filtering. Due to this rectangle covering some parts of Mexico, Oklahoma, and other neighboring regions, keywords such as “oklahoma” and “mexico” are also in the list.

Mining with the above-mentioned 92 keywords resulted in a dataset named Uri with tweet text strings, image, and video attachment URLs if available, user ID, time stamp, geoinformation if available, and place ID. Altogether, 16,028 tweets in FD and FA periods were mined within Texas, with 4934 with attachment URLs (images or videos). However, only 1143 of them have latitude–longitude coordinates, which is consistent with the fact that less than 1% of tweets tag their pinpoint locations [40]. All these coordinates were used to compute the zip code for each tweet in the dataset Uri.

3.3. Damage Statistics from FEMA

The household damage was collected via OPENFEMA API [41], which covers all historical public assistance (PA) and individual assistance (IA) approved by FEMA in all national declared disasters, including Uri. Each household has a record with up to 96 categories of information, including income, rent/own status, occupancy number/age, structure type, insurance, and others. Since the focus of this paper is the household-level damage, only the IA data are used. Table 2 displays the monetary economic loss categories with their description and source: individual and households program (IHP) amount (ihpAmont), housing assistance (HA) amount (haAmount), other needs assistance (ONA) amount (onaAmount), real property damage amount (rpfvl), personal property (ppfvl), rental assistance amount (rentalAssistanceAmount), and repair amount (repairAmount). The data in each category are all aggregated and averaged at the zip code level and paired with social media data.

The completed dataset Uri has social media data from 550 different zip codes and FEMA damage data from 1587 zip codes. Considering that Texas has 1930 zip codes in total, FEMA does not cover all the zip codes, especially rural regions (visualized in Section 4). Altogether, the dataset Uri has 338 paired data points with both social media and damage data as input and output for the DNN building. According to the methodology, 270 (80%) data points were used as training and the rest 68 (20%) were for validation and testing. There are 212 zip codes that have social media relevant to the winter storm but no FEMA damage reports. The fully trained DNN will be used to estimate the missing damage statistics, hence it is important to assist the communities at a disadvantage [42].

4. Analysis and Results

Based on the hyperparameters defined in Section 3.1, the DNN model is trained on the training portion and terminated at roughly 9000 iterations. The fully trained model is then tested on the testing portion to compare the prediction with the ground truth. Figure 3 shows the comparison for each testing zip code, where the X-axis indicates the prediction average loss in USD, and the Y-axis represents the damage measured by FEMA. In the 100% accurate scenario, all the testing points of all categories (colors) should fall on the 45 degree line. By definition, the points in the upper left section are the underestimation cases and the points in the lower right section are the overestimation cases. According to this figure, underestimation is more common than overestimation, especially at higher values, e.g., ihpAmount.

The accuracy for each category c is calculated using Equation (4) below. Here, z represents the zip code in the testing portion,

d a m a g e_{t r u e}

is the ground truth FEMA damage, and

d a m a g e_{p r e d i c t}

means the DNN-predicted damage data. We choose this metric since it is equivalent to averaging relative error over each estimate, but is weighted by the ground truth value. According to this figure, the best-performing categories are ihpAmont, haAmount, and rentalAssistanceAmount, achieving accuracy of 68.43%, 68.39%, and 70.07%, respectively. Moreover, the smaller values, e.g., ppfvl and onaAmount, often are associated with lower precision (51.89% and 34.35%) compared to others.

\begin{matrix} a c c u r a c y_{c} = 1 - \sum_{i = 1}^{z} \frac{|d a m a g e_{t r u e} - d a m a g e_{p r e d i c t i o n}|}{d a m a g e_{t r u e}} \end{matrix}

(4)

As mentioned in Section 3.3, FEMA did not survey 212 zip codes for the storm damage. However, this does not mean there is no damage in those areas. The fully trained DNN model can be used to estimate the average damage amount for them in all seven categories based on real-time social media data. To visualize such capability on a GIS map, the ihpAmount is selected because it has a high value, thus clear variations. Figure 4 illustrates the ihpAmount from FEMA and the DNN model. In this map, the gray area represents that there are no data from both social media and FEMA. The 338 paired data points in the dataset Uri, i.e., zip codes with both FEMA and social media data, are marked with green color representing the FEMA evaluated amount in USD. The intensity of the color shows the value variation in each legend. There are zip codes with only social media but no FEMA evaluation, and the DNN can estimate the damage for these neglected regions. The DNN-estimated ihpAmount is color-coded with red, indicating the USD values in five bins: 0–105, 105–208, 209–319, 319–429, and 429–533. Similarly, the zip codes that only have FEMA damage data are marked with blue and are visualized with different color intensity.

From this map, it is observed that many zip codes in rural West Texas do not have FEMA survey data, whereas metropolitan areas such as Houston and Dallas are fully covered (blue and green). In comparison, the DNN-estimated damage has less variation (0–522) than the FEMA damage data (0–10,070). Still, some zip codes in rural West Texas miss both social media and FEMA data. This can be solved by localizing more users there even they do not share geoinformation. Densely populated areas, i.e., small-sized zip codes, often have less damage than large-sized zip codes in both the DNN and FEMA damage. It is also observed that several zip codes in Oklahoma are reported by FEMA and are hence mapped in this figure.

In order to evaluate the damage prediction accuracy across geographies, Figure 5 displays the ihpAmount prediction error for each paired zip code, i.e., green in Figure 4. The error percentage is calculated by Equation (5) for each zip code, where

d a m a g e_{t r u e}

is the FEMA ihpAmount and

d a m a g e_{p r e d i c t}

means the DNN predicted value. According to this definition, the lower the error (coded with lighter color), the closer the prediction is to the FEMA survey, and the darker the color, the less accurate for the DNN estimation. The color intensity is divided into 10 equal bins by data size, with the remaining marked in light blue. It is worth noting that this map only shows the overlapping (FEMA survey and DNN estimation) ihpAmount to show the accuracy as an example; the other categories can be analyzed the same way. In this map, 7 out of 10 bins have an estimation error below 43%, which is consistent with the average ihpAmount 68.42% accuracy in Figure 3. The more accurate estimations appear close to the cities (Houston, Dallas, and San Antonio) compared to rural regions. The map distribution also suggests more data collection in the centers of Houston and Dallas areas, as they are voids. Altogether, the DNN only trained on text data is capable of predicting average damage with up to 70% accuracy and is able to estimate damage for the communities that are often neglected.

\begin{matrix} e r r o r = \frac{|d a m a g e_{t r u e} - d a m a g e_{p r e d i c t i o n}|}{d a m a g e_{t r u e}} \end{matrix}

(5)

5. Conclusions and Discussion

This work demonstrated a prototype that intakes georeferenced real-time social media data (texts, images, and videos) and estimates the monetary damage statistics using DNN techniques. The output estimation could be visualized and mapped for end users including government, non-governmental organizations (NGOs), first responders, news media, and the general public. To test the plausibility of such DNN model, a keyword calculation technique was developed to mine the text data relevant to Winter Storm Uri on Twitter. The damage statistics of seven different categories were collected from FEMA individual assistance reports and paired with the social media data on the zip code level to form the dataset Uri. A fully connected DNN structure was designed and trained, validated, and tested on the dataset Uri to predict the average monetary loss in USD based on the georeferenced text data. The fully trained DNN model achieved up to 70% precision when tested on the testing portion, indicating successful damage estimation. Moreover, the fully trained DNN could help in estimating the damage where FEMA survey does not cover, especially fpr rural and low-income communities who are often at a disadvantage.

Although only 1143 tweets from 550 zip codes were used in this study, the fully trained DNN was able to achieve a 70% accuracy, which shows the potential of the proposed method. The total collected data size is 16,028 tweets (with images), with a great portion of them not having location information. Based on the common understanding of DNN research, the more training data there are, the more accurate the model becomes. Hence, the next step is to develop tweet localization techniques to enlarge the dataset Uri. This could be achieved by using the user’s previous location, social network region, or other information to derive the current geoinformation. However, this is beyond the scope of this research. Moreover, the information-rich image and video (images and audio) data can be combined with the current DNN, which is another way to increase the data size. This work is set to predict zip-code-level average damage due to the location information availability. Future work of downscaling, i.e., from zip-code-level average onto household level, could improve the granularity of this technique and, hence, greatly assist general public risk awareness, resource distribution, and volunteer arrangement. Finally, we seek to generalize our approach to investigate the extent to which our approach, and even a specific regional model, is applicable to comparable events occurring in other regions.

Author Contributions

All authors conceived and designed the study and outlined the methodology; Y.P. analyzed the data and drafted the manuscript; X.Y. and N.D. extensively updated the manuscript. Microsoft AI for Humanitarian Action Group assisted the computational solution on Azure. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

This material is based upon work supporting “Rapid damage prediction from social media using historical big data and deep learning” through Microsoft AI for Humanitarian Action. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funder. We thank all members of the theMicrosoft AI for Humanitarian Action Group, Cameron Birge, Microsoft Philanthropies, AI for Humanitarian Action Program Lead.

Conflicts of Interest

The authors declare no conflict of interest.

References

Kemabonta, T. Grid Resilience analysis and planning of electric power systems: The case of the 2021 Texas electricity crises caused by winter storm Uri (\# TexasFreeze). Electr. J. 2021, 34, 107044. [Google Scholar]
Bottner, R.; Weems, J.; Hill, L.G.; Ziebell, C.; Long, S.; Young, S.; Sasser, M.; Ferguson, A.; Tirado, C. Addiction Treatment Networks Cannot Withstand Acute Crises: Lessons from 2021 Winter Storm Uri in Texas. NAM Perspect. 2021, 2021. [Google Scholar] [CrossRef] [PubMed]
Federal Emergency Management Agency. Damage Assessment Operations Manual; Federal Emergency Management Agency: Washington, DC, USA, 2019.
Gao, H.; Barbier, G.; Goolsby, R. Harnessing the Crowdsourcing Power of Social Media for Disaster Relief. IEEE Intell. Syst. 2011, 26, 10–14. [Google Scholar] [CrossRef]
Wang, Z.; Ye, X. Social media analytics for natural disaster management. Int. J. Geogr. Inf. Sci. 2018, 32, 49–72. [Google Scholar] [CrossRef]
Ye, X.; Wei, X. A multi-dimensional analysis of El Niño on Twitter: Spatial, social, temporal, and semantic perspectives. ISPRS Int. J. Geo-Inf. 2019, 8, 436. [Google Scholar] [CrossRef]
Heglund, J.; Hopkinson, K.M.; Tran, H.T. Social sensing: Towards social media as a sensor for resilience in power systems and other critical infrastructures. Sustain. Resilient Infrastruct. 2021, 6, 94–106. [Google Scholar] [CrossRef]
Yaqub, W.; Kakhidze, O.; Brockman, M.L.; Memon, N.; Patil, S. Effects of credibility indicators on social media news sharing intent. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, 25–30 April 2020; pp. 1–14. [Google Scholar]
Yuan, F.X.; Liu, R. Mining Social Media Data for Rapid Damage Assessment during Hurricane Matthew: Feasibility Study. J. Comput. Civ. Eng. 2020, 34, 5020001. [Google Scholar] [CrossRef]
Yue, Y.; Dong, K.; Zhao, X.; Ye, X. Assessing wild fire risk in the United States using social media data. J. Risk Res. 2021, 24, 972–986. [Google Scholar] [CrossRef]
Wang, Z.; Ye, X.; Tsou, M.H. Spatial, temporal, and content analysis of Twitter for wildfire hazards. Nat. Hazards 2016, 83, 523–540. [Google Scholar] [CrossRef]
Wang, Z.; Ye, X. Space, time, and situational awareness in natural hazards: A case study of Hurricane Sandy with social media data. Cartogr. Geogr. Inf. Sci. 2019, 46, 334–346. [Google Scholar] [CrossRef]
Hao, H.; Wang, Y. Leveraging multimodal social media data for rapid disaster damage assessment. Int. J. Disaster Risk Reduct. 2020, 51, 101760. [Google Scholar] [CrossRef]
Lee, C.C.; Maron, M.; Mostafavi, A. Community-scale Big Data Reveals Disparate Impacts of the Texas Winter Storm of 2021 and its Managed Power Outage. arXiv 2021, arXiv:2108.06046. [Google Scholar]
Gong, Z.; Cai, T.; Thill, J.C.; Hale, S.; Graham, M. Measuring relative opinion from location-based social media: A case study of the 2016 US presidential election. PLoS ONE 2020, 15, e0233660. [Google Scholar] [CrossRef] [PubMed]
Zhang, Z.; Yin, D.; Virrantaus, K.; Ye, X.; Wang, S. Modeling human activity dynamics: An object-class oriented space–time composite model based on social media and urban infrastructure data. Comput. Urban Sci. 2021, 1, 1–13. [Google Scholar] [CrossRef]
Malik, M.; Lamba, H.; Nakos, C.; Pfeffer, J. Population bias in geotagged tweets. In Proceedings of the International AAAI Conference on Web and Social Media, Oxford, UK, 26–29 May 2015; Volume 9, pp. 18–27. [Google Scholar]
Ye, X.; Wu, L.; Lemke, M.; Valera, P.; Sackey, J. Defining computational urban science. In New Thinking in GIScience; Springer: Berlin/Heidelberg, Germany, 2022; pp. 293–300. [Google Scholar]
Luna, S.; Pennock, M.J. Social media applications and emergency management: A literature review and research agenda. Int. J. Disaster Risk Reduct. 2018, 28, 565–577. [Google Scholar] [CrossRef]
Ye, X.; Niyogi, D. Resilience of human settlements to climate change needs the convergence of urban planning and urban climate science. Comput. Urban Sci. 2022, 2, 1–4. [Google Scholar] [CrossRef]
Beigi, G.; Hu, X.; Maciejewski, R.; Liu, H. An overview of sentiment analysis in social media and its applications in disaster relief. Sentim. Anal. Ontol. Eng. 2016, 313–340. [Google Scholar]
Purohit, H.; Sheth, A. Twitris v3: From citizen sensing to analysis, coordination and action. In Proceedings of the International AAAI Conference on Web and Social Media, Cambridge, MA, USA, 8–11 July 2013; Volume 7, pp. 746–747. [Google Scholar]
Wang, Q.; Taylor, J.E. Quantifying human mobility perturbation and resilience in Hurricane Sandy. PLoS ONE 2014, 9, e112608. [Google Scholar]
Wang, Q.; Taylor, J.E. Patterns and limitations of urban human mobility resilience under the influence of multiple types of natural disaster. PLoS ONE 2016, 11, e0147299. [Google Scholar] [CrossRef]
Kryvasheyeu, Y.; Chen, H.; Obradovich, N.; Moro, E.; Van Hentenryck, P.; Fowler, J.; Cebrian, M. Rapid assessment of disaster damage using social media activity. Sci. Adv. 2016, 2, e1500779. [Google Scholar] [CrossRef]
Nguyen, D.T.; Ofli, F.; Imran, M.; Mitra, P. Damage assessment from social media imagery data during disasters. In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Sydney, Australia, 31 July–3 August 2017; pp. 569–576. [Google Scholar]
Mouzannar, H.; Rizk, Y.; Awad, M. Damage Identification in Social Media Posts using Multimodal Deep Learning. In Proceedings of the ISCRAM, Rochester, NY, USA, 20–23 May 2018. [Google Scholar]
Abiodun, O.I.; Jantan, A.; Omolara, A.E.; Dada, K.V.; Mohamed, N.A.; Arshad, H. State-of-the-art in artificial neural network applications: A survey. Heliyon 2018, 4, e00938. [Google Scholar] [CrossRef] [PubMed]
Behler, J.; Parrinello, M. Generalized neural-network representation of high-dimensional potential-energy surfaces. Phys. Rev. Lett. 2007, 98, 146401. [Google Scholar] [CrossRef] [PubMed]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Qiu, M.; Song, Y.; Akagi, F. Application of artificial neural network for the prediction of stock market returns: The case of the Japanese stock market. Chaos Solitons Fractals 2016, 85, 1–7. [Google Scholar] [CrossRef]
Varoquaux, G.; Buitinck, L.; Louppe, G.; Grisel, O.; Pedregosa, F.; Mueller, A. Scikit-learn: Machine learning without learning the machinery. GetMobile Mob. Comput. Commun. 2015, 19, 29–33. [Google Scholar] [CrossRef]
Agarap, A.F. Deep learning using rectified linear units (relu). arXiv 2018, arXiv:1803.08375. [Google Scholar]
Bottou, L. Large-scale machine learning with stochastic gradient descent. In Proceedings of the COMPSTAT’2010, Paris, France, 22–27 August 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 177–186. [Google Scholar]
Hawkins, D.M. The problem of overfitting. J. Chem. Inf. Comput. Sci. 2004, 44, 1–12. [Google Scholar] [CrossRef]
Ahmed, W. Using Twitter as a data source an overview of social media research tools (2021). In Impact of Social Sciences Blog; London School of Economics and Political Science: London, UK, 2021. [Google Scholar]
Liang, Y.; Caverlee, J.; Cao, C. A noise-filtering approach for spatio-temporal event detection in social media. In Proceedings of the European Conference on Information Retrieval; Springer: Berlin/Heidelberg, Germany, 2015; pp. 233–244. [Google Scholar]
Campos, R.; Mangaravite, V.; Pasquali, A.; Jorge, A.; Nunes, C.; Jatowt, A. YAKE! Keyword extraction from single documents using multiple local features. Inf. Sci. 2020, 509, 257–289. [Google Scholar] [CrossRef]
Lahitani, A.R.; Permanasari, A.E.; Setiawan, N.A. Cosine similarity to determine similarity measure: Study case in online essay assessment. In Proceedings of the 2016 4th International Conference on Cyber and IT Service Management, Bandung, Indonesia, 26–27 April 2016; pp. 1–6. [Google Scholar]
Zheng, X.; Han, J.; Sun, A. A survey of location prediction on twitter. IEEE Trans. Knowl. Data Eng. 2018, 30, 1652–1671. [Google Scholar] [CrossRef]
Federal Emergency Management Agency. OpenFEMA Dataset; Federal Emergency Management Agency: Washington, DC, USA, 2022.
Shahraki, Z.K.; Fatemi, A.; Malazi, H.T. Evidential fine-grained event localization using Twitter. Inf. Process. Manag. 2019, 56, 102045. [Google Scholar] [CrossRef]

Figure 1. Texas population histogram by zip code.

Figure 2. DNN architecture.

Figure 3. Zip code average damage ($) estimation accuracy.

Figure 4. Distribution of ihpAmount ($) from FEMA, and the DNN model prediction.

Figure 5. Map of DNN ihpAmount estimation error (%).

Table 1. Similarity (%) matrix of keywords of FB, FD, FA, and OC.

	FB	FD	FA	OC
FB	100	52.22	65.55	22.22
FD	52.22	100	56.66	15.55
FA	65.55	56.66	100	21.11
OC	22.22	15.55	21.11	100

Table 2. Selected FEMA IA categories.

Category	Description	Source
ihpAmount	Total individual and households program (IHP) amount awarded in USD for eligible applicants.	IA owner, IA renter, IHP
haAmount	Amount awarded for housing assistance (HA) in USD from IHP.	IHP
onaAmount	Amount awarded in USD for other needs assistance (ONA) from IHP.	IHP
rpfvl	Real property damage amount.	IHP, IHP large
ppfvl	Value of disaster-caused damage to personal property components, including appliances and furniture.	IHP, IHP large
rentalAssistanceAmount	Amount of rental assistance awarded in USD.	IA renter, IHP
repairAmount	Amount of repair assistance awarded in USD.	IA owner, IHP

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pi, Y.; Ye, X.; Duffield, N.; on behalf of the Microsoft AI for Humanitarian Action Group. Rapid Damage Estimation of Texas Winter Storm Uri from Social Media Using Deep Neural Networks. Urban Sci. 2022, 6, 62. https://doi.org/10.3390/urbansci6030062

AMA Style

Pi Y, Ye X, Duffield N, on behalf of the Microsoft AI for Humanitarian Action Group. Rapid Damage Estimation of Texas Winter Storm Uri from Social Media Using Deep Neural Networks. Urban Science. 2022; 6(3):62. https://doi.org/10.3390/urbansci6030062

Chicago/Turabian Style

Pi, Yalong, Xinyue Ye, Nick Duffield, and on behalf of the Microsoft AI for Humanitarian Action Group. 2022. "Rapid Damage Estimation of Texas Winter Storm Uri from Social Media Using Deep Neural Networks" Urban Science 6, no. 3: 62. https://doi.org/10.3390/urbansci6030062

Article Menu

Rapid Damage Estimation of Texas Winter Storm Uri from Social Media Using Deep Neural Networks

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. DNN Structure

3.2. Social Media Data and Keywords Calculation

3.3. Damage Statistics from FEMA

4. Analysis and Results

5. Conclusions and Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI