**Spatiotemporal Changes of Urban Rainstorm-Related Micro-Blogging Activities in Response to Rainstorms: A Case Study in Beijing, China**

#### **Nan Wang 1,2, Yunyan Du 1,2,\*, Fuyuan Liang 3, Jiawei Yi 1,2 and Huimeng Wang 1,2**


Received: 18 September 2019; Accepted: 26 October 2019; Published: 31 October 2019

#### **Featured Application: This paper examines rainstorm-related micro-blogging activities in response to rainstorms in an urban environment at fine spatial and temporal scales. Results could be used in supporting disaster assessment and mitigation decision making.**

**Abstract:** Natural disasters cause significant casualties and losses in urban areas every year. Further, the frequency and intensity of natural disasters have increased significantly over the past couple of decades in the context of global climate change. Understanding how urban dwellers learn about and response to a natural hazard is of great significance as more and more people migrate to cities. Social media has become one of the most essential communication platforms in the virtual space for users to share their knowledge, information, and opinions about almost everything in the physical world. Geo-tagged posts published on different social media platforms contain a huge amount of information that can help us to better understand the dynamics of collective geo-tagged human activities. In this study, we investigated the spatiotemporal distribution patterns of the collective geo-tagged human activities in Beijing when it was afflicted by the "6-22" rainstorm. We used a variety of machine learning and statistical methods to examine the correlations between rainstorm-related microblogs and the rainstorm characteristics at a fine spatial and a fine temporal scale across Beijing. We also studied factors that could be used to explain the changes of the rainstorm-related blogging activities. Our results show that the human response to a disaster is very consistent, though with certain time lags, in the virtual and physical spaces at both the grid and city scales. Such a consistency varies significantly across our study area.

**Keywords:** social media; rainstorm event; spatiotemporal analysis; factor assessing

#### **1. Introduction**

Natural disasters such as hurricanes, floods and tornadoes can cause significant life losses, property damages, and even political instability [1–4]. In the context of global climate change, natural disasters have become more frequent and pose increasing physical, social, and economic threats to human society [5,6]. Closely monitoring how a natural disaster disturbs human activities is thus of great value [7–9], particularly in an urban environment where human-environment relationships are usually much more complicated.

Different methods and datasets have been used to study urban dwellers' response to a natural disaster. A comprehensive evaluation model was built to evaluate the macro-population vulnerability in response to an earthquake at city and county levels [10]. Questionnaire data were used to examine the impacts of bushfires on residents [11]. Qualitative studies have been conducted to examine how communities are resilient to the impacts of natural disasters using interview data [12].

Natural disasters usually pose significant threats to the dwellers being inflicted. The threats vary dynamically. Consequently, people's emotion and public opinions to such threats may also change rapidly [13]. Such dynamic changes couldn't be timely captured and examined using the afore-mentioned conventional methods and data. Unfortunately, the dynamic changes of the human response to a disaster are extremely important in disaster evaluation and mitigation.

The emerging social media data have shown their usefulness in tracking human behaviors in response to natural disasters and emergencies. Nowadays, Twitter, Facebook, Flicker and Weibo have become indispensable platforms for people to share their ideas and disseminate vital information in time [14–16], in particularly when a natural disaster, public safety event, or disease infection event occurs [17–21]. Data collected through these platforms have been used in improving situation awareness [22–24], event detection [25,26], communication analysis [27,28] and even helping governments guide the public opinions [29,30].

Social media data have also been used to detect and monitor the ongoing development of disasters such as influenza transmission, floods, typhoons, hurricanes, and terrorist attacks [31–34]. Different social media data have been examined to reveal the different development stages of a disaster and how human response to the development stages [35–37]. A research framework was constructed to extract multidimensional (time, space, content, and network) information from social data [8]. Disaster-related information derived from social media data was also used to examine the spatiotemporal impacts of a disaster on human activities and assess the actual disaster-induced damages [38–40]. A variety of machine learning methods have also been used to examine social media data to evaluate the disaster-induced damages [41,42].

Many studies have shown that public opinions in the virtual space, as reflected by social media data, is consistent with what happens in the real world at a broad geographical scale. Such consistency has not been sufficiently examined and clearly illustrated at a finer temporal and spatial scale. There is also a need to study the factors driving the human response to a disaster, particularly in an urban environment. In this study, we investigated how dwellers in a mega city react to rainstorms as revealed by blogging activities on Weibo, one of the most popular social media platforms in China. We then explored various factors that may contribute to the blogging activities in response to rainstorms.

#### **2. Study Area and Data**

#### *2.1. Study Area*

This study examined Weibo users' blogging activities in response to the rainstorms that hit Beijing from 21 to 24 June 2017, we chose the main urban area of Beijing as the study area as shown in Figure 1 (The water ponding points and major hubs in the main urban area have been marked in the Figure 1). Beijing is the capital of China. It significant grew over the past decade. The percent of the developed land in Beijing increases from 7.9% in 2005 to 16.3% in 2015. The total population has increased from 1538 million in 2005 to 2170.5 million in 2015. Beijing is also one of the most economically active cities in China with a total GDP of \$975 billion in 2005 to \$3270 billion in 2015.

Dwellers in Beijing use a variety of social media platforms on a daily base. Sina Weibo is one of the most popular platforms that allow people to stay in touch and share each other information about any on-going events. As of 2018, Sina Weibo has 462 million active users in China and every day an average 1.30 million words are posted online through it [42].

**Figure 1.** A map showing the boundary of our study area. Location of the study area in Beijing (**a**) and overview of the study area (**b**).

#### *2.2. Data*

A total of 3.32 million Weibo blogs geotagged with Beijing were crawled from the Sina Weibo platform. All posts were published from June to September 2017, a period that Beijing receives most of the rains all over the year. Every blog comes with the information of its user, publishing location, publishing time, and texts.

We also collected data of rainfall amount, points of interest (POIs), and water ponding sites. We used two precipitation data sets in this study. The hourly precipitation data at meteorological stations were collected from the China Meteorological Data Network (http://data.cma.cn/). The 1-h cumulative precipitation dataset was generated from the meteorological radar in Daxing, a town located 13 km south of the city. The radar covers the entire Beijing area and provides precipitation data with a spatial resolution of 1.051 × 1.051 km (After image processing and registration processing). In this paper, we unified the study of grid scale to this resolution, and the other data sources used are also processed to this resolution for further analysis and processing.

The POIs include the locations of businesses, educational institutions, residential areas, transportation facilities, open spaces, and others. The data set was produced mainly for navigation by the Beijing NavInfo Co., Ltd, Beijing, 110000. Each POI comes with its coordinates (latitude and longitude), type, name, address and flag (show its importance level). In this paper, we categorized the POIs into five classes (Table 1).



#### **3. Methods**

Figure 2 shows our data processing and analysis processes. We first used ArcGIS 10.5 to aggregate geotagged Weibo posts to grids with the same resolution (1.051 × 1.051 km) as of the rainfall data. We used the support vector machine (SVM) model to classify and extract rainstorm-related microblogs. We then analyzed how city dwellers respond to the rainstorms in terms of the changes of the numbers

of rainstorm-related microblogs at the grid scale and city scale. Finally, we investigated the factors that could be used to explain the city dwellers' response to the rainstorms.

**Figure 2.** A flowchart showing the data processing and analysis processes.

#### *3.1. Extraction of Rainstorm-Related Weibo Posts*

In total we crawled 3.32 million Weibo posts that were published from June to September in 2017 and geo-tagged with city Beijing. We then used the keywords such as thunderbolt, storm, water, and rainfall to filter and found around 8000 posts that are possibly related to the rainstorms.

We then randomly selected 2000 out of the 8000 posts and manually checked each post. The post was labeled with "true" if it is truly related to a rainstorm otherwise "false" if it is not. The 2000 manually labeled posts were then evenly divided into two subsets, which were used to train the SVM classifier and validate the classification results, respectively.

The SVM classifier has been used in previous studies to label the microblogs either as event-related or event-independent [18,43]. It is a nonlinear classifier that was generated using the radial basis function (RBF). Essentially, it produces an optimal hyperplane that can best separate the rainstorm-related posts from those none-related. The hyperplane is defined by two parameters, C and gamma, which represent the influencing range of a single sample and the influencing degree of the support vector, respectively. The two parameters were calculated using the GridSearchCV method [44] based on the training data subset. The validation subset data was then used to evaluate the separation accuracy using the five-fold cross validation method [45]. In this study, we obtained an F-score of 0.85, which indicates that the SVM classifier could be used to identify the truly-rainstorm-related Weibo posts. The final SVM model was then used to examine all unlabeled Weibo posts. In total we found 6072 out of the 8000 posts were truly rainstorm-related during the period of June to September 2017.

#### *3.2. Weibo Blogging Index*

We used two indexes to measure the blogging activities in response to the rainstorms. The first index, the human's event response index (*HERI*), is defined as the ratio between the standardized number of the rainstorm-related Weibo posts (RRWP) to the standardized total number of the posts within a specific cell (In the data standardization process, we normalize the total number of Weibo posts and RRWP to 0–1 for each grid.).

$$HERI = \frac{\text{Standardized number of the RRWP}}{\text{Standardized total number of Weibo Posts}} \tag{1}$$

The *HERI* could be used to measure human response intensity. A higher *HERI* value would indicate city dwellers are more active in blogging the rainstorms. A very similar index was used to estimate hazard-induced damages and monitor the post-hazard recovery speed [40,46].

The *HERI* could be significantly affected by the rainfall amount. Thus, we used another index, the event normalized response relation (ENRR), to evaluate the human response to a rainstorm by eliminating the bias introduced by the variations in rainfall amount. The ENRR is expressed as the relationship between the *HERI* and the rainfall levels per cell. Both the *HERI* and rainfall amount values were first broken into three levels (high, medium, and low) using the Jenks Natural Breaks classification method, which clusters data into different classes by seeking to reduce the variance within a class and maximize the variance between classes. The different combinations of the three *HERI* and rainfall levels would reflect how dwellers response to a rainstorm which brings different rainfall amount across our study area. Figure 3 shows the nine relationships represented by ENRR. In the study, we mapped the relationship of ENRR to each grid to reflect the relationship between *HERI* and rainfall intensity in different regions.

**Figure 3.** The 9 relationships of ENRR.

#### *3.3. Statistical Analysis*

We used a variety of conventional and spatial statistics methods to evaluate the areal difference of the blogging activities in response to the rainstorms across our study area. The hourly rainfall and the corresponding hourly RRWP were separately divided into four different groups according to their quartile levels, from which a confusion matrix was constructed. We then used the weighted Kappa coefficient to evaluate the consistency of the relationship between different levels of rainfall and the RRWP.

Quantile regression was used to estimate the conditional quantiles (0.05, 0.25, 0.50, 0.75, 0.9, and 1) of the number of posts in response to certain rainfall amount by measuring their central tendency and statistical dispersion. Quantile regression could more accurately describe the variation range of the dependent variable in response to the dependent variable. We then used the receiver operating characteristic (ROC) curves [47,48] to obtain the range within which the water ponding sites and major transportation hubs affect the blogging activities in response to the rainstorm. An optimal threshold is obtained by weighting both the sensitivity and the specificity equally, as measured by the closest distance between the points along the ROC curves and the top-left point, i.e., the perfect classification

where the sensitivity and specificity both equal to 1. In addition, we also performed hotspot analysis based on the *HERI* and the ENRR.

In this paper, we examined the blogging activities in response to the rainstorms at both city and grid levels, respectively. The city extent is defined by the administrative boundary of Beijing. Within the city, the rainfall and Weibo posts were aggregated to individual grids of 1.051 km × 1.051 km. There are 2776 grids within our study area, covering a total area of 2749.23 km2.

#### **4. Analysis and Results**

Five heavy rainstorms hit Beijing on 22 June, 6 July, 20 July, 2 August, and 22 August (Figure 4). The 22 June rainstorm brought historical record precipitation, flooded the city, and caused significant economic losses. When the city was afflicted by the 22 June rainstorm, Weibo users posted over 1000 blogs, the maximum blog number among all rainstorm events that hit Beijing in summer 2017. In this study, we mainly focus on the blogging activities in response to the 22 June rainstorm.

**Figure 4.** The time series RRWP and rainfall in Beijing from June to September 2017.

#### *4.1. The "622" Rainstorm*

At 16:00 on 21 June 2017, the Beijing Meteorological Bureau issued the first yellow lightning storm warning of a high altitude and low vortex, which later evolved into the 22 June rainstorm. The rainstorm first brought rain to the western part of Beijing from noon on 21 June and then across the whole Beijing city (Figure 5a). The storm lasted for 66 h and finally ended at 06:00 on 24 June. Heavy rains and floods were reported in Fangshan, Shunyi, Yanqing and Changping Districts. The storm flooded 131.5 hectares of agricultural land, affected 8594 people, and caused about \$3.22 million direct economic losses.

Both conventional and online medias covered this rainstorm extensively and generated a large number of news reports and Weibo posts. In total, we crawled 230,125 geotagged Weibo posts during the time period (20–27 June 2017) (Figure 5b) and 2362 out of them are RRWP (Figure 5c).

**Figure 5.** Spatial distribution of the rainfall amount (**a**), the number of Weibo posts (**b**), and the number of RRWP (**c**).

#### *4.2. Blogging Activities at City Level*

Figure 6 shows the time series hourly rainfall and the hourly number of RRWP from 20 to 28 June. No RRWP was found on the social media platform before the rainstorm hit the city. Rainstorm-related blogging activities were first detected when the first rainstorm warning was issued at 4:20 p.m. on 21 June. The blogging activities significantly intensified, particularly when the rainstorm is most intensive during the time period from 19:00 on 21 June to 04:00 on 24 June. During this time period, the city rainfall amount accounts for 90.5% of the total rainfall brought to Beijing by the "6.22" rainstorm. About 91.6% of the RRWP was posted during this time period.

Figure 6 also shows that variations in the rainfall amount are generally consistent with the blogging activities though there seems to be a 1-h time lag. The blogging activities are most intense in about 10 min before the release of the rainstorm warning. A high rainfall amount is not always accompanied by strong blogging activities, particularly when raining occurs from the late night to the early morning and the rainstorm hits suburbs with a small population flow.

**Figure 6.** Time series of hourly accumulated rainfall and Weibo posts.

The confusion matrix (Figure 7) between the different levels of rainfall amount and the RRWP shows that higher rainfall levels are always associated with more RRWP. We found 38 h with higher rainfall amount and more RRWP. Lower rainfall levels are associated with fewer RRWP. A statistically significant weighted Kappa coefficient of 0.63 indicates that the levels of blogging activities are consistent with the rainfall levels across the city.

**Figure 7.** Consistency between rainfall and RRWP levels.

Figure 8 shows the correlation coefficients between the number of RRWP and rainfall amount with a time lag up to 6 h. With increased time lags, the coefficients drop though the correlations are statistically significant at a confidence level of 0.01. The highest coefficient 0.653 was found when the time lag is 1 h, suggesting that more RRWP were posted one hour after the rainstorm. In other words, heavy rainfall usually triggers intensified blogging activities one hour later. It seems that, after 1 h of the rainstorm, the city starts to be afflicted by issues such as waterlogging and traffic congestion. Such issues tend to intensify rainstorm-related blogging activities.

**Figure 8.** Correlation coefficients between the rainfall amount and the number of RRWP with different time lags.

Quantile regression analysis between the rainfall amount and the number of RRWP (Figure 9) shows a steeper slope for the higher percentile data. In other words, increased rainfall shows a more significant impact on the number of RRWP when the rainfall is heavier. By contrast, when the rainfall is less than 30th percentile (the average rainfall of the grid in the study area is 8.6), an increase in the rainfall amount shows little impacts on the change of the RRWP. Once the rainfall exceeds the 30th percentile, the RRWP starts to increase. As the rainfall percentile increases, the regression slope becomes steeper. In other words, once the rainfall is over the 30th percentile, it tends to trigger Weibo users to post much more RRWP.

**Figure 9.** Quantile regression estimates of the relationship between the RRWP and the rainfall amount at the city scale.

#### *4.3. Human Response at Grid Scale*

In order to explore the differences in human response intensity of different time periods at grid scale, we first divided the whole day into four time periods (08–10, 11–16, 17–20, 21–07), and then map the RRWPs in different time periods by the 4 time periods' dot maps. Figure 10 shows the results. We can find that in the study area, the morning rush hours (08–10) and the evening rush hours (17–20) have the strongest human response intensity. During these two periods, important transportation hubs (Commercial business center, large jobs-housing area) and water ponding point areas have become regions with a high response in the main urban area of Beijing. In addition, there was a phenomenon in which dense points are distributed around the subway and along important roads. The occurrence of rainstorm event has caused great obstacles to traffic operation and delayed human's travel. The points in the second period (11–16) are mainly distributed near the traffic station and the more severely affected areas. The points in the last period are sparsely distributed in the study area. We also found the density distribution of four periods' points in major traffic stations such as airports, railway stations, and bus stations were relatively uniform, while large jobs-housing areas are densely distributed at points of the morning rush hours and the evening rush hours.

Figure 11 shows the correlation between the rainfall amount and the number of RRWP at the grid scale. The correlation coefficients vary between −0.14 and 0.86 with an average of 0.22. The negative or no correlation relationship is mainly found in suburbs, such as the Changping, Huairou and Miyun Districts. By contrast, higher correlation coefficients are mainly found in populated areas within the city, including populated residential communities, important transportation hubs, and areas significantly impacted by the rainstorms as shown in news reports.

**Figure 10.** Dot maps of in different time periods based on grid cells.

**Figure 11.** Correlation coefficients at grid level across our study area.

Figure 12a shows the *HERI* across our study area. Only the 2203 grids with at least 10 daily Weibo posts are selected to calculate the *HERI*. At the grid scale, *HERI* values range between 0 and 9.83 with an average value of 1.23.

The regions with a higher *HERI* value are mainly found in three places in our study area. The first are the areas with more rainstorm-induced damages, including serious house collapse, road blockage and mudslides. These areas are mainly found in the suburbs such as the Fangshan and Mentougou Districts. The densely populated regions, including Zhongguancun and the CBD, the Tongzhou residential area, an Internet technology parks also have a higher *HERI* value. Regions with important transportation hubs also have a higher *HERI* value. The transportation hubs include subway stations, train stations, and airports.

Figure 12b shows the *HERI* hotspot analysis results. Hotspots are mainly located in densely populated areas, important residential and workplaces, such as the Tongzhou residential area, CBD districts, and IT parks. It is worth noting that a large number of hotspots are found in the urban core areas. By contrast, the cold spots are mainly found in the remote suburbs of our study area. Such areas have a low population thus limited human activities.

The proportions of POIs within each hotspot identified are shown in Figure 12c. The *HERI* hotspots in the Beijing Capital Airport, Yizhuang, and Changping-Shahe Districts have the highest percent of transportation POIs. The texts of the RRWP within these hotspots show that the rainstorms may significantly delay the commute in these regions thus stimulate users to publish more RRWP to complain the traffic. The hotspots in the Tongzhou residential area and the Mentougou District show a higher proportion of residential POIs. Hotspots in Zhongguancun, Chaoyang CBD and IT Park have a highly mixture of multiple types of residential, business, and education POIs. There is no significant difference in the proportions of the POIs in other hotspots.

**Figure 12.** Spatial distribution of the *HERI* values (**a**); *HERI* hotspot analysis results (**b**); Percentage of POI types in each HERI hotspot area (**c**).

Figure 13a shows the binary relationship between the different levels of *HERI* and rainfall amounts. We classify the *HERI* and rainfall amount into three groups (high, medium, and low) using the Jenks

natural breaks, respectively. In total, there would be nine combinations between the different levels of *HERI* and rainfall amounts. The HH combination (high *HERI* and more rainfall) is located in suburbs, such as Fangshan District and Xi'erqi, which were hit by heavy rains and afflicted with serious rainstorm-induced damages and losses. The combination (HL) with a higher HERI and less rainfall is mainly found in densely populated areas, including the urban core area, Tongzhou District, CBD and the Beijing Capital Airport. The texts of the RRWP show that people in these areas complain that the rainstorms caused significant traffic jams and ruined their daily routine. Passengers trapped in the Beijing Capital Airport also published more RRWP due to the significant flight delays. The LH combination (low *HERI* and high rainfall amount) is mainly in the sparsely populated regions, where few RRWP were posted due to the fewer number of the Weibo users.

Figure 13b shows the POI types in the different combinations of the *HERI* and rainfall amount levels. The areas with a higher *HERI* value tend to have more transportation POIs, no matter what the rainfall amount is. By contrast, places with fewer RRWP and higher rainfall levels are less populous and with more green space.

**Figure 13.** The ENRR values across our study area (**a**) and the proportions of different types of POIs within the areas with different ENRR values (**b**).

#### *4.4. Factors Influencing HERI*

For all grids across our study area, the AUC values on the ROC curves are 0.767 and 0.733 for the water ponding sites and the major transportation hubs, respectively (Figure 14). The most appropriate OIDF values for the afore-mentioned two factors are 3400 m and 3200 m respectively. The bilateral Welch t test results show that the *HERI* value of the areas within the OIDF distance of a water ponding site and a major transportation hub were both significantly higher than those beyond and the difference is statistically significant at the 0.01 significance level.

Table 2 shows the number and density of water ponding sites and major transportation hubs by each level combination of rainfall amount and RRWP. The level combinations with intense blogging activities are all associated with high density of water ponding sites and major transportation hubs, no matter what the rainfall amount is. By contrast, the level combinations with inactive blogging activities are associated with a low density of water ponding sites and major transportation hubs.


**Table 2.** The statistics of water ponding sites and transportation hubs by different ENRR categories.

**Figure 14.** The ROC curves of the influencing factors.

#### **5. Conclusions**

In this study, we inferred the human activities from the rainstorm-related Weibo posts and examined how different levels of human activities are associated with different rainfall amount levels at both city and grid scales. The consistency between the rainfall amount and the human activities could be explained by the distribution of the water ponding sites and major transportation hubs. The regions with high density of water ponding sites and major transportation hubs tend to show intense human response to rainstorms in terms of the number of rainstorm-related Weibo posts. At different time periods, the intensity of human responses to rainstorm events in areas of different attributes and functions were also very different. The human response has been significantly enhanced during the early and late peak hours and is concentrated in important transportation hubs and water ponding sites. The occurrence of a rainstorm event has a huge impact on human travel. Analysis of the rainstorm-related posts suggests that there is no significant difference between the impacting ranging (~3.3 km) of a water ponding site and a major transportation hub.

We found that on the large scale, although the ground disaster space has a high consistency with the social media space, the intensity of responses at different stages, different spatial areas, and during different time periods of the disaster on social media platform were different. When looking at spatial differences on a grid scale for urban disaster events, the impacts of different types of region vary greatly due to the complexity of human–land relationships. During a rainstorm, the existence of special areas such as urban water ponding points, traffic stations, main jobs and housing areas, important line sites, and some disaster sites have led to frequent occurrence of secondary disasters and become major concentrated areas where humans respond strongly. These results show that there are time and space differences in the human response at the urban scale and grid scale under urban rainstorm events. Our research on the spatial consistency is similar to the previous research conclusions [36,39,49], but a further exploration of fine spatiotemporal process and supplementation of the factors affecting the differences in human responses give us a new understanding of the human–land relationship under the event conditions at a fine scale.

Of course, this study has some defects that can be ameliorated by additional research to improve upon our framework and further research goals. This study only examined the number of rainstorm-related Weibo posts without considering other information available in the original Weibo posts, such as the emotions, themes, and characteristics of the social media information. Other multi-source spatial data such as the nighttime lights, ambient population data, and road traffic congestion data, if successfully integrated, could provide a more comprehensive study on the human response to a natural disaster. The integration of multi-source spatial data and more comprehensive data mining methods would also significantly reduce the uncertainty of the associations between human activities in both the physical and virtual spaces.

**Author Contributions:** Conceptualization, N.W. and Y.D.; methodology, N.W.; validation, N.W., Y.D., H.W., and F.L.; formal analysis, N.W.; investigation, N.W.; resources, N.W and J.Y.; data curation, J.Y.; writing—original draft preparation, N.W.; writing—review and editing, F.L.; visualization, N.W.; supervision, Y.D.; project administration, Y.D.; funding acquisition, Y.D.

**Funding:** This research was funded by the National Key Research and Development Program of China (Grant Nos. 2017YFB0503605 and 2017YFC1503003), and the Strategic Priority Research Program of the Chinese Academy of Sciences (Grant Nos. XDA20040401 and QYZDY-SSW-DQC007-2), and the National Mountain Flood Disaster Investigation Project (SHZH-IWHR-57).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Review* **Rainfall Induced Landslide Studies in Indian Himalayan Region: A Critical Review**

#### **Abhirup Dikshit 1, Raju Sarkar 2, Biswajeet Pradhan 1,3,\*, Samuele Segoni <sup>4</sup> and Abdullah M. Alamri <sup>5</sup>**


Received: 25 February 2020; Accepted: 31 March 2020; Published: 3 April 2020

**Abstract:** Landslides are one of the most devastating and recurring natural disasters and have affected several mountainous regions across the globe. The Indian Himalayan region is no exception to landslide incidences affecting key economic sectors such as transportation and agriculture and often leading to loss of lives. As reflected in the global landslide dataset, most of the landslides in this region are rainfall triggered. The region is prone to 15% of the global rainfall-induced landslides, and thereby a review of the studies in the region is inevitable. The high exposure to landslide risk has made the Indian Himalayas receive growing attention by the landslides community. A review of landslides studies conducted in this region is therefore important to provide a general picture of the state-of-the-art, a reference point for researchers and practitioners working in this region for the first time, and a summary of the improvements most urgently needed to better address landslide hazard research and management. This article focuses on various studies ranging from forecasting and monitoring to hazard and susceptibility analysis. The various factors used to analyze landslide are also studied for various landslide zones in the region. The analysis reveals that there are several avenues where significant research work is needed such as the inclusion of climate change factors or the acquisition of basic data of highest quality to be used as input data for computational models. In addition, the review reveals that, despite the entire region being highly landslide prone, most of the studies have focused on few regions and large areas have been neglected. The aim of the review is to provide a reference for stakeholders and researchers who are currently or looking to work in the Indian Himalayas, to highlight the shortcomings and the points of strength of the research being conducted, and to provide a contribution in addressing the future developments most urgently needed to obtain a consistent advance in landslide risk reduction of the area.

**Keywords:** Indian Himalayas; landslides; GIS; remote sensing

#### **1. Introduction**

Landslides are the most frequent naturally occurring hazards that affect people and their livelihood worldwide. The frequency of occurrence in the Himalayan context is very large when compared with global events [1]. This review paper is an attempt to understand the research being undertaken to understand, assess, and mitigate landslide scenarios in the Indian Himalayan region. The need for such a review was raised from the compilation of the global landslide disaster database by Froude and Petley, [1]. In their database, a total of 5318 non-seismic landslides occurred from 2004 to 2017, of which 3285 landslides were triggered by rainfall. In the context of the Indian Himalayas, during the same period, 580 landslides occurred with 477 triggered by rainfall, thereby contributing 14.52% of the global landslides. These number could be even higher; for instance, based on NASA GLC data, the number of landslides in the Indian Himalaya during 2007–2015 is 691 with 6306 casualties (https://data.nasa.gov/Earth-Science/Global-Landslide-Catalog/h9d8-neg4).

This region covers more than 12% of India's landmass and is very prone to landslides due to the fragile lithology, the complex geological setting, the high energy of the relief with steep slopes, and the high topographic roughness. Moreover, most of the area is seismically active and subject to extreme precipitations, and the situation has been further worsened with the increase in anthropogenic activities and the advent of climate change. Since it is a well-established fact that most of the landslides in this area have been primarily triggered by rainfall [2–4], the focus of the present review is only on the studies considering rainfall triggered landslides. Moreover, one of the main practical purposes of this review is to serve as a starting point for future projects which consider implementing territorial landslide early warning systems, and rainfall triggering landslides are the only ones that at present can be forecasted with a certain confidence. Studies on earthquake-induced landslides, snow avalanches, and Glacier Lake Outburst Flood (GLOF) were not considered in this review. This work also covers the various methodologies being adopted for landslide monitoring and analysis, as well as reports on various mitigation measures being undertaken along with the role of Geographic Information Systems (GIS), remote sensing, and the recent use of computational techniques.

#### **2. Materials and Methods**

#### *2.1. Study Region*

The Indian Himalayan region is one of the most diverse and heterogeneous area in terms of geology, lithology, rainfall distribution, land use/land cover, soil properties, and road and stream networks, which makes it highly prone to landslides. The region covers 16.2% (~500,000 km2) of India's landmass as well as 10 of its states (Ministry of Environment, Forest and Climate Change, Government of India). The region extends 26◦20 –35◦40 N and 74◦50 –95◦40 E and covers the states Jammu and Kashmir, Himachal Pradesh, Uttarakhand, Sikkim, Arunachal Pradesh, Meghalaya, Nagaland, Manipur, and Mizoram, as well as the hill stations of Assam and West Bengal.

The region has been divided both horizontally and vertically into four and three divisions, respectively. In horizontal context, it has been categorized as Jammu and Kashmir Himalaya, Himachal Himalaya, Uttarakhand Himalaya, and Eastern Himalaya (Figure 1). Uttarakhand Himalaya is further divided into the Garhwal and Kumaon Himalayas. In terms of vertical divisions, it has been divided into Great Himalaya, Middle Himalaya, and Lesser Himalaya or Shivalik Ranges [5]. Geologically, the Himalayas are broadly divided into four areas across its length: (i) Foothill or Outer Himalaya; (ii) Lesser Himalaya; (iii) Higher Himalaya; and (iv) Tethyan or Trans-Himalaya [6]. The major rivers that originate from the great Himalayan Mountain Ranges are the Indus, Ganges, and their various tributaries. The region exposes diverse geology with different rock types representing the complete spectrum ranging in age from Archaean metamorphites/granitoids to the youngest Quaternary alluvium.

**Figure 1.** (**a**) Southeast Asia highlighting India; (**b**) geographical boundaries of India highlighting the Indian Himalayan region; and (**c**) location of rainfall-induced landslides in the Indian Himalayan region [1] (DEM Source: SRTM, resolution 30 m).

The region is tectonically and physiographically divided into three broad domains: the Peninsular India, the Extra-Peninsular India, and the Indo-Gangetic Plain [7]. The Indo-Gangetic plain is sandwiched between the shield area of the Peninsular India and the highly deformed suites of the Himalaya of Extra-Peninsular India, comprising essentially the younger meta-sediments. The tectonic trough (foreland basin) sandwiched between peninsular shield in the south and Himalayan Mountains in the north formed due to upliftment of the latter has been filled up by sediments derived from both sides, especially from the Himalayas. Structurally, the Himalayan Mountain chain occurring all along northern part of India can be divided into four contrasting longitudinal litho-cum-morphotectonic belts from south to north: (i) foothill belt; (ii) Main Himalayan belt; (iii) Indus-Shyok belt; and (iv) Karakoram belt.

The foothill Himalaya is a 10–50 km wide Miocene to Lower Pleistocene Molasse sequence represented by Siwalik, Murree, and Subathu Group of rocks. The belt is a domain of active tectonics, having participated in the terminal phase of the Himalayan Orogeny. This is followed to the north by the Lesser and Higher Himalayas, represented by geological sequences of Proterozoic age with a Phanerozoic cover of varying thicknesses in different parts. The foothill Himalaya is overlain by alluvium and separated from the Lesser Himalaya by the north-dipping fault commonly known as the Main Boundary Fault (MBF) or the Main Boundary Thrust (MBT) in Garhwal, Kumaon, Darjeeling, and Arunachal Pradesh Himalaya. The Main Frontal Thrust (MFT) limits the margins of the Siwalik Zone against the Ganga Plains [8].

The Lesser Himalaya is 60–80 km wide and is a discontinuous belt stretching between theMBT in the south and the Main Central Thrust (MCT) in the north [9]. It consists of autochthonous Late Proterozoic sediments, thrust over by three vast nappes that are built up successively of Palaeozoic sediments, Precambrian epimetamorphics, and mesograde metasediments [10]. The epi-metamorphic and meso-metamorphic nappes throughout their extent are characterized by Early Proterozoic (= 1900 Ma) and Early Palaeozoic granitic bodies of large dimension. The MCT separates the Lesser Himalaya from the Higher Himalaya to its north. The Higher Himalaya marks the region of the highest peaks of the Himalaya (Nunkun, Leopargial, Kedarnath, Badrinath, Nanda Devi, Api, Dhaulagiri, Mt. Everest, and Kanchanjunga), made up of 10–15 km thick Precambrian crystallines exhumed up and intruded by granites, some of which are Tertiary in age. The Indus Shyok belt/the Tethys Himalaya extend to the south of the Trans-Himalayan Karakoram belt and comprise ophiolite melange (Indus ophiolite and associated formation) and plutonic rocks (Ladakh Granitoid Complex) of the Indus Shyok belt [11]. These predominantly fossiliferous sediments range in age from Late Proterozoic to Eocene. Sporadic occurrences of chromite have been reported from the ultrabasic rocks associated with Dras volcanics from Ophiolite-Melange zone. Karakoram belt, the northernmost zone, comprises Palaeozoic and Mesozoic sedimentary sequence in a metamorphic basement of an unknown age. This Trans-Himalayan belt lies to the north of the Indus suture Zonein Ladakh region and extends eastward into Tibet. No important mineral occurrence is known from this belt. Figure 2 represents the geological map of the Indian Himalayan region.

**Figure 2.** Geology of: (**a**) South Asia highlighting the Indian Himalayan region; and (**b**) the study region (Source: USGS World Geologic Map).

#### *2.2. Data Collection*

The review was undertaken by carrying out a bibliographic search on "Web of Science" Database (1990–October 2019) for a combination of keywords: "Landslide\*", "Himalaya\*", and "India". For the current analysis, only peer-reviewed journal articles written in English were considered, as peer-reviewed journals are considered as having the best quality articles and because the English language ensures that these contributions could be fully understandable for the whole international scientific community. The filtering of the articles was carried out by the number of citations (a) articles published before 2008 with 10 or more citations; (b) articles from 2008 to 2010 with 5 or more citations; and (c) all articles thereafter independent of the number of citations received (following the approach adopted from Reichenbach et al. [12]. After performing a screening of the relevant studies, the number of selected articles was narrowed down to 226 (Figure 3).

**Figure 3.** Analysis of the literature database during 1990–October 2019 from Web of Science. The left ordinate axis represents the number of articles per year and the right ordinate axis depicts the cumulative number of articles during the entire analysis period.

The articles were published across 75 different journals, with nearly 60% of the works published in 11 journals (Figure 4). The articles were initially majorly published in Engineering Geology and Geomorphology. The focus has shifted towards other journals such as Natural Hazards, Landslides, and Geomatics, Natural Hazards and Risk. Generally, after a major landslide event, the preliminary report is published in Current Science. However, extensive analysis of landslide study misses in such journals and is mostly focused on post-landslide studies.

**Figure 4.** Distribution of the number of articles across the top 11 journals (out of 75). Horizontal bars depict the number of articles and its thickness denotes the average number of citations per article.

#### **3. Results**

The analysis reveals that the landslide studies in the region are quite biased towards Uttarakhand region, while there are few to no studies in the northeast Indian region (Figure 5).

**Figure 5.** Distribution of landslide studies across the states in Indian Himalayan region.

This bias can be attributed to the high population density in the western sector of the Indian Himalaya, leading to a larger number of casualties and a generally higher level of landslide risk. Pham et al. [13] calculated the number of casualties due to landslide incidences during 2007–2015, which came out to be 5228, a staggering 82.9% of all disaster casualties in the region. To understand the type of studies being conducted, we divided them into seven broad categories: description of main events; identification; forecasting; monitoring and investigation; river damming; extreme events and climate change; and susceptibility mapping.

#### *3.1. Main Landslide Events*

The Himalayan region has been affected by several landslides, and a few of the most researched and damaging landslides are mentioned here. Most studies have been carried out in the Uttarakhand region. Some major landslide events in the Uttarakhand region occurred in the Okhimath area in Mandakini Valley, which was significantly damaged by the August 1998 landslide event, triggered by heavy rainfall. In total, 466 landslides were triggered, leading to 103 deaths and damaging 47 villages. Thereafter, cloudburst on July 2001 in a part of Mandakini valley led to more than 200 landslides causing 27 fatalities and affecting almost 4000 people [14]. The region also suffered from a heavy rainfall of more than 200 mm during 13–14 September 2012, which led to the death of 51 people and caused 473 landslides [15]. June 2013 witnessed another heavy rainfall of 350 mm within a period of five days in the Chamoli and Rudraprayag districts of Uttarakhand [16]. Other major landslides in the Uttarakhand region were the 2009 landslide in Pittorgarh region with a loss of 43 people.

Another major landslide affected region is Himachal Pradesh, of which the area of Katropi has often been studied [17,18]. The area has a history of recurring landslides since 1977 and recently it suffered from major landside in August 2017 claiming 46 lives [17]. The Pawari landslide zone has also been an active landslide zone located in the southeastern part of the state, which expanded 7% during 2005–2014 [19]. The Luggarbhati landslide in 1995 and Dharla landslide event in 2007 claimed 65 and 62 lives, respectively. The Jammu and Kashmir region has suffered from major cloudbursts such as the Leh Nalla cloudburst (2005, 2006), Leh cloudburst (2010), and Baggar cloudburst (2011) [20].

In the northeast part of the region, most reported landslides have been in the Darjeeling area of West Bengal and Sikkim. The Darjeeling region has a history of several landslides with the first major recorded occurrence in 1899, which led to the death of 72 people. Thereafter, landslides happened in 1950 (127 people died) and 1968 (667 people died) [21]. The region has also suffered significant landslides in the late 20th and 21st century (1991, 1993, 2004, 2005, 2006, 2009, and 2015) [22]. Similarly, the Sikkim region faced numerous landslides such as the Lanta Khola landslide [23], south Sikkim area [24]. The prominent unstable sections in the northeast region of the Indian Himalaya are National Highway (NH)-40 [25] and NH-44 [26] in Meghalaya. In Mizoram, NH-44A has suffered several landslides, in 2011, 2013 and 2017 [27].

#### *3.2. Landslide Identification*

The identification, visualization and standardized classification is an important step for preand post-hazard landslide analysis, which could assist in assessing rescue and relief operations [28]. Historically, landslide damage assessment was carried out using field visits and categorizing it according to the block diagrams depicted in [29]. However, understanding landslide scenario using such simplistic illustrations fails to consider the surrounding morphometry and its contextual relationship. With the advancement of remote sensing technologies and availability of high spatial and temporal resolution data, researchers have attempted to map landslides using image classification techniques. In the case of Indian Himalaya, the identification of landslides after an event has been mostly carried out by scientists at National Remote Sensing Centre (NRSC). Vinod Kumar et al. [30] studied Varunawat Parbhat landslide in Uttarakhand following the September 2003 landslide event using post-landslide Stereoscopic Earth Observation data on a 1:10,000 scale. The availability of high-quality images enables researchers to identify the loss of vegetation and exposure of fresh rock and soil after the landslide event [31]. The methodologies to assess landslide damages using pre- and post-landslide event images can be categorized as pixel based (PB) (medium resolution images) and object based (OB) (high-resolution images). Pixel based method involves the use of pre- and post-event images, and landslides are identified using spectral information. The identification of the landslides is mainly based on change detection and image fusion techniques. However, the use of pixel-based identification approach has some limitations. Firstly, the availability of images (pre- and post-event) may not be available from the same sensor, including variation in the atmospheric conditions and bandwidth of spectral values. Secondly, the representations of landslides are based on pixel values. On the other hand, object-based approaches use spectral, shape, spatial, and contextual properties and the identification is carried out using an object rather than a pixel, which makes more sense for natural events such as landslides as the damages are of irregular size and shape. It has the potential to add morphometric information derived from Digital Elevation Model (DEM).

The work conducted in the Indian Himalayan region has been primarily conducted using an object-based image classification technique. The use of OB image classification is dependent on the data type and methodology being used. The methodology involves detection, segmentation, and classification of images. The detection procedure can further be categorized into: (i) direct object change detection; and (ii) classified object change detection. The first detection technique involves the comparison of two images for changes in geometrical and spectral information acquired at different times. Such approach is usually used when comparison is made between images acquired from same satellite sensors and generation of full change matrix is not an absolute requirement. In the latter approach, first segmentation and classification are performed independently using time series images, and then the change matrix is derived using classified images. The second approach is mostly used in land use/land cover change analysis as well as when time series data are acquired by different satellite sensors [32]. The studies conducted in the region have mainly used the first approach as a single feature, i.e., the landslide is analyzed and the changes are estimated from images acquired from the same sensor.

The capability of detection using object-based analysis depends on the segmentation technique being used, which has the ability to excerpt objects that accurately describe the relevant properties [32]. The most popular segmentation technique for classification of images is the multi-resolution segmentation (MRS). This segmentation approach is a bottom-up region-merging technique, wherein small objects are merged into bigger ones in subsequent steps, and involves three parameters: scale, shape, and compactness. The scale parameter handles the size of the image object size, whereas the shape parameter determines the degree to which shape influences segmentation vs. spectral homogeneity. The compactness defines the weight of the compactness criteria. The higher the value is, more compact the objects will be [33]. Several methods exist to select an optimal scale parameter, of which plateau objective function [31,32] and optimal scale parameter selector [34] have mostly been used. However, obtaining a desired scale for all the landslide types in an area is difficult, thus over-segmentation is preferred to under-segmentation [35].

The use of the OB technique has been used for landslide detection immediately after a landslide event, as well as for creating a landslide inventory database from historical images. Martha et al. [35] prepared landslide inventory data for the 1998 Uttarakhand landslides in Okhimath region, Uttarakhand and compared with the field data acquired after the landslide event. The study identified 73 landslides using Resourcesat-1 LISS-IV multispectral data (5.8 m) and a 10-m Cartosat-1 derived DEM. This semi-automatic approach resulted in achieving an accuracy of 76% for recognition and 69% for classification in terms of number of landslides. Thereafter, Martha et al. [31] used historical panchromatic image dataset (1998–2006) for the same region to prepare a landslide inventory database. The images were acquired from Cartosat-1 (2.5 m) and IRS-1D (5.8 m) and depicted the use of texture in cases of missing spectral information. Martha et al. [36] analyzed various time series images (1997–2009) for Okhimath region, Uttarakhand using Cartosat-1 (2.5 m), Resourcesat-1 (5.8 m), and IRS-1D panchromatic (5.8 m) data. For the entire period, the accuracy of landslide detection varied from 60% to 89%, whereas it varied from 71% to 97% in terms of landslide extent.

The identification and classification of landslides is available for two severe landslide events occurring in Uttarakhand in September 2012 [15] and June 2013 [16,32]. Martha and Vinod Kumar [15] used very-high resolution (VHR) satellite data (Resourcesat-2 (5.8 m), Cartosat-2 (1 m), Kompsat-2 (1 m), GeoEye-1 (0.5 m), and Cartosat-1 DEM (10 m)) covering the September 12 Uttarakhand landslide event and identified 473 landslides. Martha et al. [16] compared pre-disaster images (Resourcecat-2 and GeoEye-1) with post-disaster images (Resourcesat-2 and Cartosat-2a (1 m)) after the June 2013 Uttarakhand event. The study identified 6031 landslides and classified those as new (57.74%), old (18.92%), and reactivated (23.34%). Martha et al. [32] identified new landslides by comparing the preand post-landslide Resourcesat-2 images with a detection accuracy of 81%. Mohan Vamsee et al. [34] improved the scale component of MRS technique and developed the optimal scale parameter selector (OSPS) tool, which was applied to Uttarakhand region using Resourcesat-2 images.

However, the accuracy of the above-mentioned techniques largely depends on the resolution of the satellite images that are costly to be acquired. Therefore, free and high-resolution Google Earth (GE) images have also been used for landslide mapping [19,37]. The ability of such images to provide a 3D view and its free availability can serve researchers to exploit imagery for landslide detection and mapping [38]. Kumar et al. [37] utilized GE images for landslide dimension mapping along Satluj Valley in Northwest Himalaya. Further, Kumar et al. [19] conducted a study on a relatively smaller area (Pawari landslide) in the same region and analyzed the changes using GE images of 2005, 2012, and 2014.

As landslide mapping is the first key step towards conducting any landslide study or to set up recovery attempts after a landslide event, focus should be on the use of high temporal resolution dataset. The focus has primarily been on the use of semi-automatic identification approach and gradually improving the algorithms used to detect and classify landslides [16,28]. The classification and segmentation techniques need to be improved using computational techniques such as Machine Learning (ML) and Artificial Neural Networks (ANN) [39]. In addition, the studies have largely been

concentrated in a single area and need to be applied to other Himalayan regions to understand their applicability and reliability.

#### *3.3. Landslide Forecasting*

Landslide forecasting is a key element for disaster risk reduction and is also the most challenging. Most of the landslides in the Indian Himalayan region are shallow in nature, of which rainfall is the primary triggering factor [3,4]. The analysis of precipitation for landslide occurrence can be performed by estimating minimum rainfall conditions, sub surface monitoring, or slope stability analysis. The minimum rainfall conditions, also known as thresholds, can be classified as empirical and physical approaches [40–42]. Physical models assess the relationship between rainfall conditions and hydrological conditions of the soil, which affects slope stability. The model analyzes the spatial variation of several factors such as geotechnical parameters, soil depth, volumetric water content, geology, and topography to determine the pore water pressure change and estimate the factor-of-safety. Such models need a large dataset of many parameters in spatial and temporal context, which is usually not available for the Indian Himalaya. Empirical methods analyze the rainfall conditions using statistical methods to determine threshold levels of precipitation. Such models are simpler to apply because they require only the spatial and temporal dataset of precipitation and landslide events [43,44]. The threshold values can be greatly affected by the data quantity and quality, the rain gauge density, and the methodology used. The thresholds are usually determined by drawing (usually with statistical techniques) a lower line to the precipitation conditions corresponding to landslide events in log-log, semi-log, or Cartesian coordinate system [3,41]. Segoni et al. [43] reviewed the type of thresholds, data collected, and other important information for thresholds determined in the global context for 2008–2017.

The studies on rainfall thresholds for the Indian Himalayan region have been very limited. The use of empirical approach has been explored for different Himalayan regions. Sengupta et al. [45] proposed the use of EMAP for Sikkim Himalayas, which is the ratio of cumulative event rainfall (E) to mean annual rainfall (MAP) (EMAP = E⁄MAP), instead of rainfall event-duration (ED) or rainfall intensity-duration (ID) thresholds. Kanungo and Sharma [3] determined ID thresholds for Garhwal Himalayas using best fit of the lower boundary of ID plane. Dikshit and Satyam [4] developed the ID thresholds for Kalimpong region using frequentist statistical approach. The use of probabilistic [22,46] and semi-automated algorithmic approach [47] has only been attempted for Kalimpong region. Harilal et al. [48] developed both regional (Sikkim) and local (Gangtok, Sikkim) ID thresholds using statistical approach. The main concerns with these studies are the coarse spatial distribution of rain gauges, all studies being performed using a single rain gauge, and the availability of only daily rainfall data. In this regard, Gariano et al. [49] highlighted that a coarse temporal resolution dataset of precipitation could lead to an underestimation of rainfall thresholds, which culminates in a higher number of false alarms when using thresholds for operational early warning system. In addition, the thresholds determined are rainfall intensity-duration thresholds, which should be avoided, while the determination of event rainfall-duration (ED) thresholds should be encouraged. Mathew et al. [50] established ID thresholds along the Rishikesh–Mana pilgrimage route for Garhwal Himalaya using Tropical Rainfall Measuring Mission (TRMM)-based Multi-satellite Precipitation Analysis (TMPA) data. Despite using different methods to determine various aspects of rainfall characteristics, all studies were unified in the effect of antecedent rainfall in the region. Mathew et al. [50] also suggested that antecedent rainfall ranging from 15 to 30 days plays an influential role for destabilizing slopes in the Himalayan region, which leads the subsequent rainfall of short duration (24–72 h) to trigger landslides. The studies on rainfall thresholds in the Indian Himalayan region have been very minimal and more work should be conducted on the calculation and analysis of regional and local thresholds. Kumar et al. [51] highlighted that the mean rainfall threshold intensity of NW Himalayas (excluding Sikkim Himalayas) is roughly 290 mm/day. At present, defining rainfall thresholds at regional scale or at a state administrative level would be highly desirable but is hard to accomplish. The variation in the

thresholds determined also ascertains that local thresholds would perform be better than regional thresholds in setting up an operational landslide early warning system. This could be due to the heterogeneous rainfall pattern at local scales in various Himalayan pockets. Table 1 lists the various thresholds generated for different regions of the study area. Figure 6 illustrates the thresholds in ID plane.

**Table 1.** Threshold equations generated for various Himalayan regions using several methods (I is Rainfall Intensity (mm/h), while D is Duration (h)).


**Figure 6.** Comparison of various ID thresholds developed for various Himalayan regions with global thresholds defined by Caine [52].

#### *3.4. Landslide Monitoring and Investigation*

Landslide monitoring is a very significant aspect of landslide assessment, especially in the Indian Himalayan region where the slopes are generally creepy in nature, which could immediately fail during an abrupt rainfall event or seismic occurrence. Landslide monitoring is generally categorized into three types [53]:


Of the various monitoring techniques, the use of remote sensing (RS) data along with traditional monitoring instruments would prove to be very helpful especially in understanding long-term deformation and the advancement in RS technology would gradually overcome the use of traditional equipment. In the case of Indian Himalaya, monitoring studies include remote sensing techniques and ground based observation. Yhokha et al. [56] used the Persistent Scatterer Interferometry (PSI) technique using ENVISAT satellite for Lesser Himalaya, Nainital. PSI is based on InSAR technique, which utilizes several SAR images and has proved to be successful in identifying the creeping zones in Nainital. Martha et al. [57] monitored the landslide dammed lake in Zanskar Himalayas for five months (January–May 2015) using multi-temporal high-resolution satellite images after the landslide event (December 2014) and depicted the variation in the dimension of the impounded lake during the monitoring period. Dikshit et al. [54] and Dikshit and Satyam, [22] used tilt sensors at shallow depths to analyze the variation in tilting angle of the instrument, which is related to the lateral displacement of the slope. The study also validated the empirical models thereby encouraging similar studies to be conducted in other Northeast Himalayan regions, which could help in setting up a preliminary early warning system.

The investigation of unstable slopes in the Indian Himalaya has been conducted using Ground Penetrating Radar (GPR) or 2D Electrical Resistivity Tomography (ERT). Mondal et al. [58] conducted ERT investigation for Naitwar Bazar landslide in Uttarakhand for six sites, which was active after the 2004 event. Kannaujiya et al. [59] compared slide dimensions observed from satellite (IRS LISS-IV) with geophysical investigation and determined the depth and slip surface geometry using 2D ERT and GPR) for Kaliganga river valley in Uttarakhand. Falae et al. [55] used ERT to understand the subsurface movement to determine the probable failure plane of the Pakhi landslide, Uttarakhand. Sharma et al. [60] conducted a geophysical study at Lanta Khola Landslide, Sikkim using very low-frequency (VLF) electromagnetic survey to understand the subsurface structure. The results reveal the presence of a water-saturated zone at the subsurface level of the slide.

The literature review reveals that only a handful of studies have been carried out for landslide monitoring and investigation; considering the increased quantity of the available remote sensing data sources, the focus needs to shift towards the use of high spatial and temporal resolution data, which would lead to near real-time monitoring results. The framework using sub surface investigation is ideal at a local scale and would fulfill the demands of a community but would be difficult to manage in long term due to the high installation and operational costs. However, in sections where the landslide problem is immense, which is slide specific, such as Paglajhora landslide (Darjeeling), Singtam landslide (Sikkim), or Kotropi landslide (Himachal Pradesh), monitoring using instruments would still be required as the region is highly vulnerable and satellite images at currently available resolution may not be able to accurately identify failure planar sections.

#### *3.5. Lake Damming Landslides*

Landslide lakes or dams are temporary lakes in the river valleys formed after a landslide blocks the river course. Landslide dammed lakes and their outburst floods (LLOFs) are common in the Indian Himalaya. The breaching of such temporary lakes with a huge amount of accumulated water and sediments can create devastating floods in the downstream areas [61]. In the Himalayas, landslide dams commonly form in high mountains because of different mass movement types such as rock and debris avalanches, rock and soil slides, mud–debris–earth flows. The oldest recorded landslide lake was the Gohna Tal, which was formed in September 1893 as a result of heavy precipitation blocking the Birahi Ganga River situated in the Kumaon Himalaya. Gradually, the lake expanded to an approximate 4000 m length, 340 m width, and 300 m depth. On 26 August 1894, the dam collapsed, and a devastating flood hit downstream of Birahi Ganga [62].

The techniques used for assessment of damming processes are either based on geomorphic characteristics, hydraulic properties, and velocity measurement of dam material. The assessments of the damming process are dependent on the analysis of three significant factors: (a) pre-dam formation mechanism; (b) dimensional characteristics of the dam; and (c) stability analysis of dam [63]. The pre-dam formation mechanism largely depends on the slope stability analysis and landslide triggering factors, whereas dimensional characteristics are dependent on geometry of the area and landslide volume. In terms of slope stability, the techniques can be categorized as discontinuum modeling and continuum modeling [63]. Discontinuum modeling is majorly used for rock slope stability analysis, whereas continuum modeling is used for debris flows as well as rock slopes dependent on the material and geometry of the slope [19,64].

Most of the studies have focused on geomorphic analysis [15,16,57,63] using different satellite images to understand the spatiotemporal landslide changes. Gupta and Sah [65] catalogued the LLOFs developed in the Trans-Himalayan region between 2000 and 2005 and studied its impact on the stability in the region. Martha et al. [57] conducted an extensive investigation of Phutkal River landslide dammed lake for landslide occurrence on 31 December 2014 using multi-temporal Cartosat-2 images of 1 m spatial resolution and calculated slide volume using pre- and post-event datasets. Kumar et al. [63] used GE images of 1.5 m resolution and estimated landslide volume along with slope stability analysis for Urni landslide in Himachal Pradesh.

#### *3.6. Extreme Events and Climate Change*

The effect of global warming and the corresponding changes to climate and geohazards is expected to affect landslide events [66]. However, forecasting and understanding the impact of climate change on landslide activity still poses a challenge. Gariano and Guzzetti [66] in their review article on climate change studies related to landslides emphasized the need for more studies as large parts of the world suffer from a few to no studies. Although it is predicted that the Indian Himalaya region will be affected by climate change at the time of compiling this article, we could not find any article in which the effect of climate change has been considered in the Indian Himalaya for any type of rainfall-induced landslide study. However, some works exist that describe some of the most recent extreme rainfall events: Jammu and Kashmir Himalaya for the 2014 rainfall event [51], Uttarakhand Himalaya for the 2009 event [67], and Leh region for the 2010 event [68]. One of the main reasons for the lack of inclusion of climatic variation in landslide studies for the Indian Himalayan region is the unavailability of data both in spatial and temporal context, which has been highlighted by several researchers. Kumar et al. [51] estimated that the pattern of rainfall intensity along the Himalayas varies from west to east, thereby precipitation patterns being affected by western disturbances and summer monsoon. Such lack of studies in a region such as the Himalayas, which is prone to severe landslides and other geohazards, is a matter of concern. Hereon, studies should include climate aspects using remote sensing products, which could reconstruct events and provide a better understanding of climate impact.

#### *3.7. Landslide Susceptibility*

Landslide hazard is defined as the probability of landslide occurrence of a specific type ("when" or "how frequent") and magnitude ("how large") [12]. The spatial probability of landslide hazard is assessed by carrying out landslide susceptibility mapping or popularly known as landslide hazard

zonation mapping. Landslide susceptibility assessment determines potential of landslide event considering several predisposing factors and investigating their spatial distribution [12]. The spatial occurrence of landslides is controlled by several and sometimes inter-playing factors such as geological setting, rainfall, morphology, soil, and vegetation conditions, thus landslide susceptibility assessment is not a straightforward task, and many methodologies have been developed to assist the susceptibility analysis. The classification techniques used for landslide susceptibility models has also changed over time. We categorized them into four groups: Qualitative, Semi-quantitative, Quantitative, and Deterministic (Figure 7). In this section, we explore the models, the parameters used, and the trend in susceptibility analysis.

**Figure 7.** Types of modeling techniques used in landslide susceptibility studies.

Initially, studies were mostly based on qualitative methods wherein values are ascertained for landslide conditioning factors based on the knowledge about the study area and are quite subjective [69,70]. Semi-quantitative methods are logical tools and emphasize the significant factors by assigning weights or values. These include techniques such as Analytic Hierarchy Process (AHP), danger pixel approach, and a weighted linear combination [71].

The deterministic analysis involves an analysis of physical and mechanical soil properties and determines the susceptibility in the form of a factor of safety (FS) [72]. FS is the ratio between factors affecting landslide to factors preventing landslide incidences. This involves the utilization of several factors such as infiltration, soil cohesion, groundwater table, pore water pressure, geotechnical soil parameters, etc. Several deterministic models are available to determine slope stability such as SHALSTAB (Shallow Landsliding Stability model), SINMAP (Stability Index Mapping), SHETRAN, TRIGRS (Transient Rainfall Infiltration and Grid-based Regional Slope Stability) model, etc. In the case of the Indian Himalayas, Sarkar et al. [72] used SHALSTAB model to determine the critical steady-state rainfall for slope stability in Darjeeling Himalayas. A landslide inventory map was developed using various satellite images (Resourcesat -1, Cartosat -1, and Google Earth) and field surveys, soil parameters (depth, saturated soil density, engineering properties, and soil transmissivity), and slope parameters (angle, contour length, and upslope contributing area). Mathew et al. [73] determined FS for terrain stability for Garigaon watershed area of Uttarakhand by coupling an infinite slope stability model with a steady-state hydrological model (LIDA) using the spatial analysis in GIS environment. The landslide locations were mapped using LISS IV and Cartosat-1 PAN data and the input parameters used were soil texture, index properties, porosity, LULC, and terrain properties. The results indicate an increase in the unstable areas of more than 45% for rainfall intensity variation from 50 to 100 mm/day.

Quantitative models depend on the landslide density under each influencing factor and can be further classified as a bivariate and multivariate [74]. Both bivariate and multivariate statistical models compute weights; however, multivariate techniques depend on the collective effect of parameters [75,76]. Bivariate statistical methods include weights of evidence, frequency ratio, information value, and the combination of frequency ratio and fuzzy methods [77–79]. The most popular multivariate model used is logistic regression [80]. The analysis shows that most studies have used quantitative techniques, of which logistic regression has been most used [81–83]. Thereafter, frequency ratio and information value were used 16% and 9% of the studies, respectively [74,84–88]. In terms of semi-quantitative models, AHP has been the most used method (Figure 8).

**Figure 8.** Model types and the techniques used in hazard and susceptibility studies.

After understanding the various models being used to determine susceptibility, the analysis of the parameters being used was investigated. The use of parameters for susceptibility analysis would affect the results; however, they usually depend on the local factors along with data availability. In addition, the factors depend on the type of movement, scale of study and the methodology used [89]. Based on the analysis of the articles, the factors influencing landslide susceptibility can be divided into four groups: (i) geological (lithology, geological structures); (ii) geomorphological (drainage, relative relief, slope, and slope aspect); (iii) environmental (land cover); and (iv) anthropogenic (roads). In the 81 studies, 36 factors were used, where some factors such as "gradient" and "slope" were clubbed together, and a word map of the factors is illustrated in Figure 9. The most used factors were slope angle (93.8%), land use/land cover (LULC) (92.5%), aspect (80.2%), and lithology (62.9%). Apart from these factors, another set of parameters, namely drainage density, curvature, topographic wetness index (TWI), stream power index (SPI), relative relief, lineament distance, and lineament density, was also utilized. Seismic factors, fault buffer, and road buffer have been less frequently used. The trend over the years in the number of factors to be used has increased with an average of six factors being used until 2010 and an average of eight factors thereafter. This trend can be ascertained to the increase in the availability of data and remote sensing products.

**Figure 9.** Tree map of the factors used for determining landslide hazard and susceptibility.

In terms of the spatial resolution of the dataset used to determine susceptibility, the most commonly used Digital Elevation Model (DEM) is the one released by USGS, with a spatial resolution of 30 m (46%). Only 28% of the studies used a spatial resolution higher than 10 m, which was obtained from Cartosat-1 images, whereas 26% of studies used the dataset with a spatial resolution of 12.5 m from ALOS-PALSAR DEM. The scale of geological maps has generally been 1:250,000 (62%) while the remaining 38% studies were conducted using 1:50,000 scale. Anbalgan [69] formulated the guidelines for landslide hazard assessment based on Landslide Hazard Evaluation factor (LHEF) using factors such as structure, relative relief, lithology, slope morphometry, land use, and groundwater conditions. The LHEF rating system was based on determining the factors and assigning values based on the understanding/knowledge about the study region. The national standard body of the country, Bureau of Indian Standards (BIS), prepared its guidelines (BIS, 1998) based on Anbalgan's study using the same factors excluding groundwater conditions [90]. BIS recommends an indirect method for medium-scale (1:25,000–1:50,000) landslide susceptibility mapping. Such technique is a generalized method that could be applied over large regions regardless of the relationship between causative factors and landslide types, often leading to moderate to poor predictions [91].

Following this, the nodal agency, Geological Survey of India (GSI), developed its own guidelines (GSI, 2005) considering 10 factors, which is a modified version of BIS (1998). Singh et al. [90] compared the zonation maps prepared using BIS and GSI guidelines for Arunachal Pradesh and found that GSI guidelines were better than BIS guidelines. Das et al. [92] studied homogenous susceptible units (HSU)-based landslide hazard utilizing spatiotemporal data and landslide size, which could better represent homogeneous susceptible regions. The study was conducted using high-resolution data for Northern Uttarakhand region. Martha et al. [36] applied the weight of evidence model to determine the susceptibility using a semi-automatic derived landslide inventories. Sarkar et al. [93] was the first to introduce landslide intensity as a parameter for hazard determination, which was applied for Garhwal Himalayas. The intensity was determined by analyzing the volume and velocity, which depends on the landslide area, debris thickness and the types of failures.

However, lately, the rise has been in the use of computational techniques such as Machine Learning (ML), Support Vector machine (SVM), and Artificial Neural Network (ANN), which have proved to outperform traditional approaches [13,94–96]. However, the studies have largely focused on landsides in Uttarakhand Himalayas. Ramakrishnan et al. [97] used backpropagation neural network and showed prediction capability of 80%. Pham et al. [98] used various ML ensemble models using Multilayer Perceptron (MLP) Neural Networks and six different ensemble techniques, thereby depicting better capability using ensemble framework. Pham et al. [99] analyzed susceptibility for an area of 323 km<sup>2</sup> in Uttarakhand using various ML models (SMO-SVM, VFI, and LR), FT, MLP-neural networks, and NB. In the first case, SMO-SVM performed best, whereas, in the latter case, MLP neural networks and FT models provided similar accuracy compared to NB. Pham et al. [13] used four hybrid machine learning models for Uttarakhand Himalayas. Pham et al. [13,95,100] extensively analyzed the Pittorgarh region (242 km2) in Uttarakhand using various hybrid ML models (Table 2).


**Table 2.** Various computational models used for landslide susceptibility in Uttarakhand Himalayas.

Kanungo et al. [107] compared susceptibility map for Darjeeling Himalayas using traditional, ANN, fuzzy, and combined neural fuzzy weighing schemes and found the combined neural fuzzy model to provide more accurate results. Following this, two different models (combined neural certainty and fuzzy) were compared with the neural fuzzy model for the same region and the latter was found to be better [108]. Chawla et al. [109] used Genetic Programming (GP) and Particle Swarm Optimization (PSO) SVM model for the Darjeeling Himalayan region and found GP method to perform better. Meena et al. [110] conducted susceptibility analysis using a hybrid spatial multicriteria

evaluation model for Kullu, Himachal Pradesh and found it to produce better results when compared to FR and AHP.

#### **4. Discussion**

The results of the review provide researchers a comprehensive insight to understand the studies being conducted in various avenues of the Indian Himalaya, thus providing a general picture of the state-of-the-art in this region and a reference point for those (researchers and practitioners) starting to work in this area for the first time. Moreover, the analysis of the results can be useful in identifying some existing research problem and the most important research questions to be addressed in the near future.

The first gap to be urgently filled is the lack of studies in regions such as the Northeast and Jammu and Kashmir Himalayas. Since research programs have focused mainly on Uttarakhand region, which is understandable given the high population density and risk exposure, more efforts need to be made to analyze other regions as well and to help policy makers in formulating a general guideline towards mitigation. Another issue is the urgent need for high quality data, as many of the reviewed studies suffer from the coarse dataset being employed. As an instance, rainfall thresholds have been conducted using a very sparse rain gauge network that measures rainfall only at daily time steps; and most of the susceptibility studies make use of a very coarse geological map at the 1:250,000 scale. It is well established that the poor quality of input data has a negative impact on the quality of the results of any scientific model; therefore, priority should be given to the acquisition of high-resolution basic data for further research.

In the Indian Himalaya, many studies have been conducted that explore different disciplines of landslide research in various regions, and a precise framework is not evident. One of the next steps of research in the region could be the combination of existing studies in a more comprehensive and coherent framework aimed at assessing and reducing landslide risk. For instance, susceptibility maps could be used to define the most hazardous areas, rainfall thresholds to provide temporal forecasts especially for rapid movements, regional scale remote sensing systems to get a general overview of slow-moving landslides, and ground based monitoring systems could be employed for management of the hot-spots at highest risk level. Similar programs should also consider landslide hazard evolution as influenced by climate change and human modifications to the territorial setting.

Another significant gap highlighted by the review is that, even though landslides are a well-known source of risk in the area, a quantitative risk assessment has never been carried out, thus all past and present risk reduction strategies are implemented without a comprehensive framework that could optimize the efforts made. At present, landslide hazard studies are also quite limited: examples exist that encompass temporal probability of occurrence (by means of rainfall thresholds) or spatial probability of occurrence (by landslide susceptibility maps). A spatiotemporal hazard assessment would be desirable and, to obtain it the future, two options could be explored: the application of distributed physically-based models (which, in turn, require the acquisition of many input parameters with high spatial density—see, e.g., Tofani et al. [111]) or the adoption of simpler approaches based on the dynamic combination of rainfall thresholds and susceptibility maps [112].

Several techniques have been used to understand various aspects; however, the use of computational techniques has been limited to susceptibility and hazard analysis. It is encouraging to see the use of advanced machine learning techniques (ANN, SVM, and Random Forest) for landslide mitigation. However, the use of deep learning and artificial intelligence, which shows great promise in geohazards and climatic studies, is yet to be performed for the Indian Himalayan region. The use of advanced computational techniques (deep learning) especially for landslide identification is one aspect which researchers should look to exploit.

#### **5. Conclusions**

This review paper is an attempt to understand the studies being conducted in the Indian Himalayan region, which contributes almost 15% of the global rainfall-induced landslides. The article focuses only on landslide activity which was triggered due to rainfall as it reflects a significant portion to landslides in the Indian Himalaya and it is the only typology that can be managed by early warnings and forecasting models. The review reveals that there are several avenues where significant progress is required, especially in the aspect of climate change, use of high spatial and temporal resolution data, and new methodologies such as physical methods and computational approaches. The study highlights key topics in which the research has focused: susceptibility mapping, identification, and slope stability analysis. The conclusions from each sub-section are as follows:

	- a. Use of automated approaches involving the use of computational techniques.
	- b. Use of higher temporal resolution datasets and assessment of their reliability.
	- c. Application of the current techniques towards other significant landslide prone Himalayan regions. In terms of rainfall threshold studies, the number of articles were less than 10 with most of the work focused on the use of statistical models to define a single threshold, and majorly the Eastern Himalayas have been covered. The thresholds developed show large differences when calculated for regional and local scale, therefore it is suggested to develop thresholds at local scale to improve the understanding of the region and help in setting up an operational landslide early warning system. In general, the thresholds are very low if compared with other literature thresholds, thus confirming the basic assumption that the India Himalaya is very susceptible to landsliding. Additional research needs to be conducted on the use of physical models, including campaigns aimed at gathering input data for more complex models; moreover, efforts need to be made on the combination of empirical and physical models for a better understanding of landslide initiation.

**Author Contributions:** Conceptualization, A.D. and R.S.; formal analysis, A.D. and S.S.; data curation, R.S.; writing—original draft preparation, A.D. and S.S.; writing—review and editing, B.P. and A.M.A.; and supervision, B.P. and R.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by the Centre for Advanced Modelling and Geospatial Information Systems (CAMGIS) in the University of Technology Sydney (UTS) under Grants 321740.2232335, 321740.2232357, 321740.2232424, and 321740.2232452. This research was also funded by International Science Council under Grant ISCROAP/IRDR/SG/2019/009. This research was also supported by Researchers Supporting Project number RSP-2019/14, King Saud University, Riyadh, Saudi Arabia. The research was also funded by Università degli Studi di Firenze under the Grant 58517\_INTERNAZIONALIZZAZIONE.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
