Every year, floods result in the loss of human lives, the loss of livestock, and millions in economic damages. In recent years, there has been an increase in both the intensity and frequency of these natural disasters, attributed to climate change. The need for forecasting, damage control, and mitigation is of great concern for governments. There are various types of systems and methodologies that can help in forecasting, assessing risk, and predicting damages. This paper will explore the use of artificial intelligence for flood detection through remote sensing images. This can help construct an automated system for flood detection that can assist human action, such as aiding and rescuing affected people in the area. Artificial intelligence can play a crucial role in making these systems faster, cheaper, and more robust. Through its strong predictive capabilities and its ability to discover patterns from large amounts of data, AI can help analyze historical data and make systems that outperform physical and numerical models.
Risk assessment and forecasting systems can help prevent and mitigate damages before they occur and can help in efficiently evacuating people, constructing flood-prevention structures, and reinforcing existing ones. Flood risk assessment systems use historical data to train AI algorithms to create maps indicating the risk of an area. Remote sensing data and machine learning algorithms, such as SVR, have been used to calculate maps showing the flood inundation depth [
1]. Similarly, in [
2], several machine learning algorithms, such as random forest and linear regression, are used to assess flood risk, and the information gain ratio is used to calculate the importance of the factors used for training the model (e.g., elevation, curvature, and rainfall). The second type of system used before a flood occurs is the forecasting system. These systems usually monitor streams of data (time series) and try to predict their respective values in a future time window, e.g., the amount of rainfall that will occur in a future 6-h time window. For example, [
3] uses LSTM to forecast the flow rate up to a 3-day horizon. This model takes as input a time series of daily discharge and rainfall. Also, in [
4], another forecasting system is described that uses a spatiotemporal LSTM, which takes as input hourly rainfall and streamflow data and outputs the outlet flow for the next 6 h. This model also focuses on the interpretability of the results, helping to make more informed decisions.
Apart from the aforementioned techniques used before a flood occurs, there are plenty of systems that are used after a flood. A large portion of these systems focus on flood detection, damage detection, and water body detection. In [
5], transformers are used to perform image segmentation in images containing floods, which is a very useful method for automatically detecting water bodies, trees, and houses through aerial imaging. Convolutional neural networks have also been used to assess structural damage in buildings through aerial images from UAVs [
6]. Similarly, [
7] uses images from UAVs and CNN models to detect damage to infrastructure after a flood. Aerial images and state-of-the-art vision models have been employed to perform scene understanding in images containing flooding events [
8]. These models can successfully detect buildings, flooding, and roads, and they can distinguish between floodwater and natural water.
In flood and damage detection cases, models that use AI are very useful as they can make use of images collected from UAVs and satellite images and rapidly analyze complex images. The early detection of floods through remote sensing can be vital in the early deployment of help in that area. Also, robust models that can perform segmentation, mapping, and detection can be used to automatically label huge amounts of data collected daily from various sources. Deep convolutional neural networks have been used to detect floods in Sentinel-2 images [
9]. Sentinel-2 contains multiple frequency bands, and the aforementioned study uses green and blue bands, as well as water indices, to make detection easier. Similarly, in [
10], images from both Sentinel-1 and Sentinel-2, as well as CNN models, are used to detect floods in these images, achieving an accuracy of 80%. Furthermore, [
11] compares several machine learning algorithms (neural networks and SVM) with CNNs for flood detection in radar images. In [
12], the performance of a multi-modal model that integrates a CNN with a transformer is compared with that of singular models, such as random forest, neural network, SVM, and CNN. The results show that the transformer combined with the CNN yields better results than the singular models. The authors of [
13] compare several segmentation models (WVResU-Net, Swin U-Net, U-Net+++, Attention U-Net, R2U-Net, ResU-Net, TransU-Net, and TransU-Net++) to successfully map flooded areas using Sentinel-1 SAR images. Similarly, in [
14], SAR images from Sentinel-1 are used to map inundation extents of lakes. The method uses a CNN to extract high-dimensional features, which are used as input to a transformer, and a fully connected neural network as a classifier head. Moreover, Swin transformers have been used for wetland classification (water, forest wetland, etc.) [
15]. The aforementioned study compares the performance of the transformer with CNN models, with the Swin transformer outperforming all other models. Another study [
16] uses Sentinel-1 images and Swin transformers to perform water body detection for agricultural reservoirs, while [
17] compares Swin transformers with CNNs for wetland classification, using Sentinel-1 and Sentinel-2 images, demonstrating that the former outperforms the latter. Last but not least, [
18] combines two models—a Swin transformer and a CNN—to perform water body mapping in remote sensing images. This literature review successfully demonstrates that vision models, especially vision transformers, can be used efficiently for flood detection, segmentation, and mapping. Also, remote sensing data, such as those from satellites, can be used to train the models because they are rich in information that conventional images cannot capture. This study aims to investigate the possibility of combining transfer learning with vision transformers for fast and automatic detection of flooded areas. This capability is critical for flood risk management agencies and civil protection authorities aiming to quickly decide their emergency response plan after a flooding event. Moreover, although most previous studies focused on applying ViT models on one type of image, the models in this study are applied and evaluated for both SAR and multispectral images from Sentinel-1 and 2, with the objective of developing accurate and “image agnostic” flood detection methodologies that will leverage available data from different sources. Additionally, this study compares the proposed methodology with state-of-the-art CNN models that have shown promise for flood detection applications.