**1. Introduction**

Nepal is a geographically diverse country with flats in the south and increasing hills, to the mighty Himalayas in the north. In Nepal, approximately 70% to 90% of the total annual rainfall occurs during the monsoon period resulting in high runoff and sediment discharge causing surface water area change [1]. Thus, it is rich in water resources with approximately 600 rivers [2] and 5358 lakes [3]. Due to such seasonal variation and large surface water area, it is difficult to track changes in surface water [4,5]. Furthermore, the change in stream-flows due to climate change has also been predicted [6,7]. Therefore, the monitoring and estimation of surface water is an essential task.

In such cases, remote sensing technology plays a very important role on detecting, extracting, and monitoring surface water [8,9]. Open and free access mid-resolution multi-spectral satellite images such as Landsat brings further benefits in the process [10]. Thus, the authors begin to utilize the Landsat database to extract surface water from a small case of Phewa to a Landsat scene that covers different types of surface water along with features that resemble water, such as shadows, forests, built-ups, snow and clouds. In previous studies, the authors evaluated water index methods, single and combined [11,12], along with the segmentation of the scene [13]. Our latest work showed promising results for a scene in which segmentation and the optimum threshold were manually identified based on the given set of reference dataset. As a next step, automated extraction of surface water with well-known supervised classification approaches were evaluated [14].

With recent developments in computing technology, the machines are cheaper, and the algorithms are efficient. Therefore, the abundance of these machines and machine learning algorithms have been widely applicable in almost every aspect of human life. Moreover, their optimization has outperformed the classical ones. Numerous machine learning algorithms have been applied for remotely sensed imageries [15–19]. These algorithms can be divided broadly into three categories: (a) Unsupervised learning; (b) supervised learning; and (c) reinforcement learning. Unsupervised learning groups are given an unlabeled dataset based on the implicit relationship/function. Supervised learning utilizes a certain labeled instance (training dataset) to predict a similar dataset. Reinforcement learning does not provide a precise label, rather it takes the next step based on the goal-oriented feedback available for each prediction. Reinforcement learning is still in the developing stage, and as there are no errors, it could be wrong with each positive reward. Results of unsupervised learning cannot be ascertained and can be less accurate, whereas supervised provides specific class and labels with better accuracy [20]. Some of the most common supervised algorithms are decision trees, naive Bayes (NB), neural networks (NNET), regression, support vector machines (SVM), and ensemble methods [21]. The libraries for these algorithms have been well developed and implemented in reliable ecosystems of open source tools, such as Python and R languages. Despite the availability of open access data and tools, the evaluation of such in Nepal has never been documented. Moreover, the challenge of varying conditions in a single scene is also new for testing the performance of machine learning algorithms in the extraction of surface water.

Hence, the motivation of this work is to introduce the application of the most common algorithms used by the remote sensing community in Nepal and evaluate their performance for surface water extraction. The six most common algorithms, naive Bayes (NB), recursive partitioning and regression trees (RPART), neural networks (NNET), support vector machine (SVM), random forest (RF), and gradient boosted machines (GBM) were evaluated in a Landsat 8 operational land imager (OLI) bands. Also, the slope, normalized difference vegetation index (NDWI) and normalized difference water index (NDWI) were combined one at a time and all three at once with OLI bands to evaluate whether the combination can overcome the limitations of the original bands in water extraction. In the future, such evaluation will assist in selection of proper methods to develop an effective time series database at national scale in Nepal.

### **2. Materials and Methods**

As the study is the extension of our previous study [13], the authors utilized the same Landsat scene and reference dataset in this study. Hence, details on the study area and data can be found in Acharya et al. [13]. Pre-processing of the Landsat scene was carried out in Environment for Visualizing Images (ENVI) version 5.3 (Exelis Visual Information Solutions, Boulder, CO, USA), cartographic maps were produced in ArcGIS 10.5 (Environmental Systems Research Institute, California, CA, USA), and the machine learning process were carried using Classification And REgression Training (CARET) package in R 3.5.0 (The R Foundation, Vienna, Austria) software packages.
