*3.1. Random Forest*

Variable importance was used to determine which of the random forest inputs were most important in the landcover classification of CSMR. Variable importance was measured by the mean decrease in Gini index (Gini value), a measure in which higher values indicate higher importance in the model (Table 3) [44]. From this measure, NDVI and green vegetation fraction were the most important variables in three of the four years. NDVI and green vegetation fractions did not have the highest importance in January 2018 and November 2020, respectively. Secondary variables that also had high importance were mARI, bare soil fractions, and senesced vegetation. Recovery time steps had greater mARI importance compared to the earlier dates. Shade fractions and subtidal fractions had the lowest amount of importance in many of the dates. The bare surface model (digital elevation) was only available and used for January 2018 but had moderate importance in the model. LiDAR was not used for other dates due to expected differences in surface from time of the debris flow to later dates. Additionally, January 2018 had the lowest values for decrease in Gini index, and this could be linked to having more variables to use and/or high solar zenith angle.


**Table 3.** Variable importance across dates (mean decrease in Gini index).

Final model selection was done via k-fold cross validation where models with the lowest error were selected as the final model, with accuracies being reported for each number of splits tested. The number of splits that occur at each node within a decision tree is indicated by mtry; the random forest model then selects the mtry with the highest accuracy as the final prediction. The final mtry accuracy values (mtry = 2, 8, 2, and 2 respectively, Table 4) were high for all four dates—99.5%, 93%, 95.6%, and 97.1%, respectively—with similar kappa values—99.3%, 91.1%, 94.3%, and 96.3%.

**Table 4.** Class error and final model accuracy across dates.


Landcover class accuracy was measured via producer's and user's error and allows for the assessment of the mapping of individual landcover classes (Table 4). High marsh vegetation was most accurately mapped with low user's and producer's error across all dates. Subtidal and mid marsh had the greatest amount of user's and producer's error, especially in January 2018. Subtidal cover had the greatest confusion with mid marsh vegetation and bare soil, while mid marsh was confused with bare soil and subtidal. Error within the subtidal and mid marsh classes was below 10% for most dates, and classification for the two classes remained relatively accurate.
