**1. Introduction**

Drought is a recurring and extreme climate event that originates in a temporary water deficit and may be related to a lack of precipitation, soil moisture, streamflow, or any combination of the three taking place at the same time [1]. Drought differs from other hazard types in several ways. First, unlike other geophysical hazards that occur along well defined areas (i.e., floods, earthquakes, landslides), drought can occur anywhere with the exception of desert regions and extremely cold areas where it does not have meaning [2,3]. Secondly, drought develops slowly, resulting from a prolonged period (from weeks to years) of precipitation that is below the average, or expected, value at a particular location [4].

To improve drought mitigation, different indicators are used to trigger a drought warning [1,5]. While an indicator is a derived variable for identifying and assessing different drought types, a trigger is a threshold value of the indicator used to determine the onset, intensity or end of a drought, as well as the timing to implement proper drought response actions [6,7]. Since precipitation is one of the most important inputs to a watershed system and provides a direct measurement of water supply conditions over different timescales, several commonly used drought indicators rely on precipitation measurements only [4]. Among them, the Standardized Precipitation Index (SPI) of [8] is certainly the most prominent; it has been recommended by the World Meteorological Organization (WMO) for characterizing the onset, end, duration and severity of drought events deriving from precipitation deficiencies taking place at different accumulation periods and occurring at different stages of a same hydro-meteorological anomaly [9].

The immediate consequences of short-term droughts (i.e., a few weeks duration) are, for example, a fall in crop production, poor pasture growth and a decline in fodder supplies from crop residues, whereas prolonged water shortages (e.g., of several months or years duration) may, among others, lead to a reduction in hydro-electrical production and an increase of forest fire occurrences [10]. Therefore, skillful predictions of the onset and end of a drought a few months in advance will benefit a variety of sectors by allowing sufficient lead time for drought mitigation efforts. Indeed, drought forecasting is nowadays a critical component of drought hydrology science, which plays a major role in drought risk management, preparedness and mitigation.

It has been demonstrated that droughts can be forecasted using stochastic or neural networks [11,12]. While [13] demonstrated that these type of forecast can provide "reasonably good agreemen<sup>t</sup> for forecasting with 1 to 2 months lead times", they do not quantify the improvement of these methods with respect to using probabilistic forecasts of the precipitation fields. Forecasts of droughts can also be produced using deterministic numerical weather prediction models. However, such forecasts are highly uncertain due to the chaotic nature of the atmosphere, which is particularly strong on a sub-seasonal timescale [14].

As an alternative, ensemble prediction systems that forecast multiple scenarios of future weather have considerably evolved over recent years. Indeed, the routine generation of global seasonal climate forecasts coupled with advances in near-real-time monitoring of the global climate has now allowed for testing the feasibility of generating global drought forecasts operationally. Systems to monitor drought around the globe are described in [7] for meteorological drought and in [15] for hydrologic and agricultural conditions. For example, Yuan et al. [16] used seasonal precipitation forecasts from the North American Multi-Model Ensemble (NMME) and other coupled ocean-land-atmosphere general circulation models (GCMs) to examine the predictability of drought onset around the globe based on the SPI. For the global domain, they found only a modest increase in the forecast probability of drought onset relative to baseline expectations when using the GCM forecasts. Hao et al. [17] described the Global Integrated Drought Monitoring and Prediction System (GIDMaPS) that uses three drought indicators. The forecasting component of their system relies on a statistical approach based on an ensemble streamflow prediction (ESP) methodology. Dutra et al. [18,19] generated global forecasts of 3-, 6-, and 12-month SPI by combining seasonal precipitation reforecasts from the European Centre for Medium-Range Weather Forecasts (ECMWF) System 4 (S4) with precipitation observations from the Global Precipitation Climatology Centre (GPCC) and, alternatively, the ECMWF Interim Reanalysis. They reported on several verification metrics for the SPI forecasts for 18 regions around the globe. Using the same definition as [16], they found that the ECMWF S4 provides useful skill in predicting drought onset in several regions, and the skill is largely derived from El Niño-Southern Oscillation (ENSO) teleconnections. However, they also found that in many regions is difficult to improve on "climatological" forecasts. Recently, Spennemann et al. [20] studied the performance and uncertainties of seasonal soil moisture and precipitation anomalies (SPI) forecasts over Southern South America by means of Climate Forecast System, version 2 (CFSv2). Their results show that both SPI and standardized soil moisture anomalies forecast skills are regionally and seasonally dependent. In general, a fast degradation of the forecast skill is observed as the lead time increases, resulting in almost no added value with regard to climatology at lead times longer than 3 months. However, they note that the forecasts have a higher skill for dry events if compared with wet events.

In this study, we build on the work of [18,19] by considering the ECMWF S4 ensemble framework to generate seasonal forecasts of the SPI, and perform their verification against corresponding SPI from precipitation observations of the GPCC over Latin America. Drought is viewed from a meteorological perspective, and seasonal forecasts of the 3- and 6-month SPI (SPI3 and SPI6) are generated and verified on a monthly basis for the hindcast period of 1981–2010.

While the focus of the work is on the prediction of meteorological drought, the study assesses two fundamental constraints in generating reliable regional drought predictions that will arise whether using the reported method or any other approach (e.g., land surface modeling): (1) the accuracy of summary statistics (e.g., mean, median, percentile) at predicting a seasonal drought from the members of the ensemble forecasting system; and (2) the skill of probabilistic categorical predictions of seasonal drought from the members of the ensemble forecasting system.

#### **2. Study Area, Datasets and Methods**

The study area covers the whole South-Central America region (the domain of analysis is limited to land surface grid points between 56◦ S–35◦ N, 33◦–128◦ W). South-Central America spans a vast range of latitudes and has a wide variety of climates. It is characterized largely by humid and tropical conditions, but important areas have been extremely affected by meteorological droughts in the past [21–23] and the climate change scenarios foresee an increased frequency of these events for the region [24,25]. Given the significant reliance of South-Central American economies on rainfed agricultural yields (rainfed crops contribute more than 80% of the total crop production in South-Central America), and the exposure of agriculture to a variable climate, there is a large concern in the region about present and future climate and climate-related impacts [26]. South-Central American countries have an important percentage of their GDP in agriculture (10% average, [27]), and the region is a net exporter of food globally, accounting for 11% of the global value. According to the agricultural statistics supplied by the United Nations Food and Agriculture Organization [27], 65% of the world production of corn and more than 90% of the world production of soybeans occurs in Argentina, Brazil, the United States and China. The productivity of these crops is expected to decrease in the extensive plains located in middle and subtropical latitudes of South-America (e.g., Brazil and Argentina), leading to a reduction in the worldwide productivity of cattle farming and having adverse consequences for global food security [28,29].

#### *2.1. Forecasts: The ECMWF Seasonal Forecast System (S4)*

In this study, we use the ECMWF seasonal forecast system 4 (hereafter S4; [30]) to forecast 3- and 6-month SPI. The S4 is a dynamical forecast system based on an atmospheric-ocean coupled model, which has been operational at ECMWF since 2011 and is launched once a month (on the first day of the month). The 2011 version of the forecast model has 91 vertical levels, lead times up to 13 months, and a resolution of T255 (80 km). It provides back integrations (hindcast) with 15/51 member ensemble (number depends on month) for every month from 1980 onwards. Molteni et al. [30] provide a detailed overview of S4 performance. For the comparison with the GPCC observations, the S4 has been re-gridded to 1.0◦ latitude/longitude grid spacing, and daily precipitation values over its hindcast period (1981–2010) have been aggregated to monthly values. The ability of the probabilistic model to accurately forecast seasonal drought conditions has been evaluated up to 6 months of lead time. In addition to the dynamical seasonal forecasts and in order to test whether the forecasts perform better than a benchmark, a set of climatological forecasts (CLM) were also generated by randomly sampling past years from the reference data set to match the number of ensemble members in the hindcast, as depicted in [19].

#### *2.2. Observations: The GPCC Full Data Reanalysis Version 6.0*

In this study, monthly precipitation totals at 1.0◦ latitude/longitude grid spacing from the Full Data Reanalysis Monthly Product Version 6.0 of the GPCC are used as a reference data set (for the forecast verification). The GPCC was established in 1989 on request of the World Meteorological Organization (WMO) and provides a global gridded analysis of monthly precipitation over land from operational in situ rain-gauges based on the Global Telecommunications System (GTS) and historic precipitation data measured at global stations. The data supplies from 190 worldwide national weather services to the GPCC are regarded as primary data source, comprising observed monthly totals from 10,700 to more than 47,000 stations since 1901. The monthly gridded data sets are spatially interpolated with a spherical adaptation of the robust Shepard's empirical weighting method [31]. Validation of the original data sets for drought monitoring has been performed by [18,32], who found that GPCC data sets show higher values for extreme precipitation, and tend to over-smooth the data. This can generate some problems when analyzing intense precipitation events but appears of secondary importance in drought analysis. Therefore, to be consistent with the data provided by the ensembles from ECMWF, a common period of the hindcast that covers the period from 1981 to 2010 is used to calculate the SPI.

#### *2.3. Drought Indicator: The Standardized Precipitation Index (SPI)*

In this study, we selected the SPI [8] as a meteorological drought indicator. The SPI is a statistical indicator that compares the total precipitation received at a particular location during a period of time with the long-term precipitation distribution for the same period of time at that location. In order to allow for the statistical comparison of wetter and drier climates, the SPI is based on a transformation of the accumulated precipitation into a standard normal variable with zero mean and variance equal to one. SPI results are given in units of standard deviation from the long-term mean of the standardized precipitation distribution. Negative values, therefore, correspond to drier periods than normal and positive values correspond to wetter periods than normal. The fundamental strength of the SPI is that it can be calculated for a variety of precipitation timescales (e.g., weekly, monthly, seasonal or yearly accumulation periods) and updated on various time steps (e.g., daily, weekly, monthly), enabling water supply anomalies relevant to a range of end users to be readily identified and monitored. SPI is typically calculated on a monthly basis for a moving window of n months, where n indicates the precipitation accumulation period.

The magnitude of negative SPI values correspond to percentiles of a probability distribution that are frequently used as threshold levels (triggers) to classify drought intensity [8,33,34]. Several classification systems of meteorological drought intensity based on fixed threshold levels of the SPI have been presented in the literature. The most widely known is that proposed by [8], which maps precipitation totals below the 50th percentile into four fixed categories of drought intensity (Table A1). For example, a "moderate" drought event starts at SPI = −1.0 (units of standard deviation), which corresponds to a cumulative probability of 15.9%, that is, approximately the 16th percentile. McKee at al. [8] determined that every region is in "mild" drought 34% of the time, in "moderate" drought 9.2% of the time, in "severe" drought 4.4% of the time, and in "extreme" drought 2.3% of the time (Table A1). The threshold levels of drought intensity proposed by [8] have been used worldwide in numerous applications at different timescales of precipitation accumulation, such as to monitor drought in the United States [35,36] and Europe [37], for detecting droughts in East Africa [38], to monitor drought conditions and their uncertainty in Africa using data from the Tropical Rainfall Measuring Mission (TRMM) [32], and for improving the fire danger forecast in the Iberian Peninsula [39].

#### *2.4. Drought Detection and Verification Methods*

The methods to detect drought events from the S4 ensemble system (Table A2) were defined in [40] as 13th percentile (Q13); 23th percentile (Q23); Median (MED); 77th percentile (Q77); 88th percentile (Q88); Large spread (SpL); Low spread (Spl); Dry spread (SpD); Flood spread (SpF); Mean (EM\_RES).

Forecast verification is the process of assessing the quality of forecasts. The usefulness of forecasts to support decision making clearly depends on their error characteristics, which are elucidated through forecast verification methods. In this study, the forecasts correspond to the monthly SPI3 and SPI6 values computed with the ECMWF S4 for the period 1987–2010; the observations correspond to the SPI3 and SPI6 values computed with the GPCC for the same historical period. The validation methods used are the percentage correct (PC), extreme dependency score (EDS), Gilbert skill score (GSS), BIAS, probability of detection (POD), and False Alarm Rate (FAR). A comprehensive description of the validation metrics can be found in the supplementary material.

#### **3. Results and Discussion**

Initially, we assessed the ability of the ECMWF S4 ensemble system to seasonally forecast the spatial distribution of SPI in South-Central America by evaluating its monthly scalar accuracy and skill score at each location with 3- and 6-month lead time (respectively for the SPI3 and SPI6). In the sequence, we verify the non-probabilistic identification of drought events by means of the S4 system.

#### *3.1. Non-Probabilistic Forecasts of Continuous SPI Values*

In Figure 1, we present the monthly correlation between observed and forecast ensemble mean (a) SPI3 and (b) SPI6 at, respectively, 3- and 6-month lead time for the hindcast period of 1981–2010. The maps depicted in Figure 1 show that there is a positive correlation between SPI3 forecast and observations at all months and for most of the study area. Overall, the forecast SPI3 values follow the trends (increases or decreases) of the observed SPI3 values. Notwithstanding, the statistical significance between observed and forecast SPI3 varies across regions and months: for example, the correlation along the East Pacific coast is almost never statistical significant during the year, it is mostly statistical significant during the whole year for Northeast of South-America, and significant patterns are only verified for Central America during the months between December and May. On the other hand, SPI6 forecasts present extensive geographic areas that are negatively correlated with SPI6 observations at 6-month lead time (Figure A1). These large forecast errors are not systematic but occur mainly for the Amazon and Central East part of South America, and are most evident during the months of January–April (end of the wet season) and June–August (dry season). Surprisingly, and similarly to the SPI3, the correlation is statistically significant during almost the whole year for the Northeast of South America and for large parts of Central America from March to May. Mo and Lyon [41] sugges<sup>t</sup> that the statistically significant correlation patterns in Central America and Northeast of South America are likely contributed by the ENSO: these regions are known to have a strong ENSO signal, and the seasonal skillful of precipitation forecasts contribute to the SPI3 and SPI6 seasonal forecasts. Moreover, in those areas and during both seasons (wet and dry), the intra-seasonal patterns of precipitation seem to be highly influenced by the activity of the Madden–Julian Oscillation [42]. Since the correlation is statistical significant for some regions at some months, then it suggests that the forecast has some skill at 3- and 6-months lead time.

The scalar skill score was also analyzed to assess the ability of the forecasts to improve SPI prediction over the climatological median values (i.e., SPI = 0). The differences between the ECMWF-based forecasts and the climatological forecasts (CLM) will indicate whether there is additional skill obtained from the dynamical model forecasts. In Figure 2, we present the monthly SPI3 forecast skill (using the mean of the ensemble) at 3-month lead time relative to baseline skill for the hindcast period 1981–2010, which shows the difference in correlation between the ECMWF S4 SPI3 forecasts and the baseline SPI3 forecasts based on climatological probabilities. Our results confirm that the forecasts have higher skill than the baseline, but the differences are often not significant at the 5% level based on the Fishers Z test. Indeed, although the correlation with observations is extensively significant over the study area, it does not extensively improve over the climatological SPI values. Marked improvements are observed for Northeast Brazil during the months of April–July, Mexico during the months of December–April, and North of South America between January–April. Overall, our results are consistent with [19,41], namely, that it is still challenging to improve on SPI forecasts that are based on climatology and persistence.

**Figure 1.** Monthly correlation of the observed and forecast standardized precipitation index (SPI) at 3-months lead time (SPI3) (using the mean of the ensemble) for the hindcast period (1981–2010). Values are indicated in the color bar: 0.31 (0.37) is statistical significant at 10% (5%) significance level.

Interestingly, scalar skill score results sugges<sup>t</sup> that SPI3 forecasts match the observations in dry regions mainly during the beginning of the dry seasons, while at regions with high rainfall variability and/or during the wet seasons the forecasts are usually less skillful. Therefore, we believe that the ECMWF S4 ensemble mean might underestimate monthly rainfall and thus increase the intensity of dry periods and lessen the forecast values of SPI3 for the study region.

**Figure 2.** Monthly difference in forecast skill (Pearson correlation) between the forecast SPI3 at 3-month lead time (using the mean of the ensemble) and climatological SPI for the hindcast period (1981–2010). Values are indicated in the color bar: 1.96 is the statistical significance at the 5% significance level.

On the other hand, the 6-month seasonal forecasts are less skillful than the 3-month forecasts (Figure A2). Indeed, and as expected from the correlation analysis, skill scores for the SPI6 forecasts are generally lower than for SPI3 and almost not statistically significant at the 5% level. In Figure A2, it is perceptible that regions with meaningful SPI6 forecasts are also depicted as skillful for the SPI3. The monthly skill scores clearly show that the meaningful forecasts are concentrated over the eastern Amazon, namely in most of the states of AP (Amapá), PA (Pará) and MA (Maranhão). Molteni et al. [30] states that some important bias reductions were introduced in S4, as compared to S3, particularly in the tropical Atlantic and Indian Oceans, and some improvements over land areas e.g., in East Asia and over the Amazon Basin. It is possible that these improvements over the bias of the ECMWF S4 precipitation forecasts will reduce the residual errors between observed and predicted seasonal SPI values.

In Figure 3, the Root Mean Squared Error (RMSE) values between observed and forecast SPI3 at 3-month lead time (Figure A3 for SPI6), for the hindcast period 1981–2010. The results sugges<sup>t</sup> that the predicted SPI is less consistent with the observations derived from GPCC for those regions placed in the subtropical subsidence zones around 10◦ and 30◦ N/S, such as subtropical southeast and central Brazil, Paraguay and Bolivia, as well as large areas of Peru.

**Figure 3.** Root Mean Squared Error (RMSE) between the observed and forecast SPI3 at 3-month lead time (mean of the ensemble) for the hindcast period (1981–2010). Values in difference of percentile magnitude are indicated in the color bar.

The high variability of precipitation regimes within those latitudes [43,44] makes it difficult to predict drought at seasonal scale. The results based on the analysis of residual errors also sugges<sup>t</sup> that locations with monthly forecast errors inferior to 0.2 have significant skill, whereas those superior to 0.5 have negative correlation and are unskillful. This output is confirmed by the monthly skill score measured

in terms of the RMSE (Figure 4). The RMSE skill score approximates the skill score computed with the correlation index (Figure 2) and its spatial patterns: overall, seasonal SPI3 and SPI6 forecasts are monthly skillful for a small region in the eastern part of the Amazon Basin.

**Figure 4.** Skill Score of the SPI3 at 3-month lead time forecast measured in terms of the RMSE relative to climatological RMSE for the hindcast period (1981–2010).

#### *3.2. Non-Probabilistic Forecasts of Categorical SPI Values*

In Figures 5 and A5 the score values of categorical drought forecasts are represented (i.e., below the SPI -1 threshold) while the ensemble drought detection was based on several methods as depicted in Table A2. We have pooled together all seasons and locations at the study area in generating Figures 5 and A5. Surprisingly, the distribution of score values for SPI3 and SPI6 are alike for all methods and all verification measures. This may be due to the fact that boundary conditions of seasonal dynamical model forecasts are often characterized by low frequency variability, leading to similar predictability of medium-range climate conditions that extend from a few to several months lead time. In general, precipitation is the result of a complex and interacting phenomena at different spatial and temporal scales, but regional atmospheric patterns that are actively involved in the development of long-term drought conditions are persistent and influenced by predictors that can be accurately estimated at large lead times. Therefore, precipitation anomalies over extreme peak thresholds (drought conditions) might be similarly predicted for different accumulation periods and seasonal lead times, although the accuracy of their scalar values may vary regionally and seasonally. Moreover, given the similar distribution of score values for different methods of categorical drought identification, we present the results of the SPI3 and SPI6 in a joint analysis.

**Figure 5.** Verification measures of categorical drought forecasts (i.e., below the SPI3 "-1" threshold) estimated with the methods described in Table A2.

For categorical drought events predicted with both SPI3 and SPI6, computed with the ECMWF S4 ensemble mean (EM-RES), POD values indicate that for at least 50% of the locations in South-Central America one in three seasonal drought events is correctly predicted. This is better than the respective climatology (16% of drought events are correctly detected) and extends over a geographic area larger than that with statistical significant scalar skill scores. Although the ensemble mean performs better than the climatology, POD values are still higher for the methods Q13 (60% of detection) and SpD (80% of detection); the worst results of all the methods are given by the wettest members of the ranked distributions (Q77 and Q88). This means that drier members are better than the mean at detecting the drought onset, but also that there is a low consistency between the extreme and dry members of the ECMWF S4 ensemble set. Lavaysse et al. [40] found similar results in in Europe, where the highest POD is achieved by using the 13 percentile, and the product using the Q13 and Q23 (SpD).

According to the FAR scores, we perceive that by using the ensemble mean SPI values to correctly detect a drought (EM\_RES), there will be on average a 70% rate of false alarms. Median FAR values are even larger for dryer members (10% more for Q13 and SpD), and the inter-quantile range of the wettest members is about six times greater than that of the mean (60%), which indicates a large spread of FAR values. Based on these results, it is difficult to select the method that better optimizes between the number of drought hits and the number of misses. Indeed, while the mean of the ensemble shows always an average number of hits and misses (as similar to Spl and SpL, which represent the mean of ensemble extreme and opposite members), the dryer and wetter members of the ensemble attain, respectively, extreme numbers of hits or misses. In that sense, Lavaysse et al. [40] proposed a way to compensate the effect of number of event detected in POD and FAR by using specific thresholds in order to select the same number of events for the different methods.

Looking at PC, we might sugges<sup>t</sup> that the Q13 of the ensemble is the worst performing method to detect between drought and non-drought events. On the other hand, by looking the EDS we might sugges<sup>t</sup> that Q13 is the best method to detect the onset and end of a drought. Because of the non-dependency of the EDS alone to assess a model's performance on size is fixed, Ghelli and Primo [45] have suggested to not use the EDS alone to assess a model's performance on forecasting rare events. Those authors have shown that the EDS equation results in an increased freedom of false alarms and correct negatives, which can freely vary with the only restriction that their sum has to be constant. This feature encourages hedging, that is, forecasting the event all the time to guarantee a hit and thus to ensure a higher success rate, however this will increase the false alarm ratio and bias. Therefore, it is paramount to use the EDS in combination with other scores that include the right hand side of the contingency table, as the false alarm rate and/or the bias. Indeed, both FAR and BIAS show that SpD is not an accurate method to detect drought, as it forecasts a large number of drought events that do not occur.

In that sense [40] proposes the use of the maximum Gilbert skill score (GSS) as trigger-point to find the method that better optimizes among the number of false alarms, misses and hits of drought events identified with the SPI. Looking at Figures 5 and A5, it is noted that the ensemble mean (EM\_RES) is the best choice for discriminating among seasonal drought and non-droughts events at 3- and 6-month lead time, whilst keeping a minor number of false alarms. Although the SpD gives the best POD, it also increases the ratio of false alarms and diminishes the overall skill score of the method. Following the approach by [40], we sugges<sup>t</sup> that the ensemble mean should be used to trigger the warning of seasonal drought events for South-Central America by means of the SPI3 and SPI6 for respectively 3- and 6-month lead times.

#### *3.3. Probabilistic Forecasts of Categorical SPI Values*

In addition to having skillful forecasts of scalar SPI3 and SPI6 derived with the ECMWF S4 ensemble mean at seasonal lead times, a second fundamental challenge to generate reliable drought forecasts for the region is associated with uncertainties in the ensemble used. Therefore, to further quantify the uncertainties arising from the spread of the ensemble when computing the SPI, we computed the overall Brier Skill Score (BSS), based on the climatological frequency of "moderate", "severe" and "extreme" drought events (Table A1). In Figures 6 and 7, we map the spatial distribution of BSS for the ECMWF S4 SPI-3 and SPI6 forecast respectively, measured in terms of the BS relative to climatological BS at a lead time of 3 and 6 months for the hindcast period 1981–2010. We have pooled together all seasons at each grid point.

The spatial distribution of BSS sugges<sup>t</sup> that the skill of the forecasting system is very similar for both accumulation periods and decreases with the increasing intensity of drought. Looking at the skill for predicting "moderate" drought events, the maps introduced in Figures 6 and 7 show that the forecasting system behaves better than the climatology for large clustered points at the North of South America, Mexico, Northeast of Argentina and Uruguay. In the later regions, where a hot spot appears over La Plata basin, local feedbacks between soil properties and precipitation variability can explain the improved skill which is linked to the coupling strength between soil moisture, evapotranspiration, and temperature [46,47]. On the other hand, the system skill for predicting "extreme" drought events is limited to a few locations in Northeast Brazil, Northeast Mexico, Northeast Amazon, and Northeast of Argentina. These results are encouraging, but only the Northeast of Mexico shows some spatial clustering with positive BSS for extreme drought events, while positive BSS is spatially scattered for the other regions. On combining these results, it can thus be reasonably assumed that forecasting different magnitudes of meteorological drought intensity on seasonal time scales remains quite challenging, but the ECMWF S4 forecasting system does at least a promising job in capturing the drought events (i.e., "moderate" drought) for some regions.

**Figure 6.** Brier Skill Score (BSS) of the European Centre for Medium Range Weather (ECMWF) S4 SPI-3 forecast for different probabilities of SPI occurrence, at a lead time of 3 months for the hindcast period 1981–2010. Values are indicated in the color bar; land grid points colored in white indicate that the forecasting system is no more skillful than the climatology.

**Figure 7.** Brier Skill Score (BSS) of the ECMWF S4 SPI-6 forecast for different probabilities of SPI occurrence, at a lead time of 3 months for the hindcast period 1981–2010. Values are indicated in the color bar; land grid points colored in white indicate that the forecasting system is no more skillful than the climatology.

It is interesting to note that the spatial pattern of positive BSS at different SPI categories closely matches the regions that show significant skill scores for non-probabilistic drought forecasts, as well as the geographic grid points that have the lowest monthly RMSEs (Figure 3). As expected, the BSS is lower for the locations where the scalar mismatch between the forecast and observations is larger, which implies more categorical misses and/or false alarms at any SPI intensity. Notwithstanding, since the increase of SPI intensity is accompanied by a decrease of the respective cumulative probability, it was expected that the BSS would decrease with an increase of the SPI drought category because there is a larger probability for mismatching.

To finalize the evaluation of seasonal drought forecasts with the ECMWF S4 data set for South-Central America, we proceed with the analysis of the Relative Operating Characteristic (ROC) of the forecasts. In Figures 8 and 9, we present the spatial distribution of the area under the ROC curve for the probability of drought detection at different SPI frequencies. The values are estimated considering the ECMWF S4 SPI3 and SPI6 forecasts at a lead time of 3 and 6 months respectively, for the hindcast period. We have pooled together all seasons at each 1dd grid point in generating the maps of Figures 8 and 9. For the SPI3 and SPI6, for the "moderate" drought threshold, the area under the ROC curve at all grid points in South-Central America is well above the no skill line, indicating that, despite the poor reliability measured by the BSS, the forecasting system does have some skill. Nevertheless, similarly to the BSS, we perceive that the regions in the North of South America, Northeast of Argentina and Mexico are more skillful than the remaining locations. As the intensity of drought increases, the usefulness of the forecasting system decreases both in magnitude and area. For "extreme" drought events, the grid-points located in South, Central and Northeast of South America are not skillful, as the area under ROC curve is below the 0.5.

**Figure 8.** Area under the Relative Operating Characteristic (ROC) curve for the probability of drought detection at different SPI3 frequencies. Values indicated in the color bar are estimated at lead time of 3 months for the hindcast period 1981–2010.

**Figure 9.** Area under the ROC curve for the probability of drought detection at different SPI6 frequencies. Values indicated in the color bar are estimated at lead time of 6 months for the hindcast period 1981–2010.
