A Random Forest Method to Forecast Downbursts Based on Dual-Polarization Radar Signatures

Medina, Bruno L.; Carey, Lawrence D.; Amiot, Corey G.; Mecikalski, Retha M.; Roeder, William P.; McNamara, Todd M.; Blakeslee, Richard J.

doi:10.3390/rs11070826

Open AccessArticle

A Random Forest Method to Forecast Downbursts Based on Dual-Polarization Radar Signatures

by

Bruno L. Medina

^1,*

,

Lawrence D. Carey

¹

,

Corey G. Amiot

¹,

Retha M. Mecikalski

¹,

William P. Roeder

²,

Todd M. McNamara

² and

Richard J. Blakeslee

³

¹

Department of Atmospheric Science, The University of Alabama in Huntsville, Huntsville, AL 35899, USA

²

45th Weather Squadron, Patrick Air Force Base, FL 32925, USA

³

NASA Marshall Space Flight Center, Huntsville, AL 35805, USA

^*

Author to whom correspondence should be addressed.

Remote Sens. 2019, 11(7), 826; https://doi.org/10.3390/rs11070826

Submission received: 13 March 2019 / Revised: 3 April 2019 / Accepted: 3 April 2019 / Published: 6 April 2019

(This article belongs to the Special Issue Radar Meteorology)

Download

Browse Figures

Versions Notes

Abstract

:

The United States Air Force’s 45th Weather Squadron provides wind warnings, including those for downbursts, at the Cape Canaveral Air Force Station and Kennedy Space Center (CCAFS/KSC). This study aims to provide a Random Forest model that classifies thunderstorms’ downburst and null events using a 35-knot wind threshold to separate these two categories. The downburst occurrence was assessed using a dense network of wind observations around CCAFS/KSC. Eight dual-polarization radar signatures that are hypothesized to have physical implications for downbursts at the surface were automatically calculated for 209 storms and ingested into the Random Forest model. The Random Forest model predicted null events more correctly than downburst events, with a True Skill Statistic of 0.40. Strong downburst events were better classified than those with weaker wind magnitudes. The most important radar signatures were found to be the maximum vertically integrated ice and the peak reflectivity. The Random Forest model presented a more reliable performance than an automated prediction method based on thresholds of single radar signatures. Based on these results, the Random Forest method is suggested for continued operational development and testing.

Keywords:

downbursts; dual-polarization radar; Random Forest; statistical learning

1. Introduction

A downburst is characterized by the occurrence of divergent intense winds at or near the surface, which are produced by a thunderstorm’s downdraft [1,2]. This phenomenon can produce substantial surface damage, often similar to that of tornadoes [3]. A number of observational [4,5,6,7,8,9] and modeling [10,11,12,13,14] studies have been conducted to reveal the structure, dynamics, microphysics, and environmental conditions associated with a variety of convective downbursts. Precipitation microphysical processes such as precipitation loading [10], melting hailstones [6,12,15], and evaporation of raindrops [10,14,16] are important for downburst generation. Based on this understanding, automated Doppler radar algorithms for downburst detection have been developed in prior studies [17,18]. Recently, [19] used radar and environmental variables as input to different machine learning techniques to predict surface straight-line convective winds.

In addition to Doppler radar and environmental observations of downbursts, dual-polarization meteorological radar characteristics for downbursts have been described in recent decades. For example, the differential reflectivity (Z_dr)-hole [6] is caused by melting hail within a downdraft and is characterized by a region of near-zero dB Z_dr and high reflectivity (Z_h) that is surrounded by larger Z_dr and smaller Z_h values. The mixed-phase hydrometeor region caused by hail melting [20,21] and loading [22] induces a localized reduction in the co-polar correlation coefficient (ρ_hv). In another study [8], a hydrometeor classification algorithm based on dual-polarization radar variables was utilized to identify a graupel region that transitioned to a rain and hail mixture, descending to the surface prior to the downburst.

The prognosis of intense winds has a substantial importance for operations at the Cape Canaveral Air Force Station and the National Aeronautics and Space Administration (NASA) Kennedy Space Center (CCAFS/KSC) in Florida. The United States Air Force’s 45th Weather Squadron (45WS) provides weather warnings for CCAFS/KSC. One of the 45WS operational tasks is to provide forecasts of winds greater or equal to 35 kt with 30 min of lead time desired, and forecasts of winds greater or equal to 50 kt with 60 min of lead time desired, in order to protect personnel, infrastructure, space launch vehicles, and space mission payloads [23,24,25,26,27]. Currently, the 45WS probability of detection (POD) for convective thunderstorms capable of producing such winds is considered high, but the probability of false alarm (POFA, same as false alarm ratio [28]) is also high. It is desired to maintain a high POD as well as high skill scores for other performance metrics such as the True Skill Statistic (TSS) while simultaneously reducing POFA for 45WS wind warnings [27].

Using dual-polarization radar signatures that have physical implications for high surface wind production, this study aims to increase the efficiency in distinguishing convection with the potential to produce downburst winds greater than or equal to 35 kt and convection that does not produce such winds. The downburst verification dataset is obtained from a high-density network of observation towers around CCAFS/KSC, as will be discussed in Section 2.1, which allows for more robust quantitative observations compared to wind reports from human observers [29]. Radar signatures used in this study, as described in Section 2.4, are hypothesized to be related to physical processes that lead to a further development of downbursts at the surface. These radar signatures are input into a Random Forest model in order to train the model and obtain a prediction of either a wind event greater or equal to 35 kt or a null event (i.e., wind event less than 35 kt) for each storm in the dataset. The model also provides a measure of each radar signature’s importance, thus identifying the signatures that showed the strongest performance in the Random Forest (more in Section 2.6). The predictability of each radar signature is also tested using a more simple and intuitive approach by applying thresholds to each signature individually. It is important to note that the spatial extent of each wind event is not addressed in this study and hence no distinction was made between microbursts and macrobursts [2]. To our knowledge, this study is pioneering in the application of dual-polarization radar variables as input into a statistical learning technique to predict downbursts that are validated using a dense network of wind observation towers.

This manuscript is organized as follows: Section 2 presents the materials and methods used in this study. Section 3 shows the Random Forest model results, the signatures that were most relevant to the model, and results from the threshold-based method for each individual radar signature. Section 4 contains a discussion of results and a comparison to other studies, and Section 5 presents conclusions and future work.

2. Materials and Methods

2.1. Cape WINDS Towers and Soundings

Weather observation towers around the CCAFS/KSC complex are used by the 45WS to monitor weather conditions. The Cape Weather Information Network Display System (Cape WINDS) is a network of 29 towers that measures, among other variables, temperature, dew point temperature, peak wind velocity, and mean wind direction. The average station density is one tower per 29 km² [30] and their location around the CCAFS/KSC complex is shown in Figure 1. Most towers contain multiple sensors located at different heights above ground level [30]. In this study, the peak wind velocity in a 5-min period was used to determine if the 35 kt wind threshold was recorded on any tower, and the mean wind direction during the 5-min period was used to help identify the convective cell that produced the downburst. A wind observation recorded by the Cape WINDS network was assumed to occur at a median time of 2.5 min after the start of the reporting period.

Data from KXMR soundings launched at the CCAFS, typically at 00:00, 10:00, and 15:00 UTC every day, were available for this study. This dataset was primarily used to extract specific isotherm heights, such as 0°C, −10°C, and −40°C, which were used in the implementation of some radar parameters, as discussed in Section 2.4. For a given storm, the considered isotherm heights were from the sounding nearest to the majority of the storm’s life cycle.

2.2. C-Band Radar and Processing

A Radtec Titan Doppler Radar, officially named Weather Surveillance Radar (herein 45WS-WSR), is a C-band dual-polarization radar operated by the 45WS to provide weather support to the CCAFS/KSC complex. It operates with a 0.95° beamwidth, 5.33 cm wavelength, 24 samples per pulse, and peak transmitted power of 250 kW [31]. The radar is located about 42 km southwest from the CCAFS/KSC launch towers, which leads to a horizontal beam width of approximately 600 m and peak vertical gap between radar beams of roughly 700 m over the CCAFS/KSC complex [31] (Figure 1). Thirteen elevation angles ranging from 0.2° to 28.3° comprise a volume scan, which takes 2.65 min to complete [32]. Quality control, such as differential attenuation correction, was applied to the raw data prior to their acquisition for this study.

The raw radar data were gridded to a Cartesian coordinate system with a 500 m grid resolution, 1 km constant radius of influence, and a Cressman weighting function [33] using the Python ARM Radar Toolkit [34]. The gridding was performed on linear Z_h and Z_dr, which were then converted back to logarithmic Z_h and Z_dr. The data were gridded out to 100 km north, south, east, and west from the 45WS-WSR and 17 km in the vertical direction. These gridding attributes were selected based on the radar beam width and vertical spacing between radar beams over CCAFS/KSC, and through an empirical analysis using different gridding techniques performed by [31].

The radar variables used in this study were Z_h and Z_dr. An evident reduction in the ρ_hv values are typically observed from this radar, possibly because of the low number of samples per pulse within 45WS-WSR operations. Values of ρ_hv were often below 0.80 in mixed-phase precipitation and below 0.60 in very heterogeneous mixtures of precipitation [31]. For these reasons, ρ_hv data were not used in this study.

2.3. Wind and Null Events

The 2015 and 2016 warm seasons (May through September) were the period used in this study. In order to identify the convective cells that caused winds ≥ 35 kt, hereafter ‘wind events’, the Cape WINDS towers were first analyzed to identify observations of wind greater than that threshold. It is important to note that the 45WS considers the wind value of 35 kt as a hard threshold for its warnings, even with the sensors’ accuracy of 58 kt for the range of 0–39 kt. Therefore, we are also using this hard threshold in this study. The timing of the wind observation was then compared to the radar data timing. The time of a radar volume scan was considered to be the median value within the volume scan’s 2.65 min duration (i.e., approximately 1 min and 20 s after the volume scan initiation time). Next, each wind observation was associated to a single radar volume scan. The wind direction was used to help determine which convective cell was associated with an observed downburst. The convective cell had to be located at a maximum distance of 10 km from the Cape WINDS tower that observed the wind ≥ 35 kt at the moment the downburst occurred. If these requirements were all met, the cell was manually tracked backward in time, which had to last at least 30 min. A box was subjectively defined around the cell throughout its life cycle, ignoring its history after the downburst time. If the cell’s 40 dBZ reflectivity contour was merged with another cell at any height level, both storms were considered as one. These cells were tracked until their initiation or until the radar range distance of 67 km (Figure 1) because vertical gaps in the gridded data become significant at this distance [31]. An example of a wind event is shown in Figure 2, with a red box representing the cell’s spatial definition, which resulted from manual storm tracking. High winds associated with hurricanes, and consistent high biased values in a single instrument not verified in neighboring sensors, were discarded.

Convective cells that did not produce such high winds (i.e., <35 kt; hereafter ‘null events’), were also obtained in order to differentiate them from the wind events and to be used to train the Random Forest model. Null cases were identified by selecting convective cells that passed through the Cape WINDS area (at 15 km distance or less from any tower) and did not produce a Cape WINDS wind observation ≥ 35 kt. The entire life cycle of null events were considered, which had to be at least 25 min. Another requirement for a null event identification was that a 40 dBZ Z_h had to be observed at any altitude for a minimum time period of 10 min.

2.4. Dual-Polarization Radar Signatures

Once the radar data were gridded and the convective cells were identified and tracked, a large number of radar parameters (i.e., signatures) were calculated for every wind and null case. This method can be referred to as ‘semi-automated analysis’, since storms were manually tracked and radar signatures were automatically and objectively calculated for all storms. About 50 signatures were initially considered, all with a physical process hypothesized to be directly or indirectly related to a future occurrence of a downburst, as reviewed in Section 1. A considerable fraction of parameters were representing the same process, with variations in the radar threshold being the only difference. As an example, a signature that uses both Z_h and Z_dr data for identification of precipitation ice was tested using different thresholds of Z_h. Then, in an attempt to reduce the amount of redundant information among the numerous signatures, a correlation analysis was performed. For large correlations (i.e., 0.70 or higher) between two radar signatures, only one signature was kept for further study, which was the signature that had the lowest correlation values with all other radar signatures examined. After this first reduction process, a Principal Component Analysis (PCA) [35] was performed to identify the variables that explained the most variance. The signatures with relatively large correlation (i.e., 0.60 or higher) with the first PCA level—which explains the most variance in the dataset—were selected as the final radar signatures. The number of radar signatures was ultimately reduced to eight, all based on radar variables Z_h and/or Z_dr. The parameters are listed in Table 1 and described in detail below.

Signature #1 implies that storm’s updraft lifts a significant amount of liquid hydrometeors, such as raindrops, above the 0 °C level, creating a column of Z_dr ≥ 1 dB at sufficient reflectivity (Z_h ≥ 30 dBZ). A Z_dr column’s height is associated with updraft strength and storm intensity [36,37,38]. The freezing of these hydrometeors at sub-freezing environmental temperatures eventually produces ice particles, which may contribute to downburst formation. After identifying the 0 °C isotherm height using the KXMR sounding data, it was verified if a single gridded column had continuous Z_dr values ≥ 1 dB from this height upward. The maximum column top height was recorded as the storm’s Z_dr column height. A 30 dBZ Z_h filter was applied to avoid erroneous updraft identification at the edges of storms where positive Z_dr values are also common. It is hypothesized that a higher maximum Z_dr column height would lead to a greater potential of precipitation ice production and hence downburst occurrence at the surface through melting and loading of these hydrometeors.

The lifted liquid hydrometeors eventually freeze in the Z_dr column’s upper boundary, serving as embryos that can produce precipitation ice, such as graupel and hail [39]. The increase in precipitation ice amount above the 0 °C level is represented by both Signatures #2 and #3. Signature #2, also called the precipitation ice signature [31], is a maximum height of the measured −1 dB ≤ Z_dr ≤ +1 dB that is co-located with Z_h ≥ 30 dBZ [38,40]. Signature #3 is the maximum vertically integrated ice (VII), which is a reflectivity-integrated signature to estimate the amount of precipitation ice between the −10 °C and −40 °C isotherms in units of kg m⁻² [41,42]. It is hypothesized that a higher vertical extent of precipitation ice and a larger amount of reflectivity-integrated ice would indicate sufficient precipitation ice growth in both size and quantity, as well as an increase in hydrometeor loading and negative buoyancy. The VII expression is shown in the Equation (1).

VII = π ρ_{i} N_{0}^{\frac{3}{7}} {(\frac{5.28 \times 10^{- 18}}{720})}^{\frac{4}{7}} \int_{h (- 10 C)}^{h (- 40 C)} z_{h}^{\frac{4}{7}} dh

(1)

where ρ_i is the density of ice and N₀ is the intercept parameter, assumed to be equal to 917 kg m⁻³ and 4 × 10⁶ m⁻⁴, respectively, z_h is the linear reflectivity (in mm⁶ m⁻³), and h is the height of the specified isotherms in meters [41,42].

Signatures #4 and #5 are indirectly related to the ice calculation. A higher altitude of the peak Z_h (Signature #4) and the peak Z_h value above the 0°C isotherm (Signature #5) are associated with the number and concentration of hydrometeors at high levels, which are usually associated with precipitation ice loading that may produce negative buoyancy [23].

Signatures #6–#8 are reflectivity-based parameters that consider the entire storm in their calculations. The number and concentration of all hydrometeor types are considered at all height levels for these signatures. A larger value for these three signatures is likely related to larger hydrometeor loading and increased likelihood of downburst generation. Signature #6 is the peak Z_h in the storm, which can be at any height level, even below the 0°C level. Similarly to Signature #3, the VIL signature (Signature #7) is an integration of z_h through the storm’s depth, as shown in equation 2 in units of kg m⁻² [43].

VIL = 3.44 \times 10^{- 6} \int {z_{h}}^{\frac{4}{7}} dh

(2)

Signature #8 is Density of VIL (DVIL) in units of g m⁻³, which is simply VIL/echotop, with echotop being defined as the storm’s maximum 18 dBZ Z_h height in km [44].

Figure 3 highlights most of the aforementioned radar signatures for a wind event that occurred on 09 June 2015. It consists of a Z_dr vertical cross-section plot at the location marked with a black line in Figure 2. A Z_dr column (Signature #1) can be seen as warm colors about 10 km east from the radar center extending approximately 1.5 km above the 0°C isotherm height, which is marked as a blue horizontal line. The precipitation ice signature (Signature #2) can be seen as Z_dr ~ 0 dB (denoted by gray colors) co-located with Z_h ≥ 30 dBZ, shown as black contours. This signature reaches its maximum height at 8.5 km AGL about 11 km east from radar. Other signatures, such as peak Z_h and its height above ground level, can also be inferred from this plot.

2.5. Random Forest

This study uses a Random Forest model for training and forecasting of wind events. Random Forest is a tree-based method that combines multiple Decision Trees [45,46,47,48]. Decision Trees consist of a series of splitting rules that stratifies observations into nodes, using predictors that best split the observations. In our study, the radar signatures’ maximum values through a tracked storm’s life cycle are used as inputs for the model, and classification trees are used to discriminate wind and null events. Random Forests build hundreds of Decision Trees, each taking a different storm sample (about two-thirds) from the total storm data set. Each Decision Tree built is a separate model, and the resulting prediction among all trees is averaged to reduce variance, which is high for a single decision tree because trees are not highly correlated. Also, Random Forest uses only a small sample of predictors as split candidates in every tree node. Using a limited number of predictors as split candidates usually yields even smaller errors than considering all predictors (the so-called bagged trees), and averaging the resulting trees leads to an even larger reduction in variance.

In order to implement the Random Forest model, the R package Random Forest was used [49], where 500 trees were built using the entire set of storms as the training dataset. Two predictors were used as split candidates, consistent with the Random Forest default settings of using approximately the square root of the total number of predictors available [46]. No separate testing dataset was defined because it is possible to obtain the model’s error through the set of storms not used for tree’s construction, called out-of-bag (OOB) storms. As previously mentioned, each tree uses approximately two-thirds of the storm sample, which are randomly chosen. Storms not used to fit a given tree are called out-of-bag observations. As a result, each storm was out-of-bag for approximately one-third of trees. All trees’ predictions for a given OOB storm are counted and the majority vote among all of these trees is considered as the Random Forest single prediction for that storm. For example, a vote equal to 0.6 for a given storm means that 60% of trees predicted that storm to be a wind event, while the other 40% predicted it to be a null event. The majority vote is considered as the Random Forest prediction (i.e., the wind/null classification is made based on whichever classification receives a vote greater than 0.5). This way, every storm has a wind/null prediction based on a model that used the entire storm dataset for training, without the need for a testing dataset. It is shown in Section 2.5 that this methodology is relevant and equivalent to an approach that applies a model using a separate training and testing datasets.

A classification prediction is obtained for each storm and a summary of all storm predictions can be displayed in a simple contingency table or confusion matrix, from which performance metrics can be calculated [50]. The most intuitive metric for wind event predictability is the Probability of Detection (POD), which is the number of correct wind event forecasts divided by the total number of wind event observations. The Probability of False Alarm (POFA, same as false alarm ratio) is also used in this study, which is the number of incorrect wind forecasts divided by the total number of wind forecasts. The False Alarm Rate (F) is the number of incorrect wind forecasts divided by the total number of null observations. F is important to define because it is an analog to the POD, since it is a fraction of incorrectness of null events, while POD is fraction of correctness for the wind events. For that reason, the TSS is the main metric used in this study to evaluate the predictability of a model, since its formula can be simplified to the difference between POD and F. Thus, TSS is a simple and relevant measure of model performance because it balances the wind and null events’ predictability equally within the model, independent of the size of each dataset. A secondary metric used in this study for Random Forest predictability is the OOB estimate of error rate, which is the number of incorrect wind and null predictions divided by the total number of events. This is equivalent to 1-PC, where PC is the Proportion Correct, or the sum of the number of correct wind and null predictions divided by the total number of events. This metric differs from TSS, since each event, wind or null, is equally considered in its computation. Because of this, if the size of a particular class (wind or null) is greater than the other, this class would be weighted more heavily in the OOB estimate of error rate (or 1-PC) calculation.

2.6. Mean Decrease Accuracy and Mean Decrease Gini

Since Random Forest is a method that builds hundreds of trees for its model development, it is not easy to determine the most important signatures that contributed most greatly to an increase in the model performance. However, two methods that account for the signatures’ importance quantitatively for all trees are available when running the model [46]. The Mean Decrease Accuracy (MDA) is obtained by recording the OOB observation error for a given tree, and then the same is done after permuting each signature from the tree. The difference between the two results is calculated, and differences for all trees are obtained, averaged, and normalized by the standard deviation of the differences. A large MDA value indicates that there was a significant decrease in model accuracy once the signature was removed, indicating an important signature.

The Mean Decrease Gini (MDG) is the second method to obtain the signatures’ importance. The Gini index is a measure of node purity, being small for a node with a dominant class (wind or null classes are predominant for the OOB events that occurred at that given tree node). MDG is the sum of the decrease in the Gini index by splits over a given signature for a tree, averaged over all trees. Similar to the MDA, a large MDG value indicates an important predictor. Both variable importance methods were calculated in order to evaluate the most important signatures for the Random Forest model.

2.7. Single Signature Predictability

A simple method to determine the predictability of each individual radar signature was performed in order to compare with the Random Forest model results. The predictability of each signature in Table 1 was tested by applying different thresholds for each signature and testing them for all wind and null events. It was verified if a given threshold was observed before the downburst time for wind events and at any time during the life cycle of null events. Through these methods, statistics were obtained in a contingency table and performance metrics were calculated. The performance metrics calculated were the same as presented in Section 2.5, with TSS being the primary metric used for comparison of results between the single signatures and the Random Forest.

3. Results

3.1. Random Forest

Using the methods described in Section 2.3, a total of 84 wind events and 125 null events were identified from the 2015 and 2016 warm seasons. Table 2 presents the Random Forest’s out-of-bag confusion matrix, or contingency table, showing the number of correct and incorrect predictions for all wind and null events. For wind events, the random forest model predicted 49 out of the 84 events correctly, leading to a POD of 58%. For null events, the model correctly determined 102 out of 125 events. This means that 82% of null events were correctly depicted, or an F of 18% (note that this is not the same as POFA). The Random Forest prediction of null events is noticeably better than the prediction of wind events. In total, 58 out of all 209 events were incorrectly predicted, or an OOB estimate of error rate of 28%. The POFA for the model is 32%. The resultant TSS for the Random Forest model is 0.40, which is in the range of TSS values that are considered marginal for operational utility by the 45WS (i.e., 0.3 to 0.5) [24].

The OOB votes for each storm can also be accessed from the Random Forest model. Votes are the fraction of trees that predicted a given storm as a wind event, considering all trees that have not used that storm for training. In a classification Random Forest, a storm with a vote greater than 0.5 is considered a wind event. In this way, votes may be interpreted as a qualitative ‘probability’ for a storm to become a wind event. Figure 4 shows every storm’s maximum wind magnitude measured by the Cape WINDS network in terms of its Random Forest vote. The vertical line depicts the wind event threshold of 35 kt, separating wind events to the right and null events to the left of the chart. The horizontal line at a vote equal to 0.5 determines the Random Forest’s wind and null classification prediction above and below the line, respectively. The upper-right and the lower-left portions of the plot represent the random forest’s correct predictions in the same manner as Table 2. The upper-left and the lower-right sections of Figure 4 represent the false alarms and misses of the model, respectively, or the Random Forests’ incorrect predictions. In the lower-left quadrant, it can be seen that the correct negative events are numerous and spread out over most of the quadrant area. Few null events were incorrectly identified by the Random Forest as wind events, as can be seen in the upper-left quadrant. A significant number of storms produced peak winds around 35 kt, which is near the wind magnitude threshold that separated wind events from null events. The Random Forest model struggled to predict those borderline events as either wind or null, as evident by the wide range of vote values. If we examine events that produced peak winds between 35 kt and 40 kt, 38 out of 66 (58%) were correctly identified as wind events. Storms with a maximum wind magnitude greater than 40 kt were less numerous, but the Random Forest model classified them more correctly than events with peak winds between 35 kt and 40 kt. Eighteen storms had winds greater than 40 kt and the Random Forest model correctly classified 11 of these as wind events, or 61%. Based on these results, it seems that the POD of wind events increased with increasing downburst strength. This corroborates with a tendency for an increase of Random Forest votes with an increase in wind magnitude, even with the presence of some outlier events to this tendency.

The Mean Decrease Accuracy (MDA) and the Mean Decrease Gini (MDG) values for each signature are shown in Table 3. A large MDA and MDG value indicates a high importance of the radar signature for the Random Forest. The two most important signatures were VII and peak Z_h over the entire cell. VII is the signature with the highest MDG and second-highest MDA, while peak Z_h over the entire convective cell is the signature with the highest MDA and second-highest MDG. The two signatures with the lowest MDA and MDG are the height of precipitation ice and the height of peak Z_h, with the latter yielding a negative MDA.

3.2. Single Signatures

The individual predictability for each of the eight signatures were computed by defining thresholds for each signature and verifying if a signature value greater than that threshold occurred at least once before a wind event’s downburst time and at any time during a null event’s life cycle. This procedure was applied to all 209 storms, which is the same dataset used in the Random Forest simulation. From these predictions of the wind and null events, a number of performance metrics were obtained to evaluate each signature’s predictability over a range of physically realistic thresholds. The main metric used for comparisons with the Random Forest simulations was TSS. The calculation of 1-PC was also performed because it is equivalent to the Random Forest’s OOB estimate of error rate. Lastly, the well-known POD and POFA were calculated as well.

Figure 5 shows the performance metrics for different thresholds for all eight radar signatures. As expected, POD and POFA generally decrease as the signatures’ thresholds increase. The maximum TSS observed for each signature was between 0.35 and 0.40 for six out of the eight signatures. The highest TSS among all signatures and thresholds tested is 0.43, which was observed for a threshold of 52 dBZ for the peak storm Z_h at any height (Signature #6, Figure 5f). This specific signature’s threshold presented POD, POFA, and 1-PC values equal to 0.83, 0.42, and 0.31, respectively. The signature that presented the smallest maximum TSS was the height of peak Z_h (Signature #4, Figure 5d), which was 0.29 at a threshold of 1250 m above the 0 °C isotherm height.

In general, the curves for 1-PC in Figure 5 have an approximate negative correlation to the TSS curves, since a lower 1-PC value means a better prediction, while for TSS a larger value indicates a better prediction. For signatures S#1 and S#8, the minimum 1-PC is found at the same signature threshold as the maximum TSS. For the Z_dr column signature (S#1), the maximum TSS and minimum 1-PC occurs for a threshold of 2750 m (TSS of 0.36 and 1-PC of 0.27), but this threshold presented an undesirable POD smaller than 50% (POD of 0.43). The Signature #8 DVIL has a maximum TSS of 0.39 and a minimum 1-PC of 0.26 for a threshold of 1.9 kg m⁻², but its POD is also lower than 50% (POD of 0.49).

VII signature (S#3) presents maximum TSS and minimum 1-PC for the same threshold of 4 kg m⁻², in which TSS is 0.40 and 1-PC is 0.29. However, other thresholds of 4.5 and 5.5 kg m⁻² have the exact same minimum 1-PC, but these thresholds have lower TSS, POD, and POFA (Figure 5c). For the other five signatures (S#2, S#4–S#7), the minimum 1-PC occurs at higher thresholds than the maximum TSS, which resulted in lower TSS, POD and POFA for the thresholds with the minimum 1-PC. In addition, VIL (S#7) presented more than one threshold with the same minimum 1-PC value, with 16 and 17 kg m⁻² having 1-PC equal to 0.27.

The maximum TSS for each signature is shown in Figure 6, which is organized in terms of POD, POFA, and TSS. TSS increases toward the top left of the plot and is negative (i.e., worse than a random forecast) to the right of POFA equal to 0.6. As previously mentioned, two signatures had maximum TSS for thresholds with POD of less than 0.5. The other six signatures presented a maximum TSS for thresholds with POD of greater than 0.5, but with a relatively high POFA around 0.4.

4. Discussion

Random Forest OOB prediction for wind and null events presents better performance metrics than most of the single signatures’ predictions, as described in Section 3.2. Random Forest correctly depicted 58% of wind events and 82% of null events, leading to an overall correct prediction of 72% for all events. In this study, the main performance metric used for predictability analysis is the TSS, which weighs each storm category (winds and nulls) equally. In the TSS equation, half of its formulation comes from the wind events’ predictability (a/(a+c); see Table 2), while the other half considers the null events’ predictability (b/(b+d)). In this way, the TSS equation is independent of how much larger a given category is compared to the other. The other performance metric used in this study is the Random Forest’s OOB estimate of error rate, or 1-PC for single signature predictions. These equations are represented by the sum of all storms incorrectly predicted divided by the total number of events. This means that every storm is equally considered independently of whether it is a wind or a null event. In this study, since the null dataset comprises almost 60% of our entire dataset, the TSS weights wind events more heavily in its calculation compared to the OOB estimate of error rate.

The Random Forest’s TSS of 0.40 is larger than most of the single signatures’ best TSS. The only single signature threshold that had a larger TSS than the Random Forest OOB estimate is the maximum Z_h over the entire storm (Signature #6) using the 52 dBZ threshold. This signature’s threshold presented a TSS equal to 0.43 due to its relatively high wind event predictability (POD of 0.83). However, its null event predictability is worse than the Random Forest model, since it only predicted 60% of these events correctly. Therefore, the F and the POFA were 0.40 and 0.42, respectively. Thresholds smaller than 52 dBZ showed higher F, while thresholds greater than 52 dBZ presented smaller POD, with both patterns leading to smaller TSS as shown in Figure 5f. In contrast to this single radar signature, the Random Forest model results show much better prediction for null events but a poorer wind event prediction, leading to a slightly lower TSS. The single parameter approach is simpler to apply operationally but it does not contrast null events to the wind events as well as the multi-parameter Random Forest model. Also, a 1 dB variation from this signature threshold leads to a lower TSS than Random Forest results, which is within the Z_h measurement error. Hence, the Random Forest model is preferred due to it being a more robust model in comparison to the simpler single signature approach. However, the user should consider taking into account whether the wind detection is preferred over incorrect null event detection, or if a low F is more important for operational applications.

A VII threshold of 4 kg m⁻² presented the exact same TSS as that of the Random Forest multi-parameter model results. However, this signature’s POD and POFA are slightly larger (0.63 and 0.35) than those of the Random Forest. Similar to the Signature #6 case, a small variation of only 0.5 kg m⁻² in the VII threshold produces poorer TSS than the Random Forest model. The other six signatures present lower TSS values than the Random Forest, which indicates a worse balance between wind detection and F. As shown in Figure 6, these signatures have high POFA (greater than 0.39) or low POD (lower than 0.49).

The Random Forest OOB estimate of error rate is 28%, which is the percentage of total events (winds and nulls) incorrectly predicted. As stated previously, this metric takes into account null events’ performance more than wind events’ simply because of null events comprising a larger percentage of the total dataset than wind events. The Random Forest model depicted null events with greater skill than wind events; therefore, this metric generally presents better results than single signature predictions. As shown in Figure 5, single signatures present their minimum 1-PC at higher thresholds than their maximum TSS. This is due to the low F these thresholds present, which is related to the fact that the null events’ predictability has greater importance for this performance metric. The signature threshold associated with this minimum 1-PC also presents lower POD, since 1-PC weighs wind event predictability less than TSS does. This is the primary reason why Random Forest OOB estimate of error rate has better results (i.e., a lower value) than five single signatures’ best 1-PC threshold. The five signatures with a 1-PC poorer than the Random Forest model are S#2-S#6. The three signatures that presented better 1-PC values than the Random Forest model yielded their strongest 1-PC value at a threshold that also presented a POD lower than 50%, which is undesirable.

The MDA and MDG calculated for all radar signatures (Table 2) indicated that VII and peak Z_h were the most important signatures for the Random Forest model. Most of the other signatures also presented positive values, indicating they contributed to an improved discrimination between wind and null classes. The height of the peak Z_h (Signature #4) was the only signature that presented a negative MDA. To examine potential effects this signature may have on the performance of the Random Forest model, an additional Random Forest run was performed using only seven of the original signatures, removing Signature #4. Resultant predictions showed slightly worse performance metrics than the original model run, with POD, POFA, and TSS equal to 0.57, 0.34, and 0.37, respectively, and positive MDA and MDG for all signatures. This implies that removing signatures is not required and even causes a reduction in Random Forest model performance.

An earlier study [31] explored downbursts at CCAFS/KSC using the same Cape WINDS tower data and some of the same storms used in this study, but with a smaller dataset. They used similar signatures and analyzed performance metrics from signature thresholds by visual, subjective analysis, in contrast to this study, which used a semi-automated objective analysis (i.e., storms were manually tracked and radar signatures were calculated automatically). The prior study [31] assessed five dual-polarization radar signatures, three of which are coincident with this study: height of the Z_dr column, height of the precipitation ice signature, and peak Z_h. The results from the Random Forest and objective single signature analyses herein are compared with the results from the subjective single signature analyses in [31] in the following paragraphs.

The Z_dr column signature visually identified in [31] presents better results than the semi-automated single signature method and Random Forest model herein. For any given threshold, ref. [31] shows larger POD and TSS and smaller POFA than the semi-automated single signature approach. For example, for 2000 m above the 0 °C level, [31]’s POD, POFA, and TSS values are 0.84, 0.21, and 0.63 respectively, while for the semi-automated single signature analysis, these performance metrics are 0.63, 0.40, and 0.34, respectively. In [31], the Z_dr column threshold with highest TSS is 2500 m, while for the semi-automated single signature the threshold with the highest TSS is 2750 m. Signature threshold resolutions are different between these two studies for this signature, being 500 m for [31] and 250 m for this study, which may have contributed to some of the differences in these results.

A similar behavior can be seen for the other two common signatures between these two studies. For the precipitation ice signature, subjective visual analysis in [31] yielded much better results, with the best TSS being 0.75 for the thresholds of 4500 and 5000 m, while for the semi-automated single signature analysis the maximum TSS was 0.37 for the threshold of 6500 m. This signature was observed at high altitudes within null events more often using the semi-automated analysis than in the visual analysis, where it was rarely observed. For example, in this study, about 31% of null events had this signature for the threshold of 6500 m above 0 °C level. This difference is speculated to be due to the expanded null event definition used in this study. The study in [31] only considered a single updraft–downdraft cycle for null events while this study used the entire null event life cycle.

For the maximum Z_h signature, the subjective visual analysis in [31] had better performance metrics than the semi-automated analysis for any given threshold. For the 50 dBZ threshold, the visual analysis in [31] had POD, POFA, and TSS values of 0.94, 0.33, and 0.47, respectively, while the semi-automated analysis results herein are 0.91, 0.52 and 0.24, respectively. For the 55 dBZ threshold, the POD, POFA and TSS in [31] were 0.47, 0.06, and 0.44, respectively, while for the semi-automated analysis, the same metrics are 0.45, 0.25, and 0.34, respectively. As with the Z_dr columns discussed above, the resolution used for the maximum Z_h signature was different between these studies, being 5 dBZ in [31] and 1 dBZ for this study. The threshold with highest TSS was 50 dBZ in [31], with visual analysis yielding a TSS of 0.47, and the highest TSS for the semi-automated analysis is 52 dBZ, with a TSS of 0.43. Interestingly, the wind event detection is roughly the same for both methods, since POD is similar. However, there are more null events being detected in the semi-automated method compared to the visual method for these thresholds. As mentioned previously for the precipitation ice signature, the main reason for this difference is likely the different null event definitions used in these studies.

The Random Forest OOB approach is suitable for analysis since it presents results that are comparable to a method that uses a dataset to train the Random Forest model and a separate dataset to test the model. To simulate this, one storm was removed from the original dataset, and the Random Forest model was trained using all remaining 208 storms. Then, the model was applied to the removed storm, which became the single test storm. The same procedure was repeated for all storms and the output from each Random Forest run was compared to the real storm’s category, wind or null. These results were then summarized using the same performance metrics used throughout this study. The results for this approach were very similar to the OOB approach (i.e., an approach that does not require splitting the dataset between training and testing), with a POD of 0.58, POFA of 0.32, TSS of 0.39, and an error rate of 0.28. In an operational setting, this approach would be suitable for application because of the straightforward method of Random Forest to be trained and applied to an ongoing convective cell. In addition, the OOB method used in this study generally agrees with the aforementioned operational approach, attesting to its suitableness for use in operations.

5. Conclusions

This study presented a Random Forest classification method for downburst forecasting around the CCAFS/KSC. The parameters ingested into the Random Forest model are based on dual-polarization radar signatures that have physical implications for downdraft intensification and the occurrence of a strong downburst at the surface. The Cape WINDS high density wind towers data provided unique quantitative wind observations, in contrast to wind reports based on surface damage that are frequently used where such observations are not available. A Random Forest consists of hundreds of decision trees, each using about two-thirds of the total storm dataset to be trained. For each tree node, only two signatures are candidates to be used for a tree’s split, and one signature is ultimately used. This procedure results in lower variance, and hence better results than a single decision tree. Then, the OOB method is used to obtain a prediction result for each storm, avoiding the necessity for separate training and testing datasets.

The Random Forest model depicted null events better than wind events. The POD for the stronger downbursts was higher than for downbursts with maximum wind magnitude close to the wind event threshold of 35 kt. This corroborates with an expected tendency of wind detection increasing as the wind magnitude increases, as shown in Figure 4.

When compared to a threshold-based method for each single signature, the Random Forest model is preferred because of its robustness. Some single signature thresholds presented better TSS than the Random Forest model. However, they had poorer performance for thresholds close enough to be within the radar measurement error. Also, some single signatures with high TSS or low 1-PC metrics occurred at thresholds with POD lower than 0.5 or relatively high POFA.

The Random Forest OOB method was equivalent to an approach where a storm is separated from the model to be used as testing data. The latter approach, which had similar results to the OOB method, is suitable for adaptation in an operational forecast office. The 45WS and other users can decide among the methods presented in this study whether a better wind event detection or a lower false alarm is desirable. However, given its robust performance, the aforementioned Random Forest approach is recommended for continued investigation and operational testing. Before operational implementation and testing, future work should include a storm identification and tracking algorithm such as [51,52] in order to make the proposed Random Forest method fully objective and automated.

Author Contributions

Conceptualization, L.D.C., W.P.R., T.M.M., and R.J.B.; Data curation, B.L.M., C.G.A., and R.M.M.; Formal analysis, B.L.M., C.G.A., and R.M.M.; Funding acquisition, W.P.R. and R.J.B.; Investigation, B.L.M., L.D.C., C.G.A., R.M.M., and W.P.R.; Methodology, B.L.M., L.D.C., C.G.A., and R.M.M.; Project administration, L.D.C., W.P.R., and R.J.B.; Resources, W.P.R. and T.M.M.; Software, B.L.M., C.G.A., and R.M.M.; Supervision, L.D.C.; Validation, B.L.M., C.G.A., and R.M.M.; Visualization, B.L.M.; Writing—original draft, B.L.M.; Writing—review & editing, B.L.M., L.D.C., C.G.A., R.M.M., W.P.R., T.M.M., and R.J.B.

Funding

This research was funded by NASA Marshall Space Flight Center (MSFC) and the 45th Weather Squadron (45WS) under NASA MSFC, grant number NNX15AR78G.

Acknowledgments

The authors would like to thank John Mecikalski for providing initial insight into best practices for implementing the Random Forest method for nowcasting high impact convective weather events. We also thank Jeffrey Zautner for providing the Cape WINDS and KXMR sounding data.

Conflicts of Interest

The authors declare no conflict of interest.

References

Fujita, T.T. Manual of Downburst Identification for Project NIMROD; SMRP Res. Paper 156; University of Chicago: Chicago, IL, USA, 1978; p. 104. [Google Scholar]
Fujita, T.T.; Wakimoto, R.M. Microbursts in JAWS depicted by Doppler radars, PAM, and aerial photographs. In Proceedings of the 21st Conference on Radar Meteorology, Edmonton, AB, Canada, 19–23 September 1983; pp. 638–645. [Google Scholar]
Fujita, T.T. Tornadoes and downbursts in the context of generalized planetary scales. J. Atmos. Sci. 1981, 38, 1511–1534. [Google Scholar] [CrossRef]
Hjelmfelt, M.R. The microbursts of 22 June 1982 in JAWS. J. Atmos. Sci. 1987, 44, 1646–1665. [Google Scholar] [CrossRef]
Hjelmfelt, M.R. Structure and life cycle of microburst outflows observed in Colorado. J. Appl. Meteorol. 1988, 27, 900–927. [Google Scholar] [CrossRef]
Wakimoto, R.M.; Bringi, V.N. Dual-polarization observations of microbursts associated with intense convection: The 20 July storm during the MIST project. Mon. Weather Rev. 1988, 116, 1521–1539. [Google Scholar] [CrossRef]
Knupp, K.R. Structure and evolution of a long-lived, microburst-producing storm. Mon. Weather Rev. 1996, 124, 2785–2806. [Google Scholar] [CrossRef]
Mahale, V.N.; Zhang, G.; Xue, M. Characterization of the 14 June 2011 Norman, Oklahoma, downburst through dual-polarization radar observations and hydrometeor classification. J. Appl. Meteorol. Climatol. 2016, 55, 2635–2655. [Google Scholar] [CrossRef]
Kuster, C.M.; Heinselman, P.L.; Schuur, T.J. Rapid-update radar observations of downbursts occurring within an intense multicell thunderstorm on 14 June 2011. Weather Forecast. 2016, 31, 827–851. [Google Scholar] [CrossRef]
Srivastava, R.C. A simple model of evaporatively driven dowadraft: Application to microburst downdraft. J. Atmos. Sci. 1985, 42, 1004–1023. [Google Scholar] [CrossRef]
Proctor, F.H. Numerical simulations of an isolated microburst. Part I: Dynamics and structure. J. Atmos. Sci. 1988, 45, 3137–3160. [Google Scholar] [CrossRef]
Proctor, F.H. Numerical simulations of an isolated microburst. Part II: Sensitivity experiments. J. Atmos. Sci. 1989, 46, 2143–2165. [Google Scholar] [CrossRef]
Hjelmfelt, M.R.; Roberts, R.D.; Orville, H.D.; Chen, J.P.; Kopp, F.J. Observational and numerical study of a microburst line-producing storm. J. Atmos. Sci. 1989, 46, 2731–2744. [Google Scholar] [CrossRef]
Fu, D.; Guo, X. Numerical study on a severe downburst-producing thunderstorm on 23 August 2001 in Beijing. Adv. Atmos. Sci. 2007, 24, 227–238. [Google Scholar] [CrossRef]
Srivastava, R.C. A model of intense downdrafts driven by the melting and evaporation of precipitation. J. Atmos. Sci. 1987, 44, 1752–1774. [Google Scholar] [CrossRef]
Lolli, S.; Di Girolamo, P.; Demoz, B.; Li, X.; Welton, E.J. Rain evaporation rate estimates from dual-wavelength lidar measurements and intercomparison against a model analytical solution. J. Atmos. Ocean. Technol. 2017, 34, 829–839. [Google Scholar] [CrossRef]
Smith, T.M.; Elmore, K.L.; Dulin, S.A. A damaging downburst prediction and detection algorithm for the WSR-88D. Weather Forecast. 2004, 19, 240–250. [Google Scholar] [CrossRef]
Wolfson, M.M.; Delanoy, R.L.; Forman, B.E.; Hallowell, R.G.; Pawlak, M.L.; Smith, P.D. Automated microburst wind-shear prediction. Linc. Lab. J. 1994, 7, 399–426. [Google Scholar]
Lagerquist, R.; McGovern, A.; Smith, T. Machine learning for real-time prediction of damaging straight-line convective wind. Weather Forecast. 2017, 32, 2175–2193. [Google Scholar] [CrossRef]
Kumjian, M.R.; Ryzhkov, A.V. Polarimetric signatures in supercell thunderstorms. J. Appl. Meteorol. Climatol. 2008, 47, 1940–1961. [Google Scholar] [CrossRef]
Suzuki, S.I.; Maesaka, T.; Iwanami, K.; Misumi, R.; Shimizu, S.; Maki, M. Multi-parameter radar observation of a downburst storm in Tokyo on 12 July 2008. SOLA 2010, 6, 53–56. [Google Scholar] [CrossRef]
Richter, H.; Peter, J.; Collis, S. Analysis of a destructive wind storm on 16 November 2008 in Brisbane, Australia. Mon. Weather Rev. 2014, 142, 3038–3060. [Google Scholar] [CrossRef]
Loconto, A.N. Improvements of Warm-Season Convective Wind Forecasts at the Kennedy Space Center and Cape Canaveral Air Force Station. Master’s Thesis, Department of Chemical, Earth, Atmospheric, and Physical Sciences, Plymouth State University, Plymouth, NH, USA, 2006. [Google Scholar]
Rennie, J.J. Evaluating WSR-88D Methods to Predict Warm-Season Convective Wind Events at Cape Canaveral Air Force Station and Kennedy Space Center. Master’s Thesis, Department of Atmospheric Science and Chemistry, Plymouth State University, Plymouth, NH, USA, 2010. [Google Scholar]
Harris, R.A. Comparing Variable Updraft Melting Layer Heights to Convective Wind Speeds Using Polarimetric Radar Data. Master’s Thesis, Department of Atmospheric Science and Chemistry, Plymouth State University, Plymouth, NH, USA, 2011. [Google Scholar]
Scholten, C.A. Dual-Polarimetric Radar Characteristics of Convective-Wind-Producing Thunderstorms over Kennedy Space Center. Master’s Thesis, Department of Atmospheric Science and Chemistry, Plymouth State University, Plymouth, NH, USA, 2013. [Google Scholar]
Roeder, W.P.; Huddleston, L.L.; Bauman, W.H.; Doser, K.B. Weather research requirements to improve space launch from Cape Canaveral Air Force Station and NASA Kennedy Space Center. In Proceedings of the Space Traffic Management Conference, Daytona Beach, FL, USA, 26 June 2014. [Google Scholar]
Barnes, L.R.; Schultz, D.M.; Gruntfest, E.C.; Hayden, M.H.; Benight, C.C. CORRIGENDUM: False alarm rate or false alarm ratio? Weather Forecast. 2009, 24, 1452–1454. [Google Scholar] [CrossRef]
Edwards, R.; Allen, J.T.; Carbin, G.W. Reliability and climatological impacts of convective wind estimations. J. Appl. Meteorol. Climatol. 2018, 57, 1825–1845. [Google Scholar] [CrossRef]
Computer Sciences Raytheon. 45th Space Wing Eastern Range Instrumentation Handbook—CAPE WINDS; Computer Sciences Raytheon: Brevard County, FL, USA, 2015; p. 27. [Google Scholar]
Amiot, C.G.; Carey, L.D.; Roeder, W.P.; McNamara, T.M.; Blakeslee, R.J. C-band Dual-Polarization Radar Signatures of Wet Downbursts around Cape Canaveral, Florida. Weather Forecast. 2019, 34, 103–131. [Google Scholar] [CrossRef]
Roeder, W.P.; McNamara, T.M.; Boyd, B.F.; Merceret, F.J. The new weather radar for America’s space program in Florida: An overview. In Proceedings of the 34th Conference on Radar Meteorology, Williamsburg, VA, USA, 5 October 2009. [Google Scholar]
Cressman, G.P. An operational objective analysis system. Mon. Weather Rev. 1959, 87, 367–374. [Google Scholar] [CrossRef]
Helmus, J.J.; Collis, S.M. The Python ARM Radar Toolkit (Py-ART), a library for working with weather radar data in the Python programming language. J. Open Res. Softw. 2016, 4, e25. [Google Scholar] [CrossRef]
Lorenz, E. Empirical Orthogonal Functions and Statistical Weather Prediction; MIT Department of Meteorology: Cambridge, MA, USA, 1956; p. 49. [Google Scholar]
Illingworth, A.J.; Goddard, J.W.F.; Cherry, S.M. Polarization radar studies of precipitation development in convective storms. Q. J. R. Meteorol. Soc. 1987, 113, 469–489. [Google Scholar] [CrossRef]
Tuttle, J.D.; Bringi, V.N.; Orville, H.D.; Kopp, F.J. Multiparameter radar study of a microburst: Comparison with model results. J. Atmos. Sci. 1989, 46, 601–620. [Google Scholar] [CrossRef]
Herzegh, P.H.; Jameson, A.R. Observing precipitation through dual-polarization radar measurements. Bull. Am. Meteorol. Soc. 1992, 73, 1365–1376. [Google Scholar] [CrossRef]
Hubbert, J.C.V.N.; Bringi, V.N.; Carey, L.D.; Bolen, S. CSU-CHILL polarimetric radar measurements from a severe hail storm in eastern Colorado. J. Appl. Meteorol. 1998, 37, 749–775. [Google Scholar] [CrossRef]
Straka, J.M.; Zrnić, D.S.; Ryzhkov, A.V. Bulk hydrometeor classification and quantification using polarimetric radar data: Synthesis of relations. J. Appl. Meteorol. 2000, 39, 1341–1372. [Google Scholar] [CrossRef]
Carey, L.D.; Rutledge, S.A. The relationship between precipitation and lightning in tropical island convection: A C-band polarimetric radar study. Mon. Weather Rev. 2000, 128, 2687–2710. [Google Scholar] [CrossRef]
Mosier, R.M.; Schumacher, C.; Orville, R.E.; Carey, L.D. Radar nowcasting of cloud-to-ground lightning over Houston, Texas. Weather Forecast. 2011, 26, 199–212. [Google Scholar] [CrossRef]
Greene, D.R.; Clark, R.A. Vertically integrated liquid water—A new analysis tool. Mon. Weather Rev. 1972, 100, 548–552. [Google Scholar] [CrossRef]
Amburn, S.A.; Wolf, P.L. VIL density as a hail indicator. Weather Forecast. 1997, 12, 473–478. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning; Springer: New York, NY, USA, 2013; Volume 112. [Google Scholar]
Mecikalski, J.R.; Williams, J.K.; Jewett, C.P.; Ahijevych, D.; LeRoy, A.; Walker, J.R. Probabilistic 0–1-h convective initiation nowcasts that combine geostationary satellite observations and numerical weather prediction model data. J. Appl. Meteorol. Climatol. 2015, 54, 1039–1059. [Google Scholar] [CrossRef]
Ahijevych, D.; Pinto, J.O.; Williams, J.K.; Steiner, M. Probabilistic forecasts of mesoscale convective system initiation using the random forest data mining technique. Weather Forecast. 2016, 31, 581–599. [Google Scholar] [CrossRef]
Liaw, A.; Wiener, M. Classification and regression by randomForest. R News 2002, 2, 18–22. [Google Scholar]
Wilks, D. Statistical Methods in the Atmospheric Sciences, 3rd ed.; Academic Press: Cambridge, MA, USA, 2011; p. 676. [Google Scholar]
Lakshmanan, V.; Rabin, R.; DeBrunner, V. Multiscale storm identification and forecast. Atmos. Res. 2003, 67–68, 367–380. [Google Scholar] [CrossRef]
Lakshmanan, V.; Smith, T.; Stumpf, G.; Hondl, K. The warning decision support system integrated information. Weather Forecast. 2007, 22, 596–612. [Google Scholar] [CrossRef]

Figure 1. Cape WINDS tower locations around CCAFS/KSC (red), the 45WS-WSR radar location (blue), and the approximated 67 km range from the 45WS-WSR radar (shaded blue).

Figure 2. Z_h at 5 km AGL on 06/09/2015 at 1915 UTC. The spatial definition of a cell associated with a wind event is highlighted as a red box, and the gray ‘X’s show Cape WINDS tower locations. The solid black line indicates the plane of the vertical-cross section shown in Figure 3.

Figure 3. Vertical cross-section of Z_dr (shaded) and Z_h (black contour every 10 dBZ, from 10 dBZ to 50 dBZ) at the location shown as black line in Figure 2. The horizontal blue line indicates the 0 °C isotherm height.

Figure 4. Random Forest vote for all events as a function of the observed maximum wind magnitude in kt. The vertical line depicts the wind event threshold of 35 kt. The horizontal line at a vote of 0.5 specifies the minimum vote value necessary for the Random Forest to predict a storm as a wind event. As such, the upper (lower) left quadrant can be interpreted as encompassing the incorrectly (correctly) forecasted null events. Similarly, the upper (lower) right quadrant can be interpreted as including the correctly (incorrectly) forecasted wind events. More details can be found in the main text.

Figure 5. POD, POFA, TSS, and 1-PC for the single signatures prediction for different thresholds applied. The optimal value for POD and TSS is 1, and for POFA and 1-PC is 0. Radar signatures are: (a) Z_dr column maximum height; (b) Precipitation ice signature maximum height; (c) VII; (d) Height of peak Z_h above the 0°C isotherm level; (e) Peak Z_h above the 0°C isotherm level; (f) Peak Z_h within the storm; (g) VIL; (h) DVIL.

Figure 6. TSS for the radar signatures’ threshold with maximum TSS (contours), presented in terms of POD and POFA. Radar signatures are S#1: Z_dr column maximum height; S#2: Precipitation ice signature maximum height; S#3: VII; S#4: Height of peak Z_h above the 0°C isotherm level; S#5: Peak Z_h above the 0°C isotherm level; S#6: Peak Z_h within the storm; S#7: VIL; S#8: DVIL.

Table 1. Radar signature numbers, physical descriptions, and units.

Signature Number	Description	Units
S#1	vertical extent of the 1 dB Z_dr contour in a Z_dr column in the presence of Z_h ≥ 30 dBZ at temperatures colder than 0°C	m
S#2	vertical extent of co-located values of Z_h ≥ 30 dBZ and Z_dr ~0 dB at temperatures colder than 0°C	m
S#3	maximum vertically integrated ice (VII) within a storm	kg m⁻²
S#4	height of the peak Z_h in the storm	m
S#5	peak Z_h at temperatures colder than 0°C	dBZ
S#6	peak Z_h at any temperature within a storm	dBZ
S#7	maximum vertically integrated liquid (VIL) within a storm	kg m⁻²
S#8	maximum density of VIL (DVIL) within a storm	g m⁻³

Table 2. Random forest out-of-bag confusion matrix.

		Observation
		Null	Wind
Prediction	Wind	b = 23	a = 49
Prediction	Null	d = 102	c = 35
	Total	125	84

Table 3. Random Forest’s Mean Decrease Accuracy and Mean Decrease Gini for all radar signatures.

Signature	Mean Decrease Accuracy	Mean Decrease Gini
S#1: Height of Z_dr column	10.73	12.55
S#2: Height of precipitation ice	5.39	10.34
S#3: VII	12.98	13.86
S#4: Height of peak Z_h above 0°C	−1.17	10.11
S#5: Peak Z_h above 0°C	10.34	13.26
S#6: Peak Z_h	14.58	13.56
S#7: VIL	8.18	12.85
S#8: DVIL	9.26	13.34

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Medina, B.L.; Carey, L.D.; Amiot, C.G.; Mecikalski, R.M.; Roeder, W.P.; McNamara, T.M.; Blakeslee, R.J. A Random Forest Method to Forecast Downbursts Based on Dual-Polarization Radar Signatures. Remote Sens. 2019, 11, 826. https://doi.org/10.3390/rs11070826

AMA Style

Medina BL, Carey LD, Amiot CG, Mecikalski RM, Roeder WP, McNamara TM, Blakeslee RJ. A Random Forest Method to Forecast Downbursts Based on Dual-Polarization Radar Signatures. Remote Sensing. 2019; 11(7):826. https://doi.org/10.3390/rs11070826

Chicago/Turabian Style

Medina, Bruno L., Lawrence D. Carey, Corey G. Amiot, Retha M. Mecikalski, William P. Roeder, Todd M. McNamara, and Richard J. Blakeslee. 2019. "A Random Forest Method to Forecast Downbursts Based on Dual-Polarization Radar Signatures" Remote Sensing 11, no. 7: 826. https://doi.org/10.3390/rs11070826

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Random Forest Method to Forecast Downbursts Based on Dual-Polarization Radar Signatures

Abstract

1. Introduction

2. Materials and Methods

2.1. Cape WINDS Towers and Soundings

2.2. C-Band Radar and Processing

2.3. Wind and Null Events

2.4. Dual-Polarization Radar Signatures

2.5. Random Forest

2.6. Mean Decrease Accuracy and Mean Decrease Gini

2.7. Single Signature Predictability

3. Results

3.1. Random Forest

3.2. Single Signatures

4. Discussion

5. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI