**3. Results**

The overall evolution of the simulated L2G burn and the associated smoke plume is best visualized with a 3D animation (see Animation S1 in the Supplementary Materials). The Supplementary Materials also includes an animated view of the cross-wind modeled CO2 mixing ratio (Animation S2). The latter demonstrates the ability of the LES to capture common plume behavior. As seen in the animation, the initial rise of moist buoyant air results in a temporary overshoot of the equilibrium plume height, followed by the gradual settling of the plume to its final injection height near the top of the boundary layer for this case. While the ability of WRF-SFIRE to qualitatively capture typical plume dynamics is reassuring, the following sections take a more quantitative approach to model evaluation.

#### *3.1. Fire Behavior*

Prior to evaluating the ability of WRF-SFIRE to capture plume rise and dispersion, it is important to ensure that the model is able to reasonably simulate fire behavior. Initial surface and fuel conditions have the potential to strongly impact fire growth and intensity, and, hence, affect the location and buoyancy of the smoke plume. As noted in Section 1, our approach does not constitute a comprehensive fire behavior evaluation study, but rather aims to ensure that WRF-SFIRE captures the bulk properties of combustion and supplies a reasonable surface forcing to the simulated atmosphere.

Our evaluation is based on the analysis of fire energy transport of RxCADRE observational data for L2G burn carried out by Butler et al. [10]. The study provides measurement-based values as well as error margins for ROS, and peak and average heat fluxes of the fire, which we use to assess the performance of the semi-empirical fire algorithm driving our LES simulation. Figure 3a,b compares LES-derived average and peak total heat fluxes for HIP1 and entire burn area over the flaming period with observations. For HIP1 point-to-point comparison, we use output from the nearest modeled grid points. L2G average observed values include measurements from all three HIP lots. The corresponding simulated estimates are calculated using the entire burn area (roughly half of the L2G lot).

**Figure 3.** Comparison of observed (blue) and modeled (red) fire behavior. The box and whiskers span interquartile range (IQR) and 1.5 × IQR, respectively, with the notch denoting the 95% confidence interval of the median (median ±1.57 × IQR/n1/2). Red line and green triangle correspond to median and mean, respectively. (**a**) Average heat flux during flaming period. (**b**) Peak fire heat flux during flaming period. (**c**) Rate of spread.

The start and end times of the flaming period are defined as simulation frames at which total heat flux at the location exceeded 5 kW m<sup>−</sup><sup>2</sup> [10]. For both burn-wide and point comparisons, the flaming period is determined separately for each individual grid point. Only ignited grids are included in the analysis. This approach allows us to mimic the analysis performed by Butler et al. [10] in the absence of true combustion modeling in WRF-SFIRE.

For the entire burn area the observed mean and peak heat fluxes associated with the fire (not the background environment) are 11 kW m<sup>−</sup><sup>2</sup> and 20 kW m<sup>−</sup>2, compared to LES-derived values of 8.9 kW m<sup>−</sup><sup>2</sup> and 19 kW m<sup>−</sup>2, respectively. For HIP1 lot the corresponding values were 11.4 kW m<sup>−</sup><sup>2</sup> and 19.4 kW m<sup>−</sup><sup>2</sup> (observed) versus 8.2 kW m<sup>−</sup><sup>2</sup> and 13 kW m<sup>−</sup><sup>2</sup> (modeled). Note that, due to close proximity of the HIP1 sensors to each other, four out of seven of them fall into the same atmospheric grid within the modeled domain. Modeled HIP1 averages should therefore be treated with caution, as they consist of only four unique values. Moreover, the large spread of observed HIP1 heat fluxes renders the differences between model and measurements not statistically significant. Overall, the results shown in Figure 3 sugges<sup>t</sup> that on average the surface thermal forcing to the modeled atmosphere due to the fire is reasonably captured by the model, subject to a slight negative bias (significant and non-significant for average and peak heat fluxes, respectively).

Observed rates of spread during the L2G burn were estimated using two methods in the study by Butler et al. [10]: flame arrival time from ignition and video images. The former approach takes into account the ignition time of the nearest fire line (perpendicular to fire advance vector) and the distance to the individual HIP1 sensors. The resultant values appear to have lower associated uncertainty than the latter image-derived method. To ensure consistency, we mimicked the above methodology in our simulated domain. Using the high-resolution fire domain, we calculated the upwind distance between each HIP1 point and the ignition line and the time it took the flame to reach each sensor location. To estimate ROS for the entire burn area, we created a mid-fire cross-section of 50 point-pairs between second and third ignition lines. Similar to the approach above, we derived the distance and flame

travel time for each pair to calculate ROS. As shown in Figure 3c, mean LES-based HIP1 and L2G ROS values of 0.049 m s<sup>−</sup><sup>1</sup> and 0.087 m s<sup>−</sup><sup>1</sup> are significantly lower then the corresponding observed rates of spread (0.23 m s<sup>−</sup><sup>1</sup> and 0.30 m s<sup>−</sup>1, respectively). Possible implications and sensitivity of our results to this deficiency are addressed in Section 4.

## *3.2. Plume Dynamics*

Airborne emissions data collected during RxCADRE campaign is central to our evaluation of WRF-SFIRE's ability to capture plume rise and dispersion. The emissions dataset [17] contains smoke plume entry and exit points along the flight path, which were calculated using background CO baseline concentrations. The measurements were taken along horizontal transects passing through the plume at various vertical levels ("parking garage" profile), beginning close to the ground and moving towards the top of the plume, for a total of 9 crossings.

The identified in-plume segments were then compared with modeled CO mixing ratios along the same flight path extracted from the geo- and time-referenced LES domain. Figure 4 shows the time series of the flight path simulated emissions, overlaid with observations-derived plume segments. The results sugges<sup>t</sup> good overall agreemen<sup>t</sup> in both location and timing between the modeled and observed emissions dispersion throughout majority of the BL depth. The coinciding model CO peaks and observed smoke segments indicate that the horizontal width of the smoke plume is well represented in the model. Potential shortcomings include excess smoke near the ground, as suggested by the early peaks (12:36 and 12:40 CST) not identified as a plume crossing, as well as a slight skew of the overall smoke distribution towards higher levels. A small phase shift appears in the modeled peaks toward the later parts of the simulation (12:50 CST and beyond).

**Figure 4.** Simulated CO mixing ratio along RxCADRE flight path. Red dashed and solid black lines correspond to LES-derived and observed values, respectively. Gray shading indicates observed smoke time periods (not magnitudes) as identified from CO measurements along the flight path.

To evaluate the vertical distribution of WRF-SFIRE emissions, we compared the model-generated CO2 concentrations with airborne measurements obtained during the "parking garage" and "corkscrew" (spiral ascent or descent) maneuvers. As shown in Figure 5a, there is a good overall agreemen<sup>t</sup> in injection heights for fire-generated emissions during the earlier "parking garage" profile. Plume top is accurately captured. Modeled concentrations tend to have a negative bias of ∼5 ppmv throughout the bulk of the plume thickness (500–1300 m), and be slightly over-predicted for the very top and bottom of the smoke column (at 400 m and 1500 m).

**Figure 5.** Observed (black) and modeled (red) vertical CO2 emissions distribution during: (**a**) "parking garage" maneuver; and (**b**) corkscrew maneuver.

The "corkscrew" profile corresponds to a time near the very end of our simulation. As shown in Figure 5b, the band of modeled emissions appears to be very narrow and severely under-predicts the smoke concentrations. We discuss possible reasons for this behavior in Section 4.
