In this simulation study, we analyze time series data representing pollutant levels, such as carbon monoxide (CO), as input variables. The pollutant values, denoted as , are combined with fixed parameters ( and ) to generate the response variable , which reflects health outcomes. We model this relationship using a GLARMA(1,0) Poisson model.
3.1.1. Large Samples
For this simulation study, three scenarios were considered:
S1: and , where .
S2: and , where is an independent random vector in time and .
S3: and , where , with the autoregressive parameter assuming values 0.2, 0.5, and 0.8.
These scenarios were selected with specific objectives: Scenario 1 uses the covariate from [
12] for comparison purposes. Scenario 2 considers a covariate with no temporal dependence, allowing us to examine the impact of the absence of a temporal structure on the bootstrap intervals. Scenario 3 incorporates a covariate with distinct levels of temporal dependence, aiming to assess the influence of more complex temporal structures on the variables. Of the three scenarios, Scenario 3 is the most closely related to real-world situations, as pollutants typically exhibit temporal dependence.
The considered sample size
n was equal to 1000. The values of
in Equation (
19) were fixed at 0.2, 0.4, and 0.6, and the nominal level for the confidence intervals was fixed at 95%. The parameter
assumed the value 1.0. Although [
12] showed that, for
, only in the simplest model, the process
has a stationary and ergodic distribution, the numerical simulations revealed that, even for complex models, this value of
provides better estimates. For all scenarios, the parameter values used in the simulations were chosen for simplicity, with
. Additional simulations conducted with different values of
produced similar results, indicating that the findings are not sensitive to the specific choice of parameter values.
The classic model-based bootstrap considers the three steps presented in
Section 2.2.1 to construct the confidence intervals. For the sieve and INAR(1) cases, the steps shown in
Section 2.2.2 and
Section 2.2.3 were applied to the count time series
, and then the GLARMA Poisson model was fitted considering the bootstrap samples as the response variables. In addition, for the INAR(1) bootstrap, an INAR process having Poisson-distributed innovations was assumed with
and
, where
was obtained by Yule–Walker estimation.
The Monte Carlo simulations were repeated 500 times with 500 bootstrap replications. The asymptotic confidence interval was estimated as in Equation (
12). All the codes were written in the R language and are available from the authors upon request.
Table 1(a) presents the mean and standard deviation of the 500 Monte Carlo estimates of parameters
,
, and
in scenario 1 (S1). For
and
parameters, the mean of the estimates was close to the real values, especially when
or
. Conversely, the standard deviation shows a consistent increase with the value of
. Parameter
is better estimated for small values.
Table 1(b) presents the 95% confidence intervals for the RR of the covariate
. The values in square brackets are the calculated intervals’ average lower and upper limits. The results indicate that classic and sieve bootstrap methods exhibit a decline in coverage as
increases. Specifically, for the classic bootstrap, the coverage drops from 0.884 to 0.729, while for the sieve bootstrap, the decrease is even more pronounced, from 0.927 to 0.570. Notably, for
, the sieve bootstrap still achieves a coverage rate close to the nominal level of 0.95, suggesting that it may be a reasonable choice in this scenario. However, for higher values of the autoregressive parameter, both methods present considerably lower coverage, indicating potential limitations in their performance under strong dependence. In contrast, the INAR(1) bootstrap and the asymptotic confidence intervals maintained a coverage rate of approximately 95% for all values of
.
Scenario 1 studied the impact of the same covariate considered in [
12], although the authors only evaluated cases where the time correlation is a moving-average process of order 1. Real data sets commonly present an autoregressive autocorrelation structure. In this case, S1 showed that even when the time correlation structure is complex (e.g.,
), for deterministic covariates (
), the asymptotic theory and the INAR(1) bootstrap presented coverage rates close to the nominal level of the confidence intervals.
In the parameter estimation presented in
Table 2(a), the mean of the estimates was close to the real values of all parameters, except when
. It can also be seen that the standard deviations were much affected by the increase in
.
Table 2(b) presents the 95% confidence intervals for the RR in scenario 2 regarding the covariate
.
Table 2(b) shows that for the classic bootstrap, the coverage rate was close to 1 for all values of
, which means that almost 100% of the intervals contain the true relative risk value. Regarding the sieve bootstrap, for
and
, the coverage rate was also close to 1. Meanwhile, for
, the coverage rate decreased to 0.849. The INAR(1) bootstrap and the asymptotic intervals had similar performance, with coverages close to 0.95 for
and
and considerably below the nominal level for
. It should be pointed out that the INAR(1) bootstrap always presented coverages closer to 0.95 than the asymptotic interval, even for the
case.
Scenarios 1 and 2 showed that the coverage of the INAR bootstrap and asymptotic approaches were close to the nominal level for
and
, where
was appropriately estimated. In
Table 1(a) and
Table 2(a), the mean of this parameter is close to the true values, and although the standard deviation goes up in
Table 2(a), the coverage in scenario 2 is not impacted. However, for
, the
estimates were terrible, mainly in S2, and as the RR depends on
, the interval coverage was also impacted. Finally, it is essential to observe that the coverage intervals are unsuitable for classic and sieve bootstraps even when the
estimates are reasonable.
In epidemiology, it is common for air pollutants to present temporal correlation. To simulate this behavior, in scenario 3, the covariate
followed an autoregressive process of order 1:
where
is an AR(1) process with autoregressive parameter
and
is defined by Equation (
5). To evaluate the time structure’s impact, the covariate’s autoregressive parameter (
) assumed values equal
, and
.
Table 3(a) presents the parameter estimates for all values of
. For
, when focusing on
, the mean estimate of this parameter is close to the true value for
, while the estimate becomes less accurate for
and
, accompanied by an increase in the standard deviation.
In the case of
, shown in
Table 3(a), the mean of the
estimate remains close to the true value for
. However, compared to the results for the exact value of
in
Table 3(a), there is a noticeable increase in the standard deviation. For both
and
, the
estimates deteriorate, and the standard deviation increases as the value of
rises.
Table 3(a) also shows the parameter estimates when the covariate’s autoregressive parameter (
) is
. Even for the smallest value of
considered, the mean estimate of
is poor, and the standard deviation is significantly high. For
and
, the means of the
estimates become much worse, and the standard deviation increases even further.
Table 3(b) presents the 95% confidence intervals (CIs) for the relative risk (RR) of the covariate
. For
, the coverage rate for the classic model-based bootstrap was close to 1 for all values of
. The performance of the asymptotic approach, sieve bootstrap, and INAR(1) bootstrap was similar, with the coverage decreasing for
. For
, all methods exhibited poor performance.
For
, as shown in
Table 3(b), a similar performance was observed for the classic and sieve bootstraps, with the coverage rate equal to 100% for
, followed by a drop in coverage as
increased. Both methods also produced large confidence intervals. The coverage rate of the INAR(1) and asymptotic approaches was approximately 95% for
. For
, these rates decreased, with the INAR(1) bootstrap maintaining the highest coverage. Again, for
, all methods showed substantial deviations from the nominal coverage level.
For
,
Table 3(b) shows that for
, the classic model-based and sieve bootstraps had coverage rates close to 1, while the INAR(1) bootstrap and asymptotic approach exhibited coverage rates of 0.93 and 0.895, respectively. For
and
, all methods saw a significant decline in the coverage rate, with the CIs from the asymptotic approach and INAR(1) bootstrap being the most affected.
In general, we observed that the INAR method appears to perform better, alongside the asymptotic method, as they present narrower confidence intervals and coverage rates closer to the nominal value. On the other hand, the classic and sieve methods showed very poor coverage in some cases, with coverage rates close to 1, while the nominal value was 0.95. This suggests that the INAR and asymptotic approaches are more reliable for estimating the relative risk in scenario 3, although the coverage rate was in general inferior than that observed in scenario 2.
The comparison between scenarios 2 and 3 indicates that time correlation in the covariates can impact the coverage rate of the confidence intervals; as the autoregressive structure becomes more complex, the interval coverage becomes smaller.
Table 3(a) showed that the values of
strongly impact the parameter estimation, and this effect becomes worse as this autoregressive parameter increases in the direction of the nonstationarity region, either in the covariate or in the
component. It is easy to verify that for any
,
, where the covariate
is an independent random vector in time. However, for
AR(1), the variability of the state process
increases, which inflates the model estimates, directly impacting the coverage rates of the RR.
Beyond the empirical investigations discussed here, scenarios with more complex model structures, such as bivariate time series, were also considered. As expected, the coverage rate was unsatisfactory. Therefore, the authors recommend applying the procedure proposed by [
7] before implementing the bootstrap approaches discussed in this study in practical situations where covariates are time series. This is further explored in
Section 3.2, within the real data analysis, where the covariates follow a vector of time series data.