Analysis of the Spread of COVID-19 in the USA with a Spatio-Temporal Multivariate Time Series Model

Rui, Rongxiang; Tian, Maozai; Tang, Man-Lai; Ho, George To-Sum; Wu, Chun-Ho

doi:10.3390/ijerph18020774

Open AccessArticle

Analysis of the Spread of COVID-19 in the USA with a Spatio-Temporal Multivariate Time Series Model

by

Rongxiang Rui

¹

,

Maozai Tian

²

,

Man-Lai Tang

^3,*,

George To-Sum Ho

⁴

and

Chun-Ho Wu

⁴

¹

School of Statistics, Renmin University of China, Beijing 100872, China

²

College of Medical Engineering and Technology, Xinjiang Medical University, Ürümqi 830011, China

³

Department of Mathematics, Statistics and Insurance, Hang Seng University of Hong Kong, Hong Kong, China

⁴

Department of Supply Chain and Information Management, Hang Seng University of Hong Kong, Hong Kong, China

^*

Author to whom correspondence should be addressed.

Int. J. Environ. Res. Public Health 2021, 18(2), 774; https://doi.org/10.3390/ijerph18020774

Submission received: 16 December 2020 / Revised: 10 January 2021 / Accepted: 13 January 2021 / Published: 18 January 2021

(This article belongs to the Section Public Health Statistics and Risk Assessment)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

With the rapid spread of the pandemic due to the coronavirus disease 2019 (COVID-19), the virus has already led to considerable mortality and morbidity worldwide, as well as having a severe impact on economic development. In this article, we analyze the state-level correlation between COVID-19 risk and weather/climate factors in the USA. For this purpose, we consider a spatio-temporal multivariate time series model under a hierarchical framework, which is especially suitable for envisioning the virus transmission tendency across a geographic area over time. Briefly, our model decomposes the COVID-19 risk into: (i) an autoregressive component that describes the within-state COVID-19 risk effect; (ii) a spatiotemporal component that describes the across-state COVID-19 risk effect; (iii) an exogenous component that includes other factors (e.g., weather/climate) that could envision future epidemic development risk; and (iv) an endemic component that captures the function of time and other predictors mainly for individual states. Our results indicate that maximum temperature, minimum temperature, humidity, the percentage of cloud coverage, and the columnar density of total atmospheric ozone have a strong association with the COVID-19 pandemic in many states. In particular, the maximum temperature, minimum temperature, and the columnar density of total atmospheric ozone demonstrate statistically significant associations with the tendency of COVID-19 spreading in almost all states. Furthermore, our results from transmission tendency analysis suggest that the community-level transmission has been relatively mitigated in the USA, and the daily confirmed cases within a state are predominated by the earlier daily confirmed cases within that state compared to other factors, which implies that states such as Texas, California, and Florida with a large number of confirmed cases still need strategies like stay-at-home orders to prevent another outbreak.

Keywords:

columnar density of total atmospheric ozone; COVID-19; maximum temperature; minimum temperature; spatio-temporal multivariate time-series analysis; USA

1. Introduction

The pandemic due to the coronavirus disease 2019 (COVID-19), caused by the novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [1], was the most disastrous incident in 2020, causing millions of deaths and resulting in economic activity worldwide falling sharply. According to the latest report of the World Health Organization (WHO), the cumulative cases around the world reached 28,637,952 and the cumulative deaths were 917,417 as of 13 September 2020 (https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200914-weekly-epi-update-5.pdf?sfvrsn=cf929d04_2). Furthermore, the World Bank suggested that most countries would be expected to suffer from economic recession in 2020 (https://www.worldbank.org/en/news/feature/2020/06/08/the-global-economic-outlook-during-the-covid-19-pandemic-a-changed-world).

Although some countries and regions (e.g., North America, China, and Europe) are actively developing vaccines with some showing encouraging signs [2,3,4], it is almost impossible to provide sufficient effective vaccines to every person in the next few years. Hence, this pandemic will undeniably last several months or even a few years. As one of the most developed countries in the world, the United States declared a public health emergency on 31 January 2020, and preventive and proactive measures (e.g., suspending the entry and the quarantine of foreign nationals seeking entry) have been taken to control the spread of the virus and treat those affected. However, it has become one of the most severely affected nations as the respective numbers of confirmed cases and deaths approximately account for 1/5 (i.e., 6,386,832 and 191,809) of the whole global cases as of 13 September 2020. Even worse, as Chowell and Mizumoto [5] argued, states and territories with the largest proportions of older populations (such as Florida, Maine, and Puerto Rico) have become the places with the largest number of confirmed cases. The spread of this pandemic in the USA has become a global concern.

Typically, susceptible-infected-recovered (SIR) based models (e.g., SIRD [6]), first proposed by Kermack and McKendrick [7], are widely used models due to their simplicity and good performance. However, these models (such as susceptible-infected-recovered-susceptible (SIRS), susceptible-exposed-infected-recovered (SEIR), and susceptible-exposed-infected-recovered-susceptible (SEIRS)) only take into account the tendency of the related epidemic transmission corresponding to one single region, and other useful information can hardly be uncovered, which includes the impacts derived from the place itself, other areas, and other exogenous variables. Some other models that are utilized to characterize epidemic pervasion are based on time series models. For instance, seasonal autoregressive integrated moving average (SARIMA) models were employed for modeling infectious disease count data in Helfenstein [8] and Trottier et al. [9]. Recently, different time series models (e.g., auto-regressive integrated moving average (ARIMA), the Holt–Winters additive model, and HWAAS) and machine learning approaches (e.g., Prophet, DeepAR, and N-Beats) have been adopted to analyze and compare the prediction accuracy of the percentage of active cases per population based on the COVID-19 data from ten countries with the highest number of confirmed cases as of 4 May 2020 [10].

Held et al. [11] proposed a space-time multivariate time series model (denoted as the HHHmodel) that can be applied to model multiple-unit cases where the “unit” can be different geographical regions, different age groups, or different epidemics caused by different pathogens. Motivated by the HHH model, Paul et al. [12] and Paul and Held [13] developed a spatio-temporal framework to jointly model several epidemics by considering the spatial interaction effect, as well as the time autoregressive effect. Their models have been applied to analyze the transmission of dengue fever in Guandong Province in China in 2014 [14], malaria and cutaneous leishmaniasis analysis in Afghanistan [15], hemorrhagic fever with renal syndrome in Zhejiang Province of China [16], and the effect of containment measures for COVID-19 in Italy [17].

One drawback of the model proposed by Paul et al. [12] and Paul and Held [13] is that it mainly takes care of the connection between the current number of infected cases and the previous numbers of infected cases and the adjacent units/areas, which may ignore other exogenous predictors. As a result, it is limited in its interpretability and applicability for infectious diseases such as COVID-19. In fact, recent studies (e.g., [18,19]) have pointed out that some weather/climate related variables show statistically significant associations with the transmission of COVID-19.

To overcome this limitation for the analysis of state-level time series of the COVID-19 contagion effect, we consider a spatio-temporal framework based on the multivariate time series model proposed by Paul et al. [12] and Paul and Held [13], which however decomposes the COVID-19 risk additively into autoregressive, spatiotemporal, exogenous, and endemic components. Briefly, the autoregressive and spatiotemporal components respectively describe the within-state and across-state COVID-19 risk effects. The exogenous component includes other factors that could affect future epidemic development risk, while the endemic component captures the function of time and other predictors mainly about individual states.

Briefly, some weather/climate related variables are carefully selected as exogenous factors in our analyses. Indeed, some climate/weather related variables have been shown to be correlated with epidemic transmission in the relevant literature. For instance, in a study of the influence of weather on the foot-and-mouth disease epidemic spread from 1967 to 1968, Hugh-Jones and Wright [20] argued that wind and precipitation played a major role in the spread of the disease, especially wind. According to Tan et al. [21], the environmental temperature can influence the spread of SARS. Qi et al. [18] found that the daily average temperature and daily average relative humidity are significantly negatively associated with the daily confirmed cases of COVID-19 in Hubei, China. Similarly, Tosepu et al. [19] found that the average seasonal temperature was significantly correlated with COVID-19 in Jakarta, Indonesia.

The rest of this article is organized as follows. In Section 2, we elaborate our spatio-temporal multivariate time series model, including the sub-models of each coefficient in the model. In Section 3, we employ our model for analyzing the COVID-19 count data of the USA and show our main findings. In Section 4 and Section 5, the discussion and conclusions are presented, respectively. We report the technical materials for parameter estimation in Appendix A.

2. Development of the Model

Models

Let

Y_{r, t}

denote the number of infected cases in state r at time point t with

r = 1, \dots, R, t = 1, \dots, T

. Usually,

Y_{r, t}

is assumed to follow a Poisson distribution [22,23,24]. Since the number of infected cases in each state is hardly totally observed (i.e., the existence of heterogeneity for different states), employing the Poisson assumption could underestimate the underlying dispersion. Here, we adopt the negative binomial distribution [11,12,14]. That is, suppose

Y_{r, t}

follows a conditional negative binomial distribution, i.e.,

Y_{r, t} | Y_{\cdot, t - l}, V \sim NegBin (μ_{r, t}, ε_{r})

for

r = 1, \dots, R, t = 1, \dots, T

, with conditional mean

μ_{r, t}

and conditional variance:

σ_{r, t}^{2} = μ_{r, t} (1 + ε_{r} μ_{r, t}),

where

Y_{\cdot, t - l}

indicates the vector consisting of the number of infected cases of all states at time point

t - l

, l is the time lag term satisfying

l \in {1, \dots, T - 1}

,

ε_{r}

is the overdispersion parameter of state r, and

V

is a random effect vector with

V \sim N (0, Σ)

with

Σ = diag {σ_{(λ)}^{2}, σ_{(ψ)}^{2}, σ_{(θ)}^{2}, σ_{(ζ)}^{2}} \otimes I_{R \times R}

, ⊗ being the Kronecker product and

I_{R \times R}

an

R \times R

identity matrix. It is easy to see that when

ε_{r}

equals zero, the distribution of

Y_{r, t}

reduces to a Poisson distribution, whereas the larger the value of

ε_{r}

, the greater the overdispersion is. Thus, comparing with the Poisson assumption, the negative binomial assumption has wider applicability.

To embed other predictors in the distribution of

Y_{r, t}

, a hierarchical modeling procedure is employed here. In the first layer, the conditional mean

μ_{r, t}

is formulated as follows:

μ_{r, t} = λ_{r, t} Y_{r, t - l} + ψ_{r, t} Ψ_{r, t - l} + θ_{r, t} Θ_{r, t - l} + ζ_{r, t},

(1)

where:

\begin{matrix} Ψ_{r, t - l} = & \sum_{j \to r} ω_{r, j} Y_{j, t - l}, Θ_{r, t - l} = \sum_{j} η_{r, j} x_{j, k, t - l}, j = 1, \dots, R, k = 1, \dots, K . \end{matrix}

Here,

x_{j, k, t}

denotes the observation at time t of the k-th exogenous factor in state j, which could have an influence on state r.

η_{r, j}

is an indicator with the value being

1 / m_{j}

if

x_{j, k}

has influence on state r and zero otherwise, where

m_{j}

is the number of factors that have an influence on the number of cases of the j-th state.

j \to r

indicates that states j and r are neighbors that share the same border.

ω_{r, j}

is an indicator with the value being

1 / n_{j}

if state r is adjacent to state j and zero otherwise, where

n_{j}

is the number of states that have a common border with state j. Other choices of weights (i.e.,

ω_{r, j}

) are also available in [12,13,14] and the references therein.

According to Giuliani et al. [25],

λ_{r, t} Y_{r, t - l}

,

ψ_{r, t} Ψ_{r, t - l}

, and

ζ_{r, t}

are respectively called the epidemic-within component, epidemic-between component, and endemic component. In this paper, we adopt the terminology from Giuliani et al. [25], and we further call

θ_{r, t} Θ_{r, t - l}

the epidemic-boosted component. It is noteworthy that

Ψ_{r, t - l}

, which is based on the space-time dimension, mainly includes the interaction information between one state and other states neighboring that state, while

Θ_{r, t - l}

contains the correlation information of other exogenous factors between one state and other states neighboring that state. Comparing with the model proposed in Paul et al. [12] and Paul and Held [13], our proposed model in (1) improves the interpretability, as well as the applicability.

In the second layer, for parameters

λ_{r, t}, ψ_{r, t}

, and

ζ_{r, t}

, we adopt the same strategy as given in Paul et al. [12] and Paul and Held [13]; that is, each parameter assumes the following log-linear form:

log (\cdot_{r, t}) = α_{r}^{(\cdot)} + V_{r}^{(\cdot)} + {β^{(\cdot)}}^{⊤} z_{r, t}^{(\cdot)},

(2)

where

V^{(\cdot)}

is assumed to have a multivariate normal distribution with zero mean and covariance matrix

σ_{(\cdot)}^{2} I_{R \times R}

, i.e.,

V^{(\cdot)} \sim N (0, σ_{(\cdot)}^{2} I_{R \times R})

. We further discuss the formulation of each parameter as follows.

We first consider autoregressive parameter

λ_{r, t}

. As suggested in Paul et al. [12], Cheng et al. [14], Adegboye et al. [15], and Wu et al. [16],

λ_{r, t}

is formulated by the following log-linear form:

log (λ_{r, t}) = α^{(λ)} + V_{r}^{(λ)},

(3)

where

α^{(λ)}

is related to the intercept term and

V_{r}^{(λ)} \sim N (0, σ_{(λ)}^{2})

.

For

ψ_{r, t}

, it satisfies:

log (ψ_{r, t}) = α^{(ψ)} + V_{r}^{(ψ)} + β_{1}^{(ζ)} log (P u_{r, t}),

(4)

where

α^{(λ)}

is related to the intercept term and

V_{r}^{(λ)} \sim N (0, σ_{(λ)}^{2})

. For the choice of

P u_{r, t}

, unlike Paul et al. [12] and Paul and Held [13], Giuliani et al. [25] argued that it should be a variable that reflects the possible heterogeneous influence for different regions, and their choice was the population of a state. However, we believe that people of different ages could be significantly divergent under the consideration of the infection effect from the population. In this regard, we define

P u_{r, t}

as the population size of people whose ages are under 65 in state r at time t based on the fact that this group of people is more likely to travel to other places.

For

θ_{r, t}

, the log-linear formula may not be suitable as the influence of such exogenous variables could have a positive or negative effect on epidemic transmission, i.e., the sign of

θ_{r, t}

could be “+” or “−”. Here, we suppose that

θ_{r, t}

follows a normal distribution with mean

α^{(θ)}

and variance

σ_{(θ)}^{2}

, i.e.,

θ_{r, t} = α^{(θ)} + V_{r}^{(θ)},

(5)

where

V_{r}^{(θ)} \sim N (0, σ_{(θ)}^{2})

.

Giuliani et al. [25] employed a second-order polynomial log-linear regression to evaluate the fluctuation of the number of confirmed cases from the perspective based on the time dimension. This is reasonable when the epidemic is in the early stage (i.e., t is relatively small). However, with the development of the pandemic, the reproduction number will inevitably tend to be small as the population for a specific state is limited. Thus, we suggest to use the s-shaped growth curve—logistic growth model—which was also employed to study age-specific case-fatality rates of COVID-19 in China and Italy [26]. That is, we consider:

\begin{matrix} log (ζ_{r, t}) = & α^{(ζ)} + V_{r}^{(ζ)} + β_{1}^{(ζ)} log (logit (t)) + β_{2}^{(ζ)} log (P o_{r, t}), \end{matrix}

(6)

where:

logit (t) = {(1 + exp \{- (β_{3}^{(ζ)} + β_{4}^{(ζ)} t)\})}^{- 1}

is a logistic function and

P o_{r, t}

is defined as the population size of people whose ages are over 65 years old in state r at time t.

3. Results

3.1. Data of Interest

Study area: Here, we consider the 50 states plus Washington, D.C. (DC), for our COVID-19 analyses. However, American Samoa, Guam, the Northern Mariana Islands, the Commonwealth of Puerto Rico, and the Virgin Islands are excluded from our study for simplicity.

COVID-19 data: We obtained the state-level confirmed cases data on COVID-19 in the USA from Kaggle, which are available from https://www.kaggle.com/sudalairajkumar/covid19-in-usa. Here, we are mainly interested in the cumulative positive cases, as this will be used for the calculation of the number of daily increased COVID-19 cases in the USA. On 14 March 2020, the U.S. President held a coronavirus conference, one day after he declared the pandemic a national emergency (https://www.rev.com/blog/transcripts/donald-trump-coronavirus-press-conference-transcript-march-14). For this reason, we consider those data starting from March 15th, 2020.

Weather/climate data: The state-level weather and climate data we use in this paper are openly available from Kaggle, which is fully powered by Dark Sky and can be downloaded from https://www.kaggle.com/eeemonts/weatherclimate-data-covid19?select=csv. The factors included in our analyses are the maximum temperature (MaT), minimum temperature (MiT), humidity (Hu), the probability of precipitation appearance (PA), the percentage of cloud coverage (CC), sea-level air pressure (AP), wind speed (WS), and the columnar density of total atmospheric ozone (CDTAO). Consistent with the COVID-19 data, we only consider weather/climate data starting from March 15th, 2020.

Population data: Both the state-level population data of 2019 and the state-level population percentage of people over 65 years old were collected from Population Reference Bureau (PRB), which is available from (https://www.prb.org/usdata/indicator/age65/snapshot). Since the direct accessibility of the population over 65 years old is denied, we simply used the state-level population of 2019 multiplying the related percentage over 65 years old to get the approximation of the state-level population over 65.

Figure 1 depicts the daily confirmed cases and the cumulative confirmed cases in the USA. One can see that the cumulative confirmed cases and daily confirmed cases in some states such as Connecticut (CT), New Jersey (NJ), and New York (NY) have eased up; some states like North Dakota (ND) became more and more severe; and others tended to recur.

Figure 2 shows state-level subplots of the cumulative confirmed cases on September 15th, the daily confirmed cases on September 15th, the total populations in 2019, the populations over 65, and the populations under 65. It appears that both the cumulative confirmed cases and the daily confirmed cases have a strong consistency with the populations under 65, as well as the populations over 65. The appearance of a high infection rate among individuals under 65 could mean that there is a trend of younger people having severe COVID-19 infections in the USA as warned by Kass et al. [27], which can cause much worse situations (e.g., infecting more older adults) [28].

3.2. Weather and Environmental Factors’ Selection

To explore the association between weather/climate based factors and COVID-19 transmission, Tosepu et al. [19] applied the Spearman-rank correlation test, while Bashir et al. [29] utilized the Kendall and Spearman-rank correlation tests. Here, we first use the Kendall and Spearman-rank correlation tests to identify factors that have a significant correlation with daily confirmed cases.

Figure 3 shows the results from the Spearman and Kendall tests. According to the scatter plots, the p-values associated with MaT, MiT, Hu, CC, and CDTAO with respect to all states are mostly below

0.10

, whereas those associated with AP and PA are mostly over

0.10

. On the other hand, results from bar plots suggest that no particular factor has a remarkable association with the majority of the states. Briefly, AP and PA only have an influence on less than 25 states, while MaT, MiT, and CDTAO have a strong association with more than 40 states. According to Figure 4, it can be safely concluded that MaT, MiT, and CDTAO are the major factors that contribute to the strong associations in most of the states and are taken into account for further modeling analyses.

3.3. Optimal Parameters’ Determination

For better fitting the confirmed cases in different states, the optimal time lag l needs to be determined. Considering that the desirable range for l is dynamically increased, point-wise optimization seems too tedious and less efficient. Thus, we adopted the possible range from one to 14, which is the time interval that the CDC suggested to stay at home after one’s last contact with a person who has COVID-19 (https://www.cdc.gov/coronavirus/2019-ncov/if-you-are-sick/quarantine.html). Besides, all weather/climate factors were respectively nondimensionalized by Studentization to mitigate the impact of the different units.

Table 1 summarizes the correlation between different time lags and the related penalized log-likelihood values. It is noticed that when

l = 2

, the average estimate of

l (π, v)

has the largest value (i.e., 48,510,611.518) with the smallest sd (i.e., 301,043.787). Hence,

l = 2

will be used in all subsequent analyses.

Table 2 shows the estimates of

π

and

σ

based on the configuration for time lag

l = 2

. From the estimates of the dispersion parameters

ε

’s, we observe that the daily confirmed cases of COVID-19 show obvious overdispersion in almost every state (especially in New Mexico (NM)), which confirms that using the negative binomial distribution is a sensible choice for analyzing the transmission of COVID-19 in the USA.

The estimates of

σ

characterize the heterogeneity of COVID-19 transmission across states. According to the results in Table 2, there is spatial variation concerning the epidemic-within component with

{\hat{σ}}_{λ} = 0.851

, the epidemic-between component with

{\hat{σ}}_{(ψ)}

=

0.836

, the epidemic-boosted component with

{\hat{σ}}_{(θ)} = 0.816

, and the endemic component with

{\hat{σ}}_{ζ} = 0.853

. Therefore, we believe that there is significant spatial heterogeneity in the epidemic-within, endemic-between, epidemic-boosted, and endemic component.

3.4. Components Analysis

Figure 5 shows the state-level estimated random effects with respect to epidemic-within component, epidemic-between component, epidemic-boosted component, and endemic component. Clearly, heterogeneity appears in all components, and the random effects from the four components have a significant effect in most of the states. Here, we mainly focus on random effects from the epidemic-within component and epidemic-boosted component. From Subplot (a) in Figure 5, the estimates of

α^{(λ)} + V^{(λ)}

for most states are smaller than zero, which implies that community-level spread in most states has considerably alleviated from the perspective of epidemic-with component. Similarly, the estimates of

α^{(λ)} + V^{(λ)}

are smaller than the negative estimate of

α^{(θ)}

(see Subplot (c) in Figure 5), which indicates higher values of weather/climate factors corresponding to less daily confirmed cases.

Figure 6 and Figure 7 show the state-level percentage of daily confirmed cases of each component in every single day and the state-level means of fitted values based on (1). It is obvious that the estimated values of the epidemic-within component are the highest among all components in most of the states (e.g., CA) as time goes on. This phenomenon suggests that cross-state spread, weather/climate factors’ influence, or other unobserved factors have little impact on COVID-19 transmission, whereas the previous state’s related cases predominate the fluctuation of future infections in most states. For several states such as Connecticut (CT), New Hampshire (NH), and Rhode Island (RI), no particular component demonstrates a dominant effect on daily confirmed cases. According to Figure 1, we find that with relatively less confirmed cases, the endemic component seems to be more dominant in these states, which is also shown in CT, NH, and RI.

4. Discussion

In this article, we find that MaT, MiT, and CDTAO have statistically significant associations with daily confirmed cases in almost all the states in America, based on both the Kendall and Spearman-rank correlation tests. Furthermore, from the estimated coefficients of the epidemic-boosted component, we identify that this association is negative. That is, higher MaT, MiT, and CDTAO correspond to smaller daily confirmed cases. However, further analysis uncovers that the previous daily confirmed cases in one state itself are generally predominant for the next confirmed cases, which suggests that states with a large number of confirmed cases tend to cause more infections.

Recent research [18,19] has shown some evidence of the correlation between weather/climate factors and COVID-19 transmission. Our work, based on an extended multivariate time series model, further confirms and quantifies the existence of a similar relationship. Unlike some existing models (e.g., [10]), our model successfully facilitates the interpretability and practicability by an additional term that characterizes the degree of the influence from some other external factors. However, one obvious drawback of our model is that a large number of unknown parameters need to be estimated, and the computational cost is therefore high. As a result, the effective sample size also needs to be sufficiently large. Recently, Kimball et al. [30] investigated a COVID-19 outbreak in a long-term care skilled nursing facility (SNF) in King County, Washington, recognized on 28 February 2020, and discovered that screening the SNF residents based on the symptoms related to this epidemic could not to discover all SARS-CoV-2 infections since they found that 23 (30.3%) workers had SARS-CoV-2 positive tests even if they were asymptomatic or presymptomatic on the testing day. Such clustering based infections dramatically aggregate the transmission intensity, making the analysis based on not only our model, but others previously mentioned non-trivial challenges. More recently, Rader et al.’s analysis [31] unveiled that “epidemics in crowded cities are more spread over time, and crowded cities have larger total attack rates than less populated cities”. Such phenomena further imply that an area with a higher population density could cause a much severer outbreak, which is also a situation that requires further investigation.

5. Conclusions

As COVID-19 has become the most disastrous health event in the world, especially in the USA by far, so understanding the transmission pattern of this pandemic has become more urgent. Our analyses of the COVID-19 surveillance data depict remarkably heterogeneous transmission across states during the COVID-19 outbreak in the USA from 15 March 2020 to 15 September 2020. The degree of heterogeneity is characterized by random effects parameter estimates. With the Kendall and Spearman-rank correlation tests, we explore the association between weather/climate factors and daily confirmed COVID-19 cases for each state, which is further used for the analysis of the spatial and temporal occurrence of COVID-19.

Some interesting findings are noteworthy. First, the heterogeneity of COVID-19 transmission across states is observed in all four components, which implies that there are different situations in different states and the same strategies may not work perfectly to contain this pandemic in all states.

Second, some weather/climate factors (i.e., CC, Hu, MaT, MiT, and CDTAO) demonstrate significant correlations with daily confirmed cases in many states. In particular, MaT, MiT, and CDTAO have a strong association with most states. Based on the estimated coefficients from the epidemic-boosted component, one can further find that these variables correspond to daily confirmed cases with a negative correlation in almost all states, i.e., higher MaT, MiT, and CDTAO correlate with less daily confirmed cases. This phenomenon suggests that climate change in the local and adjacent areas could affect the possibility of infection in this area.

Third, since the estimates of the epidemic-within component in most states are predominant, their corresponding values can represent the fluctuation of the daily confirmed cases. Since the estimated coefficients of the epidemic-within component in most states are smaller than one (

exp (α^{(λ)} + V_{r}^{(λ)}) < 1

), the community-level spread of COVID-19 in most states is remarkably mitigated and the transmission intensity is decreased. Furthermore, we believe that the relatively large number of daily confirmed cases in the current stage are mainly due to the previous large number of infected cases, and the number of new confirmed cases per day will gradually decrease as time goes on.

Fourth, for the future tendency of the daily confirmed cases, the influence of the previous confirmed cases is the most important among the four components. This means that for states like Texas (TX), which has a large number of confirmed cases, the risk of a sharp increase of the daily confirmed cases is still higher than other states with less confirmed cases. Therefore, there is no doubt that regulations like social distancing or wearing masks in public places are clearly necessary.

Final, since the endemic components for some states (e.g., Vermont (VT)) show obvious predominance, other possible variables that could influence COVID-19 transmission need to be further determined.

Author Contributions

R.R. and M.T. contributed equally to this paper. Conceptualization, R.R. and M.T.; methodology, R.R. and M.T.; formal analysis, R.R.; data curation, R.R.; writing, original draft preparation, R.R.; writing, review and editing, R.R., M.T., M.-L.T., G.T.-S.H., and C.-H.W. All authors read and agreed to the published version of the manuscript.

Funding

The authors disclose receipt of the following financial support for the research, authorship, and/or publication of this article: The work was partially supported by the National Natural Science Foundation of China (No.11861042) and the China Statistical Research Project (No.2020LZ25). The work of Man-Lai Tang was partially supported through grants from the Research Grant Council of the Hong Kong Special Administrative Region (UGC/FDS14/P01/16, UGC/FDS14/P02/18, and the Research Matching Grant Scheme (RMGS)) and a grant from the National Natural Science Foundation of China (11871124).

Acknowledgments

We gratefully appreciate three anonymous reviewers for their helpful comments and suggestions, which greatly improved the quality of this work. We also thank the support from the Public Computing Cloud, Renmin University of China. The computing facilities/software were supported by SAS Viya and the Big Data Intelligence Centre at Hang Seng University of Hong Kong.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Materials for Parameter Estimation

Appendix A.1. Inference

We consider the process of model parameter estimation. Let

σ = (σ_{(λ)}

,

σ_{(ψ)}

,

σ_{(θ)}

,

σ_{(ζ)})^{⊤}

,

ε = {(ε_{1}, \dots, ε_{R})}^{⊤}

,

α = {(α^{(λ)}, α^{(ψ)}, α^{(θ)}, α^{(ζ)})}^{⊤}

, and

β = (β_{1}^{(ψ)}

,

β_{1}^{(ζ)}

,

β_{2}^{(ζ)}

,

β_{3}^{(ζ)}

,

β_{4}^{(ζ)})^{⊤}

represent all unknown parameters. Let

v = ({v^{(λ)}}^{⊤}

,

{v^{(ψ)}}^{⊤}

,

{v^{(θ)}}^{⊤}

,

{v^{(ζ)}}^{⊤})^{⊤}

denote the observations of random effects

V

.

Recall that

Y_{r, t} | Y_{\cdot, t - l}, V \sim NegBin (μ_{r, t}, ε_{r})

, and

V \sim N (0, Σ)

. The conditionally joint probability function is then given as:

\begin{matrix} f & (y_{t}, v; α, β, ε, σ | y_{, t - l}) \\ = f (y_{, t}; α, β, ε, σ | v, y_{, t - l}) f (v; σ) \\ = \{\prod_{r} \frac{Γ (y_{r, t} + \frac{1}{ε_{r}})}{Γ (y_{r, t} + 1) Γ (\frac{1}{ε_{r}})} {(\frac{μ_{r, t} ε_{r}}{1 + μ_{r, t} ε_{r}})}^{y_{r, t}} {(\frac{1}{1 + μ_{r, t} ε_{r}})}^{\frac{1}{ε_{r}}}\} \\ \cdot {(2 π)}^{- 2 R} {| Σ |}^{- 1 / 2} exp (- \frac{1}{2} v^{⊤} Σ^{- 1} v) \\ \propto \{\prod_{r} \frac{Γ (y_{r, t} + \frac{1}{ε_{r}})}{Γ (\frac{1}{ε_{r}})} {(\frac{μ_{r, t} ε_{r}}{1 + μ_{r, t} ε_{r}})}^{y_{r, t}} {(\frac{1}{1 + μ_{r, t} ε_{r}})}^{\frac{1}{ε_{r}}}\} \\ \cdot {| Σ |}^{- 1 / 2} exp (- \frac{1}{2} v^{⊤} Σ^{- 1} v) . \end{matrix}

(A1)

or the logarithm of the conditional likelihood function is:

\begin{matrix} l (π, v) & \propto \sum_{r, t} {log Γ (y_{r, t} + \frac{1}{ε_{r}}) - log Γ (\frac{1}{ε_{r}}) + y_{r, t} log (\frac{μ_{r, t} ε_{r}}{1 + μ_{r, t} ε_{r}}) \\ - \frac{1}{ε_{r}} log (1 + μ_{r, t} ε_{r})} - \frac{1}{2} log | Σ | - \frac{1}{2} v^{⊤} Σ^{- 1} v, \end{matrix}

(A2)

where

π = {(α^{⊤}, β^{⊤}, ε^{⊤})}^{⊤}

,

| \cdot |

denotes the determinant of a matrix and

Γ (\cdot)

represents the gamma function.

Since (A2) includes the random effects, maximizing the above equation with respect to

π

and

v

does not generate the log-likelihood, but the penalized log-likelihood or posterior mode estimates from a Bayesian viewpoint [32]. Furthermore, not only the sample observations

v

are unavailable, but also the values of parameters

{\hat{σ}}_{(λ)}^{2}, {\hat{σ}}_{(ψ)}^{2}, {\hat{σ}}_{(θ)}^{2}

, and

{\hat{σ}}_{(ζ)}^{2}

are unknown. To address this issue, Paul and Held [13] proposed a Laplace approximation based method to estimate the covariance by maximizing the marginal likelihood with respect to

π

and

ε

. It is noteworthy that their approach can be applied to general cases. Note that the covariance matrix (i.e.,

Σ

) is assumed to be a diagonal matrix and

V_{r}^{(\cdot)} \sim (0, σ_{(\cdot)}^{2} I), r = 1, \dots, R

. If R is sufficiently large, we can replace

σ_{(\cdot)}^{2}

by its sample variance. In our case, R represents the number of states in the USA, which can be safely assumed to be sufficiently enough. Specifically, we have the sample variance as:

{\hat{σ}}_{(\cdot)}^{2} = \frac{{v^{(\cdot)}}^{⊤} v^{(\cdot)}}{R - 1} .

(A3)

The estimate of

v^{⊤} Σ^{- 1} v

then satisfies:

\hat{v^{⊤} Σ^{- 1} v} = v^{⊤} {\hat{Σ}}^{- 1} v = 4 (R - 1)

and the estimate of

| Σ |

is thus:

\begin{matrix} | \hat{Σ} | & = {({\hat{σ}}_{(λ)}^{2} {\hat{σ}}_{(ψ)}^{2} {\hat{σ}}_{(θ)}^{2} {\hat{σ}}_{(ζ)}^{2})}^{R} \\ = \frac{{[({v^{(λ)}}^{⊤} v^{(λ)}) ({v^{(ψ)}}^{⊤} v^{(ψ)}) ({v^{(θ)}}^{⊤} v^{(θ)}) ({v^{(ζ)}}^{⊤} v^{(ζ)})]}^{R}}{{(R - 1)}^{4 R}} . \end{matrix}

Therefore, (A2) can be simplified as:

\begin{matrix} l (π, v) & \propto \sum_{r, t} {log Γ (y_{r, t} + \frac{1}{ε_{r}}) - log Γ (\frac{1}{ε_{r}}) \\ + y_{r, t} log (\frac{μ_{r, t} ε_{r}}{1 + μ_{r, t} ε_{r}}) - \frac{1}{ε_{r}} log (1 + μ_{r, t} ε_{r})} \\ - \frac{R}{2} \sum_{λ, ψ, θ, ζ} log ({v^{(\cdot)}}^{⊤} v^{(\cdot)}) . \end{matrix}

(A4)

Due to the difficulty of implementing the gamma function with a large input value (e.g.,

Γ (200) = \infty

in the R software), we replace the gamma function

Γ (\cdot)

by Stirling’s approximation [33], i.e.,

\sqrt{2 π} exp (- u) u^{u - 1 / 2}

. One noticeable advantage for this replacement is that one can directly calculate the logarithm value first rather than the value of the gamma function first (and the logarithm value next), which can significantly mitigate the data overflow phenomenon (e.g.,

{log (Γ (u)) |}_{u = 200} = \infty

, but

log (\sqrt{2 π} exp (- u) u^{u - 1 / 2}) {|_{u = 200} = \frac{1}{2} log (2 π) - u + (u - \frac{1}{2}) log (u) |}_{u = 200} \approx 857.933

in the R software). Accordingly, (A4) can be approximated as:

\begin{matrix} l (π, v) & \propto \sum_{r, t} {(y_{r, t} + \frac{1}{ε_{r}}) log (\frac{1 + y_{r, t} ε_{r}}{1 + μ_{r, t} ε_{r}}) \\ + y_{r, t} log (μ_{r, t}) - \frac{1}{2} log (1 + ε_{r} y_{r, t})} \\ - \frac{R}{2} \sum_{λ, ψ, θ, ζ} log ({v^{(\cdot)}}^{⊤} v^{(\cdot)}) . \end{matrix}

(A5)

To obtain estimates of

π

and

v

by maximizing (A5), we apply the Adam algorithm, which is sufficient and accessible for multi-parameter optimization and only requires first-order gradients with little memory requirement [34]. Its pseudo-code is shown in Algorithm A1 in Appendix A.3. Here, the initial values for

v

’s,

α

’s, and

β

’s are generated from the standard normal distribution (

N (0, 1)

), while those for

ε

’s are generated from the uniform distribution (

U (0.01, 1)

), which guarantees the non-negativity.

Appendix A.2. First-Order Partial Gradients of the Penalized Log-Likelihood Function

From (A5), we have:

\begin{matrix} \frac{\partial l (π, v)}{\partial μ_{r, t}} = & \frac{y_{r, t}}{μ_{r, t}} - (y_{r, t} + \frac{1}{ε_{r}}) \frac{ε_{r}}{1 + μ_{r, t} ε_{r}}, \\ \frac{\partial μ_{r, t}}{\partial λ_{r, t}} = & y_{r, t - l}, \frac{\partial μ_{r, t}}{\partial ψ_{r, t}} = Ψ_{r, t - l}, \\ \frac{\partial μ_{r, t}}{\partial θ_{r, t}} = & Θ_{r, t - l}, \frac{\partial μ_{r, t}}{\partial ζ_{r, t}} = 1, \frac{\partial λ_{r, t}}{\partial α_{r}^{(λ)}} = \frac{\partial λ_{r, t}}{\partial v_{r}^{(λ)}} = λ_{r, t}, \\ \frac{\partial ψ_{r, t}}{\partial α^{(ψ)}} = & \frac{\partial ψ_{r, t}}{\partial v_{r}^{(ψ)}} = ψ_{r, t} \frac{\partial θ_{r, t}}{\partial α^{(θ)}} = \frac{\partial θ_{r, t}}{\partial v_{r}^{(θ)}} = 1, \\ \frac{\partial ζ_{r, t}}{\partial α^{(ζ)}} = & \frac{\partial ζ_{r, t}}{\partial v_{r}^{(ζ)}} = ζ_{r, t}, \frac{\partial ψ_{r, t}}{\partial β_{1}^{(ψ)}} = ψ_{r, t} log (P u_{r, t}), \\ \frac{\partial ζ_{r, t}}{\partial β_{1}^{(ζ)}} = & ζ_{r, t} log (logit (t)), \frac{\partial ζ_{r, t}}{\partial β_{2}^{(ζ)}} = ζ_{r, t} log (P o_{r, t}), \\ \frac{\partial ζ_{r, t}}{\partial β_{3}^{(ζ)}} = & β_{1}^{(ζ)} (1 - logit (t)), \frac{\partial ζ_{r, t}}{\partial β_{4}^{(ζ)}} = t β_{1}^{(ζ)} (1 - logit (t)) . \end{matrix}

Thus,

\begin{matrix} \frac{\partial l (π, v)}{\partial ε_{r}} = & \sum_{t} - \frac{1}{ε_{r}^{2}} log (\frac{1 + y_{r, t} ε_{r}}{1 + μ_{r, t} ε_{r}}) \\ + (y_{r, t} + \frac{1}{ε_{r}}) (\frac{y_{r, t}}{1 + y_{r, t} ε_{r}} - \frac{μ_{r, t}}{1 + μ_{r, t} ε_{r}}) \\ - \frac{y_{r, t}}{2 (1 + ε_{r} y_{r, t})}, \\ \frac{\partial l (π, v)}{\partial v_{r}^{(λ)}} = & \sum_{t} \frac{\partial l (π, v)}{\partial μ_{r, t}} \frac{\partial μ_{r, t}}{\partial λ_{r, t}} \frac{\partial λ_{r, t}}{\partial v_{r}^{(λ)}} - \frac{v_{r} R}{| v^{(λ)} |^{2}} \\ = & \sum_{t} λ_{r, t} y_{r, t - l} (\frac{y_{r, t}}{μ_{r, t}} - \frac{1 + y_{r, t} ε_{r}}{1 + μ_{r, t} ε_{r}}) - \frac{v_{r} R}{| v^{(λ)} |^{2}}, \\ \frac{\partial l (π, v)}{\partial v_{r}^{(ψ)}} = & \sum_{t} \frac{\partial l (π, v)}{\partial μ_{r, t}} \frac{\partial μ_{r, t}}{\partial ψ_{r, t}} \frac{\partial ψ_{r, t}}{\partial v_{r}^{(ψ)}} - \frac{v_{r} R}{| v^{(ψ)} |^{2}} \\ = & \sum_{t} ψ_{r, t} Ψ_{r, t - l} (\frac{y_{r, t}}{μ_{r, t}} - \frac{1 + y_{r, t} ε_{r}}{1 + μ_{r, t} ε_{r}}) - \frac{v_{r} R}{| v^{(ψ)} |^{2}}, \\ \frac{\partial l (π, v)}{\partial v_{r}^{(θ)}} = & \sum_{t} \frac{\partial l (π, v)}{\partial μ_{r, t}} \frac{\partial μ_{r, t}}{\partial θ_{r, t}} \frac{\partial θ_{r, t}}{\partial v_{r}^{(θ)}} - \frac{v_{r} R}{| v^{(θ)} |^{2}} \\ = & \sum_{t} Θ_{r, t - l} (\frac{y_{r, t}}{μ_{r, t}} - \frac{1 + y_{r, t} ε_{r}}{1 + μ_{r, t} ε_{r}}) - \frac{v_{r} R}{| v^{(θ)} |^{2}}, \\ \frac{\partial l (π, v)}{\partial v_{r}^{(ζ)}} = & \sum_{t} \frac{\partial l (π, v)}{\partial μ_{r, t}} \frac{\partial μ_{r, t}}{\partial ζ_{r, t}} \frac{\partial ζ_{r, t}}{\partial v_{r}^{(ζ)}} - \frac{v_{r} R}{| v^{(ζ)} |^{2}} \\ = & \sum_{t} ζ_{r, t} (\frac{y_{r, t}}{μ_{r, t}} - \frac{1 + y_{r, t} ε_{r}}{1 + μ_{r, t} ε_{r}}) - \frac{v_{r} R}{| v^{(ζ)} |^{2}}, \end{matrix}

\begin{matrix} \frac{\partial l (π, v)}{\partial π} = \sum_{r, t} \frac{\partial l (π, v)}{\partial μ_{r, t}} \frac{\partial μ_{r, t}}{\partial π} = \sum_{r, t} (\frac{y_{r, t}}{μ_{r, t}} - \frac{1 + y_{r, t} ε_{r}}{1 + μ_{r, t} ε_{r}}) \\ (λ_{r, t}, ψ_{r, t}, 1, ζ_{r, t}, ψ_{r, t} log (P u_{r, t}), ζ_{r, t} log (logit (t)), \\ ζ_{r, t} log (P o_{r, t}), β_{1}^{(ζ)} (1 - logit (t)), t β_{1}^{(ζ)} (1 - logit (t)))^{⊤}, \end{matrix}

where:

logit (t) = {(1 + exp \{- (β_{3}^{(ζ)} + β_{4}^{(ζ)} t)\})}^{- 1} .

Thus,

\begin{matrix} \nabla_{(π, v)} l (π, v) = & ({(\frac{\partial l (π, v)}{\partial π})}^{⊤}, \frac{\partial l (π, v)}{\partial v_{r}^{(λ)}}, \\ \frac{\partial l (π, v)}{\partial v_{r}^{(ψ)}}, \frac{\partial l (π, v)}{\partial v_{r}^{(θ)}}, \frac{\partial l (π, v)}{\partial v_{r}^{(ζ)}})^{⊤} . \end{matrix}

(A6)

Appendix A.3. Pseudo Algorithm

Here is the algorithm for the parameters’ optimization in (A5).

Algorithm A1 Adam based method for parameter optimization. Good default settings for the analyzed COVID-19 dataset are learning rate

η = 0.05

, exponential decay rates

b_{1} = 0.1

and

b_{2} = 0.1

, and

γ = 1 \times 10^{- 8}

. Algorithm tolerance

t o l = 1 \times 10^{- 4}

. All operations are element-wise.

Initialization: maxit = 200 (maximum iteration steps), flag = 0 (convergence indicator),

e_{1} = 0

(first moment vector),

e_{2} = 0

(second moment vector),

ϵ = 0

(iteration-step indicator),

(π_{0}, v_{0})

,

a_{0} = 0

,

τ_{0} = 0

.

Iteration process:

while

ϵ < =

maxit and flag = 0 do

ϵ = ϵ + 1

g_{ϵ} = \nabla_{(π, v)} l (π_{ϵ - 1}, v_{ϵ - 1})

(gradients of

l (π, v)

shown in (A6) in Appendix A.2)

e_{ϵ}^{(1)} = \frac{b_{1} e_{ϵ - 1}^{(1)} - (1 - b_{1}) g_{ϵ}}{1 - b_{1}^{ϵ}}

(bias-corrected first moment estimate)

e_{ϵ}^{(2)} = \frac{b_{2} e_{ϵ - 1}^{(2)} + (1 - b_{2}) g_{ϵ}^{2}}{1 - b_{2}^{ϵ}}

(bias-corrected second raw moment estimate)

τ_{ϵ} = (π_{ϵ - 1}, v_{ϵ - 1}) - η \frac{e_{ϵ}^{(1)}}{\sqrt{e_{ϵ}^{(2)}} + γ}

(temporarily updated parameters)

(π_{ϵ}, v_{ϵ}) = b_{2} a_{ϵ - 1} + (1 - b_{2}) τ_{ϵ}

(updated parameters)

a_{ϵ} = (1 - \frac{1}{ϵ}) a_{ϵ - 1} + \frac{1}{ϵ} τ_{ϵ}

(averaged parameters for further iteration)

if

\frac{| (π_{ϵ}, v_{ϵ}) - (π_{ϵ - 1}, v_{ϵ - 1}) |}{\sqrt{5 R + 9}} < t o l

(convergence determination)

flag = 1

end while

return

(π_{ϵ}, v_{ϵ})

(optimal estimates)

References

Gorbalenya, A.E.; Baker, S.C. The species Severe acute respiratory syndrome-related coronavirus: Classifying 2019-nCoV and naming it SARS-CoV-2. Nat. Microbiol. 2020, 5, 536. [Google Scholar]
Le, T.T.; Andreadakis, Z.; Kumar, A.; Roman, R.G.; Tollefsen, S.; Saville, M.; Mayhew, S. The COVID-19 vaccine development landscape. Nat. Rev. Drug Discov. 2020, 19, 305–306. [Google Scholar] [CrossRef] [PubMed]
Graham, B.S. Rapid COVID-19 vaccine development. Science 2020, 368, 945–946. [Google Scholar] [CrossRef] [PubMed]
Suganya, S.; Divya, S.; Parani, M. Severe acute respiratory syndrome-coronavirus-2: Current advances in therapeutic targets and drug development. Rev. Med Virol. 2020. [Google Scholar] [CrossRef] [PubMed]
Chowell, G.; Mizumoto, K. The COVID-19 pandemic in the USA: What might we expect? Lancet 2020, 395, 1093–1094. [Google Scholar] [CrossRef]
Rui, R.; Tian, M. Joint Estimation of Case Fatality Rate of COVID-19 and Power of Quarantine Strategy Performed in Wuhan, China. Biom. J. 2020, 63, 46–58. [Google Scholar] [CrossRef] [PubMed]
Kermack, W.O.; McKendrick, A.G. A contribution to the mathematical theory of epidemics. Proc. R. Soc. London. Ser. A Contain. Pap. Math. Phys. Character 1927, 115, 700–721. [Google Scholar]
Helfenstein, U. Box-Jenkins modeling of some viral infectious diseases. Stat. Med. 1986, 5, 37–47. [Google Scholar] [CrossRef]
Trottier, H.; Philippe, P.; Roy, R. Stochastic modeling of empirical time series of childhood infectious diseases data before and after mass vaccination. Emerg. Themes Epidemiol. 2006, 3, 9. [Google Scholar] [CrossRef] [Green Version]
Papastefanopoulos, V.; Linardatos, P.; Kotsiantis, S. COVID-19: A Comparison of Time Series Methods to Forecast Percentage of Active Cases per Population. Appl. Sci. 2020, 10, 3880. [Google Scholar] [CrossRef]
Held, L.; Höhle, M.; Hofmann, M. A statistical framework for the analysis of multivariate infectious disease surveillance counts. Stat. Model. 2005, 5, 187–199. [Google Scholar] [CrossRef] [Green Version]
Paul, M.; Held, L.; Toschke, A.M. Multivariate modeling of infectious disease surveillance data. Stat. Med. 2008, 27, 6250–6267. [Google Scholar] [CrossRef] [PubMed]
Paul, M.; Held, L. Predictive assessment of a non-linear random effects model for multivariate time series of infectious disease counts. Stat. Med. 2011, 30, 1118–1136. [Google Scholar] [CrossRef] [PubMed]
Cheng, Q.; Lu, X.; Wu, J.T.; Liu, Z.; Huang, J. Analysis of heterogeneous dengue transmission in Guangdong in 2014 with multivariate time series model. Sci. Rep. 2016, 6, 33755. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Adegboye, O.; Al-Saghir, M.; LEUNG, D.H. Joint spatial time-series epidemiological analysis of malaria and cutaneous leishmaniasis infection. Epidemiol. Infect. 2017, 145, 685–700. [Google Scholar] [CrossRef] [Green Version]
Wu, H.; Wang, X.; Xue, M.; Wu, C.; Lu, Q.; Ding, Z.; Zhai, Y.; Lin, J. Spatial-temporal characteristics and the epidemiology of hemorrhagic fever with renal syndrome from 2007 to 2016 in Zhejiang Province, China. Sci. Rep. 2018, 8, 1–14. [Google Scholar]
Dickson, M.M.; Espa, G.; Giuliani, D.; Santi, F.; Savadori, L. Assessing the effect of containment measures on the spatio-temporal dynamic of COVID-19 in Italy. Nonlinear Dyn. 2020, 101, 1833–1846. [Google Scholar] [CrossRef]
Qi, H.; Xiao, S.; Shi, R.; Ward, M.P.; Chen, Y.; Tu, W.; Su, Q.; Wang, W.; Wang, X.; Zhang, Z. COVID-19 transmission in Mainland China is associated with temperature and humidity: A time-series analysis. Sci. Total Environ. 2020, 728, 138778. [Google Scholar] [CrossRef]
Tosepu, R.; Gunawan, J.; Effendy, D.S.; Lestari, H.; Bahar, H.; Asfian, P. Correlation between weather and Covid-19 pandemic in Jakarta, Indonesia. Sci. Total Environ. 2020, 725, 138436. [Google Scholar] [CrossRef]
Hugh-Jones, M.; Wright, P. Studies on the 1967–8 foot-and-mouth disease epidemic: The relation of weather to the spread of disease. Epidemiol. Infect. 1970, 68, 253–271. [Google Scholar] [CrossRef]
Tan, J.; Mu, L.; Huang, J.; Yu, S.; Chen, B.; Yin, J. An initial investigation of the association between the SARS outbreak and weather: With the view of the environmental temperature and its variation. J. Epidemiol. Community Health 2005, 59, 186–192. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Verdasca, J.; Da Gama, M.T.; Nunes, A.; Bernardino, N.; Pacheco, J.; Gomes, M. Recurrent epidemics in small world networks. J. Theor. Biol. 2005, 233, 553–561. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Hassen, H.B.; Elaoud, A.; Salah, N.B.; Masmoudi, A. A SIR-Poisson Model for COVID-19: Evolution and Transmission Inference in the Maghreb Central Regions. Arab. J. Sci. Eng. 2021, 46, 93–102. [Google Scholar] [CrossRef] [PubMed]
Read, J.M.; Bridgen, J.R.; Cummings, D.A.; Ho, A.; Jewell, C.P. Novel coronavirus 2019-nCoV: Early estimation of epidemiological parameters and epidemic predictions. MedRxiv 2020. [Google Scholar] [CrossRef] [Green Version]
Giuliani, D.; Dickson, M.M.; Espa, G.; Santi, F. Modelling and Predicting the Spatio-Temporal Spread of Coronavirus Disease 2019 (COVID-19) in Italy; University of Trento: Trento, Italy, 2020. [Google Scholar]
Gao, X.; Dong, Q. A logistic model for age-specific COVID-19 case-fatality rates. JAMIA Open 2020, 3, 151–153. [Google Scholar] [CrossRef]
Kass, D.A.; Duggal, P.; Cingolani, O. Obesity could shift severe COVID-19 disease to younger ages. Lancet 2020, 399, 1544–1545. [Google Scholar] [CrossRef]
Harris, J.E. Data from the COVID-19 epidemic in Florida suggest that younger cohorts have been transmitting their infections to less socially mobile older adults. Rev. Econ. Househ. 2020, 18, 1019–1037. [Google Scholar] [CrossRef]
Bashir, M.F.; Ma, B.; Komal, B.; Bashir, M.A.; Tan, D.; Bashir, M. Correlation between climate indicators and COVID-19 pandemic in New York, USA. Sci. Total Environ. 2020, 728, 138835. [Google Scholar] [CrossRef]
Kimball, A.; Hatfield, K.M.; Arons, M.; James, A.; Taylor, J.; Spicer, K.; Bardossy, A.C.; Oakley, L.P.; Tanwar, S.; Chisty, Z.; et al. Asymptomatic and presymptomatic SARS-CoV-2 infections in residents of a long-term care skilled nursing facility—King County, Washington, March 2020. Morb. Mortal. Wkly. Rep. 2020, 69, 377. [Google Scholar] [CrossRef] [Green Version]
Rader, B.; Scarpino, S.V.; Nande, A.; Hill, A.L.; Adlam, B.; Reiner, R.C.; Pigott, D.M.; Gutierrez, B.; Zarebski, A.E.; Shrestha, M.; et al. Crowding and the shape of COVID-19 epidemics. Nat. Med. 2020, 26, 1829–1834. [Google Scholar] [CrossRef]
Kneib, T.; Fahrmeir, L. A mixed model approach for geoadditive hazard regression. Scand. J. Stat. 2007, 34, 207–228. [Google Scholar] [CrossRef]
Pearson, K. Historical note on the origin of the normal curve of errors. Biometrika 1924, 16, 402–404. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]

Figure 1. Original data of state-level daily positive test cases in the USA from https://www.kaggle.com/sudalairajkumar/covid19-in-usa (excluding Washington, D.C.). The black line indicates the related daily confirmed cases and the red line the cumulative confirmed cases. The names of all states are denoted by the related postal codes.

Figure 2. State-level population related information. We took the logarithm of the original data for better visualization.

Figure 3. State-level Spearman and Kendall test results. Scatter plots show the related p-value, and bar plots show the related number of states that are significantly influenced by related factors with

α = 0.01, 0.05

, and

0.10

, respectively, which are drawn with dotted black lines. The white dotted lines in the bar plots are equal to 25, which is approximately half of the total number of analyzed states. Maximum temperature (MaT), minimum temperature (MiT), humidity (Hu), the probability of precipitation appearance (PA), the percentage of cloud coverage (CC), sea-level air pressure (AP), wind speed (WS), and the columnar density of total atmospheric ozone (CDTAO)

Figure 3. State-level Spearman and Kendall test results. Scatter plots show the related p-value, and bar plots show the related number of states that are significantly influenced by related factors with

α = 0.01, 0.05

, and

0.10

, respectively, which are drawn with dotted black lines. The white dotted lines in the bar plots are equal to 25, which is approximately half of the total number of analyzed states. Maximum temperature (MaT), minimum temperature (MiT), humidity (Hu), the probability of precipitation appearance (PA), the percentage of cloud coverage (CC), sea-level air pressure (AP), wind speed (WS), and the columnar density of total atmospheric ozone (CDTAO)

Figure 4. State-level combination of Spearman and Kendall tests. Two-dimensional barcode plots show whether a factor is significantly associated with a state by both the Spearman and Kendall tests (with squares in dark green color). The bar plot represents the number of states that are significantly associated with each factor with

α = 0.01, 0.05

, and

0.10

, respectively, by both the Spearman and Kendall tests. The horizontal line in the top subplot is equal to 25, which is approximately half of the total number of analyzed states.

Figure 4. State-level combination of Spearman and Kendall tests. Two-dimensional barcode plots show whether a factor is significantly associated with a state by both the Spearman and Kendall tests (with squares in dark green color). The bar plot represents the number of states that are significantly associated with each factor with

α = 0.01, 0.05

, and

0.10

, respectively, by both the Spearman and Kendall tests. The horizontal line in the top subplot is equal to 25, which is approximately half of the total number of analyzed states.

Figure 5. State-level estimated random effects in the multivariate time series model for: (a) epidemic-within component

V^{(λ)}

; (b) epidemic-between component

V^{(ψ)}

; (c) epidemic-boosted component

V^{(θ)}

; and (d) endemic component

V^{(ζ)}

. Vertical dotted lines in (a,b) are

- α^{(λ)}

and

- α^{(θ)}

. There is a strong variation in all four components, and different random effects for different states have various influences.

Figure 5. State-level estimated random effects in the multivariate time series model for: (a) epidemic-within component

V^{(λ)}

; (b) epidemic-between component

V^{(ψ)}

; (c) epidemic-boosted component

V^{(θ)}

; and (d) endemic component

V^{(ζ)}

. Vertical dotted lines in (a,b) are

- α^{(λ)}

and

- α^{(θ)}

. There is a strong variation in all four components, and different random effects for different states have various influences.

Figure 6. State-level daily percentage of daily confirmed cases with respect to the epidemic-within, endemic-between, epidemic-boosted, and endemic components, which are indicated by red, green, blue, and black lines, respectively.

Figure 7. State-level estimated means of daily confirmed cases and related 90% confidence intervals. The gray band represents the 90% confidence interval, the white line the observations of the daily confirmed cases, and the red line the estimated mean values.

Table 1. Different time lags and the corresponding penalized log-likelihood values shown with the mean and standard deviation (sd). Thirty different initial processes for each lag are randomly implemented. The corresponding mean and sd are calculated based on 30 repetitions.

Lags	$l = 1$	$l = 2$	$l = 3$	$l = 4$	$l = 5$	$l = 6$	$l = 7$
$mean (l (π, v))$	47512882.850	48510611.518	48188861.487	47895162.927	47783167.600	47998732.406	47976026.240
$sd (l (π, v))$	1600987.450	301043.787	923652.391	849380.316	1297418.980	825285.208	1055666.220
lags	$l = 8$	$l = 9$	$l = 10$	$l = 11$	$l = 12$	$l = 13$	$l = 14$
$mean (l (π, v))$	47747910.670	47946826.823	47493940.570	47630036.889	47306645.479	47309743.565	47115278.420
$sd (l (π, v))$	1199672.290	643524.434	1288241.920	773951.669	920626.589	673304.655	1158747.060

Table 2. Optimal parameter estimates with time lag

l = 2

. One-hundred different initial processes are randomly implemented. The related mean and standard deviation (sd) are calculated based on 100× outcomes.

Table 2. Optimal parameter estimates with time lag

l = 2

. One-hundred different initial processes are randomly implemented. The related mean and standard deviation (sd) are calculated based on 100× outcomes.

Estimates	$σ_{(λ)}$	$σ_{(ψ)}$	$σ_{(θ)}$	$σ_{(ζ)}$	$α^{(λ)}$	$α^{(ψ)}$	$α^{(θ)}$	$α^{(ζ)}$
mean	0.851	0.836	0.816	0.853	0.220	−0.136	0.173	−0.233
sd	0.121	0.113	0.107	0.124	0.911	0.929	0.958	0.991
estimates	$β_{1}^{(ψ)}$	$β_{1}^{(ζ)}$	$β_{2}^{(ζ)}$	$β_{3}^{(ζ)}$	$β_{4}^{(ζ)}$	$ε_{A L}$	$ε_{A K}$	$ε_{A Z}$
mean	−0.300	0.232	−0.064	−0.119	−0.076	0.731	0.774	0.764
sd	0.873	1.068	1.004	1.180	1.085	0.285	0.286	0.269
estimates	$ε_{A R}$	$ε_{C A}$	$ε_{C O}$	$ε_{C T}$	$ε_{D E}$	$ε_{F L}$	$ε_{G A}$	$ε_{H I}$
mean	0.739	0.697	0.754	0.776	0.744	0.732	0.730	0.693
sd	0.286	0.296	0.284	0.270	0.285	0.301	0.275	0.297
estimates	$ε_{I D}$	$ε_{I L}$	$ε_{I N}$	$ε_{I A}$	$ε_{K S}$	$ε_{K Y}$	$ε_{L A}$	$ε_{M E}$
mean	0.707	0.673	0.781	0.715	0.696	0.751	0.816	0.784
sd	0.253	0.278	0.284	0.285	0.292	0.281	0.296	0.290
estimates	$ε_{M D}$	$ε_{M A}$	$ε_{M I}$	$ε_{M N}$	$ε_{M S}$	$ε_{N J}$	$ε_{N M}$	$ε_{N Y}$
mean	0.780	0.658	0.740	0.724	0.758	0.708	0.858	0.780
sd	0.301	0.291	0.296	0.289	0.308	0.291	0.273	0.306
estimates	$ε_{M O}$	$ε_{M T}$	$ε_{N E}$	$ε_{N V}$	$ε_{N H}$	$ε_{N C}$	$ε_{N D}$	$ε_{O H}$
mean	0.716	0.757	0.732	0.780	0.776	0.793	0.769	0.710
sd	0.299	0.281	0.298	0.303	0.284	0.273	0.284	0.301
estimates	$ε_{O K}$	$ε_{O R}$	$ε_{P A}$	$ε_{R I}$	$ε_{T X}$	$ε_{S C}$	$ε_{S D}$	$ε_{T N}$
mean	0.667	0.732	0.742	0.763	0.812	0.765	0.746	0.729
sd	0.294	0.277	0.306	0.283	0.296	0.279	0.273	0.290
estimates	$ε_{U T}$	$ε_{V T}$	$ε_{V A}$	$ε_{W A}$	$ε_{W V}$	$ε_{W I}$	$ε_{W Y}$	$ε_{D C}$
mean	0.736	0.742	0.687	0.757	0.805	0.779	0.727	0.791
sd	0.291	0.287	0.327	0.283	0.287	0.300	0.289	0.296

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Rui, R.; Tian, M.; Tang, M.-L.; Ho, G.T.-S.; Wu, C.-H. Analysis of the Spread of COVID-19 in the USA with a Spatio-Temporal Multivariate Time Series Model. Int. J. Environ. Res. Public Health 2021, 18, 774. https://doi.org/10.3390/ijerph18020774

AMA Style

Rui R, Tian M, Tang M-L, Ho GT-S, Wu C-H. Analysis of the Spread of COVID-19 in the USA with a Spatio-Temporal Multivariate Time Series Model. International Journal of Environmental Research and Public Health. 2021; 18(2):774. https://doi.org/10.3390/ijerph18020774

Chicago/Turabian Style

Rui, Rongxiang, Maozai Tian, Man-Lai Tang, George To-Sum Ho, and Chun-Ho Wu. 2021. "Analysis of the Spread of COVID-19 in the USA with a Spatio-Temporal Multivariate Time Series Model" International Journal of Environmental Research and Public Health 18, no. 2: 774. https://doi.org/10.3390/ijerph18020774

APA Style

Rui, R., Tian, M., Tang, M.-L., Ho, G. T.-S., & Wu, C.-H. (2021). Analysis of the Spread of COVID-19 in the USA with a Spatio-Temporal Multivariate Time Series Model. International Journal of Environmental Research and Public Health, 18(2), 774. https://doi.org/10.3390/ijerph18020774

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Analysis of the Spread of COVID-19 in the USA with a Spatio-Temporal Multivariate Time Series Model

Abstract

1. Introduction

2. Development of the Model

Models

3. Results

3.1. Data of Interest

3.2. Weather and Environmental Factors’ Selection

3.3. Optimal Parameters’ Determination

3.4. Components Analysis

4. Discussion

5. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A. Materials for Parameter Estimation

Appendix A.1. Inference

Appendix A.2. First-Order Partial Gradients of the Penalized Log-Likelihood Function

Appendix A.3. Pseudo Algorithm

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI