1. Introduction
Time series models have become increasingly popular for data analysis in various scientific fields. The widely recognized autoregressive moving average (ARMA) model [
1] has been commonly employed for the modeling of univariate time series. However, this model may not always be suitable for all types of data. In many cases, real-world data do not adhere to the assumption of normality that is required for the estimation of ARMA model parameters [
2]. Consequently, recent literature has introduced new non-Gaussian time-series models that assume different probability distributions.
A general time-series model, known as the generalized autoregressive moving average (GARMA), was proposed in [
3] as an extension of generalized linear models [
4], specifically designed for dependent variables belonging to the canonical exponential family. Building upon similar approaches, the authors of [
5] developed dynamic models using the beta family distribution, while [
6] introduced a dynamic class of models for double-bounded interval data following the Kumaraswamy distribution. The authors of [
7] proposed a dynamic regression model based on the Conway–Maxwell–Poisson distribution, and [
8] presented a new generalized autoregressive moving average model based on the Bernoulli geometric distribution. Other recent contributions in this field include [
9,
10]. As a comprehensive reference for non-Gaussian dynamic regression, see [
11].
Although numerous time series models have been published in the literature, there remains a limited availability of models specifically designed to handle continuous, asymmetric, and non-negative data. Given these circumstances, this work proposes a dynamic model based on the Chen distribution [
12]. This distribution is very flexible and has garnered attention from the scientific community, as evidenced in [
13,
14,
15]. The Chen distribution, which is characterized by shape parameters
and has support in the positive real numbers
, is defined by its probability density function [
12]:
The corresponding cumulative and quantile functions are respectively expressed by:
and
The original formulation of the Chen distribution relies on the parameters and , which may not have direct interpretability. However, for the purpose of regression and/or time-series modeling, it is more convenient to directly model the mean or median parameter of the distribution. Mean-based regression models are commonly employed for the modeling of response variables, but when the variable of interest exhibits asymmetric behavior, the more robust alternative is to use a median-based approach. Hence, in this work, we introduce a median-based reparameterization of the Chen distribution, which will serve as the foundation for a new flexible dynamic regression model for positive continuous data.
In this context, we introduce a novel class of dynamic regression models known as the Chen autoregressive moving average (CHARMA) model, which has specifically been designed for the modeling of asymmetric, continuous, positive, and time-dependent data. The CHARMA model assumes that the conditional distribution of the variable of interest follows the reparameterized Chen distribution. To model the conditional median, we employ a dynamic structure that includes autoregressive and moving average terms, time-varying regressors, and a strictly monotonic and twice-differentiable link function. We utilize the conditional likelihood theory to perform parameter inference for the CHARMA model. Additionally, we introduce closed-form expressions to the conditional score vector and the conditional information matrix, thus enabling computationally efficient inferences to be drawn for the model parameters. Diagnostic analysis and forecasting tools are also discussed to assess the model’s performance and predictive capabilities. To illustrate the practical application of the proposed model, we conduct a time-series analysis of average wind speed data taken from Rio Grande City, Brazil, and a time series analysis of monthly maximum temperature data from Teresina City, Brazil. In both applications, we compare the CHARMA model with other competing models and demonstrate the suitability of our proposed model and theory through empirical results. Overall, our findings highlight the effectiveness of the CHARMA model in modeling asymmetric, continuous, positive, and time-dependent data. The comprehensive analysis and empirical results further validate the practical applicability of our proposed model and theory.
The paper unfolds as follows.
Section 2 introduces a new median-based parameterization for the Chen distribution and the dynamical CHARMA model. The conditional likelihood inference is discussed in
Section 3.
Section 4 focuses on model selection criteria, diagnostics, and forecasting. Numerical results are discussed in
Section 5, wherein
Section 5.1 presents a Monte Carlo simulation study, and
Section 5.2 and
Section 5.3 explore empirical applications in monthly average wind speed data and monthly average maximum temperature data, respectively. Concluding remarks are given in
Section 6. Finally, some analytical details are presented in
Appendix A.
2. The Proposed Model
Let
represent the
th quantile of the Chen distribution, and
can be expressed as
. The probability density function and cumulative distribution function of a Chen-distributed variable
Y expressed in terms of its quantile-based parameterization can be given, respectively, by:
and
Note that if we set
, the value of
will correspond to the median (
) of variable
Y, that is,
.
Figure 1 illustrates different shapes of the Chen density of the reparameterized distribution, considering various values for
and
.
Let
be a stochastic process, where each
—conditioned on the previous information set
consisting of observations up to time
—follows a Chen distribution, as defined in (
1), with
. Using the median-based parameterization, the conditional density function of
is given by:
where
is the conditional median of
, and the parameter
is considered fixed for all
. The dynamical structure of the proposed CHARMA (
) model is written as follows:
where
represents the linear predictor,
is an intercept,
is an unknown
k-dimensional parameter vector associated with exogenous covariates,
is the
k-dimensional vector of explanatory covariates at time
t,
and
are the vectors of autoregressive and moving average parameters, respectively, and
is a strictly monotonic and twice-differentiable link function, where
. In this study, the errors were considered as
on the predictor scale, following the approach of [
6]. Due to the parametric space of
, we chose to use the logarithm as the link function because it provides non-negative values for
regardless of the values assigned to
. The proposed CHARMA (
) model is defined by (
2) and (
3), where
p and
q represent the dimensions of
and
, respectively, indicating the order of the ARMA dynamic component.
4. Model Selection, Diagnosis, and Prediction
In this section, we introduce several diagnostic measures for the assessment of the adequacy and goodness-of-fit of the proposed model. For model selection, we recommend utilizing the Akaike Information Criterion (AIC) [
19] and the Bayesian Information Criterion (BIC). From among various competing fitted models, the preferred model is the one with the lowest AIC and BIC value.
Residual analysis is important for assessing the goodness-of-fit of a statistical model [
11]. In this study, we consider the quantile residual, which is defined as follows [
20]:
where
is the standard normal cumulative distribution function. If the model is well-fitted,
is approximated distributed as a standard normal distribution, independently of the distribution of the response variable. These residuals are also expected to be independent, with a zero mean and a constant variance [
1,
21]. To evaluate the assumption that residuals are not auto-correlated, we suggest using the Ljung-Box test [
22]. This test is conducted under the null hypothesis that the first auto-correlations of the residuals are zero.
A fitted model that successfully passes all diagnostic checks can be utilized for both in-sample and out-of-sample predictions. Predictions for the conditional median of the CHARMA
model can be carried out by considering the estimation of
, replacing
with
in (
3). Thus, the in-sample predictions, starting at
, are calculated as follows:
where
. For predictions
h steps ahead, with
, the forecasts are calculated by the following equation:
where
,
, and
To assess the quality of both in-sample and out-of-sample predictions, some accuracy measures can be employed. For this purpose, we recommend utilizing the mean absolute percentage error (MAPE) and mean squared error (MSE) as figures-of-merit to quantify the differences between the predicted values from the fitted model and the observed values. These measures are commonly used when comparing competing models [
23,
24,
25].