1. Introduction
Avocado (
Persea americana, Mill) is a climacteric fruit characterized by a high respiration rate [
1], which limits its shelf life [
2]. The respiratory phenomenon depends on the fruit’s intrinsic characteristics, as well as various external factors such as temperature, air composition, moisture content, ambient illumination, and mechanical damage [
3,
4]. An ambient storage temperature is a key factor in the deterioration rate of fresh fruits and vegetables [
5]. Numerous studies have quantified the effect of this variable on the respiration rate using modeling techniques, as demonstrated by Caleb et al. [
6] in pomegranate fruits, and Ravindra and Goswami [
7] in mango fruits. These authors found that, due to temperature effects, the respiration rate increased from 5.67 to 18.53 mLCO
2·kg
−1·h
−1 in the range from 5° to 15 °C and from 31.25 to 72.80 mLCO
2·kg
−1·h
−1 in the range from 5 to 30 °C, respectively. It has also been demonstrated that fresh produce, postharvest, requires high relative humidity in the storage room air to avoid experiencing stressful conditions [
8]. This was evident in a study conducted with nopal verdura cladodes, where a 65% relative humidity increased their respiration rate by up to 90% compared to a room humidity of 90% [
9]. This same condition is repeated in the lighting factor of storage spaces; for example, the senescence of asparagus is accelerated by 8 days when samples are kept under red and green light, and by 4 days when they are kept under white and blue light [
10].
The physiological variable of respiration in different fruit species has been extensively studied from the perspective of deterministic modeling. However, various environmental factors classified as random events can modify the respiratory behavior of these products to varying degrees. In the broad spectrum of modeling, one can also discuss models based on probability and statistics, where a model can be obtained based on a history of data that is continuously measured at equidistant intervals, known as time series, and, with this model, the future value of the variable can be estimated with a certain level of reliability [
11]. This type of model is closer to real scenarios in terms of environmental factors, since these environmental factors are stochastic phenomena, i.e., unpredictable.
The Autoregressive Integrated Moving Average (ARIMA) model proposed by Box–Jenkins is a well-established methodology [
12] that expresses the observation at time t as a linear function of previous observations, a current error term, and a linear combination of previous error terms [
13]. This model considers a set of data, known as a time series, spaced chronologically and uniformly over time. It is denoted by ARIMA (
p,
d,
q), which is a combination of an Autoregressive (AR) and Moving Average (MA) component with a differencing order (d), where p and q are the orders of AR and MA, respectively. A generalization of the ARIMA model is the Autoregressive Integrated Moving Average model with exogenous variables (ARIMAX), which can incorporate one or more input variables into the output time series through cross-correlation. In this regard, the exogenous variables are referred to as input variables, and the variable of interest is referred to as the output variable. The general structure of an ARIMAX model is shown in Equation (1) [
12].
where
is the value of the output variable at the current time,
c is a constant term,
and
are the coefficients of the autoregressive and moving average terms, respectively,
are the coefficients of the exogenous variable,
are the exogenous variables at time
t−
j, and
εt is the white noise.
The first approach of ARIMAX models to physiological processes in fruits is the study conducted [
4], where they elucidated the influence of temperature, relative humidity, and ambient illumination on the respiration rate in peach fruits, finding that the model is reliable in predicting the respiration rate with the presence of external factors. Therefore, the objective of this research was to develop a mathematical model using stochastic modeling techniques to explain the effect of exogenous variables, namely temperature, relative humidity, and ambient illumination, on the respiration rate in Hass avocado fruits. The obtained results will provide relevant information for the design of the transportation and storage operations of avocado fruits postharvest.
2. Materials and Methods
2.1. Experimental Material
Avocado Hass fruits were harvested from an orchard owned by the Fundación Salvador Sánchez Colín, located in the municipality of Temascaltepec (19°02′40.0′′ N 99°58′35.6′′ W), Mexico.
The fruits were harvested at 8:00 a.m. on April 2022 at physiological maturity with uniform coloration, a dry matter percentage of 24.3 ± 1.2%, and a weight ranging between 170 and 200 g. They were transported on the same day of harvest to the bioprocess laboratory at the Universidad Autónoma Chapingo. A selection was made based on uniformity in skin color and being free from mechanical damage. The fruits were left under laboratory conditions (20 ± 2 °C; 30–50% relative humidity) for 12 h for acclimatization before measurements were taken.
2.2. Measurement of Respiratory Activity and Exogenous Variables
The respiratory rate was quantified using a modified continuous airflow system proposed by Pérez-López et al. [
14] (
Figure 1). This system consists of an air compressor, a collector board with micrometric valves to regulate airflow, and airtight containers where fruits of known weight (approx. 500 g) were placed. Inside these containers, HOBO-brand sensors (TELAIRE-7001, Onset Computer Corporation, Boston, MA, USA) were also placed and connected to a data acquisition system from the HOBO brand (U12-012, Onset Computer Corporation, Boston, MA, USA).
These sensors were programmed to continuously record data every 2 min on a personal computer using the HOBOware
® Lite v. 3.1.0 software (Onset Computer Corporation, Boston, MA, USA). The recorded variables were as follows: CO
2 (ppm), temperature (°C), relative humidity (%), and ambient illumination (lux). Measurements were taken until the fruits reached consumption maturity, which lasted for 11 days. Subsequently, calculations were performed to obtain the respiratory rate according to Equation (2) [
14].
where
R is the respiration rate in terms of CO
2 (mL·kg
−1·h
−1),
CO2out and
CO2in are the CO
2 concentrations at the outlet and inlet of the container, respectively,
F is the airflow passing through the micrometric valve collector board (mL·h
−1), and
W is the weight of the fruits inside the container (kg).
2.3. Descriptive Statistical Analysis
Each of the quantified variables was considered as a time series. The respiration rate (RR) was the output variable, and the temperature (TEMP), relative humidity (RH), and ambient illumination (AI) variables were considered input variables. A descriptive statistical analysis was performed for each of these variables, along with a preliminary analysis of the avocado fruit respiration rate in order to observe its behavior over time.
2.4. ARIMA Modeling
The modeling procedure for each time series was conducted using the methodology of Box et al. [
12] based on three basic steps as follows: (1) model identification, (2) parameter estimation, and (3) diagnostic model verification.
The model identification stage involves checking for stationarity, meaning that the mean, variance, and autocorrelation remain constant over time [
15], which is a strict requirement for building ARIMA models. The simplest way to make a series stationary is by applying first-order differencing (d = 1) [
16], also known as integration. The parameter d indicates the number of times a series has been differentiated to make it stationary. This differencing involves subtracting successive values of the series (subtracting the first observation from the second, the second from the third, and so on), resulting in the loss of one value and the creation of a new series (Equation (3)).
where
yt is the value of the series at time
t, and
yt−1 is the value of the series at lag 1.
To verify the stationarity, the augmented Dickey–Fuller unit root test (ADF) was conducted. The order of the models (p, q) was identified using the autocorrelation function (ACF) and the partial autocorrelation function (PACF) as references, and the significance of the parameters of the candidate models was evaluated using the maximum likelihood method.
The Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC) of the candidate models were compared to identify the optimal and most parsimonious model. To test the non-correlation of the residuals of the final model, the Ljung–Box Q test and the autocorrelation tests (ACF and PACF) were used [
12].
2.5. ARIMAX Modeling
To detect the relationship between the output variable and the input variables, the cross-correlation function (CCF) was utilized, which is a simple and efficient technique for detecting any temporal lag between two time series [
17,
18]. The Granger causality test was also employed to determine the statistical causal relationships between the different time series [
19].
The identification of ARIMAX models was performed using the pre-whitened series of the output variable RR. Pre-whitening is a technique that helps identify which lag of the independent variable affects the dependent variable [
20]. Each of the input variables (TEMP, RH, and AI) was incorporated into the output variable RR as covariates, and the autocorrelation functions ACF and PACF were used to identify the models that exhibited white noise.
The significance of each parameter of the model was evaluated using the maximum likelihood method. The model residuals were checked using autocorrelation tests (ACF, PACF), and, finally, the gain (effect) of the transfer function models was quantified from the coefficients of the obtained multivariate ARIMAX model.
2.6. Statistical Analysis
The time series modeling was conducted using PROC ARIMA from the Statistical Analysis System software for Academics (SAS® Institute Inc., Cary, NC, USA), and the Granger causality test was performed using the statistical software R version 4.1.3 integrated into the RStudio platform (RStudio, PBC, Inc., Boston, MA, USA).
3. Results and Discussion
3.1. Descriptive Statistical Analysis
A total of 7730 data points were obtained from the continuous airflow respiration measurement system. The descriptive analysis (
Table 1) shows a mean of 23.28 °C and 35.14% inside the containers. The AI time series recorded a minimum value of 4.31 lux and a maximum of 697.90 lux, with a standard error of 2.41, indicating high variability around the mean due to the differences between day and night.
3.2. Respiration Rate
The mean RR recorded was 71.52 mL·kg
−1·h
−1, ranging from 34.17 to 125.34 mL·kg
−1·h
−1. Similar results were obtained by García et al. [
21], who recorded values ranging from 135 to 150 mL·kg
−1·h
−1, equivalent to 68.5 and 76.14 mL·kg
−1·h
−1, over 7 days at 25 °C and 44% relative humidity. Maftoonazad and Ramaswamy recorded higher values, ranging from 120 to 160 mL·kg
−1·h
−1 on days 0 and 6, respectively [
22]. On the other hand, Tesfay et al. [
1] obtained values of 160 and 200 mL·kg
−1·h
−1, equivalent to 81.9 and 101.5 mL·kg
−1·h
−1 a 21 ± 1 °C. These values confirm what Kader [
23] stated, affirming that avocado is a highly perishable fruit with respiration rates exceeding 60 mg·kg
−1·h
−1 (approx. 30.45 mL·kg
−1·h
−1).
An increase was recorded on day 10, coinciding with the maximum registered value of 125.34 mL·kg
−1·h
−1 (
Figure 2). As a climacteric fruit, avocados experience a climacteric peak before ripening, where cellular respiration is high, leading to fruit softening [
21,
24]. Once this climacteric peak is reached, the respiration rate decreases up to day 11, at which point, commercial maturity is achieved, followed by the onset of senescence.
The preliminary analysis of the original output series (RR) revealed a positive trend over time and the variability of the data around the mean (
Figure 2), indicating that the original series does not meet the stationarity criterion. It could be stated that the variability of the data concerning the mean is mainly due to the days and nights, the illumination varies during the day, as well as the temperature and therefore the relative humidity. These conditions simulate the environment of a fruit in its natural environment, without any type of treatment to preserve its shelf life. Due to this behavior, a logarithmic transformation and a first difference (d = 1) were applied (
Figure 3), which eliminated the trend and resulted in a mean approaching 0.
3.3. ARIMA Modeling
Table 2 shows the parameter estimates obtained for the autoregressive (AR) and moving average (MA) components identified for each time series. From the coefficients obtained from the AR and MA components of the final ARIMA models, the equations of the mathematical models are derived.
The time series of input variables TEMP, RH, and AI, as well as the output series RR, did not meet the stationarity requirement, so the original series has a first order integration (d = 1). With the differenced series, the obtained model is an ARIMA (0,1,0). The parameters of the model for the RR output series were identified based on the visual analysis of significant peaks (p ≤ 0.05) in the autocorrelation functions (ACF) and the partial autocorrelation (PACF) of the logarithm-transformed and differenced series.
The AR parameter is included if the ACF shows significant peaks, while the MA parameters are included if the PACF shows significant peaks [
12]. Graphically, it is observed that both functions of the transformed series show significant peaks at the initial lags, indicating the need to include two or more AR and MA processes in the corresponding final model for the RR input time series (
Figure 4).
The parameters obtained for the autoregressive (AR) and moving average (MA) components identified for each time series are significant (
p < 0.5). Another way to evaluate the viability of the models is through observing their t-values. Parameters whose absolute t-value is greater than 2 are included in the model [
12]. It is observed that the absolute t-values of all the models are above this value. The mean, with values very close to 0, was not significant, so it was not included in the model.
The optimal model that described the behavior of the output series RR was fitted to an ARIMA (1,1,3), meaning it includes one autoregressive component and three moving average components with one difference. It is worth mentioning that any modification to the original time series entails building a new mathematical model, which involves repeating the ARIMA model construction methodology. The input series TEMP, RH, and AI were fitted to the ARIMA models (3,1,2), ARIMA (1,1,2), and ARIMA (1,1,2) respectively, (
Table 3). All these selected models met the criterion of having the lowest AIC and BIC values.
The autocorrelation functions (ACF), as well as the PACF, did not show significant peaks in their residuals, meaning that they exhibit white noise, thus ruling out the presence of autocorrelation among them. The ADF test yielded significant values (p ≤ 0.05) for the final ARIMA models, rejecting the null hypothesis (the series is non-stationary, there is a unit root). The Ljung–Box test resulted in p-values greater than the significance level (p ≥ 0.05), indicating the absence of autocorrelation in the residuals of all selected optimal ARIMA models.
Figure 5 depicts the fitting of the optimal ARIMA models to the original data and the behavior of their residuals. The residuals of the displayed ARIMA models show very low values, indicating minimal difference between the observed experimental data and the fitted values, thus indicating a good fit. The series with the highest fit was TEMP and RH, followed by RR, and finally the AI series.
3.4. Causal Relationship
A lagged relationship was observed between the exogenous variables and the respiration rate through the cross-correlation function (CCF). However, relying solely on correlation to establish causality is insufficient and risky [
25]. Hence, the Granger causality test was employed to determine the effective existence of a causal relationship between the input variables and the output variable. This test considers changes in the variables themselves and explores the interrelationship among them, thus avoiding any spurious correlation between these variables [
26]. The test contrasts the null hypothesis: Variable 1 does not Granger-cause Variable 2, rejecting the hypothesis with
p-values ≤ 0.05.
The
p-values of the Granger causality test show a significant causal relationship with values of
p ≤ 0.05 in all three cases (
Table 4), indicating a direct cause-and-effect relationship between the input variables and the output variable RR. These results confirm that it is possible to integrate the variables TEMP, RH, and AI to predict the respiration rate in Hass avocado fruits.
3.5. ARIMAX Modeling
From the pre-whitened model of the output series RR, the input variables TEMP, RH, and AI were first individually incorporated, and then jointly to detect the individual effect of each input variable and its multivariate effect. The significance of the parameters of each of the found ARIMAX models is detailed in
Table 5. The equations obtained from the coefficients of these parameters are detailed in
Table 6.
3.6. Transfer Function Gain
The univariate effect (gain) of the input variables found by the ARIMAX (1,1,3) model are shown in
Table 7.
A contemporaneous effect was observed between the variable RH and RR, meaning that a change in RH over time implies a simultaneous change in the respiration rate. In contrast, the variables TEMP and AI showed a delayed effect. The impact detected in the transfer functions indicates that for each unit increase in T, RH, and AI, respiration increases by 0.34, 1.52, and 0.99 mL·kg−1·h−1, respectively. This implies an increase of 0.34% due to temperature change, 1.52% due to relative humidity change, and 0.99% due to the ambient illumination change on average in the respiration rates of Hass avocado fruits. An unequal increase is observed because these are different variables; therefore, it is expected that the effect of each of the variables is of different proportions, which is the focus of this study.
4. Conclusions
The effect of temperature, relative humidity, and ambient illumination on the respiration rate of Hass avocado fruits was adequately modeled using the Box–Jenkins methodology. The Granger causality test demonstrated a causal relationship between the input variables and the output variable, indicating that these variables were suitable for obtaining the transfer functions of the ARIMAX models. The impact detected in the transfer functions indicates that each of the exogenous variables has a directly proportional effect on the respiration rate, with proportions of 0.48%, 1.40%, and 2.12% due to a unit increase in temperature, ambient illumination, and relative humidity, respectively. ARIMAX modeling can be successfully used to explain respiratory behavior in Hass avocado fruits due to the effect of external factors after harvest. The models found were specifically for avocados under the conditions in which they were studied. In practice, this type of model could be adjusted for each particular species to determine the impact of the environmental and other random events involved. Based on this, the postharvest management conditions should be defined. In future research, it is intended to extrapolate this stochastic modeling procedure in order to measure the effect of dynamic loads on the respiratory metabolism of fruits during transportation, where there is a considerable loss in the quality of fresh products.