1. Introduction
Sugar is an essential commodity in world trade due to its extensive use in the food industry. Sugar derivatives are also used to produce biofuels, which are likely to grow in the next decade [
1,
2]. When used for food purposes, sugar is mainly produced from sugarcane, predominantly in tropical and subtropical regions. In temperate regions, the most addressed raw material is sugar beet [
3,
4].
Observing the sugar prices in different regions of the world, it is evident that the global sugar market is extremely fragmented. A high price volatility has been observed in recent years [
5]. The volatile nature of the market is further intensified by the effects of various support measures that benefit the subsector [
6]. It is one of the commodities most strongly protected by governments, as they seek to protect producers from low prices by implementing various policy instruments, such as border measures, minimum price levels, and subsidies [
7]. This product may suffer influence from the intern markets due to strict regulations applied in countries like Russia, India, and China. Note that these countries are in the top ranking of the world’s largest producers. Other factors influence the price, such as import rate and production volume control [
8].
As specified by Van der Eng [
9], sugar is a crucial commodity for economic development. The average annual production is around 1700 million tons, which covers 24 million hectares of land around the world [
10]. In addition, a 15% growth in agricultural production is expected for the next decade, which can be attributed to improvements in production and high production intensity, followed by technological innovation [
11].
According to the United States Department of Agriculture (USDA), global consumption increased from 171,582 metric tons in 2019/20 to 177,795 metric tons in 2020/21 [
12]. Similarly, the per capita sugar consumption is expected to rise due to urbanization and increased consumption of processed foods [
11]. In the USA, the consumption of food commodities is also expected to grow, mainly due to the population increase [
13].
In this sense, understanding the dynamics of sugar prices between the domestic and international markets is of paramount importance for the success of companies operating in this sector. It plays a significant role in determining the rate of inflation, wages, salaries and various policies in an economy. In the case of cash crops such as sugarcane, the level of production affects the raw material cost and the competitive advantages in the market [
14].
The literature on modeling international sugar markets and projecting their prices tends to use general equilibrium models [
6] or partial equilibrium models [
15,
16,
17]. These are usually recursive approaches that provide annual market balances for production, consumption, trade, and world prices over a projection period. Thus, the models aim to assess the effect of specific policies on the international market and their ability to incorporate a wide range of policy variables.
Another approach related to sugar price analysis is based on time series techniques [
8,
18,
19,
20,
21]. In this case, the policy simulation possibilities are relatively narrow, but allow us to use data at a much higher frequency level (e.g., daily) than partial or general equilibrium models. Time series analysis and forecasting is addressed in different fields of research, such as economics, finance, health, engineering, etc. [
22], which is the focus of the current investigation.
Several models have been suggested in the literature to improve the accuracy of sugar price forecasting. Ribeiro and Oliveira [
23] proposed a hybrid model composed of artificial neural network Kalman filters to predict the monthly price of sugar in Brazil and India. Melo, Milioni and Júnior [
24] used the Mixture of Local Experts Models (MLEM) to predict the daily and monthly price of American domestic sugar. Esam [
25] applied the Autoregressive Integrated Moving Average (ARIMA) model to estimate the annual Egyptian sugar consumption and production. Mehmood [
26] also applied the ARIMA model to predict the annual production of the sugar crop in Pakistan. Vishawajith et al. investigated the growth or decrease in the sugar crop area in India using annual data and the ARIMA model [
27].
The application, study, and development of forecasting models have been a theme very much discussed in the last few decades [
28], and such discussion has mainly focused on linear models. The models from the Box and Jenkins family stood out as some of the most used. However, due to the growing use of neural-inspired approaches, there is still room for evaluating the performance of nonlinear approaches in many fields, such as commodities prices. There is an important research gap that deserves to be filled. In this sense, this paper aims to investigate the use of feedforward and recursive Artificial Neural Networks (ANNs) as well as linear models for the price of sugar in several countries. The models addressed are the Multilayer Perceptron (MLP), Extreme Learning Machines (ELM), Elman Network, Jordan Network, Echo State Networks-ESN, Autoregressive Model (AR), and Autoregressive Moving and Average model (ARMA) [
3,
29,
30,
31]. The databases addressed are related to four of the most important markets: the European Union, United States, Brazil and the World.
Commodity prices broadly influence general price levels, which are of interest to central databases, policymakers, businesses, and consumers, in such a way that decisions depend on their expectations of future inflation. Thus, understanding the price dynamics between the national and international markets is of great importance for the strategic planning of the sugar and alcohol sector [
9]. This study may effectively help society, researchers, and scientists working in this area [
32].
This paper is organized as follows:
Section 2 presents the details of the forecasting models;
Section 3 discusses the database addressed, pre-processing stages and the performance metrics used;
Section 4 shows the results found by the models and a critical analysis; finally, this paper is concluded in
Section 5.
3. Data Analysis and Processing
This section presents the details of the databases used in this work. The historical data from the European Union, the United States, and the World are available on The World Bank website [
52]. The samples for Brazilian sugar prices were obtained at the Center for Advanced Studies in Applied Economics-CEPEA [
53].
The database from Brazil (
Figure 3) corresponds to monthly prices in Reais (R
$) (Brazilian currency) per kilogram and comprises the period from May 2003 to March 2020, totaling 203 samples. The European Union (
Figure 4), the United States (
Figure 5), and the World (
Figure 6) databases correspond to monthly prices per kilogram in US dollars (US
$) and comprise the period from January 1960 to October 2019, totaling 718 samples. The series was collected until March 2020.
In order to detect the presence of the trend component, the Kendall test was applied for each series. This component was detected only in the United States series. Thus, the differentiation method made the series approximately stationary [
28].
Seasonality can be identified using the Friedman test [
28]. It was found that all series were not stationary because they present seasonal components. In this sense, applying an adjustment that excludes the seasonal influences is required. After this process, the series becomes approximately stationary, presenting zero mean and unitary standard deviation, which are mandatory for applying the linear methods. The seasonal adjustment function used in this work was the
Z-Score, given by Equation (
5):
in which
is the new adjusted price,
represents the original sugar price in time
t,
is the mean,
is the standard deviation, and
is the seasonality index (in this case, the index of each month) [
43].
Next,
Figure 11 and
Figure 12 were expanded upon to analyze the autocorrelation function (ACF) and partial autocorrelation function (PACF) of each series, respectively. To achieve this, we considered up to 80 lags and the adjusted series.
The ACF of the E.U. and USA present a slow and constant decay, considering the decrease in the correlation of the samples. On the other hand, the ACF of the World and Brazil show an exponential decay. These formed functions present characteristics analogous to those of a combination of sine and damped exponentials, indicating that the generating process is associated with an autoregressive model [
28].
Regarding the PACF, the E.U. series presents up to 6 significant lags, the highest value. The other series are limited to 4 lags [
28].
To perform the experiments, a separation in the dataset was split into three groups- training, validation, and testing–in which the temporal order is respected. The first 66.67% were used for training, the following 17.01% for validation, and the last 16.32% for testing. Such separation is necessary for the application of Artificial Neural Networks. It is unnecessary for the AR and ARMA models to carry out validation, so the training and validation sets were used to adjust the models. This way, the test set presents the same size and samples for all experiments.
The wrapper method was used to select the most relevant inputs for the ANNs, which performs a systematic search for the best variables [
41]. An exhaustive search was carried out for the AR and ARMA models, considering up to order 6.
Table 1 presents the selected lags for the ANNs, the
p order of the AR models, and the
p and
q orders of the ARMA models, considering the four series addressed. In
Table 1, the inputs are problem-dependent, meaning that the best entries often differ when using the wrapper methodology.
4. Results and Discussion
This section presents and discusses the forecasting results achieved for Brazil, the European Union, the United States, and world sugar prices. The forecasting methods are shown in
Section 2.
To determine the number of neurons for the unique hidden layer of each neural network, a grid search considering from 5 up to 50 neurons was performed for each architecture. Also, 30 independent rounds of simulation were executed for the neural models.
Table 2,
Table 3,
Table 4 and
Table 5 show the best result of the test set for 30 rounds, considering 1-, 3-, 6- and 12 steps-ahead forecasting, respectively. In these tables,
NN denotes the number of neurons in the hidden layer of the best configuration. The values in bold represent the most minor errors for each database.
The Wilcoxon test was applied to the results regarding the MSE in the test set. The
p-values achieved were close to zero (below
). Therefore, we can assume that changing the predictor implies a significant change in the results [
29].
It is possible to notice that as the forecasting horizon increases, there is a tendency for errors to increase. This is partly due to the decreasing correlation between the desired inputs and outputs.
Figure 11 reveals this behavior, as the lags are less correlated as the temporal distance among the samples increases. Also, the best predictor for MSE is often different for MAE. This behavior can occur since the used metrics penalize the difference between the desired response and the model’s output differently. The MSE strongly penalizes the highest differences, since it uses the square operation. This was observed in previous studies on time series forecasting [
40,
41].
By analyzing the values highlighted in bold, it is possible to observe that the linear models (specifically the AR) were the winners, only considering one step ahead of the European Union and the MAE. Furthermore, the ARMA and the Jordan network did not show the best results for any scenario.
Observing
Table 2, it was noted that the Elman and Jordan networks did not present highlighted results. However, the other recurrent proposal, ESN, created by Jaeger and Ozturk et al., achieved the smallest MSE for three of four series and the best MAE for the world series. The other cases favored the MLP (best MAE for Brazil and the USA) and ELM (best MSE for the EU). In summary, considering the MSE, the most-used metric for forecasting purposes, the unorganized approaches (ELM and ESN) stood out. This is an interesting finding, since these methods are simple to implement and fast to adjust.
The results were different for three steps-ahead prediction. The ELM and Elman performed better, except for the USA, in which the MLP achieved the best errors.
Regarding the six- and twelve steps-ahead horizon, a pattern was observed. The Elman network, ESN-Ozturk et al., and MLP achieved the smallest errors for both metrics considering the Brazil, EU, and USA datasets, respectively. ELM and ESN-Jaeger were the best for the world dataset.
In order to show the general ranking of the models,
Table 6 was created based on Borda Count Methodology [
55], considering all forecasting horizons. In this case, “N. wins” means the number of wins each model achieved.
The linear models (AR and ARMA) achieved just one best result. This observation was expected since neural models are nonlinear techniques capable of approximating any dataset with a small error. The approach is more flexible than linear structures due to the presence of the nonlinear activation functions in the hidden layer, justifying their wide application in time series prediction [
41,
51,
56]. However, it is essential to mention that the ANNs must be carefully adjusted to avoid overtraining. In this case, the model may “learn” the training samples, but it becomes unable to generalize or indicate a suitable response for unknown samples (test).
Considering the general results of the ANNs, the Jordan network did not perform well. This is remarkable, since the Elman network, which presents a structure with a high similarity to the Jordan proposal, was the best for many cases. Indeed, the delayed output of the hidden layer seems to present more useful information in the composition of the formation of the response of the model than the output response.
The MLP was the architecture that achieved the best results, followed by the Elman network. As can be observed, the results are miscellaneous, but the Elman network was the best for multistep forecasting for Brazil, as well as the MLP for the USA.
The unorganized approaches performed well, too. We highlight the ESN-Jaeger for one step ahead and the ELM for three steps ahead. This is a good indication, since these methods are more accessible in adjusting the parameters, and there is no instability in the response due to a vanishing gradient, as in classic RNN approaches [
42]. These results are interesting because they corroborate the findings of previous studies that dealt with daily samples [
3].
Finally,
Figure 13,
Figure 14,
Figure 15 and
Figure 16 present the output of the best predictor according to the MSE, considering one step-ahead forecasting for the test set of Brazil, the European Union, the United States, and the world, respectively.
5. Conclusions
This study investigated the application of several models to predict the sugar prices from four databases: Brazil, the European Union, the United States, and the world databases. This is an essential task, as sugar is present in several sectors of the economy. Also, it presents a fragmented global market, which is evident from its high volatility.
For this purpose, several prediction methods were evaluated: linear methods (AR and ARMA) and Artificial Neural Networks (MLP, ELM, Elman, Jordan, ESN-Jaeger, and ESN-Ozturk et al.). In this case, the Wrapper method selected the number of inputs of each model. Furthermore, different forecast horizons were considered: P = 1, 3, 6, and 12 steps ahead.
In general, the computational results show superior performance for the neural models. This is unsurprising, since real-world problems are often complex, with linear and nonlinear components. The ability of these methods to process nonlinear information is one of the most important advantages of ANN compared to traditional statistical approaches.
For future investigations, it is suggested to investigate the application of combination models, such as Ensembles and Hybrid Models, based on Error Correction. In addition, using deep neural models is an interesting possibility, mainly applying Long Short-Term Memory (LSTM) networks due to the encouraging results in forecasting tasks reported in the literature [
57,
58,
59,
60]. Finally, a comparison of the findings of this work and the use of equilibrium models should be evaluated.