Forecasting Container Throughput of Singapore Port Considering Various Exogenous Variables Based on SARIMAX Models

Lee, Geun-Cheol; Bang, June-Young

doi:10.3390/forecast6030038

Open AccessArticle

Forecasting Container Throughput of Singapore Port Considering Various Exogenous Variables Based on SARIMAX Models

by

Geun-Cheol Lee

¹

and

June-Young Bang

^2,*

¹

College of Business Administration, Konkuk University, Seoul 05029, Republic of Korea

²

Department of Industrial and Management Engineering, Sungkyul University, Anyang 14097, Republic of Korea

^*

Author to whom correspondence should be addressed.

Forecasting 2024, 6(3), 748-760; https://doi.org/10.3390/forecast6030038

Submission received: 30 July 2024 / Revised: 28 August 2024 / Accepted: 29 August 2024 / Published: 30 August 2024

Download

Browse Figures

Versions Notes

Abstract

:

In this study, we propose a model to forecast container throughput for the Singapore port, one of the busiest ports globally. Accurate forecasting of container throughput is critical for efficient port operations, strategic planning, and maintaining a competitive advantage. Using monthly container throughput data of the Singapore port from 2010 to 2021, we develop a Seasonal Autoregressive Integrated Moving Average with Exogenous Variables (SARIMAX) model. For the exogenous variables included in the SARIMAX model, we consider the West Texas Intermediate (WTI) crude oil price and China’s export volume, alongside the impact of the COVID-19 pandemic measured through global confirmed cases. The predictive performance of the SARIMAX model was evaluated against a diverse set of benchmark methods, including the Holt–Winters method, linear regression, LASSO regression, Ridge regression, ECM (Error Correction Mechanism), Support Vector Regressor (SVR), Random Forest, XGBoost, LightGBM, Long Short-Term Memory (LSTM) networks, and Prophet. This comparative analysis was conducted by forecasting container throughput for the year 2022. Results indicated that the SARIMAX model, particularly when incorporating WTI prices and China’s export volume, outperformed other models in terms of forecasting accuracy, such as Mean Absolute Percentage Error (MAPE).

Keywords:

container throughput forecasting; port of Singapore; SARIMAX model; time series analysis; exogenous variables

1. Introduction

Today, while artificial intelligence and digital transformation are revolutionizing global industries, the growth of the global economy still fundamentally depends on the increasing volume of trade in physical goods. The global network of ports and the container transportation system that connects them play an important role in facilitating such physical flow of goods across international borders. Among the various decision-making processes related to port operations, the accurate prediction of container throughput is considered one of the most crucial. That is, container throughput forecasting plays an important role in strategic planning and operational efficiency within the maritime logistics sector. The ability to precisely anticipate container throughput not only facilitates optimal resource allocation but also enhances the overall performance of container terminals and the global supply chain [1,2].

In recent years, global container throughput has been recovering from the pandemic-induced downturn. However, this recovery has been conspicuously slow, with growth rates reaching only about 1.5% year-over-year in 2022 [3]. Because events such as the financial crisis, the COVID-19 pandemic, and the conflict in Ukraine have significantly impacted global supply chain dynamics, container throughput is directly influenced by global economic conditions. These external factors have underscored the need for more sophisticated forecasting models that can account for such disruptive events. Consequently, there has been an increase in research efforts focused on developing predictive models that incorporate these external factors. Such research focusing on container throughput forecasting that explicitly incorporates the impact of COVID-19 can be easily found recently [4,5,6,7].

In light of these considerations, this study aims to conduct research on monthly container throughput forecasting for the Port of Singapore, one of the world’s major ports. In this study, we propose to develop a model that takes into account exogenous variables that significantly impact global economic conditions. The Port of Singapore has consistently maintained its position as the world’s second-busiest container port, following Shanghai, in terms of container throughput [3]. This ranking has been sustained for several decades, demonstrating Singapore’s enduring significance in global maritime trade as one of the world’s busiest transshipment hubs. Given its critical role, accurate container throughput forecasting for the Port of Singapore is of paramount importance. The port’s strategic location in the Malacca Strait positions it as a critical node connecting major shipping routes between Asia, Europe, and the Americas. Consequently, given the aforementioned significance, a substantial body of research has continued to address container throughput forecasting for the Port of Singapore up to the present day [1,8,9,10].

As previously mentioned, the importance of container throughput forecasting has led to many studies in this field. Comprehensive reviews of methodologies employed in existing studies can be found in the works of Munim et al. [2], Huang et al. [6], and Shankar et al. [11], where they present well-structured tabular summaries of previous research categorized by methodology. Generally, existing studies can be classified into three main methodological approaches: statistical methods, machine learning-based methods, and hybrid methods that combine the two. Statistical methods, although considered somewhat traditional, continue to be widely utilized even in recent research [1,2,4,7,10,12]. Techniques such as ARIMA and Exponential Smoothing are still prevalent, demonstrating their enduring effectiveness, particularly for medium- to long-term demand forecasting where sample sizes may be limited. In machine learning-based methods, artificial neural network approaches, including LSTM and RNN, have been proposed in recent studies [8,9,11,13]. These techniques have shown promising results in capturing complex patterns in container throughput data. The most notable trend in recent research is the increasing prevalence of hybrid methods, which combine artificial intelligence techniques with statistical approaches [5,6,14]. These hybrid approaches aim to leverage the strengths of both traditional and machine learning-based methods to improve forecasting accuracy.

A common characteristic among existing container throughput forecasting studies is the prevalent use of univariate time series analysis. Many researchers still rely solely on a single time series—the container throughput of a specific port—to predict future volumes. This approach, however, may be inadequate for forecasting container throughput, which is inherently influenced by international trade volumes and, in turn, affected by various external environmental factors, such as COVID-19, the financial crisis, oil price fluctuations, and other factors. To address this limitation, this study aims to identify key factors that may influence container throughput at the Port of Singapore and propose a SARIMAX-based demand forecasting method that incorporates these exogenous variables. Furthermore, in this study, we try to empirically demonstrate that the proposed forecasting method outperforms various existing methods, including state-of-the-art machine learning models, in terms of forecasting accuracy.

The remainder of this paper is structured as follows: The next section provides insights into container throughput forecasting through data analysis of various exogenous variable time series, including the container throughput time series for the Port of Singapore. Section 3 introduces the SARIMAX model utilized in this study. Section 4 presents the sequential procedures necessary for SARIMAX model fitting, including model identification, model diagnostics, and exogenous variable selection. Additionally, we conduct comparative experiments with several well-known benchmark methods to validate the performance of the proposed forecasting model. The final section concludes this paper with a discussion of this study’s findings and directions for future research.

2. Time Series Analysis and External Influences on Container Throughput

In this section, we first analyze the time series characteristics of container throughput at the Port of Singapore. Based on this analysis, we identify potential exogenous variables that are believed to influence this time series. We then examine the relationship between container throughput and each of these exogenous variables. The monthly container throughput data for the Singapore Port was collected from the website serviced by the Singapore Department of Statistics (tablebuilder.singstat.gov.sg accessed on 26 June 2024). Throughout this paper, we use TEU (twenty-foot equivalent unit) as the unit of the container throughput volume, which is a commonly used measure of volume in units of twenty-foot-long containers.

Figure 1 illustrates the trend of monthly container throughput at the Port of Singapore from 1995 to 2021. While the overall trajectory demonstrates an upward trend, there are notable periods of significant fluctuations in throughput volumes. The graph reveals several key inflection points corresponding to major global economic events. In the early 2000s, a substantial decline in throughput is evident, coinciding with the Asian financial crisis. This downturn was followed by an even more pronounced decrease during the global financial crisis of 2008–2009, triggered by the collapse of Lehman Brothers. Subsequently, the data indicates further periods of throughput reduction in 2015 and 2016, attributable to various factors affecting global trade. The most recent disruption is observable in 2020, where the impact of the COVID-19 pandemic is reflected in a stagnation of throughput volumes and an increase in volatility.

Based on the observations from Figure 1, we propose to focus our data analysis on the period following the global financial crisis to capture more recent and relevant trends. The substantial fluctuations in container throughput before and after the financial crisis suggest that focusing on the data from 2010 onwards would provide a more relevant context for our study. This timeframe allows us to examine more recent trends and factors affecting container throughput. Upon closer inspection of the post-2010 period, the graph reveals notable deviations from the underlying trend and seasonal patterns in container throughput, particularly during 2015–2016 and around 2020. These anomalies warrant further investigation to identify the underlying factors contributing to these fluctuations.

The decline in container throughput at the Port of Singapore observed in 2015 and 2016 can be attributed to several interrelated factors. Primarily, this period coincided with a global economic slowdown, which was closely linked to the deceleration of the Chinese economy. Additionally, the decrease in oil prices led to an increase in direct shipping, while the restructuring of shipping alliances also contributed to the reduction in container throughput [15,16]. To quantify these factors, our study focuses on two key variables that can be objectively measured: the price of West Texas Intermediate (WTI) crude oil and China’s export volume. The data were collected from two authoritative sources: the U.S. Energy Information Administration (EIA, www.eia.gov accessed on 28 June 2024) and the Federal Reserve Bank of St. Louis’ Federal Reserve Economic Data (fred.stlouisfed.org accessed on 28 June 2024) system.

Figure 2a illustrates that the WTI oil price significantly explains the substantial decline in container throughput at the Port of Singapore observed in 2015. Figure 2b depicting China’s export volume appears to exhibit a high degree of synchronization with Singapore’s container throughput in terms of seasonal patterns. A notable common characteristic of both charts is the distinct shift in trends before and after 2016. This inflection point marks a clear boundary between two different trend patterns, suggesting a potential structural change in the factors influencing both oil prices and container throughput.

Next, we aim to identify quantitative data that can illustrate the impact of the COVID-19 pandemic, which emerged in 2020, on container throughput at the Port of Singapore. While various approaches could be employed, we have prepared additional exogenous variable time series using confirmed case data. In this study, we intend to compare the number of confirmed COVID-19 cases in the world with the container throughput at the Port of Singapore. The statistics for confirmed cases were collected from the WHO COVID-19 Dashboard (data.who.int accessed on 5 July 2024).

Figure 3 illustrates the trends in container throughput at the Port of Singapore and global COVID-19 confirmed cases for the years 2020 and 2021. Contrary to initial expectations, there is no clear inverse relationship between the increase in confirmed cases and a decrease in container throughput. In fact, the correlation coefficient between container throughput and confirmed cases during this period is 0.477, indicating a moderate positive relationship.

Based on the analysis conducted thus far, this study considers the price of crude oil, China’s export volume, and the number of COVID-19 confirmed cases as quantitative exogenous variables that influence container throughput at the Port of Singapore. The following sections will detail how these selected variables are incorporated into the time series models, specifically focusing on their integration and impact on forecasting accuracy.

3. Methodology

This section describes the theoretical foundations of two time series modeling techniques: Seasonal AutoRegressive Integrated Moving Average (SARIMA) and its extension, SARIMA with eXogenous variables (SARIMAX). These models have proven invaluable in analyzing and forecasting various time series data, particularly those exhibiting seasonal patterns and influenced by external factors. We begin with an explanation of the SARIMA model, including its components and mathematical formulation. Subsequently, we introduce the SARIMAX model, which enhances the SARIMA framework by incorporating exogenous variables. Through this discussion, we aim to provide a robust theoretical basis for the analytical approaches employed in our study of container throughput at the Port of Singapore.

3.1. SARIMA Model

The SARIMA model, proposed in the 1970s, remains widely utilized in the field of time series analysis and forecasting due to its theoretical robustness. SARIMA extends the ARIMA model by incorporating seasonal components. While ARIMA models a time series using three components—autoregressive (AR), differencing (I), and moving average (MA) terms—SARIMA adds seasonal autoregressive (SAR), seasonal differencing (SI), and seasonal moving average (SMA) terms, resulting in a comprehensive model that accounts for seasonality.

To specify a SARIMA model, one must determine the order of each component, a process known as model identification. The orders of the six components—AR, I, MA, SAR, SI, and SMA—are denoted by p, d, q, P, D, and Q, respectively. For a time series

\{Y_{t}\}

with time index t, the SARIMA model can be represented as follows:

ϕ_{p} (B) Φ_{P} (B^{s}) {(1 - B)}^{d} {(1 - B^{s})}^{D} Y_{t} = θ_{q} (B) Θ_{Q} (B^{s}) ϵ_{t}

(1)

In this equation, s represents the seasonality. In this study, s is set to 12 since the container throughput is the monthly time series data. Using the backward shift operator B, the model can be expressed in a compact form, where

ϕ_{p} (B)

,

θ_{q} (B)

,

Φ_{P} (B^{s})

, and

Θ_{Q} (B^{s})

are polynomials of B:

1 - ϕ_{1} B - ϕ_{2} B^{2} - \dots - ϕ_{p} B^{p}

,

1 + θ_{1} B + θ_{2} B^{2} + \dots + θ_{q} B^{q}

,

1 - Φ_{1} B^{s} - Φ_{2} B^{2 s} - \dots - Φ_{P} B^{P s}

, and

1 + Θ_{1} B^{s} + Θ_{2} B^{2 s} + \dots + Θ_{Q} B^{Q s}

, respectively. In these polynomials,

ϕ_{p}

and

θ_{q}

are the coefficients of the non-seasonal components, while

Φ_{P}

and

Θ_{Q}

are the coefficients of the seasonal components. The term

ϵ_{t}

represents white noise.

3.2. SARIMAX Model

The SARIMAX model extends the SARIMA model by incorporating exogenous variables that significantly influence the time series. While SARIMA models the series using past values and seasonal effects, SARIMAX includes additional variables that can improve the accuracy of the forecasts by capturing external influences.

For a time series

\{Y_{t}\}

influenced by k exogenous series

\{X_{1 t}\}

,

\{X_{2 t}\}

, …,

\{X_{k t}\}

, the SARIMAX model is expressed as the following:

ϕ_{p} (B) Φ_{P} (B^{s}) {(1 - B)}^{d} {(1 - B^{s})}^{D} Y_{t} = θ_{q} (B) Θ_{Q} (B^{s}) ϵ_{t} + \sum_{i = 1}^{k} γ_{i} X_{i t}

(2)

Here,

γ_{i}

represents i-th coefficient of the exogenous variable, and the other symbols are consistent with those used in the SARIMA model equation.

In the preceding section, we identified three factors expected to influence container throughput at the Port of Singapore. Our subsequent empirical analysis aims to incorporate these factors as exogenous variables in the SARIMAX model to assess their efficacy in explaining variations in Singapore’s port throughput. Given that we are considering three distinct exogenous variables, it is important to note that multiple combinations of these variables can be implemented in the SARIMAX model. Through a series of experiments, we intend to determine the best combination of exogenous variables that yields the most robust results. This approach will not only validate the relevance of the selected factors but also provide insights into their relative importance in forecasting container throughput at the Port of Singapore.

The application of SARIMA and SARIMAX models in time series analysis and forecasting requires several essential steps, including ensuring the stationarity of the time series and determining the orders of the AR, MA, SAR, and SMA components. This systematic approach, known as the Box–Jenkins methodology [17], forms the foundation for robust time series modeling. In the following section, we will demonstrate the practical application of this methodology to the actual container throughput data from the Port of Singapore.

4. Empirical Analysis

This section explores the appropriate order of each component in the SARIMA model to best describe the monthly container throughput at Singapore Port. Then, proper combinations of exogenous variables are suggested for the SARIMAX model. Lastly, comparative experiments are carried out to validate the predictive performance of the proposed models. For this study, the data from January 2010 to December 2021 is used to fit the model (i.e., train data), with the aim of forecasting container throughput for 2022 later.

4.1. Stationarity and Seasonality Test

Before fitting an ARIMA-based model to the time series, it is crucial to verify the stationarity of the series. If the series is non-stationary, appropriate transformations must be applied. As shown in Figure 1, the presence of an upward trend in the container throughput time series at Singapore Port is visually apparent. To test whether this trend leads to a violation of stationarity, the ADF (Augmented Dickey–Fuller) test was applied [18]. The null hypothesis of the ADF test is that the series has a unit root, i.e., is not stationary. Therefore, a low p-value indicates a rejection of the null hypothesis, confirming stationarity in the series.

When the ADF test was applied to the original time series data from January 2010 to December 2021, the ADF statistic was −1.28 with a p-value of 0.63, which cannot reject the null hypothesis (i.e., the time series is not stationary) at significance level 0.01. Consequently, differencing was applied to the raw series to achieve stationarity. The ADF test on the differenced series showed the ADF statistic of −3.53 and a p-value of 0.007, which is small enough to reject the null hypothesis, confirming that the differenced series is stationary. Based on these results, the differencing order d for the SARIMA model was set to 1. The ADF test was conducted using the adfuller function from the tsa.stattools module of the statsmodels library in Python.

Visual inspection of Figure 2 reveals clear seasonal patterns in container throughput, notably a significant decrease in volume during specific months (e.g., February). To further confirm the presence of seasonality, we applied the QS test, proposed by Ollech and Webel [19], to the differenced time series. We used the qs() function from the uroot package in R to perform this test. When we conducted the QS test on the double differenced time series, we obtained the QS statistic value that was close to zero, with a p-value nearly one, which failed to reject the null hypothesis of no seasonality. This suggests that the double differencing, which appears to have removed the seasonal component, enhanced the stationarity of the time series. Based on these findings, we set the seasonal differencing order (D) for the SARIMA model to 1.

4.2. Model Identification for SARIMA

The process of identifying the appropriate orders for the SARIMA model is essential for accurate time series forecasting. In this subsection, we determine the orders of AR (p), MA (q), SAR (P), and SMA (Q) components in the SARIMA model. Other parameters of the model, i.e., seasonal period (s), differencing order (d), and seasonal differencing order (D), have already been set to 12, 1, and 1, respectively.

The process of determining the appropriate orders for a SARIMA model often involves a trial-and-error approach, as noted in Box et al. [17] on time series analysis. We first examine the orders of the AR and SAR components, i.e., p and P. In this study, we identify potential candidate values for p and P based on the Partial Autocorrelation Function (PACF) graph of the double differenced time series. Figure 4 presents the PACF graph for the double differenced container throughput time series. As evident from the figure, a significant PACF value is observed at lag 1. Based on this observation, we select 0 and 1 as candidate values for the AR order, p. Furthermore, significant PACF values are visible at lags 12 and 24. Consequently, we designate 1 and 2 as candidate values for the SAR order, P.

Similarly, candidate orders for the MA and SMA component are identified using the Autocorrelation Function (ACF) graph. Figure 5 shows significant ACF values at lag 1, leading to candidate orders of 0 and 1 for the MA order, q. For the SMA order (Q), significant values appear at lag 12, suggesting candidate orders of 0 and 1.

Given the candidate orders of 2 for each AR, SAR, MA, and SMA, respectively, a total of 16 combinations are possible. For each combination, the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) values were calculated [20], and the combination with the lowest AIC and BIC values was considered optimal. Table 1 summarizes the AIC and BIC values for the 16 candidate models. In this study, the SARIMA(0,1,1)(1,1,1)₁₂ model was ultimately selected, as it exhibited the lowest AIC and BIC values among the candidate models. This model can be interpreted as suggesting that the container throughput for the current month is most appropriately explained by incorporating the error term from one month prior, as well as the throughput and error term from twelve months prior. Specifically, the model achieved an AIC of 1578.906 and a BIC of 1587.532, indicating its superior fit and parsimony compared to other specifications. This formulation aligns with the inherent monthly seasonality often observed in port logistics data, while also accounting for short-term autocorrelation in the series.

4.3. Selecting Exogenous Variables for SARIMAX

In this subsection, we aim to validate the effects of exogenous variables identified through data analysis in Section 2 by fitting them into SARIMAX models and determining the final set of exogenous variables for demand forecasting. In Section 2, we identified three external factors influencing container throughput at the Port of Singapore: West Texas Intermediate (WTI) crude oil prices, China’s export volume, and the number of COVID-19 cases. While it is possible to include all these factors simultaneously as exogenous variables in the SARIMAX model, we also consider scenarios where these variables are incorporated individually or in pairs. This approach yields a total of seven possible combinations. We fitted SARIMAX models for each combination, and the resulting AIC and BIC values are summarized in Table 2.

Contrary to expectations, the results indicate that the inclusion of exogenous variables did not consistently lead to improvements in the AIC and BIC values compared to the SARIMA model selected in the previous subsection. Notably, only the SARIMAX model incorporating WTI as an exogenous variable demonstrated a lower AIC value than its SARIMA counterpart. These results suggest that among the exogenous variables considered, the WTI price has the most significant impact on model performance. Based on the quantitative criteria of AIC and BIC values, the relative importance of the three external factors can be inferred as follows: WTI prices, followed by China’s export volume, and lastly, COVID-19 confirmed cases. Furthermore, the inclusion of COVID-19 case numbers as an exogenous variable consistently yielded unfavorable outcomes across all model specifications.

Based on these findings, we have decided to exclude the COVID-19 case numbers from the set of exogenous variables in our SARIMAX models. Instead, we will proceed with three combinations of the remaining two variables (WTI price and China’s export volume) as the final set of exogenous variables for our SARIMAX models. The selected combinations will be employed in the demand forecasting process presented in the following subsection.

4.4. Comparative Experiments

To validate the predictive performance of the proposed SARIMA and SARIMAX models, we conducted an extensive comparative experiment. The models were trained using monthly time series data from January 2010 to December 2021, and their prediction capabilities were tested by generating forecasts for the year 2022. To assess the predictive accuracy of these models, we employed three widely recognized performance metrics: Mean Absolute Percentage Error (MAPE), Mean Absolute Error (MAE), and Root Mean Square Error (RMSE). These metrics are defined as follows:

M A P E = \{\sum_{t = 1}^{n} (\frac{|A_{t} - F_{t}|}{A_{t}}) / n\} \times 100 % M A E = \sum_{t = 1}^{n} (|A_{t} - F_{t}|) / n R M S E = \sqrt{\sum_{t = 1}^{n} {(A_{t} - F_{t})}^{2} / n}

where

A_{t}

represents the actual value,

F_{t}

denotes the forecasted value, n is the number of observations. In this test, n is 12, i.e., the number of all months of the year 2022.

To comprehensively evaluate the performance of our proposed approach, we employ a diverse set of benchmark methods, encompassing both traditional statistical techniques and contemporary machine learning algorithms. These include the Holt–Winters Method [21], also known as Triple Exponential Smoothing; a Linear, LASSO, and Ridge regression Model with a linear combination of relevant independent variables; the ECM (Error Correction Mechanism) [22], a method for analyzing non-stationary time series data that share a long-term equilibrium relationship; an SVR (Support Vector Regressor) [23], the regression version of SVM; a Random Forest [24], a basic tree-based ensemble learning method; XGBoost [25], an advanced gradient boosting technique renowned for its high predictive performance; LightGBM [26]: a gradient boosting framework that uses tree-based learning algorithms; Long Short-Term Memory (LSTM) [27]: a type of recurrent neural network capable of learning long-term dependencies; Prophet [28]: a forecasting tool developed by Meta (formerly Facebook), specifically designed to handle time series data with strong seasonal effects.

In methods that require independent variables, the following predictors were used: period index, month information, WTI price, China’s export volume, container throughput from one month ago, and container throughput from one year ago. All the methods tested in this study were implemented using Python libraries.

Table 3 illustrates the comparative performance of our proposed SARIMA and SARIMAX models against several benchmark methods, utilizing MAPE, MAE, and RMSE as evaluation metrics. Again, all the methods tested in this study utilized the same samples of the training dataset to forecast 12 months of 2022. Among the proposed models, the SARIMAX(0,1,1)(1,1,1)₁₂ incorporating both WTI price and China’s export volume as exogenous variables demonstrated superior performance, achieving the lowest MAPE (2.34%), MAE (72.27), and RMSE (92.99) across all models. This suggests that the inclusion of these specific exogenous variables significantly enhances the model’s predictive accuracy. Interestingly, the SARIMAX model with China’s export volume alone outperformed the base SARIMA model and the SARIMAX model with only WTI price, indicating that China’s export volume may be a more influential factor in predicting container throughput for this particular port. Comparing the proposed models to the benchmarks, we observe that our best-performing SARIMAX model surpasses all benchmark methods across all error metrics.

Traditional statistical methods such as the Holt–Winter method, linear regression, LASSO regression, Ridge regression, and ECM did not show superior performance. Even though the same exogenous variables considered in the SARIMAX model were included as independent variables in the regression models, it was found that these methods were not suitable for the forecasting problem considered in this study. It is noteworthy that while some well-known machine learning methods like SVR, XGBoost, and LSTM were included in the benchmark comparisons, they did not outperform the proposed SARIMAX models in this specific forecasting task. They also did not surpass simpler machine learning approaches like Random Forest and LightGBM.

For a more detailed analysis, we plotted a time series chart of the forecast results for the year 2022. Figure 6 presents a comparative visualization of the actual container throughput at the Port of Singapore for the year 2022 against the forecasts generated by our proposed models (SARIMA and SARIMAX) and two top-performing benchmark models (Random Forest and Prophet). The SARIMAX model, which incorporates WTI crude oil prices and China’s export volume as exogenous variables, demonstrates the closest alignment with the actual throughput values. It particularly captures the overall trend and seasonal fluctuations. The SARIMA model, while not accounting for external factors, still provides a reasonably accurate forecast, generally following the actual data’s pattern but with slightly larger deviations in some months. The Random Forest model’s forecasts exhibit minimal variation across months, suggesting a relatively static forecast pattern. This lack of variability, despite the model’s favorable performance in numerical metrics, may indicate that its accuracy is somewhat coincidental rather than based on capturing underlying patterns. The Prophet model, while capturing the general trend, shows more pronounced deviations from the actual values compared to the other models, especially during the middle months of the year. Notably, all models seem to struggle with accurately predicting the sharp increase in throughput observed in October. This suggests that this spike might be attributed to factors not fully captured by the models or the exogenous variables considered. Overall, this visual comparison corroborates the quantitative findings, highlighting the superior predictive capability of the SARIMAX model and the value of incorporating relevant exogenous variables in forecasting container throughput for the Port of Singapore.

5. Conclusions and Future Research Directions

This study proposed SARIMAX models for forecasting container throughput at the Port of Singapore, which is the second-largest container port in the world by traffic volume. To determine the orders of the SARIMAX models, we first analyzed the time series data of the container throughput, which resulted in candidate values of the orders for a SARIMA model. We then identified the most suitable values of the order by experimenting with various candidate combinations. The determined values of the orders were applied to the SARIMAX model. For the exogenous variables in the SARIMAX model, we considered three external factors. Through a series of experiments involving different combinations of these external factors, we found that the combination of WTI crude oil prices and China’s export volume yielded the best predictive performance. Our results demonstrated that including these external factors significantly enhances the predictive accuracy of the model compared to traditional SARIMA and other benchmark methods. This study highlights the importance of selecting appropriate exogenous variables when forecasting container throughput, offering valuable insights for both researchers and practitioners in the field.

However, there are also limitations inherent in using exogenous variables for forecasting. One significant condition is the necessity to predict these external factors themselves when making future projections. The accuracy of our container throughput forecast would be dependent on the precision of the forecasts for oil prices and export volumes.

Several directions for future research can be considered from this study. Future studies could investigate the impact of other economic indicators or global events on container throughput, and their combinations. The development of hybrid models, combining statistical methods with artificial intelligence approaches, may yield more robust forecasting models. Another important direction for future research is the exploration of lagged effects of exogenous variables. Different economic factors may impact container throughput with varying time delays. For example, changes in oil prices might affect shipping decisions with a lag of several months. By incorporating lagged versions of our exogenous variables into the SARIMAX model, we could potentially capture these delayed effects and further improve forecasting accuracy.

Author Contributions

Methodology, G.-C.L.; Software, J.-Y.B.; Validation, G.-C.L.; Formal analysis, G.-C.L.; Investigation, J.-Y.B.; Resources, G.-C.L.; Data curation, G.-C.L.; Writing G.-C.L. and J.-Y.B. All authors have read and agreed to the published version of the manuscript.

Funding

This paper was supported by Konkuk University in 2024.

Data Availability Statement

The monthly container throughput data for Singapore Port were collected from the website serviced by the Singapore Department of Statistics: tablebuilder.singstat.gov.sg accessed on 26 June 2024. The price of West Texas Intermediate (WTI) crude oil was collected from the U.S. Energy Information Administration (EIA): www.eia.gov accessed on 28 June 2024. China’s export volume was collected from the Federal Reserve Bank of St. Louis’ Federal Reserve Economic Data: fred.stlousifed.org accessed on 28 June 2024. The number of confirmed COVID-19 cases in the world was collected from the WHO COVID-19 Dashboard: data.who.int accessed on 5 July 2024.

Acknowledgments

This paper was supported by Konkuk University in 2024.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Huang, J.; Chu, C.-W.; Hsu, H.-L. A Comparative Study of Univariate Models for Container Throughput Forecasting of Major Ports in Asia. Proc. Inst. Mech. Eng. Part M J. Eng. Marit. Environ. 2022, 236, 160–173. [Google Scholar] [CrossRef]
Munim, Z.H.; Fiskin, C.S.; Nepal, B. Forecasting Container Throughput of Major Asian Ports Using Various Time Series Methods. Asian J. Shipp. Logist. 2023, 39, 67–77. [Google Scholar] [CrossRef]
Lloyd’s List, One Hundred Container Port. 2023. Available online: https://lloydslist.com/one-hundred-container-ports-2023 (accessed on 12 July 2024).
Lee, G.-C.; Lee, H.; Koo, H.-Y. Forecasting Container Throughput of Busan Port Using SARIMAX. Korean Manag. Sci. Rev. 2023, 40, 1–13. [Google Scholar] [CrossRef]
Huang, D.; Grifoll, M.; Sanchez-Espigares, J.A.; Zheng, P.; Feng, H. Hybrid Approaches for Container Traffic Forecasting in the Context of Anomalous Events: The Case of the Yangtze River Delta Region in the COVID-19 Pandemic. Transp. Policy 2022, 128, 1–12. [Google Scholar] [CrossRef]
Huang, A.; Liu, X.; Rao, C.; Zhang, Y.; He, Y. A New Container Throughput Forecasting Paradigm under COVID-19. Sustainability 2022, 14, 2990. [Google Scholar] [CrossRef]
Koyuncu, K.; Tavacioglu, L.; Gökmen, N.; Arican, U. Forecasting COVID-19 Impact on RWI/ISL Container Throughput Index by Using SARIMA Models. Marit. Policy Manag. 2021, 48, 1096–1108. [Google Scholar] [CrossRef]
Tan, N.D.; Yu, H.C.; Long, L.N.B.; You, S.-S. Time Series Forecasting for Port Throughput Using Recurrent Neural Network Algorithm. J. Int. Marit. Saf. Environ. Aff. Shipp. 2021, 5, 175–183. [Google Scholar] [CrossRef]
Shankar, S.; Ilavarasan, P.V.; Punia, S.; Singh, S.P. Forecasting Container Throughput with Long Short-Term Memory Networks. Ind. Manag. Data Syst. 2020, 120, 425–441. [Google Scholar] [CrossRef]
Farhan, J.; Ong, G.P. Forecasting Seasonal Container Throughput at International Ports Using SARIMA Models. Marit. Econ. Logist. 2018, 20, 131–148. [Google Scholar] [CrossRef]
Shankar, S.; Punia, S.; Ilavarasan, P. Deep Learning-Based Container Throughput Forecasting: A Triple Bottom Line Approach. Ind. Manag. Data Syst. 2021, 121, 2100–2117. [Google Scholar] [CrossRef]
Intihar, M.; Kramberger, T.; Dragan, D. Container Throughput Forecasting Using Dynamic Factor Analysis and ARIMAX Model. Promet-Traffic Transp. 2017, 29, 529–542. [Google Scholar] [CrossRef]
Kim, D.; Lee, K. Forecasting the Container Volumes of Busan Port Using LSTM. J. Korea Port Econ. Assoc. 2020, 36, 53–62. [Google Scholar] [CrossRef]
Li, H.; Bai, J.; Li, Y. A Novel Secondary Decomposition Learning Paradigm with Kernel Extreme Learning Machine for Multi-Step Forecasting of Container Throughput. Phys. A Stat. Mech. Appl. 2019, 534, 122025. [Google Scholar] [CrossRef]
Port of Singapore 2015 Container Throughput Falls to 4-Year Low. gCaptain. Available online: https://gcaptain.com/port-of-singapore-2015-container-throughput-falls-to-4-year-low/ (accessed on 15 July 2024).
Weak Trade Curtails 2016 Volume Growth at Singapore Port. PortCalls Asia. Available online: https://www.portcalls.com/weak-trade-curtails-2016-volume-growth-singapore-port/ (accessed on 15 July 2024).
Box, G.E.P.; Jenkins, G.M.; Reinsel, G.C. Time Series Analysis Forecasting and Control, 4th ed.; John Wiley and Sons: Hoboken, NJ, USA, 2008. [Google Scholar]
Said, S.E.; Dickey, D.A. Testing for Unit Roots in Autoregressive-Moving Average Models of Unknown Order. Biometrika 1984, 71, 599–607. [Google Scholar] [CrossRef]
Ollech, D.; Webel, K. A Random Forest-Based Approach to Identifying the Most Informative Seasonality Tests. Deutsche Bundesbank Discussion Paper No. 55/2020. SSRN J. 2020. [Google Scholar] [CrossRef]
Hyndman, R.J.; Athanasopoulos, G. Forecasting: Principles and Practice, 2nd ed.; OTexts: Melbourne, Australia, 2018. [Google Scholar]
Winters, P.R. Forecasting Sales by Exponentially Weighted Moving Averages. Manag. Sci. 1960, 6, 324–342. [Google Scholar] [CrossRef]
Engle, R.F.; Granger, C.W.J. Co-Integration and Error Correction: Representation, Estimation, and Testing. Econometrica 1987, 55, 251–276. [Google Scholar] [CrossRef]
Smola, A.J.; Schölkopf, B. A Tutorial on Support Vector Regression. Stat. Comput. 2004, 14, 199–222. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’16, San Francisco, CA, USA, 13–17 August 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar] [CrossRef]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Taylor, S.J.; Letham, B. Forecasting at Scale. PeerJ 2017. preprint. [Google Scholar] [CrossRef]

Figure 1. Monthly Container Throughput of Singapore Port from 1995 to 2021. Source: Data from the Singapore Department of Statistics.

Figure 2. Trends of Singapore container throughput vs. external factors from after the financial crisis: (a) container throughput vs. West Texas Intermediate (WTI) price (Unit: USD); (b) container throughput vs. China’s export volume (Unit: USD). Source: Data from the Singapore Department of Statistics, the U.S. Energy Information Administration, and the Federal Reserve Bank of St. Louis.

Figure 3. Container throughput vs. world COVID-19 confirmed cases from 2020 to 2021. Source: Data from the Singapore Department of Statistics and the WHO COVID-19 Dashboard.

Figure 4. Partial Autocorrelation Function graph of the double differenced time series. Source: own elaboration based on data from the Singapore Department of Statistics.

Figure 5. Autocorrelation Function graph of the double differenced time series. Source: own elaboration based on data from the Singapore Department of Statistics.

Figure 6. Actual Container Throughput vs. Forecasts by four Models in 2022. Source: Data from the Singapore Department of Statistics and own elaboration based on the data tested in this study.

Table 1. AIC and BIC values of all the candidate SARIMA models. Source: own elaboration based on the data tested in this study.

(p,d,q)(P,D,Q)s	AIC	BIC	(p,d,q)(P,D,Q)s	AIC	BIC
(0,1,0)(1,1,0)₁₂	1595.49	1601.24	(1,1,0)(1,1,0)₁₂	1588.825	1597.451
(0,1,0)(1,1,1)₁₂	1578.906	1587.532	(1,1,0)(1,1,1)₁₂	1572.067	1583.568
(0,1,0)(2,1,0)₁₂	1593.5	1602.125	(1,1,0)(2,1,0)₁₂	1586.274	1597.775
(0,1,0)(2,1,1)₁₂	1580.87	1592.371	(1,1,0)(2,1,1)₁₂	1573.979	1588.355
(0,1,1)(1,1,0)₁₂	1587.391	1596.016	(1,1,1)(1,1,0)₁₂	1589.361	1600.861
(0,1,1)(1,1,1)₁₂	1570.063	1581.564	(1,1,1)(1,1,1)₁₂	1571.929	1586.305
(0,1,1)(2,1,0)₁₂	1584.401	1595.902	(1,1,1)(2,1,0)₁₂	1586.329	1600.705
(0,1,1)(2,1,1)₁₂	1571.916	1586.292	(1,1,1)(2,1,1)₁₂	1573.762	1591.013

Table 2. AIC and BIC values of SARIMAX with the corresponding exogenous variables. Source: own elaboration based on the data tested in this study.

Exogenous Variables	AIC	BIC
WTI price (W)	1568.763	1583.139
China’s export volume (E)	1585.055	1599.431
COVID-19 cases (C)	1588.809	1603.185
W, E	1583.720	1600.971
W, C	1587.682	1604.933
E, C	1587.634	1604.885
W, E, C	1586.357	1606.483

Table 3. The performance of the SARIMA and SARIMAX models against benchmark methods. Source: own elaboration based on the data tested in this study.

Forecasting Models		MAPE (%)	MAE	RMSE
Proposed Models	SARIMA(0,1,1)(1,1,1)₁₂	3.02	91.97	117.21
	SARIMAX(0,1,1)(1,1,1)₁₂ with W ¹	3.66	111.82	135.33
	SARIMAX(0,1,1)(1,1,1)₁₂ with E ²	2.71	84.44	99.41
	SARIMAX(0,1,1)(1,1,1)₁₂ with W & E	2.34	72.27	92.99
Benchmarks	Holt–Winter Method	3.83	117.32	144.23
	Linear Regression	4.74	145.65	155.77
	LASSO Regression	4.31	131.40	152.80
	Ridge Regression	5.40	165.22	188.41
	ECM	4.06	128.20	156.84
	SVR	3.79	115.41	140.09
	Random Forest	2.91	89.27	114.14
	XGBoost	4.19	131.67	159.97
	LightGBM	2.98	91.17	108.15
	LSTM	3.75	116.34	153.22
	Prophet	2.94	89.36	117.78

¹ WTI crude oil price. ² China’s export volume.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lee, G.-C.; Bang, J.-Y. Forecasting Container Throughput of Singapore Port Considering Various Exogenous Variables Based on SARIMAX Models. Forecasting 2024, 6, 748-760. https://doi.org/10.3390/forecast6030038

AMA Style

Lee G-C, Bang J-Y. Forecasting Container Throughput of Singapore Port Considering Various Exogenous Variables Based on SARIMAX Models. Forecasting. 2024; 6(3):748-760. https://doi.org/10.3390/forecast6030038

Chicago/Turabian Style

Lee, Geun-Cheol, and June-Young Bang. 2024. "Forecasting Container Throughput of Singapore Port Considering Various Exogenous Variables Based on SARIMAX Models" Forecasting 6, no. 3: 748-760. https://doi.org/10.3390/forecast6030038

Article Menu

Forecasting Container Throughput of Singapore Port Considering Various Exogenous Variables Based on SARIMAX Models

Abstract

1. Introduction

2. Time Series Analysis and External Influences on Container Throughput

3. Methodology

3.1. SARIMA Model

3.2. SARIMAX Model

4. Empirical Analysis

4.1. Stationarity and Seasonality Test

4.2. Model Identification for SARIMA

4.3. Selecting Exogenous Variables for SARIMAX

4.4. Comparative Experiments

5. Conclusions and Future Research Directions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI