Development of a Time Series E-Commerce Sales Prediction Method for Short-Shelf-Life Products Using GRU-LightGBM

Chen, Yong; Xie, Xian; Pei, Zhi; Yi, Wenchao; Wang, Cheng; Zhang, Wenzhu; Ji, Zuzhen

doi:10.3390/app14020866

Open AccessArticle

Development of a Time Series E-Commerce Sales Prediction Method for Short-Shelf-Life Products Using GRU-LightGBM

by

Yong Chen

,

Xian Xie

,

Zhi Pei

,

Wenchao Yi

,

Cheng Wang

,

Wenzhu Zhang

and

Zuzhen Ji

^*

Department of Mechanical Engineering, Zhejiang University of Technology, Hangzhou 310014, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(2), 866; https://doi.org/10.3390/app14020866

Submission received: 20 December 2023 / Revised: 16 January 2024 / Accepted: 17 January 2024 / Published: 19 January 2024

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

Effective sales prediction for e-commerce would assist retailers in developing accurate production and inventory control plans, which would further help them to reduce inventory costs and overdue losses. This paper develops a systematic method for e-commerce sales prediction, with a particular focus on predicting the sales of products with short shelf lives. The short-shelf-life product sales prediction problem is poorly addressed in the existing literature. Unlike products with long shelf lives, short-shelf-life products such as fresh milk exhibit significant fluctuations in sales volume and incur high inventory costs. Therefore, accurate prediction is crucial for short-shelf-life products. To solve these issues, a stacking method for prediction is developed based on the integration of GRU and LightGBM. The proposed method not only inherits the ability of the GRU model to capture timing features accurately but also acquires the ability of LightGBM to solve multivariable problems. A case study is applied to examine the accuracy and efficiency of the GRU-LightGBM model. Comparisons among other sales prediction methods such as ARIMA and SVR are also presented. The comparative results show that the GRU-LightGBM model is able to predict the sales of short-shelf-life products with higher accuracy and efficiency. The selected features of the GRU-LightGBM model are also useful due to their interpretability while developing sales strategies.

Keywords:

sales prediction; e-commerce; GRU; LightGBM; short-shelf-life product

1. Introduction

In recent years, e-commerce has entered a new stage of rapid expansion [1,2,3]. The development and prosperity of e-commerce mean that it has replaced and eliminated many offline stores. Sales prediction is important to e-commerce. Accurate sales prediction has great significance for enterprises in the coordination of production, procurement, and logistics. Many sales prediction issues are associated with time series prediction. The traditional sales prediction methods are, for example, the parameter prediction method and the Autoregressive Integrated Moving Average (ARIMA) method [4,5]. Machine learning methods such as Gradient-Boosted Regression Trees (GBRT), Light Gradient Boosting Machine (LightGBM), and Extreme Gradient Boosting (XGBoost) are also used [6,7,8]. Traditional methods are mainly developed for offline retailers, and the sales are affected by the weather, season, policies, and passenger flow volume. In contrast, e-commerce sales are mainly associated with activities on e-commerce platforms and network hot spots. For example, the emergence of the COVID-19 pandemic accelerated the process of the decline of traditional offline retailing [9], but e-commerce platforms such as Taobao and Amazon still had large numbers of sales. Hence, there is a specific need to develop sales prediction methods for e-commerce.

In contrast, e-commerce sales for short-shelf-life products, such as fresh milk, are different from those of other products. Due to the short shelf life of fresh milk, higher precision is required in sales to guide production and inventory. The traditional sales prediction methods such as ARIMA and SARIMA are poor in dealing with this type of problem. Poor sales prediction results in poor inventory plan development and causes extensive waste. To solve the above issues, there is a need to develop a systematic prediction method for short-shelf-life product sales in e-commerce; hence, in this paper, the GRU-LightGBM prediction method is proposed. The method benefits predictions by using GRU to capture timing features and by using LightGBM for multivariable forecasting. The proposed integration is then applied to a case study for its validation and application. The results show that the GRU-LightGBM method has an accurate prediction output and acceptable training speed.

The structure of this paper is as follows. Section 1 delivers an introduction to the research. Section 2 reviews the literature related to sales prediction methods. In Section 3, the methodology of GRU-LightGBM development is elaborated. In Section 4, the GRU-LightGBM method is applied to a case study for examination. Section 5 provides a comparison of other prediction methods and also presents the conclusions.

2. Literature Review of Sales Prediction Methods

The existing sales prediction methods are associated with parameter prediction methods based on time series, machine learning, and deep learning models.

The parameter prediction methods based on time series can be mainly divided into two categories: traditional statistical methods and autoregression methods. Traditional statistical methods consider that time series have regularity and can be postponed reasonably along the trend. Pongdatu used the exponential smoothing method to solve sales forecasting problems in retail stores [10]. Sidqi used the single exponential smoothing and double exponential smoothing methods in the case of product selling prediction for the XYZ store [11]. Bergmeir presented an improved exponential smoothing method in the M3 competition and achieved good results [12]. Contreras proposed that the autoregressive method is more accurate in stationarity time series prediction [13]. The autoregressive method is also used for supply chain development; for example, Wang [14] proposed the ARIMA method for customer demand prediction. Babai [15] then used the ARIMA model to analyze the relationship between prediction accuracy and inventory. Later, Biswas [16] proposed hybrid models, e.g., ARIMA-RF and ARIMA-BCART, for wind power generation prediction. Bi [17] developed a combined model based on regression and time series methods for battery charging time prediction. In essence, the parameter prediction methods based on time series can only capture linear relations, and they are inefficient in analyzing nonlinear relations. The parameter prediction methods also require stationary time series data as input. In a stationary time series, its mean, variance, and covariance do not change over time. Thus, a stationary time series has no trend and no periodic changes. However, the data in sales prediction for e-commerce are seasonal, hence increasing the complexity of using the parameter prediction methods. Additionally, the parameter prediction methods based on time series are usually only suitable for single variable prediction; meanwhile, in e-commerce sales scenarios, the factors are diverse.

Machine learning and deep learning methods have been recently used for multivariable prediction studies [18,19]. In the application of machine learning, Xia [20] proposed the ForeXGBoost model for vehicle sales prediction. Gradient Boosting Machine (GBM) and Light Gradient Boosting Machine (LightGBM) are also widely applied in sales forecasting [21]. Some other sales prediction methods are related to Support Vector Machine (SVM) [22] and Extreme Learning Machine (ELM) [23,24]. In the application of deep learning, Kuo [25] used a backpropagation neural network based on an artificial immune system to address the problem of sales prediction with the large influence of category variables and proved the accuracy of the algorithm. Ma [26] presented a meta-learning framework based on newly developed deep convolutional neural networks in retail sales prediction. Long Short-Term Memory (LSTM), the Recurrent Neural Network (RNN), and Gated Recurrent Units (GRU) are also widely used deep learning models for sales prediction [27,28,29]. Compared with other prediction methods, deep learning models have higher accuracy but a slow calculation speed and poor interpretability.

While managing complex sales prediction problems, a single prediction model does not always provide a satisfactory output; hence, many studies are developed using a combination method [30]. The combination method is able to harmonize the advantages of the combined models to improve the prediction accuracy. Deep learning models are always chosen in such combinations due to their high precision in solving complex problems [31,32]. For example, Han proposed a combined model based on the ARIMA model and the LSTM neural network to improve the accuracy of sales forecasting for manufacturing companies [33].

3. Methodology

3.1. Research Premise

Compared to traditional offline retailers, multiple factors such as platform activities and sales promotion are introduced by e-commerce, which increases the complexity of sales prediction. Moreover, the sales of short-shelf-life products require the accurate control of inventory to reduce overdue losses; hence, higher-precision prediction methods are needed. Existing prediction methods have drawbacks in solving the above issues. Thus, there is a need to develop a new prediction method for e-commerce’s short-shelf-life products.

3.2. Research Approach

In this research, a stacking model for short-shelf-life product sales prediction is proposed based on the integration of LightGBM and GRU. The short-shelf-life product chosen in our case is fresh milk. The proposed prediction method consists of four steps; see Table 1. The proposed method is further applied to a case study for its validation. The overall research workflow is illustrated in Figure 1.

3.3. Data Processing Methods

3.3.1. Data Preprocessing

Data preprocessing comprises data cleaning, data integration, data transformation, and data reduction. In this research, the data are preprocessed as follows.

(1): One-hot. The categorical features are one-hot encoded because numeric coding will cause a large number of errors in this work.
(2): Standardization. In this work, maximum and minimum standardization are used to map the initial data to the interval [0, 1].
(3): Missing value processing. In this work, the regression tree model is used to fill in the missing values.
(4): Outlier handling. The data set is constructed using the sliding window method [34]. Then, we calculate the mean and standard deviation and determine the data outside the range of three standard deviations from the mean as outliers. The outliers are handled as missing values.

3.3.2. The STL Method

STL is used to analyze the fluctuations of the data. Various time series are handled using the STL method [35,36]. The STL method decomposes time series data into variation components, including seasonality, trends, and residuals, as shown using Equation (1) [37].

Y_{t} = T_{t} + S_{t} + R_{t},

(1)

where

Y_{t}

,

S_{t}

,

T_{t}

and

R_{t}

, respectively, represent the time series data, seasonal component, trend component, and residual component.

t = 1 to N

represents the sequence length.

The iteration mechanism of STL comprises two recursive procedures, the inner and outer loops. The inner loops are used to calculate the seasonal component and the trend component. The steps of the inner loops are shown in Table 2.

The outer loops are used to adjust the robust weight

ρ_{v}

. It is used in Step 2 and Step 6 of the inner loops. In the process of smoothing via loess, the neighborhood weight should be multiplied by a robust weight, and the calculation of

ρ_{v}

is as follows:

ρ_{v} = B (\frac{|R_{v}|}{h})

(2)

h = 6 * m e d i a n (|R_{v}|), v = 1, \dots, N

(3)

B (u) = \{\begin{matrix} {(1 - u^{2})}^{2}, f o r 0 \leq u < 1 \\ 0, f o r u \geq 1 \end{matrix}

(4)

where

R_{v}

represents the residual of the data point,

h

is defined as an intermediate variable, and

B (u)

represents the bisquare function.

The advantage of the STL method is its robust adaptation to outliers in the data. It can be successfully applied in handling the case of a large number of time series data with great volatility and instability [39].

3.3.3. Feature Selection Using REFCV

The features are selected using the Recursive Feature Elimination with Cross-Validation (RFECV) method. The Recursive Feature Elimination (RFE) method is a recursive process. It ranks the features according to their importance to the algorithm [40,41]. The pseudo-code of the algorithm is shown in Algorithm 1.

Algorithm 1: RFE algorithm—Pseudo-code for the RFE algorithm

Inputs: Training set

T

, Set of

p

features

F = (f_{1}, \dots, f_{p})

, Ranking method

M (T, F)

Outputs: Final ranking

R

1: for

i

in (1:

p

) do
2: Rank set

F

using

M (T, F)

3:

f^{*} \leftarrow

last ranked feature in

F

4:

R (p - i + 1) \leftarrow f^{*}

5:

F \leftarrow F - f^{*}

6: end for

Based on the RFE algorithm, the RFECV method cross-validates different feature combinations. By calculating its weight coefficient, finally, we select the best combination of features.

3.4. The Development of GRU-LightGBM

3.4.1. The GRU Method

The GRU model is a variant of the RNN, which was proposed by [42]. It is used to deal with the problem of gradient vanishing or exposure in long-time-series models [43]. The structure of GRU is similar to that of the standard RNN. GRU data transmission at adjacent time is shown in Figure 2.

The GRU has gated units; thus, we have the update gate

z

and the reset gate

r

. The update gate is computed by

z = σ (ω^{z} \cdot [x^{t}, h^{t - 1}])

(5)

The reset gate is computed by

r = σ (ω^{r} \cdot [x^{t}, h^{t - 1}])

(6)

After the reset gate is computed, we reset the transfer variable by

h^{'} = \tanh (ω \cdot [x^{t}, h^{t - 1} * r])

(7)

Then, we update the transfer variable

h^{t}

:

h^{t} = 1 - z \times h^{t - 1} + z \times h^{'}

(8)

The calculation process of the GRU unit is shown in Figure 3.

The value range of

z

is [0, 1]. The closer

z

is to 1, the more information is left in memory; otherwise, more information is forgotten. The update gate

z

can perform the forgetting and memorization of gates at the same time.

3.4.2. The LightGBM Method

LightGBM is a type of Gradient Boosting Decision Tree (GBDT) that was proposed in 2017 [44]. It was developed to solve the computational complexities, one of the major challenges, in traditional machine learning approaches [45]. LightGBM is a distributed and incredibly effective technique. It contains two novel techniques called Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB) [46]. The GOSS algorithm considers that sample points with a larger gradient may have a more significant impact on the results. Therefore, sample points with a larger gradient are retained, and sample points with a smaller gradient are randomly sampled. To avoid the calculation of redundant features, the EFB method groups features of high-dimensional data in a sparse feature space [47]. The histogram algorithm used in EFB aims to combine mutually exclusive features; see Figure 4. Firstly, we discretize the consecutive eigenvalues into

k

integers and construct a histogram of width

k

. Then, the mutually exclusive features are stored in different discrete histograms to construct the feature-binding group. This method can reduce the storage. The main parameters of LightGBM are shown in Table 3.

3.4.3. Harmonization of GRU-LightGBM

The structure of the GRU-LightGBM model is shown in Figure 5. The model consists of two layers. In Layer 01, the training data are evenly divided into five parts; four parts are used for training, and the remaining part of the data is used for prediction. The cross-validation method with one part reserved is used to train the GRU model. After the training, the GRU model is used for prediction. Then, we combine the five prediction results to obtain new training data for the LightGBM model in Layer 02. Driven by the model stacking approach [49], in the harmonization process, the results of predictions on the first layer are treated as new training data for the LightGBM model in the second layer. The feeding test data for the LightGBM model are the prediction results of the GRM model.

Due to the characteristics of GRU and LightGBM, the following hypothesis is proposed.

Hypothesis 1.

The prediction accuracy of GRU-LightGBM will be better than that of the traditional prediction methods.

4. Application Using GRU-LightGBM for Sales Prediction

4.1. Background of the Case Study

The case study pertains to the prediction of sales for short-shelf-life fresh milk in the context of e-commerce. In this investigation, Product A is selected as a representative example for the analysis and forecasting of sales. This choice is motivated by the consistent maintenance of the stock-keeping unit for Product A over the preceding five years, coupled with a relatively lower incidence of shortages. The specified expiration duration for this product is 14 days. This research collected the sales data for Product A for a total of 5 years from 2017 to 2021. The sales data of the last 15 days, from 10 December 2021 to 24 December 2021, are selected as the test dataset and the remaining data are selected as the training dataset.

4.2. STL Analysis

The STL method is used to analyze the historical sales data of Product A in the last five years. The time series analysis results are shown in Figure 6. The characteristics of the historical sales data are shown in Table 4.

4.3. Result of Feature Selection

The primary features for sales prediction are selected based on the characteristics of e-commerce sales. There are primarily 38 features. The RFECV method is used to analyze the impact intensity of the features on the sales. The result of RFECV is shown in Figure 7.

The curve shows that the first 26 features have significant impacts on sales, while the inclusion of the last 12 features is less important. In the following analyses, these 12 features are eliminated. We then classify the 26 features to improve the computational efficiency and interpretability; see Table 5.

4.4. Result of Sales Prediction Using GRU-LightGBM

In the GRU-LightGBM model, the main parameters of the GRU network and LightGBM are confirmed as follows. For the GRU network, according to the results of the STL analysis, the parameter ‘time_step’ of GRU is set as 14, which means that the data of the past 14 days are used to predict the sales volume of the next day. Then, through experiments, the accuracy rate is found to be highest when the number of GRU network layers is one and the number of hidden layer neurons is 64. The structure of the GRU network is shown in Figure 8. The model training parameters are set as epochs = 50 and batch_size = 72, which means that all the data are trained 50 times, and the number of samples for each batch of training is 72. For the LightGBM algorithm, the parameters are set by the GridSearchCV method; see Table 6. The stability of the proposed optimal parameter set is examined by a cross-validation experiment. The results show that the mean square error is 5.83 and the standard deviation is 0.99, which means that the model has a good prediction ability and stability under this parameter set.

The prediction results of the GRU-LightGBM model are shown in Figure 9.

5. Discussion and Conclusions

5.1. Comparison with Other Prediction Methods

There are a variety of prediction methods existing in the literature; typical methods are, for example, ARIMA [50], Seasonal ARIMA [51], SVR [52], and LSTM [53]. Table 7 illustrates the advantages and disadvantages of these existing prediction methods.

This paper then compares the performance of the proposed GRU-LightGBM method and the existing methods; see Figure 10. The comparison shows that the GRU-LightGBM model proposed in this paper is significantly superior to the other compared methods.

The root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) are also used to analyze the prediction performance [56] among the above methods; see Equations (9)–(11). RMSE reflects the degree of dispersion of the sample. MAE reflects the true error. MAPE is a measure of relative error, with an MAPE value closer to 0 representing a more perfect model. The RMSE, MAE and MAPE evaluations among different prediction methods are shown in Table 8.

R M S E = \sqrt{\frac{1}{m} \sum_{i = 1}^{m} {(y_{i} - \hat{y_{i}})}^{2}}

(9)

M A E = \frac{1}{m} \sum_{i = 1}^{m} |y_{i} - \hat{y_{i}}|

(10)

M A P E = \frac{100 %}{m} \sum_{i = 1}^{m} |\frac{y_{i} - \hat{y_{i}}}{y_{i}}|

(11)

where

m

represents the total size of the test set,

y_{i}

represents the true value and

\hat{y_{i}}

represents the prediction value.

The comparison results indicate that the GRU-LightGBM model has the highest prediction accuracy among the compared models. The RMSE and MAE values of the GRU-LightGBM model are 8.01 and 5.41, respectively, which are significantly lower than those of the other compared models; for example, the best single model prediction method is GRU and the results are 11.19 in RMSE and 8.76 in MAE. GRU-LightGBM also has the lowest MAPE value, which indicates that it is the most perfect model. The MAPE value of GRU-LightGBM is 4.11%; in comparison, for the prediction using a single model method, the MAPE value is 1.63% less than for the GRU model and 2.49% less than for the LightGBM model. This means that the stacking architecture of GRU-LightGBM assists in improving the overall prediction accuracy and stability. The above indicators support Hypothesis 1.

5.2. Contribution of the Work

In this work, a stacking model developed based on GRU and LightGBM is proposed for the sales prediction of short-shelf-life products for e-commerce. Compared with the existing prediction methods, the new combination model has several contributions. Firstly, the GRU-LightGBM inherits both the ability of the GRU model to capture timing features accurately and the ability of LightGBM to solve multivariable problems; it performs well in the case applied in this paper. Secondly, GRU-LightGBM can forecast short-shelf-life product sales accurately and efficiently. The prediction result of the GRU-LightGBM model has the smallest RMSE, MAE and MAPE values, which means that this method has the best prediction performance among all the above methods. From the results, the GRU-LightGBM model’s training speed is close to that of a single deep learning model; hence, it is acceptable. Finally, GRU-LightGBM delivers interpretability based on the features’ importance, which may further assist sales strategy development.

5.3. Limitations of the Work

One of the limitations is associated with interpretability. GRU-LightGBM offers a good result in prediction, but the interpretability of some features, e.g., network layer parameters, is weak. The reason is that the network layer features extracted from the GRU model are numerical relations for model computation, and, usually, they are physically meaningless.

The limited availability of comprehensive data poses another limitation. The integrity of the dataset holds significant importance in adopting GRU-LightGBM. In this research, a crucial factor contributing to the effective performance of GRU-LightGBM in predicting the sales of Product A is the completeness of its sales data, characterized by fewer discontinuous points. It is important to note that the success of GRU-LightGBM does not imply exclusivity to the Product A dataset; rather, it highlights the model’s reliance on high-quality data. The model demonstrates its efficacy when the data exhibit completeness and minimal disruptions, emphasizing the importance of maintaining such data quality standards for optimal performance across various datasets. In future work, authors and potential researchers could replicate and validate the proposed methods with more extensive and varied datasets.

5.4. Opportunities

The proposed GRU-LightGBM method is designed for the prediction of e-commerce sales, and its potential applicability extends to traditional retail shops. This is because, although the factors influencing sales are different, such as network hotspots affecting online sales and the store location influencing sales in retail shops, there exists a parallelism in the method’s application approach. The shared principle involves the identification of pertinent features influencing sales, coupled with the utilization of machine learning for predictive analysis. However, a more comprehensive and systematic examination is needed to fully explore the adaptability to traditional retail contexts, thereby opening opportunities for further research. In order to further explore the method’s efficacy in traditional retail contexts, researchers may need to consider the unique offline variables, such as in-store customer behavior, shelf positioning and the impact of local marketing initiatives. Additionally, understanding how the model copes with challenges specific to traditional retail, like seasonality, local economic conditions and the influence of physical advertisements, would provide valuable insights.

5.5. Conclusions

The implementation of effective sales prediction models is of significance for the e-commerce sector, offering retailers a valuable tool to formulate precise production and inventory control plans. This, in turn, leads to substantial reductions in inventory costs and mitigates overdue losses. This paper develops a systematic approach to e-commerce sales prediction, specifically addressing the challenges posed by products with a short shelf life. Recognizing the inefficiencies in existing sales prediction methods for short-shelf-life products in e-commerce, this study introduces a novel stacking model that integrates LightGBM and GRU, namely GRU-LightGBM. Compared to other prediction methods, such as SARIMA and SVM, this model delivers advantages in multiple aspects, including stability, accuracy and efficiency. The interpretability of GRU-LightGBM also contributes to the formulation of sales strategies. The paper underscores the applicability of the proposed model through a comprehensive case study. Although the application of GRU-LightGBM is associated with short-shelf-life products in e-commerce, the method may also be applicable to other time series predictions.

Author Contributions

Conceptualization, Y.C., X.X. and Z.J.; methodology, Y.C. and X.X.; software, X.X. and Z.P.; validation, X.X., Y.C. and Z.J.; resources, Y.C and W.Z.; data curation, X.X. and W.Y.; writing—original draft preparation, Y.C., X.X., C.W. and Z.J.; review and editing, all authors; supervision, Y.C. and Z.J.; funding acquisition, Y.C. and Z.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the Natural Science Foundation of Zhejiang Province, China, Grant No. LGG22G010002, and the National Natural Science Foundation of China, No. 72301244.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Gammon, K. Experimenting with blockchain: Can one technology boost both data integrity and patients’ pocketbooks? Nat. Med. 2018, 24, 378–381. [Google Scholar] [CrossRef] [PubMed]
Peng, X.; Li, X.; Yang, X. Analysis of circular economy of E-commerce market based on grey model under the background of big data. J. Enterp. Inf. Manag. 2022, 35, 1148–1167. [Google Scholar] [CrossRef]
Zhang, X.; Guo, F.; Chen, T.; Pan, L.; Beliakov, G.; Wu, J. A Brief Survey of Machine Learning and Deep Learning Techniques for E-Commerce Research. J. Theor. Appl. Electron. Commer. Res. 2023, 18, 2188–2216. [Google Scholar] [CrossRef]
Rasim; Junaeti, E.; Wirantika, R. Implementation of Automatic Clustering Algorithm and Fuzzy Time Series in Motorcycle Sales Forecasting. IOP Conf. Ser. Mater. Sci. Eng. 2018, 288, 012126. [Google Scholar] [CrossRef]
Dinçoğlu, P.; Aygün, H. Comparison of Forecasting Algorithms on Retail Data. In Proceedings of the 2022 10th International Symposium on Digital Forensics and Security (ISDFS), Istanbul, Turkey, 6–7 June 2022. [Google Scholar]
Hewage, H.C.; Perera, H.N. Comparing Statistical and Machine Learning Methods for Sales Forecasting During the Post-promotional Period. In Proceedings of the 2021 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), Singapore, 13–16 December 2021. [Google Scholar]
Dairu, X.; Shilong, Z. Machine Learning Model for Sales Forecasting by Using XGBoost. In Proceedings of the 2021 IEEE International Conference on Consumer Electronics and Computer Engineering (ICCECE), Guangzhou, China, 15–17 January 2021. [Google Scholar]
Tang, Y.M.; Chau, K.Y.; Lau, Y.-Y.; Zheng, Z. Data-Intensive Inventory Forecasting with Artificial Intelligence Models for Cross-Border E-Commerce Service Automation. Appl. Sci. 2023, 13, 3051. [Google Scholar] [CrossRef]
Lashgari, Y.S.; Shahab, S. The Impact of the COVID-19 Pandemic on Retail in City Centres. Sustainability 2022, 14, 11463. [Google Scholar] [CrossRef]
Pongdatu, G.A.N.; Putra, Y.H. Seasonal Time Series Forecasting using SARIMA and Holt Winter’s Exponential Smoothing. IOP Conf. Ser. Mater. Sci. Eng. 2018, 407, 012153. [Google Scholar] [CrossRef]
Sidqi, F.; Sumitra, I.D. Forecasting Product Selling Using Single Exponential Smoothing and Double Exponential Smoothing Methods. IOP Conf. Ser. Mater. Sci. Eng. 2019, 662, 032031. [Google Scholar] [CrossRef]
Bergmeir, C.; Hyndman, R.J.; Benítez, J.M. Bagging exponential smoothing methods using STL decomposition and Box–Cox transformation. Int. J. Forecast. 2016, 32, 303–312. [Google Scholar] [CrossRef]
Contreras, J.; Espinola, R.; Nogales, F.; Conejo, A. ARIMA models to predict next-day electricity prices. IEEE Trans. Power Syst. 2003, 18, 1014–1020. [Google Scholar] [CrossRef]
Wang, S.-J.; Huang, C.-T.; Wang, W.-L.; Chen, Y.-H. Incorporating ARIMA forecasting and service-level based replenishment in RFID-enabled supply chain. Int. J. Prod. Res. 2010, 48, 2655–2677. [Google Scholar] [CrossRef]
Babai, M.; Ali, M.; Boylan, J.; Syntetos, A. Forecasting and inventory performance in a two-stage supply chain with ARIMA(0,1,1) demand: Theory and empirical analysis. Int. J. Prod. Econ. 2013, 143, 463–471. [Google Scholar] [CrossRef]
Biswas, A.K.; Ahmed, S.I.; Bankefa, T.; Ranganathan, P.; Salehfar, H. Performance Analysis of Short and Mid-Term Wind Power Prediction using ARIMA and Hybrid Models. In Proceedings of the 2021 IEEE Power and Energy Conference at Illinois (PECI), Champaign, IL, USA, 1–2 April 2021. [Google Scholar]
Bi, J.; Wang, Y.; Sun, S.; Guan, W. Predicting Charging Time of Battery Electric Vehicles Based on Regression and Time-Series Methods: A Case Study of Beijing. Energies 2018, 11, 1040. [Google Scholar] [CrossRef]
Tsoumakas, G. A survey of machine learning techniques for food sales prediction. Artif. Intell. Rev. 2019, 52, 441–447. [Google Scholar] [CrossRef]
Li, Q.; Yu, M. Achieving Sales Forecasting with Higher Accuracy and Efficiency: A New Model Based on Modified Transformer. J. Theor. Appl. Electron. Commer. Res. 2023, 18, 1990–2006. [Google Scholar] [CrossRef]
Xia, Z.; Xue, S.; Wu, L.; Sun, J.; Chen, Y.; Zhang, R. ForeXGBoost: Passenger car sales prediction based on XGBoost. Distrib. Parallel Databases 2020, 38, 713–738. [Google Scholar] [CrossRef]
Bi, X.; Adomavicius, G.; Li, W.; Qu, A. Improving Sales Forecasting Accuracy: A Tensor Factorization Approach with Demand Awareness. INFORMS J. Comput. 2022, 34, 1644–1660. [Google Scholar] [CrossRef]
Hwang, S.; Yoon, G.; Baek, E.; Jeon, B.-K. A Sales Forecasting Model for New-Released and Short-Term Product: A Case Study of Mobile Phones. Electronics 2023, 12, 3256. [Google Scholar] [CrossRef]
Chaudhuri, K.D.; Alkan, B. A hybrid extreme learning machine model with harris hawks optimisation algorithm: An optimised model for product demand forecasting applications. Appl. Intell. 2022, 52, 11489–11505. [Google Scholar] [CrossRef]
Zhang, B.; Tseng, M.-L.; Qi, L.; Guo, Y.; Wang, C.-H. A comparative online sales forecasting analysis: Data mining techniques. Comput. Ind. Eng. 2023, 176, 108935. [Google Scholar] [CrossRef]
Kuo, R.J.; Tseng, Y.S.; Chen, Z.-Y. Integration of fuzzy neural network and artificial immune system-based back-propagation neural network for sales forecasting using qualitative and quantitative data. J. Intell. Manuf. 2016, 27, 1191–1207. [Google Scholar] [CrossRef]
Ma, S.; Fildes, R. Retail sales forecasting with meta-learning. Eur. J. Oper. Res. 2021, 288, 111–128. [Google Scholar] [CrossRef]
Yu, Q.; Wang, K.; Strandhagen, J.O.; Wang, Y. Application of Long Short-Term Memory Neural Network to Sales Forecasting in Retail—A Case Study. In Advanced Manufacturing and Automation VII; Springer: Singapore, 2018. [Google Scholar]
Saha, P.; Gudheniya, N.; Mitra, R.; Das, D.; Narayana, S.; Tiwari, M.K. Demand Forecasting of a Multinational Retail Company using Deep Learning Frameworks. IFAC-PapersOnLine 2022, 55, 395–399. [Google Scholar] [CrossRef]
Zhu, B.; Dong, H.; Zhang, J. Car Sales Prediction Using Gated Recurrent Units Neural Networks with Reinforcement Learning. In Intelligence Science and Big Data Engineering. Big Data and Machine Learning; Springer International Publishing: Cham, Switzerland, 2019. [Google Scholar]
Ou-Yang, C.; Chou, S.-C.; Juan, Y.-C. Improving the Forecasting Performance of Taiwan Car Sales Movement Direction Using Online Sentiment Data and CNN-LSTM Model. Appl. Sci. 2022, 12, 1550. [Google Scholar] [CrossRef]
Choi, T.-M.; Hui, C.-L.; Liu, N.; Ng, S.-F.; Yu, Y. Fast fashion sales forecasting with limited data and time. Decis. Support Syst. 2014, 59, 84–92. [Google Scholar] [CrossRef]
Khandelwal, I.; Adhikari, R.; Verma, G. Time Series Forecasting Using Hybrid ARIMA and ANN Models Based on DWT Decomposition. Procedia Comput. Sci. 2015, 48, 173–179. [Google Scholar] [CrossRef]
Han, Y. A forecasting method of pharmaceutical sales based on ARIMA-LSTM model. In Proceedings of the 2020 5th International Conference on Information Science, Computer Technology and Transportation (ISCTT), Shenyang, China, 13–15 November 2020. [Google Scholar]
Dong, L.; Fang, D.; Wang, X.; Wei, W.; Damaševičius, R.; Scherer, R.; Woźniak, M. Prediction of Streamflow Based on Dynamic Sliding Window LSTM. Water 2020, 12, 3032. [Google Scholar] [CrossRef]
He, H.; Gao, S.; Jin, T.; Sato, S.; Zhang, X. A seasonal-trend decomposition-based dendritic neuron model for financial time series prediction. Appl. Soft Comput. 2021, 108, 107488. [Google Scholar] [CrossRef]
Mohsin, A.; Hongzhen, L.; Iqbal, M.M.; Salim, Z.R.; Hossain, A.; Al Kafy, A. Forecasting e-waste recovery scale driven by seasonal data characteristics: A decomposition-ensemble approach. Waste Manag. Res. 2021, 40, 870–881. [Google Scholar] [CrossRef]
Carbo-Bustinza, N.; Iftikhar, H.; Belmonte, M.; Cabello-Torres, R.J.; De La Cruz, A.R.H.; López-Gonzales, J.L. Short-Term Forecasting of Ozone Concentration in Metropolitan Lima Using Hybrid Combinations of Time Series Models. Appl. Sci. 2023, 13, 10514. [Google Scholar] [CrossRef]
Qin, L.; Li, W.; Li, S. Effective passenger flow forecasting using STL and ESN based on two improvement strategies. Neurocomputing 2019, 356, 244–256. [Google Scholar] [CrossRef]
Lin, C.; Weng, K.; Lin, Y.; Zhang, T.; He, Q.; Su, Y. Time Series Prediction of Dam Deformation Using a Hybrid STL–CNN–GRU Model Based on Sparrow Search Algorithm Optimization. Appl. Sci. 2022, 12, 11951. [Google Scholar] [CrossRef]
Yadav, N.S.; Sharma, V.P.; Reddy, D.S.D.; Mishra, S. An Effective Network Intrusion Detection System Using Recursive Feature Elimination Technique. Eng. Proc. 2023, 59, 99. [Google Scholar] [CrossRef]
Lu, Y.; Fan, X.; Zhao, Z.; Jiang, X. Dynamic Fire Risk Classification Prediction of Stadiums: Multi-Dimensional Machine Learning Analysis Based on Intelligent Perception. Appl. Sci. 2022, 12, 6607. [Google Scholar] [CrossRef]
Cho, K.; van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
Li, S.; Zou, Y.; Shi, Z.; Tian, J.; Li, W. Performance enhancement of CAP-VLC system utilizing GRU neural network based equalizer. Opt. Commun. 2023, 528, 129062. [Google Scholar] [CrossRef]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
Wei, J.; Li, Z.; Pinker, R.T.; Wang, J.; Sun, L.; Xue, W.; Li, R.; Cribb, M. Himawari-8-derived diurnal variations in ground-level PM2.5 pollution across China using the fast space-time Light Gradient Boosting Machine (LightGBM). Atmos. Chem. Phys. 2021, 21, 7863–7880. [Google Scholar] [CrossRef]
Sun, X.; Liu, M.; Sima, Z. A novel cryptocurrency price trend forecasting model based on LightGBM. Financ. Res. Lett. 2020, 32, 101084. [Google Scholar] [CrossRef]
Ju, Y.; Sun, G.; Chen, Q.; Zhang, M.; Zhu, H.; Rehman, M.U. A Model Combining Convolutional Neural Network and LightGBM Algorithm for Ultra-Short-Term Wind Power Forecasting. IEEE Access 2019, 7, 28309–28318. [Google Scholar] [CrossRef]
Corp, M. Parameters-LightGBM 3.3.5.99 Documentation. 2023. Available online: https://lightgbm.readthedocs.io/en/latest/Parameters.html (accessed on 6 December 2023).
Pavlyshenko, B. Using Stacking Approaches for Machine Learning Models. In Proceedings of the 2018 IEEE Second International Conference on Data Stream Mining & Processing (DSMP), Lviv, Ukraine, 21–25 August 2018. [Google Scholar]
Gajewski, P.; Čule, B.; Rankovic, N. Unveiling the Power of ARIMA, Support Vector and Random Forest Regressors for the Future of the Dutch Employment Market. J. Theor. Appl. Electron. Commer. Res. 2023, 18, 1365–1403. [Google Scholar] [CrossRef]
Zhang, W.; Lin, Z.; Liu, X. Short-term offshore wind power forecasting—A hybrid model based on Discrete Wavelet Transform (DWT), Seasonal Autoregressive Integrated Moving Average (SARIMA), and deep-learning-based Long Short-Term Memory (LSTM). Renew. Energy 2022, 185, 611–628. [Google Scholar] [CrossRef]
Zhao, Z.; Wu, C. Wheat Quantity Monitoring Methods Based on Inventory Measurement and SVR Prediction Model. Appl. Sci. 2023, 13, 12745. [Google Scholar] [CrossRef]
Wu, D.; Jia, Z.; Zhang, Y.; Wang, J. Predicting Temperature and Humidity in Roadway with Water Trickling Using Principal Component Analysis-Long Short-Term Memory-Genetic Algorithm Method. Appl. Sci. 2023, 13, 13343. [Google Scholar] [CrossRef]
Wang, W.; Zhang, Z.; Wang, L.; Zhang, X.; Zhang, Z. Mixed-frequency data-driven forecasting the important economies’ performance in a smart city: A novel RUMIDAS-SVR model. Ind. Manag. Data Syst. 2022, 122, 2175–2198. [Google Scholar] [CrossRef]
Shankar, S.; Ilavarasan, P.V.; Punia, S.; Singh, S.P. Forecasting container throughput with long short-term memory networks. Ind. Manag. Data Syst. 2020, 120, 425–441. [Google Scholar] [CrossRef]
Karunasingha, D.S.K. Root mean square error or mean absolute error? Use their ratio as well. Inf. Sci. 2022, 585, 609–629. [Google Scholar] [CrossRef]

Figure 1. Prediction workflow.

Figure 2. GRU data transfer diagram of adjacent time. (

x^{t}

is input variable,

y^{t}

is output variable,

h^{t}

is transfer variable at time t).

Figure 2. GRU data transfer diagram of adjacent time. (

x^{t}

is input variable,

y^{t}

is output variable,

h^{t}

is transfer variable at time t).

Figure 3. The computing process of GRU unit.

Figure 4. LightGBM based on histogram (‘#’ denotes a combination of multiple items).

Figure 5. Structure of GRU-LightGBM.

Figure 6. Decomposition of sales time series (from top to bottom: original data, long-term trend, periodic trend and random disturbance).

Figure 7. RFECV method feature selection curve.

Figure 8. GRU network structure of the combined model.

Figure 9. Sales prediction output (The prediction output of December is shown in the orange squire region).

Figure 10. Comparison of sales prediction results by different algorithms.

Table 1. The steps of applying GRU-LightGBM in sales prediction.

Sequence	Description
Step 1—Raw data collection	Use marketing data tools or data crawlers to collect the sales volume data.
Step 2—Data processing	(1) Data cleaning and data transformation; (2) The seasonal trend decomposition procedures based on loess (STL) are used for data fluctuation analysis; (3) Use RFECV to selected key features.
Step 3—Prediction using GRU-LightGBM	The prediction model is a stacking model developed based on the integration of GRU and LightGBM. GRU is used to extract trends and fluctuations in time series data and LightGBM is used to solve multivariable problems.
Step 4—Prediction output analysis	Deliver prediction outputs and generate sales strategies for organization.

Table 2. Steps of inner loops [38] (the iteration of the loop (

k + 1

) is assumed).

Table 2. Steps of inner loops [38] (the iteration of the loop (

k + 1

) is assumed).

Step Number	Activity
Step 1: Detrending	Remove the trend component of the $k$ th iteration $T_{t}^{k}$ from $Y_{t}$ , i.e., $Y_{t} \leftarrow Y_{t} - T_{t}^{k}$ .
Step 2: Cycle subseries smoothing	Each cycle subseries is smoothed via loess and the smoothing results compose the temporary season series $C_{t}^{k + 1}$ .
Step 3: Low-pass filtering of cycle subseries	Low-pass filtering and loess regression are used to handle the temporary season series $C_{t}^{k + 1}$ . The resulting series is referred to as $L_{t}^{k + 1}$ .
Step 5: Deseasonalization	The seasonal component of the ( $k + 1$ )th iteration $S_{t}^{k + 1}$ is calculated as $S_{t}^{k + 1} = C_{t}^{k + 1} - L_{t}^{k + 1}$ .
Step 6: Trend smoothing	To obtain the trend component of the ( $k + 1$ )th iteration $T_{t}^{k + 1}$ , the output of Step 5 is smoothed via loess.

Table 3. The main parameters of LightGBM [48].

Parameter	Interpretation
max_depth	The max depth of the tree model.
min_data_in_leaf	The minimum amount of data in one leaf.
feature_fraction	The fraction of features selected randomly in each iteration for each tree.
bagging_fraction	The fraction of data selected randomly for each iteration without resampling.
learning_rate	The shrinkage rates. In dart, it also affects the normalization weights of dropped trees.
num_leaves	The max number of leaves in one tree.

Table 4. STL analysis output.

Analysis Aspects	Results
Long-term trend	The sales volume is increasing.
Periodic trend	There is annual periodicity, but it fluctuates greatly. The peaks always occur in June and November, and the trough always occurs in March.
Random disturbance	The peaks of sales show significant random disturbances.
Overall	Significant fluctuations and uncertainty.

Table 5. Set of sales forecast features.

Class	Feature
Product time-varying features	Sales volume, active price, Baidu index
Platform activity features	Activity period, activity heat, activity stage, activity first day
Advertising features	Brand zone, through train, drill show, brand special show, Youku
Content operation features	Master live, brand live, Juhuasuan, Taobao short video, micro tao
Periodic features	Weeks, holidays, months
Statistics features	Average daily sales volume (3 days), average daily sales volume (7 days), average activity price (3 days), average activity price (7 days), average Baidu index (3 days), average Baidu index (7 days)

Table 6. Optimal parameters of LightGBM algorithm.

Parameter	Value	Parameter	Value
boosting	gbdt	min_data_in_leaf	11
application	regression	max_bin	175
metric	mse	feature_fraction	0.8
learning_rate	0.05	bagging_fraction	0.6
n_estimators	122	bagging_freq	20
max_depth	6	reg_lambda	0.001
num_leaves	18	reg_alpha	0.01

Table 7. The comparison among different prediction methods.

Method	Advantages	Disadvantages
ARIMA	Performs well in short-term prediction. Only the prior data are needed.	Cannot capture the nonlinear relationships in sales.
Seasonal ARIMA	SARIMA is a type of ARIMA model that performs well when dealing with seasonally affected time series data.	There are shortcomings in dealing with time series that have periodicity.
SVR	SVR can execute linear and nonlinear regressions with a high degree of accuracy by utilizing various kernel functions [54].	Poor interpretation can be found when solving nonlinear problems. Weak in handling large datasets.
LightGBM	LightGBM is efficient and robust.	The accuracy of prediction is inferior to that of deep learning methods such as LSTM and GRU.
LSTM	LSTM is addressed to tackle the problem of gradient vanishing and explosion in the process of long sequence calculation and has the capacity for long-term memory [55].	It has a slow training speed.
GRU	Same as LSTM. GRU is the lightweight implementation of LSTM.	It has a slow training speed.

Table 8. Prediction performance evaluation.

Model	RMSE	MAE	MAPE
ARIMA	20.77	16.27	8.41%
Seasonal ARIMA	13.35	10.44	7.48%
SVR	13.26	10.16	6.94%
LightGBM	12.11	10.21	6.60%
LSTM	11.81	9.03	5.87%
GRU	11.19	8.76	5.74%
GRU-LightGBM	8.01	5.41	4.11%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, Y.; Xie, X.; Pei, Z.; Yi, W.; Wang, C.; Zhang, W.; Ji, Z. Development of a Time Series E-Commerce Sales Prediction Method for Short-Shelf-Life Products Using GRU-LightGBM. Appl. Sci. 2024, 14, 866. https://doi.org/10.3390/app14020866

AMA Style

Chen Y, Xie X, Pei Z, Yi W, Wang C, Zhang W, Ji Z. Development of a Time Series E-Commerce Sales Prediction Method for Short-Shelf-Life Products Using GRU-LightGBM. Applied Sciences. 2024; 14(2):866. https://doi.org/10.3390/app14020866

Chicago/Turabian Style

Chen, Yong, Xian Xie, Zhi Pei, Wenchao Yi, Cheng Wang, Wenzhu Zhang, and Zuzhen Ji. 2024. "Development of a Time Series E-Commerce Sales Prediction Method for Short-Shelf-Life Products Using GRU-LightGBM" Applied Sciences 14, no. 2: 866. https://doi.org/10.3390/app14020866

APA Style

Chen, Y., Xie, X., Pei, Z., Yi, W., Wang, C., Zhang, W., & Ji, Z. (2024). Development of a Time Series E-Commerce Sales Prediction Method for Short-Shelf-Life Products Using GRU-LightGBM. Applied Sciences, 14(2), 866. https://doi.org/10.3390/app14020866

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Development of a Time Series E-Commerce Sales Prediction Method for Short-Shelf-Life Products Using GRU-LightGBM

Abstract

1. Introduction

2. Literature Review of Sales Prediction Methods

3. Methodology

3.1. Research Premise

3.2. Research Approach

3.3. Data Processing Methods

3.3.1. Data Preprocessing

3.3.2. The STL Method

3.3.3. Feature Selection Using REFCV

3.4. The Development of GRU-LightGBM

3.4.1. The GRU Method

3.4.2. The LightGBM Method

3.4.3. Harmonization of GRU-LightGBM

4. Application Using GRU-LightGBM for Sales Prediction

4.1. Background of the Case Study

4.2. STL Analysis

4.3. Result of Feature Selection

4.4. Result of Sales Prediction Using GRU-LightGBM

5. Discussion and Conclusions

5.1. Comparison with Other Prediction Methods

5.2. Contribution of the Work

5.3. Limitations of the Work

5.4. Opportunities

5.5. Conclusions

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI