*Article* **Dynamic Model Selection Based on Demand Pattern Classification in Retail Sales Forecasting**

**Erjiang E 1,\*, Ming Yu 2, Xin Tian 3,4,\* and Ye Tao <sup>5</sup>**


**Abstract:** Many forecasting techniques have been applied to sales forecasts in the retail industry. However, no one prediction model is applicable to all cases. For demand forecasting of the same item, the different results of prediction models often confuse retailers. For large retail companies with a wide variety of products, it is difficult to find a suitable prediction model for each item. This study aims to propose a dynamic model selection approach that combines individual selection and combination forecasts based on both the demand patterns and the out-of-sample performance for each item. Firstly, based on both metrics of the squared coefficient of variation (CV2) and the average interdemand interval (ADI), we divide the demand patterns of items into four types: smooth, intermittent, erratic, and lumpy. Secondly, we select nine classical forecasting methods in the M-Competitions to build a pool of models. Thirdly, we design two dynamic weighting strategies to determine the final prediction, namely DWS-A and DWS-B. Finally, we verify the effectiveness of this approach by using two large datasets from an offline retailer and an online retailer in China. The empirical results show that these two strategies can effectively improve the accuracy of demand forecasting. The DWS-A method is suitable for items with the demand patterns of intermittent and lumpy, while the DWS-B method is suitable for items with the demand patterns of smooth and erratic.

**Keywords:** sales forecasting; demand pattern; dynamic weighting; model selection; retail

**MSC:** 62P30

#### **1. Introduction**

Retailers are under enormous pressure to grow their sales, profit, and market share [1]. Sales forecasts play a crucial role in the operation of the retail industry. Reliable sales forecasts can significantly enhance the effectiveness of business strategy quality, reduce operating expenses, and improve customer satisfaction. However, sales forecasting is not an easy task due to a variety of factors affecting demand and supply. For example, numerous factors, including weather, promotions, and pricing, have an impact on product sales [2]. Thus, for those retailers who supply a wide range of stock-keeping units (SKUs), accurately predicting the sales of each product will be a complex task.

Currently, many forecasting techniques have been applied to sales forecasts in the retail industry. Simple moving averages and sophisticated machine learning algorithms are among the techniques used. The amount of data and computing complexity required for these models varies greatly. Many academics have attempted to assess and contrast the effectiveness of various forecasting techniques, such as M-Competitions. However, some scholars have found that some models perform well in a specific scene but perform poorly in another scene [3,4]. No single prediction model is ever universally applicable in all cases [5]. Moreover, for demand forecasting of the same item, the different results of

**Citation:** E, E.; Yu, M.; Tian, X.; Tao, Y. Dynamic Model Selection Based on Demand Pattern Classification in Retail Sales Forecasting. *Mathematics* **2022**, *10*, 3179. https://doi.org/ 10.3390/math10173179

Academic Editor: Gheorghe S ˘avoiu

Received: 4 July 2022 Accepted: 25 August 2022 Published: 3 September 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

prediction models often confuse retailers. In practice, a key issue is how managers choose the right predictive model for each product in a variety of forecasting techniques. The sales volume and data length of each product widely vary. For example, Haolingju, a large chain of convenience stores in Beijing, stocks more than 5000 different items in its distribution center and has more than 800 stores. The best-selling products can sell tens of thousands a day, such as Nongfu Spring Mineral Water. In addition, managers need to remove lowvolume products from the shelves and launch new products to meet consumer demand. Some products are sold for a short period of time or have high volatility and skewness. New product forecasts are ranked by forecasters as one of the most complex forecasting tasks they encounter, as little or no historical data are available for reference [6,7].

This study proposes a dynamic model selection approach that combines individual selection and combination forecasts based on both the demand patterns and the out-ofsample performance for each item. Firstly, we selected nine classical forecasting methods in the M-Competitions to build a model pool. The M-Competitions aim to learn how to improve prediction accuracy and how to apply this learning to promote prediction theory and practice [8]. Secondly, based on both indicators of the squared coefficient of variation (CV2) and the average inter-demand interval (ADI), we divided the demand patterns of items into four types: smooth, intermittent, erratic, and lumpy. For instance, the smooth pattern is characterized by low CV2 and short ADI, while the intermittent pattern is characterized by low CV<sup>2</sup> and long ADI. The erratic pattern is characterized by high CV<sup>2</sup> and short ADI, while the lumpy pattern is characterized by high CV<sup>2</sup> and long ADI. Thirdly, we designed two dynamic weighting strategies to determine the final prediction, namely DWS-A and DWS-B. Finally, we demonstrated the effectiveness of this approach by using two large datasets from a large offline retailer (Haolinju) and a large online retailer (JD) in China. We implemented multi-round rolling forecast with different horizons. The results show that the proposed dynamic weighting strategies outperformed the benchmark and winner prediction models in M-Competitions, including Naïve, Comb S-H-D, and simple combination of univariate models (SCUM). Further, we investigated the optimal weighting strategy for each demand pattern. The analysis results suggest that the DWS-A method is applicable to the items of intermittent and lumpy patterns, and the DWS-B method is applicable to the items of smooth and erratic patterns.

The rest of the paper is organized as follows. Section 2 presents a literature review of the forecasting methods and model selection. Section 3 describes the methodology of sales forecasting. Section 4 presents the results of a sales forecasting system for two real-world problems. Lastly, Section 5 provides a summary of the results and concludes the study.

#### **2. Literature Review**

#### *2.1. Demand Forecasting Method in Retailing*

Over the past few decades, many researchers have proposed a new prediction or revised existing models based on application requirements. Traditional quantitative prediction methods include times series, econometric models, and machine learning. At present, scholars increasingly pay attention to the integration of mixed and combined models of two or more models.

#### 2.1.1. Individual Methods

(a) Time series method. Some prediction methods, such as Naïve, seasonal Naïve, and moving averages, are very simple and effective [9]. These methods are often used as a benchmark for new demand forecasting methods. However, the performance of the Naïve model will drop in the long-term predictions or predicting the series of structural mutations. Exponential smoothing is a simple and practical point prediction method in which predictions are constructed from exponentially weighted averages of past observations. Simple exponential smoothing is suitable for forecasts without significant trends or seasonal patterns. In contrast, double exponential smoothing models, such as Brown's DES and Holt's DES, were developed to deal with time series of linear trends [10,11]. Holt–Winter's model

was developed to handle time series with trends or seasonal patterns [11,12], whereas the ARMA model, proposed by Box and Jenkins in 1976 [13], is one of the most widely used to predict various time series. For instance, Ali et al. [14] found that simple time series techniques perform very well for periods without promotions.

(b) Econometric model. An econometric model is a useful tool for economic forecasting and causality analysis. As a typical example of econometric models, the traditional regression method can be used to analyze the causal relationship between product sales and the factors affecting it [15]. For example, Divakar et al. [16] proposed a sales forecasting model by using a dynamic regression model to capture the effects of such variables as past sales, trend, temperature, significant holidays, etc.

(c) Machine learning method. The artificial neural network (ANN) models are widely used in retail sales forecasts. Kong and Martin [17] found that the backpropagation neural network (BPN) is a useful tool to generate sales forecasts and outperform statistical methods. Meanwhile, Lee et al. [18] used the BPN method to establish a convenience store sales forecasting model. Furthermore, Chen and Ou [19] proposed a model that integrates grey correlation analysis and a multi-layer functional link network to predict the actual sales data in the retail industry.

#### 2.1.2. Hybrid Methods

No general predictive model is applicable to different types of problems. Some researchers argued that hybrid models integrate two or more models with different capabilities, which are more accurate than a single specific model with limited capabilities [2]. Aburto and Weber [20] proposed a hybrid system of combing ARIMA and neural networks to predict the daily demand of a Chilean supermarket. They showed an increase in predictive accuracy and proposed a replenishment system that reduces sales failures and inventory levels compared with previous solutions. Meanwhile, Arunraj and Ahrens [2] developed seasonal autoregressive combined moving averages using external variable models to predict the daily sales of banana in a retail store in Germany. Furthermore, Liu et al. [21] combined time series and hidden Markov models to improve the reliability of the prediction. Rubio and Alba [22] proposed a hybrid model combing ARIMA and support vector machine to predict Colombian shares. Wang et al. [23] proposed an error compensation mechanism to address the user's ability to correct the model in practice and designed a hybrid LSTM-ARMA model for demand forecasting.

#### 2.1.3. Combination Methods

Combining forecast refers to the averaging forecasts of component methods to reduce forecast error [24]. Makridakis and Hibon [25] proposed a combination method in the M3 competition, namely, Comb S-H-D. This method is a simple arithmetic mean of single exponential smoothing (SES), Holt exponential smoothing, and exponential smoothing with the damped trend. This combination method is more accurate than the above three methods. Makridakis, Spiliotis, and Assimakopoulos [8] found that of the 17 most accurate methods of the M4 competition, 12 are 'combinations' of statistical methods. Meanwhile, Aye et al. [26] found that the combined forecasting models perform better in forecasting aggregate retail sales than the single models and are not affected by the business cycle and time horizon.

#### *2.2. Model Selection*

The existing literature indicates that the performance of forecasting models largely depends on the choice of error measures, the model used for comparison, the forecasting horizon, and the type of data. Zhang [27] argued that no single prediction model is applicable to all cases. For instance, Aburto and Weber [20] found that neural networks are superior to ARIMA models, and the proposed additive hybrid approach yields the best results. Lee, Chen, Chen, Chen, and Liu [18] found that logistic regression performed better than BPNN and moving average, and Kuo [28] found that the fuzzy neural network has better performance than conventional statistical methods. Thus, which forecasting techniques should be chosen when retailers face complex environments in production operations and management?

Since no single model always outperforms other candidate models in all cases, it is necessary to find a model selection method for any given SKU or item. Recently, some scholars have paid more attention to the topic of forecasting model selection. Table 1 shows typical papers that have investigated forecasting model selection and presents the contribution of our study to the literature. The strategies for selecting the best prediction model according to the historical performance of candidate models can be classified into three types: individual selection, aggregate selection, and combination forecasts. Individual selection refers to finding the most suitable prediction model for each SKU or item. Instead, aggregate selection refers to when a single forecasting model is used for all SKUs or items [29]. Combined forecasts combine a set of forecasting models by building a weight coefficient vector. Individual model selection is more effective than most aggregation model selection methods, but the former has the disadvantage of higher complexity and computational costs [30]. In the individual selection procedure, information criteria (such as Akaike information), time series features, in-sample performance, and out-of-sample performance are usually used as model selection criteria. For instance, Villegas et al. [31] proposed a model selection method that combines information criteria and in-sample performance using a support vector machine. Taghiyeh, Lengacher, and Handfield [30] developed an approach that combines both in-sample and out-of-sample performance. Ulrich, Jahnke, Langrock, Pesch, and Senge [4] considered model selection as a classification problem and proposed a model selection framework via classification based on the labeled training data. Combining different models is another effective way to improve the performance of prediction [27]. However, Claeskens, Magnus, Vasnev, and Wang [3] showed that simple weighting schemes, such as arithmetic mean, usually produce equally good or better predictions than more complex weighting schemes.

**Table 1.** Review of published literature for forecasting model selection.


The contribution of our study is to determine the corresponding model selection strategies that combine individual selection and combination forecasts based on both the demand patterns and the out-of-sample performance for each item. Further, we selected the benchmark and winning models in M-Competitions as the candidate models.

#### **3. Methodology**

In this section, we designed an automatic forecasting system to address model selection of sales forecasting in the retail industry. Figure 1 shows the flowchart of the system framework, which was designed to include four steps: data input and pre-processing, construction of model pool and forecasting, classification of demand pattern and model selection, and final prediction output and database update.

**Figure 1.** The system framework flowchart.

#### *3.1. Design of Forecasting Model Pool*

According to Figure 1, we know that the sales characteristics of different items vary greatly. Moreover, the sales characteristics of each item will also change over time. Therefore, no single forecasting method can maintain the advantage in the demand forecasting of all items. In this study, the idea of dynamic optimization is introduced into the task of forecasting. Firstly, a model pool composed of multiple prediction methods was constructed. Secondly, a vector of dynamic weight coefficients was determined based on the performance of the prediction methods in practice. Finally, the prediction of each item was determined according to the corresponding demand pattern and weight vector of the item.

Based on M-Competitions, this study selected the nine most popular forecasting models to build a pool of models for predicting sales of retail products.

Sub-Model 1: Naïve. The value of the last sales is simply used for all forecasts.

Sub-Model 2: Seasonal Naïve. Considering the sales characteristics of retail products, the model uses daily sales for the previous week as the forecast for the same day of the week.

Sub-Model 3: Single exponential smoothing (SES). The SES model weights the sum of the predicted and actual values of historical sales through the smoothing coefficient.

Sub-Model 4: Holt's linear exponential smoothing. The Holt model considers the linear trend of the sequence on the basis of the SES model [11].

Sub-Model 5: Dampened trend exponential smoothing. The damped model considers the damping trend on the basis of the Holt model [32].

Sub-Model 6: Comb S-H-D. The 'Comb S-H-D' method is the simple arithmetic average of Models 3, 4, and 5. The Comb S-H-D model is more accurate than the three individual methods in M3 competition [25].

Sub-Model 7: Theta. The theta model decomposes the time series into two or more curves, which are combined by theta coefficients [33].

Sub-Model 8: 4Theta. The 4Theta model takes into account the nonlinear pattern of trend and the strength of adjustment trend on the basis of the theta model, and introduces a multiplication term into the model [34].

Sub-Model 9: Simple combination of univariate model. The SCUM model combines four methods: exponential smoothing, complex exponential smoothing, ARIMA, and dynamic optimization theta, and takes the median of the predicted values of the four models as the final predicted value [35]. The SCUM model outperformed most models and improved by 5.6% compared with the benchmark model in M4 competition [8].

#### *3.2. Demand Pattern Classification*

For retail stores, the sales characteristics of different items vary greatly. The coefficient of variation is an effective index to measure the volatility of an item's demand, which is defined by the ratio of the standard deviation to the mean demand. The squared coefficient of variation (CV2) of the demand sizes is given by:

$$\text{CV}^2 = \frac{\sigma^2}{\mu^2} \tag{1}$$

The demand for some products may be zero in some time periods. The average interdemand interval (ADI) is another important indicator to describe the demand characteristics of items. The ADI is calculated as follows:

$$\text{ADI} = \frac{Z}{I} \tag{2}$$

where *Z* is the number of zero demand, and *I* is the number of intervals. For example, the daily demand of an item is [3,0,2,0,0,3,0,1,0,4], and then the average inter-demand interval is 5/4.

Based on the series' average inter-demand interval and the squared coefficient of variation of the demand sizes, Syntetos et al. [36] proposes a rule to classify demand patterns into four categories: smooth (CV2 < 0.49 and ADI < 1.32), intermittent (CV2 < 0.49 and ADI ≥ 1.32), erratic (CV2 ≥ 0.49 and ADI < 1.32), and lumpy (CV2 ≥ 0.49 and ADI ≥ 1.32).

Figure 2 shows an example of four items selling on JD, a large B2C online retailer in China. According to CV2 and ADI, these four items represent the sales characteristics of the four demand patterns, respectively. The smooth pattern is characterized by relatively stable demand volatility and a short average inter-demand interval. The lumpy pattern is characterized by high demand volatility and a long average inter-demand interval. Obviously, the demand prediction of the lumpy pattern will be more difficult than that of the smooth pattern.

**Figure 2.** Typical sales characteristics of retail products of the four demand patterns.

#### *3.3. Design of Dynamic Weighting Strategy*

Suppose that the model pool *M* has *m* sub-models. The sub-model *i* predicts the demand *y*ˆ*i*,*T*+<sup>1</sup> at *T* + 1 based on the historical observations *y* = {*y*1,..., *yT*}.

$$\circ\_{i,T+1} = f\_i(y|y\_1, \dots, y\_T), \ i \in M \tag{3}$$

Let **w** = [*w*1, ... , *wm*] present a weight vector. The objective of the ensemble model is to determine the weight coefficient of each sub-model (*wi*) and to obtain the final prediction value by weighted summation of the output of the sub-model.

$$\hat{Y}\_{T+1} = \sum\_{i=1}^{m} w\_{i,T+1} \hat{y}\_{i,T+1}, w\_{i,T+1} = [0, 1], \sum w\_{i,T+1} = 1 \tag{4}$$

The weight coefficient will change with the performance of the model in multi-round rolling prediction. *ei*,*<sup>t</sup>* denotes the error metric of model *i* at *t*, such as the root mean square error or symmetric mean absolute percentage error, and *Ei*,*<sup>k</sup>* represents the performance of model *i* over a period of time:

$$E\_{i,k} = \frac{1}{k} \sum\_{t=T+1}^{T+k} c\_{i,t}, t = \{T+1, T+2, \dots, T+k\} \tag{5}$$

Based on the performance of the sub-models in reality, this study proposes two dynamic weighting strategies.

Dynamic weighting strategy A (DWS-A): The final predictions of DWS-A are the forecasts of that model, which outperforms other models on historical data. The weight coefficient of sub-model *i* under the DWS-A is as follows:

$$w\_{i,k}^A = \begin{cases} \ 1, \text{if } E\_{i,k} = \min\{E\_k\}, \\ \ 0, \text{otherwise.} \end{cases} \tag{6}$$

where *Ek* = . *E*1,*k*,..., *Ei*,*k*,..., *Em*,*<sup>k</sup>* / is the set of error metrics of all sub-models.

Dynamic weighting strategy B (DWS-B): The final predictions of DWS-A utilize all the sub-models, which are weighted according to their performance on historical data. The weight coefficient of sub-model *i* under the DWS-B is given by the formula:

$$w\_{i,k}^B = \frac{\max\{E\_k\} - E\_{i,k}}{\sum\_{i=1}^m \left(\max\{E\_k\} - E\_{i,k}\right)}\tag{7}$$

In the real world, the value of error metrics will change dynamically as the models roll forward. Thus, the weight coefficients (*w<sup>A</sup> <sup>i</sup>*,*<sup>k</sup>* and *<sup>w</sup><sup>B</sup> <sup>i</sup>*,*k*) will also change with *k*. The final predictions of DWS-A and DWS-B at *T* + *k* + 1 are given by the formula:

$$\hat{Y}\_{T+k+1}^{j} = \sum\_{i=1}^{m} w\_{i,k}^{j} \hat{y}\_{i,T+k+1}, j \in \{A, B\} \tag{8}$$

#### *3.4. Model Evaluation*

Cross-validation is a primary method of measuring the predictive performance of a model. In this study, symmetric mean absolute percentage error (sMAPE), mean absolute scaled error (MASE), and overall weighted average (OWA) were used to evaluate the performance of the forecasting methods [8,25,37]. The sMAPE is defined as:

$$\text{sMAPE} = \frac{1}{h} \sum\_{t=1}^{h} \frac{2|y\_t - \hat{y}\_t|}{|y\_t| + |\mathcal{G}\_t|}. \tag{9}$$

where *yt* is the real sales at point *t*, *y*ˆ*<sup>t</sup>* is the forecasting sales, and *h* is the forecasting horizon. Items with intermittent demand and lumpy demand are very common in retailing. The problem of large error can be avoided by using symmetric MAPE when the actual values, *yt*, are close to zero.

The MASE is defined as:

$$\text{MSE} = \frac{1}{h} \frac{\sum\_{t=1}^{h} |y\_t - \mathcal{Y}\_t|}{\frac{1}{n-r} \sum\_{t=r+1}^{n} |y\_t - y\_{t-r}|} \tag{10}$$

where *r* is the frequency of the data and *n* is the number of historical observations. The MASE is a scale-free error metric. It never yields undefined or infinite values and therefore is a good choice for intermittent demand and lumpy demand.

The OWA is computed by averaging the relative MASE and the relative sMAPE for all samples. The OWA is defined as:

$$\text{OWA}\_{i} = \frac{1}{2} \left( \frac{\sum\_{1}^{\text{s}} \text{sMAPE}\_{i}}{\sum\_{1}^{\text{s}} \text{sMAPE}\_{1}} + \frac{\sum\_{1}^{\text{s}} \text{MASE}\_{i}}{\sum\_{1}^{\text{s}} \text{MASE}\_{1}} \right) , i \in M \tag{11}$$

where OWA*<sup>i</sup>* is the OWA of method *i*, *s* is the number of series, and sMAPE1 and MASE1 are the performance measures of Naïve. The OWA is an effective metric to compare the performance difference between proposed models and the benchmark model. If the OWA of the proposed model is lower than 1, it means that the proposed model outperformed the benchmark model, and vice versa.

#### **4. Empirical Analysis**

#### *4.1. Empirical Data*

We demonstrate the applicability of the sales forecasting methods using two realworld problems. The first dataset was taken from Haolinju, a large chain of a convenience store in Beijing, China. Haolinju has more than 800 stores and typically stocks more than 5000 different items in its distribution center. Haolinju's sales data ranges from 9 July 2016 to 8 July 2018 and contains 5383 items of different categories and various time horizons. The second dataset was taken from JD, also known as Jingdong, a large B2C online retailer in China. JD's sales data ranges from 1 January 2016 to 31 December 2017 and contains 1000 items of different categories and various time horizons. It should be noted that JD's data in June and November are excluded due to promotional activities. Since some forecasting methods require historical data for training, we removed items with sales records less than 40 days. Then, there were 4027 items in Haolinju's data and 936 items in JD's data that met the requirements.

According to the indicators of CV<sup>2</sup> and ADI, those items in both retailers were divided into four demand patterns: smooth, intermittent, erratic, and lumpy. Table 2 shows the detailed descriptive statistics of CV<sup>2</sup> and ADI on those four demand patterns. There were 1336 (33.2%) items in Haolinju's data and 34 (3.6%) items in JD's data that met the smooth demand pattern. The CV<sup>2</sup> of nonzero demand of Haolinju and JD was 0.223 and 0.376, respectively, and the ADI of both retailers was 0.096 and 0.607, respectively. There were 1211 (30.1%) items in Haolinju's data and 700 (74.8%) items in JD's data that met the lumpy demand pattern. The CV<sup>2</sup> of nonzero demand of Haolinju and JD was 1.586 and 3.408, respectively, and the ADI of Haolinju and JD was 7.096 and 3.862, respectively. In the lumpy pattern, the sales volatility of Haolinju was less than JD, but the former had a longer demand interval.


**Table 2.** Characteristics of the sales of the offline retailer (Haolinju) and the online retailer (JD).

ADI: average demand interval; CV: the coefficient of variation. Means (standard deviation) are presented in the table.

#### *4.2. Empirical Results*

Based on the two datasets drawn from an offline retailer and an online retailer, we examined the performance of two dynamic weighting strategies by comparing with benchmark models such as Naïve, Comb S-H-D, and SCUM. We implemented multi-round rolling forecast with different horizons. The last 10 days of each series were used to test the performance of the models. We conducted the experiment with ten rounds and one horizon for short-term forecasting, and the experiment with four rounds and seven horizons for long-term forecasting. For example, suppose 1 item has 40 days of sales data, and we set the forecasting horizon equal to 1. Before starting the forecasting system, we used the sales data from day 1 to day 29 to train the sub-models and forecast the demand on day 30. In round 1, based on the performance of each sub-model on day 30, the sales data of the first 30 days were used to predict the demand on day 31. In round 10, the sales data of the first 39 days were used to forecast the demand on day 40. In this study, we measured the performance of the proposed methods with Windows 10, Intel(R) Core(TM) i7-8550U CPU @ 1.80 GHz, 8.00 GB RAM. The forecasting process was performed by using R Studio Version 4.0.5. The performances of two dynamic weighting strategies in four demand patterns were analyzed, respectively.

#### 4.2.1. Smooth Pattern

The forecast accuracy comparisons for different methods with different forecasting horizons in the smooth pattern are shown in Table 3. The Comb S-H-D outperformed the other eight methods in the model pool for Haolinju's data when the horizon was equal to one. In the remaining three datasets, the SCUM outperformed the other eight sub-models. Surprisingly, the DWS-B outperformed all sub-models on all datasets, and the DWS-A performed better than all sub-models for Haolinju's data and JD's data when the horizon was equal to seven. For instance, for Haolinju's data when the horizon was equal to seven, the sMAPE of Naïve was 22.114%, that of Comb S-H-D was 18.749%, that of SCUM was 18.947%, while for DWS-A and DWS-B they were 17.588% and 17.387%, respectively. We also calculated the improvement in OWA of the Comb S-H-D, the SCUM, and the two proposed dynamic weighting strategies over the Naïve. According to OWA, for Haolinju's data when the horizon was equal to seven, the DWS-B was 18.4% more accurate than the Naïve and 5.23% more than the SCUM. In general, the forecast results in the smooth pattern indicate that the proposed DWS-B performed better than the DWS-A method and the other three benchmark models.

#### 4.2.2. Intermittent Pattern

The pattern of intermittent demand is characterized by a long average inter-demand interval and a low coefficient of variation. The results of Table 4 show that the Naïve model provided more accuracy than all other sub-models for Haolinju's data and JD's data when the horizon was equal to one. This means that forecasting intermittent demand is not an easy task. However, the DWS-A outperformed all sub-models for Haolinju's data and JD's data. For instance, for Haolinju's data when the horizon was equal to seven, the sMAPE of Naïve was 82.682%, that of Comb S-H-D was 124.918%, that of SCUM was 126.654%, while for DWS-A it was 75.052%. According to OWA for Haolinju's data when the horizon was equal to seven, the DWS-A was 11.1% more accurate than the Naïve and 7% more than the best sub-model, sNaïve.


**Table 3.** The performance of the five methods for rolling forecast testing in the smooth pattern.

<sup>a</sup> The Comb S-H-D outperformed the other sub-models in this dataset. <sup>b</sup> The SCUM outperformed the other sub-models in these datasets.


**Table 4.** The performance of the five methods for rolling forecast testing in the intermittent pattern.

<sup>a</sup> The Naïve outperformed the other sub-models in these datasets. <sup>b</sup> The sNaïve outperformed the other sub-models in this dataset (OWA = 0.956). <sup>c</sup> The 4Theta outperformed the other sub-models in this dataset (OWA = 0.940).

#### 4.2.3. Erratic Pattern

The pattern of erratic demand is characterized by a short average inter-demand interval and a high coefficient of variation. The results of Table 5 show that the DWS-B outperformed all sub-models for Haolinju's data and JD's data. For example, for Haolinju's data when the horizon was equal to one, the sMAPE of Naïve was 32.220%, that of Comb S- H-D was 31.539%, that of SCUM was 29.752%, while for DWS-B it was 28.731%. According to OWA for Haolinju's data, when the horizon was equal to one, the DWS-B was 8.5% more accurate than the Naïve and 4.29% than the best sub-model, SCUM.


**Table 5.** The performance of the five methods for rolling forecast testing in the erratic pattern.

<sup>a</sup> The SCUM outperformed the other sub-models in these datasets. <sup>b</sup> The Naïve outperformed the other submodels in this dataset.

#### 4.2.4. Lumpy Pattern

The pattern of lumpy demand, which is characterized by a long average inter-demand interval and a high coefficient of variation, is a common phenomenon in online and offline retail. The results of Table 6 show that the DWS-A provided more accuracy than all submodels for Haolinju's data and JD's data. For example, for JD's data when the horizon was equal to seven, the MASE of Naïve was 75.817%, that of Comb S-H-D was 74.057%, that of SCUM was 74.092%, while for DWS-A it was 64.560%. According to OWA for JD's data, when the horizon was equal to seven, the DWS-A was 14.7% more accurate than the Naïve and 8.38% than the best sub-model, SCUM.


**Table 6.** The performance of the five methods for rolling forecast testing in the lumpy pattern.

<sup>a</sup> The Naïve outperformed the other sub-models in these datasets. <sup>b</sup> The SCUM outperformed the other submodels in these datasets.

#### *4.3. Optimal Dynamic Weighting Strategy for Each Demand Pattern*

Based on the empirical analysis results, as shown in Figure 3, we can determine an optimal dynamic weighting strategy for each demand pattern. For items in the smooth or erratic pattern, it is recommended to use the DWS-B method to output the final predicted value. For items in the intermittent or lumpy pattern, it is recommended to use the DWS-A method to output the final predicted value. This means that for such items with intermittent or lumpy patterns, retailers only need to consider the output of the optimal sub-model as the final predictions.

**Figure 3.** Optimal dynamic weighting strategies in the four demand patterns.

#### **5. Conclusions**

In this paper, we proposed dynamic model selection based on demand pattern classification as a new approach in the retailing forecasting area. This approach offers a framework to address the challenge of model selection with complex demand patterns in retail practice. Based on a series' average inter-demand interval and the squared coefficient of variation of the demand sizes, we divided the demand patterns of all items of retailers into four types: smooth, intermittent, erratic, and lumpy. Some studies have proposed specific prediction methods for certain demand patterns, such as Syntetos-Boylan Approximation and Croston methods for intermittent demand [38]. However, the demand pattern of items may change over time. Moreover, any single model for demand forecasting cannot be the most accurate in all periods of an item. It is necessary to monitor and update the demand pattern and switch appropriate forecasting methods. We first built a pool of models, including nine classical methods, in the M-Competitions. Then, we proposed two dynamic weighting strategies based on the historical performance of all candidate models, namely DWS-A and DWS-B. The DWS-A method only selects the best prediction model in the past as the final model. The DWS-B method sets different weights according to the historical performance of candidate models. The weights of both strategies change dynamically over time. This framework can provide automatic model selection for retail demand forecasting. Further, this approach has better interpretability and may be more acceptable to decision makers.

We verified the effectiveness of this approach by using two large datasets from an offline retailer and an online retailer in China. We implemented multi-round rolling forecast with different demand patterns and horizons to verify the generalization ability of this approach. The pattern of smooth demand, which is characterized by low volatility and short intervals, is easier to predict. For this pattern, the DWS-B delivered more accuracy at various forecast horizons. Additionally, the DWS-B in the pattern of erratic demand outperformed all models in the pool. This means that the combination forecast is more suitable for items with a short ADI. The demand patterns of intermittent and lumpy, which are characterized by a high proportion of zero values, are not easier to predict. However, the DWS-A still outperformed all models in the pool. This means that individual selection is more suitable for items with a long ADI. In general, the proposed dynamic weighting strategies dominated the benchmark and winning prediction models in M-Competitions, including Naïve, Comb S-H-D, and SCUM. We suggest that the DWS-A method is applicable to the items of intermittent and lumpy patterns, and the DWS-B method is applicable to the items of smooth and erratic patterns.

We did not consider additional prediction methods in the pool of models, such as deep learning methods, as several models take extra time in the calculation and have higher complexity, and their performance is not necessarily as good as statistical models [8]. However, the model pool and empirical results of this study are sufficient to prove the effectiveness of the proposed model selection approach. In future studies, additional models and factors that affect consumer demand should be included in this forecasting system to improve the forecast accuracy.

**Author Contributions:** Conceptualization, E.E. and M.Y.; methodology, E.E. and X.T.; validation, E.E. and Y.T.; formal analysis, X.T.; data curation, Y.T.; writing—original draft preparation, E.E.; writing—review and editing, M.Y., X.T., and Y.T. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Natural Science Foundation of China (72172145, 71932002), the Beijing Natural Science Foundation (9212020), and the Fundamental Research Funds for the Central Universities.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

