The current sales model has become diversified, and people are keen on online shopping. Nowadays, in addition to online retail platforms such as Taobao and Jingdong, more and more manufacturing companies have also opened online and offline sales channels, and customers’ shopping channels have become diversified. International market research company Euromonitor International shows that personality and customized products are important keywords for future consumption development in the world’s top ten consumption trends, and the demand for customized products is gradually increasing [
1]. At the same time, the substitutability of products is increasing, and the volatility and uncertainty from market demand are increasing. In addition, in order to encourage consumers to consume, enterprises have adopted many artificial periodic promotional activities, which further increases the uncertainty of market demand. Under these influences, customer demand has gradually shown a higher degree of intermittence and explosiveness: more zeros and high-count islands [
2].
1.1. Review of Demand Forecasting
Demand is the source of production and supply. Many scholars have shown that demand forecasting plays a key role in a company’s operational decision-making [
3,
4,
5,
6,
7], and a large number of scholars have studied how to make accurate demand forecasting. From the perspective of modeling, some scholars have established parametric models and obtained the fixed parameters of the model by training and fitting the historical data so as to predict the future demand. Some scholars established a nonparametric model, regarded the model parameters as random variables, obtained different parameters at different times through the training of historical data, and then predicted the future demand.
Most of the existing studies based on parametric models were constructed from the perspective of time series prediction. The commonly used methods include the ARIMA method [
8,
9,
10,
11,
12,
13,
14,
15] and exponential smoothing method [
16,
17,
18,
19]. In order to solve the nonlinear problems that may exist in the time series model, many scholars have improved the ARIMA method under different problem backgrounds, such as the nonlinear ARIMA time series model considering seasonal factors [
10,
11,
12,
13] and the combination forecasting method combined with computational intelligence methods [
14,
15]. Based on the network graph theory combined with the four attributes of the product, Haytham et al. proposed an autoregressive integrated moving average model (ARIMAX) with exogenous variables to predict the sales of cosmetics. The prediction results were compared with the Croston method and the ARIMA method, and better results were obtained [
20].
These methods have achieved good prediction results. However, at present, there are many kinds of products, some of which show a low level of demand, as well as many periodic promotional activities, and the demand shows intervals and strong volatility. Johnston and Boylan cited an example where the average number of purchases of an item by a customer was 1.32 occasions per year, and ‘‘For the slower movers, the average number of purchases was only 1.06 per item [per] customer’’ [
21]. For interval demand forecasting, the most classic method is the Croston method, which calculates the expected demand for the next period, that is, the ratio of the expected non-zero demand to the expected non-zero-demand time interval, both of which are estimated by simple exponential smoothing [
22]. In general, if the parameter model is specified incorrectly, that is, the data seriously violates its basic model assumptions, it may lead to inconsistent predictions and ultimately lead to inappropriate inferences and suboptimal recommendations. A misspecified Bayesian model can yield an ill-behaved asymptotic posterior distribution [
23,
24,
25]. The work in [
26] finds that the outperformance of nonparametric methods increases with higher demand variability.
The nonparametric model has the ability to accommodate infinite attributes and infinite levels in theory, which can meet the needs of complex system characterization. Snyder et al. proposed the state space model and distribution prediction. The model tracks the expected demand by exponential-smoothing updates, assumes that the demand follows a negative binomial distribution, and uses the maximum-likelihood method to estimate the model parameters [
27]. On the basis of Snyder, Chapados proposed Bayesian inference model H-NBSS for count-type time series. In order to adapt to the characteristics of counting data, H-NBSS uses the counting process as the statistical prior of the data, which makes the solution of the model very difficult. In addition, due to the characteristics of the counting process itself, H-NBSS is also extremely sensitive to the results of parameter adjustment, and subtle changes in model parameters may also lead to great fluctuations in prediction results [
28]. Assuming that the demand distribution is a multistage Poisson distribution, Seeger et al. constructed a nonparametric Bayesian model [
2]. Babai et al. proposed a nonparametric Bayesian method (CPB), assuming that the demand follows a compound Poisson-geometric distribution [
26]. However, there is no known conjugate prior that leads to a posterior distribution in a closed form. Considering the volatility of function time series, Guillaume et al. proposed a local autoregressive nonparametric prediction model. The relevant parameters learn the inference through a strategy based on approximate Bayesian computation [
29]. Yuan Ye et al. proposed a nonparametric Bayesian forecasting method based on the empirical Bayesian paradigm, which is very flexible in dealing with a variety of demand patterns and used the demand data of 46,272 inventory units of an auto parts distributor to evaluate the relative performance of the method [
30]. Aiming at the problem of port volume prediction, Majid et al. proposed a prediction model based on Bayesian estimation. Mutual information analysis is added to the model, and the macroeconomic variables are regarded as random variables. The related uncertainty is quantified by the posterior distribution. However, this prediction method requires effective quantification of macroeconomic variables [
31]. Kowsari et al. pointed out that the Bayesian method regards the regression coefficient as a random variable and considers the uncertainty of the parameters with the data as the condition. In the model hypothesis of this paper, the regression coefficient
ϕ in the hidden layer variable
ηt is assumed to obey the normal distribution, and the appropriate value is obtained through data training [
32].
The Bayesian network is a probabilistic graphical model that combines the knowledge of probability theory and graph theory. It can reflect the potential dependencies between variables, deal with some uncertain causality, and effectively represent and calculate the joint probability distribution of a set of random variables. This structure has attracted much attention in statistics and artificial intelligence [
33,
34]. The Bayesian network is widely used in the energy, biology, medical, and other fields, such as wind-energy forecasting, power demand forecasting, and water resources demand forecasting. In many studies, the Bayesian network model is used for supply-chain demand forecasting.
The Bayesian network is based on Bayesian theory in practical application. It is worth pointing out that the Bayesian method shows the characteristics of experience and demand data in a quantifiable form by selecting an appropriate prior distribution. The variance in the prior distribution reflects the uncertainty of the demand data. As the demand evolves, the observed data are used to update the likelihood function to obtain the posterior distribution of the demand. The purpose of Bayesian posterior estimation is to find the appropriate parameter value. The prediction model in
Section 2 mainly adopts the Bayesian method, which uses the Bayesian theorem to update the prior distribution (the probability specified before data collection) to the posterior distribution (the data after data analysis). In the parameter estimation process, we use the Bayesian maximum posterior estimation method to derive the functional form of the maximum posterior estimation.
Bayesian methods have been successful in many fields and predictive environments and are particularly promising and appealing when there are considerable changes in the data. This paper considers the construction of a multilayer Bayesian network commodity demand-forecasting model under the given historical sales data of commodities. The model considers the time series correlation of historical data as well as the volatility of demand and has strong flexibility. The maximum posteriori estimation method is used to estimate the optimal parameters, and then the commodity demand in the future is predicted according to the optimal parameters.
1.2. Review of PSO Algorithm
The particle swarm optimization (PSO) algorithm is an evolutionary computing technology based on swarm intelligence theory proposed by American psychologist Kennedy and electrical engineer Eberhart, inspired by the foraging behaviors of birds and fish [
35]. Because of its simple concept, few parameters, fast convergence, and easy implementation, it is widely used in function optimization, production scheduling, pattern recognition, parameter estimation, and other fields. Let
M particles form a group in a D-dimensional search space, and each particle is regarded as a search individual. They search for the individual that makes the objective function optimal in the search space. By changing their position and velocity vectors, they follow the individuals with higher fitness for optimization. In the
t-th iteration, the position vector of each particle can be expressed as
Xi(
t) = (
xi1,
xi2, …,
xiD)
T, and the velocity vector can be expressed as
Vi(
t) = (
vi1, vi2, …,
viD)
T.
The equations for PSO can be expressed as follows:
where
w is the inertia weight;
c1 and
c2 are acceleration coefficients;
rand1 and
rand2 are two uniformly distributed random numbers generated within [0,1];
pbest is the personal best point for the
i-th particle; and
gbest is the global best point.
The PSO algorithm has been widely popular because of its advantages of fewer computational memory requirements and fewer control parameters. However, it is easy to fall into the local minimum solution when solving complex nonlinear multimodal functions. In order to solve this problem, many scholars have made improvements in the inertial weight
w and particle update formula. Shi et al. first introduced a linear decreasing inertia weight
w in the speed update term, which can better balance the global development ability and local exploration ability of the algorithm [
36]. Liu et al. pointed out that the solution process of the PSO algorithm itself is a nonlinear process, and the nonlinear change of inertia weight can better improve the performance of the algorithm [
37]. Therefore, the study proposed a nonlinear inertia weight based on a Logistic chaotic map. In addition, the improvement of
w is based on the cosine function, and the nonlinear change of
w is based on the Sigmoid function [
38,
39].
Deep et al. replaced the individual optimum and group optimum in the particle velocity update formula with the linear combination of the two; that is,
and
are used to replace
pbestid and
gbestd in the particle velocity update formula, respectively [
40]. The study proposed a mean particle swarm optimization algorithm (MeanPSO) to increase the search space of particles. The second term of the particle velocity update formula in Equation (1) becomes
, which is responsible for attracting a particle’s current position towards the mean of the positive direction of the global best position (
gbest) and the positive direction of its own best position (
pbest). The third term of the particle velocity update formula in Equation (1) becomes
, which is responsible for attracting the current position of a particle to the mean of the positive direction of its own best position (
pbest) and the negative direction of the global best position (
−gbest). The relative position of the new positions generated by PSO and MeanPSO can be visualized in
Figure 1.
Figure 1 comes from [
40]. This study in [
40] points out that the MeanPSO algorithm outperforms the PSO algorithm in terms of efficiency, accuracy, reliability, and robustness. Especially for large-sized problems, MeanPSO outperforms PSO. In addition, it can be seen from
Figure 1 that the particle search interval in the MeanPSO algorithm is wider, which makes the algorithm more likely to search for the global optimal solution in the early stage of evolution. Liu proposed a hierarchical simplified particle swarm optimization algorithm with average dimension information (PHSPSO) [
41]. The PHSPSO algorithm abandons the particle velocity update term in the PSO algorithm and introduces the concept of average dimension information, that is, the average value of all the dimension information of each particle. The calculation formula is shown in Equation (2). The PHSPSO algorithm decomposes the particle position update formula into three modes, namely, Equations (3)–(5).
where Equation (3) contributes to the global development ability of the algorithm, Equation (4) contributes to the local exploration ability of the algorithm, and Equation (5) helps to improve the convergence speed of the algorithm. In the iterative process, the algorithm selects different patterns based on probability to update the particle position.
The work in [
37] pointed out that using “
x =
x +
v” to update the particle position helps to improve the local exploration ability of the algorithm, and ”
x =
wx + (1 −
w)
v” helps to improve the global development ability of the algorithm. In order to balance local development and global exploration, an adaptive position update mechanism is proposed. Using this mechanism, the particles can select the position update strategy according to the corresponding conditions to better balance the local exploration ability and the global development ability. The position update strategy of the adaptive strategy is expressed by Equation (6).
where
fit(·) is the fitness value of the particle, and
N is the number of particles in the population. In Equation (6),
pi denotes the ratio of the fitness value of the current particle to the average fitness value of all particles in the population. When the
pi value is greater than the random number, the fitness value of the current particle is much larger than the average fitness value of all particles in the population. At this time, “
x =
wx + (1 −
w)
v” should be used to update the particle position to enhance the global development ability of the algorithm. Otherwise, “
x =
x +
v” is used to update the particle position to ensure the local exploration ability of the algorithm.
In order to improve the feasibility and effectiveness of the particle swarm optimization algorithm, Yang et al. introduced the evolution speed factor and aggregation factor. By analyzing these two parameters, the author proposed a dynamic change strategy of inertia weight based on the running state and evolution state [
42]. Liang et al. proposed a comprehensive learning strategy of particle swarm optimization (CLPSO), which updated the speed by the historical best position of all other particles [
43]. Adewumi et al. made a series of improvements to the inertia weight
w and proposed the group success rate reduction inertia weight (SSRDIW) and the group success rate random inertia weight (SSRRIW). These two strategies improved the convergence speed, the global search ability, and the solution accuracy of the algorithm [
44]. Particle swarm optimization belongs to the stochastic optimization method and is mainly driven by the random streams utilized in the search mechanism. Mingchang studied the influence of control randomness on the particle swarm search scheme by introducing three different pseudo-random number (PRN) allocation strategies. The results show that we can systematically select the appropriate PRN strategy and corresponding parameters to make the PSO algorithm more powerful and efficient [
45].
According to the above discussion, this paper uses the improved particle swarm optimization algorithm (MPSO) to solve the optimal parameters of the Bayesian network model. The algorithm combines the ideas of [
37,
40,
41], introduces the adaptive decision mechanism, and introduces the nonlinear inertia weight. The experimental results show that the algorithm is better than the traditional PSO algorithm.
This research accomplishes the following three things:
Constructs a multilayer Bayesian network demand prediction model that considers demand volatility and time series correlation.
Introduces a modified PSO algorithm (MPSO) for solving the maximum a posteriori estimates of parameters.
Assesses the proposed approach’s performance via a thorough experimental evaluation.
Based on the above discussion, the motivation behind this research work is based on two arguments. First, Bayesian methods have been successful in many fields and predictive environments and are particularly promising and appealing when there are considerable changes in the data. Second, it is assumed that the demand and parameters obey the normal distribution, which has a conjugate prior. When the posterior distribution is derived, the maximum posterior is used to estimate the relevant parameters. Therefore, combining the BN with the PSO algorithm, we use the improved PSO algorithm to optimize the function form of the posterior distribution. This paper proposes a more general Bayesian network demand-forecasting model and designs a more novel solution method, which provides a new idea for demand forecasting in the supply chain.
The remainder of the paper is organized as follows.
Section 2 introduces the proposed Bayesian network prediction model. In
Section 3, we describe the solution algorithm of the model.
Section 4 is dedicated to the experiment and the discussion of results. Conclusions and future work are put forward in
Section 5.