1. Introduction
At the macro level, economic growth mainly relies on investment, exports, and consumption [
1,
2], while at the micro level, revenue management is the main means for enterprises to achieve revenue growth. Dynamic pricing, one of the most powerful revenue management techniques, has been widely applied in various industries, such as airlines, hotels, fashion, and retail [
3]. The accurate prediction of demand is crucial for effective dynamic pricing strategies, leading many scholars to develop different demand models [
4]. However, these models assume that sellers possess complete market information and fully understand the market demand, implying a determinate demand function. For example, in the classic linear demand function, this underlying assumption implies that the seller knows the specific values of all parameters of the demand function [
5,
6]. In reality, acquiring complete information on the market size is impractical for sellers, many of which often have a limited understanding of market size and product demand. This uncertainty poses pricing challenges for the seller. To address this, some scholars have proposed parametric demand learning [
7,
8,
9]. However, most of these studies focus on one-time parameter estimation and develop price strategies based on a deterministic demand function, neglecting the dynamic nature of the learning process. In the retail industry, sellers are increasingly recognizing the need for gradual adjustment in demand estimation throughout the entire sales period [
10]. Therefore, this paper focuses on dynamic pricing within the context of dynamic demand learning.
Whether a demand learning strategy can achieve good performance depends on the following three main points: First, the data source of learning plays a crucial role as the quality of the data directly impacts the learning outcomes. When the correlation between the data and the real demand is weak, the effectiveness of demand learning is compromised. Secondly, the parameter updating mechanism or learning method is essential. Since demand learning spans the entire sales period, each learning method will obtain parameter estimates, and it is vital to reasonably update them. Bayesian rule, a widely employed learning method, offers many advantages. It exhibits strong convergence properties, enables learning with limited data, and achieves more accurate results as the dataset grows. Additionally, the Bayesian model enhances interpretability, facilitating a clear understanding of underlying principles. Therefore, we adopt the Bayesian method for demand learning. The third point pertains to the construction of the demand model. If the established demand model fails to fully capture the factors influencing demand, the learning results will inevitably deviate from the real demand.
Historical prices and sales as an excellent data source for demand learning. Using such data offers several advantages. First, historical sales can be regarded as price experiments in a real business environment, providing direct insights into the relationship between price and demand. Second, these data do not entail additional costs for sellers to acquire. In contrast, conducting market research or questionnaire surveys often incur additional costs. Finally, the advancements in information technology have facilitated the storage and analysis of historical data.
In addition to data, another critical aspect of demand learning is understanding the factors that influence market demand. In the realm of dynamic pricing, two key factors affecting market demand during a sales period are the current price of the product and its historical prices. The impact of the current price on demand is evident. Meanwhile, historical prices impact product demand through consumer reference price effects (hereafter referred to as reference effects). Specifically, a product’s historical prices shape consumers’ price expectations, serving as reference prices. When the current price exceeds their expectations, their willingness to purchase decreases. Conversely, when the current price is lower than expected, they are more willing to purchase the product. Numerous studies emphasize the importance of reference effects in demand estimation and revenue management for businesses, especially within the retail industry [
11,
12]. Mazumdar et al. [
13] conducted a literature review on reference prices, examining their formation, usage, and impact on the purchase decisions of consumers. The findings indicated that reference prices have a crucial effect on enterprise management decisions. Cornelsen, Mazzocchi, and Smith [
14] provided evidence of consumer reference effects by analyzing household-level data encompassing food prices and purchases. They found significant biases in pricing strategies when reference effects were not considered. Mehra, Sajeesh, and Voleti studied how the reference price in the non-durable goods market affects a firm’s product positioning and pricing strategy [
15].
Based on the above analysis and findings, this paper investigates the problem of how a monopolistic seller determines the optimal prices for multiple sales periods in the presence of demand uncertainty and consumer reference price effects. Specifically, at the beginning of each period, the seller has only a rough estimate of the parameters of the demand function. Using deterministic dynamic programming, the seller solves for the optimal price at that time and conducts sales based on the solved price. At the end of the period, the seller re-estimates the demand function parameters using the Bayesian method while simultaneously initiating the process of solving for the next period’s optimal price. This iterative process continues until the end of the sales period.
The contribution of this paper is three-fold. First, it examines the demand learning and reference price jointly. Unlike the existing literature, this paper adopts a dynamic learning process. Precisely, sellers can use the data generated during each sales period (i.e., prices and sales) for demand learning and update previous learning results. This dynamic approach closely aligns with real-world business applications. Second, we propose an effective algorithm for approximating Bayesian dynamic programming. This enables sellers to make precise real-time price adjustments. Finally, this paper shows that reference price-influencing factors exhibit greater variation compared to price-influencing factors in demand learning. This indicates weaker heterogeneity in price sensitivity among consumers compared to reference prices. Therefore, exploring mechanisms for heterogeneous reference price formation is crucial for both businesses and academia.
The remainder of this paper is organized as follows:
Section 2 reviews the relevant literature.
Section 3 introduces the dynamic pricing model incorporating reference price and extending it to the case of uncertain demand, that is, dynamic pricing based on the reference effect and demand learning.
Section 4 describes the approximate solution algorithm for the proposed model.
Section 5 presents numerical analysis and discussions, while
Section 6 concludes the study.
2. Literature Review
Two research streams are related to our work: dynamic pricing with uncertain demand and dynamic pricing with reference price effects.
Depending on the form of the demand function, the current literature on uncertain demand can be divided into two categories. One category focuses on demand functions where price is the only variable, while another considers demand as a function of price and other non-price variables, which are collectively referred to as covariates. In the case of the former, three main research approaches are prevalent. The first approach involves employing statistical methods, such as classical maximum likelihood or least-squares estimators, for parameter estimation [
16,
17]. Keskin and Zeevi [
18] used the greedy least-squares (GILS) method to address a multi-product, limited inventory dynamic pricing problem with partially known demand distribution but unknown linear demand function parameters. Their semi-shortsighted optimal approximation strategy proved that the expected return loss rate log(T) results from this strategy and provides sufficient conditions for the progressive optimization of any strategy. Den-Boer and Zwart [
19] investigated the dynamic pricing problem with uncertain demand function parameters under limited inventory. They employed maximum likelihood estimation to calculate the unknown parameters of the demand function, utilizing price, demand, and inventory data for each sales period. Due to the endogenous learning characteristics, the calculated price strategy yielded an expected revenue loss of O([log]
2 (T)).
The second approach involves employing the Bayesian framework. Aoki [
20] was the first to apply this framework in the study of demand learning, and subsequent scholars have built upon his research [
21,
22]. Harrison et al. [
23] investigated dynamic pricing in financial services, where the parameters of the underlying demand model are unknown, but the seller can learn these parameters by observing the outcomes of each sales attempt. Ghate [
24] focused on the dynamic auction–design problem, where the seller faced uncertainty regarding the market response. They formulated the problem as a Bayesian Markovian decision process, where the seller learned more about the demand by observing the number of posted bids. Uncertainty in demand at the individual consumer level, reflected in the unknown consumer’s arrival rate, has also received scholarly attention. For example, Mason and Välimäki [
25] analyzed the optimal selling of an asset when demand is uncertain, with the seller acquiring knowledge of the arrival rate through a Bayesian approach.
The third approach involves utilizing online learning methods. This method employs price experiments to gather data, estimates parameters based on the data, and finally solves the optimization problem using the estimated parameters. A notable characteristic of this method is the trade-off it strikes between exploration and exploitation. Keskin and Zeevi [
26] studied a dynamic pricing problem where a seller encounters an unknown demand model that can change over time. Yang et al. [
27] designed an algorithm to facilitate synchronized dynamic pricing, allowing competitive firms to estimate their demand functions through observations and adjust their pricing strategies accordingly.
In reality, various factors beyond price influence demand, including product attributes and customer characteristics. Qiang and Bayati [
28] considered linear demand as a function of price and additional information, such as demand covariates, including marketing expenditure, geographic information, customer socio-economic characteristics, macroeconomic indicators, and weather information. When the seller sells a large number of products, Javanmard and Nazerzadeh [
29] assumed that consumers make purchase decisions based on a general choice model incorporating product features and their own characteristics, with unknown model parameters. Cohen et al. [
30] posited that the demand function is determined by a linear combination of multiple product attributes, with unknown coefficient values that the seller can learn through a feasible collection. Chen and Jiang [
31] employed the GILS method to segment the market into high- and low-quality product markets, where the seller lacked advanced knowledge of customer demand information for quality. They found that, under demand uncertainty, a high-quality firm may yield higher profits when initially setting prices. Ban and Keskin [
32] examined dynamic pricing at the individual customer level, with customers encoding their personalized characteristics as a d-dimensional feature vector. The personalized demand model’s parameters depended on s out of the d features. The seller could learn the relationship between customer features and product demand through sales observations.
The above literature review demonstrates a wealth of research on demand uncertainty; however, it does not consider the impact of certain behavioral factors of consumers on demand.
In the field of dynamic pricing with reference effects, researchers have proposed various mechanisms for forming reference prices based on the different relationships between the reference and the historical prices. Popescu and Wu [
33] investigated the dynamic pricing problem for a single monopoly producer facing consumers with reference effects. They assumed that the reference price is an exponential smoothing of historical prices, influenced by all historical prices. The authors theoretically proved that the price strategy converges to a steady state and provided numerical results indicating that the steady state price value decreases with customers’ memory and is sensitive to the reference effect. Yang et al. [
34] extended Popescu and Wu’s [
33] research by examining the dynamic pricing problem for a single monopoly producer under limited inventory and random demand, in which the consumers’ arrival in each period is subject to a Poisson distribution and the consumers’ reference price is also an exponential smoothing of historical prices. They explore how the reference effect influences the initial price, pricing trend, price dispersion, and expected revenue, along with the fixed pricing (FP), dynamic pricing (DP), and dynamic pricing with reference (DPR) policies. However, alternative reference price update mechanisms have been proposed. Nasiry and Popescu [
35] suggested that the reference price update of consumers is determined by the maximum (minimum) historical price of the product and the price of the latest period (known as the peak-to-end law). They established a single-product dynamic pricing model based on the peak-to-end law and found that, if loss-averse consumers are anchored at a low price, the steady price range widens. Bi et al. [
36] extended Nasiry and Popescu’s [
35] model to multiple products and showed that manufacturers can lower consumers’ reference price by reducing the price of core products, thereby attracting more customers. Additionally, Bi et al. [
37] considered consumers’ reference effect as a linear combination of prices within the memory window. They assumed that these memory factors are random variables subject to Markov first-order stochastic processes and studied the dynamic pricing problem for monopoly sellers. Drawing on prospect theory, consumers feel a “gain” (“loss”) when the reference price is greater (less) than the current price, and they react asymmetrically to these two situations (i.e., they exhibit loss-avoidance behavior). This is an aspect that has received a lot of attention in the literature with regard to reference prices [
38,
39,
40,
41]. Dynamic pricing models that integrate reference effects with other consumer behaviors have also attracted the interest of several scholars in recent years. For example, Chen et al. [
42] investigated dynamic pricing where consumers exhibit both reference effects and strategic behavior.
Some scholars have jointly explored dynamic pricing based on demand learning and reference price effects. In a discrete dynamic pricing scenario, Cao et al. [
43] assumed that the customer arrival rate is unknown to the seller and is influenced by the reference price. They found that overlooking the reference price effect could result in substantial revenue losses. Den and Keskin [
44] designed a policy involving gradual changes in the selling price to control the evolution of the reference price. They used accumulated sales data to strike a balance between learning and earnings, employing, for instance, an online learning framework. However, our study differs from both of these two studies. We incorporate the reference effect into the linear demand model in a continuous dynamic pricing scenario, where the parameters of the demand function are unknown to the seller and are updated using the Bayesian method.
4. Model Analysis and Solution
In this section, we discuss a solution to the dynamic pricing model based on the reference effect and demand learning.
The Bellman equation for Model (14) is as follows:
In the last period, the seller had information
about the demand realized in the past sales period with the realized price information
and
. The seller can compute
using Equation (13). Thus, the optimal price can be solved as
In the penultimate period, the optimal prices
and
are chosen to maximize
is a function of the . can also be considered a function of . At time , the seller knows and ; thus, the optimal prices and can be deduced.
Returning to the above analysis in the first period, we find that an optimal solution exists. However, we cannot obtain a specific form of the , and it is difficult to find the optimal solution. Therefore, we propose a numerical method to obtain an approximate solution. Specifically, in each period t, the seller updates an and fixes it. Then, Model (14) becomes a non-random dynamic programming model, and we solve using inverse recursion according to the Bellman equation. Given , the seller observes and updates to and repeats the procedure until the end of the time horizon. The specific Algorithm 1 is as follows:
Algorithm 1. Compute and . |
Initialization for do compute , for do compute
end for update end for |
5. Numerical Simulation
To present the results more intuitively, in this section, we conduct numerical simulations for Model (14) under the given parameters and conditions and show the simulation results. Considering that there will not be an infinite number of prices set by the seller in practice but only a choice within a finite number, we assume that the decision variable will only fluctuate within a finite number of prices belonging to a discrete, deterministic set.
5.1. Numerical Simulation Results
Given the parameters
,
,
,
,
,
,
,
,
,
and
, based on the value iteration and compression mapping theorems, we first obtain the price and reference price for each period set by the seller. As shown in
Figure 1, although the decision variables are limited to a finite set of values, the decision variables (prices) of the model in this study do not converge monotonically to a single value but rather fluctuate back and forth. This is in contrast to the short-term results observed by Popescu and Wu [
33]. As illustrated in
Figure 2, compared with the decision variable
, the state variable
is monotonic and increases with time in this simulation.
Based on the derived and reference prices and the two previous assumptions, this study will further simulate how the demand function coefficients
are updated.
Figure 3,
Figure 4 and
Figure 5 illustrate the update of these coefficients. With a demand-learning seller, the change in market aggregates in each period is not significant, and the gap between the first and sixth issues is only 2.5 units (see
Figure 3). In other words, consumers do not react strongly to more “intelligent” sellers. The change in the price impact factor for each period is also insignificant (see
Figure 4), with only 1.5 units. However, compared with the total demand impact coefficient and the price impact factor, the change in the reference effect utility impact factor is significant, with 7 units. The demand-learning seller learns the demand function by observing changes in product demand caused by the reference effect impact factors.
Figure 6 shows that market demand increases over time.
5.2. Discussion
In this subsection, we will discuss the numerical simulation results. First, although the seller has been updating the values of (i.e., market size) and (i.e., price sensitivity) throughout the sales period, the changes are relatively flat. This indicates that the seller’s decisions are based on prior knowledge of these parameters, resulting in minimal deviation from the optimal outcome. We can use market segmentation theory to explain why the changes of and parameters are relatively flat. Before selling the product, the seller determines their target market and product positioning. According to market segmentation theory, when the target market is defined, the initial market scale (i.e., ) is established, and product positioning provides a clear understanding for consumers, helping them clarify the substitutability of products, which, in turn, affects consumers’ price sensitivity (i.e., ). The process of target market determination and product positioning mentioned above is the process through which the seller obtains prior information about and . Therefore, the changes in and during the seller’s learning process occur gradually. For example, in China, Pinduoduo.com and JD.com are two well-known online shopping platforms. Although many people intuitively feel that JD.com’s retail prices are higher than those on Pinduoduo.com, consumers who have previously chosen to shop on JD.com do not tend to shift to the Pinduoduo.com platform. This implies that the demand for products on JD.com has not changed significantly. This phenomenon can be attributed to the initial differences in target audiences for the two platforms. JD.com’s primarily caters to high-value customers, while Pinduoduo.com focuses on attracting low-value customers.
The second result of the numerical simulation reveals that the variation in the influencing factor (i.e., ) of the reference effect surpasses the variations of and . This highlights the potential for significant decision losses if sellers solely make decisions based on prior information regarding . The reference effect, as a consumer’s behavioral characteristic at the individual level, exhibits evident heterogeneity, which has been overlooked in the literature when constructing the reference price formation mechanism. Consequently, obtaining reasonable prior information through measures such as target market and product positioning becomes challenging. However, to some extent, this underscores the necessity of learning about the influencing factors of reference effects and prompts further exploration of heterogeneous reference price formation mechanisms.
Finally, considering supply chain management, it is necessary for sellers to engage in demand learning. By updating the parameters of the demand function at the end of each period, accurate demand forecasts for the next period can be generated. The predicted results can then be communicated to upstream production enterprises through the supply chain, enabling timely adjustments to production plans and optimizing revenue across the entire supply chain.
6. Conclusions
This study summarizes previous research on dynamic pricing problems with reference effects and proposes a model to address this problem. The existing literature has primarily assumed constant parameters in dynamic pricing problems with a reference effect. It has also overlooked consumer behavioral factors in their dynamic pricing models based on seller demand learning. Drawing on this analysis, we first presented a dynamic pricing problem considering both seller demand learning and the consumer reference effect. In
Section 4, we introduced a model that incorporates these factors and provided theoretical proof of its solution. We outlined the algorithmic steps for analyzing the numerical solution and presented a specific numerical example along with its analysis. Through the numerical analysis, we observed that, with the consumer reference effect, the total consumer demand and the effect of price on demand slightly change in each period, whereas the reference effect influences factor changes more, indicating that the seller can learn about the change in product demand by monitoring the reference effect influence factor. The management implications of the above findings for enterprises are as follows: First, conducting market and product positioning before the formal product launch is necessary. This can help reduce demand uncertainty. Second, it is advisable for enterprises to adjust prices as early as possible. This is because increasing the frequency of price adjustments can accelerate the seller’s learning about the impact of reference effects on demand. Consequently, it leads to more stable demand predictions in subsequent periods.
While this study contributes to the field of dynamic pricing, it is not without limitations. For example, we focus on dynamic pricing based on demand learning in a monopolistic environment, while, in reality, sellers face intense competition, and their demands are influenced by competitors’ reactions. Therefore, constructing a reasonable demand learning model and solving it remains a challenging problem. In addition, this paper does not consider the loss aversion behavior in the consumers’ reference effects.
It is worth noting that this study specifically introduces the reference effect as an example of a consumer behavioral factor into the model. Researchers have the opportunity to incorporate other consumer behavioral factors, such as consumer inertia and strategic behavior. Furthermore, the demand-learning problem presented in this study is model-driven, lacking historical sales data for the product. Conversely, the existing literature has focused on data-driven problems, as seen in the work of Li and Wu [
45]. Thus, exploring the integration of historical data into the problem addressed in this study warrants further investigation.