Improving the Forecasting Performance of Taiwan Car Sales Movement Direction Using Online Sentiment Data and CNN-LSTM Model

Ou-Yang, Chao; Chou, Shih-Chung; Juan, Yeh-Chun

doi:10.3390/app12031550

Open AccessArticle

Improving the Forecasting Performance of Taiwan Car Sales Movement Direction Using Online Sentiment Data and CNN-LSTM Model

by

Chao Ou-Yang

¹,

Shih-Chung Chou

¹

and

Yeh-Chun Juan

^2,*

¹

Department of Industrial Management, National Taiwan University of Science and Technology (Taiwan Tech), Taipei 106, Taiwan

²

Department of Industrial Engineering and Management, Ming Chi University of Technology, New Taipei City 243, Taiwan

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(3), 1550; https://doi.org/10.3390/app12031550

Submission received: 24 December 2021 / Revised: 24 January 2022 / Accepted: 28 January 2022 / Published: 31 January 2022

(This article belongs to the Collection Methods and Applications of Data Mining in Business Domains)

Download

Browse Figures

Versions Notes

Abstract

:

The automotive industry is the leading producer of machines in Taiwan and worldwide. Developing effective methods for forecasting car sales can allow car companies to arrange their production and sales plans. Capitalizing on the growth of social media and deep learning algorithms, this research aimed to improve the overall performance of the forecasting of Taiwan car sales movement direction forecasting by using online sentiment data and CNN-LSTM method. First, the historical sales volumes and multi-channel online sentiment data for six car brands in Taiwan were collected and preprocessed for labeling of car sales movement direction. Then, three models, namely, the classical, sentimental, and CNN-LSTM models, were constructed and trained/fitted for forecasting car sales movement directions in Taiwan. Finally, the performance of the three prediction models were compared to verify the effects of online sentiment data and the CNN-LSTM model on forecasting performance. The results showed that four forecasting performance indices, i.e., accuracy, precision, recall and F1-score, improved by 27.78% (from 41.67% to 69.45%), 0.39 (from 0.38 to 0.77), 0.27 (from 0.42 to 0.69) and 0.33 (from 0.35 to 0.68), respectively. Therefore, the online sentiment data and CNN-LSTM method can indeed improve the overall performance of car sales movement direction in Taiwan.

Keywords:

automotive industry; sales forecasting; online sentiment analysis; electronic word of mouth (eWOM); Convolution Neural Networks (CNN); Long Short-Term Memory (LSTM)

1. Introduction

Forecasting, which can help managers to develop more accurate and meaningful plans have played an important role in reducing business uncertainty for companies [1]. Sales forecasting in particular is the basis of definite and reliable plans for marketing, sales management, production, procurement, and logistics, which further empower companies to provide better services and reap more benefits [2]. A successful sales forecast is an essential key for companies to manage their business successfully.

The automobile industry, the leading producer of machines in many countries, is important for worldwide economic development. Furthermore, manufacturing a car requires iron, aluminum, plastic, steel, glass, rubber, copper, and more materials. If an automobile company can accurately predict its car sales, it can arrange effective production plans for its supply chain to prevent shortages and excesses of materials in the inventory process. In addition, when a customer decides to buy a new car, he/she generally hopes to take possession of the vehicle as soon as possible. If an automobile company can accurately predict its sales, it can develop effective sales plans to provide good service to its customers. Therefore, the development of a good car sales forecasting method is important for the automobile industry.

Unfortunately, few studies to date have focused on car sales forecasting [2,3,4,5,6,7,8]. Liu and Long [8] assembled a curve-regression model, a time series decomposition model, and RBF neural networks as a combined forecasting model and used economic data which takes on the obvious time factor and trends in car-making and selling. Brühl et al. [3] developed a time series model consisting of additive components: trend, seasonal, calendar, and error components. The model collected the main time series of newly registered automobiles and a secondary time series of exogenous parameters which could influence the trend of the main time series. The trend component was estimated by Multiple Linear Regression and Support Vector Machine (SVM). Yearly, quarterly, and monthly data for newly registered automobiles served as the basis for the tests of the models. The outcomes showed that the quarterly data provided the most accurate results. Wang et al. [4] developed an automobile sales forecasting methodology based on monthly sales volume, coincident indicator, leading indicator, wholesale price index, and income. Then an adaptive network-based fuzzy inference system (ANFIS) was created to obtain the forecast. The automobile forecasting methodology developed by Hülsmann et al. [5] used market-specific exogenous parameters, such as gross domestic product (GDP), stock index, personal income, and unemployment rate, on a yearly, quarterly, or monthly basis as the input variables for time series analysis and classical data mining algorithms.

On the Internet, consumers enthusiastically share their opinions and reviews via news, blogs, and social media, also known as electronic word of mouth (eWOM), and increasing numbers of potential buyers habitually consult eWOM before making their purchasing decisions [9,10,11,12,13,14]. Since eWOM can be positive or negative statements about a product or company [15,16,17], researchers have proposed sentiment analysis methods for automatically distinguishing three types of eWOM: positive, negative, and neutral [18]. To simultaneously apply historical sales data and eWOM to car sales forecasting, Fan et al. [2] used a sentiment analysis method, the Naive Bayes (NB) algorithm, to extract the sentiment index from each online review, and then integrated the sentiment index into the imitation coefficient of the Bass/Norton model to improve the forecasting accuracy.

Although very little effort has been expended to examine car sales forecasting, several points can still be raised by referring to forecasting studies of both car sales and sales of other products to facilitate the improvement of car sales forecasting methods.

First, historical sales data are the major predictor variable used for sales forecasting. Several other predictor variables, such as product prices, advertising campaigns, holidays [19], and economic indicators [4], are also frequently used for sales forecasting. Recently, with the popularity of social media, studies have begun using online reviews [2], online promotional strategies [20], and sentiment analysis [2,20,21] as the predictor variables to improve the performance of sales forecasting.

Second, several linear and nonlinear models, such as the Delphi technique, exponential smoothing, regression analysis, autoregressive integrated moving average (ARIMA), bass diffusion model, and multinomial logistic regression (MLR), are classical methods employed for sales forecasting and other predictions [4,22,23]. However, with the development of deep learning techniques, such as the Convolution Neural Networks (CNNs) and Long Short-Term Memory (LSTM), deep learning techniques have been recently applied to sales forecasting to improve the prediction performance [4,19,20,24,25]. The CNN is usually applied to image data for solving classification problems [26], while LSTM is used to analyze time series data for solving classification, processing, and forecasting problems [27].

Third, the response variable of sales forecasting can be either the sales volume (or amount) or the sales movement direction [18,28]. Sales volume forecasting is a continuous value prediction of sales volume. In contrast, the sales movement direction transforms the sales volume into directional changes in sales, such as Up, Flat, and Down. Thus, sales movement direction forecasting is a classification problem of sales forecasting.

In Taiwan’s automobile industry, the sales volume of passenger cars in 2019 was 383,987. As consumer preferences changed, the sales of imported cars in Taiwan increased year by year. The 2019 sales volume of imported cars was 200,548, i.e., about 52% of the market share of passenger cars. The total sales volume of the top six leading imported car brands, namely, BMW, Lexus, Mazda, Mercedes-Benz (Benz), Toyota, and Volkswagen (VW), was 146,231, around 73% of the market share of imported cars [29]. Thus, accurately predicting car sales, especially for the top six leading imported car brands, could contribute to the development of Taiwan’s automobile industry. Consequently, this research aimed to improve the sales forecasting for Taiwan’s car industry and used the top six leading imported car brands as the experiment cases.

As mentioned above, the response variable of car sales forecasting can be either the sales volume (or amount) or the sales movement direction. Fantazzini and Toktamysova [7] argued that correct forecasts of car sales movement directions can still provide useful information even with large errors in the forecast car sales volumes. This is particularly important when predicting a turning point, which is a special case of directional accuracy and represents a change in the car sales movement direction. Therefore, this research selected the car sales movement direction as the response variable for the sales forecasting of Taiwan’s six leading imported car brands. In addition, because a car is a durable consumer good, potential buyers will spend more time on eWOM to aid in decision-making on the purchase. To improve the performance of car sales forecasting, in addition to historical sales data, multi-channel online sentiment data were also used as the predictor variables for car sales forecasting. Instead of regarding the sentiment data as a coefficient of the Bass/Norton model in Fan et al.’s study [2], this research prepared and analyzed a series of daily multi-channel online sentiment data of a car brand in the form of images. Therefore, to consider the image characters of online sentiment data and the time series characteristics of historical car sales data, a CNN-LSTM model integrating the CNN and LSTM networks was used to build a car sales prediction model with improved prediction performance.

To clarify the effects of the online sentiment data and the CNN-LSTM model on car sales predictions, three models were created for forecasting car sales movement directions in Taiwan. The first “classical” model, using the historical sales data as predictor variables and MLR as the prediction model, was created as the performance baseline of forecasting of car sales movement directions in Taiwan. The MLR is a generalized logistic regression for solving problems with more than two classes [22,30]. Then a “sentimental” model was created by adding the multi-channel online sentiment data as the predictor variables to the classical model so as to verify the effects of online sentiment data on prediction performance. Finally, a “CNN-LSTM” model was created by replacing the MLR method in the sentiment model with the CNN-LSTM method proposed in this research to verify the effects of the latter method on prediction performance.

The performance comparison of the three prediction models showed that the forecasting accuracy of car sales movement directions in Taiwan was effectively improved by the use of online sentiment data and the CNN-LSTM model.

This paper is organized as follows. Section 1 states the relevant topics of this research and reviews the literature related to the research problem. Section 2 elaborates the creation process of the three prediction models for forecasting car sales movement directions in Taiwan. In Section 3, the results of the three prediction models are compared and analyzed to verify the effects of the online sentiment data and the CNN-LSTM model on the forecasting of car sales movement directions in Taiwan. Finally, the important findings, discussions, and suggestions for further research are summarized in Section 4.

2. Methodology

Figure 1 shows the research framework for improving the forecasting of car sales movement directions in Taiwan by using online sentiment data and the CNN-LSTM model. First, the historical sales volumes and multi-channel online sentiment data of Taiwan’s top six leading imported car brands were collected and preprocessed for labeling of car sales movement directions. Then the structures of three prediction models, namely, the classical, sentimental, and CNN-LSTM models, were constructed for forecasting car sales movement directions in Taiwan. Third, the three prediction models were trained or fitted with the datasets of Taiwan’s top six leading imported car brands. Finally, the prediction performances of the three prediction models were evaluated and compared to verify the effects of online sentiment data and the CNN-LSTM model on the forecasting of car sales movement directions in Taiwan.

2.1. Data Collection and Preprocessing for Labeling

2.1.1. Data Collection

As mentioned above, this research used both online sentiment data and the CNN-LSTM method to improve the performance of predictions of the car sales movement direction of Taiwan’s top six leading imported car brands. As shown in Figure 2, for each of Taiwan’s six car brands, namely, BMW, Lexus, Mazda, Benz, Toyota, and VW, the historical car sales data and online sentiment data were collected mainly from 2014 to 2019.

The historical car sales data were collected from the Ministry of Transportation and Communications, R.O.C. (MOTC) [29]. The MOTC website is a platform for commonly used transportation statistics and is operated by Taiwan’s government. Since the MOTC website provides new car registration data on a monthly basis, as shown in Table 1, the new car registration data from 2014 to 2019 were retrieved as the historical monthly sales volumes for six car brands. In addition, the new car registration data of January 2020 were also collected for the continuing labeling work.

The online sentiment data were collected from the OpView Insight: Social Media Monitoring Tool (OpView) [31]. OpView is the largest social media monitoring service platform in Taiwan. It collects eWOM and news every day from five online media sources in Taiwan [32], including more than 6100 discussion forums (e.g., the Mobile01 and the Dcard), more than 36,000 social media (e.g., Facebook and Instagram), more than 400 Q&A websites (e.g., Yahoo! Answers), more than 1800 blogs, and more than 3600 news websites (e.g., ETtoday and Line Today) [31]. The collected daily eWOM and news are then analyzed as three types of sentiments, i.e., positive, negative, and neutral, for various products and brands. For the study, three types of daily online sentiment volumes, positive (P), negative (N), and total (T), from 2014 to 2019 for six car brands were collected. Table 2 shows the collected daily online sentiment volumes for BMW.

2.1.2. Labeling of Car Sales Movement Directions

As mentioned in Section 1, this research selected the sales movement direction as the response variable of car sales prediction models. Hence, the monthly sales movement directions were labeled with the collected monthly sales volumes for six car brands. The three types of sales movement directions, Up (U), Flat (F), and Down (D) are defined in Equation (1). Since the intent of this research was to predict the car sales movement direction of the next month at this month, this equation indicates that the ith month’s label l_i is determined by the i + 1th month’s monthly sales growth rate (S_i+1 − S_i)/S_i and the predefined threshold h.

l_{i} = \{\begin{array}{c} U, i f (S_{i + 1} - S_{i}) / S_{i} > h \\ F, i f - h \leq (S_{i + 1} - S_{i}) / S_{i} \leq h \\ D, i f (S_{i + 1} - S_{i}) / S_{i} < - h \end{array}

(1)

Assume that the threshold h is set at 10%. The results of Equation (1), in Table 3, show the labeling results of each month from 2014 to 2019 based on the collected historical car sales volumes shown in Table 1.

For example, the l₁ (i.e., the label of 2014/1) of BMW is determined by the 2nd month’s (i.e., 2014/2’s) monthly sales growth rate (624 − 1437)/1437 = −56.58% and the predefined threshold h = 10%. As the 2nd month’s monthly sales growth rate, −56.58%, is smaller than −10% (i.e., −h), the l₁ of BMW is labeled as the Down direction “D”. As mentioned in Section 2.1.1, for labeling the sales movement direction of 2019/12, i.e., l₇₂, the car sales volume of 2020/1 must be collected. For the example of the l₇₂ of Lexus, the 73rd month’s monthly sales growth rate (2687−2348)/2348 = 14.44% is greater than the threshold h = 10%; the l₇₂ of Lexus should be labeled with an Up direction “U” according to Equation (1).

2.2. Prediction Model Structure Construction

As explained in Section 1, to verify the effects of online sentiment data and the CNN-LSTM model on Taiwan’s car sales prediction, three models, namely, the classical, sentimental and CNN-LSTM models, were created for forecasting car sales movement directions in Taiwan. The structures of these three prediction models will be described in this section.

2.2.1. The Classical Model

In this research, the classical model was created as the baseline for prediction performance for comparison with the sentimental and the CNN-LSTM models.

The classical model adopted the most frequently used predictor variables for sales forecasting, including historical sales data and seasonality data, to predict the car sales movement direction. For example, Table 4 is the dataset prepared for creating the classical model for BMW. The monthly car sales volumes and monthly labeling data were retrieved from Table 3. As for seasonality data, car companies in Taiwan usually start different sales campaigns in specific months to promote sales, so monthly car sales volumes exhibit strong seasonality. In this research, the seasonality data, namely, the month number and the same-month-last-year sales movement direction labels were added, as shown in Table 4, to improve the accuracy of predictions of car sales movement directions.

Furthermore, since three types of car sales movement directions (U, F, and D) were defined in this research, the MLR, a generalized logistic regression for solving the problems with more than two classes [22], was selected as the forecasting method of car sales movement directions. Hence, the classical model for forecasting car sales movement directions in Taiwan can be expressed by Equation (2):

y_i = β₀ + β₁x_i₁ + β₂x_i₂ + β₃x_i₃ + ϵ

(2)

where y_i refers to the car sales movement direction (U, F, or D) of the ith month; x_i₁ refers to the sales volume of the ith month; x_i₂ refers to the same-month-last-year sales movement direction label of the ith month; x_i₃ refers to the month number of the ith month; β₀ refers to the y-intercept (constant term); β₁ to β₃ refer to the slope coefficients for x_i₁ to x_i₃; ϵ refers to the model’s error term (also known as the residuals).

In multinomial logistic regression with K classes, one class is chosen as a “pivot”, and K-1 independent binary logistic regression models are constructed [22]. If car sales movement direction Y = U is selected as the pivot, then the model for Y = U is:

\ln \frac{P (Y = F)}{P (Y = U)} = b_{F} \cdot x and \ln \frac{P (Y = D)}{P (Y = U)} = b_{D} \cdot x

(3)

where Y refers to the random variable of y_i in Equation (2); b_F and b_D refer to the set of regression coefficients in Equation (2) associated with car sales movement directions F and D, respectively (b is typically estimated by the maximum likelihood method [33]); x refers to the vector of x_i1 to x_i3 in Equation (2). Then, the probability that x belongs to car sales movement directions F and D can be expressed as Equation (4):

P (Y = F) = P (Y = U) e^{b_{F} \cdot x} and P (Y = D) = P (Y = U) e^{b_{D} \cdot x}

(4)

Since the sum of the probabilities that x belongs to each class is 1, the probability that x belongs to car sales movement direction U becomes:

P (Y = U) = 1 - P (Y = F) - P (Y = D) = {\frac{1}{1 + (e^{b_{F} \cdot x} + e^{b_{D} \cdot x})}}^{}

(5)

and Equation (4) can be rewritten as follows:

P (Y = F) = \frac{e^{b_{F} \cdot x}^{}}{1 + (e^{b_{F} \cdot x} + e^{b_{D} \cdot x})} \cdot and P (Y = D) = \frac{e^{b_{D} \cdot x}^{}}{1 + (e^{b_{F} \cdot x} + e^{b_{D} \cdot x})}

(6)

Given the x from Table 4’s dataset, MLR outputs a car sales movement direction label y such that:

y = \arg \max_{k = U, F, D} P (Y = k)

(7)

2.2.2. The Sentimental Model

As mentioned in Section 1, since the potential car buyers tend to spend more time on online sentiment data to aid in purchasing decision-making, this research intends to improve the overall performance of forecasting of car sales movement directions in Taiwan by using multi-channel online sentiment data.

To clarify the effects of multi-channel online sentiment data on car sales movement direction prediction, the sentimental model was created as shown in Equation (8) by adding the sentimental data to the classical model, shown in Equation (2), as the predictor variables for forecasting car sales movement directions in Taiwan:

y_i = β₀ + β₁x_i1 + β₂x_i2 + β₃x_i3 + β₄x_i4 + … + β₁₈x_i18 + ϵ

(8)

The only difference between the classical model and the sentimental model was that the predictor variables of the sentimental model additionally contained the online sentiment data from x_i4 to x_i18. As shown in Table 2, x_i4 to x_i18 respectively refer to the five channels, including discussion forums, social media, Q&A websites, blogs, and websites, and each channel is composed of three types of online sentiment volume, i.e., P, N, and T, for six car brands collected from the OpView Insight.

However, the online sentiment data were collected on a daily basis, whereas the forecasting model defined in Equation (8) were monthly, since the subscript i represents the ith month. Consequently, the collected daily online sentiment data needed to be converted into monthly online sentiment data. For BMW, for example, the dataset for creating the sentimental model was prepared as shown in Table 5.

As in the classical model, the MLR was used for solving the sentimental model to predict the car sales movement directions. Therefore, the fitting and forecasting processes for Equation (8) were the same as those for Equations (3)–(7).

2.2.3. The CNN-LSTM Model

In the last section, for the sentimental model, online sentiment data were added to the predictor variables of the classical model in an attempt to improve the overall performance of forecasting of car sales movement directions in Taiwan. This section presents the development of the CNN-LSTM model, which integrated the CNN and LSTM networks instead of the MLR method used in the sentimental model to improve the overall performance of forecasting of car sales movement directions in Taiwan. The basic idea of the utilization of these models is that LSTM models are appropriate for dealing with time series data, while CNN models may filter out the noise of the input data and extract more valuable features [34,35].

Figure 3 presents the architecture of the CNN-LSTM model. A sliding window [36] is first used in pre-processing for data extracting for the CNN-LSTM models. Then, the CNN-LSTM model sequentially consists of a one-dimensional convolutional neural network (1D CNN), a CNN network, an LSTM network, a fully connected network, and an output layer.

Basically, the CNN-LSTM model used the same predictor variables as the sentimental model with the exception of the historical car sales volumes. After a test of prediction performance, this research chose to use the past 60-day (2-month) data of predictor variables to predict the next-month (1-month) response variable of car sales movement directions. Therefore, this research uses a 90-day (3-month) window with a 30-day (1-month) sliding step to extract the training data for CNN-LSTM model from the collected dataset. Each sliding window includes a set of 2-month predictor variables (X_i) and a 1-month response variable (Y_i). The data of predictor variables was standardized and input via a one-dimensional convolutional neural network (1D CNN) developed for processing data arranged in a single dimension [26,37]. Since the multi-channel daily sentiment data was collected on a daily basis, its 60-day data was extracted directly to form the 1D 60-day input data. However, because both the same-month-last-year sales movement direction label and the month number were monthly-based, their two monthly data must be duplicated to form the 1D 60-day input data, i.e., one monthly data was duplicated to 30 daily data.

After receiving data via the 1D CNN layer, a CNN network containing five convolutional and max-pooling layers filtered out the noise of the input sentiment data and extracted sentiment change features for the prediction of car sales movement directions in Taiwan. In addition, both the daily online sentiment data and the monthly car sales data were time series. Moreover, the LSTM network, which has a feedback loop for processing the entire data sequence, is always used for classification, processing, and forecasting based on time series data [27].

Therefore, an LSTM network and a 4-layer fully connected network following the CNN network were used to generate the output of car sales movement direction prediction. Some considered hyperparameters of the CNN-LSTM model are presented below: the activation function was relu; the loss function was cross_entropy; the optimizer was adam; the learning rate was 0.003. Finally, the earlystop method was used to solve the overfitting problem [38,39].

3. Results

As mentioned in Section 2.1.1, this research collected the data of car sales volumes and multi-channel online sentiment volume from 2014 to 2019. For training and evaluating the three prediction models (classical, sentimental, and CNN-LSTM) proposed in Section 2.2, the collected data from 2014 to 2018 were used as the training dataset, and the data of 2019 were used as the test dataset. This section reviews and compares the forecasting performances of the three models to verify the effects of the online sentiment data and the CNN-LSTM model on forecasting of car sales movement directions in Taiwan.

The performances of the classical, sentimental, and the CNN-LSTM models proposed in Section 2.2 were evaluated with four indices, including accuracy, precision, recall, and F1-score. These measures are defined in Equations (9)–(14) based on the confusion matrix shown in Table 6.

A c c u r a c y = \frac{D D + F F + U U}{N}

(9)

P r e c i s i o n (x) = \{\begin{matrix} \frac{D D}{P (D)}, i f x = D \\ \frac{F F}{P (F)}, i f x = F \\ \frac{U U}{P (U)}, i f x = U \end{matrix}

(10)

P r e c i s i o n = \frac{P (D) \times P r e c i s i o n (D) + P (F) \times P r e c i s i o n (F) + P (U) \times P r e c i s i o n (U)}{N}

(11)

R e c a l l (x) = \{\begin{matrix} \frac{D D}{A (D)}, i f x = D \\ \frac{F F}{A (F)}, i f x = F \\ \frac{U U}{A (U)}, i f x = U \end{matrix}

(12)

R e c a l l = \frac{A (D) \times R e c a l l (D) + A (F) \times R e c a l l (F) + A (U) \times R e c a l l (U)}{N}

(13)

F 1 - s c o r e = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(14)

Accuracy is an intuitive measure of the performance of a prediction model. As defined in Equation (9), accuracy is the ratio of the correct predictions to the total number of predictions. However, accuracy is great only when the dataset is balanced. Since the dataset for forecasting car sales trends in Taiwan collected in this research had an uneven class distribution, three other measures: precision, recall, and F1-score, also needed to be reviewed for evaluation of the performance of the three prediction models.

Precision was used to evaluate the correctness of the prediction of a model for each kind of car sales movement direction. As shown in Equation (10), the precision of a specific direction is the ratio of the correct predictions within the total predictions of that direction. That is, it removes a type of prediction for U, F, and D, and evaluates the accuracy of the prediction. To evaluate the overall precision of a model, Equation (11) defines the precision as the weighted average of the precision of individual directions.

On the other hand, recall is used to evaluate the ability of a model to correctly identify the observations in each kind of car sales movement direction in the dataset. As defined in Equation (12), the recall of a specific direction is the ratio of the correct predictions among the observations with actual values of that direction. To evaluate the overall recall of a model, the recall is defined as Equation (13), as the weighted average of the recall of individual directions.

In some situations, precision or recall will be maximized at the expense of the other metric. To take both precision and recall into account, Equation (14) defines the F1-score as the harmonic mean of precision and recall. Therefore, the F1-score is an overall measure of a model’s accuracy that combines precision and recall [40]. The F1-score is usually more useful than accuracy, especially if the dataset has an uneven class distribution.

As mentioned in Section 1 and Section 3, the classical, sentimental, and the CNN-LSTM models were designed to verify the effects of the online sentiment data and the CNN-LSTM model on the forecasting of car sales movement directions in Taiwan. Since the sentimental model was designed by adding the sentimental data to the classical model as the predictor variables for that purpose, the effects of multi-channel online sentiment data on car sales prediction accuracy could be reviewed by comparing the performances of the classical and sentimental models. Furthermore, since the CNN-LSTM model adopted a deep learning method integrating the CNN and LSTM networks instead of the MLR method in the sentimental model, the effects of the CNN-LSTM model on car sales prediction accuracy could be verified by comparing the performances of the two models.

Consequently, the remainder of this section will first review the confusion matrix of the testing results. Table 7, Table 8 and Table 9 show the aggregate confusion matrices of the classical, sentimental, and the CNN-LSTM models, respectively, for the six car brands.

Based on the confusion matrices, the four performance indices of accuracy, precision, recall, and F1-score were then computed and compared model by model to evaluate the effects of online sentiment data and the CNN-LSTM model on the forecasting of the car sales movement directions.

Figure 4 shows the accuracy of the classical, sentimental, and the CNN-LSTM models for six car brands. For all but TOYOTA, their prediction accuracies of the sentimental model were higher than those of the classical model. TOYOTA had accuracies of 50% in both the classical and sentimental models. On average, the sentimental model (51.39%) provided an improvement in accuracy of 9.72% over that of the classical model (41.67%) due to the addition of multi-channel online sentiment data as the predictor variables to predict car sales movement directions. Furthermore, the accuracies of the CNN-LSTM model for all six brands were higher than those of the sentimental model. On average, the CNN-LSTM model (69.45%) achieved accuracy that was 18.06% higher than that of the sentimental model (51.39%) due to the adoption of the CNN-LSTM network instead of the MLR method. In total, the CNN-LSTM model demonstrated an average improvement in accuracy that was 27.78% higher than that of the classical model. Consequently, according to the above comparison, the accuracy of forecasting of car sales movement directions in Taiwan was indeed improved by using online sentiment data and the CNN-LSTM model.

However, accuracy is an intuitive and great measure only when the dataset is balanced. The collected dataset for forecasting car sales movement directions in Taiwan had an uneven class distribution. Hence, this research further examined three other measures, i.e., precision, recall, and F1-score, to evaluate the performances of the three prediction models.

Figure 5 shows the precision of the classical, sentimental, and the CNN-LSTM models for six car brands. As mentioned above, precision is a measure of the ability of a model to correctly identify the observations from the predictions of a specific car sales movement direction. As shown in Figure 5, the precision of the sentimental model for all six brands was higher than that of the classical model. On average, the precision of the sentimental model (0.55) was 0.16 higher than that of the classical model (0.38). Furthermore, the precision of the CNN-LSTM model for all but Lexus was higher than that of the sentimental model. Lexus had high precision of 0.83 in both the sentimental and the CNN-LSTM models. On average, the precision of the CNN-LSTM model (0.77) was 0.22 higher than that of the sentimental model (0.55). In total, the precision of the CNN-LSTM model was an average of 0.39 higher than that of the classical model. Consequently, according to the above comparison, the precision of the forecasting of car sales movement directions in Taiwan was effectively improved by the use of online sentiment data and the CNN-LSTM model.

Figure 6 shows the recall of the classical, sentimental, and the CNN-LSTM models for the six car brands. As mentioned above, recall is a measure of the ability of a model to correctly identify the observations of a specific car sales movement direction within the dataset. As shown in Figure 6, the recalls of the sentimental model for all six brands but TOYOTA were higher than those of the classical model. TOYOTA had a consistent recall of 0.50 in both the classical and sentimental models. On average, the recall of the sentimental model (0.51) was 0.09 higher than that of the classical model (0.42). Furthermore, the recalls of the CNN-LSTM model for all six brands were higher than those of the sentimental model. On average, the recall of the CNN-LSTM model (0.69) was 0.18 higher than that of the sentimental model (0.51). In total, the CNN-LSTM model demonstrated an average improvement in recall of 0.27 over that of the classical model. According to the above comparison, the recall of forecasting of car sales movement directions in Taiwan was improved by the use of online sentiment data and the CNN-LSTM model.

Finally, the results of F1-scores simultaneously considering precision and recall are reviewed. Figure 7 shows the F1-scores of the classical, sentimental, and the CNN-LSTM models for the six car brands. As mentioned above, the F1-score is more useful than accuracy when the dataset has an uneven class distribution. As shown in Figure 7, the F1-scores of the sentimental model for all six brands were higher than those of the classical model. On average, the F1-scores of the sentimental model (0.46) were 0.11 higher than those of the classical model (0.35). Furthermore, the F1-scores of the CNN-LSTM model for all six brands were higher than those of the sentimental model. On average, the F1-scores of the CNN-LSTM model (0.68) were 0.22 higher than those of the sentimental model (0.46). Overall, the CNN-LSTM model demonstrated an average improvement in F1-score of 0.33 over that of the classical model. Based on the above comparison, the F1-score of forecasting of car sales movement directions was improved by the use of online sentiment data and the CNN-LSTM model.

The comparisons of the four indices in the classical, sentimental, and the CNN-LSTM models have shown that the performance of forecasting of car sales movement directions in Taiwan was improved by the use of online sentiment data and the CNN-LSTM model.

However, accuracy works best if kinds of false predictions have similar costs. If the costs of different false predictions are very dissimilar, it is better to look at precision or recall. In this research, the costs of classifying direction D as F or U, classifying direction F as D or U, and classifying direction U as D or F were different, so precision and recall were further explored to examine the applicability of the proposed methods.

As mentioned above, precision is used to evaluate the ability of a model to correctly predict each kind of car sales movement directions; recall is used to evaluate the ability of a model to correctly identify the observations in each kind of car sales movement directions in the dataset. In practice, car companies arrange their production and sales plans according to the predictions of car sales movement directions, so the correctness of the prediction, i.e., the precision, will have a direct impact on car inventories and sales. Thus, in the application of car sales movement direction forecasting, precision is more important than recall. The following section discusses the precision of the three models for the six car brands in greater detail.

Figure 8 illustrates the precision for direction D, Precision(D), of the classical, sentimental, and the CNN-LSTM models for the six car brands. A low Precision(D) means that most of the predictions of direction D are incorrect and should be F or U, so the planned production quantity will not meet the market demand. Improving Precision(D) can prevent losses of sales and market share due to insufficient production. As shown in Figure 8, the Precision(D) values of the sentimental model for all six brands were equal to or higher than those of the classical model. On average, the Precision(D) values increased by 0.39, from 0.34 in the classical model to 0.73 in the sentimental model. Furthermore, the Precision(D) of the CNN-LSTM model for all six brands except VW were equal to or higher than those of the sentimental model. The Precision(D) of the sentimental model was enhanced by 0.11 in the CNN-LSTM model, from 0.73 to 0.84 on average. Both the online sentiment data and the CNN-LSTM method were conducive to the improvement of Precision(D) for forecasting car sales movement directions in Taiwan. The integral effect of improving Precision(D) by using online sentiment data and the CNN-LSTM method was 0.50 (from the average of 0.34 in the classical model to the average of 0.84 in the CNN-LSTM model). Obviously, the effect of the online sentimental data (0.39) in the sentimental model on the improvement of Precision(D) was particularly significant. As for VW, although the Precision(D) dropped from 1.00 for the sentimental model to 0.75 for the CNN-LSTM model, the integral effects of online sentiment data and the CNN-LSTM method led to improvement of the Precision(D) by 0.46 (from 0.29 in the classical model to 0.75 in the CNN-LSTM model).

For the precision of direction U, Figure 9 shows the Precision(U) of the three models for the six car brands. A low Precision(U) indicates that most of the predictions of direction U are incorrect and should be F or D, so car companies will plan a production quantity that exceeds market demand. Improving Precision(U) can avoid a surplus of cars. As shown in Figure 9, the Precision(U) values of the sentimental model for all six brands except TOYOTA were equal to or higher than those of the classical model. On average, the Precision(U) increased by 0.06, from 0.57 in the classical model to 0.63 in the sentimental model. Furthermore, the Precision(U) values of the CNN-LSTM model for all six brands except Lexus were equal to or higher than those of the sentimental model. The Precision(U) demonstrated an average improvement of 0.10, from 0.63 for the sentimental model to 0.73 for the CNN-LSTM model. Thus, it was found that the Precision(U) of forecasting of car sales movement directions in Taiwan was also somewhat improved by the use of online sentiment data and the CNN-LSTM method. The integral effect of online sentiment data and the CNN-LSTM method on the Precision(U) was an increase of 0.16 (from the average of 0.57 in the classical model to the average of 0.73 in the CNN-LSTM model). The effect of the CNN-LSTM method (0.16) was a little larger than that of the online sentiment data (0.06). However, TOYOTA and Lexus were not thus affected. The use of online sentiment data may have interfered with the Precision(U) improvement for TOYOTA, for it dropped from 1.00 in the classical model to 0.57 in the sentimental model. On the other hand, the CNN-LSTM appeared unable to raise the Precision(U) for Lexus, which dropped from 1.00 in the sentimental model to 0.70 in the CNN-LSTM model.

Finally, the precision of direction F, Precision(F), for the three models and six car brands are shown in Figure 10. A low Precision(F) value indicates that most of the predictions of direction F are incorrect and should be D or U. Since the planned production quantity will be higher or lower than the market demand, car companies may either lose sales and market share or hold excess inventory. As shown in Figure 10, the Precision(F) values of the sentimental model for all six brands except BENZ were equal to or higher than those of the classical model. The Precision(F) increased by 0.15 on average, from 0.07 for the classical model to 0.22 for the sentimental model. Comparatively, the Precision(F) values of the CNN-LSTM model for all six brands were higher than those of the sentimental model, as was the improvement of 0.57, from the average of 0.22 of the sentimental model to the average of 0.79 of the CNN-LSTM model. Although both the online sentiment data and the CNN-LSTM model contributed to the improvement of Precision(F) values, the effect of the CNN-LSTM method was particularly significant. As for BENZ, although the Precision(F) dropped from 0.43 in the classical model to 0.00 in the sentimental model, the integral effect of online sentiment data and the CNN-LSTM method enhanced the Precision(F) from 0.43 for the classical model to 0.67 for the CNN-LSTM model.

From the above analysis, it can be found that the precisions of the three directions, Precision(D), Precision(F), and Precision(U), all improved gradually from the classical model to sentimental and the CNN-LSTM model. Therefore, the results clearly indicated that the online sentiment data and CNN-LSTM method improved the precision of the forecasting of directions D, F, and U in car sales in Taiwan.

However, the extent of the effects of online sentiment data and CNN-LSTM method on Precision(D), Precision(F), and Precision(U) were different. The contribution to Precision(U) of the online sentiment data was 0.06, while that from the CNN-LSTM method was 0.10. That is, Precision(U) gained only a little improvement from the online sentiment data and CNN-LSTM method. For Precision(D), in contrast, the contributions of the online sentiment data and the CNN-LSTM method were 0.39 and 0.14, respectively. Thus, Precision(D) gained a lager improvement from the online sentiment data than from the CNN-LSTM method. Furthermore, the Precision(F) gained only 0.15 in improvement from the online sentiment data but 0.57 from the CNN-LSTM method. Obviously, most of the improvement of Precision(F) resulted from the CNN-LSTM method. A further discussion of how the online sentiment data and CNN-LSTM method impacted Precision(D), Precision(F), and Precision(U) is provided as follows.

As mentioned in Section 2.1, this research collected and analyzed three types of online sentiment volumes, i.e., positive (P), negative (N), and total (T), to predict car sales movement direction for six car brands. Theoretically, positive sentiment is consumer responses indicating satisfaction; negative sentiment is consumer responses suggesting dissatisfaction. A more positive sentiment volume in the previous months indicates an increased likelihood of a sales movement direction of U in the next month. Conversely, a more negative sentiment volume in the previous months increases the likelihood of a sales movement direction of D in the next month.

In practice, the originators and motives of eWOMs and online sentiment will influence consumer intentions and decisions, which in turn will impact the overall performance of sales movement direction forecasting. The main originators of eWOMs and online sentiment include the customers and the industry. The online sentiments of customers who have real praise (positive eWOMs) or complaints (negative eWOMs) may directly affect the sales movement directions of the subsequent periods. On the other hand, the online sentiments from the profession include eWOMs written by the companies themselves, the company’s partners, or the company’s competitors. To promote its products, a company and its partners may intentionally leave positive eWOMs or sentiments on social media or blogs. Conversely, competitors may write negative eWOMs as malicious attacks. Therefore, most of the eWOMs or online sentiments from the profession may be false and could interfere with sales movement direction forecasting.

In the car market, safety and after-sales service play key roles in shaping consumer attitudes and perceptions towards car purchases. Safety and after-sales service experiences also drive customer satisfaction and eWOMs. Over the years, car companies have striven to target and attract specific customer segments through automotive design and services. Meanwhile, consumers will become loyal customers of a particular automotive brand under the consideration of customer experiences and personal preferences. These loyal customers are also willing to write positive eWOMs and share their positive sentiment and experiences with the public via news, blogs, and social media. In contrast, customers who have poor experiences with a car brand will leave negative eWOMs and sentiments on the Internet.

As a result, in normal times, a car brand will have stable volumes of positive and negative eWOMs, and sentiments from satisfied and dissatisfied customers, respectively. This steady volume of positive and negative sentiments from customers theoretically should be helpful to the accurate prediction of car sales movement direction of F. However, the car brand company may sometimes write positive eWOMs and sentiments to respond to the negative eWOMs and sentiments of customers. These positive eWOMs and sentiments from the car manufacturer will interfere with the prediction of direction F because they are not from actual customers. This interference may explain why the online sentiment volume did not contribute much to the improvement of Precision(F), increasing it only from 0.07 of the classical model to 0.22 of the sentimental model on average. However, the CNN-LSTM method used in this research can effectively filter out the influence of noise from the eWOMs and the sentiment volume of the profession to enhance Precision(F) from the average of 0.22 of the sentimental model to the average 0.79 of the CNN-LSTM model, for it can integrate and apply the CNN’s image processing ability to online sentiment data and the LSTM’s time series processing ability to car sales historical data.

If it is time for a car manufacturer to launch a new car, the company will actively create an atmosphere of discussion of the merits of the new car on the Internet. As a result, the positive eWOMs or sentiment will continue to grow for a period of time. Normally, in the early stage of the introduction of a new car on the market, the increases in positive eWOMs and online sentiments are usually accompanied by growth in new car sales. However, whether the subsequent increases in positive eWOMs and online sentiment will continue to stimulate new car sales depends on consumers’ budget constraints and acceptance of the car brand. As mentioned above, consumers usually accept and prefer only a few brands. Therefore, in the late stage of the introduction of a new car, although the positive eWOMs and online sentiment continue to increase, they may not lead to an increase in car sales. This suggests that increases in positive eWOMs and online sentiment sometimes lead to increases in car sales, but sometimes they do not. This fact will reduce the effectiveness of the online sentiment data and the CNN-LSTM method, and it is the reason why the Precision(U) gained only small improvements from the online sentiment data (0.06, from 0.57 of the classical model to 0.63 of the sentimental model) and the CNN-LSTM method (0.10, from 0.63 of the sentimental model to 0.73 of the CNN-LSTM model).

In contrast, if customers are dissatisfied with the quality or service of a car brand, the volume of negative eWOMs and online sentiment will increase, and negative experiences are usually communicated faster than positive experiences. Although the car company may write positive eWOMs and sentiments to respond to the negative eWOMs and sentiments from customers, the car sales still can be affected with varying degrees of decline. If the complaints are related to serious safety issues or design defects, the car company will be more active in issuing a public recall notification to instruct customers to return their cars for free repairs. At this point, the volume of negative eWOMs and online sentiments will increase significantly, and potential consumers will turn away from the car brand. This is why adding online sentiment data as predictor variables resulted in a significant improvement of 0.39 in Precision(D), from 0.34 of the classical model to 0.73 of the sentimental model. With the significant help of the online sentiment data, the deep learning ability of the CNN-LSTM method was able to improve Precision(D) by only 0.11, from 0.73 of the sentimental model to 0.84 of the CNN-LSTM model.

To sum up, both the online sentiment data and the CNN-LSTM method have good effects on the precision of three car sales movement directions, D, F, and U. As to the computational burden, the three models were executed on a PC environment of Intel Core i7-6700 CPU @ 3.40 GHz × 4, 24 GB RAM memory, and NVIDIA GeForce GTX 1060 6 GB GPU. The total running time of the classical, sentimental, and CNN-LSTM models for each brand is about 1, 2, and 45 min, respectively. The CNN-LSTM model takes longer time to train the model. Once the model is trained, the time required for subsequent testing and actual prediction is not much different from the other two models. Although the CNN-LSTM model takes longer time, it can create better prediction performance, which is worthwhile overall.

4. Conclusions

With the explosive growth of social media and emerging forecasting methods, the efforts to improve the performance of car sales forecasting should consider the adoption of eWOM, online sentiment data, and some deep learning techniques. The purpose of this research was to improve the overall performance of forecasting of car sales movement directions in Taiwan by using online sentiment data and the CNN-LSTM method. This research selected the car sales movement direction as the predicted object, and it was defined by the sales growth rate of the next month and the predefined threshold. Therefore, in practical application, if a threshold is set at 10% and the car sales movement direction predicted by the CNN-LSTM model is up (U) or down (D), the car company can adjust the production and sales of the next month upward or downward by 10%. To verify the effects of online sentiment data and the CNN-LSTM method on the forecasting of car sales movement directions in Taiwan, three forecasting models, namely, the classical model, the sentimental model, and the CNN-LSTM model, were constructed and compared.

The results showed that of the use of both online sentiment data and the CNN-LSTM method led to significant improvements over the classical model in accuracy (27.78%, from 41.67% to 69.45%), precision (0.39, from 0.38 to 0.77), recall (0.27, from 0.42 to 0.69), and F1-score (0.33, from 0.35 to 0.68). It is found that the overall performance of forecasting of car sales movement directions in Taiwan can be effectively improved by the use of online sentiment data and the CNN-LSTM model.

Furthermore, because car companies use prediction of car sales movement directions to arrange their production and sales plans, the degree of precision will impact directly both car inventories and sales, so this degree needs to be explored in more detail. The results showed that model with the online sentiment data and the CNN-LSTM method demonstrated on the improvements in Precision(U), Precision(D), and Precision(F) of 0.16 (from 0.57 to 0.73), 0.50 (from 0.34 to 0.84), and 0.72 (from 0.07 to 0.79) respectively.

In practice, if direction U is wrongly predicted as direction F or D, the planned production quantity will fall behind the market demand and may incur some opportunity cost, but the impact is not significant. Since loyal customers may just delay their orders, there is little impact on sales volume or market share loss. In contrast, if direction D is wrongly predicted as direction F or U, the acting cost will be high. Since the planned production quantity will exceed the market demand, automobile manufacturers may hold excess inventory, in which can lead to a backlog of capital or the need to sell at a reduced price. Therefore, improvements of Precision(U), Precision(F) and Precision(D) will be of great benefit to car companies in developing effective production and sales plans.

Besides, many deep learning models have been developed. Some models are available for this research. For example, the LSTM-vanilla model which only adds a peephole connection to the classic LSTM is similar to the LSTM used in the CNN-LSTM model for this research. In the future, the LSTM-vanilla model may be used to improve the prediction performance and compare with the CNN-LSTM model proposed in this research.

Author Contributions

Conceptualization, C.O.-Y. and Y.-C.J.; Methodology, C.O.-Y., S.-C.C., and Y.-C.J.; Software, S.-C.C.; Validation, Y.-C.J.; Formal analysis, C.O.-Y., S.-C.C., and Y.-C.J.; Investigation, S.-C.C.; Resources, C.O.-Y.; Data curation, S.-C.C.; Writing—original draft preparation, S.-C.C.; Writing—review and editing, C.O.-Y. and Y.-C.J.; Visualization, S.-C.C.; Supervision, C.O.-Y. and Y.-C.J.; Project administration, C.O.-Y.; Funding acquisition, C.O.-Y. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the Ministry of Science and Technology of the Republic of China, Taiwan, under contract no. MOST 109-2221-E-011-102.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data are nondisclosure due to confidentiality agreement.

Acknowledgments

The authors would like to thank the Ministry of Science and Technology of the Republic of China, Taiwan, for financially supporting this research. The authors also gratefully acknowledge the support of the OpView Insight for this work.

Conflicts of Interest

The authors declare that there is no conflict of interest.

References

Stevenson, W.J. Operations Management, 12nd ed.; Irwin/McGraw-Hill: New York, NY, USA, 2015. [Google Scholar]
Fan, Z.-P.; Che, Y.-J.; Chen, Z.-Y. Product sales forecasting using online reviews and historical sales data: A method combining the Bass model and sentiment analysis. J. Bus. Res. 2017, 74, 90–100. [Google Scholar] [CrossRef]
Brühl, B.; Hülsmann, M.; Borscheid, D.; Friedrich, C.M.; Reith, D. A sales forecast model for the German automobile market based on time series analysis and data mining methods. In Proceedings of the 2009 Industrial Conference on Data Mining, Leipzig, Germany, 20–22 July 2009; pp. 146–160. [Google Scholar]
Wang, F.-K.; Chang, K.-K.; Tzeng, C.-W. Using adaptive network-based fuzzy inference system to forecast automobile sales. Expert Syst. Appl. 2011, 38, 10587–10593. [Google Scholar] [CrossRef]
Hülsmann, M.; Borscheid, D.; Friedrich, C.M. General sales forecast models for automobile markets and their analysis. Trans. Mach. Learn. Data Min. 2012, 5, 65–86. [Google Scholar]
Pierdzioch, C.; Rülke, J.-C.; Stadtmann, G. Forecasting U.S. car sales and car registrations in Japan: Rationality, accuracy and herding. Jpn. World Econ. 2011, 23, 253–258. [Google Scholar] [CrossRef]
Fantazzini, D.; Toktamysova, Z. Forecasting German car sales using Google data and multivariate models. Int. J. Prod. Econ. 2015, 170, 97–135. [Google Scholar] [CrossRef] [Green Version]
Liu, G.; Long, B. The research on Combination forecasting model of the automobile sales forecasting system. In Proceedings of the 2009 International Forum on Computer Science-Technology and Applications, Chongqing, China, 25–27 December 2009; pp. 82–85. [Google Scholar]
López, M.; Sicilia, M. Determinants of E-WOM Influence: The Role of Consumers’ Internet Experience. J. Theor. Appl. Electron. Commer. Res. 2014, 9, 7–8. [Google Scholar] [CrossRef] [Green Version]
Sa’ait, N.; Kanyan, A.; Nazrin, M.F. The effect of e-WOM on customer purchase intention. Int. Acad. Res. J. Soc. Sci. 2016, 2, 73–80. [Google Scholar]
Bataineh, A.Q. The Impact of Perceived e-WOM on Purchase Intention: The Mediating Role of Corporate Image. Int. J. Mark. Stud. 2015, 7, 126. [Google Scholar] [CrossRef] [Green Version]
Singh, A.; Jenamani, M.; Thakkar, J.J.; Rana, N.P. Quantifying the effect of eWOM embedded consumer perceptions on sales: An integrated aspect-level sentiment analysis and panel data modeling approach. J. Bus. Res. 2022, 138, 52–64. [Google Scholar] [CrossRef]
Filieri, R.; Lin, Z.; Pino, G.; Alguezaui, S.; Inversini, A. The role of visual cues in eWOM on consumers’ behavioral intention and decisions. J. Bus. Res. 2021, 135, 663–675. [Google Scholar] [CrossRef]
Kaur, K.; Singh, T. Impact of Online Consumer Reviews on Amazon Books Sales: Empirical Evidence from India. J. Theor. Appl. Electron. Commer. Res. 2021, 16, 2793–2807. [Google Scholar] [CrossRef]
Hennig-Thurau, T.; Gwinner, K.P.; Walsh, G.; Gremler, D.D. Electronic word-of-mouth via consumer-opinion platforms: What motivates consumers to articulate themselves on the Internet? J. Interact. Mark. 2004, 18, 38–52. [Google Scholar] [CrossRef]
Wu, S.-J.; Chiang, R.-D.; Chang, H.-C. Applying sentiment analysis in social web for smart decision support marketing. J. Ambient. Intell. Humaniz. Comput. 2018, 1–10. [Google Scholar] [CrossRef]
Lee, I. A study of the effect of social shopping deals on online reviews. Ind. Manag. Data Syst. 2017, 117, 2227–2240. [Google Scholar] [CrossRef]
Ren, R.; Wu, D.D.; Liu, T. Forecasting Stock Market Movement Direction Using Sentiment Analysis and Support Vector Machine. IEEE Syst. J. 2019, 13, 760–770. [Google Scholar] [CrossRef]
Thiesing, F.M.; Vornberger, O. Sales forecasting using neural networks. In Proceedings of the International Conference on Neural Networks (ICNN’97), Houston, TX, USA, 12–12 June 1997; Volume 2124, pp. 2125–2128. [Google Scholar]
Chong, A.Y.L.; Li, B.Y.; Ngai, E.W.T.; Ch’ng, E.; Lee, F. Predicting online product sales via online reviews, sentiments, and promotion strategies A big data architecture and neural network approach. Int. J. Oper. Prod. Manag. 2016, 36, 358–385. [Google Scholar] [CrossRef] [Green Version]
Hu, N.; Koh, N.S.; Reddy, S.K. Ratings lead you to the product, reviews help you clinch it? The mediating role of online review sentiments on product sales. Decis. Support Syst. 2014, 57, 42–53. [Google Scholar] [CrossRef]
Moon, S.-H.; Kim, Y.-H. An improved forecast of precipitation type using correlation-based feature selection and multinomial logistic regression. Atmos. Res. 2020, 240, 104928. [Google Scholar] [CrossRef]
Upadhyay, A.; Bandyopadhyay, G.; Dutta, A. Forecasting Stock Performance in Indian Market using Multinomial Logistic Regression. J. Bus. Stud. Q. 2012, 3, 16. [Google Scholar]
Loureiro, A.L.D.; Miguéis, V.L.; Da Silva, L.F.M. Exploring the use of deep neural networks for sales forecasting in fashion retail. Decis. Support Syst. 2018, 114, 81–93. [Google Scholar] [CrossRef]
Yu, Q.; Wang, K.; Strandhagen, J.O.; Wang, Y. Application of Long Short-Term Memory Neural Network to Sales Forecasting in Retail—A Case Study. In Lecture Notes in Electrical Engineering; Springer: Singapore, 2018; pp. 11–17. [Google Scholar] [CrossRef]
Kiranyaz, S.; Ince, T.; Hamila, R.; Gabbouj, M. Convolutional Neural Networks for patient-specific ECG classification. In Proceedings of the 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Milan, Italy, 25–29 August 2015. [Google Scholar]
Manowska, A. Using the LSTM Network to Forecast the Demand for Electricity in Poland. Appl. Sci. 2020, 10, 8455. [Google Scholar] [CrossRef]
Huang, W.; Nakamori, Y.; Wang, S.-Y. Forecasting stock market movement direction with support vector machine. Comput. Oper. Res. 2005, 32, 2513–2522. [Google Scholar] [CrossRef]
MOTC. Available online: https://stat.motc.gov.tw/mocdb/stmain.jsp?sys=100&funid=defjsp# (accessed on 28 September 2020).
Wang, Y. A multinomial logistic regression modeling approach for anomaly intrusion detection. Comput. Secur. 2005, 24, 662–674. [Google Scholar] [CrossRef]
OpView Insight: Social Media Monitoring Tool. Available online: https://www.opview.com.tw/ (accessed on 15 August 2020).
Lu, H.-K.; Yang, L.-W.; Lin, P.-C.; Yang, T.-H.; Chen, A.N. A study on adoption of bitcoin in Taiwan: Using Big Data Analysis of Social Media. In Proceedings of the 3rd International Conference on Communication and Information Processing, ICCIP ‘17, Tokyo, Japan, 24–26 November 2017. [Google Scholar]
Hosmer Jr, D.W.; Lemeshow, S.; Sturdivant, R.X. Applied Logistic Regression; John Wiley & Sons: Hoboken, NJ, USA, 2013; Volume 398. [Google Scholar]
Livieris, I.E.; Pintelas, E.; Pintelas, P. A CNN–LSTM model for gold price time-series forecasting. Neural Comput. Appl. 2020, 32, 17351–17360. [Google Scholar] [CrossRef]
Yenter, A.; Verma, A. Deep CNN-LSTM with combined kernels from multiple branches for IMDb review sentiment analysis. In Proceedings of the 2017 IEEE 8th Annual Ubiquitous Computing, Electronics and Mobile Communication Conference (UEMCON), New York, NY, USA, 19–21 October 2017; pp. 540–546. [Google Scholar]
Sezer, O.B.; Ozbayoglu, A.M. Algorithmic financial trading with deep convolutional neural networks: Time series to image conversion approach. Appl. Soft Comput. 2018, 70, 525–538. [Google Scholar] [CrossRef]
Kiranyaz, S.; Avci, O.; Abdeljaber, O.; Ince, T.; Gabbouj, M.; Inman, D.J. 1D Convolutional Neural Networks and Applications: A Survey. arXiv 2019, arXiv:1905.03554. [Google Scholar] [CrossRef]
Ma, Y.; Han, R.; Wang, W. Prediction-Based Portfolio Optimization Models Using Deep Neural Networks. IEEE Access 2020, 8, 115393–115405. [Google Scholar] [CrossRef]
Jie, Z.; YAN, J.-f.; Lu, Y.; Meng, W.; Peng, X. Customer Churn Prediction Model Based on LSTM and CNN in Music Streaming. DEStech Trans. Eng. Technol. Res. 2019, aemce, 254–261. [Google Scholar]
Hand, D.; Christen, P. A note on using the F-measure for evaluating record linkage algorithms. Stat. Comput. 2018, 28, 539–547. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Research framework.

Figure 2. Summary of the predicted dataset.

Figure 3. The architecture of the CNN-LSTM model.

Figure 4. The accuracy of the classical, sentimental, and the CNN-LSTM models for six car brands.

Figure 5. The precision of the classical, sentimental, and the CNN-LSTM models for six car brands.

Figure 6. The recall of the classical, sentimental, and the CNN-LSTM models for six car brands.

Figure 7. The F1-score of the classical, sentimental, and the CNN-LSTM models for six car brands.

Figure 8. The precision of direction D of the classical, sentimental, and the CNN-LSTM models for six car brands.

Figure 9. The precision of direction U of the classical, sentimental, and the CNN-LSTM models for six car brands.

Figure 10. The precision of direction F of the classical, sentimental, and the CNN-LSTM models for six car brands.

Table 1. The collected historical monthly car sales volumes.

No	Month	Brand (Car Sales)
		BMW	Lexus	Mazda	Benz	Toyota	VW
		Sales	Sales	Sales	Sales	Sales	Sales
1	2014/1	1437	1207	691	1815	2471	897
2	2014/2	624	806	231	1033	957	551
3	2014/3	1294	971	477	1588	1653	1053
4	2014/4	1393	1039	452	1692	1943	1154
5	2014/5	1284	1310	466	1518	1783	1113
6	2014/6	1424	1131	319	1620	2220	1414
7	2014/7	1469	1481	618	1616	2631	959
8	2014/8	1190	593	325	1389	983	914
9	2014/9	1454	1193	444	1784	1615	1438
10	2014/10	1630	1334	298	1695	1645	1313
11	2014/11	1565	1094	434	1754	2031	795
12	2014/12	2135	1148	517	1796	2805	1329
13	2015/1	2040	787	1970	2086	2468	1095
14	2015/2	1016	1193	609	1219	1575	720

24	2015/12	1718	1634	1859	2024	2721	773
25	2016/1	2045	1379	1697	2554	2003	980

70	2019/10	1658	2233	1418	3037	5712	1118
71	2019/11	1461	2556	1553	2555	5120	1103
72	2019/12	1927	2348	1857	2721	4405	1399
*	2020/1	1355	2687	1122	2490	5686	1232

* The record of 2020/1 is for ensuing labeling.

Table 2. The collected daily online sentiment volume for BMW.

Date	Discussion Forums			Social Media			Q&A Websites			Blogs			News
Date	* P	N	T	P	N	T	P	N	T	P	N	T	P	N	T
2014/1/1	32	18	98	6	1	32	0	0	0	5	2	11	6	1	9
2014/1/2	31	24	128	4	2	26	0	0	0	4	2	10	19	1	42
2014/1/3	35	24	101	9	6	50	0	0	0	3	2	15	10	3	29

2014/1/29	23	26	109	4	7	43	3	1	5	2	1	9	3	16	40
2014/1/30	19	20	87	6	1	27	0	0	0	2	2	9	28	0	42
2014/1/31	40	9	73	27	3	63	2	1	4	0	2	4	13	0	35
2014/2/1	12	15	59	20	6	56	4	0	7	0	1	4	1	0	12
2014/2/2	20	18	111	7	2	27	0	0	1	0	1	3	4	0	10

2014/2/28	31	12	91	6	5	29	0	2	4	6	1	13	4	1	12
2014/3/1	37	18	119	4	12	27	0	1	5	5	1	9	3	0	5
2014/3/2	19	20	85	3	2	16	0	1	4	6	0	11	6	0	11
2014/3/3	17	17	92	6	12	43	1	1	4	6	0	23	21	24	98
2014/3/4	33	40	169	8	16	51	0	0	1	6	0	21	16	22	67

2019/12/28	40	65	244	95	29	967	0	0	0	3	0	4	86	25	154
2019/12/29	44	31	138	104	70	1483	0	0	0	0	0	1	24	13	48
2019/12/30	38	36	138	230	44	1758	0	0	0	2	0	3	49	8	90
2019/12/31	18	19	86	320	40	1911	0	0	0	3	0	3	48	0	65

* P: Positive sentiment volume, N: Negative sentiment volume, T: Total sentiment volume including the volume of positive, negative, and neutral sentiments.

Table 3. The labeling results of car sales movement directions for Table 1.

No	Month	Brand (Car Sales & Label)
		BMW		Lexus		Mazda		Benz		Toyota		VW
		Sales	Label	Sales	Label	Sales	Label	Sales	Label	Sales	Label	Sales	Label
1	2014/1	1437	D	1207	D	691	D	1815	D	2471	D	897	D
2	2014/2	624	U	806	U	231	U	1033	U	957	U	551	U
3	2014/3	1294	F	971	F	477	F	1588	F	1653	U	1053	F
4	2014/4	1393	F	1039	U	452	F	1692	D	1943	F	1154	F
5	2014/5	1284	U	1310	D	466	D	1518	F	1783	U	1113	U
6	2014/6	1424	F	1131	U	319	U	1620	F	2220	U	1414	D
7	2014/7	1469	D	1481	D	618	D	1616	D	2631	D	959	F
8	2014/8	1190	U	593	U	325	U	1389	U	983	U	914	U
9	2014/9	1454	U	1193	U	444	D	1784	F	1615	F	1438	F
10	2014/10	1630	F	1334	D	298	U	1695	F	1645	U	1313	D
11	2014/11	1565	U	1094	F	434	U	1754	F	2031	U	795	U
12	2014/12	2135	F	1148	D	517	U	1796	U	2805	D	1329	D
13	2015/1	2040	D	787	U	1970	D	2086	D	2468	D	1095	D
14	2015/2	1016	U	1193	F	609	U	1219	U	1575	F	720	U

24	2015/12	1718	U	1634	D	1859	F	2024	U	2721	D	773	U
25	2016/1	2045	D	1379	D	1697	D	2554	D	2003	D	980	D

70	2019/10	1658	D	2233	U	1418	F	3037	D	5712	D	1118	F
71	2019/11	1461	U	2556	F	1553	U	2555	F	5120	D	1103	U
72	2019/12	1927	D	2348	U	1857	D	2721	F	4405	U	1399	D
*	2020/1	1355		2687		1122		2490		5686		1232

* The record of 2020/1 is for ensuing labeling.

Table 4. The dataset of the classical model for BMW.

* The records of 2013 are for ensuing predictor variables.

Table 5. The dataset of the sentimental model for BMW.

No (i)	Month	Historical Sales (x_i1)	Same-Month-Last Year Label (x_i2)	Month Number (x_i3)	Discussion Forums			Social Media			Q&A Websites			Blogs			News
No (i)	Month	Historical Sales (x_i1)	Same-Month-Last Year Label (x_i2)	Month Number (x_i3)	P (x_i4)	N (x_i5)	T (x_i6)	P (x_i7)	N (x_i8)	T (x_i9)	P (x_i10)	N (x_i11)	T (x_i12)	P (x_i13)	N (x_i14)	T (x_i15)	P (x_i16)	N (x_i17)	T (x_i18)
1	2014/1	1437	D	1	1078	725	3896	218	177	1503	20	8	80	80	49	326	451	162	1179
2	2014/2	624	U	2	1043	895	4214	274	130	1476	16	15	127	86	50	323	356	89	858
3	2014/3	1294	U	3	937	799	3763	217	169	1406	12	21	101	63	26	276	360	175	1052
4	2014/4	1393	F	4	1034	843	4003	388	291	2909	18	15	94	60	25	263	423	147	1179
5	2014/5	1284	F	5	1555	960	5082	686	387	3280	12	12	69	123	33	395	396	220	1194
6	2014/6	1424	U	6	1221	915	4336	368	301	2353	9	8	90	170	31	510	486	106	1060
7	2014/7	1469	D	7	942	792	3767	640	677	4279	11	9	73	181	36	540	420	146	1195
8	2014/8	1190	U	8	965	1013	4381	584	393	3768	14	57	115	175	29	452	413	342	1555
9	2014/9	1454	F	9	955	881	4019	372	275	2914	10	29	101	196	55	520	307	158	963
10	2014/10	1630	U	10	1090	827	4171	323	261	2237	6	15	46	216	36	547	390	274	1243
11	2014/11	1565	U	11	1194	1189	5285	365	570	2719	17	14	79	247	52	559	661	1066	3361
12	2014/12	2135	D	12	1115	956	4547	289	198	2222	16	48	151	494	83	1322	442	226	1503

70	2019/10	1658	U	10	1008	1009	5276	2839	2668	24048	0	1	3	19	8	79	780	549	3118
71	2019/11	1461	F	11	1185	1697	6615	2765	1826	26335	0	0	0	36	12	74	1308	905	3536
72	2019/12	1927	F	12	1047	1323	5040	4518	2858	33305	0	0	0	56	12	97	1521	830	3705

Table 6. The confusion matrix for car sales movement direction prediction.

Number of Predictions for Car Sales Movement Direction *		Predicted			Total
Number of Predictions for Car Sales Movement Direction *		D	F	U	Total
Actual	D	DD	DF	DU	A(D)
	F	FD	FF	FU	A(F)
	U	UD	UF	UU	A(U)
Total		P(D)	P(F)	P(U)	N

* DD, FF, UU: The number of correct predictions for car sales movement directions D, F, and U, respectively; FD: The number of actual directions F predicted as direction D; UD: The number of actual directions U predicted as direction D; DF: The number of actual directions D predicted as direction F; UF: The number of actual directions U predicted as direction F; DU: The number of actual directions D predicted as direction U; FU: The number of actual directions F predicted as direction U.

Table 7. The aggregate confusion matrix of the classical model for six car brands.

Number of Predictions for Car Sales Movement Direction		Predicted			Total
Number of Predictions for Car Sales Movement Direction		D	F	U	Total
Actual	D	14	3	4	21
	F	12	3	6	21
	U	15	2	13	30
Total		41	8	23	72

Table 8. The aggregate confusion matrix of the sentimental model for six car brands.

Number of Predictions for Car Sales Movement Direction *		Predicted			Total
Number of Predictions for Car Sales Movement Direction *		D	F	U	Total
Actual	D	10	4	7	21
	F	5	6	10	21
	U	1	8	21	30
Total		16	18	38	72

* DD, FF, UU: The number of correct predictions for car sales movement directions D, F, and U, respectively; FD: The number of actual directions F predicted as direction D; UD: The number of actual directions U predicted as direction D; DF: The number of actual directions D predicted as direction F; UF: The number of actual directions U predicted as direction F; DU: The number of actual directions D predicted as direction U; FU: The number of actual directions F predicted as direction U.

Table 9. The aggregate confusion matrix of the CNN-LSTM model for six car brands.

Number of Predictions for Car Sales Movement Direction *		Predicted			Total
Number of Predictions for Car Sales Movement Direction *		D	F	U	Total
Actual	D	14	3	4	21
	F	3	10	8	21
	U	1	3	26	30
Total		18	16	38	72

* DD, FF, UU: The number of correct predictions for car sales movement directions D, F, and U, respectively; FD: The number of actual directions F predicted as direction D; UD: The number of actual directions U predicted as direction D; DF: The number of actual directions D predicted as direction F; UF: The number of actual directions U predicted as direction F; DU: The number of actual directions D predicted as direction U; FU: The number of actual directions F predicted as direction U.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ou-Yang, C.; Chou, S.-C.; Juan, Y.-C. Improving the Forecasting Performance of Taiwan Car Sales Movement Direction Using Online Sentiment Data and CNN-LSTM Model. Appl. Sci. 2022, 12, 1550. https://doi.org/10.3390/app12031550

AMA Style

Ou-Yang C, Chou S-C, Juan Y-C. Improving the Forecasting Performance of Taiwan Car Sales Movement Direction Using Online Sentiment Data and CNN-LSTM Model. Applied Sciences. 2022; 12(3):1550. https://doi.org/10.3390/app12031550

Chicago/Turabian Style

Ou-Yang, Chao, Shih-Chung Chou, and Yeh-Chun Juan. 2022. "Improving the Forecasting Performance of Taiwan Car Sales Movement Direction Using Online Sentiment Data and CNN-LSTM Model" Applied Sciences 12, no. 3: 1550. https://doi.org/10.3390/app12031550

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improving the Forecasting Performance of Taiwan Car Sales Movement Direction Using Online Sentiment Data and CNN-LSTM Model

Abstract

1. Introduction

2. Methodology

2.1. Data Collection and Preprocessing for Labeling

2.1.1. Data Collection

2.1.2. Labeling of Car Sales Movement Directions

2.2. Prediction Model Structure Construction

2.2.1. The Classical Model

2.2.2. The Sentimental Model

2.2.3. The CNN-LSTM Model

3. Results

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI