Next Article in Journal
Analysis of Upper Limbs Target-Reaching Movement and Muscle Co-Activation in Patients with First Time Stroke for Rehabilitation Progress Monitoring
Next Article in Special Issue
Artificial Intelligence-Based Methods for Business Processes: A Systematic Literature Review
Previous Article in Journal
State Analysis of the Water Quality in Rivers in Consideration of Diffusion Phenomenon
Previous Article in Special Issue
Reinforcement Learning for Options Trading
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Improving the Forecasting Performance of Taiwan Car Sales Movement Direction Using Online Sentiment Data and CNN-LSTM Model

1
Department of Industrial Management, National Taiwan University of Science and Technology (Taiwan Tech), Taipei 106, Taiwan
2
Department of Industrial Engineering and Management, Ming Chi University of Technology, New Taipei City 243, Taiwan
*
Author to whom correspondence should be addressed.
Appl. Sci. 2022, 12(3), 1550; https://doi.org/10.3390/app12031550
Submission received: 24 December 2021 / Revised: 24 January 2022 / Accepted: 28 January 2022 / Published: 31 January 2022
(This article belongs to the Collection Methods and Applications of Data Mining in Business Domains)

Abstract

:
The automotive industry is the leading producer of machines in Taiwan and worldwide. Developing effective methods for forecasting car sales can allow car companies to arrange their production and sales plans. Capitalizing on the growth of social media and deep learning algorithms, this research aimed to improve the overall performance of the forecasting of Taiwan car sales movement direction forecasting by using online sentiment data and CNN-LSTM method. First, the historical sales volumes and multi-channel online sentiment data for six car brands in Taiwan were collected and preprocessed for labeling of car sales movement direction. Then, three models, namely, the classical, sentimental, and CNN-LSTM models, were constructed and trained/fitted for forecasting car sales movement directions in Taiwan. Finally, the performance of the three prediction models were compared to verify the effects of online sentiment data and the CNN-LSTM model on forecasting performance. The results showed that four forecasting performance indices, i.e., accuracy, precision, recall and F1-score, improved by 27.78% (from 41.67% to 69.45%), 0.39 (from 0.38 to 0.77), 0.27 (from 0.42 to 0.69) and 0.33 (from 0.35 to 0.68), respectively. Therefore, the online sentiment data and CNN-LSTM method can indeed improve the overall performance of car sales movement direction in Taiwan.

1. Introduction

Forecasting, which can help managers to develop more accurate and meaningful plans have played an important role in reducing business uncertainty for companies [1]. Sales forecasting in particular is the basis of definite and reliable plans for marketing, sales management, production, procurement, and logistics, which further empower companies to provide better services and reap more benefits [2]. A successful sales forecast is an essential key for companies to manage their business successfully.
The automobile industry, the leading producer of machines in many countries, is important for worldwide economic development. Furthermore, manufacturing a car requires iron, aluminum, plastic, steel, glass, rubber, copper, and more materials. If an automobile company can accurately predict its car sales, it can arrange effective production plans for its supply chain to prevent shortages and excesses of materials in the inventory process. In addition, when a customer decides to buy a new car, he/she generally hopes to take possession of the vehicle as soon as possible. If an automobile company can accurately predict its sales, it can develop effective sales plans to provide good service to its customers. Therefore, the development of a good car sales forecasting method is important for the automobile industry.
Unfortunately, few studies to date have focused on car sales forecasting [2,3,4,5,6,7,8]. Liu and Long [8] assembled a curve-regression model, a time series decomposition model, and RBF neural networks as a combined forecasting model and used economic data which takes on the obvious time factor and trends in car-making and selling. Brühl et al. [3] developed a time series model consisting of additive components: trend, seasonal, calendar, and error components. The model collected the main time series of newly registered automobiles and a secondary time series of exogenous parameters which could influence the trend of the main time series. The trend component was estimated by Multiple Linear Regression and Support Vector Machine (SVM). Yearly, quarterly, and monthly data for newly registered automobiles served as the basis for the tests of the models. The outcomes showed that the quarterly data provided the most accurate results. Wang et al. [4] developed an automobile sales forecasting methodology based on monthly sales volume, coincident indicator, leading indicator, wholesale price index, and income. Then an adaptive network-based fuzzy inference system (ANFIS) was created to obtain the forecast. The automobile forecasting methodology developed by Hülsmann et al. [5] used market-specific exogenous parameters, such as gross domestic product (GDP), stock index, personal income, and unemployment rate, on a yearly, quarterly, or monthly basis as the input variables for time series analysis and classical data mining algorithms.
On the Internet, consumers enthusiastically share their opinions and reviews via news, blogs, and social media, also known as electronic word of mouth (eWOM), and increasing numbers of potential buyers habitually consult eWOM before making their purchasing decisions [9,10,11,12,13,14]. Since eWOM can be positive or negative statements about a product or company [15,16,17], researchers have proposed sentiment analysis methods for automatically distinguishing three types of eWOM: positive, negative, and neutral [18]. To simultaneously apply historical sales data and eWOM to car sales forecasting, Fan et al. [2] used a sentiment analysis method, the Naive Bayes (NB) algorithm, to extract the sentiment index from each online review, and then integrated the sentiment index into the imitation coefficient of the Bass/Norton model to improve the forecasting accuracy.
Although very little effort has been expended to examine car sales forecasting, several points can still be raised by referring to forecasting studies of both car sales and sales of other products to facilitate the improvement of car sales forecasting methods.
First, historical sales data are the major predictor variable used for sales forecasting. Several other predictor variables, such as product prices, advertising campaigns, holidays [19], and economic indicators [4], are also frequently used for sales forecasting. Recently, with the popularity of social media, studies have begun using online reviews [2], online promotional strategies [20], and sentiment analysis [2,20,21] as the predictor variables to improve the performance of sales forecasting.
Second, several linear and nonlinear models, such as the Delphi technique, exponential smoothing, regression analysis, autoregressive integrated moving average (ARIMA), bass diffusion model, and multinomial logistic regression (MLR), are classical methods employed for sales forecasting and other predictions [4,22,23]. However, with the development of deep learning techniques, such as the Convolution Neural Networks (CNNs) and Long Short-Term Memory (LSTM), deep learning techniques have been recently applied to sales forecasting to improve the prediction performance [4,19,20,24,25]. The CNN is usually applied to image data for solving classification problems [26], while LSTM is used to analyze time series data for solving classification, processing, and forecasting problems [27].
Third, the response variable of sales forecasting can be either the sales volume (or amount) or the sales movement direction [18,28]. Sales volume forecasting is a continuous value prediction of sales volume. In contrast, the sales movement direction transforms the sales volume into directional changes in sales, such as Up, Flat, and Down. Thus, sales movement direction forecasting is a classification problem of sales forecasting.
In Taiwan’s automobile industry, the sales volume of passenger cars in 2019 was 383,987. As consumer preferences changed, the sales of imported cars in Taiwan increased year by year. The 2019 sales volume of imported cars was 200,548, i.e., about 52% of the market share of passenger cars. The total sales volume of the top six leading imported car brands, namely, BMW, Lexus, Mazda, Mercedes-Benz (Benz), Toyota, and Volkswagen (VW), was 146,231, around 73% of the market share of imported cars [29]. Thus, accurately predicting car sales, especially for the top six leading imported car brands, could contribute to the development of Taiwan’s automobile industry. Consequently, this research aimed to improve the sales forecasting for Taiwan’s car industry and used the top six leading imported car brands as the experiment cases.
As mentioned above, the response variable of car sales forecasting can be either the sales volume (or amount) or the sales movement direction. Fantazzini and Toktamysova [7] argued that correct forecasts of car sales movement directions can still provide useful information even with large errors in the forecast car sales volumes. This is particularly important when predicting a turning point, which is a special case of directional accuracy and represents a change in the car sales movement direction. Therefore, this research selected the car sales movement direction as the response variable for the sales forecasting of Taiwan’s six leading imported car brands. In addition, because a car is a durable consumer good, potential buyers will spend more time on eWOM to aid in decision-making on the purchase. To improve the performance of car sales forecasting, in addition to historical sales data, multi-channel online sentiment data were also used as the predictor variables for car sales forecasting. Instead of regarding the sentiment data as a coefficient of the Bass/Norton model in Fan et al.’s study [2], this research prepared and analyzed a series of daily multi-channel online sentiment data of a car brand in the form of images. Therefore, to consider the image characters of online sentiment data and the time series characteristics of historical car sales data, a CNN-LSTM model integrating the CNN and LSTM networks was used to build a car sales prediction model with improved prediction performance.
To clarify the effects of the online sentiment data and the CNN-LSTM model on car sales predictions, three models were created for forecasting car sales movement directions in Taiwan. The first “classical” model, using the historical sales data as predictor variables and MLR as the prediction model, was created as the performance baseline of forecasting of car sales movement directions in Taiwan. The MLR is a generalized logistic regression for solving problems with more than two classes [22,30]. Then a “sentimental” model was created by adding the multi-channel online sentiment data as the predictor variables to the classical model so as to verify the effects of online sentiment data on prediction performance. Finally, a “CNN-LSTM” model was created by replacing the MLR method in the sentiment model with the CNN-LSTM method proposed in this research to verify the effects of the latter method on prediction performance.
The performance comparison of the three prediction models showed that the forecasting accuracy of car sales movement directions in Taiwan was effectively improved by the use of online sentiment data and the CNN-LSTM model.
This paper is organized as follows. Section 1 states the relevant topics of this research and reviews the literature related to the research problem. Section 2 elaborates the creation process of the three prediction models for forecasting car sales movement directions in Taiwan. In Section 3, the results of the three prediction models are compared and analyzed to verify the effects of the online sentiment data and the CNN-LSTM model on the forecasting of car sales movement directions in Taiwan. Finally, the important findings, discussions, and suggestions for further research are summarized in Section 4.

2. Methodology

Figure 1 shows the research framework for improving the forecasting of car sales movement directions in Taiwan by using online sentiment data and the CNN-LSTM model. First, the historical sales volumes and multi-channel online sentiment data of Taiwan’s top six leading imported car brands were collected and preprocessed for labeling of car sales movement directions. Then the structures of three prediction models, namely, the classical, sentimental, and CNN-LSTM models, were constructed for forecasting car sales movement directions in Taiwan. Third, the three prediction models were trained or fitted with the datasets of Taiwan’s top six leading imported car brands. Finally, the prediction performances of the three prediction models were evaluated and compared to verify the effects of online sentiment data and the CNN-LSTM model on the forecasting of car sales movement directions in Taiwan.

2.1. Data Collection and Preprocessing for Labeling

2.1.1. Data Collection

As mentioned above, this research used both online sentiment data and the CNN-LSTM method to improve the performance of predictions of the car sales movement direction of Taiwan’s top six leading imported car brands. As shown in Figure 2, for each of Taiwan’s six car brands, namely, BMW, Lexus, Mazda, Benz, Toyota, and VW, the historical car sales data and online sentiment data were collected mainly from 2014 to 2019.
The historical car sales data were collected from the Ministry of Transportation and Communications, R.O.C. (MOTC) [29]. The MOTC website is a platform for commonly used transportation statistics and is operated by Taiwan’s government. Since the MOTC website provides new car registration data on a monthly basis, as shown in Table 1, the new car registration data from 2014 to 2019 were retrieved as the historical monthly sales volumes for six car brands. In addition, the new car registration data of January 2020 were also collected for the continuing labeling work.
The online sentiment data were collected from the OpView Insight: Social Media Monitoring Tool (OpView) [31]. OpView is the largest social media monitoring service platform in Taiwan. It collects eWOM and news every day from five online media sources in Taiwan [32], including more than 6100 discussion forums (e.g., the Mobile01 and the Dcard), more than 36,000 social media (e.g., Facebook and Instagram), more than 400 Q&A websites (e.g., Yahoo! Answers), more than 1800 blogs, and more than 3600 news websites (e.g., ETtoday and Line Today) [31]. The collected daily eWOM and news are then analyzed as three types of sentiments, i.e., positive, negative, and neutral, for various products and brands. For the study, three types of daily online sentiment volumes, positive (P), negative (N), and total (T), from 2014 to 2019 for six car brands were collected. Table 2 shows the collected daily online sentiment volumes for BMW.

2.1.2. Labeling of Car Sales Movement Directions

As mentioned in Section 1, this research selected the sales movement direction as the response variable of car sales prediction models. Hence, the monthly sales movement directions were labeled with the collected monthly sales volumes for six car brands. The three types of sales movement directions, Up (U), Flat (F), and Down (D) are defined in Equation (1). Since the intent of this research was to predict the car sales movement direction of the next month at this month, this equation indicates that the ith month’s label li is determined by the i + 1th month’s monthly sales growth rate (Si+1Si)/Si and the predefined threshold h.
l i = U , i f   ( S i + 1 S i ) / S i > h F , i f h S i + 1 S i / S i h D , i f   ( S i + 1 S i ) / S i < h
Assume that the threshold h is set at 10%. The results of Equation (1), in Table 3, show the labeling results of each month from 2014 to 2019 based on the collected historical car sales volumes shown in Table 1.
For example, the l1 (i.e., the label of 2014/1) of BMW is determined by the 2nd month’s (i.e., 2014/2’s) monthly sales growth rate (624 − 1437)/1437 = −56.58% and the predefined threshold h = 10%. As the 2nd month’s monthly sales growth rate, −56.58%, is smaller than −10% (i.e., −h), the l1 of BMW is labeled as the Down direction “D”. As mentioned in Section 2.1.1, for labeling the sales movement direction of 2019/12, i.e., l72, the car sales volume of 2020/1 must be collected. For the example of the l72 of Lexus, the 73rd month’s monthly sales growth rate (2687−2348)/2348 = 14.44% is greater than the threshold h = 10%; the l72 of Lexus should be labeled with an Up direction “U” according to Equation (1).

2.2. Prediction Model Structure Construction

As explained in Section 1, to verify the effects of online sentiment data and the CNN-LSTM model on Taiwan’s car sales prediction, three models, namely, the classical, sentimental and CNN-LSTM models, were created for forecasting car sales movement directions in Taiwan. The structures of these three prediction models will be described in this section.

2.2.1. The Classical Model

In this research, the classical model was created as the baseline for prediction performance for comparison with the sentimental and the CNN-LSTM models.
The classical model adopted the most frequently used predictor variables for sales forecasting, including historical sales data and seasonality data, to predict the car sales movement direction. For example, Table 4 is the dataset prepared for creating the classical model for BMW. The monthly car sales volumes and monthly labeling data were retrieved from Table 3. As for seasonality data, car companies in Taiwan usually start different sales campaigns in specific months to promote sales, so monthly car sales volumes exhibit strong seasonality. In this research, the seasonality data, namely, the month number and the same-month-last-year sales movement direction labels were added, as shown in Table 4, to improve the accuracy of predictions of car sales movement directions.
Furthermore, since three types of car sales movement directions (U, F, and D) were defined in this research, the MLR, a generalized logistic regression for solving the problems with more than two classes [22], was selected as the forecasting method of car sales movement directions. Hence, the classical model for forecasting car sales movement directions in Taiwan can be expressed by Equation (2):
yi = β0 + β1xi1 + β2xi2 + β3xi3 + ϵ
where yi refers to the car sales movement direction (U, F, or D) of the ith month; xi1 refers to the sales volume of the ith month; xi2 refers to the same-month-last-year sales movement direction label of the ith month; xi3 refers to the month number of the ith month; β0 refers to the y-intercept (constant term); β1 to β3 refer to the slope coefficients for xi1 to xi3; ϵ refers to the model’s error term (also known as the residuals).
In multinomial logistic regression with K classes, one class is chosen as a “pivot”, and K-1 independent binary logistic regression models are constructed [22]. If car sales movement direction Y = U is selected as the pivot, then the model for Y = U is:
ln P Y = F P Y = U = b F · x   and   ln P Y = D P Y = U = b D · x
where Y refers to the random variable of yi in Equation (2); bF and bD refer to the set of regression coefficients in Equation (2) associated with car sales movement directions F and D, respectively (b is typically estimated by the maximum likelihood method [33]); x refers to the vector of xi1 to xi3 in Equation (2). Then, the probability that x belongs to car sales movement directions F and D can be expressed as Equation (4):
P Y = F = P Y = U e b F · x   and   P Y = D = P Y = U e b D · x
Since the sum of the probabilities that x belongs to each class is 1, the probability that x belongs to car sales movement direction U becomes:
P Y = U = 1 P Y = F P Y = D = 1 1 + e b F · x + e b D · x
and Equation (4) can be rewritten as follows:
P Y = F = e b F · x 1 + e b F · x + e b D · x ·   and   P Y = D = e b D · x 1 + e b F · x + e b D · x
Given the x from Table 4’s dataset, MLR outputs a car sales movement direction label y such that:
y = arg   max k = U ,   F ,   D P Y = k

2.2.2. The Sentimental Model

As mentioned in Section 1, since the potential car buyers tend to spend more time on online sentiment data to aid in purchasing decision-making, this research intends to improve the overall performance of forecasting of car sales movement directions in Taiwan by using multi-channel online sentiment data.
To clarify the effects of multi-channel online sentiment data on car sales movement direction prediction, the sentimental model was created as shown in Equation (8) by adding the sentimental data to the classical model, shown in Equation (2), as the predictor variables for forecasting car sales movement directions in Taiwan:
yi = β0 + β1xi1 + β2xi2 + β3xi3 + β4xi4 + … + β18xi18 + ϵ
The only difference between the classical model and the sentimental model was that the predictor variables of the sentimental model additionally contained the online sentiment data from xi4 to xi18. As shown in Table 2, xi4 to xi18 respectively refer to the five channels, including discussion forums, social media, Q&A websites, blogs, and websites, and each channel is composed of three types of online sentiment volume, i.e., P, N, and T, for six car brands collected from the OpView Insight.
However, the online sentiment data were collected on a daily basis, whereas the forecasting model defined in Equation (8) were monthly, since the subscript i represents the ith month. Consequently, the collected daily online sentiment data needed to be converted into monthly online sentiment data. For BMW, for example, the dataset for creating the sentimental model was prepared as shown in Table 5.
As in the classical model, the MLR was used for solving the sentimental model to predict the car sales movement directions. Therefore, the fitting and forecasting processes for Equation (8) were the same as those for Equations (3)–(7).

2.2.3. The CNN-LSTM Model

In the last section, for the sentimental model, online sentiment data were added to the predictor variables of the classical model in an attempt to improve the overall performance of forecasting of car sales movement directions in Taiwan. This section presents the development of the CNN-LSTM model, which integrated the CNN and LSTM networks instead of the MLR method used in the sentimental model to improve the overall performance of forecasting of car sales movement directions in Taiwan. The basic idea of the utilization of these models is that LSTM models are appropriate for dealing with time series data, while CNN models may filter out the noise of the input data and extract more valuable features [34,35].
Figure 3 presents the architecture of the CNN-LSTM model. A sliding window [36] is first used in pre-processing for data extracting for the CNN-LSTM models. Then, the CNN-LSTM model sequentially consists of a one-dimensional convolutional neural network (1D CNN), a CNN network, an LSTM network, a fully connected network, and an output layer.
Basically, the CNN-LSTM model used the same predictor variables as the sentimental model with the exception of the historical car sales volumes. After a test of prediction performance, this research chose to use the past 60-day (2-month) data of predictor variables to predict the next-month (1-month) response variable of car sales movement directions. Therefore, this research uses a 90-day (3-month) window with a 30-day (1-month) sliding step to extract the training data for CNN-LSTM model from the collected dataset. Each sliding window includes a set of 2-month predictor variables (Xi) and a 1-month response variable (Yi). The data of predictor variables was standardized and input via a one-dimensional convolutional neural network (1D CNN) developed for processing data arranged in a single dimension [26,37]. Since the multi-channel daily sentiment data was collected on a daily basis, its 60-day data was extracted directly to form the 1D 60-day input data. However, because both the same-month-last-year sales movement direction label and the month number were monthly-based, their two monthly data must be duplicated to form the 1D 60-day input data, i.e., one monthly data was duplicated to 30 daily data.
After receiving data via the 1D CNN layer, a CNN network containing five convolutional and max-pooling layers filtered out the noise of the input sentiment data and extracted sentiment change features for the prediction of car sales movement directions in Taiwan. In addition, both the daily online sentiment data and the monthly car sales data were time series. Moreover, the LSTM network, which has a feedback loop for processing the entire data sequence, is always used for classification, processing, and forecasting based on time series data [27].
Therefore, an LSTM network and a 4-layer fully connected network following the CNN network were used to generate the output of car sales movement direction prediction. Some considered hyperparameters of the CNN-LSTM model are presented below: the activation function was relu; the loss function was cross_entropy; the optimizer was adam; the learning rate was 0.003. Finally, the earlystop method was used to solve the overfitting problem [38,39].

3. Results

As mentioned in Section 2.1.1, this research collected the data of car sales volumes and multi-channel online sentiment volume from 2014 to 2019. For training and evaluating the three prediction models (classical, sentimental, and CNN-LSTM) proposed in Section 2.2, the collected data from 2014 to 2018 were used as the training dataset, and the data of 2019 were used as the test dataset. This section reviews and compares the forecasting performances of the three models to verify the effects of the online sentiment data and the CNN-LSTM model on forecasting of car sales movement directions in Taiwan.
The performances of the classical, sentimental, and the CNN-LSTM models proposed in Section 2.2 were evaluated with four indices, including accuracy, precision, recall, and F1-score. These measures are defined in Equations (9)–(14) based on the confusion matrix shown in Table 6.
A c c u r a c y = D D + F F + U U N
P r e c i s i o n   x = D D P D ,   i f   x = D F F P F ,   i f   x =   F U U P U ,   i f   x =   U
P r e c i s i o n = P D × P r e c i s i o n D + P F × P r e c i s i o n F + P U × P r e c i s i o n U N
R e c a l l   x = D D A D ,   i f   x = D F F A F ,   i f   x = F U U A U ,   i f   x = U
R e c a l l = A D × R e c a l l D + A F × R e c a l l F + A U × R e c a l l U N
F 1 - s c o r e   =   2 × P r e c i s i o n × R e c a l l P r e c i s i o n + R e c a l l
Accuracy is an intuitive measure of the performance of a prediction model. As defined in Equation (9), accuracy is the ratio of the correct predictions to the total number of predictions. However, accuracy is great only when the dataset is balanced. Since the dataset for forecasting car sales trends in Taiwan collected in this research had an uneven class distribution, three other measures: precision, recall, and F1-score, also needed to be reviewed for evaluation of the performance of the three prediction models.
Precision was used to evaluate the correctness of the prediction of a model for each kind of car sales movement direction. As shown in Equation (10), the precision of a specific direction is the ratio of the correct predictions within the total predictions of that direction. That is, it removes a type of prediction for U, F, and D, and evaluates the accuracy of the prediction. To evaluate the overall precision of a model, Equation (11) defines the precision as the weighted average of the precision of individual directions.
On the other hand, recall is used to evaluate the ability of a model to correctly identify the observations in each kind of car sales movement direction in the dataset. As defined in Equation (12), the recall of a specific direction is the ratio of the correct predictions among the observations with actual values of that direction. To evaluate the overall recall of a model, the recall is defined as Equation (13), as the weighted average of the recall of individual directions.
In some situations, precision or recall will be maximized at the expense of the other metric. To take both precision and recall into account, Equation (14) defines the F1-score as the harmonic mean of precision and recall. Therefore, the F1-score is an overall measure of a model’s accuracy that combines precision and recall [40]. The F1-score is usually more useful than accuracy, especially if the dataset has an uneven class distribution.
As mentioned in Section 1 and Section 3, the classical, sentimental, and the CNN-LSTM models were designed to verify the effects of the online sentiment data and the CNN-LSTM model on the forecasting of car sales movement directions in Taiwan. Since the sentimental model was designed by adding the sentimental data to the classical model as the predictor variables for that purpose, the effects of multi-channel online sentiment data on car sales prediction accuracy could be reviewed by comparing the performances of the classical and sentimental models. Furthermore, since the CNN-LSTM model adopted a deep learning method integrating the CNN and LSTM networks instead of the MLR method in the sentimental model, the effects of the CNN-LSTM model on car sales prediction accuracy could be verified by comparing the performances of the two models.
Consequently, the remainder of this section will first review the confusion matrix of the testing results. Table 7, Table 8 and Table 9 show the aggregate confusion matrices of the classical, sentimental, and the CNN-LSTM models, respectively, for the six car brands.
Based on the confusion matrices, the four performance indices of accuracy, precision, recall, and F1-score were then computed and compared model by model to evaluate the effects of online sentiment data and the CNN-LSTM model on the forecasting of the car sales movement directions.
Figure 4 shows the accuracy of the classical, sentimental, and the CNN-LSTM models for six car brands. For all but TOYOTA, their prediction accuracies of the sentimental model were higher than those of the classical model. TOYOTA had accuracies of 50% in both the classical and sentimental models. On average, the sentimental model (51.39%) provided an improvement in accuracy of 9.72% over that of the classical model (41.67%) due to the addition of multi-channel online sentiment data as the predictor variables to predict car sales movement directions. Furthermore, the accuracies of the CNN-LSTM model for all six brands were higher than those of the sentimental model. On average, the CNN-LSTM model (69.45%) achieved accuracy that was 18.06% higher than that of the sentimental model (51.39%) due to the adoption of the CNN-LSTM network instead of the MLR method. In total, the CNN-LSTM model demonstrated an average improvement in accuracy that was 27.78% higher than that of the classical model. Consequently, according to the above comparison, the accuracy of forecasting of car sales movement directions in Taiwan was indeed improved by using online sentiment data and the CNN-LSTM model.
However, accuracy is an intuitive and great measure only when the dataset is balanced. The collected dataset for forecasting car sales movement directions in Taiwan had an uneven class distribution. Hence, this research further examined three other measures, i.e., precision, recall, and F1-score, to evaluate the performances of the three prediction models.
Figure 5 shows the precision of the classical, sentimental, and the CNN-LSTM models for six car brands. As mentioned above, precision is a measure of the ability of a model to correctly identify the observations from the predictions of a specific car sales movement direction. As shown in Figure 5, the precision of the sentimental model for all six brands was higher than that of the classical model. On average, the precision of the sentimental model (0.55) was 0.16 higher than that of the classical model (0.38). Furthermore, the precision of the CNN-LSTM model for all but Lexus was higher than that of the sentimental model. Lexus had high precision of 0.83 in both the sentimental and the CNN-LSTM models. On average, the precision of the CNN-LSTM model (0.77) was 0.22 higher than that of the sentimental model (0.55). In total, the precision of the CNN-LSTM model was an average of 0.39 higher than that of the classical model. Consequently, according to the above comparison, the precision of the forecasting of car sales movement directions in Taiwan was effectively improved by the use of online sentiment data and the CNN-LSTM model.
Figure 6 shows the recall of the classical, sentimental, and the CNN-LSTM models for the six car brands. As mentioned above, recall is a measure of the ability of a model to correctly identify the observations of a specific car sales movement direction within the dataset. As shown in Figure 6, the recalls of the sentimental model for all six brands but TOYOTA were higher than those of the classical model. TOYOTA had a consistent recall of 0.50 in both the classical and sentimental models. On average, the recall of the sentimental model (0.51) was 0.09 higher than that of the classical model (0.42). Furthermore, the recalls of the CNN-LSTM model for all six brands were higher than those of the sentimental model. On average, the recall of the CNN-LSTM model (0.69) was 0.18 higher than that of the sentimental model (0.51). In total, the CNN-LSTM model demonstrated an average improvement in recall of 0.27 over that of the classical model. According to the above comparison, the recall of forecasting of car sales movement directions in Taiwan was improved by the use of online sentiment data and the CNN-LSTM model.
Finally, the results of F1-scores simultaneously considering precision and recall are reviewed. Figure 7 shows the F1-scores of the classical, sentimental, and the CNN-LSTM models for the six car brands. As mentioned above, the F1-score is more useful than accuracy when the dataset has an uneven class distribution. As shown in Figure 7, the F1-scores of the sentimental model for all six brands were higher than those of the classical model. On average, the F1-scores of the sentimental model (0.46) were 0.11 higher than those of the classical model (0.35). Furthermore, the F1-scores of the CNN-LSTM model for all six brands were higher than those of the sentimental model. On average, the F1-scores of the CNN-LSTM model (0.68) were 0.22 higher than those of the sentimental model (0.46). Overall, the CNN-LSTM model demonstrated an average improvement in F1-score of 0.33 over that of the classical model. Based on the above comparison, the F1-score of forecasting of car sales movement directions was improved by the use of online sentiment data and the CNN-LSTM model.
The comparisons of the four indices in the classical, sentimental, and the CNN-LSTM models have shown that the performance of forecasting of car sales movement directions in Taiwan was improved by the use of online sentiment data and the CNN-LSTM model.
However, accuracy works best if kinds of false predictions have similar costs. If the costs of different false predictions are very dissimilar, it is better to look at precision or recall. In this research, the costs of classifying direction D as F or U, classifying direction F as D or U, and classifying direction U as D or F were different, so precision and recall were further explored to examine the applicability of the proposed methods.
As mentioned above, precision is used to evaluate the ability of a model to correctly predict each kind of car sales movement directions; recall is used to evaluate the ability of a model to correctly identify the observations in each kind of car sales movement directions in the dataset. In practice, car companies arrange their production and sales plans according to the predictions of car sales movement directions, so the correctness of the prediction, i.e., the precision, will have a direct impact on car inventories and sales. Thus, in the application of car sales movement direction forecasting, precision is more important than recall. The following section discusses the precision of the three models for the six car brands in greater detail.
Figure 8 illustrates the precision for direction D, Precision(D), of the classical, sentimental, and the CNN-LSTM models for the six car brands. A low Precision(D) means that most of the predictions of direction D are incorrect and should be F or U, so the planned production quantity will not meet the market demand. Improving Precision(D) can prevent losses of sales and market share due to insufficient production. As shown in Figure 8, the Precision(D) values of the sentimental model for all six brands were equal to or higher than those of the classical model. On average, the Precision(D) values increased by 0.39, from 0.34 in the classical model to 0.73 in the sentimental model. Furthermore, the Precision(D) of the CNN-LSTM model for all six brands except VW were equal to or higher than those of the sentimental model. The Precision(D) of the sentimental model was enhanced by 0.11 in the CNN-LSTM model, from 0.73 to 0.84 on average. Both the online sentiment data and the CNN-LSTM method were conducive to the improvement of Precision(D) for forecasting car sales movement directions in Taiwan. The integral effect of improving Precision(D) by using online sentiment data and the CNN-LSTM method was 0.50 (from the average of 0.34 in the classical model to the average of 0.84 in the CNN-LSTM model). Obviously, the effect of the online sentimental data (0.39) in the sentimental model on the improvement of Precision(D) was particularly significant. As for VW, although the Precision(D) dropped from 1.00 for the sentimental model to 0.75 for the CNN-LSTM model, the integral effects of online sentiment data and the CNN-LSTM method led to improvement of the Precision(D) by 0.46 (from 0.29 in the classical model to 0.75 in the CNN-LSTM model).
For the precision of direction U, Figure 9 shows the Precision(U) of the three models for the six car brands. A low Precision(U) indicates that most of the predictions of direction U are incorrect and should be F or D, so car companies will plan a production quantity that exceeds market demand. Improving Precision(U) can avoid a surplus of cars. As shown in Figure 9, the Precision(U) values of the sentimental model for all six brands except TOYOTA were equal to or higher than those of the classical model. On average, the Precision(U) increased by 0.06, from 0.57 in the classical model to 0.63 in the sentimental model. Furthermore, the Precision(U) values of the CNN-LSTM model for all six brands except Lexus were equal to or higher than those of the sentimental model. The Precision(U) demonstrated an average improvement of 0.10, from 0.63 for the sentimental model to 0.73 for the CNN-LSTM model. Thus, it was found that the Precision(U) of forecasting of car sales movement directions in Taiwan was also somewhat improved by the use of online sentiment data and the CNN-LSTM method. The integral effect of online sentiment data and the CNN-LSTM method on the Precision(U) was an increase of 0.16 (from the average of 0.57 in the classical model to the average of 0.73 in the CNN-LSTM model). The effect of the CNN-LSTM method (0.16) was a little larger than that of the online sentiment data (0.06). However, TOYOTA and Lexus were not thus affected. The use of online sentiment data may have interfered with the Precision(U) improvement for TOYOTA, for it dropped from 1.00 in the classical model to 0.57 in the sentimental model. On the other hand, the CNN-LSTM appeared unable to raise the Precision(U) for Lexus, which dropped from 1.00 in the sentimental model to 0.70 in the CNN-LSTM model.
Finally, the precision of direction F, Precision(F), for the three models and six car brands are shown in Figure 10. A low Precision(F) value indicates that most of the predictions of direction F are incorrect and should be D or U. Since the planned production quantity will be higher or lower than the market demand, car companies may either lose sales and market share or hold excess inventory. As shown in Figure 10, the Precision(F) values of the sentimental model for all six brands except BENZ were equal to or higher than those of the classical model. The Precision(F) increased by 0.15 on average, from 0.07 for the classical model to 0.22 for the sentimental model. Comparatively, the Precision(F) values of the CNN-LSTM model for all six brands were higher than those of the sentimental model, as was the improvement of 0.57, from the average of 0.22 of the sentimental model to the average of 0.79 of the CNN-LSTM model. Although both the online sentiment data and the CNN-LSTM model contributed to the improvement of Precision(F) values, the effect of the CNN-LSTM method was particularly significant. As for BENZ, although the Precision(F) dropped from 0.43 in the classical model to 0.00 in the sentimental model, the integral effect of online sentiment data and the CNN-LSTM method enhanced the Precision(F) from 0.43 for the classical model to 0.67 for the CNN-LSTM model.
From the above analysis, it can be found that the precisions of the three directions, Precision(D), Precision(F), and Precision(U), all improved gradually from the classical model to sentimental and the CNN-LSTM model. Therefore, the results clearly indicated that the online sentiment data and CNN-LSTM method improved the precision of the forecasting of directions D, F, and U in car sales in Taiwan.
However, the extent of the effects of online sentiment data and CNN-LSTM method on Precision(D), Precision(F), and Precision(U) were different. The contribution to Precision(U) of the online sentiment data was 0.06, while that from the CNN-LSTM method was 0.10. That is, Precision(U) gained only a little improvement from the online sentiment data and CNN-LSTM method. For Precision(D), in contrast, the contributions of the online sentiment data and the CNN-LSTM method were 0.39 and 0.14, respectively. Thus, Precision(D) gained a lager improvement from the online sentiment data than from the CNN-LSTM method. Furthermore, the Precision(F) gained only 0.15 in improvement from the online sentiment data but 0.57 from the CNN-LSTM method. Obviously, most of the improvement of Precision(F) resulted from the CNN-LSTM method. A further discussion of how the online sentiment data and CNN-LSTM method impacted Precision(D), Precision(F), and Precision(U) is provided as follows.
As mentioned in Section 2.1, this research collected and analyzed three types of online sentiment volumes, i.e., positive (P), negative (N), and total (T), to predict car sales movement direction for six car brands. Theoretically, positive sentiment is consumer responses indicating satisfaction; negative sentiment is consumer responses suggesting dissatisfaction. A more positive sentiment volume in the previous months indicates an increased likelihood of a sales movement direction of U in the next month. Conversely, a more negative sentiment volume in the previous months increases the likelihood of a sales movement direction of D in the next month.
In practice, the originators and motives of eWOMs and online sentiment will influence consumer intentions and decisions, which in turn will impact the overall performance of sales movement direction forecasting. The main originators of eWOMs and online sentiment include the customers and the industry. The online sentiments of customers who have real praise (positive eWOMs) or complaints (negative eWOMs) may directly affect the sales movement directions of the subsequent periods. On the other hand, the online sentiments from the profession include eWOMs written by the companies themselves, the company’s partners, or the company’s competitors. To promote its products, a company and its partners may intentionally leave positive eWOMs or sentiments on social media or blogs. Conversely, competitors may write negative eWOMs as malicious attacks. Therefore, most of the eWOMs or online sentiments from the profession may be false and could interfere with sales movement direction forecasting.
In the car market, safety and after-sales service play key roles in shaping consumer attitudes and perceptions towards car purchases. Safety and after-sales service experiences also drive customer satisfaction and eWOMs. Over the years, car companies have striven to target and attract specific customer segments through automotive design and services. Meanwhile, consumers will become loyal customers of a particular automotive brand under the consideration of customer experiences and personal preferences. These loyal customers are also willing to write positive eWOMs and share their positive sentiment and experiences with the public via news, blogs, and social media. In contrast, customers who have poor experiences with a car brand will leave negative eWOMs and sentiments on the Internet.
As a result, in normal times, a car brand will have stable volumes of positive and negative eWOMs, and sentiments from satisfied and dissatisfied customers, respectively. This steady volume of positive and negative sentiments from customers theoretically should be helpful to the accurate prediction of car sales movement direction of F. However, the car brand company may sometimes write positive eWOMs and sentiments to respond to the negative eWOMs and sentiments of customers. These positive eWOMs and sentiments from the car manufacturer will interfere with the prediction of direction F because they are not from actual customers. This interference may explain why the online sentiment volume did not contribute much to the improvement of Precision(F), increasing it only from 0.07 of the classical model to 0.22 of the sentimental model on average. However, the CNN-LSTM method used in this research can effectively filter out the influence of noise from the eWOMs and the sentiment volume of the profession to enhance Precision(F) from the average of 0.22 of the sentimental model to the average 0.79 of the CNN-LSTM model, for it can integrate and apply the CNN’s image processing ability to online sentiment data and the LSTM’s time series processing ability to car sales historical data.
If it is time for a car manufacturer to launch a new car, the company will actively create an atmosphere of discussion of the merits of the new car on the Internet. As a result, the positive eWOMs or sentiment will continue to grow for a period of time. Normally, in the early stage of the introduction of a new car on the market, the increases in positive eWOMs and online sentiments are usually accompanied by growth in new car sales. However, whether the subsequent increases in positive eWOMs and online sentiment will continue to stimulate new car sales depends on consumers’ budget constraints and acceptance of the car brand. As mentioned above, consumers usually accept and prefer only a few brands. Therefore, in the late stage of the introduction of a new car, although the positive eWOMs and online sentiment continue to increase, they may not lead to an increase in car sales. This suggests that increases in positive eWOMs and online sentiment sometimes lead to increases in car sales, but sometimes they do not. This fact will reduce the effectiveness of the online sentiment data and the CNN-LSTM method, and it is the reason why the Precision(U) gained only small improvements from the online sentiment data (0.06, from 0.57 of the classical model to 0.63 of the sentimental model) and the CNN-LSTM method (0.10, from 0.63 of the sentimental model to 0.73 of the CNN-LSTM model).
In contrast, if customers are dissatisfied with the quality or service of a car brand, the volume of negative eWOMs and online sentiment will increase, and negative experiences are usually communicated faster than positive experiences. Although the car company may write positive eWOMs and sentiments to respond to the negative eWOMs and sentiments from customers, the car sales still can be affected with varying degrees of decline. If the complaints are related to serious safety issues or design defects, the car company will be more active in issuing a public recall notification to instruct customers to return their cars for free repairs. At this point, the volume of negative eWOMs and online sentiments will increase significantly, and potential consumers will turn away from the car brand. This is why adding online sentiment data as predictor variables resulted in a significant improvement of 0.39 in Precision(D), from 0.34 of the classical model to 0.73 of the sentimental model. With the significant help of the online sentiment data, the deep learning ability of the CNN-LSTM method was able to improve Precision(D) by only 0.11, from 0.73 of the sentimental model to 0.84 of the CNN-LSTM model.
To sum up, both the online sentiment data and the CNN-LSTM method have good effects on the precision of three car sales movement directions, D, F, and U. As to the computational burden, the three models were executed on a PC environment of Intel Core i7-6700 CPU @ 3.40 GHz × 4, 24 GB RAM memory, and NVIDIA GeForce GTX 1060 6 GB GPU. The total running time of the classical, sentimental, and CNN-LSTM models for each brand is about 1, 2, and 45 min, respectively. The CNN-LSTM model takes longer time to train the model. Once the model is trained, the time required for subsequent testing and actual prediction is not much different from the other two models. Although the CNN-LSTM model takes longer time, it can create better prediction performance, which is worthwhile overall.

4. Conclusions

With the explosive growth of social media and emerging forecasting methods, the efforts to improve the performance of car sales forecasting should consider the adoption of eWOM, online sentiment data, and some deep learning techniques. The purpose of this research was to improve the overall performance of forecasting of car sales movement directions in Taiwan by using online sentiment data and the CNN-LSTM method. This research selected the car sales movement direction as the predicted object, and it was defined by the sales growth rate of the next month and the predefined threshold. Therefore, in practical application, if a threshold is set at 10% and the car sales movement direction predicted by the CNN-LSTM model is up (U) or down (D), the car company can adjust the production and sales of the next month upward or downward by 10%. To verify the effects of online sentiment data and the CNN-LSTM method on the forecasting of car sales movement directions in Taiwan, three forecasting models, namely, the classical model, the sentimental model, and the CNN-LSTM model, were constructed and compared.
The results showed that of the use of both online sentiment data and the CNN-LSTM method led to significant improvements over the classical model in accuracy (27.78%, from 41.67% to 69.45%), precision (0.39, from 0.38 to 0.77), recall (0.27, from 0.42 to 0.69), and F1-score (0.33, from 0.35 to 0.68). It is found that the overall performance of forecasting of car sales movement directions in Taiwan can be effectively improved by the use of online sentiment data and the CNN-LSTM model.
Furthermore, because car companies use prediction of car sales movement directions to arrange their production and sales plans, the degree of precision will impact directly both car inventories and sales, so this degree needs to be explored in more detail. The results showed that model with the online sentiment data and the CNN-LSTM method demonstrated on the improvements in Precision(U), Precision(D), and Precision(F) of 0.16 (from 0.57 to 0.73), 0.50 (from 0.34 to 0.84), and 0.72 (from 0.07 to 0.79) respectively.
In practice, if direction U is wrongly predicted as direction F or D, the planned production quantity will fall behind the market demand and may incur some opportunity cost, but the impact is not significant. Since loyal customers may just delay their orders, there is little impact on sales volume or market share loss. In contrast, if direction D is wrongly predicted as direction F or U, the acting cost will be high. Since the planned production quantity will exceed the market demand, automobile manufacturers may hold excess inventory, in which can lead to a backlog of capital or the need to sell at a reduced price. Therefore, improvements of Precision(U), Precision(F) and Precision(D) will be of great benefit to car companies in developing effective production and sales plans.
Besides, many deep learning models have been developed. Some models are available for this research. For example, the LSTM-vanilla model which only adds a peephole connection to the classic LSTM is similar to the LSTM used in the CNN-LSTM model for this research. In the future, the LSTM-vanilla model may be used to improve the prediction performance and compare with the CNN-LSTM model proposed in this research.

Author Contributions

Conceptualization, C.O.-Y. and Y.-C.J.; Methodology, C.O.-Y., S.-C.C., and Y.-C.J.; Software, S.-C.C.; Validation, Y.-C.J.; Formal analysis, C.O.-Y., S.-C.C., and Y.-C.J.; Investigation, S.-C.C.; Resources, C.O.-Y.; Data curation, S.-C.C.; Writing—original draft preparation, S.-C.C.; Writing—review and editing, C.O.-Y. and Y.-C.J.; Visualization, S.-C.C.; Supervision, C.O.-Y. and Y.-C.J.; Project administration, C.O.-Y.; Funding acquisition, C.O.-Y. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the Ministry of Science and Technology of the Republic of China, Taiwan, under contract no. MOST 109-2221-E-011-102.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data are nondisclosure due to confidentiality agreement.

Acknowledgments

The authors would like to thank the Ministry of Science and Technology of the Republic of China, Taiwan, for financially supporting this research. The authors also gratefully acknowledge the support of the OpView Insight for this work.

Conflicts of Interest

The authors declare that there is no conflict of interest.

References

  1. Stevenson, W.J. Operations Management, 12nd ed.; Irwin/McGraw-Hill: New York, NY, USA, 2015. [Google Scholar]
  2. Fan, Z.-P.; Che, Y.-J.; Chen, Z.-Y. Product sales forecasting using online reviews and historical sales data: A method combining the Bass model and sentiment analysis. J. Bus. Res. 2017, 74, 90–100. [Google Scholar] [CrossRef]
  3. Brühl, B.; Hülsmann, M.; Borscheid, D.; Friedrich, C.M.; Reith, D. A sales forecast model for the German automobile market based on time series analysis and data mining methods. In Proceedings of the 2009 Industrial Conference on Data Mining, Leipzig, Germany, 20–22 July 2009; pp. 146–160. [Google Scholar]
  4. Wang, F.-K.; Chang, K.-K.; Tzeng, C.-W. Using adaptive network-based fuzzy inference system to forecast automobile sales. Expert Syst. Appl. 2011, 38, 10587–10593. [Google Scholar] [CrossRef]
  5. Hülsmann, M.; Borscheid, D.; Friedrich, C.M. General sales forecast models for automobile markets and their analysis. Trans. Mach. Learn. Data Min. 2012, 5, 65–86. [Google Scholar]
  6. Pierdzioch, C.; Rülke, J.-C.; Stadtmann, G. Forecasting U.S. car sales and car registrations in Japan: Rationality, accuracy and herding. Jpn. World Econ. 2011, 23, 253–258. [Google Scholar] [CrossRef]
  7. Fantazzini, D.; Toktamysova, Z. Forecasting German car sales using Google data and multivariate models. Int. J. Prod. Econ. 2015, 170, 97–135. [Google Scholar] [CrossRef] [Green Version]
  8. Liu, G.; Long, B. The research on Combination forecasting model of the automobile sales forecasting system. In Proceedings of the 2009 International Forum on Computer Science-Technology and Applications, Chongqing, China, 25–27 December 2009; pp. 82–85. [Google Scholar]
  9. López, M.; Sicilia, M. Determinants of E-WOM Influence: The Role of Consumers’ Internet Experience. J. Theor. Appl. Electron. Commer. Res. 2014, 9, 7–8. [Google Scholar] [CrossRef] [Green Version]
  10. Sa’ait, N.; Kanyan, A.; Nazrin, M.F. The effect of e-WOM on customer purchase intention. Int. Acad. Res. J. Soc. Sci. 2016, 2, 73–80. [Google Scholar]
  11. Bataineh, A.Q. The Impact of Perceived e-WOM on Purchase Intention: The Mediating Role of Corporate Image. Int. J. Mark. Stud. 2015, 7, 126. [Google Scholar] [CrossRef] [Green Version]
  12. Singh, A.; Jenamani, M.; Thakkar, J.J.; Rana, N.P. Quantifying the effect of eWOM embedded consumer perceptions on sales: An integrated aspect-level sentiment analysis and panel data modeling approach. J. Bus. Res. 2022, 138, 52–64. [Google Scholar] [CrossRef]
  13. Filieri, R.; Lin, Z.; Pino, G.; Alguezaui, S.; Inversini, A. The role of visual cues in eWOM on consumers’ behavioral intention and decisions. J. Bus. Res. 2021, 135, 663–675. [Google Scholar] [CrossRef]
  14. Kaur, K.; Singh, T. Impact of Online Consumer Reviews on Amazon Books Sales: Empirical Evidence from India. J. Theor. Appl. Electron. Commer. Res. 2021, 16, 2793–2807. [Google Scholar] [CrossRef]
  15. Hennig-Thurau, T.; Gwinner, K.P.; Walsh, G.; Gremler, D.D. Electronic word-of-mouth via consumer-opinion platforms: What motivates consumers to articulate themselves on the Internet? J. Interact. Mark. 2004, 18, 38–52. [Google Scholar] [CrossRef]
  16. Wu, S.-J.; Chiang, R.-D.; Chang, H.-C. Applying sentiment analysis in social web for smart decision support marketing. J. Ambient. Intell. Humaniz. Comput. 2018, 1–10. [Google Scholar] [CrossRef]
  17. Lee, I. A study of the effect of social shopping deals on online reviews. Ind. Manag. Data Syst. 2017, 117, 2227–2240. [Google Scholar] [CrossRef]
  18. Ren, R.; Wu, D.D.; Liu, T. Forecasting Stock Market Movement Direction Using Sentiment Analysis and Support Vector Machine. IEEE Syst. J. 2019, 13, 760–770. [Google Scholar] [CrossRef]
  19. Thiesing, F.M.; Vornberger, O. Sales forecasting using neural networks. In Proceedings of the International Conference on Neural Networks (ICNN’97), Houston, TX, USA, 12–12 June 1997; Volume 2124, pp. 2125–2128. [Google Scholar]
  20. Chong, A.Y.L.; Li, B.Y.; Ngai, E.W.T.; Ch’ng, E.; Lee, F. Predicting online product sales via online reviews, sentiments, and promotion strategies A big data architecture and neural network approach. Int. J. Oper. Prod. Manag. 2016, 36, 358–385. [Google Scholar] [CrossRef] [Green Version]
  21. Hu, N.; Koh, N.S.; Reddy, S.K. Ratings lead you to the product, reviews help you clinch it? The mediating role of online review sentiments on product sales. Decis. Support Syst. 2014, 57, 42–53. [Google Scholar] [CrossRef]
  22. Moon, S.-H.; Kim, Y.-H. An improved forecast of precipitation type using correlation-based feature selection and multinomial logistic regression. Atmos. Res. 2020, 240, 104928. [Google Scholar] [CrossRef]
  23. Upadhyay, A.; Bandyopadhyay, G.; Dutta, A. Forecasting Stock Performance in Indian Market using Multinomial Logistic Regression. J. Bus. Stud. Q. 2012, 3, 16. [Google Scholar]
  24. Loureiro, A.L.D.; Miguéis, V.L.; Da Silva, L.F.M. Exploring the use of deep neural networks for sales forecasting in fashion retail. Decis. Support Syst. 2018, 114, 81–93. [Google Scholar] [CrossRef]
  25. Yu, Q.; Wang, K.; Strandhagen, J.O.; Wang, Y. Application of Long Short-Term Memory Neural Network to Sales Forecasting in Retail—A Case Study. In Lecture Notes in Electrical Engineering; Springer: Singapore, 2018; pp. 11–17. [Google Scholar] [CrossRef]
  26. Kiranyaz, S.; Ince, T.; Hamila, R.; Gabbouj, M. Convolutional Neural Networks for patient-specific ECG classification. In Proceedings of the 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Milan, Italy, 25–29 August 2015. [Google Scholar]
  27. Manowska, A. Using the LSTM Network to Forecast the Demand for Electricity in Poland. Appl. Sci. 2020, 10, 8455. [Google Scholar] [CrossRef]
  28. Huang, W.; Nakamori, Y.; Wang, S.-Y. Forecasting stock market movement direction with support vector machine. Comput. Oper. Res. 2005, 32, 2513–2522. [Google Scholar] [CrossRef]
  29. MOTC. Available online: https://stat.motc.gov.tw/mocdb/stmain.jsp?sys=100&funid=defjsp# (accessed on 28 September 2020).
  30. Wang, Y. A multinomial logistic regression modeling approach for anomaly intrusion detection. Comput. Secur. 2005, 24, 662–674. [Google Scholar] [CrossRef]
  31. OpView Insight: Social Media Monitoring Tool. Available online: https://www.opview.com.tw/ (accessed on 15 August 2020).
  32. Lu, H.-K.; Yang, L.-W.; Lin, P.-C.; Yang, T.-H.; Chen, A.N. A study on adoption of bitcoin in Taiwan: Using Big Data Analysis of Social Media. In Proceedings of the 3rd International Conference on Communication and Information Processing, ICCIP ‘17, Tokyo, Japan, 24–26 November 2017. [Google Scholar]
  33. Hosmer Jr, D.W.; Lemeshow, S.; Sturdivant, R.X. Applied Logistic Regression; John Wiley & Sons: Hoboken, NJ, USA, 2013; Volume 398. [Google Scholar]
  34. Livieris, I.E.; Pintelas, E.; Pintelas, P. A CNN–LSTM model for gold price time-series forecasting. Neural Comput. Appl. 2020, 32, 17351–17360. [Google Scholar] [CrossRef]
  35. Yenter, A.; Verma, A. Deep CNN-LSTM with combined kernels from multiple branches for IMDb review sentiment analysis. In Proceedings of the 2017 IEEE 8th Annual Ubiquitous Computing, Electronics and Mobile Communication Conference (UEMCON), New York, NY, USA, 19–21 October 2017; pp. 540–546. [Google Scholar]
  36. Sezer, O.B.; Ozbayoglu, A.M. Algorithmic financial trading with deep convolutional neural networks: Time series to image conversion approach. Appl. Soft Comput. 2018, 70, 525–538. [Google Scholar] [CrossRef]
  37. Kiranyaz, S.; Avci, O.; Abdeljaber, O.; Ince, T.; Gabbouj, M.; Inman, D.J. 1D Convolutional Neural Networks and Applications: A Survey. arXiv 2019, arXiv:1905.03554. [Google Scholar] [CrossRef]
  38. Ma, Y.; Han, R.; Wang, W. Prediction-Based Portfolio Optimization Models Using Deep Neural Networks. IEEE Access 2020, 8, 115393–115405. [Google Scholar] [CrossRef]
  39. Jie, Z.; YAN, J.-f.; Lu, Y.; Meng, W.; Peng, X. Customer Churn Prediction Model Based on LSTM and CNN in Music Streaming. DEStech Trans. Eng. Technol. Res. 2019, aemce, 254–261. [Google Scholar]
  40. Hand, D.; Christen, P. A note on using the F-measure for evaluating record linkage algorithms. Stat. Comput. 2018, 28, 539–547. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Research framework.
Figure 1. Research framework.
Applsci 12 01550 g001
Figure 2. Summary of the predicted dataset.
Figure 2. Summary of the predicted dataset.
Applsci 12 01550 g002
Figure 3. The architecture of the CNN-LSTM model.
Figure 3. The architecture of the CNN-LSTM model.
Applsci 12 01550 g003
Figure 4. The accuracy of the classical, sentimental, and the CNN-LSTM models for six car brands.
Figure 4. The accuracy of the classical, sentimental, and the CNN-LSTM models for six car brands.
Applsci 12 01550 g004
Figure 5. The precision of the classical, sentimental, and the CNN-LSTM models for six car brands.
Figure 5. The precision of the classical, sentimental, and the CNN-LSTM models for six car brands.
Applsci 12 01550 g005
Figure 6. The recall of the classical, sentimental, and the CNN-LSTM models for six car brands.
Figure 6. The recall of the classical, sentimental, and the CNN-LSTM models for six car brands.
Applsci 12 01550 g006
Figure 7. The F1-score of the classical, sentimental, and the CNN-LSTM models for six car brands.
Figure 7. The F1-score of the classical, sentimental, and the CNN-LSTM models for six car brands.
Applsci 12 01550 g007
Figure 8. The precision of direction D of the classical, sentimental, and the CNN-LSTM models for six car brands.
Figure 8. The precision of direction D of the classical, sentimental, and the CNN-LSTM models for six car brands.
Applsci 12 01550 g008
Figure 9. The precision of direction U of the classical, sentimental, and the CNN-LSTM models for six car brands.
Figure 9. The precision of direction U of the classical, sentimental, and the CNN-LSTM models for six car brands.
Applsci 12 01550 g009
Figure 10. The precision of direction F of the classical, sentimental, and the CNN-LSTM models for six car brands.
Figure 10. The precision of direction F of the classical, sentimental, and the CNN-LSTM models for six car brands.
Applsci 12 01550 g010
Table 1. The collected historical monthly car sales volumes.
Table 1. The collected historical monthly car sales volumes.
NoMonthBrand (Car Sales)
BMWLexusMazdaBenzToyotaVW
SalesSalesSalesSalesSalesSales
12014/11437120769118152471897
22014/26248062311033957551
32014/31294971477158816531053
42014/413931039452169219431154
52014/512841310466151817831113
62014/614241131319162022201414
72014/71469148161816162631959
82014/811905933251389983914
92014/914541193444178416151438
102014/1016301334298169516451313
112014/111565109443417542031795
122014/1221351148517179628051329
132015/120407871970208624681095
142015/21016119360912191575720
Applsci 12 01550 i001
242015/1217181634185920242721773
252016/120451379169725542003980
Applsci 12 01550 i002
702019/10165822331418303757121118
712019/11146125561553255551201103
722019/12192723481857272144051399
*2020/1135526871122249056861232
* The record of 2020/1 is for ensuing labeling.
Table 2. The collected daily online sentiment volume for BMW.
Table 2. The collected daily online sentiment volume for BMW.
DateDiscussion ForumsSocial MediaQ&A WebsitesBlogsNews
* PNTPNTPNTPNTPNT
2014/1/132189861320005211619
2014/1/231241284226000421019142
2014/1/335241019650000321510329
Applsci 12 01550 i003
2014/1/292326109474331521931640
2014/1/30192087612700022928042
2014/1/31409732736321402413035
2014/2/1121559206564070141012
2014/2/2201811172270010134010
Applsci 12 01550 i004
2014/2/28311291652902461134112
2014/3/1371811941227015519305
2014/3/2192085321601460116011
2014/3/3171792612431146023212498
2014/3/43340169816510016021162267
Applsci 12 01550 i005
2019/12/28406524495299670003048625154
2019/12/294431138104701483000001241348
2019/12/30383613823044175800020349890
2019/12/3118198632040191100030348065
* P: Positive sentiment volume, N: Negative sentiment volume, T: Total sentiment volume including the volume of positive, negative, and neutral sentiments.
Table 3. The labeling results of car sales movement directions for Table 1.
Table 3. The labeling results of car sales movement directions for Table 1.
NoMonthBrand (Car Sales & Label)
BMWLexusMazdaBenzToyotaVW
SalesLabelSalesLabelSalesLabelSalesLabelSalesLabelSalesLabel
12014/11437D1207D691D1815D2471D897D
22014/2624U806U231U1033U957U551U
32014/31294F971F477F1588F1653U1053F
42014/41393F1039U452F1692D1943F1154F
52014/51284U1310D466D1518F1783U1113U
62014/61424F1131U319U1620F2220U1414D
72014/71469D1481D618D1616D2631D959F
82014/81190U593U325U1389U983U914U
92014/91454U1193U444D1784F1615F1438F
102014/101630F1334D298U1695F1645U1313D
112014/111565U1094F434U1754F2031U795U
122014/122135F1148D517U1796U2805D1329D
132015/12040D787U1970D2086D2468D1095D
142015/21016U1193F609U1219U1575F720U
Applsci 12 01550 i006
242015/121718U1634D1859F2024U2721D773U
252016/12045D1379D1697D2554D2003D980D
Applsci 12 01550 i007
702019/101658D2233U1418F3037D5712D1118F
712019/111461U2556F1553U2555F5120D1103U
722019/121927D2348U1857D2721F4405U1399D
*2020/11355 2687 1122 2490 5686 1232
* The record of 2020/1 is for ensuing labeling.
Table 4. The dataset of the classical model for BMW.
Table 4. The dataset of the classical model for BMW.
Applsci 12 01550 i008
* The records of 2013 are for ensuing predictor variables.
Table 5. The dataset of the sentimental model for BMW.
Table 5. The dataset of the sentimental model for BMW.
No
(i)
MonthHistorical Sales
(xi1)
Same-Month-Last Year Label
(xi2)
Month
Number
(xi3)
Discussion ForumsSocial MediaQ&A WebsitesBlogsNews
P
(xi4)
N
(xi5)
T
(xi6)
P
(xi7)
N
(xi8)
T
(xi9)
P
(xi10)
N
(xi11)
T
(xi12)
P
(xi13)
N
(xi14)
T
(xi15)
P
(xi16)
N
(xi17)
T
(xi18)
12014/11437D11078725389621817715032088080493264511621179
22014/2624U21043895421427413014761615127865032335689858
32014/31294U393779937632171691406122110163262763601751052
42014/41393F410348434003388291290918159460252634231471179
52014/51284F5155596050826863873280121269123333953962201194
62014/61424U61221915433636830123539890170315104861061060
72014/71469D79427923767640677427911973181365404201461195
82014/81190U89651013438158439337681457115175294524133421555
92014/91454F995588140193722752914102910119655520307158963
102014/101630U1010908274171323261223761546216365473902741243
112014/111565U1111941189528536557027191714792475255966110663361
122014/122135D1211159564547289198222216481514948313224422261503
702019/101658U101008100952762839266824048013198797805493118
712019/111461F11118516976615276518262633500036127413089053536
722019/121927F12104713235040451828583330500056129715218303705
Table 6. The confusion matrix for car sales movement direction prediction.
Table 6. The confusion matrix for car sales movement direction prediction.
Number of Predictions for Car Sales Movement Direction *PredictedTotal
DFU
ActualDDDDFDUA(D)
FFDFFFUA(F)
UUDUFUUA(U)
TotalP(D)P(F)P(U)N
* DD, FF, UU: The number of correct predictions for car sales movement directions D, F, and U, respectively; FD: The number of actual directions F predicted as direction D; UD: The number of actual directions U predicted as direction D; DF: The number of actual directions D predicted as direction F; UF: The number of actual directions U predicted as direction F; DU: The number of actual directions D predicted as direction U; FU: The number of actual directions F predicted as direction U.
Table 7. The aggregate confusion matrix of the classical model for six car brands.
Table 7. The aggregate confusion matrix of the classical model for six car brands.
Number of Predictions for Car Sales Movement DirectionPredictedTotal
DFU
ActualD143421
F123621
U1521330
Total4182372
Table 8. The aggregate confusion matrix of the sentimental model for six car brands.
Table 8. The aggregate confusion matrix of the sentimental model for six car brands.
Number of Predictions for Car Sales Movement Direction *PredictedTotal
DFU
ActualD104721
F561021
U182130
Total16183872
* DD, FF, UU: The number of correct predictions for car sales movement directions D, F, and U, respectively; FD: The number of actual directions F predicted as direction D; UD: The number of actual directions U predicted as direction D; DF: The number of actual directions D predicted as direction F; UF: The number of actual directions U predicted as direction F; DU: The number of actual directions D predicted as direction U; FU: The number of actual directions F predicted as direction U.
Table 9. The aggregate confusion matrix of the CNN-LSTM model for six car brands.
Table 9. The aggregate confusion matrix of the CNN-LSTM model for six car brands.
Number of Predictions for Car Sales Movement Direction *PredictedTotal
DFU
ActualD143421
F310821
U132630
Total18163872
* DD, FF, UU: The number of correct predictions for car sales movement directions D, F, and U, respectively; FD: The number of actual directions F predicted as direction D; UD: The number of actual directions U predicted as direction D; DF: The number of actual directions D predicted as direction F; UF: The number of actual directions U predicted as direction F; DU: The number of actual directions D predicted as direction U; FU: The number of actual directions F predicted as direction U.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Ou-Yang, C.; Chou, S.-C.; Juan, Y.-C. Improving the Forecasting Performance of Taiwan Car Sales Movement Direction Using Online Sentiment Data and CNN-LSTM Model. Appl. Sci. 2022, 12, 1550. https://doi.org/10.3390/app12031550

AMA Style

Ou-Yang C, Chou S-C, Juan Y-C. Improving the Forecasting Performance of Taiwan Car Sales Movement Direction Using Online Sentiment Data and CNN-LSTM Model. Applied Sciences. 2022; 12(3):1550. https://doi.org/10.3390/app12031550

Chicago/Turabian Style

Ou-Yang, Chao, Shih-Chung Chou, and Yeh-Chun Juan. 2022. "Improving the Forecasting Performance of Taiwan Car Sales Movement Direction Using Online Sentiment Data and CNN-LSTM Model" Applied Sciences 12, no. 3: 1550. https://doi.org/10.3390/app12031550

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop