New Energy Vehicle Consumer Demand Mining Research Based on Fusion Topic Model: A Case in China

Wang, Xiaoguang; Lv, Tao; Fan, Lei

doi:10.3390/su14063316

Open AccessArticle

New Energy Vehicle Consumer Demand Mining Research Based on Fusion Topic Model: A Case in China

by

Xiaoguang Wang

^1,2

,

Tao Lv

^1,* and

Lei Fan

²

¹

School of Economics and Management, China University of Mining and Technology, Xuzhou 221116, China

²

School of Mathematics and Information Engineering, Lianyungang Normal College, Lianyungang 222000, China

^*

Author to whom correspondence should be addressed.

Sustainability 2022, 14(6), 3316; https://doi.org/10.3390/su14063316

Submission received: 10 February 2022 / Revised: 8 March 2022 / Accepted: 9 March 2022 / Published: 11 March 2022

Download Review Reports Versions Notes

Abstract

:

This study extracted the demand preference topic words of new energy vehicle consumers with the help of the topic model, calculated the similarity between the word vectors and the topic keywords and expanded the topic keywords, analyzed and compared the demand topics and feature expansion words of different car models, and summarized the demand differences of other consumer groups. The analysis results show that consumers’ demands of different groups have the exact demand dimensions such as new energy features and brand features, and different demand dimensions such as application, services, and professional performance. The research findings help consumers filter valuable information from online review data and help car companies objectively and accurately obtain consumer demands, develop more reasonable marketing strategies, and achieve healthy and sustainable corporate development.

Keywords:

Fusion Model; consumer demand; LDA; Word2vec

1. Introduction

Research shows that most consumers browse online reviews before purchasing a product, and online reviews are a vital influence on consumers’ purchase decisions [1]. For expensive durable goods like cars, the high “mismatch cost” makes consumers more inclined to obtain valuable word-of-mouth information through online reviews to assist their purchase decisions. Efficiently mining information from complex online reviews is an essential tool for auto consumers to understand and grasp the performance of cars and form a comprehensive knowledge of auto products.

On the other hand, companies increasingly focus on building and managing online reviews, paying attention to online consumers’ voices, and identifying their needs. Studies have confirmed that including consumers in the product design process is more effective than simply treating consumers as buyers of products. In particular, leading users of products’ involvement can help companies grasp market demand trends in time and obtain more sources of product innovation [2]. Brand images from consumers’ mouths are more readily accepted and spread than those set by companies. A good company image can enhance consumers’ recognition of the company, which directly affects the market acceptance of the product and largely determines the market share of the company and its future development prospects. The above factors make online reviews a “bridge” between consumers and car companies and become an essential tool for car companies to export their brand image and product marketing.

The high price of cars, the high “mismatch cost”, and the information asymmetry have created a “rigid demand” for car consumers to find valuable information from online reviews. This “rigid demand” makes it possible for car companies to use online reviews for online marketing. Based on the fusion topic model, this study makes full use of online reviews as a “bridge” to help consumers filter out valuable information from online review data. On the other hand, it allows car companies to obtain objective and accurate information about consumers’ needs, improve production decisions, and develop more reasonable marketing strategies to achieve healthy and sustainable corporate development.

The remainder of this paper is structured in the following sections. Section 2 presents a literature review of studies about consumer demands. Section 3 introduces topic models that can indicate demand topics and demand attributes, and describes the methodology used in this study. The following Section 4 elaborates on the principles of data set selection, presents the experimental design and results analysis. We discuss our obtained results in Section 5. Concluding remarks appear in Section 6.

2. Related Research

2.1. Consumer Demands

Consumer demand preference generally includes product demand preference and attribute demand preference; product demand preference is expressed as consumer satisfaction with the product, and attribute demand preference is defined as the sentiment orientation of consumers towards product attributes.

Al-Mahish et al. [3] estimated consumers’ demand for personal protecting products (PPP) from COVID-19. The results show that there is a significant difference in the mean of quantity demanded of facemasks among men and women. There is a significant difference in the mean of quantity demand for facemasks, gloves, and hand sanitizer based on respondents’ level of education. The findings indicate that the quantity demanded of facemask and gloves are sensitive to changes in consumers’ income. Baarsma et al. [4] studied the effect of the pandemic on demand for online grocery shopping specifically. They found that local hospital admissions do not correlate with the variety of groceries ordered, but online search behavior does, suggesting that hoarding behavior is driven by the general perception and impact of the virus rather than local conditions. Wu et al. [5] proposed a novel model to identify consumer’s product demands and calculate their intensity based on various online shopping behaviors. The model considers several important factors that can improve the detection of associations among co-demanded products, including the time interval between two product demands from the same consumers, the popularity of each demanded product, and the number of product demands from each consumer. Park, M et al. [6] proposed that the level of network governance affects retailers’ level of unilateral control in uncertain consumer demands. The empirical results indicate that when there is a low level of network governance, retailers tend to increase the level of unilateral control over suppliers as consumer demand uncertainty increases. On the other hand, retailers who feel a high level of network governance may rely on unilateral governance to a lesser extent regardless of consumer demand uncertainty. Pallant et al. [7] identified four distinct segments: Non-Customizers, New Customizers, Active Customizers, and Lapsing Customizers, while studying their outcome considerations in terms of new product design. They found that varying patterns of customization preference and consumption patterns among segments provide opportunities for retail and service providers to target customization offers more effectively.

Using the mobile phone as an example, Zhang et al. [8] verified the strong correlation between users’ review information on product attributes and the direction of product improvement and showed that using online review information for product improvement is feasible and effective; Nikumanesh et al. [9] analyzed the positive and negative views of users in the use of the product/service segment and proposed a model to improve product quality. Sun and Shen [10] used mobile phones on the online shopping platforms of JD.com and Tmall.com as research objects, obtained the scores of consumer demand preference dimensions based on sentiment analysis, and analyzed and compared the composition of the consumer groups of four mobile phones. By text-mining, the online review data of new food channel on Tmall.com, Zhang [11] extracted the main factors of agricultural products consumers’ concerns, which included four categories: product quality, price value, packaging and logistics, and customer service. Zhou and Zhang [12] proposed an online review product attribute extraction method and conducted experiments on the online review corpus of food, mobile phones, and books. The results show that the technique can effectively discover the evaluated objects’ attributes and mine the fine-grained aspects of review products. Dertwinkel-Kalt et al. [13] found that consumers in the online shop of a cinema are more likely to select tickets for a 3D movie when the 3D surcharge is shrouded, but they also drop out more often when the overall price is shown at the checkout.

2.2. Consumer Demands for the Automotive Industry

In the studies of consumer demands for the automotive industry, Singh et al. [14] present a text analytics framework that analyses online reviews to explore how consumer-perceived negativity corresponding to the supply chain propagates over time and how it affects car sales. Their findings suggest that consumer-voiced negativity is maximum for dealers and minimum for manufacturing and assembly related features. The proposed research framework can help the manufacturers in realizing consumer-cited critical car sales influencers; and accurately predicting the sales. The study of Hee [15] analyzed the consumers’ perceived cost and consumption value of cars on their satisfaction. They found that the consumption values experienced by consumers using hybrid cars were categorized into economic value, social value, and symbol of distinction value. The pre-purchase perceived cost of hybrid cars was fairly high, and the after-purchase adaptation cost recorded a lower score. Galarraga et al. [16] calculated own- and cross-price elasticity of demand for cars with efficiency labels on the Spanish car market. The results show that if consumers are concerned about the absolute energy performance of cars independently of other attributes, and, thus, pay attention to absolute labeling, demand for more efficient cars is more elastic than demand for less efficient cars. If consumers choose the car segment first and then the energy performance, using the relative label, the opposite result is found. ALGANAD et al. [17] examined the role of conditional value in the green automotive industry. The results of the relationships were as follows: fuel prices were the most significant predictor of Malaysian consumers’ attitudes and consumers’ intention to purchase green cars, followed by environmental consequences and government policy. However, retail sales promotions did not show a significant effect on both consumers’ attitudes and intentions. By using a large consumer panel data set of credit and employment information, Jiang et al. [18] found that receiving a bonus increases auto loan demand by 21%. By comparing consumers with bonuses with those without bonuses, the authors found that bonus payments lead to both demand expansion and demand shifting on auto loans.

Ni et al. [19] used the association rules to identify customer demands for quality in the automotive industry, mainly including purchase cost, maintenance cost, material quality, service quality, and management quality. Yang et al. [20] used a modified Kano model to analyze the customer demands of gender-specific customers for automotive services. The results showed that male customers perceived the service environment as non-differentiated customer demand. In contrast, females perceived it as a primary demand. Assuming that the sentiment distribution depends on aspect distribution and object distribution, Wang [21] constructed a sentiment classification model based on a joint model to analyze the sentiment classification and review opinions of automobile consumers using automobile word-of-mouth data as the research object. In terms of sentiment classification methods, to improve the performance of machine learning classification algorithms, Huang [22] compared three sentiment classification methods, namely, plain Bayes, support vector machine, and decision tree, followed by data mining analysis of online reviews of automobiles. Based on this, a car review sentiment mining system was developed to acquire positive and negative consumer reviews of car word-of-mouth efficiently.

The discussion and analysis above indicate that extant studies have established exciting insights and results. However, there are still some limitations in the existing methods, such as: The current research on consumer demand mainly focuses on the generalization of product attributes. If further refined to the consumer demand of the automotive industry, the established studies show the sentiment orientation calculation and sentiment analysis of consumer reviews. In addition, related studies in the automotive sector have mainly focused on carefully considering consumer demand preferences while ignoring the influence of model characteristics on user demand preferences. Because of this, this study attempts to build a research model for automotive consumers’ demand preferences and introduces the fusion topic model into the extraction of consumer demand topics and demand attribute words. Through topic extraction and word vector calculation, we identified consumer demand dimensions, reduced consumer information asymmetry, and assisted consumers in making shopping decisions. The influence of model attributes on user preferences is also considered to provide academic support for enterprises to enhance their brand image and improve their marketing strategies in a targeted manner.

3. Methodology and Model

3.1. Demand Topic Extraction Based on the Topic Model

Latent Dirichlet Allocation (LDA) based topic model occupies a crucial position in data mining such as text sentiment classification and information extraction and is commonly used to mine the latent topic information in the corpus in the big data environment. This model was proposed by Blei D M et al. in 2003 [23]. Its core idea is a three-layer Bayesian probabilistic model, which contains a three-layer structure of documents, topics, and words, forming document-topic and topic-word probability distributions. The LDA topic model can extract the deep semantic relationships between terms and documents and effectively extract the hidden topics in large-scale document sets and corpora. It is the most widely used and successful text topic extraction model. The generation formula is shown as follows.

P (w_{i} | d_{j}) = \sum_{s = 1}^{K} P (w_{i} | k = s) \times P (k = s | d_{j})

(1)

where

P (w_{i} | k = s)

denotes the probability that word

w_{i}

belongs to the topic

s

, and

P (k = s | d_{j})

denotes the probability that the topic

s

is in text

d_{j}

[24].

Chen et al. [25] used the LDA topic model approach to analyze and understand the main viewpoints of online public opinion at the next level; Li et al. [26] used the LDA algorithm to construct an online opinion topic identification model and used Sina Weibo as an example to identify the viewpoint topics in online public opinion; Liu et al. [27] used the LDA topic model to mine and parse the text comment’s feature structure and semantic content. They also explored and tracked the evolutionary trends of topics; Guo et al. [28] used the LDA topic model to mine customer online reviews and found 19 dimensions related to the satisfaction of potential hotel customers. According to the results of literature research [29,30,31,32,33], LDA uses an efficient probabilistic inference algorithm to process large-scale data and excels at identifying the implied semantics of large sets of documents.

3.2. Demand Attribute Expansion Based on the Vector Model

Word2vec is a semantic computation tool that trains models through neural network algorithms. It transforms words into vectors and maps them into a high-dimensional space for vector operations, ultimately predicting the terms related to their semantics. Word2vec consists of two models, CBOW and Skip-gram; the former indicates the probability of the current word by contextual words, while the latter predicts the likelihood of contextual words based on the present term.

The steps to expand the demand attributes using Word2vec word vector model are as follows.

(1): Word vector training. The word vector model and the word vector representation of the corresponding dimension can be obtained using Word2vec tool to train the comment corpus afterword separation. Context window distance and vector space dimension are essential parameters for model training; the more significant the window, the more contextual information involved, the better the vector representation effect.
(2): Topic word expansion. The trained model is used to calculate the cosine of the angle between the comments in the comment corpus and the given topic words. Several words with higher semantic similarity are selected as candidate feature words to expand the topic words. The semantic similarity is calculated as shown in Equation (2).

$S i m (w_{i}, w_{j}) = \cos θ \frac{\sum_{i = 1}^{n} (x_{i} y_{i})}{\sqrt{\sum_{i = 1}^{n} {(x_{i})}^{2}} \times \sqrt{\sum_{i = 1}^{n} {(y_{i})}^{2}}}$

(2)

where $S i m (w_{i}, w_{j})$ denotes the word vector cosine similarity between word $w_{i}$ and word $w_{j}$ [34].

3.3. Demand Preference Mining Based on Fusion Topic Model

The LDA model focuses on the co-occurrence of documents and words, extracts deep semantic relationships between terms and documents, and effectively removes the implicit topics in the corpus. However, the shortcoming is that it cannot consider the contextual relationship between words and words. The Word2vec model, on the other hand, focuses on the co-occurrence of context and words and describes the relationships between phrases according to the contextual background. Overall, the advantages and disadvantages of the above two models for semantic analysis complement each other to form the basis for constructing the fusion topic model in this study.

This study collects comment data from online automotive forums and generates an online comment corpus after pre-processing, including word separation and deactivation. Firstly, we extracted the topic words of new energy vehicle consumer demand preference with the help of the LDA model, and then used the trained Word2vec tool to calculate the similarity between word vectors and topic words, and carried out demand attribute expansion, and finally improved the accuracy and stability of topic mining in the form of fusion topic model. The algorithm steps are as follows.

Input: pre-processed comment corpus, set the number of demand preference topics

Output: demand topic attribute expansion words

Algorithm:

(1)

Defining the set of documents

D = (d_{1}, d_{2}, d_{3}, \dots, d_{n})

,

α

and

β

are the prior parameters of the Dirichlet function,

θ

is the polynomial distribution of topics in the documents, which obeys the Dirichlet initial distribution with hyperparameters

α

, and

t h e \emptyset

is the polynomial distribution of words in the topics, which follows the Dirichlet initial distribution with hyperparameters

β

. Training the LDA model and obtaining the model parameters

θ

and

\emptyset

;

(2)

Training the Word2vec model and setting the word vector length.

(3)

For the training set of documents

d_{i} (d_{i} \in (d_{1}, d_{2}, d_{3}, \dots, d_{n})

.

Obtaining the topic distribution of document $d_{i}$ .
Obtaining the word vectors of the top $n$ topic words, respectively.
Selecting the topic $k_{m}$ with the highest probability in $d_{i}$ , select the first $k$ words under the topic word $(a_{1}, a_{2}, a_{3}, \dots, a_{k})$ and their probabilities $q_{1}, q_{2}, q_{3}, \dots, q_{k}$ , normalize the probability value, the formula is shown in (3), where: $q_{i}$ indicates the value of $p_{i}$ after normalization.

$q_{i} = \frac{p_{i}}{\sum_{a = 1}^{k} p_{a}}$

(3)
Using Word2vec model to obtain the word vector of each word $(A_{1}, A_{2}, A_{3}, \dots, A_{k})$ , the word vectors of $k$ words are weighted and summed to obtain the topic extension feature words, the formula is shown in (4), where $B_{i}$ is the topic extension feature word of document $d_{i}$ .

$B_{i} = \sum_{b = 1}^{k} q_{b} \times A_{b}$

(4)

4. Results

4.1. Data Sources

According to the report “Economic Operation of the Automotive Industry in December 2020” published by the Ministry of Industry and Information Technology [35], China’s auto sales reached 25.311 million units in 2020, the world’s first for 12 consecutive years. The sales of new energy vehicles reached 1.367 million units, making new energy vehicles a new hot spot in the auto market. New energy vehicle refers to the vehicles that use unconventional vehicle fuel or use conventional fuel but adopt a new vehicle power unit, integrate the advanced technologies of power control and drive of the vehicle with new technology, new structure, and new theory. New energy vehicles can be broadly divided into four categories: pure electric vehicles, hybrid vehicles, fuel cell vehicles, and solar electric vehicles [36].

This study takes the online forum of Auto Home (see www.autohome.com (accessed on 9 February 2022)) as the research object. Auto home is the world’s most visited online car forum. Consumers can post reviews of cars in terms of space, power, handling, and fuel consumption on the “Word of Mouth” channel. Combining the hot spot of new energy vehicles, we selected the “New Energy” section of the “Word of Mouth” channel. According to the price level, the channel divides new energy vehicles into four groups: “below CNY100,000 (USD15,830)”, “CNY100,000–200,000 (USD15,830–31,660)”, “CNY200,000–300,000 (USD31,660–47,490)”, and “over CNY300, 000 (USD47,490)”. We collected online reviews in four groups and explored the demand preference characteristics of new energy vehicle consumers in different price levels and groups. We defined the above four groups as Group A, Group B, Group C, and Group D. Auto Home is a Chinese version of the website, so we collected 18,484 Chinese online reviews of the four groups through the self-coded crawler, including 3910 reviews of Group A, 3157 reviews of Group B, 7141 reviews of Group C and 4276 reviews of Group D. Further, after deactivation and word separation processing, the four groups of corpora were generated and prepared for subsequent analysis.

4.2. Demand Topic Extraction

Determining the number of topics in the LDA model focuses on solving the applicability problem of the mining algorithm. The corresponding LDA topic model is generally optimal when the average similarity of topic structure is most minor. Too large several topics are prone to over fitting. It generates many cases without obvious categorical semantic information. At the same time, too small of several issues do not fully reflect the topic dimension and generate coarse-grained topics, which dramatically impacts classification. Perplexity, as a measure for judging probabilistic models, is a mainstream method to determine the optimal number of topics, which generally shows a decreasing law with the increase of the number of the potential topics, and the smaller the value of perplexity means the more vital the generative ability of the topic model [37]. Therefore, in this study, a relatively small perplexity value and a relatively small number of topics are selected as the optimal model parameters for LDA topic model training [38], and the calculation process is shown in Equation (5).

perplexity (D) = e x p \frac{\sum_{m = 1}^{M} \log P (W_{m})}{\sum_{m = 1}^{M} N_{m}}

(5)

where D denotes the set of all words in a document, M denotes the number of copies,

W_{m}

denotes the words in document m,

P (W_{m})

denotes the probability of occurrence of a word in a document, and

N_{m}

represents the number of words in each record.

The group’s A, B, C, and D perplexity values were calculated in Python. The curves showed that the perplexity values of the four data groups were the smallest when the number of topics was 12, 10, 12, and 7, respectively. In general, there is no “perfect result” for classifying topics. It is necessary to try to extract the topics by combining the values of the perplexity method and then to determine the number of topics according to their readability and interpretability. In this study, after extracting topics according to the calculation results of the perplexity method, we found that some topics had too much overlap or similarity. We filtered, merged, and organized the topics by manually reading the comment statements where the keywords were located. We finally formed groups of 4–5 topics and the keywords corresponding to each case. These topics have explicit content, little similarity, and low overlap. Considering the article’s length, only the topics extracted from each group were summarized, as shown in Table 1.

4.3. Demand Attribute Expansion

Although the topics of each group of car reviews have been summarized according to the model, the keywords corresponding to each group of topics do not fit perfectly with the issues. In other words, the LDA algorithm cannot obtain a “perfect” topic classification result without considering the contextual relationship. Given this, we continued to vectorize the review corpus and find the words that are most semantically similar to the specific terms in the word vector space of four groups of models, i.e., constituting the expanded words of the demand topic features. The steps are decomposed as follows.

(1): Word frequency statistics were conducted on the comment corpus of the four groups, respectively, and high-frequency words were retained.
(2): For each group of particular topics in Table 1, two high-frequency words most relevant to the issues are manually selected from the word frequency statistics.
(3): Find the words with the most similar semantics to the high-frequency words in the word vector space and form the expanded words of consumer demand topic features. The aggregated results are shown in Table 2, Table 3, Table 4 and Table 5.

The aggregated results are shown in Table 2, Table 3, Table 4 and Table 5. Among them, group A has five demand topics and 82 topic feature expansion words, group B has four demand topics and 57 topic feature expansion words, group C has four demand topics and 68 topic feature expansion words, and group D has five demand topics and 75 topic feature expansion words.

5. Discussions

As an essential dimension to consider in enterprises’ differentiated segmentation business strategy and personalized product design, vehicle model characteristic is an element that cannot be ignored in identifying users’ demand preferences. This study found that by comparing the demand topics and topic feature words of group A, B, C, and D online review.

(1): One of the topic dimensions discussed by the four groups is “New Energy Features”. The reviews collected in this study are all about new energy vehicles on the market, so the topic of discussion naturally focuses on new energy features, such as the battery, electric consumption, hybrid, blade (referring to the blade battery, a battery product released by BYD on 29 March 2020), and so on. It is worth noting that more new names or proper nouns closely related to new energy vehicles appear in the demand feature words of Group C, such as EV (electric vehicle), HEV (hybrid electric vehicle), PHEV (plug-in hybrid electric vehicle), ECO (an energy-saving mode), X-pedal (energy-saving driving mode), NEDC (a range standard), etc. These characteristic words also reflect that, consumers who buy models in the CNY200,000–300,000 (USD31,660–47,490) range are more familiar with and understand new energy vehicles and have more specialized background knowledge in new energy vehicles.
(2): Another topic dimension shared by all four groups is “Basic Features”. Keywords such as “acceleration”, “braking”, and “endurance” reflect the basic features of the vehicle that are common to all consumers. As a kind of mobility tool, the essential characteristics of the car, such as material, speed, and configuration, are common topics that all people are concerned about and can easily stimulate discussion. As a new market hotspot, new energy vehicles are becoming more accepted in the consumer community. It is very natural for consumers to pay attention to the basic features of new energy vehicles as they do gasoline vehicles. As a car company, in addition to highlighting new energy features, it is more important to focus on its products’ overall quality and function.
(3): The topic dimension that is common to both groups A and B is “Brand Features”, indicating that consumers who buy models below CNY100,000 (USD15,830) and CNY100,000–200,000 (USD15,830–31,660) pay more attention to the brand of the car, which is also confirmed by the presence of multiple car companies or model names in the feature expansion words. The “Subjective Experience” is a common topic discussed in groups C and D, indicating that consumers who buy models between CNY200,000–300,000 (USD31,660–47,490) and those above CNY300,000 (USD47,490) focus on the consumer experience of new energy vehicles and tend to express their subjective feelings and sensations in the process of using them. In comparison, consumers who choose models above CNY300,000 (USD47,490) have a more prosperous and more diverse range of emotional expressions.
(4): One of the topic dimensions of the discussion specific to Group A models is “Application”. This indicates that consumers who buy models under CNY100,000 (USD15,830) are more concerned about how to use new energy vehicles, which is more specific in the corresponding feature expansion words, such as “commuting”, “grocery shopping”, “touring around”, and “shopping”. Another topic dimension is “Personalized Configuration”. In this discussion of entry-level new energy models, personalized configuration information such as “mobile phone”, “blue tooth”, and “induction” are the shared demand preferences of consumers.
(5): The topic dimension specific to Group C models is “Horizontal Comparison”. By searching the online comments from consumers who bought models between CNY200,000 (USD31,660) and CNY 300,000 (USD47,490), we found that the keywords “foreign”, “Tenna”, and “NIO auto” appear in the topic feature expansion words mainly refer to the comparison between the purchased model and other car brands or models. The discussion topics specific to Group D models are “Service” and “Professional Performance”. The case of “Service” corresponds to keywords such as “get a car”, “change of battery”, “test driving”, “insurance”, etc. After searching the corresponding comment corpus, we found that mid-to-high-end consumers who buy models over CNY300,000 (USD47,490) pay more attention to the service content and quality during the purchase and use process. At the same time, the words “all aluminum”, “high frequency”, and “caliper” corresponding to “Professional Performance” indicate that mid-to-high-end consumers have more professional knowledge about cars and care more about the professional performance of new energy vehicles, which is also confirmed and reflected in other demand feature expansion words.

6. Conclusions

Mining analysis and knowledge discovery through massive data is a common concern of academia and industry in big data. With the successful development of online forums, consumers share their information on product demands, purchase preferences, and consumption experiences in major automotive platforms, which objectively provides a new online marketing basis for car companies. This study crawled 18,484 online consumer reviews of 37 new energy vehicles in four groups: “below CNY100,000 (USD15,830)”, “CNY100,000–200,000 (USD15,830–31,660)”, “CNY200,000–300,000 (USD31,660–47,490)”, and “over CNY300, 000 (USD47,490)” in the “New Energy” section of the online forum of “Auto Home”. We used the LDA algorithm to construct a three-layer text probability model of “document–topic–word” and extracted and summarized the demand topics of new energy vehicle consumers who bought different models. Using the Word2vec algorithm and combing word vector similarity, we expanded the topic keywords and formed topic feature expansion words. We identified the focus features of online reviews about car consumption and the consumer demand dimensions hidden in the reviews. Finally, we analyzed and compared different groups of consumer demand topics and topic feature expansion words and summarized the demand differences of different consumer groups.

With the help of research results, consumers can more easily understand the topics and priorities of each other’s discussions. And car companies get information about consumers’ demands and feedback on product and service quality, so they can quickly adjust their business strategies, improve product and service quality, and get twice as much performance with half the effort.

Further research regarding possible improvements should be carried out based on the customer demand from the traditional fuel vehicle and new energy vehicle. On the one hand, traditional fuel vehicles are still sought after by consumers because of their endurance and service life; on the other hand, new energy vehicles are gradually becoming a new hot spot in the auto market because of their energy-saving and environmental protection, low maintenance costs, and significant purchase discounts. Further study will focus on how consumers choose to buy fuel or new energy vehicles, the similarities and differences in consumer demand between the two types of cars, and what the differences in consumer demand tell us about the marketing of automotive companies.

Author Contributions

Conceptualization, X.W. and T.L.; methodology, X.W.; software, X.W. and L.F.; validation, X.W., T.L. and L.F.; formal analysis, X.W.; investigation, X.W.; resources, X.W.; data curation, X.W. and L.F.; writing—original draft preparation, X.W.; writing—review and editing, X.W., T.L. and L.F.; visualization, X.W.; supervision, T.L.; project administration, X.W.; funding acquisition, X.W. and T.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Social Science Foundation of China(Grant No. 20FGLB011), Social Science Foundation Project of Lianyungang City (Grant No. 20LKT1016), High-level Research Project of Lianyungang Normal College (Grant No. LSZGJB202004), and Haiyan Project of Lianyungang City (Grant No. 2020).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data will be made available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Chevalier, A.; Mayzlin, D. The effect of word of mouth on sales: Online book reviews. J. Mark. Res. 2006, 43, 345–354. [Google Scholar] [CrossRef] [Green Version]
Tuarob, S.; Tucker, S. Automated discovery of lead users and latent product features by mining large-scale social media networks. J. Mech. Des. 2015, 137, 071402. [Google Scholar] [CrossRef]
Al-Mahish, M.; AlDossari, N.; Almarri, A. Consumer’s demand for Disinfectants and Protective Gear from COVID-19 infection in Al-Hofuf, Saudi. J. Infect. Dev. Ctries. 2021, 11, 1618–1624. [Google Scholar] [CrossRef]
Baarsma, B.; Groenewegen, J. COVID-19 and the Demand for Online Grocery Shopping: Empirical Evidence from the Netherlands. Economist 2021, 4, 407–421. [Google Scholar] [CrossRef]
Wu, J.; Wang, Y. Discovery of associated consumer demands: Construction of a co-demanded product network with community detection. Expert Syst. Appl. 2021, 178, 115038. [Google Scholar] [CrossRef]
Park, M.; Kim, M.; Ryu, S. The relationship between network governance and unilateral governance in dynamic consumer demand. Ind. Mark. Manag. 2020, 84, 194–201. [Google Scholar] [CrossRef]
Pallant, J.; Sands, S.; Karpen, I. Product customization: A profile of consumer demand. J. Retail. Consum. Serv. 2020, 54, 102030. [Google Scholar] [CrossRef]
Zhang, H.; Rao, H.; Feng, J. Product innovation based on online review data mining: A case study of Huawei phones. Electron. Commer. Res. 2018, 3, 3–22. [Google Scholar] [CrossRef]
Nikumanesh, E.; Bohlouli, M.; Fathi, M. Knowledge discovery from online customer reviews towards product improvement. In Proceedings of the 13th International Conference Applied Computing 2016 Proceedings, Mannheim, Germany, 28–30 October 2016; pp. 211–214. [Google Scholar]
Sun, B.; Shen, R. Product Demand Preference Discrimination and Customer Segmentation Based on Online Reviews: The Case of Smartphones. China Manag. Sci. Available online: https://doi.org/10.16381/j.cnki.issn1003-207x.2020.0164 (accessed on 6 February 2022). [CrossRef]
Zhang, H. Factors influencing consumption satisfaction of fresh agricultural products in e-commerce—An exploratory analysis based on online reviews. Jiangsu Agric. Sci. 2019, 47, 4–8. [Google Scholar]
Zhou, Q.; Zhang, C. Fine-grained attribute extraction for online user reviews. J. Intell. 2017, 36, 484–493. [Google Scholar]
Dertwinkel-Kalt, M.; Koster, M.; Sutter, M. To buy or not to buy? Price salience in an online shopping field experiment. Eur. Econ. Rev. 2020, 130, 103593. [Google Scholar] [CrossRef]
Singh, A.; Jenamani, M.; Thakkar, J.; Rana, N. Propagation of online consumer perceived negativity: Quantifying the effect of supply chain underperformance on passenger car sales. J. Bus. Res. 2021, 132, 102–114. [Google Scholar] [CrossRef]
Hee, H. A Study on the Consumer Behaviors and Satisfaction Toward Hybrid Cars: Focused on Consumers’ Perceived Cost and Consumption Values. Korean J. Hum. Ecol. 2021, 5, 783–801. [Google Scholar]
Galarraga, I.; Kallbekken, S.; Silvestri, A. Consumer purchases of energy-efficient cars: How different labeling schemes could affect consumer response to price changes. Energy Policy 2020, 137, 111181. [Google Scholar] [CrossRef]
Alganad, A.; Isa, N.; Fauzi, W. Boosting green cars retail in Malaysia: The influence of conditional value on consumer’s behavior. J. Distrib. Sci. 2021, 7, 87–100. [Google Scholar]
Jiang, Z.; Dennis, Z.; Tat, C. How Do Bonus Payments Affect the Demand for Auto Loans and Their Delinquency? J. Mark. Res. 2021, 3, 476–496. [Google Scholar] [CrossRef]
Ni, M.; Xu, X.; Deng, S. Extended QFD and data-mining-based methods for supplier selection in mass customization. Int. J. Comput. Integr. Manuf. 2007, 20, 280–291. [Google Scholar] [CrossRef]
Yang, Y.; Zeng, Z.; Xu, J. Customer demand classification for automobile service based on analytical Kano model. Boletín Técnico 2017, 55, 59–67. [Google Scholar]
Wang, S.; Li, D.; Li, Y. Sentiment mining of product word-of-mouth data based on joint model. J. Tsinghua Univ. 2017, 9, 926–931. [Google Scholar]
Huang, H. Research and Application of Sentiment Classification for Online Reviews of Automobiles. Master’s Thesis, Harbin Institute of Technology, Harbin, China, 2013; pp. 167–168. [Google Scholar]
Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent Dirichlet Allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
Wu, Y.; Huang, K.; Wang, X. Method of Emotional Classification in Short Texts Combined with LDA Models. J. Chin. Comput. Syst. 2019, 10, 2082–2086. [Google Scholar]
Chen, X.; Gao, C. An LDA Topic Modeling Approach for Online Opinion Extraction. Lib. Intell. Work. 2015, 59, 21–26. [Google Scholar]
Li, Z.; Ding, S. A study on the identification of opinion topics on the Internet. Data Anal. Knowl. Discov. 2017, 1, 18–30. [Google Scholar]
Liu, S.; Peng, X. A study of learner topic mining for MOOC course reviews. Res. Electrochem. Teach. Learn. 2017, 38, 30–36. [Google Scholar]
Guo, Y.; Barnes, S.J.; Jia, Q. Mining meaning from online ratings and reviews: Tourist satisfaction analysis using latent Dirichlet allocation. Tour. Manag. 2017, 59, 467–483. [Google Scholar] [CrossRef] [Green Version]
Ha, T.; Lee, J.; Lee, C.-H.; Coh, B. The Prediction of Long-Term Survival of Artificial Intelligence Patents Based on Deep-Learning and Latent Dirichlet Allocation Modeling. Int. Telecommun. Policy Rev. 2021, 1, 27–50. [Google Scholar] [CrossRef]
Edison, H.; Carcel, H. Text data analysis using Latent Dirichlet Allocation: An application to FOMC transcripts. Appl. Econ. Lett. 2020, 1, 38–42. [Google Scholar] [CrossRef] [Green Version]
Ozyurt, B.; Akcayol, M. A new topic modeling based approach for aspect extraction in aspect based sentiment analysis: SS-LDA. Expert Syst. Appl. 2021, 168, 114231. [Google Scholar] [CrossRef]
Ekinci, E.; Omurca, S.I. Concept-LDA: Incorporating Babelfy into LDA for aspect extraction. J. Inf. Sci. 2020, 3, 406–418. [Google Scholar] [CrossRef]
Joung, J.; Kim, H. Automated Keyword Filtering in Latent Dirichlet Allocation for Identifying Product Attributes From Online Reviews. J. Mech. Des. 2021, 8, 084501. [Google Scholar] [CrossRef]
Zeng, Q.; Hu, X.; Li, C. Extracting Keywords with Topic Embedding and Network Structure Analysis. Data Anal. Knowl. Discov. 2019, 7, 52–60. [Google Scholar]
Ministry of Industry and Information Technology Announced the “December 2020 Economic Operation of the Automotive Industry”. Available online: https://www.miit.gov.cn/jgsj/zbys/gzdt/art/2021/art_2f2576adb8004b07ae2711df028a14b0.html (accessed on 6 February 2022).
Zhang, T.; Ma, C.; Yong, C. Development Status and Trends of New Energy Vehicles in China. In Proceedings of the 4th International Conference on Energy Science and Applied Technology (ESAT), Chongqing, China, 29–30 December 2018; p. 020012. [Google Scholar]
Guan, P.; Wang, Y. Research on the optimal topic number determination method of LDA topic model in scientific and technological intelligence analysis. Mod. Libr. Inf. Technol. 2016, 9, 42–50. [Google Scholar]
Zeng, Z.; Wang, J. Research on microblog rumor identification based on LDA and random forest-an an example of 2016 haze rumor. J. Intell. 2019, 38, 89–96. [Google Scholar]

Table 1. Summary of review topics for each group of models.

Group	Topic 1	Topic 2	Topic 3	Topic 4	Topic 5
A (below CNY100,000 /USD15,830)	New Energy Features	Basic Function	Brand Features	Application	Personalized Configuration
B (CNY100,000–200,000 /USD15,830–31,660)	New Energy Features	Basic Function	Brand Features	Basic Configuration	-
C (CNY200,000–300,000 /USD31,660–47,490)	New Energy Features	Basic Function	Subjective Experience	Horizontal Comparison	-
D (overCNY300,000 /USD47,490)	New Energy Features	Basic Function	Subjective Experience	Services	Professional Performance

Table 2. Demand topic feature expansion words (Group A).

Demand Topics	Topic Feature Expansion Words
Topic 1 New Energy Features	快充 (fast charging), 慢充 (slow charging), 费电 (power consumption), 能耗 (energy consumption), 耗电 (electricity), 线板 (line board), 频繁的 (frequent), 小时 (hour), 一度 (kWh), 钟头 (clock), 用电 (electricity consumption), 一顿饭 (one meal), 半小时 (half hour), 折算 (commutation)
Topic 2 Basic Features	续航 (endurance), 座椅 (seat), 里程 (mileage), 高度 (overall height), 靠背 (seat back), 暖气 (heat), 调整 (adjust), 腰部 (lumbar), 高个子 (tall), 驾驶座 (driver’s seat), 下调 (down), 秋天 (fall), 春夏 (spring/summer), 主驾 (driver), 副驾驶 (passenger), 身高 (height), 膝盖 (knee), 驾驶室 (cab), 前排 (front row)
Topic 3 Brand Features	欧拉 (ORA), 黑猫 (Black Cat), 实用性 (utility), 应用性 (practical), 小型 (small), 车型 (model), 认可 (acceptance), 级别 (class), 定位 (positioning), 做工 (workmanship), 同级 (same class), Star, EV
Topic 4 Application	通勤 (commuting), 代步 (mobility tools), 买菜 (grocery shopping), 上下学 (got to school), 周边游 (touring around), 逛街 (shopping), 防护 (protection), 休息日 (rest day), 日常 (daily), 远门 (travel far), 放学 (after school), 县城 (county town), 家用 (household), 周末 (weekend), 短途 (excursions), 上班 (work), 十里 (ten miles), 接送 (transfer)
Topic 5 Personalized Configuration	手机 (mobile phone), 钥匙 (key), 链接 (link), 蓝牙 (blue tooth), 感应 (induction), 遥控 (remote control), 连接 (connection), 机械的 (mechanical), 开门 (door opening), 手机信号 (mobile phone signal), 预热 (preheat), 流量 (mobile data), 导航 (navigation),卡片 (card), 可乐 (coke), 制动液 (brake fluid), 电桩 (charger), 支持 (support)

Table 3. Demand topic feature expansion words (Group B).

Demand Topics	Topic Feature Expansion Words
Topic 1 New Energy Features	充电 (charging), 快充 (fast charging), 广汽 (GAC), 小时 (hours), 五毛 (50 cents), 冬天 (winter), 保值 (value preservation), 经济性 (economy), 慢充 (slow charging), 充满 (full), 费用 (cost)
Topic 2 Basic Features	续航 (endurance), 加速 (acceleration), 提速 (speed raising), 起步 (start), 超车 (overtaking), 油门 (accelerator), 综合油耗 (MPG), 推背 (pushback), 输出 (output), 仪表盘 (dashboard), 公里 (kilometers), 顿挫 (stutter), 经济性 (economy), 电机 (motor), 趴窝 (stalled), NEDC
Topic 3 Brand Features	小鹏 (XPeng auto), 比亚迪 (BYD auto), 价值 (value), 国产车 (domestic auto), 大众 (VW), 车型 (model), 青春 (youth), 山脉 (mountain range), 融合 (fusion), 时代 (era), 拉风 (fashionable), 口碑 (word of mouth), 价钱 (price), 中国 (China)
Topic 4 Basic Configuration	方向盘(steering wheel), 空调(AC), 方向(direction), 助力(power-assisted), 温度(temperature), 转向灯(turn signal), 手感(feel), 指向(pointing), 解锁(unlocking), 得心应手(handy), 操纵(handling), 刹车(brake), 陡坡(steep hill), 远程(remote), 电门(electric door), OTA

Table 4. Demand topic feature expansion words (Group C).

Demand Topics	Topic Feature Expansion Words
Topic 1 New Energy Features	充电 (charge), 纯电的 (electric), 慢充 (slow charging), 快充 (fast charging), 名爵 (MG), 小时 (hours), 充电站 (charging station), 纯油 (gas),电量 (power), 车库 (garage), 加油站 (gas station), 电力 (electricity), 过电 (overcharge), 里程 (mileage), 便携式 (portable), 充满 (full), NEDC, ECO, HEV, hevs, EV, PHEV, Xpedal
Topic 2 Basic Features	加速 (accelerate), 续航 (endurance), 提速 (speed raising), 爆发 (burst), 秒杀 (crush), 加速度 (acceleration), 后程 (rear range), 马力 (horsepower), 中后 (mid-rear), 电池容量 (battery capacity), 响应 (response), 破百 (100 mph), 爬坡 (grade ability), 牛米 (Nm), 工信部 (Ministry of Industry and Information Technology), 扭矩 (torque), PHEV
Topic 3 Subjective Experience	感受 (feeling), 体验 (experience), 乐趣 (fun), 过程 (process), 环境 (environment), 巨赞 (awesome), 飙车族 (street racers), 平稳 (smooth), 交通状况 (traffic conditions), 惊喜 (surprise), 风光 (scenery), 大气 (splendid), SP
Topic 4 Horizontal Comparison	国外 (foreign), 天籁 (Tenna), 人机 (human–machine), 陆风 (land wind), 蔚来 (NIO auto), 车企 (car company), 尚酷 (Scirocco), 青年 (youth), 板块 (board), 乔丹 (Jordan), 舒马赫 (Schumacher), 较劲 (contest), 热爱 (love), 横跨 (cross), 国有企业 (state-owned enterprise)

Table 5. Demand topic feature expansion words (Group D).

Demand Topics	Topic Feature Expansion Words
Topic 1 New Energy Features	电池 (battery), 电车 (tram), 电动车 (electric car), 衰减 (battery decay), 损耗 (power loss), 汽油车 (gasoline vehicle), 太安静 (too quiet), 里程 (mileage), 蔚来 (NIO auto), 纯电 (electric), 长途跋涉 (long haul), 汽油 (gas), 续航 (endurance)
Topic 2 Basic Features	加速 (accelerate), 续航 (endurance), 里程 (mileage), 运动版 (sport), 冬季 (winter), 超车 (overtaking), 提速 (boost), 起步 (start), 红绿灯 (stoplight), 长距离 (long distance), 推背 (pushback), 马力 (horsepower), 掉电 (battery decay), 起飞 (takeoff), 电机 (motor), 油门 (accelerator), 输出 (output)
Topic 3 Subjective Experience	感受 (feeling), 豪华 (luxury), 做工 (workmanship), 科技 (technology), 用料 (materials), 古典音乐 (classical music), 科幻 (science fiction), 现代 (modern), 高端 (high-end), 念念不忘 (nostalgia), 鲨鱼 (shark), 运动感 (sportiness), 新颖 (novel), 上档次 (upscale), 前卫的 (profashionable), 营造 (create)
Topic 4 Services	换电 (change of battery), 提车 (get the car), 终身 (lifetime), 联系 (contact), 一年 (one year), 下单 (order), 试驾 (test driving), 跟踪 (tracking), 权益 (rights), 销售 (sales), 铺设 (lay-up), 保险 (insurance), 接地 (grounding), 上牌 (licensing), 维修 (maintenance)
Topic 5 Professional Performance	全铝 (all aluminum), 卡钳 (caliper), 甲醛 (formaldehyde), 气袋 (air bag), 阻尼 (damping), 制动 (brake), 滤震 (filter), 弹簧 (spring), 布雷 (Bray), 铝合金 (aluminum alloy), 举升机 (lifter), 赛车 (racing car), 座舱 (cab), 高频 (high frequency)

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, X.; Lv, T.; Fan, L. New Energy Vehicle Consumer Demand Mining Research Based on Fusion Topic Model: A Case in China. Sustainability 2022, 14, 3316. https://doi.org/10.3390/su14063316

AMA Style

Wang X, Lv T, Fan L. New Energy Vehicle Consumer Demand Mining Research Based on Fusion Topic Model: A Case in China. Sustainability. 2022; 14(6):3316. https://doi.org/10.3390/su14063316

Chicago/Turabian Style

Wang, Xiaoguang, Tao Lv, and Lei Fan. 2022. "New Energy Vehicle Consumer Demand Mining Research Based on Fusion Topic Model: A Case in China" Sustainability 14, no. 6: 3316. https://doi.org/10.3390/su14063316

APA Style

Wang, X., Lv, T., & Fan, L. (2022). New Energy Vehicle Consumer Demand Mining Research Based on Fusion Topic Model: A Case in China. Sustainability, 14(6), 3316. https://doi.org/10.3390/su14063316

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

New Energy Vehicle Consumer Demand Mining Research Based on Fusion Topic Model: A Case in China

Abstract

1. Introduction

2. Related Research

2.1. Consumer Demands

2.2. Consumer Demands for the Automotive Industry

3. Methodology and Model

3.1. Demand Topic Extraction Based on the Topic Model

3.2. Demand Attribute Expansion Based on the Vector Model

3.3. Demand Preference Mining Based on Fusion Topic Model

4. Results

4.1. Data Sources

4.2. Demand Topic Extraction

4.3. Demand Attribute Expansion

5. Discussions

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI