1. Introduction
In the context of global energy transition and environmental imperatives, electric vehicles (EVs) have become a pivotal driver of sustainable mobility and a key contributor to carbon neutrality. Among them, sport utility vehicles (SUVs) are particularly favored for their spacious interiors, versatility, and comfort [
1]. Studies show that the market share of electric SUVs (ESUVs) in China rose from 45% to 55% between 2019 and 2021 [
2], and projections suggest that ESUVs will surpass internal combustion engine (ICE) SUVs in the European Union by 2025, reaching nearly six million units by 2030 [
3]. More importantly, ICE SUVs consume more fuel and emit more due to their weight and drag [
4], whereas electric propulsion meets performance and space needs while improving energy efficiency. As a result, ESUVs represent significant strategic value and growth potential in the evolving automotive market. With consumer decision-making increasingly driven by emotional and experiential factors—and core product attributes becoming more homogeneous—automotive styling, especially front-end design, has become a key point of differentiation and emotional branding [
5]. This visual focal point not only forms consumers’ first impressions but also conveys brand identity and cultural value, thereby influencing emotional resonance and purchasing behavior. Within this context, shape imagery—defined as the symbolic and aesthetic meaning embedded in form—plays a vital role in fostering emotional connection [
6]. Furthermore, electrification reduces the need for large front grilles, granting designers greater creative freedom in ESUV front-end design [
7]. Thus, accurately capturing and predicting consumers’ emotional preferences and integrating them into front-end styling is of theoretical and practical significance for advancing emotion-driven EV design.
Kansei Engineering (KE) [
8] is an established method for analyzing the relationship between user emotion and product form, and has been widely applied in automotive styling [
9,
10]. It typically involves three stages: feature decomposition, affective information extraction, and predictive model construction [
11]. However, traditional KE studies often rely on interviews, surveys, focus groups, or basic text analysis, which struggle to capture diverse, multilayered affective needs and suffer from limited objectivity and timeliness. With the rise of e-commerce and big data, large volumes of user-generated content (UGC) have become an important source for collecting authentic emotional feedback [
12]. Scholars have employed Python-based Scrapy and natural language processing (NLP) techniques to mine online reviews and construct affective corpora, as demonstrated by Lai et al. [
13], who combined Scrapy with Word2Vec (a classic NLP technique) to identify Kansei vocabulary related to EV exterior design, thereby facilitating emotional need analysis and shape imagery prediction. However, conventional text mining techniques often fail to reveal latent topics and subtle emotional expressions in large-scale short-text UGC due to sparsity, noise, and fragmentation [
14]. To address these issues, unsupervised topic modeling has been widely adopted to extract semantic structures from unstructured text [
15]. While Latent Dirichlet Allocation (LDA) [
16] is frequently applied in KE-related studies [
17,
18], its performance declines in short-text scenarios due to sparse word co-occurrence [
19]. As an alternative, the Biterm Topic Model (BTM) [
20] models biterms across the corpus, improving topic extraction from short texts. For instance, Pan et al. [
21] used BTM to analyze public perceptions in heritage district reviews, while Zhang et al. [
22] applied BTM to identify latent user demands in intelligent product–service systems. These studies confirm BTM’s effectiveness for affective information extraction in design contexts. In predictive modeling, KE traditionally employs Multiple Linear Regression (MLR) [
23] and Quantification Theory Type I (QTT-I) [
24]. For example, Liu and Yang [
25] used MLR to map product features to affective responses, while Xue et al. [
26] adopted QTT-I to quantify user perception of design attributes. However, affective cognition is often nonlinear, subjective, and dynamic, limiting the effectiveness of linear models [
27]. In response, recent studies have explored machine learning methods such as Back Propagation Neural Networks (BPNNs) [
28] and Support Vector Regression (SVR) [
29]. For example, Zhu et al. [
30] employed BPNN to incorporate affective and sustainable design parameters for product optimization and satisfaction prediction, whereas Yang and Shieh [
31] utilized SVR to estimate consumer affective responses to styling attributes. Although these models handle nonlinearities well, their performance is sensitive to initialization and hyperparameter selection [
32,
33]. Thus, optimization techniques such as evolutionary algorithms [
34], random search [
35], and exhaustive grid search [
36] have been used to improve generalization. The research conducted by Lin et al. [
37], Liu et al. [
38], and Yang et al. [
39] serves as a prime example of this.
However, as problem dimensionality and complexity increase, traditional optimization techniques often struggle with highly nonlinear and dynamically evolving shape imagery prediction tasks, leading to premature convergence, limited global search, and high computational costs. To address these limitations, this study adopts Swarm Intelligence Algorithms [
40] to optimize BPNN and SVR, leveraging their robustness and adaptability in complex search spaces. Among them, the Seagull Optimization Algorithm (SOA) [
41], proposed by Gaurav and Vijay in 2019, simulates seagull soaring and diving behaviors through spiral flight and stochastic migration, enabling efficient global search in high-dimensional, multimodal spaces. SOA has shown strong performance in domains such as power systems [
42], environmental modeling [
43], and engineering safety [
44], and demonstrates better parameter adaptability than Whale Optimization Algorithm (WOA) and Grey Wolf Optimizer (GWO), particularly under sparse-sample conditions. Although underexplored in product design, its potential for high-precision nonlinear modeling is evident. Nonetheless, the inherent limitations of single-algorithm strategies, such as restricted search diversity and susceptibility to local optima, can compromise optimization effectiveness. To address this, this study also employs Particle Swarm Optimization (PSO) [
45], which simulates collective learning behaviors by dynamically updating particles’ positions based on personal and global best experience. Due to its simple structure and high computational efficiency, PSO has been widely applied to nonlinear prediction tasks. For example, Fu et al. [
46] successfully used PSO to optimize SVR parameters in Ming-style furniture design, significantly enhancing model accuracy and applicability. Based on these considerations, this study constructs four predictive models, namely SOA-BPNN, SOA-SVR, PSO-BPNN, and PSO-SVR, with their respective predictive performance systematically evaluated and analyzed based on error comparison methods. The objective is to identify the optimal associative model for the precise prediction of ESUV front-end shape imagery.
After determining the optimal predictive model, verifying its practical applicability is crucial. Traditional manual modeling for product styling is time-consuming, highly prone to subjective biases, and lacks reproducibility. With the rise of Artificial Intelligence-Generated Content (AIGC), design processes increasingly shift toward intelligent automation, providing an efficient framework for emotional expression and generative creativity [
47]. Text-to-Image (T2I) technology, in particular, synthesizes images from natural language descriptions, enabling multidimensional evaluation of shape imagery prediction models and reducing the workload and biases associated with manual design [
48]. To demonstrate this feasibility, the present study employs Stable Diffusion (SD), a state-of-the-art T2I tool, to generate ESUV front-end styling proposals and validate the predictive model’s effectiveness in Kansei Engineering-based applications.
In summary, this study proposes a KE-based ESUV front-end styling method by progressively integrating multiple analytical techniques. It begins by collecting frontal-view ESUV images from public platforms to build a styling dataset, from which representative samples are selected and their design features deconstructed as input for predictive modeling. Building on this, the BTM is combined with the AHP to extract Kansei vocabulary as model output, capturing consumers’ affective preferences. To further enhance model performance, SOA and PSO are employed to optimize the parameters of BPNN and SVR, respectively. Finally, the four models are evaluated through error comparison to identify the optimal configuration, which is subsequently validated via SD-generated styling proposals. The main contributions are threefold: (1) introducing BTM-based affective analysis into KE to improve the extraction of latent emotional needs from short, fragmented user texts; (2) enhancing the robustness and accuracy of shape imagery prediction by optimizing BPNN and SVR with SOA and PSO; and (3) validating the model via SD-generated proposals to reduce subjectivity and improve the rigor and reproducibility of the design process. Together, these contributions reinforce the reliability of KE-based predictive modeling and offer practical guidance for intelligent, emotion-driven design.
The remainder of this paper is organized as follows:
Section 2 presents the methodology and models;
Section 3 outlines the ESUV front-end design experiments;
Section 4 concludes the study.
4. Discussion and Conclusions
Addressing the common challenges in existing research on Kansei Engineering in automotive front-end styling design, such as insufficient timeliness in emotion vocabulary extraction, strong subjective dependency, and relatively low accuracy of nonlinear prediction models, this study proposes an improved Kansei Engineering-based ESUV front-end styling design method. First, through Python web scraping technology, 34,697 real consumer reviews and 156 ESUV front-end samples were collected from platforms such as Autohome and Yiche. BTM was then applied to conduct an in-depth analysis of the preprocessed corpus, uncovering 48 emotion words across 4 themes. Subsequently, using the AHP method, the weight ranking clearly identified 4 representative sets of Kansei imagery vocabulary. Next, BPNN and SVR were used to construct the Kansei imagery prediction models, and SOA was employed for global optimization to significantly enhance the prediction accuracy of the models. Furthermore, PSO was introduced to make comparative improvements to these models in order to assess the relative advantages of SOA. Through average error rate analysis of the prediction results for 4 validation samples and 4 sets of vocabulary, experimental results demonstrated that the SOA-BPNN model achieved the highest Kansei imagery prediction accuracy, and thus, it was applied to the early design phase of ESUV front-end styling. In conclusion, this work developed a novel and systematic emotion-driven design framework for ESUV front-end styling, providing designers with a clearer development pathway, and addressing the scientific and rationality challenges often faced when traditional research and development are driven by subjective experience. Additionally, the application of Stable Diffusion (SD) for generative design showcases the integration of cutting-edge artificial intelligence (AI) tools into the design process. By enabling real-time creation and evaluation of multiple design concepts, SD provides automotive designers with a powerful tool to iterate and validate emotional resonance in design proposals quickly. This approach enhances the efficiency and scientific rigor of the design process, making it possible to align aesthetic outcomes more closely with consumer emotional preferences while reducing subjective bias.
To further improve this work, future research could enhance the emotional vocabulary extraction process by exploring more advanced techniques, such as deep learning, to reduce manual intervention. Additionally, incorporating temporal models like Long Short-Term Memory (LSTM) networks would allow for better tracking of evolving consumer emotional preferences. Lastly, expanding the model to consider additional design elements, such as color, material, and texture, would offer a more comprehensive framework for emotion-driven design. In conclusion, this study introduces a data-driven framework integrating Kansei Engineering, machine learning, and generative design, significantly advancing emotion-driven automotive design. The findings highlight the potential of combining these technologies to better align product design with consumer emotional preferences. The proposed framework offers both theoretical insights and practical tools for automotive designers, enhancing the scientific rigor and emotional resonance of ESUV front-end styling design.