A Knowledge Graph-Based Implicit Requirement Mining Method in Personalized Product Development

Mo, Zhenchong; Gong, Lin; Gao, Jun; Cui, Haoran; Lan, Junde

doi:10.3390/app14177550

Open AccessArticle

A Knowledge Graph-Based Implicit Requirement Mining Method in Personalized Product Development

by

Zhenchong Mo

¹

,

Lin Gong

^1,2,*

,

Jun Gao

¹,

Haoran Cui

¹ and

Junde Lan

¹

School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100081, China

²

Yangtze Delta Region Academy of Beijing Institute of Technology, Jiaxing 314000, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(17), 7550; https://doi.org/10.3390/app14177550

Submission received: 9 July 2024 / Revised: 21 August 2024 / Accepted: 23 August 2024 / Published: 26 August 2024

Download

Browse Figures

Versions Notes

Abstract

:

In the context of crowd innovation and the generative design driven by big language models, the exploration of personalized requirements has become a key in significantly improving product innovation, concept feasibility, and design interaction efficiency. To mine a large number of vague and unexpressed implicit requirements of personalized products, a domain knowledge graph-based method is proposed in this research. First, based on the classical theory of design science, the characteristics and categories of personalized implicit requirements are analyzed, and the theoretical basis of implicit requirement mining is formed. Next, in order to improve the practicability and construction efficiency of the domain knowledge graph, a more informative ontology is constructed, and better-performing natural language processing (NLP) models are proposed. Then, a multi-category personalized implicit requirement mining method based on a knowledge graph is proposed. Finally, a platform was developed based on the technical solution proposed in this study, and an example verification was conducted in the field of electromechanical engineering. The efficiency improvement of the training model proposed in the research was analyzed, and the practicality of implicit requirement mining methods are discussed.

Keywords:

implicit requirement; knowledge graph; requirement mining; patent analysis; engineering design; generative design; crowd innovation

1. Introduction

The field of engineering design, especially in product design, service system design, and complex system design, is being reshaped by two major trends: on the one hand, with the rise of large language models, generic-field generative intelligent design is gradually becoming mainstream [1]. On the other hand, driven by the Internet, human designers have formed new design models, such as crowdsourcing design and group intelligence innovative design, through collaborative organization [2,3,4]. The core goal of these development trends is to address the growing personalized requirements and achieve rapid iterative design of products and user-participation design.

In engineering design, the expression of personalized requirements has significant semi-professional characteristics, which will be elaborated in detail in Section 3 “Nature of the Problem”. This semi-professional feature provides feasibility for directly extracting requirement elements from user descriptions, but also increases the importance and difficulty of implicit requirement mining. In the field of generative intelligent design, a major challenge is how to make machines empathetic [5], so that after users input personalized requirements, they can empathize with the design and product usage context, automate the completion of users’ prompts, mine design problems that users have not anticipated or are difficult to express clearly, and improve the output quality of intelligent generative models. In terms of group intelligence innovation design, there are limitations in users’ understanding of products, which is not conducive to the emergence of innovation, but also becomes a fundamental problem that leads to the design process being repetitive, chaotic, and of low design quality [2,3,4]. Therefore, accurately mining and supplementing implicit needs that have not been clearly expressed in personalized requirements is of great significance for improving the innovation of engineering design, the completeness of product functions, and the feasibility of solutions.

Many studies have analyzed the design requirements of [6,7,8] through data-driven technologies such as machine learning. Although these studies extensively use data derived from user reviews on the Internet, user-generated content (UGC) has inherent limitations in complementing user needs [9,10]. In addition, it is difficult for the review data in the field of engineering design to meet the research needs in terms of quantity and quality, and the timeliness of the data is not in line with the development trend of engineering design. To overcome these limitations, this study introduced the engineering design knowledge contained in the patents.

This research presents a method to identify implicit personalized requirements using knowledge graphs. First, this study provides an in-depth analysis of semi-professional, task-oriented personalized requirements from the perspective of design science and engineering theory to reveal the characteristics of implicit requirements. Then, it builds a domain knowledge graph by gathering product design knowledge from patents. By optimizing the ontology layer and stacking the pre-training model, the comprehensiveness and practicability of the knowledge graph are enhanced while improving its construction efficiency. Finally, taking the user’s personalized description as input, the explicit requirement elements are extracted and the corresponding knowledge entities are matched, and the mining of multiple personalized implicit requirements is realized based on the methods of structural similarity and link prediction.

This study makes several key contributions:

-: It performs a detailed analysis of the user requirements of the front-end design stage, clarifies the characteristics, forms, and types of the personalized implicit requirements of users that drive product innovation, and enriches the theoretical basis of the related research of the implicit requirement mining for innovative design.
-: It enriches the domain patent knowledge graph ontology layer, improves the comprehensiveness and practicability of the graph, and improves the accuracy and efficiency of the knowledge graph construction by optimizing the pre-training model of the entity layer.
-: Aiming at the different characteristics of personalized implicit requirements driving product innovation, this paper puts forward a series of targeted implicit requirement mining method systems based on the domain knowledge graph.

In the upcoming sections, we systematically explore and analyze personalized implicit requirement mining methods. Section 2 reviews in detail the literature related to the topic of this study, providing readers with the research background and theoretical basis in this field. Section 3 deeply analyzes the unique nature of the research object and problem, and clarifies this research’s boundaries and purpose. In Section 4, we propose a technical solution for building a domain knowledge graph. Section 5 elaborates on the personalized implicit requirement mining methodology. Section 6 demonstrates the platform we developed, verifies the effectiveness and practicality of our method through case studies, and discusses the performance of the algorithmic model proposed during this study. Finally, in Section 7, we comprehensively summarize the entire research and propose prospects for future research directions.

2. Related Works

2.1. Analysis of User Requirements in Engineering Design

Regarding the acquisition of valuable user requirements from these data, there is an increasing number of studies on algorithms and platforms for parsing consumer opinion data, such as identifying high-quality consumer opinion data, intelligently detecting customer sentiment polarity and its corresponding opinion targets, and automatically ranking the importance of user requirements [11]. User requirement analysis theory can be divided into three categories: identification and acquisition of massive user requirements, sorting and evaluation of fuzzy front-end requirements, and classification of requirements for design improvement. The research objects of related studies are mainly user online reviews of mass-produced consumer products.

2.1.1. Identification and Acquisition of Massive User Requirements

At present, most of the research related to requirement identification and acquisition methods focuses on automation technologies based on natural language processing (NLP). NLP is a theory-driven computing technology that automatically analyzes and represents human language [12]. In recent years, neural networks based on dense vector representation have yielded excellent results in various NLP tasks. As NLP technology has continued to advance, there has been a gradual increase in related studies focused on processing user requirements using this method. Li X et al. [13] proposed a kind of context perception diversity knowledge recommendation method, through semantic analysis, context definition and perception, and user portrait modeling to address the item diversity, context diversity, and user diversity, to realize the diversity and accuracy of the recommended knowledge and meet the requirements of many stakeholders in the design process. To leverage critical information in online review texts for automobile design and development, Zhang Guofang et al. [14] utilized TF-IDF and dependency syntactic analysis methods to extract product features. They then used the BERT pre-trained model for text sentiment analysis and built a House of Quality to transform requirements into technical engineering features. This approach resulted in a product planning method driven by review data.

2.1.2. Sorting and Evaluation of Fuzzy Front-End Requirements

The method of requirement sorting and evaluation mainly combines fuzzy set theory, rough set theory, and data science. The theory of fuzzy sets was introduced by Zadeh LA in 1965. Fuzzy set theory is based on fuzzy mathematics, and studies imprecise phenomena by establishing appropriate membership functions [15]. Rough set theory was first proposed by Polish scientist Pawlak Z [16] in 1982 and has been widely used in rule induction, feature selection, and other applications. Rough and fuzzy sets are two important mathematical tools used to deal with uncertainty after probability theory, and play an important role in tackling the ambiguity and uncertainty of requirements. To determine the relative importance level of heterogeneous fuzzy customer requirements more objectively and accurately, Zheng P et al. [17] proposed a weighted interval rough number method. They addressed the problem of customer heterogeneity fusion by assigning different weights to customers. Chen Z et al. [18] presented a hybrid framework for identifying user service requirements and determining priorities to address the lack of systematic induction and assessment methods for intelligent service requirements in uncertain environments. Haber N et al. [19] combined the Kano model and the fuzzy analytic hierarchy process (fuzzy AHP) to analyze and evaluate customer requirements and select a product service system that can improve product value. Zhang Meixia et al. [20] used fuzzy reasoning to derive the spatiotemporal distribution of requirements based on survey data and chain theory.

2.1.3. Classification of Requirements for Design Improvement

Most of the related studies of requirement classification analysis are based on the Kano model and combined with data-driven or knowledge-driven technologies. The Kano model is a user requirement analysis method proposed by Noriaki Kano in the 1980s [21]. This method divides user requirements into basic, expected, attractive, indifferent, and reverse requirements, and shows the relationship between different types of requirements and customer satisfaction. Since its introduction, the Kano model has become one of the most commonly used requirement analysis models by marketing and management practitioners and researchers [22]. Zhou F et al. [23] used FastText technology to filter out uninformative comments from Internet product reviews from the perspective of the product ecosystem. Then, a topic modeling technique was used to extract various topics related to customer requirements, and a rule-based sentiment analysis method was applied to predict the comments’ sentiment and intensity values. Finally, based on the “dissatisfaction-satisfaction” pairs in sentiment analysis, the analytical Kano model was used to classify customer requirements related to the extracted topics. Budiarani V H et al. [24] implemented a Kano model for e-commerce. They used it to study user satisfaction with two widely used digital wallets in online shopping transactions during COVID-19. Janmejay Bhardwaj et al. [25] studied the 20 characteristics of products based on the Kano model to determine whether a specific requirement characteristic plays a decisive role in the customer’s purchasing behavior.

However, unlike general user requirements, personalized requirements do not often appear on ordinary e-commerce platforms, but more often appear in crowdsourcing or collaborative community platforms in the form of product development tasks. This article takes this type of personalized requirement as the primary research object.

2.2. Mining and Completion of Highly Personalized Degree of User Requirements

Many scholars have proposed exploring the implicit needs through Empathic Design, observing the scenarios of users using products, and letting users participate in the design process based on use-case reasoning. Zhou F et al. [26] proposed a two-layer model that combines sentiment analysis and case analogy reasoning to address the problem of potential user requirements often being hidden in user requirement semantics and being difficult to recognize by typical text mining-based methods. This model is used to identify explicit user requirements and deduce implicit features of potential user requirements. Timoshenko A et al. [27] identified customer requirements from UGC by combining machine learning methods, using convolutional neural networks to filter out non-informative content, and using interview methods to confirm requirements that could not be discovered effectively. Wang Z et al. [28] predefined ontologies such as products, services, and scenarios, extracted requirement-generated scenarios and targeted products and services from historical requirement data, constructed a requirement graph, and completed requirement elicitation based on the DeepWalk algorithm. Chen R et al. [29] proposed a domain-based requirement mining framework to promote improving mobile application quality. Zhang M et al. [30] proposed a deep learning-based method to identify and extract product innovation ideas from online review data, using the LSTM algorithm to identify sentences containing innovative ideas from reviews. Chen K et al. [31] extracted explicit and implicit features from user-generated text to obtain requirements. Then they used a multi-layer neural network to distinguish the impact of positive and negative opinions on each product feature.

From the above research, we can see that, firstly, in the process of identifying and improving the requirements, it is difficult to build a stable communication channel between the users and the designers, which does not assist the designers to clarify and improve the user requirements, let alone further inspire the innovative design. Secondly, the above implicit requirement inspiration methods are primarily based on historical requirement data, user usage data, and other UGC, which requires a large number to support use-case reasoning. However, the number of requirements for the same personalized products is generally small, which makes it challenging to support accurate requirement identification and completion. Finally, the above methods still require a lot of subjective judgments, and can hardly solve the subjective limitations of personalized requirements for innovative design. Therefore, based on the existing analysis of personalized requirement data, there is an urgent need to integrate relevant domain knowledge to assist design.

2.3. Knowledge Graph in Engineering Design

Concept design is a knowledge-intensive process. Design knowledge is a crucial design resource in motivating designers to inspire creativity in concept design. Design knowledge in multidisciplinary fields mainly exists in the form of natural language. The rapid development of NLP technologies, such as named entity recognition, topic discovery, embedding models, pre-trained models, sentiment recognition, text similarity calculation, search recommendation, and text generation, provides opportunities for designers to process unstructured text and obtain design knowledge from it. L. Siddharth, Jianxi Luo, et al. [32] reviewed the essential applications of natural language processing technology in concept design. They assigned corresponding natural language processing technology and models to different design stages under the existing conceptual design framework. Jia J et al. [33] proposed a method for capturing and reusing implicit knowledge in the design process. The design problem and design solution are represented by constructing a design knowledge graph. The implicit design knowledge is reused through relational learning, tensor decomposition, and other technologies. Jianxi Luo et al. [34] proposed a computer-aided design creative generation method based on the InnoGPS system. In response to the problems of traditional design creative generation relying on the knowledge or intuition of human experts and high uncertainty, combined with data-driven design methods, a cloud-based computer-aided rapid creative process is provided. The InnoGPS system proposed in this study integrates an experience network diagram of all technical fields based on international patent classification, with functions such as technology spatial mapping, technology positioning, neighboring domain recommendation, map browsing and domain discovery, and concept discovery within the domain. Liu Q et al. [8] proposed a function–structure concept network construction and analysis method to support an intelligent product design system, exploring the design information association containing explicit and implicit associations as a stimulus to stimulate creativity to support design conception. Ye F et al. [35] proposed a cross-domain knowledge discovery method based on a knowledge graph and patent mining. Knowledge elements are classified through natural language processing-related technologies such as BERT (Bidirectional Encoder Representations from Transformers) and word2vec, and the correlation between cross-domain knowledge is mined. Luo J et al. [36] proposed a knowledge-based expert system to guide data-driven design conception through knowledge distance.

As a hallmark of the fourth industrial revolution, Artificial General Intelligence (AGI) has brought about a paradigm shift in design. Large-scale language models such as GPT demonstrate significant integration of cross-domain knowledge and general reasoning capabilities through learning and training on large-scale data, exhibiting empathy and creativity in extremely high dimensions and sequences, and have broad application prospects in creative generation, design questioning and answering, etc., and stimulating the possibility of design change [37]. Zhu Q et al. [38] proposed a generative design method based on a pre-trained language model to generate biological incentive design concepts in the form of natural language. Qihao Zhu et al. [39] explored an intelligent design idea generation method based on pre-trained language models, using near-field or far-field external knowledge as an inspiration to generate concise and easy-to-understand design ideas. Qihao Zhu et al. [40] continued to explore the application of natural language generation (NLG) technology in concept generation in the early design stage. They proposed a method using the GPT-3 model to transform knowledge in text data into new concepts in the form of natural language.

As mentioned above, relevant research ignores the front-end design problems to be solved, that is, the mining and completion of user requirements, and pays too much attention to the concept generation method, leading to repeated iteration of the design process, and the output concept scheme needs to be improved in terms of innovation and feasibility [3,4,5].

3. Nature of the Problem and Research Framework

In order to clarify the essence of personalized implicit requirement mining, according to the classical theory of engineering design [41], the classification and representation of elements in personalized requirements are defined, and the differences between general user requirements and personalized requirements are compared, so as to further analyze the characteristics and categories of implicit personalized requirements.

3.1. Classification and Characterization of Elements of Personalized Requirements

In engineering design, the expression of personalized requirements often has a certain degree of professionalism, and can clearly describe the core function and structure of the product. This feature provides the feasibility of directly extracting requirement elements from user description. Before information extraction, we refine the requirement elements into two categories: functional elements and structural elements based on axiomatic design theory. Further, according to the behavior theory, we subdivide the functional elements into “function operation” and “function object” to facilitate the subsequent matching and supplement process.

In most descriptions of personalized requirements, users usually describe the specific functions and partial structures of the product. Using natural language processing (NLP) technology, we can identify, classify, and extract the functional and structural requirements from the original text. Additionally, the functional elements can be further divided into FA (verb) and FO (noun). For example, in a real user requirement case shown in Figure 1, we successfully identified functional requirements such as “loading and unloading” (FA) and “paper box” (FO), as well as structural requirements such as “six-axis robotic arm”.

3.2. Analysis of the Characteristics of Personalized Implicit Requirements

Based on the large amount of user design requirement data accumulated during the previous research and design practice, this study summarizes, extracts, and compares the characteristics of general user requirements and personalized requirements from the aspects of sources, targeted products, user characteristics, requirement characteristics, and contents. The symbolic characteristics of personalized requirements are studied.

It can be seen from Table 1 that the biggest characteristic difference between general user requirements and personalized requirements is that the former is fuzzy and emotional, while the latter is relatively more professional. Therefore, the general user requirements are more vulnerable to the user’s incomplete expression, so the implicit requirements are represented by the lack of standard function operation words, typical structure words, and standard engineering parameters. However, personalized requirements are subject to the subjective limitations of users. The expressed requirements are often limited to specific functions or structural words. Implicit requirements are mostly actions related to the time series of FAs, sub/parent, or very similar structures of FOs/structures. Based on classical theories in requirement engineering, FBS and other design science fields, and massive design requirement data, the characteristics, typical performance, and requirement examples of the above implicit requirements are presented in Table 2.

Therefore, the goal of this study is to mine the personalized implicit requirements of sub-actions of FAs, pre-/post-operations of FAs, sub/parent structures of FOs/structures, pre-/post-actions of sub-actions of FAs, and very similar structures of FOs/structures. Through the mining of these requirements, a more accurate personalized requirement completion model was constructed to improve the innovation, feasibility, and interaction efficiency of generative design or swarm intelligence design.

3.3. Research Framework

For the nature of the problem in this study, combined with the patent knowledge in the field, the research framework of this paper is proposed in Figure 2:

4. Patent-Based Domain Knowledge Graph Construction Method

The process of constructing a domain knowledge graph based on patent text is illustrated in Figure 3. It consists of four main stages: ontology layer construction, data pre-processing, model pre-training, and data layer construction. In order to enhance the entity recognition and sentence sequence classification performance of the model in the field of patent text, this study conducts incremental pre-training on a large number of collected domain-related patent text data, and constructs the domain knowledge pre-training models BERT-base-patent-fusion and BERT-wwm-ext-patent-fusion based on two base models, BERT-base and BERT-wwm-ext [42], respectively. The pre-trained models marked with * in the figure are the models proposed in this study. The numbers and letters in brackets in the data layer construction represent the data sets and training model numbers referenced in the construction process.

4.1. Ontology Layer Construction

This paper uses ontology to standardize entities, relationships, and the types and properties of entities. In the knowledge graph ontology constructed in this paper, the patent attribute part is consistent with most related studies, and the ontology is constructed based on the content contained in the patent text. The schema of the ontology is shown in Figure 4. The ontology of the design knowledge element contained in the patent (Class) is designed by the authors for the research object of this study, based on classic theories in the field of design science, including FBS, axiomatic design, and design standard word basis. The ontology layer schema of the domain knowledge graph can be expressed as <Class, Property, Relation>, where:

Class = {class_i}, which specifies the node classes in the network and the hierarchical relationships between classes. The domain knowledge graph defines three types of entities, namely, functional entities, structural entities, and patent entities; a total of 11 node classes are specified, among which function-action class and function-object class are subclasses of the function class, and power component, transport component, control component, etc. are subclasses of the structure class.

Property = {<class, property_type, value>} specifies the attributes that each entity in the knowledge graph should have and the range of its property value.

Relation = {class_A, rel_type, class_B} specifies the relationships between various entities. Relationships can be divided into three categories: patent has functions, patent has structures, and related relationships between patents due to similar functions or structures.

4.2. Data Pre-Processing

This paper focuses on Chinese patents as the research subject, and first of all, it needs to complete the structured parsing and storage of patent data. In addition to the patent data set, for the construction of the pre-training model, three data sets are extracted and labeled, including the patent data set with domain-related labels, the data set with function and structural entity labels, and the data set with structural entity category labels.

The patent data set with domain-related labels is an annotation data set used to determine whether a patent is domain-related. The domain-related classification of patents is a sentence classification task. The domain features of the title and rights statement are significant. As the input data of the model, the patent data with domain-related labels is annotated to form the training and verification sets of the model.
The data set with function and structural entity labels is constructed to fine-tune the named entity recognition model of patent text. The BIO labeling method is used to label the functional and structural entities in the patent title and claim text for the model pre-training of extracting functional and structural entities.
The data set with structural entity category labels is used to build a classification model for structural entities, subdividing structural entities into subcategories such as power and transportation components.

4.3. Model Pre-Training

Patent text classification, information extraction, and entity semantic similarity calculation all need NLP support. At present, the best SOTA (state-of-the-art) model adopts the “pre train + fine tune” framework. Based on this framework, this paper superimposes patent text data for the pre-training process to enhance the model’s ability to recognize and understand patent text, so as to obtain better results in patent text-oriented NLP tasks. When Google released the BERT model in 2018, it included a Chinese model trained on the Chinese Wikipedia corpus. In 2019, the BERT-wwm-ext model was proposed by Harbin CuiY et al. [43]. On the basis of BERT, the masked language model at the character level was improved to encompass the whole word mask, and the corpus used for pre-training was expanded, which achieved significant optimization results in multiple tasks.

To improve the model’s ability to recognize entities and classify sentence sequences in domain patent texts, this study conducted incremental pre-training using a large number of domain-related patent text data with two base models: BERT-base and BERT-wwm-ext. This led to the creation of the pre-training models BERT-base-patent-fusion and BERT-wwm-ext-patent-fusion, which superimposed domain knowledge. For fine-tuning in subsequent tasks, it will be trained based on four models: BERT-base, BERT-base-knowl-fusion, BERT-wwm-ext, and BERT-wwm-ext-knowl-fusion.

4.4. Data Layer Construction

The data layer construction of the domain knowledge graph can be divided into five stages: domain patent screening and information extraction, functional and structural entity extraction, classification of structural entities, structural entity resolution and fusion, and relationship generation in the graph.

4.4.1. Domain Patent Screening and Information Extraction

In order to improve the information purity of domain knowledge graph, it is necessary to screen all patents related to the domain in all patent data. Since there are patents related to the domain under all IPC primary partitions, it is not possible to directly screen through the IPC partition information of patents. Therefore, through the filter based on the NLP algorithm, this paper selects the domain patents for graph construction.

Based on the NLP algorithm and data set with domain-related labels, this paper fine-tunes four pre-training models to construct the domain-related patent filtering model.

As shown in Figure 5, in the patent title and claim text, the last hidden layer vector C∈R^H output after the information extraction of the Transformers module in BERT [44] is used as the representation vector of the sentence, where H is the number of neurons in the last hidden layer. Finally, the classification layer weight W ∈ R^K×H is adjusted, where K = 2 is the number of labels, and the model training can be completed by calculating the classification error based on the SoftMax result output by the following formula:

P = softmax(CW^T)

(1)

Domain-related patents and their attribute information, such as patent numbers, titles, and IPC, are imported into the knowledge graph to complete the construction of patent entities and extract their attribute information. An example is shown in Figure 6.

4.4.2. Functional and Structural Entity Extraction

Extracting functional and structural entities from unstructured patent text is a text sequence annotation problem [45]. Specifically, given a text sequence of length N,

S = (w₁, w₂, …, w_N)

(2)

where w_i represents a single word or token. The named entity recognition model returns a series of tuples <I_s, I_e, t> (I_s ∈ [1, N], I_e ∈ [1, N]), where I_s and I_e represent the start and end positions of a named entity, and t represents the entity category, which should be one of the entity categories predefined according to the actual problem model. The function and structure entity extraction model has three types of named entities, namely FA (function-action), FO (function-object) and S (structure), so t ∈ {FA, FO, S}.

The patent title mainly includes the function of the patent and its overall structure. For example, “A kind of stone grinding machine” clearly states that its function is “grinding stone”, where the function-action entity is “grinding”, the function-object entity is “stone”, and its overall structure is “stone grinding machine”, which is a structural entity. The claim of the rights mainly contains the technical characteristics of the patent and expresses the scope of protection requested. The structural entity can be obtained using named entity recognition technology from the claim of rights.

The functional and structural entity extraction model based on the patent title, and the structural entity extraction model based on the patent rights claim, are constructed. Based on the data set with patent functional and structural entity labels, we fine-tuned four pre-trained models, selected the best iteration rounds and models, and constructed three entity extraction models.

4.4.3. Classification of Structural Entities

In order to provide more detailed entity information, the structural entities are divided into eight categories: power component, transport component, measuring component, adjusting component, raw materials, mechanical basic part or other component, electronic components, and functional modules, as shown in Table 3. In the process of named entity recognition, it is inevitable that some wrong recognition entities such as non-related fields are introduced, so invalid entity is also regarded as a category. Similarly, based on the data set with a structural entity category label, the structural entity classification model is constructed by fine-tuning and testing the four pre-training models.

4.4.4. Structural Entity Resolution and Fusion

Entity resolution is mainly used to solve the problem of multiple references actually pointing to the same entity. In the domain knowledge graph, since the product described in the patent may have multiple identical structures, they will be distinguished by “Xth structure”, “Structure X”, etc. Therefore, it is necessary to perform co-referential resolution analysis on the structural entities and fuse the entities that point to the same entity.

A rule-based method is used for the structural entity resolution: first, regular expressions are used to determine whether the structural entity is expressed in the form of “Xth structure” or “Structure X”. If so, the “Xth” prefix or “X” suffix is deleted to determine whether the new entity formed exists. If it exists, the entity is merged with the existing entity; otherwise, a new entity is created.

4.4.5. Relationship Generation in the Domain Knowledge Graph

There are three types of relationships in the domain knowledge graph: “Has_Function”, “Has_Structure”, and “Is_Related”. Among them, the “Has” type relationship is between the parent entity and the function and structure entities. In extracting the function and structure entities, this type of relationship has been obtained and can be directly constructed in the graph. The “Is_Related” is constructed based on the network structure similarity between entities. Based on the highly interpretable Jaccard similarity measure, the structural similarity between patent entities with more identical functional and structural entities is higher, and the “Is_Related” is constructed between patent entities above the similarity threshold.

5. Multi-Category Personalized Implicit Requirement Mining Method Based on the Knowledge Graph

In view of the different categories of personalized requirements mentioned in Table 2, this paper implements entity-layer co-occurrence implicit requirement mining based on structural similarity, and constructs a co-reference embedding layer to realize non-co-occurrence implicit requirement mining based on link prediction, as summarized in Table 4:

Therefore, the process of mining personalized implicit requirements based on a domain knowledge graph mainly includes three stages: explicit requirement element identification and entity matching, co-occurrence implicit requirement mining of the entity layer, and non-co-occurrence implicit requirement mining of the co-reference embedding layer. As shown in the following formula, for the expression of explicit requirement elements in the domain knowledge graph, that is, each element in the knowledge entity set

K E = \{{k e}_{1}^{F A}, \dots, {k e}_{n_{1}}^{F A}, k e_{1}^{F O}, \dots, {k e}_{n_{2}}^{F O}, {k e}_{1}^{S}, \dots, {k e}_{n_{3}}^{S}\}

: If there is a knowledge entity of co-occurrence relationship, the structural similarity algorithm is used to measure its relativity; otherwise, its relativity with will be measured based on the entity embedding and link prediction algorithm. Based on the correlation

R (k e, e_{i})

, the two types of entities are sorted to realize the personalized implicit requirement mining.

R (k e, e_{i}) = \{\begin{matrix} G r a p h S i m_{J a c c a r d} (k e, e_{i}) Entity layer \\ G r a p h S i m_{N o d e 2 V e c} (k e, e_{i}) Co - reference embedding layer \end{matrix}

(3)

5.1. Explicit Requirement Element Identification and Entity Matching

The model fine-tuned based on the four pre-training models is used as the identification model for personalized explicit requirement elements, completing the mapping of unstructured RD (requirement description) to ERE (explicit requirement entity), as shown in the following equation, where

{e r e}_{i}^{F A}, {e r e}_{i}^{F O}, a n d {e r e}_{i}^{S}

represent the explicit FA, FO, and structural elements expressed by the user:

\begin{matrix} N D \to E R E = \{{e r e}_{1}^{F A}, \dots, {e r e}_{p_{1}}^{F A}, {e r e}_{1}^{F O}, \dots, {e r e}_{p_{2}}^{F O}, {e r e}_{1}^{S}, \dots, {e r e}_{p_{3}}^{S}\} \\ (p_{1} + p_{2} + p_{3} > 0) \end{matrix}

(4)

The matching process between explicit requirement elements and entities in the knowledge graph is completed based on the cosine similarity calculation of word embedding vectors. Each explicit requirement entity is input into the BERT model to obtain its semantic vector representation

V_{e r e} \in R^{H}

. Similarly, the semantic vector representation of functional and structural entities in the domain knowledge graph is obtained, that is, the matrix

M \in R^{N \times H}

, where N is the number of functional and structural entities in the network. Then, based on the cosine similarity, the cosine similarity measures of M are obtained to complete the matching of demand elements and knowledge entities, as follows:

S e m S i m {(V_{e r e}, M)}_{i} = \frac{V_{e r e} \cdot M_{i}}{|V_{e r e}| | M_{i} |} \forall i \in [1, N], i \in N^{*}

(5)

Based on the matching process between requirement elements and knowledge entities, the mapping process from the explicit requirement element set ERE to the entity KE (knowledge entities) set in the knowledge graph is completed, as shown in the following formula:

\begin{matrix} E R E = \{{e r e}_{1}^{F A}, \dots, {e r e}_{p_{1}}^{F A}, {e r e}_{1}^{F O}, \dots, {e r e}_{p_{2}}^{F O}, {e r e}_{1}^{S}, \dots, {e r e}_{p_{3}}^{S}\} \to K E \\ = \{{k e}_{1}^{F A}, \dots, {k e}_{n_{1}}^{F A}, k e_{1}^{F O}, \dots, {k e}_{n_{2}}^{F O}, {k e}_{1}^{S}, \dots, {k e}_{n_{3}}^{S}\} \\ (n_{1} + n_{2} + n_{3} \geq p_{1} + p_{2} + p_{3}) \end{matrix}

(6)

5.2. Co-Occurrence Implicit Requirement Mining of the Entity Layer

For implicit requirement elements that co-occur with explicit requirements, the structural similarity algorithm can be used to mine them. With explicit requirement entities as input, possible implicit requirement entities and their ranking are output, as shown in Figure 6:

The mining of co-occurrence-related entities with the same neighbors can be completed using the node similarity calculation method based on network theory. The node structure similarity calculation algorithms based on network theory mainly include Jaccard similarity, Sorensen–Dice similarity, Hub Promoted similarity, and Hub Depressed similarity, which are all based on the neighbor node set calculation. This paper uses Jaccard similarity to calculate node similarity.

In the network G = (V, E), V is the vertices set and E is the edges set. For a vertex x ∈ V, N(x) represents the set of its neighbors. Then, for x, y ∈ V, the Jaccard similarity is:

G r a p h S i m_{J a c c a r d} (x, y) = J (x, y) = \frac{|N (x) \cap N (y)|}{|N (x) \cup N (y)|}

(7)

The output of the above formula is sorted, and the co-occurrence layer implicit requirement entities are mined and recommended.

5.3. Non-Co-Occurrence Implicit Requirement Mining of the Co-Reference Embedding Layer

Some personalized implicit requirements do not have a co-occurrence relationship with the explicit requirement elements in the graph. This study uses network node embedding and link prediction algorithms to mine these implicit requirements. Additionally, the explicit requirement entities are taken as the input, and the possible non-co-occurrence implicit requirement entities and their ranking are output. Taking the requirement “transportation” as an example, examples of non-co-occurrence implicit requirements, such as “Preparation according to order” and “delivery”, are shown in Figure 7 (in addition to the “secondary co-occurrence” implicit requirements in the figure, our method can also mine the entities with other types of network structural similarity):

Link prediction is an algorithm that predicts the possibility of a new connection between two nodes in a given network. With the continuous advancement of complex network modeling technology, researchers have proposed many link prediction algorithms for different types of networks, including simple similarity-based, probability theory and maximum likelihood-based, and dimension reduction-based. According to the research of Kumar A et al. [46], based on the AUPR (area under the precision–recall curve), it was found that the network embedding algorithm Node2Vec based on the dimension reduction had the best performance in the test data set. Therefore, this paper uses the Node2Vec algorithm to complete the link prediction task of the knowledge graph to assist in the discovery of non-co-occurring-related entities and support the mining of implicit requirements. The process of implicit requirement mining based on the Node2Vec algorithm includes the following three steps:

5.3.1. Build Co-Occurrence Networks of Functional and Structural Entities

If the sequence is collected directly based on the knowledge graph, it can only start from the patent entities, and the sequence length just can be 2, such as

(P_{1}, F_{1}), (P_{2}, S_{1})

. Therefore, we construct an entity co-occurrence network that can generate a more extended node sequence, such as

(F_{1}, F_{2}, F_{3}), (S_{1}, S_{3}, S_{2})

, which is conducive to capturing the relationship between entities. Based on the co-occurrence of functional and structural entities, the two co-occurrence networks are constructed respectively. The network nodes are still functional and structural entities in the knowledge graph, and the edge weight is the number of co-occurrences. This process is shown in Figure 8:

5.3.2. Construct Node Sequences by Biased Random Walk

As shown in Figure 9, the Node2Vec algorithm draws on the method of generating word vectors. It first starts from any point in the network based on a biased random walk and transfers between connected nodes. In this way, a limited nodes sequence can be collected, and the SkipGram model [47] is used to train the embedding vector E based on these sequences.

The “Biased Random Walk” method proposed by Node2Vec uses two parameters to adjust the random walk process. As shown in Figure 10, nodes V₅ and V₁ belong to the same node community, showing the homogeneity of nodes; while V₅ and V₈ belong to two different communities, but have the same structure (or role) in these two communities, showing the structural equivalence of nodes. In order to adjust the weights of these two structural characteristics in the algorithm, Node2Vec generalizes DeepWalk [48]. By setting parameters, the bias of the breadth-first sampling strategy and the depth-first sampling strategy in the node sequence generation process can be selected to support the adjustment of the homogeneity and structural equivalence information reflected in the node embedding vector.

Specifically, Node2Vec defines a random walk with parameters. Suppose a random walk has just walked through the edge (step ① of Figure 10), and the current position is at node

v_{4}

. The number on the arrow represents the step number during the search. Next, the random walk needs to make the next edge by evaluating the transition probability

π_{v_{4} x}

. Let the transition probability be

π_{v_{4} x} = α_{p q} (v_{4}, x) \cdot ω_{v_{4} x}

(8)

α_{p q} (v_{4}, x) = \{\begin{matrix} \frac{1}{p} i f d_{v_{4} x} = 0 \\ 1 i f d_{v_{4} x} = 1 \\ \frac{1}{q} i f d_{v_{4} x} = 2 \end{matrix}

(9)

where

ω_{v_{4} x}

is the edge weight, and

d_{v_{4} x}

represents the shortest distance between

v_{4}

and

x

. The parameters p and q can control the strategy of random walk exploration, where p controls the possibility of a walk returning to the previously reached node. Setting it to a large value ensures a more breadth-first walk, and a smaller q value encourages the depth-first strategy.

5.3.3. Train the SkipGram Model to Obtain the Entity Embedding Vector, and Sort the Entities by Relevance

Based on the collected node sequence set, a node is regarded as a word, and a node sequence is regarded as a sentence. Based on the SkipGram method of the Word2Vec model, each node is represented as a low-dimensional dense vector

E_{i}

.

G r a p h S i m_{N o d e 2 V e c} (x, y) = \frac{E_{x} \cdot E_{y}}{‖E_{x}‖ ‖E_{y}‖}

(10)

When mining non-co-occurring implicit requirements, the formula can be used to calculate and sort the relevance of all entities, giving possible implicit requirement entities.

6. Platform Development, Case Study, and Discussion

6.1. Platform Development

Based on the domain knowledge graph construction and personalized implicit requirement mining technology mentioned above, a personalized implicit requirement mining platform is designed to provide knowledge entity retrieval, query, matching, and personalized requirement identification and completion functions.

6.1.1. Platform Development and Operation Environment

The development and operation software environment and functional framework are shown in Figure 11. The programming language is Python 3.6. The structured patent data are stored in the MySQL database, while the knowledge network uses the unstructured database Neo4j for storage. The Web backend development is mainly based on the Flask framework. The presentation layer uses the Bootstrap framework and JavaScript to provide interactive operation.

6.1.2. Construction of the Knowledge Graph in the Electromechanical Domain

The data are sourced from the experimental system of the State Intellectual Property Office of China, and consist of XML format files containing patent text information and JPEG or TIFF format files containing patent image information. The XML files are parsed into structured tables based on the format and stored in the database along with the path of the corresponding JPEG or TIFF files. Based on the technical process of constructing the domain knowledge graph, this study acquired, parsed, and stored 212,100 pieces of structured data of Chinese patents, annotated 100 domain-related tags and 3000 entities, including 1000 structural entities, forming a pre-training data set of 62.01 million words.

This study selected sentences with a length of 100 to 500 characters separated by Chinese periods to create a pre-training data set. Using unlabeled patent texts, characters were masked with a 15% probability of performing the masked language model training task, completing the pre-training of the model with domain-specific knowledge. The classification of structural entities in the labeled data set is shown in Table 5.

The training information of the four models is shown in Table 6. The * mark represents the model proposed in this paper.

Finally, a knowledge graph of electromechanical domain was constructed, which included 94,379 patent entities, 27,098 functional entities, and 856,461 structural entities. A partial screenshot of the graph is shown in Figure 12.

6.1.3. Function Modules of the Platform

Functional modules of the platform include three parts: requirement element identification, requirement element–knowledge entity matching, and implicit requirement mining.

Requirement element identification

The requirement element identification interface is shown in the figure. Users can directly enter personalized requirements in the text box and then click the “Requirement Analysis” button to obtain the domain identification of requirement elements. Users can add and delete them.

Requirement elements–knowledge entity matching

The requirement elements–knowledge entity matching page is shown in the figure. Users can adjust and update the requirement elements, and add and delete matched knowledge entities. The entity-matching results are presented in three parts: “FAs”, “FOs”, and “Structures”. Users can select a knowledge entity by clicking the “Add” button or the “View” button to view the entity’s information.

Implicit requirement mining

Based on the mining methods proposed in this paper, the platform lists all the implicit requirements that have been mined, and users can click the “Add” button to select as shown in Figure 13 and Figure 14. The interface for implicit requirement mining will be presented in the Case Study.

6.2. Case Study

In order to verify the feasibility of the proposed personalized implicit requirement mining method, this paper takes the personalized requirement of mechanical automation equipment on a crowdsourcing platform as an example. The original requirement description proposed by the user is as follows:

“My company is engaged in building materials. I hope there will be a product that can realize the automatic sorting of steel pipes. Now, we sort them manually on the open ground and put them on the truck. We sort about ten trucks a day. I hope that the equipment can replace the manual work. Manufacturers who can do it can contact me to communicate. There is no limit on the region.”

6.2.1. Explicit Requirement Element Identification

The result of the explicit requirement elements identified by the algorithm is taken, with FA element “sorting” and FO element “steel pipe” as examples.

6.2.2. Requirement Element–Knowledge Entity Matching

According to the results returned by semantic similarity, the “sorting” and “steel pipe” entities already exist in the knowledge graph, and the matching is completed directly.

6.2.3. Implicit Requirement Mining

The co-occurrence and non-co-occurrence entities of “sorting” and “steel pipe” are calculated. In the link prediction process, since the mining of implicit requirements focuses on discovering node homogeneity, the selection of parameters p and q should focus on depth priority, so p = 0.25 and q = 4 are selected as the parameters for controlling the biased random walk as shown in Table 7.

The element identification, entity matching, and requirement mining results are shown in Figure 15 and Table 8:

After exploring implicit requirements, the completed user requirements can be described as follows:

“Our company is engaged in building materials. We hope to have a product that can automatically sort building materials steel pipes. The equipment is required to classify steel pipes by detecting short pipes. In some scenarios, it is necessary to verify the marking information on the inner wall, port or surface of the steel pipe. After sorting, the head ends of the steel pipes requirement to be aligned and placed, and then dispatched to the vehicle for transportation. Now it is manually sorted on the open ground and sorted to the vehicle. About 10 vehicles are sorted a day. I hope that the equipment can replace manual labor. Manufacturers who can do it can contact me for communication. There is no limit on the region.”

6.3. Discussions

First, we focus on the training performance of the two NLP models proposed in this study. The models were fine-tuned at different stages of the graph construction, and evaluated using as the standard and by five-fold cross-validation.

6.3.1. Domain Patent Classification Model

The results are shown in Figure 16. When the training reached the sixth round, the of each model was stable at around 90%. Starting from the seventh round, some models began to overfit, and the on the validation set showed a downward trend. Therefore, based on the performance of each model when the training round was 6, the best training performance at this time was BERT-wwm-ext-patent-fusion (=92.56%).

6.3.2. Functional and Structural Entity Identification Model

Different from the classification model, in the effect evaluation of the entity identification model, this study adopts accurate matching evaluation in the calculation of the quasi-call rate, that is, only when the boundary and category are completely matched with the manually marked entity is it recorded as a true positive (TP). Additionally, Micro- is used, that is, the method of calculating the total accuracy rate and recall rate of all categories.

The results are shown in Figure 17 and Figure 18 The value of identifying the functional entities from the patent titles is about 70%; this is mainly because the patent titles are short texts, contain less context information, and have obvious unstructured characteristics, which increases the difficulty of obtaining information from the model. The structural entity identification model in the patent claim text is relatively high, up to 94.6%, and the best performance in training the above two models was that of BERT-wwm-ext-patent-fusion.

6.3.3. Structural Entities Classification Model

As shown in Figure 19, the value of the pre-trained model proposed in this study is significantly better than the benchmark BERT model by over 2 percentage points, and the BERT-base-patent-fusion model was used as the classification model of structural entities.

The performance of the four pre-trained models is summarized in Table 9.

The results indicate that the pre-training models proposed in this study outperformed the other models at different stages of graph construction.

In addition, the final output of the research content of this paper, that is, the results of personalized implicit requirement mining, should be discussed. From the case study, it can be seen that the personalized requirements are complemented through the mining of implicit requirements. The complemented user requirements can significantly improve the quality of prompts, and the interaction efficiency between users and experts in the process of group intelligence innovation.

7. Conclusions

The Internet’s rapid development has led to a surge in the requirement for personalized product design and development. However, meeting implicit requirements that drive product innovation remains a challenge. To address this, this study proposed a method for mining personalized implicit requirements based on a domain knowledge graph. This method involves analyzing personalized requirements and knowledge from domain patents. A patent knowledge ontology layer representation method was proposed, leading to the construction of a domain knowledge graph. This expands the information contained in the knowledge graph. The two NLP models proposed during the study successfully improve the efficiency of graph construction. Two implicit requirement mining methods are proposed: one based on structural similarity and the other based on link prediction. These methods aim to overcome subjective professional limitations and broaden the requirement space for product innovation design.

The method we propose and the platform we develop have good application prospects in the following contexts: First, in the context of generative design driven by large language models, it can be applied to prompt engineering, mining real and deep requirements that users find difficult to express, improving the model’s understanding and empathy abilities, and further enhancing the feasibility and innovation of product development. Second, in the context of crowd innovation, it is possible to break through the subjective limitations of user requirements, improve communication efficiency between users and designers, reduce design iterations, and accelerate innovation convergence.

However, this study has identified some limitations that require further exploration. On the one hand, the requirement expression paradigm represents the functions a product or component should achieve under what context. The requirement analysis method described in this paper realizes the extraction and mining of product functional requirements and structural requirements, but the requirement context is not considered enough. On the other hand, the study selects the patent as a field of knowledge network data and the knowledge source; in actual engineering applications, different enterprises have different types of databases and knowledge bases, and have less storage and a limited technical field. The research method proposed in the application of this kind of knowledge base effect may thus have certain limitations.

Future research could focus on combining the usage context during personalized requirement analysis. Secondly, a general requirements graph can be constructed based on the method proposed in this study with comments data, and the implicit requirement mining methods can also act on general requirements, so as to realize the mining methodology that drives design innovation and is coupled with personalized requirements. Additionally, in practical engineering applications, determining how to further expand the knowledge source and build a knowledge graph with both breadth and depth under knowledge integration remains to be further studied.

Author Contributions

Conceptualization, Z.M. and L.G.; methodology, Z.M. and J.G.; software, Z.M. and J.G.; validation, Z.M. and J.G.; formal analysis, J.G.; investigation, Z.M.; data curation, J.G.; writing—original draft preparation, Z.M. and J.G.; writing—review and editing, H.C. and J.L.; visualization, Z.M.; supervision, L.G.; project administration, L.G.; funding acquisition, L.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China, grant number No. 52375229, Theory and Method of Collective Intelligence Innovative Design Space Collaborative Exploration and Innovation Generation.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Luo, J. Designing the future of the fourth industrial revolution. J. Eng. Des. 2023, 34, 779–785. [Google Scholar] [CrossRef]
Guo, W.; Zhen, W.; Shao, H.; Lei, W.; Lin, G.; Yu, S.; Feng, Y.; Wan, X.; Liu, J. Crowdsourcing design theory and key technology development. Comput. Integr. Manuf. Syst. 2022, 28, 2650. [Google Scholar]
Guo, W.; Feng, Y.; Lei, W. Crowdsourcing Design Theory and Method; China Machine Press: Beijing, China, 2023; p. 306. [Google Scholar]
Mi, S.; Hong, Z.; Feng, Y.; Lou, S.; Fei, S.; Zhou, K.; Xiong, T.; Guo, W.; Tan, J. Process control key technologies of crowdsourcing design for product innovation, research and development. Comput. Integr. Manuf. Syst. 2022, 28, 2666. [Google Scholar]
Zhu, Q.; Luo, J. Toward artificial empathy for human-centered design. J. Mech. Des. 2024, 146, 61401. [Google Scholar] [CrossRef]
Liu, Q.; Wang, K.; Li, Y.; Liu, Y. Data-driven concept network for inspiring designers’ idea generation. J. Comput. Inf. Sci. Eng. 2020, 20, 31004. [Google Scholar] [CrossRef]
Li, X.; Chen, C.; Zheng, P.; Wang, Z.; Jiang, Z.; Jiang, Z. A knowledge graph-aided concept–knowledge approach for evolutionary smart product–service system development. J. Mech. Des. 2020, 142, 101403. [Google Scholar] [CrossRef]
Liu, Q.; Wang, K.; Li, Y.; Chen, C.; Li, W. A novel function-structure concept network construction and analysis method for a smart product design system. Adv. Eng. Inf. 2022, 51, 101502. [Google Scholar] [CrossRef]
Tan, L.; Zhang, H. An approach to user knowledge acquisition in product design. Adv. Eng. Inf. 2021, 50, 101408. [Google Scholar] [CrossRef]
Goldberg, D.M.; Abrahams, A.S. Sourcing product innovation intelligence from online reviews. Decis. Support. Syst. 2022, 157, 113751. [Google Scholar] [CrossRef]
Jin, J.; Liu, Y.; Ji, P.; Kwong, C.K. Review on recent advances in information mining from big consumer opinion data for product design. J. Comput. Inf. Sci. Eng. 2019, 19, 10801. [Google Scholar] [CrossRef]
Young, T.; Hazarika, D.; Poria, S.; Cambria, E. Recent trends in deep learning based natural language processing. IEEE Comput. Intell. Mag. 2018, 13, 55–75. [Google Scholar] [CrossRef]
Li, X.; Chen, C.; Zheng, P.; Jiang, Z.; Wang, L. A context-aware diversity-oriented knowledge recommendation approach for smart engineering solution design. Knowl.-Based Syst. 2021, 215, 106739. [Google Scholar] [CrossRef]
Zhang, G.; Kou, J.; Chen, L. Web review of a text-driven vehicle design planning approach. Mach. Des. 2021, 38, 139–144. [Google Scholar]
Zadeh, L.A. Fuzzy sets. Inf. Control 1965, 8, 338–353. [Google Scholar] [CrossRef]
Pawlak, Z. Rough sets. Int. J. Comput. Inf. Sci. 1982, 11, 341–356. [Google Scholar] [CrossRef]
Zheng, P.; Xu, X.; Xie, S.-Q. A weighted interval rough number based method to determine relative importance ratings of customer requirements in QFD product planning. J. Intell. Manuf. 2019, 30, 3–16. [Google Scholar] [CrossRef]
Chen, Z.; Ming, X.; Zhou, T.; Chang, Y.; Sun, Z. A hybrid framework integrating rough-fuzzy best-worst method to identify and evaluate user activity-oriented service requirement for smart product service system. J. Clean. Prod. 2020, 253, 119954. [Google Scholar] [CrossRef]
Haber, N.; Fargnoli, M.; Sakao, T. Integrating QFD for product-service systems with the Kano model and fuzzy AHP. Total Qual. Manag. Bus. 2020, 31, 929–954. [Google Scholar] [CrossRef]
Zhang, M.; Yahui, C.; Xiu, Y.; Li, L. Charging demand distribution analysis method of household electric vehicles considering users’ charging difference. Electric Power Autom. Equip./Dianli Zidonghua Shebei 2020, 40, 154–161. [Google Scholar]
Kano, N.; Seraku, N.; Takahashi, F.; Tsuji, S. Attractive quality and must-be quality. J. Jpn. Soc. Qual. Control 1984, 14, 147–156. [Google Scholar]
Mikulić, J.; Prebežac, D. A critical review of techniques for classifying quality attributes in the Kano model. Manag. Serv. Qual. Int. J. 2011, 21, 46–66. [Google Scholar] [CrossRef]
Zhou, F.; Ayoub, J.; Xu, Q.; Jessie Yang, X. A machine learning approach to customer needs analysis for product ecosystems. J. Mech. Des. 2020, 142, 11101. [Google Scholar] [CrossRef]
Budiarani, V.H.; Maulidan, R.; Setianto, D.P.; Widayanti, I. The kano model: How the pandemic influences customer satisfaction with digital wallet services in Indonesia. J. Indones. Econ. Bus. (JIEB) 2021, 36, 61–82. [Google Scholar] [CrossRef]
Bhardwaj, J.; Yadav, A.; Chauhan, M.S.; Chauhan, A.S. Kano model analysis for enhancing customer satisfaction of an automotive product for Indian market. Mater. Today Proc. 2021, 46, 10996–11001. [Google Scholar] [CrossRef]
Zhou, F.; Jiao, R.J.; Linsey, J.S. Latent customer needs elicitation by use case analogical reasoning from sentiment analysis of online product reviews. J. Mech. Des. 2015, 137, 71401. [Google Scholar] [CrossRef]
Timoshenko, A.; Hauser, J.R. Identifying customer needs from user-generated content. Mark. Sci. 2019, 38, 1–20. [Google Scholar] [CrossRef]
Wang, Z.; Chen, C.; Zheng, P.; Li, X.; Khoo, L.P. A graph-based context-aware requirement elicitation approach in smart product-service systems. Int. J. Prod. Res. 2021, 59, 635–651. [Google Scholar] [CrossRef]
Chen, R.; Wang, Q.; Xu, W. Mining user requirements to facilitate mobile app quality upgrades with big data. Electron. Commer. Res. Appl. 2019, 38, 100889. [Google Scholar] [CrossRef]
Zhang, M.; Fan, B.; Zhang, N.; Wang, W.; Fan, W. Mining product innovation ideas from online reviews. Inf. Process Manag. 2021, 58, 102389. [Google Scholar] [CrossRef]
Chen, K.; Jin, J.; Luo, J. Big consumer opinion data understanding for Kano categorization in new product development. J. Amb. Intel. Hum. Comp. 2022, 13, 2269–2288. [Google Scholar] [CrossRef]
Siddharth, L.; Blessing, L.; Luo, J. Natural language processing in-and-for design research. Des. Sci. 2022, 8, e21. [Google Scholar] [CrossRef]
Jia, J.; Zhang, Y.; Saad, M. An approach to capturing and reusing tacit design knowledge using relational learning for knowledge graphs. Adv. Eng. Inf. 2022, 51, 101505. [Google Scholar] [CrossRef]
Luo, J.; Sarica, S.; Wood, K.L. Computer-aided design ideation using InnoGPS. In Proceedings of the International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, Anaheim, CA, USA, 18–21 August 2019; American Society of Mechanical Engineers: New York, NY, USA, 2019; p. V02AT03A011. [Google Scholar]
Ye, F.; Fu, T.; Gong, L.; Gao, J. Cross-domain knowledge discovery based on knowledge graph and patent mining. In Proceedings of the Journal of Physics: Conference Series, Xi’an, China, 28–30 October 2021; IOP Publishing: Bristol, UK, 2021; p. 42155. [Google Scholar]
Luo, J.; Sarica, S.; Wood, K.L. Guiding data-driven design ideation by knowledge distance. Knowl.-Based Syst. 2021, 218, 106873. [Google Scholar] [CrossRef]
Verganti, R.; Vendraminelli, L.; Iansiti, M. Innovation and design in the age of artificial intelligence. J. Prod. Innov. Manag. 2020, 37, 212–227. [Google Scholar] [CrossRef]
Zhu, Q.; Zhang, X.; Luo, J. Biologically inspired design concept generation using generative pre-trained transformers. J. Mech. Des. 2023, 145, 41409. [Google Scholar] [CrossRef]
Zhu, Q.; Luo, J. Generative design ideation: A natural language generation approach. In Proceedings of the International Conference On-Design Computing and Cognition, Glasgow, UK, 4–6 July 2022; Springer: Cham, Switzerland, 2022; pp. 39–50. [Google Scholar]
Zhu, Q.; Luo, J. Generative transformers for design concept generation. J. Comput. Inf. Sci. Eng. 2023, 23, 41003. [Google Scholar] [CrossRef]
Li, Z.; Tate, D.; Lane, C.; Adams, C. A framework for automatic TRIZ level of invention estimation of patents using natural language processing, knowledge-transfer and patent citation metrics. Comput. Aided Des. 2012, 44, 987–1010. [Google Scholar] [CrossRef]
Hulth, A. Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, Sapporo, Japan, 11–12 July 2003; pp. 216–223. [Google Scholar]
Cui, Y.; Che, W.; Liu, T.; Qin, B.; Yang, Z. Pre-training with whole word masking for Chinese Bert. IEEE/ACM Trans. Audio Speech Lang. Process. 2021, 29, 3504–3514. [Google Scholar] [CrossRef]
Lample, G.; Ballesteros, M.; Subramanian, S.; Kawakami, K.; Dyer, C. Neural architectures for named entity recognition. arXiv 2016, arXiv:1603.01360. [Google Scholar]
Daud, N.N.; Ab Hamid, S.H.; Saadoon, M.; Sahran, F.; Anuar, N.B. Applications of link prediction in social networks: A review. J. Netw. Comput. Appl. 2020, 166, 102716. [Google Scholar] [CrossRef]
Kumar, A.; Singh, S.S.; Singh, K.; Biswas, B. Link prediction techniques, applications, and performance: A survey. Phys. A Stat. Mech. Its Appl. 2020, 553, 124289. [Google Scholar] [CrossRef]
Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.S.; Dean, J. Distributed representations of words and phrases and their compositionality. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 5–10 December 2013; Volume 26. [Google Scholar]
Perozzi, B.; Al-Rfou, R.; Skiena, S. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014; pp. 701–710. [Google Scholar]

Figure 1. Examples of personalized requirement characterization.

Figure 2. Research framework.

Figure 3. Construction process of the domain knowledge graph based on patent data.

Figure 4. A schematic diagram of the domain knowledge graph ontology layer.

Figure 5. The sentence sequence classification principle based on the BERT-Transformer module.

Figure 6. Implicit requirement elements co-occur with “transport” (outermost entities/nodes).

Figure 7. A kind of implicit requirement element that has a relevance but no co-occurrence relationship with “transport”.

Figure 8. Generation of co-occurrence networks based on the domain knowledge graph.

Figure 9. Node2Vec algorithm process.

Figure 10. Breadth-first and depth-first walk strategies.

Figure 11. The software environment and functional framework of the personalized implicit requirement mining platform.

Figure 12. Screenshot of a domain knowledge graph and details of patent entities.

Figure 13. Requirement element identification interface.

Figure 14. Requirement element–knowledge entity matching interface.

Figure 15. Personalized implicit requirement mining results of the case study.

Figure 16. Training performance of the domain patent classification model.

Figure 17. Training performance of the functional entity identification model in the patent titles.

Figure 18. Training performance of the structural entity identification model in the patent claim.

Figure 19. Training performance of the structural entity classification model.

Table 1. Comparison of general user requirements and personalized requirements.

Attributes	General User Requirements	Personalized Requirements
Requirements Sources	User comments from ordinary e-commerce platform	Tasks from crowdsourcing/collaborative community platform
Targeted Products	Mass-produced consumer products	Mechanical, mechanical and electrical equipment, customized products, non-ordinary consumer products
User Characteristics	Ordinary consumers	Field enthusiasts or practitioners
Requirements Characteristics	Dynamics, complexity, concealment, unstructured description, emotion, ambiguity	Dynamics, complexity, concealment, unstructured description, semi-professional
Contents	Fuzzy expression of user requirements	Core functional requirements and even some structural requirements
Examples	I want a mobile phone with a large screen, thin body, clear pictures, especially at night, with no card, to use for more than 3 years, to support 2 days without charging.	Our company is engaged in automation. At present, we need to purchase a manipulator to cooperate with the carton-forming machine assembly line to realize automatic loading and unloading. The current manipulator is expensive, and we want to replace it with a domestic one. Requirements: purchase a six-axis manipulator, which can cooperate with the forming machine process.

Table 2. The characteristics and typical manifestations of general user implicit requirements and personalized implicit requirements.

Attributes	General User Implicit Requirements	Personalized Implicit Requirements
Kano Types	Basic and expected	Attractive
Characteristics	The incompleteness of the description	Subjective limitations of the users
Typical Manifestations	Standard function operation words, typical structure words, and standard engineering parameters	Sub-actions of FAs, pre-/post-action of FAs, sub/parent or very similar structures of FOs/structures
Example	Smooth operation, HD camera, good durability, large capacity battery, etc.	Ingredients, testing, grasping, transportation, sorting, robotic hand, robotic arm, etc.

Table 3. Structural entity classification information table.

Abbreviation	Meaning	Examples
PC	Power Component	Traveling motor, turbine device
TC	Transport Component	Mud settling pipeline, discharge belt
MC	Measuring Component	Tank pressure monitor, low temperature sensor
AC	Adjusting Component	Pump motor stop button, lock control system host
RM	Raw Materials	Columnar metal, spherical catalyst
BP	Mechanical Basic Part or Other Component	Screws, gears, boxes
EC	Electronic Component	FPGA logic controller, ripple generation circuit
FM	Functional Module	Disabled scooters, alloy production devices
IE	Invalid Entity	Triples, cells

Table 4. Methods of multi-category personalized implicit requirement mining.

Types of Personalized Implicit Requirements	Relations with the Explicit Requirement Entities	Mining Method
Sub-actions of the FAs (Type I)	Co-occurrence mostly	Entity-layer; Structural similarity
Pre-/post-action of FAs (Type II)
Sub/parent structures of FOs/structures (Type III)
Pre-/post-actions of sub-actions of FAs (Type IV)	Non-co-occurrence mostly	Co-reference embedding layer; Link prediction
Similar structures of FOs/structures (Type V)	Non-co-occurrence	Co-reference embedding layer; Link prediction

Table 5. Structural entities classification information in the domain knowledge graph.

Labels	PC	TC	MC	AC	RM	BP	EC	FM	IE
Proportion	3.10%	7.00%	1.70%	3.00%	3.40%	46.30%	12.50%	6.80%	16.20%

Table 6. Training information of four models in knowledge graph construction.

Models/Attributes	BERT-Base	BERT-Base-Patent-Fusion *	BERT-wwm-ext	BERT-wwm-ext-Patent-Fusion *
Mask	Character	Character	Whole Word	Character
Data Source	Chinese Wiki	Patent Data	Chinese Wiki and extension	Patent Data
Number of Words	40 Million	62.01 Million	5 Billion	62.01 Million
Initialization Mode	Random	BERT-base	BERT-base	BERT-wwm-ext

Table 7. Node2Vec parameter setting.

Parameters	Parameter Meaning	Values
dim	The dimensions number of generated representation vector	128
number-walks	Number of random walks at the beginning of each node	10
walk-length	Step size of each random walk at the beginning of each node	30
workers	Number of parallel operations of algorithm	10
window-size	Window size of SkipGram model	10
p	Control parameters of biased random walk	0.25
q	Control parameters of biased random walk	4

Table 8. Implicit requirements mined in the case study.

ERE	Implicit Requirements	Mining Method and Calculation Results			Types	Whether to Adopt
ERE	Implicit Requirements	Co-Occurrence Frequency	Jaccard Similarity	Cosine Similarity	Types	Whether to Adopt
Sorting	Transport	4	0.89%	/	II	Yes
	Classification	2	0.94%	/	I	Yes
	Detection	2	0.93%	/	I	Yes
	Marking	1	0.93%	/	II	Yes
	Check	1	0.93%	/	I	Yes
	Assignment	1	0.93%	/	I	No
	Tin dipping	1	0.93%	/	II	No
	Prevent jamming	1	0.91%	/	II	No
	Shuttle	/	/	65.52%	IV	No
	Delivery	/	/	65.12%	IV	No
	Logistics	/	/	64.85%	IV	No
	Distribution	/	/	63.99%	IV	No
	Store	/	/	63.46%	IV	No
	Place	/	/	62.91%	IV	Yes
Steel Pipe	Inner wall	3	2.24%	/	III	Yes
	Port	2	2.04%	/	III	Yes
	Surface	4	1.22%	/	III	Yes
	Transport height	1	1.14%	/	III	No
	Side formwork	1	1.14%	/	III	No
	Building materials	1	0.96%	/	III	Yes
	Steel ring	/	/	69.85%	V	No
	Short pipe	/	/	64.62%	V	Yes
	Body of ships	/	/	64.29%	III	No
	Billet rod	/	/	64.25%	V	No
	Head end	/	/	64.02%	III	Yes

Note: The specific meaning of the categories from I to V is shown in Table 4.

Table 9. Performance of each pre-trained model on each problem of knowledge graph construction.

Problems	Pre-Training Model
	BERT-Base		BERT-wwm-ext		BERT-Base-Patent-Fusion		BERT-wwm-ext-Patent-Fusion
	Round	F-Score	Round	F-Score	Round	F-Score	Round	F-Score
Domain Patent Classification	6	87.71 ± 6.41%	6	89.06 ± 8.69%	6	91.61 ± 6.54%	6	92.56 ± 7.30%
Functional Entity Identification	8	65.42 ± 3.39%	9	65.65 ± 3.54%	7	65.39 ± 2.42%	10	67.58 ± 2.92%
Structural Entity Identification	7	94.28 ± 3.59%	9	94.38 ± 2.68%	10	94.48 ± 3.49%	8	94.64 ± 2.20%
Structural Entity Classification	4	74.40 ± 4.26%	9	73.13 ± 3.66%	5	76.88 ± 3.80%	9	76.60 ± 2.54%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mo, Z.; Gong, L.; Gao, J.; Cui, H.; Lan, J. A Knowledge Graph-Based Implicit Requirement Mining Method in Personalized Product Development. Appl. Sci. 2024, 14, 7550. https://doi.org/10.3390/app14177550

AMA Style

Mo Z, Gong L, Gao J, Cui H, Lan J. A Knowledge Graph-Based Implicit Requirement Mining Method in Personalized Product Development. Applied Sciences. 2024; 14(17):7550. https://doi.org/10.3390/app14177550

Chicago/Turabian Style

Mo, Zhenchong, Lin Gong, Jun Gao, Haoran Cui, and Junde Lan. 2024. "A Knowledge Graph-Based Implicit Requirement Mining Method in Personalized Product Development" Applied Sciences 14, no. 17: 7550. https://doi.org/10.3390/app14177550

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Knowledge Graph-Based Implicit Requirement Mining Method in Personalized Product Development

Abstract

1. Introduction

2. Related Works

2.1. Analysis of User Requirements in Engineering Design

2.1.1. Identification and Acquisition of Massive User Requirements

2.1.2. Sorting and Evaluation of Fuzzy Front-End Requirements

2.1.3. Classification of Requirements for Design Improvement

2.2. Mining and Completion of Highly Personalized Degree of User Requirements

2.3. Knowledge Graph in Engineering Design

3. Nature of the Problem and Research Framework

3.1. Classification and Characterization of Elements of Personalized Requirements

3.2. Analysis of the Characteristics of Personalized Implicit Requirements

3.3. Research Framework

4. Patent-Based Domain Knowledge Graph Construction Method

4.1. Ontology Layer Construction

4.2. Data Pre-Processing

4.3. Model Pre-Training

4.4. Data Layer Construction

4.4.1. Domain Patent Screening and Information Extraction

4.4.2. Functional and Structural Entity Extraction

4.4.3. Classification of Structural Entities

4.4.4. Structural Entity Resolution and Fusion

4.4.5. Relationship Generation in the Domain Knowledge Graph

5. Multi-Category Personalized Implicit Requirement Mining Method Based on the Knowledge Graph

5.1. Explicit Requirement Element Identification and Entity Matching

5.2. Co-Occurrence Implicit Requirement Mining of the Entity Layer

5.3. Non-Co-Occurrence Implicit Requirement Mining of the Co-Reference Embedding Layer

5.3.1. Build Co-Occurrence Networks of Functional and Structural Entities

5.3.2. Construct Node Sequences by Biased Random Walk

5.3.3. Train the SkipGram Model to Obtain the Entity Embedding Vector, and Sort the Entities by Relevance

6. Platform Development, Case Study, and Discussion

6.1. Platform Development

6.1.1. Platform Development and Operation Environment

6.1.2. Construction of the Knowledge Graph in the Electromechanical Domain

6.1.3. Function Modules of the Platform

6.2. Case Study

6.2.1. Explicit Requirement Element Identification

6.2.2. Requirement Element–Knowledge Entity Matching

6.2.3. Implicit Requirement Mining

6.3. Discussions

6.3.1. Domain Patent Classification Model

6.3.2. Functional and Structural Entity Identification Model

6.3.3. Structural Entities Classification Model

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI