1. Introduction
Prefabricated construction (PC) serves as a platform for driving reforms and facilitating the transformation of the construction industry. It presents remarkable advantages in resource and time efficiency, energy conservation, minimizing site pollution, and enhancing lifecycle benefits, all of which have garnered increasing attention. Unlike conventional on-site construction, PC involves the production of building components in factories, which are then transported to sites for assembly by workers and machinery [
In recent years, numerous technological innovations have emerged in the field of PC, serving as the cornerstone of its comprehensive performance and competitiveness and acting as key drivers for overcoming current limitations, such as the extended construction period associated with cast-in-place construction. Specifically, technological innovation in prefabricated buildings encompasses the adoption of new materials, advanced manufacturing processes, and innovative design methods to enhance building quality, reduce costs, and shorten construction durations, among other benefits [
2]. Examples, including 2D-to-3D assembling techniques [
3], automated connection devices [
4], and recycled concrete technology [
5], have garnered substantial attention from both academic and practical perspectives.
However, technological innovation knowledge in PC encompasses various domains, including Building Information Modeling (BIM) applications, construction technology advancements, design and material innovations, project management, and energy efficiency optimization. Most of the existing research focuses on a single technology within a specific field. For instance, Li et al., 2021 [
6] pointed out that although many technical studies have been published in the field of PC, most of the articles focus on research papers and opinion papers, and there is a lack of research on review articles. Zhang et al., 2021 [
7] also noted that all existing comments focus on a particular angle of the PC, such as supply chain management, energy performance, or structural design and performance. This fragmentation complicates knowledge acquisition for researchers, corporate investors, and policymakers and hinders the effective identification of intrinsic technological connections. Scholars have tried to solve this problem of knowledge fragmentation. For example, Masood et al., 2022 [
8] selected 526 research studies on PC, summarized the current status of the field, and conducted keyword co-occurrence analysis through visualization software. Liu et al., 2021 [
9] conducted a co-author, co-word, and co-citation analysis by counting the technical literature of PC around the world and identified important institutions and popular keywords in this field. Loganathan et al., 2022 [
10] used bibliometric methods to conduct a detailed analysis of the global knowledge of PC technology and derived co-citation clusters. However, descriptive statistics and bibliometric methods adopted by these studies are too general and not detailed enough, making it so that these studies cannot be effectively extended to the internal relationship between technology and technology. Moreover, as the change in time trend is ignored, their practical application effect is limited. All in all, with the gradual increase in technological innovation in PC, the existing bibliometric methods can no longer meet the increasingly complex needs of the engineering field, and new methods and theoretical support are urgently needed.
Knowledge graphs, structured as “entity–relationship–entity” or “entity–attribute–attribute value” triples, have been employed to represent entities and their relationships, providing an effective approach for integrating heterogeneous knowledge from various sources. Specifically, graphs, visualized through nodes and connecting edges [
11], can depict the structure of complex systems, whether natural or abstract [
12], and have demonstrated their value in structured organization and data management across numerous fields [
13]. Furthermore, graph analysis methods enable the systematic exploration of interactions among multiple information elements [
14]. Scholars have utilized knowledge graph tools in the domain of PC. For instance, Zhou Y et al. [
15] transformed XML (Extensible Markup Language)-structured building regulations into a Neo4j 4.0 graph database, facilitating relationship retrieval, conflict analysis, and compliance review. The knowledge graph, as a powerful semantic network technology, is capable of presenting the associations among knowledge in a structured manner. Text mining, on the other hand, is proficient in extracting valuable information from vast amounts of textual data. This paper innovatively introduces these two cutting-edge tools into the field of technological innovation in PC. Through the method adopted in this study, the issue of knowledge fragmentation can be precisely and effectively addressed, enabling knowledge management to transcend the limitations of descriptive statistics and advance to a higher level.
This study, focusing on technological innovation in PC, utilizes patent data along with text mining and knowledge graph methods to aggregate, correlate, and store knowledge related to PC technological innovations. Specifically, a specialized lexicon was initially developed using the relevant literature, industry standards, and expert opinions. This lexicon was subsequently utilized to segment patent data related to technological innovations in PC. Subsequently, the LDA topic model was applied to the processed data to uncover the evolution of thematic paths, thereby forming the classification framework and entity foundation for constructing the knowledge graph. Next, an Apriori algorithm was employed to identify technological associations within and across themes, while Gephi 0.9.4 software was utilized to visualize relationships within the patent data, enabling the extraction of relationship types. Finally, the extracted entities and relationships were stored in the Neo4j graph database to support visual query functionalities. This study effectively integrates knowledge on technological innovations in PC and promotes the application of the knowledge graphs to advance the development of innovation in this field.
3. Methodology
The framework of the research methodology is shown in
Figure 1 and is divided into the following 4 key components:
Data collection and processing; Extensive collection of recent patent data on technological innovations in PC. Based on the technical framework of PC and relevant domain knowledge, natural language processing and machine learning techniques are employed to preprocess the patent data, including tasks such as data cleaning and abstract tokenization.
Text entity recognition: A text feature matrix is constructed and fed into the LDA topic model, which outputs visualization results. This step involves conducting topic clustering and evolution analysis to extract relevant entities.
Association rule mining; Utilizing complex network theory and association rule techniques, the Apriori algorithm is applied for association rule analysis. A citation network is constructed from citation data to facilitate the extraction of relationships for the knowledge graph of PC technological innovation.
Knowledge graph construction: The data layer and schema layer of the knowledge graph are developed, with the PC technological innovation data stored in a Neo4j database. This enables visual querying and exploration of the knowledge graph.
3.1. Data Processing
To ensure the research data’s advancement, standardization, and timeliness, this study utilizes the patent database of the China National Intellectual Property Administration as the primary data source. This database provides extensive coverage of patent information in China, characterized by high-quality data and ease of access, thereby supporting the thematic modeling analysis and data-preprocessing tasks required for this study. Given the rapid advancements in PC technology in recent years, multiple retrieval iterations were performed to comprehensively gather data on technological innovations in this field.
As invention patents reflect a higher level of technological innovation and more accurately represent technological progress, this study focuses on “invention” and “utility model” patents, limited to those in the “valid” status. To capture the latest trends in PC technological innovation, the data collection period spans from 2020 to 2024. Advanced search methods in the China National Intellectual Property Administration database included a “de-duplication by application number” technique to effectively remove redundant records, culminating in the successful collection of 17,000 patents relevant to PC. This dataset provides a robust foundation for subsequent analyses of technological innovation and the construction of the knowledge graph.
The data processing process is divided into the following steps:
Perform word segmentation on the original data. Use the Jieba library, professional dictionary, and stop dictionary in Python to achieve batch word segmentation.
Divide the data by time period to facilitate cluster analysis in different time periods.
Filter the data after word segmentation, leaving technology-related words, and then convert them into a one-hot-word matrix to analyze association rules.
Back up the original data, extract the publication number item and cited patent item in the original data through Python language, clean the overflow data, and build a one-to-one correspondence chart to ensure that each patent has a unique ID. Then, import the processed data into Gephi to visualize the citation network.
Save the result table obtained from the association rule analysis in the form of triples, and use Cypher to batch import the data into the Neo4j database. The import process first involves storing the knowledge content in an Excel file and converting it into a CSV format. Subsequently, store the CSV file in the import directory of the Neo4j Community version and use the “LOAD CSV” command to import the data into the database.
A partial word list required for word segmentation is shown in
Table 1:
Given that patent texts frequently involve complex professional terminology, it is essential to first develop a specialized dictionary containing these terms to ensure accurate identification and segmentation in the tokenization process. The dictionary is stored as a text file, with each term listed on a separate line. To address the prevalent issue of synonyms in PC, a synonym table is also created to standardize the text by replacing synonyms with their standard equivalents, thus ensuring consistency in terminology. This study further integrates four common stop word lists and expands them by adding domain-specific terms that have no substantive impact on the analysis, such as “the present invention”, “utility model”, and “relates to”. During practical implementation, additional stop words were iteratively included as needed. Upon completing these preparatory steps, part-of-speech filtering is performed to remove unnecessary words, refining the dataset for subsequent analysis.
3.2. Text Entity Recognition
A topic model is an unsupervised machine learning technique aimed at automatically uncovering latent topics from large text corpora. These models play a significant role in text processing and analysis and are widely applied in both industry and academia. Topic models facilitate semantic representation, semantic matching computations, and the presentation of model content.
The LDA (Latent Dirichlet Allocation) model assumes that a document is generated through the process shown in
Figure 2:
In this process, α and β are parameters of the Dirichlet distribution, which are used to generate the topic distribution θ and the word distribution φ, respectively. For each document, the topic distribution θ is first generated using α. Subsequently, for each word, a topic Z is selected based on θ, and a word distribution φ is generated from the topic Z and β, ultimately resulting in a specific word W. N denotes the number of words in the document, and K represents the number of topics. This process is depicted using directed edges in the figure, with each node representing a random variable. The boxes indicate iterative processes: the inner box represents the word generation process, while the outer box signifies the document generation process.
The main experimental tool used in this study is Python 3.8.3. The primary libraries include pandas and numpy for data analysis, matplotlib and pyLDAvis for visualization, sklearn for LDA modeling, and jieba for tokenization. The framework for text entity extraction is shown in
Figure 3:
The text feature matrix derived in
Section 3.1 is used as input for the LDA model for training. The key parameters for constructing the LDA model include the number of topics K, the hyperparameter α for the document–topic distribution, the hyperparameter β for the topic–word distribution, and the number of iterations.
To determine the number of topics K, this study employs perplexity as a metric to evaluate model accuracy, a common approach for identifying the optimal number of topics in LDA models. The perplexity function provided by Python’s sklearn library is employed to assess the perplexity at different values of K, with visualization conducted using the matplotlib library. Based on predictive analysis and empirical experience, the upper limit for K is set at 30, and perplexity is calculated for K values ranging from 1 to 30. The results show that perplexity decreases and approaches stability as the number of topics increases.
For the hyperparameters α (document–topic prior) and β (topic–word prior), this study, after several experimental trials, selected commonly adopted default values, defined as α = 1/K and β = 1/K. To ensure the robustness and effectiveness of the model, a higher iteration count of 50 was chosen. Once the model parameters were finalized, the document–topic probability matrix and the topic–word distribution matrix were obtained. Visualization tools, including pyLDAvis and Sankey diagrams, facilitated an intuitive understanding of the relationships and distribution of topics.
3.3. Association Rule Mining
3.3.1. Apriori Algorithm
The association rule method is a data-mining technique designed to uncover relationships between variables within a dataset. Initially applied in market basket analysis to study customer purchasing behavior and reveal product co-occurrence relationships, this method has since found applications in marketing, medical diagnostics, web recommendations, and more.
The Apriori algorithm, introduced by Agrawal and Srikant in 1994, is a foundational algorithm for mining frequent itemsets in Boolean association rules, utilizing candidate itemsets to identify frequent itemsets. The core mechanism of Apriori is an iterative process known as level-wise search, where k-itemsets are used to search for (k + 1)-itemsets. The process begins with identifying frequent itemsets that meet a minimum support threshold. Subsequently, association rules are derived from these frequent itemsets based on a minimum confidence threshold. The detailed steps are as follows:
Generate candidate itemsets: Generate all potential candidate itemsets from the dataset.
Identify frequent itemsets: Calculate the support for each candidate itemset and retain those with a support value meeting or exceeding the minimum support threshold; these are the frequent itemsets.
Iteratively build larger itemsets: Combine frequent k-itemsets to form (k + 1)-itemsets, repeating steps 1 and 2 until no additional frequent itemsets can be generated.
Derive association rules: Create association rules from the frequent itemsets that exhibit a confidence level that meets or exceeds the minimum confidence threshold.
Thus, it is essential to predefine the minimum support and confidence thresholds before initiating the mining process. The Equations for support and confidence are as follows:
In these Equations, support denotes the frequency with which an itemset appears in the dataset, and the Apriori algorithm uses it to quantify the occurrence of itemsets. Confidence refers to the conditional probability of the rule A → B, representing the likelihood that itemset B appears given the presence of itemset A. Confidence is typically used to assess the strength of an association rule. In association rule mining, rules with a confidence level exceeding 0.75 are considered significant; thus, this study sets the minimum confidence threshold at 0.8. To improve the comprehensiveness and reliability of the results obtained from association rule mining, extensive iterative calculations using the Apriori algorithm are conducted to determine an appropriate value for the minimum support threshold.
3.3.2. Complex Network Analysis
A network consists of numerous entities, represented as nodes, with interactions among them forming connections or edges. These entities are treated as nodes within the network, with the relationships between them forming the connections or edges. Networks can be effectively represented through graphs, and any network can be visualized in this way. Common static metrics used to describe networks include node degree, degree distribution, and clustering coefficient, among others.
The “degree” of a node is a fundamental and critical metric, indicating the number of direct connections a node has with other nodes. A higher-degree value typically implies greater importance or centrality within the network. The degree of each node can be computed using an adjacency matrix, as detailed in the following calculation method:
For directed networks, the concept of degree is further categorized based on edge direction, comprising in-degree and out-degree. The in-degree of a node represents the number of edges directed towards it, with the calculation formula as follows:
The out-degree of a node denotes the number of edges pointing from the node to others, with the calculation formula as follows:
The total degree of a node in a directed network is the sum of its in-degree and out-degree, calculated as follows:
Using Python, the public number (PN) and cited patent (CP) fields from the original patent data are extracted to clean the data and construct one-to-one relationship charts, ensuring that each patent has a unique identifier. The processed data are subsequently imported into Gephi for visualization, enabling the construction of a citation network.
3.4. Knowledge Graph Construction
Knowledge graph construction involves two primary components: the schema layer and the data layer. The schema layer defines the foundational architecture of the knowledge graph, detailing concepts and their relationships, while the data layer stores specific entities and factual data.
Construction methods can be categorized as either top–down or bottom–up. The top–down approach begins with defining and extracting conceptual relationships within the schema layer, subsequently storing these entities in the data layer. Conversely, the bottom–up method collects data at the data layer first and then abstracts concepts to construct the schema layer. This study employs the top–down approach. The detailed process is depicted in
Figure 4.
Figure 4 illustrates the process for constructing a knowledge graph for PC technological innovation, divided into three main stages. The first stage encompasses the processing and analysis of multi-source heterogeneous data, including the collection, preprocessing, and feature analysis of patent data. The second stage involves constructing the schema layer of the knowledge graph through a three-step process: developing a classification system for technological innovation, defining relational structures for innovation, and establishing attribute constraints. The third stage focuses on building the data layer of the knowledge graph, encompassing the identification of innovation entities, application of association rules, and mapping of technological innovation evolution paths.
The final phase involves the construction and validation of the knowledge graph, forming triples, storing them in a Neo4j database, and enabling a question-and-answer functionality within the knowledge graph.
4. Result and Discussion
4.1. Topic-Clustering Results
The findings from the topic analysis conducted in
Section 3.2 on text entity recognition are summarized as follows:
As shown in
Table 2 and
Table 3, the first column lists the topic number, while the second column indicates the computed strength of each topic. For instance, “ID 0” denotes a prominent topic with a strength of 0.1969, focusing on insulation structures and wall technologies. The subsequent topics include concrete structures and rebar connection technology, prefabricated components and assembly methods, device adjustment and structural design, as well as threaded connections and component design. The fourth column displays the keywords associated with each topic, ranked by frequency, with the 25 keywords selected for each topic.
This study employs Sankey diagrams to analyze the evolution paths of topics, as illustrated in
Figure 5. In this diagram, differently colored blocks represent distinct clusters, and these blocks from consecutive time periods are connected by flow bands, depicting the transition of clusters over time. The height of the bands reflects the number of entities within a particular technological knowledge topic. By observing the division and convergence of the bands, one can discern the evolutionary relationships among patent technology knowledge topics.
For instance, if a block in one time frame splits into two or more blocks in the following period, it suggests that the knowledge topic is branching into new subtopics. Conversely, if two or more blocks merge into one in the subsequent time frame, this may indicate topic consolidation or the formation of a new topic. The Sankey diagram facilitates an intuitive visualization and tracking of the evolutionary trajectories of patent technological innovation knowledge.
As illustrated in the Figure, the horizontal axis of the Sankey diagram represents time windows, while the vertical axis denotes the corresponding hotspot topics. Topics are arranged from top to bottom (with colors transitioning from red to purple) to indicate their prominence within each time window, where higher positions reflect greater topic activity. The connecting lines between topics represent their evolutionary trajectories.
From the perspective of topic activity, research hotspots in PC technology have shifted from specific nodes and connection techniques to encompassing overall structural design, prefabricated components, and modular design. Throughout this period, insulation technology and energy-efficient building practices have consistently held significant positions. This shift highlights an increasing emphasis on structural safety, construction efficiency, and sustainability as PC technology matures. Such a comprehensive development trajectory supports the holistic advancement and broader adoption of PC techniques.
Research on “mechanical devices and adjustment” has evolved towards assembly precision and structural adaptability, with improvements in assembly efficiency and accuracy emerging as critical concerns. Concurrently, “reinforced concrete structures and construction” remains a key research area, underscoring the continuous need for improvements in traditional structural systems. The emergence of “prefabricated wall panel technology” as a new research focus indicates the growing use of prefabricated wall panels in construction. Wall insulation technology continues to garner attention, reinforcing the importance of energy-efficient building practices as a persistent research direction.
An analysis of topic separation and convergence, combined with a review of the keywords for different topics, indicates that as PC technology evolves, the demand for more specialized subfields becomes increasingly apparent. For instance, the topic “assembly nodes and structural design” progressively diverged into more specialized research areas over the following years, such as “bolt connections” and “structural constraints and connections”. This divergence reflects a focused examination of node connection technology aimed at enhancing assembly precision and structural safety.
As the complexity of innovation in PC technology grows, research has become more targeted toward specific challenges. For example, the initial topic of “prefabricated components and assembly” eventually diverged into distinct areas, such as “prefabricated steel structural components” and “concrete structures and connections”. This specialization meets the developmental needs of various materials and component types, driving targeted technological advancements. Conversely, topics like “assembly of prefabricated components and concrete structures” and “reinforced concrete structures and construction” have demonstrated a trend in convergence, underscoring the need for an integrated approach to combining prefabricated elements with conventional concrete structures, thus improving construction efficiency and overall structural performance.
The integration of emerging technologies has further driven the convergence of different research topics. For instance, the widespread adoption of Building Information Modeling (BIM) has allowed for a more cohesive integration of design, construction, and management phases in PC. BIM technology provides a unified platform that effectively integrates topics such as “component structure design” and “assembly nodes and structural design”, enhancing overall coordination and construction efficiency.
4.2. Association Rule Results
According to the experimental results, setting the minimum support to 0.0007 yielded 308 frequent itemsets. For each frequent itemset, multiple association rules were generated, and only rules with high lift values were retained, as lift indicates the strength of the association between items. For instance, in a three-itemset {‘BIM’, ‘architectural design’, ‘quantities’}, the generated confidence values for the rules were 117, 468, and 117, respectively. Only the rule with the highest lift was preserved, as a higher lift signifies a stronger association among the items.
Table 4 provides examples of selected two-dimensional association rules.
Based on the identified topics and association rules, this study developed a taxonomy to classify association rules related to technological innovation in PC. This taxonomy organizes various technical associations within a clear conceptual framework. The machine learning library Scikit-learn, integrated with Python 3.12.1, was utilized to apply the LDA model for the analysis of each association rule, producing a set of topics as output. Each topic is represented by a list of significant terms.
Table 4 presents sample outputs of the classification process.
The clustering results of the topic model encompass multiple keywords, including wall, building, insulation, staircase, structure, and exterior wall. A review of these classification outcomes reveals that connections such as curtain walls, insulation panels, and integrated designs are associated with wall structures and insulation. These are thus consolidated into a single topic termed “Wall Structure and Insulation”. Similarly, other topics generated by the LDA model are classified and merged using advanced language models to form the final association rule list for technological innovation in PC. A bottom–up approach is employed to organize the association rules and construct a structured knowledge hierarchy.
Furthermore, Python is used to extract the public number (PN) and cited patent (CP) fields from the original patent data to clean redundant data and establish one-to-one relationship charts, ensuring each patent has a unique ID. The processed data are then imported into Gephi to visualize the citation network. The visualization results are shown in
Figure 6, with red boxes indicating the key nodes that connect topics.
Figure 6 presents the citation-based visualization map constructed in Gephi. After data importation, the original network graph underwent modular processing and out-degree filtering, followed by automated clustering using the ForceAtlas layout. The different colors of the blocks represent distinct technological topics within the field of PC, and the size of the nodes corresponds to their out-degree (the number of outgoing connections to other nodes). The technological topics within each block were categorized based on the specific content of the patent data, as illustrated in
Figure 6.
The figure demonstrates that patents with citation relationships are connected by directed lines. For example, the purple nodes (representing a major category in the visualization) correspond to technologies related to prefabricated wall panels. Overall, the PC technology field shows pronounced clustering characteristics, with the patent citation network exhibiting a significant aggregation center.
4.3. Knowledge Graph Construction Results
In summary, the entities and relationship nodes of the Neo4j-based knowledge graph developed in this study are presented in
Table 5 and
Table 6.
In the Table above, the “Patent” entity type encompasses attributes such as the public number, publication date, and abstract. The “Topic” entity type includes attributes such as the topic’s name, year, and additional notes, while the “Technology” entity type includes the technology name and detailed explanations.
The “belongs to” relationship type represents a hierarchical connection structured using the LDA topic model and classification system, encompassing three categories: topic–patent, topic–technology, and patent–technology. The “topic evolution” relationship captures the merging and divergence of topics over different years. The “integration” relationship represents the mutual merging and intertwining between technology nodes. The “sequence” relationship illustrates the chronological order of technologies, while the “co-occurrence” relationship highlights technologies that frequently appear together and have strong dependencies. The “cited” and “citing” relationships link patent nodes to showcase the evolution path of patent technologies.
Data are stored in the form of triples, and entities and relationships are imported in batches using Cypher. The import process begins with storing the knowledge content in an Excel file, which is converted to CSV format. This conversion involves saving the Excel file as CSV, opening it in a text editor, and saving it with UTF-8 encoding. The CSV file is then placed in the import directory of the Neo4j Community Edition, and the “LOAD CSV” command is used to import the data into the database.
Figure 7 depicts the hierarchical relationship between topic and patent nodes.
The entities are shown as circles, while the relationships are represented by the arrowed lines. In the example, pink indicates the topic nodes, and blue represents the selected patent nodes linked to the topic. The constructed knowledge graph in this study comprises a total of 10,200 “patent” entity nodes and 6367 “cited” relationships. The following Cypher query was used for data import:
LOAD CSV WITH HEADERS FROM ‘file:///patent_patent.csv’ AS row MATCH (a:patent {nameID: row.nameID1}), (b:patent {nameID: row.nameID2}) MERGE (a)-[:RELATED_TO]->(b);
The visualization is restricted to 2000 nodes, and the resulting citation network knowledge graph is illustrated in
Figure 8.
The green nodes represent patent data, encompassing attributes such as the patent number, abstract, and publication date. The gray connecting lines indicate citation relationships. The Neo4j-based knowledge graph for PC facilitates the querying of technical information. When a query command for specific entities or relationships is executed, the results are presented to the user.
Figure 9 displays the output related to precast concrete panels following the execution of the query “MATCH (pbl)-[r]-() RETURN pbl, r”.
In this Figure, yellow nodes represent the “Technology” entity type, illustrating a relationship network centered on precast concrete panels (PBL). Nodes are linked by relationship lines labeled with their types (e.g., CONNECTED_TO). This diagram highlights the advantages of PBL, such as durability and resource conservation, and illustrates its interrelationships and dependencies with other associated technologies.
The system also supports the integration of new technological innovations in PC. By leveraging the knowledge graph, new technologies can be integrated and updated in real-time, enhancing the specialization and comprehensiveness of the knowledge graph in this domain.
4.4. Intelligent Question-Answering Support
The core principle of a knowledge graph-based question answering (QA) system is to apply computational logic to transform user queries into a comprehensible logical framework before searching for answers. This section outlines the framework of a Neo4j-based knowledge graph QA system, designed to facilitate the application of technological innovation in PC. The QA system for PC technological innovation is structured into three main layers: the user layer, the business layer, and the data layer. The detailed architecture is illustrated in
Figure 10.
To validate the practical effectiveness of the PC technological innovation knowledge graph system, several example questions were used to showcase its intelligent question-answering capabilities.
- (1)
Basic information retrieval from the knowledge graph of PC technological innovation: To query patent information associated with a specific technology topic, a user can input the following: “Show patents related to precast concrete panels in PC”. Results: Patent 1: Number: CN118148275, Abstract: A method for manufacturing innovative precast concrete panels, Publication Date: 15 May 2020; Patent 2: Number: CN221095501, Abstract: Connection techniques for precast concrete panels, Publication Date: 30 November 2019, etc.
To query citation details of a specific patent, a user can input the following: “Show the citation details for patent CN118148275”. Cited patent 1: Number: CN221095501, Title: High-efficiency construction method involving precast concrete panels, Publication Date: 22 July 2021, etc.
To query the evolution path of a specific technology, a user can input the following: “Show the evolution path of precast concrete panel technology”. Results: Precast concrete panels (2018) -> Connection technology for precast concrete panels (2019) -> High-efficiency construction method for precast concrete panels (2020).
Enterprises can leverage the knowledge graph’s query system to gain insights into the development trajectory of “precast concrete panel” technology, thereby optimizing their R&D strategies. Additionally, the system can visualize the evolutionary relationships between various technological stages, assisting companies in recognizing trends and setting strategic R&D priorities. Citation details provide valuable data for assessing the impact and commercial potential of specific patents. Finally, by periodically incorporating new technological information or feedback on the application of certain technologies, as well as enhancement suggestions, enterprises can promote better circulation and reuse of technical knowledge.
- (2)
Intelligent Decision Support and Market Forecasting The knowledge graph extends beyond basic information retrieval to provide intelligent decision support and market forecasting, aiding companies in making informed decisions regarding technological R&D and market strategy. For instance, when a user inputs, “Predict the development direction of precast concrete panel technology over the next five years based on current trends”, the system responds with an analysis integrating current patent data and technological trends. It forecasts that precast concrete panel technology will likely evolve towards intelligent assembly methods to enhance construction efficiency, the adoption of new eco-friendly materials, modular design of prefabricated components, and the integration of BIM technology. This intelligent decision support allows companies to proactively align with future technological developments, optimize R&D investment, and strengthen market positioning.
- (3)
Feedback Mechanism and Technological Improvement A feedback mechanism enables companies using a particular technology to submit performance evaluations, while other companies or researchers can adjust strategies or refine technical solutions based on this feedback. For example, a user might submit, “Prefabricated steel structure beam-column connection components demonstrate good stability during construction but require improved connection strength in practical applications; optimizing the connection design is recommended”. When researchers or companies query related patents, they can access this application feedback, thereby better supporting strategic decisions and providing a foundation for technological enhancements. This contributes to increased practicality and market competitiveness of the technology.
4.5. Implications
The theoretical and practical significance of this study is outlined as follows:
- (1)
Theoretical Significance:
Advancing Knowledge Management Theory: This study employs the construction of a knowledge graph to systematically and structurally store patent data related to PC technology, facilitating the efficient integration and reuse of knowledge. It provides a practical example of applying knowledge integration and sharing within knowledge management theory, demonstrating how knowledge graphs enhance knowledge management and utilization. By uncovering hidden relationships and new insights within patent data, the knowledge graph promotes knowledge discovery and innovation. This research extends the knowledge discovery domain within knowledge management theory by introducing innovative tools and methodologies for extracting knowledge concealed in large-scale datasets. Furthermore, the dynamic updating and maintenance capabilities of knowledge graphs ensure that knowledge bases evolve and remain accurate with the addition of new information. This study thus offers a new perspective on knowledge lifecycle management, illustrating how to sustain an updated and reliable knowledge base in rapidly changing environments.
Advancement of Complex Network Theory: By constructing a knowledge graph that encapsulates entities and relationships, this study transforms patent data into a complex network, presenting a novel application for complex network theory. Analyzing the network structure and its dynamic properties within the knowledge graph reveals interrelationships and technological evolution patterns, contributing valuable data and case studies to network analysis and evolutionary studies in complex network theory. As knowledge graphs are often heterogeneous networks with diverse node and edge types, this research supports the development of complex network theory, particularly in the context of multilayer and heterogeneous network analysis.
- (2)
Practical Significance:
Advancing Innovation in PC Technology: Integrating patent data from diverse sources into a unified knowledge graph enhances the systematic and comprehensive understanding of knowledge within the field, enabling researchers to gain a holistic view of current technological advancements and trends, and preventing redundant research efforts. This approach uncovers latent connections and mutual influences among various technologies, facilitating the discovery of implicit knowledge that can inspire new research directions and innovative solutions. The use of visual tools to display technological linkages and evolution paths enhances the understanding of complex interrelationships between technologies and provides insights for discovering new application scenarios and innovation opportunities.
Guidance for Technical Research, Business Investment, and Policy Making: The knowledge graph enables researchers to easily access a comprehensive overview of relevant technological fields, identify emerging trends and cutting-edge technologies, and select more promising research directions, thereby improving research efficiency and outcomes. It highlights critical technological nodes and development trajectories in PC technology, offering a scientific basis for business investment decisions, allowing for the identification of high-potential technology areas and projects, and optimizing investment strategies to enhance returns. Policymakers and government bodies can better understand the status of innovation in PC technology, facilitating the formulation of scientifically grounded industry policies, technical standards, and strategic development plans that promote sustainable growth in the construction sector.
While the data sources for this study are drawn from the Chinese patent database, its methodologies and findings possess significant international relevance. The existing literature and statistics show that the research on PC in China and other countries focuses on the same themes [
43], such as seismic risk assessment and innovative design strategies for PC structures, structural strength and safety risk management, and performance of PC components. In addition, China’s prefabricated type is in its infancy, but the developed scale is relatively large, representative, and can provide development experience for countries or regions at the same stage. By demonstrating the integration of patent data into a knowledge graph, this study exemplifies how regional technological innovation data can be effectively mined and leveraged. The proposed technological solutions, methods, and framework offer adaptability for other countries or regions, enabling them to address similar technical challenges and fostering global technological integration and innovation within the field of PC. By demonstrating the integration of patent data into a knowledge graph, this study exemplifies how regional technological innovation data can be effectively mined and leveraged. The proposed technological solutions and framework offer adaptability for other countries or regions, enabling them to address similar technical challenges and fostering global technological integration and innovation within the field of PC.
5. Conclusions
This study adopted knowledge graph and text mining to analyze and integrate the PC innovation knowledge based on the patent data from the past five years. That is, the Latent Dirichlet Allocation model was used to identify key research topics and their evolution. The Apriori algorithm was adopted to uncover relationships and dependencies, supporting association rule mining and reference network. The Neo4j-based knowledge graph was then constructed, integrating entity–relationship extraction and data visualization to enable interactive queries and decision support. The main findings of this study are as follows:
The topic analysis of PC technological innovation revealed that, over the past five years, the primary focused-on areas included insulation structures and wall technologies, concrete structures and rebar connection technologies, prefabricated components and assembly methods, device adjustment and structural design, and threaded connection and component design. The evolution of these technological themes over different time periods demonstrated a shift from isolated technologies toward integrated solutions, smart technologies, and sustainable development.
The analysis of association rules identified eight primary relationship types: belonging, topic evolution, integration, chronological order, co-occurrence, citation, being cited, and technological evolution. The findings highlighted that the integration of BIM and IoT in PC yielded significant benefits in terms of coordination and efficiency. The combination of novel environmentally friendly materials with energy management systems facilitated the advancement of energy-efficient building practices. Additionally, patents concerning prefabricated component technologies that enhance connection stability were frequently cited in subsequent research focused on improving assembly precision and seismic structure performance, positioning them as pivotal elements in the innovation chain.
The construction and application of the knowledge graph facilitated not only the systematic storage and visualization of knowledge related to PC technological innovation but also the development of a question-answering system based on the graph. This system provides a valuable resource for researchers, corporate investors, and policymakers, offering a detailed overview of the current state, key nodes, and trends in PC technology. The application of this knowledge graph supports the efficient extraction and utilization of implicit knowledge contained within patent data, thus fostering innovation and practical applications in the field.
In conclusion, this study successfully enabled the integration and reuse of technological innovation knowledge through the construction of a knowledge graph for PC technology. The knowledge graph constructed in this study has multiple utilities for stakeholders. For researchers, based on the knowledge graph of technological innovation in prefabricated construction, they can quickly identify technological hotspots, blank areas, and potential directions of innovation, thus avoiding repetitive research and optimizing research strategies. For enterprises, according to the knowledge graph, they can optimize R&D investment in a targeted manner and improve innovation efficiency. Based on the prediction of technological trends, they can formulate elaborate market strategies in advance to enhance their competitive advantages. From the perspective of government management, the knowledge graph can reveal the key nodes and bottlenecks in technological development, providing a standard basis for the formulation of industrial policies. It helps the optimization of the allocation of policy resources and drives the sustainable development of the innovation of prefabricated construction technologies. It is important to note that certain limitations exist, such as the LDA topic model’s limited ability to capture contextual semantics in text processing and the potential for the Apriori algorithm to overlook valuable rules due to parameter constraints. Future research could investigate more advanced topic-modeling techniques and association-rule-mining algorithms to enhance the accuracy and depth of analysis.