1. Introduction
Electric vehicles (EVs) have emerged as a pivotal solution for decarbonizing transportation, which is responsible for roughly one-quarter of global CO₂ emissions [
1,
2]. EVs, especially when powered by renewable energy, offer a promising path to sustainability. However, their adoption is challenged by high costs, limited range, and insufficient charging infrastructure, demanding innovative technological advancements [
3]. Governments worldwide are enacting policies to accelerate the shift from combustion engines to EVs. For example, the European Union has agreed on a legislation to ban the sale of internal combustion engine vehicles that run on fossil fuels starting in 2035 [
4]. The International Energy Agency’s “Global EV Outlook 2024” forecasts that by 2035, over 25% of vehicles could be electric under current policies, with China leading growth [
5]. This transition demands technological leadership, prompting companies like Sony and Honda to collaborate in ventures such as Sony Honda Mobility.
EV technology draws from multiple domains (automotive, electronics, materials, etc.), making technology convergence a key driver of innovation. In fact, EVs are a convergence product at the intersection of traditional automobile engineering and advanced electronics. Identifying where these diverse technologies could intersect next is crucial for guiding future EV development. Patent data provide a rich basis for such analysis. For example, patent classification co-occurrence networks have been widely used to reveal interrelationships between technological domains [
6]. By examining which technology classes appear together in patent documents, researchers can detect nascent convergence patterns that signal new development opportunities. This process, known as strategic technology intelligence, enables firms to identify technological opportunities and potential threats that could affect their future growth [
7]. It encompasses several key activities, such as monitoring technological advancements or forecasting future technological developments, which are critical for firms to stay competitive in this dynamic market.
Technology convergence, which involves the merging of two or more distinct technological domains into new interdisciplinary fields, provides the impetus for companies to explore new markets beyond their familiar technology areas [
8], serving as key to overcoming EV challenges. For instance, integrating automotive engineering with electronics and materials science can enhance battery efficiency and vehicle safety. Decision-making based on convergence dynamics and forecasting of promising technologies during the R&D planning and demonstration phase is an important factor in creating a sustainable innovation ecosystem [
9]. Patent information is commonly utilized as an indicator to analyze technology convergence [
10]. The patent classification system categorizes patent documents by subject matter, providing a structured approach to understanding technological domains. By examining the co-occurrence relationship among these classifications, researchers can identify potential technology convergence patterns [
11].
Previous studies in the EV field have primarily focused on trend analysis [
12,
13] or tracing technology transfer networks using patent data [
14]. Although these studies effectively captured the overall direction of technological evolution, these retrospective approaches rarely predict future developments, limiting their ability to guide proactive R&D. Moreover, there is a lack of methodological research that applies machine learning techniques to identify new technology development opportunities from a convergence perspective. Some efforts, such as Feng et al. (2020) [
15], have used resource allocation for convergence prediction but were constrained by narrow timeframes and reliance on a single indicator. Another study has used graph embedding methods to analyze R&D collaborations [
16], but their application to predicting EV technology convergence remains limited. Wang and Li (2024) [
17] combined link-prediction with node2vec graph embedding for technology forecasting, but they applied it to a co-word network, overlooking the convergence perspective. This study addresses these gaps by introducing a dual-level prediction framework using node2vec graph embedding and machine learning to forecast technology convergence opportunities in the EV sector.
Consequently, this study aims to achieve following objectives:
- (1)
Characterize the current EV technology landscape using patent data, by identifying and summarizing the trends in patent applications, prominent technological domains (CPC classifications), and key patent assignees;
- (2)
Develop a predictive analytical framework (dual-level prediction approach) for uncovering potential technology convergence opportunities within the EV sector;
- (3)
Empirically identify new and promising technological convergence opportunities in the EV patent landscape using the proposed dual-level predictive approach.
To achieve these objectives, this study constructed a patent co-classification network using Cooperative Patent Classification (CPC) codes and applied the node2vec algorithm to generate graph embeddings. These embeddings are utilized in a dual-level prediction framework combining similarity-based scoring and machine learning classification, systematically exploring latent connections across disparate technological domains. This framework is called dual-level, because it is designed to enhance reliability by utilizing node2vec embeddings in two complementary ways: (a) similarity-based link prediction, (b) link prediction framed as a binary classification problem. Lastly, promising new links are suggested by applying the analysis framework.
By explicitly targeting EV technologies, our framework generates industry-specific predictive insights. The integration of unsupervised embeddings and supervised machine learning helps to filter out less relevant connections, while the machine learning-based classification provides a robust validation of these predictions. The contribution of this study lies in offering a robust methodology and in providing strategic foresight for optimizing R&D investments and policy decisions in the EV market. Hence, we expect that the proposed dual-level approach will help R&D managers to streamline technology intelligence processes by providing actionable insights and advanced tools for making more informed decisions regarding which technologies to develop or invest in. The findings can be used to prioritize new areas of technology convergence, ultimately stimulating the progress of EV technology.
The remainder of this paper is organized as follows: following the introduction,
Section 2 describes the data collection, the analytical framework and applied methods in detail. In
Section 3, we present the empirical findings by highlighting predicted areas of technology convergence.
Section 4 discusses the main findings and what they might indicate for the future development of EV. In
Section 5, we present the conclusions and outlook for future research.
2. Data and Methods
2.1. Data
The analyzed patent data were extracted from United States Patent and Trademark Office (USPTO) via WINTELIPS, a comprehensive commercial patent search and analysis platform. Various studies have utilized WINTEPLIPS for patent analysis across different fields [
18,
19]. By reviewing prior research, we used a combination of keyword search and patent classification codes to obtain relevant patent data [
13,
20,
21]. Both positive and negative keywords were defined, with positive keywords designated for inclusion in patent titles, abstracts, and claims, while negative keywords were specified for exclusion. For example, terms like “bike” or “bicycle” are used to filter out unrelated patent information. To focus specifically on EV-related patents, we considered international patent classification (IPC) codes.
Ultimately, following search query was used for data retrieval: (((“electric” ADJ2 (car OR vehicle OR automobile OR mobility)) NOT (bike OR bicycle OR motorcycle OR “fuel cell” OR hybrid)).TI. OR ((“electric” ADJ2 (car OR vehicle OR automobile OR mobility)) NOT (bike OR bicycle OR motorcycle OR “fuel cell” OR hybrid)).AB. OR ((“electric” ADJ2 (car OR vehicle OR automobile OR mobility)) NOT (bike OR bicycle OR motorcycle OR hybrid)).CLA.) AND (B60L-003* OR B60L-011* OR B60L-013* OR B60L-015* OR B60K-001* OR B60L-050* OR B60W-010/08 OR B60W-010/24 OR B60W-010/26).IPC. AND (@AD >=20100101 <= 20241231). (Note: TI = Title, AB = Abstract, CLA = Claims, IPC = International Patent Classification, AD = Application Date).
The search period was limited from 2010 to 2024 to focus on the most recent developments. Data retrieval took place in January 2025, resulting in the identification of 7343 unique patent documents, including both registered patents and published patent applications. These documents were exported for further data processing.
2.2. Analysis Framework
In this study, we followed the six-step analysis outlined in
Figure 1 to identify new technology convergence opportunities using link prediction. In the first step, patent data are collected and preprocessed for subsequent analysis. The preprocessing involves handling missing values and standardizing data formats. The second step delivers descriptive statistics of the analyzed patent data, including the distribution of patent applications, key technological domains involved, and key patent assignees. The third step encompasses the construction of patent co-classification networks using CPC codes, which served as input for the node2vec algorithm. CPC codes, which have a broader classification system than IPC, allow a finer-grained representation of technological domains [
22]. The fourth step involves the transformation of a network into graph embeddings via node2vec. This process generates low-dimensional vector representations of each node, effectively capturing the complex relationships and structural features within the network. The fifth step consists of two inter-linked subprocesses. The first subprocess involves similarity-based link prediction, identifying potential new connections by selecting links with high similarity scores that do not exist in the original network. The second subprocess treats link prediction as a binary classification problem, using machine learning algorithm to predict whether a link will form between two unconnected nodes. The prediction results from both subprocesses are aggregated to identify potential new connections. Finally, potential technology convergence opportunities are identified from the aggregated prediction results. The final prediction results represent the intersection of both subprocesses. In summary, the described analysis framework enables effective identification of technology convergence opportunities by integrating similarity-based and classification-based link prediction methods. The dual-level approach enhances prediction reliability, leveraging the strengths of both methods. Data analysis was conducted using Python 3.10.16. Next, we provide a description of the theoretical aspects of the final five steps.
2.3. Descriptive Statistics
Descriptive statistics deal with summarizing the basic features of the patent dataset, including the distribution of patent applications over time, key technological domains involved, and key patent assignees. These statistics provide essential context for understanding the scope and relevance of our analysis. They lay the groundwork for the subsequent network construction.
2.4. Patent Co-Classification Network
This study adopted the concept of knowledge flow for constructing the co-classification networks. Knowledge flows in patent analysis refer to transfer or exchange of technological knowledge between distinct entities, such as technological domains. These are inferred from patterns in how patent documents are classified under multiple classification codes [
23]. Co-classification networks were constructed based on CPC code co-occurrences, reflecting knowledge flows across technological domains. Analyzing their relationships can provide important clues to how knowledge disseminates across different technological fields. Previous research indicated that merging various technological disciplines can be seen as recombinant innovation, which initiates technological transitions [
24]. Moreover, technologically novel patents tend to include combinations of IPC codes that were not previously linked, fostering the development of new technological pathways [
25]. By uniquely blending existing knowledge can potentially achieve higher levels of performance, as the current technological knowledge base influences the ways in which technologies combine and diffuse. CPC codes have a hierarchical structure that allows for detailed categorization of technologies. We relied on the group-level CPC codes to create a co-classification network. where nodes denote these codes and links indicate the strength of their interactions. The strength of these interactions is measured via the frequency of co-occurring CPC codes within the same patent documents.
2.5. Node2vec
Node2vec is a popular algorithm for generating low-dimensional, continuous vector representations (also called embeddings) of nodes in a network graph. First introduced in 2016 [
26], node2vec extends the idea of word2vec model from natural language processing (NLP) to graph-structured data. The algorithm maps each node to a d-dimensional vector f(v) in a manner that similar nodes are embedded closer together in the vector space. Treating graph nodes as “words” and their neighborhoods as “contexts”, node2vec learns meaningful embeddings through a process analog to training word embeddings in NLP. Node2vec captures structural similarities in a data-driven manner, making it highly adaptable to various network types. Moreover, it can effectively preserve both local and global structural properties of the network upon application.
Node2vec generates the node sequences through biased random walks by introducing two hyperparameters “p” and “q”. P refers to return parameter and controls the likelihood of immediately revisiting a node in the walk, while q refers to in-out parameter, balancing the exploration of nodes between starting and distant nodes. Consequently, a higher p value reduces the tendency to stay near the previous node, promoting broader exploration. A higher q value encourages the walk to remain within the local neighborhood of the starting node, preserving local structural characteristics. This flexibility in random walk strategies allows node2vec to generate diverse sequences of nodes that reflect different aspects of the graph’s structure. By tuning p and q, node2vec can generate random walks that either favor local or global structures, or a combination of both. Once the node sequences are generated through biased random walks, node2vec employs the Skip-Gram model to learn meaningful embeddings. The Skip-Gram model, which is a neural network architecture designed to predict the context of a given word, is adapted in node2vec to work with graph structures. In this context, it aims to maximize the probability of observing neighboring nodes given a particular node. In this study, we used the resulting embeddings for the link prediction task, which investigates the likelihood of forming connections between previously unconnected nodes based on both their local interactions and global roles.
2.6. Link Prediction
Link prediction, which describes the task of identifying potential connections between previously unconnected nodes within a network, has become an increasingly important area of research in various disciplines, ranging from social networks to biological systems [
27,
28]. In the domain of patent analysis, link prediction has been employed for various purposes, including but not limited to predicting the technological convergence patterns [
29], comprehending new word combinations [
30] and exploring partner selection [
31]. The application of link prediction in patent analysis has provided valuable insights into emerging technological trends and potential collaboration opportunities, enabling researchers and R&D specialists to make informed decisions.
This study proposes a dual-level link prediction framework, which gathers prediction results from two different methodological approaches to finalize more reliable predictions. The first approach calculates cosine similarity scores between all pairs of nodes to evaluate their potential for possible new linkage. Node embeddings generated by the node2vec algorithm are directly used to compute cosine similarity scores between pairs of unconnected nodes. Links with high similarity scores but no existing edges in the original network were identified as potential new connections. A threshold value for similarity can be set to filter out less relevant predictions. To uncover promising areas for future convergence, only CPC combinations, where the first three characters differ (CPC class level), are considered.
The second approach treats link prediction as a binary classification problem. In the binary classification approach, positive links (class ‘1’) are defined as existing edges in the co-classification network, representing CPC code pairs that co-occur in at least one patent document, indicating a known technological relationship. Negative links (class ‘0’) are defined as pairs of CPC codes with no co-occurrence in the dataset, i.e., unconnected nodes. For the binary classification-based link prediction, the input features consist of edge embeddings derived from the node2vec-generated node embeddings. Specifically, for each pair of CPC codes, the node embeddings of the two nodes are combined using the Hadamard product to produce a 100-dimensional edge embedding [
32]. Hadamard product combines the embeddings of two nodes to create a single vector representation via element-wise multiplication. This edge embedding serves as the sole input feature vector for training the machine learning models, implying the structural and relational properties of the patent co-classification network. In total, four different machine learning algorithms are employed and compared. These algorithms are trained on the derived edge embeddings to effectively identify and predict potential links within the network. Through this comparison, we selected the most suitable model based on performance metrics, such as F1-score. The employed evaluation metrics are calculated as follows:
where
TP,
TN,
FP, and
FN denote true positives, true negatives, false positives, and false negatives, respectively.
To evaluate the performance of our approach, we split the patent data into training and test sets. The training set was used to learn the node embeddings and train the model, while the test set was used to assess link prediction performance. Our training and evaluation methodology includes, (1) Time-based split, (2) Progressive validation and model update and (3) Simulation.
First, we trained the model using data from 2015 to 2020 and validated it on data from 2021 to 2023. This required chronological splitting of the dataset. Second, we assessed the model’s performance individually for each year from 2021 to 2023. We incrementally updated the model using data from each year and validated it on the subsequent year’s data. Third, we reserved data from 2024, and the best-performing model is selected for predicting future technological convergences.
Both training and test set contain positive and negative edges. Positive edges are the graph’s actual edges, while negative can either include all non-existing edges or be randomly sampled from non-existing edges. To handle class imbalance and consider structural information between nodes, negative edges are sampled using distance-based methods instead of random selection. This distance-based sampling approach ensures that the model encounters more challenging negative examples, enabling it to better distinguish between likely and unlikely connections. In the case of random sampling, it is more likely that the model learns less meaningful patterns, as it predominantly encounters trivial negatives that fail to capture the complex structural relationships present in the network [
33].
Figure 2 illustrates the conceptual process of transforming a network into node embeddings, which are then used to generate edge embeddings. These edge embeddings serve as input for a binary classification model that predicts previously unconnected CPC codes. The heatmap visualizes similarity scores derived from these embeddings, indicating how closely connected the different node pairs are. After training, the binary classifier predicts potential new connections. The final network visualization highlights these newly inferred connections using red dashed edges, illustrating how previously unlinked CPC codes are now connected based on learned representations.
2.7. Uncovering Technology Convergence Opportunities
In this analysis step, the prediction results from both approaches are combined to enhance robustness in identifying potential technology convergence opportunities. This integrated approach leverages both the structural information captured in the embeddings and the power of machine learning classification to uncover latent relationships and predict future connections within the technological landscape. By merging these two methodologies, we achieve a more comprehensive understanding of how different technologies may converge, enabling the identification of emerging trends. This holistic approach offers flexibility in understanding the dynamics of technological innovation, as it allows for the adjustment of threshold values. Moreover, our framework can be tailored to specific research needs and can accommodate the integration of methodological variations.
3. Results
3.1. Descriptive Statistics
In this section, the analyzed patent data are examined to highlight the overall trends and distinctive features that shape the EV patent landscape. Specifically, the analysis delves into the evolution of patent filing trend over time, examines the most frequently occurring CPC codes, and evaluates the distribution of patent assignees. This approach not only provides a historical perspective on the growth of EV innovations but also offers insights into the key technological areas and the distribution of assignees shaping the development of the field.
Figure 3 illustrates the trend in patent applications from 2010 to 2024, with a total of 7343 patent documents examined during this period.
Overall, the number of patent application has increased steadily till 2018. The sharp decline in 2019 could be related to COVID-19 and associated global economic slowdown. However, following the recovery from the pandemic, the number of patent applications resumed its gradual upward trend. The apparent decrease in 2023 and 2024, on the other hand, can likely be interpreted as an artifact, since these figures include patents that have not yet been published.
Figure 4 delineates the top 20 most frequently occurring CPC codes, providing insight into the dominant technological domains. Given the hierarchical structure and inherent complexity of CPC scheme, we opted to utilize group-level IPC categories to emphasize the key technology areas. (Note: Interested readers can look up for the explanation of CPC codes in the following website:
https://www.uspto.gov/web/patents/classification/cpc/html/cpc.html, accessed on 2 February 2025). A total of 1453 unique CPC codes were identified, each contributing at different rates to the depiction of the EV patent landscape. Since most patent documents are classified under more than one CPC code, the overall sum of these codes exceeds the actual number of patents analyzed. The most frequently occurring codes were “Y02T-0010”, followed by “B60L-0050”, “Y02T-0090”, “B60L-0053” and “B60L-2240”. These codes belong to subclass Y02T, which stands for “Climate change mitigation technologies related to transportation”, and subclass B60L, which pertains to “Propulsion of electrically propelled vehicles”. As the EV domain is driven by advancements in sustainable transportation and innovative electric propulsion systems, it is natural that a significant portion of the associated technology areas is centered around technologies that reduce the environmental impact of transportation and enhance energy efficiency. In particular, Y02T codes capture solutions aimed at minimizing the environmental impact of transportation, while B60L codes reflect ongoing progress in electric drive systems and associated power management. In similar vein, Y02E (“Reduction in greenhouse gas emissions”) encompasses a range of technologies focused on lowering carbon footprints across transportation systems. Moreover, H02J codes address innovations in power conversion and inverter technologies essential for effective energy management in EV drivetrains, while H01M codes pertain to advancements in battery systems and energy storage solutions, which are vital for extending range and ensuring reliable power delivery. Together, these classifications offer a comprehensive view of the strategic technological priorities driving the evolution of the EV sector.
Table 1 describes the top 20 patent assignee in the field of EV. As expected, most of the top assignees are well-established automotive manufacturers and technology companies actively involved in electric vehicle innovation. Ford Global Tech, which specializes in intellectual property management, leads the ranking with 632 patents, followed by Toyota Motor Corp, Honda Motor, Hyundai Motor and Kia, and GM Global Technology Operations. These companies have been both traditional leaders in the automotive industry and key drivers of EV development. Since Hyundai Motor and Kia are both under Hyundai Motor Group, they frequently collaborate and file patents together to achieve technological synergy, cost efficiency, and competitive advantage in the rapidly evolving EV market. For example, joint patents ensure unified technological standards across the organizations, fostering interoperability and streamlining innovation of production lines. Beyond traditional automobile manufacturers, the ranking includes several technology and electronics companies, such as Murata Manufacturing Co. and Qualcomm. Their presence underscores the increasing integration of advanced battery management systems, power electronics, and wireless communication technologies in EVs. It signals the convergent nature of EV patent landscape, as the inclusion of semiconductor and electronics manufacturers highlights the critical role of digital technologies in modern EVs. Emerging players like Thunder Power New Energy Vehicle Development Company and NIO USA Inc. demonstrate the rising influence of EV startups in pushing next-generation EV advancements, including battery innovation and autonomous mobility solutions.
Interestingly, when we analyze the assignee’s country of origin from
Table 1, most of the top patent holders are based in Japan, the United States, South Korea, Germany and China.
Table 2 shows the global distribution of patents by frequency for the top 10 countries. This distribution reflects the dominance of these countries in the global EV industry. Rather than each country specializing in a single technology domain, the patent landscape reveals significant overlap, with companies across different regions actively developing electric powertrain systems, battery management technologies, and smart mobility solutions. This convergence underscores the highly competitive and collaborative nature of EV innovation.
3.2. Co-Occurrence Network
Based on the patent co-classification relationships, we constructed six distinct co-classification networks, which serve as input for the node2vec algorithm in the subsequent analysis step. The first network spans the entire analysis period and is used to generate embeddings for similarity-based link prediction. The resulting network consists of 1452 unique nodes, each representing a specialized technology domain, and 35,962 edges. The second network, incorporating patent data filed between 2015 and 2020, was used to train machine learning models and includes 1155 nodes. The third through fifth networks represent the co-classification network for the years 2021, 2022 and 2023, respectively. These three networks are used to evaluate the models’ predictive performance in a progressive validation process, testing their ability to identify links in each subsequent year. Notably, the number of unique nodes decreased gradually from 501 in 2021 to 489 in 2022 and 416 in 2023. The sixth network was constructed using the patent data from 2024, comprising 312 unique nodes and 3843 edges. Overall, the generated networks span different periods, with varying node counts each year. The decrease in node numbers is influenced by the number of patent applications present in the respective year.
Figure 5 visualizes the resulting network for the entire analysis period, providing a static snapshot of interconnected technology domains. However, to enhance the clarity of visualization, only the top 10% of nodes are displayed. The visualization was performed using Fruchterman–Reingold algorithm, thereby bringing highly connected nodes closer to the center. A greater node size represents a higher degree of centrality, indicating the node’s importance within the network. Accordingly, Y02T-0010 (Road transport of goods or passengers) and B60L-0050 (Electric propulsion with power supplied within the vehicle) have prominent positions due to their high centrality. These prominent nodes align with those frequently occurring CPC codes. The average number of edges per node is approximately 49.53, while the overall density is 0.034. This points to a specific structural characteristic of the network, which is sparse but unevenly connected, with some nodes having significantly more connections than others. This is similar to hub-and-spoke structure, in which hubs act as key intermediaries for connectivity within the network.
Table 3 summarizes the top 10 frequently interconnected node pairs, with the number of co-occurrences measured by the absolute number of connected node pairs. The highest number of co-occurrences was found between Y02T-0010 and Y02T-0090. Given the context of Y02T, which relates to climate change mitigation technologies in transportation, this co-occurrence indicates a trend in the transportation sector toward increasingly integrative solutions, combining direct emission reduction strategies with systemic improvements to maximize environmental impact. Moreover, most of the interactions occur involving Y02T-0010, which is classified as “Road transport of goods or passengers”, highlighting its central role in innovations that focus on reducing emissions and enhancing sustainability in road transportation.
3.3. Node2vec Embedding
To obtain node2vec embeddings, it is necessary to set the conditions for biased random walk. The key parameters for the random walk include “number of walks”, “walk length”, “embedding dimension”, “p (return parameter)” and “q (in-out parameter)” [
26]. These settings can influence the balance between capturing local and global network features, and they ultimately affect the quality of the embeddings for downstream tasks.
Number of walks: The number of random walks initiated from each node. Increasing this number provides a more robust sampling of the network structure, but at the cost of higher computational cost.
Walk length: The number of steps in each random walk. Longer walks capture more of the global network structure, whereas shorter walks focus on local connectivity.
Embedding dimension: The size of the vector representing each node. A higher dimension may capture more detailed features but also increases the risk of overfitting.
Return parameter (p): It controls the likelihood of immediately revisiting a node during a walk. Lower values of p increase the likelihood of returning to the previous node, thereby favoring a more localized exploration of the network.
In-out parameter (q): It balances the search between breadth-first and depth-first strategies. A lower value encourages outward exploration (depth-first), while a higher value biases the walk toward the local neighborhood (breadth-first).
In this study, we based our settings on prior research findings and set the embedding dimension 100. According to original node2vec paper [
26], performance tends to saturate once the representation dimensionality reaches around 100. Lombardo and Poggi [
34] investigated the relationship among “walk length”, “number of walks” and “computation time”, demonstrating a linear increase in processing time as these parameters increased from 1 to 30. Their experiments standardized
p =
q = 1 to ensure uniform exploration strategies (balancing depth-first and breadth-first approaches). Moreover, Peng et al. (2019) revealed that prediction performance was not significantly affected by varying
p and
q values [
35]. This collective evidence indicated that depending on the research objectives and priorities, certain parameter settings may be more appropriate. For example, one may choose to optimize parameters such as walk length and number of walks, while keeping
p and
q to their default values. We decided chose a rather neutral setting, with no bias toward either exploration strategy; therefore, we adopted
p =
q = 1. To reduce the computational complexity, the walk length was set to 20 and number of walks to 200.
3.4. Similarity-Based Link Prediction
The first network was used to perform the similarity-based link prediction. For similarity calculation, we only considered pairs of nodes that are not directly connected and exceeded a certain similarity threshold. Typically, setting a higher threshold yields a smaller set of candidate pairs with greater confidence, while a lower threshold includes a broader range of potential relationships, allowing for more exploratory analyses. Based on these considerations, we set the threshold at 0.75, which resulted in the proposal of 54 unconnected edges.
Table 4 shows an exemplary selection of these candidate edges, along with their corresponding similarity scores. Notably, the highest similarity score observed between H01L-0029 and H10D-0064, warranting further investigation. The CPC scheme has undergone multiple revisions since its introduction in 2013, reflecting technical advancements and evolving patent landscapes. While H10D-0064 pertains to “Electrodes of devices having potential barriers”, H01L-0029 is not searchable within the current revised version. Its definition can be found in the original version as “Semiconductor devices adapted for rectifying, amplifying, oscillating or switching, or capacitors or resistors with at least one potential-jump barrier or surface barrier”. The observed similarity between these classifications may stem from overlapping functional or structural attributes inherent in semiconductor device design. In the context of electric vehicles, this convergence is particularly useful, as modern EV powertrains rely on high-performance semiconductor devices for efficient energy conversion and motor control. By integrating advanced electrode designs with optimized semiconductor components, manufacturers can achieve improved thermal management, reduced energy losses, and enhanced durability of power electronics. B60R-0017 pertains to “Arrangements or adaptations of lubricating systems or devices”, focusing on vehicle-specific lubrication mechanisms. F16N-2200, representing “Condition of lubricant”, encompasses broader technical details for monitoring lubricant properties (e.g., oxidation, viscosity, contamination) across industrial applications. The convergence between these two codes might involve designing adaptive lubrication systems for electric vehicles, capable of assessing lubricant condition in real time. EVs still require lubrication for components like reduction gearboxes and adaptive lubrication mechanisms enable proactive maintenance strategies, improving performance and longevity in EV applications. The identified candidate node pairs will be merged with the results from classification-based link prediction to highlight the potential areas of technology convergence.
3.5. Binary Classification-Based Link Prediction
The second network was used to train and compare different machine learning models. Edge embeddings are used as inputs for training the classifier. In terms of generating negative edges for training, we opted for distance-based sampling approach as mentioned in
Section 2.6. In any graph, the number of negative edges vastly outnumbers the positive edges, leading to a severe imbalance that can overwhelm the training process if all available negatives are used. To address this issue, a distance-based method is employed to prioritize negative samples that are closer to the positive examples, making them more challenging and thus enhancing the model’s ability to learn finer distinctions and achieve more robust performance. By limiting the number of negatives, we not only mitigate the imbalance but also reduce the computational burden during training, which improves the model’s ability to generalize and accurately predict links.
Table 5 summarizes the number of training and test samples, along with the distribution of positive and negative edges for the binary classification task. The training set is used for initial model training, while the test sets (2021–2023) are used for progressive validation, utilizing a distance-based approach to ensure a balanced dataset.
Following the procedure outlined in
Section 2.6, four different machine learning algorithms were employed to evaluate the effectiveness of our approach: Logistic Regression (LR), Random Forest (RF), eXtreme Gradient Boosting (XGB), and Light Gradient Boosting Model (LGBM). Logistic Regression serves as a baseline due to its simplicity and helps identify linear relationships in the data. Random Forest, an ensemble method, is known for its robustness and ability to handle high-dimensional data. XGB and LGBM are advanced boosting algorithms that have demonstrated superior performance in various machine learning tasks. By comparing the performance of these diverse algorithms, we aim to obtain the most robust and accurate model for our specific prediction task. To ensure a fair comparison, we trained and evaluated each model using the same dataset and performance metrics.
Subsequently, models were trained and evaluated using data from 2015 to 2020, with performance validated progressively on co-classification networks for 2021, 2022, and 2023. This step-by-step validation approach allows us to capture emerging trends and shifts in the patent landscape over time. Following each validation phase, the models were updated to incorporate the latest insights. In this manner, we can ensure that our predictive framework remains dynamic and responsive to ongoing technological advancements in the field.
Table 6 summarizes the model performance across these validation periods, using accuracy, precision, recall, and F1-score as evaluation metrics.
The results revealed that RandomForest achieved the highest overall F1-score (89.25%) in 2023 and demonstrated superior performance on average in most metrics, except recall. Logistic Regression performed the worst among all models, with the lowest accuracy, precision, recall, and F1-score on average. This underperformance can be attributed to its inherent limitation in capturing complex, non-linear relationships within the data. In the context of link prediction in co-classification networks, relationships between patent classifications are likely influenced by non-linear associations and hierarchical dependencies. Moreover, its relatively lower recall values indicate that it struggles to identify positive instances, potentially missing many relevant links. XGBoost and LightGBM also showed competitive performance, highlighting the advantages of boosting methods in link prediction. Most models showed slight improvement in their performance metrics from 2021 to 2023, particularly in recall scores, with XGBoost showing the most substantial improvement in this metric.
Overall, the consistent improvement in evaluation metrics across models over time implies that the training process effectively incorporated new information, enhancing the ability to identify relevant connections in the evolving patent landscape.
3.6. Final Prediction Results
In this analysis step, we combined prediction results from
Section 3.4 and
Section 3.5 to derive novel technology convergence opportunities. This hybrid modeling approach, which integrates similarity-based methods with the power of machine learning based prediction, provides a more robust framework for identifying potential technology convergence patterns. We selected the best-performing model from 2023 based on F1-score (“RandomForest”) and used it to predict new technology linkages in 2024. High-confidence candidate edges were synthesized by combining machine learning-driven predictions with similarity-based scores. By leveraging insights from previous years, the model was able to anticipate potential linkages that had not yet been observed in historical data.
Table 7 summarizes the predicted links between non-connected nodes with corresponding similarity score and prediction probabilities. In total, 18 new links were predicted as relevant for the future. To ensure higher confidence in the prediction results, only connections with a probability score greater than 0.8 were considered.
Figure 6 highlights the predicted links for future interval, whereby the green links indicate those interactions. The converging pairs of CPC codes are summarized in
Table 7.
According to
Table 7, several distinct combinatory technology convergence patterns emerged. For example, the combination of H01L-0029 and H10D-0064 showed the highest confidence. This explains the focus on semiconductor device engineering, which could result in more efficient inverters and converters, improving overall EV powertrain efficiency as well as battery management. The top 10 predictions all showed remarkably high prediction probabilities (>0.9), coupled with similarity scores above 0.75, suggesting strong potential for technology convergence. The convergence between B23K-0015 (Electron-beam welding or cutting) and F28F-2013 (Heat conductive materials) could lead to better thermal management. Especially, the combination of precise welding and advanced heat exchanger design can synergize to create improved battery pack designs. Moreover, the convergence between B32B-0019 (Layered products natural mineral fibers or particles) and D10B-2505 (Industrial textiles) could lead to advanced composite materials that offer superior passive fire protection capabilities in EV batteries. In a similar vein, the convergence between B32B-0019 and D03D-0001 (Woven fabrics designed to make specified articles) could offer enhanced thermal protection systems for battery assemblies. Further promising convergence patterns include B82Y-0030 (Nanotechnology for materials or surface science) and C09D-0127 (Coating compositions based on homopolymers or copolymers of compounds). This combination could generate specialized composite materials that are critical for protecting battery modules, power electronics, and other sensitive EV components from extreme temperatures and environmental degradation. Similarly, the combination of G01J-2001 (Photometry, e.g., photographic exposure meter) and F25D-0017 (Arrangements for circulating cooling fluids) suggests advancements in thermal management and control systems for EVs through non-invasive monitoring technique. The integration of optical sensors in cooling system could enable predictive maintenance and early detection of potential overheating issues, improving overall vehicle safety. Notably, subclass B32B (Layered products) appeared multiple times as a converging node, indicating its potential as a technological convergence hub. This subclass could play an integral role in innovating energy efficiency, safety, and sustainability related aspects of EVs.
Table 8 synthesizes the key technology convergence opportunities from
Table 7, grouping them by thematic areas and linking them to their potential impact on EV development.
4. Discussion
The prediction results align with the research objectives by identifying key areas of technology convergence, such as battery protection and thermal management. Hence, future technology convergence is most likely to occur at the intersection of nanotechnology, sensor systems, and advanced materials in the EV sector. From the perspective of technology management and R&D policy design, these convergence patterns reveal important strategic directions for innovation [
36,
37]. Understanding where technological intelligence might emerge is crucial for guiding strategic investments, fostering innovation, and shaping research agendas that support the evolution of next-generation EV technologies. However, previous studies have focused on describing existing technological landscape rather than predicting future convergence trends [
38]. While retrospective analyses provide valuable insights, they often lack a forward-looking perspective necessary for proactively identifying synergistic technology relationships. To bridge this gap, our study proposed a dual-level predictive analytical framework to systematically forecast potential technology linkages. The similarity-based method excels at efficiently pinpointing structurally promising pairs, while the classification method refines these predictions with data-driven precision. The findings unlock important knowledge residing in patent information and yield the following implications:
- (1)
In terms of theoretical implication, this study proposed a novel analysis framework for predicting potential convergent technological linkages. By integrating similarity-based scoring with a classifier model, the framework can mitigate false positives and allows for a systematic identification of emerging technology patterns. The high flexibility in the execution of method (for instance, the ability to adjust threshold values) further enhances its adaptability and enables researchers to tailor the model to various research purposes. This study adopted a rather conservative perspective in adjusting the hyperparameter of random work, but a more aggressive optimization strategy may be applied to explore broader network contexts. Furthermore, it can also be adopted to other technology domains, thereby broadening its applicability for additional innovation studies [
17,
39].
- (2)
In terms of managerial implication, this study can serve as a strategic tool for decision-makers by providing actionable insights into emerging convergent technologies. It enables them to anticipate novel combinatory innovations, thereby aligning their strategic initiatives with future technological investments. The proposed framework can help decision-makers to allocate resources more efficiently and mitigate risks associated with market uncertainties. By offering a data-driven approach of emerging trends and recognizing early signals of technology convergence, R&D managers can design targeted policies that facilitate technology buy-in or transfer. This strategic foresight translates into tangible benefits, such as developing a practical roadmap for organizations seeking to navigate complex technological landscapes and ensuring a competitive edge at the forefront of innovation in the evolving EV sector [
40].
- (3)
In terms of uncovering technology convergence opportunities, the predictions highlighted critical technology areas that exhibit strong potential for convergence. The analysis revealed that successful EV technology development will increasingly depend on the ability to integrate diverse technological domains, particularly in areas of thermal management, materials engineering, and protective systems [
41]. The findings align with the growing demand in the EV market for the development of integrated solutions in battery protection and thermal management [
42]. Thus, this understanding can guide R&D strategies and resource allocation for future technology development.
- (4)
From an economic sustainability perspective, the emergence of new technological convergence opportunities, such as integrating advanced thermal management systems and novel composite materials, has the potential to extend vehicle lifespan and lower the overall total cost of ownership of EVs. Such improvements could directly impact consumer acceptance rates and adoption speed. The insights gained from our predictive framework offer R&D departments and policymakers tools to preemptively align innovation and investment priorities, thus optimizing economic returns from technology development activities. Furthermore, business strategies in the EV sector could benefit substantially from our predictive analysis of convergence. By understanding which technologies are likely to converge, automotive and technology companies can proactively manage their R&D portfolio, strategically entering new market segments or adjusting existing product lines. For instance, battery manufacturers, semiconductor producers, and automotive OEMs can leverage convergence insights to form strategic alliances or joint ventures, enabling cost-sharing, risk mitigation, and faster time-to-market for innovative EV solutions.
- (5)
In terms of sustainability impacts, the findings align with the broader goals of sustainable development by fostering innovations that reduce greenhouse gas emissions and improve energy efficiency. This could lead to the development of new business models centered around green technologies, which are increasingly favored by consumers and policymakers.
- (6)
The findings of this study highlight several key areas where policymakers can intervene to accelerate EV adoption and innovation. By identifying high-potential technology convergence opportunities (e.g., thermal management systems, and advanced composite materials), policymakers can prioritize funding for research programs that target these areas. For instance, directing funding toward projects that combine advanced composite materials with thermal management systems could enhance battery durability and vehicle efficiency [
43]. Policymakers can also foster interdisciplinary collaboration between industries by offering tax credits, subsidies or grants for joint R&D projects. Governments can create favorable regulatory environments for convergent technologies by updating standards for EV components. For example, mandating safety protocols for advanced battery materials or thermal management systems could accelerate their adoption [
44]. Especially, technology convergence often necessitates the creation of new standards to ensure compatibility across different systems. Hence, policymakers should work with industry stakeholders to develop and implement standards that accommodate these convergent technologies, ensuring seamless integration and market adoption. Lastly, expanding programs like the U.S. National Electric Vehicle Infrastructure (NEVI) Formula Program to include funding for convergent technologies could favor the development of innovative solutions that address multiple challenges simultaneously, such as enhancing both charging efficiency and battery longevity [
45].
In sum, the proposed analysis framework can broaden the existing toolbox of generating technology intelligence and resolve uncertainty in technology convergence forecasting. The systematic approach supports strategic decision-making and R&D prioritization, ultimately fostering innovation and sustainable growth in the evolving EV sector.
5. Conclusions
This study provided a flexible framework for navigating and anticipating technology convergence opportunities through a data-driven approach. A case study in the EV field demonstrated the effectiveness of this framework, highlighting its ability to identify emerging synergies between distinct technological domains with high confidence. The derived insights enable organizations to strategically allocate R&D resources, accelerate innovation cycles, and capitalize on technology development opportunities.
Despite its contributions, this study has certain limitations that need to be considered, offering future research avenues. First, the predictive output may vary depending on the parameter settings during the node2vec transformation. Hence, additional studies could compare different threshold values to determine under which conditions the model yields more reliable performance and to better understand the impact on the exploration and exploitation trade-off. Second, expanding the applicability of analysis framework to other domains beyond EVs would allow for a broader validation of its effectiveness and adaptability. Hence, future research could be extended to related areas, such as battery management system, renewable energy, electronics or hybrid electric vehicles, for further testing and refining the methodology. Moreover, the proposed framework could be transferred into other tasks, such as R&D partner recommendation or patent citation prediction by adjusting the input features. Third, accurately defining the analysis scope is crucial, as the quality of used patent data can impact the accuracy of predictions. Including irrelevant patents can introduce noise, while excluding pertinent ones can omit significant data. To ensure comprehensive coverage across the EV sector, the patent retrieval strategy could be enhanced or include patent data registered in other key patent jurisdictions. Lastly, to gain a more intuitive understanding of potential technology convergence dynamics, additional data sources beyond patent classifications, such as patent content or market data, need to be incorporated to provide a more comprehensive view of emerging technological trends. Integrating patent text analysis could enhance the interpretability of prediction results. We, therefore, recommend scholars to carefully examine further application cases to facilitate the diffusion and continuous improvement of this framework.