1. Introduction
Methods for finding patterns in data and predicting the future from data have grown into an important field of research, which has brought about advances in information technology [
1,
2,
3]. LeCun et al. expected that predictive models that can analyze images, natural language, and signals will advance further in the near future [
4]. In fact, much modern decision making is being carried out after insights were gained from predictive models, from self-driving cars to medical diagnoses [
5,
6,
7,
8]. However, the reliability and efficiency of predictive models are affected by various factors such as noise in data [
9,
10,
11,
12]. Thus, we have to consider many factors—such as
aleatoric uncertainty and
epistemic uncertainty—to secure confidence in a predictive model [
13,
14,
15].
Kendal and Yarin argued that
aleatoric uncertainty and
epistemic uncertainty are, respectively, irreducible noise contained in data and reducible errors where the model cannot explain the data [
16]. Therefore, modern machine learning and deep learning models should ensure confidence. In other words, confidence, which is the likelihood that a predicted label is correct, needs to be calibrated to reflect the ground truth accuracy. Calibration refers to the statistical consistency between the accuracy and probability of prediction [
17,
18]. In this respect, the model should not only provide accurate predictions but also calibrated confidence [
19]. Calibrated confidence is required for modeling fraud detection and healthcare, in addition to self-driving cars, as mentioned above. This is because the uncertainty in the model can pose a direct risk to objects such as persons.
As services that use natural language processing are becoming increasingly common, research on named entity recognition, parts of speech, and question answering is being actively conducted [
20,
21,
22]. However, research on models with calibrated confidence for text classification is insufficient because the risk to an object is low relative to other applications. Although Technology Management (TM) using documents such as patents involves many factors that can threaten companies and research institutes, there has been insufficient discussion of the uncertainty.
The reasons to consider calibrated confidence in TM based on data analysis are as follows:
TM, implemented with strategies such as patent litigation, technology transfer and valuation, is a time-consuming and expensive task [
23,
24];
Patent litigation causes the proliferation of patent wars between companies because it can inflict huge losses on the accused by prohibiting manufacturing, sales, and imports [
25,
26,
27];
Technology valuation can be used for the early commercialization of excellent technology, which can offer opportunities to expand business models [
28,
29];
Technology transfer can save on the time invested in technology development and can further serve as a cornerstone for mergers and acquisitions (M&A) between companies [
30,
31];
Most technologies do not go through a TM, and data labels can become imbalanced [
32].
Based on the above evidence, TM is potentially a high-risk application. In particular, the uncertainty of the predictive model for these activities may be high. Therefore, we need to investigate ways of lowering the uncertainty in the model.
Many previous studies have pointed out that the confidence of a predictive model is uncalibrated compared to the dramatically improved accuracy. Platt proposed a method for converting the output of a predictive model into a probability using a logistic function [
33]. He laid down the groundwork for measuring the confidence of many models and comparing with their accuracy. In particular, Guo et al., in an extension of his research, contributed to reducing overconfidence by developing a function that could return a softened probability [
34]. Zadrozny and Elkans judged whether their model’s confidence was calibrated by visualizing the expected accuracy and observed accuracy [
35]. Since their method could express the uncertainty of a model in a graph, it enabled its intuitive evaluation. Naeini et al. devised a method to approximate and measure the expected value of the difference between confidence and accuracy [
36]. Based on this study, Nixon et al. proposed a method that could estimate the calibration error more efficiently than existing methods [
37]. However, previous studies were limited in their ability to measure the uncertainty in the model after learning was completed.
Recently, research has been conducted to develop a model that can calibrate confidence by improving the training process. Thulasidasan et al. and Zhang et al. tried to lower the uncertainty in the model by proposing a method to increase the diversity of representations through data mix-up [
38,
39]. They argued that their proposed method could reduce the empirical risk of overfitting and overconfidence in the training data. Furthermore, Ovadia et al. and Chan et al. emphasized that uncalibrated confidence can be prevented by simply shifting the data’s distribution [
40,
41]. In addition, Hendrycks and Gimpel proposed a method of measuring the calibration score for each object to determine the out-of-distribution that led to uncalibrated confidence [
42]. Pereyra et al. developed a regularized training method by assigning a penalty for overconfident predictions [
43]. Krishnan and Tickoo calibrated the confidence of their predictive model by optimizing the loss function, reflecting the relationship between accuracy and uncertainty [
44]. Jiang et al. considered model confidence as knowledge that can be obtained from data and proposed a neural network architecture that can learn it. They designed a novel learning strategy to calibrate confidence in modern predictive models with complex and deep layers, thereby lowering the uncertainty of deep neural networks [
45]. Xenopoulos et al. presented an interactive diagram that could visually represent both the uncertainty of individual observations and model confidence. In addition, they attempted various validations of the proposed method by conducting experiments on cases using both real-world and synthetic data [
46]. Furthermore, Mukdasai et al. used various measures and histograms to comprehensively consider the model’s capability, steadiness, accuracy, reliability, and fitness [
47]. These previous studies had the advantage that they could reduce the uncertainty of various applications because they calibrated confidence during the model training process.
We mainly focused on calibrating the confidence of TM by using the Variational Bayes (VB)-based generative model. Previous studies have argued that the confidence of the predictive model can be calibrated through the process of increasing the representation of the data. With this in mind, we propose a method of calibrating the confidence by (i) securing the representation of various data and (ii) generating data even when the number of training samples is small with a VB-based generative model. For that purpose, this study uses patent data to improve the problem with TM, which is a label-imbalanced and highly uncertain application. The patent system—the main subject of TM—encourages industrial development by disclosing instead of guaranteeing monopoly rights to inventors. A predictive model has been proposed for tasks such as technology transfer by extracting the features of TM contained in patents. Liu et al. developed a deep learning-based framework to predict patent litigation [
48]. In addition, Kwon argued that a machine learning model trained on a patent data could accurately and selectively estimate the technology to be transferred [
49,
50]. Furthermore, Setiawan et al. proposed a method that used a graph-based algorithm to determine the most efficient technology transfer path to promote TM innovation [
51]. One limitation of these previous studies is that they did not consider potential uncertainty in patent analysis.
Patents contain aleatoric uncertainty for reasons such as decreases in value due to the time lag between research and development (R&D). That is, patent data may contain irreducible noise due to the aleatoric uncertainty caused by TM. In addition, because patent labels obtained from TM may be imbalanced, researchers need to develop predictive models that can explain data using various training skills. Therefore, they should be concerned about epistemic uncertainty because it is difficult to guarantee how certain the results of patent analysis are. Therefore, this paper proposes a methodology that can calibrate confidence using a generative model to reduce the uncertainty of TM when analyzed.
In this study, our contribution is as follows:
Since our method uses a generative model, various data representations can be obtained, and the confidence can be calibrated even when the quantity of data is small;
Since a generative model can adjust the distribution of imbalanced labels, it can prevent the confidence of a specific label from becoming too large or too small;
Since the proposed methodology can obtain a disentangled representation of the data through a generative model, the results of TM can be compared in a low-dimensional space;
Since our method uses a large-scale, pre-trained language model, it can respond appropriately to patent terminology and new technologies;
This study proposes a computationally scalable method that guarantees calibrated confidence in various tasks to drive sustainable management and technological innovation.
The remainder of this paper is structured as follows.
Section 2 provides a theoretical background for VB.
Section 3 explains the proposed method and present research hypotheses designed to prove the methodology’s validity.
Section 4 presents a series of experiments to demonstrate the applicability of our methodology and describes statistical tests of the research hypotheses we carried out. The proposed method has several limitations, which
Section 5 discusses. Finally,
Section 6 draws conclusions and suggests some future works.
2. Theoretical Background
In this study, a generative model is used to calibrate the confidence generated when a predictive model is applied to TM. For versatility in the proposed methodology, we use the Conditional-Variational AutoEncoder (C-VAE), a VB-based generative model that can selectively generate data that belong to a specific class [
52].
Let be the latent variable that is generated from the prior distribution . The input data for C-VAE is generated by the distribution for , which is . Furthermore, consists of N i.i.d. samples of the variable conditioned on .
Next, let be the target variable that is generated from the distribution . The variable , having number of categories, is expressed as . Then, the target variable with category is . Note that is a vector of the standard base, where denotes a vector with 1 as the m-th element and 0 everywhere else.
Equation (1) is the conditional log-likelihood of
when
.
, known as the recognition model, was introduced to approximate the actual posterior
and was reparametrized to the deterministic differentiable function
using the variable
and the noise variable
as arguments. The generative model
that maximizes Equation (1) can be divided into an encoder and a decoder. When the number of latent dimensions is
, the encoding result of the i-th sample is
. When the category of the i-th target variable
is
, the latent vector of the data is
. The function
, called the Kullback–Leibler divergence, computes the difference between two inputs:
The encoder of C-VAE represents the input data as a disentangled vector according to the label. The decoder of C-VAE receives a specific label along with the vector, and then generates data. This generative model can be utilized in various applications. In particular, VB shows excellent performance for detecting anomalies in network intrusion [
53], credit card fraud [
54,
55], and medical diagnoses [
56,
57] when labels are imbalanced. Furthermore, many previous studies have demonstrated that VB shows well-calibrated results in healthcare, which is one of the fields sensitive to uncertainty [
58,
59,
60,
61].
The generative model used in this study has the following advantages. First, it is efficient when data labels are imbalanced [
62]; the labels for patent type obtained through TM are imbalanced. For example, there are fewer transferred technologies than those that are not. Since patents have these characteristics, there is a high risk of uncertainty in predictions; therefore, VB can be effectively utilized in TM. Second, the VB works well with multimodal data [
63]. Patents containing quantitative indicators, such as the number of inventors, and texts, such as abstracts, are multimodal. Since the calibration of confidence is affected by the generative model, the proposed method can be expected to show sufficient performance even with VB. Finally, VB is a computationally tractable method [
64]. VB, which approximates the true posterior in Bayesian inference, has a low computational cost for training and a low risk of gradient divergence.
3. Proposed Method
When the target label is , random numbers are generated from conditioned on . At this time, when the decoding condition for the k-th random number is , the generated data are . Therefore, it holds that .
The result of concatenating the raw data
and
in the row dimension is
. Then,
is a concatenated vector of
and
. This study aims to measure the change in the calibrated confidence according to
. Equation (2) is an operation for finding
:
and denote the number of observations in category and those not in category , respectively. When the values obtained using Equation (2) and the raw data are merged, is times ().
Let
trained on
and
be classifier
. Assuming that the predicted value of the test data is
, the confidence of the observation with the label
is as follows:
In Equation (3), and are the negative log-likelihood of the test data and predicted values for data with label , respectively.
In this paper, we propose a VB-based method to improve the calibrated confidence in text classification.
Figure 1 shows the architecture of the proposed methodology. First, the proposed method extracts quantitative indicators, text, and labels from the collected documents. Quantitative indicators refer to information, not text, in documents. A label is a category that represents the document, such as the sentiment or subject of the text. The proposed method scales quantitative indicators for effective convergence of VBs. The function that converts the sample space of the input data
is as follows:
In Equation (4), Max and Min are the maximum and minimum values of the data, respectively. Therefore, the space of the input data is normalized to between 0 and 1.
Next, for the progress of the proposed method, labels are one-hot encoded. The one-hot encoding of target variable
, which is category
, is as follows:
In Equation (5), means standard base. If the number of categories is , denotes a vector with a 1 as the -th element and 0 elsewhere.
Text is embedded in dimensions through a large-scale, pre-trained language model. This model transforms the document into a vector in real space that reflects the context. The p2-dimensional space has the advantage of obtaining the distance or correlation between document vectors; these variables are used as inputs to the VB-based generative model. The output is a quantitative indicator and embedding vector reflects the features of a specific label.
In
Figure 1, the encoder encodes (
+
+
)-dimensional input data into a
-dimensional vector. At this time,
is smaller than (
+
+
) because the latent space of the data needs to be extracted.
Figure A1a is the encoder architecture of the VB-based generative model. The purpose of a generative model is to generate similar data to the training data. Therefore, the encoder multiplies the deviation (
) of the encoded input data by the noise (
) and uses the added value with the mean (
) as a latent vector. Then, the input data of the decoder are a latent vector and one-hot encoded label.
Figure A1b is the decoder of the VB-based generative model. To calibrate the confidence, the proposed method concatenates the generated and training data and then uses it as an input for the classifier.
We propose three research hypotheses for the validity of the proposed methodology. The research hypothesis of the proposed method is as follows:
Hypothesis 1. The quantitative indicators of text have different distributions depending on the document’s purpose.
The model for classifying patents uses quantitative indicators, such as claims, and the number of inventors as predictors. Then, the predictors of the classification model should be able to explain events such as technology litigation, valuation, and transfer. Hypothesis 1 expresses that quantitative indicators will have different distributions depending on the target label. If the distribution of indicators is similar, they will be unsuitable predictors for classifying data [
65,
66]. Therefore, we assume that the indicators used in the proposed methodology reflect the data characteristics.
Hypothesis 2. In the latent space obtained using a generative model, each document according to a label will have a disentangled representation.
The proposed method calibrates the confidence by learning the data generated through VB. When classifying technology transfer, data that reflect the features of the transferred patent should be generated. Therefore, following Hypothesis 2, the latent space of a patent according to technology transfer in the generative model should be composed of disentangled variables. It is important to secure a disentangled latent space for the generative model; if the data characteristics are generated in entangled space, such as noise, it will have a negative effect on improving the performance of the predictive model [
67]. Therefore, we need to statistically test whether the latent space obtained from the generative model is disentangled depending on the data characteristics.
Hypothesis 3. The proposed method improves the calibrated confidence of document classification.
Finally, we need to calibrate the confidence of the document classification. This study extracts various predictors and builds a VB-based generative model. Next, generative models generate data in the disentangled space. That is, Hypothesis 3 is the basis for judging whether the proposed method helps calibrate the confidence of the model. We expect that the confidence of a classification model that has undergone this process will be calibrated. To this end, this paper will not only intuitively present the results of the method proposed through experiments through various graphs, but it will also secure the validity of the study through statistical tests.
Therefore, our research hypothesis focuses on calibrating the confidence of document classification and improving the validity of the classification model and the efficiency of the generative model. All hypotheses are statistically tested with the experiments in
Section 4.
4. Experimental Results
4.1. Dataset and Experimental Setup
Experiments are conducted to examine the proposed method’s applicability. The data used in the experiment were 11,444 US patents. The patents were collected from the WipsOn database.
Table 1 shows the predictors extracted from the collected data. The table lists 10 (
) variables—from the number of claims to the number of family patents (fam
E)—that are quantitative indicators. Emb
p2 is a 384 (
)-dimensional vector in which the text in the patent document is embedded. In the experiment, we used a transformer-based document embedding model to process the natural language of the patent data [
68].
The experiments have three target variables. The first is
Litigation, which indicates whether a patent is litigated. Patent litigation is a process for claiming the prohibition of sale, compensation for damages, and return of unreasonable profits from a defendant accused of infringing on the rights of the plaintiff [
69]. Thus, patent litigation could inflict huge damage on a company, and they need to predict patent litigation risks.
The second is Valuation, which is graded in accordance with the technology’s future value. Since the number of patents being filed has rapidly increased recently, it takes a lot of time and expense to search for prior art or vacant technology for TM. To improve this, experts provide a grade that evaluates the future value of a patent. Then, researchers can utilize high-grade technology to analyze patent data. Thus, we use the grades provided by the WipsOn database. The Valuation variable used in the experiment is a binary category that denotes whether the grade of a patent is high or low.
Finally,
Transfer is a target variable that indicates whether technology is transferred. Technology transfer, a strategy that can rapidly increase the technological competitiveness of a company or research institute, means transferring patents [
70,
71,
72]. The target variables used in the experiment,
Litigation,
Valuation, and
Transfer, often have imbalanced labels due to TM. Therefore, this study applies the proposed method to confirm the practical applicability of the three TM tasks.
In
Section 4.2,
Section 4.3,
Section 4.4, we present the statistical tests performed for the three hypotheses in this study. Experiments were conducted individually, in accordance with the purpose of document classification. All statistical hypotheses were tested at the 0.05 significance level.
4.2. Comparison of Quantitative Variables Depending on the Purpose of Document Classification
Table 2 shows the results of Hypothesis 1. Statistical tests were used to compare the differences in predictors depending on the target variables. For example, for patents with a history of litigation, the mean and standard deviation of cite
P are 223.584 and 430.433, respectively. Levene’s test for homogeneity of variance showed that there was a statistically significant difference between numbers of cited patents that were and were not litigated. Research hypothesis 1 could be adopted in the
t-test, Wilcoxon Rank-Sum test, and Kolmogorov–Smirnov test conducted under the assumption of equal variance. This is because the results of statistical tests mean that the quantitative indicators of text have different distributions according to the document’s purpose. Therefore, there is a statistically significant difference in the number of cited patents depending on the litigation status.
4.3. Comparison of Representations in Latent Space Depending on Labels in Documents
This subsection describes the results of the statistical tests for Hypothesis 2.
Table 3 shows the distribution of target variables. In the raw data, the patents related to litigation (
Litigation = Y) are very few at 125 cases (1.092%). The percentages of high-grade patents (
Valuation = Y) and transferred patents (
Transfer = Y) are 10.154% and 23.086%, respectively. Through this, it is evident that the patent labels are imbalanced; therefore, this study aims to compare how the proposed methodology works depending on the label ratio. To evaluate the generative model
and classifier
, the raw data were divided into training data and test data in a 7:3 ratio.
Figure A1 in
Appendix A summarizes the architecture of
used in this paper. In the experiment, we defined the dimension
of the latent space of
as 2 for the statistical test of Hypothesis 2. The design of the statistical test for Hypothesis 2 is as follows. First, the latent space of
is divided into two-dimensional
and
, respectively. However, the general Kolmogorov–Smirnov test deals with the homogeneity of the distribution of one-dimensional data. Therefore, the general Kolmogorov–Smirnov test and the multidimensional version of the Kolmogorov–Smirnov test [
73,
74] are applied depending on the dimensions of the latent vector. Then, we can determine whether the latent space for each condition is disentangled through statistical hypothesis testing. Since the latent variables are not entangled in
, only data belonging to a specific label can be generated. Furthermore, we conducted experiments depending on the types of predictors and generative models to compare their results. In
Table 4, Quant and Text are the results of using only quantitative indicators and text of documents, respectively. In addition, VAE refers to a generative model that does not assume conditions for a specific label in C-VAE.
In
Table 4,
and
are vectors depending on the labels of documents obtained in latent space. In the table,
denotes a two-dimensional vector obtained by merging
and
. As a result of the experiment, when the generative model was C-VAE and the quantitative indicators and document texts were used as predictors, Hypothesis 2 was not rejected for all target labels. Therefore, the proposed method evidently disentangles the document representation for each label in the latent space.
4.4. Comparison of Improvements in Calibrated Confidence in Document Classification
The purpose of this subsection is to verify that the proposed method can calibrate the confidence of document classification through experiments on Hypothesis 3. For this, we generate
times the data labeled
and merge it with the training data. The optimal value of
was determined as shown in
Appendix B.
Table A1 in
Appendix B compares the prediction performance obtained using the proposed method. The optimal values of
are 1.5, 1.3, and 2.0 for cases where the target variables are
Litigation,
Valuation, and
Transfer, respectively. For example, when the target variable is
Valuation, the optimal value of
is 1.3. When the target variable is
Valuation, there were 566 and 5015 observations with labels Y and N in the training data, respectively. Using Equation (2),
generated 5954 observations whose label is Y. When raw data and generated data were concatenated, the number of observations with label Y became 6520. That is, the labels in the data augmented through the proposed method had a Y:N ratio of 1.3 (=optimal value of
).
The model
learns the data that contains the merged training data and generated data.
Figure 2 shows the distribution of probabilities when the labels in the test data are predicted. In the
Litigation cases, the likelihood of the proposed method increased. In the
Valuation and
Transfer cases, the likelihood was higher when it was ≥0.4, indicating that the confidence was calibrated. Therefore, the proposed method can calibrate the confidence.
Finally,
Figure 3a–c shows the comparison result of
obtained when the test data with the actual label Y were applied to the baseline and the proposed method (see Equation (3)). The
obtained through the proposed method tends to be higher than the baseline in all tasks.
Figure 3d–f shows the distribution of
obtained through the baseline and the proposed method. The distributions in the figure indicate how well the proposed method secures calibrated confidence compared to the baseline. Therefore, we compare the homogeneity of the two distributions to statistically test Hypothesis 3. Avg_baseline in
Table 5 is the average of the probability that an observation with actual label Y is correctly classified as Y by the model. Similarly, Avg_
is a value measured by the proposed method. For example, in
Figure 3e, the mean of
, which is a negative log-likelihood, is −1.855 and is 0.157 (=
) when converted to a probability. Similarly, the mean of
is −1.043; it is 0.353 when converted to a probability. When the
of the
Valuation case was 1.3, a miscalibration phenomenon that resulted in a large difference between the confidence and the F1-score was observed at the baseline. For example, in the valuation, the existing accuracy was 0.157 and the difference from the F1-score was 0.238, which was very large. Conversely, the difference between the value obtained by the proposed method and the F1-score was very small at 0.042. Calculating with the same logic, the proposed method evidently calibrates the confidence from a minimum of 0.02 to a maximum of 0.04. However, the proposed method showed a similar level of confidence to that of the accuracy. Therefore, our method can calibrate the confidence better than the baseline.
Figure 4 shows the results of comparing the differences in accuracy and likelihood depending on the generative models and predictors. When the target variables are
Litigation and
Valuation, the confidence is the most calibrated when the generative model is C-VAE and the predictors are document texts. When the target variable was
Transfer, the difference between accuracy and likelihood was smallest when the quantitative indicators were used together. As such, there are appropriate generative models and predictors depending on the label of the target. However, as a result of statistical testing for Hypothesis 2, only the proposed methodology could disentangle the data characteristics depending on the labels in latent space. Therefore, we need to test Hypothesis 3 on the results obtained using the proposed methodology.
The confidence obtained using the proposed method tended to be higher than the baseline for all tasks. Thus, we compared the homogeneity of the confidence obtained through the baseline and the proposed method to test Hypothesis 3.
Table 5 shows the results of rejecting the null hypothesis in the Kolmogorov–Smirnov test, paired
t-test, and the Wilcoxon Signed Rank test. Therefore, it is possible to calibrate confidence in document classification through the proposed method. The results of the experiments conducted in this paper are as follows. First, the quantitative indicators of the patents differ depending on the purpose of document classification. Second, in the latent space of the generative models, documents have disentangled representations depending on their labels. Finally, the proposed method can increase the confidence of predictions by reducing uncertainty in document classification.
5. Discussion
Recently, various applications have pointed out that the confidence of the predictive model does not reflect the statistical consistency between the accuracy and the probability of prediction. TM is one field that has a high uncertainty of predictions for reasons such as the time lag in technological development or biased expert decision making. In particular, patents that reflect the TM contain a lot of noise due to various factors. For example, through technology valuation, companies can find excellent technologies that they can then apply to their business models. However, as the value of any technology changes over time, businesses should consider the uncertainty inherent in data to make the right decisions. Therefore, this study proposes a method that reduces uncertainty about TM.
Previous studies have mainly devised visualization methods for comparison of expected and observed accuracy, to approximate expected values for the difference between confidence and accuracy or to estimate calibration errors. Recently, researchers have discovered that a model’s confidence can be calibrated through the process of increasing the diversity of data representation. Therefore, this study used a VB-based generative model to augment the document representation in various ways. In addition, we were able to intuitively grasp the degree to which the confidence was calibrated by visualizing the predictive probability and log-likelihood obtained using the proposed method.
The experiment of the proposed method was carried out by collecting actual patents. The proposed method calibrated the distribution of prediction probability to be less biased than before (see
Figure 2). In particular, the probability of the predictive model correcting the ground truth occurred frequently at ≥0.4. In addition, the log-likelihood of the data with actual label Y was larger than the baseline in all cases (see
Figure 3). Specifically, when the target variable was
Litigation, the difference between accuracy and F1-score decreased from 0.055 to 0.025. Similarly, when the target variable was
Valuation and
Transfer, the difference between the two measures decreased significantly (see
Figure 4). We found through experiments that the degree to which confidence was calibrated decreased as the proportion of labels became imbalanced.
To reduce uncertainty, previous studies have proposed methods for measuring the confidence of models. However, their limitation was that they measured the uncertainty of a model that had already been trained. Therefore, an alternate method was suggested that reduced uncertainty by calibrating the confidence while training the model. Specifically, some methods impose a penalty using a confidence score or by shifting data. Based on these studies, we proposed a method to calibrate confidence in text classification tasks using imbalanced data. To this end, the VB used in this study (i) works well in various fields such as healthcare, (ii) is suitable for multimodal data such as patents, and (iii) is a computationally scalable method.
Nevertheless, this study has the following limitations:
This paper did not present an optimization method to find the hyperparameter in the proposed methodology. The hyperparameter , which determines how much data are generated, is expected to be related to the precursors of the data. In the experiment, we determined the hyperparameters using a greedy search. However, methodologies or empirical guidelines for optimization should be proposed;
The proposed method cannot easily guarantee calibrated confidence for multi-class classification. To examine the proposed methodology’s applicability, we conducted various statistical experiments. However, the experiments were conducted on binary-class classification. Future research should consider multi-class classifications to reduce uncertainty in various TM tasks.
Finally, this study has several limitations. First, it is expected that a method for searching for an optimal value can be developed by analyzing the precursors of the data. This is because the smaller the precursors, the more data generation is required. Next, we expect that the proposed methodology can be applied to multi-class classifications by improving the architecture of generative models. However, a different approach is needed to statistically test Hypothesis 2 for multi-class classification.
6. Conclusions
This paper proposed a methodology to calibrate confidence by using a generative model to reduce uncertainty about TM when analyzing patents with imbalanced labels. Research hypotheses were presented to ensure the proposed method’s validity. The first hypothesis is that the quantitative indicators of patents differ depending on the purpose of document classification. Patents are data that sufficiently reflect the TMs; therefore, predicting TM using these data requires that the quantitative indicator of a patent must first be able to explain the target variable. Second, the latent variable obtained through the generative model is disentangled in accordance with the label of the patent. The proposed method generates data under the condition of a specific label for a patent. If the patent is entangled in a latent space, more noise is added, and uncertainty may increase. Finally, we assume that the confidence of the TM is calibrated through the proposed method. Thus, the proposed method is effective at reducing uncertainty.
The experiment was conducted to examine the practical applicability of the proposed method and to verify the research hypotheses. For the experiment, 10 quantitative indicators were extracted from 11,444 US patents. The text of each patent was transformed into a 384-dimensional vector through a transformer-based model for document embedding. Using these variables, we applied the proposed methods for Litigation, Valuation, and Transfer, which are representative TMs. The results of testing Hypothesis 1 showed that most quantitative indicators had statistically significant differences depending in the target variables. In other words, the quantitative indicators of patents are suitable for predicting TM. Next, by testing Hypothesis 2, we confirmed that the latent vector of the patent obtained through the generative model was disentangled in accordance with the label. Therefore, the proposed method can be used to calibrate confidence by selectively generating only a specific label. In the experiment, when the target variable is Valuation, we confirmed that the proposed method reduced the confidence from a maximum of 0.238 to 0.042. Similarly, when the target variables were Litigation and Transfer, the confidence decreased from its maximum of 0.179 to its minimum of 0.020. It was found that the proposed method calibrated the confidence for the three TMs because Hypothesis 3 was statistically significant.
In the future, it will be necessary to develop an architecture that combines the generative and predictive models. The proposed method increases the precursors of the training data as a generative model to reduce uncertainty about the prediction. A disadvantage of this approach is that the results may fluctuate depending on the predictive model. Therefore, we expect that sustainable TM will be possible through the development of a methodology that can merge generative and predictive models.