4.1. Simulation to Compare the Performance of FA and BFA
We employed BFA instead of traditional FA to identify latent technologies in the patent keyword data. Unlike FA, which estimates factor loadings using single-point estimates, BFA utilizes probability distributions, providing additional statistical measures such as confidence intervals to account for parameter uncertainty. Specifically, while FA does not incorporate prior information about parameters, BFA introduces a prior distribution, enabling the identification of latent structures even with a small dataset. For this reason, BFA was used in this study to analyze patent keyword data. In this section, we perform a simulation to compare the performance of FA and BFA. The number of variables and factors for the simulation were set to 6 and 2, respectively. We generated the sample data from the true factor loadings shown in
Table 3.
To compare the performance differences between FA and BFA, we performed two experiments according to sample size.
Figure 5 shows the comparison results of performing FA and BFA using the generated data, with sample size = 500.
The
X-axis of the graph represents the true factor loadings, while the
Y-axis represents the factor loadings estimated by FA (left) and BFA (right). A greater deviation of the factor loading points from the diagonal indicates poorer model performance, as it fails to accurately reflect the true factor loadings [
11,
32,
33]. The results show that the factor loadings estimated by FA deviated slightly more from the diagonal compared to those estimated by BFA. Subsequently, we increased the sample size to 2000 and compared the FA and BFA performance.
Consistent with the results,
Figure 5 and
Figure 6 show that the factor loadings estimated by FA deviated further from the diagonal compared to those estimated by BFA. Additionally, we observed that the performance gap between FA and BFA increased as the sample size grew, indicating the superior performance of BFA. Therefore, we found that the performance of the BFA was better than that of the FA. Through the simulation results, we were able to confirm the excellent performance of BFA, once again.
4.2. Experimental Data for Digital Therapy Technology
In this paper, we used the patent documents related to digital remedy and therapeutics. Digital therapy is a method of treating a patient’s illness using digital technologies such as software and data [
34,
35]. We searched the patents from patent databases across the world [
36,
37]. Using the text mining and valid patent selection processes, we obtained 2685 patent documents and 675 terms. In this experiment, we chose 30 patent keywords highly related to digital therapy from the terms that appeared more than 100 times in all the patent documents. The chosen keywords used in our experiment are as follows: device, data, patient, user, control, information, monitoring, measurement, sensor, therapy, computing, remote, image, interface, signal, display, agent, analysis, network, predict, diagnostics, program, healthcare, brain, electron, machine, database, learn, software, and wireless. Thus, we constructed a matrix consisting of 2685 documents and 30 patent keywords. Each element of this matrix represents the frequency value of a keyword occurring in each patent document. The keywords were considered as variables in our model. Next, we showed the analysis results and their applications in a practical domain.
4.3. Analyzing Patent Keywords of Digital Therapy Technology
Using the 30 patent keywords, we carried out the BFA to understand the technology of digital therapy. First, to determine the optimal number of factors, we calculated the eigenvalues for each factor.
Figure 7 shows the eigenvalues for all the factors.
We can see that among the total 30 factors, the top 2 (F1 and F2) are very large compared to the others. Since the larger the eigenvalue of a factor, the greater the explanatory power of that factor, we present the eigenvalue value and percentile of each factor in
Table 4.
As illustrated in
Figure 3, the eigenvalues of Factors 1 and 2 were confirmed to be 5.2330 and 3.2453, respectively, which is substantially higher than those of the remaining factors. In accordance with the standard criterion for FA, where factors with eigenvalues equal to or greater than 1 are typically retained, only factors meeting this threshold were included in the analysis. Consequently, the top 10 factors with eigenvalues exceeding 1 were selected for this study. From the results in
Table 4, we know that the sum of the explanatory powers of the 10 selected factors is 62.28%. To define each factor as a latent variable, we illustrate the keywords included in each factor and the loading values of the corresponding keywords in
Table 5.
We confirmed that Factor 1 is a latent variable represented by the keyword device. Next, Factor 2 is represented by five keywords—data, database, analysis, healthcare, and diagnostics—and among these, the keyword ‘data’ has the largest loading value, so we defined Factor 2 centered on the keyword ‘data’. We also defined latent variables for the remaining factors using keywords and loadings, similar to Factors 1 and 2. Therefore, we performed latent variable definition for all the selected factors and the results are shown in
Table 6.
In our experiment, since each keyword corresponds to a detailed technology, the 10 factors defined in
Table 6 become representative technologies required for the development of digital remedy and therapeutics technology. Using the results of the BFA, we carried out social network visualizations. First, we show the results of the social network visualization using all the 30 patent keywords, without performing BFA, in
Figure 8.
In the social network shown in
Figure 8, the threshold value of the correlation coefficient for the connections between the keywords was set to 0.15. Various network structures were analyzed by adjusting the threshold, and the final value was selected to construct a social network that best explains digital therapeutics. We found that seven keywords—program, healthcare, agent, analysis, diagnostics, therapy, and predict—despite being patent keywords required for digital therapeutic technology, were isolated and not connected to the entire network group. To gain a more detailed understanding of social network visualization, we performed the visualization again after excluding these seven keywords, and the results are presented in
Figure 9.
Figure 9 is a part of
Figure 8. That is,
Figure 9 is the result of deleting the seven keywords—analysis, predict, agent, therapy, diagnostics, program, and healthcare—that are isolated and not connected to the network in
Figure 8. Therefore, we can better understand the structure of the keyword nodes connected to the network through
Figure 9, although, from
Figure 8, we can still understand the entire network, including the isolated keyword nodes. It was observed that 12 keywords—device, data, patient, information, monitoring, computing, remote, interface, signal, network, software, and wireless—are positioned at the center of the keyword network, describing digital therapeutic technologies. Additionally, we observed that five keywords—data, computing, signal, learn, and software—function as intermediaries facilitating connections between the other keywords. To confirm the importance and relevance of each keyword, we calculated three performance evaluation measures commonly used in social network analysis (degree, closeness centrality, and betweenness centrality) and show them in
Table 7.
Investigating the results in
Table 5 from the perspective of degree measure, we can see that they are distributed very diversely, from the keywords ‘monitoring’ and ‘computing’, with the highest degree of 28, to seven isolated keywords with a degree of 0. Also, we find that the closeness centrality is very widely distributed, from the keyword
computing, which has the highest value of 0.6207, to seven keywords with values of 0. Finally, we can see that the betweenness centrality is widely distributed, from the keyword ‘data’, with the largest value of 62.7908, to 11 keywords with a value of 0. Next,
Table 8 shows the top 10 keywords based on degree and centrality.
We found that the results for degree and closeness centrality are similar. That is, we could identify that the top 10 keyword lists in terms of degree and closeness centrality are identical. On the other hand, we confirmed that the results for betweenness centrality are different from those for degree or closeness centrality. From the results in
Table 7 and
Table 8, we can see that the keywords
data,
information,
computing,
signal,
device, and
software are important to developing digital therapy technology. Next, in
Figure 10, we present a social network visualization using the top 10 factors and BFA.
From the results in
Figure 10, we found that Factors 1, 3, and 9 are important and necessary technologies for digital therapy technology; this is because they are connected to many other factors and are also centrally located in the network. That is, technologies based on device systems, patient monitoring, and signal processing represent the core technologies for digital therapy technology. The technologies of Factors 5 and 7, representing electronic control and sensing systems, respectively, are also central technologies for developing digital therapy technology. We confirmed that data analysis, the most important technology in digital therapy, is directly connected to four technologies: device system, patient monitoring, software agent, and display system. To investigate the technology network consisting of 10 factors in more detail, we computed the degree and centrality of the factors and show the results in
Table 9.
The degree measure results show that Factors 1, 3, and 9 have values larger than 10. In addition, we found that the closeness centralities of Factors 1, 3, and 9 are ranked in the top three. Thus, these three factors are major technologies in digital therapy technology. On the other hand, we could confirm that the betweenness centrality result is slightly different from those for degree and closeness centrality. From the betweenness centrality result, we can see that Factor 2 is newly included in the top three factors. To compare the importance of all the factors within the network, we rank the factors in terms of degree and centrality in
Table 10.
From the results in
Table 7, we found that the ranking list of degree and closeness centrality are the same. However, the ranking by betweenness centrality was slightly different from that by degree and closeness centrality. We found that among the top three factors by betweenness centrality, Factor 2 was included instead of Factor 9, which was included in degree and closeness centrality. Using the results in
Figure 10 and
Table 6,
Table 7,
Table 8 and
Table 9, we constructed a technology diagram for digital therapy in
Figure 11.
We built the technology diagram using the factor definitions described in
Table 6 instead of the argument numbers. In
Figure 11, the four sub-technologies within the square box represent Factors 1, 2, 3, and 9, while the six sub-technologies outside the box correspond to the remaining factors, excluding the four within the box. We can see that all the sub-technologies required in the field of digital therapeutics are based on data analysis technology. Also, we identified that the technologies related to remote patient monitoring, device systems, and signal processing are central technologies in digital therapy.