Evaluation of Online Communities for Technology Foresight: Data-Driven Approach Based on Expertise and Diversity

Kim, Youngjun; Son, Changho

doi:10.3390/su142013040

Open AccessArticle

Evaluation of Online Communities for Technology Foresight: Data-Driven Approach Based on Expertise and Diversity

by

Youngjun Kim

¹ and

Changho Son

^2,*

¹

Samsung SDS, Seoul 05510, Korea

²

Department of Weapon System Engineering, Korea Army Academy at Yeong-Cheon, Yeongcheon-si 770-849, Korea

^*

Author to whom correspondence should be addressed.

Sustainability 2022, 14(20), 13040; https://doi.org/10.3390/su142013040

Submission received: 24 August 2022 / Revised: 2 October 2022 / Accepted: 10 October 2022 / Published: 12 October 2022

(This article belongs to the Special Issue Machine Learning, Data Mining and IoT Applications in Smart and Sustainable Networks)

Download

Browse Figure

Review Reports Versions Notes

Abstract

This study proposes a framework for selecting and validating data sources for public-based technology foresight. In other words, it finds out which of the many online communities have valuable data sources. Specifically, we evaluate the usefulness of text data from online communities for technology foresight in terms of expertise and diversity. To this end, not only is a bibliographic analysis using metadata conducted, but also, topic modeling techniques for a semantic analysis of texts are utilized. As a case study, we selected 20 candidate communities where discussions and predictions related to technology are made and applied newly proposed metrics. As a contribution of this study, it is expected that it will provide a basis for public participation in technology foresight, not only leaving it to a few experts.

Keywords:

technology foresight; online communities; emerging technologies

1. Introduction

In traditional technology-foresight processes, research has been conducted based on experts in each technology field, and these experts have played a key role in decision- and policy-making. Their expert knowledge generates harmonized descriptions about possible future directions [1,2]. These traditional, closed-loop foresight activities have led to experts with various cognitive limitations.

In line with this, technology foresight is now acknowledged not only as an area for experts, but also as a discipline for the general public [3,4]. Instead of simply relying on experts discussing future developments, new approaches also include external sources, such as suppliers, research institutes, users, and online communities. By integrating such external sources, the potential of different points of view can be integrated into the foresight process, resulting in collective intelligence [5]. Further, a large number of users and communities are involved in predicting the future society, and their influence has been increased, owing to a reduction in the information gap and better accessibility [6]. Combining the concepts of corporate foresight with the research on open and user innovation leads to a recently developed process, called, “Open Foresight” [7,8].

Moreover, the technological development of the internet in the direction of a participative approach, the so-called Web 2.0 [9], changed usage behavior dramatically. This evolution is characterized as being user-centered and focuses on interactive forums which foster user activities, such as co-creation and communication [10,11,12]. With these developments, online communities emerged, and individuals around the world have been able to freely share their views on shared interests and common goals using an internet platform [13]. Moreover, online communities possess the required expertise for foresight and might help by contributing to a more comprehensive understanding of the future. Their knowledge makes them especially valuable for foresight processes and, therefore, a systematic integration of online communities might reduce uncertainty about future changes.

Yet, it cannot be said that the enormous quantitative expansion of the online community has accompanied the qualitative development of the information contained therein [13]. It is, therefore, necessary to consider the intrinsic quality level of the content of the actual online community. However, most studies that utilized online communities’ data as a source of technology foresight focused only on the external characteristics of the online communities. Reliable public-opinion mining requires that online communities be categorized based on actual intrinsic content, activities, and characteristics of online communities. Hence, a systematic and quantitative evaluation and classification framework for the online community is now needed for a public-based foresight process.

In response, this study proposes a new online-community assessment framework for technology foresight. Specifically, it aims to quantitatively evaluate the elements that make up the online communities where technology-related discussions take place. In addition, the focus is on two criteria, expertise and diversity, to select technology-related posts from various users.

The remainder of this paper is organized as follows: First, the Related Works section presents a literature review of the characteristics of online communities and the techniques used to evaluate them. Second, we review the existing literature on the use of the online community for foresight activities, and evaluate the criteria of the online community. Next, the online-community evaluation framework proposed in this study is explained in detail. Then, the results of applying this technique to the actual 17 technical online community candidates are illustrated. Finally, our discussion and concluding remarks are provided.

2. Related Works

2.1. Online Communities and Open Foresight

Due to the rapid development of information and communication technology, countless online websites have been created since the late 20th century, and research for utilizing them has been conducted in various fields. Several definitions of the term online community have been proposed. The most representative and accepted description of online communities is, “social aggregations that emerge from the Net when enough people carry on public discussions long enough, with sufficient human feeling” [14]. Although detailed definitions may vary depending on the size, purpose, and characteristics of members of the community, in this study, the term online community is used as the conventional concept stated above. Especially in the 2010s, with the spread of smart phones, the limitation of space and time was almost eliminated, and the expansion speed of the sea of information continues increasing exponentially.

Due to these changes, studies have been conducted that utilize opinions generated online for foresight and innovation activities. In other words, the online community is being used as the main source for open foresight. Zeng [15] presented a method to utilize the online community for open innovation and conducted a case study to integrate the online community into the product development process of SMEs. However, this study did not proceed with the evaluation of the contents generated in the community. Shakhovska et al. [16] introduced a model that utilizes the contents of the discussion that occurs in the virtual community for marketing activities. They considered community content in terms of topicality, proficiency, and timeliness. An indicator for evaluating the online community was presented, but it was for marketing purposes, not technology foresight, and no actual case study was conducted. Antons et al. [17] introduced a technique that uses both a bibliometric approach and text-mining technique for the purpose of innovation research based on journal articles.

These articles can be referred to as prior studies of this study in that they utilize online community and web data for open foresight. Yet, to our knowledge, no studies have been conducted to determine which online communities are suitable for the specific purpose of technology foresight. In this context, we introduce evaluation criteria and methods, and conduct a case study in the following chapters.

2.2. Assessment of Online Communities

Since a mass of websites exist on the internet, it is necessary to identify the “right” online communities [13]. To exploit the myriad of data from many online communities, it is essential to avoid the harm of garbage in, garbage out. In particular, more careful data screening is needed to sort online communities for the specific purpose of technology foresight.

Pioneering studies in this regard have been carried out mainly in the field of computer science. PageRank is an algorithm used by Google Search to rank websites [18]. It works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites. Since the development of the PageRank algorithm, various subsequent studies have been conducted. The Expertise Network [19] was applied to produce a user expertise ranking of a Java developer bulletin board. It measures expertise scores through the question-answer network. Additionally, the SPEAR (spamming-resistant expertise analysis and ranking) algorithm) [20] was developed to estimate the quality of shared items and the expertise of users with respect to a particular topic of interest at the same time. Likewise, ExpertRank [21] evaluates expertise based on both document-based relevance and one’s authority in his or her knowledge community. It modified the PageRank algorithm to evaluate one’s authority so that it reduces the effect of certain biasing communication behavior in online communities.

2.3. Expertise and Diversity of Online Communities

To develop a framework for evaluating online communities, we first need to discern their characteristics. Since this study aims to screen online communities that can be a data source for public-based technology foresight, a literature review was conducted on characteristics that are considered to be able to better reflect technology-related foresights of people rather than the general characteristics of online communities.

There are several characteristics on which the online community can be useful in foresight activities. First, the expert knowledge available from users in various domains is essential for foresight activities [13]. It is clear that the main reason why participants in future studies could be extended from a small number of experts to a large number of users is that the expertise of the public has increased due to the narrowing of the information gap. In addition, it is notable that there is no time and space limitation in the communication between users in the online community. This boosts the intense interaction between users and provides additional data generation. Since most of the discussions and articles in online communities are free of charge and accessible, this is an advantage for researchers. A further benefit is the size of online communities. The number of users in each online community can range from as few as hundreds to as many as tens of millions. Correspondingly, this scale guarantees a thematically broad spectrum from the perspective of exploring the future [22].

In light of this, this study aims to establish two criteria of expertise and diversity in evaluating online communities. First, expertise in this context refers to the skills and knowledge of individuals in a particular technology [23]. Expertise is important for foresight because the factors affecting the future are manifold and complex, and thus, lead to a high degree of uncertainty. To reduce this uncertainty, sound knowledge from different areas is needed. In terms of online communities, content produced by highly expertized users can make the community more active, and these participants often become lead users in the long run [24]. Second, diversity is an indicator of how varied the themes and scope of the technologies being addressed in the community are. The complexity of today’s problems requires an inclusion of different perspectives. Individual experts often have a limited capability to solve complex and interdisciplinary problems alone and are, therefore, dependent on interactions [25]. Online communities offer a mixture of different area-promoting features of the Web 2.0, and their diversified themes could be a promising way to achieve diversity in foresight activities.

3. Measurement of Expertise and Diversity

This subsection describes the online-community evaluation framework for technology foresight presented in this study. Table 1 provides a summary of the ways in which online-community diversity and expertise can be measured. As shown, both qualitative and quantitative approaches can be used to evaluate online communities. However, since this study aims to carry out data-driven measurements, we exclude only qualitative approaches among the criteria in Table 1. In other words, unmeasurable or external characteristics of the online community are not taken into account. Namely, demographic information such as gender and age of users, and accessibility and structure of the community are not considered in the metrics. Instead, a bibliometric analysis and text analysis of the data are performed, focusing on the data to be utilized in the actual foresight activity.

3.1. Expertise

The expertise of online communities can be measured with several indicators. First, members’ profiles are used, providing information such as professional background, occupation, and hobbies. These are certainly intuitive and direct information and can confirm the user’s expertise. However, in most communities it is illegal or impossible to collect such personal information, which is not considered in the metrics proposed in this study.

User expertise also can be distinguished in user experience and technology-related knowledge [26]. User experience is measured by the total amount of posts. However, measuring expertise by simply counting the number of posts or articles generated within a community can be rather one-dimensional and a leap of logic can exist. Another method is to confirm user experience by the activity of users in the community [10]. To this end, members in the community are classified into three groups: innovators/activists, tourists/crowd-followers, and lurkers. If the ratio of innovators/activists has the highest level of activity, that is, functioning as the lead user, the user experience can be indirectly determined. In this study, the level of activity is assessed by the ratio of activists. In other words, the higher the percentage of activists, or lead users, the higher the expertise. Here, the percentage of activists is calculated by identifying the users who write high-impact articles for a topic. To this end, latent semantic analysis (LSA), a technique that analyzes relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms, is utilized [27]. Applying the LSA can derive the impact of each document on each topic. In this study, activist is set up as a case where more than 50% of the articles written by each user are included in the top 20% of the topics with high impact.

Furthermore, technology-related knowledge is measured by the number of technical terms used in articles. In other words, by measuring how many professional or technological keywords occurred in the articles written by users, the expertise of the community is determined. In this study, the knowledge level was calculated based on keywords, and the process is as follows: High-frequency keywords are extracted from the full text data collected by each community. The analyst then scores on the Likert scale based on the frequency of the top 50 keywords. Each keyword is qualitatively evaluated for how high it is related to emerging technologies. The average of these values is then evaluated to assess the knowledge level of each community.

3.2. Diversity

One of the core ideas of open foresight is collective intelligence based on the participatory structure, and diverse scenarios and predictions derived from it [5]. Therefore, diversity can be considered as an important evaluation factor for an online community with low-entry barriers due to limited time and space. First, the number of members in s community can be an indirect indicator. However, in this case, it may include users with no actual activity, and in some communities, it may be difficult to distinguish between users who post technological articles and those who do not.

Another approach is to analyze the average amount of authors per article. The average amount of authors per document is calculated by dividing the number of members who have written a post in one thread by the number of posts. There is a good chance that more topics will exist in the community if many people post their opinions rather than having a small number of users writing a lot. However, it is difficult to apply this approach to a community where it is difficult to keep track of the number of users due to anonymity, or a community where a small number of people post and discuss within the post. Therefore, in this study, only the online community that can clearly identify the author of the article was selected, and when a reply or comment was written for one article or thread, each author was also counted separately.

Diversity also can be measured by the number of themes, or topics, included in articles generated in a community. In other words, analyzing the contents of the articles written by users makes it possible to find out how many topics are covered in a community. To this end, topic modeling techniques, which analyze latent topics in the document corpus through the probability generation model, can be utilized.

One of the most representative and popular topic modeling technique is latent Dirichlet allocation (LDA) [28]. However, since LDA is a parametric statistical test, it is necessary to set the number of topics K of the raw data in advance. It is crucial to select an appropriate K value because the results of LDA topic modeling vary greatly depending on this value. However, it is difficult to know in advance how many topics are covered in the data. Therefore, in general, an analysis is performed on various K values, and an appropriate K value is selected based on the perplexity value or whether the interpretation is reasonable from the analyst’s point of view. This approach has the disadvantage that trial and error is indispensable, and it may lack objectivity because it requires the analyst’s qualitative intervention.

In this study, the hierarchical Dirichlet process (HDP), a kind of topic modeling technique, is used to measure the optimal number of topics [29]. The HDP is a nonparametric Bayesian approach to clustering grouped data. Unlike other topic modeling techniques where the number of topics must be determined in advance, the HDP technique automatically derives the optimal number of topics. Therefore, it can be effectively used in a situation where the number of topics existing in the entire corpus should be checked.

4. Results

4.1. Data Description

To apply the online-community evaluation framework suggested in this study to the actual case, we investigated various websites. Since the purpose is to select an online community for technology foresight, we identified communities that provide opinions or information related to the future of emerging technologies. In order to gather information from users of various backgrounds, we targeted diverse types of online communities such as blogs, news sites, online forums, and social media.

Twenty online communities were selected as candidates, as shown in Table 2. The communities to be analyzed were selected based on three criteria. First, is the main purpose of the community to analyze and discuss the prospects of the future society? These standards are met by communities such as Future Timeline, Singularity Weblog, and World Future Society. These communities encompass both experts and the general public’s opinions on technology foresight. Second, is it dealing with trends in advanced technologies and services or products using them? These include communities such as TechCrunch, The Verge, and CNET, and these communities mainly contain articles by experts on the possibility of various social changes caused by technology. Third, can the general public freely express their opinions without restrictions on the subject? These include communities such as Reddit, Twitter, and Quora. These communities are discussing more diverse and free topics than in the previous example; thus, from the point of view of public foresight, these communities were also included as candidates. Then, we have crawled data related to emerging technologies or foresights from each website. Each website has a different format and type, and thus, the process of extracting the data was also distinct. For example, news-driven communities, such as Business Insider, have a preliminary section of articles, and thus, articles in technology categories were collected. In the case of Future Timeline, on the other hand, since the purpose of the website itself is future research, all blog posts were consistent with the purpose of this study. In the case of social media such as Twitter, retrieved results were collected through search queries related to technology. The description of the data collected from each source is shown in Appendix A, Table A1. As a result, 15,396 articles were collected from a total of 20 communities.

4.2. Data Analysis

Based on the text data collected from each community, preliminary work was performed to measure diversity and expertise. This was carried out using a mixture of a bibliometric approach and text analysis, the results of which are summarized in Table 2.

Each community’s expertise measurement results can be summarized as follows: First, as a result of evaluating the knowledge level of each community, communities where various technical terms are used, such as the World Future Society and Kurzweil AI, scored high. On the other hand, Twitter and Quora, which have characteristics of social media, and Business Insider, where articles are focused on industry rather than technology, were measured relatively low.

As a result of calculating the ratio of activists, MIT Technology Review was the highest and KDnuggets and Business Insider were relatively low. In general, the communities of webzines related to technology were measured highly.

Each community’s diversity measurement results can be summarized as follows. First, for the ratio of authors per document, it was the highest in Reddit and Twitter, and relatively low in Business Insider and Kurzweil AI. Although the deviation is not large compared to other metrics, the scores of communities actively sharing opinions through comments or citations tend to be high.

Additionally, the optimal number of topics of each community was measured based on the topic modeling technique, the HDP. As a result, Singularity Hub and Wired were the highest and TechRadar was the lowest. In some communities where articles or columns are written intensively on a specific subject or theme, fewer than 10 optimal topics were covered. Except for these, it could be seen that there are generally around 20 topics of discussion about the future of emerging technologies.

Based on the previously calculated results, the diversity and expertise values of each online community are finally derived. First, the values of the level of knowledge, ratio of activists, ratio of authors per document, and optimal number of topics are standardized. The sum of the standardized level of knowledge and ratio of activists is the value of expertise, and the sum of the standardized ratio of authors per document and optimal number of topics is the value of diversity. The result of this calculation is shown in Appendix A, Table A1 and Table A2.

In addition, the result of mapping the previous results by using diversity and expertise as the two axes on the two-dimensional plane is shown in Figure 1. In this map, online communities with high diversity and expertise, which are in the first quadrant, are finally selected as a suitable source for technology foresight. A description of the nine online communities that are considered advisable for public-based technology foresight is summarized in Table 3.

5. Discussion

As shown in the previous chapter, 9 out of a total of 20 technology-related online communities were assessed to be suitable as data for technology foresight. This number is close to half of the total candidate groups, because only whether the expertise and diversity calculated are both positive was used as the criterion for discrimination. From the perspective of open foresight, it is hard to say that it is negative to use data from more diverse communities. However, if only one or fewer communities should be used depending on the environment or conditions of the experiment, it would be desirable to select a community with a relatively higher level of expertise and diversity—for example, World Future Society.

Additionally, the correlation between expertise and diversity was tested. The Pearson correlation coefficient of expertise and diversity, calculated in a total of 20 communities, was 0.4, which was considered to have a moderate correlation. However, since the p-value of the corresponding test statistic is 0.08, which is greater than the significance level (α = 0.05), we conclude that the correlation is not statically significant. However, a further study to analyze whether there is a significant correlation by collecting more articles from each community or increasing the number of target communities would be needed.

The rest of the candidate communities determined not to be suitable for technology foresight were further analyzed. Namely, the description of communities in which one or both diversity and expertise are negative is as follows: The online communities in the second quadrant of Figure 1 have high expertise but low diversity. Therefore, online communities in this category may not be useful for exploring various future alternatives. They can be used to reflect expert opinions, but not for public-based foresight. Blogs such as Kurzweil AI have a small number of participants who actually write, but each has a high level of professionalism. Thus, these communities can be considered as alternative data for existing expert-oriented foresight.

The online communities located in the fourth quadrant of Figure 1 are considered to be more diverse but less expertized. As a result, online communities classified in this quadrant may be inappropriate for technology foresight that requires more than a certain level of knowledge. However, they can be useful in identifying trends of public opinion or exploring various issues, rather than an in-depth analysis of technology. In the case of Twitter, there are various studies using data generated from Twitter, and it is possible to analyze the network considering the connectivity of each Tweet.

Finally, the online communities in the third quadrant of Figure 1 are rated both low in expertise and diversity. In other words, the community is considered to be difficult to use as a data source for technology foresight. However, there may be potential applications of the contents of the communities that are not captured by the evaluation criteria presented in this study. For example, TechRadar and Engadget, located in the third quadrant, provide a large number of detailed reviews of the latest electronics. Such review data may be used for analyzing functional features and technical characteristics of the product. Although this may not serve the purpose for exploratory foresight, it is likely to be more useful than other categories of data in strategy planning.

The causes for some communities to not be finalized as suitable data sources are considered to be as follows: First, Twitter has a large number of users and covers various topics. However, due to the regulation of the service, the length of each post is short and does not tend to include in-depth content. Therefore, Twitter’s evaluation resulted in high diversity but low expertise. TechCrunch provides news and information related to technology, and the technological level of each article is high. Yet, as the community’s focus is mainly on startups and businesses, it seems that various topics related to emerging technologies have not been derived. Lastly, in Business Insider, the weight of articles focused on companies or people related to technology was high. As a result, the focus on technology itself is relatively low, suggesting low expertise. In addition, as a relatively small number of journalists posted articles, diversity was also underestimated.

6. Conclusions

This study aimed to evaluate online communities to select data sources for public-based technology foresight. To this end, we proposed a framework for evaluating online communities including technology-related content based on two criteria: expertise and diversity. Specifically, text data relating to the future of emerging technologies was collected from online communities of various types. After that, four indicators were calculated as the ratio of authors per document, optimal number of topics, level of knowledge, and ratio of activists. Based on these, the expertise and diversity scores were finally measured. For both criteria, online communities that meet or exceed a certain threshold were identified as, ultimately, a valuable data source. In order to verify the measurement method presented in this study, we applied the framework to 20 actual online communities.

The contribution of this study can be divided into theoretical implications and practical implications as follows: The principal theoretical implication of this study is that the evaluation framework of the online community was newly proposed for the specific purpose of technology foresight. Traditional online-community evaluation techniques focused on individual web pages and users rather than the community itself. This is because the purpose of conventional evaluation methods is to derive results from search engines rather than to evaluate the website as a research data source. In response, this study suggested a brand-new evaluation technique for the purpose of assessing data sources for technology foresight. This is expected to contribute to future research as a new concept of open foresight emerges.

As a practical implication, this study proposed a combined evaluation technique that utilizes a bibliometric approach and text analysis. A bibliometric approach based on the statistics of documents and authors within the online community allows for an indirect reflection of the size and user characteristics of the website. In addition, the text analysis was further conducted to allow for consideration of not only the external characteristics of the online community, but also the contents contained therein. The combination of this approach is based on both qualitative and quantitative analyses. Therefore, in the long term, it is expected to automate the evaluation of online communities, and through this, new derivative research such as recommendation algorithms of data sources will be possible.

On the other hand, the following limitations exist in this study. First, the number of cases should be increased by applying the framework suggested in this study to more online communities. One of the key differentiators of public-based technology foresight from traditional expert-oriented approaches is the ease of use of big data. In order to specialize these advantages, more data sets should be utilized to realize collective intelligence through big data. In this study, we surveyed 20 online communities, but countless online communities can be analyzed. If various websites not considered in this study are explored and analyzed, more valuable data sources can be identified for technical foresight.

The second limitation is that more detailed criteria and indicators are needed for the evaluation of the online community. In addition to the measures used in this study, there are a variety of techniques that can evaluate their effectiveness considering the characteristics of online communities. For example, one method can be to measure how much discussion is carried out in the form of replies in one thread. In other words, the communication of people is more active in the case of various discussions about a single topic or opinion than when each post exists independently. In addition, connectivity between communities may be considered. Further, for online communities that provide demographic information about users such as gender, age, occupation, and educational background, new evaluation techniques may be developed.

Third, an analyst’s qualitative analysis is necessary to evaluate the technical term of the articles. This is not only a matter of time and cost, but also has limitations in that the evaluation result can be changed by the analyst’s subjectivity. A more objective analysis will be possible if a technique that can quantitatively evaluate the technical level of terms is applied by collecting more data.

Author Contributions

Conceptualization, Y.K. and C.S.; methodology, Y.K.; software, Y.K.; validation, Y.K. and C.S.; formal analysis, Y.K. and C.S.; investigation, Y.K.; resources, Y.K. and C.S.; data curation, Y.K. and C.S.; writing—original draft preparation, Y.K.; writing—review and editing, C.S.; visualization, Y.K.; supervision, Y.K.; project administration, Y.K.; funding acquisition, C.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Research Foundation of Korea, grant number NRF-2022R1F1A1062959.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1 summarizes the data collected from 20 online communities related to technology foresight.

Table A1. Results of data collection.

Online Community	Number of Articles	Number of Authors	Description
Business Insider	780	54	Articles included in “TECH” category.
CNET	482	57	Articles included in technology-related topics such as “Sci-tech”, “Smart homes”, and “Drones”.
TechCrunch	822	71	Recent news articles regarding technology.
Diamandis	301	30	Technology-related blog articles.
Engadget	582	44	Articles included in “TOMORROW” category.
Future Timeline	1522	244	Blog articles on various technological fields.
Twitter	632	167	Search results for keywords related to emerging technologies, such as “driverless”, “smart home”, and “quantum-computing”.
io9	993	108	Recent blog articles regarding technology.
KDnuggets	154	25	Articles regarding data science.
Kurzweil AI	1932	121	Articles related to the latest technology trends.
MIT Technology Review	364	72	Articles related to several emerging technologies.
Quora	774	53	Articles included in “Technology Forecasting” and “Emerging Technology”.
Reddit	954	279	Articles included in technology-related subreddits such as “technology” and “selfdrivingcars”.
Singularity Hub	1095	91	Recent news articles regarding technology.
Singularity Weblog	731	122	Blog articles on various technological fields.
TechRadar	196	21	Recent news articles regarding technology.
The Verge	723	88	News articles included in “TECH” category.
Wired	1266	72	Articles related to several emerging technologies.
World Future Society	822	144	Articles included in topics related to future technologies, such as “WorldFuture” and “Resources for Future-Minded Citizens”.
ZDNet	271	71	Articles included in “Innovation” category.

Table A2 and Table A3 show the result of these calculated results of the diversity and expertise values of each online community.

Table A2. Diversity of online communities.

Online Community	Standardized Ratio of Authors per Document	Standardized Optimal Number of Topics	Diversity
Business Insider	−0.95	−1.66	−2.61
CNET	−0.26	−1.66	−1.92
TechCrunch	−0.71	−0.22	−0.92
Diamandis	−0.52	−0.40	−0.92
Engadget	−0.86	−0.76	−1.62
Future Timeline	0.33	0.87	1.19
Twitter	1.78	0.14	1.92
io9	−0.39	0.87	0.47
KDnuggets	0.35	−0.94	−0.59
Kurzweil AI	−1.04	0.87	−0.17
MIT Technology Review	0.85	0.33	1.18
Quora	−0.96	−0.04	−0.99
Reddit	2.17	0.69	2.86
Singularity Hub	−0.75	1.41	0.66
Singularity Weblog	0.42	0.14	0.56
TechRadar	−0.42	−1.84	−2.26
The Verge	−0.21	0.33	0.11
Wired	−1.12	1.41	0.29
World Future Society	0.53	1.05	1.58
ZDNet	1.75	−0.58	1.17

Table A3. Expertise of online communities.

Online Community	Standardized Level of Knowledge	Standardized Ratio of Activists	Expertise
Business Insider	−1.70	−1.54	−3.24
CNET	−0.98	1.26	0.28
TechCrunch	0.50	0.94	1.44
Diamandis	−0.14	−0.57	−0.72
Engadget	−0.47	−1.33	−1.79
Future Timeline	0.13	0.51	0.64
Twitter	−1.20	−0.14	−1.34
io9	0.65	0.94	1.58
KDnuggets	−0.73	−1.65	−2.38
Kurzweil AI	1.10	0.61	1.72
MIT Technology Review	0.84	1.59	2.42
Quora	−1.90	−1.33	−3.23
Reddit	0.32	−0.14	0.18
Singularity Hub	0.46	−0.03	0.42
Singularity Weblog	0.63	0.40	1.03
TechRadar	−0.04	−0.14	−0.18
The Verge	0.21	0.08	0.28
Wired	1.13	1.26	2.39
World Future Society	2.01	0.51	2.52
ZDNet	−0.82	−1.22	−2.04

References

Schatzmann, J.; Schäfer, R.; Eichelbaum, F. Foresight 2.0-Definition, overview & evaluation. Eur. J. Futures Res. 2013, 1, 15. [Google Scholar]
Feng, L.; Wang, Q.; Wang, J.; Lin, K.Y. A Review of technological forecasting from the perspective of complex systems. Entropy 2022, 24, 787. [Google Scholar] [CrossRef]
Masini, E.B. The past and the possible futures of Futures Studies: Some thoughts on Ziauddin Sardar’s ‘the namesake’. Futures 2010, 42, 185–189. [Google Scholar] [CrossRef]
Janzwood, S.; Piereder, J. “Mainstreaming” foresight program development in the public sector. Foresight 2019, 21, 605–624. [Google Scholar] [CrossRef]
Miemis, V.; Smart, J.; Brigis, A. Open foresight. J. Futures Stud. 2012, 17, 91–98. [Google Scholar]
Xenias, D.; Whitmarsh, L. Dimensions and determinants of expert and public attitudes to sustainable transport policies and technologies. Transp. Res. A Policy Pract. 2013, 48, 75–85. [Google Scholar] [CrossRef]
Ehls, D.; Herstatt, C. Open source participation behavior-a review and introduction of a participation lifecycle model. In Proceedings of the 35th DRUID Celebration Conference, Barcelona, Spain, 17–19 June 2013. [Google Scholar]
Wiener, M.; Gattringer, R.; Strehl, F. Collaborative open foresight-a new approach for inspiring discontinuous and sustainability-oriented innovations. Technol. Forecast. Soc. Change 2020, 155, 119370. [Google Scholar] [CrossRef]
O’reilly, T. What is Web 2.0: Design patterns and business models for the next generation of software. Commun. Strateg. 2017, 1, 17. [Google Scholar]
Janzik, L.; Raasch, C. Online communities in mature markets: Why join, why innovate, why share? Int. J. Innov. Manag. 2011, 15, 797–836. [Google Scholar] [CrossRef]
Li, X.; Xie, Q.; Daim, T.; Huang, L. Forecasting technology trends using text mining of the gaps between science and technology: The case of perovskite solar cell technology. Technol. Forecast. Soc. Change 2019, 146, 432–449. [Google Scholar] [CrossRef]
Lee, C. A review of data analytics in technological forecasting. Technol. Forecast. Soc. Change 2021, 166, 120646. [Google Scholar] [CrossRef]
Zeng, M.A. Foresight by online communities–The case of renewable energies. Technol. Forecast. Soc. Change 2018, 129, 27–42. [Google Scholar] [CrossRef]
Malinen, S. Understanding user participation in online communities: A systematic literature review of empirical studies. Comput. Human. Behav. 2015, 46, 228–238. [Google Scholar] [CrossRef]
Zeng, M.A. The contribution of different online communities in open innovation projects. In Proceedings of the International Symposium on Open Collaboration, Berlin, Germany, 27–29 August 2014. [Google Scholar]
Shakhovska, N.; Peleshchyshyn, O.; Myna, Z.; Bilushchak, T. Online Community Information Model for Use in Marketing Activities. In Proceedings of the COAPSN, Lviv, Ukraine, 16–17 May 2019. [Google Scholar]
Antons, D.; Grünwald, E.; Cichy, P.; Salge, T.O. The application of text mining methods in innovation research: Current state, evolution patterns, and development priorities. R D Manag. 2020, 50, 329–351. [Google Scholar] [CrossRef]
Haveliwala, T.H. Topic-sensitive pagerank: A context-sensitive ranking algorithm for web search. IEEE Trans. Knowl. Data Eng. 2003, 15, 784–796. [Google Scholar] [CrossRef]
Zhang, J.; Ackerman, M.S.; Adamic, L. Expertise networks in online communities: Structure and algorithms. In Proceedings of the 16th International Conference on World Wide Web, Banff Alberta, AB, Canada, 8 May 2007. [Google Scholar]
Yeung, C.M.A.; Noll, M.G.; Gibbins, N.; Meinel, C.; Shadbolt, N. SPEAR: Spamming-resistant expertise analysis and ranking in collaborative tagging systems. Comput. Intell. 2011, 27, 458–488. [Google Scholar] [CrossRef]
Wang, G.A.; Jiao, J.; Abrahams, A.S.; Fan, W.; Zhang, Z. ExpertRank: A topic-aware expert finding algorithm for online knowledge communities. Decis. Support Syst. 2013, 54, 1442–1451. [Google Scholar] [CrossRef]
Da Costa, O.; Cachia, R.; Compañó, R. Can online social networks be used in forward-looking studies. In Proceedings of the Second International Seville Seminar on Future-Oriented Technology Analysis, Seville, Spain, 28–29 September 2006. [Google Scholar]
Popper, R. Foresight methodology. In The Handbook of Technology Foresight; Edward Elgar: Broadheath, UK, 2008; pp. 44–88. [Google Scholar]
Belz, F.M.; Baumbach, W. Netnography as a method of lead user identification. Creat. Innov. Manag. 2010, 19, 304–313. [Google Scholar] [CrossRef]
Saritas, O.; Pace, L.A.; Stalpers, S.I. Stakeholder participation and dialogue in foresight. In Participation and Interaction in Foresight: Dialogue, Dissemination and Visions; Edward Elgar: Broadheath, UK, 2013; pp. 35–69. [Google Scholar]
Brem, A.; Bilgram, V. The search for innovative partners in co-creation: Identifying lead users in social media through netnography and crowdsourcing. J. Eng. Technol. Manag. 2015, 37, 40–51. [Google Scholar] [CrossRef]
Hofmann, T. Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, CA, USA, 15–19 August 1999. [Google Scholar]
Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent dirichlet allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
Teh, Y.; Jordan, M.; Beal, M.; Blei, D. Sharing clusters among related groups: Hierarchical Dirichlet processes. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 17 December 2004. [Google Scholar]

Figure 1. Diversity-expertise map.

Table 1. Measurement of expertise and diversity of online community.

	Criteria	Quantitative/ Qualitative	Measure
Expertise	Knowledge level of keywords	Both quantitative and qualitative	Extracting keywords with high frequency and scoring each keyword with Likert scale.
	Level of activity	Quantitative	Calculating ratio of innovator/active user: innovators are considered to be users who have written a lot of high-impact articles.
	Type of community	Qualitative
	Profiles of authors	Qualitative
Diversity	Ratio of author per document	Quantitative	Dividing the total number of authors by the total number of documents.
Diversity	Optimal number of topics	Quantitative	Finding the optimal number of topics by topic modeling technique: hierarchical Dirichlet process.

Table 2. Metrics for assessing online communities.

Online Community	Expertise		Diversity
	Degree of Keyword Expertise	Ratio of Activist	Ratio of Author per Document	Optimal Number of Topics
Business Insider	1.36	0.15	0.07	8
CNET	2.34	0.41	0.12	8
TechCrunch	4.36	0.38	0.09	16
Diamandis	3.48	0.24	0.10	15
Engadget	3.04	0.17	0.08	13
Future Timeline	3.86	0.34	0.16	22
Twitter	2.0	0.28	0.26	18
io9	4.56	0.38	0.11	22
KDnuggets	2.68	0.14	0.16	12
Kurzweil AI	5.18	0.35	0.06	22
MIT Technology Review	4.82	0.44	0.20	19
Quora	1.08	0.17	0.07	17
Reddit	4.12	0.28	0.29	21
Singularity Hub	4.3	0.29	0.08	25
Singularity Weblog	4.54	0.33	0.17	18
TechRadar	3.62	0.28	0.11	7
The Verge	3.96	0.3	0.12	19
Wired	5.22	0.41	0.06	25
World Future Society	6.42	0.34	0.18	23
ZDNet	2.56	0.18	0.26	14

Table 3. Description of selected communities.

Online Community	Description
The Verge	Representative news webzine specialized in science and technology. Articles are grouped by promising technology area. Each article contains user comments.
Singularity Hub	Archived data related to emerging technology news. Focus on future-oriented technology and breakthroughs. Well categorized by technical topic or author, providing detailed information about the author.
Future Timeline	Articles are sorted by specific forecast timeline. Provides future timelines based on current trends, long-term environmental changes, technological development trends, geopolitical evolution, etc. Discussions about timeline creation can be made through scientists, future predictors, and anyone interested in future trends.
Singularity Weblog	An open community with news and columns on the future of technology and its changes.
MIT Technology Review	Technological analysis magazine published by MIT. Provides an analysis of future-oriented factors of innovative products such as smartwatches and electronic vehicles. The articles are divided according to technical topics, and the technical knowledge level covered in each article is high.
World Future Society	An international community of futurists and future thinkers. People interested in the future freely talk about various futuristic topics.
Wired	Articles focused on how emerging technologies affect culture, the economy, and politics. Offers a wide range of quality journals for a variety of ranges and types of technologies.
io9	Focuses on the subjects of science fiction, fantasy, futurism, science, technology and related areas. A type of blog that covers not only professional writing on promising technologies, but also related popular culture.
Reddit	American social news aggregation, web content rating, and discussion website, In the form of “Subreddits’”, people with common interests can share their opinions freely.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kim, Y.; Son, C. Evaluation of Online Communities for Technology Foresight: Data-Driven Approach Based on Expertise and Diversity. Sustainability 2022, 14, 13040. https://doi.org/10.3390/su142013040

AMA Style

Kim Y, Son C. Evaluation of Online Communities for Technology Foresight: Data-Driven Approach Based on Expertise and Diversity. Sustainability. 2022; 14(20):13040. https://doi.org/10.3390/su142013040

Chicago/Turabian Style

Kim, Youngjun, and Changho Son. 2022. "Evaluation of Online Communities for Technology Foresight: Data-Driven Approach Based on Expertise and Diversity" Sustainability 14, no. 20: 13040. https://doi.org/10.3390/su142013040

APA Style

Kim, Y., & Son, C. (2022). Evaluation of Online Communities for Technology Foresight: Data-Driven Approach Based on Expertise and Diversity. Sustainability, 14(20), 13040. https://doi.org/10.3390/su142013040

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Evaluation of Online Communities for Technology Foresight: Data-Driven Approach Based on Expertise and Diversity

Abstract

1. Introduction

2. Related Works

2.1. Online Communities and Open Foresight

2.2. Assessment of Online Communities

2.3. Expertise and Diversity of Online Communities

3. Measurement of Expertise and Diversity

3.1. Expertise

3.2. Diversity

4. Results

4.1. Data Description

4.2. Data Analysis

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI