Next Article in Journal
Vibration Reduction of a Timoshenko Beam with Multiple Parallel Nonlinear Energy Sinks
Previous Article in Journal
Prediction and Surveillance Sampling Assessment in Plant Nurseries and Fields
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Analysis of the Development Trend of Sports Research in China and Taiwan Using Natural Language Processing

1
Institute of Physical Education and Health, Yulin Normal University, Yulin 573000, China
2
Center for Teacher Education and Career Development, University of Taipei, Taipei City 100, Taiwan
3
Office of Physical Education, National Chin-Yi University of Technology, Taichung City 400, Taiwan
4
Department of Leisure Industry Management, National Chin-Yi University of Technology, Taichung City 400, Taiwan
5
Department and Graduate Institute of Physical Education, University of Taipei, Taipei City 100, Taiwan
*
Authors to whom correspondence should be addressed.
Appl. Sci. 2022, 12(18), 9006; https://doi.org/10.3390/app12189006
Submission received: 13 August 2022 / Revised: 1 September 2022 / Accepted: 3 September 2022 / Published: 8 September 2022

Abstract

:
Background: A digital text abstract presents the essential information of an article, and we can find the trend and value of the research by analyzing it rigorously and digging up knowledge. Therefore, this study focuses on the abstracts of index journals in China and Taiwan from July 2010 to June 2020 (a total of 3283 abstracts). Methods: Through the concepts of text mining and natural language processing (NLP), it constructs processes such as text retrieval, text segmentation and word cloud analysis, TF-IDF weight analysis, co-word analysis, network analysis, and trend analysis, and analyses a large amount of text data. Results: The results show that the scope of research in China covers the fields of social sports and sports science, and research in Taiwan covers both natural and social sciences. The network diagram highlights the richness of sports-related research fields in the two regions, but research on sports philosophy is relatively rare. Conclusions: It is suggested that all disciplines/departments should re-allocate the same resources, so as to show a balanced development trend and help expand a new chapter in the sports academic field.

1. Introduction

China and Taiwan share the same origins in terms of their history, culture, language, and text, coupled with the development of network information and the prevalence of digital collections, so that sports academic exchanges between the two regions benefit from the unprecedented application of information technology (IT), and have changed from the closed mode of the past to an open mode to create a good and mutually beneficial academic developmental environment [1,2]. However, in the process of open academic exchange, it is often necessary to face limitations in processing big data, which promotes data mining technology and further extends the application of text mining [3,4,5,6,7]. Because the process of text mining is highly dependent on computation and model application, natural language processing (NLP) technology has cross-disciplinary application value, and sports-related academic research is no exception [8,9,10,11]. This also means that the content analysis method that traditional sports research institutes rely on must be improved accordingly in order to raise data processing efficiency [4,8,12]. In order to make up for the shortcomings of the content analysis method, such as being time consuming, laborious, and not objective, this study proposes a program for analyzing and processing text data in sports-related fields based on the concept of NLP, which is different from the traditional content analysis method, with a faster automative analytical process and the ability to systematically extract and analyze the importance and relevance of keywords in abstract content [5,13]. It integrates the methods of word segmentation, word cloud analysis, TF-IDF (term frequency-inverse document frequency) analysis, and co-word analysis, and then applies visual network diagramics drawing technology [14,15,16,17,18] to present the distribution of core keywords and their network relationships in sports research abstracts from China and Taiwan (a total of 3283 articles). This reveals possible hidden knowledge structures and provides an important reference for evaluating future development trends in the two regions.
The main function of an abstract is to help readers quickly understand the main contents of the article, and to screen out the necessary important knowledge from numerous articles in the academic literature [19,20]. At present, thanks to the rapidly developing new information technology, many scholars have opened their research results up to the world through online academic database pages (such as Google Scholar), in order to lay a good foundation for academic development [21,22,23,24]. Therefore, systematic analysis of research abstracts accumulated over the years will enhance the under-standing of the development process, research characteristics and future development trends of an academic field [19]. However, the speed of data generation and storage on various network platforms far exceeds the speed that people can analyze and digest, which also allows data mining and text mining technology to play an extremely important role in exploring the application of big data analysis [5,6,12,13,25]. Generally, data mining is mostly applied to the processing of structured data, just like a table with a fixed structure, in which each column has its own clear definition and numerical value. Based on the needs of researchers, these structured data can be used for clustering analysis, classification analysis, association rule analysis, sequential pattern analysis, time- series similarity analysis, and link analysis, in order to dig out valuable implicit knowledge and information [4,12]. Text mining is much more complicated, because its original input data are unstructured information in the form of text, unlike structured data, which have their specific structure to follow; however, in these seemingly messy text data, there is also a lot of information worthy of in-depth investigation, which needs to be explored by relevant researchers [26]. Therefore, NLP applies the language rules commonly used by people to carry out disassembly and reorganization operations for the words in the text content, i.e., it can grasp the importance and relevance of keywords, and thus can roughly understand the semantic content and become an important basis for exploring new knowledge [5,18].
One of the main media of network information transmission is text [8,17,27]. From the perspective of management, most studies have used NLP to analyze the structured knowledge and language relations related to management, establish the vocabulary system of this knowledge, and further construct the decision support system or retrieval system for management [5,13,28]. However, it is rare to apply NLP technology in the sports academic field to explore its knowledge structure and future development trends. In addition, due to various political factors, sports exchanges between China and Taiwan only focus on sports competitions and visits, while exchanges involving academic fields are few, which has become a problem worthy of attention from governments and academic circles. In view of this, with the rapid increase in the demand for electronic information interpretation, it is urgent for the two regions to expand their relevant academic fields through faster and more objective automative data analysis to promote knowledge exchanges between them. The purpose of research is to construct a suitable NLP analysis process for Chinese text mining. Through the NLP process, this study presents the current situations and differential results of sports academics in the two regions, which will help to create new opportunities and new directions for research and physical education development in an online environment.

2. Methods

The process of this study was based on the concepts of text mining and NLP (Figure 1). For more, please refer to [5,6,7,14,15,16,18,27,28,29,30]. In the SCI (science citation index) and SSCI (social science citation index) articles of [31], through the steps of text retrieval, word segmentation, word cloud analysis, TF-IDF analysis, co-word analysis, network analysis, trend analysis, etc., the text data of the two regions are excavated for comparison and analysis of their similarities and differences.

2.1. Text Retrieval

In this study, the CSSCI (Chinese social sciences citation index) and TSSCI (Taiwan social science citation index) comprehensive academic journals ranked in the top two for impact factor (IF) were selected. The text retrieved in China included that from China Sport Science (IF = 0.326), which publishes research papers and reviews reflecting the latest achievements in sports research, and the Journal of Shanghai University of Sport (IF = 0.221), which is a comprehensive journal of sports academic theory whose main task is to present the latest high-level research results in the field of sports science. The text selected from Taiwan was that from the Physical Education Journal (IF = 0.663), which aims at publishing original research papers in the field of sports and exercises, and producing high-quality academic works, and from Sports & Exercises Research (IF = 0.420), which contains original articles with reference value on sports academics in Taiwan, hoping to contribute to the existing knowledge of mankind. Important messages, including titles and keywords, can be retrieved from the short abstract text to give full play to their value and utility [6,7,29]. Referring to the 10-year research interval of [16] and [31], this study retrieved the text abstracts of the two regions from July 2010 to June 2020, including 1420 abstracts in China Sport Science and 1142 abstracts in the Journal of Shanghai University of Sport, with a subtotal of 2562 abstracts in China, 345 abstracts in the Physical Education Journal and 376 in Sports & Exercises Research. There was a subtotal of 721 abstracts in Taiwan, totaling 3283 abstracts retrieved.

2.2. Word Segmentation and Word Cloud Analysis

Words, phrases, and sentences are the basic units in the process of language recognition, and their meanings will affect the understanding of the text. Accurate word segmentation is very important [27,30]. In this study, the CKIP Chinese word segmentation system [17,28] was used to disassemble the words and mark the parts of speech of each vocabulary automatically. Then, meaningful words were retrieved, such as nouns, verbs, and adjectives [6], while punctuation marks or meaningless words, prepositions, connectives, articles, etc., were not explored. Finally, through the procedure of estimating word frequency, the frequency of word occurrence could be learned for subsequent analysis. At the same time, word frequency patterns were compared by using a word cloud, where the more times a word appears, the greater the frequency of that word [15]

2.3. TF-IDF Analysis

TF-IDF is a classical keyword retrieval method, which is simple and effective [6]. It can convert words into statistical data and calculate their frequency. TF indicates the number of times words appear in the text, while IDF can show the importance of words in all texts, and at the same time increase common words and expand rare words [5,18]. Therefore, if key words appear in more different texts and have a lower frequency in each text, they can obtain more weight. Refer to the explanations of TF-IDF conversion by [32] and [18]. In the general file (F) of all samples, taking the words (wi) of all words (Dterms) in a file (f1) as an example, the formula is as follows:
TF IDF w i = w i   occurrence   times Total   number   of   Dterms × ( log   Total   number   of   Fs   Number   of   files   with   w i + 1 )  

2.4. Co-Word Analysis

A single word cannot provide an understanding of the focus of a study, but the semantic features of words can be captured through the co-occurrence frequency of words to obtain the answer [6,14,33]. The most suitable index for evaluating the frequency of co-occurrence words is the equivalence index, which is also called association strength [34]. In other words, co-word analysis can identify the linkages between words, and deduce the research topic according to the number of co-occurrence phrases. See [14] and [35] for an explanation of co-word calculation. Taking the word (wi) and the word (wj) among all words (Dterms) of a sentence (s1) as examples, and the formula is as follows:
c o   ( w i ,   w j ) = s 1     D ( w i ) s 1 ( w j ) s 1  

2.5. Network Analysis

Visualization of network diagrams can enhance the understanding of the connection and cluster relationships between words [16], just like the concept of “look for a good steed according to a picture.” One can then explore the text knowledge. When exploring the group structure of various types of relationship, network analysis technology can build a radial social network, in which key words can be regarded as nodes in the network, and information such as network density, degree centrality, and K-Core can be obtained through the combination of strong and weak links [5,36]. Density is used to judge the connection status of nodes, and a high-density network also means that the nodes have stronger cohesion, while degree centrality and K-Core can provide an understanding of the distribution or concentration of the overall network diagram [36,37]. This study made good use of the NetDraw function of the UCINET 6 software [16] to determine the aforementioned data and network diagrams, and to provide important analysis information.

3. Results

3.1. Preliminary Analysis

For China, 2562 abstracts were retrieved from China Sport Science and the Journal of Shanghai University of Sport (853,357 words). For Taiwan, 721 abstracts were retrieved from the Physical Education Journal and Sports & Exercises Research (351,368 words). Then, the CKIP Chinese word segmentation system [17,28] was used for word segmentation to sort out meaningful words [6]. Meaningless characters and words were not analyzed, such as “and,” “also,” “as well as,” “with,” “because of,” etc. For China, 9518 meaningful words were screened out. For Taiwan, 4476 meaningful words were screened out. Then, the frequency of words was analyzed. The ranking results of each word are shown in Table 1. For China, the word ranking first was “體育” (sports) (10,388 times), followed by “運動” (exercises) (4700 times), “中國” (China) (2907 times), and “發展” (development) (2736 times). For Taiwan, the word ranking first was “運動” (exercises) (2792 times), followed by “訓練” (training) (853 times), “動作” (action) (737 times), and “身體” (body) (673 times).
Due to the abundant data, this paper can only present the statistical analysis of the top 15 words in each region, but with the thinking of looking at pictures to catch the meanings, the frequency analysis results of the top 50 words in each region are presented with the help of word cloud, as shown in Figure 2 and Figure 3. The font size of a word in a word cloud represents how often it appears [15]. This result can provide a preliminary understanding of the research words with the highest occurrence rate in the two regions and their general situations. It was also found that besides the different word rankings, some words were no repeated, such as “文化” (culture), “武術” (martial arts), “水準” (standard), “肌肉” (muscle), “量表” (scale), and “學習” (study).

3.2. Correlation Analysis

This stage focuses on the TF-IDF analysis and co-word analysis, and identifies the key words from the relative relations of the words for network analysis.

3.3. TF-IDF Analysis

According to the TF-IDF calculation method proposed by [32] and [18], the word frequency statistics are imported to calculate the weight of key words. Once the TF-IDF moderates, the difference will serve as the basis for analysis and judgment [13]. As shown in Table 2, for China, the word ranking at no. 1 in terms of TF-IDF is “體育” (sports) (0.0044), followed by “運動” (exercises) (0.0041), “訓練” (training) (0.0039), and “武術” (martial arts) (0.0037), and no. 27 is “體系” (system) (0.0021). When the exploration continued, the word ranking no. 28 was “因素” (factor) (0.0019). After TF-IDF decreased sharply by 0.0002, the decreasing trend moderated, so only the words with rankings up to 27 were taken for China. For Taiwan, the word ranking no. 1 in terms of TF-IDF was “訓練” (training) (0.0032), followed by “動作” (action) (0.0029), “身體” (body) (0.0027), and “肌肉” (muscle) (0.0025), and no. 26 was “健康” (health) (0.0014). The word ranking at 27 was “關係” (relationship) (0.0012). Similarly, after TF-IDF decreased sharply by 0.0002, the decreasing trend moderated, so only the words with rankings up to 26 were taken for Taiwan. Based on these results, the co-occurrence relationships of key words in the two regions were explored.

3.4. Co-Word Analysis

The frequency of co-words facilitates the understanding of an article’s focus [14]. After analysis using the calculation methods proposed by [14], as well as [35], as shown in Table 3, for China, the co-words ranking first in the number of co-occurrences were “體育” (sports)—“發展” (development) (782 times), followed by “體育” (sports)—“社會” (society) (716 times), “發展” (development)—“社會” (society) (439 times), “體育” (sports) –“理論” (theory) (427 times), and “體育” (sports)—“運動” (exercises) (419 times). For Taiwan, the co-words ranking first in the number of co-occurrences were “運動” (exercises)—“表現” (performance) (193 times), followed by “訓練” (training)—“運動” (exercises) (163 times), “選手” (player)—“運動” (151 times), “運動” (exercises)—“運動員” (athlete) (137 times), and “身體” (body)—“運動” (exercises) (132). The co-word matrices (as shown in Table 4) of 27*27 and 26*26, respectively, for the two regions were produced for network analysis. Due to space limitations, only the first 20 pairs of key words in the ranking of co-occurrence times are presented. Here, we can firstly find out the different research styles of the two regions. China focuses on issues concerning sports, development, society, theory, China, and culture. Taiwan focuses on exercises, performance, training, athletes, the body, and other aspects. However, the closeness, linkages, cluster relationship, and overall structure of each key word still needed to be presented in the visual analysis stage.

3.5. Visual Analysis

Firstly, the co-word matrix table was converted to a UCINET-readable format [38], and then the network density, degree centrality, and K-Core were converted [5,36,37], thereby presenting nodes and clusters with different appearances. The bigger and more central the nodes are, the more important they are. The density of the network diagram for China was 0.501, which means that 50.1% of the possible links actually appeared. Figure 4 also shows that there are six clusters in the network diagram. Cluster 1 (square, K-Core 10) takes social sports development as the main core, has the most nodes, is in the center of the overall structure, and occupies the largest area, which can be said to be the trend of sports research in China. Cluster 2 (diamond, K-Core 9) only has behavior nodes, but it is located at the center point directly below the overall structure diagram, showing that various sports behavior issues have also received attention. Cluster 3 (regular triangle, K-Core 8) has three nodes, which are close to the left side of the network diagram, indicating that sports public policy and other research accounts make up a considerable proportion. Cluster 4 (round, K-Core 7) only has the node of teenager, suggesting a focus on research at the teenager level, e.g., teenager sports activities, teenager physicality, mental health, etc. Cluster 5 (inverted triangle, K-Core 5) is directly on the top of the network diagram, including research on rats and martial arts, which is less than the first four clusters. Finally, Cluster 6 (cross, K-Core 3), which takes sports games as an independent node, is located at the lower left of the network diagram, and focuses on topics related to sports games.
The density of the network diagram for Taiwan was 0.508, which means that 50.8% of possible links actually appeared. It can be seen from Figure 5 that there are also six clusters in the network diagram for Taiwan. Cluster 1 (diamond, K-Core 10) apparently takes exercises as the center point of the whole network diagram, with the largest node, and the node colors of natural science and social science can be seen from the left and right nodes, respectively; so, the focus of sports research in Taiwan is around the key words of exercises. Cluster 2 (square, K-Core 9) only has the index node, which is located at the center point directly on the top of the overall structural diagram. According to its research text, the research field of this node is inclined toward physiology-related numerical indicators, constituting a certain proportion of research papers. Cluster 3 (regular triangle, K-Core 8) mainly surrounds the right side of the network diagram, focusing on sociological issues, with Taiwan as the main research area. Cluster 4 (round, K-Core 7) focuses on related influencing factors on the muscles and joints as well as psychological aspects of student athletes and players. Cluster 5 (inverted triangle, K-Core 5) only has the node of learning, and the research that has attracted attention is about motor skills learning, learning performance, etc. Cluster 6 (cross, K-Core 4) is located at an edge corner of the network diagram, exploring coaches’ knowledge and related training abilities. This cluster is not a main trend node. Although there are some identical nodes between the two regions, they are far from each other in the cluster sequencing and structure, which shows that due to the differences in people, local conditions and policies, each region has its own characteristics, which leads to completely different research patterns.

4. Discussion and Conclusions

Based on NLP technology, this study retrieved a total of 3283 abstracts from target journals (2562 in China; 721 in Taiwan); 53 key words (27 in China; 26 Taiwan) and their related matrices were then retrieved. Then, the NetDraw function of the UCINET software was used to convert the key words and their network relations into graphical structures. According to the network relationship structure of each node in Cluster 1 (square) in Figure 4, most of the related studies in China in the last 10 years focused on sports development and actively expanded the 2 major fields of social sports and sports science in China. Social sports are not only the bridge to implementing national sports and competitive sports, but also the foundation for improving various social developments [39]. At present, China’s development in this field is in a critical period of creating a well-off society in an all-round way, and its strategic goals cannot be separated from the government’s support and guidance, including deepening the reform of the sports field, enhancing the core competitiveness of sports, and improving the health level of all people [40]. Therefore, according to the keyword network relationship structure and attributes of the left half of Cluster 1 (square) (organization, system, China, theory, development, society, sports, culture, body) and Cluster 3 (regular triangle), it was found that the topics discussed by the target journals in the development of social sports covered related research such as public service systems and sports policy reform. The results of this study highlight that against the macroscopic background of social transformation and the transformation of government functions in China, we should re-plan the social sports organization system and development policy in the new era while maintaining stability, and thus create a well-off social environment that can ensure the physical and mental health of all people [39,40].
On the other hand, according to the keyword network relationship structure and attributes of the right half of Cluster 1 (square) (function, training, level, athlete, sports, ability, health, activity, and competition) and Cluster 2 (diamond) and Cluster 4 (circle), it was found that sports science research actually occupied a considerable proportion, and the topics discussed were mainly the improvement of sports standards and the training efficiency of competitive athletes. In addition, some research topics were related to sports physiology (e.g., physiological function experimental research) and sports behavior (e.g., promoting healthy sports behavior by teenagers). Since the 1980s, the reform and opening-up has brought great vitality to the development of China’s society and economy, and also injected new vitality into the development of Chinese sports science. In October 1981, China Sport Science set up sports theory (physical fitness research), sports training, sports medicine, sports biomechanics, sports psychology, and other columns when it was first published. In the article “An Analysis of Sports Science System” published in the fourth issue of 1982, the field of sports science was more clearly subdivided into sports sociology, sports economics, sports management, sports anatomy, sports medicine, sports biomechanics, sports training technology, sports comition technology, and other disciplines [39]. Therefore, sports science is a discipline between social science and natural science, and related research should not only concern social science, but also integrate it with natural science [40]. At present, dozens of branches and marginal branches (such as sports humanities, sports, nature, etc.) have gradually formed in the field of sports science in China, and in the future, sports science in China will move toward the development trend of the subdivision of disciplines and integration of theories [39].
Although the history, culture, language, and characters of China and Taiwan are of the same origin, their differences in sports policy orientation make the academic development of the two regions different. It can be inferred from the research results shown in Figure 5 that most of the related studies in Taiwan in the last 10 years have focused on sports-related issues and actively developed two major fields: natural science and social science. The main research objects were players, athletes, and students; how to promote the development of sports performance; training efficiency; and related social behaviors. According to the network relationship structure between the left half of Cluster 1 (diamond) (body, activity, effect, training, exercises, performance, movement, students, abilities, athlete, and contestants) and Cluster 2 (square) and Cluster 4 (circle), it can be seen that research in natural science should cover sports psychology, sports physiology, and sports biomechanics. At present, natural science research in Taiwan focuses on sports psychology, sports physiology, and sports biomechanics, and tries to improve the sports performance, training efficiency, physical ability, and action display of players or athletes through various scientific methods, knowledge, and principles [41]. Then, according to the network relationship structure between the right half of Cluster 1 (diamond) (development, mode, and factors) and Cluster 3 (regular triangle), we can get a glimpse of social science issues. In the process of sports academic development in Taiwan, the discipline development, key topics, and theoretical viewpoints of social sciences are indispensable knowledge bases [42]. Its core value lies in trying to critically examine the role, function, and significance of exercises/sports in society, and even exposing people’s myths and psychological reactions to exercise/sports through relevant research, thus promoting the normal development of healthy behavior, lifestyles, social systems, and school education [43].
In conclusion, due to the improvement in people’s health awareness and competitive sports standards in China, improving the public sports service system, policies, and the foundation of sports science is the driver to realize the vision of a well-off society, and it is also an important element for the diversified expansion of sports academic fields in the future to enrich the connotation of Chinese social sports [39,40]. Compared with China’s sports academic development trend, the Sports Department of the Ministry of Education of Taiwan explained the core spirit and direction of national sports development from 2013 to 2023 with the slogan “Healthy Nationals, Excellent Competition, Vibrant Taiwan” in the White Paper on Sports Policy, and revealed that scholars in relevant academic fields will continue to pay attention to the development of natural science and social science. Looking at the attributes of the keywords in Figure 4 and Figure 5, it is not difficult to see that research on the “philosophy of sport” in the two regions is slightly weak. It is suggested that all disciplines/departments should develop in a balanced way, so as to realize the connotation of diversity in the sports academic field. Although there are some identical nodes between the two regions, they are far from each other in the cluster sequencing and structure, which shows that due to the differences in people, local conditions and policies, each region has its own characteristics, which leads to completely different research patterns. Therefore, the peaceful development of cross-strait academic relationship, as an interface and platform, has played an important role in enhancing mutual understanding and trust. This has provided both sides with necessary policy analysis in the sports academic development. The two governments should continue to remove barriers to improve conditions in the environment for academic interflow and engagement [1,2].
Although the results of this study clearly point out the research focuses and trends of mainstream journals in the two regions, due to various political and economic obstacles and restrictions, the academic circles and industries of both sides have no way to communicate through sports, thus eliminating the tense and embarrassing atmosphere on both sides of the Taiwan Strait. In view of this, only by breaking the long-standing prejudice and estrangement between the people of the two regions and actively promoting dialogues between the people and the academic circles with mutual trust and mutual benefit can we create the conditions for sustainable sports academic development. In terms of research limitations, NLP can only mine and analyze text data messages, but when faced with structured data messages, there are limitations in its processing procedures [4,12,26]. It is suggested that related research can be supplemented by deep-learning based NLP technologies (such as: word2vec) to obtain more complete research results [44,45,46,47,48]. Finally, this study only analyzed two famous academic journals each for China and Taiwan, which is really only a part of the whole. In addition, it is suggested that researchers expand the scope of exploration and analysis according to the research purpose and demands so as to increase the breadth and depth of the research.

Author Contributions

Conceptualization, T.-K.H. and W.-Y.S.; methodology, T.-K.H. and W.-Y.S.; software, T.-K.H.; validation, T.-K.H.; formal analysis, W.-Y.S.; investigation, C.-H.H.; resources, W.-Y.K.; data curation, W.-Y.S.; writing—original draft preparation, T.-K.H.; writing—review and editing, T.-K.H.; visualization, W.-Y.K.; supervision, C.-H.H.; project administration, C.-Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. deLisle, J.; Goldstein, A.; Yang, G. The Internet, Social Media, and a Changing China; University of Pennsylvania Press: Philadelphia, PA, USA, 2016. [Google Scholar]
  2. Teng, E. Taiwan and Modern China. Oxford Research Encyclopedia of Asian History. Available online: https://oxfordre.com/asianhistory/view/10.1093/acrefore/9780190277727.001.0001/acrefore-9780190277727-e-155 (accessed on 1 December 2019).
  3. Hammou, B.A.; Lahcen, A.A.; Mouline, S. Towards a real-time processing framework based on improved distributed recurrent neural network variants with fastText for social big data analytics. Inf. Process. Manag. 2019, 57, 102122. [Google Scholar] [CrossRef]
  4. Han, J.; Kamber, M.; Pei, J. Data Mining: Concepts and Techniques, 3rd ed.; Morgan Kaufmann: Waltham, MA, USA, 2012. [Google Scholar]
  5. Kang, Y.; Cai, Z.; Tan, C.-W.; Huang, Q.; Liu, H. Natural language processing (NLP) in management research: A literature review. J. Manag. Anal. 2020, 7, 139–172. [Google Scholar] [CrossRef]
  6. Li, J.; Huang, G.; Fan, C.; Sun, Z.; Zhu, H. Key word retrieval for short text via word2vec, doc2vec, and textrank. Turk. J. Electr. Eng. Comput. Sci. 2019, 27, 1794–1805. [Google Scholar] [CrossRef]
  7. Makhija, V.; Ahuja, S. Rule-based text retrieval from a bibliographic database. DESIDOC J. Libr. Inf. Technol. 2018, 38, 5–10. [Google Scholar] [CrossRef]
  8. Eisenstein, J. Introduction to Natural Language Processing; The MIT Press: Massachusetts, MA, USA, 2019. [Google Scholar]
  9. Fukuoka, Y.; Lindgren, T.G.; Mintz, Y.D.; Hooper, J.; Aswani, A. Applying Natural Language Processing to Understand Motivational Profiles for Maintaining Physical Activity After a Mobile App and Accelerometer-Based Intervention: The mPED Randomized Controlled Trial. JMIR mHealth uHealth 2018, 6, e10042. [Google Scholar] [CrossRef] [PubMed]
  10. Patel, D.; Shah, D.; Shah, M. The Intertwine of Brain and Body: A Quantitative Analysis on How Big Data Influences the System of Sports. Ann. Data Sci. 2020, 7, 1–16. [Google Scholar] [CrossRef]
  11. Zeng, Y. Evaluation of Physical Education Teaching Quality in Colleges Based on the Hybrid Technology of Data Mining and Hidden Markov Model. Int. J. Emerg. Technol. Learn. (iJET) 2020, 15, 4–15. [Google Scholar] [CrossRef]
  12. Sammut, C.; Webb, G.I. Encyclopedia of Machine Learning and Data Mining, 2nd ed.; Springer: New York, NY, USA, 2017. [Google Scholar]
  13. Chen, Y.T.; Chen, L.J.; Wu, T.Y. An investigation on the images of hakka culture tourism destinations by using natural language processing. J. Outdoor Recreat. Study 2016, 29, 81–111. [Google Scholar]
  14. Cecchini, F.M.; Riedl, M.; Fersini, E.; Biemann, C. A comparison of graph-based word sense induction clustering algorithms in a pseudoword evaluation framework. Comput. Humanit. 2018, 52, 733–770. [Google Scholar] [CrossRef]
  15. Cobb-Walgren, C.J.; Pilling, B.K.; Barksdale, H.C., Jr. Does marketing need better marketing? A creative approach to understanding student perceptions of the marketing major. e-J. Bus. Educ. Scholarsh. Teach. 2017, 11, 97–117. [Google Scholar]
  16. Dobermann, D.; Hamilton, I.S. Publication patterns in developmental psychology: Trends and social networks. Int. J. Psychol. 2015, 52, 336–347. [Google Scholar] [CrossRef] [PubMed]
  17. Ma, W.-Y.; Chen, K.-J. Introduction to CKIP Chinese word segmentation system for the first international Chinese Word Segmentation Bakeoff. In Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, Sapporo, Japan, 11–12 July 2003; Volume 17, pp. 168–171. [Google Scholar] [CrossRef] [Green Version]
  18. Vaidya, P.; Harinarayana, N.S. Social semantics and similarities from user-generated keywords to information retrieval: A case study of social tags in marine science. DESIDOC J. Libr. Inf. Technol. 2018, 38, 11–15. [Google Scholar] [CrossRef]
  19. Harvey, A.; Banerjee, A.; Tong, G.; Brezovjakova, H.; Rees, S.; Byrne, M. Anatomy of an abstract: A guide to writing a scientific abstract. J. Natl. Stud. Assoc. Med. Res. 2019, 1, 54–60. [Google Scholar]
  20. O’Donoghue, T. Planning Your Qualitative Research Thesis and Project: An Introduction to Interpretivist Research in Education and the Social Sciences, 2nd ed.; Routledge: New York, NY, USA, 2018. [Google Scholar]
  21. Harzing, A.-W. Two new kids on the block: How do Crossref and Dimensions compare with Google Scholar, Microsoft Academic, Scopus and the Web of Science? Scientometrics 2019, 120, 341–349. [Google Scholar] [CrossRef]
  22. Kousha, K.; Thelwall, M. Can Google Scholar and Mendeley help to assess the scholarly impacts of dissertations? J. Inf. 2019, 13, 467–484. [Google Scholar] [CrossRef]
  23. Martín-Martín, A.; Orduna-Malea, E.; Thelwall, M.; Delgado López-Cózar, E. Google Scholar, Web of Science, and Scopus: A systematic comparison of citations in 252 subject categories. J. Informetr. 2018, 12, 1160–1177. [Google Scholar] [CrossRef]
  24. Zientek, L.R.; Werner, J.M.; Campuzano, M.V.; Nimon, K. The Use of Google Scholar for Research and Research Dissemination. New Horizons Adult Educ. Hum. Resour. Dev. 2018, 30, 39–46. [Google Scholar] [CrossRef]
  25. Qiu, Y.; Ji, W.; Zhang, C. A Hybrid Machine Learning and Population Knowledge Mining Method to Minimize Makespan and Total Tardiness of Multi-Variety Products. Appl. Sci. 2019, 9, 5286. [Google Scholar] [CrossRef]
  26. Navathe, S.B.; Ramez, E. Data warehousing and data mining. In Fundamentals of Database Systems; Pearson Education: Singapore, 2000; pp. 841–872. [Google Scholar]
  27. Newberry, K.M.; Bailey, H.R. Does semantic knowledge influence event segmentation and recall of text? Mem. Cogn. 2019, 47, 1173–1187. [Google Scholar] [CrossRef]
  28. Chen, Y.-J.; Liou, W.-C.; Wu, J.-H. Fraud detection for financial statements of business groups. Int. J. Account. Inf. Syst. 2018, 32, 1–23. [Google Scholar] [CrossRef]
  29. Hughes, G.; Musco, P.; Caine, S.; Howe, L. Lower limb asymmetry after anterior cruciate ligament reconstruction in adolescent athlete: A systematic review and meta-analysis. J. Athl. Train. 2020, 55, 811–825. [Google Scholar] [CrossRef] [PubMed]
  30. Singh, A.P.; Kushwaha, A.K. Analysis of segmentation methods for Brahmi script. DESIDOC J. Libr. Inf. Technol. 2019, 39, 109–116. [Google Scholar] [CrossRef] [Green Version]
  31. Yang, D.-H.Y.D.-H.; Wang, Y.; Yu, T.; Liu, X. Macro-level collaboration network analysis and visualization with Essential Science Indicators: A case of social science. Malays. J. Libr. Inf. Sci. 2020, 25, 121–138. [Google Scholar] [CrossRef]
  32. Aizawa, A. On the quantitative representation of term specificity based on terms and documents co-occurrences. J. Inf. Processing Soc. 2000, 41, 3332–3343. (In Japanese) [Google Scholar]
  33. Hong, Y.; Yao, Q.; Yang, Y.; Feng, J.-J.; Wu, S.-D.; Ji, W.-X.; Yao, L.; Liu, Z.-Y. Knowledge structure and theme trends analysis on general practitioner research: A Co-word perspective. BMC Fam. Pr. 2016, 17, 1. [Google Scholar] [CrossRef]
  34. van Eck, N.J.; Waltman, L. Bibliometric mapping of the computational intelligence field. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 2007, 15, 625–645. [Google Scholar] [CrossRef]
  35. Ohsawa, Y.; Benson, N.E.; Yachida, M. KeyGraph: Automatic indexing by segmenting and unifing co-occurrence graphs. Trans. Inst. Electron. Inf. Commun. Eng. 1999, 82, 391–400. [Google Scholar]
  36. Kim, J.; Kim, K. How does local partners network embeddedness affect international joint venture survival in different subnational contexts? Asia Pac. J. Manag. 2017, 35, 1055–1080. [Google Scholar] [CrossRef]
  37. Jacobs, W.; Goodson, P.; Barry, A.E.; McLeroy, K.R.; McKyer, E.L.J.; Valente, T.W. Adolescent Social Networks and Alcohol Use: Variability by Gender and Type. Subst. Use Misuse 2016, 52, 477–487. [Google Scholar] [CrossRef]
  38. Maluleka, J.R.; Onyancha, O.B. Research collaboration among library and information science schools in south africa (1991–2012): An informetrics study. Mousaion S. Afr. J. Inf. Stud. 2017, 34, 36–59. [Google Scholar] [CrossRef]
  39. Wang, X.K.; Liu, Y.Z. On the subject structure of the sports science. J. Phys. Educ. 2002, 9, 4–8. (In Chinese) [Google Scholar]
  40. Han, H.; Zheng, J. Development of social sports organization on the 70th anniversary of the founding of the People’s Republic of China: Course review, reality consideration and future trend. China Sport Sci. 2019, 39, 3–12. (In Chinese) [Google Scholar]
  41. Shih, W.Y.; Ho, T.K. Application of the text mining method to analyze the development trends of sports research in the era of big data. Phys. Educ. J. 2020, 53, 439–452. (In Chinese) [Google Scholar]
  42. Hsu, S.Y. A Sociology of Sport and Sports; Lucky Book Store: Taipei, China, 2012. (In Chinese) [Google Scholar]
  43. Coakley, J. Sports in Society: Issues and Controversies; McGraw-Hill: London, UK, 2007. [Google Scholar]
  44. Amer, F.; Hockenmaier, J.; Golparvar-Fard, M. Learning and critiquing pairwise activity relationships for schedule quality control via deep learning-based natural language processing. Autom. Constr. 2021, 134, 104036. [Google Scholar] [CrossRef]
  45. Zhan, Z.-H.; Li, J.-Y.; Zhang, J. Evolutionary deep learning: A survey. Neurocomputing 2022, 483, 42–58. [Google Scholar] [CrossRef]
  46. Haque, R.; Islam, N.; Islam, M.; Ahsan, M. A Comparative Analysis on Suicidal Ideation Detection Using NLP, Machine, and Deep Learning. Technologies 2022, 10, 57. [Google Scholar] [CrossRef]
  47. Lauriola, I.; Lavelli, A.; Aiolli, F. An introduction to Deep Learning in Natural Language Processing: Models, techniques, and tools. Neurocomputing 2021, 470, 443–456. [Google Scholar] [CrossRef]
  48. Adewumi, T.; Liwicki, F.; Liwicki, M. Word2Vec: Optimal hyperparameters and their impact on natural language processing downstream tasks. Open Comput. Sci. 2022, 12, 134–141. [Google Scholar] [CrossRef]
Figure 1. Research flow chart.
Figure 1. Research flow chart.
Applsci 12 09006 g001
Figure 2. Word cloud of word frequency in China.
Figure 2. Word cloud of word frequency in China.
Applsci 12 09006 g002
Figure 3. Word cloud of word frequency in Taiwan.
Figure 3. Word cloud of word frequency in Taiwan.
Applsci 12 09006 g003
Figure 4. Network diagram for China.
Figure 4. Network diagram for China.
Applsci 12 09006 g004
Figure 5. Network diagram for Taiwan.
Figure 5. Network diagram for Taiwan.
Applsci 12 09006 g005
Table 1. Statistical analysis table of word frequency.
Table 1. Statistical analysis table of word frequency.
RegionRankingWord (Frequency)RankingWord FrequencyRankingWord Frequency
China1sports (10,388)6athlete (1638)11activity (1107)
2exercises (4700)7culture (1533)12standard (996)
3China (2907)8society (1488)13industry (988)
4development (2736)9theory (1265)14services (983)
5training (2074)10martial arts (1191)14body (983)
Taiwan1exercises (2792)6player (639)11behavior (394)
2training (853)7athlete (517)12ability (381)
3action (737)8muscle (454)13model (372)
4body (673)9activity (412)14study (369)
5performance (658)10scale (398)14sports (369)
Table 2. TF-IDF analysis table.
Table 2. TF-IDF analysis table.
RegionRankingWordTF-IDFRankingWordTF-IDFRankingWordTF-IDF
China1sports0.004410Rat0.003219health0.0024
2exercises0.004111public0.003020theory0.0024
3training0.003912service0.003021behavior0.0023
4martial arts0.003713body0.003022society0.0023
5athlete0.003614game0.002923organization0.0022
6culture0.003615activity0.002824standard0.0021
7China0.003516eenager0.002725ability0.0021
8development0.003417policy0.002626function0.0021
9industry0.003318competition0.002527system0.0021
Taiwan1training0.003210behavior0.002119coach0.0017
2action0.002911scale0.002120model0.0017
3body0.002712performance0.002021factor0.0016
4muscle0.002513self0.002022index0.0016
5player0.002514sports 0.002023Taiwan0.0016
6athlete0.002515society0.001924development0.0015
7study0.002516student0.001825effect0.0014
8sports0.002417ability0.001826health0.0014
9joint0.002218activity0.001727--
Table 3. Statistics of co-occurrence times.
Table 3. Statistics of co-occurrence times.
RegionRankingWordWordCo-Occurrence RankingWordWordCo-Occurrence
China1sportsdevelopment78211exercisesstandard318
2sportssociety71612sportssystem302
3developmentsociety43913exercisesdevelopment291
4sportstheory42714developmenttheory289
5sportsexercises41915theorysociety245
6sportsChina41516culturedevelopment237
7exercisesathlete39417Chinasociety233
8exercisestraining37818sportsctivity231
9Chinadevelopment32719sportsservice225
10sportsculture32519sportsorganization225
Taiwan1exercisesperformance19311exercisesdevelopment111
2trainingexercises16312exercisesability109
3playerxercises15113exercisesfactor107
4exercisesathlete13714trainingperformance98
5bodysports13215exercises‘index97
6exerciseseffect12316exercisesscale86
7actionexercises12116exercisesTaiwan86
8exercisesactivity12018muscleexercises85
9exercisesmodel11618exercisessociety85
10exerciseshealth11220actionperformance82
Table 4. Co-word matrix table.
Table 4. Co-word matrix table.
ChinaA.B.C.D.E.TaiwanA.B.C.D.E.
A. sports041916893149A. training070455280
B. exercises419037820394B. action700423769
C. training168378015174C. body454202527
D. martial arts93201501D. muscle523725023
E. athlete14939417410E. player806927230
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Ho, T.-K.; Shih, W.-Y.; Kao, W.-Y.; Hsu, C.-H.; Wu, C.-Y. Analysis of the Development Trend of Sports Research in China and Taiwan Using Natural Language Processing. Appl. Sci. 2022, 12, 9006. https://doi.org/10.3390/app12189006

AMA Style

Ho T-K, Shih W-Y, Kao W-Y, Hsu C-H, Wu C-Y. Analysis of the Development Trend of Sports Research in China and Taiwan Using Natural Language Processing. Applied Sciences. 2022; 12(18):9006. https://doi.org/10.3390/app12189006

Chicago/Turabian Style

Ho, Tu-Kuang, Wei-Yuan Shih, Wen-Yang Kao, Chin-Hsien Hsu, and Cheng-Ying Wu. 2022. "Analysis of the Development Trend of Sports Research in China and Taiwan Using Natural Language Processing" Applied Sciences 12, no. 18: 9006. https://doi.org/10.3390/app12189006

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop