Automated Construction and Mining of Text-Based Modern Chinese Character Databases: A Case Study of Fujian
Abstract
:1. Introduction
- (1)
- It transcends the limitations of traditional database by utilizing the temporal and spatial attributes of persons to construct complex spatio-temporal chains.
- (2)
- It constructs a matrix of character state changes to identify fluctuation patterns and reveal the underlying causes.
- (3)
- It employs a random walk algorithm to determine the primary migration directions of characters, offering new insights into the dynamic migration patterns of historical figures.
- (4)
- It utilizes the ZhKeyBERT model to extract the keywords from time windows corresponding to historical events, thereby facilitating a deeper analysis of the factors influencing character movements.
2. Materials and Methods
2.1. Data Sources
2.2. Study Area
2.3. Construction of Spatio-Temporal Chain
2.3.1. Character Behavior Time Series
2.3.2. Processing of Character Spatial Location
2.4. Spatiotemporal State Changes of Characters
2.4.1. Character State Change Matrix
2.4.2. Main Direction of Character Movement
2.4.3. Keyword Extraction
3. Results
3.1. Automatic Construction and Querying of Spatio-Temporal Chains
3.2. Results of Character State Changes
4. Discussion
4.1. Verification of Random Walk
4.2. In-Depth Analysis of Character State Changes
4.2.1. Stable Period
- (1)
- 1840–1910
- (2)
- 1960–2009
4.2.2. Volatile Period
- (1)
- 1910–1913
- (2)
- 1924–1927
- (3)
- 1937–1945
- (4)
- 1945–1949
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
CCEd | Clergy of the Church of England Database |
PASE | The Prosopography of Anglo-Saxon England |
FIRP | T In the First Person |
CBDB | China Biographical Database Project |
OCR | Optical Character Recognition |
PRD | the Pearl River Delta |
CPC | the Communist Party of China |
KMT | Kuomintang of China |
NRA | the National Revolutionary Army |
CPLA | Chinese People’s Liberation Army |
CPPCC | the Chinese People’s Political Consultative Conference |
References
- Zhang, H.P.; Geng, Y.Z.; Zheng, S.Q.; Zhu, Y. Evaluating Individuals in Modern Chinese History through Historical Materialism. Hist. Bimon. 2020, 6, 4–29. [Google Scholar]
- Xiong, Y.H.; Yu, L.P. Evaluation of Historical Figures in the Perspective of Historical Materialism. J. Hubei Eng. Univ. 2020, 40, 83–87. [Google Scholar]
- Li, K.; Wang, Y.J. Design and Realization of Historical Human Geographical Information System Based on WebGIS. Geospat. Inf. 2019, 17, 59–61. [Google Scholar]
- Cai, L.; Luo, L.K.; Wu, Y. On the Construction of Hunan Modern Figures Database. Libr. Work. Coll. Univ. 2009, 29, 29–31. [Google Scholar]
- Qian Xuesen Library. Available online: https://www.qianxslib.sjtu.edu.cn/ (accessed on 21 February 2025).
- Clergy of the Church of England Database. Available online: https://theclergydatabase.org.uk/ (accessed on 21 February 2025).
- Prosopography of the Byzantine World. Available online: https://pbe.kcl.ac.uk/ (accessed on 21 February 2025).
- PASE. Available online: https://pase.ac.uk/ (accessed on 21 February 2025).
- Alexander Street Press. Available online: https://alexanderstreet.com/ (accessed on 21 February 2025).
- China Biographical Database Project (CBDB). Available online: https://projects.iq.harvard.edu/cbdb/home (accessed on 21 February 2025).
- Xu, J.J.; Ge, H.M. The Figures Database Construction in Domestic Libraries. Digit. Libr. Forum 2015, 12, 50–55. [Google Scholar]
- Zhang, J.; Qian, Y.; Leng, H.; Hou, S.; Chen, J. Survey of named entity recognition research based on deep learning. Mod. Electron. Tech. 2024, 47, 32–42. [Google Scholar] [CrossRef]
- Chen, S.; Wang, H. China Biographical Database (Cbdb): A Relational Database for Prosopographical Research of Pre-Modern China. J. Open Humanit. Data 2022, 8, 13. [Google Scholar] [CrossRef]
- Yang, G.C.-Y.; Koo, A. Transitions across Borders: Migration Aspirations of Young People from Kinmen, Taiwan. Popul. Space Place 2024, 30, e2843. [Google Scholar] [CrossRef]
- Zhu, M.; Vidal, S. Family Migration in China: A Longitudinal Analysis of Couples’ Migration Behaviour. Popul. Space Place 2024, 30, e2751. [Google Scholar] [CrossRef]
- Baláž, V.; Lichner, I.; Jeck, T. Geography of Migration Motives: Matching Migration Motives with Socioeconomic Data. Morav. Geogr. Rep. 2023, 31, 141–152. [Google Scholar] [CrossRef]
- Berman, M.; Wang-Cendejas, R. Rural–Urban Migration of Alaska Indigenous Peoples: Changing Patterns and Drivers. Ann. Reg. Sci. 2024, 73, 1865–1883. [Google Scholar] [CrossRef]
- Zhang, D.; Yiwen, Z.; Fu, G. Understanding Counter-Urbanization and Re-Urbanization in Pandemic: Insights from People’s Migration Behavior in China. Habitat Int. 2024, 150, 103116. [Google Scholar] [CrossRef]
- Ren, T.; Xu, Y.; Liu, L.; Guo, E.; Wang, P. Identifying Vital Nodes in Complex Network by Considering Multiplex Influences. Adv. Complex Syst. 2023, 26, 2350009. [Google Scholar] [CrossRef]
- Rezaei, A.A.; Munoz, J.; Jalili, M.; Khayyam, H. A Machine Learning-Based Approach for Vital Node Identification in Complex Networks. Expert Syst. Appl. 2023, 214, 119086. [Google Scholar] [CrossRef]
- Zhang, J.; Liang, W. Identification of Important Nodes Based on Entropy and Neighborhood Relations in Complex Network. In Proceedings of the 2nd International Conference on Signal Processing, Computer Networks and Communications, Xiamen, China, 8–10 December 2023; pp. 332–338. [Google Scholar]
- Yuan, B.; Song, T.; Yao, J. Identification of Important Nodes in the Information Propagation Network Based on the Artificial Intelligence Method. In Proceedings of the 2024 4th International Conference on Consumer Electronics and Computer Engineering (ICCECE), Guangzhou, China, 12–14 January 2024; pp. 11–14. [Google Scholar]
- Ni, L.; Ge, J.; Zhang, Y.; Luo, W.; Sheng, V.S. Semi-Supervised Local Community Detection. IEEE Trans. Knowl. Data Eng. 2023, 36, 823–839. [Google Scholar] [CrossRef]
- Ruggeri, N.; Contisciani, M.; Battiston, F.; De Bacco, C. Community Detection in Large Hypergraphs. Sci. Adv. 2023, 9, eadg9159. [Google Scholar] [CrossRef]
- Hernández-García, Á.; Cuenca-Enrique, C.; Traxler, A.; López-Pernas, S.; Conde-González, M.Á.; Saqr, M. Community Detection in Learning Networks Using R. In Learning Analytics Methods and Tutorials: A Practical Guide Using R; Springer: Cham, Switzerland, 2024; pp. 519–540. [Google Scholar]
- Kojaku, S.; Radicchi, F.; Ahn, Y.-Y.; Fortunato, S. Network Community Detection via Neural Embeddings. Nat. Commun. 2024, 15, 9446. [Google Scholar] [CrossRef]
- Chen, D.; Su, H.; Wang, X.; Pan, G.-J.; Chen, G. Finite-Size Scaling of Geometric Renormalization Flows in Complex Networks. Phys. Rev. E 2021, 104, 034304. [Google Scholar] [CrossRef]
- Garuccio, E.; Lalli, M.; Garlaschelli, D. Multiscale Network Renormalization: Scale-Invariance without Geometry. Phys. Rev. Res. 2023, 5, 043101. [Google Scholar] [CrossRef]
- Zheng, M.; García-Pérez, G.; Boguñá, M.; Serrano, M.Á. Geometric Renormalization of Weighted Networks. Commun. Phys. 2024, 7, 97. [Google Scholar] [CrossRef]
- Chen, D.; Su, H. Extracting High-Fidelity Smaller Scale Subgraphs of Complex Networks by Edge-Reinforced Random Walk. IEEE Trans. Comput. Soc. Syst. 2024, 11, 6181–6191. [Google Scholar] [CrossRef]
- Wang, Y. Research on the TF–IDF Algorithm Combined with Semantics for Automatic Extraction of Keywords from Network News Texts. J. Intell. Syst. 2024, 33, 20230300. [Google Scholar] [CrossRef]
- Guo, W.; Wang, Z.; Han, F. Multifeature Fusion Keyword Extraction Algorithm Based on TextRank. IEEE Access 2022, 10, 71805–71813. [Google Scholar] [CrossRef]
- Tang, M.; Gandhi, P.; Kabir, M.A.; Zou, C.; Blakey, J.; Luo, X. Progress Notes Classification and Keyword Extraction Using Attention-Based Deep Learning Models with BERT. arXiv 2019, arXiv:1910.05786. [Google Scholar]
- Liu, B.; Lv, Z.; Zhu, N.; Chang, D.; Lu, M. Hot Keyword Extraction of Sci-Tech Periodicals Based on the Improved BERT Model. KSII Trans. Internet Inf. Syst. (TIIS) 2022, 16, 1800–1817. [Google Scholar]
- Lu, X.Y.; Zheng, Y.; Zan, X. Keyword Extraction for Product Research and Development Documents Using BERT-BiLSTM-TFIDF. Ind. Eng. Manag. 2023, 28, 99–106. [Google Scholar] [CrossRef]
- Gupta, A.; Chadha, A.; Tewari, V. A Natural Language Processing Model on Bert and Yake Technique for Keyword Extraction on Sustainability Reports. IEEE Access 2024, 12, 7942–7951. [Google Scholar] [CrossRef]
- Zhang, Y.; Song, H.X. A study on the Temporal Characteristics of the Distribution of Buddhist Temples in Hubei Based on Local Records. Tradit. Chin. Archit. Gard. 2023, 168, 69–72. [Google Scholar]
- Guo, C.; Hu, D.; Du, X.H.; Li, D.W.; Yang, Y.C.; Cheng, X.H. A dataset of centennial figures in the history of Nanjing. China Sci. Data 2020, 5, 313–324. [Google Scholar] [CrossRef]
- School of Computer Science, Beijing University of Technology, Yuan Wu. Available online: https://cs.bit.edu.cn/szdw/jsml2/rjznyrjgcyjs2/6817cd3b9c534b2f8fcb62c4cfd4e2dd.htm (accessed on 5 March 2025).
- Yuan, W.; Yuan, W. Iteration-Based Three-Step Unsupervised Chinese Word Segmentation. Method. Patent CN108062305B, 17 December 2021. [Google Scholar]
- Yuan, W.; Zhuang, D.; Yuan, W.; Dongsheng, Q. Equal Arc Ratio Projection and a New Spherical Triangle Quadtree Model. Int. J. Geogr. Inf. Sci. 2010, 24, 1703–1723. [Google Scholar]
- Iacopini, I.; Milojević, S.; Latora, V. Network Dynamics of Innovation Processes. Phys. Rev. Lett. 2018, 120, 048301. [Google Scholar] [CrossRef] [PubMed]
- Chen, D.; Su, H.; Zeng, Z. Geometric Renormalization Reveals the Self-Similarity of Weighted Networks. IEEE Trans. Comput. Soc. Syst. 2022, 10, 426–434. [Google Scholar] [CrossRef]
- Wang, C.J. Study of Specialization of Population Data and Information System Based on GIS—A Case Study on Fujian Province. Master’s Thesis, Fujian Normal University, Fuzhou, China, 2005. [Google Scholar]
- Li, R.J. On the Distribution and Redistribution of Population in Fujian. Science 1989, 1, 58–61. [Google Scholar]
- Lin, X. The Population Changes and Urban Modernization of Modern Xiamen. South China Popul. 2007, 3, 38–45. [Google Scholar]
- Zhu, Y. Migration and Population Changes in Fujian Since the 1980s. J. Fujian Norm. Univ. (Philos. Soc. Sci. Ed.) 1994, 1, 17–23. [Google Scholar]
- Pan, D.L. Retrospective Review of Historical Memory and Narration of the Huanghuagang Uprising in the Early 20th Century. J. Hubei Univ. (Philos. Soc. Sci.) 2019, 46, 61–68. [Google Scholar] [CrossRef]
- Wu, L.L. Changes of power in Yangzhou during the Xinhai Revolution. Stud. Repub. China 2022, 1, 107–118. [Google Scholar]
- Tang, X. On the Impact of the Xinhai Revolution on Fujian Society. Fujian Hist. Chron. 2017, 1, 5–8. [Google Scholar]
- Tang, W.F. Research on the Central Section of Alliance Association During the Revolution of 1911. Ph.D. Thesis, Nankai University, Tianjin, China, 2014. [Google Scholar]
- Lin, P.H. A Study on Social Transformation and Cultural Development in Fujian Before and After the Xinhai Revolution. J. Open Univ. Fujian 2002, 48–50, 65. [Google Scholar]
- Liu, L. Re-Examination of the Lmage of Local Warlords in Fujian During the Republic of China: An Investigation Centered on Chen Guohui. Master’s Thesis, Xiamen University, Xiamen, China, 2020. [Google Scholar]
- Liu, J.Y. Study and Reflection on Spirit of Nanchang Uprising from Perspective of 100 years of Founding of Our Army. J. Party Sch. CPC Nanchang Munic. Comm. 2025, 23, 22–28. [Google Scholar]
- Cao, M.H. A Preliminary Study on the Fujian Battlefield During the Northern Expedition. Mod. Chin. Hist. Stud. 1987, 1, 165–177. [Google Scholar]
- Han, Z. An Analysis of the Military Strategy and Rapid Victory in the Fujian Battlefield During the Northern Expedition. Mil. Hist. Res. 2001, 4, 63–69. [Google Scholar]
- Zhong, R.X.; Lin, C.R. A Brief Account of the Struggles of the People in Quanzhou and Its Counties from the May Fourth Movement to the Great Revolution. CPC Hist. Res. Teach. 1981, 2, 13–22. [Google Scholar]
- Huang, Q.Q. The Revolutionary Deeds of Martyr Fang Erhao. CPC Hist. Res. Teach. 1981, 4, 19–26. [Google Scholar]
- Chen Ming. Available online: https://baike.baidu.com/item/%E9%99%88%E6%98%8E/6769972 (accessed on 2 April 2025).
- Xiong, Y.Q. A Study of the Anti-Japanese and Anti-Tenacious Struggle Led by the CPC Fujian Provincial Committee During the War (1937–1945). Master’s Thesis, Fujian Normal University, Fuzhou, China, 2023. [Google Scholar]
- Eastern China Field Army. Available online: https://zh.wikipedia.org/wiki/%E5%8D%8E%E4%B8%9C%E9%87%8E%E6%88%98%E5%86%9B (accessed on 2 April 2025).
- Wu, M.G. The Historical Position and Role of Fujian in the Chinese People’s War of Resistance Against Japanese Aggression. Fujian Party Hist. Mon. 2015, 11, 40–47. [Google Scholar]
- Wang, S.Z. Fujian’s Historical Role and Significant Contributions in the National War of Resistance Against Japan. Fujian Party Hist. Mon. 2016, 1, 39–43. [Google Scholar]
- Zeng, G.X. The CPC Minzhong Special Committee and the Guerrilla Warfare Against Japanese Invasion Along the Fujian Coast. CPC Hist. Res. Teach. 2015, 5, 49–60. [Google Scholar]
- Wang, Y.J. The Role and Contribution of Fujian Overseas Chinese in the Chinese People’s War of Resistance against Japanese Aggression. Fujian Party Hist. Mon. 2015, 11, 11–15. [Google Scholar]
- The Chinese People’s Political Consultative Conference. Available online: https://zh.wikipedia.org/wiki/%E4%B8%AD%E5%9B%BD%E4%BA%BA%E6%B0%91%E6%94%BF%E6%B2%BB%E5%8D%8F%E5%95%86%E4%BC%9A%E8%AE%AE%E7%AC%AC%E4%B8%80%E5%B1%8A%E5%85%A8%E4%BD%93%E4%BC%9A%E8%AE%AE (accessed on 2 April 2025).
- Zhou, Z.X. A Study on the Yangtze River Detachment’s Southward Advance to Fujian. Master’s Thesis, Party School of the CPC Central Committee, Beijing, China, 2018. [Google Scholar]
Keyword | Keyword (English) * | Similarity |
---|---|---|
南京临时政府 | The Nanjing Provisional Government | 0.6284 |
广州起义 | Guangzhou Uprising | 0.6083 |
武昌起义 | Wuchang Uprising | 0.582 |
中华革命 | Chinese Revolution | 0.5706 |
中国人民解放军 | Chinese People’s Liberation Army (CPLA) | 0.5681 |
广东省政府 | The People’s Government of Guangdong Province | 0.5601 |
中华民国政府 | Republic of China | 0.5554 |
任粤军 | Appointment to the Cantonese Army | 0.5318 |
政权 | political power | 0.5214 |
革命政权 | Revolutionary regime | 0.5152 |
Keyword | Keyword (English) * | Similarity |
---|---|---|
南昌起义 | Nanchang Uprising | 0.7696 |
农民起义 | Peasant revolt | 0.6198 |
武装起义 | Armed uprising | 0.6168 |
起义 | uprising | 0.6036 |
游击区 | guerrilla zone | 0.5939 |
叛乱 | rebellion | 0.5904 |
游击队 | guerilla | 0.5891 |
起义军 | insurrectionary army | 0.5873 |
国民革命军 | National Revolutionary Army | 0.5801 |
革命军 | Revolutionary Army | 0.5669 |
Keyword | Keyword (English) * | Similarity |
---|---|---|
中国人民解放军 | CPLA | 0.6201 |
中国人民解放军空军 | Air Force of the CPLA | 0.6082 |
北京军区 | Beijing Military Region | 0.5897 |
福州军区 | Fuzhou Military Region | 0.5788 |
华东野战军 | Eastern China Field Army | 0.5754 |
朝鲜人民军 | the Korean People’s Army | 0.5719 |
驻华大使 | ambassador | 0.5686 |
中国空军 | Chinese air force | 0.553 |
台湾人 | Taiwanese | 0.5501 |
广西军区 | Guangxi Military Region | 0.5488 |
Keyword | Keyword (English) * | Similarity |
---|---|---|
福州军区 | Fuzhou Military Region | 0.5691 |
中国民主同盟中央委员会 | China Democratic League Central Committee | 0.5578 |
中央警官 | Central Police | 0.5048 |
中国人民解放战争 | Chinese People’s War of Liberation | 0.5584 |
中国人民解放军 | CPLA | 0.5686 |
北京大学法学院 | Peking University Law School | 0.4877 |
中国人民政治协商会议 | Chinese People’s Political Consultative Conference | 0.5574 |
广州市委 | Guangzhou Municipal Government | 0.5095 |
华北军区 | Northern China Military Region | 0.5347 |
广西军区 | Guangxi Military Region | 0.5587 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jian, X.; Yuan, W.; Yuan, W.; Gao, X.; Wang, R. Automated Construction and Mining of Text-Based Modern Chinese Character Databases: A Case Study of Fujian. Information 2025, 16, 324. https://doi.org/10.3390/info16040324
Jian X, Yuan W, Yuan W, Gao X, Wang R. Automated Construction and Mining of Text-Based Modern Chinese Character Databases: A Case Study of Fujian. Information. 2025; 16(4):324. https://doi.org/10.3390/info16040324
Chicago/Turabian StyleJian, Xueyan, Wen Yuan, Wu Yuan, Xinqi Gao, and Rong Wang. 2025. "Automated Construction and Mining of Text-Based Modern Chinese Character Databases: A Case Study of Fujian" Information 16, no. 4: 324. https://doi.org/10.3390/info16040324
APA StyleJian, X., Yuan, W., Yuan, W., Gao, X., & Wang, R. (2025). Automated Construction and Mining of Text-Based Modern Chinese Character Databases: A Case Study of Fujian. Information, 16(4), 324. https://doi.org/10.3390/info16040324