Explicit and Implicit Knowledge in Large-Scale Linguistic Data and Digital Footprints from Social Networks
Abstract
:1. Introduction
1.1. Neural Network Technologies and Hidden Information Extraction
1.2. The Impact of Large Language Models (LLMs)
1.3. The Use of New-Type Tools in Scientific Research
1.4. Promising Research Directions Using New-Type Tools
2. Materials and Methods
- What algorithmic approaches can be developed and empirically tested for the automated analysis of explicit information contained in user-generated content and digital traces in social media?
- What algorithms can be developed for detecting implicit information in user-generated content and digital traces in social media, and how can the validation of results be conducted?
- What are the key advantages and limitations of different algorithms in analyzing explicit and implicit information in user-generated content on social media?
2.1. Materials
2.2. Methods
2.2.1. Algorithm for Explicit Knowledge Analysis
2.2.2. Algorithm for Implicit Knowledge Analysis
2.2.3. Tools and Methodology
3. Results
3.1. Explicit Knowledge Analysis
- Assessment of City Administration’s Actions
- 2.
- Perceived Threat of Avian Influenza
- 3.
- Media Coverage of the Issue
- 4.
- Impact of Quarantine on Daily Life
- 5.
- Expert Opinions
- 6.
- Social Well-Being
3.2. Analysis of Implicit Knowledge
- Concerns Over the Escalation of the Issue
- 2.
- Distrust in the Media and Perceived Media Manipulation
- 3.
- Uncertainty About Consequences for the Public
4. Discussion
4.1. Challenges in the Application of LLMs
4.2. Methods of Modern Linguistics
4.2.1. Syntactic Analysis
4.2.2. Semantic Analysis
4.2.3. Pragmatic Analysis
4.3. Linguistic Principles and Their Influence on LLMs
The Principle of Compositionality
4.4. Integration of LLMs into Linguistic Research
- TextAnalyst 2.32
- ChatGPT (o1 and o1-mini)
5. Conclusions
5.1. Explicit Information Analysis
5.2. Implicit Information Analysis
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
Appendix B
Appendix C
Appendix D
References
- Hulstijn, J.H. Theoretical and empirical issues in the study of implicit and explicit second-language learning. Stud. Second Lang. Acquis. 2025, 27, 129–140. [Google Scholar]
- DeKeyser, R.M. Cognitive–psychological processes in second language learning. In Handbook of Second Language Teaching; Long, M., Doughty, C., Eds.; Oxford Blackwell: Oxford, UK, 2009; pp. 119–138. [Google Scholar]
- Dörnyei, Z. The Psychology of Second Language Acquisition; Oxford University Press: New York, NY, USA, 2009. [Google Scholar]
- Reber, A.S. Implicit Learning and Tacit Knowledge: An Essay on the Cognitive Unconscious; Clarendon Press: London, UK, 1993. [Google Scholar]
- Williams, J.N. Implicit learning in second language acquisition. In The New Handbook of Second Language Acquisition; Ritchie, W., Bhatia, T.K., Eds.; Emerald Group Publishing: Bingley, UK, 2012; pp. 319–344. [Google Scholar]
- Suzuki, Y.; DeKeyser, R.M. The interface of explicit and implicit knowledge in a second language: Insights from individual differences in cognitive aptitudes. Lang. Learn. 2017, 67, 747–790. [Google Scholar] [CrossRef]
- Suzuki, Y. Validity of new measures of implicit knowledge: Distinguishing implicit knowledge from automatized explicit knowledge. Appl. Psycholinguist. 2017, 38, 1229–1261. [Google Scholar] [CrossRef]
- Dingli, A. Knowledge Annotation: Making Implicit Knowledge Explicit; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
- Halevy, A.; Norvig, P.; Pereira, F. The Unreasonable Effectiveness of Data. March/April 2009 IEEE Intelligent Systems. Expert Opinion. 2024. Available online: https://docs.yandex.ru/docs/view?tm=1737979574&tld=ru&lang=en&name=2009-halevy.pdf&text=The%20Unreasonable%20Effectiveness%20of%20Data&url=https%3A%2F%2Fgwern.net%2Fdoc%2Fai%2Fscaling%2F2009-halevy.pdf&lr=213&mime=pdf&l10n=ru&sign=cdb15b7924ace0bdea216e42b0a6fbd0&keyno=0&serpParams=tm%3D1737979574%26tld%3Dru%26lang%3Den%26name%3D2009-halevy.pdf%26text%3DThe%2BUnreasonable%2BEffectiveness%2Bof%2BData%26url%3Dhttps%253A%2F%2Fgwern.net%2Fdoc%2Fai%2Fscaling%2F2009-halevy.pdf%26lr%3D213%26mime%3Dpdf%26l10n%3Dru%26sign%3Dcdb15b7924ace0bdea216e42b0a6fbd0%26keyno%3D0 (accessed on 1 December 2024).
- Sun, C.; Sun, A.; Shrivastava, A.; Shrivastava, S.; Singh, S.; Singh, A.; Gupta, A. Revisiting unreasonable effectiveness of data in deep learning era. arXiv 2017, arXiv:1707.02968v2. [Google Scholar]
- Pennington, J.; Socher, R.; Manning, C.D. GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar] [CrossRef]
- Arisoy, E.; Sainath, T.N.; Kingsbury, B.; Ramabhadran, B. Deep neural network language models. In Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT, Montreal, QC, Canada, 8 June 2012. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
- Hu, S. Detecting concealed information in text and speech. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 402–412. [Google Scholar]
- Hadi, M.; Qasem, U.; Tashi, A.; Qureshi, R. A Survey on Large Language Models: Applications, Challenges, Limitations, and Practical Usage. TechRxiv. 2023. Available online: https://arxiv.org/pdf/2303.18223 (accessed on 1 January 2024).
- Nordling, L. How ChatGPT is transforming the postdoc experience. Nature 2023, 622, 655–657. [Google Scholar] [PubMed]
- Griffin, C.; Wallace, D.; Mateos-Garcia, J.; Schieve, H.; Kohli, P. A New Golden Age of Discovery. Seizing the AI for Science Opportunity. 2024. Available online: https://deepmind.google/public-policy/ai-for-science/ (accessed on 1 December 2024).
- Tang, J.; LeBel, A.; Jain, S.; Huth, A.G. Semantic reconstruction of continuous language from non-invasive brain recordings. Nat. Neurosci. 2023, 26, 858–866. [Google Scholar] [CrossRef] [PubMed]
- Zeng, Y.; Lin, H.; Zhang, J.; Yang, D.; Jia, R.; Shi, W. How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs. arXiv 2004, arXiv:2401.06373v2. [Google Scholar]
- Wei, J.; Yang, C.; Song, X.; Lu, Y.; Hu, N.; Huang, J.; Tran, D.; Peng, D.; Liu, R.; Huang, D.; et al. Long-form factuality in large language models. arXiv 2024, arXiv:2403.18802v4. [Google Scholar] [CrossRef]
- Kharlamov, A.A.; Pilgun, M.A. Cognitive Studies in the Interpretation of Social Media Data: TextAnalyst and ChatGPT. Pattern Recognit. Image Anal. 2024, 34, 597–609. [Google Scholar] [CrossRef]
- Pilgun, M.; Koreneva, O. Información implícita y explícita en la percepción del covid-19 en los medios de comunicación social en español, alemán y ruso. Palabra Clave 2022, 25, e2513. [Google Scholar]
- Opitz, J.; Wein, S.; Schneider, N. Natural Language Processing RELIES on Linguistics. arXiv 2024, arXiv:2405.05966v3. [Google Scholar] [CrossRef]
- Kim, H.; Sclar, M.; Zhou, X.; Le Bras, R.; Kim, G.; Choi, Y.; Sap, M. Fantom: A benchmark for stress-testing machine theory of mind in interactions. arXiv 2023, arXiv:2310.15421. [Google Scholar]
- Kim, S.; Suk, J.; Cho, J.Y.; Longpre, S.; Kim, C.; Yoon, D.; Son, G.; Cho, Y.; Shafayat, S.; Baek, J.; et al. Fine-grained Evaluation of Language Models with Language Models. Computation and Language (cs.CL). arXiv 2024, arXiv:2406.05761. [Google Scholar]
Parameter | Data |
---|---|
Messages | 2061 |
Authors | 1454 |
Tokens | 1,316,387 |
Engagement | 108,430 |
Audience | 39,454,014 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pilgun, M. Explicit and Implicit Knowledge in Large-Scale Linguistic Data and Digital Footprints from Social Networks. Big Data Cogn. Comput. 2025, 9, 75. https://doi.org/10.3390/bdcc9040075
Pilgun M. Explicit and Implicit Knowledge in Large-Scale Linguistic Data and Digital Footprints from Social Networks. Big Data and Cognitive Computing. 2025; 9(4):75. https://doi.org/10.3390/bdcc9040075
Chicago/Turabian StylePilgun, Maria. 2025. "Explicit and Implicit Knowledge in Large-Scale Linguistic Data and Digital Footprints from Social Networks" Big Data and Cognitive Computing 9, no. 4: 75. https://doi.org/10.3390/bdcc9040075
APA StylePilgun, M. (2025). Explicit and Implicit Knowledge in Large-Scale Linguistic Data and Digital Footprints from Social Networks. Big Data and Cognitive Computing, 9(4), 75. https://doi.org/10.3390/bdcc9040075