Leveraging Large Language Models for Sentiment Analysis and Investment Strategy Development in Financial Markets
Abstract
:1. Introduction
- Can open-source LLMs provide stability and profitability for portfolios based on financial sentiment analysis?
- Can performance enhancement techniques such as CoT, SuperICL, and LLM Bootstrapping improve the portfolio performance of open-source LLMs in financial sentiment analysis?
- How do the reliability and explainability of generative LLMs’ analysis results contribute to the subsequent development of investment strategies and advancements in research?
2. Related Works
2.1. Language Models
2.2. Sentiment Analysis
2.3. Prompting Performance Enhancement Techniques
3. Investment Strategy Design and Evaluation Methodology
3.1. Portfolio Construction Based on LLM Sentiment Analysis
3.2. Portfolio Performance Measurement Metrics
- Peak value: The highest asset value reached by the portfolio during a specific period.
- Trough value: The lowest asset value reached after the peak within the same period.
- Final portfolio value: The asset value of the portfolio at the end of the study period.
- Initial investment: The initial investment amount at the start of the study period (=1).
4. Experimental Design
4.1. Experimental Data
4.2. LLMs Used in Sentiment Analysis
4.3. Prompt Configuration
5. Experimental Results
6. Explainability Analysis of LLMs
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
References
- Tetlock, P.C. Giving content to investor sentiment: The role of media in the stock market. J. Financ. 2007, 62, 1139–1168. [Google Scholar] [CrossRef]
- Alanyali, M.; Moat, H.S.; Preis, T. Quantifying the relationship between financial news and the stock market. Sci. Rep. 2013, 3, 3578. [Google Scholar] [CrossRef] [PubMed]
- Bollen, J.; Mao, H.; Pepe, A. Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In Proceedings of the International AAAI Conference on Web and Social Media, Barcelona, Spain, 17–21 July 2011; Volume 5, pp. 450–453. [Google Scholar]
- Bollen, J.; Mao, H.; Zeng, X. Twitter mood predicts the stock market. J. Comput. Sci. 2011, 2, 1–8. [Google Scholar] [CrossRef]
- Chan, W.S. Stock price reaction to news and no-news: Drift and reversal after headlines. J. Financ. Econ. 2003, 70, 223–260. [Google Scholar] [CrossRef]
- Malo, P.; Sinha, A.; Takala, P.; Ahlgren, O.; Lappalainen, I. Learning the roles of directional expressions and domain concepts in financial news analysis. In Proceedings of the 2013 IEEE 13th International Conference on Data Mining Workshops, Dallas, TX, USA, 7–10 December 2013; pp. 945–954. [Google Scholar]
- Brown, T.B. Language models are few-shot learners. arXiv 2020, arXiv:2005.14165. [Google Scholar]
- Deng, X.; Bashlovkina, V.; Han, F.; Baumgartner, S.; Bendersky, M. LLMs to the Moon? Reddit Market Sentiment Analysis with Large Language Models. In Proceedings of the Companion Proceedings of the ACM Web Conference 2023, Austin, TX, USA, 30 April–4 May 2023; pp. 1014–1019. [Google Scholar]
- Lopez-Lira, A.; Tang, Y. Can ChatGPT forecast stock price movements? Return predictability and large language models. arXiv 2023, arXiv:2304.07619. [Google Scholar] [CrossRef]
- Jang, E.; Choi, H.; Lee, H. Stock prediction using combination of BERT sentiment analysis and macro economy index. J. Korea Soc. Comput. Inf. 2020, 25, 47–56. [Google Scholar]
- Bendi-Ouis, Y.; Dutarte, D.; Hinaut, X. Deploying Open-Source Large Language Models: A Performance Analysis. arXiv 2024, arXiv:2409.14887. [Google Scholar]
- Konstantinidis, T.; Iacovides, G.; Xu, M.; Constantinides, T.G.; Mandic, D. FinLlama: Financial Sentiment Classification for Algorithmic Trading Applications. arXiv 2024, arXiv:2403.12285. [Google Scholar]
- Mai, Z.; Zhang, J.; Xu, Z.; Xiao, Z. Financial sentiment analysis meets Llama 3: A comprehensive analysis. In Proceedings of the 2024 7th International Conference on Machine Learning and Machine Intelligence (MLMI), Osaka, Japan, 2–4 August 2024; pp. 171–175. [Google Scholar]
- Brown, P.F.; Della Pietra, V.J.; Desouza, P.V.; Lai, J.C.; Mercer, R.L. Class-based n-gram models of natural language. Comput. Linguist. 1992, 18, 467–480. [Google Scholar]
- Cavnar, W.B.; Trenkle, J.M. N-gram-based text categorization. In Proceedings of the SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, NV, USA, 11–13 April 1994; pp. 161–175. [Google Scholar]
- Kondrak, G. N-gram similarity and distance. In International Symposium on String Processing and Information Retrieval; Springer: Berlin/Heidelberg, Germany, 2005; pp. 115–126. [Google Scholar]
- Li, Y.H.; Jain, A.K. Classification of text documents. Comput. J. 1998, 41, 537–546. [Google Scholar] [CrossRef]
- Joachims, T. Text categorization with support vector machines: Learning with many relevant features. In Proceedings of the European Conference on Machine Learning, Chemnitz, Germany, 21–23 April 1998; Springer: Berlin/Heidelberg, Germany, 1998; pp. 137–142. [Google Scholar]
- Lee, J.Y.; Dernoncourt, F. Sequential short-text classification with recurrent and convolutional neural networks. arXiv 2016, arXiv:1603.03827. [Google Scholar]
- Vaswani, A. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 6000–6010. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving language understanding by generative pre-training. OpenAI 2018. Available online: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf (accessed on 18 April 2025).
- Ghatoura, P.S.; Hosseini, S.E.; Pervez, S.; Iqbal, M.J.; Shaukat, N. Sentiment Analysis of Product Reviews Using Machine Learning and Pre-Trained LLM. Big Data Cogn. Comput. 2024, 8, 199. [Google Scholar] [CrossRef]
- Wawer, M.; Chudziak, J.A.; Niewiadomska-Szynkiewicz, E. Large Language Models and the Elliott Wave Principle: A Multi-Agent Deep Learning Approach to Big Data Analysis in Financial Markets. Appl. Sci. 2024, 14, 11897. [Google Scholar] [CrossRef]
- Delgadillo, J.; Kinyua, J.; Mutigwe, C. FinSoSent: Advancing Financial Market Sentiment Analysis through Pretrained Large Language Models. Big Data Cogn. Comput. 2024, 8, 87. [Google Scholar] [CrossRef]
- Jiang, A.Q.; Sablayrolles, A.; Mensch, A.; Bamford, C.; Chaplot, D.S.; Casas, D.D.L.; Sayed, W.E. Mistral 7B. arXiv 2023, arXiv:2310.06825. [Google Scholar]
- Team, G.; Mesnard, T.; Hardin, C.; Dadashi, R.; Bhupatiraju, S.; Pathak, S.; Kenealy, K. Gemma: Open models based on gemini research and technology. arXiv 2024, arXiv:2403.08295. [Google Scholar]
- Onwuegbuche, F.C.; Wafula, J.M.; Mung’atu, J.K. Support Vector Machine for Sentiment Analysis of Nigerian Banks’ Financial Tweets. J. Data Anal. Inf. Process. 2019, 7, 153–170. [Google Scholar] [CrossRef]
- Antweiler, W.; Frank, M.Z. Is all that talk just noise? The information content of internet stock message boards. J. Financ. 2004, 59, 1259–1294. [Google Scholar] [CrossRef]
- Sun, Y.; Liu, X.; Chen, G.; Hao, Y.; Zhang, Z.J. How mood affects the stock market: Empirical evidence from microblogs. Inf. Manag. 2020, 57, 103181. [Google Scholar] [CrossRef]
- Kojima, T.; Gu, S.S.; Reid, M.; Matsuo, Y.; Iwasawa, Y. Large language models are zero-shot reasoners. Adv. Neural Inf. Process. Syst. 2022, 35, 22199–22213. [Google Scholar]
- Xu, C.; Xu, Y.; Wang, S.; Liu, Y.; Zhu, C.; McAuley, J. Small models are valuable plug-ins for large language models. arXiv 2023, arXiv:2305.08848. [Google Scholar]
- Wu, R. Portfolio Performance Based on LLM News Scores and Related Economical Analysis. SSRN: 4709617, 2024. Available online: http://dx.doi.org/10.2139/ssrn.4709617 (accessed on 18 April 2025).
- Sahoo, A.; Chanda, R.; Das, N.; Sadhukhan, B. Comparative Analysis of BERT Models for Sentiment Analysis on Twitter Data. In Proceedings of the 2023 9th International Conference on Smart Computing and Communications (ICSCC), Kochi, India, 17–19 August 2023; pp. 658–663. [Google Scholar]
- Liu, Z.; Huang, D.; Huang, K.; Li, Z.; Zhao, J. FinBERT: A pre-trained financial language representation model for financial text mining. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, Yokohama, Japan, 7–15 January 2021; pp. 4513–4519. [Google Scholar]
- Soleimanian, M. Board Environmental, Social, and Governance (ESG) Expertise and the Usefulness of ESG Reports. McGill University, 2024. Available online: https://www.mcgill.ca/desautels/files/desautels/board_environmental_social_and_governance.pdf (accessed on 18 April 2025).
No. | Ticker | Company Name (Location) | News Count | No. | Ticker | Company Name (Location) | News Count | No. | Ticker | Company Name (Location) | News Count |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | AAPL | Apple Inc. (Cupertino, CA, USA) | 598 | 11 | BKNG | Booking Holdings Inc. (Norwalk, CT, USA) | 763 | 21 | META | Meta Platforms, Inc. (Menlo Park, CA, USA) | 742 |
2 | ADBE | Adobe Inc. (San Jose, CA, USA) | 146 | 12 | CMCSA | Comcast Corporation (Philadelphia, PA, USA) | 77 | 22 | MSFT | Microsoft Corporation (Redmond, WA, USA) | 583 |
3 | AMAT | Applied Materials, Inc. (Santa Clara, CA, USA) | 354 | 13 | COST | Costco Wholesale Corporation (Issaquah, WA, USA) | 743 | 23 | NFLX | Netflix, Inc. (Los Gatos, CA, USA) | 255 |
4 | AMD | Advanced Micro Devices, Inc. (Santa Clara, CA, USA) | 745 | 14 | CSCO | Cisco Systems, Inc. (San Jose, CA, USA) | 212 | 24 | NVDA | NVIDIA Corporation (Santa Clara, CA, USA) | 426 |
5 | AMGN | Amgen Inc. (Thousand Oaks, CA, USA) | 81 | 15 | GOOG | Alphabet Inc. (Mountain View, CA, USA) | 339 | 25 | PDD | PDD Holdings Inc. (Shanghai, China) | 426 |
6 | AMZN | Amazon.com, Inc. (Seattle, WA, USA) | 731 | 16 | GOOGL | Alphabet Inc. (Mountain View, CA, USA) | 381 | 26 | PEP | PepsiCo, Inc. (Harrison, NY, USA) | 756 |
7 | ARM | Arm Holdings plc (Cambridge, UK) | 759 | 17 | HON | Honeywell International Inc. (Charlotte, NC, USA) | 758 | 27 | QCOM | Qualcomm Incorporated (San Diego, CA, USA) | 715 |
8 | ASML | ASML Holding N.V. (Veldhoven, Netherlands) | 460 | 18 | INTU | Intuit Inc. (Mountain View, CA, USA) | 195 | 28 | TMUS | T-Mobile US, Inc. (Bellevue, WA, USA) | 437 |
9 | AVGO | Broadcom Inc. (San Jose, CA, USA) | 122 | 19 | ISRG | Intuitive Surgical, Inc. (Sunnyvale, CA, USA) | 85 | 29 | TSLA | Tesla, Inc. (Austin, Texas, USA) | 692 |
10 | AZN | AstraZeneca plc (Cambridge, UK) | 242 | 20 | LIN | Linde plc (Guildford, UK) | 766 | 30 | TXN | Texas Instruments Incorporated (Dallas, Texas, USA) | 107 |
Model Type | Model Name | Manufacturer (Location) | Release Year | Number of Parameters | Advantages | Disadvantages | Features |
---|---|---|---|---|---|---|---|
Discriminative model | BERT- Sentiment | NLP Town (Lubbeek, Belgium) | 2021 | 167.36 M | Excellent for multilingual sentiment analysis and classifying sentiments as positive, negative, or neutral | Optimized for short texts such as Twitter; limited in analyzing long texts | Based on BERT, optimized for sentiment analysis of user reviews and social media texts |
FinBERT-tone | Hong Kong University of Science and Technology (Kowloon, Hong Kong) | 2021 | 109.75 M | Excellent performance in English sentiment analysis specialized for the financial domain | Performance may degrade when applied to texts outside the financial field | Based on BERT, fine-tuned to analyze sentiments (positive, negative, or neutral) in financial texts | |
RoBERTa- Finance | Concordia University (Montreal, QC, Canada) | 2021 | 355.36 M | Optimized performance for analyzing financial news and reports | Limited when applied to texts outside the financial domain | Trained RoBERTa model for financial text sentiment analysis | |
Generative model | Llama 3.1 3.2 | Meta (Menlo Park, CA, USA) | 2024 | 3 B, 8 B | High performance provided as open-source for free | May be inferior in performance compared with some competitive models | Uses a tokenizer with 128,000 tokens and Grouped-Query Attention |
Mistral | Mistral AI (Paris, France) | 2023 | 7 B | High performance with a lightweight model | Limited in complex tasks compared with ultra-large models | Suitable for real-time data analysis and conversational AI applications | |
Gemma 2 | Google (Mountain View, CA, USA) | 2024 | 27 B | Excellent sentiment analysis capabilities across various languages and cultural contexts | May be limited in certain advanced analytical tasks | Outstanding performance in text data processing and long text analysis |
Model Name | Portfolio Strategy | Daily Average Return (%) | Standard Deviation (%) | Sharpe Ratio | MDD (%) | Final Return (%) |
---|---|---|---|---|---|---|
FinBERT | Long | 0.0423 | 1.5575 | 0.4316 | −45.84 | 26.60 |
Short | −0.0759 | 1.5989 | −0.7538 | −59.93 | −50.03 | |
Long–Short | −0.0168 | 0.8203 | −0.3250 | −34.12 | −14.57 | |
BERT | Long | 0.0428 | 1.2055 | 0.5637 | −31.01 | 31.98 |
Short | −0.0444 | 1.3478 | −0.5225 | −46.23 | −34.14 | |
Long–Short | −0.0008 | 0.4075 | −0.0302 | −16.47 | −1.25 | |
RoBERTa Finance | Long | 0.0459 | 1.2625 | 0.5772 | −26.39 | 34.48 |
Short | −0.0328 | 1.6084 | −0.3235 | −42.60 | −30.08 | |
Long-Short | 0.0066 | 0.6252 | 0.1667 | −17.17 | 3.67 | |
Llama 3.1 | Long | 0.0526 | 1.2975 | 0.6433 | −25.48 | 41.17 |
Short | −0.0320 | 1.5579 | −0.3264 | −39.54 | −29.21 | |
Long–Short | 0.0103 | 0.5515 | 0.2957 | −15.80 | 7.07 | |
Llama 3.2 | Long | 0.0104 | 1.3729 | 0.1208 | −39.65 | 0.79 |
Short | −0.0586 | 1.4729 | −0.6311 | −50.25 | −41.89 | |
Long–Short | −0.0241 | 0.5859 | −0.6519 | −32.05 | −18.24 | |
Mistral | Long | 0.0122 | 1.4916 | 0.1300 | −49.76 | 0.83 |
Short | −0.0275 | 1.5135 | −0.2883 | −43.20 | −26.24 | |
Long–Short | −0.0076 | 0.6752 | −0.1795 | −24.31 | −7.45 | |
Gemma 2 | Long | 0.0546 | 1.3706 | 0.6329 | −26.05 | 42.39 |
Short | −0.0430 | 1.5237 | −0.4478 | −45.01 | −34.74 | |
Long–Short | 0.0058 | 0.6654 | 0.1391 | −12.37 | 2.86 | |
Nasdaq 30 | Nasdaq 30 | 0.0406 | 1.1772 | 0.5476 | −21.19 | 30.10 |
Model Name | Portfolio Strategy | Daily Average Return (%) | Standard Deviation (%) | Sharpe Ratio | MDD (%) | Final Return (%) |
---|---|---|---|---|---|---|
Llama 3.2 BERT-ICL | Long | 0.0637 | 1.2440 | 0.8126 | −27.57 | 54.78 |
Llama 3.1 BERT-ICL | Long | 0.0625 | 1.2366 | 0.8025 | −22.63 | 53.49 |
Llama 3.2 CoT | Long | 0.0609 | 1.2655 | 0.7638 | −30.84 | 51.12 |
Llama 3.2 FinBERT-ICL | Long | 0.0722 | 1.5267 | 0.7511 | −32.18 | 60.50 |
Mistral BERT-ICL | Long | 0.0574 | 1.2241 | 0.7447 | −25.79 | 47.68 |
Nasdaq 30 | Nasdaq 30 | 0.0406 | 1.1772 | 0.5476 | −21.19 | 30.10 |
Model Name | Portfolio Strategy | Daily Average Return (%) | Standard Deviation (%) | Sharpe Ratio | MDD (%) | Final Return (%) |
---|---|---|---|---|---|---|
Llama 3.1 BERT-BOOTICL | Long | 0.0641 | 1.2194 | 0.8346 | −27.14 | 55.67 |
Llama 3.2 BERT-ICL | Long | 0.0636 | 1.2439 | 0.8125 | −27.56 | 54.77 |
Llama 3.1 BERT-ICL | Long | 0.0625 | 1.2366 | 0.8025 | −22.63 | 53.49 |
Llama 3.2 BERT-BOOTICL | Long | 0.0588 | 1.2169 | 0.7677 | −27.88 | 49.44 |
Nasdaq 30 | Nasdaq 30 | 0.0406 | 1.1772 | 0.5476 | −21.19 | 30.10 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mun, Y.; Kim, N. Leveraging Large Language Models for Sentiment Analysis and Investment Strategy Development in Financial Markets. J. Theor. Appl. Electron. Commer. Res. 2025, 20, 77. https://doi.org/10.3390/jtaer20020077
Mun Y, Kim N. Leveraging Large Language Models for Sentiment Analysis and Investment Strategy Development in Financial Markets. Journal of Theoretical and Applied Electronic Commerce Research. 2025; 20(2):77. https://doi.org/10.3390/jtaer20020077
Chicago/Turabian StyleMun, Yejoon, and Namhyoung Kim. 2025. "Leveraging Large Language Models for Sentiment Analysis and Investment Strategy Development in Financial Markets" Journal of Theoretical and Applied Electronic Commerce Research 20, no. 2: 77. https://doi.org/10.3390/jtaer20020077
APA StyleMun, Y., & Kim, N. (2025). Leveraging Large Language Models for Sentiment Analysis and Investment Strategy Development in Financial Markets. Journal of Theoretical and Applied Electronic Commerce Research, 20(2), 77. https://doi.org/10.3390/jtaer20020077