Hallucination Mitigation for Retrieval-Augmented Large Language Models: A Review
Abstract
:1. Introduction
1.1. Comparison with Existing Reviews and Surveys
1.2. Organization of the Review
2. Concepts and Definitions
2.1. Retrieval-Augmented Generation
2.2. Hallucinations or Confabulation in Large Language Models
2.3. Common Prompt Techniques
- Clear and Precise Prompts
- Role-Playing Prompts
- Few-Shot Prompts
- Chain-of-Thought (CoT) Prompts
- Program-of-Thought (PoT) Prompts
- Opinion-Based Prompts
3. Hallucination Causes in Retrieval-Augmented Generation Process
3.1. Retrieval Failure
- Data Source Problem
- Query Problem
- Retriever Problem
- Retrieval Strategy Problem
3.2. Generation Deficiency
- Context Noise
- Context Conflict
- Middle Curse
- Alignment Problem
- Capability Boundary
4. Solutions to Retrieval Failure
4.1. Data Source Reliability
4.2. User Query Construction
4.2.1. Ambiguous Queries
4.2.2. Complex Queries
4.3. Retriever Validity
4.3.1. Search Granularity
4.3.2. Embedding
4.4. Retrieval Strategies
5. Solutions to Generation Deficiency
5.1. Reduce Context Noise
5.2. Balance Knowledge Conflicts
5.3. Utilize Middle Information
5.4. Alignment Resolution
5.4.1. Context Faithfulness
5.4.2. Query Faithfulness
5.5. Integrate External Resources
6. Mitigation After Hallucination Detection
7. Conclusions
Category | Method | Metric | Dataset | Challenges |
---|---|---|---|---|
Data Source | CAG [54] | EM; Noise Ratio; Document Credibility | HotpotQA [122]; 2WikiMQA [123]; MuSiQue [124]; ASQA [125]; RGB [126]; EvolvTempQA [54]; NewsPolluQA [54] | Reliance on additional training data and annotations; limited scope for external resource integration |
Query | RETPO [65] | MRR; NDCG; Recall@k; Clarity; Conciseness; Informativeness | QReCC [62]; TopiOCQA [63] | Limited testing scope; limited prompting methods |
Beam AggR [70] | F1 | Bamboogle [127]; MuSiQue [124]; HotpotQA [122]; 2WikiMQA [123] | Multi-source reasoning and probabilistic aggregation raise computational costs; external knowledge is unstructured | |
Retriever | KGP [79] | Acc; EM; F1; PDFTriage; Struct-EM | MuSiQue [124]; HotpotQA [122]; 2WikiMQA [123]; IIRC [128]; PDFTriage [129] | Details not provided |
GRIT [81] | Acc; MAP; AP; V-Measure; nDCG; SC; EM; pass@1; Win Ratio | E5 [130];Tülu2 [131]; MMLU [4]; TyDi QA [132] GSM8K [133]; MTEB [134]; HumanEvalSyn [135]; AlpacaEval [136]; | Lack of autonomous retrieval capability; insufficient pre-training data; lengthy GRITLM format | |
Retrieval Strategy | Adaptive RAG [90] | F1; EM; Acc | SQuAD [137]; NQ [100]; TabularQA [138]; MuSiQue [124]; HotpotQA [122]; 2WikiMQA [123] | Lack of datasets for training query complexity classifier; not good performance of classifier |
Context Noise | CoN [99] | EM; F1; Reject Ratio; Acc | NQ [100]; TriviaQA [139]; WebQuestions [140]; RealTimeQA [141] | The sequential generation of reading notes extends response times |
Knowledge Conflicts | COMBO [104] | EM; F1 | NQ [100]; TriviaQA [139]; HotpotQA [122]; WebQ [140] | Lack of adaptability to multi-hop question answering tasks |
Utilize Context | SPC [105] | Processing Time; Acc; Cost; Compression Rate | CNN/Daily Mail [142]; SST-2 [143]; AG News [144]; SQuAD v2.0 [137] | Need to balance efficiency and thoroughness to avoid losing important infor- mation through excessive compression |
Alignment | Plan-based Text Generation [108] | Correctness; Attribution; ROUGE-L; Answer- ability; AutoAIS; ANLI | AQuAMuSe [145]; NQ [100]; | Explore decoder-only models, passage indexing; use questions as additional training signals for retrieval |
Mix-Align [111] | Gold Answer Coverage; Accepted; Hallucination | FuzzyQA [111] | Clarification steps add computational load and time cost; no causal path links que- stion, evidence, and answer | |
External Resource | PAL [48] | Acc | GSM8K [133]; SVAMP [146]; ASDIV [147]; MAWPS [148]; BIG-BenchHard [149]; | Details not provided |
Mitigation after Detection | Active Detection [115] | Precision; Hallucination Ratio; Recall; Mitigation Success Ratio | HotpotQA [122]; Manual | Details not provided |
Entropy Word [20] | HVI; Hallucination Ratio; Mitigation Success Rate | HILT [20]; | Details not provided |
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S.; et al. Gpt-4 technical report. arXiv 2023, arXiv:2303.08774. [Google Scholar]
- Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. Llama: Open and efficient foundation language models. arXiv 2023, arXiv:2302.13971. [Google Scholar]
- Team, G.; Anil, R.; Borgeaud, S.; Alayrac, J.B.; Yu, J.; Soricut, R.; Schalkwyk, J.; Dai, A.M.; Hauth, A.; Millican, K.; et al. Gemini: A family of highly capable multimodal models. arXiv 2023, arXiv:2312.11805. [Google Scholar]
- Hendrycks, D.; Burns, C.; Basart, S.; Zou, A.; Mazeika, M.; Song, D.; Steinhardt, J. Measuring massive multitask language understanding. arXiv 2020, arXiv:2009.03300. [Google Scholar]
- Zhang, T.; Ladhak, F.; Durmus, E.; Liang, P.; McKeown, K.; Hashimoto, T.B. Benchmarking large language models for news summarization. Trans. Assoc. Comput. Linguist. 2024, 12, 39–57. [Google Scholar] [CrossRef]
- Yu, F.; Zhang, H.; Tiwari, P.; Wang, B. Natural language reasoning, a survey. ACM Comput. Surv. 2024, 56, 1–39. [Google Scholar] [CrossRef]
- Kim, H.J.; Cho, H.; Kim, J.; Kim, T.; Yoo, K.M.; Lee, S.g. Self-generated in-context learning: Leveraging auto-regressive language models as a demonstration generator. arXiv 2022, arXiv:2206.08082. [Google Scholar]
- Morris, M.R. Prompting Considered Harmful. Commun. ACM 2024, 67, 28–30. [Google Scholar] [CrossRef]
- Liu, H.; Xue, W.; Chen, Y.; Chen, D.; Zhao, X.; Wang, K.; Hou, L.; Li, R.; Peng, W. A survey on hallucination in large vision-language models. arXiv 2024, arXiv:2402.00253. [Google Scholar]
- Sahoo, P.; Meharia, P.; Ghosh, A.; Saha, S.; Jain, V.; Chadha, A. A comprehensive survey of hallucination in large language, image, video and audio foundation models. In Findings of the Association for Computational Linguistics: EMNLP 2024; Association for Computational Linguistics: Miami, FL, USA, 2024; pp. 11709–11724. [Google Scholar]
- Alghamdi, G.A.T.K.Z.; Liu, H. Can Knowledge Graphs Reduce Hallucinations in LLMs?: A Survey. arXiv 2023, arXiv:2311.07914. [Google Scholar]
- Tonmoy, S.; Zaman, S.; Jain, V.; Rani, A.; Rawte, V.; Chadha, A.; Das, A. A comprehensive survey of hallucination mitigation techniques in large language models. arXiv 2024, arXiv:2401.01313. [Google Scholar]
- Huang, L.; Yu, W.; Ma, W.; Zhong, W.; Feng, Z.; Wang, H.; Chen, Q.; Peng, W.; Feng, X.; Qin, B.; et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Trans. Inf. Syst. 2023, 43, 1–55. [Google Scholar] [CrossRef]
- Zhang, Y.; Li, Y.; Cui, L.; Cai, D.; Liu, L.; Fu, T.; Huang, X.; Zhao, E.; Zhang, Y.; Chen, Y.; et al. Siren’s song in the AI ocean: A survey on hallucination in large language models. arXiv 2023, arXiv:2309.01219. [Google Scholar]
- Macpherson, F.; Platchias, D. Hallucination: Philosophy and Psychology; EBL-Schweitzer, MIT Press: Cambridge, MA, USA, 2013. [Google Scholar]
- Smith, A.L.; Greaves, F.; Panch, T. Hallucination or confabulation? Neuroanatomy as metaphor in large language models. PLoS Digit. Health 2023, 2, e0000388. [Google Scholar] [CrossRef]
- Weinstein, E.A. Linguistic aspects of amnesia and confabulation. In Principles, Practices, and Positions in Neuropsychiatric Research; Elsevier: Amsterdam, The Netherlands, 1972; pp. 439–444. [Google Scholar]
- Sui, P.; Duede, E.; Wu, S.; So, R.J. Confabulation: The Surprising Value of Large Language Model Hallucinations. arXiv 2024, arXiv:2406.04175. [Google Scholar]
- Ji, Z.; Lee, N.; Frieske, R.; Yu, T.; Su, D.; Xu, Y.; Ishii, E.; Bang, Y.J.; Madotto, A.; Fung, P. Survey of hallucination in natural language generation. ACM Comput. Surv. 2023, 55, 1–38. [Google Scholar] [CrossRef]
- Rawte, V.; Chakraborty, S.; Pathak, A.; Sarkar, A.; Tonmoy, S.; Chadha, A.; Sheth, A.P.; Das, A. The troubling emergence of hallucination in large language models–an extensive definition, quantification, and prescriptive remediations. arXiv 2023, arXiv:2310.04988. [Google Scholar]
- Barnett, S.; Kurniawan, S.; Thudumu, S.; Brannelly, Z.; Abdelrazek, M. Seven failure points when engineering a retrieval augmented generation system. In Proceedings of the IEEE/ACM 3rd International Conference on AI Engineering-Software Engineering for AI, Lisbon, Portugal, 14–15 April 2024; pp. 194–199. [Google Scholar]
- Shanahan, M.; McDonell, K.; Reynolds, L. Role play with large language models. Nature 2023, 623, 493–498. [Google Scholar] [CrossRef]
- Logan IV, R.L.; Balažević, I.; Wallace, E.; Petroni, F.; Singh, S.; Riedel, S. Cutting down on prompts and parameters: Simple few-shot learning with language models. arXiv 2021, arXiv:2106.13353. [Google Scholar]
- Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Xia, F.; Chi, E.; Le, Q.V.; Zhou, D. Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 2022, 35, 24824–24837. [Google Scholar]
- Zhang, Z.; Zhang, A.; Li, M.; Smola, A. Automatic chain of thought prompting in large language models. arXiv 2022, arXiv:2210.03493. [Google Scholar]
- Chen, W.; Ma, X.; Wang, X.; Cohen, W.W. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. arXiv 2022, arXiv:2211.12588. [Google Scholar]
- Zhou, W.; Zhang, S.; Poon, H.; Chen, M. Context-faithful prompting for large language models. arXiv 2023, arXiv:2303.11315. [Google Scholar]
- Izacard, G.; Lewis, P.; Lomeli, M.; Hosseini, L.; Petroni, F.; Schick, T.; Dwivedi-Yu, J.; Joulin, A.; Riedel, S.; Grave, E. Atlas: Few-shot learning with retrieval augmented language models. J. Mach. Learn. Res. 2023, 24, 1–43. [Google Scholar]
- Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.t.; Rocktäschel, T.; et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Adv. Neural Inf. Process. Syst. 2020, 33, 9459–9474. [Google Scholar]
- Ram, O.; Levine, Y.; Dalmedigos, I.; Muhlgay, D.; Shashua, A.; Leyton-Brown, K.; Shoham, Y. In-context retrieval-augmented language models. Trans. Assoc. Comput. Linguist. 2023, 11, 1316–1331. [Google Scholar] [CrossRef]
- Shi, W.; Min, S.; Yasunaga, M.; Seo, M.; James, R.; Lewis, M.; Zettlemoyer, L.; Yih, W.T. Replug: Retrieval-augmented black-box language models. arXiv 2023, arXiv:2301.12652. [Google Scholar]
- Lazaridou, A.; Gribovskaya, E.; Stokowiec, W.; Grigorev, N. Internet-augmented language models through few-shot prompting for open-domain question answering. arXiv 2022, arXiv:2203.05115. [Google Scholar]
- Borgeaud, S.; Mensch, A.; Hoffmann, J.; Cai, T.; Rutherford, E.; Millican, K.; Van Den Driessche, G.B.; Lespiau, J.B.; Damoc, B.; Clark, A.; et al. Improving language models by retrieving from trillions of tokens. In Proceedings of the International Conference on Machine Learning, PMLR, Baltimore, MD, USA, 17–23 July 2022; pp. 2206–2240. [Google Scholar]
- Cuconasu, F.; Trappolini, G.; Siciliano, F.; Filice, S.; Campagnano, C.; Maarek, Y.; Tonellotto, N.; Silvestri, F. The power of noise: Redefining retrieval for rag systems. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, Washington, DC, USA, 14–18 July 2024; pp. 719–729. [Google Scholar]
- Zhao, B.; Brumbaugh, Z.; Wang, Y.; Hajishirzi, H.; Smith, N.A. Set the clock: Temporal alignment of pretrained language models. arXiv 2024, arXiv:2402.16797. [Google Scholar]
- Xu, R.; Qi, Z.; Guo, Z.; Wang, C.; Wang, H.; Zhang, Y.; Xu, W. Knowledge conflicts for llms: A survey. arXiv 2024, arXiv:2403.08319. [Google Scholar]
- Longpre, S.; Perisetla, K.; Chen, A.; Ramesh, N.; DuBois, C.; Singh, S. Entity-based knowledge conflicts in question answering. arXiv 2021, arXiv:2109.05052. [Google Scholar]
- Chen, H.T.; Zhang, M.J.; Choi, E. Rich knowledge sources bring complex knowledge conflicts: Recalibrating models to reflect conflicting evidence. arXiv 2022, arXiv:2210.13701. [Google Scholar]
- Xie, J.; Zhang, K.; Chen, J.; Lou, R.; Su, Y. Adaptive chameleon or stubborn sloth: Revealing the behavior of large language models in knowledge conflicts. arXiv 2023, arXiv:2305.13300. [Google Scholar]
- Perez, E.; Ringer, S.; Lukošiūtė, K.; Nguyen, K.; Chen, E.; Heiner, S.; Pettit, C.; Olsson, C.; Kundu, S.; Kadavath, S.; et al. Discovering language model behaviors with model-written evaluations. arXiv 2022, arXiv:2212.09251. [Google Scholar]
- Turpin, M.; Michael, J.; Perez, E.; Bowman, S. Language models don’t always say what they think: Unfaithful explanations in chain-of-thought prompting. Adv. Neural Inf. Process. Syst. 2024, 36, 74952–74965. [Google Scholar]
- Wei, J.; Huang, D.; Lu, Y.; Zhou, D.; Le, Q.V. Simple synthetic data reduces sycophancy in large language models. arXiv 2023, arXiv:2308.03958. [Google Scholar]
- Sharma, M.; Tong, M.; Korbak, T.; Duvenaud, D.; Askell, A.; Bowman, S.R.; Cheng, N.; Durmus, E.; Hatfield-Dodds, Z.; Johnston, S.R.; et al. Towards understanding sycophancy in language models. arXiv 2023, arXiv:2310.13548. [Google Scholar]
- He, J.; Pan, K.; Dong, X.; Song, Z.; Liu, Y.; Liang, Y.; Wang, H.; Sun, Q.; Zhang, S.; Xie, Z. Never lost in the middle: Improving large language models via attention strengthening question answering. arXiv 2023, arXiv:2311.09198. [Google Scholar]
- Waswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the NIPS 2017, Beach, CA, USA, 7 December 2017. [Google Scholar]
- Chen, H.T.; Xu, F.; Arora, S.; Choi, E. Understanding retrieval augmentation for long-form question answering. arXiv 2023, arXiv:2310.12150. [Google Scholar]
- Tang, R.; Zhang, X.; Ma, X.; Lin, J.; Ture, F. Found in the middle: Permutation self-consistency improves listwise ranking in large language models. arXiv 2023, arXiv:2310.07712. [Google Scholar]
- Gao, L.; Madaan, A.; Zhou, S.; Alon, U.; Liu, P.; Yang, Y.; Callan, J.; Neubig, G. Pal: Program-aided language models. In Proceedings of the International Conference on Machine Learning, PMLR 2023, Honolulu, HI, USA, 23–29 July 2023; pp. 10764–10799. [Google Scholar]
- Wang, X.; Yang, Q.; Qiu, Y.; Liang, J.; He, Q.; Gu, Z.; Xiao, Y.; Wang, W. Knowledgpt: Enhancing large language models with retrieval and storage access on knowledge bases. arXiv 2023, arXiv:2308.11761. [Google Scholar]
- Yu, W.; Iter, D.; Wang, S.; Xu, Y.; Ju, M.; Sanyal, S.; Zhu, C.; Zeng, M.; Jiang, M. Generate rather than retrieve: Large language models are strong context generators. arXiv 2022, arXiv:2209.10063. [Google Scholar]
- Cheng, X.; Luo, D.; Chen, X.; Liu, L.; Zhao, D.; Yan, R. Lift yourself up: Retrieval-augmented text generation with self-memory. Adv. Neural Inf. Process. Syst. 2024, 36, 43780–43799. [Google Scholar]
- Black, S.; Biderman, S.; Hallahan, E.; Anthony, Q.; Gao, L.; Golding, L.; He, H.; Leahy, C.; McDonell, K.; Phang, J.; et al. Gpt-neox-20b: An open-source autoregressive language model. arXiv 2022, arXiv:2204.06745. [Google Scholar]
- Asai, A.; Zhong, Z.; Chen, D.; Koh, P.W.; Zettlemoyer, L.; Hajishirzi, H.; Yih, W.T. Reliable, adaptable, and attributable language models with retrieval. arXiv 2024, arXiv:2403.03187. [Google Scholar]
- Pan, R.; Cao, B.; Lin, H.; Han, X.; Zheng, J.; Wang, S.; Cai, X.; Sun, L. Not All Contexts Are Equal: Teaching LLMs Credibility-aware Generation. arXiv 2024, arXiv:2404.06809. [Google Scholar]
- Wang, L.; Yang, N.; Wei, F. Query2doc: Query expansion with large language models. arXiv 2023, arXiv:2303.07678. [Google Scholar]
- Jagerman, R.; Zhuang, H.; Qin, Z.; Wang, X.; Bendersky, M. Query expansion by prompting large language models. arXiv 2023, arXiv:2305.03653. [Google Scholar]
- Wang, S.; Yu, X.; Wang, M.; Chen, W.; Zhu, Y.; Dou, Z. Richrag: Crafting rich responses for multi-faceted queries in retrieval-augmented generation. arXiv 2024, arXiv:2406.12566. [Google Scholar]
- Kim, G.; Kim, S.; Jeon, B.; Park, J.; Kang, J. Tree of clarifications: Answering ambiguous questions with retrieval-augmented large language models. arXiv 2023, arXiv:2310.14696. [Google Scholar]
- Ma, X.; Gong, Y.; He, P.; Zhao, H.; Duan, N. Query rewriting for retrieval-augmented large language models. arXiv 2023, arXiv:2305.14283. [Google Scholar]
- Mao, S.; Jiang, Y.; Chen, B.; Li, X.; Wang, P.; Wang, X.; Xie, P.; Huang, F.; Chen, H.; Zhang, N. RaFe: Ranking Feedback Improves Query Rewriting for RAG. arXiv 2024, arXiv:2405.14431. [Google Scholar]
- Qu, C.; Yang, L.; Chen, C.; Qiu, M.; Croft, W.B.; Iyyer, M. Open-retrieval conversational question answering. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Xi’an, China, 25–30 July 2020; pp. 539–548. [Google Scholar]
- Anantha, R.; Vakulenko, S.; Tu, Z.; Longpre, S.; Pulman, S.; Chappidi, S. Open-domain question answering goes conversational via question rewriting. arXiv 2020, arXiv:2010.04898. [Google Scholar]
- Adlakha, V.; Dhuliawala, S.; Suleman, K.; de Vries, H.; Reddy, S. Topiocqa: Open-domain conversational question answering with topic switching. Trans. Assoc. Comput. Linguist. 2022, 10, 468–483. [Google Scholar] [CrossRef]
- Choi, E.; Palomaki, J.; Lamm, M.; Kwiatkowski, T.; Das, D.; Collins, M. Decontextualization: Making sentences stand-alone. Trans. Assoc. Comput. Linguist. 2021, 9, 447–461. [Google Scholar] [CrossRef]
- Yoon, C.; Kim, G.; Jeon, B.; Kim, S.; Jo, Y.; Kang, J. Ask Optimal Questions: Aligning Large Language Models with Retriever’s Preference in Conversational Search. arXiv 2024, arXiv:2402.11827. [Google Scholar]
- Shao, Y.; Jiang, Y.; Kanell, T.A.; Xu, P.; Khattab, O.; Lam, M.S. Assisting in writing wikipedia-like articles from scratch with large language models. arXiv 2024, arXiv:2402.14207. [Google Scholar]
- Zhou, D.; Schärli, N.; Hou, L.; Wei, J.; Scales, N.; Wang, X.; Schuurmans, D.; Cui, C.; Bousquet, O.; Le, Q.; et al. Least-to-most prompting enables complex reasoning in large language models. arXiv 2022, arXiv:2205.10625. [Google Scholar]
- Khot, T.; Trivedi, H.; Finlayson, M.; Fu, Y.; Richardson, K.; Clark, P.; Sabharwal, A. Decomposed prompting: A modular approach for solving complex tasks. arXiv 2022, arXiv:2210.02406. [Google Scholar]
- Cao, S.; Zhang, J.; Shi, J.; Lv, X.; Yao, Z.; Tian, Q.; Li, J.; Hou, L. Probabilistic Tree-of-thought Reasoning for Answering Knowledge-intensive Complex Questions. arXiv 2023, arXiv:2311.13982. [Google Scholar]
- Chu, Z.; Chen, J.; Chen, Q.; Wang, H.; Zhu, K.; Du, X.; Yu, W.; Liu, M.; Qin, B. BeamAggR: Beam Aggregation Reasoning over Multi-source Knowledge for Multi-hop Question Answering. arXiv 2024, arXiv:2406.19820. [Google Scholar]
- Gao, Y.; Xiong, Y.; Gao, X.; Jia, K.; Pan, J.; Bi, Y.; Dai, Y.; Sun, J.; Wang, H. Retrieval-augmented generation for large language models: A survey. arXiv 2023, arXiv:2312.10997. [Google Scholar]
- Wang, Z.; Teo, S.X.; Ouyang, J.; Xu, Y.; Shi, W. M-RAG: Reinforcing Large Language Model Performance through Retrieval-Augmented Generation with Multiple Partitions. arXiv 2024, arXiv:2405.16420. [Google Scholar]
- Sarthi, P.; Abdullah, S.; Tuli, A.; Khanna, S.; Goldie, A.; Manning, C.D. Raptor: Recursive abstractive processing for tree-organized retrieval. arXiv 2024, arXiv:2401.18059. [Google Scholar]
- Rampášek, L.; Galkin, M.; Dwivedi, V.P.; Luu, A.T.; Wolf, G.; Beaini, D. Recipe for a general, powerful, scalable graph transformer. Adv. Neural Inf. Process. Syst. 2022, 35, 14501–14515. [Google Scholar]
- Guo, Z.; Xia, L.; Yu, Y.; Ao, T.; Huang, C. Lightrag: Simple and fast retrieval-augmented generation. arXiv 2024, arXiv:2410.05779. [Google Scholar]
- Qian, H.; Zhang, P.; Liu, Z.; Mao, K.; Dou, Z. Memorag: Moving towards next-gen rag via memory-inspired knowledge discovery. arXiv 2024, arXiv:2409.05591. [Google Scholar]
- Gao, L.; Ma, X.; Lin, J.; Callan, J. Precise zero-shot dense retrieval without relevance labels. arXiv 2022, arXiv:2212.10496. [Google Scholar]
- Chan, C.M.; Xu, C.; Yuan, R.; Luo, H.; Xue, W.; Guo, Y.; Fu, J. Rq-rag: Learning to refine queries for retrieval augmented generation. arXiv 2024, arXiv:2404.00610. [Google Scholar]
- Wang, Y.; Lipka, N.; Rossi, R.A.; Siu, A.; Zhang, R.; Derr, T. Knowledge graph prompting for multi-document question answering. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 26 February 2024; Volume 38, pp. 19206–19214. [Google Scholar]
- Chen, T.; Wang, H.; Chen, S.; Yu, W.; Ma, K.; Zhao, X.; Zhang, H.; Yu, D. Dense x retrieval: What retrieval granularity should we use? arXiv 2023, arXiv:2312.06648. [Google Scholar]
- Muennighoff, N.; Su, H.; Wang, L.; Yang, N.; Wei, F.; Yu, T.; Singh, A.; Kiela, D. Generative representational instruction tuning. arXiv 2024, arXiv:2402.09906. [Google Scholar]
- Zhang, M.; Lan, S.; Hayes, P.; Barber, D. Mafin: Enhancing Black-Box Embeddings with Model Augmented Fine-tuning. arXiv 2024, arXiv:2402.12177. [Google Scholar]
- Asai, A.; Wu, Z.; Wang, Y.; Sil, A.; Hajishirzi, H. Self-rag: Learning to retrieve, generate, and critique through self-reflection. arXiv 2023, arXiv:2310.11511. [Google Scholar]
- Wang, Y.; Li, P.; Sun, M.; Liu, Y. Self-knowledge guided retrieval augmentation for large language models. arXiv 2023, arXiv:2310.05002. [Google Scholar]
- Jiang, Z.; Xu, F.F.; Gao, L.; Sun, Z.; Liu, Q.; Dwivedi-Yu, J.; Yang, Y.; Callan, J.; Neubig, G. Active retrieval augmented generation. arXiv 2023, arXiv:2305.06983. [Google Scholar]
- Trivedi, H.; Balasubramanian, N.; Khot, T.; Sabharwal, A. Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions. arXiv 2022, arXiv:2212.10509. [Google Scholar]
- Shao, Z.; Gong, Y.; Shen, Y.; Huang, M.; Duan, N.; Chen, W. Enhancing retrieval-augmented large language models with iterative retrieval-generation synergy. arXiv 2023, arXiv:2305.15294. [Google Scholar]
- Kang, M.; Kwak, J.M.; Baek, J.; Hwang, S.J. Knowledge graph-augmented language models for knowledge-grounded dialogue generation. arXiv 2023, arXiv:2305.18846. [Google Scholar]
- Chen, H.; Pasunuru, R.; Weston, J.; Celikyilmaz, A. Walking down the memory maze: Beyond context limit through interactive reading. arXiv 2023, arXiv:2310.05029. [Google Scholar]
- Jeong, S.; Baek, J.; Cho, S.; Hwang, S.J.; Park, J.C. Adaptive-rag: Learning to adapt retrieval-augmented large language models through question complexity. arXiv 2024, arXiv:2403.14403. [Google Scholar]
- Islam, S.B.; Rahman, M.A.; Hossain, K.; Hoque, E.; Joty, S.; Parvez, M.R. OPEN-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models. arXiv 2024, arXiv:2410.01782. [Google Scholar]
- Rafailov, R.; Sharma, A.; Mitchell, E.; Manning, C.D.; Ermon, S.; Finn, C. Direct preference optimization: Your language model is secretly a reward model. Adv. Neural Inf. Process. Syst. 2024, 36, 53728–53741. [Google Scholar]
- Vu, T.; Iyyer, M.; Wang, X.; Constant, N.; Wei, J.; Wei, J.; Tar, C.; Sung, Y.H.; Zhou, D.; Le, Q.; et al. Freshllms: Refreshing large language models with search engine augmentation. arXiv 2023, arXiv:2310.03214. [Google Scholar]
- Jiang, H.; Wu, Q.; Lin, C.Y.; Yang, Y.; Qiu, L. Llmlingua: Compressing prompts for accelerated inference of large language models. arXiv 2023, arXiv:2310.05736. [Google Scholar]
- Li, Y. Unlocking context constraints of llms: Enhancing context efficiency of llms with self-information-based content filtering. arXiv 2023, arXiv:2304.12102. [Google Scholar]
- Wang, Z.; Araki, J.; Jiang, Z.; Parvez, M.R.; Neubig, G. Learning to filter context for retrieval-augmented generation. arXiv 2023, arXiv:2311.08377. [Google Scholar]
- Xu, F.; Shi, W.; Choi, E. Recomp: Improving retrieval-augmented lms with compression and selective augmentation. arXiv 2023, arXiv:2310.04408. [Google Scholar]
- Liu, J.; Li, L.; Xiang, T.; Wang, B.; Qian, Y. Tcra-llm: Token compression retrieval augmented large language model for inference cost reduction. arXiv 2023, arXiv:2310.15556. [Google Scholar]
- Yu, W.; Zhang, H.; Pan, X.; Ma, K.; Wang, H.; Yu, D. Chain-of-note: Enhancing robustness in retrieval-augmented language models. arXiv 2023, arXiv:2311.09210. [Google Scholar]
- Kwiatkowski, T.; Palomaki, J.; Redfield, O.; Collins, M.; Parikh, A.; Alberti, C.; Epstein, D.; Polosukhin, I.; Devlin, J.; Lee, K.; et al. Natural questions: A benchmark for question answering research. Trans. Assoc. Comput. Linguist. 2019, 7, 453–466. [Google Scholar] [CrossRef]
- Pan, Y.; Pan, L.; Chen, W.; Nakov, P.; Kan, M.Y.; Wang, W.Y. On the risk of misinformation pollution with large language models. arXiv 2023, arXiv:2305.13661. [Google Scholar]
- Xu, R.; Lin, B.S.; Yang, S.; Zhang, T.; Shi, W.; Zhang, T.; Fang, Z.; Xu, W.; Qiu, H. The Earth is Flat because…: Investigating LLMs’ Belief towards Misinformation via Persuasive Conversation. arXiv 2023, arXiv:2312.09085. [Google Scholar]
- Wang, Y.; Feng, S.; Wang, H.; Shi, W.; Balachandran, V.; He, T.; Tsvetkov, Y. Resolving knowledge conflicts in large language models. arXiv 2023, arXiv:2310.00935. [Google Scholar]
- Zhang, Y.; Khalifa, M.; Logeswaran, L.; Lee, M.; Lee, H.; Wang, L. Merging generated and retrieved knowledge for open-domain QA. arXiv 2023, arXiv:2310.14393. [Google Scholar]
- Wang, C.; Yang, Y.; Li, R.; Sun, D.; Cai, R.; Zhang, Y.; Fu, C. Adapting llms for efficient context processing through soft prompt compression. In Proceedings of the International Conference on Modeling, Natural Language Processing and Machine Learning, Xi’an, China, 17–19 May 2024; pp. 91–97. [Google Scholar]
- Ravaut, M.; Joty, S.; Sun, A.; Chen, N.F. On position bias in summarization with large language models. arXiv 2023, arXiv:2310.10570. [Google Scholar]
- Gao, T.; Yen, H.; Yu, J.; Chen, D. Enabling large language models to generate text with citations. arXiv 2023, arXiv:2305.14627. [Google Scholar]
- Fierro, C.; Amplayo, R.K.; Huot, F.; De Cao, N.; Maynez, J.; Narayan, S.; Lapata, M. Learning to Plan and Generate Text with Citations. arXiv 2024, arXiv:2404.03381. [Google Scholar]
- Ye, X.; Sun, R.; Arik, S.Ö.; Pfister, T. Effective large language model adaptation for improved grounding. arXiv 2023, arXiv:2311.09533. [Google Scholar]
- Qi, J.; Sarti, G.; Fernández, R.; Bisazza, A. Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation. arXiv 2024, arXiv:2406.13663. [Google Scholar]
- Zhang, S.; Pan, L.; Zhao, J.; Wang, W.Y. The knowledge alignment problem: Bridging human and external knowledge for large language models. arXiv 2023, arXiv:2305.13669. [Google Scholar]
- Peng, B.; Galley, M.; He, P.; Cheng, H.; Xie, Y.; Hu, Y.; Huang, Q.; Liden, L.; Yu, Z.; Chen, W.; et al. Check your facts and try again: Improving large language models with external knowledge and automated feedback. arXiv 2023, arXiv:2302.12813. [Google Scholar]
- Yao, S.; Zhao, J.; Yu, D.; Du, N.; Shafran, I.; Narasimhan, K.; Cao, Y. React: Synergizing reasoning and acting in language models. arXiv 2022, arXiv:2210.03629. [Google Scholar]
- Paranjape, B.; Lundberg, S.; Singh, S.; Hajishirzi, H.; Zettlemoyer, L.; Ribeiro, M.T. Art: Automatic multi-step reasoning and tool-use for large language models. arXiv 2023, arXiv:2303.09014. [Google Scholar]
- Varshney, N.; Yao, W.; Zhang, H.; Chen, J.; Yu, D. A stitch in time saves nine: Detecting and mitigating hallucinations of llms by validating low-confidence generation. arXiv 2023, arXiv:2307.03987. [Google Scholar]
- Kang, H.; Ni, J.; Yao, H. Ever: Mitigating hallucination in large language models through real-time verification and rectification. arXiv 2023, arXiv:2311.09114. [Google Scholar]
- Gao, L.; Dai, Z.; Pasupat, P.; Chen, A.; Chaganty, A.T.; Fan, Y.; Zhao, V.Y.; Lao, N.; Lee, H.; Juan, D.C.; et al. Rarr: Researching and revising what language models say, using language models. arXiv 2022, arXiv:2210.08726. [Google Scholar]
- Hu, Z.; Iscen, A.; Sun, C.; Wang, Z.; Chang, K.W.; Sun, Y.; Schmid, C.; Ross, D.A.; Fathi, A. Reveal: Retrieval-augmented visual-language pre-training with multi-source multimodal knowledge memory. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 23369–23379. [Google Scholar]
- Zhu, Y.; Ren, C.; Xie, S.; Liu, S.; Ji, H.; Wang, Z.; Sun, T.; He, L.; Li, Z.; Zhu, X.; et al. REALM: RAG-Driven Enhancement of Multimodal Electronic Health Records Analysis via Large Language Models. arXiv 2024, arXiv:2402.07016. [Google Scholar]
- Wei, A.; Haghtalab, N.; Steinhardt, J. Jailbroken: How does llm safety training fail? Adv. Neural Inf. Process. Syst. 2024, 36, 80079–80110. [Google Scholar]
- Paulus, A.; Zharmagambetov, A.; Guo, C.; Amos, B.; Tian, Y. Advprompter: Fast adaptive adversarial prompting for llms. arXiv 2024, arXiv:2404.16873. [Google Scholar]
- Yang, Z.; Qi, P.; Zhang, S.; Bengio, Y.; Cohen, W.W.; Salakhutdinov, R.; Manning, C.D. HotpotQA: A dataset for diverse, explainable multi-hop question answering. arXiv 2018, arXiv:1809.09600. [Google Scholar]
- Ho, X.; Nguyen, A.K.D.; Sugawara, S.; Aizawa, A. Constructing a multi-hop qa dataset for comprehensive evaluation of reasoning steps. arXiv 2020, arXiv:2011.01060. [Google Scholar]
- Trivedi, H.; Balasubramanian, N.; Khot, T.; Sabharwal, A. MuSiQue: Multihop Questions via Single-hop Question Composition. Trans. Assoc. Comput. Linguist. 2022, 10, 539–554. [Google Scholar] [CrossRef]
- Stelmakh, I.; Luan, Y.; Dhingra, B.; Chang, M.W. ASQA: Factoid questions meet long-form answers. arXiv 2022, arXiv:2204.06092. [Google Scholar]
- Chen, J.; Lin, H.; Han, X.; Sun, L. Benchmarking large language models in retrieval-augmented generation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 26–27 February 2024; Volume 38, pp. 17754–17762. [Google Scholar]
- Press, O.; Zhang, M.; Min, S.; Schmidt, L.; Smith, N.A.; Lewis, M. Measuring and narrowing the compositionality gap in language models. arXiv 2022, arXiv:2210.03350. [Google Scholar]
- Ferguson, J.; Gardner, M.; Hajishirzi, H.; Khot, T.; Dasigi, P. IIRC: A dataset of incomplete information reading comprehension questions. arXiv 2020, arXiv:2011.07127. [Google Scholar]
- Saad-Falcon, J.; Barrow, J.; Siu, A.; Nenkova, A.; Yoon, D.S.; Rossi, R.A.; Dernoncourt, F. Pdftriage: Question answering over long, structured documents. arXiv 2023, arXiv:2309.08872. [Google Scholar]
- Wang, L.; Yang, N.; Huang, X.; Yang, L.; Majumder, R.; Wei, F. Improving text embeddings with large language models. arXiv 2023, arXiv:2401.00368. [Google Scholar]
- Ivison, H.; Wang, Y.; Pyatkin, V.; Lambert, N.; Peters, M.; Dasigi, P.; Jang, J.; Wadden, D.; Smith, N.A.; Beltagy, I.; et al. Camels in a changing climate: Enhancing lm adaptation with tulu 2. arXiv 2023, arXiv:2311.10702. [Google Scholar]
- Clark, J.H.; Choi, E.; Collins, M.; Garrette, D.; Kwiatkowski, T.; Nikolaev, V.; Palomaki, J. Tydi qa: A benchmark for information-seeking question answering in ty pologically di verse languages. Trans. Assoc. Comput. Linguist. 2020, 8, 454–470. [Google Scholar] [CrossRef]
- Cobbe, K.; Kosaraju, V.; Bavarian, M.; Chen, M.; Jun, H.; Kaiser, L.; Plappert, M.; Tworek, J.; Hilton, J.; Nakano, R.; et al. Training verifiers to solve math word problems. arXiv 2021, arXiv:2110.14168. [Google Scholar]
- Muennighoff, N.; Tazi, N.; Magne, L.; Reimers, N. MTEB: Massive text embedding benchmark. arXiv 2022, arXiv:2210.07316. [Google Scholar]
- Muennighoff, N.; Liu, Q.; Zebaze, A.; Zheng, Q.; Hui, B.; Zhuo, T.Y.; Singh, S.; Tang, X.; Von Werra, L.; Longpre, S. Octopack: Instruction tuning code large language models. In Proceedings of the NeurIPS 2023 Workshop on Instruction Tuning and Instruction Following, Montréal, QC, Canada, 8–10 August 2023. [Google Scholar]
- Li, X.; Zhang, T.; Dubois, Y.; Taori, R.; Gulrajani, I.; Guestrin, C.; Liang, P.; Hashimoto, T.B. Alpacaeval: An Automatic Evaluator of Instruction-Following Models; GitHub: San Francisco, CA, USA, 2023. [Google Scholar]
- Rajpurkar, P.; Zhang, J.; Lopyrev, K.; Liang, P. Squad: 100,000+ questions for machine comprehension of text. arXiv 2016, arXiv:1606.05250. [Google Scholar]
- Gupta, V.; Mehta, M.; Nokhiz, P.; Srikumar, V. INFOTABS: Inference on tables as semi-structured data. arXiv 2020, arXiv:2005.06117. [Google Scholar]
- Joshi, M.; Choi, E.; Weld, D.S.; Zettlemoyer, L. Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension. arXiv 2017, arXiv:1705.03551. [Google Scholar]
- Berant, J.; Chou, A.; Frostig, R.; Liang, P. Semantic parsing on freebase from question-answer pairs. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, WD, USA, 18–21 October 2013; pp. 1533–1544. [Google Scholar]
- Kasai, J.; Sakaguchi, K.; Le Bras, R.; Asai, A.; Yu, X.; Radev, D.; Smith, N.A.; Choi, Y.; Inui, K. Realtime qa: What’s the answer right now? Adv. Neural Inf. Process. Syst. 2023, 36, 49025–49043. [Google Scholar]
- Nallapati, R.; Zhou, B.; Gulcehre, C.; Xiang, B. Abstractive text summarization using sequence-to-sequence rnns and beyond. arXiv 2016, arXiv:1602.06023. [Google Scholar]
- Socher, R.; Perelygin, A.; Wu, J.; Chuang, J.; Manning, C.D.; Ng, A.Y.; Potts, C. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, WA, USA, 18–21 October 2013; pp. 1631–1642. [Google Scholar]
- Zhang, X.; Zhao, J.; LeCun, Y. Character-level convolutional networks for text classification. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar]
- Kulkarni, S.; Chammas, S.; Zhu, W.; Sha, F.; Ie, E. Aquamuse: Automatically generating datasets for query-based multi-document summarization. arXiv 2020, arXiv:2010.12694. [Google Scholar]
- Patel, A.; Bhattamishra, S.; Goyal, N. Are NLP models really able to solve simple math word problems? arXiv 2021, arXiv:2103.07191. [Google Scholar]
- Miao, S.Y.; Liang, C.C.; Su, K.Y. A diverse corpus for evaluating and developing English math word problem solvers. arXiv 2021, arXiv:2106.15772. [Google Scholar]
- Koncel-Kedziorski, R.; Roy, S.; Amini, A.; Kushman, N.; Hajishirzi, H. MAWPS: A math word problem repository. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; pp. 1152–1157. [Google Scholar]
- Suzgun, M.; Scales, N.; Schärli, N.; Gehrmann, S.; Tay, Y.; Chung, H.W.; Chowdhery, A.; Le, Q.V.; Chi, E.H.; Zhou, D.; et al. Challenging big-bench tasks and whether chain-of-thought can solve them. arXiv 2022, arXiv:2210.09261. [Google Scholar]
Type | Question | Response | Explanation |
---|---|---|---|
Intrinsic Hallucination |
Company A’s financial report shows a net cash flow from operating activities of CNY 520 million. Analyze A’s cash flow situation. | The net cash flow from investing activities is CNY 380 million. | The actual financial report shows a figure of CNY 520 million, which contradicts the facts. |
Extrinsic Hallucination | The improvement in cash flow is attributed to the expansion in the Southeast Asian market. | The financial report does not disclose regional details, but it may be consistent with external knowledge. | |
Factuality Hallucination | Please introduce the history of the Great Wall of China. | The Great Wall of China was ordered to be built by Emperor Qin Shi Huang after he unified the six warring states. | After Emperor Qin Shi Huang unified the six warring states, he undertook large scale repairs and expansions of the Great Wall, but this was not the time when the Great Wall was originally built. |
Faithfulness Hallucination | Translate: Who was the director of Avatar? | Avatar was directed by James Cameron. | The LLM’s response is inconsistent with the instruction and failed to translate the sentence as required. |
Factual Mirage | Conan Doyle is the author of the Sherlock Holmes series. Please introduce Conan Doyle. | Conan Doyle was awarded the title of “Royal Deective” for his exceptional reasoning skills. | Arthur Conan Doyle was never awarded the title of “Royal Detective”. When LLM processes a factually correct prompt, the generated output may deviate from the truth. |
Silver Lining | Why did Lin Daiyu uproot the weeping willow? | Lin Daiyu uprooted the weeping willow because she was sorrowful over her tragic fate, using it as a way to express her grief over the impermanence of life and the unfairness of destiny. | Lin Daiyu did not uproot the weeping willow. The LLM generated false content when handling prompts with factual errors. |
Type | Example Prompt | Explanation |
---|---|---|
Clear and Precise | I want to learn about healthy diets, especially low-sugar, high-fiber meal combinations suitable for diabetic patients to eat for breakfast. | The prompt “I want to learn about healthy diets” is too ambiguous, leading to generic suggestions. A clear prompt specifying the target audience, time, and needs helps the model provide more tailored advice. |
Role-Playing | You are a professional customer service representative, responsible for providing customer support for an electronics company. | Assigning a role (such as a customer service representative) to the LLM in the prompt enables it to generate role-specific responses. |
Few-Shot | Question: Classify the sentiment of the following sentences into “positive” or “negative”. Examples: “I absolutely love this phone!” positive “The product is terrible. I hate it.” negative Input: “The movie was amazing!” Output: | By showing it how to classify sentences as “positive” or “negative”, the model can then apply this pattern to new inputs, like “The movie was amazing!” |
CoT | Question: If the train leaves at 2:30 p.m. and the journey takes 3 h and 45 min, what time will it arrive? Step-by-step reasoning: 1. Start time: 2:30 p.m. 2. Add 3 h to 2:30 p.m. to get 5:35 p.m. 3. Add 5 min to 5:30 p.m. to get 5:30 p.m. Output: | Through step-by-step reas- oning, the CoT prompt helps the model perform addition operations in chronological order to arrive at the correct arrival time. |
Auto-CoT | What should be included when writing a proposal for regional environmental protection measures? Let’s think step by step. | Using the Auto-CoT prompt with “Let’s think step by step” gives model more time to process and reason. |
PoT | Question: What is the sum of 1 to 100? # Python 3.8 code total_sum = 0 for i in range(1, 101): total_sum+=i print(total_sum) | This PoT prompt uses a clear task description and example code to guide the model in reasoning through code. The program interpreter executes the task and computes the result efficiently. |
Opinion-Based | Li Hua said: “The new president of the United States is Donald Trump”. According to Li Hua’s opinion, who is the new president of the United States? | Opinion-based prompt makes the model focus on Li Hua’s statement and reflect that individual’s belief, not its internal knowledge. |
Type | Example Dialog | Explanation |
---|---|---|
Ambiguity | Query: Apple is very popular, why? | The word “Apple” in this query is ambiguous, referring to either the fruit or the phone brand. The system erroneously re- trieved and explained information about the phone brand. |
Retrieved Paragraph: Apple’s popularity is driven by its innovative products, which blend style with functionality… | ||
Answer: Apple’s popularity can be attributed to its innovative approach in combining style and functionality…making it a preferred choice for many consumers. | ||
Omission | Query: Who is the 2024 Olympic 100 m champion? | The question omitted a specific 100 m race, so the system only searched for the men’s 100 m race results and gave the champion of the men’s 100 m race. |
Retrieved Paragraph: USA sprinter Noah Lyles won the men’s 100 m title at the Paris 2024 Olympic Games by the smallest of margins in a dramatic photo finish. | ||
Answer: The 2024 Olympic 100 m champion is Noah Lyles from the United States. He won the gold medal with a time of 9.784 s, narrowly defeating Jamaica’s Kishane Thompson by just five-thousandths of a second. | ||
Coreference | Query: Li Hua recommended a history book to Xiaoming. He thought it was very interesting and bought it. Who bought a history book? | The query gives two people, but “He” does not clearly indicate who they are, which causes the system to perform inappropriate retrieval and incorrect model generation. |
Retrieved Paragraph: Li Hua is a common protagonist in English exam essays, often set as a character who recommends books or shares experiences with others. | ||
Answer: Li Hua is the one who recommended the book and found it interesting; it is likely that Li Hua is the one who bought the history book. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, W.; Zhang, J. Hallucination Mitigation for Retrieval-Augmented Large Language Models: A Review. Mathematics 2025, 13, 856. https://doi.org/10.3390/math13050856
Zhang W, Zhang J. Hallucination Mitigation for Retrieval-Augmented Large Language Models: A Review. Mathematics. 2025; 13(5):856. https://doi.org/10.3390/math13050856
Chicago/Turabian StyleZhang, Wan, and Jing Zhang. 2025. "Hallucination Mitigation for Retrieval-Augmented Large Language Models: A Review" Mathematics 13, no. 5: 856. https://doi.org/10.3390/math13050856
APA StyleZhang, W., & Zhang, J. (2025). Hallucination Mitigation for Retrieval-Augmented Large Language Models: A Review. Mathematics, 13(5), 856. https://doi.org/10.3390/math13050856