Optimizing Legal Text Summarization Through Dynamic Retrieval-Augmented Generation and Domain-Specific Adaptation
Abstract
:1. Introduction
- The development of a Dynamic Legal RAG system tailored to the complexities of Indian legal texts.
- The implementation of a Factually-Aligned Summarization model that leverages real-time retrieval of legal context.
2. Related Work
2.1. Information Retrieval for Legal Texts
2.1.1. Traditional Retrieval Methods
2.1.2. Neural Information Retrieval in Legal Domain
2.2. Legal Text Summarization
2.2.1. Extractive Summarization for Legal Documents
2.2.2. Abstractive Summarization in Legal NLP
2.2.3. Decoder-Only Transformer Architecture for Abstractive Summarization in Legal NLP
2.2.4. Enhancing Summarization with Legal-Domain-Specific Knowledge
3. Methodology
3.1. Dynamic Legal RAG System
- Statutory Law—Constitutional provisions, legislative acts, and codified regulations.
- Case Law and Precedents—Judicial rulings that establish authoritative interpretations of legal provisions.
- Legal Commentaries and Doctrinal Writings—Expert analyses that offer nuanced interpretations of legal principles.
3.1.1. Legal Corpus Construction and Indexing
- Landmark Supreme Court Judgments (Volumes 1 and 2);
- Legal Maxims and Phrases (Volumes 1 and 2);
- The Constitution of India;
- Indian Penal Code (IPC);
- Criminal Procedure Code (CrPC);
- Civil Procedure Code (CPC);
- Indian Evidence Act;
- Legal Dictionary.
- Indexing Methodology for BM25-Based Retrieval
- Text Preprocessing: Each document undergoes tokenization, stopword removal, and stemming, ensuring that common legal terms are retained while reducing redundancy.
- Segmentation and Structuring: Given the hierarchical nature of legal texts, sections, provisions, and case citations are indexed separately, allowing for granular retrieval of specific legal clauses and precedents.
- BM25 Index Construction: The retrieval model ranks documents using the BM25 ranking function, defined as follows:
- is the relevance score of document D for query Q.
- is the term frequency of keyword t in document D.
- is the document length, and avgDL is the average document length in the corpus.
- and b are hyperparameters controlling term saturation and length normalization.
- (inverse document frequency) is given by
- N is the total number of documents in the corpus.
- is the number of documents containing term t.
- Entity-Aware Indexing: To further enhance retrieval, Legal Named Entity Recognition (Legal NER) is employed to extract provision–statute pairs and precedents, which are indexed as queryable entities. These extracted entities are assigned higher weights within the BM25 ranking, ensuring that statutory provisions and case law references are prioritized during retrieval.
- It prevents division by zero when .
- It acts as a Bayesian smoothing factor to moderate the influence of rare or overly frequent terms. This smoothing helps prevent extreme values in the score when terms appear in either all or very few documents in the corpus.
- Impact of the Indexing Strategy
3.1.2. Contextualized Legal Document Retrieval
- Retrieval Mechanism for Provision–Statute Pairs
- Provision represents the specific clause or section of a legal act (e.g., “Section 304B”).
- Statute denotes the governing legal framework (e.g., “Indian Penal Code”).
- represents related legal provisions, inferred using cross-references from the legal knowledge base.
- Retrieval Mechanism for Precedents
- Case Citation uniquely identifies the legal precedent (e.g., “Kesavananda Bharati v. State of Kerala (1973)").
- represents judicial references, including ratio decidendi, obiter dicta, and citations appearing in related judgments.
- Dynamic Integration into Summarization
- Provision–Statute Pair Integration: Retrieved statutory texts supplement the summarization pipeline, ensuring that legal provisions are accurately represented within the generated summaries.
- Precedent-Aware Summarization: Case law retrieval provides precedential context, reinforcing the legal argumentation presented in the summarized text.
3.1.3. Role of Legal Named Entity Recognition (Legal NER)
- Entity-Aware Query Formulation: Extracted legal entities are used to dynamically generate focused search queries that guide the BM25 retriever. For example, if a provision such as “Section 197 CrPC” or a precedent like “Maneka Gandhi v. Union of India” is identified in the judgment, it is used to retrieve authoritative references and explanations from the legal corpus.
- Contextual Enrichment of Summarization: The retrieved legal information is integrated into the input passed to the summarization model. This ensures that the model has access to definitions, related precedents, or statutory explanations that may not have been seen during training.
3.1.4. Dynamic Retrieval Design: Thresholds, Chunk Size, and Entity-Aware Queries
3.1.5. Dynamic Legal RAG System Workflow
3.2. Legal Text Summarization
3.2.1. Selected Decoder-Only Model Architectures
- Model Selection and Justification of Decoder-Only Architecture
3.2.2. Fine-Tuning Process of Large Language Models
- Dataset Preparation and Preprocessing
- Model Selection and Quantization
- Optimization Strategies and Hyperparameter Tuning
- Parameter-Efficient Fine-Tuning (PEFT) [50] with LoRA
- Summary of Fine-Tuning Outcomes
4. Experimental Procedure
4.1. Environmental Setup
Environmental Setup for Dynamic Legal RAG System and Legal Text Summarization
4.2. Dataset Preparation
4.2.1. Dataset for Legal Retrieval-Augmented Generation (Legal RAG)
4.2.2. Dataset for Legal Text Summarization
- train-data/—Contains 7030 judgment–summary pairs used for training.
- −
- judgement/—Raw full-text judgments.
- −
- summary/—Corresponding abstractive summaries.
- −
- stats-IN-train.txt—Word and sentence count statistics.
- test-data/—Contains 100 judgment–summary pairs for evaluation.
- −
- judgement/, summary/, stats-IN-test.txt.
- Temporal Validity of Legal Language and Dataset Justification
- Statistical Analysis and Filtering
- Outliers in Judgment Lengths: Approximately 7% of the dataset (499 judgments) comprises extremely long texts, with the 99th percentile averaging 38,858 tokens. To maintain consistency, these are removed.
- Very Short Judgments: Judgments in the 0.5th percentile (mean 12 tokens) are deemed too short for a meaningful summarization and are excluded.
- Compression Ratio Filtering
- represents the token count of the i-th token in the summary.
- represents the token count of the j-th token in the judgment.
- and denote the total number of tokens in the summary and judgment, respectively.
- Compression ratios > 1 indicate that summaries are longer than judgments, contradicting the fundamental principle of summarization.
- Compression ratios = 0 indicate that the summary is identical to the judgment, providing no meaningful compression.
Algorithm 1 Preprocessing and filtering of IN-Abs dataset for legal text summarization. |
|
- Randomization and Data Splitting
- Training Set: Consisting of 6500 judgment–summary pairs, this subset is utilized to fine-tune the legal text summarization model, allowing it to learn domain-specific linguistic structures and summarization patterns.
- Testing Set: Comprising 257 judgment–summary pairs, this subset serves as an independent evaluation benchmark to assess model performance and generalization capabilities.
5. Results and Discussion
5.1. Dynamic Legal RAG Results
5.1.1. Performance Evaluation of Different Retrievers
- (A)
- Description of Retrieval Models
- (B)
- Evaluation Metrics
- ROUGE-1, ROUGE-2, and ROUGE-L [52]: These measure lexical overlap between the retrieved documents and the ground truth.
- Cosine Similarity: Evaluates the contextual relevance of retrieved results.
- Legal Coverage: Assessed using a 5-point Likert scale by six legal annotators, measuring how well a retrieved document aligns with the legal query.
- Irrelevant Retrieval Percentage: Computes the proportion of retrieved documents that are semantically unrelated to the query.
- Redundancy Percentage: Measures how frequently the same legal excerpts appear across multiple retrieved results.
5.1.2. Analysis of Top-K Chunk Selection
5.1.3. Selection of the Optimal Retriever
5.1.4. Qualitative Analysis of Retrieved Context in Dynamic Legal RAG
- Sample Query 1—Statutory Anchor: Section 300 IPC
- Sample Query 2—Precedent: Kesavananda Bharati v. State of Kerala (1973)
5.2. Results on Legal Text Summarization
5.2.1. Performance of Pretrained Transformer Models (Zero-Shot)
5.2.2. Performance of Fine-Tuned Models (LoRA-Based Adaptation)
5.2.3. Performance of Fine-Tuned Models Integrated with Legal Domain Knowledge
5.2.4. Summary of Comparative Analysis
- DeepSeek-7B is the best performing baseline model due to its extended context window and strong reasoning capabilities, while LLaMA 2-7B is the weakest.
- Fine-tuning significantly improves model performance, with LLaMA 3.1-8B emerging as the strongest fine-tuned model, leveraging its superior long-context handling and multilingual capabilities.
- Domain knowledge integration via NER and RAG leads to the highest performance gains, particularly for LLaMA 3.1-8B, which benefits from real-time retrieval of legal precedents and statutes.
- LLaMA 2-7B, despite its poor baseline performance, improves drastically when enhanced with domain knowledge, suggesting that its primary limitation lies in contextual understanding rather than model architecture.
5.3. Legal-Domain-Specific Evaluation Metrics
- Citation Density (CD): The number of legal citations (e.g., case names, statute references) per 100 tokens.
- Statutory Inclusion Rate (SIR): The percentage of statutes/provisions mentioned in the ground truth that are also included in the generated summary.
- Precedent Alignment Score (PAS): Measures the inclusion and accuracy of cited precedents with respect to the gold summary.
- Factual Consistency Index (FCI): Proportion of factual claims in the summary that are supported by retrieved legal content.
- Legal Entity Preservation Rate (LEPR): The percentage of extracted legal entities retained in the summary, relative to the input judgment.
5.4. Qualitative Results: Summary Comparisons Across Models
5.4.1. Sample Case 1: Indra Sawhney v. Union of India (1992)
- Excerpt from Judgment:
- Gold-Standard (Human Annotated) Summary:
5.4.2. Sample Case 2: Maneka Gandhi v. Union of India (1978)
- Excerpt from Judgment:
- Gold-Standard (Human Annotated) Summary:
6. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Robertson, S.; Zaragoza, H. The probabilistic relevance framework: BM25 and beyond. Found. Trends Inf. Retr. 2009, 3, 333–389. [Google Scholar] [CrossRef]
- Dubey, A.; Jauhri, A.; Pandey, A.; Kadian, A.; Al-Dahle, A.; Letman, A.; Mathur, A.; Schelten, A.; Yang, A.; Fan, A.; et al. The llama 3 herd of models. arXiv 2024, arXiv:2407.21783. [Google Scholar]
- Ramos, J. Using tf-idf to determine word relevance in document queries. In First Instructional Conference on Machine Learning; Citeseer: Princeton, NJ, USA, 2003; Volume 242, pp. 29–48. [Google Scholar]
- Koubarakis, M.; Skiadopoulos, S.; Tryfonopoulos, C. Logic and computational complexity for Boolean information retrieval. IEEE Trans. Knowl. Data Eng. 2006, 18, 1659–1666. [Google Scholar] [CrossRef]
- Gomes, T.; Ladeira, M. A new conceptual framework for enhancing legal information retrieval at the Brazilian Superior Court of Justice. In Proceedings of the 12th International Conference on Management of Digital EcoSystems, Virtual, 2–4 November 2020; pp. 26–29. [Google Scholar]
- Costa, W.M.; Pedrosa, G.V. Legal Information Retrieval Based on a Concept-Frequency Representation and Thesaurus. In Proceedings of the 25th International Conference on Enterprise Information Systems, Prague, Czech Republic, 24–26 April 2023; pp. 303–311. [Google Scholar]
- Mandal, A.; Ghosh, K.; Ghosh, S.; Mandal, S. Unsupervised approaches for measuring textual similarity between legal court case reports. Artif. Intell. Law 2021, 29, 417–451. [Google Scholar] [CrossRef]
- Liu, L.; Liu, L.; Han, Z. Query Revaluation Method For Legal Information Retrieval. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE 2020) Working Notes, Hyderabad, India, 16–20 December 2020; pp. 18–21. [Google Scholar]
- Balaji, N.N.A.; Bharathi, B.; Bhuvana, J. Legal Information Retrieval and Rhetorical Role Labelling for Legal Judgements. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE 2020) Working Notes, Hyderabad, India, 16–20 December 2020; pp. 26–30. [Google Scholar]
- Kanapala, A.; Jannu, S.; Pamula, R. Passage-based text summarization for legal information retrieval. Arab. J. Sci. Eng. 2019, 44, 9159–9169. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1 (Long and Short Papers), pp. 4171–4186. [Google Scholar]
- Chalkidis, I.; Fergadiotis, M.; Malakasiotis, P.; Aletras, N.; Androutsopoulos, I. LEGAL-BERT: The muppets straight out of law school. arXiv 2020, arXiv:2010.02559. [Google Scholar]
- Anand, D.; Wagh, R. Effective deep learning approaches for summarization of legal texts. J. King Saud Univ.-Comput. Inf. Sci. 2022, 34, 2141–2150. [Google Scholar] [CrossRef]
- Althammer, S.; Askari, A.; Verberne, S.; Hanbury, A. DoSSIER@ COLIEE 2021: Leveraging dense retrieval and summarization-based re-ranking for case law retrieval. arXiv 2021, arXiv:2108.03937. [Google Scholar]
- Karpukhin, V.; Oğuz, B.; Min, S.; Lewis, P.; Wu, L.; Edunov, S.; Chen, D.; Yih, W.t. Dense passage retrieval for open-domain question answering. arXiv 2020, arXiv:2004.04906. [Google Scholar]
- Khattab, O.; Zaharia, M. Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Xi’an, China, 25–30 July 2020; pp. 39–48. [Google Scholar]
- Wang, X.; Macdonald, C.; Tonellotto, N.; Ounis, I. ColBERT-PRF: Semantic pseudo-relevance feedback for dense passage and document retrieval. Acm Trans. Web 2023, 17, 1–39. [Google Scholar] [CrossRef]
- Muennighoff, N. Sgpt: Gpt sentence embeddings for semantic search. arXiv 2022, arXiv:2202.08904. [Google Scholar]
- Louis, A.; Van Dijck, G.; Spanakis, G. Finding the law: Enhancing statutory article retrieval via graph neural networks. arXiv 2023, arXiv:2301.12847. [Google Scholar]
- Mao, Y.; He, P.; Liu, X.; Shen, Y.; Gao, J.; Han, J.; Chen, W. Generation-augmented retrieval for open-domain question answering. arXiv 2020, arXiv:2009.08553. [Google Scholar]
- Mihalcea, R.; Tarau, P. Textrank: Bringing order into text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, 25–26 July 2004; pp. 404–411. [Google Scholar]
- Erkan, G.; Radev, D.R. Lexrank: Graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 2004, 22, 457–479. [Google Scholar] [CrossRef]
- Jain, D.; Borah, M.D.; Biswas, A. Fine-tuning textrank for legal document summarization: A Bayesian optimization based approach. In Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation, Hyderabad, India, 16–20 December 2020; pp. 41–48. [Google Scholar]
- Kumar, H.; Jayanth, P.; Anand Kumar, M. Large Language Models for Indian Legal Text Summarisation. In Proceedings of the 2024 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT), Bangalore, India, 12–14 July 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–5. [Google Scholar]
- Liu, C.L.; Chen, K.C. Extracting the gist of Chinese judgments of the supreme court. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law, Montreal, QC, Canada, 17–21 June 2019; pp. 73–82. [Google Scholar]
- Nallapati, R.; Zhai, F.; Zhou, B. Summarunner: A recurrent neural network based sequence model for extractive summarization of documents. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar]
- Liu, Y.; Lapata, M. Text summarization with pretrained encoders. arXiv 2019, arXiv:1908.08345. [Google Scholar]
- Shukla, A.; Bhattacharya, P.; Poddar, S.; Mukherjee, R.; Ghosh, K.; Goyal, P.; Ghosh, S. Legal case document summarization: Extractive and abstractive methods and their evaluation. arXiv 2022, arXiv:2210.07544. [Google Scholar]
- Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mohamed, A.; Levy, O.; Stoyanov, V.; Zettlemoyer, L. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv 2019, arXiv:1910.13461. [Google Scholar]
- Ni, J.; Abrego, G.H.; Constant, N.; Ma, J.; Hall, K.B.; Cer, D.; Yang, Y. Sentence-t5: Scalable sentence encoders from pre-trained text-to-text models. arXiv 2021, arXiv:2108.08877. [Google Scholar]
- Zhang, J.; Zhao, Y.; Saleh, M.; Liu, P. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In Proceedings of the International Conference on Machine Learning. PMLR, Virtual, 3–18 July 2020; pp. 11328–11339. [Google Scholar]
- Kale, A.R.; Deshmukh, P.R. Abstractive Text Summarization: A Transformer Based Approach. In Proceedings of the 2024 IEEE 9th International Conference for Convergence in Technology (I2CT), Pune, India, 5–7 August 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–4. [Google Scholar]
- Myla, S.D.; Saini, R.; Kapoor, N. Enhanced Text Summarization through Hybrid Integration of RoBERTa, T5, and Pegasus Models. In Proceedings of the 2024 International Conference on Advances in Modern Age Technologies for Health and Engineering Science (AMATHE), Shivamogga, India, 9–10 May 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–8. [Google Scholar]
- Zaheer, M.; Guruganesh, G.; Dubey, K.A.; Ainslie, J.; Alberti, C.; Ontanon, S.; Pham, P.; Ravula, A.; Wang, Q.; Yang, L.; et al. Big bird: Transformers for longer sequences. Adv. Neural Inf. Process. Syst. 2020, 33, 17283–17297. [Google Scholar]
- Kumar, V.B.; Bhattacharjee, K.; Gangadharaiah, R. Towards cross-domain transferability of text generation models for legal text. In Proceedings of the Natural Legal Language Processing Workshop 2022, Abu Dhabi, United Arab Emirates, 8 December 2022; pp. 111–118. [Google Scholar]
- de Oliveira, L.; Rodrigo, A.L. Repurposing decoder-transformer language models for abstractive summarization. arXiv 2019, arXiv:1909.00325. [Google Scholar]
- Rothe, S.; Narayan, S.; Severyn, A. Leveraging pre-trained checkpoints for sequence generation tasks. Trans. Assoc. Comput. Linguist. 2020, 8, 264–280. [Google Scholar] [CrossRef]
- Jo, S.G.; Park, S.H.; Kim, J.J.; On, B.W. Learning cluster patterns for abstractive summarization. IEEE Access 2023, 11, 146065–146075. [Google Scholar] [CrossRef]
- Kalamkar, P.; Agarwal, A.; Tiwari, A.; Gupta, S.; Karn, S.; Raghavan, V. Named entity recognition in indian court judgments. arXiv 2022, arXiv:2211.03442. [Google Scholar]
- Polsley, S.; Jhunjhunwala, P.; Huang, R. Casesummarizer: A system for automated summarization of legal texts. In Proceedings of the COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations, Osaka, Japan, 11–16 December 2016; pp. 258–262. [Google Scholar]
- Bhattacharya, P.; Poddar, S.; Rudra, K.; Ghosh, K.; Ghosh, S. Incorporating domain knowledge for extractive summarization of legal case documents. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law, São Paulo, Brazil, 21–25 June 2021; pp. 22–31. [Google Scholar]
- Khandelwal, U.; Levy, O.; Jurafsky, D.; Zettlemoyer, L.; Lewis, M. Generalization through memorization: Nearest neighbor language models. arXiv 2019, arXiv:1911.00172. [Google Scholar]
- Borgeaud, S.; Mensch, A.; Hoffmann, J.; Cai, T.; Rutherford, E.; Millican, K.; Van Den Driessche, G.B.; Lespiau, J.B.; Damoc, B.; Clark, A.; et al. Improving language models by retrieving from trillions of tokens. In Proceedings of the International Conference on Machine Learning. PMLR, Baltimore, MD, USA, 17–23 July 2022; pp. 2206–2240. [Google Scholar]
- Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.t.; Rocktäschel, T.; et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Adv. Neural Inf. Process. Syst. 2020, 33, 9459–9474. [Google Scholar]
- Touvron, H.; Martin, L.; Stone, K.; Albert, P.; Almahairi, A.; Babaei, Y.; Bashlykov, N.; Batra, S.; Bhargava, P.; Bhosale, S.; et al. Llama 2: Open foundation and fine-tuned chat models. arXiv 2023, arXiv:2307.09288. [Google Scholar]
- Bi, X.; Chen, D.; Chen, G.; Chen, S.; Dai, D.; Deng, C.; Ding, H.; Dong, K.; Du, Q.; Fu, Z.; et al. Deepseek llm: Scaling open-source language models with longtermism. arXiv 2024, arXiv:2401.02954. [Google Scholar]
- Chaplot, D.S. Albert q. jiang, alexandre sablayrolles, arthur mensch, chris bamford, devendra singh chaplot, diego de las casas, florian bressand, gianna lengyel, guillaume lample, lucile saulnier, lélio renard lavaud, marie-anne lachaux, pierre stock, teven le scao, thibaut lavril, thomas wang, timothée lacroix, william el sayed. arXiv 2023, arXiv:2310.06825. [Google Scholar]
- Tunstall, L.; Beeching, E.; Lambert, N.; Rajani, N.; Rasul, K.; Belkada, Y.; Huang, S.; Von Werra, L.; Fourrier, C.; Habib, N.; et al. Zephyr: Direct distillation of lm alignment. arXiv 2023, arXiv:2310.16944. [Google Scholar]
- Dettmers, T.; Pagnoni, A.; Holtzman, A.; Zettlemoyer, L. Qlora: Efficient finetuning of quantized llms. Adv. Neural Inf. Process. Syst. 2023, 36, 10088–10115. [Google Scholar]
- Han, Z.; Gao, C.; Liu, J.; Zhang, J.; Zhang, S.Q. Parameter-efficient fine-tuning for large models: A comprehensive survey. arXiv 2024, arXiv:2403.14608. [Google Scholar]
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. Lora: Low-rank adaptation of large language models. ICLR 2022, 1, 3. [Google Scholar]
- Lin, C.Y. Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, Barcelona, Spain, 25 July 2004; Association for Computational Linguistics: Stroudsburg, PA, USA, 2004; pp. 74–81. [Google Scholar]
- Banerjee, S.; Lavie, A. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, MI, USA, 29 June 2005; Association for Computational Linguistics: Stroudsburg, PA, USA, 2005; pp. 65–72. [Google Scholar]
- Zhang, T.; Kishore, V.; Wu, F.; Weinberger, K.Q.; Artzi, Y. Bertscore: Evaluating text generation with bert. arXiv 2019, arXiv:1904.09675. [Google Scholar]
- Bai, Y.; Jones, A.; Ndousse, K.; Askell, A.; Chen, A.; DasSarma, N.; Drain, D.; Fort, S.; Ganguli, D.; Henighan, T.; et al. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv 2022, arXiv:2204.05862. [Google Scholar]
- Pires, T.; Schlinger, E.; Garrette, D. How multilingual is multilingual BERT? arXiv 2019, arXiv:1906.01502. [Google Scholar]
- Conneau, A.; Khandelwal, K.; Goyal, N.; Chaudhary, V.; Wenzek, G.; Guzmán, F.; Grave, E.; Ott, M.; Zettlemoyer, L.; Stoyanov, V. Unsupervised cross-lingual representation learning at scale. arXiv 2019, arXiv:1911.02116. [Google Scholar]
Model Category | Model Variant | Decoder Layers | Attention Heads | Embedding Size | MLP Dimension | Key/Value Dimension |
---|---|---|---|---|---|---|
LLaMA | LLaMA 2 7B | 32 | 32 | 32,000 × 4096 | 14,336 | 1024 |
LLaMA 3.1 8B | 32 | 32 | 128,256 × 4096 | 14,336 | 1024 | |
DeepSeek 7B | 30 | 30 | 100,015 × 4096 | 11,008 | 4096 | |
Mistral | Mistral 7B | 32 | 32 | 32,000 × 4096 | 14,336 | 1024 |
Zephyr 7B | 32 | 32 | Pad’ng idx = 2 | 14,336 | 1024 |
Model | Checkpointing Stage | Trainable Parameters | Total Parameters | Trainable % |
---|---|---|---|---|
Mistral-7B & Zephyr-7B | Before Gradient Checkpointing | 0 | 3,752,071,168 | 0.00% |
After Gradient Checkpointing | 262,410,240 | 3,752,071,168 | 6.99% | |
After Applying LoRA | 346,030,080 | 4,098,101,248 | 8.44% | |
DeepSeek-7B | Before Gradient Checkpointing | 0 | 3,855,200,256 | 0.00% |
After Gradient Checkpointing | 819,572,736 | 3,855,200,256 | 21.26% | |
After Applying LoRA | 894,279,680 | 4,749,479,936 | 18.83% | |
Llama-2-7B | Before Gradient Checkpointing | 0 | 3,500,412,928 | 0.00% |
After Gradient Checkpointing | 262,410,240 | 3,500,412,928 | 7.50% | |
After Applying LoRA | 342,097,920 | 3,842,510,848 | 8.90% | |
Llama-3.1-8B-Instruct | Before Gradient Checkpointing | 0 | 4,540,600,320 | 0.00% |
After Gradient Checkpointing | 1,050,939,392 | 4,540,600,320 | 23.15% | |
After Applying LoRA | 1,134,559,232 | 5,675,159,552 | 19.99% |
Legal Document | Type of Legal Framework | Legal Significance | Application in RAG |
---|---|---|---|
Landmark Judgments | Case Law | Establishes binding precedents | Supports retrieval of judicial interpretations |
Serves as a primary source for case law analysis | Aids in legal reasoning | ||
Legal Maxims and Phrases | Doctrinal Principles | Provides interpretative principles | Enhances contextual understanding |
Standardizes legal terminology | Improves semantic enrichment | ||
Constitution of India | Constitutional Law | Supreme legal document defining rights | Facilitates constitutional law queries |
Governs fundamental governance structure | Supports jurisprudential research | ||
Indian Penal Code (IPC) | Substantive Criminal Law | Defines criminal offenses | Supports criminal case law retrieval |
Prescribes penalties for crimes | Enhances legal document classification | ||
Indian Evidence Act | Procedural Law | Governs admissibility of evidence | Enables retrieval of evidentiary rules |
Establishes legal standards for proof | Assists in legal argumentation | ||
Criminal Procedure Code (CrPC) | Procedural Law | Regulates criminal trials and investigations | Provides procedural case insights |
Ensures due process in criminal justice | Supports RAG-based legal queries | ||
Civil Procedure Code (CPC) | Procedural Law | Governs civil litigation procedures | Identifies procedural precedents |
Regulates dispute resolution mechanisms | Enhances case law retrieval | ||
Legal Dictionary | Legal Reference | Defines legal terminology | Ensures precision in information retrieval |
Standardizes legal definitions | Supports document annotation |
Statistic | Judgment Length (Tokens) | Summary Length (Tokens) | Compression Ratio (CR) |
---|---|---|---|
Total Samples | 6757 | 6757 | 6757 |
Mean (Average) | 4039.44 | 807.25 | 0.229 |
Standard Deviation | 3180.25 | 643.38 | 0.105 |
Minimum | 421 | 42 | 0.050 |
25th Percentile (Q1) | 2058 | 428 | 0.144 |
Median (50th Percentile) | 3140 | 641 | 0.216 |
75th Percentile (Q3) | 4904 | 957 | 0.305 |
Maximum | 24,268 | 7771 | 0.500 |
Retriever Model | Top-K Chunks | ROUGE-1 | ROUGE-2 | ROUGE-L | Cosine Similarity | Legal Coverage (1–5) | Irrelevant Retrieval (%) | Redundancy (%) |
---|---|---|---|---|---|---|---|---|
BM25 [1] | Top 1 | 0.7805 | 0.6203 | 0.7234 | 0.85 | 4.4 | 5.8 | 9.3 |
Top 3 | 0.9102 | 0.7647 | 0.8894 | 0.94 | 4.9 | 2.3 | 3.9 | |
Top 5 | 0.8810 | 0.7208 | 0.8543 | 0.92 | 4.6 | 3.6 | 5.1 | |
DPR [14] | Top 1 | 0.6521 | 0.5017 | 0.6123 | 0.79 | 4.0 | 8.3 | 12.1 |
Top 3 | 0.9214 | 0.7789 | 0.8612 | 0.91 | 4.7 | 4.8 | 7.5 | |
Top 5 | 0.8745 | 0.7180 | 0.8326 | 0.89 | 4.5 | 6.1 | 9.0 | |
ColBERT [16] | Top 1 | 0.7032 | 0.5509 | 0.6640 | 0.81 | 4.2 | 7.0 | 10.4 |
Top 3 | 0.8743 | 0.7316 | 0.8129 | 0.90 | 4.8 | 3.2 | 3.5 | |
Top 5 | 0.9321 | 0.7930 | 0.8821 | 0.88 | 5.0 | 5.0 | 6.8 | |
SGPT [18] | Top 1 | 0.7235 | 0.5832 | 0.6824 | 0.83 | 4.3 | 6.2 | 9.1 |
Top 3 | 0.8806 | 0.7452 | 0.8227 | 0.91 | 4.8 | 2.1 | 5.2 | |
Top 5 | 0.8619 | 0.8013 | 0.8521 | 0.90 | 4.7 | 3.5 | 6.0 |
Top-3 Retrieved Chunks | Expert Evaluation and Remarks |
---|---|
Chunk 1: Section 300 of the Indian Penal Code defines murder. Except in cases falling under the exceptions specified in the section, culpable homicide is murder if the act is done with the intention of causing death, or if it is done with the intention of causing such bodily injury as is likely to cause death, or if the offender knows that his act is imminently dangerous to life. | Highly Relevant
|
Chunk 2: The classification of homicide into culpable homicide not amounting to murder and murder is elaborated under Sections 299 and 300 IPC. The distinction hinges upon the degree of intention and knowledge involved in the act. | Contextually Supportive
|
Chunk 3: In Virsa Singh v. State of Punjab (1958), the Supreme Court clarified that for a conviction under Section 300, it is essential to establish both the intention to inflict the injury and the sufficiency of that injury in the ordinary course of nature to cause death. | Precedential Grounding
|
Top-3 Retrieved Chunks | Expert Evaluation and Remarks |
---|---|
Chunk 1: In Kesavananda Bharati v. State of Kerala, the Supreme Court ruled that although Parliament has the power to amend the Constitution under Article 368, it cannot alter or destroy the basic structure of the Constitution. This judgment laid the foundation for the Basic Structure Doctrine in Indian constitutional law. | Highly Relevant
|
Chunk 2: The majority opinion in the case (7:6 split) held that features such as judicial review, federalism, secularism, and the rule of law form part of the Constitution’s basic structure and cannot be amended, even through constitutional amendments. | Contextually Supportive
|
Chunk 3: The Basic Structure Doctrine introduced in Kesavananda Bharati has been consistently upheld in subsequent rulings, including in Indira Nehru Gandhi v. Raj Narain and Minerva Mills v. Union of India, reinforcing its foundational status in constitutional jurisprudence. | Precedential Continuity
|
Model Type | Model | ROUGE-1 | ROUGE-2 | ROUGE-L | METEOR [53] | BERTScore Precision | BERTScore Recall | BERTScore F1 |
---|---|---|---|---|---|---|---|---|
Pretrained Baseline (Zero-Shot) | LLaMA 3.1-8B | 0.3986 | 0.2402 | 0.2458 | 0.3689 | 0.8134 | 0.8312 | 0.8223 |
DeepSeek-7B | 0.4405 | 0.2756 | 0.2803 | 0.4053 | 0.8391 | 0.8542 | 0.8463 | |
Zephyr-7B Beta | 0.3892 | 0.2321 | 0.2384 | 0.3504 | 0.8105 | 0.8274 | 0.8189 | |
Mistral-7B | 0.3541 | 0.2036 | 0.2101 | 0.3178 | 0.7903 | 0.8065 | 0.7981 | |
LLaMA 2-7B | 0.3324 | 0.1857 | 0.1902 | 0.2987 | 0.7713 | 0.7872 | 0.7786 | |
Fine-Tuned Models (LoRA) | LLaMA 3.1-8B | 0.4528 | 0.2874 | 0.2929 | 0.4189 | 0.8493 | 0.8647 | 0.8658 |
DeepSeek-7B | 0.4521 | 0.2892 | 0.2956 | 0.4105 | 0.8501 | 0.8652 | 0.8573 | |
Zephyr-7B Beta | 0.4109 | 0.2525 | 0.2583 | 0.3805 | 0.8232 | 0.8417 | 0.8323 | |
Mistral-7B | 0.3674 | 0.2159 | 0.2217 | 0.3295 | 0.7992 | 0.8153 | 0.8071 | |
LLaMA 2-7B | 0.3457 | 0.1983 | 0.2042 | 0.3120 | 0.7821 | 0.7986 | 0.7902 | |
Fine-Tuned Models + Domain Knowledge Integration | LLaMA 3.1-8B | 0.5214 | 0.3572 | 0.3652 | 0.4951 | 0.8834 | 0.8979 | 0.8906 |
DeepSeek-7B | 0.4593 | 0.2981 | 0.3049 | 0.4267 | 0.8567 | 0.8703 | 0.8631 | |
Zephyr-7B Beta | 0.4328 | 0.2675 | 0.2746 | 0.4012 | 0.8317 | 0.8472 | 0.8393 | |
Mistral-7B | 0.4462 | 0.2898 | 0.2969 | 0.4154 | 0.8498 | 0.8635 | 0.8561 | |
LLaMA 2-7B | 0.4883 | 0.3236 | 0.3312 | 0.4508 | 0.8712 | 0.8856 | 0.8784 |
Evaluation Metric | Zero-Shot Model | Fine-Tuned Model | RAG-Based Model |
---|---|---|---|
Citation Density (CD) | 0.92 | 1.63 | 2.21 |
Statutory Inclusion Rate (SIR) | 42.5% | 68.4% | 87.2% |
Precedent Alignment Score (PAS) | 28.3% | 59.1% | 85.6% |
Factual Consistency Index (FCI) | 61.0% | 75.3% | 92.7% |
Legal Entity Preservation Rate (LEPR) | 52.8% | 79.6% | 93.8% |
Generated Summary | Expert Justification/Observation |
---|---|
Zero-Shot Summary: The judgment discusses the issue of reservation in government jobs for backward classes and interprets constitutional rights. The Court referred to the Mandal Commission and gave directions on equality in employment. | Limitations:
|
Fine-Tuned Summary: The Court upheld 27% reservations for OBCs under Article 16(4) and ruled that the ’creamy layer’ should be excluded. It also imposed a 50% ceiling on reservations and addressed the validity of Mandal Commission’s recommendations. | Improvements:
|
RAG-Based Summary: The Supreme Court, while interpreting Article 16(4), upheld 27% reservation for OBCs based on the Mandal Commission’s findings but excluded the creamy layer to ensure equality. It instituted a 50% reservation ceiling, asserting that excessive quotas violate constitutional balance. The Court also recommended periodic reviews of backward class status to prevent misuse and ensure targeted benefits. | Most Complete:
|
Generated Summary | Expert Justification/Observation |
---|---|
Zero-Shot Summary: The case is about a passport issue where the Court discussed personal liberty and gave a ruling on constitutional rights under Article 21. | Limitations:
|
Fine-Tuned Summary: The Court held that impounding a passport violates Article 21 and must follow a fair procedure. It expanded the meaning of personal liberty and emphasized that laws must be reasonable and just. | Improvements:
|
RAG-Based Summary: In Maneka Gandhi v. Union of India, the Supreme Court held that the right to travel abroad falls under personal liberty in Article 21. The judgment revolutionized Indian constitutional law by introducing the doctrine of procedural due process, asserting that any law depriving personal liberty must be fair, just, and reasonable. The case overruled narrow interpretations of liberty and linked Articles 14, 19, and 21 in a unified reading. | Most Complete:
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ajay Mukund, S.; Easwarakumar, K.S. Optimizing Legal Text Summarization Through Dynamic Retrieval-Augmented Generation and Domain-Specific Adaptation. Symmetry 2025, 17, 633. https://doi.org/10.3390/sym17050633
Ajay Mukund S, Easwarakumar KS. Optimizing Legal Text Summarization Through Dynamic Retrieval-Augmented Generation and Domain-Specific Adaptation. Symmetry. 2025; 17(5):633. https://doi.org/10.3390/sym17050633
Chicago/Turabian StyleAjay Mukund, S, and K. S. Easwarakumar. 2025. "Optimizing Legal Text Summarization Through Dynamic Retrieval-Augmented Generation and Domain-Specific Adaptation" Symmetry 17, no. 5: 633. https://doi.org/10.3390/sym17050633
APA StyleAjay Mukund, S., & Easwarakumar, K. S. (2025). Optimizing Legal Text Summarization Through Dynamic Retrieval-Augmented Generation and Domain-Specific Adaptation. Symmetry, 17(5), 633. https://doi.org/10.3390/sym17050633