Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (13)

Search Parameters:
Keywords = legal text summarization

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
28 pages, 1928 KiB  
Article
Deep Learning-Based Automatic Summarization of Chinese Maritime Judgment Documents
by Lin Zhang, Yanan Li and Hongyu Zhang
Appl. Sci. 2025, 15(10), 5434; https://doi.org/10.3390/app15105434 - 13 May 2025
Viewed by 126
Abstract
In the context of China’s accelerating maritime judicial digitization, automatic summarization of lengthy and terminology-rich judgment documents has become a critical need for improving legal efficiency. Focusing on the task of automatic summarization for Chinese maritime judgment documents, we propose HybridSumm, an “extraction–abstraction” [...] Read more.
In the context of China’s accelerating maritime judicial digitization, automatic summarization of lengthy and terminology-rich judgment documents has become a critical need for improving legal efficiency. Focusing on the task of automatic summarization for Chinese maritime judgment documents, we propose HybridSumm, an “extraction–abstraction” hybrid summarization framework that integrates a maritime judgment lexicon to address the unique characteristics of maritime legal texts, including their extended length and dense domain-specific terminology. First, we construct a specialized maritime judgment lexicon to enhance the accuracy of legal term identification, specifically targeting the complexity of maritime terminology. Second, for long-text processing, we design an extractive summarization model that integrates the RoBERTa-wwm-ext pre-trained model with dilated convolutional networks and residual mechanisms. It can efficiently identify key sentences by capturing both local semantic features and global contextual relationships in lengthy judgments. Finally, the abstraction stage employs a Nezha-UniLM encoder–decoder architecture, augmented with a pointer–generator network (for out-of-vocabulary term handling) and a coverage mechanism (to reduce redundancy), ensuring that summaries are logically coherent and legally standardized. Experimental results show that HybridSumm’s lexicon-guided two-stage framework significantly enhances the standardization of legal terminology and semantic coherence in long-text summaries, validating its practical value in advancing judicial intelligence development. Full article
(This article belongs to the Special Issue Data Analysis and Data Mining for Knowledge Discovery)
Show Figures

Figure 1

18 pages, 2206 KiB  
Article
Multi-Knowledge-Enhanced Model for Korean Abstractive Text Summarization
by Kyoungsu Oh, Youngho Lee and Hyekyung Woo
Electronics 2025, 14(9), 1813; https://doi.org/10.3390/electronics14091813 - 29 Apr 2025
Viewed by 293
Abstract
Text summarization plays a crucial role in processing extensive textual data, particularly in low-resource languages such as Korean. However, abstractive summarization faces persistent challenges, including semantic distortion and inconsistency. This study addresses these limitations by proposing a multi-knowledge-enhanced abstractive summarization model tailored for [...] Read more.
Text summarization plays a crucial role in processing extensive textual data, particularly in low-resource languages such as Korean. However, abstractive summarization faces persistent challenges, including semantic distortion and inconsistency. This study addresses these limitations by proposing a multi-knowledge-enhanced abstractive summarization model tailored for Korean texts. The model integrates internal knowledge, specifically keywords and topics that are extracted using a context-aware BERT-based approach. Unlike traditional statistical extraction methods, our approach utilizes the semantic context to ensure that the internal knowledge is both diverse and representative. By employing a multi-head attention mechanism, the proposed model effectively integrates multiple types of internal knowledge with the original document embeddings. Experimental evaluations on Korean datasets (news and legal texts) demonstrate that our model significantly outperforms baseline methods, achieving notable improvements in lexical overlap, semantic consistency, and structural coherence, as evidenced by higher ROUGE and BERTScore metrics. Furthermore, the method maintains information consistency across diverse categories, including dates, quantities, and organizational details. These findings highlight the potential of context-aware multi-knowledge integration in enhancing Korean abstractive summarization and suggest promising directions for future research into broader knowledge-incorporation strategies. Full article
Show Figures

Figure 1

35 pages, 18520 KiB  
Article
Optimizing Legal Text Summarization Through Dynamic Retrieval-Augmented Generation and Domain-Specific Adaptation
by S Ajay Mukund and K. S. Easwarakumar
Symmetry 2025, 17(5), 633; https://doi.org/10.3390/sym17050633 - 23 Apr 2025
Viewed by 762
Abstract
Legal text summarization presents distinct challenges due to the intricate and domain-specific nature of legal language. This paper introduces a novel framework integrating dynamic Retrieval-Augmented Generation (RAG) with domain-specific adaptation to enhance the accuracy and contextual relevance of legal document summaries. The proposed [...] Read more.
Legal text summarization presents distinct challenges due to the intricate and domain-specific nature of legal language. This paper introduces a novel framework integrating dynamic Retrieval-Augmented Generation (RAG) with domain-specific adaptation to enhance the accuracy and contextual relevance of legal document summaries. The proposed Dynamic Legal RAG system achieves a vital form of symmetry between information retrieval and content generation, ensuring that retrieved legal knowledge is both comprehensive and precise. Using the BM25 retriever with top-3 chunk selection, the system optimizes relevance and efficiency, minimizing redundancy while maximizing legally pertinent content. with top-3 chunk selection, the system optimizes relevance and efficiency, minimizing redundancy while maximizing legally pertinent content. A key design feature is the compression ratio constraint (0.05 to 0.5), maintaining structural symmetry between the original judgment and its summary by balancing representation and information density. Extensive evaluations establish BM25 as the most effective retriever, striking an optimal balance between precision and recall. A comparative analysis of transformer-based (Decoder-only) models—DeepSeek-7B, LLaMA 2-7B, and LLaMA 3.1-8B—demonstrates that LLaMA 3.1-8B, enriched with Legal Named Entity Recognition (NER) and the Dynamic RAG system, achieves superior performance with a BERTScore of 0.89. This study lays a strong foundation for future research in hybrid retrieval models, adaptive chunking strategies, and legal-specific evaluation metrics, with practical implications for case law analysis and automated legal drafting. Full article
Show Figures

Figure 1

29 pages, 3281 KiB  
Article
An Automated Repository for the Efficient Management of Complex Documentation
by José Frade and Mário Antunes
Information 2025, 16(3), 205; https://doi.org/10.3390/info16030205 - 5 Mar 2025
Viewed by 628
Abstract
The accelerating digitalization of the public and private sectors has made information technologies (IT) indispensable in modern life. As services shift to digital platforms and technologies expand across industries, the complexity of legal, regulatory, and technical requirement documentation is growing rapidly. This increase [...] Read more.
The accelerating digitalization of the public and private sectors has made information technologies (IT) indispensable in modern life. As services shift to digital platforms and technologies expand across industries, the complexity of legal, regulatory, and technical requirement documentation is growing rapidly. This increase presents significant challenges in managing, gathering, and analyzing documents, as their dispersion across various repositories and formats hinders accessibility and efficient processing. This paper presents the development of an automated repository designed to streamline the collection, classification, and analysis of cybersecurity-related documents. By harnessing the capabilities of natural language processing (NLP) models—specifically Generative Pre-Trained Transformer (GPT) technologies—the system automates text ingestion, extraction, and summarization, providing users with visual tools and organized insights into large volumes of data. The repository facilitates the efficient management of evolving cybersecurity documentation, addressing issues of accessibility, complexity, and time constraints. This paper explores the potential applications of NLP in cybersecurity documentation management and highlights the advantages of integrating automated repositories equipped with visualization and search tools. By focusing on legal documents and technical guidelines from Portugal and the European Union (EU), this applied research seeks to enhance cybersecurity governance, streamline document retrieval, and deliver actionable insights to professionals. Ultimately, the goal is to develop a scalable, adaptable platform capable of extending beyond cybersecurity to serve other industries that rely on the effective management of complex documentation. Full article
Show Figures

Figure 1

15 pages, 806 KiB  
Review
Bitcoin Use Cases: A Scoping Review
by Emma Apatu and Poornima Goudar
Challenges 2024, 15(1), 15; https://doi.org/10.3390/challe15010015 - 14 Mar 2024
Cited by 1 | Viewed by 4055
Abstract
This scoping review examines individual and societal use cases of Bitcoin in the peer-reviewed literature. Arksey and O’Malley’s scoping review methodology was used, and a comprehensive search strategy was employed using Web of Science and Engineering village databases. Articles were screened at the [...] Read more.
This scoping review examines individual and societal use cases of Bitcoin in the peer-reviewed literature. Arksey and O’Malley’s scoping review methodology was used, and a comprehensive search strategy was employed using Web of Science and Engineering village databases. Articles were screened at the title and abstract and full-text levels by the authors. One author conducted data extraction to summarize the data. In total, 17 relevant articles were included in this review. Investment and savings were the most widely reported use cases at an individual level, with payments and international transfers less frequently reported in the studies. Only two studies reported on societal use cases of legal tender; however, only one country, El Salvador, executed its intention. Our study suggests that Bitcoin is being used by individuals around the world with little report of societal (e.g., country adoption) uses cases. For example, there is evidence on the internet and on a grass-roots level that Bitcoin is being used in circular economies; however, the peer-reviewed literature may not yet capture the extent and full benefits and challenges. As such, we provide ideas for future research to more comprehensively explore Bitcoin uses and its impacts on individuals and society. Full article
Show Figures

Figure 1

21 pages, 815 KiB  
Article
Prediction of Arabic Legal Rulings Using Large Language Models
by Adel Ammar, Anis Koubaa, Bilel Benjdira, Omer Nacar and Serry Sibaee
Electronics 2024, 13(4), 764; https://doi.org/10.3390/electronics13040764 - 15 Feb 2024
Cited by 5 | Viewed by 3007
Abstract
In the intricate field of legal studies, the analysis of court decisions is a cornerstone for the effective functioning of the judicial system. The ability to predict court outcomes helps judges during the decision-making process and equips lawyers with invaluable insights, enhancing their [...] Read more.
In the intricate field of legal studies, the analysis of court decisions is a cornerstone for the effective functioning of the judicial system. The ability to predict court outcomes helps judges during the decision-making process and equips lawyers with invaluable insights, enhancing their strategic approaches to cases. Despite its significance, the domain of Arabic court analysis remains under-explored. This paper pioneers a comprehensive predictive analysis of Arabic court decisions on a dataset of 10,813 commercial court real cases, leveraging the advanced capabilities of the current state-of-the-art large language models. Through a systematic exploration, we evaluate three prevalent foundational models (LLaMA-7b, JAIS-13b, and GPT-3.5-turbo) and three training paradigms: zero-shot, one-shot, and tailored fine-tuning. In addition, we assess the benefit of summarizing and/or translating the original Arabic input texts. This leads to a spectrum of 14 model variants, for which we offer a granular performance assessment with a series of different metrics (human assessment, GPT evaluation, ROUGE, and BLEU scores). We show that all variants of LLaMA models yield limited performance, whereas GPT-3.5-based models outperform all other models by a wide margin, surpassing the average score of the dedicated Arabic-centric JAIS model by 50%. Furthermore, we show that all scores except human evaluation are inconsistent and unreliable for assessing the performance of large language models on court decision predictions. This study paves the way for future research, bridging the gap between computational linguistics and Arabic legal analytics. Full article
(This article belongs to the Special Issue Emerging Theory and Applications in Natural Language Processing)
Show Figures

Figure 1

32 pages, 1269 KiB  
Article
Evaluation of Automatic Legal Text Summarization Techniques for Greek Case Law
by Marios Koniaris, Dimitris Galanis, Eugenia Giannini and Panayiotis Tsanakas
Information 2023, 14(4), 250; https://doi.org/10.3390/info14040250 - 21 Apr 2023
Cited by 9 | Viewed by 5034
Abstract
The increasing amount of legal information available online is overwhelming for both citizens and legal professionals, making it difficult and time-consuming to find relevant information and keep up with the latest legal developments. Automatic text summarization techniques can be highly beneficial as they [...] Read more.
The increasing amount of legal information available online is overwhelming for both citizens and legal professionals, making it difficult and time-consuming to find relevant information and keep up with the latest legal developments. Automatic text summarization techniques can be highly beneficial as they save time, reduce costs, and lessen the cognitive load of legal professionals. However, applying these techniques to legal documents poses several challenges due to the complexity of legal documents and the lack of needed resources, especially in linguistically under-resourced languages, such as the Greek language. In this paper, we address automatic summarization of Greek legal documents. A major challenge in this area is the lack of suitable datasets in the Greek language. In response, we developed a new metadata-rich dataset consisting of selected judgments from the Supreme Civil and Criminal Court of Greece, alongside their reference summaries and category tags, tailored for the purpose of automated legal document summarization. We also adopted several state-of-the-art methods for abstractive and extractive summarization and conducted a comprehensive evaluation of the methods using both human and automatic metrics. Our results: (i) revealed that, while extractive methods exhibit average performance, abstractive methods generate moderately fluent and coherent text, but they tend to receive low scores in relevance and consistency metrics; (ii) indicated the need for metrics that capture better a legal document summary’s coherence, relevance, and consistency; (iii) demonstrated that fine-tuning BERT models on a specific upstream task can significantly improve the model’s performance. Full article
(This article belongs to the Special Issue Novel Methods and Applications in Natural Language Processing)
Show Figures

Figure 1

14 pages, 1366 KiB  
Article
Data-Driven Analysis of Privacy Policies Using LexRank and KL Summarizer for Environmental Sustainability
by Abdul Quadir Md, Raghav V. Anand, Senthilkumar Mohan, Christy Jackson Joshua, Sabhari S. Girish, Anthra Devarajan and Celestine Iwendi
Sustainability 2023, 15(7), 5941; https://doi.org/10.3390/su15075941 - 29 Mar 2023
Viewed by 2190
Abstract
Natural language processing (NLP) is a field in machine learning that analyses and manipulate huge amounts of data and generates human language. There are a variety of applications of NLP such as sentiment analysis, text summarization, spam filtering, language translation, etc. Since privacy [...] Read more.
Natural language processing (NLP) is a field in machine learning that analyses and manipulate huge amounts of data and generates human language. There are a variety of applications of NLP such as sentiment analysis, text summarization, spam filtering, language translation, etc. Since privacy documents are important and legal, they play a vital part in any agreement. These documents are very long, but the important points still have to be read thoroughly. Customers might not have the necessary time or the knowledge to understand all the complexities of a privacy policy document. In this context, this paper proposes an optimal model to summarize the privacy policy in the best possible way. The methodology of text summarization is the process where the summaries from the original huge text are extracted without losing any vital information. Using the proposed idea of a common word reduction process combined with natural language processing algorithms, this paper extracts the sentences in the privacy policy document that hold high weightage and displays them to the customer, and it can save the customer’s time from reading through the entire policy while also providing the customers with only the important lines that they need to know before signing the document. The proposed method uses two different extractive text summarization algorithms, namely LexRank and Kullback Leibler (KL) Summarizer, to summarize the obtained text. According to the results, the summarized sentences obtained via the common word reduction process and text summarization algorithms were more significant than the raw privacy policy text. The introduction of this novel methodology helps to find certain important common words used in a particular sector to a greater depth, thus allowing more in-depth study of a privacy policy. Using the common word reduction process, the sentences were reduced by 14.63%, and by applying extractive NLP algorithms, significant sentences were obtained. The results after applying NLP algorithms showed a 191.52% increase in the repetition of common words in each sentence using the KL summarizer algorithm, while the LexRank algorithm showed a 361.01% increase in the repetition of common words. This implies that common words play a large role in determining a sector’s privacy policies, making our proposed method a real-world solution for environmental sustainability. Full article
Show Figures

Figure 1

16 pages, 1563 KiB  
Article
A High-Precision Two-Stage Legal Judgment Summarization
by Yue Huang, Lijuan Sun, Chong Han and Jian Guo
Mathematics 2023, 11(6), 1320; https://doi.org/10.3390/math11061320 - 9 Mar 2023
Cited by 9 | Viewed by 2478
Abstract
Legal judgments are generally very long, and relevant information is often scattered throughout the text. To complete a legal judgment summarization, capturing important, relevant information comprehensively from a lengthy text is crucial. The existing abstractive-summarization models based on pre-trained language have restrictions on [...] Read more.
Legal judgments are generally very long, and relevant information is often scattered throughout the text. To complete a legal judgment summarization, capturing important, relevant information comprehensively from a lengthy text is crucial. The existing abstractive-summarization models based on pre-trained language have restrictions on the length of an input text. Another concern is that the generated summaries have not been well integrated with the legal judgment’s technical terms and specific topics. In this paper, we used raw legal judgments as information of different granularities and proposed a two-stage text-summarization model to handle different granularities of information. Specifically, we treated the legal judgments as a sequence of sentences and selected key sentence sets from the full texts as an input corpus for summary generation. In addition, we extracted keywords related to technical terms and specific topics in the legal texts and introduced them into the summary-generation model as an attention mechanism. The experimental results on the CAIL2020 and the LCRD datasets showed that our model achieved an overall 0.19–0.41 improvement in its ROUGE score, as compared to the baseline models. Further analysis also showed that our method could comprehensively capture essential and relevant information from lengthy legal texts and generate better legal judgment summaries. Full article
Show Figures

Figure 1

25 pages, 1181 KiB  
Review
Challenges and Open Problems of Legal Document Anonymization
by Gergely Márk Csányi, Dániel Nagy, Renátó Vági, János Pál Vadász and Tamás Orosz
Symmetry 2021, 13(8), 1490; https://doi.org/10.3390/sym13081490 - 13 Aug 2021
Cited by 31 | Viewed by 6752
Abstract
Data sharing is a central aspect of judicial systems. The openly accessible documents can make the judiciary system more transparent. On the other hand, the published legal documents can contain much sensitive information about the involved persons or companies. For this reason, the [...] Read more.
Data sharing is a central aspect of judicial systems. The openly accessible documents can make the judiciary system more transparent. On the other hand, the published legal documents can contain much sensitive information about the involved persons or companies. For this reason, the anonymization of these documents is obligatory to prevent privacy breaches. General Data Protection Regulation (GDPR) and other modern privacy-protecting regulations have strict definitions of private data containing direct and indirect identifiers. In legal documents, there is a wide range of attributes regarding the involved parties. Moreover, legal documents can contain additional information about the relations between the involved parties and rare events. Hence, the personal data can be represented by a sparse matrix of these attributes. The application of Named Entity Recognition methods is essential for a fair anonymization process but is not enough. Machine learning-based methods should be used together with anonymization models, such as differential privacy, to reduce re-identification risk. On the other hand, the information content (utility) of the text should be preserved. This paper aims to summarize and highlight the open and symmetrical problems from the fields of structured and unstructured text anonymization. The possible methods for anonymizing legal documents discussed and illustrated by case studies from the Hungarian legal practice. Full article
(This article belongs to the Topic Applied Metaheuristic Computing)
Show Figures

Figure 1

17 pages, 4732 KiB  
Article
Hybridization of Intelligent Solutions Architecture for Text Understanding and Text Generation
by Anton Ivaschenko, Arkadiy Krivosheev, Anastasia Stolbova and Oleg Golovnin
Appl. Sci. 2021, 11(11), 5179; https://doi.org/10.3390/app11115179 - 2 Jun 2021
Cited by 5 | Viewed by 2753
Abstract
This study proposes a new logical model for intelligent software architecture devoted to improving the efficiency of automated text understanding and text generation in industrial applications. The presented approach introduces a few patterns that provide a possibility to build adaptable and extensible solutions [...] Read more.
This study proposes a new logical model for intelligent software architecture devoted to improving the efficiency of automated text understanding and text generation in industrial applications. The presented approach introduces a few patterns that provide a possibility to build adaptable and extensible solutions using machine learning technologies. The main idea is formalized by the concept of expounder hybridization. It summarizes an experience of document analysis and generation solutions development and social media analysis based on artificial neural networks’ practical use. The results of solving the task by the best expounder were improved using the method of aggregating multiple expounders. The quality of expounders’ combination can be further improved by introducing the pro-active competition between them on the basis of, e.g., auctioning algorithm, using several parameters including precision, solution performance and score. Analysis of the proposed approach was carried out using a dataset of legal documents including joint-stock company decision record sheets and protocols. The solution is implemented in an enterprise content management system and illustrated by an example of processing of legal documentation. Full article
(This article belongs to the Special Issue 14th International Conference on Intelligent Systems (INTELS’20))
Show Figures

Figure 1

17 pages, 454 KiB  
Article
Identification of Judicial Outcomes in Judgments: A Generalized Gini-PLS Approach
by Gildas Tagny-Ngompé, Stéphane Mussard, Guillaume Zambrano, Sébastien Harispe and Jacky Montmain
Stats 2020, 3(4), 427-443; https://doi.org/10.3390/stats3040027 - 27 Sep 2020
Cited by 3 | Viewed by 3178
Abstract
This paper presents and compares several text classification models that can be used to extract the outcome of a judgment from justice decisions, i.e., legal documents summarizing the different rulings made by a judge. Such models can be used to gather important statistics [...] Read more.
This paper presents and compares several text classification models that can be used to extract the outcome of a judgment from justice decisions, i.e., legal documents summarizing the different rulings made by a judge. Such models can be used to gather important statistics about cases, e.g., success rate based on specific characteristics of cases’ parties or jurisdiction, and are therefore important for the development of Judicial prediction not to mention the study of Law enforcement in general. We propose in particular the generalized Gini-PLS which better considers the information in the distribution tails while attenuating, as in the simple Gini-PLS, the influence exerted by outliers. Modeling the studied task as a supervised binary classification, we also introduce the LOGIT-Gini-PLS suited to the explanation of a binary target variable. In addition, various technical aspects regarding the evaluated text classification approaches which consists of combinations of representations of judgments and classification algorithms are studied using an annotated corpora of French justice decisions. Full article
(This article belongs to the Special Issue Interdisciplinary Research on Predictive Justice)
Show Figures

Figure 1

24 pages, 435 KiB  
Article
Evaluation of Diversification Techniques for Legal Information Retrieval
by Marios Koniaris, Ioannis Anagnostopoulos and Yannis Vassiliou
Algorithms 2017, 10(1), 22; https://doi.org/10.3390/a10010022 - 29 Jan 2017
Cited by 18 | Viewed by 9139
Abstract
“Public legal information from all countries and international institutions is part of the common heritage of humanity. Maximizing access to this information promotes justice and the rule of law”. In accordance with the aforementioned declaration on free access to law by legal information [...] Read more.
“Public legal information from all countries and international institutions is part of the common heritage of humanity. Maximizing access to this information promotes justice and the rule of law”. In accordance with the aforementioned declaration on free access to law by legal information institutes of the world, a plethora of legal information is available through the Internet, while the provision of legal information has never before been easier. Given that law is accessed by a much wider group of people, the majority of whom are not legally trained or qualified, diversification techniques should be employed in the context of legal information retrieval, as to increase user satisfaction. We address the diversification of results in legal search by adopting several state of the art methods from the web search, network analysis and text summarization domains. We provide an exhaustive evaluation of the methods, using a standard dataset from the common law domain that we objectively annotated with relevance judgments for this purpose. Our results: (i) reveal that users receive broader insights across the results they get from a legal information retrieval system; (ii) demonstrate that web search diversification techniques outperform other approaches (e.g., summarization-based, graph-based methods) in the context of legal diversification; and (iii) offer balance boundaries between reinforcing relevant documents or sampling the information space around the legal query. Full article
(This article belongs to the Special Issue Humanistic Data Processing)
Show Figures

Figure 1

Back to TopTop