1. Introduction
As environmental and social concerns intensify globally, Environmental, Social, and Governance (ESG) evaluations have become a cornerstone of sustainable investment and risk management strategies. Governments, asset managers, and civil society increasingly pressure firms to demonstrate not only financial performance but also ethical and responsible conduct. In response, ESG rating systems have proliferated, offering structured metrics to gauge corporate sustainability. However, a growing body of research highlights a fundamental flaw: ESG ratings often suffer from significant inconsistency and incompleteness, creating a misleading sense of comparability across firms and industries [
1,
2].
A key reason for this weakness lies in the heavy reliance on self-reported disclosures and voluntary sustainability reports. These sources, while standardized to some extent, often omit latent risks, exaggerate compliance, or use vague, qualitative language. As a result, ESG assessments may fail to detect companies with elevated legal and reputational risk, especially in cases of greenwashing, social-washing, or governance failures that do not surface in annual disclosures [
3]. In fact, several firms with high ESG ratings have faced serious legal disputes for violations of environmental laws, labor abuses, or unethical board practices—indicating a clear gap between perceived ESG performance and actual risk exposure [
4].
Legal cases related to ESG controversies, such as lawsuits over environmental damage, labor discrimination, or shareholder rights violations, offer unfiltered and verifiable evidence of how firms’ actions contradict their public ESG narratives. Yet, these cases are rarely integrated into existing ESG scoring models. This omission constitutes a critical form of information asymmetry—where investors, regulators, and stakeholders lack access to negative, yet highly material, corporate behavior that has been adjudicated in court.
Recent advances in Natural Language Processing (NLP) and Explainable AI (XAI) make it feasible to extract meaningful risk indicators from legal documents. Techniques such as SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-Agnostic Explanations) allow for interpretable modeling of high-dimensional text data, identifying which legal terms, phrases, or arguments most strongly correlate with future risk outcomes [
5,
6]. When combined with firm-level features and ESG rating data, this litigation-based information can contribute to a more predictive and grounded ESG risk assessment framework. In this study, the United States was selected as the research context due to the availability of detailed and publicly accessible court rulings from both federal and state jurisdictions. The U.S. also exhibits one of the most mature and active ESG litigation environments globally, offering rich and well-documented examples of legal accountability. Although this study does not include data from the author’s home country, the proposed methodology is designed to be transferable to other jurisdictions, and future research aims to apply it to local ESG litigation cases. Given the precedent-setting role of U.S. case law and its frequent use in international ESG research, the chosen context provides a relevant and generalizable foundation for global sustainability discussions.
This study addresses two core research questions:
Does augmenting ESG risk models with litigation-derived features improve the prediction accuracy of future ESG-related legal events? By investigating these questions, the study seeks to advance ESG assessment systems from narrative compliance to empirical accountability. In doing so, it proposes a hybrid risk-scoring framework that integrates both disclosed and adjudicated ESG information. This approach not only enhances system-level robustness in sustainability ratings but also contributes to reducing information asymmetry in capital markets, promoting more ethically aligned corporate behavior.
The remainder of this study is organized as follows.
Section 2 reviews the theoretical background and prior research on ESG disclosure limitations and legal risk modeling.
Section 3 outlines the research methodology, including data sources, feature engineering, and model training.
Section 4 presents the empirical findings and model performance evaluation.
Section 5 concludes the study by summarizing contributions and proposing directions for future research.
2. Literature Review
2.1. Information Asymmetry Theory in ESG Risk Assessment
Information about a firm’s ESG performance is primarily disseminated to the market through voluntary corporate disclosures. However, such disclosures are often subject to selective reporting and positive framing, potentially limiting stakeholders’ ability to accurately assess the firm’s actual risks and sustainability profile. This creates a typical information asymmetry between firms and their stakeholders, wherein the quantity and quality of information available to each party are unevenly distributed. Prior studies have demonstrated that firms often engage in practices such as downplaying or delaying the disclosure of negative ESG-related information, either by minimizing its significance or postponing its release to the public [
7]. Such selective disclosure behavior can distort the information environment, resulting in decision-making errors among investors and contributing to broader market inefficiencies [
8].
An alternative information source that can mitigate these limitations is legal adjudication data. Court rulings and administrative sanctions are official, independently verified records issued by judicial or regulatory authorities. ESG-related legal disputes often provide concrete evidence of serious environmental, social, or governance risks. Importantly, such legal outcomes have significant financial and reputational consequences and are difficult for firms to selectively conceal or alter, making them a valuable complement to traditional ESG assessment frameworks, especially in capturing negative signals that may be omitted from voluntary disclosures.
Building on this perspective, the present study proposes integrating legal adjudication data with ESG disclosure information to help alleviate information asymmetry and enhance the accuracy of ESG risk assessments. Specifically, we employ large language model (LLM)-based natural language processing techniques to extract and quantify ESG-related risk factors from legal documents, and then combine these measures with conventional ESG indicators. This approach aims to empirically evaluate whether the inclusion of adjudicated legal signals can improve predictive performance compared to disclosure-only models. By doing so, the study addresses the structural limitations of existing ESG assessment practices and contributes to developing a more substantive and reliable framework for evaluating corporate sustainability.
2.2. Inconsistencies in ESG Rating Systems
Environmental, Social, and Governance (ESG) ratings have become indispensable in the global investment ecosystem, shaping capital flows, corporate strategies, and regulatory frameworks. However, a growing body of scholarship highlights fundamental inconsistencies across ESG rating providers, which undermine the reliability and comparability of ESG scores. Berg et al. [
1] introduced the term “aggregate confusion” to describe the weak correlations—often below 0.6—among ESG scores assigned by leading agencies such as MSCI, Sustainalytics, and Refinitiv. This divergence suggests that firms can receive markedly different ESG assessments depending on the evaluator, posing serious challenges for investors, regulators, and stakeholders who rely on these metrics for decision-making [
9].
The root causes of these inconsistencies are multifaceted. First, ESG evaluation lacks a standardized methodology. Rating agencies differ significantly in their selection of indicators, weighting schemes, materiality frameworks, and data sources. Christensen et al. [
2] demonstrated that while one provider might emphasize outcome-based environmental metrics (e.g., carbon emissions), another might prioritize the existence of internal ESG policies regardless of their actual implementation or effectiveness. It is also argued that the conceptual foundations of ESG vary widely among agencies, reflecting divergent normative views on sustainability and leading to inconsistent taxonomies and scoring logic [
10].
Second, ESG scores rely heavily on corporate self-disclosures. Many companies voluntarily publish sustainability reports or respond to ESG questionnaires, often presenting curated information that is not subject to external audit or legal verification. Boiral [
3] critiques these reports as being more symbolic than substantive, describing them as “simulacra” intended to construct socially desirable narratives rather than reflect operational realities. ESG disclosures are often used strategically for impression management, resulting in selective transparency that omits negative information [
11].
Third, the opacity of ESG rating methodologies contributes to limited reproducibility and auditability [
12]. Chatterji et al. [
4] note that proprietary algorithms and undisclosed data processing methods prevent scholars and practitioners from replicating ESG scores or validating their predictive accuracy. As a result, even well-intentioned firms may be misrepresented due to evaluator-specific criteria, taxonomies, or regionally adjusted scoring rubrics.
The implications of these inconsistencies are profound. From a financial perspective, unreliable ESG scores can lead to mispriced risk, capital misallocation, and legal liabilities for fiduciaries relying on flawed metrics [
13]. From a policy standpoint, such inconsistencies erode the legitimacy of ESG integration into regulatory regimes, such as the EU Sustainable Finance Disclosure Regulation (SFDR) or the U.S. SEC’s proposed ESG disclosure rules [
14]. Most critically, inconsistent ESG ratings distort perceptions of corporate sustainability performance, allowing firms to maintain high ESG scores while engaging in practices that contradict core environmental or social principles [
14,
15,
16,
17].
To address these limitations, this study proposes a hybrid ESG risk assessment framework that integrates litigation-based indicators into conventional scoring systems. Rather than discarding ESG ratings altogether, the approach seeks to complement them with legally adjudicated, externally verified data to improve objectivity, transparency, and predictive value. This integration helps mitigate methodological fragmentation and addresses blind spots in conventional ESG evaluations through the use of explainable, verifiable, and legally grounded evidence.
2.3. Disclosure Gaps and Symbolic Compliance
A major reason for ESG rating inconsistencies lies in the structural limitations of corporate ESG disclosure itself. While transparency initiatives such as the Global Reporting Initiative (GRI), the Sustainability Accounting Standards Board (SASB), and the Task Force on Climate-related Financial Disclosures (TCFD) have made significant progress in standardizing ESG reporting, participation remains largely voluntary and implementation practices vary widely. As a result, companies can engage in what scholars’ term “symbolic compliance”—a practice where firms adopt ESG language and frameworks to signal legitimacy without implementing substantive changes [
18,
19]. Symbolic compliance is referred to the strategic adoption of ESG-oriented rhetoric, structures, or policies primarily for reputational or legitimacy purposes, often decoupled from actual operational improvements or measurable sustainability outcomes [
20].
Sustainability reports are often designed to protect corporate reputation rather than to provide transparent, actionable data [
3,
5,
21]. This phenomenon is particularly evident in industries with high environmental or social risk, where the pressure to appear responsible incentivizes the production of glossy, strategically crafted reports. Under regulatory or public scrutiny, firms tend to enhance the volume and visibility of their sustainability disclosures, often without making substantive improvements in ESG performance [
22].
The theoretical underpinning for symbolic compliance can be found in institutional theory. Organizations often adopt formal structures to enhance legitimacy, even when these structures are decoupled from actual operational practices [
23]. In the ESG context, this means firms can develop policies on diversity, emissions, or ethics, but these policies may lack enforcement mechanisms, performance targets, or stakeholder accountability.
Symbolic compliance exacerbates the problem of information asymmetry between firms and stakeholders. Investors, policymakers, and consumers may assume that comprehensive ESG disclosures signal strong ESG performance, when in fact, they may reflect public relations strategies rather than actual outcomes. This misalignment distorts market signals and undermines the utility of ESG ratings based on such disclosures. As Lyon and Montgomery [
24] identifies, the risk is not merely that firms mislead, but that existing frameworks reward them for doing so. Similarly, Al Amosh [
25] finds that greater accounting reporting complexity is associated with lower levels of ESG disclosure—particularly in the environmental dimension—suggesting that structural barriers in corporate reporting practices can further limit the reliability and completeness of ESG-related information. Moreover, information asymmetry is further intensified by the selective nature of corporate disclosures [
26]. Firms often possess private information about internal ESG-related issues and potential risks but choose not to disclose them unless explicitly mandated by regulatory requirements. This deliberate withholding or selective reporting of negative ESG information results in stakeholders operating under incomplete and potentially misleading assumptions, thus undermining market efficiency and investor decision-making [
27]. In this study, latent risks are defined as ESG-relevant risks that are present but not readily visible to stakeholders due to their omission, underreporting, or misrepresentation in corporate disclosures—such as pending litigation, regulatory investigations, or operational practices with hidden environmental or social impacts [
28].
Recent research also emphasizes that ESG disclosures alone do not guarantee corporate sustainability, particularly when symbolic compliance is not accompanied by digital transformation or green innovation efforts. Qing and Jin [
26] demonstrate that while ESG and AI-based digital transformation both positively influence sustainability outcomes, these effects are contingent upon substantive environmental innovation, underscoring the importance of moving beyond declarative ESG signaling.
Furthermore, symbolic compliance creates blind spots for ESG evaluations. Disclosures typically exclude controversial events such as labor strikes, regulatory fines, or lawsuits unless required by law. This omission limits the ability of ESG scores to serve as forward-looking risk indicators. Only ESG factors considered “material” by financial standards tend to correlate with superior performance, indicating that many ESG disclosures may inflate perception without providing substantive value [
29].
In this context, integrating litigation data into ESG assessment offers a critical corrective. Legal cases provide externally validated evidence of ESG-related failures—environmental harm, labor violations, or governance breaches—that are often absent from corporate disclosures. By capturing these “invisible” risks, the proposed model directly addresses the shortcomings of symbolic compliance and significantly reduces information asymmetry between corporations and their stakeholders. This integration enhances the accountability and reliability of ESG assessments, aligning with recent calls for outcome-based ESG metrics that reflect actual impacts rather than aspirational statements.
In parallel, Liu et al. [
30] empirically demonstrate that the positive impact of ESG and AI adoption on corporate sustainability is contingent upon organizational capabilities, such as learning capacity, digital-oriented top management teams (TMTs), and operational slack. Their findings reinforce the argument that symbolic compliance alone is insufficient and that measurable ESG performance requires structural and strategic readiness within firms.
2.4. ESG-Related Litigation as a Risk Signal
Legal proceedings linked to ESG issues offer a uniquely reliable and underutilized lens through which to assess corporate sustainability risks. Unlike self-reported data or third-party ESG scores based on voluntary disclosures, litigation records represent adjudicated, externally validated evidence of material harm. They provide insight not only into the frequency of ESG violations but also into their severity, legal consequences, and underlying structural causes.
Karpoff et al. [
31] found that firms subject to regulatory sanctions or fraud investigations typically experience significant declines in shareholder value, even after controlling for reputational repair mechanisms. Firms involved in environmental and labor-related lawsuits tend to experience long-term financial underperformance [
32]. These findings underscore that litigation is not a peripheral or exceptional ESG event—it is a central indicator of misalignment between corporate actions and ESG principles.
Repeat litigation in ESG domains often signals systemic governance deficiencies, serving as an early-warning indicator of structural failure [
33]. Moreover, shareholder engagement tends to yield positive ESG outcomes only when backed by credible enforcement mechanisms such as litigation or regulatory pressure [
34].
Despite this, current ESG models largely exclude litigation data, either due to definitional ambiguities or a lack of structured pipelines for legal document integration. ESG rating agencies rarely revise scores in real time following lawsuits, and often rely on voluntary company disclosures rather than court documents or enforcement records [
5]. This creates a profound misalignment between ESG scores and real-world ESG performance.
Integrating litigation records into ESG assessment frameworks has the potential to resolve this disconnect. Such records offer not only historical accountability but predictive insight into future risk exposure. Legal actions—especially when interpreted through NLP and explainable AI techniques—can be classified, weighted, and connected to firm-level ESG profiles to build a more grounded and verifiable risk model. By embedding litigation into ESG evaluation, this study advances the shift from perception-based assessments to evidence-based accountability metrics.
2.5. Explainable AI for Legal Text Analysis
The increasing complexity and opacity of legal texts—especially court rulings involving ESG controversies—pose both a challenge and an opportunity for improving ESG risk assessment. Unlike voluntary sustainability reports or public disclosures, judicial decisions provide externally validated, enforceable accounts of corporate behavior. These rulings document violations of environmental laws, labor standards, and governance norms with detailed reasoning and legal context, making them highly credible sources of ESG risk signals. Consequently, case law represents a vital, underutilized reservoir of information that can reveal discrepancies between a firm’s public ESG image and its actual legal exposure.
Yet the practical integration of court rulings into ESG risk models remains limited. This is primarily due to their unstructured nature, technical legal terminology, and length, which render manual analysis both time-intensive and prone to subjectivity [
35]. As ESG litigation increases globally, there is a growing need for automated systems that can process large volumes of legal texts and extract meaningful, decision-relevant features.
Recent advances in Natural Language Processing (NLP) offer the tools needed to address this gap. Pre-trained models such as Legal-BERT, CaseLaw-BERT, and Longformer have demonstrated effectiveness in legal document classification, argument extraction, and judgment prediction [
36,
37]. However, for their outputs to be trusted and adopted in high-stakes ESG analysis, they must go beyond black-box predictions.
This is where Explainable AI (XAI) becomes indispensable. XAI methods like SHAP (SHapley Additive Explanations) and LIME (Local Interpretable Model-Agnostic Explanations) make it possible to understand and audit the behavior of NLP models by identifying which words, phrases, or structural components in a legal document most influenced a model’s risk prediction [
25,
38]. For instance, if a model highlights repeated fines, non-compliance citations, or adverse judgments as critical signals, these explanations can serve as justifiable risk indicators for investors and regulators.
Moreover, the inclusion of XAI in legal NLP ensures transparency, fairness, and accountability in ESG evaluation systems. It enables stakeholders to trace and verify how specific legal precedents contribute to an overall risk score, which is essential for both regulatory compliance and institutional credibility. XAI also facilitates human-in-the-loop systems, allowing legal experts to validate or refine model outputs [
39,
40].
In sum, applying Explainable AI (XAI) to legal text analysis serves a dual function in this study. First, it allows for the scalable and systematic extraction of risk-relevant features embedded within judicial decisions, capturing nuanced legal signals that are often overlooked in conventional ESG evaluations. Second, it enhances the interpretability and credibility of the resulting risk-scoring models, enabling stakeholders—such as investors, regulators, and compliance officers—to understand, trust, and act upon the model’s outputs. By ensuring transparency and accountability in the analysis process, XAI facilitates the integration of litigation-based insights into ESG assessment frameworks, ultimately supporting more robust and ethically grounded decision-making.
3. Research Method
This study deliberately adopts a model development and empirical evaluation framework rather than ex ante hypothesis testing, which aligns with its exploratory and applied aims. Given the novelty of integrating adjudicated legal texts into ESG modeling, this design is methodologically appropriate. The use of cross-validated model comparison, rigorous sampling design, and structured feature extraction ensures scientific validity beyond mere descriptive analysis. By comparing a baseline ESG-only model to a litigation-informed hybrid model, we assess whether adjudicated legal features improve ESG risk predictions. This comparative structure serves as a functional alternative to hypothesis testing and is appropriate for the study’s exploratory and applied aims.
3.1. Data Collection
To empirically investigate the impact of litigation on ESG risk assessment, this study compiled a dataset of ESG-related court rulings retrieved from the LexisNexis legal database. The rulings were sourced exclusively from U.S. federal and state courts to ensure jurisdictional consistency. The collection covers decisions issued between 1 January 2023, and 31 May 2025.
An initial set of candidate cases was identified using ESG-related keywords such as “environmental harm,” “labor rights,” “board failure,” and “regulatory sanction.” To ensure that each ruling was genuinely relevant to ESG risk, all documents were manually reviewed in full. Cases were retained only if they substantively addressed sustainability issues, such as verifiable environmental violations, labor disputes, or governance breaches. To enhance transparency and reproducibility of the sample construction, we clarify our inclusion and exclusion criteria for ESG-related litigation cases. Specifically, a case was included if it met at least one of the following conditions: (1) the lawsuit addressed core environmental issues such as pollution, emissions, or waste management; (2) the litigation involved social aspects including labor practices, workplace discrimination, or community impact; or (3) the case focused on governance failures such as transparency violations, board misconduct, or regulatory non-compliance. We excluded cases that only indirectly mentioned sustainability or where ESG relevance was peripheral to the core legal dispute. For example, intellectual property disputes involving green technologies were included only when the environmental significance was explicitly recognized by the court, whereas generic supply chain contract disputes were excluded unless human rights or environmental concerns were substantively litigated. These criteria were applied consistently during the data screening process to ensure that included cases had clear ESG salience. Purely commercial disputes unrelated to sustainability—for example, trademark or patent conflicts—were excluded based on the substantive content rather than keyword matches alone.
For precise firm identification, Named Entity Recognition (NER) was applied to extract defendants’ names from each ruling. The extracted entities were manually cross-checked to prevent misclassification, such as including subsidiaries or similarly named entities that do not correspond to the listed firms of interest. Only rulings where the defendant could be matched to a publicly listed company with available ESG disclosure records were included in the final sample.
Each valid court ruling was then annotated according to a predefined coding manual, classifying the violation domain (Environmental, Social, or Governance) and the legal outcome. This labeling process followed a standardized protocol to ensure consistency across the dataset.
To provide a robust comparative baseline, a control group was constructed using stratified random sampling. Control firms were matched to litigated firms by both industry classification (2-digit NAICS) and market capitalization decile. This stratification helps ensure that the control firms have comparable ESG risk exposure, sector-specific characteristics, and size-related governance structures. By mirroring the key structural dimensions of the litigated firms, this approach minimizes potential sample selection bias and enhances the validity of comparisons between the groups. To ensure correct temporal alignment for causal interpretation, only ESG scores that were disclosed prior to the filing date of each litigation case were used in the analysis. This approach guarantees that the ESG ratings reflect ex-ante information and are not influenced by subsequent legal outcomes. This matching strategy mirrors quasi-experimental designs commonly used in policy evaluation studies, thereby enhancing the internal validity of the comparison and addressing potential confounders in litigation exposure.
3.2. ESG Score Collection and Baseline Modeling
To benchmark the effectiveness of legal-case features, a baseline classification model was first constructed and evaluated using only traditional ESG scores and control variables, prior to the introduction of litigation-derived inputs. This formulation provides a transparent reference point for evaluating whether the inclusion of litigation-based features in subsequent models yields statistically significant improvements in predictive performance. ESG risk scores for both litigated and non-litigated firms were obtained from Sustainalytics, a globally recognized ESG data provider. Sustainalytics’ scores measure a firm’s exposure to and management of ESG risks; higher scores indicate higher unmanaged risk.
Each firm was assigned a binary label: 1 if the firm was involved in ESG-related litigation, 0 otherwise. A Random Forest classifier was trained using Sustainalytics ESG risk scores as the main feature, supplemented with control variables including firm size (log market capitalization), and industry dummy variables. This model served to evaluate the predictive power of conventional ESG data in identifying high-risk firms.
The model structure and feature set are explicitly specified to ensure reproducibility and scientific rigor. The inclusion of control variables and industry fixed effects accounts for structural heterogeneity, supporting causal inference through observational data. To clarify the structure of the baseline model, the prediction function can be written as follows:
and the hybrid model as:
where denotes the litigation risk label for firm , is the Sustainalytics ESG risk score, is the log-transformed market capitalization, and represents binary indicators for industry sector . This formulation provides a transparent reference point for evaluating whether the inclusion of litigation-based features in subsequent models yields statistically significant improvements in predictive performance.
For the XGBoost model, hyperparameters were empirically determined as follows: max_depth = 6, learning_rate = 0.1, n_estimators = 300, subsample = 0.8, and colsample_bytree = 0.8. For the Random Forest model, the configuration included n_estimators = 500, max_depth = None, and min_samples_split = 2. A stratified 5-fold cross-validation procedure was employed to ensure a rigorous separation between training and validation sets, thereby enhancing the stability and generalizability of performance estimates. As the legal case dataset exhibited no substantial class imbalance, oversampling or reweighting strategies were deemed unnecessary. To mitigate overfitting, an early stopping criterion was applied, whereby training was terminated if the validation AUC failed to improve for a predefined number of epochs. These methodological specifications establish a robust baseline against which the incremental value of litigation-based features can be systematically assessed. This formulation provides a transparent reference point for evaluating whether the inclusion of litigation-based features in subsequent models yields statistically significant improvements in predictive performance.
3.3. Feature Engineering from Legal Texts
To enrich the ESG risk modeling with litigation-derived information, a structured feature extraction pipeline was constructed by combining surface-level and deep semantic features. This multi-layered engineering process transformed raw unstructured court documents into analytically meaningful variables compatible with machine learning models. First, categorical and binary variables were created from the metadata, including violation domain (Environmental, Social, or Governance), legal outcome, and firm identifiers. Second, term frequency-inverse document frequency (TF-IDF) was applied to capture high-frequency legal keywords that are indicative of ESG controversies—for example, terms such as “injunction,” “sanction,” and “negligence.” These surface-level features enable interpretable, keyword-level analysis of common legal signals.
To capture deeper semantic meaning within the full text of rulings, we employed Legal-BERT, a domain-specific transformer model pre-trained on large-scale legal corpora [
41,
42]. We applied TF-IDF vectorization to transform the tokenized risk-related segments into feature vectors and extracted semantic representations using Legal-BERT. To enhance transparency and reproducibility, we additionally specified the tokenization and vectorization configurations as follows. TF-IDF vectorization was performed using unigram-based tokenization with a minimum document frequency threshold of 5 and no maximum limit, ensuring the inclusion of infrequent but semantically meaningful terms. Legal-BERT embeddings were generated using a pre-trained model with a hidden size of 768 dimensions. For fusion, we implemented an early-fusion architecture where the TF-IDF vectors and Legal-BERT embeddings were concatenated at the feature level before model training. Legal-BERT was selected because it is specifically optimized for extracting nuanced legal language patterns from complex court decisions, ensuring that the embedded representations accurately reflect the semantic structure of adjudicated ESG risks. This domain adaptation is particularly important for modeling legal reasoning and regulatory context, which generic language models may not capture effectively.
These contextual embeddings were combined with the TF-IDF features in a late-fusion architecture, leveraging both shallow lexical salience and deep semantic representations [
43]. To ensure transparency in the feature integration process, SHapley Additive exPlanations (SHAP) were incorporated at the training stage. SHAP scores quantify each feature’s marginal contribution to ESG risk predictions, thereby identifying influential legal signals—such as repeated regulatory sanctions or governance failures—and enhancing interpretability. This integration of explainability into the fusion process ensures that the extracted risk signals are not only accurate but also trustworthy for ESG compliance and investment decision-making.
Building on this foundation, SHAP further enables the alignment of model reasoning with human expectations. In ESG applications, where transparency and accountability are essential, such alignment reinforces stakeholder trust and facilitates more informed compliance and investment strategies.
3.4. Explainable Modeling with SHAP
The incorporation of legal texts into ESG risk prediction frameworks presents a critical methodological challenge—ensuring model interpretability. As ESG assessments increasingly influence high-stakes decision-making processes, including investment allocation, regulatory enforcement, and reputational risk management, there is a growing demand for transparent and auditable justifications of model outputs. This requirement becomes even more pressing within legal contexts, where accountability and the ability to justify decisions are fundamental [
44].
To address this challenge, the present study employs SHapley Additive exPlanations (SHAP), a state-of-the-art explainable artificial intelligence (XAI) technique rooted in cooperative game theory [
45,
46]. SHAP estimates the marginal contribution of each feature to the model’s prediction by evaluating all possible feature combinations, thereby facilitating both local (instance-level) and global (model-level) interpretability. To compute SHAP values, TreeExplainer (for XGBoost models) and DeepExplainer (for neural models) were respectively used, depending on the fusion model architecture. Each ESG case was treated as a local instance, and SHAP values were computed across all features—including TF-IDF terms and contextual Legal-BERT embeddings—using 100 background samples to ensure robust estimation. The top-ranked features per instance were then aggregated to derive global explanations. This dual-layered transparency enables stakeholders to understand not only what the model predicts, but also why it arrives at a particular risk classification.
Within the context of litigation-based ESG risk modeling, SHAP is applied to quantify the importance of diverse input features, spanning from structured legal variables (e.g., litigation outcomes, violation domains) to unstructured textual embeddings generated using Legal-BERT. The SHAP analysis reveals legally salient patterns that systematically elevate predicted ESG risk. For instance, frequent references to regulatory sanctions, environmental damage, or board-level negligence consistently emerge as key indicators of elevated risk across multiple cases.
This commitment to interpretability aligns with recent regulatory developments, including the European Union’s Artificial Intelligence Act and the OECD’s principles on trustworthy AI, both of which underscore explainability as a prerequisite for deploying AI in high-risk domains such as finance and law [
47,
48]. By incorporating SHAP into the modeling framework, this study not only enhances the transparency of ESG risk predictions but also ensures compliance with emerging ethical and regulatory standards in the deployment of AI for ESG assessment. Unlike black-box models that limit interpretability, the integrated SHAP analysis offers both local and global explanation, complying with the standards of explainable AI (XAI) as required in high-stakes domains such as finance and law.
3.5. Model Training and Evaluation
To evaluate the effectiveness of the proposed ESG risk prediction models, this study employed five key performance metrics commonly used in supervised classification tasks: accuracy, precision, recall, F1-score, and the area under the receiver operating characteristic curve (AUC-ROC). However, rather than applying these metrics in a generic manner, their relevance is contextualized within the specific goals and constraints of litigation-based ESG risk assessment.
Accuracy, defined as the proportion of all correct predictions relative to the total number of cases, offers a high-level overview of the model’s general classification performance. However, in the context of ESG litigation—which is characterized by a relatively low base rate of high-risk events (i.e., lawsuits)—accuracy alone may be misleading. A model that simply predicts the majority class (low-risk) can still achieve high accuracy while failing to identify actual risk cases, thereby undermining its practical value for compliance officers, asset managers, and other stakeholders. Therefore, accuracy is reported for completeness but is not prioritized in performance evaluation.
In contrast, precision plays a particularly important role in this domain. Precision measures the proportion of firms predicted to be at high risk of ESG-related litigation that were in fact involved in such legal cases. In real-world applications—such as ESG investment screening or regulatory supervision—false positives can incur substantial reputational, legal, and financial costs. High precision thus indicates that the model minimizes false alarms and provides reliable alerts for closer due diligence or intervention.
Recall, or sensitivity, captures the model’s ability to correctly identify firms that actually experienced ESG-related litigation. From a risk management perspective, missing high-risk firms (i.e., generating false negatives) poses a significant concern, as these entities may go unmonitored despite posing material environmental, social, or governance threats. A model with low recall could perpetuate existing blind spots in ESG evaluation systems, especially for firms engaging in greenwashing or symbolic compliance. Therefore, achieving high recall is essential for the model’s role as an early-warning system that flags firms whose disclosed ESG profiles diverge from their actual legal exposure.
Given the inherent trade-off between precision and recall—particularly in imbalanced datasets where litigation cases are relatively rare—the F1-score serves as a consolidated metric that balances both concerns. It is especially informative in this study because it rewards models that not only identify high-risk firms accurately but also avoid excessive overprediction. F1-score provides a more meaningful reflection of real-world utility than accuracy, aligning with the operational needs of institutional investors and ESG rating agencies who must balance responsiveness with credibility.
Finally, the AUC-ROC is employed to assess the model’s ability to discriminate between high-risk and low-risk firms across all possible classification thresholds. This threshold-agnostic measure is particularly important given that decision-makers may adjust risk tolerance levels depending on regulatory, financial, or strategic objectives. A high AUC value indicates that the model reliably ranks firms by underlying litigation risk, even if a final classification cutoff must later be calibrated. In this way, the AUC-ROC supports flexible deployment across different ESG governance scenarios, from regulatory stress testing to proactive stakeholder engagement.
To ensure that the models generalize well and do not overfit to the limited dataset, a five-fold cross-validation procedure was applied during training and hyperparameter tuning. In addition, dropout regularization was incorporated within the Legal-BERT embedding layer to prevent the model from memorizing case-specific language patterns. Together, these strategies help maintain the empirical rigor and practical reliability of the proposed litigation-aware ESG risk framework.
Taken together, these metrics and methodological safeguards provide a comprehensive and nuanced evaluation framework that reflects not only statistical performance but also the ethical and strategic imperatives of ESG modeling. By prioritizing recall and precision over raw accuracy, validating ranking capacity through AUC, and explicitly addressing overfitting risks, the study ensures that its proposed hybrid model offers both empirical rigor and actionable relevance for the growing ecosystem of sustainability-oriented decision-makers. This multi-metric evaluation strategy, coupled with cross-validation and dropout regularization, ensures that the model not only performs well empirically but also maintains methodological precision and generalizability.
Across multiple cross-validation folds, the proposed hybrid model consistently outperformed the ESG-only baseline, achieving an AUC of 0.76 and an F1-score of 0.68. These results underscore the practical utility of incorporating litigation-derived features into ESG risk classification tasks.
4. Research Analysis
4.1. Descriptive Statistics
To address the first research question—whether ESG-related litigation reveals material risk factors that are underrepresented or entirely omitted in conventional ESG evaluations—this study conducted a comprehensive descriptive analysis of court rulings related to ESG issues. The dataset comprises 213 adjudicated cases between January 2023 and May 2025, each manually categorized into Environmental (E), Social (S), or Governance (G) domains, based on the primary subject matter of the legal dispute. The following
Table 1 provides an overview of ESG-related litigation cases categorized by violation domain.
In this study, ESG-related legal violations are categorized into three domains—Environmental (E), Social (S), and Governance (G)—based on the substantive nature of the alleged misconduct and the primary U.S. federal or state statutes invoked in each case. This classification follows a content-based criterion, whereby the legal provisions directly determine the violation type. Specifically, Environmental violations are defined as breaches of environmental protection laws aimed at safeguarding natural resources and public health. These cases frequently cite the Clean Air Act (CAA), which regulates air pollutant emissions; the Clean Water Act (CWA), which governs water pollution; and the Resource Conservation and Recovery Act (RCRA), which addresses hazardous waste management. Social violations are defined as legal breaches affecting labor rights, human rights, or equal opportunity, such as violations of the Fair Labor Standards Act (FLSA), which establishes wage and hour standards, and Title VII of the Civil Rights Act, which prohibits workplace discrimination. Governance violations refer to legal breaches that undermine corporate governance integrity, including violations of disclosure obligations, accounting fraud, and breaches of fiduciary duty, often adjudicated under the Securities Exchange Act of 1934, the Sarbanes–Oxley Act (SOX), and state-level corporate governance laws like the Delaware General Corporation Law. This taxonomy is legally grounded, provides operational definitions for each category, and is directly applied in our dataset construction, feature engineering, and SHAP-based interpretation to analyze risk drivers by violation type.
The distribution of violation types indicates that governance-related disputes account for the largest share at 42.7%, followed by environmental (35.2%) and social (22.1%) cases. This distribution stands in contrast to the structure of most mainstream ESG rating frameworks, such as MSCI and Sustainalytics, which tend to assign higher weight to environmental indicators—e.g., carbon emissions, energy use, or waste management. Governance is often reduced to policy existence checks (e.g., whether a board diversity policy is in place), rather than dynamic indicators of governance failures. The fact that governance-related lawsuits are disproportionately prevalent suggests a fundamental blind spot in current rating methodologies, particularly in capturing behavioral or structural deficiencies that are not easily disclosed.
Furthermore, legal outcomes demonstrate the material consequences of these violations: approximately 51.6% of the cases resulted in adverse outcomes for the firm, such as court-imposed sanctions, monetary damages, or injunctive relief. These results confirm that ESG litigation is not a marginal issue but a significant manifestation of non-compliance with sustainability norms.
As shown in
Table 2, more than half of the ESG-related litigation cases (approximately 51.6%) resulted in negative outcomes for the firms involved, such as monetary damages, injunctive relief, regulatory sanctions, or settlements. This significant proportion highlights the existence of substantial, tangible risks that traditional ESG rating systems frequently overlook. Such legally adjudicated outcomes provide clear evidence that ESG assessments based primarily on voluntary disclosures and self-reported compliance can fail to capture critical, real-world risk factors, reinforcing the need for integrating litigation-based evidence into ESG evaluation frameworks.
One major reason for this misalignment lies in the heavy reliance of ESG rating agencies on self-reported disclosures, sustainability reports, and voluntary responses to questionnaires. These documents often omit or downplay legally sensitive matters. For instance, while many firms claim to have “comprehensive human rights policies,” court rulings reveal involvement in labor union suppression, discriminatory layoffs, or subcontractor abuse. Such contradictions point to the limitations of narrative- and policy-driven ESG scoring, particularly when they lack empirical enforcement validation.
To further explore this, key sentences and concepts were extracted from the rulings and compared with the respective firms’ disclosed ESG reports. The analysis revealed multiple instances in which firms declared adherence to “ethical supply chain management,” yet were found liable in court for violations such as labor within their subcontracting networks.
Overall, this analysis provides empirical evidence that ESG-related litigation offers a distinct, complementary source of risk information that is largely excluded from conventional ESG rating models. While rating agencies tend to assess firms based on declared intentions, governance frameworks, and reputational indicators, litigation data captures failures of implementation and accountability. By revealing how corporate practices diverge from ESG narratives, court rulings serve as a verifiable, adjudicated counterweight to symbolic compliance. As such, integrating litigation-based data into ESG risk models can play a critical role in mitigating information asymmetry and enhancing the predictive and ethical robustness of sustainability evaluations.
4.2. Model Performance
To examine whether incorporating litigation-derived features can enhance the accuracy and reliability of ESG risk prediction, this study conducted supervised classification experiments using two configurations: a baseline model that relied solely on traditional ESG ratings and financial indicators, and a hybrid model that incorporated additional features extracted from court rulings. The baseline model utilized ESG scores provided by major rating agencies, including Sustainalytics. The hybrid model extended this approach by integrating structured legal information—such as violation domain and legal outcome—as well as semantic representations of court texts generated using Legal-BERT, a domain-specific transformer-based model trained on legal documents.
All models were evaluated using a stratified 70%, 15%, 15% split across training, validation, and testing sets, and five-fold cross-validation was employed for hyperparameter tuning. Three algorithms—Random Forest, XGBoost, and a Legal-BERT-based classifier—were used to assess performance. Among the baseline models, XGBoost yielded the highest performance with an F1-score of 0.68 and an AUC-ROC of 0.74. However, further analysis revealed a significant number of false negatives, especially for firms that had maintained high ESG scores despite later becoming involved in litigation. This suggests that conventional ESG indicators, which rely heavily on self-disclosed information, fail to detect many latent risk factors associated with legal and regulatory exposure.
In contrast, the hybrid model incorporating litigation-derived features showed marked improvements across all evaluation metrics. As summarized in
Table 3, the Legal-BERT-based hybrid model achieved an F1-score of 0.81, an AUC-ROC of 0.87, a precision score of 0.79, and a recall score of 0.84. The improvement in recall is particularly notable, indicating the model’s enhanced capacity to identify high-risk firms that may otherwise be overlooked. This reflects the predictive value of litigation-based data, which provides concrete, adjudicated signals of non-compliance and misconduct that are often absent from voluntary ESG disclosures.
Figure 1 visualizes these performance metrics, clearly illustrating that the hybrid model outperforms the baseline across all key metrics.
Figure 2 further presents the ROC curves of both models, reinforcing the observed differences in classification effectiveness, particularly in recall and AUC-ROC, and confirming that the integration of legal information meaningfully enhances the model’s ability to capture real-world ESG risk.
Finally, this performance difference is examined in greater detail through
Figure 3 and
Figure 4, which compare the confusion matrices of the baseline and hybrid models, respectively. The hybrid approach reduces both false positives and false negatives, making it more suitable for high-stakes decision-making in investment, compliance, and governance contexts.
To further analyze the internal logic of the hybrid model and enhance transparency, SHAP was employed to assess feature importance. The SHAP analysis revealed that the most influential predictors of ESG litigation risk were predominantly derived from legal data rather than traditional ESG scores or financial indicators.
As summarized in
Table 4, the top-ranked feature was prior regulatory sanctions (mean SHAP score = 0.221), followed by mentions of fiduciary breach (0.189) and the frequency of governance-related terms (0.172). Other highly ranked features included adverse legal outcomes such as monetary damages or injunctions (0.157), mentions of environmental violations (0.144), repeated non-compliance statements (0.131), and labor rights violations (0.119).
These features consistently showed strong positive contributions to the model’s predictions of elevated ESG litigation risk. Taken together, these findings highlight that legally grounded and behaviorally observable events provide greater predictive and explanatory value than narrative-based ESG indicators, thereby underscoring the inadequacy of relying solely on self-reported ESG disclosures.
The results in
Table 4 substantiate the claim that litigation-derived features not only offer high predictive utility but also improve model explainability. The top-ranked features correspond to verifiable legal events and language patterns that are often entirely omitted from ESG disclosures. For example, the presence of prior regulatory sanctions and repeated references to governance failures were consistently associated with elevated predicted risk, reflecting the practical and legal consequences of poor ESG performance. These findings underscore the inadequacy of relying solely on narrative-based ESG indicators and highlight the value of legally grounded, behaviorally observable data in risk modeling.
4.3. Key Findings
The empirical analyses presented in
Section 4.1 and
Section 4.2 collectively yield several significant findings that directly address the study’s two research questions.
First, the results affirm that ESG-related litigation contains critical risk information that is underrepresented, if not entirely absent, in conventional ESG ratings. Descriptive evidence revealed a notable disconnect between firms’ ESG scores and their actual exposure to legal sanctions. A majority of litigated firms maintained above-average ESG ratings in the year prior to their legal disputes, despite being subsequently found liable for violations ranging from environmental damage to governance breaches. This misalignment suggests that prevailing ESG assessment systems—primarily based on voluntary disclosures, policy checklists, and reputational indicators—fail to detect latent structural risks, particularly those related to internal control failures and performative (rather than substantive) ESG compliance.
Second, the integration of litigation-derived features into predictive models significantly enhances their ability to identify firms at elevated risk of future ESG-related legal events. The hybrid models outperformed baseline configurations across all performance metrics, with the Legal-BERT-based model achieving the highest F1-score (0.81) and AUC-ROC (0.87). The marked improvement in recall—rising from 0.65 in the baseline model to 0.84 in the litigation-augmented model—is especially noteworthy. It indicates that many high-risk firms overlooked by disclosure-based assessments can be effectively identified when adjudicated legal evidence is incorporated. These findings demonstrate that litigation data offer predictive value not captured by ESG ratings alone and support their inclusion as core input features in next-generation ESG risk modeling.
Third, explainability analysis using SHAP further substantiates the centrality of litigation-based features in model decision-making. Features such as prior sanctions, frequency of governance-related legal terms, and adverse legal outcomes emerged as the most influential predictors, surpassing the explanatory weight of conventional ESG scores. This not only confirms the empirical relevance of legal signals but also enhances the transparency and accountability of AI-driven ESG assessments—an increasingly important concern in high-stakes decision-making contexts. The use of SHAP ensured that model outputs were interpretable and aligned with human domain knowledge, allowing regulators, investors, and analysts to understand the legal basis for elevated risk classifications.
Taken together, these findings point to the limitations of current ESG evaluation systems and propose a credible path forward. While ESG scores remain useful for benchmarking disclosure practices and policy commitments, they are insufficient as standalone indicators of actual sustainability performance or legal vulnerability. By augmenting ESG assessments with legally adjudicated data and ensuring explainability through SHAP, the proposed framework offers a more robust, predictive, and ethically grounded approach to ESG risk evaluation. This approach not only improves model accuracy but also strengthens the legitimacy of ESG analysis as a tool for institutional accountability and market transparency.
5. Conclusions
5.1. Summary of Contributions
This study offers a significant contribution to the field of ESG risk assessment by proposing a hybrid evaluation model that integrates legal litigation data with traditional ESG metrics. While conventional ESG scores are largely built on self-reported disclosures and standardized questionnaires, they frequently fail to reflect hidden or latent risks associated with legal and regulatory non-compliance. Our work addresses this critical gap by incorporating court rulings, enforcement records, and textual signals from litigation documents into a machine learning framework designed to predict ESG-related legal risk.
Through the application of transformer-based sentence embeddings and explainable AI (SHAP), we were able to uncover the relative importance of various features in ESG litigation prediction. Notably, indicators such as recurring governance failures, prior sanctions by regulatory bodies, and negatively framed judicial expressions were found to be stronger predictors of ESG litigation than traditional ESG scores. This suggests that many high-rated firms under current ESG schemes may still be exposed to substantial legal and reputational risks.
Furthermore, this research enhances transparency and trust in ESG modeling by integrating explainability mechanisms that allow stakeholders to understand the basis for predictions. The result is a more robust, evidence-based ESG risk signal that can serve as a supplementary tool for investors, regulators, and ESG rating agencies seeking to enhance the granularity and reliability of sustainability assessments.
These contributions fill a crucial gap in the current ESG literature. While prior studies have raised concerns over the opacity, inconsistency, and limited predictive value of ESG ratings based on self-reported disclosures [
49,
50], few have proposed alternative frameworks that incorporate verifiable legal outcomes as a direct signal of ESG-related risks. For instance, Liang and Renneboog [
51] explored the relationship between ESG performance and firm value using disclosure-based metrics, but did not account for the legal accountability dimension. Krueger et al. [
52] investigated the financial implications of ESG controversies but largely relied on news sentiment rather than adjudicated outcomes. By contrast, this study introduces a litigation-augmented ESG risk model that operationalizes court rulings, enforcement records, and legal language as explainable features. In doing so, it provides a legally grounded, auditable, and transparent mechanism to complement or challenge traditional ESG ratings. This novel integration of explainable AI and court-based signals advances the field by offering a more rigorous and accountable approach to ESG risk detection—particularly in contexts where greenwashing or symbolic compliance is suspected.
5.2. Implications for ESG Evaluation Practices
The findings from this study carry substantial implications for how ESG evaluations are designed and applied in practice. First, they highlight the insufficiency of relying solely on self-reported or third-party ESG ratings, which often fail to reveal latent risks related to regulatory sanctions, governance failures, or repeat violations. By integrating adjudicated legal outcomes and explainable AI methods, this approach provides a more evidence-based perspective that investors, regulators, and ESG rating agencies can use to detect greenwashing or symbolic compliance more effectively.
Importantly, the SHAP-based explanations in this framework are designed to be practically actionable. By clearly highlighting which legal signals—such as prior regulatory sanctions, repeated governance breaches, or adverse court outcomes—most strongly contribute to a firm’s predicted risk, the model enables compliance officers, ESG analysts, and institutional investors to trace, justify, and communicate risk scores with transparency. This interpretability supports the integration of litigation-aware risk signals into existing due diligence, audit, and monitoring processes.
To complement the global SHAP-based feature importance analysis, this study introduces a case-level qualitative interpretation that highlights the model’s practical relevance and interpretability. A representative example involves a major airline that became the subject of ESG-related litigation. In that case, the court determined that the company had violated its fiduciary duties by pursuing ESG objectives without adhering to its own internal governance protocols. Despite having received a relatively neutral ESG rating from commercial providers, the ruling emphasized misalignments between the firm’s public sustainability disclosures and its operational decisions—particularly with respect to fleet renewal and carbon offsetting practices. Similar patterns are reflected in our SHAP analysis (
Table 4), where features such as Prior Regulatory Sanctions, Mentions of Fiduciary Breach, and Adverse Legal Outcomes emerge as the most influential predictors of ESG litigation risk. Even in cases where ESG scores appear neutral or positive, the presence of repeated regulatory infractions and a high frequency of governance-related terminology in legal documents significantly contribute to elevated risk classifications. This analysis demonstrates that the proposed model moves beyond numerical prediction to offer legally grounded and interpretable risk signals. By identifying inconsistencies between disclosed ESG commitments and actual governance practices, the model enables stakeholders to reassess ESG risk with a heightened degree of contextual and evidentiary rigor. The integration of adjudicated legal data not only enhances prediction accuracy but also provides a transparent and verifiable basis for decision-making, which is essential for applications in investment, compliance, and regulatory oversight.
However, while SHAP provides a clear ranking of feature contributions, its practical utility depends on how effectively these insights are integrated into actual compliance, audit, or investment workflows. For example, a human-in-the-loop process should validate whether repeated regulatory sanctions flagged by the model indeed warrant immediate risk reclassification or further investigation. Moreover, domain experts must interpret the context of legal outcomes to avoid over-reliance on machine-generated signals that may lack nuance in complex cases. To support such workflows, future studies should develop scenario-based guidelines or case studies demonstrating how XAI outputs can be systematically reviewed, challenged, or refined in institutional ESG risk management.
Furthermore, a human-in-the-loop mechanism can complement the explainable AI outputs by allowing domain experts to review the risk drivers identified by the model and provide corrections or refinements as needed. This feedback loop helps ensure that the final risk assessments align with legal realities and organizational risk tolerance, bridging the gap between automated signals and expert judgment.
5.3. Limitations of the Study
While this study offers an innovative approach to enhancing ESG risk models through litigation data and explainable AI, several limitations should be acknowledged.
First, the legal dataset used here is limited to U.S. federal and state court rulings, which may constrain the generalizability of the results to other legal systems and regulatory environments. Differences in court transparency, litigation culture, and ESG disclosure mandates can significantly affect how ESG controversies manifest in legal proceedings. To address this, future research should apply the same framework to diverse jurisdictions—such as the EU, where regulatory enforcement and ESG reporting are more codified, or Asian countries with emerging ESG regimes—and adapt the legal text processing pipeline for multilingual and region-specific legal contexts. Additionally, the choice of the U.S. as the legal context is supported by its globally recognized case law system, high transparency, and the precedent-setting nature of its rulings. These characteristics make U.S. court data a valuable foundation for methodological development and a relevant reference point for international ESG research. Future research will aim to apply this framework to other jurisdictions, to examine its broader applicability. Furthermore, jurisdiction-specific differences in ESG litigation may influence how model outcomes align with stakeholder expectations, indicating the need for stakeholder engagement in future validation efforts.
Second, litigation data itself may be subject to inherent reporting lags, even within the defined observation window. Such lags arise because legal cases often progress through multiple procedural stages—from the underlying ESG-related incident to investigation, filing, trial, and final adjudication—each of which may span months or years. Consequently, the date when a ruling becomes publicly available may substantially postdate the actual occurrence of the ESG violation. This temporal misalignment can attenuate the timeliness of predictions, as the model’s “observation” of the event is effectively delayed relative to the real-world risk emergence. In high-stakes applications such as investment screening or regulatory monitoring, this lag may result in a delayed response to emerging risks, limiting the model’s preventive utility. Future studies could address this limitation by incorporating additional legal process metadata (e.g., filing dates, preliminary rulings) or by integrating alternative, earlier-stage data sources (e.g., regulatory notices, press releases, whistleblower reports) to mitigate the impact of reporting delays and improve the responsiveness of ESG risk prediction systems.
Third, while the litigation dataset used in this study comprises 213 ESG-related rulings, which is comparatively meaningful given the novelty of ESG as a legal concept and the limited global accumulation of such cases, the small sample size still constrains the statistical power of multi-class or multi-dimensional modeling. This limitation reflects the current reality that ESG-related case law is relatively scarce. Future research should expand the dataset as more cases become available, leveraging broader legal databases—such as LexisNexis—to incorporate additional jurisdictions and enrich the training corpus, thereby improving both the robustness and generalizability of model outcomes.
Fourth, the two-year observation window provides valuable, up-to-date insights into recent ESG-related litigation but may not fully capture repeated litigation patterns, long-term governance failures, or evolving risk trajectories. Extending the dataset to cover a longer time span would enable dynamic tracking of persistent violators, changes in corporate governance practices, and the cumulative effect of regulatory actions. Incorporating additional longitudinal data sources, such as administrative penalties or regulatory investigations, could also help build a more comprehensive view of structural ESG risks over time.
Finally, although the control group was constructed using stratified random sampling by industry classification and market capitalization to ensure basic comparability, residual selection bias may still exist. This stratification approach remains appropriate given known industry- and size-specific ESG risk exposures, but future research could test more advanced matching techniques—such as Propensity Score Matching (PSM) or coarsened exact matching (CEM)—to further improve covariate balance and validate the robustness of the comparative analysis under alternative assumptions. Moreover, the methodological framework proposed in this study is not confined to a single jurisdiction but is designed to be broadly applicable across legal systems. With appropriate access to data, the framework can be flexibly adapted to align with the legal structures and ESG governance practices of different countries. Addressing these limitations in future research will involve securing access to commercial ESG datasets, conducting comparative benchmarking, and implementing external validation using real-world ESG events to enhance the framework’s robustness and practical relevance. By doing so, the proposed approach can evolve into a more reliable and widely applicable tool for advancing transparency, accountability, and comparability in ESG assessments across diverse legal and regulatory contexts.