Next Article in Journal
An Assessment of the Microbiological, Biochemical, and Physicochemical Properties of the Soil Around an Illegal Landfill Site in Central Poland, Central Europe
Next Article in Special Issue
Educational Aspects Affecting Paramedic Preparedness and Sustainability of Crisis Management: Insights from V4 Countries and the Role of Innovative Technologies
Previous Article in Journal
How Does Fiscal Vertical Imbalance Affect Regional Green Technology Innovation in China—The Moderating Role of Financial Decentralization and Fiscal Transparency
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Coal Mine Accident Risk Analysis with Large Language Models and Bayesian Networks

1
University of Chinese Academy of Sciences, Beijing 100049, China
2
Institutes of Science and Development, Chinese Academy of Sciences, Beijing 100190, China
*
Author to whom correspondence should be addressed.
Sustainability 2025, 17(5), 1896; https://doi.org/10.3390/su17051896
Submission received: 23 January 2025 / Revised: 16 February 2025 / Accepted: 17 February 2025 / Published: 24 February 2025

Abstract

:
Coal mining, characterized by its complex operational environment and significant management challenges, is a prototypical high-risk industry with frequent accidents. Accurate identification of the key risk factors influencing coal mine safety is critical for reducing accident rates and enhancing operational safety. Comprehensive analyses of coal mine accident investigation reports provide invaluable insights into latent risk factors and the underlying mechanisms of accidents. In this study, we construct an integrated research framework that synthesizes large language models, association rule mining, and Bayesian networks to systematically analyze 700 coal mine accident investigation reports. First, a large language model is employed to extract risk factors, identifying multiple layers of risks, including 14 direct, 38 composite, and 75 specific factors. Next, the Apriori algorithm is applied to mine 281 strong association rules, which serve as the foundation for constructing a Bayesian network model comprising 127 nodes. Finally, sensitivity analysis and critical path analysis are conducted on the Bayesian network to reveal seven primary risk factors primarily related to on-site safety management, the execution of operational procedures, and insufficient safety supervision. The novelty of our framework lies in its efficient processing of unstructured text data via large language models, which significantly enhances the accuracy and comprehensiveness of risk factor identification compared to traditional methods. The findings provide robust theoretical and practical support for coal mine safety risk management and offer valuable insights for risk management practices in other high-risk industries. From a policy perspective, we recommend that the government strengthen legislation and supervision of coal mine safety with a particular focus on the enforcement of operational procedures and on-site safety management, promote comprehensive safety education and training to enhance frontline personnel’s awareness and emergency response capabilities, and leverage data-driven technologies to develop intelligent risk early-warning systems. These measures will improve the precision and efficiency of safety management and provide a scientific basis for accident prevention and control.

1. Introduction

The occurrence of coal mine accidents not only results in human casualties and economic losses, but also generates a cascade of profound and long-lasting impacts, such as production suspensions, administrative penalties imposed on regulatory bodies, erosion of public trust in coal mining enterprises, and disruptions to coal supply chains. Given the severe and far-reaching consequences of such accidents, research focused on the prevention and mitigation of coal mine accidents holds substantial practical and societal value. Over the past several decades, the safety standards of coal mine production in China have significantly improved. As illustrated in Figure 1, between 2013 and 2023, three key indicators of safety performance—the frequency of coal mine accidents, the number of fatalities, and the death rate per million tons of coal—have all demonstrated a consistent downward trajectory. While the safety situation in Chinese coal mines has shown continuous improvement, the steady enhancement of safety standards faces persistent challenges, including increasing uncertainty in the external environment, inherent weaknesses in the safety infrastructure of mines, insufficient enforcement of corporate responsibility, and a shortage of skilled technical personnel. These issues constitute latent risks that may compromise the safety of coal mining operations. According to well-established accident causation theories [1,2,3,4], the occurrence of coal mine accidents is the result of the complex interaction of multiple risk factors. Effectively identifying and managing these latent risks in the coal mining process could lead to a significant reduction in the incidence of accidents.
Early research on coal mine accidents predominantly focused on the harm inflicted on miners, emphasizing the factors influencing both the likelihood of injury and its severity during such events. Key determinants identified in these studies include miners’ experience, the size of the coal mine, and the employed mining methods. These insights led to the development of protective measures aimed at enhancing miners’ safety in the workplace [5,6,7]. In addition, early research predominantly relied on accident statistics and case studies, utilizing quantitative methods and empirical observations to investigate accident causation [8]. While these approaches were innovative at the time, their effectiveness was limited by restricted data availability and relatively narrow methodological scope, hindering a comprehensive understanding of the complex causes of accidents. This limitation led subsequent scholars to recognize that a single-factor approach cannot fully account for the complexity of accidents, underscoring the need for a multidimensional analysis of the interactions between various contributing factors. Furthermore, early studies made significant contributions to the development of safety measures and the formulation of safety theories, providing invaluable theoretical foundations and practical insights that have been instrumental in shaping more refined risk assessment models and safety management systems in later research.
With the progressive improvement of coal mining environments, scholarly attention shifted toward the potential occupational health hazards associated with mining operations. Contemporary studies highlight that a wide range of factors—including occupational safety measures, the enforcement of safety regulations, risk control technologies, working conditions, and individual characteristics—have a significant impact on miners’ occupational health [9,10,11,12]. Currently, research on coal mine accidents primarily centers on accident prevention and risk control, with the overarching goal of minimizing their adverse impacts. A critical and effective research avenue involves the in-depth analysis of risk factors contributing to coal mine accidents, providing a robust scientific foundation for the formulation of preventive measures. Studies addressing coal mine accident risk factors can be broadly classified into three approaches: knowledge-driven, data-driven, and a fusion of knowledge and data-driven methodologies. In addition, risk factors are dynamic in nature, continuously evolving as mining environments, technological advancements, and safety management measures are progressively optimized [13]. By integrating historical accident data, researchers can trace the temporal evolution of risk factors, demonstrating that changes in production conditions, management levels, and external environments exert profound impacts on accident risk characteristics [14,15]. This dynamic evolution highlights the necessity of adopting adaptive risk management strategies to promptly identify and address emerging safety hazards [16]. Understanding the mechanisms behind these evolving risk factors not only aids in developing more proactive and targeted preventive measures but also lays a strong theoretical foundation and provides methodological support for future research employing knowledge-driven, data-driven, and hybrid approaches.
Knowledge-driven research primarily builds on the expertise of scholars and employs established theoretical frameworks to analyze critical risk factors associated with coal mine accidents. Among these frameworks, process safety management (PSM) is one of the most frequently applied. By integrating systematic risk assessment models such as HAZOP [17,18] and FTA [19], this approach effectively identifies and evaluates risk factors, thereby offering essential support for enhancing coal mine safety management systems [20,21,22,23]. For example, researchers have developed various risk factor analysis models [24,25,26,27,28,29,30] and have investigated the interaction mechanisms of coal mine accident risk factors within relevant theoretical frameworks [31,32,33,34,35]. These studies further assess the impact of these risk factors on accident outcomes, thereby providing a robust theoretical foundation for developing effective risk control strategies. Despite its contributions, the knowledge-driven approach exhibits certain limitations. It often focuses on specific types of coal mine accidents or examines risk factors in isolation, with minimal emphasis on the interrelationships and interactions among diverse factors. Such a narrow scope limits a comprehensive understanding of coal mine accident risks and undermines holistic risk management efforts [36,37,38].
Data-driven research on coal mine accident risk factors emphasizes the analysis of extensive coal mine accident datasets to identify and thoroughly investigate the key factors influencing such incidents. For instance, text mining techniques have been employed to extract risk factors from a vast number of coal mine accident investigation reports [39,40,41,42,43,44,45,46,47], while machine learning and deep learning models have been utilized to conduct advanced analyses of these identified factors [48,49,50,51,52,53]. This approach relies extensively on coal mine accident investigation reports, which systematically document the causes of accidents, providing a reliable data foundation for research in coal mine process safety management [54,55,56]. The adoption of advanced digital technologies in the coal mining sector has enabled real-time monitoring of production processes and the rapid detection of potential hazards, thereby offering critical support for preventing major accidents and safeguarding production facilities and personnel. The integration of digital technologies not only accelerates the response to safety hazards but also creates new opportunities for multi-source data integration and the optimization of risk management frameworks [57,58,59]. Historical data, including accident investigation reports, provide detailed insights into the causes and processes of accidents, forming a solid foundation for identifying the characteristics of risk factors and uncovering their evolutionary mechanisms. At the same time, real-time monitoring data capture dynamic changes in potential risks during production processes, enabling timely adjustments to safety management strategies.
By combining a deep analysis of risk factors’ characteristics and evolutionary mechanisms within historical data with the dynamic feedback provided by real-time monitoring, it becomes possible not only to validate analytical findings but also to optimize risk management strategies further. The seamless integration of historical and real-time data allows for a more precise understanding of the evolutionary trajectories of risk factors. Guided by process safety principles, this integration not only enhances the accuracy of analytical results but also provides systematic, scientifically grounded support for coal mine accident prevention and risk control [60]. Knowledge and data fusion-driven research on coal mine accident risk factors begins by leveraging the expertise of scholars to formulate research questions, followed by an in-depth analysis of coal mine accident data to investigate the root causes of these risks. Based on the findings, targeted risk mitigation strategies are developed, providing a robust foundation for effective risk management and accident prevention in coal mining operations [61,62,63,64,65]. This research approach not only builds on existing theoretical knowledge to refine the analysis of risk factors—such as categorizing risks by accident type [66,67] or location [68,69]—but also harnesses coal mine accident data to uncover novel insights and advance the understanding of underlying risk dynamics. The integration of knowledge and data is fundamental to conducting comprehensive and in-depth studies of coal mine accidents while establishing a foundation for dynamic process safety management. This fusion-driven research framework, guided by process safety principles, emphasizes systematization, prevention, and adaptability. By deeply integrating theoretical expertise with empirical data, it enables more precise identification and control of risk factors. Such an approach not only uncovers the underlying patterns and interactions among risk factors but also provides scientifically grounded and systematically informed decision-making support for coal mine safety management. As a result, it facilitates significant improvements in safety production levels while advancing the full implementation of process safety principles across the industry [70,71].
In summary, early research primarily relied on accident statistics and case studies to conduct quantitative investigations into the direct impacts of coal mine accidents on miner injuries and economic losses, identifying key risk factors such as miner experience, mine size, and mining methods. Although these methods were innovative at the time, their ability to reveal the complex causes and dynamic evolution of risk factors was limited by restricted data sources and a narrow methodological scope. With the continuous improvement of coal mining environments and technological advancements, the research focus has gradually shifted from isolated accident outcomes to the evolution of risk factors and their profound effects on miners’ occupational health. Accordingly, scholars have introduced knowledge-driven approaches based on theoretical frameworks, data-driven methods relying on big data analytics, and hybrid methodologies that integrate both strategies to explore the composition, interaction mechanisms, and evolution of accident risks from multiple dimensions. However, despite their distinct advantages, these approaches suffer from limitations—such as neglecting the interactions among multiple factors, incomplete information extraction, and an insufficient understanding of causal relationships—which prevent the establishment of a systematic, dynamic, and comprehensive risk assessment model. In light of these limitations and research gaps, there is an urgent need for a novel approach to overcome the shortcomings of existing methods. To address this gap, the present study proposes the use of advanced deep learning techniques and large language models to efficiently extract and analyze risk information from textual data while preserving semantic integrity. This approach aims to uncover the causal relationships among risk factors and track their dynamic evolution over time. Building on this foundation, this study seeks not only to construct a precise and efficient model for identifying and assessing coal mine accident risk factors but also to provide more scientific and systematic decision support for coal mine safety management and accident prevention.
From the perspective of risk factors, the key to effectively preventing and controlling coal mine accidents lies in the precise identification of the critical factors that contribute to their occurrence. Recent studies have widely employed text mining techniques for identifying coal mine accident risk factors, with the primary advantage being their ability to efficiently process large volumes of textual data, thereby significantly improving analysis efficiency [72,73]. However, one major drawback of text mining methods is that the process may disrupt the intrinsic semantic structure of the data, resulting in potential information loss [74,75]. Additionally, given the lack of in-depth understanding of the specific context within the coal mining industry, the risk factors identified solely through text mining often require further verification and refinement. The outcomes may not fully or accurately capture the real-world challenges encountered in coal mine production. As shown in Table 1, the main limitations and challenges of existing methods in handling risk factors are outlined in detail. To overcome the limitations inherent in existing methods for coal mine accident risk identification and to facilitate more precise analyses, this study proposes leveraging deep learning techniques using large language models [76,77,78]. Large language models are equipped with advanced natural language processing and understanding capabilities, enabling them to not only efficiently identify potential risk factors but also to uncover the causal relationships between these factors, while effectively preserving the semantic integrity of the data [79,80,81]. In the context of complex textual data, the efficiency of large language models further enhances the reliability and validity of the analysis results. By incorporating large language models, this study provides robust technical support for the development of a more accurate and practical coal mine accident risk factor analysis model. Furthermore, it offers a novel theoretical foundation and decision-making framework for coal mine safety management and accident prevention, thereby significantly enhancing the capacity to address the diverse safety challenges faced in coal mine production.

2. Materials and Methods

2.1. Dataset

To standardize the reporting and investigative processes for coal mine safety incidents, ensure the implementation of accountability measures for safety failures, and reduce the occurrence of production-related accidents, relevant government agencies have established comprehensive regulations governing the reporting and investigation of coal mine accidents, in accordance with applicable laws and regulations. As such, the formulation of each coal mine accident investigation report must follow a specified procedural framework to guarantee the authenticity, professionalism, objectivity, and authority of the findings. In comparison to other forms of accident-related data, investigation reports offer a more thorough and accurate representation of coal mine accidents. This rationale underpins the decision to select coal mine accident investigation reports as the primary source of data for this study. A total of 700 coal mine accident investigation reports were obtained from the official website of the National Mine Safety Administration, available at https://www.chinamine-safety.gov.cn/, and the respective provincial-level mine safety regulatory agencies. The 700 coal mine accident investigation reports used in this study encompass all accident levels and major types, providing a highly representative sample. These reports cover common safety hazards in coal mining, such as roof accidents, gas accidents, mechanical and electrical accidents, and transportation accidents, with accident levels ranging from particularly serious, serious, and major to general accidents. The reports are sourced from the National Mine Safety Administration and 25 provincial-level coal mine safety regulatory authorities, reflecting the safety conditions and management practices across different regions. However, given the greater number of coal mines and more frequent accidents in certain regions, regional imbalances may exist within the dataset. Nevertheless, these reports span a long period, offering valuable insights into coal mine safety issues over time. Through comprehensive analysis of these data, this study provides a scientific foundation for coal mine Safety Risk Assessments. Future research could expand the dataset further by incorporating coal mine accident data from additional regions and different mine types, thereby enhancing the generalizability of the conclusions.
Coal mine accidents can be categorized into four levels based on the number of fatalities or direct economic losses incurred: particularly major accidents, major accidents, significant accidents, and general accidents. A particularly major accident refers to an incident resulting in 30 or more fatalities; a major accident involves fatalities ranging from 10 to 30; a significant accident is one that causes the deaths of 3 to 10 individuals; and a general accident is defined as an event resulting in fewer than 3 fatalities. Using this classification standard, an analysis of 700 coal mine accident investigation reports was conducted, revealing the distribution of accidents across these categories, as shown in Table 2. Furthermore, Figure 2 illustrates the distribution of coal mine accident types within the 700 reports.

2.2. Identification of Coal Mine Accident Risk Factors

Identifying and extracting risk factors that contribute to coal mine accidents from accident investigation reports is a critical step in research focused on coal mine accidents based on risk factors. Text mining, as the primary method for identifying and extracting these risk factors, offers significant advantages in efficiently processing and automatically analyzing large-scale textual data. It effectively uncovers latent risk patterns and associations. This method is particularly suited for handling complex and content-rich accident investigation reports, significantly enhancing both the efficiency and coverage of the analysis [82]. However, the effectiveness of text mining is highly dependent on data quality and algorithm precision. Insufficient data preprocessing or the presence of ambiguity and unstructured content in the text can severely limit the accuracy of the analysis. Furthermore, text mining has inherent limitations in processing complex contexts and subtle semantic differences, necessitating continuous optimization of algorithms to meet specific analytical needs [83]. Given the unstructured nature of text data and the complexity of language, traditional text mining methods face challenges in accurately understanding text semantics. To address this issue, this study introduces Large Language Models (LLMs) to identify and extract coal mine accident risk factors. LLMs are a class of artificial intelligence models trained on vast amounts of data, designed to understand and generate natural language text. They are widely applied in various natural language processing (NLP) tasks, including text generation, machine translation, sentiment analysis, and question answering systems [84,85,86]. Leveraging their advanced natural language processing capabilities, this study applies LLMs to identify coal mine accident risk factors, ensuring the accuracy and consistency of the risk factor extraction process.
Cross-disciplinary studies have demonstrated that LLM-based approaches offer notable advantages in text semantic understanding tasks. For example, in the legal domain, Wei et al. (2023) compared a fine-tuned DistilBERT model with a non-fine-tuned counterpart in legal text classification and found that the fine-tuned model achieved substantially higher F1 scores at both the document and segment levels, while also outperforming traditional logistic regression models in identifying implicit associations within legal terminology and other complex semantic structures [87]. In the medical domain, Peng et al. (2023) showed that their proposed GatorTronGPT achieved an F1 score improvement of 3% to 10% over traditional models (e.g., BioGPT) in drug–disease relation extraction tasks and that a model trained on synthetic clinical text, GatorTronS, also surpassed baseline models (such as ClinicalBERT) in clinical concept extraction tasks [88]. Moreover, Li et al. (2023) employed CNN and BERT models to analyze construction accident narratives, demonstrating that deep learning models attained significantly higher accuracy in accident classification tasks than traditional methods combining TF-IDF with SVM, and observed that LLMs (such as GPT) can effectively predict accident types and visualize key semantic features through transfer learning [89]. Given that coal mine accident investigation reports exhibit considerable structural and linguistic similarities to the legal, medical, and construction accident reports mentioned above, these findings provide both theoretical and empirical evidence supporting the inference that LLM-based approaches could likewise offer significant advantages in semantic text understanding within the domain of coal mine accidents. Future work could build on this foundation by directly comparing the performance of LLM-based methods with traditional text mining approaches specifically in coal mine accident investigations. Large Language Models (LLMs) can be applied using several strategies, each tailored to specific objectives. The prompting strategy, for instance, involves providing explicit textual input to guide the model in generating relevant information or text. This approach is widely employed in tasks such as text generation, machine translation, and question answering systems. Its key advantage lies in its simplicity and flexibility. However, there is an inherent risk that the generated output may diverge from the intended target [90,91,92]. A second strategy, fine-tuning, involves adapting a pretrained LLM to a specific task or dataset to optimize its performance in a particular context. While this strategy effectively enhances task-specific performance, it requires a substantial volume of labeled data and may lead to overfitting [93,94,95]. Another approach, zero-shot learning, enables LLMs to handle new tasks without specific training on them, demonstrating the model’s remarkable generalization capabilities. This strategy offers high flexibility and convenience, as it eliminates the need for task-specific training; however, its effectiveness may be limited when dealing with complex tasks [96,97]. Lastly, few-shot learning involves providing the model with a small number of examples to facilitate quick adaptation to new tasks, making it particularly useful in data-scarce scenarios. While this strategy is effective in situations where sample sizes are limited, its success largely depends on the quality of the examples provided [98,99]. The diversity and flexibility inherent in these strategies underscore the broad applicability and significant potential of LLMs across various tasks. Given the length and complexity of coal mine accident investigation reports, this study will employ the prompting strategy to apply LLMs. By constructing logically coherent, clearly structured, and highly targeted prompt instructions, the model will be guided to effectively identify and extract risk factors associated with coal mine accidents.

2.3. Association Rule Mining

Association rule mining is a pivotal technique in data analysis, designed to uncover the underlying relationships between different attributes within a dataset [100,101]. Initially, it found applications in supermarket basket analysis [102,103,104], where it was used to identify latent patterns in consumer purchasing behavior. Association rules are typically expressed in the form of “if… then…”, consisting of an antecedent (condition) and a consequent (outcome). The core principle of association rule mining is to determine the likelihood that the consequent will occur given the presence of a specific antecedent within the dataset. In the process of association rule mining, two fundamental statistical measures are considered: support and confidence. Support indicates the frequency with which the association rule appears in the dataset, whereas confidence quantifies the reliability of the consequent occurring when the antecedent condition is met [105,106]. The mining process itself involves generating candidate itemsets, calculating the support of itemsets, constructing association rules, and evaluating the confidence of these rules. This technique has seen widespread application across various fields, including market basket analysis, recommender systems, and network traffic analysis [107,108,109].
Among the various association rule mining algorithms, the Apriori and FP-growth algorithms are the most widely adopted methods [110,111,112]. In recent years, there has been increasing scholarly interest in applying association rule mining techniques to the analysis of accident causation. By conducting in-depth mining of extensive accident datasets, this approach aims to uncover the latent factors and patterns that contribute to the occurrence of accidents. This method not only facilitates the identification of factors that are potentially associated with accidents under specific conditions but also offers new insights and strategies for accident prevention, management, and emergency response. This study utilizes the Apriori algorithm to investigate the hidden association rules between risk factors in coal mine accidents, with the goal of revealing the causal chains that lead to such incidents. The outcomes of this analysis provide a robust theoretical foundation for the development of Bayesian network models and for advancing risk assessment methodologies.

2.4. Bayesian Network

Bayesian networks, a type of probabilistic graphical model, represent the dependencies between random variables through a directed acyclic graph (DAG) structure [113,114,115]. Their flexibility and expressive power have made Bayesian networks indispensable in a wide range of disciplines, particularly for addressing complex systems and managing uncertainty [116,117,118]. In medical diagnostics, Bayesian networks synthesize patient symptoms, test results, and medical history, utilizing probabilistic inference to provide diagnostic support, thereby assisting healthcare providers in making more accurate and informed decisions [119,120,121]. In industrial and safety management, they are employed to assess potential safety risks, evaluate the likelihood of accidents, and identify their root causes, offering a scientific foundation for the development of more effective safety strategies and preventive measures [122,123]. In financial risk management, Bayesian networks model the interactions between market fluctuations, risk factors, and returns, providing a probabilistic framework for investment decision-making [124,125,126]. In natural language processing, they are widely applied in tasks such as semantic analysis, information retrieval, and text classification, helping to uncover latent information structures in textual data [127,128]. In the fields of ecology and environmental management, Bayesian networks are utilized to analyze the intricate relationships among various ecological factors and predict the potential impacts of environmental changes on ecosystems [129,130]. Beyond these, Bayesian networks also hold substantial promise in areas such as bioinformatics, cybersecurity, supply chain management, artificial intelligence, and machine learning [131,132,133]. This study applies the Bayesian network model to comprehensively examine the risk factors involved in coal mine accidents, explore their interdependencies, identify the critical pathways through which accidents occur, and uncover the underlying risk mechanisms. These insights provide both theoretical and practical guidance for the prevention and control of coal mine accidents.

2.5. Research Framework

This study develops a comprehensive framework for identifying, analyzing, and modeling risk factors in coal mine accidents, comprising key components, including risk factor extraction, classification, association rule mining, and Bayesian network analysis. First, based on coal mine accident investigation reports, a large language model (LLM), combined with prompting and few-shot learning, is employed to process textual data, extracting direct risk factors from explicitly identified direct causes and deriving comprehensive and specific risk factors from indirectly stated causes. Subsequently, the Apriori algorithm is applied to the extracted risk factors for association rule mining, identifying strong association patterns among different risk factors. Extreme frequent itemsets (EFIs) are leveraged to identify and filter key highly associated risk factors. Based on the extracted strong association rules, a Bayesian network is constructed to more precisely characterize the dependency relationships among risk factors. The network’s conditional probability distribution is then optimized through parameter learning. Sensitivity analysis, critical path analysis, and high-frequency feature statistics are subsequently conducted to further identify the most influential risk factors and their interdependencies in accident occurrences. The proposed research framework systematically identifies coal mine accident risk factors and elucidates their interconnections. By integrating large language models with association rule mining, this framework efficiently extracts key risk factors from accident investigation reports and uncovers their underlying association patterns. Additionally, the Bayesian network modeling approach quantifies causal relationships among risk factors, establishing a scientific basis for risk prediction and decision-making. This framework advances the automation of accident risk analysis while providing more precise and data-driven risk assessments, thereby supporting more effective coal mine safety management and control strategies. The proposed research framework is illustrated in Figure 3.

3. Results and Analysis

3.1. Identification Results of Risk Factors

This study employs large language models (LLMs) to extract risk factors from accident investigation reports, driven by the following three primary considerations:
(1)
Transfer Learning Advantages. LLMs are deep neural networks designed to process natural language and generate coherent text. Trained on vast text corpora (e.g., Wikipedia, books, news articles, and web pages), they have demonstrated outstanding performance across various natural language processing (NLP) tasks. The application of transfer learning allows these models to rapidly adapt to new tasks, such as risk factor identification, significantly enhancing both training efficiency and accuracy [134,135]. Through transfer learning, pretrained models are fine-tuned to meet specific task requirements, thus circumventing the complexities and resource demands of training from scratch. This approach not only optimizes model deployment speed and performance but also substantially boosts research and practical application efficiency.
(2)
Deep Semantic Understanding. LLMs, particularly those based on multi-layer Transformer architectures, are equipped to grasp and analyze the deep semantic structures and latent information embedded in the text [136,137]. This advanced capability enables these models to accurately extract risk factors that may not be immediately apparent from the complex structures of accident reports. Deep semantic understanding is particularly critical when processing large volumes of unstructured text data, providing robust technical support for the precise identification of risk factors.
(3)
Entity Relationship Extraction. LLMs (e.g., GPT and BERT), having been pretrained on extensive textual data, acquire sophisticated linguistic patterns and are fine-tuned for specific tasks such as entity relationship extraction. By leveraging attention mechanisms, these models focus on pivotal information within the text, using contextual understanding to identify and deconstruct complex relationships [138,139]. As a result, LLMs excel in extracting and interpreting entities and their interrelationships within intricate textual data. This capacity is particularly valuable for analyzing causal relationships, thereby enhancing our understanding of the interactions between risk factors in coal mine accidents.
Given the inherent capabilities of LLMs, utilizing them to extract risk factors from coal mine accident investigation reports presents a highly effective and justified approach. Drawing on their formidable transfer learning capacity, deep semantic comprehension, and efficient entity relationship extraction, LLMs are well suited to process and analyze complex accident reports, providing essential support for the precise identification and extraction of potential risk factors. To further optimize the efficiency of LLM deployment for text analysis, this study utilizes an API-based approach for model invocation.

3.1.1. Prompt Design

Prompting is a technique that involves meticulously crafting input text to guide a pretrained large language model (such as GPT-3) in generating the desired output [140,141]. This method has demonstrated remarkable efficacy and wide applicability across a variety of domains, including information extraction, text generation, and question answering systems [142,143]. When utilizing large language models to extract risk factors from coal mine accident investigation reports, the design of effective prompts is crucial. By strategically formulating input prompts to mirror human reasoning processes, the performance and efficiency of large language models in tasks involving logical reasoning, computational analysis, and decision support can be significantly enhanced, leading to clearer and more interpretable output.
First and foremost, the design of input prompts must be specific, unambiguous, and precise, as this is essential for enabling the model to accurately identify and extract key information, such as deficiencies in safety management and instances of employee misconduct. A well-crafted prompt not only enhances the relevance and accuracy of the model’s outputs but also bolsters its effectiveness in real-world applications. By optimizing the design of input prompts, the overall performance of the model can be significantly improved, making it more applicable and reliable in complex, dynamic environments. Moreover, adopting a chain-of-thought approach plays a crucial role in augmenting the efficacy of prompt design. Chain-of-thought, or causal reasoning, is a cognitive process that systematically links information points through logical sequences [144,145]. In the context of prompt design, utilizing this approach ensures that each step builds on the result of the previous one, thus forming a coherent and structured logical progression. This method enhances the systematicity and effectiveness of the prompt design, ensuring that each element of the prompt is functionally interconnected. Coal mine accident investigation reports typically encompass various sections, including the company profile, accident location, sequence of events, causes of the accident, and accountability. The risk factors influencing coal mine accidents are predominantly found in the section on accident causes, which are further categorized into direct and indirect causes. Direct causes refer to the specific events or actions that directly triggered the accident, while indirect causes are more concerned with the underlying risk factors that contributed to the occurrence. Given the distinct nature of how direct and indirect causes are described in accident reports, this study will develop separate prompts for extracting information related to direct causes and indirect causes, ensuring that each set of risk factors is addressed appropriately.
To ensure that the prompt design meets the highest standards and is practically viable, this study conducted rigorous validation across several state-of-the-art large language models, including Kimi, ChatGPT, and Tongyi Qianwen [146,147,148]. Identical prompts were applied across all models to systematically evaluate their performance in recognizing and extracting key information under different conditions. The results revealed a high level of consistency and strong effectiveness across all tested models, providing compelling evidence for the applicability and validity of the proposed prompt design. This outcome not only further affirms the reliability of the approach but also establishes a solid theoretical and practical foundation for its broader application in future contexts.

3.1.2. Direct Risk Factor Identification Results

This study utilizes large language models, with carefully crafted prompts, to extract direct risk factors from the direct causes of accidents. By conducting an in-depth analysis of coal mine accident reports and integrating the field’s systematic definitions of direct risk factors, we designed a specialized set of prompts for risk factor extraction. These prompts ensure that the model can accurately identify and extract critical factors. The direct causes encompass the triggering events leading to accidents, which include human behavior, equipment and technical failures, environmental changes, and managerial negligence. When designing text prompts to extract risk factors from the direct causes, particular emphasis is placed on various triggering events, especially human behavior-related factors such as violations of safety protocols or improper operations. To ensure the large language model accurately identifies and extracts risk factors from the direct causes—while avoiding erroneous inferences or excessive generalization—the prompts are exclusively focused on human behavior-related events. Other types of accident triggers are categorized under indirect causes. To further enhance the model’s recognition performance, this study integrates a few-shot learning approach. By providing specific examples of risk factor extraction (including accident text and corresponding annotations), the model is trained to enhance its accuracy. The prompts direct the model’s attention to specific tasks, thereby improving precision, while few-shot learning enables the model to effectively learn despite limited annotated data. The synergy of these two strategies not only strengthens the model’s reasoning capabilities but also enhances its adaptability in complex scenarios, significantly improving both its generalization ability and its practical application in real-world contexts. The process is illustrated in Figure 4, and the results of direct risk factor extraction are presented in Table 3.
The extracted direct risk factors primarily encompass irregular operational behaviors, insufficient implementation of safety measures, deficiencies in on-site management, and violations of technical specifications. First, non-compliance with operational procedures by workers, unauthorized actions, and failure to implement essential safety precautions were identified as direct triggers of the accidents. These deviations from standard practices were pivotal in increasing the likelihood of incidents. Second, inadequate enforcement of safety measures, such as failing to reinforce supports as mandated or proceeding with excavation without addressing potential hazards, further amplified the operational risks. In terms of on-site management, the prevalence of unauthorized commands and illegal production practices resulted in a breakdown of safety oversight. Additionally, violations of technical standards, including insufficient support strength and disorganized ventilation systems, contributed to escalating the accident risks. Together, these factors represent the primary sources of direct risks in coal mine accidents, underscoring critical vulnerabilities in operational procedures and safety management.

3.1.3. Indirect Risk Factor Identification Results

The identification and extraction of both composite and specific risk factors must be conducted from the indirect causes of coal mine accidents. The indirect causes section uncovers the deeper, underlying reasons for the triggering events of the accidents, such as deficiencies in site management, inadequate safety training, and the failure to effectively implement corporate responsibilities. These indirect causes are typically presented in a list format in coal mine accident investigation reports, with each cause being distinct and independent. When risk factors arise from the same indirect cause, a high degree of interrelationship exists among them. Each indirect cause typically encompasses one composite risk factor and several specific risk factors, with the latter representing concrete manifestations of the former. Therefore, when designing the text prompts for the identification and extraction of risk factors from the indirect causes, each cause should be addressed individually. The extraction process should follow a methodical approach, considering the hierarchical structure and causal relationships of the risk factors. The extraction procedures for both composite and specific risk factors are aligned with those for direct risk factors, employing a combined strategy of text prompts and few-shot learning. Figure 5 and Figure 6 present the text prompts used for extracting composite and specific risk factors, and the extraction results are displayed in Table 4 and Table 5, with the extracted risk factors highlighted in yellow.
Based on the extraction results, the composite risk factors predominantly span several critical areas, including management, technical operations, safety education and training, and safety supervision, along with the implementation of responsibilities. First, risks associated with management highlight significant deficiencies in the safety management systems and their enforcement within coal mining enterprises. These include violations of operational procedures and negligence in safety practices, underscoring the enterprise’s insufficient emphasis on safety management, which creates substantial gaps in the execution of safety protocols. Second, the risks within technical operations are primarily linked to inadequate operating procedures and neglect in equipment management. The lack of effective standards and controls not only amplifies safety hazards but also severely diminishes the enterprise’s capacity to anticipate and mitigate potential risks during production. Risks concerning safety education and training primarily stem from weak safety awareness among workers and insufficient training programs. This results in a lack of essential skills for risk identification and emergency response, thereby further increasing the likelihood of accidents. Finally, risks in safety supervision and the implementation of responsibilities expose deeper, systemic issues in the enterprise’s safety oversight, responsibility systems, and institutional enforcement. These include ineffective supervision and inadequately implemented accountability structures, reflecting the enterprise’s failure to fully discharge its safety obligations and resulting in lapses in safety management. These composite risk factors are deeply interrelated and represent fundamental weaknesses within the safety management systems of coal mining enterprises. They indicate that accidents are rarely attributable to a single cause but rather emerge as the inevitable result of multiple, interconnected factors. Thus, the accurate identification and effective management of these composite risks are not only essential for preventing accidents but also for enhancing the overall safety management system and ensuring production safety.
Based on the extraction results, the specific risk factors primarily encompass the following aspects. First, the inadequate implementation of safety management is a significant issue, evident in the failure to strictly adhere to on-site safety protocols, the delayed correction of unsafe practices, the neglect of safety inspection responsibilities, and disorganized on-site management. These issues expose critical deficiencies in the enforcement of safety management systems and the overall robustness of the management framework within coal mining enterprises. Second, the deficiency in safety education and training emerges as a pressing concern. This is particularly reflected in the weak safety awareness of workers, insufficient risk identification capabilities, and the superficial nature of training programs, which highlight the enterprise’s considerable shortcomings in enhancing employees’ safety knowledge. As a result, employees lack the necessary knowledge and emergency response skills to adequately address potential risks. Third, risks in technical operations primarily center around inadequate support measures, non-compliance with safety technical specifications, and violations of operational procedures. These issues point to significant weaknesses in the management of technical operations and the absence of effective safety standards, further exacerbating safety hazards during production. Fourth, the failure to enforce safety supervision and responsibility is evident in the lack of clarity around safety responsibilities, the formalized nature of safety inspections, and the inadequate performance of both higher-level companies and local regulatory bodies. These factors underscore structural deficiencies in the enforcement of safety production responsibilities and regulatory frameworks, rendering safety management systems largely ineffective. Lastly, deficiencies in emergency management and production organization are reflected in the failure to promptly identify and eliminate safety hazards, insufficient emergency response equipment, and poor coordination of production activities. These issues collectively heighten the potential for accidents, further amplifying the risks to safety.
In conclusion, the risk factors extracted in this study offer a profound reflection of the management deficiencies and safety vulnerabilities present at multiple levels within coal mining enterprises. These issues not only highlight gaps in day-to-day management but also emphasize the disjointedness and inefficiencies in the implementation of safety measures, which in turn increase the likelihood of accidents. To verify the accuracy and validity of the extracted risk factors, this study extracts risk factors using multiple large language models to verify consistency across models, thereby validating the effectiveness of the designed prompts in risk factor extraction. Simultaneously, the results are compared with manually extracted risk factors to provide additional assurance of the accuracy and effectiveness of the extraction process. These comparative analyses demonstrate the feasibility and reliability of the proposed extraction method in practice. A thorough analysis and effective response to these overarching risk factors are essential for enhancing the safety management standards of coal mining enterprises, ensuring operational safety, and mitigating the occurrence of accidents. Therefore, addressing these issues through targeted corrective actions is pivotal to achieving sustainable safety management and establishing a robust safety framework within coal mining enterprises.

3.2. Association Rule Mining Analysis

The occurrence of coal mine accidents is typically the result of the interaction of multiple risk factors. By identifying and mitigating the key risk factors and their propagation pathways, the frequency of such accidents can be significantly reduced. Accordingly, this study aims to identify and analyze critical risk factors based on those extracted by a large language model for coal mine accidents. The Apriori algorithm is employed to mine association rules from the Boolean dataset of coal mine accident risk factors, enabling the discovery of highly frequent itemsets and strong associations between risk factors. This analysis will provide a solid foundation for the identification and analysis of key risk factors, as well as for the construction of the Bayesian network topology.
When applying the Apriori algorithm for association rule mining, it is crucial to establish two key variables—support and confidence—whose threshold values directly influence the results of the mining process. The current academic literature on association rule mining reveals that there is no universally accepted standard for determining the minimum support and confidence thresholds. To strike a balance between minimizing the generation of irrelevant rules and ensuring the inclusion of significant associations, this study employs an iterative trial-and-error method to determine the optimal threshold combinations. After extensive testing and incorporating feedback from experts and coal mine management, the identified strong association rules were compared and analyzed against the operational rules and practices of coal mine production. Ultimately, the minimum support threshold was set at 0.5, the minimum confidence threshold at 0.4, the maximum number of antecedents was limited to 3, and the minimum lift threshold was set to 1. Through association rule mining, a set of highly frequent itemsets (see Table 6) and 281 strong association rules were derived (some of the results are shown in Table 7). The risk factors within the highly frequent itemsets indicate that these factors consistently appear in coal mine accidents and are key contributors to their occurrence. As seen in Table 6, the number of direct and comprehensive risk factors is relatively small, while specific risk factors are more numerous, underscoring the dominant role that specific risk factors play in triggering coal mine accidents. Furthermore, an analysis of the strong association rules shows that these rules are consistent with the established safety management protocols and procedures in coal mine operations, thus providing an accurate representation of the causal relationships between risk factors. For instance, the first association rule in Table 7 involves specific risk factors {S17, S5, S29, S14} and a comprehensive risk factor {C2}. These four specific risk factors all point to the inadequate management of on-site safety risks, while the comprehensive risk factor C2 represents “insufficient implementation of on-site safety measures”. This rule suggests that when on-site safety risks are not effectively addressed in a timely manner, they are strongly associated with the inadequate execution of safety measures, which, in turn, significantly heightens the likelihood of an accident occurring.
Based on the results of association rule mining on coal mine accident risk factors, a Bayesian network topology was constructed. In this network, the antecedents and consequents of strong association rules were used as nodes, while their relationships were represented by directed edges [149,150]. To ensure the structural validity of the Bayesian network, the network design strictly adhered to the directed acyclic graph (DAG) property, preventing the formation of directed cycles and ensuring the correct propagation of information and the effective computation of probabilities in Bayesian inference. This study employs a strong association rule-based approach as an initial structural learning method for the Bayesian network [151,152]. Its primary advantage lies in its ability to rapidly extract frequent dependencies among risk factors, thereby improving modeling efficiency and providing an intuitive initial network topology. However, since association rules identify correlations between variables based on co-occurrence frequency, they do not inherently imply causal relationships, which may introduce spurious associations. To address this issue, after the initial network construction, the structure was refined using causal constraints, theoretical analysis, and expert knowledge to enhance the reliability of causal inference [153]. Compared to score-based learning approaches, such as those using the Bayesian Information Criterion (BIC) and Akaike Information Criterion (AIC), the strong association rule-based approach offers greater computational efficiency and interpretability, making it particularly suitable for large-scale datasets and scenarios where expert knowledge is limited. However, score-based approaches optimize the network structure globally, ensuring the best model fit while mitigating the influence of spurious associations. Nonetheless, these methods entail high computational complexity, making them less efficient for large datasets. To achieve a balance between efficiency and accuracy, this study integrates the strengths of both approaches: strong association rules are first used to identify highly correlated variables, followed by causal constraint optimization to ensure that the final Bayesian network structure aligns with both data characteristics and the theoretical framework and practical standards of coal mine safety management.
Following the initial construction of the Bayesian network topology, the causal relationships within the network were systematically evaluated and refined based on theoretical analysis and domain knowledge. This process ensured that the network structure was not only consistent with the results of data-driven association rule mining but also aligned with established theoretical frameworks and industry standards, thereby enhancing its accuracy and reliability. Specifically, association rule mining was employed to extract potential causal relationships, which were then subjected to structural validation and causal constraint refinement based on theoretical knowledge and industry standards in the field of coal mine safety. The evaluation criteria primarily included theoretical consistency, causality direction correctness, and the adequacy of empirical data support. The evaluation process focused on three key aspects: First, ensuring that the directionality of causal relationships adheres to established theoretical principles to prevent erroneous causal inference; second, identifying and eliminating spurious correlations, meaning that if an observed association between two variables is driven solely by a third confounding factor rather than a true causal relationship, the connection should be adjusted or removed; and third, improving the interpretability and practical applicability of the network, ensuring that the model not only captures meaningful probabilistic dependencies but also supports evidence-based decision-making in coal mine safety management. Ultimately, the optimized Bayesian network topology (as depicted in Figure 7) represents risk factors as nodes, with directed edges illustrating the probabilistic dependencies and causal relationships between these factors. This refined network structure reduces potential errors in causal inference, enhances the model’s stability and interpretability, and provides a robust scientific foundation for further analysis of key risk factors in coal mine accidents using Bayesian network methodology.

3.3. Main Risk Factor Analysis of Coal Mine Accidents

The construction and analysis of the Bayesian network for coal mine accident risk factors aims to quantitatively identify the critical risk factors contributing to coal mine accidents, thereby facilitating the development of more targeted accident prevention strategies. In this study, the Bayesian network is constructed and analyzed using R, based on the previously established network topology. The parameter learning process is carried out through maximum likelihood estimation. The data used for parameter learning are derived from the risk factor identification results produced by the large language model. The results are processed with Python (v. 3.10.9.) and converted into the format required for parameter learning. The binary values (0/1) are mapped to “F/T,” where “0” represents the non-occurrence of a particular risk factor (F) and “1” represents its occurrence (T). This conversion prepares the data for Bayesian network parameter learning. Subsequent analyses include sensitivity analysis, critical path analysis, and frequency statistical methods to systematically identify and summarize the key risk factors influencing coal mine accidents.

3.3.1. Sensitivity Analysis

Sensitivity analysis is a vital tool in Bayesian networks, employed to quantify the influence of uncertain factors in input nodes on the target variable. In this study, R is utilized to conduct a sensitivity analysis of the nodes within the Bayesian network to identify the key risk factors. The application of sensitivity analysis to coal mine accident risk factors facilitates the identification of those factors that significantly affect the occurrence of coal mine accidents. The first step in sensitivity analysis is to define the target node. In this research, the average degree and the upper quartile of all nodes in the Bayesian network are computed to identify nodes with higher degrees, which are then regarded as important nodes. Through this process, 43 critical nodes were selected (see Figure 8), and these selected nodes are highlighted in red. Each of these 43 nodes is treated sequentially as the target node, with the remaining nodes acting as evidence nodes for sensitivity analysis. In each iteration, only the evidence node with the greatest influence on the target node is retained, and the sensitivity analysis threshold is set to 0.5. The results of the analysis are presented in Figure 9, where red nodes represent evidence nodes (those with the greatest impact on the target node), and blue nodes represent the target nodes.
From the analysis, it is apparent that different evidence nodes exert significantly different influences on the target node. Further examination reveals that some nodes function both as target and evidence nodes, signifying that their influence within the network extends beyond merely being affected by other nodes; they also influence other nodes. Based on this observation, a sensitivity analysis chain is constructed. In this process, the most influential evidence node identified in the previous analysis is designated as the target node, and the search for the next most influential evidence node continues recursively until no further nodes meet the required conditions. The sensitivity analysis threshold remains at 0.5. To ensure the depth and robustness of the analysis, the number of nodes in the sensitivity chain must be at least four. The aim of sensitivity chain analysis is to progressively uncover the influence of the original target node within the network by identifying evidence nodes with the greatest impact on the target node. In this process, only the evidence node with the greatest influence on the target node is retained, which then becomes the new target node for subsequent recursive analysis. This iterative process results in multiple sensitivity chains, as shown in Figure 10, with the red nodes representing the initial target nodes. Upon reviewing the sensitivity chains, it becomes clear that although the initial target nodes vary across the chains, certain nodes appear in multiple chains. Furthermore, the sensitivity analysis results for these common nodes are not consistent. To ensure the optimality of the sensitivity analysis results for all nodes in the chain, the chains are merged. The merging process is depicted in Figure 11. Starting from the initial target node of each chain, the most influential preceding node is identified and treated as the new starting point. The search for the most influential preceding node continues until no more nodes meet the required criteria. Similarly, starting from the initial target node, the most influential succeeding node is identified across all chains, with the process continuing until no further qualifying nodes are found. The final sensitivity chain analysis results are shown in Figure 12, where red nodes represent the initial target nodes of each chain. Based on this analysis, 13 high-sensitivity risk factors were identified: S68, S36, C35, S60, S23, S30, S43, S48, S29, S38, S44, S53, and S55. The sensitivity analysis results illuminate the complexity of the mechanisms underlying coal mine accidents. These high-sensitivity risk factors have a substantial impact on the occurrence of coal mine accidents. To effectively reduce the frequency of such accidents, coal mine enterprises should prioritize managing and mitigating the safety risks associated with these high-sensitivity factors.

3.3.2. Critical Path Analysis

In this study, risk factors are categorized into three types: direct risk factors, comprehensive risk factors, and specific risk factors. Direct risk factors are those closely associated with human behavior and have the potential to directly trigger coal mine accidents. These factors include operational errors, violations, and inappropriate responses during emergencies. Typically, they are directly triggered by the decisions and actions of on-site personnel and have a direct causal relationship with the occurrence of accidents. Comprehensive risk factors, in contrast, are systemic or managerial issues that indirectly increase the likelihood of accidents by influencing the quality of corporate management, the implementation of safety protocols, and the enforcement of oversight mechanisms. These include gaps in management systems, inadequate training, insufficient supervision, and resource shortages. Although these factors may not directly initiate accidents, they significantly amplify the severity of other risk factors, thus exacerbating the risk. Specific risk factors refer to potential hazards that might lead to accidents under particular environmental or situational conditions, often linked to specific workflows or locations. These factors include hazards within the work environment, the risk of equipment or process failures under certain conditions, and time-dependent risks. While these factors may not directly cause accidents, they can significantly escalate the likelihood of an accident under the right conditions. Given the inherent characteristics of these three categories of risk factors, this study focuses on comprehensive risk factors as target nodes in key path analysis, based on causal relationships. While direct risk factors are closely tied to human behavior and often serve as direct triggers of accidents, their impact is typically short-term and immediate. As such, they are not well suited to reflect systemic issues and are therefore excluded from serving as key nodes in path analysis. Although specific risk factors can highlight hazards in particular environments or circumstances, they tend to be localized and condition-dependent, which limits their applicability across varying contexts. Consequently, they are also unsuitable as the primary target nodes for key path analysis. In contrast, comprehensive risk factors have more systemic characteristics and span multiple levels, such as management systems, oversight structures, and safety investments. These factors not only indirectly increase the likelihood of accidents by influencing other risk factors but also provide insight into the overall safety management capabilities of the organization, including the accumulation of long-term risks. Comprehensive risk factors thus offer a broader and more integrated perspective, which is critical for identifying underlying management problems and systemic risks within the organization. For these reasons, this study prioritizes comprehensive risk factors as the target nodes for key path analysis, in order to uncover the deeper managerial issues and systemic risks that underpin accident occurrences.
Using Bayesian network diagnostic inference methods for key path analysis, we first identify the parent node with the highest posterior probability from all comprehensive risk factor nodes, signifying that these factors significantly increase the likelihood of coal mine accidents. This parent node is then treated as the new evidence node, and forward and backward inference is carried out to sequentially identify the parent node with the highest posterior probability. These parent nodes represent the primary drivers triggering the previous set of nodes. This process is repeated until no additional nodes can be inferred, resulting in the key path, as shown in Figure 13. The nodes within this key path can be regarded as critical risk factors for coal mine accidents.
As illustrated in Figure 13, regardless of the initial parent node chosen, nodes S15 and C18 consistently appear, underscoring a fundamental lack of risk awareness regarding safety production and poor management coordination within coal mining enterprises. Addressing these issues would effectively disrupt the primary path leading to accidents, thereby significantly reducing the likelihood of their occurrence. The key path analysis further reveals that inadequate oversight in the formulation and approval of procedural measures (S30) leads to unclear safety production responsibilities (S10), which in turn triggers insufficient safety supervision (C4) and lapses in on-site safety management (C12). Moreover, the failure to implement on-site safety measures (C2), weak safety culture (C14), and organizational deficiencies in production (S21) further exacerbate the likelihood of accidents. The key path analysis identifies the main risk factors influencing coal mine accidents as follows: C12, C2, C4, S10, S30, C18, S15, and S21.

3.3.3. Frequency Statistical Analysis

A further analysis was conducted to assess the frequency of risk factors across all coal mine accident investigation reports (see Figure 14). From Figure 14, the top ten most frequent risk factors are as follows: S1, S5, S62, S39, S7, C2, S4, S14, S23, and S57. The higher the frequency of occurrence, the more likely it is that these risk factors are either overlooked in real-world operations or remain difficult to control within the existing management framework. High-frequency risk factors are often deeply embedded in various stages of the production process, having gone unrecognized and unaddressed over extended periods. This not only amplifies the cumulative effect of latent risks but also plays a significant role in triggering coal mine accidents.
By synthesizing the findings from sensitivity analysis, key path analysis, and frequency statistics, the most critical risk factors contributing to coal mine accidents are identified and summarized in Table 8. Notably, issues such as violations of operational protocols by on-site personnel and inadequate enforcement of regulations are particularly concerning. These reflect significant deficiencies in workers’ safety awareness, operational competence, and risk identification abilities. The root cause of these problems lies in the failure of enterprises to provide effective safety training and technical support. Moreover, the lack of strict oversight in the formulation and approval of safety protocols has led to poor implementation of safety measures. As a result, workers are often unable to adopt appropriate safety precautions in the face of potential hazards, thus elevating the likelihood of accidents.
Furthermore, the absence of robust on-site management exacerbates the situation. The management has failed to fulfill its core responsibilities, particularly in overseeing and providing technical guidance to subordinate coal mines. This failure has resulted in unresolved safety hazards, thereby increasing the risk of rule violations. The inadequate implementation of safety measures at the operational level has further amplified the probability of accidents. Inadequate supervision and inspections have allowed risks to accumulate unchecked, ultimately creating the conditions for potentially catastrophic incidents.

3.3.4. Analysis of Risk Factors Associated with Main Risk Factors

The primary risk factors in coal mining enterprises’ safety production processes frequently occur and are often concealed, making early detection and control challenging. If these key risk factors are not addressed in a timely and effective manner, the risks will inevitably spread to surrounding nodes, triggering new safety hazards. This diffusion of risk not only heightens the probability of accidents but also complicates the identification and mitigation of safety risks.
For example, consider the major risk factor S23 (On-site workers violated work regulations). When the state of this node is set to T (true), the posterior probabilities of other nodes in the Bayesian network are computed. Among these, nodes S33, C7, S25, and S24 exhibit elevated posterior probabilities, indicating that when node S23 is not effectively controlled, the likelihood of related risk factors occurring significantly increases. Specifically, these factors include S33 (Violation of safety technical measures), C7 (Inadequate technical management), S25 (No safety management organization has been established), and S24 (Failure to follow the instructions of the on-site primary responsible person). A similar analysis of nodes S55, S60, S30, C2, C4, and C12 reveals associated risk factors with higher posterior probabilities, as shown in Table 9.
This demonstrates that when key risk factors remain uncontrolled, the probability of occurrence for linked risk factors rises significantly. In this context, implementing joint defense measures becomes a crucial strategy to prevent the further spread of risks within the accident network and to effectively reduce the likelihood of accidents.

4. Conclusions

This study employs large language models (LLMs) for an in-depth analysis of coal mine accident investigation reports, capitalizing on their advanced natural language processing (NLP) capabilities to effectively extract key risk factors impacting coal mine safety. Due to the lack of standardization in the formatting of these reports and the significant variation in their textual expressions, traditional text mining methods struggle to cope, which limits their applicability. To overcome these challenges, this study developed a specialized set of prompts designed for risk factor extraction, guiding the large language model to accurately identify risk factors within the reports. Ultimately, 14 direct risk factors, 38 integrated risk factors, and 75 specific risk factors were identified. Compared to traditional text mining techniques, such as word segmentation, keyword extraction, and semantic analysis, large language models demonstrate superior abilities in contextual understanding and semantic inference. These models are adept at handling complex, diverse linguistic structures, enabling them to process varied and non-standardized text with high accuracy. Furthermore, while traditional text mining methods rely on predefined rules or dictionaries that may fail to account for all potential risk factors, large language models offer greater flexibility in adapting to different expressions, ensuring a more comprehensive and precise extraction process. The NLP approach based on large language models not only preserves the integrity of the original information in the reports but also minimizes the loss of crucial details. Compared to traditional text mining methods, large language models (LLMs) exhibit significant advantages in deep semantic understanding. Traditional approaches, relying on predefined rules and keyword extraction, often fail to fully capture the complex contexts and implicit causal relationships within texts. In contrast, LLMs enable more precise analysis by accurately identifying semantic layers and subtle nuances in context. In this study, risk factors extracted by LLMs, such as “violation of work regulations by on-site workers” (S23), “failure to strictly implement pre-shift meeting arrangements” (S55), and “negligence in managing subordinate coal mines” (S60), demonstrate a level of detail and depth beyond the capabilities of traditional methods. These nuanced risk factors uncover issues that traditional methods typically overlook, rendering the findings more consistent with the actual challenges in coal mine safety management. Consequently, the recommendations proposed in this study are more actionable, providing a robust and reliable foundation for future risk assessments and safety decision-making.
After extracting risk factors from coal mine accident investigation reports, the next critical step is to assess their relationships with coal mine accidents. Given the large number of risk factors involved and their intricate interdependencies, traditional methods have limitations in effectively identifying causal relationships among them. Furthermore, overly complex network structures can impede the effective incorporation of expert knowledge, thereby diminishing both analytical accuracy and practical applicability. In contrast, Bayesian networks provide an efficient approach to handling multiple interrelated factors while simplifying complex analytical problems through graphical structures. They not only capture the intricate dependencies among risk factors but also integrate data-driven learning with expert knowledge, thereby improving the model’s robustness and predictive reliability. Moreover, Bayesian networks excel at modeling uncertainty, making them particularly suitable for analyzing unpredictable risks in coal mine accidents.
Therefore, this study employs Bayesian networks to analyze the risk factors associated with coal mine accidents, aiming to achieve a more comprehensive understanding of their underlying relationships and to generate valuable insights for coal mine safety management.
In a novel approach, this study integrates Bayesian networks with association rule mining to thoroughly examine coal mine accident risk factors. By utilizing the Apriori algorithm to uncover strong association rules among the risk factors, 362 robust association rules were generated, yielding a Bayesian network with 127 risk factor nodes. Through a combination of sensitivity analysis, critical path analysis, and frequency statistics, seven major risk factors were identified: S23 (On-site workers violated work regulations), S55 (Failure to strictly implement pre-shift meeting arrangements), S60 (Neglect in managing subordinate coal mines), S30 (Insufficient scrutiny in the preparation and approval of procedures and measures), C2 (Inadequate implementation of on-site safety measures), C4 (Deficient safety supervision and inspection), and C12 (Poor on-site safety management). The results indicate that the most significant risk factors contributing to coal mine accidents are concentrated in areas such as on-site safety management, the execution of operational procedures, and inadequate safety oversight. These issues, compared to other risks, are more critical and should be prioritized in accident control efforts.
Further analysis reveals that these primary risk factors expose several key shortcomings in coal mine safety management, particularly in the areas of insufficient enforcement of operational procedures, the failure to implement pre-shift meeting protocols, lack of oversight of subordinate coal mines, insufficient scrutiny in procedural approvals, and deficiencies in on-site safety measures and management. To effectively prevent accidents, efforts should focus on addressing these risk factors by strengthening the enforcement of operational procedures, ensuring the rigorous implementation of pre-shift meeting arrangements, and improving the supervision of subordinate coal mines. Additionally, refining the procedural and approval processes, enhancing on-site safety measures, and bolstering the supervisory and inspection capabilities of safety management personnel are vital strategies for reducing the occurrence of coal mine accidents. By establishing and enhancing these management and oversight mechanisms, it becomes possible to systematically identify potential risks and implement effective countermeasures, ultimately improving the effectiveness of accident prevention strategies.
With the rapid advancement of large language models, their application in analyzing coal mine accident risk factors is poised to reach unprecedented depths. Harnessing their sophisticated natural language processing capabilities, large language models excel in efficiently processing complex and heterogeneous accident reports, uncovering latent risk factors, and identifying nuanced details that often elude conventional approaches. Notably, the integration of large language models with advanced techniques such as Bayesian networks and deep learning significantly enhances the precision and comprehensiveness of risk prediction and analysis, offering robust scientific support for coal mine safety management. In this study, large language models are employed to systematically extract key risk factors from accident reports, which are subsequently analyzed using the Apriori algorithm to identify association rules among risk factors. These association rules are further integrated into a Bayesian network to construct a causal analysis framework. This comprehensive approach provides a rigorous theoretical foundation for dynamic risk assessment and the optimization of decision-making processes. The proposed research framework, which synergizes cutting-edge data analysis methodologies with intelligent tools, delineates a forward-looking pathway for enhancing the intelligence of coal mine safety management.
Specifically, large language models demonstrate substantial potential for advancing coal mine safety management across various critical domains. In the area of safety education and training, these models integrate extensive accident case studies and historical data to develop personalized and scenario-based training materials, thereby enhancing the safety awareness and operational skills of both managerial and frontline personnel. In on-site safety management, large language models, when combined with real-time monitoring systems, enable the consolidation and analysis of fragmented data from multiple sources. This facilitates the accurate identification of potential risks and provides management teams with data-driven risk alerts and optimization strategies, resulting in improved management efficiency and more timely hazard responses. Furthermore, in process optimization, large language models extract association rules from accident data to identify common failure patterns, offering robust scientific support for refining process design and strengthening control measures at critical operational stages. In the context of safety supervision and inspection, these models dynamically analyze risk evolution trends, delivering intuitive risk assessment outcomes and actionable strategy recommendations that enhance the efficiency of supervisory activities and bolster the scientific rigor of decision-making processes. The research framework developed in this study offers significant potential for real-world applications. It supports the creation of intelligent coal mine risk assessment and early warning systems capable of real-time monitoring and dynamic analysis of on-site conditions. Additionally, it enables the optimization of operational processes by identifying and addressing high-risk areas, facilitates quantitative risk assessment and adaptive strategy refinement within safety supervision and inspection, and enhances the precision and applicability of safety education and training by generating data-driven, scenario-specific learning content tailored for management and operational personnel. By implementing this framework, coal mine safety management can advance toward higher levels of intelligence, systemization, and precision, providing a robust scientific foundation for improving coal mine production safety and strengthening process safety management practices.

Author Contributions

Conceptualization, G.D. and A.C.; Methodology, G.D.; Software, G.D.; Validation, G.D.; Formal analysis, G.D.; Investigation, G.D. and A.C.; Resources, G.D.; Data curation, G.D.; Writing—original draft, G.D.; Writing—review & editing, G.D. and A.C.; Supervision, A.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original data presented in this study are not publicly available. For data usage requests or further inquiries, please contact the corresponding author(s).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Heinrich, H.W. Industrial Accident Prevention. A Scientific Approach; McGraw-Hill Book Company: New York, NY, USA, 1941. [Google Scholar]
  2. Reason, J. Human Error; Cambridge University Press: Cambridge, UK, 1990. [Google Scholar]
  3. Leveson, N.G. Engineering a Safer World: Systems Thinking Applied to Safety; The MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
  4. Reason, J. Managing the Risks of Organizational Accidents; Routledge: London, UK, 2016. [Google Scholar]
  5. Bennett, J.D.; Passmore, D.L. Probability of death, disability, and restricted work activity in United States underground bituminous coal mines, 1975–1981. J. Saf. Res. 1984, 15, 69–76. [Google Scholar] [CrossRef]
  6. Leigh, J.; Macaskill, P.; Kuosma, E. Global burden of disease and injury due to occupational factors. Epidemiology 1999, 10, 626–631. [Google Scholar] [CrossRef] [PubMed]
  7. Butani, S.J. Relative risk analysis of injuries in coal mining by age and experience at present company. J. Occup. Accid. 1988, 10, 209–216. [Google Scholar] [CrossRef]
  8. Ma, Q.; Wan, M.; Shao, J. Six-hierarchy model of accident analysis and its application in coal mine accidents. J. Saf. Sci. Resil. 2022, 3, 61–71. [Google Scholar] [CrossRef]
  9. Weeks, J.L. Strongmen and straw men: Authoritarian regimes and the initiation of international conflict. Am. Political Sci. Rev. 2012, 106, 326–347. [Google Scholar] [CrossRef]
  10. Zhou, L.; Cao, Q.; Yu, K. Research on occupational safety, health management and risk control technology in coal mines. Int. J. Environ. Res. Public Health 2018, 15, 868. [Google Scholar] [CrossRef]
  11. Samantra, C.; Datta, S.; Mahapatra, S.S. Analysis of occupational health hazards and associated risks in fuzzy environment: A case research in an Indian underground coal mine. Int. J. Inj. Control Saf. Promot. 2017, 24, 311–327. [Google Scholar] [CrossRef] [PubMed]
  12. Nikulin, A.; Nikulina, A.Y. Assessment of occupational health and safety effectiveness at a mining company. Ecol. Environ. Conserv. 2017, 23, 351–355. [Google Scholar]
  13. Zhang, Y.; Wang, S.X.; Yao, J.T. The impact of behavior safety management system on coal mine work safety: A system dynamics model of quadripartite evolutionary game. Resour. Policy 2023, 82, 103497. [Google Scholar] [CrossRef]
  14. Oei, P.Y.; Brauers, H.; Herpich, P. Lessons from Germany’s hard coal mining phase-out: Policies and transition from 1950 to 2018. Clim. Policy 2020, 20, 963–979. [Google Scholar] [CrossRef]
  15. Zhang, G.; Wang, E. Risk identification for coal and gas outburst in underground coal mines: A critical review and future directions. Gas Sci. Eng. 2023, 118, 205106. [Google Scholar] [CrossRef]
  16. Tubis, A.; Werbińska-Wojciechowska, S.; Wroblewski, A. Risk assessment methods in mining industry—A systematic review. Appl. Sci. 2020, 10, 5172. [Google Scholar] [CrossRef]
  17. Galante, E.; Bordalo, D.; Nobrega, M. Risk assessment methodology: Quantitative HazOp. J. Saf. Eng. 2014, 3, 31–36. [Google Scholar]
  18. Single, J.I.; Schmidt, J.; Denecke, J. State of research on the automation of HAZOP studies. J. Loss Prev. Process Ind. 2019, 62, 103952. [Google Scholar] [CrossRef]
  19. Vesely, B. Fault Tree Analysis (FTA): Concepts and Applications; NASA HQ: Washington, DC, USA, 2002. [Google Scholar]
  20. Baybutt, P. A framework for critical thinking in process safety managemen. Process Saf. Prog. 2016, 35, 337–340. [Google Scholar] [CrossRef]
  21. Hooi, Y.K.; Hassan, M.F.; Shariff, A.M. ICT in process safety management. In Proceedings of the IEEE 2014 International Conference on Computer and Information Sciences (ICCOINS), Kuala Lumpur, Malaysia, 3–5 June 2014; pp. 1–6. [Google Scholar]
  22. Theophilus, S.C.; Nwankwo, C.D.; Acquah-Andoh, E. Integrating human factors (HF) into a process safety management system (PSMS). Process Saf. Prog. 2018, 37, 67–85. [Google Scholar] [CrossRef]
  23. Liaw, H.J. Deficiencies frequently encountered in the management of process safety information. Process Saf. Environ. Prot. 2019, 132, 226–230. [Google Scholar] [CrossRef]
  24. Sari, M.; Selcuk, A.S.; Karpuz, C. Stochastic modeling of accident risks associated with an underground coal mine in Turkey. Saf. Sci. 2009, 47, 78–87. [Google Scholar] [CrossRef]
  25. Liu, R.; Cheng, W.; Yu, Y. Human factors analysis of major coal mine accidents in China based on the HFACS-CM model and AHP method. Int. J. Ind. Ergon. 2018, 68, 270–279. [Google Scholar] [CrossRef]
  26. Maiti, J.; Khanzode, V.V. Development of a relative risk model for roof and side fall fatal accidents in underground coal mines in India. Saf. Sci. 2009, 47, 1068–1076. [Google Scholar] [CrossRef]
  27. Yuxin, W.; Gui, F.; Qian, L. Accident case-driven study on the causal modeling and prevention strategies of coal-mine gas-explosion accidents: A systematic analysis of coal-mine accidents in China. Resour. Policy 2024, 88, 104425. [Google Scholar] [CrossRef]
  28. Tong, R.; Cheng, M.; Yang, X. Exposure levels and health damage assessment of dust in a coal mine of Shanxi Province, China. Process Saf. Environ. Prot. 2019, 128, 184–192. [Google Scholar] [CrossRef]
  29. Li, M.; Wang, H.; Wang, D. Risk assessment of gas explosion in coal mines based on fuzzy AHP and bayesian network. Process Saf. Environ. Prot. 2020, 135, 207–218. [Google Scholar] [CrossRef]
  30. He, Y.; Li, J.; Yu, M. Path analysis of coal mine accident risk factors based on the 24Model. Int. J. Occup. Saf. Ergon. 2024, 1–14. [Google Scholar] [CrossRef]
  31. Qiao, W. Analysis and measurement of multifactor risk in underground coal mine accidents based on coupling theory. Reliab. Eng. Syst. Saf. 2021, 208, 107433. [Google Scholar] [CrossRef]
  32. Liang, K.; Liu, J.; Wang, C. The coal mine accident causation model based on the hazard theory. Procedia Eng. 2011, 26, 2199–2205. [Google Scholar] [CrossRef]
  33. Yang, L.; Fang, X.; Wang, X. Risk prediction of coal and gas outburst in deep coal mines based on the SAPSO-ELM algorithm. Int. J. Environ. Res. Public Health 2022, 19, 12382. [Google Scholar] [CrossRef] [PubMed]
  34. Deng, Y.; Song, L.; Zhou, Z. An approach for understanding and promoting coal mine safety by exploring coal mine risk network. Complexity 2017, 2017, 7628569. [Google Scholar] [CrossRef]
  35. Tian, J.; Wang, Y.; Gao, S. Analysis of mining-related injuries in Chinese coal mines and related risk factors: A statistical research study based on a meta-analysis. Int. J. Environ. Res. Public Health 2022, 19, 16249. [Google Scholar] [CrossRef] [PubMed]
  36. Fu, G.; Xie, X.; Jia, Q. Accidents analysis and prevention of coal and gas outburst: Understanding human errors in accidents. Process Saf. Environ. Prot. 2020, 134, 1–23. [Google Scholar] [CrossRef]
  37. Miao, D.; Wang, W.; Lv, Y. Research on the classification and control of human factor characteristics of coal mine accidents based on K-Means clustering analysis. Int. J. Ind. Ergon. 2023, 97, 103481. [Google Scholar] [CrossRef]
  38. Zhang, M.; Li, H.; Xia, H. Human factors analysis of coal mine gas accidents based on improved HFACS model. Hum. Factors Ergon. Manuf. Serv. Ind. 2024, 34, 309–324. [Google Scholar] [CrossRef]
  39. Li, S.; You, M.; Li, D. Identifying coal mine safety production risk factors by employing text mining and Bayesian network techniques. Process Saf. Environ. Prot. 2022, 162, 1067–1081. [Google Scholar] [CrossRef]
  40. Qiu, Z.; Liu, Q.; Li, X. Construction and analysis of a coal mine accident causation network based on text mining. Process Saf. Environ. Prot. 2021, 153, 320–328. [Google Scholar] [CrossRef]
  41. Na, X.U.; Ling, M.A.; Liu, Q. An improved text mining approach to extract safety risk factors from construction accident reports. Saf. Sci. 2021, 138, 105216. [Google Scholar]
  42. Tingjiang, T.; Enyuan, W.; Ke, Z. Research on assisting coal mine hazard investigation for accident prevention through text mining and deep learning. Resour. Policy 2023, 85, 103802. [Google Scholar] [CrossRef]
  43. Lu, C.; Li, S.; Xu, K. Coal mine safety accidents, environmental regulation and economic development—An empirical study of PVAR based on ten major coal provinces in China. Sustainability 2022, 14, 14334. [Google Scholar] [CrossRef]
  44. Hu, J.; Huang, R.; Xu, F. Data mining in coal-mine gas explosion accidents based on evidence-based safety: A case study in China. Sustainability 2022, 14, 16346. [Google Scholar] [CrossRef]
  45. Wang, D.; Sui, W.; Ranville, J.F. Hazard identification and risk assessment of groundwater inrush from a coal mine: A review. Bull. Eng. Geol. Environ. 2022, 81, 421. [Google Scholar] [CrossRef]
  46. Tian, S.; Wang, Y.; Hongxia, L.I. Analysis of the causes and safety countermeasures of coal mine accidents: A case study of coal mine accidents in China from 2018 to 2022. Process Saf. Environ. Prot. 2024, 187, 864–875. [Google Scholar] [CrossRef]
  47. Li, F.; Duan, B.; Sun, Y. Quantitative risk assessment model of working positions for roof accidents in coal mine. Saf. Sci. 2024, 178, 106628. [Google Scholar] [CrossRef]
  48. Tripathy, D.P.; Parida, S.; Khandu, L. Safety risk assessment and risk prediction in underground coal mines using machine learning techniques. J. Inst. Eng. Ser. D 2021, 102, 495–504. [Google Scholar] [CrossRef]
  49. Xuecai, X.; Shifei, S.; Gui, F. Accident case data–accident cause model hybrid-driven coal and gas outburst accident analysis: Evidence from 84 accidents in China during 2008–2018. Process Saf. Environ. Prot. 2022, 164, 67–90. [Google Scholar] [CrossRef]
  50. Matloob, S.; Li, Y.; Khan, K.Z. Safety measurements and risk assessment of coal mining industry using artificial intelligence and machine learning. Open J. Bus. Manag. 2021, 9, 1198–1209. [Google Scholar] [CrossRef]
  51. Amoako, R.; Buaba, J.; Brickey, A. Identifying risk factors from MSHA accidents and injury data using logistic regression. Min. Metall. Explor. 2021, 38, 509–527. [Google Scholar]
  52. You, M.; Li, S.; Li, D. Applications of artificial intelligence for coal mine gas risk assessment. Saf. Sci. 2021, 143, 105420. [Google Scholar] [CrossRef]
  53. Fan, Z.; Xu, F. Health risks of occupational exposure to toxic chemicals in coal mine workplaces based on risk assessment mathematical model based on deep learning. Environ. Technol. Innov. 2021, 22, 101500. [Google Scholar] [CrossRef]
  54. He, Z.; Wu, Q.; Wen, L. A process mining approach to improve emergency rescue processes of fatal gas explosion accidents in Chinese coal mines. Saf. Sci. 2019, 111, 154–166. [Google Scholar] [CrossRef]
  55. Hao, M.; Nie, Y. Hazard identification, risk assessment and management of industrial system: Process safety in mining industry. Saf. Sci. 2022, 154, 105863. [Google Scholar] [CrossRef]
  56. Xuecai, X.; Xueming, S.; Gui, F. Accident causes data-driven coal and gas outburst accidents prevention: Application of data mining and machine learning in accident path mining and accident case-based deduction. Process Saf. Environ. Prot. 2022, 162, 891–913. [Google Scholar] [CrossRef]
  57. Du, J.; Chen, J.; Pu, Y. Risk assessment of dynamic disasters in deep coal mines based on multi-source, multi-parameter indexes, and engineering application. Process Saf. Environ. Prot. 2021, 155, 575–586. [Google Scholar] [CrossRef]
  58. Li, B.; Wang, E.; Shang, Z. Optimize the early warning time of coal and gas outburst by multi-source information fusion method during the tunneling process. Process Saf. Environ. Prot. 2021, 149, 839–849. [Google Scholar] [CrossRef]
  59. Jin, L.I.U. Construction of Emergency Services Platform for Coal Mining Accidents by Integrating Multi Source Datasets. International Journal of Simulation—Systems. Sci. Technol. 2016, 17, 36.1–36.5. [Google Scholar]
  60. Cai, G.; Zheng, X.; Guo, J.; Gao, W. Real-time identification of borehole rescue environment situation in underground disaster areas based on multi-source heterogeneous data fusion. Saf. Sci. 2025, 181, 106690. [Google Scholar] [CrossRef]
  61. Zhang, J.; Xu, K.; Reniers, G. Statistical analysis the characteristics of extraordinarily severe coal mine accidents (ESCMAs) in China from 1950 to 2018. Process Saf. Environ. Prot. 2020, 133, 332–340. [Google Scholar] [CrossRef]
  62. Chen, H.; Qi, H.; Long, R. Research on 10-year tendency of China coal mine accidents and the characteristics of human factors. Saf. Sci. 2012, 50, 745–750. [Google Scholar] [CrossRef]
  63. Li, X.; Cao, Z.; Xu, Y. Characteristics and trends of coal mine safety development. Energy Sources Part A Recovery Util. Environ. Eff. 2021, 47, 2316–2334. [Google Scholar] [CrossRef]
  64. Zhang, J.; Fu, J.; Hao, H. Root causes of coal mine accidents: Characteristics of safety culture deficiencies based on accident statistics. Process Saf. Environ. Prot. 2020, 136, 78–91. [Google Scholar] [CrossRef]
  65. Zhang, C.; Wang, P.; Wang, E. Characteristics of coal resources in China and statistical analysis and preventive measures for coal mine accidents. Int. J. Coal Sci. Technol. 2023, 10, 22. [Google Scholar] [CrossRef] [PubMed]
  66. Fu, G.; Zhao, Z.; Hao, C. The accident path of coal mine gas explosion based on 24Model: A case study of the Ruizhiyuan gas explosion accident. Processes 2019, 7, 73. [Google Scholar] [CrossRef]
  67. Yin, W.; Fu, G.; Yang, C. Fatal gas explosion accidents on Chinese coal mines and the characteristics of unsafe behaviors: 2000–2014. Saf. Sci. 2017, 92, 173–179. [Google Scholar]
  68. Jiang, W.; Qu, F.; Zhang, L. Quantitative identification and analysis on hazard sources of roof fall accident in coal mine. Procedia Eng. 2012, 45, 83–88. [Google Scholar] [CrossRef]
  69. Yinnan, H.; Ruxiang, Q. Analysis of the spatial distribution and future trends of coal mine accidents: A case study of coal mine accidents in China from 2005–2022. Spat. Stat. 2024, 63, 100851. [Google Scholar] [CrossRef]
  70. Amin, M.T.; Khan, F.; Amyotte, P. A bibliometric review of process safety and risk analysis. Process Saf. Environ. Prot. 2019, 126, 366–381. [Google Scholar]
  71. Yang, Y.; Chen, G.; Reniers, G. A bibliometric analysis of process safety research in China: Understanding safety research progress as a basis for making China’s chemical industry more sustainable. J. Clean. Prod. 2020, 263, 121433. [Google Scholar]
  72. Lee, S.; Song, J.; Kim, Y. An empirical comparison of four text mining methods. J. Comput. Inf. Syst. 2010, 51, 1–10. [Google Scholar]
  73. Jusoh, S.; Alfawareh, H.M. Techniques, applications and challenging issue in text mining. Int. J. Comput. Sci. Issues 2012, 9, 431. [Google Scholar]
  74. Talib, R.; Hanif, M.K.; Ayesha, S. Text mining: Techniques, applications and issues. Int. J. Adv. Comput. Sci. Appl. 2016, 7, 414–418. [Google Scholar] [CrossRef]
  75. Baden, C.; Pipal, C.; Schoonvelde, M. Three gaps in computational text analysis methods for social sciences: A research agenda. Commun. Methods Meas. 2022, 16, 1–18. [Google Scholar]
  76. Kasneci, E.; Seßler, K.; Küchemann, S. ChatGPT for good? On opportunities and challenges of large language models for education. Learn. Individ. Differ. 2023, 103, 102274. [Google Scholar]
  77. Meyer, J.G.; Urbanowicz, R.J.; Martin PC, N. ChatGPT and large language models in academia: Opportunities and challenges. BioData Min. 2023, 16, 20. [Google Scholar] [CrossRef] [PubMed]
  78. Teubner, T.; Flath, C.M.; Weinhardt, C. Welcome to the era of chatgpt et al. the prospects of large language models. Bus. Inf. Syst. Eng. 2023, 65, 95–101. [Google Scholar] [CrossRef]
  79. Min, B.; Ross, H.; Sulem, E. Recent advances in natural language processing via large pre-trained language models: A survey. ACM Comput. Surv. 2023, 56, 1–40. [Google Scholar] [CrossRef]
  80. Chen, S.; Zhang, Y.; Yang, Q. Multi-task learning in natural language processing: An overview. ACM Comput. Surv. 2024, 56, 1–32. [Google Scholar]
  81. Sindhu, B.; Prathamesh, R.P.; Sameera, M.B. The evolution of large language model: Models, applications and challenges2024. In Proceedings of the IEEE International Conference on Current Trends in Advanced Computing (ICCTAC), Bhubaneswar, India, 15–17 February 2024; pp. 1–8. [Google Scholar]
  82. Gaikwad, S.V.; Chaugule, A.; Patil, P. Text mining methods and techniques. Int. J. Comput. Appl. 2014, 85, 42–45. [Google Scholar]
  83. Hickman, L.; Thapa, S.; Tay, L. Text preprocessing for text mining in organizational research: Review and recommendations. Organ. Res. Methods 2022, 25, 114–146. [Google Scholar]
  84. Meng, X.; Yan, X.; Zhang, K.; Liu, D.; Cui, X.; Yang, Y.; Zhang, M.; Cao, C.; Wang, J.; Wang, X.; et al. The application of large language models in medicine: A scoping review. Iscience 2024, 27, 109713. [Google Scholar]
  85. Topsakal, O.; Akinci, T.C. Creating large language model applications utilizing langchain: A primer on developing llm apps fast. In Proceedings of the International Conference on Applied Engineering and Natural Sciences, Konya, Turkey, 16–17 June 2023; Volume 1, pp. 1050–1056. [Google Scholar]
  86. Yang, J.; Jin, H.; Tang, R. Harnessing the power of llms in practice: A survey on chatgpt and beyond. ACM Trans. Knowl. Discov. Data 2024, 18, 1–32. [Google Scholar]
  87. Wei, F.; Keeling, R.; Huber-Fliflet, N. Empirical study of LLM fine-tuning for text classification in legal document review2023. In Proceedings of the IEEE International Conference on Big Data (BigData), Sorrento, Itlay, 15–18 December 2023; pp. 2786–2792. [Google Scholar]
  88. Peng, C.; Yang, X.; Chen, A. A study of generative large language model for medical research and healthcare. NPJ Digit. Med. 2023, 6, 210. [Google Scholar]
  89. Li, J.; Wu, C. Deep learning and text mining: Classifying and extracting key information from construction accident narratives. Appl. Sci. 2023, 13, 10599. [Google Scholar] [CrossRef]
  90. Qi, S.; Cao, Z.; Rao, J. What is the limitation of multimodal llms? A deeper look into multimodal llms through prompt probing. Inf. Process. Manag. 2023, 60, 103510. [Google Scholar] [CrossRef]
  91. Wang, L.; Chen, X.; Deng, X.W. Prompt engineering in consistency and reliability with the evidence-based guideline for LLMs. NPJ Digit. Med. 2024, 7, 41. [Google Scholar] [CrossRef]
  92. Venerito, V.; Lalwani, D.; Del Vescovo, S. Prompt engineering: The next big skill in rheumatology research. Int. J. Rheum. Dis. 2024, 27, e15157. [Google Scholar] [CrossRef] [PubMed]
  93. Luo, L.; Ning, J.; Zhao, Y. Taiyi: A bilingual fine-tuned large language model for diverse biomedical tasks. J. Am. Med. Inform. Assoc. 2024, 31, 1865–1874. [Google Scholar] [CrossRef] [PubMed]
  94. Wang, A.; Liu, C.; Yang, J. Fine-tuning large language models for rare disease concept normalization. J. Am. Med. Inform. Assoc. 2024, 31, 2076–2083. [Google Scholar] [CrossRef]
  95. Pornprasit, C.; Tantithamthavorn, C. Fine-tuning and prompt engineering for large language models-based code review automation. Inf. Softw. Technol. 2024, 175, 107523. [Google Scholar] [CrossRef]
  96. Ziems, C.; Held, W.; Shaikh, O. Can large language models transform computational social science? Comput. Linguist. 2024, 50, 237–291. [Google Scholar] [CrossRef]
  97. Russe, M.F.; Reisert, M.; Bamberg, F. Improving the use of LLMs in radiology through prompt engineering: From precision prompts to zero-shot learning. RöFo-Fortschritte Geb. Röntgenstrahlen Bildgeb. Verfahr. 2024, 196, 1166–1170. [Google Scholar] [CrossRef] [PubMed]
  98. Viswanathan, V.; Gashteovski, K.; Lawrence, C. Large language models enable few-shot clustering. arXiv 2023, arXiv:2307.00524. [Google Scholar] [CrossRef]
  99. Li, T.; Shetty, S.; Kamath, A. CancerGPT for few shot drug pair synergy prediction using large pretrained language models. NPJ Digit. Med. 2024, 7, 40. [Google Scholar] [CrossRef]
  100. Kumbhare, T.A.; Chobe, S.V. An overview of association rule mining algorithms. Int. J. Comput. Sci. Inf. Technol. 2014, 5, 927–930. [Google Scholar]
  101. Telikani, A.; Gandomi, A.H.; Shahbahrami, A. A survey of evolutionary computation for association rule mining. Inf. Sci. 2020, 524, 318–352. [Google Scholar] [CrossRef]
  102. Wang, F.; Wen, Y.; Guo, T. Collaborative filtering and association rule mining-based market basket recommendation on spark. Concurr. Comput. Pract. Exp. 2020, 32, e5565. [Google Scholar]
  103. Ünvan, Y.A. Market basket analysis with association rules. Commun. Stat. Theory Methods 2021, 50, 1615–1628. [Google Scholar] [CrossRef]
  104. Alawadh, M.M.; Barnawi, A.M. A survey on methods and applications of intelligent market basket analysis based on association rule. J. Big Data 2022, 4, 1–25. [Google Scholar]
  105. Qodmanan, H.R.; Nasiri, M.; Minaei-Bidgoli, B. Multi objective association rule mining with genetic algorithm without specifying minimum support and minimum confidence. Expert Syst. Appl. 2011, 38, 288–298. [Google Scholar]
  106. Baralis, E.; Cagliero, L.; Cerquitelli, T. Generalized association rule mining with constraints. Inf. Sci. 2012, 194, 68–84. [Google Scholar] [CrossRef]
  107. Altaf, W.; Shahbaz, M.; Guergachi, A. Applications of association rule mining in health informatics: A survey. Artif. Intell. Rev. 2017, 47, 313–340. [Google Scholar]
  108. Shaukat, K.; Zaheer, S.; Nawaz, I. Association rule mining: An application perspective. Int. J. Comput. Sci. Innov. 2015, 2015, 29–38. [Google Scholar]
  109. Diaz-Garcia, J.A.; Ruiz, M.D.; Martin-Bautista, M.J. A survey on the use of association rules mining techniques in textual social media. Artif. Intell. Rev. 2023, 56, 1175–1200. [Google Scholar] [PubMed]
  110. Yuan, X. An improved Apriori algorithm for mining association rules. In Proceedings of the AIP Conference Proceedings, Provo, UT, USA, 16–21 July 2017; AIP Publishing: College Park, MD, USA, 2017; Volume 1820. [Google Scholar]
  111. Zeng, Y.; Yin, S.; Liu, J. Research of Improved FP-Growth Algorithm in Association Rules Mining. Sci. Program. 2015, 2015, 910281. [Google Scholar]
  112. Shawkat, M.; Badawi, M.; El-ghamrawy, S. An optimized FP-growth algorithm for discovery of association rules. J. Supercomput. 2022, 78, 5479–5506. [Google Scholar] [CrossRef]
  113. Friedman, N.; Koller, D. Being Bayesian about network structure. A Bayesian approach to structure discovery in Bayesian networks. Mach. Learn. 2003, 50, 95–125. [Google Scholar]
  114. Atienza, D.; Bielza, C.; Larrañaga, P. Semiparametric bayesian networks. Inf. Sci. 2022, 584, 564–582. [Google Scholar]
  115. Ben-Gal, I. Bayesian Networks. In Encyclopedia of Statistics in Quality and Reliability; John Wiley & Sons: Hoboken, NJ, USA, 2008. [Google Scholar]
  116. Gemela, J. Learning Bayesian networks using various datasources and applications to financial analysis. Soft Comput. 2003, 7, 297–303. [Google Scholar]
  117. Drury, B.; Valverde-Rebaza, J.; Moura, M.F. A survey of the applications of Bayesian networks in agriculture. Eng. Appl. Artif. Intell. 2017, 65, 29–42. [Google Scholar]
  118. Plankensteiner, K.; Bluder, O.; Pilz, J. Bayesian network model with application to smart power semiconductor lifetime data. Risk Anal. 2015, 35, 1623–1639. [Google Scholar]
  119. Refai, A.; Merouani, H.F.; Aouras, H. Maintenance of a Bayesian network: Application using medical diagnosis. Evol. Syst. 2016, 7, 187–196. [Google Scholar]
  120. Lautenbach, K.; Pinl, A. A Petri net representation of Bayesian message flows: Importance of Bayesian networks for biological applications. Nat. Comput. 2011, 10, 683–709. [Google Scholar]
  121. Kaur, D.; Sobiesk, M.; Patil, S. Application of Bayesian networks to generate synthetic health data. J. Am. Med. Inform. Assoc. 2021, 28, 801–811. [Google Scholar]
  122. Smith, D.; Veitch, B.; Khan, F. Understanding industrial safety: Comparing Fault tree, Bayesian network, and FRAM approaches. J. Loss Prev. Process Ind. 2017, 45, 88–101. [Google Scholar] [CrossRef]
  123. Hwang, S.; Boyle, L.N.; Banerjee, A.G. Identifying characteristics that impact motor carrier safety using Bayesian networks. Accid. Anal. Prev. 2019, 128, 40–45. [Google Scholar] [CrossRef]
  124. Wang GW, Y.; Yang, Z.; Zhang, D. Application of Bayesian networks in analysing tanker shipping bankruptcy risks. Marit. Bus. Rev. 2017, 2, 177–198. [Google Scholar] [CrossRef]
  125. Lin EM, H.; Sun, E.W.; Yu, M.T. Behavioral data-driven analysis with Bayesian method for risk management of financial services. Int. J. Prod. Econ. 2020, 228, 107737. [Google Scholar]
  126. Gandy, A.; Veraart LA, M. A Bayesian methodology for systemic risk assessment in financial networks. Manag. Sci. 2017, 63, 4428–4446. [Google Scholar] [CrossRef]
  127. Sohn, S.; Larson, D.W.; Habermann, E.B. Detection of clinically important colorectal surgical site infection using Bayesian network. J. Surg. Res. 2017, 209, 168–173. [Google Scholar] [CrossRef] [PubMed]
  128. Turner, R.J.; Hagoort, K.; Meijer, R.J. Bayesian network analysis of antidepressant treatment trajectories. Sci. Rep. 2023, 13, 8428. [Google Scholar] [CrossRef] [PubMed]
  129. Milns, I.; Beale, C.M.; Smith, V.A. Revealing ecological networks using Bayesian network inference algorithms. Ecology 2010, 91, 1892–1899. [Google Scholar] [CrossRef]
  130. Zhang, W.; Liu, G.; Yang, Z. Urban agglomeration ecological risk transfer model based on Bayesian and ecological network. Resources Conserv. Recycl. 2020, 161, 105006. [Google Scholar]
  131. Su, X.; Bai, P.; Du, F. Application of Bayesian networks in situation assessment. In Proceedings of the International Conference on Intelligent Computing and Information Science, Chongqing, China, 8–9 January 2011; Springer: Berlin/Heidelberg, Germany, 2011; pp. 643–648. [Google Scholar]
  132. Weber, P.; Medina-Oliva, G.; Simon, C. Overview on Bayesian networks applications for dependability, risk analysis and maintenance areas. Eng. Appl. Artif. Intell. 2012, 25, 671–682. [Google Scholar]
  133. Cai, B.; Liu, Y.; Liu, Z. Application of Bayesian networks in quantitative risk assessment of subsea blowout preventer operations. Risk Anal. 2013, 33, 1293–1311. [Google Scholar] [CrossRef] [PubMed]
  134. Nicula, B.; Dascalu, M.; Newton, N.N. Automated paraphrase quality assessment using language models and transfer learning. Computers 2021, 10, 166. [Google Scholar] [CrossRef]
  135. Sulaiman, N.; Hamzah, F. Evaluation of Transfer Learning and Adaptability in Large Language Models with the Glue Benchmark; Authorea Preprints: Hoboken, NJ, USA, 2024. [Google Scholar]
  136. Liu, Y.; Han, T.; Ma, S. Summary of chatgpt-related research and perspective towards the future of large language models. Meta-Radiology 2023, 1, 100017. [Google Scholar] [CrossRef]
  137. Elhafsi, A.; Sinha, R.; Agia, C. Semantic anomaly detection with large language models. Auton. Robot. 2023, 47, 1035–1055. [Google Scholar]
  138. Zhou, S.; Zhou, Z.; Wang, C. A User-Centered Framework for Data Privacy Protection Using Large Language Models and Attention Mechanisms. Appl. Sci. 2024, 14, 6824. [Google Scholar] [CrossRef]
  139. Zhang, R.; Han, J.; Liu, C. LLaMA-adapter: Efficient fine-tuning of large language models with zero-initialized attention. In Proceedings of the Twelfth International Conference on Learning Representations, Vienna, Austria, 22–27 April 2024. [Google Scholar]
  140. Liu, X.; Wang, J.; Sun, J. Prompting frameworks for large language models: A survey. arXiv 2023, arXiv:2311.12785. [Google Scholar]
  141. Marvin, G.; Hellen, N.; Jjingo, D. Prompt engineering in large language models. In Proceedings of the International Conference on Data Intelligence and Cognitive Informatics, Tirunelveli, India, 27–28 June 2023; Springer Nature: Singapore, 2023; pp. 387–402. [Google Scholar]
  142. Cao, C.; Sang, J.; Arora, R. Prompting is all you need: LLMs for systematic review screening. medRxiv 2024. [Google Scholar] [CrossRef]
  143. Cheng, Y.; Chen, J.; Huang, Q. Prompt sapper: A LLM-empowered production tool for building AI chains. ACM Trans. Softw. Eng. Methodol. 2024, 33, 1–24. [Google Scholar] [CrossRef]
  144. Zou, A.; Zhang, Z.; Zhao, H. Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models. arXiv 2023, arXiv:2310.06692. [Google Scholar]
  145. Zhang, Z.; Zhang, A.; Li, M. Automatic chain of thought prompting in large language models. arXiv 2022, arXiv:2210.03493. [Google Scholar]
  146. OpenAI. ChatGPT (GPT-4). 2023. Available online: https://openai.com/chatgpt (accessed on 22 January 2025).
  147. Alibaba Cloud. Qwen, the AI Assistant. Available online: https://tongyi.aliyun.com/acloud.com/qwen (accessed on 22 January 2025).
  148. Moonshot, A.I. Academic Search Assistant. Moonshot AI. 2024. Available online: https://kimi.moonshot.cn/ (accessed on 22 January 2025).
  149. Lamma, E.; Riguzzi, F.; Stambazzi, A. Improving the SLA algorithm using association rules. In AI*IA 2003: Advances in Artificial Intelligence: 8th Congress of the Italian Association for Artificial Intelligence, Pisa, Italy, 23–26 September 2003; Proceedings 8; Springer: Berlin/Heidelberg, Germany, 2003; pp. 165–175. [Google Scholar]
  150. Iqbal, K.; Asghar, S. Generating hierarchical association rules with the use of Bayesian network 2009. In Proceedings of the IEEE Third International Conference on Network and System Security, Melbourne, Australia, 19–21 October 2009; pp. 528–533. [Google Scholar]
  151. Lamma, E.; Riguzzi, F.; Storari, S. Improving the K2 algorithm using association rule parameters. Mod. Inf. Process. Elsevier Sci. 2006, 1, 207–217. [Google Scholar]
  152. Storari, S.; Riguzzi, F.; Lamma, E. Exploiting association and correlation rules parameters for learning Bayesian networks. Intell. Data Anal. 2009, 13, 689–701. [Google Scholar]
  153. Jinhua, W.; Xuehua, M.; Jie, C. Fault diagnosis method of Bayesian network based on association rules. Trans. Inst. Meas. Control 2024, 01423312241267256. [Google Scholar] [CrossRef]
Figure 1. China’s coal mine safety production trends, 2013–2023.
Figure 1. China’s coal mine safety production trends, 2013–2023.
Sustainability 17 01896 g001
Figure 2. Counts of coal mine accident types.
Figure 2. Counts of coal mine accident types.
Sustainability 17 01896 g002
Figure 3. Research framework.
Figure 3. Research framework.
Sustainability 17 01896 g003
Figure 4. Direct risk factor extraction process.
Figure 4. Direct risk factor extraction process.
Sustainability 17 01896 g004
Figure 5. Comprehensive risk factor extraction process.
Figure 5. Comprehensive risk factor extraction process.
Sustainability 17 01896 g005
Figure 6. Specific risk factor extraction process.
Figure 6. Specific risk factor extraction process.
Sustainability 17 01896 g006
Figure 7. Bayesian network topology structure.
Figure 7. Bayesian network topology structure.
Sustainability 17 01896 g007
Figure 8. Node degree.
Figure 8. Node degree.
Sustainability 17 01896 g008
Figure 9. Sensitivity analysis results.
Figure 9. Sensitivity analysis results.
Sustainability 17 01896 g009
Figure 10. Initial outcomes of sensitivity chain analysis.
Figure 10. Initial outcomes of sensitivity chain analysis.
Sustainability 17 01896 g010
Figure 11. Process of sensitivity chain merging.
Figure 11. Process of sensitivity chain merging.
Sustainability 17 01896 g011
Figure 12. Risk factor sensitivity analysis chains.
Figure 12. Risk factor sensitivity analysis chains.
Sustainability 17 01896 g012
Figure 13. Critical path analysis results.
Figure 13. Critical path analysis results.
Sustainability 17 01896 g013
Figure 14. Frequency statistics of risk factors.
Figure 14. Frequency statistics of risk factors.
Sustainability 17 01896 g014
Table 1. Comparison between this study and previous literature.
Table 1. Comparison between this study and previous literature.
Comparison DimensionsPrevious StudiesThis Study
Research
Methods
Text Mining MethodsLarge Language Models
(1) Limited contextual and semantic understanding.(1) Advanced contextual and semantic understanding.
(2) Scalability and adaptability constraints.(2) Scalability and cross-domain adaptability.
(3) Lack of generative and complex reasoning capabilities.(3) Generative and inferential capabilities.
Risk
Identification
(1) Limited ability to capture implicit and evolving risks.(1) Enhanced contextual and semantic understanding.
(2) Challenges in handling ambiguity and misinformation.(2) Robust adaptability to dynamic and unstructured data.
(3) Lack of advanced causal and predictive insights. (3) Advanced predictive and inferential capabilities.
Table 2. Accident severity level statistics.
Table 2. Accident severity level statistics.
Accident Severity LevelNumber of Accidents
Extremely serious accident13
Major accident43
Significant accident105
Minor accident539
Table 3. Direct risk factors.
Table 3. Direct risk factors.
Risk Factor IDDirect Risk Factors
D1Violation of operating procedures by workers
D2Unauthorized operation without following on-site instructions
D3Failure to implement safety protection measures
D4Violation of safety technical regulations
D5Insufficient support strength leading to roof collapse
D6Non-compliance with roof inspection and tapping protocol
D7Illegal and unlawful organization of production
D8Failure to implement enhanced support measures
D9Risk-taking in organizing operations
D10On-site violation of command procedures
D11Non-compliant and unauthorized operations by workers
D12Disruption of the ventilation system
D13Parallel operations in violation of safety rules
D14Arranging excavation without eliminating accident hazards
Table 4. Comprehensive risk factors.
Table 4. Comprehensive risk factors.
Risk Factor IDComprehensive Risk Factors
C1On-site violations and irregular operations
C2Inadequate implementation of on-site safety measures
C3Failure to enforce mutual safety measures
C4Deficient safety supervision and inspection
C5Insufficient safety education and training
C6Prioritization of production over safety in coal mines
C7Inadequate technical management
C8Weak implementation of safety policies by regulatory authorities
C9Inadequate identification and control of safety risks
C10Illegal contracting
C11Non-compliance with safety technical measures
C12Poor on-site safety management
C13Incomplete and non-targeted operating procedures
C14Weak safety production concepts
C15Violation of labor discipline by on-site personnel
C16Unclear focus on key tasks
C17Deficient safety management
C18Lack of coordination in safety production management
C19Failure to timely implement regional rock burst prevention measures
C20Unclear division of responsibilities in safety risk control management
C21Insufficient safety confirmation
C22Inadequate implementation of safety production management systems
C23Low safety awareness among operators
C24Non-standardized management of outsourced projects and personnel
C25Lack of comprehensive reflection and lesson learning from accidents
C26Inadequate hidden danger investigation and remediation
C27Failure to fulfill corporate responsibility for safety
C28Weak Safety Supervision and Regulation
C29Non-compliance with job responsibilities
C30Gross negligence in safety management
C31Inadequate safety management and supervision by the parent company
C32Poor equipment safety management
C33Insufficient investment in safety production
C34Lax standards and irregular procedures for resumption of work and production
C35Inadequate Safety Risk Assessment
C36Incomplete safety responsibility system
C37Disorderly blasting management
C38Deficient safety management structure
Table 5. Specific risk factors.
Table 5. Specific risk factors.
Risk Factor IDSpecific Risk Factors
S1Failure to detect and prevent employees’ violations of work regulations
S2No mutual support arrangement was made during the accident shift
S3Inadequate fulfillment of safety inspection responsibilities
S4Safety education and training were conducted superficially
S5Failure to promptly identify and eliminate potential safety hazards
S6Employees have a low ability to identify job-related risks
S7Workers exhibit a lack of safety awareness
S8No extension or reinforcement measures were taken for advanced support
S9Parallel and cross operations at the work site
S10Safety responsibilities are unclear
S11Safety management systems are not implemented
S12On-site operations were carried out without following approval procedures
S13The safety responsibility system for all personnel is not fully established
S14Failure to detect signs of roof collapse on-site
S15Insufficient awareness of safety risks
S16Poor reliability of emergency rescue equipment
S17No warning line set at the construction site
S18Workers entered hazardous areas in violation of regulations
S19Safety risk identification was merely formalized
S20Safety production work arrangements are not specific
S21Production organization has vulnerabilities
S22Failure to provide a safe working environment for employees
S23On-site workers violated work regulations
S24Failure to follow the instructions of the on-site primary responsible person
S25No safety management organization has been established
S26No safety protection measures were taken during operations
S27On-site supervisors failed to effectively control on-site work
S28Inadequate implementation of inspection procedures for critical locations
S29Operating procedures and safety technical measures lack practicality and specificity
S30Insufficient scrutiny in the preparation and approval of procedures and measures
S31On-site personnel lack self-protection and mutual protection awareness
S32Some personnel’s professional skills do not meet business requirements
S33Violation of safety technical measures
S34Weak compliance awareness among management personnel
S35Inadequate enforcement of the pre-shift meeting system
S36Insufficient emphasis on critical work arrangements
S37Inadequate on-site safety confirmation
S38Inadequate safety management of the construction team
S39Inadequate enforcement of on-site safety management systems
S40Poor coordination of safety production management work
S41Inadequate fulfillment of the position responsibility system
S42No temporary support was provided when replacing supports
S43Disorderly on-site safety management
S44Failure to implement the safety hazard inspection system
S45Safety inspections are merely formalized
S46Insufficient allocation of safety management personnel
S47Irregular management of labor employment
S48The safety management system is not fully established
S49Other personnel on-site failed to stop violations of regulations in time
S50Weak legal awareness among management personnel
S51Failure to take effective anti-impact measures
S52Unauthorized alternating excavation operations
S53Failure to implement personnel limitation measures as required
S54Workers engaging in risky operations
S55Failure to strictly implement pre-shift meeting arrangements
S56Operations conducted in unsafe locations
S57Failure to properly supervise, inspect, and approve work during shifts
S58Failure to monitor safety confirmations and hazard handling of shift operations promptly
S59Illegal subcontracting to unqualified companies
S60Neglect in managing subordinate coal mines
S61Insufficient supervision of the implementation of safety technical measures
S62The superior company failed to fulfill its safety management responsibilities
S63Accompanying mine leaders did not strictly follow the regulations for underground visits
S64Ineffective implementation of safety technical measures
S65Insufficient attention to on-site safety production by coal mine management
S66Unauthorized use of civilian explosives
S67Illegal production activities in unapproved integration and renovation areas
S68Failure to meet safety production conditions
S69Non-compliance with regulatory instructions from government authorities
S70Failure to effectively fulfill job duties and responsibilities
S71Inadequate performance of local regulatory authorities
S72Insufficient fulfillment of integration responsibilities by the superior company
S73Inadequate safety supervision
S74Weak equipment management capabilities
S75Insufficient efforts in law enforcement, inspection, and supervisory guidance
Table 6. The extremely frequent itemset of risk factors.
Table 6. The extremely frequent itemset of risk factors.
Factors Included in Frequent Itemset
D1D11C2C5C7C9C12C18S1S3S4S5
S7S8S9S10S11S12S13S14S15S16S17S18
S19S20S23S24S25S26S27S28S29S30S32S33
S34S35S36S37S38S39S40S41S43S44S45S46
S47S49S50S57S61S62S64S65
Table 7. Strong association rules (partial).
Table 7. Strong association rules (partial).
RulesSupportConfidenceLift
{S17, S5, S29, S14} => {C2}0.5040.941.205
{S4, S7} => {C5}0.5230.8091.109
{S62, S29, S7} => {C7}0.5010.8821.32
{S17} => {S1}0.5611.01.058
{S62, S29, S11, S30} => {S10}0.50.9481.811
{S10, S30} => {S11}0.5031.01.678
{S16, S30} => {S12}0.5040.9691.638
{S5, S29, S30} => {S13}0.5010.8711.629
{S16, S11} => {S14}0.5071.01.31
{S33, S17, S19} => {S16}0.5010.9971.879
{S18} => {S17}0.5011.01.782
{S17, S5, S39, S14} => {S18}0.50.9251.844
{S16, S17, S29, S33} => {S19}0.5010.9751.905
{S33, S16} => {S20}0.5030.9721.788
{S24, S5, S29} => {S23}0.5391.01.315
{S20, S23} => {S24}0.5070.9911.554
{S62, S27, S23, S5} => {S25}0.50.8711.624
{S24, S43, S30} => {S27}0.5010.9941.613
{S4, S39, S14, S32} => {S28}0.5031.01.564
{S19} => {S29}0.5121.01.578
Table 8. Bayesian network risk factor analysis results.
Table 8. Bayesian network risk factor analysis results.
Analysis MethodRisk Factor
Sensitivity AnalysisS68 Failure to meet safety production conditions
S36 Insufficient emphasis on critical work arrangements
C35 Inadequate Safety Risk Assessment
S60 Neglect in managing subordinate coal mines
S23 On-site workers violated work regulations
S30 Insufficient scrutiny in the preparation and approval of procedures and measures
S43 Disorderly on-site safety management
S48 The safety management system is not fully established
S29 Operating procedures and safety technical measures lack practicality and specificity
S38 Inadequate safety management of the construction team
S44 Failure to implement the safety hazard inspection system
S53 Failure to implement personnel limitation measures as required
S55 Failure to strictly implement pre-shift meeting arrangements
Critical Path AnalysisC12 Poor on-site safety management
C4 Deficient safety supervision and inspection
C2 Inadequate implementation of on-site safety measures
S10 Safety responsibilities are unclear
S30 Insufficient scrutiny in the preparation and approval of procedures and measures
C18 Lack of coordination in safety production management
S15 Insufficient awareness of safety risks
S21 Production organization has vulnerabilities
High-Frequency Risk FactorsS1 Failure to detect and prevent employees’ violations of work regulations
S55 Failure to strictly implement pre-shift meeting arrangements
S60 Neglect in managing subordinate coal mines
S30 Insufficient scrutiny in the preparation and approval of procedures and measures
S23 On-site workers violated work regulations
C2 Inadequate implementation of on-site safety measures
C4 Deficient safety supervision and inspection
S14 Failure to detect signs of roof collapse on-site
S35 Inadequate enforcement of the pre-shift meeting system
C12 Poor on-site safety management
Main Risk FactorsS23 On-site workers violated work regulations
S55 Failure to strictly implement pre-shift meeting arrangements
S60 Neglect in managing subordinate coal mines
S30 Insufficient scrutiny in the preparation and approval of procedures and measures
C2 Inadequate implementation of on-site safety measures
C4 Deficient safety supervision and inspection
C12 Poor on-site safety management
Table 9. Associated factors of the main risk factors.
Table 9. Associated factors of the main risk factors.
Main risk FactorsAssociated Factors
S23 On-site workers violated work regulationsS33 Violation of safety technical measures
C7 Inadequate technical management
S25 No safety management organization has been established
S24 Failure to follow the instructions of the on-site primary responsible person
S55 Failure to strictly implement pre-shift meeting arrangementsS57 Failure to properly supervise, inspect, and approve work during shifts
S47 Irregular management of labor employment
S53 Failure to implement personnel limitation measures as required
S14 Failure to detect signs of roof collapse on-site
S59 Illegal subcontracting to unqualified companies
S60 Neglect in managing subordinate coal minesS5 Failure to promptly identify and eliminate potential safety hazards
S65 Insufficient attention to on-site safety production by coal mine management
S37 Inadequate on-site safety confirmation
S30 Insufficient scrutiny in the preparation and approval of procedures and measuresS16 Poor reliability of emergency rescue equipment
S39 Inadequate enforcement of on-site safety management systems
S10 Safety responsibilities are unclear
C2 Inadequate implementation of on-site safety measuresS1 Failure to detect and prevent employees’ violations of work regulations
C4 Deficient safety supervision and inspectionC8 Weak implementation of safety policies by regulatory authorities
S72 Insufficient fulfillment of integration responsibilities by the superior company
C12 Poor on-site safety managementS62 The superior company failed to fulfill its safety management responsibilities
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Du, G.; Chen, A. Coal Mine Accident Risk Analysis with Large Language Models and Bayesian Networks. Sustainability 2025, 17, 1896. https://doi.org/10.3390/su17051896

AMA Style

Du G, Chen A. Coal Mine Accident Risk Analysis with Large Language Models and Bayesian Networks. Sustainability. 2025; 17(5):1896. https://doi.org/10.3390/su17051896

Chicago/Turabian Style

Du, Gu, and An Chen. 2025. "Coal Mine Accident Risk Analysis with Large Language Models and Bayesian Networks" Sustainability 17, no. 5: 1896. https://doi.org/10.3390/su17051896

APA Style

Du, G., & Chen, A. (2025). Coal Mine Accident Risk Analysis with Large Language Models and Bayesian Networks. Sustainability, 17(5), 1896. https://doi.org/10.3390/su17051896

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop