1. Introduction
The occurrence of coal mine accidents not only results in human casualties and economic losses, but also generates a cascade of profound and long-lasting impacts, such as production suspensions, administrative penalties imposed on regulatory bodies, erosion of public trust in coal mining enterprises, and disruptions to coal supply chains. Given the severe and far-reaching consequences of such accidents, research focused on the prevention and mitigation of coal mine accidents holds substantial practical and societal value. Over the past several decades, the safety standards of coal mine production in China have significantly improved. As illustrated in
Figure 1, between 2013 and 2023, three key indicators of safety performance—the frequency of coal mine accidents, the number of fatalities, and the death rate per million tons of coal—have all demonstrated a consistent downward trajectory. While the safety situation in Chinese coal mines has shown continuous improvement, the steady enhancement of safety standards faces persistent challenges, including increasing uncertainty in the external environment, inherent weaknesses in the safety infrastructure of mines, insufficient enforcement of corporate responsibility, and a shortage of skilled technical personnel. These issues constitute latent risks that may compromise the safety of coal mining operations. According to well-established accident causation theories [
1,
2,
3,
4], the occurrence of coal mine accidents is the result of the complex interaction of multiple risk factors. Effectively identifying and managing these latent risks in the coal mining process could lead to a significant reduction in the incidence of accidents.
Early research on coal mine accidents predominantly focused on the harm inflicted on miners, emphasizing the factors influencing both the likelihood of injury and its severity during such events. Key determinants identified in these studies include miners’ experience, the size of the coal mine, and the employed mining methods. These insights led to the development of protective measures aimed at enhancing miners’ safety in the workplace [
5,
6,
7]. In addition, early research predominantly relied on accident statistics and case studies, utilizing quantitative methods and empirical observations to investigate accident causation [
8]. While these approaches were innovative at the time, their effectiveness was limited by restricted data availability and relatively narrow methodological scope, hindering a comprehensive understanding of the complex causes of accidents. This limitation led subsequent scholars to recognize that a single-factor approach cannot fully account for the complexity of accidents, underscoring the need for a multidimensional analysis of the interactions between various contributing factors. Furthermore, early studies made significant contributions to the development of safety measures and the formulation of safety theories, providing invaluable theoretical foundations and practical insights that have been instrumental in shaping more refined risk assessment models and safety management systems in later research.
With the progressive improvement of coal mining environments, scholarly attention shifted toward the potential occupational health hazards associated with mining operations. Contemporary studies highlight that a wide range of factors—including occupational safety measures, the enforcement of safety regulations, risk control technologies, working conditions, and individual characteristics—have a significant impact on miners’ occupational health [
9,
10,
11,
12]. Currently, research on coal mine accidents primarily centers on accident prevention and risk control, with the overarching goal of minimizing their adverse impacts. A critical and effective research avenue involves the in-depth analysis of risk factors contributing to coal mine accidents, providing a robust scientific foundation for the formulation of preventive measures. Studies addressing coal mine accident risk factors can be broadly classified into three approaches: knowledge-driven, data-driven, and a fusion of knowledge and data-driven methodologies. In addition, risk factors are dynamic in nature, continuously evolving as mining environments, technological advancements, and safety management measures are progressively optimized [
13]. By integrating historical accident data, researchers can trace the temporal evolution of risk factors, demonstrating that changes in production conditions, management levels, and external environments exert profound impacts on accident risk characteristics [
14,
15]. This dynamic evolution highlights the necessity of adopting adaptive risk management strategies to promptly identify and address emerging safety hazards [
16]. Understanding the mechanisms behind these evolving risk factors not only aids in developing more proactive and targeted preventive measures but also lays a strong theoretical foundation and provides methodological support for future research employing knowledge-driven, data-driven, and hybrid approaches.
Knowledge-driven research primarily builds on the expertise of scholars and employs established theoretical frameworks to analyze critical risk factors associated with coal mine accidents. Among these frameworks, process safety management (PSM) is one of the most frequently applied. By integrating systematic risk assessment models such as HAZOP [
17,
18] and FTA [
19], this approach effectively identifies and evaluates risk factors, thereby offering essential support for enhancing coal mine safety management systems [
20,
21,
22,
23]. For example, researchers have developed various risk factor analysis models [
24,
25,
26,
27,
28,
29,
30] and have investigated the interaction mechanisms of coal mine accident risk factors within relevant theoretical frameworks [
31,
32,
33,
34,
35]. These studies further assess the impact of these risk factors on accident outcomes, thereby providing a robust theoretical foundation for developing effective risk control strategies. Despite its contributions, the knowledge-driven approach exhibits certain limitations. It often focuses on specific types of coal mine accidents or examines risk factors in isolation, with minimal emphasis on the interrelationships and interactions among diverse factors. Such a narrow scope limits a comprehensive understanding of coal mine accident risks and undermines holistic risk management efforts [
36,
37,
38].
Data-driven research on coal mine accident risk factors emphasizes the analysis of extensive coal mine accident datasets to identify and thoroughly investigate the key factors influencing such incidents. For instance, text mining techniques have been employed to extract risk factors from a vast number of coal mine accident investigation reports [
39,
40,
41,
42,
43,
44,
45,
46,
47], while machine learning and deep learning models have been utilized to conduct advanced analyses of these identified factors [
48,
49,
50,
51,
52,
53]. This approach relies extensively on coal mine accident investigation reports, which systematically document the causes of accidents, providing a reliable data foundation for research in coal mine process safety management [
54,
55,
56]. The adoption of advanced digital technologies in the coal mining sector has enabled real-time monitoring of production processes and the rapid detection of potential hazards, thereby offering critical support for preventing major accidents and safeguarding production facilities and personnel. The integration of digital technologies not only accelerates the response to safety hazards but also creates new opportunities for multi-source data integration and the optimization of risk management frameworks [
57,
58,
59]. Historical data, including accident investigation reports, provide detailed insights into the causes and processes of accidents, forming a solid foundation for identifying the characteristics of risk factors and uncovering their evolutionary mechanisms. At the same time, real-time monitoring data capture dynamic changes in potential risks during production processes, enabling timely adjustments to safety management strategies.
By combining a deep analysis of risk factors’ characteristics and evolutionary mechanisms within historical data with the dynamic feedback provided by real-time monitoring, it becomes possible not only to validate analytical findings but also to optimize risk management strategies further. The seamless integration of historical and real-time data allows for a more precise understanding of the evolutionary trajectories of risk factors. Guided by process safety principles, this integration not only enhances the accuracy of analytical results but also provides systematic, scientifically grounded support for coal mine accident prevention and risk control [
60]. Knowledge and data fusion-driven research on coal mine accident risk factors begins by leveraging the expertise of scholars to formulate research questions, followed by an in-depth analysis of coal mine accident data to investigate the root causes of these risks. Based on the findings, targeted risk mitigation strategies are developed, providing a robust foundation for effective risk management and accident prevention in coal mining operations [
61,
62,
63,
64,
65]. This research approach not only builds on existing theoretical knowledge to refine the analysis of risk factors—such as categorizing risks by accident type [
66,
67] or location [
68,
69]—but also harnesses coal mine accident data to uncover novel insights and advance the understanding of underlying risk dynamics. The integration of knowledge and data is fundamental to conducting comprehensive and in-depth studies of coal mine accidents while establishing a foundation for dynamic process safety management. This fusion-driven research framework, guided by process safety principles, emphasizes systematization, prevention, and adaptability. By deeply integrating theoretical expertise with empirical data, it enables more precise identification and control of risk factors. Such an approach not only uncovers the underlying patterns and interactions among risk factors but also provides scientifically grounded and systematically informed decision-making support for coal mine safety management. As a result, it facilitates significant improvements in safety production levels while advancing the full implementation of process safety principles across the industry [
70,
71].
In summary, early research primarily relied on accident statistics and case studies to conduct quantitative investigations into the direct impacts of coal mine accidents on miner injuries and economic losses, identifying key risk factors such as miner experience, mine size, and mining methods. Although these methods were innovative at the time, their ability to reveal the complex causes and dynamic evolution of risk factors was limited by restricted data sources and a narrow methodological scope. With the continuous improvement of coal mining environments and technological advancements, the research focus has gradually shifted from isolated accident outcomes to the evolution of risk factors and their profound effects on miners’ occupational health. Accordingly, scholars have introduced knowledge-driven approaches based on theoretical frameworks, data-driven methods relying on big data analytics, and hybrid methodologies that integrate both strategies to explore the composition, interaction mechanisms, and evolution of accident risks from multiple dimensions. However, despite their distinct advantages, these approaches suffer from limitations—such as neglecting the interactions among multiple factors, incomplete information extraction, and an insufficient understanding of causal relationships—which prevent the establishment of a systematic, dynamic, and comprehensive risk assessment model. In light of these limitations and research gaps, there is an urgent need for a novel approach to overcome the shortcomings of existing methods. To address this gap, the present study proposes the use of advanced deep learning techniques and large language models to efficiently extract and analyze risk information from textual data while preserving semantic integrity. This approach aims to uncover the causal relationships among risk factors and track their dynamic evolution over time. Building on this foundation, this study seeks not only to construct a precise and efficient model for identifying and assessing coal mine accident risk factors but also to provide more scientific and systematic decision support for coal mine safety management and accident prevention.
From the perspective of risk factors, the key to effectively preventing and controlling coal mine accidents lies in the precise identification of the critical factors that contribute to their occurrence. Recent studies have widely employed text mining techniques for identifying coal mine accident risk factors, with the primary advantage being their ability to efficiently process large volumes of textual data, thereby significantly improving analysis efficiency [
72,
73]. However, one major drawback of text mining methods is that the process may disrupt the intrinsic semantic structure of the data, resulting in potential information loss [
74,
75]. Additionally, given the lack of in-depth understanding of the specific context within the coal mining industry, the risk factors identified solely through text mining often require further verification and refinement. The outcomes may not fully or accurately capture the real-world challenges encountered in coal mine production. As shown in
Table 1, the main limitations and challenges of existing methods in handling risk factors are outlined in detail. To overcome the limitations inherent in existing methods for coal mine accident risk identification and to facilitate more precise analyses, this study proposes leveraging deep learning techniques using large language models [
76,
77,
78]. Large language models are equipped with advanced natural language processing and understanding capabilities, enabling them to not only efficiently identify potential risk factors but also to uncover the causal relationships between these factors, while effectively preserving the semantic integrity of the data [
79,
80,
81]. In the context of complex textual data, the efficiency of large language models further enhances the reliability and validity of the analysis results. By incorporating large language models, this study provides robust technical support for the development of a more accurate and practical coal mine accident risk factor analysis model. Furthermore, it offers a novel theoretical foundation and decision-making framework for coal mine safety management and accident prevention, thereby significantly enhancing the capacity to address the diverse safety challenges faced in coal mine production.
4. Conclusions
This study employs large language models (LLMs) for an in-depth analysis of coal mine accident investigation reports, capitalizing on their advanced natural language processing (NLP) capabilities to effectively extract key risk factors impacting coal mine safety. Due to the lack of standardization in the formatting of these reports and the significant variation in their textual expressions, traditional text mining methods struggle to cope, which limits their applicability. To overcome these challenges, this study developed a specialized set of prompts designed for risk factor extraction, guiding the large language model to accurately identify risk factors within the reports. Ultimately, 14 direct risk factors, 38 integrated risk factors, and 75 specific risk factors were identified. Compared to traditional text mining techniques, such as word segmentation, keyword extraction, and semantic analysis, large language models demonstrate superior abilities in contextual understanding and semantic inference. These models are adept at handling complex, diverse linguistic structures, enabling them to process varied and non-standardized text with high accuracy. Furthermore, while traditional text mining methods rely on predefined rules or dictionaries that may fail to account for all potential risk factors, large language models offer greater flexibility in adapting to different expressions, ensuring a more comprehensive and precise extraction process. The NLP approach based on large language models not only preserves the integrity of the original information in the reports but also minimizes the loss of crucial details. Compared to traditional text mining methods, large language models (LLMs) exhibit significant advantages in deep semantic understanding. Traditional approaches, relying on predefined rules and keyword extraction, often fail to fully capture the complex contexts and implicit causal relationships within texts. In contrast, LLMs enable more precise analysis by accurately identifying semantic layers and subtle nuances in context. In this study, risk factors extracted by LLMs, such as “violation of work regulations by on-site workers” (S23), “failure to strictly implement pre-shift meeting arrangements” (S55), and “negligence in managing subordinate coal mines” (S60), demonstrate a level of detail and depth beyond the capabilities of traditional methods. These nuanced risk factors uncover issues that traditional methods typically overlook, rendering the findings more consistent with the actual challenges in coal mine safety management. Consequently, the recommendations proposed in this study are more actionable, providing a robust and reliable foundation for future risk assessments and safety decision-making.
After extracting risk factors from coal mine accident investigation reports, the next critical step is to assess their relationships with coal mine accidents. Given the large number of risk factors involved and their intricate interdependencies, traditional methods have limitations in effectively identifying causal relationships among them. Furthermore, overly complex network structures can impede the effective incorporation of expert knowledge, thereby diminishing both analytical accuracy and practical applicability. In contrast, Bayesian networks provide an efficient approach to handling multiple interrelated factors while simplifying complex analytical problems through graphical structures. They not only capture the intricate dependencies among risk factors but also integrate data-driven learning with expert knowledge, thereby improving the model’s robustness and predictive reliability. Moreover, Bayesian networks excel at modeling uncertainty, making them particularly suitable for analyzing unpredictable risks in coal mine accidents.
Therefore, this study employs Bayesian networks to analyze the risk factors associated with coal mine accidents, aiming to achieve a more comprehensive understanding of their underlying relationships and to generate valuable insights for coal mine safety management.
In a novel approach, this study integrates Bayesian networks with association rule mining to thoroughly examine coal mine accident risk factors. By utilizing the Apriori algorithm to uncover strong association rules among the risk factors, 362 robust association rules were generated, yielding a Bayesian network with 127 risk factor nodes. Through a combination of sensitivity analysis, critical path analysis, and frequency statistics, seven major risk factors were identified: S23 (On-site workers violated work regulations), S55 (Failure to strictly implement pre-shift meeting arrangements), S60 (Neglect in managing subordinate coal mines), S30 (Insufficient scrutiny in the preparation and approval of procedures and measures), C2 (Inadequate implementation of on-site safety measures), C4 (Deficient safety supervision and inspection), and C12 (Poor on-site safety management). The results indicate that the most significant risk factors contributing to coal mine accidents are concentrated in areas such as on-site safety management, the execution of operational procedures, and inadequate safety oversight. These issues, compared to other risks, are more critical and should be prioritized in accident control efforts.
Further analysis reveals that these primary risk factors expose several key shortcomings in coal mine safety management, particularly in the areas of insufficient enforcement of operational procedures, the failure to implement pre-shift meeting protocols, lack of oversight of subordinate coal mines, insufficient scrutiny in procedural approvals, and deficiencies in on-site safety measures and management. To effectively prevent accidents, efforts should focus on addressing these risk factors by strengthening the enforcement of operational procedures, ensuring the rigorous implementation of pre-shift meeting arrangements, and improving the supervision of subordinate coal mines. Additionally, refining the procedural and approval processes, enhancing on-site safety measures, and bolstering the supervisory and inspection capabilities of safety management personnel are vital strategies for reducing the occurrence of coal mine accidents. By establishing and enhancing these management and oversight mechanisms, it becomes possible to systematically identify potential risks and implement effective countermeasures, ultimately improving the effectiveness of accident prevention strategies.
With the rapid advancement of large language models, their application in analyzing coal mine accident risk factors is poised to reach unprecedented depths. Harnessing their sophisticated natural language processing capabilities, large language models excel in efficiently processing complex and heterogeneous accident reports, uncovering latent risk factors, and identifying nuanced details that often elude conventional approaches. Notably, the integration of large language models with advanced techniques such as Bayesian networks and deep learning significantly enhances the precision and comprehensiveness of risk prediction and analysis, offering robust scientific support for coal mine safety management. In this study, large language models are employed to systematically extract key risk factors from accident reports, which are subsequently analyzed using the Apriori algorithm to identify association rules among risk factors. These association rules are further integrated into a Bayesian network to construct a causal analysis framework. This comprehensive approach provides a rigorous theoretical foundation for dynamic risk assessment and the optimization of decision-making processes. The proposed research framework, which synergizes cutting-edge data analysis methodologies with intelligent tools, delineates a forward-looking pathway for enhancing the intelligence of coal mine safety management.
Specifically, large language models demonstrate substantial potential for advancing coal mine safety management across various critical domains. In the area of safety education and training, these models integrate extensive accident case studies and historical data to develop personalized and scenario-based training materials, thereby enhancing the safety awareness and operational skills of both managerial and frontline personnel. In on-site safety management, large language models, when combined with real-time monitoring systems, enable the consolidation and analysis of fragmented data from multiple sources. This facilitates the accurate identification of potential risks and provides management teams with data-driven risk alerts and optimization strategies, resulting in improved management efficiency and more timely hazard responses. Furthermore, in process optimization, large language models extract association rules from accident data to identify common failure patterns, offering robust scientific support for refining process design and strengthening control measures at critical operational stages. In the context of safety supervision and inspection, these models dynamically analyze risk evolution trends, delivering intuitive risk assessment outcomes and actionable strategy recommendations that enhance the efficiency of supervisory activities and bolster the scientific rigor of decision-making processes. The research framework developed in this study offers significant potential for real-world applications. It supports the creation of intelligent coal mine risk assessment and early warning systems capable of real-time monitoring and dynamic analysis of on-site conditions. Additionally, it enables the optimization of operational processes by identifying and addressing high-risk areas, facilitates quantitative risk assessment and adaptive strategy refinement within safety supervision and inspection, and enhances the precision and applicability of safety education and training by generating data-driven, scenario-specific learning content tailored for management and operational personnel. By implementing this framework, coal mine safety management can advance toward higher levels of intelligence, systemization, and precision, providing a robust scientific foundation for improving coal mine production safety and strengthening process safety management practices.