Deep Learning-Based Anomaly Detection in Occupational Accident Data Using Fractional Dimensions

Akgüller, Ömer; Batrancea, Larissa M.; Balcı, Mehmet Ali; Tuna, Gökhan; Nichita, Anca

doi:10.3390/fractalfract8100604

Open AccessArticle

Deep Learning-Based Anomaly Detection in Occupational Accident Data Using Fractional Dimensions

by

Ömer Akgüller

^1,2

,

Larissa M. Batrancea

^3,*

,

Mehmet Ali Balcı

^1,*

,

Gökhan Tuna

⁴

and

Anca Nichita

⁵

¹

Department of Mathematics, Faculty of Science, Mugla Sitki Kocman University, Mugla 48000, Turkey

²

Engineering Sciences Department, Engineering and Architecture Faculty, İzmir Katip Çelebi University, İzmir 35620, Turkey

³

Department of Business, Babeş-Bolyai University, 400174 Cluj-Napoca, Romania

⁴

Department of Resource Management in Health, Social Security Institution of the Republic of Turkey, Ankara 06520, Turkey

⁵

Faculty of Economics, “1 Decembrie 1918” University of Alba Iulia, 510009 Alba Iulia, Romania

^*

Authors to whom correspondence should be addressed.

Fractal Fract. 2024, 8(10), 604; https://doi.org/10.3390/fractalfract8100604

Submission received: 19 September 2024 / Revised: 14 October 2024 / Accepted: 15 October 2024 / Published: 17 October 2024

(This article belongs to the Special Issue Fractal Dynamics and Machine Learning in Financial Markets)

Download

Browse Figures

Versions Notes

Abstract

:

This study examines the effectiveness of Convolutional Autoencoder (CAE) and Variational Autoencoder (VAE) models in detecting anomalies within occupational accident data from the Mining of Coal and Lignite (NACE05), Manufacture of Other Transport Equipment (NACE30), and Manufacture of Basic Metals (NACE24) sectors. By applying fractional dimension methods—Box Counting, Hall–Wood, Genton, and Wavelet—we aim to uncover hidden risks and complex patterns that traditional time series analyses often overlook. The results demonstrate that the VAE model consistently detects a broader range of anomalies, particularly in sectors with complex operational processes like NACE05 and NACE30. In contrast, the CAE model tends to focus on more specific, moderate anomalies. Among the fractional dimension methods, Genton and Hall–Wood reveal the most significant differences in anomaly detection performance between the models, while Box Counting and Wavelet yield more consistent outcomes across sectors. These findings suggest that integrating VAE models with appropriate fractional dimension methods can significantly enhance proactive risk management in high-risk industries by identifying a wider spectrum of safety-related anomalies. This approach offers practical insights for improving safety monitoring systems and contributes to the advancement of data-driven occupational safety practices. By enabling earlier detection of potential hazards, the study supports the development of more effective safety policies, and could lead to substantial improvements in workplace safety outcomes.

Keywords:

anomaly detection; fractional dimension; deep learning; occupational safety

1. Introduction

Occupational safety is a critical concern across industries, particularly in high-risk sectors such as coal extraction [1,2], manufacturing [3,4,5], and metal production [6,7]. The analysis of occupational accident data plays a crucial role in identifying patterns and anomalies that may signify potential risks. Detecting such anomalies in accident data can enable preventive measures, reducing the likelihood of future incidents and improving overall workplace safety [8]. Beyond its immediate impact on the health and well-being of workers, maintaining a safe work environment also has significant financial implications for companies and industries. Accidents in high-risk sectors not only lead to direct costs such as medical expenses, compensation claims, and legal fees, but also cause indirect costs in the form of lost productivity, equipment damage, and reputational harm [9,10,11,12].

In financial markets, a company’s safety record can influence investor confidence and stock performance, especially in industries with a high risk of workplace incidents. Companies with frequent accidents or poor safety management may experience negative market reactions, with shareholders perceiving these issues as indicators of poor operational management or increased liability [13,14]. Conversely, organizations that demonstrate a commitment to safety and the ability to proactively manage risks are often viewed more favorably by investors, leading to increased market stability and investor trust [15]. Moreover, regulatory bodies and government agencies impose fines and penalties on companies that fail to maintain adequate safety standards, which can further impact financial performance [16]. As such, early detection of anomalies in accident data not only helps to safeguard workers, but also plays a crucial role in maintaining financial health by preventing costly accidents, legal repercussions, and losses in market value. In this context, effective anomaly detection systems are not only tools for operational safety, but also strategic assets for sustaining business continuity and protecting corporate reputation in the eyes of investors and financial markets.

The Coal and Lignite Extraction (NACE05), Manufacture of Other Transportation Vehicles (NACE30), and Basic Metal Industry (NACE24) sectors are known for their elevated occupational hazards [17,18]. These industries inherently involve complex and high-risk processes, ranging from underground mining operations to heavy machinery manufacturing and metalworking, where the potential for accidents is significantly higher than in many other sectors. Despite ongoing safety efforts, including strict regulations, regular inspections, and safety training programs, accident rates remain a pressing concern. Frequent exposure to hazardous materials, high-pressure environments, and dangerous machinery means that even minor lapses in safety protocols can lead to severe, sometimes fatal, incidents.

Traditional methods for analyzing accident data, such as simple statistical summaries or rule-based systems, are often limited in their ability to detect the subtle patterns that can precede major incidents. These methods generally rely on reactive approaches, identifying risks only after an accident has occurred, rather than proactively identifying warning signs that may signal the potential for a future event. Furthermore, they may overlook complex interactions between different variables in the data, such as correlations between equipment failure, worker behavior, and environmental conditions, which could contribute to an accident.

This motivates the use of advanced anomaly detection techniques, particularly those leveraging deep learning models, which have the capacity to process large volumes of data and uncover hidden risks that may not be apparent through traditional methods. Deep learning models, such as Convolutional Autoencoders (CAE) and Variational Autoencoders (VAE), offer the ability to detect anomalies by learning patterns from historical data and identifying deviations from these learned patterns in real-time [19,20,21,22]. To enhance this approach, this study integrates the use of Gramian Angular Fields (GAFs), which play a pivotal role in transforming time series data into images that can be more effectively processed by these models.

The GAF is a powerful technique that enables the transformation of one-dimensional time series data into two-dimensional images, capturing temporal correlations and angular relationships within the data [23,24,25]. This transformation is particularly valuable in anomaly detection, as it allows the CAE and VAE models to learn from image representations, making it easier to capture non-linear relationships and hidden patterns that would otherwise be difficult to identify in raw time series data. In this study, GAFs are applied to the fractional dimension series derived from occupational accident data, allowing for a visual interpretation of the complexity and irregularity inherent in the data. This approach provides a richer feature set for the models, improving their ability to detect anomalies in high-risk sectors.

The use of an eight-window-length sliding window with a step size of 1 allows the GAF to be applied dynamically across time intervals, ensuring that the temporal evolution of accident risks is captured. For each window, a GAF is generated, representing the fractional dimensions of the data as an image. This image is then input into the CAE and VAE models to detect potential anomalies. The fractional dimension at the midpoint of each window is selected as the anomaly point, providing a focused, data-driven means of identifying critical moments where safety risks may be emerging.

Fractional dimensions—Box Counting, Hall–Wood, Genton, and Wavelet methods—are central to this approach, as they capture the inherent complexity and irregularity of occupational accident data. Traditional time series methods often fail to detect subtle, non-linear, and chaotic behaviors underlying accident occurrences. These methods can overlook important structural information that could provide early warnings of potential hazards. Fractional dimension analysis, in conjunction with GAFs, overcomes these limitations by offering a way to visualize and analyze complex patterns that traditional methods might miss.

Box Counting, Hall–Wood, Genton, and Wavelet methods each bring unique advantages to the analysis of accident data. Box Counting measures how accidents scale across different time frames or environments, while Hall–Wood focuses on local fractal dimensions, identifying regions of varying complexity and risk. The Genton method is particularly adept at estimating irregularities in the data, which may arise from erratic behaviors or unpredictable conditions. Finally, the Wavelet method provides a multi-scale approach to fractal dimension estimation, allowing for the identification of anomalies across different frequency components of the data. When combined with GAFs, these methods provide a more comprehensive understanding of the hidden patterns within accident data.

The study’s primary objective is to assess the ability of CAE and VAE models, enhanced by the use of GAFs and fractional dimension methods, to detect anomalies in accident data and predict potential hazards. A secondary goal is to evaluate how well these models and methods can optimize safety measures in high-risk industries, reducing accident-related costs and legal liabilities. Additionally, this research seeks to determine how effectively these models can adapt to varying industrial contexts, specifically within the NACE05, NACE30, and NACE24 sectors, highlighting the strengths and limitations of each approach.

By adopting GAFs in combination with deep learning models and fractional dimension methods, this study aims to demonstrate how industries can transition from reactive to proactive accident prevention through advanced machine learning and data-driven approaches. The ability to visually represent and analyze complex patterns in accident data using GAFs has significant implications for improving workplace safety, reducing costs associated with accidents, and safeguarding corporate reputation, while also contributing to the academic discourse on machine learning in industrial safety management.

Occupational safety remains a paramount concern across various industries, especially in high-risk sectors such as NACE05, NACE24, and NACE30. Despite stringent safety regulations and advancements in protective technologies, these industries continue to face challenges in predicting and preventing accidents. A significant research gap exists in the effective detection of subtle anomalies within operational data that precede safety incidents. Traditional statistical methods often fail to capture complex patterns and temporal dependencies inherent in such data, leading to missed opportunities for proactive intervention. This study addresses this gap by employing advanced anomaly detection techniques using Gramian Angular Summation Fields (GASF) of fractional dimension series. By transforming time-series data into visual representations through GASF, we capture intricate patterns and nonlinear relationships that are not easily detectable in raw data. The fractional dimension series, derived using methods like Box Counting, Hall–Wood, Genton, and Wavelet, provide a nuanced characterization of the data’s fractal properties, which are essential for identifying irregularities associated with safety risks.

We applied CAE and VAE models to these GASF-transformed fractional dimension series across the three NACE sectors. This approach allows for deep learning models to learn complex features and detect anomalies that may correlate with unsafe operational behaviors, lapses in safety protocols, or emerging risks. By leveraging the strengths of both CAE and VAE models, we aim to enhance the sensitivity and specificity of anomaly detection in occupational safety contexts. Our findings indicate that the VAE model, in particular, is effective at detecting a broader and more varied range of anomalies in sectors like NACE05 and NACE30, which are characterized by complex operational processes and higher safety risks. The CAE model showed proficiency in identifying moderate anomalies, which may represent minor deviations from standard practices. The variations in model performance across different fractional dimension methods highlight the importance of selecting appropriate analytical techniques tailored to the specific characteristics of each sector. By integrating GASF of fractional dimension series with advanced autoencoder models, this study provides a novel methodological framework for anomaly detection in occupational safety. It demonstrates how capturing the fractal and temporal properties of operational data can lead to more effective identification of potential safety hazards. This approach not only enhances the detection capabilities, but also contributes to a deeper understanding of the underlying operational behaviors that may compromise safety.

In conclusion, this research fills a critical gap in the field of occupational safety by introducing an innovative anomaly detection methodology that combines GASF of fractional dimension series with deep learning models. The ability to detect and interpret anomalies related to safety incidents can inform the development of more targeted safety policies, improve operational practices, and ultimately reduce the occurrence of workplace accidents. This study lays the groundwork for future applications of advanced analytical techniques in enhancing occupational safety across various high-risk industries.

This study is structured as follows: In Section 2, we begin by providing a detailed description of the dataset, followed by the computation of fractional dimensions, the application of Gramian Angular Fields, and the process of anomaly detection using Convolutional Autoencoder and Variational Autoencoder models. Section 3 presents the emerging fractional dimensions for each sector, along with the detected anomalies, accompanied by brief interpretations of the results. In Section 4, we engage in a comprehensive discussion of the findings, focusing on the distributions of anomaly scores to further analyze the outcomes. Finally, Section 5 concludes the study with summarizing remarks, highlighting the key findings and their broader implications.

2. Materials and Methods

2.1. Data Set

The dataset utilized in this study was meticulously gathered from the Turkish Republic Social Security Institution, a trusted and authoritative source of occupational data within Turkey. Access to this dataset was granted under official permission, ensuring that the data are both legitimate and reliable. Spanning from the beginning of 2012 to the beginning of 2023, this dataset provides a comprehensive temporal window, encompassing over a decade of occupational accident records. This extended timeframe allows for a thorough examination of trends, seasonality, and the evolution of safety practices within key industrial sectors in Turkey. The duration and scope of this dataset are critical for understanding long-term patterns and the impact of various interventions on occupational safety over the years.

For this study, we strategically selected three industrial sectors that are of paramount importance due to their high incidence rates of occupational accidents. These sectors were identified based on the latest available data from the 2022 Statistical Yearbook of the Social Security Institution, which ranks industries according to their occupational accident incidence rates. By focusing on these sectors, our study addresses areas with the greatest need for safety improvements and the highest potential impact on reducing workplace injuries. The sectors chosen are:

•: NACE05—Coal and Lignite Extraction: The Coal and Lignite Extraction sector is a cornerstone of Turkey’s energy production, yet it is notoriously hazardous due to the inherent risks of mining. This sector has been a focal point for occupational safety due to the high frequency and severity of accidents associated with underground and surface mining operations. The dataset for this sector includes 2092 time ticks, corresponding to individual periods during which occupational accidents were recorded. Over the study period, a total of 67,547 occupational accidents were reported in this sector. This significant number of incidents reflects the perilous conditions faced by workers in coal and lignite extraction and underscores the critical need for ongoing safety enhancements and rigorous monitoring protocols in the mining industry.
•: NACE30—Manufacture of Other Transportation Vehicles: This sector includes the manufacturing of a wide range of transportation vehicles, such as ships, trains, and aircraft, excluding motor vehicles. The sector is characterized by complex production processes, strict regulatory standards, and the need for precision in manufacturing, which collectively influence the safety landscape. The dataset for this sector consists of 2007 time ticks, with a total of 31,095 occupational accidents recorded. Although the total number of accidents is lower compared to the coal and lignite extraction sector, the nature of the work involves high-stakes operations where safety failures can have severe consequences. The data highlights the critical need for targeted safety measures and continuous monitoring to prevent accidents in this highly specialized industry, where even minor incidents can lead to significant disruptions.
•: NACE24—Basic Metal Industry: The Basic Metal Industry is foundational to numerous other industrial activities, producing essential materials such as steel, aluminum, and other non-ferrous metals. This sector is particularly hazardous due to the intense heat, heavy machinery, and hazardous chemicals involved in metal production processes. The dataset for this sector includes 2098 time ticks, during which a staggering total of 175,881 occupational accidents were recorded. This makes the Basic Metal Industry the most accident-prone sector among those analyzed in this study. The sheer volume of incidents in this sector highlights the persistent dangers that workers face and the urgent need for comprehensive safety protocols and innovations to mitigate risks. This sector’s high accident rate also underscores its critical role in the broader industrial ecosystem, where safety improvements could have far-reaching effects.

The time series presented for the NACE05 sector in Figure 1, which pertains to Coal and Lignite Extraction, exhibits a highly variable pattern with multiple significant spikes in occupational accidents over the period from 2012 to early 2023. The data show an initial period of low activity, followed by a sharp and intense cluster of spikes around 2013 and 2014. This period is characterized by frequent and severe accident occurrences, with peaks reaching nearly 300 incidents. After a relatively quiet period from 2015 to 2018, the series resumes with increased volatility starting around 2019, culminating in another set of spikes, particularly around 2020 and again in 2022, with a notable increase in accident frequency and magnitude.

The time series depicted in Figure 2 represents the total number of occupational accidents in the NACE30 sector, which involves the manufacture of other transportation vehicles, from the beginning of 2012 to the middle of 2023. The early part of the series shows a relatively low and stable accident rate, with some sporadic spikes around 2014. After a period of reduced activity from 2015 to 2018, there is a noticeable increase in both the frequency and magnitude of accidents, starting around 2019. This trend continues into 2022, with the data showing a dense clustering of accident occurrences and multiple significant spikes, particularly evident as the series progresses. The increase in volatility and the concentration of accidents in the later years suggest that the sector has experienced periods of heightened risk or operational changes that impacted safety.

The time series displayed in Figure 3 represents the total number of occupational accidents in the NACE24 sector, which pertains to the Basic Metal Industry, from the beginning of 2012 to 2023. The data are characterized by multiple prominent spikes, particularly concentrated around 2014 and then again from 2020 onward. These spikes reach over 200 accidents at their peak, indicating periods of significantly increased risk or adverse events within the industry. The early part of the series (2012 to 2014) shows a rapid escalation in accident counts, followed by a relative lull from 2015 to 2018. However, starting in 2019, the data exhibits a marked increase in both the frequency and intensity of accidents, with numerous spikes and a generally higher baseline of incidents continuing into 2022 and beyond.

Overall, the dataset’s extensive coverage, both in terms of time and sectoral focus, provides a rich foundation for advanced analysis. By concentrating on these three high-risk sectors, our study aims to offer actionable insights into the patterns and causes of occupational accidents. The detailed time series data for each sector allows for the application of sophisticated analytical techniques, such as Gramian Angular Fields (GAFs) and deep learning models, to detect anomalies and predict future risks. This approach not only helps in understanding past incidents, but also in proactively identifying areas where safety measures can be improved, ultimately contributing to the prevention of occupational accidents and the promotion of safer work environments across Turkey’s most hazardous industries.

2.2. Fractal Dimension

A numerical metric known as fractal dimension characterizes the intricacy and self-resemblance of time series data, providing insight into the fundamental dynamics of the system that generates the series. The fractal dimension, unlike traditional geometric dimensions, is not limited to integer values. It can take non-integer values, which indicate the extent to which a time series fills the space it occupies. This idea is especially valuable for studying time series that display irregular, fragmented, or chaotic behavior when conventional statistical techniques may be inadequate. The fractal dimension is a useful tool for analyzing time series data, since it captures detailed patterns and scale invariance, allowing for the characterization of roughness, complexity, and underlying structure.

The Box Counting method is one of the most common techniques used to estimate the fractal dimension of a time series or a geometric object [26,27,28]. This method involves plotting the time series

X = {x_{1}, x_{2}, \dots, x_{n}}

in a two-dimensional space, where the x-axis represents time and the y-axis represents the value

x_{i}

at each time step. To estimate the fractal dimension, a grid of boxes with a uniform size

ϵ

is superimposed on the plot. The method counts the number of boxes

N (ϵ)

that contain at least one point from the time series. By varying the size of the boxes

ϵ

and repeating the counting process, a relationship between

N (ϵ)

and

ϵ

is established. The fractal dimension

D_{B}

is then estimated by analyzing the scaling behavior of

N (ϵ)

with respect to

ϵ

. Mathematically, the fractal dimension is defined as

D_{B} = lim_{ϵ \to 0} \frac{log N (ϵ)}{log (1 / ϵ)},

(1)

where

D_{B}

is obtained as the slope of the line in the plot of

log N (ϵ)

against

log (1 / ϵ)

.

The Hall–Wood method provides an estimation of the fractal dimension by utilizing the properties of fractional Brownian motion, a stochastic process that generalizes Brownian motion [29,30]. Given a time series

X = {x_{1}, x_{2}, \dots, x_{n}}

, the method begins by computing the empirical variogram

γ (h)

, which measures the variance of the differences between pairs of observations separated by a lag h. The variogram is calculated as

γ (h) = \frac{1}{2 N (h)} \sum_{i = 1}^{N (h)} {(x_{i + h} - x_{i})}^{2},

(2)

where

N (h)

represents the number of pairs with lag h. For time series exhibiting self-similar behavior, the variogram follows a power-law relation,

γ (h) \sim c h^{2 H},

(3)

where H is the Hurst exponent and c is a constant. The fractal dimension

D_{H}

is then derived from the Hurst exponent using the relation

D_{H} = 2 - H

. The Hurst exponent H is typically estimated from the slope of the log-log plot of

γ (h)

versus h, where the linearity of this plot indicates self-similarity in the time series.

The Genton method offers a robust approach to estimating the fractal dimension, particularly for time series that may be affected by noise or nonstationarity. This method also employs the variogram, but improves upon traditional methods by incorporating robust statistical measures [31,32]. The empirical variogram is calculated similarly to the Hall–Wood method, but the Genton method uses the median of squared differences to provide a more resilient estimate, defined as

γ_{r o b u s t} (h) = median {({|x_{i + h} - x_{i}|}^{p})}^{\frac{2}{p}},

(4)

where p is typically set to 1 for median absolute deviation or 2 for standard variance. Like the Hall–Wood method, the Genton method assumes a power-law relationship

γ_{r o b u s t} (h) \sim c h^{2 H}

, and the fractal dimension

D_{G}

is estimated as

D_{G} = 2 - H

.

The Wavelet method leverages the multi-resolution properties of wavelets to estimate the fractal dimension, particularly suitable for nonstationary time series [33,34]. The method begins by decomposing the time series

X = {x_{1}, x_{2}, \dots, x_{n}}

using a wavelet transform. A wavelet

ψ (t)

is applied at different scales a, producing wavelet coefficients

W (a, b)

, where b is the translation parameter, according to the formula

W (a, b) = \frac{1}{\sqrt{a}} \int_{- \infty}^{\infty} X (t) ψ (\frac{t - b}{a}) d t .

(5)

The energy or variance of the wavelet coefficients at each scale a is then computed as

E (a) = \sum_{b} {| W (a, b) |}^{2}

. The wavelet method relies on the scaling behavior of

E (a)

, which follows a power-law relationship

E (a) \sim a^{β}

, where

β

is an exponent related to the Hurst exponent H. The Hurst exponent is then derived from the relation

H = \frac{β + 1}{2}

, and the fractal dimension

D_{W}

is computed as

D_{W} = 2 - H = \frac{3 - β}{2} .

(6)

This method is particularly powerful for analyzing the fractal properties of time series across different frequency bands, capturing the complexity and scaling behavior of the data.

The Box Counting method is one of the most intuitive and widely used techniques for estimating fractal dimensions. It is straightforward to implement, and works well for a wide range of time series and geometric objects. The primary advantage of the Box Counting method is its simplicity and ability to provide a quick estimate of the fractal dimension by simply counting the number of grid boxes that contain points from the time series as the box size varies. However, its accuracy can be limited, especially when dealing with highly irregular or noisy data. The method can also be sensitive to the choice of grid size, and may struggle to accurately capture the fractal dimension of time series that exhibit complex scaling behavior.

In contrast, the Hall–Wood method offers a more sophisticated approach, particularly useful for time series that can be modeled as fractional Brownian motion. This method leverages the variogram, a statistical tool that captures the variability between time points at different lags, to estimate the fractal dimension. The Hall–Wood method is particularly effective for self-similar time series, where the variogram exhibits a power-law relationship with the lag. By focusing on the Hurst exponent, which characterizes the degree of self-similarity, the Hall–Wood method provides a robust estimate of the fractal dimension. However, its reliance on the assumption of fractional Brownian motion can be a limitation when dealing with time series that do not follow this model.

The Genton method enhances the Hall–Wood approach by introducing robust statistical techniques to mitigate the effects of noise and nonstationarity in the time series. While it also uses the variogram, the Genton method replaces the standard variance with a robust estimator, such as the median absolute deviation, to improve the reliability of the fractal dimension estimate. This makes the Genton method particularly suitable for time series that are contaminated by outliers or exhibit irregular behavior. The trade-off, however, is increased computational complexity and the potential need for careful parameter selection, such as the choice of the robust estimator, to ensure accurate results.

The Wavelet method stands out for its ability to handle nonstationary time series, making it particularly powerful in contexts where the data exhibit varying degrees of complexity across different scales. By decomposing the time series into wavelet coefficients at multiple scales, the Wavelet method captures the scaling behavior and energy distribution across different frequency bands. This multi-resolution analysis enables a nuanced estimation of the fractal dimension that can adapt to the inherent complexity of the time series. However, the method requires a good understanding of wavelet transforms and careful selection of the wavelet function, which can make it more challenging to apply compared to the Box Counting method. Additionally, the Wavelet method may involve higher computational costs, particularly for long or complex time series.

2.3. Gramian Angular Fields

This paper presents a complete framework for identifying anomalies in time series data of occupational accidents in various sectors. The methodology combines fractal dimensions and Gramian Angular Fields (GAF) with modern deep learning clustering methods. Our approach consists of using fractal dimensions to measure the intricacy of accident patterns, converting these dimensions into GAF heat maps, and utilizing deep learning algorithms to detect abnormal behavior that may indicate possible safety hazards.

Gramian Angular Fields (GAF) are a technique used to transform time series data into a 2D matrix representation, enabling the application of image-based techniques such as Convolutional Neural Networks (CNNs) for time series analysis [35,36,37]. The core idea behind GAF is to convert the time series data into a polar coordinate system, encoding the angular relationships between different time points, and then construct a Gram matrix from these angular values. This process preserves the temporal dynamics and allows the extraction of spatial features from the time series.

Given a univariate time series

X = {x_{1}, x_{2}, \dots, x_{n}}

, where

x_{i}

represents the value of the time series at time step i, the first step is to normalize the time series to the interval

[- 1, 1]

. This normalization can be performed using min-max scaling

{\tilde{x}}_{i} = \frac{x_{i} - min (X)}{max (X) - min (X)} \times 2 - 1,

(7)

where

{\tilde{x}}_{i}

is the normalized value of

x_{i}

.

Next, each

{\tilde{x}}_{i}

is encoded as an angular value

ϕ_{i}

in the polar coordinate system

ϕ_{i} = arccos ({\tilde{x}}_{i}),

(8)

where

ϕ_{i}

represents the angular component corresponding to

{\tilde{x}}_{i}

in polar coordinates. Note that

{\tilde{x}}_{i}

is in the range

[- 1, 1]

, so

ϕ_{i}

will be in the range

[0, π]

.

The Gramian Angular Summation Field (GASF) is defined as a matrix

G

where each element

G_{i j}

represents the cosine of the sum of the angular values

ϕ_{i}

and

ϕ_{j}

G_{i j}^{GASF} = cos (ϕ_{i} + ϕ_{j}) .

(9)

Expanding the cosine function using trigonometric identities, we can rewrite the above expression as

G_{i j}^{GASF} = cos (ϕ_{i}) cos (ϕ_{j}) - sin (ϕ_{i}) sin (ϕ_{j}) .

(10)

Given that

cos (ϕ_{i}) = {\tilde{x}}_{i}

, we have

G_{i j}^{GASF} = {\tilde{x}}_{i} {\tilde{x}}_{j} - \sqrt{1 - {\tilde{x}}_{i}^{2}} \cdot \sqrt{1 - {\tilde{x}}_{j}^{2}} .

(11)

The GASF matrix

G

encodes the temporal correlations between different time points through the summation of their angular representations.

The Gramian Angular Difference Field (GADF) is an alternative to GASF, focusing on the sine of the difference between the angular values

ϕ_{i}

and

ϕ_{j}

G_{i j}^{GADF} = sin (ϕ_{i} - ϕ_{j}) .

(12)

This can be expanded using the trigonometric identity

G_{i j}^{GADF} = sin (ϕ_{i}) cos (ϕ_{j}) - cos (ϕ_{i}) sin (ϕ_{j}) .

(13)

Substituting

cos (ϕ_{i}) = {\tilde{x}}_{i}

and

sin (ϕ_{i}) = \sqrt{1 - {\tilde{x}}_{i}^{2}}

, we obtain

G_{i j}^{GADF} = \sqrt{1 - {\tilde{x}}_{i}^{2}} \cdot {\tilde{x}}_{j} - {\tilde{x}}_{i} \cdot \sqrt{1 - {\tilde{x}}_{j}^{2}} .

(14)

The GADF matrix

G

captures the temporal differences between time points, providing an alternative perspective on the time series data.

The choice of Gramian Angular Fields (GAF) in this study is pivotal for transforming time-series data into a format suitable for advanced anomaly detection techniques. GAF was selected over other potential methods because of its unique ability to convert one-dimensional time-series data into two-dimensional images while preserving temporal dependencies and intrinsic patterns. This transformation enables the application of powerful image-based deep learning models, such as Convolutional Autoencoders (CAE) and Variational Autoencoders (VAE), which are adept at capturing complex spatial features and detecting anomalies that might be overlooked by traditional time-series analysis methods.

One of the primary advantages of using GAF is its preservation of temporal dynamics. GAF encodes the temporal correlations of time-series data into spatial structures within images, meaning that both the magnitude and the temporal relationships between data points are maintained. This allows models to detect anomalies based on both value deviations and temporal patterns. Additionally, by converting time-series data into images, GAF facilitates the use of autoencoders.

Furthermore, GAF can represent nonlinear dynamics inherent in fractional dimension series, which are crucial for understanding the fractal characteristics of the data. This is particularly important in sectors where operational processes exhibit complex, nonlinear behaviors that traditional linear methods might not capture. The visual representation of time-series data through GAF also provides an intuitive way to interpret anomalies, aiding in the qualitative analysis of the data and supporting the quantitative findings of the models.

However, there are limitations to using GAF. Transforming time-series data into GAF images and processing them through deep learning models can be computationally intensive, requiring significant processing power and memory resources, especially with large datasets. While GAF aims to preserve temporal dependencies, the transformation process may lead to some loss of fine-grained temporal information. If the anomaly is highly dependent on specific time intervals, this could reduce the sensitivity of detection. Moreover, the conversion to two-dimensional images increases the dimensionality of the data, which might introduce challenges related to the “curse of dimensionality”, potentially affecting the performance of the models if not managed properly. The use of complex models like CAE and VAE on GAF-transformed data also increases the risk of overfitting, especially if the dataset is not sufficiently large or diverse to generalize well.

2.4. Anomaly Detection

A Convolutional Autoencoder (CAE) is a deep learning architecture well-suited for unsupervised anomaly detection [38,39]. It comprises two primary components: an encoder and a decoder. Given an input image

X

, such as a heat map derived from Gramian Angular Fields (GAF) of fractal dimensional time series data, the encoder maps this image to a lower-dimensional latent space

Z

. Mathematically, the encoder can be represented as a function

f_{enc}

, parameterized by weights

θ_{enc}

, such that

Z = f_{enc} (X; θ_{enc}) .

(15)

Here,

Z

is the latent representation, a compressed version of the input image

X

, capturing the essential features of the image while reducing its dimensionality.

The decoder then reconstructs the image from this latent representation. The decoder is represented as a function

f_{dec}

, parameterized by weights

θ_{dec}

, such that the reconstructed image

\hat{X}

is given by

\hat{X} = f_{dec} (Z; θ_{dec}) .

(16)

The overall goal of the CAE is to learn the parameters

θ_{enc}

and

θ_{dec}

such that the reconstructed image

\hat{X}

is as close as possible to the original input image

X

. This is achieved by minimizing the reconstruction error, which is typically measured using a loss function

L (X, \hat{X})

. A common choice for the loss function is the Mean Squared Error (MSE), defined as

L (X, \hat{X}) = \frac{1}{n} \sum_{i = 1}^{n} {(X_{i} - {\hat{X}}_{i})}^{2},

(17)

where n is the number of pixels in the image,

X_{i}

is the pixel value at position i in the original image, and

{\hat{X}}_{i}

is the corresponding pixel value in the reconstructed image.

During training, the CAE minimizes this loss function over a set of normal images, leading to learned representations

θ_{enc}^{*}

and

θ_{dec}^{*}

that are optimized to reconstruct normal data with low error. The trained CAE is then used for anomaly detection. When presented with a new image

X

, the reconstruction error

L (X, \hat{X})

is computed. For normal images, this error is expected to be small, as the CAE has learned to reconstruct such images accurately. However, for anomalous images, the reconstruction error tends to be significantly higher because the CAE has not encountered such patterns during training and, therefore, fails to reconstruct them accurately.

To detect anomalies, a threshold

τ

is set on the reconstruction error. If the error for a new image exceeds this threshold, i.e., if

L (X, \hat{X}) > τ,

(18)

then the image is flagged as anomalous. The threshold

τ

can be determined based on the distribution of reconstruction errors for a validation set of normal images, often using a percentile or statistical measure.

CAEs are particularly effective in this context, because they utilize convolutional layers in both the encoder and decoder, which excel at capturing spatial hierarchies in image data. This ability to model both local and global features allows CAEs to effectively learn the intricate spatial and temporal patterns present in GAF heat maps, making them ideal for identifying subtle deviations and anomalies in the data.

The CAE is a reliable technique for detecting anomalies without supervision, especially useful in situations where the objective is to identify and recreate intricate spatial patterns in the data. The CAE functions by compressing the input image into a latent space and subsequently reconstructing it, with the main goal of reducing the reconstruction error. This process is highly efficient when the regular data displays consistent patterns, enabling the CAE to acquire a concise representation that accurately reproduces these patterns with few mistakes. When the reconstruction error surpasses a predetermined threshold, anomalies are identified. This occurs because the CAE, which is usually unable to effectively rebuild patterns it has not been exposed to during training, fails in these cases. Nevertheless, a possible drawback of CAEs is their deterministic characteristic, as they assign a specific latent vector to each input, which may not adequately represent the inherent diversity in the data, especially when there is noise or minor anomalies present.

A Variational Autoencoder (VAE) is a probabilistic graphical model that extends the standard autoencoder by introducing a probabilistic approach to the latent space [40,41,42]. Unlike traditional autoencoders, which directly map the input image

X

to a single point in the latent space, a VAE models the latent space as a distribution. Specifically, the encoder in a VAE does not output a fixed latent vector

z

, but rather the parameters of a probability distribution, typically a multivariate Gaussian distribution

q_{ϕ} (z | X)

, where

ϕ

represents the parameters of the encoder network.

The encoder maps the input image

X

to the mean

μ (X)

and the logarithm of the variance

log σ^{2} (X)

of the Gaussian distribution

μ (X), log σ^{2} (X) = f_{enc} (X; ϕ) .

(19)

The latent variable

z

is then sampled from this Gaussian distribution

z \sim N (μ (X), σ^{2} (X)) .

(20)

This probabilistic formulation allows the VAE to capture the uncertainty in the latent space, providing a richer and more flexible representation of the input data.

The decoder, parameterized by

θ

, takes the sampled latent variable

z

and reconstructs the image

\hat{X}

by mapping

z

back to the image space

\hat{X} = f_{dec} (z; θ) .

(21)

The VAE is trained to maximize a variational lower bound on the data likelihood, known as the Evidence Lower Bound (ELBO). The ELBO consists of two terms: the reconstruction loss, which ensures that the decoded images resemble the input images, and the Kullback–Leibler (KL) divergence, which regularizes the latent space to be close to a prior distribution (usually a standard normal distribution

N (0, I)

). Mathematically, the ELBO is given by

L (X; ϕ, θ) = E_{q_{ϕ} (z | X)} [log p_{θ} (X | z)] - D_{KL} (q_{ϕ} (z | X) ∥ p (z)) .

(22)

The first term,

E_{q_{ϕ} (z | X)} [log p_{θ} (X | z)]

, represents the reconstruction loss, typically implemented as the negative log-likelihood of the reconstructed image given the latent variable

z

. The second term,

D_{KL} (q_{ϕ} (z | X) ∥ p (z))

, is the KL divergence between the learned latent distribution

q_{ϕ} (z | X)

and the prior

p (z)

, enforcing that the learned latent distribution is close to the prior distribution.

For anomaly detection, VAEs are particularly useful because they not only learn to reconstruct the data, but also provide a probabilistic model of the latent space. Given a new input image

X_{new}

, the encoder maps it to a latent distribution

q_{ϕ} (z | X_{new})

, and the decoder reconstructs the image

{\hat{X}}_{new}

. The anomaly score can be computed by evaluating the log-likelihood of the reconstructed image

{\hat{X}}_{new}

under the learned distribution

Anomaly Score = - log p_{θ} (X_{new} | z) .

(23)

Images with low log-likelihood, indicating that they do not fit well with the learned distribution of normal data, are flagged as anomalies. This approach is particularly powerful when dealing with GAF heat maps of fractal time series, as it can effectively model the inherent variability in the data while identifying outliers that deviate from the normal patterns.

The VAE overcomes certain constraints of the CAE by incorporating a probabilistic methodology in the latent space. VAEs, instead of directly mapping the input to a single point, employ a mapping that results in a distribution, usually a Gaussian, from which latent variables are sampled. The utilization of probabilistic modeling enables the VAE to encompass a wider spectrum of variations in the data, hence enhancing its capacity to model intricate distributions and identify abnormalities that may not be readily reproducible by a deterministic model, such as the CAE. The VAE’s capacity to represent uncertainty and produce novel samples from the acquired latent distribution also offers added versatility in anomaly detection, as images with a low likelihood according to the acquired distribution can be identified as anomalies. Nevertheless, the VAE adds more intricacy to the training process, as it necessitates optimizing a variational lower limit. This involves striking a balance between the precision of reconstruction and the regularization of the distribution in the latent space.

In the realm of anomaly detection, the absence of ground truth labels presents a significant challenge for quantitatively evaluating the performance and reliability of different detection methods. Without definitive labels indicating which data points are truly anomalous, researchers and practitioners must rely on a variety of intrinsic and comparative metrics to assess the effectiveness of their models. These metrics provide mathematical frameworks to understand how well an anomaly detection method distinguishes between normal and anomalous instances based on the inherent structure and distribution of the data.

The Mean Anomaly Score serves as a fundamental metric by averaging the anomaly scores assigned to each data point across all detection methods employed. Mathematically, if each data point

x_{i}

is assigned an anomaly score

s_{i j}

by method j, the mean anomaly score for data point

x_{i}

is given by

{\bar{s}}_{i} = \frac{1}{m} \sum_{j = 1}^{m} s_{i j},

(24)

where m is the number of detection methods. The overall mean anomaly score is then the average of these individual mean scores across all data points, providing a singular value that encapsulates the general tendency of the dataset towards anomaly as detected by the ensemble of methods.

The Silhouette Score offers insight into the consistency within clusters of data points, effectively measuring how similar each data point is to its own cluster compared to other clusters. For a data point

x_{i}

, the Silhouette coefficient

σ_{i}

is calculated using the mean intra-cluster distance

a (i)

and the mean nearest-cluster distance

b (i)

, such that

σ_{i} = \frac{b (i) - a (i)}{max {a (i), b (i)}} .

(25)

The overall Silhouette Score is the average of

σ_{i}

across all data points, with values ranging from −1 to +1. A higher Silhouette Score indicates well-separated and distinct clusters, suggesting effective differentiation between normal and anomalous data points.

The Mean Local Outlier Factor (LOF) score quantifies the degree to which a data point is considered an outlier based on the local density deviation with respect to its neighbors. For each data point

x_{i}

, the LOF score

LOF (x_{i})

is computed by comparing the local density of

x_{i}

to the local densities of its k-nearest neighbors. The mean LOF Score is the average of these scores across all data points, with higher values indicating a greater likelihood of being an outlier. This metric captures the relative outlierness of data points without relying on predefined cluster structures.

The Davies–Bouldin Index (DBI) assesses the average similarity ratio of each cluster with its most similar cluster, where similarity is defined in terms of within-cluster scatter and between-cluster separation. Mathematically, for each cluster i, the DBI identifies the cluster j that minimizes the ratio

\frac{S_{i} + S_{j}}{d (i, j)},

(26)

where

S_{i}

and

S_{j}

represent the average intra-cluster distances for clusters i and j, respectively, and

d (i, j)

is the distance between the centroids of clusters i and j. The overall DBI is the mean of these ratios across all clusters. A lower Davies–Bouldin Index signifies better clustering performance, with compact and well-separated clusters indicating more effective anomaly detection.

The Calinski–Harabasz Index (CHI) evaluates the ratio of between-cluster variance to within-cluster variance, providing a measure of cluster separation and compactness. For a given clustering configuration, CHI is calculated as

CHI = \frac{Tr (B_{k})}{Tr (W_{k})} \times \frac{N - k}{k - 1},

(27)

where

Tr (B_{k})

is the trace of the between-group dispersion matrix,

Tr (W_{k})

is the trace of the within-cluster dispersion matrix, N is the total number of data points, and k is the number of clusters. A higher Calinski–Harabasz Index indicates better-defined clusters with greater separation and lower intra-cluster variance, thereby reflecting more precise anomaly detection.

Lastly, the Dunn Index measures the ratio between the smallest inter-cluster distance to the largest intra-cluster distance, serving as an indicator of cluster compactness and separation. For a set of clusters, the Dunn Index is defined as

Dunn = \frac{{min}_{1 \leq i < j \leq k} δ (C_{i}, C_{j})}{{max}_{1 \leq l \leq k} Δ (C_{l})},

(28)

where

δ (C_{i}, C_{j})

represents the minimum distance between any two points in clusters

C_{i}

and

C_{j}

, and

Δ (C_{l})

denotes the maximum intra-cluster distance within cluster

C_{l}

. A higher Dunn Index signifies better clustering quality, with well-separated and tightly knit clusters indicating superior anomaly detection performance.

3. Results

3.1. Fractional Dimensions

In this study, for each NACE sector (NACE05, NACE30, and NACE24), we employed a sliding window technique to systematically segment the occupational accident time series data into smaller, overlapping sub-time series. This method enables us to capture temporal patterns and trends in occupational accidents over time at a finer granularity. Specifically, sliding windows with a width of eight data points were utilized, corresponding to weekly or bi-weekly periods. This choice of window size allows us to examine shorter-term variations and anomalies in the accident data, reflecting potential shifts in safety conditions or operational risks over a relatively short period.

The windows were applied with a step length of 1, meaning that the windows shift by one time unit at each step, ensuring maximum overlap between consecutive segments. This overlapping structure provides a more continuous and smooth analysis of the data, enabling the detection of subtle changes in the underlying patterns. By applying this technique to each NACE sector, we aim to identify potential anomalies and emerging trends in occupational accidents, which may signal underlying structural shifts or changes in operational practices within these industries.

For each sub-time series generated by the sliding windows, we computed four separate fractional dimension series, corresponding to the four distinct methods employed in this study: Box Counting, Hall–Wood, Genton, and Wavelet. Each of these methods captures different aspects of the data’s fractal or geometric complexity, providing complementary insights into the structure of the time series. The resulting fractional dimension series offer a detailed quantitative characterization of the complexity and variability within the occupational accident data, effectively highlighting non-linear patterns and irregularities that may not be readily visible using traditional methods.

These fractional dimensions serve as a powerful tool for measuring the self-similarity and roughness of the time series, allowing us to explore how the complexity of accident occurrences evolves over time within each NACE sector. By examining how these dimensions fluctuate across different sub-time series, we gain a more granular understanding of the underlying dynamics, including potential shifts in accident trends, the emergence of anomalous periods, and the effect of external factors on occupational safety.

3.1.1. Fractional Dimensions for NACE05

Figure 4 presents the fractional dimension series for the NACE05 sector (Coal and Lignite Extraction) calculated using sliding windows of size 8. Each of the four subplots corresponds to one of the four different methods used: Box Counting, Hall–Wood, Genton, and Wavelet, offering unique perspectives on the complexity and structure of the sub-time series data.

In the Box Counting method plot, the data points generally cluster around specific values, with fractional dimensions hovering around 1.5 and higher. This indicates that, on average, the time series exhibits moderate complexity, with several consistent bands suggesting recurring patterns in the accident data. However, some points dip significantly below 1, reflecting intervals where the complexity sharply decreases, possibly due to a reduction in the variability of accident occurrences during certain periods.

The Hall–Wood method plot shows more scattered results compared to Box Counting, with a wide range of fractional dimension values extending up to 7. The scattering of points may indicate that the Hall–Wood method captures more granular fluctuations in the complexity of the time series, potentially reflecting changes in external factors, such as operational changes or sudden shifts in safety regulations, that impact accident rates. However, most of the data points remain between 1.5 and 3, suggesting that the underlying accident patterns exhibit moderate complexity across most time windows.

The Genton method plot presents another perspective, where the fractional dimensions appear more densely packed around values of 1 to 2. This indicates that the Genton method primarily captures a lower range of complexities, perhaps focusing on smaller-scale irregularities within the time series. The presence of a few extreme outliers above 5 and negative values might point to sporadic, anomalous events in the accident data, such as significant safety lapses or rare operational changes, which temporarily disrupt the otherwise stable patterns.

Finally, the Wavelet method plot shows a distribution of fractional dimensions that predominantly lie between 1.5 and 3. This distribution suggests that the Wavelet method detects consistent levels of moderate complexity within the time series, with fewer extreme outliers than the Genton method. The relative clustering of points between 2 and 2.5 indicates that this method may be more sensitive to capturing recurring fluctuations in accident data over time, perhaps highlighting the cyclic or periodic nature of certain operational hazards.

In summary, while all four methods provide insights into the complexity of the NACE05 accident data, they capture different aspects of the underlying dynamics. The Box Counting method highlights consistent patterns with a few deviations, the Hall–Wood method captures more scattered and granular variations, the Genton method points to sporadic anomalies, and the Wavelet method emphasizes recurring fluctuations. Each method contributes a unique view, allowing for a comprehensive understanding of the temporal evolution and complexity in occupational accidents within the Coal and Lignite Extraction sector.

3.1.2. Fractional Dimensions for NACE30

Figure 5 presents the fractional dimension series for the NACE30 sector, calculated using a sliding window size of 8, across four methods: Box Counting, Hall–Wood, Genton, and Wavelet.

In the Box Counting method plot, we observe a prominent clustering of fractional dimension values around 1.5, with some deviations reaching up to 2. The relatively consistent clustering indicates that the time series for this sector demonstrates a moderately complex structure, with frequent and repeated patterns over time. However, some points fall below 1, indicating periods where the accident data are less complex and possibly more regular. These consistent bands of complexity suggest a stable yet repetitive accident pattern in the NACE30 sector.

The Hall–Wood method plot displays a wider spread of fractional dimension values compared to the Box Counting method, with dimensions ranging between 1 and 6. This more scattered distribution may suggest that the Hall–Wood method is sensitive to detecting more intricate variations and localized irregularities in the accident data. The spread across a wide range implies that, at times, the accident patterns may exhibit high complexity, which could indicate periods of increased unpredictability or external influences impacting accident rates.

The Genton method plot also shows a wide range of fractional dimension values, with values fluctuating between 0 and 5, although a dense clustering occurs around the 1.5 to 2 range. This suggests that while the general complexity of the time series is moderate, the Genton method is capturing occasional outliers and anomalies. These outliers may point to rare but significant deviations in the underlying accident dynamics, potentially highlighting times of sudden changes in safety protocols, operational risks, or external events affecting the manufacturing processes.

Lastly, the Wavelet method plot shows a relatively tight clustering of fractional dimensions between 1.5 and 2.5. The wavelet-based approach seems to highlight recurring trends and periodic fluctuations in the accident data, possibly reflecting cycles in operational risks or safety management practices within the NACE30 sector. The lack of extreme outliers in this method indicates that, overall, the accident data in this sector exhibits more regular and less chaotic behavior when analyzed with wavelet transformations.

In conclusion, the four methods provide complementary insights into the complexity of the accident data in the NACE30 sector. The Box Counting and Wavelet methods emphasize more consistent patterns and moderate complexity, while the Hall–Wood and Genton methods capture a broader range of variations and occasional extreme complexity, possibly related to anomalous events. Each method, therefore, contributes to a more nuanced understanding of the temporal dynamics in occupational accidents within the sector of manufacturing other transportation vehicles.

3.1.3. Fractional Dimensions for NACE24

Figure 6 presents the fractional dimension series for the NACE24 sector calculated using a sliding window size of 8 across four methods.

In the Box Counting method plot, we observe a strong clustering of fractional dimensions around 1.5, with some data points reaching as high as 2. This suggests a moderate level of complexity in the accident patterns over time, with consistent repeating behaviors dominating the data. Occasional dips below 1 reflect periods where the time series exhibits reduced complexity, possibly indicating more regular patterns in accident occurrences during those intervals. The clustering around specific values, as in the previous NACE sectors, could be indicative of relatively stable yet slightly complex dynamics within the Basic Metal Industry.

In the Hall–Wood method plot, we see a much more scattered distribution of fractional dimensions, with values ranging from close to 0 up to 7. This wider spread indicates that the Hall–Wood method is capturing more intricate and localized variations in the complexity of the time series. The high range suggests moments of elevated complexity, which may correspond to periods of heightened unpredictability in accident rates, potentially due to external shocks or operational changes within the sector.

The Genton method plot also reveals a scattered distribution, with most fractional dimensions falling between 1.5 and 2, although some extreme outliers rise above 6, while a few even fall below zero. These extreme values might represent rare, but significant, deviations from the normal accident pattern, potentially linked to sudden changes in safety protocols or operational anomalies. The clustering of points around 1.5 and 2 aligns with the moderate complexity seen in other methods, but the presence of outliers suggests that the Genton method is particularly sensitive to capturing sporadic anomalies.

The Wavelet method plot exhibits a distribution that remains fairly tight between 1.5 and 2.5, with fewer extreme outliers compared to the Genton and Hall–Wood methods. This consistency suggests that the Wavelet method is adept at identifying regular fluctuations and possibly cyclical behaviors in accident patterns over time. The relative lack of large deviations implies that, overall, the accident patterns in the Basic Metal Industry tend to exhibit more regularity and less chaotic behavior when viewed through the lens of wavelet analysis.

The fractional dimension analysis of the NACE24 sector indicates a somewhat complicated structure in occupational accident data, with the Box Counting and Wavelet approaches highlighting consistent, stable patterns. Conversely, the Hall–Wood and Genton techniques encompass a wider array of variations and anomalies, signifying instances of heightened complexity or substantial deviations from standard patterns. Collectively, these methodologies offer an extensive comprehension of the temporal dynamics in the accident data pertaining to the Basic Metal Industry.

3.2. GAFs

This study utilizes Gramian Angular Fields (GAFs) to convert fractional dimension time series data into a visual format that facilitates anomaly detection via deep learning techniques. GAFs transform time series data into organized images by encoding angular information, so effectively capturing temporal dependencies and patterns within the data. The benefit of utilizing GAFs is their capacity to transform time series into two-dimensional matrices, facilitating the use of advanced image-based methodologies.

For this study, we have specifically chosen to use Gramian Angular Summation Fields (GASFs) as the method of representation. GASFs encode the summation of angular values between time series points, effectively capturing both the magnitude and directional trends of the data. This choice is particularly justified for the anomaly detection task in our dataset, which focuses on understanding longer-term trends and cumulative patterns in occupational accident data across different sectors (NACE05, NACE24, and NACE30). GASFs emphasize the overall trajectory of the time series, allowing for the identification of gradual shifts or cumulative deviations from normal behavior, which are crucial in detecting anomalies that develop over time.

The decision to use GASFs over Gramian Angular Difference Fields (GADFs) is grounded in the nature of the data and the objectives of this study. While GADFs highlight localized variations and are more sensitive to short-term fluctuations, GASFs provide a more comprehensive view of the global patterns within the time series, making them better suited for detecting gradual anomalies that emerge over longer periods. This aligns with the goal of identifying sustained changes in the accident data, where a cumulative perspective on the data are more informative than focusing solely on short-term differences. Thus, GASFs were chosen as the more appropriate method for this study, as they allow us to capture and analyze broader temporal trends that are critical for understanding the dynamics of occupational safety in these sectors.

Figure 7, Figure 8 and Figure 9 presents samples of images generated using Gramian Angular Summation Fields (GASFs) for different NACE sectors, specifically NACE05, NACE24, and NACE30, with window lengths of 8. Each row of images corresponds to a specific NACE sector and highlights the results from four different fractional dimension methods: Box Counting, Hall–Wood, Genton, and Wavelet.

In these images, the contrast and patterns reflect the variation in the fractional dimensions over time. For instance, the Box Counting method produces images that display more homogeneous and repetitive patterns across sectors, indicating a more consistent capture of high-dimensional anomalies. The Hall–Wood method produces images with more intersecting lines and changes in color intensity, reflecting its sensitivity to diverse structural changes. The Genton method shows highly textured images with visible grids and intersections, implying that this method captures a wide range of irregularities and fluctuations in the dataset. Finally, the Wavelet method generates images with smoother gradients and fewer intersections, which may point to its focus on detecting gradual and ongoing changes rather than sharp anomalies. The differences between the methods visually highlight their varying capabilities in capturing distinct anomaly types and patterns, offering insight into how each approach interacts with the underlying data structure.

3.3. Deep Learning Models Results

The methodology used in this study follows a structured approach for anomaly detection in the fractional dimension series derived from occupational accident data. First, an 8-length sliding window with a step size of 1 is applied to the fractional dimensions dataset, allowing for dynamic subseries analysis over time. For each window, the GASF technique is employed to convert the fractional dimension series into an image representation. This transformation enables the use of image-based techniques for anomaly detection.

Both a Convolutional Autoencoder and a Variational Autoencoder are employed to detect anomalies within the generated images. These autoencoders are trained in an unsupervised manner, learning the normal patterns in the data and identifying any significant deviations as anomalies. The autoencoder is trained in an unsupervised manner, meaning that no labeled data are required for the detection process. The autoencoder learns the normal patterns within the image representations and flags any deviations as potential anomalies. Finally, the fractional dimension corresponding to the midpoint of the sliding window is selected as the anomaly point, indicating when the anomaly likely occurred in the time series.

3.4. Results for NACE05

Figure 10 consists of four plots representing the anomaly detection results from the CAE applied to fractional dimension datasets for NACE05, using different methods: Box Counting, Hall–Wood, Genton, and Wavelet. The black points represent detected anomalies, while the lighter blue points illustrate the normal data points across the dataset.

The anomalies detected in the Box Counting method are relatively consistent, showing clusters of anomalies around specific intervals in the dataset. The majority of the black points are concentrated near the upper range of fractional dimension values (around 1.8 to 2.0). This suggests that outlier behavior occurs when the dimensionality is high. There are distinct regions, particularly around indices 500 and 1500, where anomalies are notably dense, implying a potential pattern or shift in behavior in those specific time windows.

In the Hall–Wood method, anomalies are distributed more widely across the range of fractional dimensions, with many points falling between 4.0 and 6.0. This indicates that the detected anomalies in this method are less clustered and more spread out, covering a wider range of fractional dimension values. The dispersion of anomalies across various levels, especially around index 500, suggests that the method is capturing more diverse anomalies. These variations may point to different underlying structures or events causing the anomalies within this time frame.

The Genton method shows a noticeable pattern where anomalies are concentrated at higher fractional dimension values, particularly above 3.0. The clustering of anomalies between indices 500 and 1000, as well as later around 1500, highlights potential areas of interest where abrupt changes in the data occur. The concentration of anomalies within these regions suggests that the method may be highly sensitive to fluctuations in this range of dimensions, possibly indicating significant structural changes in the dataset during these periods.

The anomalies detected using the Wavelet method are more evenly distributed across the fractional dimension space, particularly between 2.0 and 3.0. While there are no extreme outliers in terms of values, the anomalies are relatively scattered across the timeline, though with a slight tendency to concentrate around indices 400 and 1500. This may suggest that the Wavelet method detects subtler, more gradual anomalies in the dataset, potentially related to smoother, longer-term shifts in the underlying dynamics.

In Table 1, we present anomaly detection metrics for anomalies detected by CAE.

Figure 11 contains four plots that display the anomaly detection results using a VAE over the fractional dimension dataset for the NACE05 sector.

The Box Counting method reveals a similar pattern to the previous figure, with anomalies largely concentrated around the higher fractional dimension values, specifically between 1.8 and 2.0. There is a consistent set of detected anomalies throughout the dataset, particularly clustering around the earlier part (indices 0–500) and slightly less so in the later part (1500–2000). These clusters indicate that the VAE has identified recurrent anomaly structures in high-dimensional spaces, possibly capturing events or periods with irregular structural changes within the data.

The Hall–Wood method presents a wider spread of anomalies detected by the VAE. Most anomalies occur within a range of fractional dimensions between 4.0 and 6.0, with some points even higher, reaching around 7.0. Unlike the Box Counting method, the anomalies here are more dispersed across the time indices, particularly noticeable in the first half of the dataset, around indices 0–1000. This distribution suggests that the VAE is detecting a more diverse set of irregularities across multiple time points, possibly indicating various phases of activity or dynamic shifts in this period.

In the Genton method, anomalies are clustered in the mid-to-higher fractional dimension range, between 2.5 and 4.5. The anomalies appear fairly consistent across the timeline, but there are particularly dense concentrations around indices 500–1000 and 1500–2000. This pattern suggests the VAE’s sensitivity to fluctuations in fractional dimension values in these ranges, indicating potential irregularities or structural transitions in the data during these time intervals. The spread of anomalies across a moderate dimensional range highlights that the VAE is picking up on both subtle and more pronounced shifts.

The Wavelet method shows a more regular distribution of anomalies compared to the other methods, with most anomalies occurring in the fractional dimension range between 2.0 and 3.0. These anomalies are relatively scattered across the timeline, with some denser periods, particularly around indices 0–500 and 1500–2000. The VAE is detecting changes within this middle range of fractional dimensions, indicating that it is sensitive to gradual, consistent changes in the dataset over time, potentially reflecting subtle yet ongoing variations or shifts.

In Table 2, we present anomaly detection metrics for anomalies detected by VAE.

3.5. Results for NACE30

Figure 12 illustrates the results of anomaly detection using a CAE over fractional dimension datasets for NACE30, with black points representing detected anomalies, and light points showing the overall dataset. The four methods used to compute fractional dimensions—Box Counting, Hall–Wood, Genton, and Wavelet—are shown in separate plots.

The anomalies detected in the Box Counting method are predominantly located in the upper range of fractional dimensions, between 1.6 and 2.0. There is a clear clustering of anomalies, especially around indices 0–500 and 1000–1500. This pattern indicates that certain periods within the dataset exhibit heightened irregularity in high-dimensional spaces, suggesting shifts in the data structure or external factors affecting the sector during these periods.

The anomalies detected with the Hall–Wood method are more spread out, with a significant portion appearing in the range of 3.0 to 5.0 fractional dimensions. The anomaly distribution is slightly more concentrated in the earlier part of the dataset, particularly between indices 0 and 500, although there are smaller clusters at later points as well. This suggests that the Hall–Wood method is capturing more diverse anomalies, possibly reflecting a wider range of variability or disruptions in the dataset at different time intervals.

The anomalies detected with the Genton method exhibit a broader spread, with anomalies occurring between 2.0 and 4.0, and occasional points reaching 5.0. There is a noticeable concentration of anomalies between indices 0–1000, after which they become more sparsely distributed. This indicates that early periods in the dataset experienced more structural changes or irregularities compared to later periods. The presence of anomalies over a broader range of fractional dimensions suggests that this method captures both subtle and more pronounced shifts in the dataset.

The Wavelet method shows a relatively even distribution of anomalies, mostly in the range of 2.0 to 3.0. The anomalies are more scattered throughout the dataset, with no particularly dense clusters but a consistent presence across the time series. This suggests that the Wavelet method is detecting more gradual or smaller-scale anomalies that might not be as abrupt but indicate ongoing variations in the dataset. The even distribution implies that changes in the dataset are more continuous and less concentrated in specific time windows.

In Table 3, we present anomaly detection metrics for anomalies detected by CAE.

Figure 13 displays the results of anomaly detection using a VAE applied to the fractional dimension datasets for NACE30.

In the Box Counting method, the detected anomalies are concentrated at higher fractional dimension values, particularly between 1.6 and 2.0. The anomalies tend to cluster around certain periods, notably between indices 0–500 and around 1000–1500. These clusters suggest that significant anomalies or irregular patterns in the data are more common during these time intervals. The concentration of anomalies in high-dimensional spaces might indicate that notable deviations or shifts in the dataset occur when the data becomes more complex or dynamic.

The Hall–Wood method reveals a broader spread of anomalies across a wider range of fractional dimensions, with most anomalies occurring between 3.0 and 5.0. The anomalies are more evenly distributed across the timeline, with no single period dominating, although there is a noticeable density of anomalies in the early part of the dataset (0–500). This suggests that the VAE is detecting a variety of anomalies over time, capturing irregularities in both subtle and more pronounced shifts in the dataset. The Hall–Wood method seems particularly sensitive to a range of structural changes, highlighting its ability to identify different types of anomalies.

The Genton method shows a higher concentration of anomalies around the mid-to-higher range of fractional dimensions, primarily between 2.5 and 4.0. The anomalies appear more consistently throughout the dataset, with fewer clear clusters, but a higher density between indices 500–1000. This suggests that the VAE, using the Genton method, captures persistent anomalies in this dimension range. The method is likely sensitive to moderate changes in the dataset, detecting both smaller and more sustained anomalies across a wider timeline.

The anomalies detected using the Wavelet method are distributed fairly evenly across the fractional dimension range, primarily between 2.0 and 3.0. The anomalies are also spread relatively uniformly across the timeline, though with a slight increase in density around indices 0–500. This suggests that the Wavelet method captures ongoing, consistent changes in the dataset, detecting anomalies that may represent gradual or longer-term shifts rather than abrupt changes. The relatively even distribution indicates that this method is well-suited to identifying more continuous, subtle anomalies.

In Table 4, we present anomaly detection metrics for anomalies detected by VAE.

3.6. Results for NACE24

Figure 14 illustrates anomaly detection results using a CAE applied to the fractional dimension datasets for NACE24. The four subplots represent different methods for computing fractional dimensions, the light-colored points represent the overall dataset, while the black points indicate detected anomalies.

The Box Counting method shows a consistent detection of anomalies, primarily in the higher fractional dimension range between 1.6 and 2.0. These anomalies are clustered at various intervals across the dataset, with concentrations around indices 0–500 and 1000–1500. The anomalies become slightly more dispersed towards the latter half of the dataset. This concentration of anomalies at high-dimensional values suggests that significant deviations or structural shifts occur during these periods, possibly pointing to irregular activities in the industry during these intervals.

The Hall–Wood method displays a wider spread of detected anomalies, with most occurring in the 3.0 to 5.0 fractional dimension range. These anomalies are somewhat evenly distributed throughout the dataset, though there is a notable density of anomalies around indices 0–500. The spread suggests that the Hall–Wood method is capable of capturing a variety of structural changes in the data, detecting both early irregularities and a more consistent pattern of anomalies later in the dataset. This points to ongoing variability in the underlying processes of the industry.

The anomalies detected using the Genton method are concentrated around the 2.5 to 4.0 range in fractional dimensions. The anomaly detection is fairly dense, particularly in the earlier part of the dataset (0–1000). However, as the timeline progresses, the anomalies become more scattered. The dense concentration of anomalies in the earlier part of the dataset suggests more significant irregularities or disruptions during those periods, whereas the latter part of the dataset may exhibit more subtle or gradual changes.

In the Wavelet method, anomalies are detected mostly between 2.0 and 3.0, with a relatively even distribution across the timeline. There are no clear clusters of anomalies in specific time intervals, but there is a consistent presence of anomalies across the dataset, suggesting ongoing subtle variations in the dataset over time. The more evenly spread anomalies indicate that the Wavelet method is sensitive to continuous but less pronounced shifts, possibly indicating gradual, ongoing changes in the data rather than abrupt disruptions.

In Table 5, we present anomaly detection metrics for anomalies detected by CAE.

Figure 15 presents anomaly detection results using a VAE applied to the fractional dimension datasets for NACE24.

In the Box Counting method, the anomalies are primarily located in the higher fractional dimension range, between 1.6 and 2.0. There are dense clusters of anomalies, particularly at indices 0–500 and 1000–1500. The anomaly detection becomes more dispersed in the latter half of the dataset. The concentration of anomalies in the upper range of fractional dimensions suggests that notable deviations or irregular events are occurring during these time periods, likely indicating significant structural changes in the dataset during these intervals.

The Hall–Wood method exhibits a broader range of detected anomalies, with most falling between 3.0 and 5.0. Anomalies are relatively evenly distributed throughout the dataset, though there is a denser concentration of anomalies around indices 0–500. This pattern indicates that the Hall–Wood method detects a diverse set of anomalies across multiple time periods, capturing both early irregularities and consistent patterns of anomalies later in the dataset. This spread suggests ongoing variability or disruptions in the sector over time.

Anomalies detected using the Genton method are concentrated between 2.0 and 4.0 fractional dimensions. The anomalies are denser in the earlier part of the dataset (indices 0–1000), while they become more dispersed in the later periods. This suggests that the Genton method detects a high number of anomalies during the early stages of the dataset, reflecting significant irregularities during that time. As the dataset progresses, fewer anomalies are detected, possibly indicating a period of stabilization or smaller, less pronounced shifts.

The Wavelet method displays a relatively uniform distribution of anomalies, mostly in the 2.0 to 3.0 range. The anomalies are consistently present throughout the dataset, with no distinct clustering, though there are some slight increases in density between indices 0–500 and 1500–2000. This even distribution of anomalies suggests that the Wavelet method is sensitive to ongoing, subtle changes in the dataset, potentially capturing continuous variations that may not be abrupt but reflect long-term structural shifts.

In Table 6, we present anomaly detection metrics for anomalies detected by VAE.

4. Discussion

The analysis of fractional dimensions across the NACE05, NACE30, and NACE24 sectors reveals nuanced insights into the complexity and dynamics of occupational accident data within each industry. Utilizing four distinct methods—Box Counting, Hall–Wood, Genton, and Wavelet—the study consistently identifies moderate levels of complexity, as indicated by fractional dimensions predominantly ranging between 1.5 and 3. The Box Counting and Wavelet methods across all sectors highlight stable and recurring patterns, suggesting a degree of regularity and predictability in accident occurrences. These methods often show tight clustering around specific values, reflecting consistent behaviors and periodic fluctuations that may be tied to cyclical operational risks or established safety protocols.

Conversely, the Hall–Wood and Genton methods demonstrate a broader spread of fractional dimension values, capturing more granular and sporadic variations. This scatter suggests that these methods are sensitive to irregularities and anomalous events, such as sudden changes in safety regulations or unexpected operational shifts, which introduce higher complexity and unpredictability into the accident data. Particularly in the NACE30 and NACE24 sectors, these methods reveal occasional extreme values, pointing to rare but significant deviations from normal patterns that could be critical for targeted safety interventions.

Overall, the complementary nature of the four methods provides a comprehensive understanding of the temporal dynamics in occupational accidents across different industrial sectors. While Box Counting and Wavelet methods emphasize consistent and moderate complexity, Hall–Wood and Genton approaches uncover underlying irregularities and potential anomalies. This multifaceted analysis underscores the importance of employing diverse analytical techniques to fully capture the intricate behaviors and evolving risks within various occupational environments, thereby informing more effective safety management and policy development.

For each data point, an anomaly score is calculated by comparing the original data with the model’s output. The difference between the two, often measured using metrics like mean squared error (MSE), indicates how well the model was able to reconstruct the input. If the reconstruction error (or anomaly score) is low, the data point is considered normal. If the error is high, the model struggled to accurately reconstruct the input, suggesting that the data point is anomalous.

In the remaining part of this section, we will present a detailed discussion on the anomaly scores detected by both the Convolutional Autoencoder (CAE) and the Variational Autoencoder (VAE). We will compare and interpret the performance of each method in identifying anomalous data points. Furthermore, we will conduct a statistical analysis of the anomalies detected by both methods, examining the distribution, frequency, and severity of these anomalies.

4.1. Discussions on NACE05

Figure 16 represents the distributions of anomaly scores detected by both the CAE and the VAE for four different fractional dimension methods: Box Counting, Hall–Wood, Genton, and Wavelet, within the NACE05 sector.

The results for the anomaly detection in the NACE05 sector using CAE and VAE models show several interesting patterns when comparing different fractal dimension methods. For the Box Counting method, both models—CAE and VAE—exhibit similar distributions in their anomaly scores, though the distributions are not identical. The CAE has a more varied distribution with visible peaks and valleys, whereas the VAE presents a smoother and more uniform spread. Both models tend to cluster their anomaly scores around the middle range of the normalized scale, but the CAE has a higher concentration of anomalies in the central region, indicating that it may detect a broader range of moderate anomalies. In contrast, the VAE shows a wider distribution, suggesting that it captures a more extensive variety of anomalies, which could imply its capacity to detect more subtle deviations as well.

The Hall–Wood method shows more distinct distributions between the CAE and VAE models. The CAE distribution appears more symmetric, centering the anomaly scores around the middle of the scale. This symmetry suggests that CAE identifies moderate anomalies more consistently. On the other hand, the VAE’s distribution is skewed towards higher anomaly scores, indicating that it is more sensitive to capturing larger deviations in this particular fractal dimension. This result implies that the VAE may be more effective for identifying more extreme outliers, while the CAE is better at detecting more moderate ones.

For the Genton method, the contrast between CAE and VAE is particularly notable. The CAE model produces a very narrow and concentrated distribution, indicating that it identifies only a small range of anomalies. This could mean that the CAE is more conservative and focuses on clearer, more obvious anomalies. In contrast, the VAE model exhibits a much broader distribution of anomaly scores, suggesting that it captures a wider variety of anomalies. This broader range could mean that the VAE is better suited for detecting outliers in datasets where anomalies are diverse in nature. The difference between the CAE and VAE in this method is the most pronounced, with the VAE clearly offering a more comprehensive detection of anomalies.

The Wavelet method yields more similar distributions for both the CAE and VAE models. Both models concentrate their anomaly scores around the central region of the scale, with only slight differences in the tails of the distributions. The CAE’s distribution is slightly more peaked, implying that it focuses on a narrower range of anomalies, while the VAE has a broader and smoother distribution, suggesting that it detects a more diverse set of deviations. Overall, for the Wavelet method, both models perform comparably, though the VAE still demonstrates a tendency to capture a wider range of anomalies.

Comparatively, across the four fractal dimension methods, the Genton method stands out as the one where the difference between CAE and VAE is most significant, with the VAE detecting a wider variety of anomalies. The Hall–Wood method also shows notable differences, particularly in the upper range of anomaly scores, where the VAE is more sensitive to larger deviations. On the other hand, the Box Counting and Wavelet methods display more consistency between the two models, with Box Counting showing more localized clustering of anomaly scores and Wavelet presenting more evenly distributed scores. These observations suggest that the choice of fractal dimension method significantly influences how CAE and VAE detect anomalies. The Genton and Hall–Wood methods show larger variances between the models, with the VAE excelling in capturing a broader range of anomalies, while the Box Counting and Wavelet methods demonstrate more consistent behavior between the models, with both methods detecting a similar range of anomalies.

From the quantitative results in Table 1 and Table 2, several key points emerge. The Mean Anomaly Score is consistently higher for the VAE across most methods, indicating a broader detection of anomalies compared to CAE. The Silhouette Score, however, is more negative in both models, indicating that the separation between normal and anomalous data points may not be distinct. Mean LOF Score values are notably higher for Genton, Hall–Wood, and Wavelet in VAE, reinforcing the idea that VAE captures more complex anomaly patterns. The Davies–Bouldin Index is lower for the Wavelet method in both models, suggesting that clustering in this method is more distinct. The Calinski–Harabasz Index and Dunn Index show consistent patterns with the VAE model exhibiting better cluster separation and compactness in most cases, especially in the Hall–Wood and Genton methods. These metrics affirm the tendency of VAE to detect a broader and more diverse set of anomalies across different methods, while the CAE remains more focused on moderate anomalies with tighter clustering.

In the mining sector, safety is a paramount concern due to the high-risk nature of underground and surface mining operations. The Box Counting method revealed pronounced variations in the CAE distributions for NACE05, indicating that the CAE model detected specific anomalies that might correspond to irregular safety incidents or non-compliance with safety protocols. The higher Davies–Bouldin Index (1.479 for CAE) suggests less distinct clustering, which could reflect a variety of operational behaviors leading to safety risks. The Hall–Wood and Genton methods showed that the VAE model captured a broader range of anomalies with higher mean anomaly scores (2.027 for VAE in Hall–Wood, 1.688 for VAE in Genton). This suggests that VAE was more sensitive to detecting significant deviations, potentially corresponding to critical safety incidents such as equipment failures, hazardous environmental conditions, or procedural lapses. The detection of a wide variety of anomalies by VAE implies that mining operations may have complex safety challenges that require robust monitoring. The models’ ability to identify these anomalies can help safety managers pinpoint specific areas where safety practices may be lacking or where additional training is necessary. For instance, anomalies could indicate patterns in near-miss incidents, equipment malfunctions, or non-adherence to safety checklists, allowing for targeted interventions.

4.2. Discussions on NACE30

The violin plots provided is Figure 17 show the distributions of anomaly scores detected by both the CAE and the VAE for the NACE30 sector, analyzed through four different fractional dimension methods.

In the anomaly detection results for the Manufacture of Other Transportation Vehicles (NACE30) sector, using the CAE and VAE models reveals several interesting patterns across different fractal dimensions. In the Box Counting method, the anomaly score distributions differ notably between CAE and VAE. The CAE model shows more pronounced variability with distinct peaks, suggesting that it identifies a more differentiated set of anomalies. In contrast, the VAE’s distribution is smoother and shows a more consistent spread across a wider range of anomaly scores. Both models primarily detect anomalies around the middle range of scores, but the VAE’s broader distribution suggests it might be more sensitive to a greater variety of anomalies. However, the CAE’s sharper features indicate that it is better at identifying distinct, concentrated patterns of anomalies, whereas the VAE might be picking up more subtle variations.

In the Hall–Wood method, the differences between CAE and VAE become more apparent. The CAE distribution is narrower and more centralized, focusing on a smaller range of anomalies with a focus on moderate deviations. This suggests that CAE may miss some of the more extreme outliers, but provides a more focused detection of smaller, consistent anomalies. On the other hand, the VAE distribution is wider and smoother, capturing a more diverse set of anomaly scores, including higher outliers. This implies that the VAE is more attuned to larger deviations, making it more effective at identifying extreme anomalies within this method.

The Genton method reveals the most striking differences between the two models. The CAE model’s anomaly score distribution is narrow and concentrated near the lower end of the scale, indicating that it detects very few anomalies, likely focusing on highly specific deviations. In contrast, the VAE displays a much broader distribution with a peak towards the mid-to-upper range of anomaly scores. This suggests that the VAE is more sensitive to variations in the data and is identifying a broader set of outliers. The CAE, however, appears much more selective, detecting only a limited set of anomalies with this method. This divergence in anomaly detection suggests that the VAE is far more effective for identifying diverse and more complex anomalies in this context.

The Wavelet method shows somewhat similar distributions for both CAE and VAE models, though VAE demonstrates a slightly broader spread in anomaly scores. Both models have central peaks, indicating that they detect most anomalies in the mid-range of the scale. However, the VAE’s broader tail suggests that it detects a wider range of anomalies, while the CAE focuses more on a narrower band of scores, indicating fewer variations in its anomaly detection. This means that both models perform similarly in this method, but VAE might capture more subtle outliers compared to CAE, which is more focused on a specific anomaly range.

From the results presented in Table 3 and Table 4, some clear comparative insights emerge. The Mean Anomaly Score remains consistent between CAE and VAE for all methods, with slightly higher scores observed in VAE for the Genton and Hall–Wood methods, highlighting its broader detection capability. The Silhouette Score for both models in Box Counting is positive, indicating good cluster cohesion, but it becomes negative in other methods, implying overlapping clusters or difficulty distinguishing between normal and anomalous data points. The Mean LOF Score remains notably high in the Genton and Hall–Wood methods, particularly in VAE, reinforcing its sensitivity to diverse anomaly patterns. The Davies–Bouldin Index is lower for Wavelet in both models, indicating more distinct clustering in this method. Interestingly, the Calinski–Harabasz Index for VAE is higher than CAE in the Genton method, indicating better-defined clusters, while the Dunn Index remains zero for both models, suggesting poor cluster separation for all methods.

The manufacture of transport equipment includes the production of ships, trains, aircraft, and other vehicles, involving complex assembly processes and coordination of numerous components. In NACE30, the CAE model identified distinct peaks in anomaly distributions using the Box Counting method, reflected by a high Calinski–Harabasz Index (8.523 for CAE). This indicates a segmented range of outliers, which may correspond to specific operational behaviors affecting safety. The VAE model, particularly with the Hall–Wood and Genton methods, captured a broader range of anomalies with higher mean LOF scores (798,700 in Hall–Wood for VAE) and mean anomaly scores, suggesting sensitivity to both moderate and extreme deviations. These anomalies might be linked to complex supply chain issues, inconsistencies in quality control, or variations in assembly processes that could impact worker safety. In the context of occupational safety, these findings suggest that the manufacturing processes in NACE30 are subject to variability that could pose safety risks. The ability of the models to detect a diverse set of anomalies can assist safety professionals in identifying problematic stages in the production process. For example, anomalies may reveal patterns of equipment misuse, lapses in protective measures, or areas where safety training is insufficient.

4.3. Discussions on NACE24

Figure 18 show the distributions of anomaly scores detected by both the CAE and the VAE for the NACE24 sector, analyzed through four different fractional dimension methods.

In the anomaly detection results for the Basic Metal Industry (NACE24) sector using CAE and VAE models, notable patterns emerge across the different fractal dimension methods. In the Box Counting method, the anomaly score distributions for CAE and VAE show a relatively similar structure, but with some key differences. The CAE distribution is characterized by distinct peaks and troughs, indicating that it detects a more segmented and well-defined range of anomalies. On the other hand, the VAE distribution is smoother, reflecting a more even spread of anomaly scores. Despite these differences, both models cluster the majority of anomaly scores toward the middle range, although the VAE captures a slightly broader range. This broader spread suggests that VAE may be more sensitive to a wider variety of anomalies, while CAE identifies more specific and pronounced clusters of outliers.

The Hall–Wood method reveals a significant divergence between the models. The CAE distribution is narrow and concentrated, with the majority of detected anomalies focused around a mid-range value. This pattern suggests that CAE is better suited to detecting moderate deviations and misses some of the larger outliers. In contrast, the VAE distribution is much broader, with a more pronounced tail extending toward higher anomaly scores. This indicates that VAE is more sensitive to larger anomalies, capturing more extreme deviations, whereas CAE focuses on consistent, moderate outliers.

In the Genton method, the distinction between CAE and VAE is even more pronounced. The CAE distribution is very narrow and concentrated at the lower end of the spectrum, indicating that it detects very few anomalies, most of which are highly specific. In contrast, the VAE distribution is much broader, capturing a wide array of anomaly scores, especially in the mid-to-high range. This broader detection range suggests that VAE is far more sensitive to variations in the data and can identify a more diverse set of outliers, while CAE remains focused on a limited subset of anomalies.

The Wavelet method yields somewhat comparable results for both models, though the VAE still displays a slightly broader distribution. Both models have central peaks, suggesting that most anomalies detected are clustered around mid-range values. However, the VAE exhibits a longer tail, indicating that it captures more subtle and less obvious outliers compared to the CAE. Overall, both models perform similarly in this method, but VAE’s broader distribution suggests a greater sensitivity to subtle anomalies.

Comparing the results from Table 5 and Table 6 provides a more detailed quantitative evaluation. The Mean Anomaly Score remains largely consistent across models and methods, though slightly higher values for the VAE in the Genton and Hall–Wood methods highlight its broader anomaly detection capabilities. The Silhouette Score shows negative values across all methods, with slightly more negative values in the Genton method for both models, indicating some overlap between the detected anomalies and normal data points. The Mean LOF Score is consistent between the models, with higher values in the Genton and Hall–Wood methods, suggesting greater outlier sensitivity in VAE. The Davies–Bouldin Index remains lower for both models in the Wavelet method, indicating clearer clustering, while Calinski–Harabasz Index values are notably higher in VAE for the Genton method, indicating better-defined clusters. The Dunn Index remains very low or zero, implying poor separation between clusters across methods.

In the basic metals manufacturing sector, safety issues often revolve around heavy machinery, high temperatures, and hazardous materials. The findings showed more uniform behavior between CAE and VAE models in NACE24 across all methods, with fewer extreme outliers detected. The Wavelet method displayed consistent mean anomaly scores (1.986 for both models), and the Genton method indicated fewer high deviations. This uniformity suggests that safety policies and practices in NACE24 might be more standardized and effectively implemented, leading to fewer anomalies related to safety incidents. The tighter clustering indicated by lower Davies–Bouldin Index values implies that operational behaviors are more consistent, potentially due to strict adherence to safety regulations and comprehensive training programs. However, the models still detected moderate anomalies, which could correlate with minor safety incidents or near-misses. These findings highlight the importance of continuous monitoring and incremental improvements in safety practices, even in sectors with strong safety records. The consistent detection of moderate anomalies can prompt reviews of routine procedures and encourage a proactive approach to hazard identification.

4.4. Comparative Discussions

The Box Counting method revealed similarities in the distributions across all sectors, with both models displaying somewhat comparable behavior. In NACE05, however, there were more pronounced variations in the CAE distributions compared to NACE24, where both models exhibited smoother and more regular patterns. In NACE30, the CAE’s anomaly distribution showed distinct peaks, indicating a more segmented range of outliers. Quantitatively, the mean anomaly scores across all sectors for both models were fairly close, with slightly higher scores for NACE30. The higher Davies–Bouldin Index for CAE in NACE05 (1.479) compared to NACE24 (1.083) reflected these variations. The distinct peaks in NACE30 were corroborated by a high Calinski–Harabasz Index (8.523 for CAE), suggesting that the CAE model identified a more segmented range of outliers. Overall, while the Box Counting method yielded similar results across sectors, the models behaved more distinctly in NACE05 and NACE30, indicating more nuanced or segmented anomaly patterns compared to the uniform behavior in NACE24.

In contrast, the Hall–Wood method highlighted more significant differences between the sectors and between the models. In NACE05 and NACE30, there was a stark contrast between CAE and VAE performances. The VAE model captured a broader range of outliers, particularly larger deviations, as indicated by higher mean LOF scores—104,678 in NACE05 and 798,700 in NACE30 for VAE, compared to lower scores for CAE. In NACE05, the mean anomaly score for VAE (2.027) was notably higher than that of CAE, reflecting VAE’s ability to detect a wider variety of deviations. The CAE’s detection remained more centralized, as seen in its silhouette score of −0.191 compared to VAE’s −0.162. NACE24 followed a similar pattern but with less pronounced differences between the models, as both had identical mean anomaly scores of 2.179. In NACE30, the CAE focused on moderate anomalies, indicated by a Davies–Bouldin Index of 2.619, while the VAE captured a broader range of anomalies, highlighted by a Calinski–Harabasz Index of 1.223 for CAE versus 0.750 for VAE. This demonstrates that the Hall–Wood method was particularly effective for VAE in identifying diverse outliers in NACE30. Overall, the method showed that both models, especially VAE, were adept at detecting extreme and varied anomalies in NACE05 and NACE30, while NACE24 exhibited a more moderate anomaly pattern.

The Genton method presented the most pronounced divergence between sectors in terms of model performance. In NACE05, the VAE captured a broad distribution of anomaly scores with a mean anomaly score of 1.688, while the CAE focused on a narrower range of outliers, reflected by a lower silhouette score of −2.848. A similar pattern was observed in NACE30, where the VAE detected a wide variety of anomalies, indicated by a mean LOF score of

1.025 \times 10^{7}

, whereas the CAE remained concentrated on specific moderate deviations, with a silhouette score of 3.521 for CAE versus −1.781 for VAE. In NACE24, the CAE detected even fewer anomalies, and while the VAE still captured a broader set of outliers, its mean anomaly score of 1.745 and Davies–Bouldin Index of 6.999 suggested fewer high deviations compared to the other sectors. This implies that while the Genton method is effective at capturing a wide range of deviations in NACE05 and NACE30, it is less suited for detecting significant outliers in NACE24. Overall, this method revealed that both models, especially VAE, were better at identifying anomalies in NACE05 and NACE30, consistently detecting a more diverse set of outliers.

Lastly, the Wavelet method showed less pronounced differences between sectors, though certain patterns emerged. In NACE05, both CAE and VAE exhibited similar behavior, with mean anomaly scores of 1.914 for both models and central clustering of anomaly scores. The VAE showed a broader distribution, reflected by a higher mean LOF score of 104,678 compared to 75,175 for CAE. NACE30 demonstrated a similar trend, with both models detecting anomalies within a central range, but the VAE had a broader tail, indicating greater sensitivity to subtle outliers, as suggested by silhouette scores of −0.051 for CAE and −0.066 for VAE. In NACE24, both models performed more uniformly, with identical mean anomaly scores of 1.986 and nearly identical Davies–Bouldin Index values. The Wavelet method thus showed more consistency across sectors, although the VAE’s broader detection range in NACE05 and NACE30 suggested greater sensitivity to subtle deviations, while NACE24’s results were more focused and moderate.

Comparing the sectors overall, NACE05, and NACE30 displayed more significant differences between the models across most methods, with the VAE consistently capturing a broader and more varied range of anomalies. This was particularly evident through higher mean LOF scores and Calinski–Harabasz Index values. NACE24, however, showed more uniform behavior between the models, with fewer extreme outliers detected, especially in the Genton and Hall–Wood methods. The tighter clustering in NACE24 was suggested by Davies–Bouldin Index values of 1.402 for CAE and 3.101 for VAE in the Hall–Wood method. This indicates that anomalies in NACE24 are more moderate and less varied compared to the wider range of deviations observed in NACE05 and NACE30.

In conclusion, while both CAE and VAE models are effective for anomaly detection across all three sectors, the VAE consistently provides a broader scope of detection. This is particularly true in sectors with more complex or varied data patterns, such as NACE05 and NACE30. The CAE tends to focus on more specific or moderate anomalies, which is especially noticeable in NACE24. Additionally, the choice of fractional dimension method significantly influences the results. The Genton and Hall–Wood methods showed the greatest variances between models and sectors, highlighting the models’ differing sensitivities to anomaly types, while the Box Counting and Wavelet methods yielded more consistent outcomes across all sectors. This suggests that selecting an appropriate method is crucial depending on the specific characteristics of the sector and the anomalies of interest.

5. Conclusions

This study explored the detection of anomalies in occupational accident data across three key sectors in Turkey—Mining of Coal and Lignite (NACE05), Manufacture of Basic Metals (NACE24), and Manufacture of Other Transport Equipment (NACE30)—using Convolutional Autoencoder (CAE) and Variational Autoencoder (VAE) models. By employing four distinct fractional dimension methods—Box Counting, Hall–Wood, Genton, and Wavelet—we aimed to capture the complexity and fractal characteristics of the data from multiple perspectives. A significant aspect of our methodology was the use of Gramian Angular Fields (GAFs), which transformed the fractional dimension time series into images. This transformation enabled the deep learning models to effectively process intricate patterns and nonlinear relationships inherent in the data, enhancing the detection of anomalies that might be precursors to safety incidents.

The results revealed that both CAE and VAE models were effective in detecting anomalies across all sectors, but the VAE consistently reported a broader range of anomalies than the CAE. This was particularly evident in sectors with complex operational processes like NACE05 and NACE30. In NACE30, which involves the manufacture of transport equipment such as ships, trains, and aircraft, the CAE model identified distinct peaks in anomaly distributions using the Box Counting method. This was reflected by a high Calinski–Harabasz Index (8.523 for CAE), indicating a segmented range of outliers that may correspond to specific operational behaviors affecting safety. The VAE model, especially with the Hall–Wood and Genton methods, captured a broader range of anomalies with higher mean Local Outlier Factor (LOF) scores (798,700 in Hall–Wood for VAE) and mean anomaly scores. These findings suggest that the VAE was more sensitive to both moderate and extreme deviations, potentially linked to complex supply chain issues, inconsistencies in quality control, or variations in assembly processes that could impact worker safety.

In NACE05, the mining sector, the VAE model again captured a broad distribution of anomaly scores with a mean anomaly score of 1.688 using the Genton method, while the CAE focused on a narrower range of outliers, reflected by a lower silhouette score of −2.848. This indicates that the VAE was more adept at identifying diverse anomalies that may correlate with operational risks inherent in mining activities, such as equipment failures or hazardous environmental conditions.

NACE24, the manufacture of basic metals, showed more uniform behavior between the models, with fewer extreme outliers detected. Both models exhibited similar mean anomaly scores and nearly identical Davies–Bouldin Index values across methods like Box Counting and Wavelet. This suggests that anomalies in NACE24 were more moderate and less varied, possibly due to standardized safety practices and consistent operational procedures.

The choice of fractional dimension method significantly influenced the results. The Genton and Hall–Wood methods presented the most pronounced divergence between sectors in terms of model performance. In these methods, the VAE consistently identified a broader range of anomalies in NACE05 and NACE30, highlighting its sensitivity to complex data patterns. The Box Counting and Wavelet methods showed less pronounced differences between sectors, producing more consistent results across both models. These methods may be more robust when analyzing sectors with more regular or moderate anomaly patterns, such as NACE24.

The integration of GAFs was crucial in enhancing the anomaly detection capabilities of both models. By transforming the fractional dimension time series into images, GAFs allowed the models to exploit spatial hierarchies and local dependencies, capturing complex temporal dynamics and nonlinear relationships. This approach improved the detection of anomalies and provided visual interpretability, aiding in qualitative analysis and supporting the development of targeted safety interventions.

In the context of occupational safety, these findings have significant implications. The ability of the models, particularly the VAE, to detect a diverse set of anomalies can assist safety professionals in identifying problematic stages in operational processes. For instance, in NACE30, anomalies may reveal patterns of equipment misuse, lapses in protective measures, or areas where safety training is insufficient. In NACE05, detecting a wide variety of anomalies could help pinpoint specific areas where safety practices may be lacking or where additional monitoring is necessary.

In conclusion, this study demonstrates the effectiveness of combining fractional dimensions, GAF transformations, and deep learning models in detecting anomalies in occupational accident data. The VAE consistently reported a broader range of anomalies than the CAE, especially in sectors with more complex or varied data patterns like NACE05 and NACE30. The CAE was effective in detecting more specific or moderate anomalies, which was particularly noticeable in NACE24. The choice of fractional dimension method played a significant role, with the Genton and Hall–Wood methods showing the greatest variances between models and sectors, while the Box Counting and Wavelet methods yielded more consistent outcomes across all sectors.

However, there are limitations to this study. The analysis was confined to specific sectors, which may limit the generalizability of the findings to other industries with different operational risks or accident patterns. Additionally, while we explored several fractional dimension methods, future research could investigate the impact of combining these methods or incorporating other advanced time series techniques. The use of unsupervised learning models like CAE and VAE poses challenges in fine-tuning for specific industrial environments, which could affect their performance.

Future studies could explore hybrid models that integrate different anomaly detection techniques or incorporate supervised learning approaches to enhance detection accuracy. Applying this methodology across a wider range of sectors could further validate its effectiveness in diverse industrial contexts. Investigating the real-time implementation of these methods in operational settings could also provide dynamic safety monitoring, enhancing the ability to prevent accidents before they occur.

Overall, this research contributes to the advancement of occupational safety by introducing an innovative anomaly detection framework that leverages the strengths of fractional dimensions, GAF transformations, and deep learning models. By capturing complex patterns and nonlinear relationships in operational data, this approach offers a powerful tool for identifying potential safety hazards. The findings underscore the importance of selecting appropriate models and analytical methods based on the specific characteristics of the data and the nature of the anomalies of interest. Integrating these insights into safety management systems can inform the development of more effective safety policies and practices, ultimately reducing workplace accidents and improving employee well-being across industries.

Author Contributions

Conceptualization, Ö.A., L.M.B., M.A.B., G.T. and A.N.; methodology, Ö.A., L.M.B., M.A.B. and G.T.; software, Ö.A. and M.A.B.; validation, G.T.; formal analysis, Ö.A., L.M.B., M.A.B., G.T. and A.N.; investigation, Ö.A., L.M.B., M.A.B., G.T. and A.N.; resources, Ö.A., L.M.B., M.A.B., G.T. and A.N.; data curation, Ö.A., M.A.B. and G.T; writing—original draft preparation, Ö.A., L.M.B., M.A.B., G.T. and A.N.; writing—review and editing, Ö.A., L.M.B., M.A.B., G.T. and A.N.; visualization, Ö.A. and M.A.B.; supervision, L.M.B. and M.A.B.; project administration, L.M.B. and M.A.B.; funding acquisition, L.M.B. and A.N. All authors have read and agreed to the published version of the manuscript.

Funding

This study was conducted with financial support from the scientific research funds of the “1 Decembrie 1918” University of Alba Iulia, Romania.

Data Availability Statement

The data used in this study were provided to us through the letter numbered E-35441757-207.01.02.99-95499662 from the Presidency of the Social Security Institution of the Republic of Turkey. We extend our gratitude to the Presidency of the Social Security Institution of the Republic of Turkey for supplying the data. The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Beeche, C.A.; Garcia, M.A.; Leng, S.; Roghanchi, P.; Pu, J. Computational risk modeling of underground coal mines based on NIOSH employment demographics. Saf. Sci. 2023, 164, 106170. [Google Scholar] [CrossRef] [PubMed]
Ilić Krstić, I.; Avramović, D.; Živković, S. Occupational injuries in underground coal mining in Serbia: A case study. Work 2021, 69, 815–825. [Google Scholar] [CrossRef]
Geldart, S.; Smith, C.A.; Shannon, H.S.; Lohfeld, L. Organizational practices and workplace health and safety: A cross-sectional study in manufacturing companies. Saf. Sci. 2010, 48, 562–569. [Google Scholar] [CrossRef]
Sitompul, Y.R.; Simarmata, V.P.A. Description of Work Accident and Occupational Safety and Health Activities of Paint Manufacturing Industry PTSU, in West Java 2016–2017. Int. J. Health Sci. Res. 2022, 12, 280–289. [Google Scholar] [CrossRef]
Shanmugasundar, G.; Sabarinath, S.S.; Babu, K.R.; Srividhya, M. Analysis of occupational health and safety measures of employee in material manufacturing industry using statistical methods. Mater. Today Proc. 2021, 46, 3259–3262. [Google Scholar] [CrossRef]
Berhan, E. Management commitment and its impact on occupational health and safety improvement: A case of iron, steel and metal manufacturing industries. Int. J. Workplace Health Manag. 2020, 13, 427–444. [Google Scholar] [CrossRef]
Małysa, T. Application of Forecasting as an Element of Effective Management in the Field of Improving Occupational Health and Safety in the Steel Industry in Poland. Sustainability 2022, 14, 1351. [Google Scholar] [CrossRef]
Badri, A.; Boudreau-Trudel, B.; Souissi, A.S. Occupational health and safety in the industry 4.0 era: A cause for major concern? Saf. Sci. 2018, 109, 403–411. [Google Scholar] [CrossRef]
Forteza, F.J.; Carretero-Gomez, J.M.; Sese, A. Occupational risks, accidents on sites and economic performance of construction firms. Saf. Sci. 2017, 94, 61–76. [Google Scholar] [CrossRef]
Matthews, L.R.; Quinlan, M.; Jessup, G.M.; Bohle, P. Hidden costs, hidden lives: Financial effects of fatal work injuries on families. Econ. Labour Relat. Rev. 2022, 33, 586–609. [Google Scholar] [CrossRef]
Sheehan, L.R.; Lane, T.J.; Collie, A. The impact of income sources on financial stress in workers’ compensation claimants. J. Occup. Rehabil. 2020, 30, 679–688. [Google Scholar] [CrossRef] [PubMed]
Kim, D.K.; Park, S. An analysis of the effects of occupational accidents on corporate management performance. Saf. Sci. 2021, 138, 105228. [Google Scholar] [CrossRef]
Gander, P.; Hartley, L.; Powell, D.; Cabon, P.; Hitchcock, E.; Mills, A.; Popkin, S. Fatigue risk management: Organizational factors at the regulatory and industry/company level. Accid. Anal. Prev. 2011, 43, 573–590. [Google Scholar] [CrossRef] [PubMed]
Gatzert, N. The impact of corporate reputation and reputation damaging events on financial performance: Empirical evidence from the literature. Eur. Manag. J. 2015, 33, 485–499. [Google Scholar] [CrossRef]
Flammer, C. Corporate social responsibility and shareholder reaction: The environmental awareness of investors. Acad. Manag. J. 2013, 56, 758–781. [Google Scholar] [CrossRef]
Pouliakas, K.; Theodossiou, I. The economics of health and safety at work: An interdiciplinary review of the theory and policy. J. Econ. Surv. 2013, 27, 167–208. [Google Scholar] [CrossRef]
Chattopadhyay, S.; Chattopadhyay, D. Coal and other mining operations: Role of sustainability. In Fossil Energy; Springer: New York, NY, USA, 2020; pp. 333–356. [Google Scholar]
Gautam, P.K.; Gautam, R.K.; Banerjee, S.; Chattopadhyaya, M.; Pandey, J. Heavy metals in the environment: Fate, transport, toxicity and remediation technologies. Nova Sci. Publ. 2016, 60, 101–130. [Google Scholar]
Choi, K.; Yi, J.; Park, C.; Yoon, S. Deep learning for anomaly detection in time-series data: Review, analysis, and guidelines. IEEE Access 2021, 9, 120043–120065. [Google Scholar] [CrossRef]
Li, G.; Jung, J.J. Deep learning for anomaly detection in multivariate time series: Approaches, applications, and challenges. Inf. Fusion 2023, 91, 93–102. [Google Scholar] [CrossRef]
Memarzadeh, M.; Matthews, B.; Avrekh, I. Unsupervised anomaly detection in flight data using convolutional variational auto-encoder. Aerospace 2020, 7, 115. [Google Scholar] [CrossRef]
Pota, M.; De Pietro, G.; Esposito, M. Real-time anomaly detection on time series of industrial furnaces: A comparison of autoencoder architectures. Eng. Appl. Artif. Intell. 2023, 124, 106597. [Google Scholar] [CrossRef]
Hong, Y.Y.; Martinez, J.J.F.; Fajardo, A.C. Day-ahead solar irradiation forecasting utilizing gramian angular field and convolutional long short-term memory. IEEE Access 2020, 8, 18741–18753. [Google Scholar] [CrossRef]
Qin, Z.; Zhang, Y.; Meng, S.; Qin, Z.; Choo, K.K.R. Imaging and fusing time series for wearable sensor-based human activity recognition. Inf. Fusion 2020, 53, 80–87. [Google Scholar] [CrossRef]
Yokkampon, U.; Mowshowitz, A.; Chumkamon, S.; Hayashi, E. Autoencoder with Gramian Angular Summation Field for Anomaly Detection in Multivariate Time Series Data. J. Adv. Artif. Life Robot. 2022, 2, 206–210. [Google Scholar] [CrossRef]
Chen, W.S.; Yuan, S.Y. Some fractal dimension estimate algorithms and their applications to one-dimensional biomedical signals. Biomed. Eng. Appl. Basis Commun. 2002, 14, 100–108. [Google Scholar] [CrossRef]
Kaminsky, R.; Mochurad, L.; Shakhovska, N.; Melnykova, N. Calculation of the exact value of the fractal dimension in the time series for the box-counting method. In Proceedings of the 2019 9th International Conference on Advanced Computer Information Technologies (ACIT), Ceske Budejovice, Czech Republic, 5–7 June 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 248–251. [Google Scholar]
So, G.B.; So, H.R.; Jin, G.G. Enhancement of the box-counting algorithm for fractal dimension estimation. Pattern Recognit. Lett. 2017, 98, 53–58. [Google Scholar] [CrossRef]
Chai, R. Fractal dimension of fractional Brownian motion based on random sets. Fractals 2020, 28, 2040020. [Google Scholar] [CrossRef]
Valentim, C.A.; Inacio, C.M.C., Jr.; David, S.A. Fractal methods and power spectral density as means to explore EEG patterns in patients undertaking mental tasks. Fractal Fract. 2021, 5, 225. [Google Scholar] [CrossRef]
Balcı, M.A.; Batrancea, L.M.; Akgüller, Ö.; Gaban, L.; Rus, M.I.; Tulai, H. Fractality of Borsa Istanbul during the COVID-19 pandemic. Mathematics 2022, 10, 2503. [Google Scholar] [CrossRef]
Radu, V.; Dumitrescu, C.; Vasile, E.; Tanase, L.C.; Stefan, M.C.; Radu, F. Analysis of the Romanian capital market using the fractal dimension. Fractal Fract. 2022, 6, 564. [Google Scholar] [CrossRef]
Sarraj, M.; Ben Mabrouk, A. The Systematic Risk at the Crisis—A Multifractal Non-Uniform Wavelet Systematic Risk Estimation. Fractal Fract. 2021, 5, 135. [Google Scholar] [CrossRef]
Wang, W.; Xiang, H.; Zhao, D. Estimating the fractal dimension of hydrological time series by wavelet analysis. J. Sichuan Univ. (Eng. Sci. Ed.) 2005, 37, 1–4. [Google Scholar]
Jaleel, M.; Kucukler, O.F.; Alsalemi, A.; Amira, A.; Malekmohamadi, H.; Diao, K. Analyzing gas data using deep learning and 2-d gramian angular fields. IEEE Sens. J. 2023, 23, 6109–6116. [Google Scholar] [CrossRef]
Shankar, A.; Khaing, H.K.; Dandapat, S.; Barma, S. Epileptic seizure classification based on Gramian angular field transformation and deep learning. In Proceedings of the 2020 IEEE Applied Signal Processing Conference (ASPCON), Kolkata, India, 7–9 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 147–151. [Google Scholar]
Jiang, W.; Zhang, D.; Ling, L.; Lin, R. Time series classification based on image transformation using feature fusion strategy. Neural Process. Lett. 2022, 54, 3727–3748. [Google Scholar] [CrossRef]
Abbasi, S.; Famouri, M.; Shafiee, M.J.; Wong, A. OutlierNets: Highly compact deep autoencoder network architectures for on-device acoustic anomaly detection. Sensors 2021, 21, 4805. [Google Scholar] [CrossRef]
Thill, M.; Konen, W.; Wang, H.; Bäck, T. Temporal convolutional autoencoder for unsupervised anomaly detection in time series. Appl. Soft Comput. 2021, 112, 107751. [Google Scholar] [CrossRef]
Bond-Taylor, S.; Leach, A.; Long, Y.; Willcocks, C.G. Deep generative modelling: A comparative review of vaes, gans, normalizing flows, energy-based and autoregressive models. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 7327–7347. [Google Scholar] [CrossRef]
Ghosh, P.; Sajjadi, M.S.; Vergari, A.; Black, M.J.; Schölkopf, B. From Variational to Deterministic Autoencoders. In Proceedings of the Eight International Conference on Learning Representations (ICLR 2020), Virtual, 26 April–1 May 2020. [Google Scholar]
Li, P.; Pei, Y.; Li, J. A comprehensive survey on design and application of autoencoder in deep learning. Appl. Soft Comput. 2023, 138, 110176. [Google Scholar] [CrossRef]

$Fractalfract 08 00604 g001$

Figure 1. Time series of occupational accidents in the NACE05 sector (Coal and Lignite Extraction) from 2012 to 2023. The figure shows a highly variable pattern with significant spikes in accident occurrences, particularly around 2013–2014, 2020, and 2022, indicating potential periods of abnormal safety conditions.

$Fractalfract 08 00604 g001$

$Fractalfract 08 00604 g002$

Figure 2. Time series of occupational accidents in the NACE30 sector (Manufacture of Other Transportation Vehicles) from 2012 to 2023. The figure illustrates a transition from stable accident rates in the early years to increased volatility and frequent spikes post-2019, indicating potential anomalies in the sector’s safety conditions.

$Fractalfract 08 00604 g002$

$Fractalfract 08 00604 g003$

Figure 3. Time series of occupational accidents in the NACE24 sector (Basic Metal Industry) from 2012 to 2023. The figure highlights multiple significant spikes in accident occurrences, particularly around 2014 and from 2020 onward, indicating potential anomalies in the sector’s safety conditions.

$Fractalfract 08 00604 g003$

$Fractalfract 08 00604 g004$

Figure 4. Fractional dimension series for NACE05 sector with sliding window size of 8.

$Fractalfract 08 00604 g004$

$Fractalfract 08 00604 g005$

Figure 5. Fractional dimension series for NACE30 sector with sliding window size of 8.

$Fractalfract 08 00604 g005$

$Fractalfract 08 00604 g006$

Figure 6. Fractional dimension series for NACE24 sector with sliding window size of 8.

$Fractalfract 08 00604 g006$

$Fractalfract 08 00604 g007$

Figure 7. Samples of images emerging from GASFs for NACE05.

$Fractalfract 08 00604 g007$

$Fractalfract 08 00604 g008$

Figure 8. Samples of images emerging from GASFs for NACE30.

$Fractalfract 08 00604 g008$

$Fractalfract 08 00604 g009$

Figure 9. Samples of images emerging from GASFs for NACE24.

$Fractalfract 08 00604 g009$

$Fractalfract 08 00604 g010$

Figure 10. Detected anomalies for NACE05 by using CAE.

$Fractalfract 08 00604 g010$

$Fractalfract 08 00604 g011$

Figure 11. Detected anomalies for NACE05 by using VAE.

$Fractalfract 08 00604 g011$

$Fractalfract 08 00604 g012$

Figure 12. Detected anomalies for NACE30 by using CAE.

$Fractalfract 08 00604 g012$

$Fractalfract 08 00604 g013$

Figure 13. Detected anomalies for NACE30 by using VAE.

$Fractalfract 08 00604 g013$

$Fractalfract 08 00604 g014$

Figure 14. Detected anomalies for NACE24 by using CAE.

$Fractalfract 08 00604 g014$

$Fractalfract 08 00604 g015$

Figure 15. Detected anomalies for NACE24 by using VAE.

$Fractalfract 08 00604 g015$

$Fractalfract 08 00604 g016$

Figure 16. Distributions of anomaly scores detected by CAE and VAE for NACE05 sector.

$Fractalfract 08 00604 g016$

$Fractalfract 08 00604 g017$

Figure 17. Distributions of anomaly scores detected by CAE and VAE for NACE30 sector.

$Fractalfract 08 00604 g017$

$Fractalfract 08 00604 g018$

Figure 18. Distributions of anomaly scores detected by CAE and VAE for NACE24 sector.

$Fractalfract 08 00604 g018$

Table 1. Anomaly detection metrics for NACE05 sector with CAE.

Dimension	Box Counting	Genton	Hall–Wood	Wavelet
Mean Anomaly Score	$1.356658 \times 10^{0}$	$1.658744 \times 10^{0}$	2.027355	1.914555
Silhouette Score	$- 3.305738 \times 10^{- 1}$	$- 2.848759 \times 10^{- 1}$	$- 0.191293$	$- 0.051750$
Mean LOF Score	$9.503116 \times 10^{6}$	$4.282022 \times 10^{6}$	751,575.983577	104,678.389288
Davies–Bouldin Index	$1.479347 \times 10^{1}$	$4.879715 \times 10^{0}$	2.774837	0.895968
Calinski–Harabasz Index	$9.022555 \times 10^{- 3}$	$2.515651 \times 10^{- 1}$	1.248670	2.189310
Dunn Index	$7.528629 \times 10^{- 5}$	$0.000000 \times 10^{0}$	0.000000	0.000000

Table 2. Anomaly detection metrics for NACE05 sector with VAE.

Dimension	Box Counting	Genton	Hall–Wood	Wavelet
Mean Anomaly Score	$1.356658 \times 10^{0}$	$1.658744 \times 10^{0}$	2.027355	1.914555
Silhouette Score	$- 3.305738 \times 10^{- 1}$	$- 2.538866 \times 10^{- 1}$	−0.162309	−0.051750
Mean LOF Score	$9.503116 \times 10^{6}$	$4.282022 \times 10^{6}$	751,575.983577	104,678.389288
Davies–Bouldin Index	$1.479347 \times 10^{1}$	$4.618381 \times 10^{0}$	2.325548	0.895968
Calinski–Harabasz Index	$9.022555 \times 10^{- 3}$	$2.565173 \times 10^{- 1}$	1.543713	2.189310
Dunn Index	$7.528629 \times 10^{- 5}$	$0.000000 \times 10^{0}$	0.000000	0.000000

Table 3. Anomaly detection metrics for NACE30 sector with CAE.

Dimension	Box Counting	Genton	Hall–Wood	Wavelet
Mean Anomaly Score	$1.387400 \times 10^{0}$	$1.675677 \times 10^{0}$	2.062522	1.960624
Silhouette Score	$3.521386 \times 10^{- 1}$	$- 9.698684 \times 10^{- 2}$	−0.052766	−0.066488
Mean LOF Score	$9.807179 \times 10^{6}$	$1.025165 \times 10^{7}$	798,700.715486	1.060215
Davies–Bouldin Index	$7.223060 \times 10^{- 1}$	$2.228118 \times 10^{0}$	2.619118	0.812285
Calinski–Harabasz Index	$8.523012 \times 10^{0}$	$1.330725 \times 10^{0}$	1.223944	0.979110
Dunn Index	$0.000000 \times 10^{0}$	$0.000000 \times 10^{0}$	0.000000	0.000000

Table 4. Anomaly detection metrics for NACE30 sector with VAE.

Dimension	Box Counting	Genton	Hall–Wood	Wavelet
Mean Anomaly Score	$1.387400 \times 10^{0}$	$1.675677 \times 10^{0}$	2.062522	1.960624
Silhouette Score	$3.521386 \times 10^{- 1}$	$- 1.781847 \times 10^{- 1}$	−0.059242	−0.066488
Mean LOF Score	$9.807179 \times 10^{6}$	$1.025165 \times 10^{7}$	798,700.715486	1.060215
Davies–Bouldin Index	$7.223060 \times 10^{- 1}$	$3.108188 \times 10^{0}$	3.163852	0.812285
Calinski–Harabasz Index	$8.523012 \times 10^{0}$	$4.657136 \times 10^{- 1}$	0.750734	0.979110
Dunn Index	$0.000000 \times 10^{0}$	$0.000000 \times 10^{0}$	0.000000	0.000000

Table 5. Anomaly detection metrics for NACE24 sector with CAE.

Dimension	Box Counting	Genton	Hall–Wood	Wavelet
Mean Anomaly Score	$1.394522 \times 10^{0}$	$1.745517 \times 10^{0}$	$2.179524 \times 10^{0}$	1.985830
Silhouette Score	$- 2.297276 \times 10^{- 1}$	$- 3.918465 \times 10^{- 1}$	$- 1.525400 \times 10^{- 2}$	−0.132634
Mean LOF Score	$6.225602 \times 10^{6}$	$6.270015 \times 10^{6}$	$1.053318 \times 10^{6}$	1.028634
Davies–Bouldin Index	$1.083728 \times 10^{0}$	$8.073675 \times 10^{1}$	$2.402526 \times 10^{0}$	1.112059
Calinski–Harabasz Index	$1.136497 \times 10^{0}$	$6.767209 \times 10^{- 4}$	$2.402606 \times 10^{0}$	1.400540
Dunn Index	$0.000000 \times 10^{0}$	$2.852035 \times 10^{- 5}$	$0.000000 \times 10^{0}$	0.000000

Table 6. Anomaly detection metrics for NACE24 sector with VAE.

Dimension	Box Counting	Genton	Hall–Wood	Wavelet
Mean Anomaly Score	$1.394522 \times 10^{0}$	$1.745517 \times 10^{0}$	$2.179524 \times 10^{0}$	1.985830
Silhouette Score	$- 2.297276 \times 10^{- 1}$	$- 3.913471 \times 10^{- 1}$	$- 6.768336 \times 10^{- 2}$	−0.132634
Mean LOF Score	$6.225602 \times 10^{6}$	$6.270015 \times 10^{6}$	$1.053318 \times 10^{6}$	1.028634
Davies–Bouldin Index	$1.083728 \times 10^{0}$	$6.999509 \times 10^{1}$	$3.100156 \times 10^{0}$	1.112059
Calinski–Harabasz Index	$1.136497 \times 10^{0}$	$7.413058 \times 10^{- 4}$	$1.155069 \times 10^{0}$	1.400540
Dunn Index	$0.000000 \times 10^{0}$	$2.852035 \times 10^{- 5}$	$0.000000 \times 10^{0}$	0.000000

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Akgüller, Ö.; Batrancea, L.M.; Balcı, M.A.; Tuna, G.; Nichita, A. Deep Learning-Based Anomaly Detection in Occupational Accident Data Using Fractional Dimensions. Fractal Fract. 2024, 8, 604. https://doi.org/10.3390/fractalfract8100604

AMA Style

Akgüller Ö, Batrancea LM, Balcı MA, Tuna G, Nichita A. Deep Learning-Based Anomaly Detection in Occupational Accident Data Using Fractional Dimensions. Fractal and Fractional. 2024; 8(10):604. https://doi.org/10.3390/fractalfract8100604

Chicago/Turabian Style

Akgüller, Ömer, Larissa M. Batrancea, Mehmet Ali Balcı, Gökhan Tuna, and Anca Nichita. 2024. "Deep Learning-Based Anomaly Detection in Occupational Accident Data Using Fractional Dimensions" Fractal and Fractional 8, no. 10: 604. https://doi.org/10.3390/fractalfract8100604

APA Style

Akgüller, Ö., Batrancea, L. M., Balcı, M. A., Tuna, G., & Nichita, A. (2024). Deep Learning-Based Anomaly Detection in Occupational Accident Data Using Fractional Dimensions. Fractal and Fractional, 8(10), 604. https://doi.org/10.3390/fractalfract8100604

Article Menu

Deep Learning-Based Anomaly Detection in Occupational Accident Data Using Fractional Dimensions

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Set

2.2. Fractal Dimension

2.3. Gramian Angular Fields

2.4. Anomaly Detection

3. Results

3.1. Fractional Dimensions

3.1.1. Fractional Dimensions for NACE05

3.1.2. Fractional Dimensions for NACE30

3.1.3. Fractional Dimensions for NACE24

3.2. GAFs

3.3. Deep Learning Models Results

3.4. Results for NACE05

3.5. Results for NACE30

3.6. Results for NACE24

4. Discussion

4.1. Discussions on NACE05

4.2. Discussions on NACE30

4.3. Discussions on NACE24

4.4. Comparative Discussions

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI