2.1. Data Set
The dataset utilized in this study was meticulously gathered from the Turkish Republic Social Security Institution, a trusted and authoritative source of occupational data within Turkey. Access to this dataset was granted under official permission, ensuring that the data are both legitimate and reliable. Spanning from the beginning of 2012 to the beginning of 2023, this dataset provides a comprehensive temporal window, encompassing over a decade of occupational accident records. This extended timeframe allows for a thorough examination of trends, seasonality, and the evolution of safety practices within key industrial sectors in Turkey. The duration and scope of this dataset are critical for understanding long-term patterns and the impact of various interventions on occupational safety over the years.
For this study, we strategically selected three industrial sectors that are of paramount importance due to their high incidence rates of occupational accidents. These sectors were identified based on the latest available data from the 2022 Statistical Yearbook of the Social Security Institution, which ranks industries according to their occupational accident incidence rates. By focusing on these sectors, our study addresses areas with the greatest need for safety improvements and the highest potential impact on reducing workplace injuries. The sectors chosen are:
- •
NACE05—Coal and Lignite Extraction: The Coal and Lignite Extraction sector is a cornerstone of Turkey’s energy production, yet it is notoriously hazardous due to the inherent risks of mining. This sector has been a focal point for occupational safety due to the high frequency and severity of accidents associated with underground and surface mining operations. The dataset for this sector includes 2092 time ticks, corresponding to individual periods during which occupational accidents were recorded. Over the study period, a total of 67,547 occupational accidents were reported in this sector. This significant number of incidents reflects the perilous conditions faced by workers in coal and lignite extraction and underscores the critical need for ongoing safety enhancements and rigorous monitoring protocols in the mining industry.
- •
NACE30—Manufacture of Other Transportation Vehicles: This sector includes the manufacturing of a wide range of transportation vehicles, such as ships, trains, and aircraft, excluding motor vehicles. The sector is characterized by complex production processes, strict regulatory standards, and the need for precision in manufacturing, which collectively influence the safety landscape. The dataset for this sector consists of 2007 time ticks, with a total of 31,095 occupational accidents recorded. Although the total number of accidents is lower compared to the coal and lignite extraction sector, the nature of the work involves high-stakes operations where safety failures can have severe consequences. The data highlights the critical need for targeted safety measures and continuous monitoring to prevent accidents in this highly specialized industry, where even minor incidents can lead to significant disruptions.
- •
NACE24—Basic Metal Industry: The Basic Metal Industry is foundational to numerous other industrial activities, producing essential materials such as steel, aluminum, and other non-ferrous metals. This sector is particularly hazardous due to the intense heat, heavy machinery, and hazardous chemicals involved in metal production processes. The dataset for this sector includes 2098 time ticks, during which a staggering total of 175,881 occupational accidents were recorded. This makes the Basic Metal Industry the most accident-prone sector among those analyzed in this study. The sheer volume of incidents in this sector highlights the persistent dangers that workers face and the urgent need for comprehensive safety protocols and innovations to mitigate risks. This sector’s high accident rate also underscores its critical role in the broader industrial ecosystem, where safety improvements could have far-reaching effects.
The time series presented for the NACE05 sector in
Figure 1, which pertains to Coal and Lignite Extraction, exhibits a highly variable pattern with multiple significant spikes in occupational accidents over the period from 2012 to early 2023. The data show an initial period of low activity, followed by a sharp and intense cluster of spikes around 2013 and 2014. This period is characterized by frequent and severe accident occurrences, with peaks reaching nearly 300 incidents. After a relatively quiet period from 2015 to 2018, the series resumes with increased volatility starting around 2019, culminating in another set of spikes, particularly around 2020 and again in 2022, with a notable increase in accident frequency and magnitude.
The time series depicted in
Figure 2 represents the total number of occupational accidents in the NACE30 sector, which involves the manufacture of other transportation vehicles, from the beginning of 2012 to the middle of 2023. The early part of the series shows a relatively low and stable accident rate, with some sporadic spikes around 2014. After a period of reduced activity from 2015 to 2018, there is a noticeable increase in both the frequency and magnitude of accidents, starting around 2019. This trend continues into 2022, with the data showing a dense clustering of accident occurrences and multiple significant spikes, particularly evident as the series progresses. The increase in volatility and the concentration of accidents in the later years suggest that the sector has experienced periods of heightened risk or operational changes that impacted safety.
The time series displayed in
Figure 3 represents the total number of occupational accidents in the NACE24 sector, which pertains to the Basic Metal Industry, from the beginning of 2012 to 2023. The data are characterized by multiple prominent spikes, particularly concentrated around 2014 and then again from 2020 onward. These spikes reach over 200 accidents at their peak, indicating periods of significantly increased risk or adverse events within the industry. The early part of the series (2012 to 2014) shows a rapid escalation in accident counts, followed by a relative lull from 2015 to 2018. However, starting in 2019, the data exhibits a marked increase in both the frequency and intensity of accidents, with numerous spikes and a generally higher baseline of incidents continuing into 2022 and beyond.
Overall, the dataset’s extensive coverage, both in terms of time and sectoral focus, provides a rich foundation for advanced analysis. By concentrating on these three high-risk sectors, our study aims to offer actionable insights into the patterns and causes of occupational accidents. The detailed time series data for each sector allows for the application of sophisticated analytical techniques, such as Gramian Angular Fields (GAFs) and deep learning models, to detect anomalies and predict future risks. This approach not only helps in understanding past incidents, but also in proactively identifying areas where safety measures can be improved, ultimately contributing to the prevention of occupational accidents and the promotion of safer work environments across Turkey’s most hazardous industries.
2.2. Fractal Dimension
A numerical metric known as fractal dimension characterizes the intricacy and self-resemblance of time series data, providing insight into the fundamental dynamics of the system that generates the series. The fractal dimension, unlike traditional geometric dimensions, is not limited to integer values. It can take non-integer values, which indicate the extent to which a time series fills the space it occupies. This idea is especially valuable for studying time series that display irregular, fragmented, or chaotic behavior when conventional statistical techniques may be inadequate. The fractal dimension is a useful tool for analyzing time series data, since it captures detailed patterns and scale invariance, allowing for the characterization of roughness, complexity, and underlying structure.
The Box Counting method is one of the most common techniques used to estimate the fractal dimension of a time series or a geometric object [
26,
27,
28]. This method involves plotting the time series
in a two-dimensional space, where the
x-axis represents time and the
y-axis represents the value
at each time step. To estimate the fractal dimension, a grid of boxes with a uniform size
is superimposed on the plot. The method counts the number of boxes
that contain at least one point from the time series. By varying the size of the boxes
and repeating the counting process, a relationship between
and
is established. The fractal dimension
is then estimated by analyzing the scaling behavior of
with respect to
. Mathematically, the fractal dimension is defined as
where
is obtained as the slope of the line in the plot of
against
.
The Hall–Wood method provides an estimation of the fractal dimension by utilizing the properties of fractional Brownian motion, a stochastic process that generalizes Brownian motion [
29,
30]. Given a time series
, the method begins by computing the empirical variogram
, which measures the variance of the differences between pairs of observations separated by a lag
h. The variogram is calculated as
where
represents the number of pairs with lag
h. For time series exhibiting self-similar behavior, the variogram follows a power-law relation,
where
H is the Hurst exponent and
c is a constant. The fractal dimension
is then derived from the Hurst exponent using the relation
. The Hurst exponent
H is typically estimated from the slope of the log-log plot of
versus
h, where the linearity of this plot indicates self-similarity in the time series.
The Genton method offers a robust approach to estimating the fractal dimension, particularly for time series that may be affected by noise or nonstationarity. This method also employs the variogram, but improves upon traditional methods by incorporating robust statistical measures [
31,
32]. The empirical variogram is calculated similarly to the Hall–Wood method, but the Genton method uses the median of squared differences to provide a more resilient estimate, defined as
where
p is typically set to 1 for median absolute deviation or 2 for standard variance. Like the Hall–Wood method, the Genton method assumes a power-law relationship
, and the fractal dimension
is estimated as
.
The Wavelet method leverages the multi-resolution properties of wavelets to estimate the fractal dimension, particularly suitable for nonstationary time series [
33,
34]. The method begins by decomposing the time series
using a wavelet transform. A wavelet
is applied at different scales
a, producing wavelet coefficients
, where
b is the translation parameter, according to the formula
The energy or variance of the wavelet coefficients at each scale
a is then computed as
. The wavelet method relies on the scaling behavior of
, which follows a power-law relationship
, where
is an exponent related to the Hurst exponent
H. The Hurst exponent is then derived from the relation
, and the fractal dimension
is computed as
This method is particularly powerful for analyzing the fractal properties of time series across different frequency bands, capturing the complexity and scaling behavior of the data.
The Box Counting method is one of the most intuitive and widely used techniques for estimating fractal dimensions. It is straightforward to implement, and works well for a wide range of time series and geometric objects. The primary advantage of the Box Counting method is its simplicity and ability to provide a quick estimate of the fractal dimension by simply counting the number of grid boxes that contain points from the time series as the box size varies. However, its accuracy can be limited, especially when dealing with highly irregular or noisy data. The method can also be sensitive to the choice of grid size, and may struggle to accurately capture the fractal dimension of time series that exhibit complex scaling behavior.
In contrast, the Hall–Wood method offers a more sophisticated approach, particularly useful for time series that can be modeled as fractional Brownian motion. This method leverages the variogram, a statistical tool that captures the variability between time points at different lags, to estimate the fractal dimension. The Hall–Wood method is particularly effective for self-similar time series, where the variogram exhibits a power-law relationship with the lag. By focusing on the Hurst exponent, which characterizes the degree of self-similarity, the Hall–Wood method provides a robust estimate of the fractal dimension. However, its reliance on the assumption of fractional Brownian motion can be a limitation when dealing with time series that do not follow this model.
The Genton method enhances the Hall–Wood approach by introducing robust statistical techniques to mitigate the effects of noise and nonstationarity in the time series. While it also uses the variogram, the Genton method replaces the standard variance with a robust estimator, such as the median absolute deviation, to improve the reliability of the fractal dimension estimate. This makes the Genton method particularly suitable for time series that are contaminated by outliers or exhibit irregular behavior. The trade-off, however, is increased computational complexity and the potential need for careful parameter selection, such as the choice of the robust estimator, to ensure accurate results.
The Wavelet method stands out for its ability to handle nonstationary time series, making it particularly powerful in contexts where the data exhibit varying degrees of complexity across different scales. By decomposing the time series into wavelet coefficients at multiple scales, the Wavelet method captures the scaling behavior and energy distribution across different frequency bands. This multi-resolution analysis enables a nuanced estimation of the fractal dimension that can adapt to the inherent complexity of the time series. However, the method requires a good understanding of wavelet transforms and careful selection of the wavelet function, which can make it more challenging to apply compared to the Box Counting method. Additionally, the Wavelet method may involve higher computational costs, particularly for long or complex time series.
2.3. Gramian Angular Fields
This paper presents a complete framework for identifying anomalies in time series data of occupational accidents in various sectors. The methodology combines fractal dimensions and Gramian Angular Fields (GAF) with modern deep learning clustering methods. Our approach consists of using fractal dimensions to measure the intricacy of accident patterns, converting these dimensions into GAF heat maps, and utilizing deep learning algorithms to detect abnormal behavior that may indicate possible safety hazards.
Gramian Angular Fields (GAF) are a technique used to transform time series data into a 2D matrix representation, enabling the application of image-based techniques such as Convolutional Neural Networks (CNNs) for time series analysis [
35,
36,
37]. The core idea behind GAF is to convert the time series data into a polar coordinate system, encoding the angular relationships between different time points, and then construct a Gram matrix from these angular values. This process preserves the temporal dynamics and allows the extraction of spatial features from the time series.
Given a univariate time series
, where
represents the value of the time series at time step
i, the first step is to normalize the time series to the interval
. This normalization can be performed using min-max scaling
where
is the normalized value of
.
Next, each
is encoded as an angular value
in the polar coordinate system
where
represents the angular component corresponding to
in polar coordinates. Note that
is in the range
, so
will be in the range
.
The Gramian Angular Summation Field (GASF) is defined as a matrix
where each element
represents the cosine of the sum of the angular values
and
Expanding the cosine function using trigonometric identities, we can rewrite the above expression as
Given that
, we have
The GASF matrix
encodes the temporal correlations between different time points through the summation of their angular representations.
The Gramian Angular Difference Field (GADF) is an alternative to GASF, focusing on the sine of the difference between the angular values
and
This can be expanded using the trigonometric identity
Substituting
and
, we obtain
The GADF matrix
captures the temporal differences between time points, providing an alternative perspective on the time series data.
The choice of Gramian Angular Fields (GAF) in this study is pivotal for transforming time-series data into a format suitable for advanced anomaly detection techniques. GAF was selected over other potential methods because of its unique ability to convert one-dimensional time-series data into two-dimensional images while preserving temporal dependencies and intrinsic patterns. This transformation enables the application of powerful image-based deep learning models, such as Convolutional Autoencoders (CAE) and Variational Autoencoders (VAE), which are adept at capturing complex spatial features and detecting anomalies that might be overlooked by traditional time-series analysis methods.
One of the primary advantages of using GAF is its preservation of temporal dynamics. GAF encodes the temporal correlations of time-series data into spatial structures within images, meaning that both the magnitude and the temporal relationships between data points are maintained. This allows models to detect anomalies based on both value deviations and temporal patterns. Additionally, by converting time-series data into images, GAF facilitates the use of autoencoders.
Furthermore, GAF can represent nonlinear dynamics inherent in fractional dimension series, which are crucial for understanding the fractal characteristics of the data. This is particularly important in sectors where operational processes exhibit complex, nonlinear behaviors that traditional linear methods might not capture. The visual representation of time-series data through GAF also provides an intuitive way to interpret anomalies, aiding in the qualitative analysis of the data and supporting the quantitative findings of the models.
However, there are limitations to using GAF. Transforming time-series data into GAF images and processing them through deep learning models can be computationally intensive, requiring significant processing power and memory resources, especially with large datasets. While GAF aims to preserve temporal dependencies, the transformation process may lead to some loss of fine-grained temporal information. If the anomaly is highly dependent on specific time intervals, this could reduce the sensitivity of detection. Moreover, the conversion to two-dimensional images increases the dimensionality of the data, which might introduce challenges related to the “curse of dimensionality”, potentially affecting the performance of the models if not managed properly. The use of complex models like CAE and VAE on GAF-transformed data also increases the risk of overfitting, especially if the dataset is not sufficiently large or diverse to generalize well.
2.4. Anomaly Detection
A Convolutional Autoencoder (CAE) is a deep learning architecture well-suited for unsupervised anomaly detection [
38,
39]. It comprises two primary components: an encoder and a decoder. Given an input image
, such as a heat map derived from Gramian Angular Fields (GAF) of fractal dimensional time series data, the encoder maps this image to a lower-dimensional latent space
. Mathematically, the encoder can be represented as a function
, parameterized by weights
, such that
Here,
is the latent representation, a compressed version of the input image
, capturing the essential features of the image while reducing its dimensionality.
The decoder then reconstructs the image from this latent representation. The decoder is represented as a function
, parameterized by weights
, such that the reconstructed image
is given by
The overall goal of the CAE is to learn the parameters
and
such that the reconstructed image
is as close as possible to the original input image
. This is achieved by minimizing the reconstruction error, which is typically measured using a loss function
. A common choice for the loss function is the Mean Squared Error (MSE), defined as
where
n is the number of pixels in the image,
is the pixel value at position
i in the original image, and
is the corresponding pixel value in the reconstructed image.
During training, the CAE minimizes this loss function over a set of normal images, leading to learned representations
and
that are optimized to reconstruct normal data with low error. The trained CAE is then used for anomaly detection. When presented with a new image
, the reconstruction error
is computed. For normal images, this error is expected to be small, as the CAE has learned to reconstruct such images accurately. However, for anomalous images, the reconstruction error tends to be significantly higher because the CAE has not encountered such patterns during training and, therefore, fails to reconstruct them accurately.
To detect anomalies, a threshold
is set on the reconstruction error. If the error for a new image exceeds this threshold, i.e., if
then the image is flagged as anomalous. The threshold
can be determined based on the distribution of reconstruction errors for a validation set of normal images, often using a percentile or statistical measure.
CAEs are particularly effective in this context, because they utilize convolutional layers in both the encoder and decoder, which excel at capturing spatial hierarchies in image data. This ability to model both local and global features allows CAEs to effectively learn the intricate spatial and temporal patterns present in GAF heat maps, making them ideal for identifying subtle deviations and anomalies in the data.
The CAE is a reliable technique for detecting anomalies without supervision, especially useful in situations where the objective is to identify and recreate intricate spatial patterns in the data. The CAE functions by compressing the input image into a latent space and subsequently reconstructing it, with the main goal of reducing the reconstruction error. This process is highly efficient when the regular data displays consistent patterns, enabling the CAE to acquire a concise representation that accurately reproduces these patterns with few mistakes. When the reconstruction error surpasses a predetermined threshold, anomalies are identified. This occurs because the CAE, which is usually unable to effectively rebuild patterns it has not been exposed to during training, fails in these cases. Nevertheless, a possible drawback of CAEs is their deterministic characteristic, as they assign a specific latent vector to each input, which may not adequately represent the inherent diversity in the data, especially when there is noise or minor anomalies present.
A Variational Autoencoder (VAE) is a probabilistic graphical model that extends the standard autoencoder by introducing a probabilistic approach to the latent space [
40,
41,
42]. Unlike traditional autoencoders, which directly map the input image
to a single point in the latent space, a VAE models the latent space as a distribution. Specifically, the encoder in a VAE does not output a fixed latent vector
, but rather the parameters of a probability distribution, typically a multivariate Gaussian distribution
, where
represents the parameters of the encoder network.
The encoder maps the input image
to the mean
and the logarithm of the variance
of the Gaussian distribution
The latent variable
is then sampled from this Gaussian distribution
This probabilistic formulation allows the VAE to capture the uncertainty in the latent space, providing a richer and more flexible representation of the input data.
The decoder, parameterized by
, takes the sampled latent variable
and reconstructs the image
by mapping
back to the image space
The VAE is trained to maximize a variational lower bound on the data likelihood, known as the Evidence Lower Bound (ELBO). The ELBO consists of two terms: the reconstruction loss, which ensures that the decoded images resemble the input images, and the Kullback–Leibler (KL) divergence, which regularizes the latent space to be close to a prior distribution (usually a standard normal distribution
). Mathematically, the ELBO is given by
The first term,
, represents the reconstruction loss, typically implemented as the negative log-likelihood of the reconstructed image given the latent variable
. The second term,
, is the KL divergence between the learned latent distribution
and the prior
, enforcing that the learned latent distribution is close to the prior distribution.
For anomaly detection, VAEs are particularly useful because they not only learn to reconstruct the data, but also provide a probabilistic model of the latent space. Given a new input image
, the encoder maps it to a latent distribution
, and the decoder reconstructs the image
. The anomaly score can be computed by evaluating the log-likelihood of the reconstructed image
under the learned distribution
Images with low log-likelihood, indicating that they do not fit well with the learned distribution of normal data, are flagged as anomalies. This approach is particularly powerful when dealing with GAF heat maps of fractal time series, as it can effectively model the inherent variability in the data while identifying outliers that deviate from the normal patterns.
The VAE overcomes certain constraints of the CAE by incorporating a probabilistic methodology in the latent space. VAEs, instead of directly mapping the input to a single point, employ a mapping that results in a distribution, usually a Gaussian, from which latent variables are sampled. The utilization of probabilistic modeling enables the VAE to encompass a wider spectrum of variations in the data, hence enhancing its capacity to model intricate distributions and identify abnormalities that may not be readily reproducible by a deterministic model, such as the CAE. The VAE’s capacity to represent uncertainty and produce novel samples from the acquired latent distribution also offers added versatility in anomaly detection, as images with a low likelihood according to the acquired distribution can be identified as anomalies. Nevertheless, the VAE adds more intricacy to the training process, as it necessitates optimizing a variational lower limit. This involves striking a balance between the precision of reconstruction and the regularization of the distribution in the latent space.
In the realm of anomaly detection, the absence of ground truth labels presents a significant challenge for quantitatively evaluating the performance and reliability of different detection methods. Without definitive labels indicating which data points are truly anomalous, researchers and practitioners must rely on a variety of intrinsic and comparative metrics to assess the effectiveness of their models. These metrics provide mathematical frameworks to understand how well an anomaly detection method distinguishes between normal and anomalous instances based on the inherent structure and distribution of the data.
The Mean Anomaly Score serves as a fundamental metric by averaging the anomaly scores assigned to each data point across all detection methods employed. Mathematically, if each data point
is assigned an anomaly score
by method
j, the mean anomaly score for data point
is given by
where
m is the number of detection methods. The overall mean anomaly score is then the average of these individual mean scores across all data points, providing a singular value that encapsulates the general tendency of the dataset towards anomaly as detected by the ensemble of methods.
The Silhouette Score offers insight into the consistency within clusters of data points, effectively measuring how similar each data point is to its own cluster compared to other clusters. For a data point
, the Silhouette coefficient
is calculated using the mean intra-cluster distance
and the mean nearest-cluster distance
, such that
The overall Silhouette Score is the average of
across all data points, with values ranging from −1 to +1. A higher Silhouette Score indicates well-separated and distinct clusters, suggesting effective differentiation between normal and anomalous data points.
The Mean Local Outlier Factor (LOF) score quantifies the degree to which a data point is considered an outlier based on the local density deviation with respect to its neighbors. For each data point
, the LOF score
is computed by comparing the local density of
to the local densities of its k-nearest neighbors. The mean LOF Score is the average of these scores across all data points, with higher values indicating a greater likelihood of being an outlier. This metric captures the relative outlierness of data points without relying on predefined cluster structures.
The Davies–Bouldin Index (DBI) assesses the average similarity ratio of each cluster with its most similar cluster, where similarity is defined in terms of within-cluster scatter and between-cluster separation. Mathematically, for each cluster
i, the DBI identifies the cluster
j that minimizes the ratio
where
and
represent the average intra-cluster distances for clusters
i and
j, respectively, and
is the distance between the centroids of clusters
i and
j. The overall DBI is the mean of these ratios across all clusters. A lower Davies–Bouldin Index signifies better clustering performance, with compact and well-separated clusters indicating more effective anomaly detection.
The Calinski–Harabasz Index (CHI) evaluates the ratio of between-cluster variance to within-cluster variance, providing a measure of cluster separation and compactness. For a given clustering configuration, CHI is calculated as
where
is the trace of the between-group dispersion matrix,
is the trace of the within-cluster dispersion matrix,
N is the total number of data points, and
k is the number of clusters. A higher Calinski–Harabasz Index indicates better-defined clusters with greater separation and lower intra-cluster variance, thereby reflecting more precise anomaly detection.
Lastly, the Dunn Index measures the ratio between the smallest inter-cluster distance to the largest intra-cluster distance, serving as an indicator of cluster compactness and separation. For a set of clusters, the Dunn Index is defined as
where
represents the minimum distance between any two points in clusters
and
, and
denotes the maximum intra-cluster distance within cluster
. A higher Dunn Index signifies better clustering quality, with well-separated and tightly knit clusters indicating superior anomaly detection performance.