Anomaly Detection Algorithm for Urban Infrastructure Construction Equipment based on Multidimensional Time Series

Wu, Bingjian; Zhang, Fan; Wang, Yi; Hu, Min; Bai, Xue

doi:10.3390/su16083335

Open AccessArticle

Anomaly Detection Algorithm for Urban Infrastructure Construction Equipment based on Multidimensional Time Series

by

Bingjian Wu

^1,2,

Fan Zhang

^1,2,

Yi Wang

^1,2

,

Min Hu

^1,2,*

and

Xue Bai

^1,2

¹

SHU-SUCG Research Center for Building Industrialization, Shanghai University, Shanghai 201800, China

²

SILC Business School, Shanghai University, Shanghai 201800, China

^*

Author to whom correspondence should be addressed.

Sustainability 2024, 16(8), 3335; https://doi.org/10.3390/su16083335

Submission received: 4 January 2024 / Revised: 7 April 2024 / Accepted: 10 April 2024 / Published: 16 April 2024

Download

Browse Figures

Versions Notes

Abstract

:

Safety is the foundation of urban sustainable development. The urban construction and operation process involves a large amount of multidimensional time series data. By detecting anomalies in these multidimensional time subsequences (MTSs), decision support can be provided for early warning of urban construction and operation risks. Considering the complexity of urban infrastructure, there is an urgent need for fast and accurate anomaly detection. This paper proposes a real-time anomaly detection algorithm based on improved distance measurement (RADIM). RADIM retains the relationships between dimensions in multidimensional subsequences, using an Extended Frobenius Norm with Local Weights (EFN_lw) and a Euclidean distance based on multidimensional data (ED_mv) to measure the similarity of MTSs. Moreover, a threshold update mechanism based on First-order Mean Difference (TMFD) is designed to detect real-time anomalies by assessing deviations. This method has been applied to tunnel construction. According to comparative experiments, RADIM exhibits better adaptability, real-time performance, and accuracy in risk warning of tunnel boring machines and construction status.

Keywords:

urban infrastructure construction; multidimensional time series; real time; abnormal detection; distance measurement

1. Introduction

With the development of smart cities, the process of urban construction and operation and maintenance will generate a large amount of multidimensional time series containing real-time status information of equipment and facilities [1]. Finding the abnormal operation state in these multidimensional time series and taking timely measures to avoid the occurrence of risks have become hotspots of concern in the sustainable development of urban areas. Affected by various factors, equipment inevitably causes breakdowns, degradations, and failures, resulting in accidents, severe environmental disasters, and casualties [1,2]. To improve the safety and quality of urban construction, it is of great significance to detect equipment abnormalities in advance and deal with them. The application of Internet of Things technology and the era of Industry 4.0 have significantly increased the convenience and reliability of monitoring technology [3]. Increasing the number of sensors and improving the quality of data collection on large engineering equipment will enable more effective diagnosis of anomalies. Therefore, time series anomaly detection based on sensing information has attracted increasing attention and made progress [4,5].

At present, time-series anomaly detection methods predominate single-dimensional problems [6]. Multidimensional problems are often transformed into single-dimensional problems through dimension reduction (such as by principal component analysis (PCA) [7], kernel PCA (KPCA) [8], and locally linear embedding (LLE) [9]) or independent analysis of each dimension [10]. The main methods include K-means [11], SVM [12], and LSTM [13]. However, they have certain limitations. First, the dimensionality reduction algorithm will lead to data distortion or loss, making it difficult to reconcile the contradiction between data characterization accuracy and computational complexity. Second, an independent dimension analysis ignores the interaction between dimensions, making it difficult to discern the relationship between the dimensions. Moreover, the limitation of computational complexity usually bothers proposed algorithms, and it is difficult for them to meet the needs of real-time detection for large-scale engineering equipment. Some scholars have begun to explore direct anomaly detection methods based on multidimensional time series to address that issue. For instance, they use correlation measurement [14,15], clustering [16], transformers [17], or deep neural networks [18] to establish models and perform anomaly recognition for sample points or sample subsequences. These methods consider the relationships between dimensions, reduce the distortion of the original data, and are more suitable for anomaly detection for multidimensional time series. Although methods are constantly innovating, particularly in machine learning and deep neural networks, these methods cannot be applied with stable performance across all types of anomaly detection situations [7,19,20]. The main difficulties lie in the following aspects:

(1): The actual engineering data distribution is unbalanced between normal and abnormal data. Traditional methods cannot effectively retain the relationships between the dimensions of multidimensional time series, overlooking the value of inter-dimensional relationships in anomaly detection tasks.
(2): The complex relationships of dimensional data contribute to the algorithm′s relatively low adaptability, often failing to detect anomalies effectively due to changes in external situations.
(3): There are higher requirements for real-time detection of anomalies, particularly in the complex systems and large equipment used in urban construction. If effective constraints are not promptly applied to anomalies, they can quickly spread throughout the entire system, affecting the quality of construction projects.

A new algorithm called RADIM (real-time abnormal detection algorithm based on improved distance measurements) has been proposed to address the bottleneck issue in anomaly detection for multidimensional time series. RADIM combines two essential methods to utilize the relationships between dimensions of multidimensional sequences effectively. The first method is the Extended Frobenius Norm with Local Weights (EFN_lw), which is particularly adept at capturing the nuances of multidimensional data distributions. It calculates similarity by considering the local weightings of data points, thereby improving sensitivity to real-time changes in data patterns. The second method is the Euclidean distance based on the mean value (ED_mv), which simplifies the complexity of multidimensional analysis by focusing on the mean values of distances, providing a balance between accuracy and computational efficiency.

RADIM also includes the dynamic threshold mechanism based on first-order difference (TMFD), which dynamically sets a reasonable threshold based on the latest data patterns. This approach adapts to real-time changes more effectively than static thresholds, addressing the crucial need for timely response in anomaly detection. Using a sliding window sequence segmentation method further supports the real-time aspect of the algorithm, ensuring that the detection is continuously updated with the latest data.

Theoretical analysis and experimental results have shown that our algorithm has relatively low time complexity, with single detection times typically under one second. This meets the stringent requirements for real-time anomaly detection in large-scale equipment. Furthermore, RADIM can perform anomaly detection without sufficient support from abnormal data, thereby overcoming the low detection accuracy caused by data imbalance. It also enhances RADIM′s capability to detect anomalies in complex, multidimensional scenarios where traditional methods might struggle.

The rest of this work is organized as follows. The first part introduces the background of anomaly detection. The second Section is modeling preparation, which analyzes and classifies multidimensional time series abnormality problems by combining them with abnormality problems for engineering equipment. On this basis, this section introduces the process and principles of the real-time abnormal detection algorithm based on the improved distance and model evaluation indicators. The third Section describes the basic processes and formulas for the critical modules of the algorithm. Six experiments are designed to verify the algorithm′s performances in Section four, which analyzes the algorithm′s accuracy, stability, and real-time performance. The last Section concludes and explains the next steps for the research.

2. Algorithm Definition

2.1. Anomaly Types for Multidimensional Time Series

In the anomaly detection algorithm, an anomaly refers to data that are sharply different in the data set. They are generated by different mechanisms rather than by random deviations [21]. Abnormal data usually contain three categories: point, contextual, and collective [22]. A point anomaly usually refers to a single data instance exceeding the normal range corresponding to the rest of the data. A context anomaly refers to an abnormal data instance in a specific context. A collective anomaly represents a collection of related data instances that are abnormal for the entire data set.

For time-series data, both point anomalies and collective anomalies can be converted into contextual anomalies, as time-series anomalies possess the contextual attributes and behavioral attributes of contextual anomalies [22]. However, an ideal multidimensional time series detection method needs to compare and analyze the single changes of time series characteristics from the perspective of time attributes and behavior attributes and consider the relationship between dimensions. Combined with the characteristics of engineering equipment anomalies, multidimensional time subsequences (MTS) anomalies were divided into two categories: asynchronous anomalies and synchronous anomalies, as shown in Figure 1.

2.1.1. Asynchronous Anomalies

Types of anomalies are defined from the perspectives of correlation between dimensions and the temporal attributes of anomaly events. On the one hand, when the physical (structure or system) correlation of each dimension is weak, it means that the behavioral attributes of each dimension are different. In this case, whether the time attributes are consistent or not, it is classified as an asynchronous anomaly, as shown in Figure 1a. On the other hand, when there is a strong correlation between dimensions, indicating that the behavior attributes of each dimension are closely related if the time attributes are inconsistent, it is also considered an asynchronous anomaly, as depicted in area ① in Figure 1b. For instance, during shield tunneling, insufficient grouting at a specific location due to blockage in the grouting pipeline is one of many factors affecting ground settlement. Therefore, a weak correlation exists between it and the ground settlement curve. Currently, the data from the blocked grouting pipeline and the ground settlement data form an MTS representing an asynchronous anomaly with inconsistent behavioral attributes. Additionally, since the grouting pipelines in the shield are independent of each other, the detection data would indicate that only one grouting pipeline shows a significantly abnormal data distribution. In this case, the data from each grouting pipeline form an MTS representing an asynchronous anomaly with inconsistent time attributes.

2.1.2. Synchronous Anomalies

A synchronous anomaly occurs when multiple sensors in an engineering device have consistent temporal attributes and strong physical (structural or systemic) correlations between each dimension. Generally, during the operation of engineering equipment, if the operation mode changes abnormally due to the external environment, correlated deviations differing from normal characteristics could appear on the detected unit. For example, in the process of shield tunneling, when the external load changes suddenly, sensors in the same region will fluctuate at the same time, showing similar abnormal behavior, as shown in area ② in Figure 1b.

2.2. Algorithm Design

2.2.1. Detection Process

The data characteristics of different engineering backgrounds are various. Therefore, problem analysis and data processing should incorporate the characteristics of the engineering equipment before anomaly detection. The time window is usually set with equipment periodicity, and an MTS is ‘cut′ by a fixed-width non-overlapping window. Different similarity measurement methods for data feature extraction are designed by combining the characteristics of synchronous and asynchronous anomalies from the perspective of equipment component characteristics. According to the status of the equipment, the normal class and abnormal class are distinguished in the training phase. In the testing phase, the running state of the equipment is tracked to modify the allowable deviation fluctuation range. Once the deviation value of a time subsequence is significantly beyond the range, the value can be considered an outlier, and the corresponding subsequence is abnormal. The algorithm is named RADIM (abnormal detection algorithm based on improved distance measurements), whose framework diagram is shown in Figure 2.

2.2.2. Matching Principle of Anomaly Types and Similarity Measurements

The matching of anomaly types and the similarity measurement method are the cores of the model. Based on the analysis of the structure of the equipment system and the historical inspection data, the model assumes that the distribution characteristics of multidimensional data are relatively stable over a period of time and do not change significantly. Therefore, two measurement methods were set up to reflect the data distribution characteristics and the characteristics of anomaly types. The selection criteria and the corresponding anomaly types are shown in Table 1.

One method is the extended Frobenius Norm based on local weight (EFN_lw). EFN_lw transforms the time sub-series matrix into eigenvalues and eigenvectors based on the singular value decomposition and then calculates the similarity between different time sub-series. Because EFN_lw can preserve the original features of the data, it is easy to capture the corresponding anomalous events and reduce the false negative rate in the case of synchronous anomalies with complex data distribution or asynchronous anomalies with weak correlations between dimensions.

The other method is the Euclidean distance based on the mean value (ED_mv). In the case of asynchronous anomalies where the dimensions of the multidimensional time series are strongly correlated and the temporal properties of the anomalous events are inconsistent, ED_mv measures the similarity of MTSs by analyzing them from the mean straight-line distances between the dimensions in the sub-series of the time series in response to their inconsistency in the time domain. Since the Euclidean distance calculation process is simple, the ED_mv method is preferred in this anomaly type.

3. Algorithm Implementation

3.1. Overview

Suppose there is a data set

Θ = {x_{1}, x_{2}, \dots, x_{η}}

containing

η

samples, where

x_{i} \subseteq R^{n}

and n represent the dimension of the time series. Using a sliding window with a width of

m (m ≫ n)

,

Θ

is decomposed into an MTS of equal width

A_{0}, A_{1}, \dots, A_{t} (t < η / m, t \in N^{*})

, and then a dynamic workspace

\prod

is set.

\prod

is similar to a ‘stack′, which follows the principle of ‘first-in-first-out′. The first k MTS of the dynamic workspace is used to hold the normal subsequences and the last one is used to hold the subsequence to be detected. The algorithm includes three main steps. Firstly, it selects the appropriate method to calculate the MTS similarity based on the MTS distribution characteristics. Subsequently, it dynamically updates the normal deviation range (NDR) based on the results of continuous data detection, which ensures the adaptability of the algorithm to changing patterns. Finally, the similarity calculation and the updated NDR are used to determine whether there is any abnormality in the MTS. The specific process is shown in Figure 3.

3.2. Similarity Measurement

First, the similarity measurement method was selected according to the data distribution features, and then the distance values of the time subsequences in

Π

were calculated using the corresponding algorithmic rules.

3.2.1. EFN_lw

EFN_lw is based on the extended Frobenius Norm (Eros), which is a similarity measurement proposed by Yang et al. [23]. This method, which can detect offline anomalies in multidimensional time series well, has been applied to climate detection, bridge, finance, and other fields in recent years [24,25,26]. Based on the good performance of this algorithm, improvements were decided to be made to meet the needs of real-time detection. It is believed, after analysis, that the calculation method of the weight coefficient is the main reason limiting real-time detection. This algorithm uses all data to obtain the weight coefficient, which reflects the overall characteristics of the data and is not sensitive enough to state changes. When a time subsequence is added, the weight coefficient value is changed so that the distance values of the multidimensional time subsequences associated are changed. The contradiction between the new distance value and the old distance value limits the online detection of the distance measurement directly. Therefore, this paper redesigned the weight calculation method and proposed EFN_lw. The specific process is shown in Figure 4. On the basis of ensuring accuracy, this method solved the contradiction between the old and new distance values and reduced the time complexity from O(n²) to O(n), which meets the requirement of online detection.

In order to compare the performance of the two methods objectively, the relevant experimental data sets were verified to demonstrate the rationality of the EFN_lw method in real-time distance, as discussed in Section 4.2.

Figure 4a shows the step of detecting the distance value of the time subsequences in the detection sequence space. Assuming that the time subsequence to be detected is A_k′, the time subsequence A_j′ is taken from the normal sequence space. Letting the eigenvectors of matrixes A_k′ and A_j′ be

V_{A_{k}'} = [β_{1}, β_{2}, \dots, β_{n}]

and

V_{A_{j}'} = [β_{1}', β_{2}', \dots, β_{n}']

, the corresponding eigenvalues are

\sum_{A_{k}'} = {[λ_{1}, λ_{2}, \dots, λ_{n}]}^{T}

and

\sum_{A_{j}'} = {[λ_{1}, λ_{2}, \dots, λ_{n}]}^{T}

.

The similarity between A_k′ and A_j′ is shown in Formula (1) as follows:

D (A_{k}', A_{j}') = \sum_{i = 1}^{n} ϖ_{i} | < β_{i}, β_{i}' > |

(1)

In this formula, the larger the calculated value is, the higher the similarity between A_k′ and A_j′. The parameter

ϖ_{i}

is the local weight coefficient, which can be calculated using Formula (2) as follows:

ϖ_{i} = \frac{λ_{i} + λ_{i}'}{\sum_{i = 1}^{n} (λ + i λ_{i}')}

(2)

According to Formulas (1) and (2), the distance value for the two vector matrices can be derived as shown in Formula (3). The smaller the value, the higher the similarity.

D_{E F N_l w} (A_{k}', A_{j}') = \sqrt{2 - 2 \cdot D (A_{k}', A_{j}')}

(3)

Suppose that the k time subsequences are stored in the normal sequence space. Then, the final distance value of A_k′ is as follows:

D_{E F N_l w} (A_{k}') = \frac{1}{k} \sum_{i = 0}^{k - 1} D_{E F N_l w} (A_{k}', A_{i}')

(4)

Figure 4b shows the distance value calculation process for the first K multidimensional time subsequences in the original time series data. The specific calculation Formula is (5) as follows:

D_{E F N_l w} (A_{i}') = \frac{1}{k - 1} \sum_{j = 0, j \neq i}^{k - 1} D_{E F N_l w} (A_{i}', A_{j}'), 0 \leq i \leq k - 1

(5)

3.2.2. ED_mv

The Euclidean distance measures the distance between single-dimensional time series data. This algorithm, being simple and widely used, was targeted for extension to measure the distance of multidimensional time series data. The process is shown in Figure 5.

The multidimensional time subsequence

A_{t}

is split into n single-dimensional vectors according to the dimension, and these are denoted as

A_{t} = [a_{1 *}, a_{2 *}, \dots, a_{n *}]

. Formula (6) shows the Euclidean distance between two single-dimensional vectors.

D (a_{p *}, a_{q *}) = \sqrt{\sum_{i = 1}^{m} {(a_{p i} - a_{q i})}^{2}}, 1 \leq p < q \leq n

(6)

Due to the symmetry of the Euclidean distance, the calculations were not repeated during traversal. Formula (7) shows the distance value of

A_{t}

.

D_{E D_m v} (A_{t}) = \frac{2 \cdot \sum_{p = 1}^{n} \sum_{q = p + 1}^{n} D (a_{p *}, a_{q *})}{n * (n - 1)}

(7)

3.3. Threshold Mechanism Based on the First-Order Difference (TMFD)

The threshold mechanism based on the first-order difference is mainly used for dynamic calculation of the threshold to determine the NDR in the current mode. To increase the flexibility of the algorithm in different scenarios, the threshold and magnification

θ

were set to describe the NDR. The threshold includes the initial threshold

t h r e_s t

and the dynamic threshold

t h r e_d y

. The first K multidimensional time subsequences in the original time series were aimed to be used to determine

t h r e_s t

and

θ

in order to construct the initial NDR. Then, abnormality judgments were performed on the subsequent subsequences. The judgment results incrementally revise

t h r e_d y

to determine the dynamic update mechanism. The process of TMFD is shown in Figure 6.

First, the first-order difference is determined from the distance value. Assuming that the distance value corresponding to the time subsequence

A_{t}

(t ≥ 0) is

D_{S M M} (A_{t})

and that the distance values are stored sequentially in the one-dimensional array

D V (t)

, the first-order difference is determined according to the distance value to obtain the difference array as

ι (t) = D V (t) - D V (t - 1) = D_{S M M} (A_{t}) - D_{S M M} (A_{t - 1})

, where t > 0 and

ι (0) = 0

.

Second, the initial NDR is determined. To obtain the parameter range, two initial parameter values first need to be determined:

t h r e_s t

and

θ

.

t h r e_s t

, which can be obtained according to Formula (8) as follows:

t h r e_s t = \frac{\sum_{t = 0}^{k - 1} \max (ι (t), 0)}{c o u n t (ι (t) > 0)}, 0 \leq t < k

(8)

The max function is used to filter the positive values of K, and the count function records the number of positive values. The selection of the magnification

θ

is based on experimental experience and generally takes a value between 2 and 10. After determining these two initial parameters, the initial NDR is

U (0, t h r e_s t \times θ)

.

Third, the dynamic update mechanism is set to update the NDR in real time. As shown in Figure 6, if the first-order difference value corresponding to the time subsequence

A_{t} (t \geq k)

to be detected is normal and positive, and then the NDR needs to be updated. The basis for the abnormality judgment is explained in detail in the next section. Here, a one-dimensional array

ι_p o s = [p_{1}, p_{2}, \dots, p_{u}]

is defined,

u < t

to store the positive values in the difference group

ι

in real time. Assuming that the dynamic threshold corresponding to

A_{t - 1}

is

t h r e_d y (A_{t - 1})

, Formula (9) is obtained as follows:

t h r e_d y (A_{t}) = \frac{1}{u} [(u - 1) \cdot t h r e_d y (A_{t - 1}) + p_{u}]

(9)

Lastly, the NDR is updated synchronously, as shown in Formula (10).

N D R = U (0, t h r e_d t (A_{t}) \times θ)

(10)

3.4. Abnormal Judgement

Considering that most anomalies in engineering equipment are continuous, the degree of anomaly is accumulated in the abnormality judgment method of this paper, and the cumulative result is used to judge whether the anomaly has ended. Assuming that the current time subsequence is

A_{t}

, the abnormal detection result is defined as follows. If the result is normal, then anomaly = 0, and otherwise, anomaly = 1. In addition,

C u_d e v

is marked as the cumulative deviation value. If

A_{t - 1}

is normal, then

C u_d e v = 0

; otherwise,

C u_d e v = D_{S M M} (A_{t - 1}) - D_{S M M} (A_{t - d}), d \leq t

, and

D_{S M M} (A_{t - d})

is normal. The corresponding abnormality judgment is shown in Formula (11).

a n o m a l y = \{\begin{cases} 0, D_{S M M} (A) t - D_{S M M} (A_{t - 1}) + C u_d e v \in N D R \\ 1, D_{S M M} (A_{t}) - D_{S M M} (A_{t - 1}) + C u_d e v \notin N D R \end{cases}

(11)

4. Experiment Design and Analysis

The model in this paper is written in Python. The hardware configuration of the experimental computer is a 2.30 GHz CPU with 8.00 GB RAM, and the operating system is Windows 10.

4.1. Experiment Design

4.1.1. Data Sets

In this experiment, two benchmark data sets and four shield engineering data sets were selected to verify the effectiveness of the algorithm. The details of the data sets are shown in Table 2. Data set 1 are the two-dimensional time series data generated by a function generator, using a random disturbance in the range (−0.5~0.5) to simulate noise [27], as shown in Formulas (12) and (13).

ψ_{1} (\partial) = \{\begin{cases} 2 \sin \partial + ε_{n o i s e}, 0 \leq \partial < 2500, 3000 \leq \partial < 3500 \\ 2 \sin \partial + 4 + ε_{n o i s e}, 2500 \leq \partial < 3000 \end{cases}

(12)

ψ_{2} (\partial) = \{\begin{cases} \sin \partial + ε_{n o i s e}, 0 \leq \partial < 1500, 2000 \leq \partial < 3500 \\ \sin \partial + 2 + ε_{n o i s e}, 1500 \leq \partial < 2000 \end{cases}

(13)

Data set 2 is extracted from a video of an actor performing various actions with and without a replica gun. The two dimensions record the movement trajectory of the actor′s right hand on the X and Y coordinates. The actor′s action is identified as an anomaly if the actor draws the gun from the holster mounted on the hip, points it at the target, and then puts it back into the holster [28]. Data sets 3–6 are construction data from shields used in tunnel construction in different regions of China. Data set 3 records grease pressure data from six grease pressure sensors, monitoring the grease pressure at the tail of the shield to prevent grease leakage. Grease leakage may cause groundwater to surge into the tunnel, affecting construction safety and quality. The data set collected pressure data by second, ranging from 15:32:24 on 15 December 2017 to 5:05:49 on 24 December 2017; a grease pressure anomaly occurred during this period. Data set 4 is used to monitor the operating safety of the propulsion system during shield construction. The propulsion system is crucial for shield tunneling, and abnormal warnings from the propulsion system help engineers notice changes in excavation conditions in time and adjust the construction plans. This data set includes five sensor values: the cutter head torque, total jack thrust, balance pressure, propulsion speed right, and grouting pressure. It records historical information from 2 September 2017 to 6 September 2017, and the acquisition frequency is 1 min. Data sets 5 and 6 collect data from sensors that can reflect the stability of the ground. Ground instability can easily lead to ground collapse, severely affecting construction quality and increasing construction costs. Data set 5 is used to diagnose whether there is an anomaly in the grouting pipe. This data set collects grouting pressure data at four different positions that were recorded between 14:27:29 on 13 December 2019 and 11:32:58 on 20 December 2019, which corresponds to two actual grouting pipe blockage failures. Data set 6 is for abnormality detection in ground subsidence. The data come from the construction of a cross-river tunnel. We selected five sealing pressure data values at different positions in the sealed cabin that record a surface collapse event from 1 May 2008 to 25 May 2008. The data sampling interval is 3 min. The specific information of the data sets is shown in Table 2.

4.1.2. Experiment Procedure

For ease of description, the RADIM algorithms based on EFN_lw and ED_mv are marked RADIM_Elw and RADIM_mv, respectively. Since the selection of the similarity measurement method mainly depends on anomaly types, the anomaly characteristics of different data sets are analyzed according to the engineering background of the data sets. The specific matching principle has been explained in Section 2.2.

Data set 1 describes typical asynchronous anomalies generated at different times. Because the period and function types between the two dimensions are similar, it is considered that the data set belongs to asynchronous anomalies under strong correlation, and RADIM_mv is chosen. Data set 2 records a continuous action. In this process, the two axes record the changes in each direction in the same action cycle, but there is no obvious correlation between the two dimensions. Therefore, it is easy to generate asynchronous exceptions in weak correlation situations, so RADIM_Elw is more appropriate. The grease anomaly in data set 3 shows that the oil in the barrel is generally exhausted or difficult to discharge. Under normal operating conditions, the pressure of each pipeline has a significant correlation; when a grease seal anomaly occurs, the pressure values of the pipelines are usually consistent in terms of time characteristics, and it is more likely that synchronous anomalies with strong correlation will occur, and it is easier to find such anomalies using RADIM_Elw. In data set 4, the five sensors have different attributes, and the correlation between the dimensions is weak. When the soil quality suddenly changes, the stability relationship will be broken, and each sensor data will change independently. It can be assumed that this data set is prone to asynchronous anomalies with weak correlations, so RADIM_Elw is preferred. Data set 5 records two abnormal incidents of blockage of grouting pipes, which occurred in pipeline 1 and pipeline 2, respectively. During normal grouting, each grouting pipe completes grouting at different positions according to the construction requirements. For a period of time, the ratio between the dimensions is relatively stable, but when a certain grouting pipe is blocked, the distance relationship between the dimensions will be broken, resulting in an asynchronous anomaly in the stronger correlation case, which is more appropriately detected using RADIM_mv. The major ground collapse accidents in data set 6 are typically the cumulative effect of excavation instability in the time dimension, a process prone to strongly correlated asynchronous anomalies between pressure sensors in the excavation. This significant change under strong correlation also applies to the RADIM_mv algorithm. Therefore, in this experiment, data sets 1, 5, and 6 are used to verify the performance of RADIM_mv. Data sets 2, 3, and 4 are used to detect the effect of RADIM_Elw.

To comprehensively evaluate the performance of this algorithm, the precision, recall,

F 1_S c o r e

, and running time were selected to evaluate the algorithm′s performance from three perspectives: feasibility, stability, and algorithm comparison.

In general, the parameter settings can directly affect the detection result. This algorithm has three key parameters: time window length, magnification, and MTS number in the workspace. The time window length mainly depends on the physical meaning of the collection field. If there are obvious periods or time nodes in the data set, they can be used as the division basis. Otherwise, a value is assigned according to the data distribution characteristics. The selection of magnification is mainly based on the statistical characteristics of normal data fluctuations in the context of the field. After obtaining the initial threshold thre_st, the normal interval needs to be expanded to a certain extent according to the statistical characteristics. If the data have high volatility, it is recommended to set a larger magnification. If the volatility is low, the corresponding value should be smaller, which is usually between 2 and 10. In practical applications, it can be adjusted dynamically according to the effect of anomaly detection. The choice of the MTS number in the workspace determines the data size of the training set. Since the amount of data required in the training phase of the algorithm is small, approximately 20% of the data is usually considered for selection as the total sample. In this section, the parameters are adjusted according to the characteristics of the data. After 50 trainings, the parameter groups of the different data sets are shown in Table 3.

4.1.3. Model Evaluation

To compare the algorithm′s performance in terms of feasibility, stability, accuracy, and real-time performance, four core evaluation indicators were selected: precision, recall (sensitivity), specificity,

F 1_S c o r e

, and running time. The specific index formulas are given in Formulas (14) to (17).

P r e c i s i o n = \frac{T P}{T P + F P}

(14)

R e c a l l = \frac{T P}{T P + F N}

(15)

S p e c i f i c i t y = \frac{T N}{T N + F P}

(16)

F 1_S c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(17)

TP represents the number of true positive results, TN indicates the number of true negative results, and FP and FN represent the numbers of false positive and false negative results, respectively. Among these indicators, precision measures the proportion of subsequences correctly identified as normal out of those judged as normal by the model; a high precision indicates that the model generates fewer false positives. Recall measures the proportion of actual normal subsequences correctly identified by the model, with a high recall rate reflecting better performance in confirming normal subsequences. Specificity measures the proportion of actual abnormal subsequences correctly identified as such by the model, where high specificity means better performance in confirming abnormal subsequences.

F 1_S c o r e

, the harmonic mean of precision and recall, assesses the model′s overall efficacy in identifying normal subsequences. Running time reflects the amount of time taken by the model to run; the smaller the value of running time, the better the real-time performance of the model in anomaly detection.

Considering the difference in detection mechanisms between the algorithms, the training time and testing time were included in the running time to ensure the fairness of the algorithm running time and the unity of the training data.

4.2. Benchmark Data Set Experiments

To better evaluate the performance of the algorithm, an experiment on the parameter changes was carried out to determine the robustness of the detection results. In addition, the characteristics of the algorithm were analyzed by comparing abnormality algorithms, such as the meta-feature-based anomaly detection approach (MFAD) [7] and long short-term memory (LSTM) [31]. Before performing anomaly detection, the rationality of the improved EFN_lw based on Eros on real-time distance measurement needs to be verified according to the previous matching results. As mentioned in Section 3.2, Eros is a similarity measurement method based on multidimensional time series. This measurement method provides a new solution for the anomaly detection of multidimensional time series. Although the existing anomaly detection algorithms based on Eros have high accuracy and ideal detection results, it is difficult to perform online detection with these diagnostic methods, and EFN_lw can break the bottleneck. To verify the running speed of EFN_lw, engineering data sets 3 and 4 were selected for experimental comparison.

In the experiment, two Eros and EFN_lw measurement methods were used for 20 runs each for the grouting system data set and pressure system data set, respectively. The average running time is described in Table 4.

It can be seen in Table 4 that the overall average running time of EFN_lw in the two data sets is slightly lower than Eros. In addition, as the size of the data set increases, the amount of change in the running time of EFN_lw is smaller than Eros. This can indicate that EFN_lw has a higher calculation speed and shorter calculation time, which is consistent with the time complexity of the two. Observing the average time taken for the abnormal detection of a single multidimensional time subsequence in data sets 3 and 4, the corresponding value is much less than 0.0011 s. It shows that in real-time anomaly detection, using EFN_lw can effectively reduce the time consumption in the distance measurement phase, and the corresponding running time can almost be ignored. Through the comparison of the overall running time and the detection time of a single judgment, the superiority of the EFN_lw algorithm in real-time detection can be strongly illustrated.

4.2.1. Manual Data Set

The red areas in Figure 7 are the correctly detected positions, which are consistent with the results in Table 2. This can preliminarily prove the feasibility of the RADIM_mv algorithm.

4.2.2. Video Surveillance Data Set

Using RADIM_Elw to detect the anomalies in this data set, the detection results were obtained in Figure 8. Compared with Table 2, it was found that the algorithm could accurately identify the three marked abnormal positions, with only a small difference in the anomaly boundary. This difference was caused by the width setting of the time window and did not affect the detection performance of the algorithm. Through the verification of the data set, it is believed that the problem of multidimensional anomaly detection can be solved by the algorithm.

4.3. Engineering Validation

4.3.1. Grease System Data Set

Three abnormal oil pressures were found, corresponding to ring numbers 500, 516, and 517–520 (Figure 9), which were consistent with the results in Table 2. The accuracy, recall rate, and

F 1_S c o r e

of the algorithm were all 1, and the average running time was 0.855 s, which proves that the algorithm shows good performance in synchronous anomaly detection for high-dimensional data sets. According to Figure 9, RADIM_Elw robustly detected anomalies and ignored meaningless noise, which contributed to discovering the occurrence of grease leakage and overhauling the grease pipelines.

4.3.2. Propulsion System Data Set

Figure 10 shows that there are obvious differences in the data distribution characteristics of the five dimensions in the data set. Under the RADIM_Elw algorithm, a total of nine inconsistencies were detected in data set 4, and the position of the abnormal marks was found accurately without missing reports. According to the statistics, the average running time of the algorithm was 0.635 s, which meant that it was less time consuming. RADIM_Elw can be considered suitable for solving complex multidimensional time series anomaly detection problems. Additionally, continuous multiple alarms in the propulsion system could remind constructors to optimize construction decisions for tunneling condition shifts.

4.3.3. Grouting System Data Set

Comparing the results in Figure 11 and Table 2, it was found that the algorithm had some misreports, which are indicated in blue. The first and third places roughly matched the actual abnormal mark, but the second discord was an obvious false alarm. Since the blockage of the grouting pipe during shield construction is a gradual accumulation process, it is believed that it is normal that the detected interval is greater than the time of the anomaly log. The performance of the algorithm in this data set was quantified according to the information in Table 2, and it was calculated that the average parallel time was 2.339 s and the

F 1_S c o r e

was 0.980.

4.3.4. Pressure System Data Set

It can be seen in Figure 12 that the algorithm detected two inconsistencies. Comparing the locations in Table 2, it was found that the algorithm can detect the locations of the accidents. The subtle difference was mainly due to the magnification and time window parameters. In actual engineering, it is believed that such a small range of false detection is allowable. To objectively measure the performance of the algorithm on this data set, key indicators were calculated from the perspective of sample size, resulting in an accuracy of 0.952, a recall rate of 0.978, a comprehensive F1-score of 0.965, and an average running time of 0.197 s.

In general, using four key evaluation indicators to verify the algorithm on the above data sets, it was found that the algorithm proposed in this paper can, relatively accurately, find abnormal locations in a very short time and comprehensively perform well.

4.4. Parameter Sensitivity

To study the sensitivity of the algorithm to key parameters and judge the stability of the algorithm, two representative engineering data sets were selected from the previous data sets: grease system and grouting system data sets for testing. In the experiment, three sets of control experiments were set up with three key parameters as the only variables. In each set of experiments, the parameter values to be verified were set at equal intervals. Each parameter was tested 50 times, and then the average of the results was recorded. Incorporating general experience with the key parameters, the window length, magnification, and MTS in the workspace interval were set to 5, 0.5, and 5, respectively. The experimental results are shown in Figure 13, Figure 14 and Figure 15.

Comparisons can be drawn in Figure 13, Figure 14 and Figure 15, leading to the following conclusions. First, according to the fluctuation degree of the curve, the overall data volatility is not large, and the algorithm has strong stability with respect to parameter changes. By comparing the distribution of the curve under different parameter changes, the sensitivity of the algorithm to the parameter under the set interval value is determined: magnification > the window length > the MTS in the workspace. This rule informs us that in the actual parameter adjustment process, the magnification value should be coarsely tuned first, incorporating the values drawn from engineering experience, followed by fine tuning the time window length and, finally, precision tuning the MTS number in the workspace to determine the optimal parameter combination. Second, the relationship between the key parameters and the algorithm running time was studied. From the experiment, it was found that the running time of the algorithm and the time window length show a relatively obvious inverse proportional relationship. Therefore, in order to reduce the running time of the algorithm in an actual engineering data set, the width of the time window can be increased appropriately to provide a certain amount of time redundancy for anomaly detection in order to meet the time requirement.

4.5. Algorithm Performance

In this section, the typical algorithms LSTM and MFAD were chosen for experimental comparison. In theory, the time complexity of RADIM is roughly the same as that of MFAD, which is O(n), and the time complexity of LSTM is relatively high, and its value is O(n²). In the experiment, the four engineering data sets in this paper were used to compare the three algorithms under different engineering backgrounds. The experiment was repeated 100 times, and the optimal value and average value of the detection results of the algorithm were recorded. The results are shown in Figure 16.

Figure 16a–c show the performance of three algorithms on four experimental data sets in terms of five metrics: precision, recall (sensitivity), specificity,

F 1_S c o r e

, and running time.

In Figure 16a–c, it is evident that the three algorithms can detect most anomalies, with the optimal precision and recall rates for the four experimental data sets being close to or equal to 1. The average precision and recall scores of RADIM and LSTM exceed 0.9, indicating their steady and accurate performance.

F 1_S c o r e

comprehensively measured the algorithms′ accuracy according to precision and recall scores. In terms of the specificity index, the average level of the RADIM algorithm in the four data sets is not less than 0.9, the average level of LSTM algorithm in the other data sets is not less than 0.81, except for 0.36 in the grouting system data set, and the average level of the MFAD algorithm in the four data sets is lower than 0.72, which shows that the RADIM algorithm can identify normal subsequences correctly with a low false alarm rate higher than the MFAD algorithm. The accuracy of the RADIM algorithm in identifying the actual abnormal subsequence is significantly better than the other three algorithms. In Figure 16d, RADIM and LSTM performed better than MFAD with respect to the

F 1_S c o r e

, and RADIM and LSTM possessed over 0.96 in all data sets, except for data set 4 regarding the propulsion system. Associated with the engineering background, the dimensional relationship of the propulsion system data set might be more complicated, which leads to a decline in the detection accuracy for all three algorithms. RADIM successfully detected anomalies in the propulsion system data set with an

F 1_S c o r e

of 0.875 on average, which is higher than LSTM and MFAD. Additionally, the numerical difference of the

F 1_S c o r e

between the average and optimal values contributed to analyzing the stability of the experimental operation. RADIM′s minor differences proved its relatively stable performance for four experimental data sets. From the perspective of anomaly types, the synchronous anomaly in data set 3 and the asynchronous ones in data set 5 can be well-detected under RADIM with an

F 1_S c o r e

of over 0.97, while results under LSTM showed a weaker ability to detect a synchronous anomaly.

Figure 16e illustrates the running times of the algorithms in four experimental data sets. Despite the influence of data volume, the algorithms′ running duration remained consistent within the same data set. Notably, the RADIM algorithm, with a running time of less than 5 s for the detection of tens of thousands of multidimensional data, proved to be significantly more efficient than LSTM and MFAD. This efficiency reassures the audience about the real-time detection capabilities of RADIM.

Conclusively, regarding algorithm accuracy, the performance of RADIM and LSTM is about the same. Still, the RADIM algorithm has a clear advantage regarding specificity metrics and is more stable for synchronous and asynchronous anomalies. From the perspective of running time, the time consumption of RADIM and MFAD is much less than that of LSTM. That is, the results of the RADIM algorithm in real-time anomaly detection have the accuracy of LSTM and the lower time consumption of MFAD, thus meeting the requirements of real-time detection for engineering equipment.

5. Conclusions

Real-time anomaly detection of multidimensional time series in large-scale engineering facilities in intelligent construction is extremely challenging. The complexity of multidimensional time series data, the noise interference of the field environment, and the requirements for real-time detection accuracy, like the three factors, have increased the difficulty of algorithm research. In this work, the detection method RADIM is introduced. The algorithm is simple in principle but can effectively deal with synchronous type and asynchronous type anomaly monitoring of the MTS while preserving the integrity of information between dimensions. Compared with the traditional MTS anomaly detection algorithm, the RADIM algorithm′s time complexity is reduced to O(n), and the anomaly detection threshold can be updated dynamically to satisfy adaptive, real-time, and high-precision detection of anomalies with high detection accuracy. The algorithm is applied to four engineering equipment data sets with large data sizes and different detection types. Comparative analysis of the experimental results shows that the algorithm performs better than typical anomaly detection algorithms in large-scale urban construction equipment anomaly detection. RADIM′s consistent performance in engineering equipment data sets illustrates its potential for a wide range of applications in other urban development scenarios, such as bridge structural inspections and large and complex equipment system operating condition inspections that contain multidimensional time series.

The proposed RADIM algorithm′s existing limitation is that the parameters′ values depend on the characteristics of the data set′s distribution. Therefore, more data analysis and experience must be used in the initial parameter adjustment process. In the case of complex data sets, the time required for parameter adjustment is relatively long. In addition, the similarity measure methods in the algorithm are incomplete, which cannot fully meet the needs of multidimensional time series anomaly detection in complex situations.

In future work, three main areas will be focused on for improvement. (1) Different types of multidimensional raw data in urban infrastructure will be considered to extend or improve the similarity measurement method (SMM). (2) A model database will be established to record typical data sets′ characteristics and parameter group values. This will provide decision-making input or suggestions for subsequent data sets of the same kind of large-scale engineering equipment or abnormality types, aiming to reduce the time needed for parameter adjustment in complex data sets. (3) Design and improve the similarity measurement method and establish an adaptive matching mechanism, making the algorithm more operational and generalized.

Author Contributions

Conceptualization, M.H. and B.W.; methodology, M.H., F.Z. and B.W.; software, Y.W. and X.B.; validation, Y.W., B.W. and F.Z.; investigation, M.H. and B.W.; data curation, F.Z. and X.B.; writing—original draft preparation, M.H. and B.W.; writing—review and editing, X.B. and B.W.; visualization, B.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author. The data are not publicly available due to the specificity of research work.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Xiang, Y.; Tang, T.; Su, T.; Brach, C.; Liu, L.; Mao, S.S.; Geimer, M. Fast crdnn: Towards on site training of mobile construction machines. IEEE Access 2021, 9, 124253–124267. [Google Scholar] [CrossRef]
Hu, M.; Bai, X.; Xu, W.; Wu, B. Review of anomaly detection algorithms for multi-dimensional time series. J. Comput. Appl. 2020, 40, 1553–1564. [Google Scholar] [CrossRef]
Zhong, R.Y.; Xu, X.; Klotz, E.; Newman, S.T. Intelligent Manufacturing in the Context of Industry 4.0: A Review. Engineering 2017, 3, 616–630. [Google Scholar] [CrossRef]
Habeeb, R.A.A.; Nasaruddin, F.; Gani, A.; Hashem, I.; Ahmed, E.; Imran, M. Real-time big data processing for anomaly detection: A Survey. IJIM 2018, 45, 289–307. [Google Scholar] [CrossRef]
Ding, N.; Ma, H.; Gao, H.; Ma, Y.; Tan, G. Real-time anomaly detection based on long short-Term memory and Gaussian Mixture Model. Comput. Electr. Eng. 2019, 79, 106458. [Google Scholar] [CrossRef]
Zhang, C.; Chen, Y.; Yin, A.; Qin, Z.; Zhang, X.; Zhang, K.; Jiang, Z.L. An Improvement of PAA on Trend-Based Approximation for Time Series. In Proceedings of the 18th ICA3PP 2018, Guangzhou, China, 15–17 November 2018; pp. 248–262. [Google Scholar]
Hu, M.; Ji, Z.; Yan, K.; Guo, Y.; Feng, X.; Gong, J.; Zhao, X.; Dong, L. Detecting Anomalies in Time Series Data via a Meta-Feature Based Approach. IEEE Access 2018, 6, 27760–27776. [Google Scholar] [CrossRef]
Navi, M.; Meskin, N.; Davoodi, M. Sensor fault detection and isolation of an industrial gas turbine using partial adaptive KPCA. J. Process Control 2018, 64, 37–48. [Google Scholar] [CrossRef]
Roweis, S.T.; Saul, L.K. Nonlinear dimensionality reduction by locally linear embedding. Science 2000, 290, 2323–2326. [Google Scholar] [CrossRef] [PubMed]
Canizo, M.; Triguero, I.; Conde, A.; Onieva, E. Multi-head CNN–RNN for multitime series anomaly detection: An industrial case study. Neurocomputing 2019, 363, 246–260. [Google Scholar] [CrossRef]
Han, Z.J. An adaptive K-means initialization method based on data density. Comput. Appl. Softw. 2014, 31, 182–187. [Google Scholar]
Kaur, R.; Kang, S.S. An enhancement in classifier support vector machine to improve plant disease detection. In Proceedings of the 2015 IEEE 3rd International Conference on MOOCs, Innovation and Technology in Education (MITE), Amritsar, India, 1–2 October 2015; pp. 135–140. [Google Scholar]
Tran, K.P.; Nguyen, H.D.; Thomassey, S. Anomaly detection using Long Short Term Memory Networks and its applications in Supply Chain Management. IFAC-PapersOnLine 2019, 52, 2408–2412. [Google Scholar] [CrossRef]
Wang, W.; Bao, J.; Li, T. Bound smoothing based time series anomaly detection using multiple similarity measures. J. Intell. Manuf. 2021, 32, 1711–1727. [Google Scholar] [CrossRef]
Zheng, J.; Qu, H.; Li, Z.; Li, L.; Tang, X. A deep hypersphere approach to high-dimensional anomaly detection. Appl. Soft Comput. 2022, 125, 109146. [Google Scholar] [CrossRef]
Li, J.; Izakian, H.; Pedrycz, W.; Jamal, I. Clustering-based anomaly detection in multivariate time series data. Appl. Soft Comput. 2021, 100, 106919. [Google Scholar] [CrossRef]
Wang, X.; Pi, D.; Zhang, X.; Liu, H.; Guo, C. Variational transformer-based anomaly detection approach for multivariate time series. Measurement 2022, 191, 110791. [Google Scholar] [CrossRef]
Wambura, S.; Huang, J.; Li, H. Robust Anomaly Detection in Feature-Evolving Time Series. Comput. J. 2021, 65, 1242–1256. [Google Scholar] [CrossRef]
Audibert, J.; Michiardi, P.; Guyard, F.; Marti, S.; Zuluaga, M.A. Do Deep Neural Networks Contribute to Multivariate Time Series Anomaly Detection? Pattern Recognit. 2022, 132, 108945. [Google Scholar] [CrossRef]
Gauci, M.; Chen, J.; Li, W.; Dodd, T.J.; Groß, R. Self-organized aggregation without computation. Int. J. Robot. Res. 2014, 33, 1145–1161. [Google Scholar] [CrossRef]
Hawkins, D.M. A single outlier in normal samples. In Identification of Outliers, 3rd ed.; Springer: Dordrecht, The Netherlands, 1980; pp. 27–41. Available online: https://www.springer.com/cn/book/9789401539968 (accessed on 6 April 2024).
Chandola, V.; Banerjee, A.; Kumar, V. Anomaly detection: A survey. ACM Comput. Surv. 2009, 41, 15. [Google Scholar] [CrossRef]
Yang, K.; Shahabi, C. A PCA-based similarity measure for multivariate time series. In Proceedings of the 2nd ACM International Workshop on Multimedia Databases, Washington, DC, USA, 8–13 November 2004. [Google Scholar] [CrossRef]
Guo, X.F.; Li, F. Analysis on similarity of multivariate time series based on Eros. Comput. Eng. Appl. 2014, 48, 111–114. [Google Scholar]
Weng, X.Q.; Shen, J.Y. Outlier Mining for Multivariate Time Series Based on Sliding Window. Comput. Eng. 2007, 33, 102–104. [Google Scholar]
Chen, Z. Research on Anomaly Detection and Data Quality Assessment of Bridge Health Monitoring Data. Master′s Thesis, Department of Engineering, Chongqing University, Chongqing, China, 2017. [Google Scholar]
Wang, T.; Lu, G.; Yan, P. Multi-sensors based condition monitoring of rotary machines: An approach of multi-dimensional time-series analysis. Measurement 2019, 134, 326–335. [Google Scholar] [CrossRef]
Kaya, H.; Gündüz-Öğüdücü, Ş. A distance based time series classification framework. Inf. Syst. 2015, 51, 27–42. [Google Scholar] [CrossRef]
Keogh, E.; Lin, J.; Fu, A. HOT SAX: Efficiently finding the most unusual time series subsequence. In Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM′05), Houston, TX, USA, 27–30 November 2005. [Google Scholar] [CrossRef]
Hu, M.; Wu, F.F.; Zhu, B.; Lu, B.; Pu, J.L. A New Hazard Identification Method-State Transition Graph. Appl. Mech. Mater. 2011, 48–49, 71–78. [Google Scholar] [CrossRef]
Malhotra, P.; Vig, L.; Shroff, G.; Agarwal, P. Long Short Term Memory Networks for Anomaly Detection in Time Series. Presented at ESANN. [Online]. Available online: http://www.i6doc.com/en/ (accessed on 6 April 2024).

Figure 1. Abnormal types of multidimensional time series: (a) the anomaly types under weak correlation between dimensions; (b) the anomaly types under strong correlation between dimensions.

Figure 2. Overview of RADIM architecture.

Figure 3. RADIM algorithm process.

Figure 4. EFN_LW algorithm flow: (a) an MTS in the detection sequence region; (b) the first K MTS in the normal sequence region.

Figure 5. ED_mv flowchart.

Figure 6. TMFD process.

Figure 7. Experimental results for the manual data set.

Figure 8. Experimental results for the video surveillance data set.

Figure 9. Experimental results for the grease system data set.

Figure 10. Experimental results for the propulsion system data set: 1 discord, location 7280~7420, ring number 947; 2 discord, location 7770~7910, ring number 948; 3 discord, location 9170~9310, ring number 949; 4 discord, location 9800~10,080, ring number 950; 5 discord, location 10,430~11,130, ring number 951; 6 discord, location 11,200~117,600, ring number 952; 7 discord, location 12,530~15,470, ring number 953~956; 8 discord, location 15,750~15,890, ring number 957; 9 discord, location 16,590~18,270, ring number 958~959.

Figure 11. Experimental results for the grouting system data set.

Figure 12. Experimental results for the pressure system data sets.

Figure 13. Influence of the time window width parameters on the experimental results: (a) grease system data set; (b) grouting anomaly data set.

Figure 14. Effect of magnification on the experimental results: (a) grease system data set; (b) grouting system data set.

Figure 15. Influence of the number of multidimensional time subsequences in the workspace on the experimental results: (a) grease system data set; (b) grouting system data set.

Figure 16. Algorithm performance evaluation: (a) precision; (b) recall; (c) specificity; (d) F1-score; (e) running time.

Table 1. Criteria for the selection of similarity measurement methods.

Similarity Measure Methods	Selection Criteria		Anomaly Type
Similarity Measure Methods	MTS Correlation	Temporal Attributes of MTS Abnormal Events	Anomaly Type
ED_mv	Strong	Consistency	Synchronous anomaly
EFN_lw	Strong	Inconsistency	Asynchronous anomaly
EFN_lw	Weak	Consistency or Inconsistency	Asynchronous anomaly

Table 2. Experimental data set description.

No	Data Set Name	Dimensionality	Total Sample Size	Training Sample Size	Real Discord	Data Sources
1	Manual	2	3500	1000	1500–2000, 2500–3000	Composite data
2	Video Surveillance	2	11,250	1400	300–430, 1465–1590, 1913–2964	[28,29]
3	Grease System	6	56,580	13,125	500, 516–520 (RingNo)	Shanghai Metro Line 13
4	Propulsion System	5	18,340	3500	947–959 (RingNo)	Hangzhou Wenyi Tunnel Project
5	Grouting System	4	186,320	4000	2019/12/16 09:02–15 (RingNo:219) 2019/12/18 11:11:15 (RingNo:234)	Hangzhou Shaoxing Metro Project
6	Pressure System	5	11,720	2250	6650–11,720	Tunnel Project in Shanghai [30]

Table 3. Parameter setting value.

Number	Data Set Name	Time Window Length	Magnification	MTS Number in Workspace
1	Grease System	175	4	75
2	Propulsion System	70	3	50
3	Grouting System	100	9	40
4	Pressure System	75	4	30

Table 4. Running time of the Eros and EFN_lw methods.

Method	Time of Grouting System Data Set (s)	Time of Pressure System Data Set (s)
EFN_lw	0.5939	0.3047
Eros	0.8753	0.4581

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, B.; Zhang, F.; Wang, Y.; Hu, M.; Bai, X. Anomaly Detection Algorithm for Urban Infrastructure Construction Equipment based on Multidimensional Time Series. Sustainability 2024, 16, 3335. https://doi.org/10.3390/su16083335

AMA Style

Wu B, Zhang F, Wang Y, Hu M, Bai X. Anomaly Detection Algorithm for Urban Infrastructure Construction Equipment based on Multidimensional Time Series. Sustainability. 2024; 16(8):3335. https://doi.org/10.3390/su16083335

Chicago/Turabian Style

Wu, Bingjian, Fan Zhang, Yi Wang, Min Hu, and Xue Bai. 2024. "Anomaly Detection Algorithm for Urban Infrastructure Construction Equipment based on Multidimensional Time Series" Sustainability 16, no. 8: 3335. https://doi.org/10.3390/su16083335

APA Style

Wu, B., Zhang, F., Wang, Y., Hu, M., & Bai, X. (2024). Anomaly Detection Algorithm for Urban Infrastructure Construction Equipment based on Multidimensional Time Series. Sustainability, 16(8), 3335. https://doi.org/10.3390/su16083335

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Anomaly Detection Algorithm for Urban Infrastructure Construction Equipment based on Multidimensional Time Series

Abstract

1. Introduction

2. Algorithm Definition

2.1. Anomaly Types for Multidimensional Time Series

2.1.1. Asynchronous Anomalies

2.1.2. Synchronous Anomalies

2.2. Algorithm Design

2.2.1. Detection Process

2.2.2. Matching Principle of Anomaly Types and Similarity Measurements

3. Algorithm Implementation

3.1. Overview

3.2. Similarity Measurement

3.2.1. EFN_lw

3.2.2. ED_mv

3.3. Threshold Mechanism Based on the First-Order Difference (TMFD)

3.4. Abnormal Judgement

4. Experiment Design and Analysis

4.1. Experiment Design

4.1.1. Data Sets

4.1.2. Experiment Procedure

4.1.3. Model Evaluation

4.2. Benchmark Data Set Experiments

4.2.1. Manual Data Set

4.2.2. Video Surveillance Data Set

4.3. Engineering Validation

4.3.1. Grease System Data Set

4.3.2. Propulsion System Data Set

4.3.3. Grouting System Data Set

4.3.4. Pressure System Data Set

4.4. Parameter Sensitivity

4.5. Algorithm Performance

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI