Bridge Damage Identification Based on Encoded Images and Convolutional Neural Network

Wang, Xiaoguang; Li, Wanhua; Ma, Ming; Yang, Fan; Song, Shuai

doi:10.3390/buildings14103104

Open AccessArticle

Bridge Damage Identification Based on Encoded Images and Convolutional Neural Network

by

Xiaoguang Wang

^1,2,*,

Wanhua Li

³

,

Ming Ma

²,

Fan Yang

¹ and

Shuai Song

⁴

¹

Highway School, Chang’an University, Xi’an 710064, China

²

CCCC First Highway Consultants Co., Ltd., Xi’an 710075, China

³

School of Civil Engineering and Architecture, Guangxi University, Nanning 530004, China

⁴

School of Civil Engineering, Qingdao University of Technology, Qingdao 266520, China

^*

Author to whom correspondence should be addressed.

Buildings 2024, 14(10), 3104; https://doi.org/10.3390/buildings14103104

Submission received: 29 August 2024 / Revised: 24 September 2024 / Accepted: 24 September 2024 / Published: 27 September 2024

(This article belongs to the Section Building Structures)

Download

Browse Figures

Versions Notes

Abstract

:

Bridges are prone to damage from various factors, impacting the overall safety of transportation networks. Accurate damage identification is crucial for maintaining bridge integrity. This study proposes a novel method using encoded images and a convolutional neural network (CNN) for bridge damage identification. By converting raw acceleration data into encoded images, the data can be represented from multiple perspectives, enhancing the extraction of essential features related to bridge damage states. The method was validated using data simulated from a continuous rigid-frame bridge model. The results demonstrate that using encoded images as inputs yields a higher recall rate, precision, and F1-score compared to using acceleration responses as inputs, achieving a comprehensive accuracy of 92%. This study concludes that the combination of encoded images and CNN provides a robust approach for accurate and efficient bridge damage identification.

Keywords:

bridge; structural health monitoring; damage identification; encoded image; convolutional neural network

1. Introduction

Bridges are susceptible to damages caused by vehicle loads, material aging, environmental erosion, natural disasters, and other factors during its long-term operation [1]. As a crucial component of the transportation network, the condition of a bridge directly impacts the operation of the entire network [2]. Hence, it is imperative to identify the damages in the bridge promptly and accurately, and damage identification is also the essential function of the bridge health monitoring system [3,4].

In the field of structural health monitoring, deep neural networks (DNNs) have shown significant advantages in handling complex and nonlinear data [5]. In recent years, DNNs have typically been used to achieve damage identification by serving simultaneously as feature extractors and pattern classifiers. The damage features can be automatically extracted and optimized by DNNs, which have the advantages of high efficiency and no need for manual intervention [6,7]. Liu et al. [8] used a convolutional neural network (CNN) and long short-term memory (LSTM) for damage identification in bridge structures, finding that these DNN models could accurately identify structural damage in both simulated and measured data, outperforming traditional methods. A CNN was utilized by Wang et al. [9] for damage identification in a steel truss bridge model, and the identification results demonstrated that the CNN approach effectively identified damage locations and severities, proving its robustness and potential for enhancing bridge health monitoring systems. Nguyen et al. [10] used a CNN combined with the Gapped Smoothing Method (GSM) to identify and localize damage in the Bo Nghi bridge, demonstrating the method’s effectiveness in actual bridge engineering. The Generative Adversarial Network (GAN) and Deep Adaptation Network (DAN) were combined for cross-domain damage identification by Zhou et al. [11]. A GAN is employed to augment the number of learning samples, while a DAN is used to extract essential damage features, and the effectiveness of the proposed method is validated using a steel truss bridge model. However, the excitations on bridges under operating conditions are similar to white noise, resulting in small structural responses [12,13,14] with significant randomness. Zhou et al. [15] conducted a comparison of different features for bridge seismic damage detection, showing that manually constructed features can achieve better damage identification results, but this process is often time-consuming and labor-intensive. Directly applying DNNs to extract damage features from bridge structural responses may occasionally miss critical damage characteristics, resulting in reduced accuracy for damage identification.

In the field of structural health monitoring, image-encoding techniques are commonly employed to represent the raw measured data for data anomaly detection, and, by using this representation, a very high accuracy of anomaly detection can be achieved [8]. Among the various image-encoding techniques, the simple ones can directly use time series waveform [16] or spectrograms [17,18] as the encoded images, and the advanced ones can project the time series data into two-dimensional image spaces, such as the Gramian Angular Field (GAF) [19] and Markov Transition Field (MTF) [20]. De Santo et al. [21] proposed a state evaluation framework using time-series-encoding techniques, including recurrence plot, GAF, MTF, and wavelet transform. A convolutional neural network was adopted for image classification to predict maintenance tasks, and this framework was evaluated using the NASA bearings dataset. Bridge experiment data were encoded into images by Mantawy et al. [22] utilizing GAF and MTF, and they demonstrated that the MTF-encoded images of acceleration achieved the highest accuracy in damage state prediction using a convolutional neural network. GAF was utilized by Liao et al. [23] to encode guided wave data into images, and CNNs were used as classifier to identify structural damage states. This method was validated with guided wave data measured from composite structures. Rahadian et al. [24] used statistical analysis and the Pearson index to determine the correlation between image-encoding methods and CNN accuracy, enabling the selection of appropriate encoding techniques for diverse datasets while avoiding the tedious matching of encoded images and CNN models. Advanced image-encoding techniques can accurately depict the changes in dynamic characteristics of data by calculating correlations or transformation probabilities between data points, and they have demonstrated their effectiveness in extracting the essential features of time series data [25]. In other words, image-encoding techniques primarily focus on the change laws of data features rather than their actual values, and the data with similar change laws can be categorized into the same type. The above properties equip image-encoding techniques with the theoretical capacity to handle highly random data. Deng et al. [26] reviewed the application of image-encoding techniques in detecting abnormal data for structural health monitoring. Currently, these techniques are primarily applied to anomaly detection in structural responses. However, their use in evaluating the structural state is still limited, and further research is needed to determine whether they can effectively highlight essential structural characteristics.

To address the issue of bridge damage features being difficult to extract and easily interfered with, leading to low damage identification accuracy, a novel damage identification method based on image-encoding techniques is proposed in this paper. Raw structural responses are transformed into images using multiple image-encoding techniques to emphasize key features related to damage states. This method captures damage characteristics from various perspectives without increasing input size and is highly adaptive, requiring no complex manual adjustments or parameter tuning. Then, a CNN is constructed to serve as the pattern recognition tool to map the encoded images to damage states. The proposed method is validated using simulated data from a continuous rigid-frame bridge model, with raw acceleration responses also used as inputs for comparison.

2. Methodology

This paper proposes a novel bridge damage identification method based on image-encoding techniques and a CNN. The image-encoding techniques extract characteristic changes in time series data by converting them into two-dimensional images. The CNN, known for its image processing capabilities, is then used as a pattern classifier to determine the location and extent of the damage. The framework of the proposed method is illustrated in Figure 1. The measured bridge responses are presented as multi-column data, with each column corresponding to a testing channel. Three different image-encoding techniques are applied to transform these responses into images. For each column, three images are generated and subsequently fused into a single image. As a result, each column of structural response corresponds to one fused image. These fused images are then used as input to the CNN, which establishes a mapping between the encoded images and the structural damage states.

2.1. Image Encoding

In this part, the fundamental concepts of image-encoding techniques are introduced. Three types of images-encoding techniques are adopted to encode the time series data of the bridge response; they are Gramian Angular Field (GASF), Markov Transition Field (MTF), and Unthresholded Recurrence Plots (URPs), and their fusion image is adopted to represent the raw data in this study.

2.1.1. GASF

The Gramian Angular Field (GAF) can represent time series data in a polar coordinate system and examine the angular relationships between data points at different time intervals, and, by these means, the correlations between data points are revealed. GAF is particularly effective for processing and analyzing time series data [9].

For a given time series,

X = {x_{1}, x_{2}, \dots, x_{i}, \dots, x_{j}, \dots, x_{n}}

, i, j ∈ [1, n], and n is the number of data points. The first step of GAF is to scale X into the interval of [−1, 1], as expressed in Equation (1).

\tilde{X} = \frac{(x_{i} - \max (X)) + (x_{i} - \min (X))}{\max (X) - \min (X)}

(1)

Then, the polar coordinate is adopted to represent the rescaled time series, in which the time series values are encoded as angular cosines, and the time stamps are encoded as the radius; the operation is expressed as Equations (2) and (3).

ϕ = \arccos ({\tilde{x}}_{i}), - 1 \leq {\tilde{x}}_{i} \leq 1, {\tilde{x}}_{i} \in \tilde{X}

(2)

r = \frac{t_{i}}{N}, t_{i} \in N

(3)

where ϕ is the angular cosine of

{\tilde{x}}_{i}

, t_i is the time stamp, N is a constant factor used to regularize the span of the polar coordinate system, and ℕ is a set of natural numbers.

After encoding the scaled time series into polar coordinate form, the elements in the Gram matrix can be used to examine the correlation between two points at different time intervals from the trigonometric function value of the sum or difference. Two calculation methods are derived from the theory of GAF; one is GASF, which is based on the cosine function, and the other is GADF, which is based on the sine function. The two calculation methods are defined as follows:

\begin{array}{l} G A S F = [\cos (ϕ_{i} + ϕ_{j})] = [\begin{matrix} \cos (ϕ_{1} + ϕ_{1}) & \cos (ϕ_{1} + ϕ_{2}) & \dots & \cos (ϕ_{1} + ϕ_{n}) \\ \cos (ϕ_{2} + ϕ_{1}) & \cos (ϕ_{2} + ϕ_{2}) & \dots & \cos (ϕ_{2} + ϕ_{n}) \\ ⋮ & ⋮ & ⋱ & ⋮ \\ \cos (ϕ_{n} + ϕ_{1}) & \cos (ϕ_{n} + ϕ_{2}) & \dots & \cos (ϕ_{n} + ϕ_{n}) \end{matrix}] \\ = {\tilde{X}}^{‘} \cdot \tilde{X} - \sqrt{I - {\tilde{X}}^{2}} \end{array}

(4)

\begin{array}{l} G A D F = [\sin (ϕ_{i} + ϕ_{j})] = [\begin{matrix} \sin (ϕ_{1} - ϕ_{1}) & \sin (ϕ_{1} - ϕ_{2}) & \dots & \sin (ϕ_{1} - ϕ_{n}) \\ \sin (ϕ_{2} - ϕ_{1}) & \sin (ϕ_{2} - ϕ_{2}) & \dots & \sin (ϕ_{2} - ϕ_{n}) \\ ⋮ & ⋮ & ⋱ & ⋮ \\ \sin (ϕ_{n} - ϕ_{1}) & \sin (ϕ_{n} - ϕ_{2}) & \dots & \sin (ϕ_{n} - ϕ_{n}) \end{matrix}] \\ = {\sqrt{I - {\tilde{X}}^{2}}}^{‘} \cdot \tilde{X} - {\tilde{X}}^{‘} \sqrt{I - {\tilde{X}}^{2}} \end{array}

(5)

In the formula, I is the unit row vector, and (·)’ is the transpose operation of the corresponding matrix.

In GAF, when the position moves from the upper left corner to the lower right corner, the corresponding time increases continuously, thus retaining time dependence. In addition, GAF examines the time correlation of the sequence by calculating the direction superposition or difference in two time points with a time interval of k to obtain the elements G_ij||i–j|=k at each position in the matrix.

2.1.2. MTF

MTF [27] is based on the Markov process, which is used to describe the state transition probability between every two points in time series data so as to further analyze the dynamic characteristics and spatial structure of data. Specifically, the principle of encoding the time series into MTF is as follows.

The normalized time series usually has a Gaussian distribution, and it is easy to determine the demarcation point under the Gaussian curve to divide multiple regions of the same size. Therefore, the breakpoint is a sorted number set, Q = {q₁, q₂, …, q_i, …, q_j, …, q_Q}, such that the area under the Gaussian curve follows Equation (6).

P (q_{i + 1}) - P (q_{i}) = \frac{1}{Q}

(6)

After the quantile window is determined, each point x_i of the time series is assigned to its corresponding window q_j, and window conversion with the time step is counted in the way of a first-order Markov chain to obtain the weighted adjacency matrix W with a size of Q × Q. Then, after the normalization of ∑w_i,j = 1, W will be converted to the following Markov matrix:

W = [\begin{matrix} w_{11 | x_{t} \in q_{1}, x_{t - 1} \in q_{1}} & w_{12 | x_{t} \in q_{1}, x_{t - 1} \in q_{2}} & \dots & w_{1 Q | x_{t} \in q_{1}, x_{t - 1} \in q_{Q}} \\ w_{21 | x_{t} \in q_{2}, x_{t - 1} \in q_{1}} & w_{22 | x_{t} \in q_{2}, x_{t - 1} \in q_{2}} & \dots & w_{2 Q | x_{t} \in q_{2}, x_{t - 1} \in q_{Q}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ w_{Q 1 | x_{t} \in q_{Q}, x_{t - 1} \in q_{1}} & w_{Q 2 | x_{t} \in q_{Q}, x_{t - 1} \in q_{2}} & \dots & w_{Q Q | x_{t} \in q_{Q}, x_{t - 1} \in q_{Q}} \end{matrix}]

(7)

To improve the insensitivity of W to the data distribution and temporal dependency, Markov matrix is extended by MTF via aligning each probability along the temporal order, as defined in Equation (8).

M = [\begin{matrix} M_{11} & M_{12} & \dots & M_{1 n} \\ M_{21} & M_{22} & \dots & M_{2 n} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ M_{n 1} & M_{n 2} & \dots & M_{n n} \end{matrix}] = [\begin{matrix} m_{11 | x_{1} \in q_{i}, x_{1} \in q_{j}} & m_{12 | x_{1} \in q_{i}, x_{2} \in q_{j}} & \dots & m_{1 n | x_{1} \in q_{i}, x_{n} \in q_{j}} \\ m_{21 | x_{2} \in q_{i}, x_{1} \in q_{j}} & m_{22 | x_{2} \in q_{i}, x_{2} \in q_{j}} & \dots & m_{2 n | x_{2} \in q_{i}, x_{n} \in q_{j}} \\ \dots & \dots & ⋱ & \dots \\ m_{n 1 | x_{n} \in q_{i}, x_{1} \in q_{j}} & m_{n 2 | x_{n} \in q_{i}, x_{2} \in q_{j}} & \dots & m_{n n | x_{n} \in q_{i}, x_{n} \in q_{j}} \end{matrix}]

(8)

In MTF, M_ij||i–j|=k denotes the transition probability between the points with time interval k. The main diagonal M_ii is the maximum transition probability with a time interval of 0, indicating that the self-transition probability or state does not change. The size of MTF is N × N when the length of the original time series is N.

2.1.3. URP

The recurrence plot (RP) was proposed by Eckmann [28] to analyze the recursive behavior of nonlinear time series, and it was proven to have very high efficiency. The main idea of RP is to reconstruct the phase space of time series by embedding dimension m and time delay τ to reveal the points with the same state that the trajectory has visited previously. The equation of RP is defined as follows:

R P_{i, j} (ε) = H (ε - ‖\vec{x} (i) - \vec{x} (j)‖), \vec{x} (\cdot) \in ℜ^{m}, i, j = 1, \dots, N R P_{i, j} (ε)

(9)

where N is the number of states, m is the dimension of the phase space trajectory,

\vec{x} (i)

and

\vec{x} (j)

are the observed sub-series at states i and j, respectively, ε is the threshold,

‖\cdot‖

is the norm operation (such as the Euclidean norm or maximum norm), and H(y) is the Heaviside unit function used to binarize the distance matrix, as defined in Equation (10).

H (y) = \{\begin{matrix} 0, y < 0 \\ 1, y \geq 0 \end{matrix}

(10)

where

y = ε - ‖\vec{x} (i) - \vec{x} (j)‖

. H(y) is zero when y is negative; this means that no co-relationship is between state i and state j, and an empty point is placed in RP. H(y) is 1 when y is non-negative; this represents a recursive relationship between the two states, and a black dot is plotted in RP.

To avoid information loss caused by the process of binarization, the two binarized values can be substituted by different color values, and this new method is called URP [29,30]. The URP encoding technique is also adopted to convert time series into images in this paper. In URP, the distance between two states is depicted by phase space instead of recursion state, and its expression is shown as Equation (11).

U R P_{i, j} = ‖\vec{x} (i) - \vec{x} (j)‖

(11)

2.1.4. Fusion

The images encoded by different encoding techniques can represent the same time series in different perspectives with different characterization capabilities. For a time series with different characters at different locations, it is difficult to demonstrate its characters well by using only one encoding technique. To take full advantage of different encoding techniques, a fused image containing the three encoded images mentioned above are proposed in this paper to further highlight the abnormal features in the monitoring data.

First, the images encoded by different encoding techniques should be converted to the same size. Then, the fused image can be obtained by an operation of superposition with weights. The image fusion operation is expressed below:

I_{F} = \sum_{i = 1}^{n} (α_{i} \cdot I_{i})

(12)

\sum_{i = 1}^{n} α_{i} = 1

(13)

where I_F is the fused image; I_i is the image encoded by a particular encoding technique; and α_i is the corresponding fusion weight. To balance the different features captured by different encoding techniques, the weights of the three encoding techniques are set as equal, as α₁ = α₂ = α₃ = 1/3. Figure 2 demonstrates the process of encoding a time series as three channel images and fusing them into one multiple-channel image.

2.2. Convolutional Neural Network

CNN uses feedforward operations such as convolution, pooling, and nonlinear mapping to automatically extract features from input data and performs well in various pattern recognition problems and computer vision tasks [31]. To make full use of the advantages of CNN, this paper designs a two-dimensional CNN to realize damage pattern recognition of a steel truss bridge. The structure of the designed CNN in this paper is demonstrated in Figure 3. The construction of key layers, training strategies, and hyperparameter configurations of the CNN are illustrated as follows.

2.2.1. Key Layers

In the designed CNN, the fusion of GASF, MTF, and URP are used as input, with the damage condition serving as the output. Between the input layer and the output layer, there are two continuous stacked feature learning modules, each of which has a convolution layer, a batch normalization layer, and a maximum pooling layer. After the feature learning module is a flattening layer and two fully connected layers with Dropout technology. The Softmax activation function identifies each damage condition in the output layer, and the Leaky ReLU activation function is applied to all activation layers except the output layer. To provide a comprehensive understanding of the network’s construction, the parameters of each layer are summarized in Table 1.

(1): Convolutional layer

In the convolution layer, the input data are convoluted by a specific convolution kernel to achieve feature extraction, and its output is generally passed to the activation layer. After passing through the activation layer, the output is activated to obtain the corresponding feature map. The inactivated feedforward operation of the convolution layer can be expressed as follows [31]:

u_{l}^{k} = \sum_{h = U} x_{l - 1}^{h} * w_{l}^{k} + b_{l}^{k}

(14)

where

u_{l}^{k}

is the latent representation of the kth inactivated feature map of the current layer;

x_{l - 1}^{h}

is the hth feature map of the previous layer or the input image of the hth channel (when the current layer is the first convolutional layer); U is the number of feature maps or channels; the symbol * means the 2D convolution operation; and

w_{l}^{k}

and

b_{l}^{k}

are, respectively, the weight (convolution kernel) and bias of the convolution operation. The convolution operation is demonstrated in Figure 4.

(2): Activation layer

The convolution operation is a linear mapping. In order to enhance the feature expression ability of the model, an activation layer with a nonlinear mapping function can be added after the convolution layer so that the model can better learn complex nonlinear features. The inactivated output of the convolutional layer can be formally converted into a feature map by the activation function. The process is expressed by a mathematical formula as follows:

y_{l}^{k} = f (u_{l}^{k})

(15)

where

y_{l}^{k}

is the kth feature map of the current layer;

u_{l}^{k}

is the inactivated output of the previous convolutional layer; and f(·) is the activation operation of a nonlinear activation function.

The Leaky ReLU activation function [32,33], which is insensitive to the gradient disappearance problem and has a high convergence speed, was adopted in this paper, and its expression is shown in Equation (16).

L r a k y Re L U = \{\begin{matrix} x, x \geq 0 \\ α x, x < 0 \end{matrix}

(16)

where α is a small positive number to ensure that the left half has a certain gradient.

(3): Batch normalization layer

The batch normalization (BN) layer [34] is used to extend the data to zero mean and unit variance during each training iteration to solve the problem of the coordinated updating of parameters between different layers and reduce the over-fitting risk of the model [35]. This is a simple and effective method, which can make the network obtain faster convergence and stronger generalization ability. BN transformation is expressed as follows:

μ_{D} = \frac{1}{m} \sum_{i = 1}^{m} x_{i}

(17)

σ_{D}^{2} = \frac{1}{m} \sum_{i = 1}^{m} {(x_{i} - μ_{D})}^{2} σ

(18)

{\hat{x}}_{i} = \frac{x_{i} - μ_{D}}{\sqrt{σ_{D}^{2} + ε}}

(19)

y_{i} = γ {\hat{x}}_{i} + β

(20)

where m is the size of the mini-batch data

D = \{x_{1}, x_{2}, \dots, x_{m}\}

, μ_D and

σ_{D}^{2}

are, respectively, the mean and variance of each batch of training data,

{\hat{x}}_{i}

is the normalized value, and ε is a small constant added to the mini-batch variance to ensure numerical stability. β and γ are two trainable parameters, while

{\hat{x}}_{i}

is linearly transformed into y_i to enhance their expressiveness.

(4): Pooling layer

The pooling layer replaces the output of the network at a certain position with the overall statistics of adjacent outputs at that position. It can reduce the data dimension, concentrate the extracted features, and prevent over-fitting to a certain extent [36]. Figure 5 shows the commonly used Max Pooling and Average Pooling operations. The maximum pooling operation is adopted in the designed CNN, which can be described as follows:

P_{i, j} \subset {1, 2, \dots, N_{i n}}^{2}, \forall (i, j) \in {1, 2, \dots, N_{o u t}}^{2}

(21)

{output}_{i, j} = \max_{(m, n) \in P_{i, j}} ({Input}_{m, n})

(22)

where P_i,j is the pooling region; N_in is the dimension of input, while N_out is the dimension of output.

(5): Fully connected layer

The feature representations learned in the convolution and pooling layers are flattened into column vectors and input into the fully connected (FC) layer, and then the mapping relationship between complex high-dimensional features and simple low-dimensional labels is established. To reduce the complexity of the model and improve the generalization ability of the model, regularization and Dropout [37] techniques are usually combined with the FC layer.

(6): Output layer

The output layer is set at the end of the network to complete the label mapping. High-level features from the FC layer are sent to the output layer to determine their associated classification patterns. This layer uses the classification cross entropy function as the loss function to guide the learning of the network and uses the Softmax classification function to calculate the probability of possible categories, which can be expressed as Equation (23) and Equation (24).

L = - \frac{1}{N} \sum_{n = 1}^{N} \sum_{k = 1}^{K} l_{k}^{(n)} \log (Ψ_{k}^{n})

(23)

Ψ_{k}^{n} = \frac{\exp (η_{k})}{\sum_{m = 1}^{K} \exp (η_{m})}

(24)

where N is the number of small batch samples, K is the set of all classes k,

l_{k}^{(n)}

is the one-hot ground truth label of sample n for class k, and

Ψ_{k}^{n}

is the probability of sample n belonging to class k.

2.2.2. Training Strategies and Hyperparameter Configurations

The performance of the network depends not only on the layout of the layer but also on the training strategy and hyper-parameter configuration. To avoid the over-fitting problem and achieve efficient and stable convergence of the training process, Adam optimization algorithm, mini-batch, L2 regularization, and Dropout are applied to the network. The Adam optimization algorithm [38] with an adaptive learning rate is used to update the parameters of each layer, in which the learning rate changes adaptively in different learning stages, and three hyperparameters, α, β1, and β2, are used to control the optimization process of the learning rate. In addition, mini-batch, L2 regularization, and Dropout techniques [37] are employed to avoid network overfitting. In this study, the hyperparameter configurations of the network are determined through a manual trial-and-error method. The strategy for adjusting these hyperparameters is to enable the network to achieve a rapid and stable convergence during pre-training. The final optimal configurations are presented in Table 2.

3. Case Study

3.1. Bridge Damage Simulation

The proposed damage identification method was validated on a continuous rigid-frame bridge model constructed with reinforced concrete. The steel reinforcement used is HRB335, with the girder made of C50 concrete and the piers made of C40 concrete. Damage scenarios were focused on the piers, which have hollow rectangular sections with a reinforcement ratio of 0.6%. The layout of the bridge model is illustrated in Figure 6.

A finite element model of the bridge is established in OpenSees(3.7.0) to simulate its damage status and vibration responses. The damage status of the bridge is simulated by altering the stiffness of elements at the bottom of the four piers. The four levels of damage assigned to the elements are defined as Intact, Minor, Moderate, and Severe, corresponding to stiffness reductions of 0%, 20%, 50%, and 80%, respectively. To obtain the vibration responses of the damaged bridge model, white noise excitation is applied to the bases of the four piers in the longitudinal direction. The horizontal acceleration responses at the top and bottom of each pier are extracted, totaling eight measurement points. The sampling frequency for both excitation and response is set at 100 Hz, and the duration of the white noise excitation is set at 10 s, resulting in each measurement point having 1000 data points of acceleration response.

Six damage scenarios are generated with representative combinations of damage levels, and the damage states of the six damage scenarios are detailed in Table 3.

Randomly generated white noise is used to excite the model to obtain corresponding acceleration responses, producing a total of 6000 samples. For each sample, the acceleration responses from eight measurement points are used as inputs, and the damage states of the four locations are used as outputs.

3.2. Bridge Damage Identification

3.2.1. Sample Preparation

In actual engineering, the impact of environmental noise on the bridge cannot be ignored [39,40]. Therefore, randomly generated white noises are added onto the simulated structural responses to simulate the environment and measurement noises. The noise addition process is shown in Equation (25).

a c c_{n o i s e} = a c c + SNR \cdot n o i s e

(25)

where acc is the raw bridge acceleration responses; acc_noise is the acceleration responses with noise; noise is Gaussian white noise with a mean value of 0 and standard deviation of 1; and SNR is the signal-to-noise ratio. In this study, to simulate bridge acceleration responses being completely overwhelmed by noise, the SNR is set to 1, meaning the noise and acceleration response have equal energy.

After the adding of noise, the time series are converted into four types of images using the encoding technique described in Section 2.1. The four encoded images of the acceleration responses from the top of pier 2 are demonstrated in Figure 7.

It can be seen from Figure 7 that different encoding techniques vary in their representation of damage. As the damage progresses from minor to severe, the changes in the features of the encoded images also follow no consistent pattern. Using multiple encoded images can better represent the essential features of the damage samples.

To compare the representational capabilities of encoded images for bridge damage states, two sample libraries are created using raw acceleration responses and fusion encoded images as inputs, with damage scenario labels as the output. These two sample libraries are divided into training, validation, and testing sets at a ratio of 6:2:2.

3.2.2. Network Training and Validation

Two CNN models are constructed according to the framework illustrated in Section 2.2, and the damage samples of acceleration responses and encoded images are employed to train the two models. During the learning process, the losses and accuracies of the training and validation samples are tracked, and they are demonstrated in Figure 8 and Figure 9.

Both models were trained on a GPU with a processing speed of 1.8 GHz, and they converged within 100 epochs, achieving satisfactory accuracy in bridge damage identification. The network using raw acceleration responses as input took approximately 22 min to train, with the training and validation losses stabilizing after 80 epochs. By contrast, the network using images as input required about 35 min to complete training, but its losses stabilized after only 50 epochs. Although the overall training time was longer for the image-based network, it demonstrated faster convergence and higher learning efficiency compared to the raw response-based network.

No obvious divergence between the losses of training samples and validation samples can be observed, suggesting that the overfitting problem did not occur. In Figure 8a and Figure 9a, the model losses for acceleration responses stabilized after 80 epochs, whereas those for encoded images stabilized after 40 epochs, indicating that the second model exhibits rapid convergence. In Figure 8b and Figure 9b, the accuracy of the model using acceleration responses as inputs is 88%, while the model using encoded images as inputs achieved an accuracy of 93%.

In summary, for the same bridge damage samples, using encoded images as input features allowed the CNN to achieve faster convergence and higher identification accuracy.

3.2.3. Damage Identification

In this section, 1200 testing samples are used to evaluate the performance of the two models in bridge damage identification. These samples were not encountered by the models during their learning processes. By feeding the testing samples into the well-trained models, the damage identification results of the rigid-frame bridge model can be derived.

To more directly examine the accuracies and misidentifications of different damage scenarios, a confusion matrix is adopted in this study to display the results of bridge damage identification. To quantitatively compare the identification performances of the models with different inputs, the precision, recall, and F1-score for different damage scenarios are calculated using Equations (26)–(28), respectively.

Precision = \frac{TP}{TP + FP}

(26)

Recall rate = \frac{TP}{TP + FN}

(27)

F 1 - score = \frac{2 \times Precision \times Recall rate}{Precision + Recall rate}

(28)

The confusion matrices of the damage identification results of the two models are demonstrated in Figure 10, where the right column represents the recall rate, and the bottom row indicates precision. The F1-scores of different damage scenarios are listed in Table 4.

It is obvious that when using acceleration responses as inputs, scenarios 1 and 2 have lower recall rates, and there is a significant amount of misidentification between them. Additionally, the precision for scenario 3 is significantly below the average value. When encoded images are used as inputs, the recall rates and precisions across all scenarios are more uniform, without any significantly poor identification results. Simultaneously, the model using encoded images as inputs generally has higher F1-scores. Comprehensively, the three metrics all indicate that using encoded images as inputs allows the CNN to achieve better damage identification results.

4. Conclusions

In this study, a bridge damage identification method based on encoded images and a CNN was proposed, and a continuous rigid-frame bridge model was employed to validate the proposed method. The damage identification results of different types of inputs were investigated and compared, and the following conclusions can be drawn:

Different image-encoding techniques can represent structural responses from various perspectives, and using fusion images can integrate these features, allowing for a more comprehensive representation of the responses.
Using encoded images as inputs enables more accurate damage identification, allowing for a more precise distinction between the closely related damage levels of scenario 1 and scenario 2.
Using encoded images as inputs and a CNN as the pattern classifier enables accurate identification of bridge damage states, achieving higher recall rates, precision, and F1-scores.
The bridge model used in this case study was simplified, and further research should be conducted for practical engineering applications, considering factors such as modeling errors and environmental interference.

Author Contributions

Conceptualization, X.W.; methodology, X.W. and S.S.; validation, X.W.; formal analysis, X.W., M.M. and F.Y.; investigation, X.W.; data curation, M.M. and F.Y.; writing—original draft preparation, X.W. and S.S.; visualization, X.W. and S.S.; writing—review and editing, X.W., W.L., M.M., F.Y. and S.S.; funding acquisition, X.W., W.L., M.M., F.Y. and S.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by the Key R&D program of Shanxi Province (2023-YBGY-140 and 2020GY-096), the Traffic Scientific Research Project of Shaanxi Provincial Department of Transportation (22-53K and 23-46X), the Key R&D projects in Ningxia Hui Autonomous Region (2022BEG03173), and the Shaanxi Province Youth Science and Technology New Star Project (2022 KJXX-110).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

Author Xiaoguang Wang was employed by the company CCCC First Highway Consultants Co., Ltd. Author Ming Ma was employed by the company CCCC First Highway Consultants Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Li, H.N.; Ren, L.; Jia, Z.G.; Yi, T.H.; Li, D.S. State-of-the-art in structural health monitoring of large and complex civil infrastructures. J. Civ. Struct. Health Monit. 2016, 6, 3–16. [Google Scholar] [CrossRef]
Hajializadeh, D.; Obrien, E.J.; O’Connor, A.J. Virtual structural health monitoring and remaining life prediction of steel bridges. Can. J. Civ. Eng. 2017, 44, 264–273. [Google Scholar] [CrossRef]
Vagnoli, M.; Remenyte-Prescott, R.; Andrews, J. Railway bridge structural health monitoring and fault detection: State-of-the-art methods and future challenges. Struct. Health Monit. 2018, 17, 971–1007. [Google Scholar] [CrossRef]
Shakya, A.; Mishra, M.; Maity, D.; Santarsiero, G. Structural health monitoring based on the hybrid ant colony algorithm by using Hooke–Jeeves pattern search. SN Appl. Sci. 2019, 1, 799. [Google Scholar] [CrossRef]
Zhou, X.; Li, Q.; Cui, R.; Zhu, X. Deep neural network based time–frequency decomposition for structural seismic responses training with synthetic samples. Comput. Aided Civ. Infrastruct. Eng. 2024. Early View. [Google Scholar]
Ahmed, H.; Mostafa, K.; Hegazy, T. Utilizing different artificial intelligence techniques for efficient condition assessment of building components. Can. J. Civ. Eng. 2024, 51, 379–389. [Google Scholar] [CrossRef]
Ding, X.; Kwon, T.J. Enhancing winter road maintenance with explainable AI: SHAP analysis for interpreting machine learning models in road friction estimation. Can. J. Civ. Eng. 2024, 51, 529–544. [Google Scholar] [CrossRef]
Liu, G.; Niu, Y.; Zhao, W.; Duan, Y.; Shu, J. Data anomaly detection for structural health monitoring using a combination network of GANomaly and CNN. Smart Struct. Syst. 2022, 29, 53–62. [Google Scholar]
Wang, Z.; Oates, T. Imaging time-series to improve classification and imputation. arXiv 2015, arXiv:1506.00327. [Google Scholar]
Nguyen, D.H.; Nguyen, Q.B.; Bui-Tien, T.; De Roeck, G.; Wahab, M.A. Damage detection in girder bridges using modal curvatures gapped smoothing method and Convolutional Neural Network: Application to Bo Nghi bridge. Theor. Appl. Fract. Mech. 2020, 109, 102728. [Google Scholar] [CrossRef]
Zhou, X.; Li, M.; Liu, Y.; Yu, W.; Elchalakani, M. Cross-domain damage identification of bridge based on generative adversarial and deep adaptation networks. Structures 2024, 64, 106540. [Google Scholar] [CrossRef]
Lu, P.; Liu, Z.; Zhang, T. A machine learning model to predict the seismic lifecycle behavior of a cross-sea cable-stayed bridge. Buildings 2024, 14, 1190. [Google Scholar] [CrossRef]
Jia, L.; Xu, J.; Luo, K.; Li, W.; Liu, Y.; Pei, H. Mechanical performance analysis and parametric study of the transition section of a hybrid cable-stayed suspension bridge. Buildings 2024, 14, 1805. [Google Scholar] [CrossRef]
Wu, J.; Zou, H.; He, N.; Xu, H.; Wang, Z.; Rui, X. Experimental and numerical analysis of flexural properties and mesoscopic failure mechanism of single-shell lining concrete. Buildings 2024, 14, 2620. [Google Scholar] [CrossRef]
Zhou, X.; Zhao, Y.; Khan, I.; Cao, L. Comparative study on CNN-based bridge seismic damage identification using various features. KSCE J. Civ. Eng. 2024, 9, 1–10. [Google Scholar] [CrossRef]
Bao, Y.; Tang, Z.; Li, H.; Zhang, Y. Computer vision and deep learning–based data anomaly detection method for structural health monitoring. Struct. Health Monit. 2019, 18, 401–421. [Google Scholar] [CrossRef]
Tang, Z.; Chen, Z.; Bao, Y.; Li, H. Convolutional neural network-based data anomaly detection method using multiple information for structural health monitoring. Struct. Control Health Monit. 2019, 26, e2296. [Google Scholar] [CrossRef]
Shajihan, S.A.V.; Wang, S.; Zhai, G.; Spencer Jr, B.F. CNN based data anomaly detection using multi-channel imagery for structural health monitoring. Smart Struct. Syst. 2022, 29, 181–193. [Google Scholar]
Mao, J.; Wang, H.; Spencer, B.F. Toward data anomaly detection for automated structural health monitoring: Exploiting generative adversarial nets and autoencoders. Struct. Health Monit. 2021, 20, 1609–1626. [Google Scholar] [CrossRef]
Lei, X.; Xia, Y.; Wang, A.; Jian, X.; Zhong, H.; Sun, L. Mutual information based anomaly detection of monitoring data with attention mechanism and residual learning. Mech. Syst. Signal Process. 2023, 182, 109607. [Google Scholar] [CrossRef]
De Santo, A.; Ferraro, A.; Galli, A.; Moscato, V.; Sperlì, G. Evaluating time series encoding techniques for Predictive Maintenance. Expert Syst. Appl. 2022, 210, 118435. [Google Scholar] [CrossRef]
Mantawy, I.M.; Mantawy, M.O. Convolutional neural network based structural health monitoring for rocking bridge system by encoding time-series into images. Struct. Control Health Monit. 2022, 29, e2897. [Google Scholar] [CrossRef]
Liao, Y.; Qing, X.; Wang, Y.; Zhang, F. Damage localization for composite structure using guided wave signals with Gramian angular field image coding and convolutional neural networks. Compos. Struct. 2023, 312, 116871. [Google Scholar] [CrossRef]
Rahadian, H.; Bandong, S.; Widyotriatmo, A.; Joelianto, E. Image encoding selection based on Pearson correlation coefficient for time series anomaly detection. Alex. Eng. J. 2023, 82, 304–322. [Google Scholar] [CrossRef]
Yang, C.L.; Chen, Z.X.; Yang, C.Y. Sensor classification using convolutional neural network by encoding multivariate time series as two-dimensional colored images. Sensors 2019, 20, 168. [Google Scholar] [CrossRef]
Deng, Y.; Zhao, Y.; Ju, H.; Yi, T.H.; Li, A. Abnormal data detection for structural health monitoring: State-of-the-art review. Dev. Built Environ. 2024, 17, 100337. [Google Scholar] [CrossRef]
Liu, L.; Wang, Z. Encoding temporal Markov dynamics in graph for time series visualization. arXiv 2016, arXiv:1610.07273. [Google Scholar]
Eckmann, J.P.; Kamphorst, S.O.; Ruelle, D. Recurrence plots of dynamical systems. Europhys. Lett. (EPL) 1987, 4, 973–977. [Google Scholar] [CrossRef]
Yang, H. Multiscale recurrence quantification analysis of spatial cardiac vectorcardiogram signals. IEEE Trans. Biomed. Eng. 2010, 58, 339–347. [Google Scholar] [CrossRef] [PubMed]
Sipers, A.; Borm, P.; Peeters, R. On the unique reconstruction of a signal from its unthresholded recurrence plot. Phys. Lett. A 2011, 375, 2309–2321. [Google Scholar] [CrossRef]
Zhu, S.; Yu, T.; Xu, T.; Chen, H.; Dustdar, S.; Gigan, S.; Pan, Y. Intelligent computing: The latest advances, challenges, and future. Intell. Comput. 2023, 2, 0006. [Google Scholar] [CrossRef]
Maas, A.L.; Hannun, A.Y.; Ng, A.Y. Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 1–6. [Google Scholar]
Xu, B.; Wang, N.; Chen, T.; Li, M. Empirical evaluation of rectified activations in convolutional network. arXiv 2015, arXiv:1505.00853. [Google Scholar]
Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv 2015, arXiv:1502.03167. [Google Scholar]
Zhang, Y.; Miyamori, Y.; Mikami, S.; Saito, T. Vibration-based structural state identification by a 1-dimensional convolutional neural network. Comput. Aided Civ. Infrastruct. Eng. 2019, 34, 822–839. [Google Scholar] [CrossRef]
Graham, B. Fractional max-pooling. arXiv 2014, arXiv:1412.6071. [Google Scholar]
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Hernandez-Garcia, M.; Masri, S. Application of statistical monitoring using latent-variable techniques for detection of faults in sensor networks. J. Intell. Mater. Syst. Struct. 2013, 25, 121–136. [Google Scholar] [CrossRef]
Huang, H.; Yi, T.; Li, H. Sensor fault diagnosis for structural health monitoring based on statistical hypothesis test and missing variable approach. J. Aerosp. Eng. 2017, 30, B4015003. [Google Scholar] [CrossRef]

Figure 1. Framework of damage identification.

Figure 2. Image encoding of time series.

Figure 3. Structure of the designed CNN.

Figure 4. Convolution operation.

Figure 5. Pooling operations.

Figure 6. Layout of the bridge model. (unit: cm).

Figure 7. Encoded images of the six damage scenarios.

Figure 8. Learning process tracking on the acceleration responses. (a) Loss. (b) Accuracy.

Figure 9. Learning process tracking on the encoded images. (a) Loss. (b) Accuracy.

Figure 10. Damage identification results. (a) Model trained on acceleration responses. (b) Model trained on encoded images.

Table 1. Detailed configurations of the designed CNN.

Layer	Type	Kernel Num.	Kernel Size	Stride	Padding	With BN	Activation	With Dropout
1	Input	None	None	None	None	None	None	None
2	Convolutional	16	3 × 3	1	Same	True	LReLU	False
3	Pooling	None	2 × 2	2	Valid	False	None	False
4	Convolutional	32	5 × 5	2	Same	True	LReLU	False
5	Pooling	None	2 × 2	2	Valid	False	None	False
6	Flatten	None	None	None	None	None	None	False
7	FC	None	None	None	None	False	LReLU	True
8	FC	None	None	None	None	False	LReLU	True
9	Output	None	None	None	None	None	Softmax	None

Table 2. Hyperparameter configurations of the CNN.

Hyperparameter	Value	Description
α	1 × 10⁻⁴	Learning rate set at the beginning of the training
β1	0.98	Weight parameter of momentum term of Adam
β2	0.95	The parameters of controlling learning rate decay
Batch size	256	The number of samples fed into the network in a single batch
Epoch	100	The number of times to traverse all the samples
Dropout rate	0.3	The random deactivated rate for hidden node

Table 3. Details of damage scenarios.

Damage Scenario	Damage Location and Extent
Damage Scenario	Pier 1	Pier 2	Pier 3	Pier 4
Scenario 1	Intact	Intact	Intact	Intact
Scenario 2	Minor	Intact	Minor	Intact
Scenario 3	Moderate	Intact	Minor	Minor
Scenario 4	Severe	Minor	Minor	Moderate
Scenario 5	Severe	Minor	Moderate	Moderate
Scenario 6	Severe	Moderate	Moderate	Severe

Table 4. F1-scores of different damage scenarios.

F1-Score	1	2	3	4	5	6	Average
Model 1	0.873	0.861	0.847	0.874	0.888	0.929	0.879
Model 2	0.893	0.901	0.913	0.917	0.900	0.942	0.911

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, X.; Li, W.; Ma, M.; Yang, F.; Song, S. Bridge Damage Identification Based on Encoded Images and Convolutional Neural Network. Buildings 2024, 14, 3104. https://doi.org/10.3390/buildings14103104

AMA Style

Wang X, Li W, Ma M, Yang F, Song S. Bridge Damage Identification Based on Encoded Images and Convolutional Neural Network. Buildings. 2024; 14(10):3104. https://doi.org/10.3390/buildings14103104

Chicago/Turabian Style

Wang, Xiaoguang, Wanhua Li, Ming Ma, Fan Yang, and Shuai Song. 2024. "Bridge Damage Identification Based on Encoded Images and Convolutional Neural Network" Buildings 14, no. 10: 3104. https://doi.org/10.3390/buildings14103104

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Bridge Damage Identification Based on Encoded Images and Convolutional Neural Network

Abstract

1. Introduction

2. Methodology

2.1. Image Encoding

2.1.1. GASF

2.1.2. MTF

2.1.3. URP

2.1.4. Fusion

2.2. Convolutional Neural Network

2.2.1. Key Layers

2.2.2. Training Strategies and Hyperparameter Configurations

3. Case Study

3.1. Bridge Damage Simulation

3.2. Bridge Damage Identification

3.2.1. Sample Preparation

3.2.2. Network Training and Validation

3.2.3. Damage Identification

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI