**3. Bridge Monitoring Data Verification**

According to the proposed method process, anomaly detection is performed on the bridge monitoring data set. First, data preprocessing is performed on all original samples, missing values are deleted, and samples are standardized. In order to test the generalization ability of the model, the data set is divided into training and test sets, and 80% of the samples are randomly selected as the training set. The training set size is 22,616. Twenty percent of the samples are randomly selected as the test set, and the test set size is 5656. In order to simulate real anomalies, the distribution of test samples is unbalanced. Table 2 shows the distribution of selected test samples.

**Table 2.** Data distribution of the test set.


Constructing a balanced training set of various categories is beneficial to the training process. Data expansion is carried out on the small number of anomalies in the training set, namely outlier and drift. The normal samples in all training sets are expanded to outlier samples by magnifying individual points. The Gaussian distributed noise with 2%, 3%, 4%, 5%, 6%, 7%, and 8% standard deviation to the signals are added to each drift sample once, and symmetrical flip it once to obtain 8 times the number of drift samples. Therefore, an additional 10,860 (13,575 × 80%) outlier samples and 4345 (679 × 80% × 8) drift samples were obtained. After adding to the training set, the new training set size is 37,821 (22,616 + 10,860 + 4345).

Down-sampling is implemented on the test set and new training set samples, and the dimensionality of the samples is reduced from 1 × 72,000 to 2 × 3600 while retaining most of their features.

In order to build the 1D-CNN architecture, two one-dimensional convolutional layers are stacked to obtain the deep features of the sample more efficiently, and a flatten layer and two dense layers are connected to convert two-dimensional features into one-dimensional output. The last layer of the network uses the softmax multi-classifier. In short, softmax is the value that maps the output of the previous layer to (0,1) through the softmax function. The sum of these values is 1, which can be understood as a probability. The node with the largest probability is selected as the predicted abnormal data type. The network structure is shown in Figure 6. The detailed structure of 1D-CNN is shown in Table 3. The hyperparameter configuration is shown in Table 4.

**Figure 6.** Schematic of the proposed CNN architecture.


**Table 3.** The detailed architecture of CNN.

**Table 4.** The configurations of training process.


Mean Squared Error (MSE) as a loss function for training and validation can be expressed as:

$$MSE = \frac{1}{N} \sum\_{i=1}^{N} (Y\_i - \chi\_{0,i})^2 \tag{3}$$

where *Y* represents the predicted value, and *Y*0 represents the true label value. *N* represents the total number of samples.

In the training process, the training set is divided into 12.5% as the verification set. During the training process, the training loss and the validation loss (MSE) are monitored, and the training accuracy and verification accuracy (Accuracy) are also monitored. The change of the loss function and the change of the accuracy are shown in Figures 7 and 8.

**Figure 7.** Training and validation loss curve.

It can be seen that the overall loss value shows a downward trend, and the overall accuracy shows an upward trend. The amplitude is large at the beginning of training, indicating that the learning rate is appropriate. There are glitches and oscillations locally, possibly because a large batch size is selected for a large number of samples, and there are a small number of samples with incorrect labels in the real-world data set. After the loss value and accuracy stabilized, the final training and validation accuracy reached more than 95%.

**Figure 8.** Training and validation accuracy curve.

Table 5 shows the classification results in a statistical way. In the statistical analysis of binary or multiple classifications, precision, recall, and F1 score are measures of the accuracy of the classification results, and the last one is the harmonic average of the first two. Recall is relative to the sample, that is, how many positive samples in the sample are predicted correctly. Take the missing-type samples in Table 5 as an example.There are a total of 603 missing-type samples. If 602 are predicted correctly, the recall is 602/603 = 99.83%. Precision is relative to the prediction result. It indicates how many of the samples whose predictions are positive are correct. Taking the normal-type samples as an example, a total of 2590 samples are predicted to be normal types. If 2542 predictions are correct, the precision is 2542/2590 = 98.15%. Recall and precision indicators are sometimes contradictory. If a comprehensive indicator is used to express the results of recall and precision, the most common method should be the F1 score as follows:

$$F\_1 = 2\frac{precision \cdot recall}{precision + recall} \times 100\% \tag{4}$$

Where *F*1 represents F1 score, *recall* represents recall, and *precision* represents precision.


**Table 5.** The prediction result of the test set.

It can be seen that the proposed method can effectively identify various data patterns. The recall of normal, missing, minor, square, trend, and drift categories can reach above 90%. Except for the low F1 score of outlier and drift, the other types are all high. A small

number of minor samples are classified into the normal category. Some outlier samples are classified into the normal category, and a few are classified into the minor category. The outlier sample may have only a few peaks, and most of the features of the outlier sample are very similar to the normal sample, and the feature that is too small will be lost in the convolution process. Trend and drift are partly confused, probably because they both have slanted features.
