*2.2. Data Preprocessing*

Since there may be missing values or calculation errors in the process of data acquisition, data cleaning is performed on all data to remove missing or calculated incorrect values, which is reflected in MATLAB as "NAN". In order to keep the data length unchanged as the input of the neural network, all "NAN" values are replaced with 0.

**Figure 1.** The monitored bridge and the position of the accelerometer on the bridge deck and tower. Source: The 1st International Project Competition for Structural Health Monitoring, July 2020.

**Figure 2.** *Cont*.

**Figure 2.** The example for each data pattern. (**a**) Normal data; (**b**) Missing data; (**c**) Minor data; (**d**) Outlier data; (**e**) Square data; (**f**) Trend data; (**g**) Drift data.


**Table 1.** Description of each type of data pattern.

Zero-mean normalization is used to process the data for one hour so that the normalized data are normally distributed; that is, the mean is zero, and the standard deviation is one. This method can eliminate errors caused by self-variation or large differences in values, making the data more beneficial for subsequent steps. As shown in Equation (1),

$$\mathbf{x}^\* = \frac{\mathbf{x} - \mu}{\sigma} \tag{1}$$

Where *x* is a one-hour time series {*<sup>x</sup>*1, *x*2, ... , *<sup>x</sup>N*}, *μ* is the mean of all sampling points, *σ* is the standard deviation of all sampling points, and *x*\* is the normalized time series.
