A Novel Imputation Model for Missing Concrete Dam Monitoring Data

Cui, Xinran; Gu, Hao; Gu, Chongshi; Cao, Wenhan; Wang, Jiayi

doi:10.3390/math11092178

Open AccessArticle

A Novel Imputation Model for Missing Concrete Dam Monitoring Data

by

Xinran Cui

^1,2,3

,

Hao Gu

^1,3,*,

Chongshi Gu

^1,2,3,

Wenhan Cao

^1,2,3 and

Jiayi Wang

^1,2,3

¹

College of Water Conservancy and Hydropower Engineering, Hohai University, Nanjing 210098, China

²

National Engineering Research Center of Water Resources Efficient Utilization and Engineering Safety, Hohai University, Nanjing 210098, China

³

State Key Laboratory of Hydrology-Water Resources and Hydraulic Engineering, Hohai University, Nanjing 210098, China

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(9), 2178; https://doi.org/10.3390/math11092178

Submission received: 25 March 2023 / Revised: 25 April 2023 / Accepted: 3 May 2023 / Published: 5 May 2023

(This article belongs to the Special Issue Mathematical Modeling and Numerical Simulation in Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

To ensure the safety of concrete dams, a large number of monitoring instruments are embedded in the bodies and foundations of the dams. However, monitoring data are often missing due to failure of monitoring equipment, human error and other factors that cause difficulties in diagnosis of dam safety and failure to precisely predict their deformation. In this paper, a new method for imputing missing deformation data is proposed. First, since the traditional deformation increment speed distance index of the deformation similarity index does not take into account the fact that there is little change in deformations occurring in two consecutive days, the denominator of the index tends to be equal to zero. In this paper, an improved index for solving this problem is proposed. A combined weighting method for calculating the deformation similarity comprehensive index and the k-means clustering method is then proposed and used to classify deformation monitoring points. Subsequently, a panel data model that imputes different types of missing data is established. The method proposed in this paper can impute missing concrete dam deformation data more accurately; therefore, it can effectively solve the missing deformation monitoring data problem.

Keywords:

missing data imputation; distance similarity index; measurement point clustering; panel data model; concrete dam

MSC:

62P30

1. Introduction

The behavior of a concrete dam is a nonlinear dynamic evolution process in which materials and structures interact under the synergistic action of multiple factors [1,2]. As a comprehensive effect quantity of the performance of the concrete dam, deformation always attracts more attention as it indicates the transformation of the structural behavior of the dam [3,4,5]. An enormous amount of deformation monitoring data were gathered for fundamental analysis and predictions of dam deformation behavior during the lifespan of a concrete dam [6]. However, the failure of monitoring instruments, the complicated monitoring environments and human errors inevitably led, to some extent, to the loss of actual measured data [7]. Health monitoring and evaluation may be less effective and accurate due to missing data. In some situations, it may even result in an error of judgement. Therefore, it is of significant importance to submit feasible missing data imputation models to improve the reliability of the predicted results for dam safety monitoring [8]. Generally, missing data patterns can be approximately classified into different groups such as univariate, multivariate, monotone, nonmonotone and file matching. There are two common cases of missing monitoring concrete dam data: single missing data and multiple missing data [9]. At present, the methods for imputing missing data fall into two main categories: statistics-based methods and machine learning-based methods. Statistical missing data imputation methods such as mean imputation [10], hot-deck imputation [11], K-Nearest Neighbor (KNN) imputation [12], Multiple Imputation by Chained Equation (MICE) method [13] and linear regression imputation [14] are utilized for single imputation. However, the predicted results may be biased because they neglect the variability of the missing values and require pooling of results [15]. When faced with data that have more influencing factors and higher data dimensions, machine learning-based methods usually show better performance. Gu et al. [6] proposed a multi-value missing data imputation approach using BP (back propagation) mapping of spatially adjacent points from the single-value missing data completion method based on the nonlocal average method. Li et al. [8] suggested a framework for imputing missing sensor data based on a deep-stacked bidirectional long short-term memory neural network with a self-attention mechanism to handle various missing data scenarios in dam structural monitoring systems with different missing rates with high accuracy and robustness. Mao et al. [16] proposed a deep neural network multi-view learning method for imputing successive missing values in dam structural health monitoring systems. Wan et al. [17] proposed a novel data recovery model based on Bayesian multi-task learning with a multi-dimensional Gaussian prior process for structural health monitoring. Lin et al. [18] proposed a new hybrid multiple imputation framework based on deep features and proved that it was better than an imputation method based directly on the original data. The above methods all take one single measurement point as the object for interpolation; however, it ignores the interrelationships between the measurement points, which affects the accuracy and validity of the model.

In fact, the deformation sequences of all measured points in the concrete dam contain deformation information both in time and cross-sectional dimension. Due to the integrity of the dam structure, variations in adjacent monitoring points are related. As a result, deformation monitoring data have a spatio-temporal characteristic and can reflect the real-time deformation state of the dam [19,20]. If the dam deformation data are analyzed from only a single time series or cross-sectional series, it will be difficult to get a comprehensive and effective understanding of the overall deformation behavior of the dam. The continuous development of spatio-temporal data mining technologies has led to development of the panel data model [20,21]. Panel models have been utilized in domains where there is uncertainty or loss of data [22,23,24]. For instance, fuzzy logic [25] is employed in statistical debugging of stochastic simulations (i.e., Monte Carlo simulations) to quantify the extent to which a simulation with stochastics produces the intended result (i.e., passes or fails a test case). Before establishing a panel model for regional characteristics of the deformation of concrete dams, it is necessary to consider the correlations between all deformation measurement points in the concrete dams and to cluster the measurement points based on their deformation similarities [26,27]; this can effectively eliminate the interference caused by differences in the deformation laws of different measurement points. An important issue during the process of zoning the deformation measurement points is solving the problem of what type of similarity indicators to measure to determine the correlation between measurement points. The traditional indicators combine the “Absolute Quantity Euclidean Distance”, “Increment Quantity Euclidean Distance” and “Increment Speed Euclidean Distance” indicators to characterize the similarities of deformation monitoring sequences [19,28]. Nevertheless, when monitored values from two subsequent days in the deformation monitoring sequence show little variation or do not change at all, the usual similar “Increment Speed Euclidean Distance” indicator will not work since the denominator is very small or equal to zero. Another main problem is calculating the weights of each indicator to determine the Comprehensive Euclidean Distance of the measurement points. Commonly used weighting methods include objective weight [29], subjective weight and combined weight methods [30]. The method of calculating the objective weight does not depend on the subjective attitude of the decision-maker [31]. If the objective environment changes, its weight value will also change accordingly in a non-inheritable manner. Subjective weight has strong personal effect and certain inheritance [32]. However, due to its subjective randomness, sometimes the results of weight assignment is inaccurate due to personal factors of decision makers.

This paper proposes a novel model for imputation of missing concrete dam monitoring data based on clustering theory and the panel data model and aims to address the shortcomings of traditional methods, which do not consider correlations between measurement points from the perspective of spatio-temporal integrity during imputation of missing deformation data. First, the Increment Speed Euclidean Distance index has been improved to avoid the impact of a small or equal denominator on the indicator results and to reflect the relative deformation increase similarity at the same time. Additionally, a combined weighting method that combines the transparent characteristics of the objective weighting method and the empirical characteristics of the subjective weighting method to improve the traditional single index weighting method is proposed so that the proposed comprehensive distance index is more scientific. Based on the established similarity criteria, the k-means clustering method is used to determine the measurement point partition. Finally, a panel data model is established based on the deformation specific effect quantities of different measuring parts and used to impute missing data for corresponding parts. This paper is divided into three parts: the first part proposes the deformation similarity criterion and clustering method for dam measurement points. In the second part, different forms of panel data models and the basis for choosing the models are introduced. Finally, the third part includes testing the feasibility and effectiveness of the model. The clustering process and panel model establishment were completed using Python and Matlab code editors.

2. Deformation Similarity Criterion and the Method for Clustering Measurement Points

For large-volume concrete dams, local deformation abnormalities in a single measurement point do not represent changes in the safety of the dam structure; thus, analysis of local deformation of the dam alone is no longer sufficient to meet the requirements. Based on the panel characteristics of concrete dam deformation, if we can consider the elevation and regional differences in dam deformations from the traditional “point” analysis, we can avoid the biased judgement caused by considering local deformation in a large part. At the same time, concrete dam deformation varies greatly in different regions due to the large differences in factors influencing deformation behavior in different areas of concrete dams (such as load action, constraint conditions, material properties, environmental factors, etc.). Thus, how to model parts with similar deformation patterns and homogeneous responses to loads is associated with the robustness of the regional analysis model. Deformation partition analysis needs to deal with the following two questions: (1) what statistics should be used to characterize the degree of similarity between deformations in the measured points? And (2) what criteria are used to determine the degree of similarity between the regions? In the following section, the similarity criterion for deformation is developed based on panel characteristics of concrete dam deformation combined with spatial and temporal information on deformation. The structural deformation properties in both time and cross-sectional dimensions are then studied to establish the concrete dam deformation partitioning method.

2.1. Deformation Partitioning Criterion

Concrete dam measurement point deformation partitioning is the process of distinguishing and classifying the deformation properties of the entire dam based on the similarity (dissimilarity) of the deformation of each measurement point. Under the premise that no assumptions should be made regarding the deformation field, the deformation of each component of the concrete dam is examined and processed using mathematical methods and the appropriate classification criteria are established. On one hand, dam deformation monitoring data are compressed and retrieved, while on the other hand, the basis for further investigation is created. The goal is to make the deformation law inside the region as near to the closest degree of similarity as possible, while making the deformation law between regions as dissimilar as possible. Traditional deformation partitioning methods utilize the mean values of time series deformation of each measurement point, i.e., the deformation series degenerates into a cross-sectional series. This method can only represent the average change in dam deformation, which leads to loss of deformation information on the time dimension. Besides, the method is based on an assumption that the deformation of each measurement point varies in the same direction in the time dimension, making it difficult to reflect changes in deformation properties over time. As shown in Figure 1, if the above method of taking the mean value is adopted, measurement point 1 and measurement point 3 should be grouped in one category; however, if we consider the change in the deformation sequence over the whole time period, it is more reasonable to group measurement point 2 and measurement point 3 into one category.

Concrete dam deformation data provide information on the following three aspects: first, the absolute dam deformation values; second, the dynamic deformation time series, i.e., the increment in deformation over time; and third, the fluctuation in deformation development, i.e., the degree of variability or fluctuation. Thus, three similarity indices (absolute distance, incremental distance and growth rate distance) are merged to effectively reflect the similarity in deformation monitoring sequence on which dam deformation partitioning can be performed.

When preprocessing the deformation monitoring data, let {δ_it} and {δ_jt} denote the absolute deformation of the monitoring point i and monitoring point j at time section t, in which i and j are the indices of the cross-sectional dimension (spatial units), with i, j = 1, 2, …, n, and t is the monitoring time index of the time dimension (time periods),with t = 1, 2, …, T. In total, there are n monitoring points and T monitoring days.

First, the Absolute Quantity Euclidean Distance between points i and j, denoted as d_ij (AQED), can be expressed as follows:

d_{i j} (AQED) = {[\sum_{t = 1}^{T} {(δ_{i t} - δ_{j t})}^{2}]}^{1 / 2}

(1)

where δ_it is the deformation value of measurement point i at time t; δ_jt is the deformation value of measurement point j at time t; and d_ij (AQED) characterizes the distance between measurement point i and measurement point j during the whole period T.

Second, the Increment Quantity Euclidean Distance between points i and j, denoted as d_ij (IQED), can be expressed as follows:

d_{i j} (IQED) = {[\sum_{t = 1}^{T} {(Δ δ_{i t} - Δ δ_{j t})}^{2}]}^{1 / 2}

(2)

where Δδ_it = δ_it − δ_it−₁, Δδ_jt = δ_jt − δ_jt−₁. Δδ_it and Δδ_jt denote the differences in the absolute amount of deformation between two adjacent periods. d_ij (IQED) characterizes the difference between the absolute quantity of indices in adjacent periods, which specifies the magnitudes of fluctuations in the data between points i and j in T time sections.

Third, the Increment Speed Euclidean Distance between points i and j, denoted as d_ij (ISED), can be expressed as follows:

d_{i j} (ISED) = {[\sum_{t = 1}^{T} {(\frac{Δ δ_{i t}}{Δ δ_{i, t - 1}} - \frac{Δ δ_{j t}}{Δ δ_{j, t - 1}})}^{2}]}^{1 / 2}

(3)

d_ij (ISED) portrays the difference in the incremental deformation trend of measurement points i and j over time. If the deformation is changing in the same direction over time and the more coordinated this change is, the more similar they are and the smaller d_ij (ISED) is; if the corresponding deformation is changing in the opposite direction, the similarity is poor and d_ij (ISED) will be larger at this time, which is in line with the basic principle of similarity metrics. However, there are several problems associated with the formula for traditional Increment Speed Euclidean Distance: (1) The denominator changes to 0 when measured deformation values at a measuring point are unchanged in two adjacent time periods. (2) When measured values at a measuring point change very little in two adjacent time periods, d_ij (ISED) is infinite. Both cases will make the results of the growth distance inaccurate. Thus, the Relative Deformation Increase Euclidean Distance between points i and j, denoted as d_ij (RDIED), is proposed. It can be expressed as follows:

d_{i j} (RDIED) = {[\sum_{t = 1}^{T} {(\frac{Δ δ_{i t} - Δ δ_{j t}}{Δ δ_{\max} - Δ δ_{\min}})}^{2}]}^{1 / 2}

(4)

where Δδ_max and Δδ_min represent the maximum value and the minimum value of Δδ_it and Δδ_jt, namely, the increment in the absolute amount of deformation between two adjacent periods, respectively. Δδ_max − Δδ_min indicate the amplitude of the incremental deformation in the measurement points i and j. Thus, d_ij (RDIED) reflects the relative increment in deformation considering the maximum increase in the amplitude of deformation sequences and can be used as the deformation similarity index instead of d_ij (ISED).

In order to accurately describe the deformation characteristics of each measurement point, it is necessary to establish a comprehensive criterion to portray the deformation similarity. The “comprehensive distance” (Comprehensive Euclidean Distance) between measurement points i and j, abbreviated as d_ij (CED), is introduced as:

d_{i j} (CED) = ω_{1} d_{i j} (AQED) + ω_{2} d_{i j} (IQED) + ω_{3} d_{i j} (RDIED)

(5)

where ω₁ + ω₂ + ω₃ = 1. ω₁, ω₂ and ω₃ denote the weights of the three distances.

The Comprehensive Euclidean Distance d_ij (CED) is a weighted combination of the above three distances and the weight coefficients can be subjectively given or objectively determined based on the actual situation. In order to reflect the comprehensive information in the concrete dam’s spatial deformation data itself, a combination of the entropy weighting method and the CRiteria Importance Through Intercriteria Correlation (CRITIC) method are used to calculate the comprehensive distance weight coefficients.

2.2. The Combined Weighting Method

2.2.1. The Entropy Weight Method

The Entropy Weight Method (EWM) [33] is an objective assignment method that can determine sample attribute weights based on the degree of variation in the sample attributes. Generally speaking, the greater the degree of variation in sample attributes, the lower its information entropy, which means that the sample provides more information and has greater weight in the comprehensive evaluation. Let the evaluation value y_it = δ_it, where i and t are as described in Section 2.1. Based on the idea of information entropy and the distance index evaluation system, the original data can be represented by matrix Y = [y_it]_n×T as:

Y = {[\begin{matrix} y_{11} & y_{12} & \dots & y_{1 T} \\ y_{21} & y_{22} & \dots & y_{2 T} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ y_{n 1} & y_{n 2} & \dots & y_{n T} \end{matrix}]}_{n \times T}

(6)

The three distances can be employed in conjunction with the aforementioned deformation similarity indices. Thus, the distance between adjacent measurement points can be viewed as the evaluation object. The EWM calculation steps are as follows:

Step 1: The characteristic weight p_it can be calculated as:

p_{i t} = \frac{y_{i t}}{\sum_{t = 1}^{T} y_{i t}}, (i = 1, 2, \dots, n; t = 1, 2, \dots, T)

(7)

where p_it ∈ [0, 1].

Step 2: entropy value calculation. The entropy value S_i of the indicator is calculated as follows:

S_{i} = - \frac{1}{\ln T} \sum_{t = 1}^{T} p_{i t} \ln p_{i t}

(8)

Step 3: entropy weight determination. The entropy weight ω_i of the indicator is calculated as follows:

ω_{i} = \frac{1 - S_{i}}{\sum_{t = 1}^{T} (1 - S_{i})}

(9)

2.2.2. The Criteria Importance through Intercriteria Correlation Method

The Criteria Importance Through Intercriteria Correlation (CRITIC) method is an objective empowerment method proposed by Diakoulake [34]. The basic idea is to establish the objective weights of indicators based on the discriminative power and conflict between evaluation indicators [35]. First of all, the discriminative power of an indicator refers to the variations in the value of the same indicator in different samples. The greater the variability of an evaluation indicator in different samples, the stronger its discriminative power, and vice versa. Second, the conflict between indicators refers to the degree of correlation between indicators; the correlation coefficient is usually used to measure the magnitude and direction of conflict. In the CRITIC method, standard deviation, s_i, is adopted to represent the fluctuations in the internal values of each indicator.

{\bar{y}}_{i} = \frac{1}{T} \sum_{t = 1}^{T} y_{i t}

(10)

s_{i} = \sqrt{\frac{1}{T - 1} \sum_{t = 1}^{T} {(y_{i t} - {\bar{y}}_{i})}^{2}}

(11)

where

{\bar{y}}_{i}

is the average value of deformation data y_it. The larger the standard deviation, the greater the numerical difference in the indicator.

Correlation coefficient is used to express the correlation between indicators. The stronger the correlation with other indicators, the more the same information is reflected and, to a certain extent, the evaluation intensity of the indicators is weakened, thus the weight assigned to the indicators should be reduced. The correlation coefficient r_ij between every two indices can be expressed as follows:

r_{i j} = \frac{\sum_{t = 1}^{T} (y_{i t} - \overline{y_{i}}) (y_{j t} - \overline{y_{j}})}{\sqrt{\sum_{t = 1}^{T} {(y_{i t} - \overline{y_{i}})}^{2} + \sum_{t = 1}^{T} {(y_{j t} - \overline{y_{j}})}^{2}}}

(12)

where

{\bar{y}}_{i}

and

{\bar{y}}_{j}

are the average values of the deformation data y_i and y_j. The correlation coefficient r_ij is the index used to measure the size and direction of conflict between data. Thus, the index G_i can be given as follows:

G_{i} = s_{i} \sum_{j = 1}^{n} (1 - r_{i j})

(13)

where

\sum_{j = 1}^{n} (1 - r_{i j})

is the quantitative value of the contradiction between the index i and other indices. Therefore, the objective weight w_i calculated by CRITIC can be expressed as:

w_{i} = \frac{G_{i}}{\sum_{i = 1}^{n} G_{i}}, (i = 1, 2, \dots, n)

(14)

2.2.3. Calculation of the Combined Weights of the Distances

The EMW is based on the principle of the degree of variability between indicators, while the CRITIC method is based on the principle of contrasting strengths of the same indicator and the difference between indicators. Combining the weights of the two methods combines the advantages of both methods for accurate weighting. The formula for their combination can be expressed as follows:

W_{i} = \frac{\sqrt{ω_{i} w_{i}}}{\sum_{i = 1}^{n} \sqrt{ω_{i} w_{i}}}

(15)

where ω_i is the weight calculated by the EWM; w_i is the weight calculated by the CRITIC method.

The Absolute Quantity Euclidean Distance d_ij (AQED), the Increment Quantity Euclidean Distance d_ij (IQED) and the Increment Speed Euclidean Distance d_ij (RDIED) are selected as evaluation indices by combining the above deformation similarity indices. The evaluation object is the distance between pairs of all measurement points. The obtained W_i is the weight coefficient and d_ij (CED) between different measurement points can be obtained after substituting into Equation (5) and can be used as the deformation similarity criterion.

2.3. Clustering Method for Dam Deformation Monitoring Points

The k-means method is introduced into the clustering analysis in response to the question of which criterion to use to determine the degree of similarity between deformation regions. The k-means method is a classical unsupervised machine learning algorithm that clusters samples based on their distances to K clustering centers μ_k. Each partition is set as c_k (k = 1, 2, …, K) [36]. The algorithm is used widely due to its computational simplicity and high efficiency. Based on the similarity measurement criteria proposed above, the calculation steps are as follows [37]:

Step 1: Randomly select k sample points from the dataset y_i (i = 1, 2, …, n) as cluster centers μ_k.

Step 2: Calculate the Absolute Quantity Euclidean Distance, the Increment Quantity Euclidean Distance and the Relative Deformation Increase Euclidean Distance according to Equation (1), Equation (2) and Equation (4).

Step 3: Refer to Equation (15) and calculate the entropy weight coefficients of the three distances in step 2. Substitute the weight coefficients into Equation (5) to calculate the comprehensive distance d_ij (CED) between n measurement points.

Step 4: Based on the comprehensive distance d_ij (CED) between each sample point and the cluster center μ_k, the sample point is placed into a cluster corresponding to the cluster center with the greatest similarity.

Step 5: Recalculate the cluster center μ_k of each cluster based on the existing samples in the cluster.

Step 6: Iterate step 4 and step 5 until the objective function converges, that is, the cluster center does not change. This marks the end of the clustering process. The core code for clustering is shown in Figure 2. The flow chart of the method for clustering dam deformation monitoring points is shown in Figure 3.

Using the k-means clustering method, a complete deformation partitioning process is proposed based on the deformation similarity criterion and the method for determining the number of regions. The choice of the value of k is the key step when using k-means to cluster deformation measurement points on concrete dams. In this paper, the elbow method is proposed for selecting the value of k. The core index of the elbow method is the Sum of Squared Errors (SSE), which can be calculated as:

S S E = {\sum_{k = 1}^{K} \sum_{y \in c_{k}} | y - y_{μ} |}^{2}

(16)

where k is the number of clusters; c_k represents the number of k clusters; and y_μ is the average value of the monitoring deformation data y_it. The basic idea of the elbow method when determining the optimal number of clusters is that as the number of clusters k increases, sample grouping becomes more refined, the degree of aggregation of each cluster gradually increases and the SSE gradually decreases. When k is less than the true number of clusters, the degree of aggregation of each cluster increases significantly due to the increase in k, thus the SSE decreases significantly. When k reaches the true number of clusters, the return on the degree of aggregation obtained by increasing k decreases rapidly, thus the decline in SSE decreases sharply and then flattens as the value of k continues to increase, that is, the graph of SSE and k is elbow-shaped and the corresponding k value of the elbow is the true clustering number of the data. Figure 4 shows the selection process for parameter k using the elbow method.

3. Deformation Varied Intercept Panel Model

Parts with similar deformation patterns can be distinguished based on the deformation partitioning method obtained in the previous study. In order to process missing concrete dam deformation monitoring data, an analytical model, which characterizes the deformation law of concrete dams, needs to be established. The overall deformation behavior of the concrete dam is the result of the synergistic effect of deformation in each section of the dam. The most important factors affecting the deformation of concrete dams at any point are water pressure (H), temperature (T) and aging (θ), all of which can be characterized using polynomials [38]. In practical engineering, as shown in Figure 5, deformation monitoring points A, B, C and D at different locations on the dam under external loads can be mostly explained by common influencing factors (water pressure, temperature and aging). However, there is a significant difference in the deformation law between measuring points C and D near the dam foundation and bank slope and measuring points A and B near the dam crest, which is associated with the synergistic effect of complex factors such as constraint conditions, material properties and surrounding environments in different parts of the dam, resulting in different specific deformation effects α_i in each part. There will be heterogeneities in the parameters of the deformation analysis model if these complex factors that cannot be monitored and quantified are not captured by explanatory variables. Although the independent variables in traditional deformation analysis models can describe the most important factors that affect deformation results, they often ignore the deformation specificity of different measurement points caused by these unmonitored factors and it is necessary to consider the amount of deformation that distinguishes heterogeneities in measurement points. Therefore, based on the panel data theory, the specific deformation effect quantity α_i characterizing the measurement points is introduced into the model to establish a concrete subarea variable intercept panel model.

We constructed an analytical model of the changing law of dam deformation based on deformation varied intercept panel model. The general formula of the varied intercept panel model can be expressed as:

y_{i t} = X_{i t} β + α_{i} + u_{i t}

(17)

where y_it is the deformation monitoring value of measurement point i at time t; X_it is the independent variable 1 × M vector; and M is the number of independent variables. The most important factors influencing deformation are selected and combined with the mechanical characteristics of the concrete dam. Thus, X_it = [1, H¹_t_, H²_t_, H³_t_, H⁴_t, T_1,t, T_2,t, ⋅⋅⋅, T_m_,t, θ_t_, lnθ_t], where H¹_t_, H²_t_, H³_t_, H⁴_t are the factors influencing the water pressure components of the deformation monitoring data; T_1,t, T_2,t, …, T_m_,t are the factors influencing the temperature components of the deformation monitoring data; θ_t and lnθ_t are the factors influencing the ageing components of the deformation monitoring data; and α_i is the pure quantity constant, which represents the specific effect quantity of deformation produced by different parts of the dam under the unique influencing factors. It is difficult to explicitly include these influencing factors (such as dam body shape, constraint conditions, material properties, load effects, etc.) as independent variables in the model. Therefore, α_i is adopted to represent the specific individual effects produced by these influences. β = [a₀, a₁, a₂, a₃, a₄, b₁, …, b_m, c₁, c₂]^T is a matching M × 1 vector of the parameters to be estimated. u_it is a random error component with mean 0 and variance σ² and satisfies independent identical distribution.

If parameters α_i and β in the panel do not vary at measurement point i and time t, i.e., assuming that there is no significant difference in individual specific effect quantities between different measurement points, then the panel model is equivalent to the traditional regression model. However, in practical engineering, linear regression models are extremely special cases. When there are unknown individual specific effects between different monitoring sites and individual specific effects of the omitted influences do not differ significantly, errors can easily occur during model setting. The introduction of the panel model of idiosyncratic effects that vary with measurement points is more consistent with the actual working conditions due to large differences in deformation laws in different parts of the concrete dam. For this reason, the study investigated deformation methods using fixed effects and random effects panel models.

3.1. Fixed Effects Panel Model

The deformation monitoring fixed effects panel model can be expressed in matrix form as:

y_{i t} = X_{i t} β + γ_{i} + u_{i t}

(18)

where y_it is the deformation monitoring value of measurement point i at time t; X_it is the independent variable 1 × M vector, in which T_1,t, T_2,t, …, T_m_,t are the factors influencing the temperature components of the deformation monitoring data. Generally, m is equal to 4. The water pressure factors are usually expressed as

H_{t}^{i} = H^{i}, i = 1, 2, 3, 4

. The temperature factors are usually set as the harmonic function of time t, i.e.,

f (t) = \sum_{i = 1}^{2} (b_{1 i} \sin \frac{2 π i t}{365} + b_{2 i} \cos \frac{2 π i t}{365})

; the parameter β does not change with time. γ_i is the unique fixed effect of different deformation monitoring points that depict the inherent specific effect on different parts of the dam. It reflects the difference in deformation at different monitoring points on the dam.

3.2. Random Effects Panel Model

Concrete dams are affected by various random factors during operation. If the quantity of the idiosyncratic effects at different measurement points is considered as a random variable, then the actual state of dam deformation is described by the random effects panel model. Thus, the parameters of the model primarily focus on reflecting the main components of the deformation monitoring values. The random effects parts reflect the idiosyncratic components of deformation at different measurement points.

The matrix form of the random effects panel model for deformation monitoring can be expressed as follows:

y_{i t} = X_{i t} β + α_{i} + ε_{i t}

(19)

where y_it and X_it are as described in Section 3.1; ε_it satisfies the condition E (ε_it|x_i₁, …, x_iT) = 0 and

ε_{i t} \overset{i i d}{~} (0, σ_{ε}^{2})

; and α_i is the random effect of deformation at different measurement points. All measurement points i,j and time t satisfy the conditions that E (α_it|x_i₁, …, x_iT) = 0, E (α_i²) =

σ_{α}^{2}

, E (α_iα_j) = 0, I ≠ j, E (ε_itα_j) = 0. α_i characterizes the idiosyncratic effect of external complex factors on the deformation of different parts of the dam. The idiosyncratic effect of each monitoring point is a random variable. Additionally, the specificity of the overall deformation of the dam conforms to a normal distribution. The differential characteristics of the deformation of different parts of the dam can be further reflected in the distribution of α_i. In the random effects regression model, ordinary least squares (OLS) estimation will lead to problems such as underfitting. The generalized least squares (GLS) estimation [39] can be used to solve the above problems and to obtain effective estimates of the parameters. The above study shows that the factors affecting the deformation value y_it of concrete dams are expressed in two parts: the independent variables represent the common influencing factors (water pressure, temperature, ageing, etc.) of deformation at all measurement points; the idiosyncratic effect quantities α₁, …, α_n reflect the variability of deformation at different measurement points and the quantities of the idiosyncratic effect are taken in two cases. The corresponding panel models are fixed effects model and random effects model. In actual engineering work, a panel model suitable for defining the deformation characteristics of concrete dams needs to be selected, i.e., the type of model needs to be chosen. The selection of a fixed effects model or a random effect model can be determined by testing the deformation monitoring sequences.

3.3. Model Type Selection

The deformation monitoring sequence of the concrete dam can be established using the partition panel model according to the method proposed above, but the virtual variable form expressing the difference between the measurement points, that is, whether the specific effect quantity is a fixed or random variable, needs to be determined using the setting test in the panel model to determine whether it is fixed or random effect. It is usual to analyze both models and compare whether the results are consistent. If is the results are consistent and show little or no heterogeneity, then the fixed effect model is selected; otherwise, the random effects model is selected. It can then be used to find the source of heterogeneity and conservatively draw a conclusion. In general, the basis for selection can be the Hausman test [40] and the results of the model can be examined from two aspects: the overall you and toxicity of the model and the importance of each variable.

4. Steps for Processing Missing Concrete Dam Deformation Monitoring Data

In actual engineering, there are generally two cases of missing data in concrete dam deformation monitoring, as shown in Figure 6: one is that data for one measurement point are missing in a certain time period—this is called single-value missing data—and the other is that data for all measurement points are missing in a certain time period—this is called multiple-value missing data. In both cases, the missing data can be imputed using the methods proposed in this paper. The imputation steps are as follows:

Step 1: The deformation monitoring values of all measured points on the concrete dam are clustered based on the monitoring value of the time period with complete monitoring data (the part before or after the missing data) so that the parts with similar deformation laws and homogenous responses to the load are classified into categories. The traditional point analysis method is replaced by the regional analysis method, which avoids, to a great extent, the bias caused by only considering local deformation of the dam body.

Step 2: The panel data model is used to model all the data in the period, except for missing measurement point data from the same area. After obtaining the panel data model expression with respect to the data, fitted values for missing data for each measuring point are obtained.

Step 3: The measured values and the calculated model values of the corresponding missing data points at the monitoring points are compared with the complete monitoring data. The flow chart that shows the imputation steps of missing data is shown in Figure 7.

5. Case Study

Jinping-Ⅰ dam, the highest concrete hyperbolic arch dam in the world, is taken as an example to verify the feasibility of the novel missing data imputation model. The top elevation of the dam is 1885.0 m, the lowest base elevation is 1580.0 m and the maximum dam height is 305.0 m. It is located on the main stream of the Yalong River in Liangshan Yi Autonomous Prefecture, Sichuan Province. The construction project started in 2005 and the pouring of the first concrete for the dam body commenced on 23 October 2009. Pouring of the concrete reached the dam crest elevation on 23 December 2013. Water was first added to a normal pool level of 1880 m on 24 August 2014. Several horizontal plumbs are installed in the dam to fully monitor the deformation of the structure. The exact distribution of the deformation measurement points is shown in Figure 8.

6. Results and Discussion

6.1. Clustering Results of the Measurement Points

The clustering is based on the Comprehensive Euclidean Distance d_ij (CED) between the deformation of the measurement points, and the deformation partitioning was calculated for 31 plumb line measurement points based on changing characteristics of the deformation from 17 December 2016 to 17 December 2018. First, the weight of each distance indicator was calculated. Based on the combined weights method described in Section 2.2.3, the weights of the Absolute Quantity Euclidean Distance d_ij (AQED), the Increment Quantity Euclidean Distance d_ij (IQED) and the Relative Deformation Increase Euclidean Distance d_ij (RDIED) were calculated as 0.2949, 0.2585 and 0.4466, respectively. The elbow method was then used to determine the number of clusters, k. The k value was equal to 6, as shown in Figure 9. Based on the k-means clustering classification process in Section 2.3, the comprehensive distance d_ij (CED) between each measuring point and the cluster center was calculated as shown in Figure 10. From this, the final classification result was determined as shown in Figure 10 and in Table 1.

6.2. Imputation of Missing Data

6.2.1. Imputation of Single-Value Missing Data

Taking measuring point PL13-4 in Cluster 5 as an example, some of the data were randomly deleted, including data scattered over different intervals as well as entire segments of data corresponding to a specific period, to construct a dataset with missing data. The research data values are plotted in Figure 11. Based on the deformation zoning results, before establishing the concrete dam deformation zoning panel model, a fixed effect panel model and a random effect panel model for deformation monitoring sequences were established for all measurement points in the area where measuring point PL13−4 is located, to determine the appropriate regression model. Comparison of the fixed effects panel data model and the random effects panel data model is shown in Figure 12. It indicated that the results of the two models were similar. During regression analysis, the fixed effect model and the random effect model were subjected to the F-test and both models gave a p value of 0.0000, indicating that both models were significant. It was indicated in Section 3.3 that when the results of the fixed effects model and the random effects model are consistent, the fixed effects model is employed. Therefore, the fixed effects panel data model was selected as the reliable model.

In the area where the PL13-4 measurement point is located, a fixed effects panel data model was adopted to fit and analyze the deformation monitoring effect quantity at the measurement point for the two time periods before and after the missing data, and the missing data values are calculated. To validate the imputation method for missing data, the proposed method was compared with traditional methods such as linear interpolation method and OLS interpolation method.

The quantity of the deformation monitoring effect of each measurement point in the two time periods (before and after the missing data) were modeled and analyzed using the panel data random effects model. The method proposed in this paper was used to fill in the missing data, assuming that there are several discontinuous missing data collected from measurement point PL13-3 during the study period. The results of the panel data model are presented in Table 2. The imputation results are shown in Table 3 and Figure 13. As can be seen in Table 3, compared with the traditional methods, filling in missing data for multiple discontinuous data at a single measurement point using the panel data model proposed in this paper gave results that are closer to the true value.

Traditional missing data imputation models usually analyze one-dimensional time series data for a single measurement point, making it difficult to account for multiple factors impacting the accuracy of the imputation of missing data. The panel data model takes the impact of both common and idiosyncratic factors on the deformation into account, effectively improving the explanatory ability and estimation accuracy of the model. The shortcomings of the traditional analytical models can therefore be compensated.

6.2.2. Imputation of Multiple-Value Missing Data

To verify the generalizability of the proposed imputation model, we took the measurement points in another region as a second example and assumed that all measured values (PL11-5, PL13-5 and PL16-5) in Cluster 6 from 13 July 2017 to 13 December 2017 were missing, as plotted in Figure 14. The data before and after the missing data were used to build the panel data imputation model. Comparison of the fixed effects panel data model and the random effects panel data model for Cluster 6 is shown in Figure 15. The fixed effects model and the random effects model were established and gave the same result. The parameters of the panel data model are given in Table 4. To verify that zoning the measurement points can improve the accuracy of imputation of deformation data, a global panel data model, a mean imputation model and a linear regression imputation model were compared. The imputation results of measurement points in Cluster 6 are shown in Figure 16, Figure 17 and Figure 18. At the same time, the average values of the Root Mean Square Error (RMSE) and the Mean Absolute Error (MAE) of the three measurement points were calculated to compare accuracy. The formula for calculating RMSE and MAE values for each measurement point can be expressed as:

\begin{array}{l} RMSE = \sqrt{\frac{\sum_{t = 1}^{T} {(y_{i t} - y_{e s t, t})}^{2}}{T}} \\ MAE = \frac{\sum_{t = 1}^{T} | y_{i t} - y_{e s t, t} |}{T} \end{array}

(20)

where y_it represents the true deformation value of measurement point i at time t and y_est,t represents the estimation value y_it calculated using the proposed model. The mean RMSE and MAE values are calculated and displayed in Table 5.

From the comparison of the imputation results of each model, it can be concluded that the panel data model proposed in this paper imputes missing data relatively well. At the same time, the mean RMSE and MAE values are both relatively small. The panel data model transforms the traditional point analysis method, which can, to a greater extent, avoid the bias caused by considering only local deformation of the concrete dam. The random effects panel model for imputation of missing data has high flexibility and sensitivity that can overcome interference from random disturbances, abnormal fluctuations and covariance effects. Thus, it can more objectively portray the basic law of dam deformation.

7. Model Applicability Verification

To further verify the applicability of the model proposed in this paper, Xiaowan dam was taken as an example; measuring points in the dam body were clustered and the panel data model was used to impute missing data. The clusters of the measurement points on Xiaowan dam are shown in Figure 19.

Taking the measurement points in Cluster 2 as an example, and assuming that data for measurement points taken from 30 May 2014 to 30 June 2014 were missing, a fixed effects panel data model and a random effects panel data model were established for the data collected before and after the missing data period. The model fitting results are shown in Figure 20. The regression coefficients of the panel data model for points measured in Cluster 6 is shown in Table 6. Similar to the analysis in Section 6, the modeling results of the fixed effects panel model and the random effects panel model were the same, and the p values of the F-test were both equal to 0. The results of imputation of missing data for measurement points in Cluster 2 are shown in Figure 21, Figure 22 and Figure 23. Comparison of the calculated RMSE and MAE values of the verification model are shown in Table 7. It can be concluded that this analysis verified the accuracy and applicability of the model.

8. Conclusions

In this paper, a novel missing data imputation model for monitoring concrete dam deformation safety was adopted. The study can be summarized as follows:

(1): An improved deformation similarity index called Relative Deformation Increase Euclidean Distance is proposed for solving the problems associated with traditional Increment Speed Euclidean Distance deformation similarity. The improved indicator is the same as the traditional indicator, indicating the relative growth of deformation. However, the improved indicator avoids situations where the denominator is extremely small or equal to 0, thus expanding the scope of its application.
(2): A combined weighting method that combines the advantages of two subjective weighting methods is proposed for improving the construction of the traditional deformation similarity index and applied to the clustering of deformation monitoring points. This method can improve the accuracy of clustering of deformation monitoring points and help to comprehensively describe the deformation characteristics of each area of the dam.
(3): Considering the quantity of the correlation of the deformation monitoring effect between measurement points in the same clustering partition and the influence of non-load factors, a method for imputing missing concrete dam deformation monitoring data based on panel data theory is proposed to make up for the deficiencies of traditional methods. The validity and generality of the method are verified using the Jinping-Ⅰ dam and Xiaowan Dam engineering examples.
(4): The data series used for monitoring dam deformation is part of time series data, which are also widely used in other disciplines such as economics, meteorology and power industries. In these fields, missingness inevitably exists in time series data. Therefore, the method proposed in this article can also be applied to other fields. Meanwhile, the method proposed considers two dimensions of missing time series data, comprehensively reflecting the correlation between data and the environment. It provides another direction for processing missing time series data. However, as mentioned, there are various types of missing data. The model proposed in this paper only discusses the common single missing data and multiple missing data in dam monitoring and proves its applicability. The applicability of the model to other missing data types needs further verification.

Author Contributions

Conceptualization, X.C. and H.G.; Methodology, X.C.; Software, W.C.; Data Curation, C.G.; Writing—Original Draft Preparation, X.C.; Writing—Review & Editing, X.C., J.W., H.G. and W.C.; Supervision, H.G. and C.G.; Funding Acquisition, H.G. and C.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant Nos. U2243223, 52239009, 52209159, 52079046, 52079049 and 52179128), the Basic Scientific Research Funding of State Key Laboratory (Grant No. 522012272), the Water Conservancy Science and Technology Project of Jiangsu (Grant No. 2022024), and Jiangsu Young Science and Technological Talents Support Project (Grant No. TJ-2022-076).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to project confidentiality.

Conflicts of Interest

The authors declare no conflict of interest.

References

Huang, C.Z.; Gu, C.S.; He, J. A novel method for processing missing data of concrete dam deformation. Adv. Sci. Technol. Water Resour. 2022, 42, 89–94. [Google Scholar]
Fu, X.; Gu, C.S.; Qin, D. Deformation features of a super-high arch dam structural system. Optik 2017, 130, 681–695. [Google Scholar] [CrossRef]
Wei, W.; Gu, C.S.; Fu, X. Processing Method of Missing Data in Dam Safety Monitoring. Math. Probl. Eng. 2021, 2021, 12. [Google Scholar] [CrossRef]
Qin, Q.F.; Bai, X.F.; Sun, J.N.; Li, F.; Zhu, J.J.; Mei, Y.; Yang, X.J. Overview and prospect of dam deformation monitoring technology. In Proceedings of the Conference on AOPC—Optical Sensing and Imaging Technology, Beijing, China, 20–22 June 2021. [Google Scholar]
Liu, Y.T.; Zheng, D.J.; Georgakis, C.; Kabel, T.; Cao, E.H.; Wu, X.; Ma, J.J. Deformation Analysis of an Ultra-High Arch Dam under Different Water Level Conditions Based on Optimized Dynamic Panel Clustering. Appl. Sci. 2022, 12, 481. [Google Scholar] [CrossRef]
Gu, H.; Wang, T.F.; Zhu, Y.T.; Wang, C.; Yang, D.S.; Huang, L.X. A Completion Method for Missing Concrete Dam Deformation Monitoring Data Pieces. Appl. Sci. 2021, 11, 463. [Google Scholar] [CrossRef]
Ge, W.; Wang, X.W.; Li, Z.K.; Zhang, H.X.; Guo, X.Y.; Wang, T.; Gao, W.X.; Lin, C.N.; van Gelder, P. Interval Analysis of the Loss of Life Caused by Dam Failure. J. Water Resour. Plan. Manag. 2021, 147, 7. [Google Scholar] [CrossRef]
Li, Y.T.; Bao, T.F.; Chen, H.; Zhang, K.; Shu, X.S.; Chen, Z.X.; Hu, Y.H. A large-scale sensor missing data imputation framework for dams using deep learning and transfer learning strategy. Measurement 2021, 178, 14. [Google Scholar] [CrossRef]
Little, R.J.A.; Rubin, D.B. Statistical Analysis with Missing Data; John Wiley & Sons: Hoboken, NJ, USA, 2019; Volume 793. [Google Scholar]
Norazian, M.N.; Shukri, Y.A.; Azam, R.N.; Al Bakri, A.M.M. Estimation of missing values in air pollution data using single imputation techniques. Scienceasia 2008, 34, 341–345. [Google Scholar] [CrossRef]
Andridge, R.R.; Little, R.J.A. A Review of Hot Deck Imputation for Survey Non-response. Int. Stat. Rev. 2010, 78, 40–64. [Google Scholar] [CrossRef]
Al-Helali, B.; Chen, Q.; Xue, B.; Zhang, M.J. A Hybrid GP-KNN Imputation for Symbolic Regression with Missing Values. In Proceedings of the 31st Australasian Joint Conference on Artificial Intelligence (AI), Victoria Univ Wellington, Wellington, New Zealand, 11–14 December 2018; pp. 345–357. [Google Scholar]
Zhang, Z.H. Multiple imputation with multivariate imputation by chained equation (MICE) package. Ann. Transl. Med. 2016, 4, 5. [Google Scholar] [CrossRef]
Dai, M.; Jin, Y.; Zha, Q.; Liu, Y. Binary Logistic Regression Imputation and Application. Math. Pract. Theory 2013, 43, 162–167. [Google Scholar]
Bertsimas, D.; Pawlowski, C.; Zhuo, Y.D. From Predictive Methods to Missing Data Imputation: An Optimization Approach. J. Mach. Learn. Res. 2018, 18, 39. [Google Scholar]
Mao, Y.C.; Zhang, J.H.; Qi, H.; Wang, L.B. DNN-MVL: DNN-Multi-View-Learning-Based Recover Block Missing Data in a Dam Safety Monitoring System. Sensors 2019, 19, 2895. [Google Scholar] [CrossRef]
Wan, H.P.; Ni, Y.Q. Bayesian multi-task learning methodology for reconstruction of structural health monitoring data. Struct. Health Monit. 2019, 18, 1282–1309. [Google Scholar] [CrossRef]
Lin, J.; Li, N.H.; Alam, M.A.; Ma, Y.Q. Data-driven missing data imputation in cluster monitoring system based on deep neural network. Appl. Intell. 2020, 50, 860–877. [Google Scholar] [CrossRef]
Chen, B.; Hu, T.Y.; Huang, Z.S.; Fang, C.H. A spatio-temporal clustering and diagnosis method for concrete arch dams using deformation monitoring data. Struct. Health Monit. 2019, 18, 1355–1371. [Google Scholar] [CrossRef]
Wang, J.Y.; Gu, H.; Chen, B.; Gu, C.S.; Zhang, Q.N.; Xing, Z.K. A Spatio-Temporal Dam Deformation Zoning Method Considering Non-Uniform Distribution of Monitoring Information. IEEE Access 2021, 9, 117615–117628. [Google Scholar] [CrossRef]
Shao, C.F.; Gu, C.S.; Yang, M.; Xu, Y.X.; Su, H.Z. A novel model of dam displacement based on panel data. Struct. Control Health Monit. 2018, 25, 2037. [Google Scholar] [CrossRef]
Gore, R.; Reynolds, P.F.; Kamensky, D.; Diallo, S.; Padilla, J. Statistical Debugging for Simulations. ACM Trans. Model. Comput. Simul. 2015, 25, 2699722. [Google Scholar] [CrossRef]
Kamensky, D.; Gore, R.; Reynolds, P.F. Applying enhanced fault localization technology to Monte Carlo simulations. In Proceedings of the Winter Simulation Conference (WSC)/Conference on Modeling and Analysis for Semiconductor Manufacturing (MASM), Phoenix, AZ, USA, 11–14 December 2011; pp. 2798–2809. [Google Scholar]
Hao, P.; Zheng, Z.; Gao, Y.; Zhang, Z. Statistical Fault Localization in Decision Support System Based on Probability Distribution Criterion. In Proceedings of the Joint World Congress of the International-Fuzzy-Systems-Association (IFSA)/Annual Meeting of the North-American-Fuzzy-Information-Processing-Society (NAFIPS), Edmonton, AB, Canada, 24–28 June 2013; pp. 878–883. [Google Scholar]
Liu, L.; Lian, M.J.; Lu, C.W.; Zhang, S.; Liu, R.M.; Xiong, N.N. TCSA: A Traffic Congestion Situation Assessment Scheme Based on Multi-Index Fuzzy Comprehensive Evaluation in 5G-IoV. Electronics 2022, 11, 1032. [Google Scholar] [CrossRef]
Cao, W.H.; Wen, Z.P.; Su, H.Z. Spatiotemporal clustering analysis and zonal prediction model for deformation behavior of super-high arch dams. Expert Syst. Appl. 2023, 216, 119439. [Google Scholar] [CrossRef]
Li, J.J.; Chen, X.D.; Gu, C.S.; Huo, Z.Y. Seepage Comprehensive Evaluation of Concrete Dam Based on Grey Cluster Analysis. Water 2019, 11, 1499. [Google Scholar] [CrossRef]
Shi, Z.W.; Gu, C.S.; Qin, D. Variable-intercept panel model for deformation zoning of a super-high arch dam. SpringerPlus 2016, 5, 898. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; Wang, J.; Shi, L.; Diao, R.; Chen, D. Node importance ranking of multi-attribute social networks based on objective weight determining method. Appl. Res. Comput. 2016, 33, 2933–2936. [Google Scholar]
Jian, W.U.; Changyong, L.; Wennian, L.I. Method to determine attribute weights based on subjective and objective integrated. Syst. Eng. Electron. 2007, 29, 383–387. [Google Scholar]
Zhu, B.; Jin, W.; Li, L.; Zhao, J.; Chen, Z.; Zhang, Y.; Li, W. Evaluation of Brake Pedal Feeling Based on Subjective and Objective Comprehensive Weighting Method. Automot. Eng. 2021, 43, 697–704. [Google Scholar]
Shi, D.; Hu, B.; Liu, Y.; Chen, J. An Improved Weighting Method of AHM-RS Radar Equipment Supportability Evaluation Index. Fire Control Command Control 2020, 45, 170–176. [Google Scholar]
Chen, C.H. A Novel Multi-Criteria Decision-Making Model for Building Material Supplier Selection Based on Entropy-AHP Weighted TOPSIS. Entropy 2020, 22, 259. [Google Scholar] [CrossRef]
Diakoulaki, D.; Mavrotas, G.; Papayannakis, L. Determining objective weights in multiple criteria problems: The critic method. Comput. Oper. Res. 1995, 22, 763. [Google Scholar] [CrossRef]
Hou, J.M.; Xu, Z.H.; Yu, W.J.; Ding, S.Y. Hybrid Load-Following Operation Strategy for Building Triple-Feed System Considering Energy Storage Characteristics. Electr. Power Constr. 2022, 43, 50–62. [Google Scholar]
Chen, M.; Yin, C.J.; Xi, Y.P. A new clustering algorithm Partition K-means. In Proceedings of the International Conference on Advanced Materials and Computer Science, Chengdu, China, 1–2 May 2011; p. 577. [Google Scholar]
Zhang, D.; Li, M.; Xu, D.; Zhang, Z. A survey on theory and algorithms for k-means problems. Sci. Sin. Math. 2020, 50, 1387–1404. [Google Scholar]
Gu, C.S.; Zhao, E.F. Theory and Method of Dam Safety Monitoring; Hohai University Press: Nanjing, China, 2019. [Google Scholar]
Kantar, Y.M. Generalized least squares and weighted least squares estimation methods for distributional parameters. Revstat-Stat. J. 2015, 13, 263. [Google Scholar]
Holly, A. Hausman specification test. In Encyclopedia of Statistical Sciences; Wiley Online Library: Hoboken, NJ, USA, 2004. [Google Scholar]

Figure 1. Changes in the law of deformation measurement of measurement points in different periods.

Figure 2. Core source code for k-means clustering process.

Figure 3. Flow chart of the method for clustering dam deformation monitoring points.

Figure 4. Schematic diagram of parameter k determined using the elbow method.

Figure 5. Schematic diagram of the specific effect of deformation at different measuring points on a concrete dam.

Figure 6. Schematic diagram of two cases of missing data.

Figure 7. Flow chart showing the imputation steps of missing data.

Figure 8. Distribution of measurement points on the concrete dam.

Figure 9. Schematic diagram of the selection of the number of clusters, k, of measurement points.

Figure 10. The final classification result of the concrete dam.

Figure 11. Deformation monitoring value hydrograph of measuring point PL13-4 after de-leting some data.

Figure 12. Comparison of the fixed−effects panel data model and the random−effects panel data model for point PL13-4: (a) fixed effects model; (b) random effects model.

Figure 13. Comparison of the results of missing deformation values at measurement point PL13-4.

Figure 14. Missing values of all measurement points in Cluster 6.

Figure 15. Comparison of the fixed effects panel data model and the random effects panel data model for Cluster 6 of Jinping−Ⅰ Hydropower Station: (a) fixed effects model; (b) random effects model.

Figure 16. Comparison of imputation results of deformation at measurement point PL11-5.

Figure 17. Comparison of imputation results of deformation at measurement point PL13-5.

Figure 18. Comparison of imputation results of deformation at measurement point PL16-5.

Figure 19. Clusters of measurement points on Xiaowan dam.

Figure 20. Comparison of the fixed effects panel data model and the random effects panel data model of Cluster 6 of Xiaowan dam: (a) fixed effects model; (b) random effects model.

Figure 21. Comparison of imputation results of deformation at measurement point A15-PL-01.

Figure 22. Comparison of imputation results of deformation at measurement point A19-PL-01.

Figure 23. Comparison of imputation results of deformation at measurement point A19-PL-02.

Table 1. Clustering of deformation measurement points.

Cluster Number	Measurement Points
Cluster 1	PL1-1, PL5-2, PL5-3, PL5-4, PL9-4, PL19-2, PL19-3, PL19-4, PL19-5
Cluster 2	PL5-1, PL9-1, PL9-2, PL13-1
Cluster 3	PL9-3, PL11-1, PL11-2, PL13-2, PL16-1, PL16-2
Cluster 4	PL9-5, PL19-1, PL23-1, PL23-2, IP23-1
Cluster 5	PL11-3, PL13-3, PL16-3, PL11-4, PL13-4, PL16-4
Cluster 6	PL11-5, PL13-5, PL16-5

Table 2. Regression coefficients of the fixed effects panel data model for data from measuring point PL13-3.

	Coefficient	R²	Prob > F
H¹_t	0.1695	0.8488	0.0000
H²_t	−0.0022
H³_t	1.108 × 10⁻⁵
H⁴_t	−1.36 × 10⁻⁸
T_1,t	−0.0454
T_2,t	−0.1804
T_3,t	−0.9589
T_4,t	−1.7557
θ_t	−2.2023
lnθ_t	8.1392

Table 3. Results of Single-Value Missing Data Imputation for measurement point PL13-4.

Date	Deformation Value (mm)
Date	True	Panel Data	Linear	OLS
12 January 2017	41.37	42.02	42.43	36.82
8 February 2017	37.71	38.04	39.88	34.66
26 February 2017	34.08	34.17	42.9	31.78
6 March 2017	32.24	32.22	47.16	25.25
13 March 2017	30.11	29.93	41.74	13.99
21 March 2017	27.76	27.41	41.73	18.97
28 March 2017	25.45	25.3	40.59	20.23
4 April 2017	23.24	23.28	39.82	21.52
11 April 2017	21.36	21.64	39.02	20.69
18 April 2017	21.53	21.69	38.26	19.87
25 April 2017	21.84	21.96	41.32	19.96
2 May 2017	22.81	22.84	37.84	20.72
9 May 2017	21.92	21.72	38.99	20.65
16 May 2017	20.78	20.29	41.36	21.29
23 May 2017	19.97	19.62	36.04	22.04
30 May 2017	20.87	20.28	32.33	22.59
7 June 2017	20.98	20.42	30.77	23.76

Table 4. Regression coefficients of the panel data model for points measured in Cluster 6.

	Coefficient	R²	Prob > F
H¹_t	0.0852	0.9698	0.0000
H²_t	3.085 × 10⁻⁵
H³_t	4.31 × 10⁻⁸
H⁴_t	2.442 × 10⁻¹⁰
T_1,t	−0.0883
T_2,t	−0.0947
T_3,t	−0.4196
T_4,t	−0.5479
θ_t	−0.9529
lnθ_t	3.7467

Table 5. Comparison of calculated RMSE and MAE values of the model.

Imputation Method	Panel Data Model	Global Panel Data Model	Linear Regression	Mean Interpolation
Root Mean Square Error	0.59	5.64	17.13	9.21
Mean Absolute Error	0.37	3.78	4.01	2.76

Table 6. Regression coefficients of the panel data model for points measured in Cluster 6.

	Coefficient	R²	Prob > F
H¹_t	16.034	0.9981	0.0000
H²_t	−0.1790
H³_t	0.0007
H⁴_t	−9.687 × 10⁻⁷
T_1,t	11.421
T_2,t	−0.8241
T_3,t	5.3385
T_4,t	−4.5594
θ_t	−16.199
lnθ_t	−0.7188

Table 7. Comparison of calculated RMSE and MAE values of the verification model.

Imputation Method	Panel Data Model	Global Panel Data Model	Linear Regression	Mean Interpolation
Root Mean Square Error	0.12	1.04	2.57	2.96
Mean Absolute Error	0.05	0.98	1.86	2.04

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cui, X.; Gu, H.; Gu, C.; Cao, W.; Wang, J. A Novel Imputation Model for Missing Concrete Dam Monitoring Data. Mathematics 2023, 11, 2178. https://doi.org/10.3390/math11092178

AMA Style

Cui X, Gu H, Gu C, Cao W, Wang J. A Novel Imputation Model for Missing Concrete Dam Monitoring Data. Mathematics. 2023; 11(9):2178. https://doi.org/10.3390/math11092178

Chicago/Turabian Style

Cui, Xinran, Hao Gu, Chongshi Gu, Wenhan Cao, and Jiayi Wang. 2023. "A Novel Imputation Model for Missing Concrete Dam Monitoring Data" Mathematics 11, no. 9: 2178. https://doi.org/10.3390/math11092178

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Imputation Model for Missing Concrete Dam Monitoring Data

Abstract

1. Introduction

2. Deformation Similarity Criterion and the Method for Clustering Measurement Points

2.1. Deformation Partitioning Criterion

2.2. The Combined Weighting Method

2.2.1. The Entropy Weight Method

2.2.2. The Criteria Importance through Intercriteria Correlation Method

2.2.3. Calculation of the Combined Weights of the Distances

2.3. Clustering Method for Dam Deformation Monitoring Points

3. Deformation Varied Intercept Panel Model

3.1. Fixed Effects Panel Model

3.2. Random Effects Panel Model

3.3. Model Type Selection

4. Steps for Processing Missing Concrete Dam Deformation Monitoring Data

5. Case Study

6. Results and Discussion

6.1. Clustering Results of the Measurement Points

6.2. Imputation of Missing Data

6.2.1. Imputation of Single-Value Missing Data

6.2.2. Imputation of Multiple-Value Missing Data

7. Model Applicability Verification

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI