When employing neural networks for prediction, their accuracy heavily relies on the size of the training dataset. Insufficient training samples lead to a significant decrease in predictive accuracy. However, in the era of big data, similar prediction tasks can always be found to provide some prior knowledge to the network. In this study, for the task of predicting the degradation of the bonding strength between CFRP and concrete under the combined effect of hydrothermal and salt attacks, similar prediction tasks would involve predicting the degradation of the bonding strength between CFRP and concrete under the influence of either a hydrothermal or salt attack individually. These tasks share common inputs (such as the material properties of CFRP and concrete) and common outputs (bonding strength), despite differences in their parameters related to temperature, humidity, and salt. These shared characteristics provide a solid knowledge foundation for neural network learning. Therefore, when the sample size for a target prediction task is small, learning prior knowledge on large-scale datasets similar to the target task is recommended, and then the neural network can be transferred to the target task. In the field of deep learning, this problem is called few-shot learning, and the MAML algorithm has been proposed [
51] to solve this problem.
3.1. MAML Model
Unlike conventional neural networks, which optimize learning outcomes based on individual samples, the MAML algorithm focuses on tasks as its training target, thereby creating the learning process. Instead of directly acquiring a mathematical model for prediction, it aims to learn “how to learn a mathematical model more efficiently and effectively.” MAML is primarily divided into two components: meta-learning and fine-tuning processes. Meta-learning involves designing model characteristics that can be shared across various tasks, facilitating these characteristics’ adjustments. Its objective is to train the model on a series of related tasks to gather useful general knowledge for tackling new tasks. The meta-learning training process entails iterative tasks, wherein the model utilizes its current parameters for each task and computes the loss specific to that task. Subsequently, the model parameters are updated by calculating a weighted average of the losses across these tasks. Fine-tuning takes place after meta-learning (as delineated in the fine-tune steps in Algorithm 1), during which the model’s parameters are fixed and further adjustments are made to adapt to specific new tasks. This fine-tuning stage resembles traditional machine learning training. Throughout fine-tuning, the model is trained using data from new tasks, adjusting parameters via gradient descent to minimize the loss function associated with the new tasks. Since the model’s initial parameters have been optimized during meta-learning, the fine-tuning training process typically requires fewer iterations and can swiftly adapt to new tasks. Consequently, even with limited existing data, MAML can still achieve high accuracy.
Algorithm 1. Algorithm of MAML. |
Input: Point set X with 19 characteristics and 436 samples |
Label set Y with 1 characteristic and 436 samples |
Training parameter: -weighs and bias in the net |
Hyper parameter: epoch1-The number of times each task was trained in the MAML |
epoch2-The number of times each task was trained in the Fine-tune |
α-parameters to control the learning rate of the net |
β-parameters to control the learning rate of the task |
Data segmentation: 1. Divide the samples into training set and test |
set2. Divide the training set into M tasks, the test set into N tasks |
Steps in the MAML: |
1. initialization |
2. for j = 1, …, epoch1 |
3. for batches of tasks in M tasks |
4. initialization and get by forward propagation |
5. Calculate the loss of each task |
6. Update for i times |
7. end |
8. Calculate the loss function of each batch
|
9. Update in the θ by |
10. end |
Output: |
Steps in the Fine-tune of a specific task in N tasks: |
1. Froze the in the shallow layer of |
2. for j = 1, …, epoch2 |
3. for batches of samples in the training set |
4. Get
with Forward propagation. |
5. Calculate the loss function:
|
6. Update θ to
|
7. end |
8. end |
Output: |
The steps outlined in Algorithm 1 describe the MAML process comprehensively. The notation ‘j’ is the current training time, ‘m’ represents the sample number, is the predicted value, represents the updated weights and biases at the current training time, ‘y’ is the labeled value, is a temporary parameter used to search for gradient descent directions, is the number of tasks in a batch, is the -th task in the train, is the notation of a partial differential, is the component of array which represents a single weight or bias value, ‘L’ is the loss function of the regression problem, and is the gradient of the variable.
The MAML model has three assumptions. The first is the fast adaptation assumption. Under this assumption, the MAML model can achieve good performance on a new task with just a few gradient updates, meaning that the initial parameters obtained through meta-training are close to the optimal parameters in the task space. The second is the shared parameter assumption. Under this assumption, although tasks are different, they can share the same model’s initialization parameters, namely its meta-parameters. These meta-parameters are learned through an optimization process across multiple tasks. The third is the finite sample assumption. Under this assumption, for each task only a small number of training samples (usually referred to as K-shot learning) are needed to adapt the model. This assumption is aligned with the goal of MAML, which is to achieve rapid learning.
This research utilizes two databases, Meta1 and Meta2, to develop the MAML model. The Meta1 database considers the CFRP-to-concrete interfacial ultimate loads under different environments (hygrothermal or salt attack) separately. The coupled database possesses features that influence the interfacial ultimate load under both wet–dry attacks and salt attacks, enabling the MAML model to perform transfer learning between these two domains during training. This enhances the model’s generalization ability and performance on new tasks, making it applicable to data obtained from both wet–dry attack and salt attack environments. The Meta2 database is entirely unrelated to the Meta1 database; it considers the interfacial ultimate load under the joint action of sulfate ions and wet–dry cycle environments. However, after training with Meta1, the MAML model is already suitable for environments where wet–dry and salt ions act together. Therefore, under Meta2 processing, the MAML model will demonstrate even stronger applicability.
3.2. MAML Training and Test Results
To ascertain the accuracy of the MAML prediction methodology, database Meta2, which is not involved in the training process, is utilized to evaluate its network performance. This posterior database is partitioned into distinct training and testing sets, and during the fine-tune process these sets are locked. However, throughout every meta-learning epoch, the assignment of tasks to training and testing sets remained dynamic; upon the completion of an epoch, tasks were reallocated, ensuring that both the training and testing sets were consistently resampled at a 4:1 ratio. The hyperparameters
and
of the model were set to 0.01. In order to mitigate the possibility of incidental outcomes, a total of six repeated predictions were conducted, and the predictive results are illustrated in
Table 5.
In the partitioned test set, as depicted in
Table 5, the predictive results of each repeated prediction are compared against the true value, and the errors between them are calculated. Here the error is equal to (predicted value–true value)/true value. At the end of each test, the average error between the predicted value and the true value is calculated using the absolute value of the 20 predictions. It is evident that although the overall average error is relatively small, ranging from 10.97~21.41%, there are occasional instances of significant errors in individual data points. These may stem from incomplete features in those specific data instances. In the Meta2 database, the chloride concentration feature values and temperature feature values for all sample data are 0. This is because we are only considering the coupling effect of sulfate ion corrosion with the wet–dry cycle environment. As a result, certain data points may lack crucial features, rendering the model unable to fully comprehend the data, thereby leading to significant prediction errors. This is an inevitable bias in deep learning algorithms. However, since these larger biases occur only sporadically, it can still be seen that the network possesses good learning and prediction capabilities. Moreover, there has been no indication of overfitting when the input factors reach 19, which may be attributed to factors such as regularization, cross-validation, and the model architecture within the MAML algorithm. Furthermore, the average errors from these six repeated tests are not consistent, which may be attributed to the variability in the initialization of model weights and biases before each training process.
Figure 1 illustrates the evolution of the proximity between predicted values and true values during the training and testing processes of the MAML algorithm. In order to assess the prediction accuracy more comprehensively, we decided to no longer rely solely on simple relative errors but introduce the coefficient of determination, R-squared, to better characterize this relationship. The equation of R-squared is shown in Equation (5):
where
is the ith test value,
is the ith predicting value, and
is the average value of the test values.
The number of iterations, serving as a special hyperparameter, is also tested, as shown in
Figure 1. In MAML, an epoch refers to the number of iterations as well as the operation number of the entire training and testing dataset. To achieve the best results, it is essential to properly set the maximum number of epochs. In this study, epochs in the range of 0 to 5000 are selected. As the R-squared value approaches 1.0, indicating better prediction results, it is evident that, with increasing training epochs, both during the training and testing processes, the predicted results converge to higher values, around 0.8. As depicted in
Figure 1a, the convergence speed during the training process is notably rapid. In the initial iterations, the predicted values quickly approach the true value, reaching an R-squared value of approximately 0.8 at around 300 iterations. Subsequently, the convergence rate slows down, with a slight increase in the R-squared value, nearing stability around 1000 iterations. The subsequent trend suggests that the value will eventually approach 1.0; however, this would require significantly increased costs. In
Figure 1b, the evolution curve of R-squared values during the testing process and with increasing iterations is illustrated. It is noticeable that there is a notable distinction between the testing and training processes. While some individual tests still exhibit a rapid convergence speed consistent with the training process and their R-squared values are initially high, the convergence speed of tests 1, 2, and 3 significantly decreases. Although the results of these tests converge by the 1000th iteration, their R-squared values slightly decrease, stabilizing between 0.75 and 0.82. This occurrence is likely due to the disparity in the amount of data used between the training and testing processes. Therefore, the final convergence results may be biased, but the primary objective of achieving rapid convergence while maintaining prediction accuracy on a small-sample database has been accomplished.
Hence, the MAML prediction approach proves suitable for forecasting the interfacial bond performance between CFRP and concrete amidst varied environmental influences. Moreover, given the limited sample size of Meta2 tasks and the favorable prediction outcomes achieved by MAML, this method is also applicable to tasks with scant samples.
3.3. Comparison between MAML and Previous Analytical Models
In recent years, with the increasing application of deep learning in various engineering fields, there has been a growing number of cases where scholars in the field of civil engineering tend to use traditional ANN models, namely, backward propagation algorithms, for implicit regression problems. As shown in
Figure 2, traditional neural networks are mainly divided into two groups: forward propagation and backward propagation. During forward propagation, with m training samples available, where
X represents input variables and
y represents output results, these m samples can be represented as
. Input variable
X is represented as an
m ×
nx matrix, where
m denotes the number of training samples and
nx represents the dimensionality of the input feature vectors. Similarly, output variable
y is represented as an
m ×
ny matrix, with
m denoting the number of training samples and
ny representing the dimensionality of the output feature vectors. The above is the preparation of the dataset before training.
It is clearly indicated in
Figure 3 that the input matrix
X undergoes computations through the hidden layers of the neural network, where each layer calculates its respective formulaic results and applies an activation function (typically ReLU) to generate the layer’s output results. Eventually, these computations culminate in the final output
; this is the process of forward propagation. Back propagation differs; forward propagation calculated the predicted value
and then used the loss function to calculate the difference (
L) between the predicted value
and the actual value
y. In the process of backpropagation, the gradients of each layer’s parameters
, calculated based on the loss function, are computed in reverse order, starting from the last layer. Subsequently, the weights of each layer are adjusted layer by layer (
), ultimately updating the parameters. This iterative procedure leads to a gradual decrease in the loss function
L and thereby constitutes the backpropagation process.
In this section, a comparison is made between the BPNN, a widely used algorithm in civil engineering for implicit regression problems, and the MAML algorithm in their predictions of the degradation level of CFRP-to-concrete interface bond strength under the coupling effect of hydrothermal and salt attacks. This aims to demonstrate that the MAML algorithm, suitable for few-shot learning, is more appropriate for addressing complex implicit regression problems. To ensure a more reliable comparison of the predictive results between BPNN and MAML, the same database is adopted. The following content compares the accuracy of the predictive results, the fitting capability of the network models, and the convergence speed and precision of the two algorithms.
Figure 3 illustrates the comparison between the prediction results of both the MAML and BPNN using the database Meta2. A total of two repeated tests are conducted to avoid randomness. The x-axis represents the sample number of the test set, while the y-axis denotes the normalized bond strength. The black line represents the true values of 20 test samples, the red line represents the predicted values of the MAML algorithm, and the blue line represents the predicted values of the BPNN algorithm.
As shown in
Figure 3, it can be observed that the predicted results of the BPNN in both tests are significantly higher than the true values. Moreover, after regularization, most of the predicted values of the BPNN are distributed between 0.15 and 0.20. For samples with lower strength, the BPNN does not exhibit satisfactory learning results, indicating that the network’s learning ability is limited and the results are underfitting. The main reason for the underfitting is the low sample size, which is also one of the important limitations of neural network algorithms in practical engineering applications. For the MAML algorithm, which incorporates prior knowledge, as shown by the red line in
Figure 3, the predictive results are significantly superior to those of the BPNN algorithm. Moreover, it achieves better fitting results for both larger and smaller values.
Figure 4 provides a more intuitive comparison of the fitting capabilities of the two algorithms.
Figure 4 depicts the fitting of the predictions of two algorithms. The black line in the middle represents perfect fitting, where the predicted values match the actual values. The red portion represents prediction data from the MAML algorithm, while the blue portion represents prediction data from the BPNN algorithm. The
R2 value indicates the slope of the scatter fit to the line, with values closer to 1.0 indicating a closer approximation to perfection. It can be observed that the computational results of the MAML algorithm are primarily distributed near the curve of perfect fitting, with the fitting results from two repeated predictions being almost identical, with R-squared values of 0.9137 and 0.9472, respectively. On the other hand, the computational results of the BPNN mostly cluster in the top-left of the graph, indicating that the neural network struggles to predict results effectively as the actual values decrease. This discrepancy highlights the superior predictive performance of MAML, compared to BPNN, of this complex task.
The loss function reflects the difference between the predicted values and the actual values. The closer the loss function is to zero, the closer the predicted values are to the actual values. This function is a crucial variable in the learning process of neural networks, and its evolution pattern represents the learning path or process of the neural network.
Figure 5 illustrates the comparison of loss functions during the training and testing processes of two neural network algorithms.
Figure 5 provides a comparative analysis of the training and testing losses of MAML and BPNN across different numbers of repeated tests. Barring the initial phase, MAML consistently exhibits lower training losses than BPNN. The training loss of MAML swiftly approaches zero and steadily converges after a relatively short number of epochs, whereas BPNN’s training loss fluctuates with increasing epochs, displaying a tendency to remain largely unconverged and frequently showcasing outliers. It is not surprising that the BPNN algorithm fails to converge in its prediction on this database. After all, with only around 80 samples, and not all of them being high-quality, the dataset falls far short of the big data requirements of neural network algorithms. On the contrary, both in training and testing, the MAML algorithm converges rapidly on this small sample database.
In summary of this section, it is evident that MAML demonstrates superior applicability and accuracy in predicting the CFRP-to-concrete interfacial bond strength affected by the coupling effect of hydrothermal and salt ion attacks. However, this does not imply that the MAML algorithm universally outperforms the BPNN algorithm. When the database is sufficiently abundant and of high quality, the superiority of the BPNN algorithm has been demonstrated. Therefore, the significant role of the MAML algorithm lies in its ability to utilize low-quality databases generated by combining two or more unrelated factors to accomplish prediction tasks under multiple influences. This is something that the BPNN cannot achieve, as it is constrained by its requirements for a certain size and quality of database.