1. Introduction
The objective of this research is to develop an accurate technique to perform ‘en masse’ seismic assessments of low-rise buildings by predicting capacity curves with machine learning (ML), employing a simple set of the buildings’ geometrical parameters, rather than engaging with a tedious modeling process.
There are numerous studies focused on the seismic risk and vulnerability of buildings, based on analyzing the risk on either an urban or a building scale. Urban-scale analyses are mainly based on macro-seismic approaches [
1,
2]. Because of the obvious difficulty of modeling each building individually (the mechanical approach), these methods are characterized by a trade-off between accuracy and effectiveness in terms of time and resources. Typically, their rationales boil down to the assessment of urban areas by grouping similar buildings into typologies or building classes of which the seismic behavior has been previously calculated in depth [
3,
4,
5]. For example, in Europe, it is common to use the building classes of the RISK-UE project [
6]. This methodology was applied to the city of Barcelona [
7] and to some Portuguese cities [
8]. Hybrid approaches can, however, improve the accuracy of macro-seismic assessments by combining the latter method with other techniques. For instance, visual screening can improve the knowledge level of the structures under assessment, which in turn helps to create more representative building classes and increase accuracy. Other hybrid approaches include identifying and modeling those features of the structures that have a greater impact in their seismic performance by means of sensitivity or parametric analyses [
9,
10]. Empirical methods have also been used to perform seismic assessments on an urban scale [
11]. This approach relies heavily on damage data from previous earthquakes and their impact on the building stock, which are not always easy to obtain.
In contrast to the above approaches, when studying a building in detail, a specific mechanical model for the building is defined and calculated. This is much more accurate, but requires many hours of work: the blueprints of the building must be obtained or drafted, the parameters of the materials must be determined, a model must be built by a specialist and, finally, the results are obtained. This is obviously excessively time-consuming when calculating a large number of buildings. However, by shortcutting the modeling process with ML, the methodology presented in this paper begets the advantages from both worlds (mechanical and macro-seismic) while having none of their drawbacks. In short, the main innovation and impact of this paper is that it sets out a methodology for an ‘en masse’ mechanical approach that is set to replace macro-seismic methods for the seismic assessment of urban areas and real-time evaluation tools.
The seismic behavior of individual structures can be assessed by means of dynamic or static analyses, accounting for their nonlinear modeling. Dynamic analyses require higher computational burdens than static ones. Contrariwise, static analyses are characterized by an affordable computational time while retaining high performance targets. In this work, nonlinear static analyses are performed to assess the seismic behavior of the structures following Eurocode 8, commonly known as push-over. This method is based on the calculation of the capacity curves of the building, which capture the relationship between basal shear and roof displacement under an incremental force applied to the structure. Nonlinear static has been selected over dynamic analysis mainly because (i) low-rise and mainly regular buildings are assessed, for which push-over analyses produce reliable results [
12,
13]. In the case of mid- and high-rise buildings, dynamic analyses would be required. Hence, dynamic analyses like time history analysis are usually carried out as in [
14]. (ii) The time history analysis method requires real accelerograms to determine the behavior of the structure for a specific seismic record. This is a well-known dynamic approach. However, this is not a generic method, as the aim of this research requires. It should be noted that this method is mainly used to perform exhaustive analyses of particular case study buildings or infrastructures such as tunnels [
15], slopes [
16] or special structures [
17]. (iii) The goal of this work is to develop a general method that can allow the determination of a capacity curve for all types of regular low-rise RC structures in a fast and simple manner. The choice of push-over over time history analysis has facilitated the generation of a dataset large enough to achieve this last objective.
As a first implementation of the proposed method, a typology of low-rise prismatic reinforced concrete (RC) buildings is defined and a training set of more than 7000 structures is parametrically generated. The capacity curves of these models are obtained by means of push-over analysis using SAP2000 software. After defining and training an appropriate neural network model, full capacity curves are predicted in a single run of the network, with a resolution of up to 100 points. This benchmark is substantially higher than previous research that employed less efficient and comprehensive approaches, e.g., one point at a time. The problem with this common approach is not only that it is slower, more tedious and complicated, but also that the information of where the curve ends is completely lost.
However, push-over analysis is still time-consuming, computationally expensive and requires advanced modeling expertise. In general, the 3D modeling of structures within engineering software packages is not an option when one needs to assess a large number of buildings. For this reason, there is an increasing volume of research that advocates the use of ML techniques that enable bypassing these limitations. One such method that has gained interest in recent years is artificial neural networks (ANN), mainly due to the extraordinary capability of these networks to approximate very complex functions. ANN can perform nonlinear modeling without prior knowledge of the relationships between the input and output variables. Consequently, they constitute a general and flexible modeling tool for prediction. It comes as no surprise that many engineering disciplines are witnessing an intense engagement with neural network models to solve a broad range of challenging problems [
18,
19,
20,
21,
22,
23,
24,
25].
The choice of ANN responds, on the one hand, to the fact that these models are naturally well suited for the challenges outlined above (strong nonlinearity and a large regression output of up to 100 points). On the other hand, it may be noted that the finite element method (FEM), as used in the SAP2000 calculations of this study, aims to solve differential functions that arise from structural analysis. Neural networks, in turn, are naturally equipped to deal with continuous and differentiable data that allow for an error optimization process through the gradient descent algorithm. In other words, the differentiable character of the data generated by FEM calculations is another aspect that makes the problem under study in this paper a good fit for a neural network model.
Many studies exist that use ML methods to assess the damage, vulnerability and response of buildings under seismic action, ranging from entire structures to structural elements and material stress-tests. In [
26], a very similar approach to the method proposed in this paper is conducted to assess a set of 30 buildings using time history analysis. The present research expands the latter study to account for a much broader range of buildings based on a training set of more than 7000 parametrically generated structures. Similarly, a very in-depth application of neural networks was carried out in [
24] to predict seismic-induced stress in specific elements of a two-span and two-story structure. In contrast with the present research, their model is specific to a case study structure, and the trained networks cannot be used to predict stresses in similar yet different structures. This is also the case in [
27], where a recurrent neural network model with Bayesian training and mutual information is used for the response prediction of large buildings. Recurrent nets are a common choice for time series and, more broadly, data that feature temporal correlations. Although not specific to response prediction, other work featuring recurrent models in the area of seismic analysis can also be found in the literature [
28,
29]. In light of the results obtained with the methodology presented in the present paper, the authors are of the opinion that the feedforward model chosen in this work was sufficient to address the problem at hand. However, a recurrent approach would perhaps be better suited for the challenges described in the Future Work section (for instance, extending the proposed methodology to high-rise buildings). Additionally, a good number of recent studies have focused on several improvements of the ML algorithms and methodologies used to assess the impact of seismic actions on structures [
30,
31,
32,
33,
34,
35].
ANN were also used in a classification fashion to establish which damage-level category a certain structure would fall into for a given seismic demand [
36]. Therefore, their approach was to use neural networks for pattern recognition based on structural parameters. In the method proposed here, a multidimensional regression approach to the application of neural networks is used to predict capacity curves, which provide a more detailed behavior of the structural damage. In the case of [
37], a very similar approach to the present work was adopted to predict fragility curves with neural networks and other ML methods. Yet, since their study was based on dynamic analysis, capacity curves were not considered, as opposed to the static approach employed in this paper. Furthermore, fragility curves were not computed in a single run of the network, as the present method proposes. Instead, each run estimates an individual pair of coordinates, and then the curves are rebuilt based on these estimations.
Other data mining techniques including ANN were employed to predict the performance point of school buildings under seismic action [
37]. Similarly, in [
38], the same objective was targeted using a genetic algorithm instead, achieving similar levels of error. The advantage of this last approach is that their model provides a transparent mathematical formula for the prediction of the performance point. However, both studies focus on predicting the performance point directly without considering the capacity curves of the buildings. This imposes many levels of complexity on the model, thus reducing the accuracy.
Other work used neural networks to make use of simplified approaches to determine the seismic performance of buildings. In [
39], an experimental database was employed to train a neural network with very few input parameters, seven in total, in order to predict stress and deformation values at specific locations within masonry-infilled RC frames under seismic action. The study aims to simplify the complex modeling of mixed-element structures, but is still limited in the scope of application due to a much-reduced number of input parameters. Finally, in [
40], neural networks were used to predict a bilinear simplification of capacity curves with accurate results. By contrast, in the methodology proposed in this paper, original capacity curves are predicted without simplification, broadening their scope of application.
As a summary, the main contribution of this work with regard to previous research is centered on the model’s ability to instantly perform a mechanical seismic assessment of an uncountable number of diverse buildings beyond the already vast training dataset. Additionally, this study delivers an important contribution by presenting a unified methodology that allows for the prediction of capacity curves (i) using simple input parameters derived from the geometric and material properties of the buildings alone, (ii) performing in a plastic regime (iii) in high resolution of up to 100 points per curve, (iv) in a single and coherent process (rather than in a point-by-point fashion), (v) for entire buildings with a great range of variability in size (limited to prismatic low-rise) and (vi) with immediate applicability to real-world emergency relief use cases under the seismic regulations in Eurocode 8.
3. Results
Following the experimentation, a final configuration of the network was achieved, which is detailed in
Table 10. Specific examples of the results are provided and shown in the
Appendix A.
With the final network configuration presented, after 1200 epochs, the average validation
MAE across 60 random splits between training and validation sets was 0.0123 (1.23%). A graph plotting the
MAE (%)/epoch progress for the minimum, maximum and average results out of the randomized splits is shown in
Figure 17. Additionally, the RMSE was also calculated in order to draw comparisons with other work, yielding a value of 0.0134 and following a very similar distribution to the
MAE.
However, as can be observed in
Figure 17, the best balance between the validation error and the overfitting ratio is achieved around epoch #700, where the average training and validation
MAE are 0.0112 and 0.0124, respectively. This couplet of values makes up for an overfitting ratio of 0.904, which lies within an acceptable range [
52]. Beyond this point, the validation error does not improve significantly, while the divergence between training and validation errors increases at a steady pace, thus increasing the overfitting ratio without a substantial gain in the validation accuracy.
Table 11 shows the evolution of the average overfitting ratio across epochs.
The distribution of the four error indicators defined in the scope of this work (
MAE, full area error, fitted area error and
Ld) are shown in
Figure 18,
Figure 19,
Figure 20 and
Figure 21, respectively. The average
fitted area error obtained for capacity curves is 2.35%, while the
full area error is 2.65%.
Last displacement error (
Ld) is 20.81% on average.
Out of a total of 2200 samples in the validation set, less than 50 samples present a fitted area error above 5.0% and there are no samples above 11.0%, while more than 1200 samples remain below the 2% threshold. Regarding the last displacement error, although most of the samples display an error below 10%, there is a constant spread of this error along the percentage axis all the way up to 100%.
To illustrate what these errors account for in visual terms, specific samples of curves have been graphed for three representative error values within the error distributions obtained. For the fitted area error,
Figure 22 shows a sample with an error below 1%, while
Figure 23 corresponds to a sample with an error close to the average value of the distribution (2.35%), and
Figure 24 an error close to 5%, which represents the largest error range containing a meaningful number of samples (at least 10). More data on these specific samples can be consulted in
Table A1 of the
Appendix A.
For the
last displacement error (
Ld), the thresholds chosen to provide a visual illustration are
Ld < 5%,
Ld ~21.32% (average of the distribution) and
Ld ~60%, as shown in
Figure 25,
Figure 26 and
Figure 27. More data on these specific samples can be consulted in
Table A2 of the
Appendix A.
4. Discussion of Results
The tests on different network architectures yielded the best results for configurations that featured two hidden layers; in particular, the scheme that delivered the best results was 30-65-65-100, as shown in
Table 6. Architectures with only one hidden layer returned better results with larger sizes, but still not as competitive as the latter. The fact that the network clearly performs better when increasing its complexity accounts for the level of difficulty of the predictions. However, as illustrated previously in
Figure 11, after a certain point, increasing the complexity of the model does not improve the results due to overfitting. Perhaps, with an even larger training set, these deeper architectures may improve the results presented in this paper, thus leaving room for future work.
Regarding the size of the output layer, and rather counter-intuitively, smaller sizes of curve resolution did not improve the metrics. An initial output layer size of 100 was set because it was considered to have enough resolution for the problem at hand while not being excessively large for training. Interestingly, lowering this value proved detrimental, while eventually increasing its size up to 135 yielded equally accurate results (see
Table 7). This can be explained by the fact that lower resolutions are less loyal to the calculation algorithms within the engineering software that produced the validation set of capacity curves (SAP2000), and since neural networks perform best with clear patterns, lower resolutions introduce harmful noise into the training process.
At the fine-tuning stage, batch sizes played an important role in maximizing the curve prediction accuracy. Although there may be some disagreement on the regularizing effect of the batch size [
64,
65], in these experiments, it was found that larger batch sizes can prevent overfitting by regularizing the network to some extent, because the loss values are averaged for all elements in the batch and then back-propagated to adjust the weights and biases of the model.
Table 9 shows how the lowest batch sizes had very good training results, but lagged when tested against the validation set, thus displaying a more acute overfitting. The best-performing batch size tested was of 24 samples.
Beyond the exploration of network parameters, it is important to note that the one critical factor in achieving the results discussed above was the size of the dataset. Initial attempts not accounted for in this text were carried out with 2000 samples and delivered poor results. The increase to 7000 samples saw the largest impact in accuracy among all of the different options and parameters tested in that first round of experiments. However, it is important to observe that these results should be tested and validated on real cases to measure the impact of the modeling assumptions (like the lack of infill panels) established in this method. Additionally, there are several uncertainties implicit in the results from the model that need to be explored further, such as the actual properties of the materials that have been modeled according to industry regulations, but are not based on empiric material surveys on the ground.
Although this research is centered on its applicability to a very broad range of structures, the results achieved still compare quite well to other work previously discussed in this paper. In [
26], the maximum inter-story drift ratio (MIDR) was predicted using different ANN models. The best results featured the prediction of MIDR values with an error between 1.5% and 2%. Despite the obvious difference of the parameters being measured, the average
fitted area error of 2.35% achieved in the present study remains within close range to those results. Another interesting point of comparison is the work of Pérez Ramírez et al. [
27], where a recurrent neural network model with Bayesian training and mutual information is employed to estimate the acceleration response spectrum of buildings. In this case, the model also aims to predict a response curve and, therefore, a better comparison can be established. Their results show the prediction accuracy of the network for a large residential building and a scaled model of a five-story steel structure under both seismic action and ambient vibrations. For the 30th floor of the residential building, the lowest RMSE values were 0.1789 and 0.1827 for moderate and high-level seismic excitations, respectively. These results are less accurate than the RMSE of 0.0134 achieved in the present work, but, of course, predicting the vibration response of a high-rise building is also a very tough challenge.
Future Work
In the short term, it may be interesting to include the use of genetic algorithms to evolve an even more optimal network architecture. Additionally, a comparison with other regression methods or machine learning approaches would be desirable to contextualize the results presented in this paper, while in the long term, it might be interesting to explore a similar approach with (i) dynamic nonlinear analysis, e.g., time history analysis, (ii) irregular structures and high-rise buildings with higher vibration modes and (iii) a wider variation of sectional and material properties. It should also be observed that this work has followed the capacity spectrum method from Eurocode 8, which is proven to work well with low-rise structures. However, a research effort to predict capacity curves under a set of strong earthquakes that can produce greater nonlinearities would extend the applicability of the method presented in this paper. All of these cases (dynamic analysis, high-rise buildings and stronger earthquakes) pose a greater challenge in terms of machine learning and may require the use of more specialized neural network architectures. In this regard, because capacity curves can be regarded as time-dependent time series, network models better suited to handle dynamic inputs, such as recurrent neural networks [
66,
67], should be explored.
5. Conclusions
A method based on artificial neural networks to estimate the capacity curves of low-rise RC buildings was developed and implemented. In the methodology presented, no modeling of the specific building is required. Curves can be estimated with an average curve-area fitting above 97.6%, only requiring the basic geometric parameters of the building to be specified.
As a first implementation, a typology of prismatic RC buildings was defined and a training set of more than 7000 structures was parametrically generated. The capacity curves of these models were obtained by means of push-over analysis using SAP2000 software.
The proposed method is fit for the accurate assessment of the seismic vulnerability of regular low-rise RC buildings almost instantly. It provides a fast and reliable alternative for the calculation of capacity curves when detailed information of the building is unavailable, but basic data are available (number of spans, span dimensions, beam and column profiles and slab thickness). It may also provide a fast and robust alternative when, due to the large volume of buildings to be assessed, it is not feasible to engage in individual modeling. While current macro-seismic approaches address these issues, their accuracy is in no way comparable. This feature can be very relevant in the light of urban scenarios where the seismic vulnerability of a great number of buildings needs to be assessed. The resulting trained network can be used by emergency services and other government bodies as a decision-making tool for prevention purposes (targeted retrofitting interventions, for example) and, after a seismic event, for quick and effective relief action.
The main conclusions of the research presented in this paper are the following:
- -
ANN provide an accurate approximation method for the nonlinear static push-over calculation of low-rise structures within a wide range of sizes and geometric configurations.
- -
The accuracy of the method successfully addresses the shortcomings of current macro-seismic approaches, while remaining fast and efficient.
- -
Stress-deformation curves in a plastic regime can be predicted with ANN in one go for entire buildings using only basic geometric parameters. For low-rise structures, this work achieves a curve area error below 2.7% and a resolution of up to 100 points.
- -
The relative simplicity of the ANN architecture required to predict the capacity curves of low-rise buildings makes a strong case for the future research of high-rise structures using deeper networks and larger datasets.