**1. Introduction**

Bridges, which are without a doubt of high significance for transportation networks, can be also seen as results or products of construction projects. The completion of a project within budget is one of the project's success key factors. It is more likely to achieve success if the cost estimates are realistic and close to the actual costs. Therefore, there is a need for cost estimates provided at the successive stages of a construction project. Early cost estimates rely on basic information and parameters of a project. Although their expected accuracy is relatively low (they can be considered as qualitative predictions rather than precise cost estimates), they are delivered when the crucial decisions are made and thus the impact on the final cost is great.

Along with the intensive development and modernization of transport infrastructure in Poland, bridge construction has also increased over the past few years. On the one hand, it is important to start the process of cost estimation for a bridge project as early as possible. On the other hand, some artificial intelligence and machine learning tools offer capabilities, such as learning from experience and knowledge generalization, which make them applicable for the early cost estimation models. Especially for bridge projects, the development of such models is supposed to provide early estimates or forecasts of the final cost.

The aim of this paper is to introduce a cost estimation model for bridge construction projects based on machine learning, namely the support vector machine (SVM) method. The goal of the research was to develop a model supporting fast cost estimates of total construction costs of bridges in the early stages of construction projects.

### *1.1. Literature Review*

The problem of cost modeling for bridge projects is present in scientific publications. One can distinguish various approaches to this issue.

Part of the research is focused on the development of models for estimating the costs of either selected cost components or elements of bridge structures. In [1], the costs of doing preliminary engineering as cost components of the total costs of newly built bridges are addressed. The authors introduced statistical models that link variation in preliminary engineering costs with specific parameters. A conceptual model aiding cost estimates of bridge foundations is presented in [2]. A three-stage decision process including the foundation system selection, materials' quantities estimation, and foundation cost estimation is supported by the proposed model. In this study, stepwise regression analysis was applied. Another work [3] reports analysis which aimed to develop material quantity models of the abutment and caisson as components of a whole bridge structure, with prestressed concrete I-girder superstructure. The research and application of multiple regression analysis resulted in a number of equations proposed for estimates of concrete volume and reinforcing steel weight of abutment and caisson as components of a whole bridge structure. Another study [4] presents the problem of bridge superstructures cost estimates. The proposed method, based on linear regression and a bootstrap resampling, provides estimates in the early stages of road projects.

Another part of the research presents efforts on development models for estimating construction costs of specific kinds of bridges. The authors of [5] proposed a model for the cost estimation of timber bridges based on artificial neural networks. The performance of the proposed neural network-based model is reported to be better than the model based on linear regression. Another work [6] introduces a model for approximate cost estimation for prestressed concrete beam bridges based on the quantity of standard work. The proposed method supports cost estimates for a typical beam bridge structure using three parameters: length of span, total length of bridge, and width. Another paper [7] presents the methodology for estimates of railroad bridges. The proposed model combines case-based reasoning, genetic algorithms, and multiple regression as tools. Another work [8] introduced a computer-aided system providing cost estimates of prestressed concrete road bridges. The system, built upon the database including data collected from completed bridge projects, allows estimating the material quantities and costs of all bridge elements. The estimating models that constitute the core of the system were developed with the use of statistical analysis. The authors of [9] focus on the use of Bridge Information Modeling (BrIM) for detailed cost estimates. The authors discussed the issue of extraction of information from the bridge model and cost estimation process prepared on this basis. The methodology for generating cash flow and required payments are presented as well.

The problem of risk analysis in bridge construction is addressed in [10]. The research aimed to identify and analyze risks associated with bridge construction. Impacts of risks on cost and schedule in bridge projects are discussed.

Some publications refer to the issue of replacement, renovation, repair, and maintenance costs of bridges. Replacement cost prediction models, developed with the use of regression techniques, are introduced in [11], in which the authors investigated the applicability of nonlinear and log-linear models for the task. Another work [12] presents the development of a model for cost estimation of repair and maintenance of bridges using artificial neural networks. Another paper [13] presents the development, discussion, and performance assessment of a set of regression models for estimating the costs of rehabilitating bridges. One of the papers addresses specifically the issue of repair or replacement costs damaged by hurricane Katrina [14]. The authors analyzed and compared damage patterns to bridges and examples of repair measures. Relationships between storm surge elevation, damage level, and repair costs were developed. The issue of potential design considerations for bridges in vulnerable coastal regions is discussed. Some studies address the topic of life-cycle costs

of bridges. In another report [15], the life-cycle cost-effectiveness of fiber-reinforced-polymer bridge decks is investigated and analyzed. The author used life-cycle cost method analysis, tailored for comparing new materials with conventional ones. Publications on cost optimization of concrete bridge components and systems are reviewed in [16] along with the presentation of the state-of-the-art in life-cycle cost analysis and design of concrete bridges.

SVM are machine learning systems with the ability to learn from experience (hidden in the data presented to the systems) and knowledge generalization. The theory of SVM, developed by Vapnik and co-workers, is based on the principles of statistical learning [17,18]. The methodology and theory of SVM are also broadly presented in the literature by other authors, e.g., [19–21]. SVM can be applied for either classification or regression problems. Some SVM implementations in construction management, introduced in works published in recent years, are the automated document classification for improving information flow in construction management systems [22], methodology of legal decision support aiming at mitigation of negative impacts of conflicts that occur in the course of construction projects [23], risk hedging prediction for construction material suppliers [24], modeling construction contractors default prediction [25], prediction of company failure in the construction industry [26], and dynamical prediction of construction project success [27].

In the field of cost analyses in construction, specifically supported by SVM, one can also find recent works. SVM-based modeling variations of construction prices with the use of construction cost index in Taiwan were introduced in [28]. The study established a hybrid intelligence system based on the fusion of SVM and Differential Evolution for estimation of construction cost index in construction. The system is reported to perform with a satisfying, high accuracy. In another work [29], the authors developed models supporting the prediction of construction project cost and schedule success, as the input early project planning status information was used. The alternative models, based on either ANN or SVM, were compared—the latter proven to perform better. In one of the works [30], SVM-based machine learning, along with interval estimation and differential evolution, is implemented for modeling the cost at completion of construction projects (one of the metrics known from the Earned Value Management method). The proposed model proved its capability of delivering reliable forecasts. The authors of [31] focused on conceptual cost estimates of school buildings. Models based on linear regression, ANN and SVM, were developed and compared. The study on the estimation of costs and durations of urban road construction supported by alternative artificial intelligence tools, that are ANN or SVM, is presented in [32]. The SVM-based model is reported to perform with significantly better accuracy in terms of costs; whereas, for duration prediction, the SVM-based model is just slightly better than the one based on ANN.

### *1.2. Research Objectives*

The aim of this paper is to present the results of studies on the development of a machine learning-based regression model, using the support vector machine (SVM) method, to support early estimates of total construction costs of bridges. The paper content includes an introduction and review of the literature. The following section presents the synthesis of the SVM-based regression methodology and assumptions for the prediction of the total construction costs of bridges as a regression problem to be solved. These are followed by the introduction of the results of the SVM-based regression analysis and the discussion. The last section includes conclusions and recapitulation.

The main assumption for the model proposed in this paper is the use of the SVM method. The rationale for this assumption is the method's capability of dealing with great dimensional data, applicability to non-linear regression and the fact that the method allows finding a global solution for a given task. Moreover, SVM works well on small sets of training data. The following remarks that refer to the mentioned can be made. First: it is possible to take into account many variables that play the role of cost predictors in the problem of early cost estimation of bridges. Second: nonlinear relationships between the cost predictors and the total construction costs of bridges can be modeled

with the use of the SVM machine learning-based regression model. Third: The SVM-based model can be built upon a moderate amount of training data that characterize bridges and their costs.

The novelty of the introduced model relies on the fact that it offers cost predictions of bridges as whole objects. Moreover, several types of bridge structures are considered. Earlier works [2–4] focused mostly on estimates of either the substructure or superstructure. On the other hand, some works are limited to specific types of bridges [5–8]. The application of the SVM-based regression method for the development of a cost estimation model allows overcoming some drawbacks of the models built on the basis of regression analysis [2–4] or ANN [5]. When compared to linear regression, the SVM method does not require a priori assumptions about the functional relationship for the developed model. When compared to ANN, SVM is not at risk of the so-called local minima problem.

### **2. Methodology and Concept of a Model**

The development of a model capable of providing early cost estimates of bridges based on the SVM method is understood here by solving the regression problem with the use of machine learning. The dependent variable of the sought-for regression model was the total construction cost of a bridge, later denoted as *y*. On the other hand, independent variables such as vectors of cost predictors, later denoted as *x*, represent information such as the features, characteristics, and specificity of bridges. The sought-for model was intended to provide multidimensional mapping from the set of cost predictors to the set of values representing total construction costs. Formally, the implicit regression function *f*, which is supposed to provide the mapping *x* → *y* denoted as:

$$y = f(\mathbf{x}),\tag{1}$$

is supposed to be found with the use of machine learning-based on the SVM method. This method is based on knowledge generalization and learning from examples (that represent some experiences) presented to a machine.

### *2.1. Support Vector Machines Method in Regression Analysis*

The following fundamentals of the method were compiled and summarized after [17–21]. The SVM method allows approximating *f* as a linear hyperplane. The linear approximation is achieved specifically for nonlinear problems due to a transformation of independent variable space to a higher dimensional, linear feature space. If the set of training examples is given as χ such that: { χ = [*x*, *y*] ∈ *Rm* × *R* } and Φ is a nonlinear transformation used to determine a new feature space *H* for the inputs: Φ*: Rm* → *H*, Φ(*x*) ∈ *H*, *y* ∈ *R*, then the function *f* can be given as follows:

$$f(\mathbf{x}) = \mathbf{w}^{\mathrm{T}} \mathcal{O}(\mathbf{x}) + w\_0 \tag{2}$$

The transformation Φ(*x*) is supposed to increase the expressive power of the representation, and the approximation function is computed in the higher dimensional, linear feature space *H*. Support vectors (*sv*) are the training data points that lie closest to the hyperplane and thus they affect its optimal location.

To measure the errors of the training process, Vapnik's ε-insensitive loss function is assumed:

$$l(f(\mathbf{x}), y) = |y - f(\mathbf{x})|\_{\varepsilon \searrow} \tag{3}$$

where:


Here, ε defines a tube of insensitiveness used to fit the training examples around the true values *y*. In other words, the value of ε affects the number of support vectors.

Following this the, problem comes down to optimization by machine learning:

$$\frac{1}{2} \|\!\!w\|^2 + \mathcal{C}\Sigma(\xi - \xi^\star) \to \min,\tag{5}$$

subject to the constraints for the both sides of ε-tube:

$$\mathfrak{w}^{\mathsf{T}}\mathfrak{\Phi}(\mathbf{x}) + \mathfrak{w}\_{0} - \underline{y} \le \varepsilon + \underline{\zeta} \text{ and } \underline{y} - (\mathfrak{w}^{\mathsf{T}}\mathfrak{\Phi}(\mathbf{x}) + \mathfrak{w}\_{0}) \le \varepsilon + \underline{\zeta}^{\mathsf{s}} \text{ and } \underline{\zeta}, \underline{\zeta}^{\mathsf{s}} \ge 0 \tag{6}$$

The use of loss function (3) results in toleration of deviations smaller than ε. The *C* represents the regularization parameter in the SVM method, and determines a compromise between decision function's margin against training accuracy. It determines the compromise between the complexity of a model and ξ, and ξ\* in (5) and (6) are slack variables that penalize predictions out of the ε-tube. The optimization of (5) is solved with the use of Lagrange multipliers:

$$f(\mathbf{x}) = \Sigma\_{\text{nsw}}(\alpha - \alpha^\*) \Phi(\mathbf{x})^T \Phi(\mathbf{x}') + w\_{0\star} \tag{7}$$

where *nsv* stands for the number of support vectors and α, α\* are the multipliers for the optimal solution such as:

$$0 \le a \le \mathbb{C} \text{ and } 0 \le a \le \mathbb{C} \tag{8}$$

The choice of appropriate transformation Φ and explicit calculation of Φ(*x*) *<sup>T</sup>*Φ(*x*- ) is difficult and computationally complex. To simplify the computations, the kernel functions *K*(*x*, *x*- ) are introduced instead:

$$K(\mathbf{x}, \mathbf{x}') = \Phi(\mathbf{x})^T \Phi(\mathbf{x}'),\tag{9}$$

The kernel functions which are mostly mentioned for the use in the SVM method are: polynomial (10), radial basis (11), and sigmoidal (12):

$$K(\mathbf{x}, \mathbf{x}') = \tanh(\gamma \mathbf{x} \cdot \mathbf{x}' + \mathbf{c}),\tag{10}$$

$$K(\mathbf{x}, \mathbf{x}') = \exp(-\|\mathbf{y} - \mathbf{x}'\|^2),\tag{11}$$

$$K(\mathbf{x}, \mathbf{x}') = (\gamma \mathbf{x} \cdot \mathbf{x}' + \mathbf{c})^d,\tag{12}$$

Taking into account the above, the approximation function can be given finally as:

$$f(\mathbf{x}) = \Sigma\_{\rm s\overline{v}}(\alpha - \alpha^\*) \mathbf{K}(\mathbf{x}, \mathbf{x}') + w\_{0\prime} \tag{13}$$
