**1. Introduction**

Cost management and knowing whether a final account is on budget or not is critical to measure a project's success [1]. As an example, the Project Management Institute [2] highlights the importance of monitoring and controlling costs using estimates as baselines to achieve budgeting goals. Cost estimation is the process of producing cost estimates by quantifying and valuing the necessary resources to develop a project [3]. The process is iterative in the sense that estimates are updated according to the level of information that becomes available during the inception and design stages, which is fundamental for the decision-making process. The estimation of costs enables the determining of the project's economic feasibility and the evaluation of alternatives, moreover, it can be a driver for the scope given the greater influence project owners have in the initial stages [2].

The most commonly used method to estimate costs in the early stages of building projects is the superficial area method [4]. This method, also called floor area method, consists of multiplying the total gross internal floor area (GIFA) by an appropriate cost/m2, based on historical data [5]. This traditional method provides low accuracy ranging between −15% to +25% [6,7]. Increasing the accuracy and reliability of cost estimates is of utmost importance for the decision-maker's ability to optimally assess alternatives and improve investment decisions early on in projects.

**Citation:** Castro Miranda, S.L.; Del Rey Castillo, E.; Gonzalez, V.; Adafin, J. Predictive Analytics for Early-Stage Construction Costs Estimation. *Buildings* **2022**, *12*, 1043. https:// doi.org/10.3390/buildings12071043

Academic Editor: Osama Abudayyeh

Received: 8 May 2022 Accepted: 15 July 2022 Published: 19 July 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Predictive analytics is a term that has been used since 2006 to find and exploit relationships in data [8]. Some methods, such as regression analysis, have been used in statistics for 200 years, starting with the early Legendre and Gauss Least Squares Method, used to determine orbits about the sun from astronomical observations [9]. Other more recent techniques, including Artificial Neural Networks (ANN), Decision Trees (DT), and Case-Based Reasoning (CBR), have evolved with the increase in computation capabilities and the growing volume of data stored [10]. Predictive analytics has been classified as a subset of data science [11], with the aim being to elaborate empirical predictions [12]. Predictive analytics started being applied in credit scoring in the decade beginning in 1950 and has increased its presence and benefits in the areas of fraud detection, healthcare, marketing, insurance, and retail [13,14].

In the process of creating predictive models, the initial stages consider the collection and preparation of observational data related to the desired phenomenon to forecast. The amount of data is critical to achieving higher accuracy in the results [12,15]. Given the dataintensive nature of predictive analytics, two characteristics of construction information can make predictive analytics suitable for cost estimation. First, construction projects consume a large amount of information in the form of drawings, schedules, contract documents, and specifications [10]. Secondly, project data, including cost, are becoming highly structured with the aim of 5D building information modelling, which provides quantities in real time from the information linked to virtual models [16]. The potential of predictive analytics in the construction industry has been widely supported by the research developed since 2000 [15,17].

A review of 27 studies on the use of artificial intelligence to construction-cost estimation has revealed three main drawbacks in the research area: (1) the need to consider more modeling parameters; (2) the need for standard validation methods to estimate the accuracy of models; and (3) ambiguity and opacity of the experimental results [17]. In a later review, the modeling process sorted by technique was identified by analysing more than 100 publications related to artificial intelligence and parametric estimation for construction cost [15]. Elfaki [17] and Elmousalami [15] focused on providing guidelines to improve the experimentation and the modelling process from a research perspective. Yet, explicit benefits and implications for practice, such as the accuracy levels, have not been addressed. Predictive analytics has tremendous potential to benefit construction projects, but the industry has not widely adopted this new technology [10].

In this paper, a systematic literature review based on the approach suggested by Kitchenham and Charters [18] was conducted to explore the applications of predictive analytic techniques on the early-stage cost estimation of building projects. This review aimed to investigate how predictive analytics can enhance the practice by: (1) exploring the model's input determination; (2) identifying the techniques used and accuracy of models; and (3) examining the direct benefits and challenges identified by the authors. The structure of the paper follows with a background of cost estimation and predictive analytics. Next, Section 3 reports the methodology, then the results and discussion are presented in Section 4. Finally, the conclusion is provided in Section 5.
