**3. Methodology**

Systematic literature reviews can support the development of a new knowledge base for practitioners and managers to provide collective insights [36]. According to Borrego [37], these rigorous reviews have become a significant source of evidence in medical research and are gaining importance in areas such as psychology and education. On the other hand, Denyer and Tranfield [38] highlighted the potential of systematic literature reviews as an evidence-based approach for management research. According to Pan [39], the two guidelines have become well-known guidelines for systematic reviews, the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) and Kitchenham guide [18,40]. Although the PRISMA has been designed primarily for studies that evaluate the effects of health interventions, Page [40] argues that its check lists items are applicable to other areas and it has been adopted for global standards when conducting systematic literature reviews. However, Denyer and Tranfield [38] exposed that fit-forpurpose methodologies should be developed according to the unique characteristics of the study's design. The present review focused on implementing predictive analytics techniques, which have evolved in the area of informatics requiring intensive use of computation applications. Since the guidelines by Kitchenham and S. Charters [18] for systematic literature reviews have been adapted from the medical and psychology, and according to Ayodele [41], implemented in computer science, the study has followed such guidelines considering them appropriate to address the research objective. A step-by-step description of the methodology is illustrated in Figure 2. Overall, the review process consisted of three main stages—planning, conducting, and reporting the review.

#### **Figure 2.** Methodology.

The planning stage was the most crucial part of the review because it provided a guide for the activities necessary to address the research objective. Accordingly, the first step in this stage was to identify the need for the review. For this purpose, a scoping review was conducted in the area of estimation, focusing on their challenges and future trends. A further review of cost modelling techniques allowed to establish the need to aggregate the individual results of the studies and transform them into recommendations for its uptake. In the second step, the consequent objective of investigating how predictive analytics can enhance cost estimation was divided into three questions:

Q1. How does predictive analytics determine the input parameters of models, and what are the parameters commonly used?

Q2. What is the predictive power of the predictive analytics techniques to forecast the construction cost in the early stages of building projects, and what are the most explored techniques?

Q3. What are the benefits and challenges in the use of predictive analytics techniques in cost estimation?

Following the suggestions on Kitchenham and Charters [18], the third step was to create a protocol for the inclusion of the fundamental procedures for the conduction of the review. This formal document is essential in systematic literature reviews because it is a plan helping to maintain objectivity in the research [36].

The second stage, conducting the review started with the identification of research. The database search engine selected was Scopus and the target material for the review was published applications of predictive analytics for estimating the costs of building construction projects in the early stages. The search syntax was TITLE ((cost OR costs) AND (estimation OR prediction OR modeling OR modelling OR model OR estimate) AND (buildings OR construction OR projects)) and it returned 1586 documents.

Aiming at finding resources to answer the research questions, the selection of primary studies was done based on the inclusion criteria which also considered as excluded from the review any study not fulfilling all the indicators. The following list contains the criteria used to include and exclude literature:


The selection of primary studies was conducted in two phases, first, by analysing the titles and abstracts and, then, a second selection was made by fully reviewing the studies. In the first filter, candidates were excluded when their characteristics were clearly against the selection criteria. In the second filter, a study was selected only when it fulfilled all the selection criteria. The preselection narrowed the list of papers from 1586 down to 127, and then, the full review allowed to identify 30 papers. A backward and forward snowballing process was performed on the 30 articles following the previous approach and following the suggestions provided by Wohlin [42]. With this process 16 additional studies were identified, finalising with 46 papers in total.

Quality assessment of studies using a variety of empirical methods remains a major problem [43]. In order to control the quality of the studies in the review, the presence of their publication venues in the Scimago H index and Google h5 index, together with the number of citations on Google Scholar were part of a quality-monitoring process.

In data extraction and monitoring the necessary information from the articles was imported from the Scopus search list in an XML format extraction and stored in an Excel sheet. This information consisted of title, authors, year of publication, venue, and number of citations until May 2022. In addition to the bibliographical data, the following content data items were sought to answer the research questions.


Systematic literature reviews typically use meta-analysis to combine and assess quantitative experimental results [44], but the present study used a statistical descriptive and content analysis approach. The bibliographic information was first analysed to have an overview of the publications and to understand the context of the research area. The compilation was synthesised into the items, date of publication, number of publications distributed in time, and origin country of the study.

The synthesis of the data to answer the research question one provided the number of techniques used in the process of selecting the initial parameters of the models and the parameters most used. To determine the parameters, the ranked lists of parameters provided in the studies were aggregated by the Borda–Kendall technique. This method was selected because its use has been widely implemented for rank aggregation and the derived techniques are intuitive and easy to understand [45–47].

The techniques implemented in the studies and the accuracy of the models were collected to answer the second research question. The numbers of techniques most explored were grouped as percentages. The accuracy of the models was summarised in averages and distributed in quartiles, while the second component of predictive power, validation methods, were grouped by type.

In answering research question three, benefits and drawbacks of the utilisation of predictive analytics techniques in cost modelling were compiled using reciprocal translation, which allowed integrating different terms describing the same meaning [18]. The ideas were extracted only from the discussion and conclusion sections to ensure they were derived from the experimentation. These were tabulated and ranked according to the number of authors mentioning them. The last stage of systematic literature reviews is the report. For this purpose, the report followed the protocol structure since it contains the fundamental elements of the review.
