1. Introduction
The focus of this paper is the development of an estimation function for the construction time of utility scale power plants, and the associated statistical analysis of the results from the estimation function. Construction cost of power plants has been often studied in the literature; however, a statistical analysis of the construction time for power plants has not been executed previously. The development of this function is novel and aims to increase technical accuracy and to inform policy makers. The technical motivation is to provide a function for use in the preliminary project planning phase to accurately estimate the required construction time, logistics, and budget. The policy motivation is to inform policy makers about the time required to implement renewable energy sources and integrate the sources into the grid because a concern of many governments is energy security. This paper determines the construction time for power plants using stepwise, multiple linear regression techniques with interaction terms. The paper is limited in geographical scope to Canadian power plants because of the limited number of provinces. Data was collected for the provinces of Alberta, British Columbia, Manitoba, New Brunswick, Nova Scotia, Ontario, Prince Edward Island, Quebec, and Saskatchewan.
Prior to a discussion of the investigation, the statistical tools and approach are explained and justified. The investigation begins with existing construction time data for hydroelectric, nuclear, and wind power plants across Canada. These data are explored for sensitivities to year of construction, technology, location, and size. A regression is performed, leading to estimation functions that are subsequently compared to the construction time of the power plants. These estimation functions will answer the following research objectives:
Determination of construction time dependency on jurisdiction of construction;
Assessment of the impact of power plant technology on construction time;
Influence of power plant installed capacity on construction time;
Impact of historical factors on construction time.
2. Literature Review
The majority of previous research on construction time has focused on transportation and civic infrastructure rather than power systems. The main exception is nuclear power plants, which have been subject to a number of economic and construction analyses. The value of these analyses is to provide improved estimates of the necessary budgets and timelines to complete large projects so that subsequent projects may be completed more efficiently.
Estimates for construction time and costs are part of the larger field of project and operations management. Models of project management have developed over many centuries, with early models using trial and error approaches without conception of the final products [
1]. Standardized management tools and practices did not appear until the 1950s. These tools and practices were developed in response to military and aerospace programs of the Cold War, with one example being the critical path model. Standard project monitoring tools were not enforced until the 1980s by American federal agencies along with the development of concurrent engineering practices [
1].
The development of project management tools have aided large construction projects; however, the effectiveness of the tools is subject to the users. Several researchers have identified that social biases exist in project planning such as future-perfect strategizing [
2], strategic misrepresentation [
3,
4,
5,
6], and escalation of commitment [
7]. The biases have been studied in the context of power plant construction. Strategic misrepresentation was investigated for offshore wind farms for the United Kingdom coupled with reference class forecasting [
6]. The authors considered 10 UK wind power plants and applied Flyvbjerg’s reference class forecast (RCF) approach. Flyvbjerg’s approach identified cost overruns as a result of optimism bias and strategic misrepresentation. Nine of the 10 wind power plants considered exhibited a cost overrun, and all had time overruns exceeding 10% of the original duration. Strategic misrepresentation was found to extend to the benefits of the wind farms as capacity factors were overestimated. Koch and their colleagues applied the RCF to the London Array and suggested extending the construction time by 30%.
A similar study was completed by Locatelli for the Flamanville, France and Olkiluoto, Finland nuclear power plants [
7]. Both power plants are facing cost overruns and construction delays, with optimism bias and unfamiliarity with the design identified as contributing factors. The study included detailed analyses of the two projects using ATLAS coding of qualitative data to identify collaborations, network management, procurement, etc. An earlier study revealed that scaling up nuclear power led to significant increases in cost and time [
8]. Subsequent work by Locatelli predicted the completion of Flamanville for 2019–2020 [
9], and associates the delays to a lack of standardization. Locatelli contrasts the French construction experience with the South Korean construction of nuclear plants, which occur on time and on budget because of a standardized construction and supply chain.
Another study applied the Construction Industry Institute Engineering Procurement Construction (EPC) project life cycle to nuclear power plants to determine the origins of delays [
10]. The EPC life cycle identifies six phases; however, each phase may be sensitive to the type of project such as nuclear. This result confirms an earlier study completed for the Ford Amendment on nuclear that concluded delays in the 1970s and 1980s resulted from a lack of experienced personnel, poor project management, and regulatory approval of permits in a dynamic regulatory and political environment.
The majority of research on construction times has focused on waterworks and sewage [
11], buildings [
12], transportation infrastructure [
13], and petrochemical projects [
14]. Six categories of delay were identified: contractor performance, owner administration, planning and design, government regulations, environmental assessments, and supervision. Of these, contractor performance is considered the major delay cause. Upward of 70% of projects experience delays with an average delay of 39% of the intended construction duration [
11]. Contractor performance was echoed as a cause of delay by Alhajri et al. with specific examples cited of poor site management and conflict with the contractor from interviews conducted with project managers, engineers, and supervisors [
14].
Al-Momani considered construction delays and causes for 130 public projects in Jordan such as residential, medical, and schools. The sources of delays varied based on the group surveyed. Engineers identified cash flow from the project owners as the main delay, while owners identified design errors by the engineers and labor availability of the contractors as the main sources of delay. The survey groups also identified that the early stages of a project were the origins of most delays, which coincides with the findings by Wright. The analysis used Excel to develop a simple linear regression for the construction delays [
12].
A more recent study of power transmission projects attempted to determine the origins of the delays that afflict that sector. A survey of 311 stakeholders identified 63 delay factors that can be organized into ten groups: sector-specific, general, administrative, employer, contractor, consultant, materials, equipment, labor, and unavoidable. The sector-specific and general factors are novel from this study as they identified factors that are particular to power transmission projects such as access roads, right-of-way, poor site management, and poor coordination between different types of work [
15]. The study did not include any regression techniques and was solely empirical; however, the identification of factors impacting delays is useful for power plant related analyses.
Love and their colleagues examined cost overrun probability distributions for 276 Australian construction projects with the objective to see if the distribution is normal [
13]. Normality was assessed as applying the incorrect distribution would affect the predicted results. This assessment is important since Flyvbjerg assumes Gaussian distributions in their technique. The authors used Kolmogorov–Smirnov, Anderson–Darling, and Chi-squared tests to compare distributions and found that the distributions are non-Gaussian and are best described by a three parameter Frechet function for cost overruns. This result suggests that Flyvbjerg’s normality assumption for cost overruns is incorrect.
Further, there is debate whether the project type and geographical location affects the cost overrun prediction. Flyvbjerg predicts independence of location but dependence of project type [
16]. Dependence upon project type was also identified by Bhargava and their colleagues for highway projects, and project type influenced both cost and construction time [
17]. Meanwhile, Odeck predicts independence of project type [
18]. Last, Flyvbjerg’s approach is that projects of the same class are assumed to have the same optimism bias [
19]; however, earlier work by Love and Odeck show that the bias is not uniform for a given class [
18,
20,
21]. The present work assesses the dependence of location and project type, and applies non-Gaussian distributions, thus contributing to the literature. The novelty is the consideration of construction time instead of cost.
The technical approach considered here is the use of multiple linear regression (MLR) techniques, as described in
Section 4. Regression analyses have an established presence in predicting cost and time overruns of infrastructure projects. Samarghandi et al. used regression techniques to quantify construction delay factors in Iran for residential projects and educational facilities [
22]. Educational facilities have been a particular interest for assessing cost and time overruns, with a more advanced regression analysis executed by Asiedu et al. [
23]. Samarghandi et al. used linear regression techniques where a single dependent variable was correlated with a single independent variable. The Asiedu et al. team used multiple linear regression (MLR), where they identified 10 predictive variables that could influence cost overruns. The MLR analysis determined that five of the variables influenced the overruns [
23]. A study of 911 building projects in Ghana also applied MLR analysis in the statistical software
R and considered seven predictive variables [
24]. A total of three variables were determined to influence the completion cost of the projects.
Other construction projects, such as buildings and drainage, have been analyzed using regression techniques. Senouci et al. used linear regression models to predict cost overruns in Qatari construction projects. These models were developed in Excel and determined the correlation between contract price and cost overrun [
25]. Croatian water infrastructure projects have also been modeled using MLR techniques. Ninety-three projects were considered with data collected via survey of project managers. These surveys identified 108 variables that were evaluated using MLR, and a set of 5 variables were determined to have the majority of the impact on the success of a water project [
26].
Electrical infrastructure projects have been studied less often; however, an extensive analysis of cost overruns was done in 2017 of 401 projects across 57 countries. These projects were constructed between 1936 and 2014, and the team applied a linear regression analysis to the construction costs. They determined that large projects such as hydroelectric and nuclear facilities have a high correlation with cost overruns while decentralized facilities such as wind and solar have a negative correlation [
27]. This study did not include a time trend, as the data was clustered in certain decades. For example, most of the nuclear power plants used in this study were constructed during the 1970s. A subsequent study examined hydroelectric power plants and their time overruns by considering 57 hydroelectric dams installed between 1975 and 2015 [
28]. This study only included hydroelectric facilities financed by the World Bank Group and assessed the uncertainties in the costs and benefits of the technology. The data analyzed indicated that 80% of the 57 facilities considered experienced a time overrun, which highlights the need to accurately predict the construction of hydroelectic facilities [
28].
4. Model
The theoretical framework is composed of early diagnostics, regression analysis, and residual analysis, and was completed using R, a free programming language and software for statistical analysis and graphics. A couple of early diagnostics were performed prior to the regression analysis. First, the dependent variable (Time) was plotted against each independent variable to observe any non-linearities. A correlation matrix was computed to see if there is any indication of multicollinearity among the independent variables. In regression analysis, a function must be established that relates the dependent variable to the independent variable. The function may use the dependent and independent variables directly, known as a level–level, or use the logarithmic of one or both variables. Additionally, the function may include interaction terms which are other variables that may influence the dependent variable. Further, the function may be linear or a polynomial. For the regression analysis considered here, several models were considered of the following functional forms. The objective was to determine the functional form that captured the relationship between the construction time and the independent variables the best. Each of the following functional forms was applied to the data, and a stepwise regression applied within each form. The stepwise regression added parameters to the model such as technology type, decade of construction, or midpoint of construction. At each step and form, the residuals of the model were calculated and recorded. The form that had the lowest residual was selected as the best model. Below are the forms considered:
Level–level without interaction terms;
Log–level without interaction terms;
Level–log without interaction terms;
Log–log without interaction terms;
Level–level with interaction terms;
Log–level with interaction terms;
Level–log with interaction terms;
Log–log with interaction terms;
Polynomial models with linear dependent variable;
Polynomial models with logarithmic dependent variable.
The level–level form without interaction terms is the simplest model and uses the dependent and independent variables directly. For example, construction time as a function of power plant installed capacity. The log–level form without interaction terms is similar to the level–level form except the logarithm of the dependent variable is used. In this case, that would be the logarithm of construction time as a function of installed capacity. A logarithm may be necessary, as some technologies have long construction periods. A related form is the level–log form without interaction terms. This model would be the construction time, as a function of the logarithm of installed capacity. The power plants considered here range in capacity from a few MWs to the GW scale; therefore, the logarithm of installed capacity may provide a better fit. Lastly, the log–log form without interaction terms uses the logarithm of both the dependent and independent variables.
Interaction terms describe any relationships between the independent variables that may have an effect on the construction. Models with interaction terms are more sophisticated than the basic models and contribute to the stepwise regression approach used here. The simplest models were attempted first, then interaction terms added to determine if a more accurate model in terms of predicting construction time was possible. An example of interaction terms in the context of power plant construction, is the jurisdiction where the power plants are constructed can have an interaction with power plant size. A jurisdiction with large hydroelectric resources may become more efficient at construction of large installed capacities; therefore, the construction time would be shorter compared to other jurisdictions. The model would require a term to account for the interaction of jurisdiction with power plant size.
The level–level, log–level, level–log, and log–log models all attempt to construct a linear relationship between the independent variables and the dependent variables. A line of best fit is the result of these models. With some data, a linear relationship is not feasible and a nonlinear relationship is required. The polynomials describe nonlinear relationships such as quadratic, cubic, or quartic relationships between the independent and dependent variables. These models are more sophisticated than the linear models and may also use logarithms of the variables.
Each model is estimated by the ordinary least squares (OLS) method [
29]. Initial model development and selection of the functional form was performed using a stepwise regression procedure. For each functional form, stepwise regression selects a group of independent variables by employing forward selection, backward elimination, or bidirectional elimination methods. Stepwise regression adds or removes independent variables to a regression model by optimizing an objective function/criterion such as Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), and
R2 value. The set of models suggested by the stepwise regression was then reduced by applying several selection criteria: standard error of regression, overall significance of a model, individual significance of independent variables, and adjusted-
R2 value. The number of models was further reduced by a multicollinearity test and a residual analysis. The ideal regression model will determine the function that relates the dependent variable to a single independent variable. Actual data may involve the dependent variable being related to several independent variables or the independent variables may be related to each other, which is multicollinearity. The existence of these relationships must be checked and the extent of the relationships evaluated.
Residual analysis was implemented to see if the surviving models satisfied the six multiple linear regression (MLR) Gauss–Markov assumptions. For this purpose, first residuals were plotted against fitted values to detect if there was a serious violation of the linear parameters, zero conditional mean, and homoskedasticity assumptions. Similarly, a histogram and a normal probability plot of residuals were created to check the normality assumption. Finally, outliers and influential observations were determined using outlier tests, standardized residuals, and Cook’s Distance. After the residual analysis, model selection was completed by assessing the remaining models in terms of their predictive power.
7. Conclusions
The construction times of utility scale power plants in Canada were examined using multiple linear regression analysis techniques to assist in identifying the factors that may contribute to time overruns. Construction time overruns for power plants is an area where there has been little study, while other infrastructure projects such as educational facilities, water systems, and highways have been thoroughly studied. This lack of study has implications for construction planning, policy development to adopt more renewable power, and larger grid level planning. If the construction time is poorly estimated, the logistics of certain supplies and workers to arrive on site will be incorrect, policy windows may close that are favorable to certain technologies, and the retirement of older, inefficient power plants may be delayed.
The MLR analysis of Canadian power plants revealed that the construction time is strongly dependent on the installed capacity of the power plant, shown by all the models (Equations (
2)–(
6)) requiring the inclusion of the power plant size through the MW variable. The construction time is also a function of the type of technology, as the analyses demonstrated that hydro and nuclear power plants required separate models (Equations (
4)–(
6)) to accurately estimate their construction time compared to a model developed from hydro, nuclear, and wind (Equation (
2)). Lastly, the construction time is a function of location, as both the hydro and nuclear models required location indicators.
The inclusion of technology indicators on construction time means that an accurate estimate of the construction time for novel power plants may be difficult in jurisdictions where no power plants of that type currently exist. Construction planners are recommended to use the worst case, i.e., longest construction time, from existing technologies. In the Canadian context, the regression model developed here for hydroelectric facilities is recommended. These regression models are also of use to policy developers because there is typically a policy window within which a policy should be enacted. Policy analysts aiming to encourage the use of more utility-scale renewable energy are recommended to use the regression models to estimate how long the power plants will take to construct. This duration will indicate how long a particular construction may remain at the forefront of the public’s perception. The duration will also help with planning to meet environmental targets. Lastly, the inclusion of jurisdictions in the models suggests that policy planners should examine the policies implemented by certain Canadian provinces that may have eased the implementation of power plant technologies.
The development of regression models for the construction time of Canadian utility-scale power plants fills a gap in the knowledge of construction time and time overruns for energy infrastructure. These models have several limitations, the major limitation being that the models were developed with publicly available data. As a result, not every power plant in Canada was used in the regression analyses as many power plants have not released the necessary data. For this research to continue further, power plant construction companies and energy ministries are encouraged to make time line data publicly available. A second limitation is that the end times of the construction may not be uniformly defined across the entire data set. Some reported end of construction times were when a power plant was grid connected, while others used when a generator became operational. The last limitation is that the start times of the construction varied where some facilities used the issuance of government licenses as the start, while others used the breaking of ground as the start.