*2.2. Calibrating Data*

In the previous section, it has been shown that forecasting is important for good decision-making in project management, and that such an approach requires the presence of accurate estimates for the activity durations and costs of the project activities. While many studies have investigated the project managemen<sup>t</sup> domain from different angles, they all—implicitly or explicitly—agree that good forecasting is a necessary requirement for coping with the *entropy of projects*, but this requires *energy*, which is the managerial effort of the project manager and her team.

Hence, accurate estimates should ideally be based on a mix of data for similar past projects and human judgement (the expertise often so readily available in the project team). Many of the simulation studies in the literature clearly opt for using well-known statistical distributions to model activity uncertainty, and randomly vary the parameters for the average duration and standard deviation without really knowing what realistic values are. Despite the relevance of such studies, they do not take any human judgement into account when estimating the distribution parameters, and hardly make use of data of past projects. Instead, they simply rely an arbitrarily chosen numbers for the distribution parameters without a link to real projects. The idea of calibrating data is to overcome the shortcomings of these simulation studies by relying on data of past projects to fit probability distributions, without ignoring the observation that these data are prone to human biases and possible misjudgements.

Figure 1 gives a graphical summary of the central idea of calibrating project data for activity duration distributions. A calibration method is a method to filter data of empirical projects (inputs) by removing parts (calibration) that cannot be used further in the analysis, and to identify the distribution parameters for activity duration that appears the most appropriate in a real-life context. The goal is to classify the project activities in clusters that have identical values for the parameters (average and variance) of a predefined probability distribution (outputs). The three parts (input–calibration–output) are briefly summarized along the following lines, followed by some details about the existing calibration methods.


To the best of our knowledge, only two calibration methods have been proposed in literature that explicitly take the presence of the two human biases—the Parkinson's effect and the effect of rounding errors—into account, and the current study will extend these methods to a third method. The two existing calibration procedures rely on a pre-defined distribution for the activity durations of the project, as outlined in the *calibration* step. More specifically, the *lognormal distribution* is chosen as the distribution for modelling activity duration uncertainty, which means that the null hypothesis for all calibration tests (step 2 in Figure 1) is that the division of real activity duration with the estimated activity duration from the schedule follows a lognormal distribution. While some arguments were given in previous studies why the lognormal distribution is a good candidate distribution for modelling activity duration uncertainty (see, e.g., the study by [15] who advocated the use of this distribution based on theoretical arguments and empirical evidence), this choice obviously restricts the two current and the newly presented calibration methods. Indeed, many other distributions have been used in literature to model activity duration uncertainty, such as the beta distribution (e.g., [16]), the generalised beta distribution (e.g., [17]) or the triangular distributions (e.g., [18]), but a detailed discussion on the choice of distribution and a comparison of these distributions for modelling activity uncertainty is not within the scope of our study. However, this does not mean that our study has no practical or academic value. The main goal of the calibration methods used in the study is that, although they assume that the core distribution of an activity duration is the *lognormal distribution*, it is still true that the parameters for this given distribution (such as the values for the average and standard deviation) cannot be readily seen from empirical data due to distorting human factors such as hidden earliness or rounded data. Consequently, since the calibration methods test whether activity durations follow a lognormal distribution after correcting for the Parkinson effect and rounding errors, we will refer—in line with the previous studies—to the assumed distribution for activity duration as the *Parkinson distribution with lognormal core* (PDLC).
