**5. Computational Results**

This section shows the results of a set of computational experiments on the same set of projects as used in [20]. All projects are taken from the database of [21] which consisted—at the time of introducing this database—of 51 projects. Additional projects have been added later, and has resulted in a database of 125 projects from companies in Belgium. Twenty-eight projects did not contain authentic time tracking data, and were removed from the analysis (97 left), and 14 projects only contained activities that ended exactly on time (which are assumed to be subject to the Parkinson effect). Hence, 83 remaining projects were used in the extended calibration study and will also be used in the computational experiments of the current paper. The average values for six summary statistics of these 83 projects were published in the extended calibration procedure study and are therefore not repeated here. However, Figure 6 displays a summary of the 83 projects used for the analysis. The top graph shows that more than 70% of the projects come from the construction industry, followed by almost 25% IT projects. The bottom graph displays the real time/cost performance of the projects. The graph shows that the database does not contain projects in the bottom right quadrant (over budget and ahead of schedule), but the three other quadrants contain projects with different degrees of earliness/lateness and budget underruns and overruns.

The results of our computational experiment are divided between three sections. In Section 5.1, all projects are used to test the statistical partitioning heuristic without using managerial partitioning, while Section 5.2 makes use of a subset of these projects, now also adding managerial partitioning to the tests. Finally, Section 5.3 is added with a list of limitations of the statistical partitioning heuristic that can be used as guidelines for future research in this domain.

First of all, it is very important to note that the statistical partitioning heuristic still relies on the *p*-value to determine whether or not a certain partition follows the PDLC. The reason for this is twofold. First, it allows us to compare the results of the partitioning heuristic to those of the calibration procedures—in which *p* was the only goodness-of-fit measures that was considered. In addition, second, the only other eligible measure *SEY* does not provide a uniform basis for comparison between projects or partitions, as its numerical value strongly depends on—and can thus vary greatly with—the input values from the data set (i.e., the *ln*(*RDi*/*PDi*) values). In other words, no universal fit threshold can be set for *SEY*. This also explains why we will focus more on the *p*-values than on the *SEY* results in upcoming discussions.

 **Figure 6.** Empirical project database used for the analysis. (**a**) Sector of the 83 projects (mainly construction projects); (**b**) Project time/cost performance.

Secondly, it should also be stressed that *SEY* always remains the main stopping criterion when applying the partitioning heuristic under the advanced stopping strategy. Therefore, we did not include the *R*2*a* values in the two tables with computation results, since they are only of secondary importance. Average *SEY* values are mentioned in the tables because of their prime role in the stopping strategy of the statistical procedure.

Finally, we consider eight different settings for the statistical partitioning heuristic, and, since each of them can be performed with or without managerial partitioning, the results had to be divided over two tables. Table 1 shows the outcomes for the application of the statistical partitioning heuristic to our database under the eight different settings without using human partitioning as an initialization step. A second table will show similar results, but now adding a human partitioning step prior to the statistical partitioning steps (Table 2). The eight settings reflect the choices that must be made for hypothesis testing (Section 4.2) and for the selection and stopping strategies of Sections 4.3.1 and 4.3.2. Each choice can be set to either 0 or 1. To represent these different settings in Tables 1 and 2, the code format *rounding*–*selection*–*stopping* is introduced as follows:


As a result, the eight settings for the parameters (*rounding*–*selection*–*stopping*) are then equal to (0-0-0), (1-0-0), (0-0-1), (1-0-1), (0-1-0),(1-1-0), (0-1-1), (1-1-1).

## *5.1. Without Managerial Partitioning*

Table 1 displays the results for the statistical partitioning heuristic without managerial partitioning under the eight different settings. The table is split up in four main rows ((*a*) to (*d*)), and will be explained along the following lines.


**Table 1.** Results for the partitioning heuristic without managerial partitioning.

**(***a***) # partitions:** This part displays the number of created partitions (total, average per project and maximum) as well as the percentage of projects with one up to six created partitions. All 83 available projects are considered for every setting of the partitioning heuristic. The total number of activities over these projects amounts to no less then 5068 activities (or an average of 61 activities per project), which can be deemed quite an extensive empirical dataset. Remark that the total number of partitions is equal to the number of considered projects for the settings with *selection* = 0 (shown in the first four (-, 0 ,-) settings). Indeed, when only on-time points can be eliminated, partition *P* per definition follows

a pure Parkinson distribution and should therefore not explicitly be considered. We thus only look at partition *L* for evaluating the partitioning heuristic with *selection* = 0. When *selection* = 1 (shown in the last four columns), on the other hand, the partitions created by removing any (i.e., not necessarily on-time) activity from the initial project do no longer trivially adhere to the pure Parkinson distribution. Therefore, all created partitions are considered explicitly in these cases. This explains why the number of partitions in Table 1 is bigger than 83 for settings with *selection* = 1.

The row with the average number partitions per project (avg/p) also shows interesting results. In contrast to the situation where *selection* = 0, there can be more (or less) than two partitions when *selection* is set to 1. There is a logical correspondence between the average number of partitions and the average number of partitioning steps per project (part (*b*) of the table). Indeed, the more partitioning steps that are executed, the greater the chance that an extra partition is created. As such, setting (0-1-1), which exhibited the highest number of partitioning steps for *selection* = 1 (1705), also yields the most partitions per project, namely three on average. The minimum is observed for setting (1-1-0) (1.7 partitions per project), which also clearly showed the least partitioning steps (365). Notice that this minimum is less than 2, which means that, under this setting, there are a lot of projects for which the PDLC is accepted (i.e., *p* > 0.05) even without elimination of a single activity, so that all activities fit the proposed distribution as a whole. This is largely due to the beneficial influence of accounting for the rounding effect through the appropriate averaging of Blom scores. When we want to optimize the fit (i.e., further decrease *SEY*), however, activities will need to be eliminated, thus producing at least one extra partition. This explains why, for the setting (1-1-1), there is on average almost one partition more per project than for setting (1-1-0) (2.6 compared to 1.7).

Furthermore, the maximum number of partitions over all projects is also displayed in Table 1, together with the grouping of the projects according to the number of partitions in which they are divided by executing the partitioning heuristic under different settings. A maximum of six partitions—which is in itself still not too much to become inconvenient to work with—only occurs for one project under setting (0-1-1). This is also the only setting for which there are more projects with three partitions than there are with two partitions, the latter clearly being the most common case and in correspondence with the situation where *selection* = 0 (with per definition only one partition *L* and one partition *P*).

**(***b***) # partitioning steps:** When further going down the rows in the table, we see that settings with *selection* = 1 require significantly fewer partitioning steps than settings with *selection* = 0. This means that a potential fit can be obtained much faster by allowing all activities (i.e., early, on-time and tardy) to be removed from the base partition, which indicates a first advantage of the partitioning heuristic with respect to the calibration procedures. For setting (1-1-0), for example, an average project only needs four partitioning steps. Obviously, when the advanced stopping strategy is used (*stopping* = 1), the number of necessary partitioning steps increases from 4 to 9. Conversely, accounting for rounding (*rounding* = 1) appears to have a decreasing effect on the required number of partitioning steps, i.e., from 16 to 4 and from 21 to 9, which is assumed to be a positive effect given that a lower number of partitions means bigger clusters of data with similar characteristics.

**(***c***) % activities / partition:** For *selection* = 0, we observe that partition *L* of an average project comprises between half (54%) and three quarters (73%) of the total activities, depending on the other selected options. This implies that up to about half of the activities (46%) were removed from the base partition and put in partition *P* (for setting (0-0-1)), which is quite a considerable portion provided that all these eliminated activities had to be on time. This indicates that a grea<sup>t</sup> part of the activities of the considered real-life projects were reported as being on time, which supports the existence of the Parkinson effect (and the rounding effect in second instance) and therefore the relevance of the applied methodologies (i.e., the calibration procedures and the partitioning heuristic to validate the PDLC). Note that no values are reported for the settings with *selection* = 1 since, in these cases, even the partition *P* is subject to further hypothesis testing, possibly resulting in several new partitions. The division of these partitions into new partitions until the stopping criterion is met is shown by the values for the % activities in each partition under part (*a*) of this table.

**(***d***) Goodness of fit:** More importantly, one can observe that the setting (1-1-1) clearly yields the biggest *p*-value and thus the best fit to the PDLC. This *p*-value is significantly larger than that of the optimum for the extended calibration procedure when no managerial partitioning is executed (0.731 >> 0.385; the latter value is not shown in the table but is the maximum value of the extended calibration procedure found in Table 2 of [20]), and even larger than the overall optimum that occurs when applying initial partitioning according to RP and S4 (0.731 > 0.606; the latter value is the overall maximum *p*-value found in the previously mentioned study). It can thus already be stated that the statistical partitioning heuristic performs better than the extended calibration procedure, also by comparing the percentages of accepted partitions (or projects) without execution of managerial partitioning (maxima: 95% > 81% for the extended calibration procedure). Moreover, accounting for the rounding effect (*rounding* = 1) always appears to be beneficial for the validation chance of the PDLC. Similarly, there is a clear advantage of allowing every activity to be eliminated (*selection* = 1) instead of only the on-time points (*selection* = 0), supported by both *p*-values and accepted partitions' percentages.

We now mention a couple of qualitative reasons why a better performance is observed for *selection* = 1 than for *selection* = 0. First of all, the biggest residual in a certain partitioning step will always be at least as big—and most likely bigger—in the former case than in the latter, since the algorithm can choose from *all* activities when *selection* = 1 and not just from the on-time fraction. Eliminating an activity with a bigger residual means a stronger decrease of *SEY* and thus a faster evolution towards the acceptance of the PDLC. This also explains why *selection* = 1 requires fewer partitioning steps than *selection* = 0, as mentioned earlier.

Secondly, although Table 1 did not ye<sup>t</sup> consider managerial partitioning, there is statistical partitioning when setting *selection* to 1. This means that—in contrast to what is the case for the calibration procedures or when putting *selection* to zero—the early and tardy activities that show very diverse characteristics for their durations can now be assigned to different partitions for which specific distribution profiles can be defined, instead of obstinately trying to fit a single distribution profile to a set of activities that are just too heterogeneous. A good illustrative example is given by the detection of clear outliers in the project data discussed in [20] while validating their extended calibration procedure. These authors propose two straightforward criteria to select outliers, and compare their approach with the approach taken in the empirical validation of the original calibration procedure [19]. In their empirical validation of the original calibration procedure, the authors eliminated 66 activities from the set of projects as clear outliers, but they did not explicitly state how they did this. Using the two proposed criteria to detect outliers for the extended calibration procedure has resulted in the detection of the same 66 activities as being clear outliers, except for one project. This project (ID C2014-03) also had clear outliers when these two new criteria were used, but these outliers were not detected in the first empirical validation study. In the extended calibration study, it was therefore argued that failing to identify and eliminate clear outliers could lead to serious distortions in the results as a motivation for why the two criteria should always be strictly applied. This is, however, is no longer as valid as it was when the statistical partitioning heuristic was used. Using the newly proposed selection and stopping strategies, non-removed outliers would obviously exhibit the biggest residuals and thus automatically be put in a separate partition and could then no longer impede the validation of the PDLC for the other activities (and the resulting partition should be automatically removed from the project database). This also implies that it would no longer be a huge problem to not identify and eliminate the clear outliers beforehand, since the procedure would do this automatically when *selection* = 1. The partitioning heuristic therefore becomes less prone to human error and prevents biased outcomes resulting from such errors, which of course is an advantage of the partitioning heuristic with respect to the calibration procedures and supports the applicability and robustness of the former.

## *5.2. With Managerial Partitioning*

Table 2 presents more similar results than the previous table, but now with the managerial partitioning step as an initialization carried out prior to the statistical partitioning algorithm. The table no longer considers all eight settings for the statistical partitioning heuristic, but fixes the *rounding* value to 1 because this was shown to have a positive effect on both the partitioning efficiency (i.e., fewer partitioning steps) and, foremost, goodness-of-fit (i.e., higher *p*-value). In addition, the *stopping* option is also fixed to 1, since this obviously produces the better *p*-values compared to *stopping* = 0. Moreover, the former setting in fact incorporates the latter, since, up to the point where *p* becomes greater than 0.05, both approaches run completely parallel. In contrast, the value for the *selection* option is not fixed, since the experiments are set up to assess its influence in combination with managerial partitioning. The settings that are included in Table 2 are thus reduced to (1-0-1) and (1-1-1). Although Table 2 (with managerial partitioning) contains more information than Table 1 (without managerial partitioning), the former will be discussed less extensively than the latter, as many aspects have already been addressed. Rather, we now focus on the most notable results and differences.


**Table 2.** Results for the partitioning heuristic with managerial partitioning.

\* For partitioning criterion WP, a different scale applies for the next six rows: 1/2/3/4/5/6 partition(s) should be regarded as 1-5/6-10/11-15/16-20/21-25/26-30 partitions, respectively.

**(***a***) # Projects:** A first difference is the number of projects that are considered. This is no longer always 83 because, for some projects, the two of the three criteria for managerial partitioning were not defined by the project manager (i.e., the WPs and/or RPs of the activities were not known, cf. S0 of Section 3.1). The total number of activities that are considered is thus also less than 5068 for WP and RP as partitioning criteria, however, still adequate with a total number of activities of 3796 and 887.

**(***b***) # partitions:** The number of partitions (human) displayed in the table reflects the number of partitions that are created by performing managerial partitioning according to the different criteria. This is the initial partitioning operation (i.e., before executing the actual partitioning heuristic), and obviously yields the same partitions for both *selection* values. On the other hand, subpartitions are created by performing statistical partitioning and are therefore only present when *selection* = 1. In that case, each of the partitions obtained from managerial partitioning is further divided into smaller partitions—therefore called *subpartitions*—using the statistical partitioning heuristic. This means that each project in fact goes through two consecutive partitioning phases when the partitioning heuristic is applied with setting (1-1-1) and including managerial partitioning. The number of subpartitions is obviously larger than the number of partitions, and even reaches 631 over 53 projects for the WP criterion. This comes down to almost 12 subpartitions per project, which might be a bit much to be practical and less relevant since this implies an average of only six activities per subpartition. However, this is not a problem when one of the other managerial criteria is applied, with an average of about five subpartitions per project. The main reason is that project managers apparently define way too much WPs, on average eight per project, with an excessive maximum of 26 WPs for one project. This issue could be resolved by stimulating project managers to limit the number of identified WPs through consideration of higher-level classification criteria.

**(***c***) # partitioning steps:** The number of partitioning steps do not fundamentally differ between the two tables and the table still shows that the setting with *selection* = 1 requires significantly fewer partitioning steps than the setting with *selection* = 0. Furthermore, the introduction of managerial partitioning does not seem to increase the average number of partitioning steps (this remains about 9 (between 8 and 10) for (1-1-1) like in Table 1), which means that the computational effort to partition the data remains just as low.

**(***d***) % activities / partition:** The percentage of activities per partition differs between the two tables. For the setting with *selection* = 0, partition *L* on average still comprises about 80% (between 77% and 79% as shown in row '% act partition *L*') of the initial activities, and even 90% for the WP criterion. This is much more than the 59% for (1-0-1) without managerial partitioning from Table 1. Hence, in order to obtain a fit to the PDLC, a far smaller portion of (on-time) activities needs to be removed from the managerial partitions than was the case for the complete project. This indicates that the application of managerial partitioning criteria is indeed relevant and beneficial, and that the definition of them by project managers should thus be stimulated.

**(***f***) Goodness of fit:** The absolute best fit so far in this research is obtained by applying the partitioning heuristic with setting (1-1-1) in combination with managerial partitioning according to the criterion that already proved most profitable in an earlier study, namely RP. The average *p*-value of 0.811 is significantly higher than the maximum for the extended calibration procedure, which is 0.606 for partitioning step S4 preceded by managerial partitioning according to—also—RP. The percentage of accepted partitions is equal and very high (97%) for both, so we can conclude that the partitioning heuristic outperforms the calibration procedures, regardless even of its qualitative benefits concerning flexibility and robustness. Therefore, we will no longer consider the (extended) calibration procedure in the rest of the discussion.

However, the mentioned *p*-value of 0.811 is not exceedingly higher than that for partitioning setting (1-1-1) combined with either of the other managerial criteria (*p* ranging from 0.756 to 0.783) or even without managerial partitioning (*p* = 0.731; see Table 1), and also the partitioning setting (1-0-1) combined with managerial partitioning according to—again—RP comes close with a *p* of 0.741. The reason for this is that a combination of managerial partitioning and statistical partitioning (which occurs when *selection* = 1) should in fact be seen as a 'double' optimization. Both partitioning approaches already perform very well separately, but combining them takes the distribution fitting another (small) step closer to 'optimal' partitioning. Furthermore, managerial and statistical partitioning do not only perform well on their own; they are mutually also quite comparable. To show this, we need to compare the partitioning heuristic with setting (1-1-1) (so without advanced

statistical partitioning) and no managerial partitioning (see Table 1) and that with setting (1-0-1) (so without advanced statistical partitioning) and managerial partitioning according to RP (see Table 2). Remarkably, both exhibit almost identical *p*-values (0.731 versus 0.741) and accepted partitions percentages (94% versus 95%). This observation is in fact hugely promising, as it indicates that we can just perform the partitioning heuristic with inclusion of the statistical partitioning (i.e., set *selection* to 1) and still obtain very relevant partitions without requiring realistic input for managerial criteria (i.e., WPs or—even better—RPs accurately defined by the project manager). Statistical partitioning is no longer—or at least far less—prone to human judgement and bias than managerial partitioning. In the latter case, project managers indeed need to *accurately* define the WPs or RPs, otherwise the resulting partitions would be faulty and unrealistic anyhow. It might be beneficial to bypass this uncertain human factor, and thus create a more solid and trustworthy methodology for categorizing activities into risk classes and assigning specific distribution profiles to them. The partitioning heuristic developed in this section allows just this. Apart from the discussion of either managerial or statistical partitioning (or both) being preferred, our results clearly show that it is essential to create partitions for a project in order to obtain decent fits of the activity durations to the PDLC.
