*4.2. Hypothesis Test*

The *hypothesis test* (S1) of the statistical partitioning heuristic follows the same methodology as in both calibration procedures, and it can incorporate the data correction for rounding errors or not (S4). The test still assesses whether or not *ln*(*RDij*/*PDij*) is normally distributed by employing Blom scores and the table of Looney and Gulledge. If the correction for *rounding* errors (S4) is also taken into account, it still corresponds to the averaging of the Blom scores for all clusters of tied points. Therefore, it is not necessary to elaborate on each aspect of the S0 and S4 procedures in detail.

Recall that the hypothesis (S1) was also tested in steps S2 and S3 of the calibration procedures, after the removal of all on-time points and a portion of tardy points to incorporate the effect of Parkinson. As a matter of fact, the major difference between the calibration procedures and the new statistical partitioning method lies exactly in the treatment of the data for the Parkinson's effect (S2 or S3). The (extended) calibration method aims at removing data from the project clusters to be never used again (since it follows the Parkinson effect) and only continues the hypothesis testing on the remaining portion of the data. However, the new statistical partitioning heuristic does not automatically remove data points from the clusters, but, instead, aims at splitting each partition into two separate clusters (subpartitions) and then continues testing the same hypothesis on both partitions. This iterative process of splitting and testing continues until a certain stop criterion is met, and the data of all created subpartitions that pass the test are kept in the database. More precisely, at a certain moment during the search, each subpartition will be either accepted (i.e., the data follow a lognormal distribution) or rejected (i.e., the data do not follow a lognormal distribution or the sample size of the cluster has become too small). As shown in Figure 4, we have set the minimum sample size to 3 since partitions containing too few points may ge<sup>t</sup> too easily accepted. The way partitions are split into two subpartitions is defined by two newly developed statistical strategies (selection and stopping), which will be discussed in the next section.
