*3.1. GMM Clustering*

GMM clustering was carried out in R using the MCLUST [78]. GMM clustering assumes that the observed data are generated from a mixture of K components, where the density of each component is described by a multivariate Gaussian distribution. MCLUST fit 14 different models to the data, parameterised by the shape (spherical or ellipsoidal) and volume. In the case of ellipsoidal models, the alignment of the axes and the difference in shape of the fitted ellipsoidals was specified. This is known as Volume-Shape-Orientation (VSO) decomposition. For a given model, the volume, shape, and orientation can be constrained to equal variance, denoted by 'E'. If the variance is free to change, the model is denoted 'V'. Additionally, the orientation of the clusters relative to each other can be constrained to Equal or Varying, or a model can have alignment limited to the coordinate axis, and is labelled 'I'. For example, 'EVI' denotes equal volume components, with variable shapes (i.e., not spherical) and orientation aligned with the axes.

MCLUST makes use of the Bayesian Information Criterion (BIC; Schwarz et al. [79]) to compare mixture models fitted on the data. The best-fit model and number of components are chosen based on the largest BIC value. A difference in BIC value between models of 6–10 is considered significant, while a difference of greater than 2 provides positive evidence for a better fit [80]. This standard GMM fit method is the same as that employed in some previous studies, for example Horváth et al. [56] and Bhave et al. [54].

## *3.2. Combination of Gaussian Components*

In the case where Gaussian components were overlapping or components were suspected to be non-Gaussian, as has been shown for the BATSE and *Fermi*/GBM GRB duration distributions [51,52], the MCLUST function clustCombi was used to hierarchically combine components using an entropy criterion [74]. Entropy is a measure of the uncertainty of the observations belonging to a certain cluster or component. Thus, a large decrease in entropy signifies a better fit with smaller uncertainty. For MCLUST, the final number of components was chosen based on the observed 'elbow' in the entropy plot. The number of components at which the elbow occurred pointed to a large decrease in entropy and, therefore, a model with smaller uncertainty.

There are several methods for joining Gaussian Mixture components. In comparison to the entropy criterion, these methods have limitations, for example, requiring spherical components [81] or one-dimensional data [82]. Other suggested methods assume the number of clusters [83] or make use of hard clustering methods, which assigns points to one cluster rather than applying a probabilistic method (e.g., Tantrum et al. [84]). The method employed in this study was a soft-clustering probabilistic method, which is computationally efficient and applicable to multiple dimensions. Hence, it was the chosen method to achieve a robust clustering result for the complex GRB datasets.

#### **4. Results**

The results of the initial MCLUST fit and subsequent clustCombi method applied to the *Swift*/BAT and *Fermi*/GBM samples are summarised in Table 2.

**Table 2.** Number of components (K), Bayesian Information Criterion (BIC) values, models, and number of bursts (#) identified in the MCLUST and subsequent clustCombi fits to the *Swift*/BAT and *Fermi*/GBM samples.

