*3.3. Principal Component Analysis*

After performing a two-level wavelet decomposition, the dataset for each GRB increased in length by a factor of four. A dimensionality reduction technique was used to extract only the most significant information encoded in the wavelet coefficients. PCA is a form of decomposition, which extracts uncorrelated Principal Components from correlated data via an orthogonal transformation [91,92]. PCA involves eigenvalue decomposition of the covariance matrix of the input wavelet coefficient data. The eigenvectors are sorted by the magnitude of their eigenvalues. The user must choose how many eigenvectors to keep based on the percentage of variance explained by each eigenvector. The chosen eigenvectors represent the original data in a new PCA reference frame and are known as the Principal Components (PCs). The matrix of PCs is used to project the wavelet coefficients onto the lower-dimensional PCA space.

In this work, PCA was carried out using the sklearn.decomposition.PCA function. For *Swift*/BAT, the components whose cumulative variance reached >70% were chosen as

the new representation of the dataset, as the number of components required to meet >90% was large. For BATSE and *Fermi*/GBM, the number of retained components ensured that >90% of the variance was captured.

#### *3.4. t-SNE*

The chosen PCA components require transformation to a 2D space so that features can be visualised. Stochastic Neighbourhood Embedding (SNE; Hinton and Roweis [93]) provided a 2D visual representation of the components on arbitrary axes by computing the probability that each point is a neighbour of another point. This used a Gaussian probability density and Kullback–Leibler minimisation [94] to ensure that the low-dimensional space adequately represented the high-dimensional space. A user-specified parameter called Perplexity specified the importance of local or global structure. In general, the Perplexity can be considered representative of the number of nearest neighbours of each point.

t-SNE (t-distributed SNE; Maaten and Hinton [95]) used a Student t-distribution with a single degree of freedom, replacing the Gaussian comparison between points. The sklearn.manifold.TSNE method was used with a Perplexity, which maximises the separation of clusters in the final representation. In this case, the smaller *Swift*/BAT sample was analysed with a Perplexity of 40, while for the larger samples of BATSE and *Fermi*/GBM, Perplexities of 50 and 70 were used, respectively. The result is a 2D representation of the PCA feature space, in which similar light curves were grouped together.

#### *3.5. GMM Clustering*

Finally, Gaussian Mixture Model (GMM)-based clustering was applied to the t-SNE plots to identify clusters using the MCLUST package in R [96,97]. GMM clustering assumes that the observed data are generated from a mixture of K components, where the density of each component is described by a multivariate Gaussian distribution. MCLUST applies 14 different models and chooses the best-fit model and number of clusters based on the Bayesian Information Criterion (BIC; Schwarz et al. [98]). Since the underlying distributions are non-Gaussian, clusters are combined using the clustCombi function to converge on the optimum number of clusters, calculated via an entropy criterion [99].
