*4.6. Clustering*

Both sequence and filtered structure parameters were used as input for clustering separately. First, hierarchical clustering was done using the scaled features as input, using Euclidean distance and Ward's method (Supplementary Figures S1 and S3). Then, k-means clustering was employed, and the within-groups sum of squares were plotted as a function of the number of clusters (Supplementary Figures S2 and S4). k-means clustering analysis did not provide a clear-cut support for the number of clusters to choose, and hence we opted for choosing a low number of clusters in both cases (four and five in the case of sequence- and structure-based clustering, respectively), that are not in contradiction with the k-means analysis. This choice of cluster numbers reflects our preference for providing an overall high-level classification. Clustering was done using R with the Ward.D2 and k-means packages.
