*2.3. Clustering Based on Bioinformatics*

Twelve bioinformatics features viz. Shannon entropy, instability index, aliphatic index, charged residues, half-life, melting temperature, N-terminal of the sequence, molecular weight, extinction coefficient, net charge at pH7, and isoelectric point of the D1, D2, and D3 domains of ACE2 for all nineteen species were determined (Figure 6).

For each species, a twelve-dimensional feature vector was found (Figure 6). For each domain D1, D2, and D3 domain, a distance matrix was determined using the Euclidean distance

$$d(S, T) = \sqrt{\sum\_{i=1}^{12} (f\_i - g\_i)^2}$$

Note that here *f<sup>i</sup>* and *g<sup>i</sup>* denote the *i*th feature for the species *S* and *T*, respectively. These distance matrices with heatmap representation for all three domains are presented in Figures 7–9. In addition, by inputting the distance matrix, using the K-means clustering technique, several clusters of species were formed for D1 and D2 domains in eighteen species (Figures 7 and 8) and D3 domain in all nineteen species (Figure 9).

A final set of six clusters was formed using the K-means clustering method to have all three domains for eighteen different species (Figure 10). Although the species S7 was clustered with the species S5 and S11 as per full-length ACE2 sequence homology, S7 formed a unique singleton cluster when the bioinformatics features were taken into consideration. Similarly, the species S16 formed a singleton cluster though it was clustered with S17, S18, and S19 as per the amino acid homology of ACE2. The sequence homology of ACE2 made the four species S16, S17, S18, and S19 into a single cluster, but bioinformatics features placed the species S18 in a cluster where the other three species S1, S2, and S3 belonged. Based on bioinformatics features, S4 clustered together with S15 though the ACE2 receptor of S4 was sequentially similar to ACE2 of S9, S10, and S12.

The clusters {*S*1, *S*2, *S*3}, {*S*6, *S*13}, and {*S*9, *S*10, *S*12} were unaltered with respect to the full length ACE2 homology and bioinformatics features.


**Figure 6.** Bioinformatics of the D1, D2, and D3 domains of ACE2 from eighteen species. For *Salmo salar*, only D3 bioinformatics was presented.


**Figure 7.** Distance matrix based on the bioinformatics feature vectors of D1 of ACE2 across eighteen eighteen species and associated clusters.


**Figure 8.** *Cont.*

**Figure 8.** Distance matrix based on the bioinformatics feature vectors of D2 of ACE2 across eighteen species and associated clusters.

**Figure 9.** Distance matrix based on the bioinformatics feature vectors of D3 of ACE2 across nineteen species and associated clusters.

**Figure 10.** Clusters of species based on the bioinformatics of the D1, D2, and D3 domains.
