*4.2. Methods*

Examining amino acid substitutions: For human ACE2 receptor, substitutions were examined for all species, and only those substitutions are accounted for, which occurred in the binding residues in the mentioned three domains D1, D2, and D3 [14]. Based on the character of the substitutions which interfered with the binding residues of the ACE2 across various species, two types were defined: substitutions affected transmission (M1) and substitutions which did not affect transmission (M2).

Multiple sequence alignments and associated phylogenetic trees were developed using the NCBI web-suite across all individual binding domains D1, D2, and D3 in eighteen species and D3 in *Salmo salar* [37,38].

K-means clustering: The algorithmic clustering technique derives homogeneous subclasses within the data such that data points in each cluster are as similar as possible according to a widely used distance measure viz. Euclidean distance. One of the most commonly used simple clustering techniques is the *K-means clustering* [39,40]. The algorithm is described below in brief:

*Algorithm:* K-means algorithm is an iterative algorithm that tries to form equivalence classes from the feature vectors into K (pre-defined) clusters where each data point belongs to only one cluster [39].


In this present study, nineteen species were clustered using *Matlab* by inputting the distance matrix derived from the feature vectors associated with the three domains of ACE2 across all species.

Secondary structure predictions: The secondary structure of full-length ACE2 sequence of all species were predicted using the web-server CFSSP (Chou and Fasman Secondary Structure Prediction Server) [41]. This server predicts secondary structure regions from the protein sequence such as alpha-helix, beta-sheet, and turns from the amino acid sequence [41]. On obtaining the full-length ACE2 secondary structures, individual domains D1, D2, and D3 were cropped for each species.

Bioinformatics features: Several bioinformatics features viz. Shannon entropy, instability index, aliphatic index, charged residues, half-life, melting temperature, N-terminal of the sequence, molecular weight, extinction coefficient, net charge at pH7, and isoelectric point of D1, D2, and D3 domains of ACE2 for all nineteen species were determined using the web-servers *Pfeature and ProtParam* [42,43].

Computational analysis of the intrinsic disorder predisposition: Per-residue propensity of the ACE2 proteins from nineteen species for the intrinsic disorder were evaluated by the PONDR® VSL2 algorithm [44,45], which is one of the more accurate stand-alone per-residue disorder predictors [46,47]. In these analyses, residues with the disorder scores exceeding the threshold value of 0.5 are considered as intrinsically disordered, whereas residues with the predicted disorder scores between 0.2 and 0.5 are considered as flexible.

Shannon entropy: Shannon entropy measures the amount of complexity in a primary sequence of ACE2. It was determined using the web-server *Pfeature* by the formula

$$SE = -\sum\_{i=1}^{20} p\_i \log\_2(p\_i)\_f$$

where *p<sup>i</sup>* denotes the frequency probability of a given amino acid in the sequence [42].

Instability index: Instability index is determined using the web-server *ProtParam*, and it estimates the stability of a protein in a test tube. A protein whose instability index is smaller than 40 is predicted as stable. A value above 40 predicts that the protein may be unstable [42].

Aliphatic index: Aliphatic index of a protein is defined as the relative volume gathered by aliphatic side chains (alanine, valine, isoleucine, and leucine). It may be regarded as a positive factor for increasing the thermostability of globular proteins, such as ACE2 [42].

N-terminal: It was reported that the N-terminal of a protein is responsible for its function. For each domain sequence, N-terminal residue was determined using the *Pfeature* [42].

In vivo half-life: The half-life predicts the time it takes for half of the protein amount to degrade after its synthesis in the cell. The N-end rule originated from the observations that the identity of the N-terminal residue of a protein plays an essential role in determining its stability in vivo [48].

Extinction coefficients: The extinction coefficient measures how much light a protein absorbs at a particular wavelength. It is useful to estimate this coefficient when a protein is purified [48].

Polarity sequence: Every amino acid in the domains D1, D2, and D3 of ACE2 were recognized as polar (P) and non-polar (Q) and thus every D1, D2, and D3 for eighteen species and the domain of *Salmo salar* turned out to be binary sequences with two symbols P and Q. Then, homology of these sequences for each domain was made and, consequently, a phylogenetic relationship was drawn.

**Author Contributions:** S.S.H. conceived the problem. S.S.H., V.N.U., D.A., and S.G. carried out the work. S.S.H. and P.P.C., S.S.H. V. N.U. and G.K.A. analyzed the results and wrote the primary draft of the article; K.L., A.A.A.A., M.M.T., and P.A. edited the manuscript. B.D.U., M.S., D.P., A.S., T.M.A.E.-A., R.K., K.T., G.P. and A.M.B. have reviewed critically, and N.R., S.P.S., W.B.-d.-C., Á.S.-A., G.C., and have read the final draft. All authors have agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors do not have any conflicts of interest to declare.
