**3. Discussion**

In this study, we amassed the ACE2 protein sequences of nineteen species to investigate the possible transmission of SARS-CoV-2 among these species in relation to human ACE2 protein. Multiple sequence alignments of these ACE2 receptors enabled us to estimate the similarity concerning amino acids and, from that, we observed that *Salmo salar* (Salmon fish) was quite distant. It also gave us the idea that some of the amino acid substitutions in the binding residues occurring across the species with respect to human ACE2 resulted in amino acids have similar binding properties, indicating that their interactions with RBD of the S protein will be similar to that of humans, thus making transmission across these species feasible. It was observed that ACE2 sequences from *Homo sapiens* and *Pan troglodytes* (Chimpanzee) were almost identical (showing 99.01% sequence identity). Although ACE2 from *Macaca mulatta* (Rhesus macaque) also shared a high percentage of sequence identity with human protein (95.16%), it possesses substitutions at 39 positions. However, no substitutions were observed in the amino acid residues involved in the interaction with the RBD of the S protein, making the viral transmission across these species highly likely. Again, *Pteropus vampyrus* (Large flying fox) and *Pteropus alecto* (Black flying fox) have precisely the same ACE2 sequence, and thus

signifying high viral transmission and that both of them have an equal chance of getting infected by each other.

Further analysis led us to present a possible transmission flow among the nineteen species, as illustrated in Figure 13. The multifaceted examination of the ACE2 protein indicated that interspecies SARS-CoV-2 transmission is quite possible, and we have tried to provide a better insight into it by predicting the possible transmission among species within the same cluster and between clusters too. However, further in-depth analysis is necessary in the future for the identification of new hosts of SARS-CoV-2 as well as for determination of possible ways to prevent inter-species transmission.

The results reported in this study allow us to propose possible routes of the SARS-CoV-2 transmission flow among species. Unsurprisingly, our results indicate that, among the species studied, it is the members of primates that are the most at risk, followed by those of carnivores, cetartiodactyls, and finally bats. It is settling to see that the predicted transmission flow based on the results of our analyses is in line with the conventional evolutionary knowledge and reported infection cases. One should keep in mind though that the major goal of this study was to provide formally comprehensive structural evidence that could help in clarifying why some hosts are more susceptible than others to SARS-CoV-2 and could constitute a reservoir for further virus spillover. Obviously, more detailed studies are needed in the future to take into account structural properties of ACE2 and peculiarities of its interaction with the RBD of the S protein [15–18], and the presence of different ACE2 isoforms in individual animal species (e.g., humans have at least five ACE2 isoforms [26]). Moreover, one should consider the epigenetic regulation and expression determination of ACE2 (e.g., despite having the same protein sequence, ACE2 is differently expressed in different human cells, and different levels of expression of ACE2 are found in the same type of nasal epithelial cells or pneumocytes from humans and mice). It will also be necessary to analyze more ACE2 sequences from other species and to investigate the possibility for these different species of transforming themselves, in the long term, into healthy carriers of the virus or even into transmitters and diffusers of the disease.

It is well known that protein pairs with a sequence identity greater than 40% are very likely to be structurally similar, whereas protein pairs with a sequence identity of 20–35% represent a 'twilight zone', where structural similarity in pairs is considerably less common, with less than 10% of protein pairs with sequence identity below 25% have similar structures [29–31]. Sequence identity of the ACE2 proteins from nineteen species analyzed in this study ranges from 99.01% (*Homo sapiens* vs. *Pan troglodytes*) to 58.0% (*Homo sapiens* vs. *Danio rerio*), with the lowest identity of 57.13% being between the proteins from *Danio rerio* and *Rhinolophus ferrumequinum*. Therefore, one might expect rather close overall structural organization of all these proteins, even the most distant ones. In fact, even the lowest level of sequence identity for the pair of ACE2 proteins is still well above the sequence identity of 20–35% characteristics for the 'twilight zone'. On the other hand, fold-level, global structural similarity does not exclude the presence of local structural variability that might define, for example, the peculiarities of protein–protein interactions. Structural information is currently available only for the ACE2 from *Homo sapiens* and *Felis catus*. Therefore, previous studies that analyzed the peculiarities of interactions between the viral spike protein and host ACE2 from many household and other animals, such as *Pan troglodytes* (chimpanzee), *Macaca mulatta* (Rhesus monkey), *Felis catus* (domestic cat), *Equus caballus* (horse), *Oryctolagus cuniculus* (rabbit), *Canis lupus familiaris* (dog), *Sus scrofa* (pig), *Avis aries* (sheep), *Bos taurus* (cattle), *Mus musculus* (house mouse), and *Mustela putorius furo* (ferret) [32–34] were focused on the structural part of these interactions and utilized a typical set of structural biology approaches, such as homology modelling and docking. Therefore, in line with our previous study [25], we decided to compare the peculiarities of the per-residue intrinsic disorder predispositions of the ACE2 proteins from nineteen species analyzed in this study rather than building their homology models. Figure 14 summarizes the results of this analysis and shows that, although these proteins have rather similar intrinsic disorder predispositions, their disorder profiles are not identical.

Furthermore, such differences in the intrinsic disorder predisposition are not equally spread through the protein sequences, with some regions (e.g., the N-terminal 150 residues and residues 500–700) of the disorder profiles showing rather noticeable variability. Figure 14 also shows that the S protein binding domains D1 and D2 of ACE2 proteins are characterized by high variability of their intrinsic disorder predispositions, whereas D3 domains are more conserved. We also looked at the peculiarities of intrinsic disorder profiles of ACE2 proteins in six clusters with the major focus at the S protein binding domains D1, D2, and D3 (see Figure 15).

This comparison revealed that the in-cluster variability of intrinsic disorder propensity was noticeably lower than the diversity between the clusters as a rule. These observations support the notion that the capability of ACE2 to interact with SARS-CoV-2 protein S can be dependent on the peculiarities of the ACE2 local intrinsic disorder predisposition [25].

**Figure 14.** Per-residue intrinsic disorder predisposition of ACE2 proteins. (**A**) Peculiarities of the intrinsic disorder distribution within the amino sequences of ACE2 protein from nineteen species analyzed in this study. Light gray vertical bars show the location of the ACE2 regions responsible for interaction with SARS-CoV-2 S protein, domains D1 (residues 24–42), D2 (residues 79–84), and D3 (residues 330–393). (**B**–**D**). Zoomed-in disorder profiles focusing at the domains D1 (**B**), D2 (**C**), and D3 (**D**) responsible for the ACE2-S interaction. Disorder predispositions were evaluated using the PONDR® VSL2 algorithm.

**Figure 15.** Peculiarities of intrinsic disorder predisposition within the D1 (**A**,**D**,**G**,**J**,**M**), and (**P**), D2 (**B**,**E**,**H**,**K**,**N**), and (**Q**), and D3 domains (**C**,**F**,**I**,**L**,**O**), and (**R**) of ACE2 proteins from cluster 1 (**A**,**B**), and (**C**), cluster 2 (**D**,**E**), and (**F**), cluster 3 (**G**,**H**), and **(I)**), cluster 4 (**J**,**K**), and (**L**), cluster 5 (**M**,**N**), and (**O**), and cluster 6 (**P**,**Q**), and (**R**). For keys, see Figure 14.
