4.1. Principal Component Analysis
Multiple sequence alignment was used to develop and test hypotheses based on protein structure, function, and phylogenies. High-quality alignments from large datasets are challenging to generate and interpret due to risks of misalignments, computational burden, and the multivariate nature of peptide sequences [
18,
20]. In this study, FFT and PCA were explored to help with these problems and to ultimately investigate conotoxin divergence patterns based on peptide sequences available in ConoServer.
Feature extraction and sorting revealed that conopeptides in each gene superfamily are isolated from the main three diet types based on the prey they subdue. They can be classified as isolates from piscivorous species that immobilize and gulp fish, molluscivorous species that feed on other gastropods, and vermivorous species that feed on polychaete worms [
16]. Generalists are available in ConoServer but were omitted to increase the specificity of the pattern analysis. Strikingly, several gene superfamilies are diet-specific, meaning that these groups only contain peptides that target one diet type. It was found that peptides from the G2, I3, K, Q, Y, and R gene superfamilies are only expressed in vermivorous cone snails, while conodipine is expressed only in piscivorous cone snails (
Table 1). Our findings are consistent with previous reports where several novel conotoxins were isolated only in vermivorous and piscivorous species [
33,
34].
To assess superfamilies that house conopeptides targeting more than one diet type, NJTs and PCA distinguished peptide sequence similarities based on diet preference using the BLOSUM62 scoring matrix. NJTs were used for calculating and visualizing gene superfamilies with a small number of samples because PCA failed to display the appropriate similarity space or proximity, as similar sequences tend to lie far from each other in space, such as in the case of the C superfamily (
Table 2) which shows similarity in the NJT dendrogram (
Figure 3). It appeared that distances between points of similar sequences were enhanced due to the absence of other components. The NJT dendrograms revealed that peptide sequences within the same diet type have nearer distances than sequences from others (see divergent MSTLGMTLL and insulin in
Figure 3). In some cases, sequences within the same diet type are located in the same node (see C, vermivorous; insulin, molluscivorous in
Figure 3), implying divergence events based on diet type. Both patterns suggest sequence similarities in peptides isolated from cone snails with the same diet.
Principal component analysis was used to examine the data structure of the complex peptide sequences within superfamilies to investigate the role of diet preference in conotoxin divergence. PCA revealed prominent clustering in the similarity spaces (
Table 2). The components, which are data features of the peptide sequences, were generated by an eigenvector decomposition formed from the sum of the BLOSUM62 scores at the aligned position between each pair of sequences. In this study, the score matrix size can be huge, ranging up to a 100 × 100 matrix and more for large gene superfamilies. Interpreting a large data matrix can be tedious. Hence, to maintain objectivity, PCA was used to reduce dimensions to find directions of differences and visualize them by using the axes [
20]. The PCA algorithm was tested using an artificial dataset composed of highly similar venom peptide sequences from six different taxa. The PCA plots showed positive clustering per taxa as expected (
Figure 2). It suggests that the algorithm [
23,
28] satisfies objectivity and reproducibility [
20]. PCA of large gene superfamilies exhibited positive clustering based on diet types. The findings imply there is little variation in conopeptide sequences from the same diet group, as evidenced by the PCA plots for each gene superfamily (
Table 2). It is noteworthy that the peptide sequences in each gene superfamily originated from different species of cone snails, meaning the vermivorous peptide samples in the A superfamily (
Table 2) originated from multiple species. Hence, peptide sequence similarity bias based on where they are expressed can be neglected.
Sequence similarity based on diet type has far-reaching implications as it explores the role of diet preference on the evolutionary divergence of conotoxins. Diet has always been thought to be a significant driving force shaping venom composition and evolutionary patterns because venom is highly connected to a species’ ability to apprehend prey [
13]. Inside prey are target receptors where conotoxins bind and act to reach specific physiological endpoints [
1]. PCA revealed data structures and patterns on peptide sequences grouped in gene superfamilies. It is important to note that the gene superfamily classification is based on signal sequence and cysteine framework sequence similarities, and subsets to it are pharmacological families that have a functional role in classifying conotoxins based on the specific target receptor where they bind [
32]. Hence, structure-function-wise, peptide sequences in each gene superfamily should be expected to have significant similarities. Interestingly, the data structure within gene superfamilies showed peptide sequences similarities based on the cone snail diet preference. These findings may imply that conotoxins diverged accordingly to increase affinity and fit target receptors found in polychaete worms, fish, or mollusks. Furthermore, hypermutation of the M-regions presumably fine-tunes conotoxins to increase their binding affinity in their specific prey receptors. There is evidence of the structural difference between nicotinic acetylcholine receptors (nAChRs) in invertebrates (e.g., nematodes and polychaetes) and vertebrates (Fig.A2). In fire worm-eating cone snails (vermivorous), the venom contains many peptides specific to the homopentamer homolog of the nAChR that only have alpha subunits (Fig.A2). It was presumed that the abundance of very specific conotoxins in worm-hunting cone snails serves their purpose to act on the all-alpha subunit nicotinic receptor found in the neuromuscular junction of fire worms [
35].
Conotoxin divergence based on diet can be beneficial for receptors binding to their prey. Due to environmental fluctuations (e.g., sudden climate changes or catastrophes), rapidly changing prey, predators, and competition force conotoxins to adapt rapidly by diverging, as if it is a mechanistic way to evolutionarily succeed by really diversifying [
1,
36]. In conotoxins, venom component diversity plays a crucial role in receptor binding to reach the desired physiological endpoints. Olivera described this as the combinatorial library search strategy, an optimum evolutionary technique of cone snails to evolve new peptides in their venom to generate neuropharmacological diversity [
1]. Early accounts revealed that multiple conotoxins play a role in producing the desired physiological response in prey capture, as individual venom components did not produce the same effect as crude venom. It tells us that conopeptides work together in groups (or cabals) to reach a physiological endpoint [
1,
37]. Hence, increased spontaneous mutation rates in the M-regions of conotoxins are beneficial because they enrich the combinatorial pool of conopeptides to achieve the appropriate formulation to adapt to a particular hunting situation [
6,
38]. Tt appears that cone snails evolved to fine-tune their conopeptide armory to respond to evolutionary pressures by producing powerful cabals of peptides selective for the cone snail’s prey of choice [
3].
Lastly, several unusual patterns emerged from the data. Most peptides in the database are isolated from the Indo-Pacific region. Strikingly, molluscivorous snails are exclusive to cone snails located in the Pacific region (
Table 1). This result implies that venom variation is affected by environmental conditions. The role of climate, seasonal changes, and temperature show positive changes in venom variation in scorpions and snakes [
39,
40]. Future studies on this topic can be pursued to establish the role of the geographical distribution in cone snail venom composition and diversity.
4.2. Structural Element Analysis
Prey shifts can accelerate conotoxin diversity. Due to environmental fluctuations, cone snails must adapt to rapidly changing predators, prey, and competition. Food resource utilization is among the critical evolutionary events that can lead to biodiversity on Earth, and these shifts open opportunities for studying the underlying molecular changes [
41]. Morphological observations and sequencing efforts indicate a vermivorous ancestry that evolved into molluscivorous and piscivorous diets. These evolutionary reconstructions based on curated databases show that ancestral cone snails preyed on marine worms and evolved the capacity to prey on mollusks and fish [
26,
27]. These findings align with the molecular phylogeny analysis of Puillandre et al. and Aman et al., suggesting two separate events in the Miocene era which triggered the generation of fish-hunting and mollusk-hunting cone snail lineages [
42,
43]. These prey shifts led to a series of adaptive radiations that continued to present, as evidenced by fossil records showing few fish-hunting and mollusk-hunting cone snail species in the geologic past, anatomical specialization, and now, the increasing role of venom specialization [
44,
45].
All cone snails use their venom as the primary weapon for prey capture. In our previous data mining study [
7], sequences in each superfamily clustered based on diet types, meaning that conopeptides from mollusk-hunting or molluscivorous cone snails have greater sequence similarities than conopeptides belonging to other groups. These findings opened opportunities for subgroup identification based on diet types that may play functional roles in conotoxin affinity to target receptors.
Molecular analyses revealed an interesting pattern when sequences from the α-conotoxin pharmacological family were aligned and analyzed. All samples showed the type I cysteine framework (-CC--C--C--) that is known to potently and selectively target nAChR subtypes [
32,
46]. However, within the alpha-conotoxin family, subgroups were apparent based on structure similarity. Conopeptides from fish-hunting or piscivorous cone snails have a smaller number of intervening amino acids between the first/second and third cysteine residue, indicated by the misalignment of these positions (
Figure 4). Furthermore, conopeptides from worm-hunting or vermivorous cone snails showed shorter intervening amino acids between the third and the fourth cysteine residue, as indicated by the gaps near the C-terminus. In contrast, molluscivorous cone snails have full occupancy between cysteine residues (
Figure 2). These results reveal that the number of intervening amino acids between cysteines may play a functional role in conopeptides, which are unexpectedly based on diet subgroups. The most striking result from the data is that these classifications were observed before in the structural determinant study conducted by Gomez et al. but they did not show what caused the subgrouping [
11]. Their analysis revealed that type I alpha-conotoxins fall into at least three distinct categories (as in
Figure A3). Peptides from different subgroups showed dissimilar α-conotoxin backbones, indicated by gaps in their structural alignment. Our results share similarities with their findings; however, we identified that these subgroups are based on cone snail diet types. Notice that in
Figure A3, group 1.1 contains a sample from the molluscivorous cone snails such as TxIA from mollusk-hunting
C. textile, group 1.2 contains samples from vermivorous cone snails such as RgIA from worm-hunting
C. regius, and group 1.3 are conopeptides from piscivorous cone snails such as GI from fish-hunting
C. geographus. These three subgroups are classified based on the diet preference where they take action, indicating possible insertions and deletions between cysteine residues that caused venom specificity.
Structural element analysis revealed positions containing specificity residues located between the third and fourth cysteine in a type I cysteine framework (
Table 3). The tests identified that positions 12, 13, 15, and 16 contain structural elements that are nonoverlapping and statistically reliable based on their high SH score and z-score (
Table 3). Furthermore, their high multi-relief weights indicate that these positions are conserved within subgroups but divergent between other groups. These findings may have identified structural elements critical in function specificity, providing insights into why these conopeptides group and act on specific diet types. Overall, it was found that α-conotoxins have structural homology within the same diet subgroup but are divergent from other diet subgroups.
Overall, strong conservation patterns are prominent in molluscivorous peptides. The tests revealed highly conserved hydrophobic, hydrophilic, and small residues intervening the third and fourth cysteine. Similarly, the prominence of gaps showed shorter intervening amino acids between cysteine residues for piscivorous and vermivorous peptides.