Next Article in Journal
Astragalus mongholicus and Scutellaria baicalensis Extracts Mixture Target Pyroptosis in Ischemic Stroke via the NLRP3 Pathway
Previous Article in Journal
Cyclophilin A Regulates Tripartite Motif 5 Alpha Restriction of HIV-1
Previous Article in Special Issue
Helichrysum populifolium Compounds Inhibit MtrCDE Efflux Pump Transport Protein for the Potential Management of Gonorrhoea Infection
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Lessons from Deep Learning Structural Prediction of Multistate Multidomain Proteins—The Case Study of Coiled-Coil NOD-like Receptors

by
Teodor Asvadur Șulea
,
Eliza Cristina Martin
,
Cosmin Alexandru Bugeac
,
Floriana Sibel Bectaș
,
Anca-L Iacob
,
Laurențiu Spiridon
and
Andrei-Jose Petrescu
*
Department of Bioinformatics and Structural Biochemistry, Institute of Biochemistry of the Romanian Academy, Splaiul Independentei 296, 060031 Bucharest, Romania
*
Author to whom correspondence should be addressed.
Int. J. Mol. Sci. 2025, 26(2), 500; https://doi.org/10.3390/ijms26020500
Submission received: 27 November 2024 / Revised: 3 January 2025 / Accepted: 7 January 2025 / Published: 9 January 2025

Abstract

:
We test here the prediction capabilities of the new generation of deep learning predictors in the more challenging situation of multistate multidomain proteins by using as a case study a coiled-coil family of Nucleotide-binding Oligomerization Domain-like (NOD-like) receptors from A. thaliana and a few extra examples for reference. Results reveal a truly remarkable ability of these platforms to correctly predict the 3D structure of modules that fold in well-established topologies. A lower performance is noticed in modeling morphing regions of these proteins, such as the coiled coils. Predictors also display a good sensitivity to local sequence drifts upon the modeling solution of the overall modular configuration. In multivalued 1D to 3D mappings, the platforms display a marked tendency to model proteins in the most compact configuration and must be retrained by information filtering to drive modeling toward the sparser ones. Bias toward order and compactness is seen at the secondary structure level as well. All in all, using AI predictors for modeling multidomain multistate proteins when global templates are at hand is fruitful, but the above challenges have to be taken into account. In the absence of global templates, a piecewise modeling approach with experimentally constrained reconstruction of the global architecture might give more realistic results.

1. Introduction

There is a general consensus that structural biology has entered a new era with the advent of the new generation of deep learning predictors such as AlphaFold and RoseTTAFold [1,2,3,4,5,6]. However, how these predictions relate to the complex reality that proteins face in their cellular environment is still an open question. Proteins are complex molecular objects displaying tightly controlled structural progression into the cell during their turnover—from nascent to native to denatured, finally degraded states. Only ~5% of them are able to refold in solution, while the rest need interactions with chaperones to guide their folding into the native state, meaning that their structure is intrinsically constrained and depends on the environment [7,8]. Moreover, between 60% and 80% of a proteome comprise sequences with multiple structurally and functionally distinct modules, including intrinsically disordered regions [9,10], with their complexity increasing along the tree of life [11], as such modular connectivity ensures optimized hierarchical molecular function bundling is driven by a significant increase in local encounter rate [12,13]. Delineating domains is not always trivial, as only a few multidomain proteins fold into a beads-on-a-string discrete manner while the rest display more or less extensive interdomain interfaces and intricate, hard-to-distinct connecting regions generating an almost continuous domain overlap into the folding space [14], which causes significant delineation differences between the main domain classification methods: CATH [15], SCOP [16], SCOPe [17] and ECOD [18]. Further complications arise from the fact that the functional structure of many multidomain proteins depends on the integration or exchange of cofactors [19] or on external interactors inflicting stern structural transitions during their functional cycle, for instance, in processes such as signal transduction or viral cell entry [20,21]. Thus, it is legitimate to ask what the limits of the new deep learning technologies are in protein structural prediction.
A good example for assessing AI prediction capabilities on more complex multidomain proteins is the family of coiled-coil NOD-like (CNL) receptors on which, over the past decade, we gained significant insight using a computationally aided experimental approach [22,23,24,25,26,27,28,29,30,31,32]. CNL receptors are involved in the so-called effector-triggered immunity (ETI) and display many of the above-mentioned complexities. The ‘core’ of a CNL sequence contains three main modules: a ‘signaling/connector’ CC domain, an NBD ‘switch’ and an LRR ‘sensor’. Of these, NBD further consists of two blocks: NBS-Arc1 and Arc2, which suffer a 180° rotation with respect to NBS-Arc1 upon CNL activation, induced by its interaction with a pathogen effector protein. This further generates an overall change of the receptor structure accompanied by a cofactor exchange from ADP in the inactive state to ATP in the active state. In some CNL extensions, this ‘core’ region may arise toward both the N- and C-terminal regions referred to as ‘integrated domains’ [33]. Recently, the structure of one member of this family from Arabidopsis thaliana, HOPZ-ACTIVATED RESISTANCE1 (ZAR1), has been solved in its inactive, intermediate and active final state in which a pentamer resistosome is formed that protrudes the cell membrane and leads to cell death [34,35]. However, even if all CNLs maintain the same canonical domain organization, other members of this family were shown to perform other functions [36], such as those related to the defense gene activation via interactions with transcription factors [37] or to the immune signaling by various types of sensor and helper CNLs that (re)act as singletons, pairs or networks [38].
Herein, we use the best-known AI prediction methods: AlphaFold2 (AF2) [39], AlphaFold3 (AF3) [40] and RoseTTAFold All-Atom (RFAA) [41] to build 3D models of 32 representatives selected from the CNL family from Arabidopsis thaliana and analyze them by both assessing their standalone quality and with up-to-date available experimental data.

2. Results

2.1. Gauging the Modeling Performance of AI Platforms by Using ZAR1-Solved Structures

As the structure of the A. thaliana ZAR1-CNL receptor was recently solved by Cryogenic electron microscopy (Cryo-EM) in its inactive, transition and active states, assessment of the predictive capacity of AF2, AF3 and RFAA was performed directly by comparing models obtained in various training conditions (described under Section 4) with the experimental data both globally and on a domain-by-domain basis.

2.1.1. Model Performance at Local Domain Level

At the domain level, the Cα RMSD of all models against active and inactive experimental structures are shown in Table 1 and Table 2, respectively. It is interesting to note that while predictions of NBD and LRR were, in all cases, of very high quality, most models presented severe departures from the experimental data in the CC domain region, with RMSDs of over 12 Å. Only by using a trivial “AF2—Active/Inactive control” workflow can a better representation of all ZAR1 modules be obtained. However, this is not significant since this workflow only uses ZAR1 active/inactive templates to train the local implementation of AF2 and eliminate the multiple sequence alignment (MSA) step that AF2 performs by default. By allowing the MSA step as well, the prediction quality of the CC severely degrades to the same level as the rest of the models.
At closer inspection, the local modeling solution proposed by all platforms for the CC region is basically a four alpha-helix bundle, which mixes up segments modeled according to the active and inactive state templates, respectively, as shown in Figure 1.
Mapping the secondary structure over the LRR domain indicates that all models accurately reproduce most of the structural features. As can be seen from Figure 2, the extended beta strands forming the beta-sheet ventral structure of the LRR domain consistently match that derived from Cryo-EM data. In contrast, on the dorsal side of the horseshoe, all models display a highly biased helical tendency when compared to the experimental data. Coming back to the ventral region, it is interesting to note that not all beta strands forming the beta-sheet network were typical LxxLxL motifs as those found in the regions 533-RGVVSTT-538, 794-EGLMLSS-800 and 840-RGGVWMK-846.

2.1.2. Prediction Performance at the Global Architectural Level

The global RMSD of the highest-ranked models are presented in Table 3. As expected, the lowest RMSDs were shown by the trivial “AF2—Active/Inactive control” models due to the severe restrictions imposed by the input filtration and curation of structural information. As can be seen closer to the experiment was the so-called “AF2—Active MSA” model, which, besides relying only on the active specific templates, also included the AF2 MSA step. This is significant as it indicates that feeding AI modeling with specific structural 3D information might improve, in some cases, the prediction quality of the global architecture of multistate multidomain proteins, at least in high sequence homology cases.
As can also be seen from Table 3, when based on the default training input (i.e., based on the complete structural information including all NBD templates in any of their active, inactive or transition states), all AF2-DB, all-AF3s and all-RFAAs models were closer to the inactive ADP compact domain configuration (<6 Å) rather than to the active ATP model (>20 Å) in which the domains were sparse (Figure 3).
Moreover, the ~6 Å RMSD is largely due to the CC modeling problems, as discussed above, rather than to departures from the overall configuration. It is highly remarkable that even when modeling ZAR1 in the presence of ATP—which is specific to the ‘open’ active state of both AF3 and RFAA—it models the protein in its inactive ADP compact configuration. These models feature the ATP correctly located in its binding site but placed in the wrong overall architecture, which means that ligand information is not taken into account in modeling the protein moiety.
Model stability was tested by molecular dynamics (MD) simulations. To this end, all models were first subjected to energy optimization, and those showing the lowest final potential energy were selected for classical MD runs, as described in Section 4. None of the selected models unfolded during this test or displayed any significant local structural departures, while the RMSD remained low, similar to that of the experimental models, as can be seen from Supplementary Table S3.
Gauging the modeling performance in interdomain interface predictions is also significant. As discussed, except for the AF2 filtered workflows designed for targeted active/inactive modeling, the default AI platforms modeled ZAR1 in the ‘closed’, inactive ADP state, even when modeled in the presence of the active state ATP ligand. Detailed analysis indicates that all models reproduced the experimental interdomain interfaces of this compact state well. Interestingly, even if the local modeling solution proposed for the CC highly diverges from the experiment, all platforms correctly predict its interface with the LRR domain. The agreement between models and the inactive ADP experimental structure is indicated by MD-based free energy estimations and, in more detail, by the interdomain contact histograms shown in Table 4 and Figure 4, respectively. The slightly tighter (~5 ÷ 10%) binding shown by models vs. experiment results from an average 16% increase in the number of contacts, which also indicates a model tendency toward compacting the structures.

2.2. Modeling the CNL Set

2.2.1. Sequence Selection

The original set of 1257 sequences retrieved from NLRscape [42] was subjected to a filtering procedure described in Section 4. In the first round, sequences lacking NBD and those with TIR signatures at the N-terminal end were eliminated. This reduced the set to 376 predicted CNLs. In the second step, any sequence lacking any of the nine canonical NBD motifs was eliminated using NLRexpress [43], resulting in a 256 set of predicted fully functional CNL sequences. This was further subjected to clustering at 70% sequence identity and 70% sequence coverage using MMSeqs2 v15.0 [44], resulting in 36 groups, given in Table 5. From these, the most cited and experimentally used sequence was selected as representative of each cluster and was used further. However, four clusters were eliminated from the analysis, the first (#16) due to the fact that its members have no citations and present extra marginal domains, the second (#21) because the cited member had extra domains, the third (#25) due to the fact that its sequences were unusually short (<675 aa) and the fourth, sequence A0A5S9WPD4, was due to missing parts of the CC and LRR domain of A0A1P8AP86, which has a > 75% identity on the rest of the common frame.
With these, the set of 32 A. thaliana CNL representatives shown in Table 5 were retained for further analysis.
The similarity tree and the similarity matrix of this set are shown in Figure 5, while the annotated global alignment is available in Supplementary Figure S1. As can be seen, the tree has two main and even branches corresponding to two main types of CC do-main—a group of 16 EDVID motif CC type CNLs and a group of 16 RPS5-like CC type helper CNLs. Interestingly, the first ‘EDVID’ branch is sparser, with similarity levels var-ying from 20 to 40–50%, while the ‘RPS5’ branch is far more compact, with all similarity levels higher than 40%.

2.2.2. Model Generation

The models retrieved from the AF2 Database.v4 (latest release at the time of writing), those from the AlphaFold3 Server and from the NeuroSnap [45] implementation of RoseTTAFold All-Atom, along with all models generated by the local deployment of AlphaFold2 are available for download at https://github.com/Teohoho/CNL_Paper_Data (created on 26 November 2024).

2.2.3. Model Refinement and Analysis

All models were minimized using the default L-BFGS algorithm as implemented in OpenMM [46]. Pre-optimization and post-optimization model quality was assessed with MolProbity [47], and the results are shown in Supplementary Tables S1 and S2. The overall distribution of pre- and post-minimization Clash Score for a subset of the models is shown in Figure 6. As can be seen, global energy minimization results in a significant quality improvement in the case of AF2 and AF3 models, while the NeuroSnap-generated RFAAs still lag behind with a large Clash Score—indicating potential entanglements and significant compactness. As RFAA models display this trend both in the presence and absence of ligands, it suggests that these small molecules are not the root of the low score for the NeuroSnap RFAA model problem.
In trying to rescue these models, simulated annealing was used to further optimize them. Only some of these models were able to withstand this step while attempting to heat the rest, leading to severe numerical instability, making them unfixable via the described annealing protocol. For this reason, the RFAA models were not simulated further.
For the highest-ranked AF2 and AF3 models, stability was evaluated using GCHMC simulations performed with Robosample [48], as described in Section 4. The highest RMSD (Å) obtained during the simulation is presented in Supplementary Table S4. Results indicate that all CNL models are stable, do not unfold globally or locally and are confined in a configurational basin not larger than 2 Å RMSD, on average.

2.2.4. Conformation Preference

As described under Section 4, a simplified representation of the CNL configuration was set for each model in a reference system centered in the alpha carbon of the glycine of the “VVG” NBD entry motif, with the xOy plane generated with this origin and the NBS and Arc1 domain mass centers. This plane consistently separates the ADP-inactive and ATP-active NBD configurations. Figure 7 displays the distribution of CC, Arc2 and LRR1-5 mass centers for a subset of models. The full domain assignment is given in Supplementary Table S3. As can be seen, all models on the ‘EDVID’ CNL branch are generated in either ADP-inactive or ATP-active configurations with a bias toward the compact ADP-inactive structure. It also shows that modeling CNLs in the presence of ATP does not affect the preference toward the closed, compact ADP form, indicating that information regarding ligand specificity is not taken into account by either AF3 or RFAA AI platforms in modeling the protein moiety. Interestingly, in the absence of ADP or ATP, the AF3 predictor models the sequences closer to the sparse, active CNL configuration. Turning now to the second ‘RPS5-like’ CNL branch, all models display a sparse CC configuration, completely different from that shown by the ‘EDVID’ group and documented by the ZAR1 Cryo-EM data—represented by the cluster of CC domains in the bottom left of each 3D scatter plot. A more detailed analysis reveals that on this branch, the CC does not interface anymore with the rest of the CNL, suggesting a completely different mechanism of signaling or multimerization.
In order to see if results hold for CNL families from other species, several well-studied receptors from wheat, barley and potato were also subjected to the same modeling protocols. Over 75% of these further examples were modeled in the ADP inactive, compact state, as can be seen from Supplemental Table S5, which presents the domain coordinates corresponding to Figure 7.
The conformational preference toward more compact modeling solutions may be gauged in several ways. For instance, this results from the ratio of two principal moments—that of the presumably most extended model, the AF2—active control—used as an internal reference and that of a current model, Iaf2ac/Imodel. If the current model is less prolate, this ratio will be larger than 1. For each of the 32 CNLs, this ratio was measured for all AF2, AF3, RFAA and OmegaFold models. The histogram of this ratio for the NBD-LRR1-5 region—Figure 8 below—shows that over 80% of the models have this region in the more compact inactive ADP configuration.
Given that half of the CNLs are RPS5-type helpers in which the CC does not interface anymore with the rest of the receptor and adopts random configurations (Figure 7)—the histogram for the overall CC-NBD-LRR1-5 is less well separated, as most of the helper models have the principal moment larger than that of their reference constrained to adopt an active ATP ZAR1-like configuration (Figure 9).
Nevertheless, even in these conditions, the overall accessibility is lower than that shown by the control. The histogram of accessibility ratios between AF2 active control (af2ac) references and the rest of the models for each CNL analyzed herein is shown in Figure 10. This indicates that all model solutions are more compact than the reference, mainly due to more buried residues in the CC region, even when this is wobbling randomly apart from the NDB-LRR1-5 core.
All these results indicate that, at least in the case of CNL receptors, AI platforms have a strong tendency to select the experimental, closed, compact ADP structure as a template to the detriment of the more extended ATP one.
To assess what is happening when only one template is at hand, thankfully, a close CNL multidomain protein family does exist: the toll-interleukin-1 receptor (TIR) NOD-like receptors, TNL. In structural databases, only one entry is solved in the more extended ATP state. The modular organization of TNL mirrors that of CNL by swapping CC to TIR, which ends in a TIR/NBS-Arc1/Arc2/LRR four-module structure, which suffers the same conformational change of the central region upon activation as a CNL. This provided the opportunity to test one example of the influence of the template and MSA step in modeling the overall configuration of a multidomain multistate protein. Results on the TNL sequence M1C2N4 (retrieved from NLRscape) show that all MSA-based AI platforms—AF2, AF3 and RFAA—provide solutions following the active ATP template. In contrast, OmegaFold, which does not rely on MSA but rather on a protein language model based on a geometry-inspired transformer, models the TNL in a compact ADP conformation, as shown in Figure 11.

3. Discussion

Model Quality

The recent release of several NOD-like receptor structures, including one present in the A. thaliana CNL set taken under investigation here, allowed a detailed analysis of the prediction capabilities of several AlphaFold and RoseTTAFold AI modeling platforms.
At the domain level, the structure of the folded NBD-Arc1, Arc2 and LRR modules belonging to CATH 3.40.50.300, 1.10.533.10 and 3.80.10 topologies have been predicted within less than 2 Å by all assessed platforms. In contrast, deviation from the experimental structure higher than 12 Å is seen in the coiled-coil N-terminal region. Analysis of the local solution provided by AI platforms for this region indicates that they combine structural information from the ADP-inactive and ATP-active templates of ZAR1. There are two key factors concurring with this issue: the first, more general, is the problem of multivalued 1D to 3D mappings, as is the case of all proteins solved in more than one structural state; the second, more specific, is the structural promiscuity and multivalency of CC domains. While mostly helical in terms of secondary structure, in plants, the CNL-CC region is known to adopt multiple, condition-dependent configurations and is shown to be able to stand alone or generate homo or hetero-multimers [23,29,49]. This highly raises the level of modeling inaccuracy, so departures from experiments are expected, and to pin down a closer-to-reality, context-dependent CC modeling solution, any kind of experimental constraints are of great help.
At the lower secondary structure level, detailed analysis carried out on the LRR domain indicates that despite the fact that the overall architecture is well predicted, mainly due to a consistent identification of ventral beta strands, there is a noticeable tendency on the dorsal side of the horseshoe to provide local helical solutions in experimentally coiled regions, most likely due to an excess of animal LRR templates in the default structural training set and/or a tendency of AI platforms toward providing local ordered more compact solutions. Hence, a careful review of the models provided by AI platforms for LRR domains, at least in the case of irregular plant NLR, is needed, as AI modeling platforms were shown to provide inconsistent local solutions [50] given their underrepresentation in structural databases and significant LRR motif irregularities [51].
At the global CNL architecture level, predictions based on the overall structural information used by default by AI platforms are all biased toward the more compact ADP-inactive configuration and only by feeding the AF2 predictor with filtered information specific to the ATP-active state this became apt to drive modeling toward the sparser ATP-active configuration. Notably, when the protein is modeled in the presence of ATP, the AF3 and RFAA platforms model the protein in its more compact ADP-inactive state, indicating that information related to ligand specificity is not taken into account in modeling the protein moiety but merely to place it at the right binding site. From an information point of view, this is surprising, as it would have been a very simple way to distinguish, in multistate situations, between active and inactive NBD templates, but seemingly, this was not used in the AF3 and RFAA network training workflows. Using ligands for template selection in such MSA-based AI platforms would be very useful in modeling the various states of proteins involved in signal transduction or proteins subjected to allostery.
The bias toward the more compact ADP-inactive state is, to a lesser extent, maintained in modeling all sequences on the ‘EDVID’ branch of A. thaliana CNLs. However, some sequences are modeled closer to the ATP-active state, suggesting that information on contacts at various interfaces of the CNL is missing from what was used by default in training the predictor. Surprisingly, the NeuroSnap implementation of RFAA was shown, in some cases, to be prone to generate entangled, overly compact models displaying severe steric clashes.
On the other hand, the sensitivity of AI platforms to the local, detailed characteristics of the sequence of a modular protein is plainly demonstrated by the vastly different modeling solutions proposed for the CNL receptors found on the ‘RPS5’ branch. All models of this group display the CC region as completely separated from the rest of the CNL. This solution correlates well with the presence of a myristoilation signal at the N-terminal end of the CNL sequence, which suggests a direct binding of the CNL to the membrane, in stark contrast to the insertion membrane mechanism that was proven for the EDVID-type receptor ZAR1. Even when separated from the NBD-LRR1-5 core, the overall exposed surface of the CC-NBD-LRR1-5 region is lower than that of the internal reference af2ac models, as seen in Figure 9. This adds up to the observation of a general trend toward more compact AI modeling solutions. This trend is also confirmed by several examples of CNLs from receptor families found in other plant species, which also show that when facing a choice between ADP compact and ATP-extended NBD templates, all AI platforms display a clear preference to model CNLs in the more compact state.
Interestingly, on the other hand, templates and the MSA step lead the way in finding AF2, AF3 and RFAA solutions. This is clearly shown by the TNL modeling example presented herein. All AF/RF default solutions provided for this receptor are models following the only existing template, which is in ATP extended TNL conformation. In contrast, in the solution provided by OmegaFold, which is not based on MSA, the receptor is modeled in an ADP compact state of the NBD, as shown in Figure 11. When feeding AF2 with only ADP templates and removing the MSA step, the resulting modeling solutions become inconsistent and diverge from both the NBD (NBS-Arc1/Arc2) standard, well-documented states.
It is generally recognized that modeling multidomain proteins, especially those that undergo structural transitions during their functional cycle, is far more complex than modeling single-domain proteins [1], and the present case study plainly shows many of the confronting challenges. For multidomain multistate proteins, AI platforms were reported to predict only a single ‘ground state’ [52]. Here, however, 20% of the selected CNL sets were not modeled in the ground but instead in the active snapshot, probably due to the lack of information regarding interdomain contacts. Another surprise is that simple information, such as the nature of the ligand, is not used by the network to discriminate between protein states by the new generation of predictors. Our AF2 inactive/active MSA models show that such an approach would be well suited to discriminate between states. The low accessibility of the RPS5-type helpers is also notable. This suggests that the local solution of the CC is more compact than that of the elongated one seen in active ZAR1. This is consistent with previous reports that AIs do not perform very well in modeling elongated CC regions, which are modeled in some cases as ‘balls’ [53].

4. Materials and Methods

4.1. Sequence Selection of the A. thaliana Representative CNL Set

A representative set of A. thaliana CNL sequences was selected from the NLRscape web resource (https://nlrscape.biochim.ro/home.php, accessed on 20 May 2024). In the first step, only sequences containing the canonical CC-NBD-LRR domain organization were retained, while those containing additional sequence stretches at any of the N- and C-terminus ends were eliminated. The retained set was further filtered out using NLRexpress (https://nlrexpress.biochim.ro/, accessed on 20 May 2024) to discard sequences lacking one or more of the 9 characteristic CNL motifs while retaining only the fully functional predicted ones. This reduced set was then clustered using MMSeqs2 at 70% coverage and 70% identity. Cluster representatives were finally selected based on the level of literature mentioned and the degree of experimental usage.

4.2. Sequence Analysis of Representative Set Sequences

Relevant data related to representatives and their structural profiles were retrieved from NLRscape, while the delineation of the CC, NBD and LRR canonical domains and of the two invariant globular NBD regions, NBS-Arc1 and Arc2, was based on sequence profiling and the precise location of the 9 CNL motifs identified with NLRexpress and LRRpredictor.
The phylogenetic analysis was conducted using the maximum likelihood method implemented in IQtree [54] (http://iqtree.cibiv.univie.ac.at/, accessed on 20 August 2024). This analysis was performed on either the full-length sequences or individual domains and subdomains. Sequence alignments were performed using MAFFT v7.490 [55]. The phylogeny model was selected using the automated substitution selection and FreeRate model to account for heterogeneity [56]. Additionally, we employed the ultrafast bootstrap approximation [57] and the SH-aLRT branch test, both utilizing 1000 replicates, alongside the approximate Bayes test [58]. Tree graphics were generated using iTOL v7.0 (https://itol.embl.de/, accessed on 20 August 2024) [59]. Sequence identity matrices were computed on either the full-length sequences or individual domains and subdomains using Ugene v46.0 [60] and in-house scripts.

4.3. Template and MSA Filtering with the Locally Implemented Version of AF2

In order to assess the effects induced by template and MSA restriction/enrichment on AI modeling, the latest version of AlphaFold2 (v. 2.3.2) was locally implemented and used to test various retraining filtering.
Firstly, since the generation of the PDB snapshot that is used by default by AF2, which already contains the A. thaliana ZAR1 CNL structures solved in active ATP open conformation (6J5T), transition conformation and inactive ADP closed conformation (6J5W), newer structures of NLR proteins have been published (8RFH, 8XUO, 8XUQ, 8XUV) [61,62]. In order to test the effects of template restrictions in guiding AF2, these new templates were split into “Active” and “Inactive” conformations based on their ligand/structural similarity to ZAR1, as presented in Table 6. Other plant proteins that were used are also presented.
Secondly, as the CC and LRR domains have well-described architectures, Cath 1.20 Up-Down Bundle and Cath 3.80.10 Leucine-Rich Repeat, respectively, the AF2 PDB database was filtered out to retain only protein chains containing regions of these two types. Subsequently, only these were included as input in the final modeling of the “MSA” models.
Moreover, given that the MSA that AF2 generates should reflect features specific to NLR proteins, input sequences were filtered out to retain only the complete set of plant NLRs retrieved from NLRScape.
Given that the structure of A. thaliana ZAR1 was already solved in its inactive, transition and active states, its default and restricted template models were used to gauge the prediction capabilities of the AI platforms tested herein. In order to improve the local modeling of ZAR1 (Q38834), the lab-implemented version of AF2 was modified to accept templates with higher than 95% sequence similarity to the input sequence.
Files containing all sequence names and all PDB accession codes for the structures used are available on the above-mentioned GitHub Repository.

4.4. Model Generation

Default AF2 models were retrieved from the AlphaFold2 Database [63]. AlphaFold2 v. 2.3.2, downloaded from the Deepmind Github repository [64], was used for local implementation. The installed AF2 has been run on consumer-grade desktops equipped with NVIDIA RTX 3080Ti GPUs. Instead of a docker deployment, AF2 was directly installed, given that this allows a more simple and flexible handling. AF2 databases were modified to pass the customized sequences/structures subsets as described above. In this way, 4 model sets were generated using the local AF2 deployment:
  • No MSA input, only the “Active” experimental structures from Table 6—“Active Control”.
  • No MSA input, only the “Inactive” experimental structures from Table 6—“Inactive Control”.
  • MSA consisting of only NLR Proteins retrieved from NLRscape, “Active” experimental structures from Table 6 and structures corresponding to CATH families for CC and LRR architectures—“Active MSA”.
  • MSA consisting of only NLR Proteins retrieved from NLRscape, “Inactive” experimental structures from Table 6 and structures corresponding to CATH families for CC and LRR architectures—“Inactive MSA”.
To test the effect of ligand presence upon CNL protein modeling, AlphaFold3 and the NeuroSnap server implementation of RoseTTAFold All Atom RFAA were used to model all sequences in interaction or not with their endogenous ADP/ATP ligands. Hence, all sequences of the representative set were modeled on these two web platforms in: (a) the presence of ADP, corresponding to the active state; (b) the absence of ligands, corresponding to the transition state; and (c) the presence of ATP, corresponding to the active state.
In this way, for each sequence of the representative set, overall, 39 models were generated as follows: 1 AF2-DB, 3 NeuroSnap-RFAA3, 4 × 5 AF2 local and 3 × 5 AF3 models ranked from 1 to 5 in these later cases. All of these were further subjected to refinement and analysis, as described below.
For the conformation preference analysis (Section 2.2.4), as a control to the template and MSA-based AF/RFAA predictors, we also analyzed the models generated by OmegaFold, which relies on a protein language model based on a geometry-inspired transformer rather than MSA [65].

4.5. Model Refinement

Raw models were first scored with MolProbity v4.3. All models were then optimized by energy minimization using the L-BFGS algorithm implemented in OpenMM v. 7.3.0 and then rescored. For RFAA models, a further simulated annealing was needed. This was performed by heating the models to 600 K in 0.6 ns followed by a 1 ns MD simulation and a stepwise cooling from 600 K to 300 K in 1 ns, from 300 K to 100 K in 2 ns and from 100 K to 0 K in 3 ns, with final L-BFGS minimization. In order to preserve the secondary structures, harmonic constraints were applied on backbone atoms in these regions, using a 10 kcal/nm2 force constant. In the end, MolProbity scores of RFAA models for the resulting final state were obtained.
Simulation files were generated using the tLEaP from the AmberTools23 package [66]. Models were parameterized using the FF19SB forcefield [67], and parameters for ATP/ADP ligands were taken from Meagher et al. [68].

4.6. Model Analysis

4.6.1. Model Quality

The quality of all models was evaluated using MolProbity. Moreover, interdomain distances between all heavy atoms were used to probe for unrealistic clashes (<2 Å) among the CC, NBD and LRR domains. These criteria were also used in the optimization protocol described above.

4.6.2. Assessment of Overall Architecture

The RCSB PDB database contains NLR structures solved in multiple states: ADP-inactive monomer, transition and ATP-active oligomers. These differ mainly by the configuration of the NBD ‘switch’, in that Arc2 performs a ~180° rotation with respect to NBS-Arc1 during activation. In a simplified representation, this can be seen as a flip of the Arc2 mass center (m.c.) to the opposite side of the plan formed within NBS-Arc1 by the Cα of the conserved Gly from the entry NDB motif: VVG, the m.c. of NBS and the m.c. of Arc1. In comparing the module architecture of various CNL models, this plan with the origin in the entry Gly Cα was used to represent the relative location of Arc2 m.c., which allows to evaluate if the model is in ADP-closed or ATP-open state, but also to locate the m.c. of the CC and LRR1-5 to see if the overall architecture is consistent or diverges from that of ZAR1—the only experimentally known CNL structure. Given that the number of LRR repeats largely varies over the representative set, only the first 5 repeats known to interact with NBD in the ADP closed state [25] were considered in the m.c. calculation of LRR (LRR1-5).
Principal moments and SASA were calculated using the respective functions from the MDTraj package [69].

4.6.3. Model Stability

Model stability was tested by two simulation methods. For the generated ZAR1 models, a more detailed approach was used to compare structural prediction performance. The top models in each class, as well as the crystallized structures, were immersed into an explicit water box, and MD was simulated using OpenMM at 300 K° and constant pressure for 10 ns. Since measuring interdomain interface interactions was of special interest, SHAKE use was avoided, opting instead for 1 fs timestep simulation, allowing free hydrogen movements.
Secondly, the stability of all generated models was assessed by a higher sampling efficacy method, namely the generalized coordinate Hamiltonian Monte Carlo (GCHMC) [70], implemented in the Robosample package. The method is able to use constraints without disrupting the correct conformation probability distribution. As GCHMC uses both Cartesian and bond–angle–torsion coordinates, this ensures great freedom in choosing constraints. Herein, three groups of degrees of freedom (Gibbs blocks) took turns to oversample the interfaces between the three domains and the linker areas: one sampling all of the internal degrees of freedom (DoFs) of the system, one for sampling the torsional DoFs on the interface residues (defined as residues that are closer than 5A to residues on other domains) and one for sampling the interface sidechains and the phi/psi torsions of the linker domains at the same time. All simulation parameters are provided on the above-mentioned GitHub repository.

4.6.4. Interface Analysis

Since NLR proteins are multidomain, it is important to evaluate how well the generated models conserve the interfaces between the domains as well as how stable the interfaces are in time. To this end, using the above-described molecular dynamics trajectories, we employed Prodigy [71] to compute the binding free energy between each pair of domains (CC-NBD, CC-LRR and NBD-LRR) for a subset of snapshots equally spaced across each simulation.

5. Conclusions

The present work focuses on evaluating the ability of some of the best-known platforms of new-generation deep learning predictors to provide plausible models in the more challenging situation of multistate multidomain proteins by taking as a case study a coiled-coil family of NOD-like receptors from A. thaliana. This class is interesting on at least two fronts: First—some structural information in this group does exist, but it is not comprehensive, which allows both gauging the AI predictor performance and evaluating extrapolation effects induced by sequence drifts when the overall modular architecture is conserved. Second—an in-depth understanding of the functioning of this immune system family has a tangible economic impact on plant yield [72]; thus, any structural information might be of use in this case—this is why this work was directed toward the most frequently used CNL receptors in experiments.
Results presented herein reveal a truly remarkable ability of these platforms to correctly predict the 3D structure of the sequence modules that fold in well-established topologies. On the bright side, there is also the sensitivity of these platforms to local sequence drifts upon the modeling solution proposed for the overall modular configuration of the multidomain protein—as shown in the present work by the structural discrimination between the two main CNL branches. As expected, on the other hand, one can notice their lower performance in modeling more promiscuous, morphing regions of a modular sequence, such as the coiled-coil regions, which locally display strong helical propensity and may be found in multiple global, context-dependent configurations. In such cases, feeding the predictor with filtered structural information rather than with the default, a general one, proved useful in directing the modeling process closer to reality. Results also indicate that in multivalued 1D to 3D instances, when several templates are at hand for the same sequence, AI predictors display a consistent preference to model such modular proteins in their most compact configuration rather than in the sparser ones. A similar bias was noticed at the local secondary structure level as well—for instance, in modeling many dorsal coiled stretches of the LRR domain as helices.
The trends revealed by this work suggest that using the new generation of AI predictors for modeling multidomain multistate proteins is fruitful when global templates are at hand, but nevertheless, this approach faces the challenges mentioned above that have to be taken into account. Challenges further increase when global templates are missing and only local remote homologs may be found in structural databases. In this case, in our hands, a piecewise modeling approach using such predictors, with local AI models further corrected for large insertions and joined to include all known experimental constraints such as those given by SAXS [73], Cryo-EM [74], NMR [75] or HDX-MS [76] in building the overall modular architecture is better suited to give results closer to the real functional system.
Finally, deep learning predictions are mostly based on the empiric evidence gathered through the ages in structural databases; it is highly relevant to explore the properties of such information-based models with methods focused on the physics of the modular system, such as structural optimization, stability tests and/or probing their potential energy surface (PES), in order to identify local barriers and configurational basins, especially when interactions with ligands or other molecular species are known to shape up the modular architecture. While MD simulation provides a detailed description of the system, The new generation of enhanced sampling methods, such as GCHMC, are shown to probe a far larger conformational space of the system and, in this sense, should be more useful, for instance, in probing the pathways between states of multistate multidomain proteins.

Supplementary Materials

The Supplementary Figure S1 and Tables S1–S5 can be downloaded at: https://www.mdpi.com/article/10.3390/ijms26020500/s1. All generated models and files containing filtered input for AlphaFold2 can be downloaded at: https://github.com/Teohoho/CNL_Paper_Data.

Author Contributions

Conceptualization, A.-J.P. and L.S.; methodology, A.-J.P., E.C.M., L.S.; software, T.A.Ș.; validation, C.A.B., F.S.B. and A.-L.I.; formal analysis, T.A.Ș. and E.C.M.; investigation, T.A.Ș., E.C.M., C.A.B., F.S.B. and A.-L.I.; resources, A.-J.P. and L.S.; writing—original draft preparation, T.A.Ș. and E.C.M.; writing—review and editing, A.-J.P. and L.S.; supervision, A.-J.P. All authors have read and agreed to the published version of the manuscript.

Funding

Romanian Academy programs of IBAR no 3, PN-III-P4-ID-PCE-2020-2444.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is contained within the article and Supplementary Materials, at the presented GitHub URL.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Perrakis, A.; Sixma, T.K. AI revolutions in biology. Embo Rep. 2021, 22, e54046. [Google Scholar] [CrossRef] [PubMed]
  2. Goulet, A.; Cambillau, C. Present Impact of AlphaFold2 Revolution on Structural Biology, and an Illustration With the Structure Prediction of the Bacteriophage J-1 Host Adhesion Device. Front. Mol. Biosci. 2022, 9, 907452. [Google Scholar] [CrossRef]
  3. Bertoline, L.M.F.; Lima, A.N.; Krieger, J.E.; Teixeira, S.K. Before and after AlphaFold2: An overview of protein structure prediction. Front. Bioinform. 2023, 3, 1120370. [Google Scholar] [CrossRef]
  4. Wodak, S.J.; Velankar, S. Structural biology: The transformational era. Proteomics 2023, 23, e2200084. [Google Scholar] [CrossRef]
  5. Read, R.J.; Baker, E.N.; Bond, C.S.; Garman, E.F.; van Raaij, M.J. AlphaFold and the future of structural biology. IUCrJ 2023, 10, 377–379. [Google Scholar] [CrossRef]
  6. Dahlström, K.M.; Salminen, T.A. Apprehensions and Emerging Solutions in ML-Based Protein Structure Prediction; Elsevier Ltd.: Amsterdam, The Netherlands, 2024. [Google Scholar] [CrossRef]
  7. Zheng, W.; Schafer, N.P.; Wolynes, P.G. Frustration in the energy landscapes of multidomain protein misfolding. Proc. Natl. Acad. Sci. USA 2013, 110, 1680–1685. [Google Scholar] [CrossRef]
  8. Hartl, F.U.; Hayer-Hartl, M. Converging concepts of protein folding in vitro and in vivo. Nat. Struct. Mol. Biol. 2009, 16, 574–581. [Google Scholar] [CrossRef]
  9. Han, J.-H.; Batey, S.; Nickson, A.A.; Teichmann, S.A.; Clarke, J. The folding and evolution of multidomain proteins. Nat. Rev. Mol. Cell Biol. 2007, 8, 319–330. [Google Scholar] [CrossRef]
  10. Holehouse, A.S.; Kragelund, B.B. The molecular basis for cellular function of intrinsically disordered protein regions. Nat. Rev. Mol. Cell Biol. 2024, 25, 187–211. [Google Scholar] [CrossRef]
  11. Rebeaud, M.E.; Mallik, S.; Goloubinoff, P.; Tawfik, D.S. On the evolution of chaperones and cochaperones and the expansion of proteomes across the Tree of Life. Proc. Natl. Acad. Sci. USA 2021, 118, e2020885118. [Google Scholar] [CrossRef]
  12. Aziz, M.F.; Caetano-Anollés, G. Evolution of networks of protein domain organization. Sci. Rep. 2021, 11, 12075. [Google Scholar] [CrossRef] [PubMed]
  13. Kjaergaard, M. Estimation of Effective Concentrations Enforced by Complex Linker Architectures from Conformational Ensembles. Biochemistry 2022, 61, 171–182. [Google Scholar] [CrossRef]
  14. Harrison, A.; Pearl, F.; Mott, R.; Thornton, J.; Orengo, C. Quantifying the similarities within fold space. J. Mol. Biol. 2002, 323, 909–926. [Google Scholar] [CrossRef] [PubMed]
  15. Waman, V.P.; Bordin, N.; Alcraft, R.; Vickerstaff, R.; Rauer, C.; Chan, Q.; Sillitoe, I.; Yamamori, H.; Orengo, C. CATH 2024: CATH-AlphaFlow Doubles the Number of Structures in CATH and Reveals Nearly 200 New Folds. J. Mol. Biol. 2024, 436, 168551. [Google Scholar] [CrossRef]
  16. Andreeva, A.; Kulesha, E.; Gough, J.; Murzin, A.G. The SCOP database in 2020: Expanded classification of representative family and superfamily domains of known protein structures. Nucleic Acids Res. 2020, 48, D376–D382. [Google Scholar] [CrossRef]
  17. Chandonia, J.-M.; Guan, L.; Lin, S.; Yu, C.; Fox, N.K.; Brenner, S.E. SCOPe: Improvements to the structural classification of proteins-extended database to facilitate variant interpretation and machine learning. Nucleic Acids Res. 2022, 50, D553–D559. [Google Scholar] [CrossRef]
  18. Schaeffer, R.D.; Zhang, J.; Kinch, L.N.; Pei, J.; Cong, Q.; Grishin, N.V. Classification of domains in predicted structures of the human proteome. Proc. Natl. Acad. Sci. USA 2023, 120, e2214069120. [Google Scholar] [CrossRef]
  19. Rasmussen, S.G.F.; DeVree, B.T.; Zou, Y.; Kruse, A.C.; Chung, K.Y.; Kobilka, T.S.; Thian, F.S.; Chae, P.S.; Pardon, E.; Calinski, D.; et al. Crystal structure of the b 2 adrenergic receptor—Gs protein complex. Nature 2011, 477, 549–555. [Google Scholar] [CrossRef]
  20. Hilger, D.; Masureel, M.; Kobilka, B.K. Structure and dynamics of GPCR signaling complexes. Nat. Struct. Mol. Biol. 2018, 25, 4–12. [Google Scholar] [CrossRef]
  21. Jackson, C.B.; Farzan, M.; Chen, B.; Choe, H. Mechanisms of SARS-CoV-2 entry into cells. Nat. Rev. Mol. Cell Biol. 2022, 23, 3–20. [Google Scholar] [CrossRef]
  22. Slootweg, E.; Roosien, J.; Spiridon, L.N.; Petrescu, A.-J.; Tameling, W.; Joosten, M.; Pomp, R.; van Schaik, C.; Dees, R.; Borst, J.W.; et al. Nucleocytoplasmic distribution is required for activation of resistance by the potato NB-LRR receptor Rx1 and is balanced by its functional domains. Plant Cell 2010, 22, 4195–4215. [Google Scholar] [CrossRef] [PubMed]
  23. Maekawa, T.; Cheng, W.; Spiridon, L.N.; Töller, A.; Lukasik, E.; Saijo, Y.; Liu, P.; Shen, Q.-H.; Micluta, M.A.; Somssich, I.E.; et al. Coiled-coil domain-dependent homodimerization of intracellular barley immune receptors defines a minimal functional module for triggering cell death. Cell Host Microbe 2011, 9, 187–199. [Google Scholar] [CrossRef] [PubMed]
  24. Sela, H.; Spiridon, L.N.; Petrescu, A.; Akerman, M.; Mandel-Gutfreund, Y.; Nevo, E.; Loutre, C.; Keller, B.; Schulman, A.H.; Fahima, T. Ancient diversity of splicing motifs and protein surfaces in the wild emmer wheat (Triticum dicoccoides) LR10 coiled coil (CC) and leucine-rich repeat (LRR) domains. Mol. Plant Pathol. 2012, 13, 276–287. [Google Scholar] [CrossRef]
  25. Slootweg, E.J.; Spiridon, L.N.; Roosien, J.; Butterbach, P.; Pomp, R.; Westerhof, L.; Wilbers, R.; Bakker, E.; Bakker, J.; Petrescu, A.-J.; et al. Structural determinants at the interface of the ARC2 and leucine-rich repeat domains control the activation of the plant immune receptors Rx1 and Gpa2. Plant Physiol. 2013, 162, 1510–1528. [Google Scholar] [CrossRef]
  26. Sela, H.; Spiridon, L.N.; Ashkenazi, H.; Bhullar, N.K.; Brunner, S.; Petrescu, A.-J.; Fahima, T.; Keller, B.; Jordan, T. Three-dimensional modeling and diversity analysis reveals distinct AVR recognition sites and evolutionary pathways in wild and domesticated wheat Pm3 R genes. Mol. Plant-Microbe Interact. 2014, 27, 835–845. [Google Scholar] [CrossRef]
  27. Sueldo, D.J.; Shimels, M.; Spiridon, L.N.; Caldararu, O.; Petrescu, A.; Joosten, M.H.A.J.; Tameling, W.I.L. Random mutagenesis of the nucleotide-binding domain of NRC1 (NB-LRR Required for Hypersensitive Response-Associated Cell Death-1), a downstream signalling nucleotide-binding, leucine-rich repeat (NB-LRR) protein, identifies gain-of-function mutations in the nucleotide-binding pocket. New Phytol. 2015, 208, 210–223. [Google Scholar] [CrossRef]
  28. De Oliveira, A.S.; Koolhaas, I.; Boiteux, L.S.; Caldararu, O.F.; Petrescu, A.; Resende, R.d.O.; Kormelink, R. Cell death triggering and effector recognition by Sw-5 SD-CNL proteins from resistant and susceptible tomato isolines to Tomato spotted wilt virus. Mol. Plant Pathol. 2016, 17, 1442–1454. [Google Scholar] [CrossRef]
  29. Wróblewski, T.; Spiridon, L.; Martin, E.C.; Petrescu, A.-J.; Cavanaugh, K.; Truco, M.J.; Xu, H.; Gozdowski, D.; Pawłowski, K.; Michelmore, R.W.; et al. Genome-wide functional analyses of plant coiled–coil NLR-type pathogen receptors reveal essential roles of their N-terminal domain in oligomerization, networking, and immunity. PLoS Biol. 2018, 16, e2005821. [Google Scholar] [CrossRef]
  30. Slootweg, E.J.; Spiridon, L.N.; Martin, E.C.; Tameling, W.I.; Townsend, P.D.; Pomp, R.; Roosien, J.; Drawska, O.; Sukarta, O.C.; Schots, A.; et al. Distinct roles of non-overlapping surface regions of the coiled-coil domain in the potato immune receptor Rx1. Plant Physiol. 2018, 178, 1310–1331. [Google Scholar] [CrossRef]
  31. Baudin, M.; Schreiber, K.J.; Martin, E.C.; Petrescu, A.J.; Lewis, J.D. Structure–function analysis of ZAR1 immune receptor reveals key molecular interactions for activity. Plant J. 2020, 101, 352–370. [Google Scholar] [CrossRef]
  32. Baudin, M.; Martin, E.C.; Sass, C.; Hassan, J.A.; Bendix, C.; Sauceda, R.; Diplock, N.; Specht, C.D.; Petrescu, A.J.; Lewis, J.D. A natural diversity screen in Arabidopsis thaliana reveals determinants for HopZ1a recognition in the ZAR1-ZED1 immune complex. Plant Cell Environ. 2020, 44, 629–644. [Google Scholar] [CrossRef] [PubMed]
  33. Duxbury, Z.; Wu, C.-H.; Ding, P. A Comparative Overview of the Intracellular Guardians of Plants and Animals: NLRs in Innate Immunity and Beyond. Annu. Rev. Plant Biol. 2021, 72, 155–184. [Google Scholar] [CrossRef] [PubMed]
  34. Wang, J.; Wang, J.; Hu, M.; Wu, S.; Qi, J.; Wang, G.; Han, Z.; Qi, Y.; Gao, N.; Wang, H.-W.; et al. Ligand-triggered allosteric ADP release primes a plant NLR complex. Science 2019, 364, 43. [Google Scholar] [CrossRef] [PubMed]
  35. Wang, J.; Hu, M.; Wang, J.; Qi, J.; Han, Z.; Wang, G.; Qi, Y.; Wang, H.-W.; Zhou, J.-M.; Chai, J. Reconstitution and structure of a plant NLR resistosome conferring immunity. Science 2019, 364, 44. [Google Scholar] [CrossRef]
  36. Cesari, S. Multiple Strategies for Pathogen Perception by Plant Immune Receptors; John Wiley and Sons Inc.: Hoboken, NJ, USA, 2018. [Google Scholar] [CrossRef]
  37. Xu, S.; Wei, X.; Yang, Q.; Hu, D.; Zhang, Y.; Yuan, X.; Kang, F.; Wu, Z.; Yan, Z.; Luo, X.; et al. A KNOX Ⅱ transcription factor suppresses the NLR immune receptor BRG8-mediated immunity in rice. Plant Commun. 2024, 5, 101001. [Google Scholar] [CrossRef]
  38. Contreras, M.P.; Lüdke, D.; Pai, H.; Toghani, A.; Kamoun, S. NLR receptors in plant immunity: Making sense of the alphabet soup. EMBO Rep. 2023, 24, e57495. [Google Scholar] [CrossRef]
  39. Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef]
  40. Abramson, J.; Adler, J.; Dunger, J.; Evans, R.; Green, T.; Pritzel, A.; Ronneberger, O.; Willmore, L.; Ballard, A.J.; Bambrick, J.; et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 2024, 630, 493–500. [Google Scholar] [CrossRef]
  41. Krishna, R.; Wang, J.; Ahern, W.; Sturmfels, P.; Venkatesh, P.; Kalvet, I.; Lee, G.R.; Morey-Burrows, F.S.; Anishchenko, I.; Humphreys, I.R.; et al. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science 2024, 384, eadl2528. [Google Scholar] [CrossRef]
  42. Martin, E.C.; Ion, C.F.; Ifrimescu, F.; Spiridon, L.; Bakker, J.; Goverse, A.; Petrescu, A.-J. NLRscape: An atlas of plant NLR proteins. Nucleic Acids Res. 2023, 51, D1470–D1482. [Google Scholar] [CrossRef]
  43. Martin, E.C.; Spiridon, L.; Goverse, A.; Petrescu, A.-J. NLRexpress—A bundle of machine learning motif predictors—Reveals motif stability underlying plant Nod-like receptors diversity. Front. Plant Sci. 2022, 13, 975888. [Google Scholar] [CrossRef] [PubMed]
  44. Steinegger, M.; Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 2017, 35, 1026–1028. [Google Scholar] [CrossRef] [PubMed]
  45. Neurosnap Platform. Available online: https://neurosnap.ai/ (accessed on 31 May 2024).
  46. Eastman, P.; Swails, J.; Chodera, J.D.; McGibbon, R.T.; Zhao, Y.; Beauchamp, K.A.; Wang, L.-P.; Simmonett, A.C.; Harrigan, M.P.; Stern, C.D.; et al. OpenMM 7: Rapid development of high performance algorithms for molecular dynamics. PLoS Comput. Biol. 2017, 13, e1005659. [Google Scholar] [CrossRef] [PubMed]
  47. Williams, C.J.; Headd, J.J.; Moriarty, N.W.; Prisant, M.G.; Videau, L.L.; Deis, L.N.; Verma, V.; Keedy, D.A.; Hintze, B.J.; Chen, V.B.; et al. MolProbity: More and better reference data for improved all-atom structure validation. Protein Sci. 2018, 27, 293–315. [Google Scholar] [CrossRef]
  48. Spiridon, L.; Şulea, T.A.; Minh, D.D.; Petrescu, A.-J. Robosample: A rigid-body molecular simulation program based on robot mechanics. Biochim. Biophys. Acta (BBA)—Gen. Subj. 2020, 1864, 129616. [Google Scholar] [CrossRef]
  49. Hao, W.; Collier, S.M.; Moffett, P.; Chai, J. Structural basis for the interaction between the potato virus X resistance protein (Rx) and its cofactor ran GTPase-activating protein 2 (RanGAP2). J. Biol. Chem. 2013, 288, 35868–35876. [Google Scholar] [CrossRef]
  50. van Grinsven, I.L.; Martin, E.C.; Petrescu, A.-J.; Kormelink, R. Tsw—A case study on structure-function puzzles in plant NLRs with unusually large LRR domains. Front. Plant Sci. 2022, 13, 983693. [Google Scholar] [CrossRef]
  51. Martin, E.C.; Sukarta, O.C.A.; Spiridon, L.; Grigore, L.G.; Constantinescu, V.; Tacutu, R.; Goverse, A.; Petrescu, A.-J. Lrrpredictor—A new LRR motif detection method for irregular motifs of plant NLR proteins using an ensemble of classifiers. Genes 2020, 11, 286. [Google Scholar] [CrossRef]
  52. Agarwal, V.; McShan, A.C. The power and pitfalls of AlphaFold2 for structure prediction beyond rigid globular proteins. Nat. Chem. Biol. 2024, 20, 950–959. [Google Scholar] [CrossRef]
  53. Peng, C.-X.; Liang, F.; Xia, Y.-H.; Zhao, K.-L.; Hou, M.-H.; Zhang, G.-J. Recent Advances and Challenges in Protein Structure Prediction. J. Chem. Inf. Model. 2024, 64, 76–95. [Google Scholar] [CrossRef]
  54. Minh, B.Q.; Schmidt, H.A.; Chernomor, O.; Schrempf, D.; Woodhams, M.D.; von Haeseler, A.; Lanfear, R. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Mol. Biol. Evol. 2020, 37, 1530–1534. [Google Scholar] [CrossRef] [PubMed]
  55. Katoh, K.; Standley, D.M. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol. Biol. Evol. 2013, 30, 772–780. [Google Scholar] [CrossRef]
  56. Soubrier, J.; Steel, M.; Lee, M.S.; Der Sarkissian, C.; Guindon, S.; Ho, S.Y.; Cooper, A. The influence of rate heterogeneity among sites on the time dependence of molecular rates. Mol. Biol. Evol. 2012, 29, 3345–3358. [Google Scholar] [CrossRef]
  57. Minh, B.Q.; Nguyen, M.A.T.; Von Haeseler, A. Ultrafast Approximation for Phylogenetic Bootstrap. Mol. Biol. Evol. 2013, 30, 1188–1195. [Google Scholar] [CrossRef]
  58. Anisimova, M.; Gil, M.; Dufayard, J.-F.; Dessimoz, C.; Gascuel, O. Survey of Branch Support Methods Demonstrates Accuracy, Power, and Robustness of Fast Likelihood-based Approximation Schemes. Syst. Biol. 2011, 60, 685–699. [Google Scholar] [CrossRef]
  59. Letunic, I.; Bork, P. Interactive Tree Of Life (iTOL) v4: Recent updates and new developments. Nucleic Acids Res. 2019, 47, W256–W259. [Google Scholar] [CrossRef] [PubMed]
  60. Okonechnikov, K.; Golosova, O.; Fursov, M.; The UGENE Team. Unipro UGENE: A unified bioinformatics toolkit. Bioinformatics 2012, 28, 1166–1167. [Google Scholar] [CrossRef] [PubMed]
  61. Selvaraj, M.; Toghani, A.; Pai, H.; Sugihara, Y.; Kourelis, J.; Yuen, E.L.H.; Ibrahim, T.; Zhao, H.; Xie, R.; Maqbool, A.; et al. Activation of plant immunity through conversion of a helper NLR homodimer into a resistosome. PLoS Biol. 2024, 22, e3002868. [Google Scholar] [CrossRef]
  62. Ma, S.; An, C.; Lawson, A.W.; Cao, Y.; Sun, Y.; Tan, E.Y.J.; Pan, J.; Jirschitzka, J.; Kümmel, F.; Mukhi, N.; et al. Oligomerization-mediated autoinhibition and cofactor binding of a plant NLR. Nature 2024, 632, 869–876. [Google Scholar] [CrossRef]
  63. Varadi, M.; Anyango, S.; Deshpande, M.; Nair, S.; Natassia, C.; Yordanova, G.; Yuan, D.; Stroe, O.; Wood, G.; Laydon, A.; et al. AlphaFold Protein Structure Database: Massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 2022, 50, D439–D444. [Google Scholar] [CrossRef]
  64. AlphaFold2 Github Repository. Available online: https://github.com/google-deepmind/alphafold (accessed on 20 May 2024).
  65. Wu, R.; Ding, F.; Wang, R.; Shen, R.; Zhang, X.; Luo, S.; Peng, J. High-resolution de novo structure prediction from primary sequence. bioRxiv 2022. [Google Scholar] [CrossRef]
  66. Case, D.A.; Aktulga, H.M.; Belfon, K.; Cerutti, D.S.; Cisneros, G.A.; Cruzeiro, V.W.D.; Forouzesh, N.; Giese, T.J.; Götz, A.W.; Gohlke, H.; et al. AmberTools. J. Chem. Inf. Model. 2023, 63, 6183–6191. [Google Scholar] [CrossRef] [PubMed]
  67. Tian, C.; Kasavajhala, K.; Belfon, K.A.A.; Raguette, L.; Huang, H.; Migues, A.N.; Bickel, J.; Wang, Y.; Pincay, J.; Wu, Q.; et al. Ff19SB: Amino-Acid-Specific Protein Backbone Parameters Trained against Quantum Mechanics Energy Surfaces in Solution. J. Chem. Theory Comput. 2020, 16, 528–552. [Google Scholar] [CrossRef] [PubMed]
  68. Meagher, K.L.; Redman, L.T.; Carlson, H.A. Development of polyphosphate parameters for use with the AMBER force field. J. Comput. Chem. 2003, 24, 1016–1025. [Google Scholar] [CrossRef]
  69. McGibbon, R.T.; Beauchamp, K.A.; Harrigan, M.P.; Klein, C.; Swails, J.M.; Hernández, C.X.; Schwantes, C.R.; Wang, L.-P.; Lane, T.J.; Pande, V.S. MDTraj: A Modern Open Library for the Analysis of Molecular Dynamics Trajectories. Biophys. J. 2015, 109, 1528–1532. [Google Scholar] [CrossRef]
  70. Spiridon, L.; Minh, D.D.L. Hamiltonian Monte Carlo with Constrained Molecular Dynamics as Gibbs Sampling. J. Chem. Theory Comput. 2017, 13, 4649–4659. [Google Scholar] [CrossRef]
  71. Vangone, A.; Bonvin, A.M. Contacts-based prediction of binding affinity in protein–protein complexes. eLife 2015, 4, e07454. [Google Scholar] [CrossRef]
  72. Ficke, A.; Cowger, C.; Bergstrom, G.C.; Brodal, G. Understanding yield loss and pathogen biology to improve disease management: Septoria nodorum blotch—A case study in wheat. Plant Dis. 2018, 102, 696–707. [Google Scholar] [CrossRef]
  73. Brookes, E.; Rocco, M.; Vachette, P.; Trewhella, J. AlphaFold-predicted protein structures and small-angle X-ray scattering: Insights from an extended examination of selected data in the Small-Angle Scattering Biological Data Bank. J. Appl. Crystallogr. 2023, 56 Pt 4, 910–926. [Google Scholar] [CrossRef]
  74. Ramelot, T.A.; Tejero, R.; Montelione, G.T. Representing structures of the multiple conformational states of proteins. Curr. Opin. Struct. Biol. 2023, 83, 102703. [Google Scholar] [CrossRef]
  75. Abdollahi, H.; Prestegard, J.H.; Valafar, H. Computational modeling multiple conformational states of proteins with residual dipolar coupling data. Curr. Opin. Struct. Biol. 2023, 82, 102655. [Google Scholar] [CrossRef]
  76. Papadopoulos, N.; Nédélec, A.; Derenne, A.; Şulea, T.A.; Pecquet, C.; Chachoua, I.; Vertenoeil, G.; Tilmant, T.; Petrescu, A.-J.; Mazzucchelli, G.; et al. Oncogenic CALR mutant C-terminus mediates dual binding to the thrombopoietin receptor triggering complex dimerization and activation. Nat. Commun. 2023, 14, 1881. [Google Scholar] [CrossRef]
Figure 1. ADP, ATP and no ligand, respectively. Crystal structures of active (right) and inactive (left) versus the AlphaFold Database model (center). A simplified representation of each state is presented, showcasing how the AF2 model combines the two existing structures.
Figure 1. ADP, ATP and no ligand, respectively. Crystal structures of active (right) and inactive (left) versus the AlphaFold Database model (center). A simplified representation of each state is presented, showcasing how the AF2 model combines the two existing structures.
Ijms 26 00500 g001
Figure 2. Alignment of secondary structure profiles of all models generated for ZAR1 and of the inactive crystal. Model names have been abbreviated as follows: AF2 DB: AlphaFold2 Database; AF2 AC: AlphaFold2 Active Control; AF2 IC: AlphaFold2 Inactive Control; AF2 AM: AlphaFold2 Active MSA; AF2 IM: AlphaFold2 Inactive MSA; AF3 ATP/ADP/No Ligand: AlphaFold3 model with ATP, ADP and no ligand, respectively; RFAA ADP/ATP/No Ligand: RoseTTAFold all-atom model, with ADP, ATP and no ligand, respectively; OmegaFold: OmegaFold. The underline sequence residues highlight areas which adopt a beta strand structure, but do not feature typical LxxLxL motifs (which are marked in purple).
Figure 2. Alignment of secondary structure profiles of all models generated for ZAR1 and of the inactive crystal. Model names have been abbreviated as follows: AF2 DB: AlphaFold2 Database; AF2 AC: AlphaFold2 Active Control; AF2 IC: AlphaFold2 Inactive Control; AF2 AM: AlphaFold2 Active MSA; AF2 IM: AlphaFold2 Inactive MSA; AF3 ATP/ADP/No Ligand: AlphaFold3 model with ATP, ADP and no ligand, respectively; RFAA ADP/ATP/No Ligand: RoseTTAFold all-atom model, with ADP, ATP and no ligand, respectively; OmegaFold: OmegaFold. The underline sequence residues highlight areas which adopt a beta strand structure, but do not feature typical LxxLxL motifs (which are marked in purple).
Ijms 26 00500 g002
Figure 3. Comparison between the solved structures of the sparser active (ATP-bound) state (left) and the more compact inactive (ADP-bound) state (right). For each structure: CC domain—orange, NBS domain—blue, LRR domain: pink.
Figure 3. Comparison between the solved structures of the sparser active (ATP-bound) state (left) and the more compact inactive (ADP-bound) state (right). For each structure: CC domain—orange, NBS domain—blue, LRR domain: pink.
Ijms 26 00500 g003
Figure 4. Bar plots for each inter-domain pair for the inactive conformations. The black bars show how many times a residue in each domain interacts with a residue in the corresponding domain, for the generated models. The red bars are contacts featured in the experimentally solved structure. (A): CC-NBS; (B): NBS-LRR; (C): CC-LRR.
Figure 4. Bar plots for each inter-domain pair for the inactive conformations. The black bars show how many times a residue in each domain interacts with a residue in the corresponding domain, for the generated models. The red bars are contacts featured in the experimentally solved structure. (A): CC-NBS; (B): NBS-LRR; (C): CC-LRR.
Ijms 26 00500 g004
Figure 5. Similarity tree and matrix for the 32 sequences modeled further, as well as for the A0A5S9WPD4 sequence, which was eliminated (due to high similarity to A0A1P8AP86).
Figure 5. Similarity tree and matrix for the 32 sequences modeled further, as well as for the A0A5S9WPD4 sequence, which was eliminated (due to high similarity to A0A1P8AP86).
Ijms 26 00500 g005
Figure 6. Histogram of pre- and post-minimization distributions of MolProbity scores for a subset of the generated models.
Figure 6. Histogram of pre- and post-minimization distributions of MolProbity scores for a subset of the generated models.
Ijms 26 00500 g006
Figure 7. Distribution of CC (brown circles), Arc2 (blue circles) and LRR (purple circles) domains of the modeled sequences, and the same domains of the experimental structures (red circle: active Arc2 center, red triangle: active CC center, red star: active LRR center; blue circle: inactive Arc2 center, blue triangle: inactive CC center, blue star: inactive LRR center). The orange triangle represents the “VGG—NBS center—Arc1 center” origin plane, as described.
Figure 7. Distribution of CC (brown circles), Arc2 (blue circles) and LRR (purple circles) domains of the modeled sequences, and the same domains of the experimental structures (red circle: active Arc2 center, red triangle: active CC center, red star: active LRR center; blue circle: inactive Arc2 center, blue triangle: inactive CC center, blue star: inactive LRR center). The orange triangle represents the “VGG—NBS center—Arc1 center” origin plane, as described.
Ijms 26 00500 g007
Figure 8. The histogram of Iaf2ac/Imodel ratio for the NBD-LRR region. The blue line corresponds to the experimental inactive state, and the red line to the experimental active state.
Figure 8. The histogram of Iaf2ac/Imodel ratio for the NBD-LRR region. The blue line corresponds to the experimental inactive state, and the red line to the experimental active state.
Ijms 26 00500 g008
Figure 9. The histogram of Iaf2ac/Imodel ratio for the CC-NBD-LRR1-5 region. The EDVID type is shown in blue bars, and the RPS5 type is shown in orange bars. The blue line corresponds to the experimental inactive state, and the red line to the experimental active state.
Figure 9. The histogram of Iaf2ac/Imodel ratio for the CC-NBD-LRR1-5 region. The EDVID type is shown in blue bars, and the RPS5 type is shown in orange bars. The blue line corresponds to the experimental inactive state, and the red line to the experimental active state.
Ijms 26 00500 g009
Figure 10. The histogram of ACCaf2ac/ACCmodel ratios for the CC-NBD-LRR1-5 region. The blue line corresponds to the experimental inactive state, and the red line to the experimental active state.
Figure 10. The histogram of ACCaf2ac/ACCmodel ratios for the CC-NBD-LRR1-5 region. The blue line corresponds to the experimental inactive state, and the red line to the experimental active state.
Ijms 26 00500 g010
Figure 11. From left to right: Model of M1C2N4 (TNL) protein generated by OmegaFold with solved structure of NBS domain of ZAR1 protein; model of M1C2N4 generated by AlphaFold3 with solved structure of NBS and LRR domains of TNL protein ROQ1 (RCSB Code: 7JLV). For each structure: TIR domain—orange, NBS domain—blue, LRR domain: purple.
Figure 11. From left to right: Model of M1C2N4 (TNL) protein generated by OmegaFold with solved structure of NBS domain of ZAR1 protein; model of M1C2N4 generated by AlphaFold3 with solved structure of NBS and LRR domains of TNL protein ROQ1 (RCSB Code: 7JLV). For each structure: TIR domain—orange, NBS domain—blue, LRR domain: purple.
Ijms 26 00500 g011
Table 1. RMSD (Å) of individual domains for the highest ranked models using an active ATP state structure as reference.
Table 1. RMSD (Å) of individual domains for the highest ranked models using an active ATP state structure as reference.
Model NameCCNBS-ARC1ARC2LRR
AF2—Database14.461.861.631.45
AF2—Active Control1.390.490.420.53
AF2—Inactive Control18.511.861.510.84
AF2—Active MSA1.550.590.610.96
AF2—Inactive MSA14.401.791.530.97
AF3—ADP14.411.801.641.42
AF3—ATP14.391.891.581.33
AF3—No Ligand14.451.921.591.51
RFAA—ADP14.422.041.641.86
RFAA—ATP14.502.051.782.02
RFAA—No Ligand14.451.921.651.96
OmegaFold14.481.871.333.11
Table 2. RMSD (Å) of individual domains for the highest ranked models using the Inactive ADP state structure as reference.
Table 2. RMSD (Å) of individual domains for the highest ranked models using the Inactive ADP state structure as reference.
Model NameCCNBS-ARC1ARC2LRR
AF2—Database12.421.330.881.34
AF2—Active Control19.191.891.450.78
AF2—Inactive Control0.800.690.350.47
AF2—Active MSA18.481.881.510.90
AF2—Inactive MSA12.540.860.440.75
AF3—ADP11.991.310.981.22
AF3—ATP12.111.350.871.16
AF3—No Ligand12.081.260.931.37
RFAA—ADP12.741.411.021.77
RFAA—ATP12.651.531.041.92
RFAA—No Ligand12.361.421.111.89
OmegaFold12.991.240.873.04
Table 3. Global RMSD (Å) for the highest ranked models using crystal structures as reference.
Table 3. Global RMSD (Å) for the highest ranked models using crystal structures as reference.
Model NameRMSD vs. ActiveRMSD vs. Inactive
AF2—Database22.1586.004
AF2—Active Control0.83223.036
AF2—Inactive Control22.6120.675
AF2—Active MSA1.27122.873
AF2—Inactive MSA21.9975.837
AF3—ADP22.1195.785
AF3—ATP22.0765.879
AF3—No Ligand22.1915.889
RFAA—ADP22.4066.756
RFAA—ATP22.6466.792
RFAA—No Ligand22.4246.027
OmegaFold22.3746.442
Table 4. Binding free energy between ZAR1 domains. Energies reported in kcal/mole.
Table 4. Binding free energy between ZAR1 domains. Energies reported in kcal/mole.
Model NameCC/NBD InterfaceNBD/LRR InterfaceCC/LRR Interface
Inactive Crystal−8.8−14.0−7.8
AF2—Database−10.1−15.1−8.5
AF2—Inactive Control−9.7−14.7−8.3
AF2—Inactive MSA−10.1−14.9−8.6
AF3—ADP−10.9−15.6−8.9
AF3—ATP−10.7−15.0−9.2
AF3—No Ligand−10.6−15.4−9.1
RFAA—ADP−9.7−15.0−9.6
RFAA—ATP−9.7−16.2−9.3
RFAA—No Ligand−9.7−14.4−9.2
Table 5. The 36 CNL clusters represented by the member with the most publications in each cluster. Clusters that were eliminated are written in bold.
Table 5. The 36 CNL clusters represented by the member with the most publications in each cluster. Clusters that were eliminated are written in bold.
Protein NameLengthClusterDomainsPublications
A0A1P8AP868881CC-NBD-ARC1-ARC2-LRR2
Q9SI858932CC-NBD-ARC1-ARC2-LRR3
Q940K08893CC-NBD-ARC1-ARC2-LRR6
Q9M6678354CC-NBD-ARC1-ARC2-LRR5
Q9C6468995CC-NBD-ARC1-ARC2-LRR3
Q392149266CC-NBD-ARC1-ARC2-LRR10
Q8W4749077CC-NBD-ARC1-ARC2-LRR-X4
Q8RXS58888CC-NBD-ARC1-ARC2-LRR3
Q8W3K39109CC-NBD-ARC1-ARC2-LRR3
O6497388910CC-NBD-ARC1-ARC2-LRR10
A0A654EJG690411CC-NBD-ARC1-ARC2-LRR0
Q8L3R388512CC-NBD-ARC1-ARC2-LRR4
Q8W4J990813CC-NBD-ARC1-ARC2-LRR12
Q9STE584714CC-NBD-ARC1-ARC2-LRR2
Q9LQ5487015CC-NBD-ARC1-ARC2-LRR3
A0A7G2ET34130616X-CC-NBD-ARC1-ARC2-LRR0
Q9XIF090617CC-NBD-ARC1-ARC2-LRR2
Q9LVT394818CC-NBD-ARC1-ARC2-LRR-X2
Q9STE784719CC-NBD-ARC1-ARC2-LRR2
Q9FG9086220CC-NBD-ARC1-ARC2-LRR2
Q9LRR5142421CC-NBD-ARC1-ARC2-LRR-X-LRR2
Q9FLB487422CC-NBD-ARC1-ARC2-LRR2
A0A5S9WIX487523CC-NBD-ARC1-ARC2-LRR0
A0A5S9WPD4102524CC-NBD-ARC1-ARC2-LRR-X0
A0A654EJV266125CC-NBD-ARC1-ARC2-LRR0
A0A654FPA288126CC-NBD-ARC1-ARC2-LRR0
O8248489227CC-NBD-ARC1-ARC2-LRR2
Q8W3K0113828CC-NBD-ARC1-ARC2-LRR5
Q3883485229CC-NBD-ARC1-ARC2-LRR10
Q9LRR4105430CC-NBD-ARC1-ARC2-LRR2
P6083988431CC-NBD-ARC1-ARC2-LRR2
P6083889432CC-NBD-ARC1-ARC2-LRR5
A0A654EJC392133CC-NBD-ARC1-ARC2-LRR0
Q9LMP685134CC-NBD-ARC1-ARC2-LRR2
Q9SX3885735CC-NBD-ARC1-ARC2-LRR2
Q4248490936CC-NBD-ARC1-ARC2-LRR17
Table 6. Table containing relevant solved structures of NLR proteins. The ZAR1 structures have been bolded.
Table 6. Table containing relevant solved structures of NLR proteins. The ZAR1 structures have been bolded.
UniProt Acc.RCSB AccChainDomainsOrganismGeometric
Classification
Q388346J5TCCC-NBD-ARC1-ARC2-LRRA. thalianaActive
Q388346J6ICCC-NBD-ARC1-ARC2-LRRA. thalianaActive
Q9ZSN57CRCAX-TIR-NBD-ARC1-ARC2-LRRA. thalianaActive
Q9ZSN57DFVAX-TIR-NBD-ARC1-ARC2-LRRA. thalianaActive
A0A290U7C47JLVATIR-NBD-ARC1-ARC2-LRR-XN. benthamianaActive
S5ABD67XC2ACC-NBD-ARC1-ARC2-LRRT. monococcumActive
S5ABD67XE0ACC-NBD-ARC1-ARC2-LRRT. monococcumActive
S5ABD67XX2ACC-NBD-ARC1-ARC2-LRRT. monococcumActive
Q388346J5WACC-NBD-ARC1-ARC2-LRRA. thalianaInactive
A1X8776S2PNCC-NBD-ARC1-ARC2-LRRS. lycopersicumInactive
A1X8778BV0ACC-NBD-ARC1-ARC2-LRRS. lycopersicumInactive
A0A0S3ANR18RFHACC-NBD-ARC1-ARC2-LRRN. benthamianaInactive
A0A3Q7IF178XUOACC-NBD-ARC1-ARC2-LRRS. lycopersicumInactive
A0A3Q7IF178XUQACC-NBD-ARC1-ARC2-LRRS. lycopersicumInactive
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Șulea, T.A.; Martin, E.C.; Bugeac, C.A.; Bectaș, F.S.; Iacob, A.-L.; Spiridon, L.; Petrescu, A.-J. Lessons from Deep Learning Structural Prediction of Multistate Multidomain Proteins—The Case Study of Coiled-Coil NOD-like Receptors. Int. J. Mol. Sci. 2025, 26, 500. https://doi.org/10.3390/ijms26020500

AMA Style

Șulea TA, Martin EC, Bugeac CA, Bectaș FS, Iacob A-L, Spiridon L, Petrescu A-J. Lessons from Deep Learning Structural Prediction of Multistate Multidomain Proteins—The Case Study of Coiled-Coil NOD-like Receptors. International Journal of Molecular Sciences. 2025; 26(2):500. https://doi.org/10.3390/ijms26020500

Chicago/Turabian Style

Șulea, Teodor Asvadur, Eliza Cristina Martin, Cosmin Alexandru Bugeac, Floriana Sibel Bectaș, Anca-L Iacob, Laurențiu Spiridon, and Andrei-Jose Petrescu. 2025. "Lessons from Deep Learning Structural Prediction of Multistate Multidomain Proteins—The Case Study of Coiled-Coil NOD-like Receptors" International Journal of Molecular Sciences 26, no. 2: 500. https://doi.org/10.3390/ijms26020500

APA Style

Șulea, T. A., Martin, E. C., Bugeac, C. A., Bectaș, F. S., Iacob, A.-L., Spiridon, L., & Petrescu, A.-J. (2025). Lessons from Deep Learning Structural Prediction of Multistate Multidomain Proteins—The Case Study of Coiled-Coil NOD-like Receptors. International Journal of Molecular Sciences, 26(2), 500. https://doi.org/10.3390/ijms26020500

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop