Next Article in Journal
Cytochrome P450 BsCYP99A44 and BsCYP704A177 Confer Metabolic Resistance to ALS Herbicides in Beckmannia syzigachne
Next Article in Special Issue
Roles of Calcium Signaling in Gene Expression and Photosynthetic Acclimatization of Solanum lycopersicum Micro-Tom (MT) after Mechanical Damage
Previous Article in Journal
Hyaluronic Acid Scaffolds for Loco-Regional Therapy in Nervous System Related Disorders
Previous Article in Special Issue
The Impact of Non-Nodulating Diazotrophic Bacteria in Agriculture: Understanding the Molecular Mechanisms That Benefit Crops
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

RLPredictiOme, a Machine Learning-Derived Method for High-Throughput Prediction of Plant Receptor-like Proteins, Reveals Novel Classes of Transmembrane Receptors

by
Jose Cleydson F. Silva
1,
Marco Aurélio Ferreira
2,
Thales F. M. Carvalho
3,
Fabyano F. Silva
4,
Sabrina de A. Silveira
5,
Sergio H. Brommonschenkel
6 and
Elizabeth P. B. Fontes
2,*
1
National Institute of Science and Technology in Plant-Pest Interactions, Bioagro, Viçosa 36570-900, Brazil
2
Departament of Biochemistry and Molecular Biology, Universidade Federal de Viçosa, Viçosa 36570-900, Brazil
3
Institute of Engineering, Science and Technology, Universidade Federal dos Vales do Jequitinhonha e Mucuri, Janaúba 39447-814, Brazil
4
Departament of Animal Science, Universidade Federal de Viçosa, Viçosa 36570-900, Brazil
5
Department of Computer Science, Universidade Federal de Viçosa, Viçosa 36570-900, Brazil
6
Plant Pathology Department/Bioagro, Universidade Federal de Viçosa, Viçosa 36570-900, Brazil
*
Author to whom correspondence should be addressed.
Int. J. Mol. Sci. 2022, 23(20), 12176; https://doi.org/10.3390/ijms232012176
Submission received: 30 July 2022 / Revised: 8 October 2022 / Accepted: 9 October 2022 / Published: 12 October 2022
(This article belongs to the Special Issue State-of-the-Art Molecular Plant Sciences in Brazil)

Abstract

:
Cell surface receptors play essential roles in perceiving and processing external and internal signals at the cell surface of plants and animals. The receptor-like protein kinases (RLK) and receptor-like proteins (RLPs), two major classes of proteins with membrane receptor configuration, play a crucial role in plant development and disease defense. Although RLPs and RLKs share a similar single-pass transmembrane configuration, RLPs harbor short divergent C-terminal regions instead of the conserved kinase domain of RLKs. This RLP receptor structural design precludes sequence comparison algorithms from being used for high-throughput predictions of the RLP family in plant genomes, as has been extensively performed for RLK superfamily predictions. Here, we developed the RLPredictiOme, implemented with machine learning models in combination with Bayesian inference, capable of predicting RLP subfamilies in plant genomes. The ML models were simultaneously trained using six types of features, along with three stages to distinguish RLPs from non-RLPs (NRLPs), RLPs from RLKs, and classify new subfamilies of RLPs in plants. The ML models achieved high accuracy, precision, sensitivity, and specificity for predicting RLPs with relatively high probability ranging from 0.79 to 0.99. The prediction of the method was assessed with three datasets, two of which contained leucine-rich repeats (LRR)-RLPs from Arabidopsis and rice, and the last one consisted of the complete set of previously described Arabidopsis RLPs. In these validation tests, more than 90% of known RLPs were correctly predicted via RLPredictiOme. In addition to predicting previously characterized RLPs, RLPredictiOme uncovered new RLP subfamilies in the Arabidopsis genome. These include probable lipid transfer (PLT)-RLP, plastocyanin-like-RLP, ring finger-RLP, glycosyl-hydrolase-RLP, and glycerophosphoryldiester phosphodiesterase (GDPD, GDPDL)-RLP subfamilies, yet to be characterized. Compared to the only Arabidopsis GDPDL-RLK, molecular evolution studies confirmed that the ectodomain of GDPDL-RLPs might have undergone a purifying selection with a predominance of synonymous substitutions. Expression analyses revealed that predicted GDPGL-RLPs display a basal expression level and respond to developmental and biotic signals. The results of these biological assays indicate that these subfamily members have maintained functional domains during evolution and may play relevant roles in development and plant defense. Therefore, RLPredictiOme provides a framework for genome-wide surveys of the RLP superfamily as a foundation to rationalize functional studies of surface receptors and their relationships with different biological processes.

1. Introduction

The capacity to transiently regulate cellular processes in response to external environmental signals is crucial to all living organisms. While the downstream regulatory events in a signaling cascade can involve biochemical modifications, including protein phosphorylation, ligand binding, and allosteric regulation, as well as changes in the transcription/translation profiles, the initial sensing event is predominantly mediated by membrane receptors. In plants, two major classes of proteins with membrane receptor structural configuration co-exist, namely receptor-like kinases (RLK) and receptor-like proteins (RLP) [1,2]. The receptor-like kinases comprise a large family with more than 420 family members in Arabidopsis [3]. These transmembrane receptors harbor a divergent extracellular domain (ectodomain, ECD) at the N-terminal region, followed by a transmembrane segment (TM) and a C-terminal cytoplasmic signaling domain. This configuration of a single-pass transmembrane kinase receptor invokes a mechanism of ligand binding-induced homo or hetero oligomerization of RLKs as the essential early event for signaling and transducing from the receptor, similarly to the receptor-tyrosine kinases (RTK) of mammalian cells [4,5]. In this scenario, ECD is the stimulus-sensing, ligand recognition domain that induces multimerization, and the kinase domain functions as the phosphorylation-dependent transducing module that relays the signal intracellularly.
Phylogenetic analyses based on the RLK kinase domains organized their ectodomain into clusters of conserved motifs and classified the RLKs into 15 subfamilies. Among them, the leucine-rich repeat (LRR)-RLK subfamily is further subdivided into 13 subfamilies (LRRI-XIII) according to the LRR motif organization ranging from 3 to 26 LRRs [6,7]. The RLK family size has been determined in other plant species, which revealed even larger RLK gene families in the genome of soybean, rice, and tomato [3,8,9,10]. The complexity of the RLK superfamily may reflect the intricate coordination of plant responses to external signals during plant development and interactions with the biotic and abiotic environment. Accordingly, several RLKs have been functionally characterized in development, environmental stresses, and plant defenses (for more details, see references [11,12,13,14,15,16,17,18,19,20,21,22]).
RLKs are also involved in plant immunity and function as pattern recognition receptors (PRRs), which perceive pathogen-associated molecular patterns (PAMPs) or damage-associated molecular patterns (DAMPs) presented, respectively, by pathogens and plants during infection. Interaction of PRRs with PAMPs/DAMPs initiates PAMP-triggered immunity (PTI), the first layer of the innate immune system in plants [23]. Many examples of leucine-rich repeat receptor-like kinases (LRR-RLKs) have been functionally characterized as PRRs (for more details, see references [24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42]).
The second class of plant transmembrane proteins, RLPs, are built into an N-terminal extracellular domain, which shares similar motifs with RLK ectodomains, an internal single transmembrane segment followed by a short cytoplasmic domain that lacks a transducing-kinase domain [23]. RLPs are structurally similar to Toll-like receptors (TLRs) involved in mammalian immunity, which also contain a leucine-rich repeat ectodomain and a short cytoplasmic tail [5]. The RLP configuration poses a higher degree of complexity for signaling as they depend on heterodimerization with RLKs or association with receptor-like cytoplasmic kinases (RLCK) for transducing a stimulus from the receptor. Accordingly, the leucine-rich repeat receptor-like protein (LRR-RLP) TOO MANY MOUTHS (TMM) forms complexes with LRR-RLKs ERECTA and ERECTA-LIKE 1 (ERL1) to perceive the EPIDERMAL PATTERNING FACTOR 1 (EPF1) and EPF2 peptides for the regulation of stomatal patterning [43], and CLAVATA2 RLP is required for the stability of CLAVATA1 (CLV1) RLK [44]. Likewise, lysine motif (LysM)-RLPs, LYSIN-MOTIF 1 (LYM1), and LYM3 associate with the LysM-RLK CERK1 (CHITIN ELICITOR RECEPTOR KINASE 1) to recognize bacterial peptidoglycans [45], and the LRR-RLP RLP23 forms a complex with the LRR-RLK SUPPRESSOR OF BIR1-1 (SOBIR1) that recognizes NECROSIS- AND ETHYLENE-INDUCING PEPTIDE 1 (NEP1)-LIKE PROTEINS (NLPs) to trigger PTI signaling [46]. In addition to these Arabidopsis RLPs, the first characterized RLP, Cf-9, was identified in tomato plants as an LRR-RLP and has been shown to trigger effector-triggered immunity (ETI)-like signaling, elicited specifically by the Cladosporium fulvum Av9 effector [47]. The tomato LRR-RLP Cf-4 is also required for resistance to C. fulvum expressing the Avr4 gene [48]. Cf-9 and Cf-4 associate with the RLKs SOBIR1 AND BRI1-ASSOCIATED KINASE 1 to initiate receptor endocytosis and plant immunity [49]. Likewise, N. benthamiana LRR-RLP RESPONSE TO XEG1 (RXEG1), which recognizes the glycoside hydrolase 12 protein XEG1, and RLP RE02 (Response to VmE02) forms a complex with BAK1 and SOBIR1 to transduce the XEG1- and VmE02- induced defense signals, respectively [50,51]. The rice RLP, OsRLP1, also interacts with OsSOBIR1 to induce immune responses against viral infection [52].
Although some progress has been reached in characterizing RLPs, a biological function has been assigned to only a few plant RLPs, despite their conceptual relevance in cell signaling events. While 15 RLK subfamilies with distinct ECD have been detected in Arabidopsis, only three Arabidopsis RLP subfamilies have been identified based on single-gene identification and functional studies [2]. The only genome-wide study of RLPs was restricted to the LRR-RLP subfamily [53]. In the case of RLKs, the successful identification and organization of the superfamily in different subfamilies relied on methods that use algorithms, such as BLAST and hidden Markov models (HMM), to perform searches for sequence alignments of conserved regions. One possible explanation for the poor characterization of RLPs may be the difficulty of assigning members to this family based on sequence comparison, as they lack the conserved C-terminal serine/threonine kinase domain, restricting the prediction of novel RLPs. In addition to requiring RLPs to be associated with a kinase domain-containing receptor for signaling, the lack of a cytoplasmic transducing kinase domain prevents genome-wide predictions of RLP subfamilies based on sequence comparisons. Therefore, a complete inventory of the RLP family in the genome of different plant species is lacking, and, hence, functional studies have been limited.
The limitation of software based on multiple sequence alignments for identifying RLPs may be overcome with the application of artificial intelligence algorithms developed based on filters that support the point features of these receptors. In artificial intelligence, machine learning (ML) has emerged as a potential tool in molecular biology to analyze massive datasets and extract knowledge from complex biosystems [54]. ML has been extensively used in all sorts of thematic issues, from medicine to robotics [55,56,57]. In plant science, ML has been applied for viral gene identification [58], the diagnosis of bacterial infection [59], salt stress tolerance [60], and the taxonomy of grapevine [61], in addition to global analysis of gene expression, in response to hormones and environmental stresses [62], plant immunity, and miRNA network prediction [54]. Trained models have also been successfully used for functional protein classification in plant genomes [63].
To provide a framework for identifying and predicting RLP function, we developed the RLPredictiOme as a machine learning method associated with Bayesian inference approaches. In addition to six different features to train ML models, the method used multiple datasets based on RLK ectodomains and the hypothesis that RLP lacks the kinase domain but retains the same RLK receptor configuration. It is reasonable to suppose that the RLP family may contain all RLK-identified ectodomains as they may have emerged during evolution from kinase domain-losing RLKs. So far, five RLK different ectodomains-containing RLP groups have been identified [53]. Our ML models could distinguish RLPs from non-RLPs (NRLPs), RLPs from RLKs and classify subfamilies with relatively high accuracy, precision, sensitivity, and specificity. To prove the capacity to predict RLP families, we validated the method with biological experiments describing a new RLP family, designated GDPDL-RLP. The RLPredictiOme may facilitate the prediction and provide new insights into the role of RLPs in plants.

2. Results

2.1. Revisiting the Ectodomain of the RLK Superfamily in Plants

We performed a survey in the genome of 80 plant species to identify the functional ectodomains of RLKs based on in silico models as a first step for defining the datasets. A total of 40,418 sequences were retrieved. We identified 100 classes of RLK ectodomains associated with C-terminal kinase domains (Table 1). However, most of these ectodomains generated subfamilies with less than 10 members. Sequence identities higher than 0.85 were removed through CD-hit software. Additionally, only sequences with a single membrane segment were selected. A total of 14,787 amino acid sequences were recovered, and their ectodomains were used as positive datasets for filtering RLPs versus NRLPs and RLPs versus RLKs.
Three datasets were created to represent a higher number of negative examples. The first dataset contained 14,973 positive examples and 15,993 negative ones. The second and third ones contained the same examples, 14,973 positives, and 15,973 negative examples. To distinguish RLPs from NRLPs, we used six types of features (see Methods sections) from the three datasets, thus implying a total of 18 training sets. On the other hand, to distinguish RLPs from RLKs, only one dataset with 14,973 positives (ectodomain of the RLKs) and negative (full-length sequence of the RLKs) examples were used, implied in six training sets based on the assumed number of features.
The RLP subfamily members were assigned according to the ectodomains of RLKs. For each training set, 15 classes were considered, and a 16th class, designated Other RLPs, was defined by grouping the smaller subfamilies (Table 2). In some plant species, uncharacterized RLK subfamilies have at least one to ten members and were grouped in the class Other-RLPs. LRR-RLKs, unknown-RLK, S-domain-RLK, and WAK-RLKs are over-represented RLK subfamilies in plants. In contrast, thaumatin, GDPD, and malectin are small subfamilies not represented in all plant species [9]. For each super-represented subfamily, 500 sequences were randomly selected to compose ten additional datasets; thereby, considering the previously mentioned six types of features, 60 training sets were obtained for training.

2.2. Feature Analysis

We implemented the RLPredictiOme method using six distinct types of attributes (Figure 1). These included (i) the frequency of the chemical properties of amino acid side chains (CPAASC), which have 9 features, and (ii) CPAASC2 extracted from N-terminal and C-terminal regions with 18 features; (iii) the amino acid composition with 20 features and (iv) amino acid composition extracted from N-terminal and C-terminal regions with 40 features (Figure 1B). Furthermore, we used (v) dipeptide and (vi) tripeptide compositions resulting in 400 and 8000 features, respectively. The simultaneous use of six types of features and multiple datasets provided RLPredictiOme with information to apply Bayesian inference (see Section 4) as a powerful ensemble method to make robust predictions.
For the classification models for RLPs/NRLPs (first step, Figure 1C), the tripeptide composition was the feature with the best performance among all tested features of the models built with the RLPs/NRLPs datasets using the logistic regression algorithm (Table 3). The three models built with tripeptide composition achieved accuracy (ACC) of 0.953, 0.955, and 0.953, respectively, and Matthew’s correlation coefficient (MCC) of 0.906, 0.910, and 0.96, respectively. Furthermore, the false discovery rate (FDR) was lower than 0.05.
For the classification models for RLPs/RLKs (second step, Figure 1D), the amino acid composition of the N-terminus and C-terminus and tripeptide composition were the features archiving both the best performance, resulting in ACC of 0.97, MCC of 0.95 and FDR lower than 0.05 (Table 4). In the RLP subfamily models built with RLP subfamily datasets (third step, Figure 1E), the tripeptide composition outperformed the others, with ACC and MCC of 0.984 and 0.866, respectively (Table 5).

2.3. ML Model Capacity of Distinguishing RLPs from NRLPs

The ability of the ML models to distinguish RLPs from NRLPs was examined through the predictive capacity of the models created with the RLPs/NRLPs datasets (Figure 1C). The models that classify RLPs/NRLPs were evaluated using 10-fold cross-validation based on the following metrics: ACC, sensitivity, precision, F-measure, specificity, FDR, and MCC. For each dataset, 21 models (21 algorithms) were selected, and the performance results are presented in Table 3. In general, the selected models provided average values for ACC, F-measure, FDR, MCC, precision, sensitivity, and specificity equal to 0.93, 0.934, 0.070, 0.861, 0.948, 0.948, and 0.911, respectively.

2.4. ML Model Abilities to Distinguish RLPs from RLKs

To distinguish RLPs from RLKs, we assessed the generality of models constructed with RLP/RLK datasets (Figure 1D). The outcome of 10-fold cross-validations and evaluated metrics for RLPs/RLKs models are shown in Table 4. The quadratic discriminant analysis and gradient boosting classifier with the amino acid composition of the N-terminus, C-terminus, and tripeptide features outperformed the others (Table 4). The average performance of the six models provided ACC 0.968, F-measure 0.967, FDR 0.04, MCC 0.936, precision 0.981, sensitivity 0.981, and specificity 0.955, respectively.

2.5. The Ability of ML Models to Classify RLP Subfamilies

To classify the RLP subfamily, we evaluated models built with RLP subfamily datasets using 10-fold cross-validation. The performance of the models was examined by the previously mentioned metrics (Figure 1E). The tripeptide and dipeptide composition features achieved average MCC values higher than 0.90 when using the K-nearest neighbor algorithm. The N-terminus and C-terminus amino acid composition feature achieved an average MCC value of 0.899 using a calibrated classifier and linear discriminant analysis (Table 4). The average performance of the six models provided ACC 0.98, F-measure 0.874, FDR, MCC, precision 0.877, sensitivity 0.87, while MCC varied from 0.759 to 0.953 (Table 5).

2.6. Validation of RLPredictiOme

For RLPredictiOme validation, we tested the ML models in combination with Bayesian inference as an ensemble method approach (Figure 1). In the first validation, we submitted 47 near-characterized sequences of RLPs against the RLPredictiOme. The validation data set comprises thirty-nine LRR-RLPs, six LysM-RLPs, two WAK-RLPs, and one salt stress-responsive/antifungal-RLP (Table 6). However, six of these RLPs were not characterized as RLP as they did not have a TM. The test resulted in thirty-seven LRR-RLPs correctly classified, two LysM-RLPs were correctly classified, and two LysM-RLPs were classified as undefined due to relative low probability (p) provided by Bayesian inference of the RLP subfamily. The remaining two LysM-RLPs (Q67UE8.1 LYP4 and Q69T51.1 LYP6), one WAK-RLP (AKP45167), and one salt stress-responsive/antifungal- RLP (LOC_Os04g56430.1) were not classified as RLPs due to the TM absence (Table 6).
In the second validation, we used the data of a genome-wide study of RLPs restricted to the LRR-RLP subfamily [53]. The 57 LRR-RLPs of Arabidopsis were submitted to the RLPredictiOme predictor. As a result, 47 LRR-RLPs were classified correctly, although 13 LRR-RLPs did not have a signal peptide (SP). One LRR-RLP harboring SP was undefined, and the remaining nine LRR-RLPs were not classified as RLP due to the TM absence (Table 7). Interestingly, the AtRLP4 protein was previously classified as LRR-RLP; however, the RLPredictiOme classified it as malectin-RLP due to one di-glucose binding domain within the endoplasmic reticulum-associated LRR domain.
In a third validation, we selected 148 LRR-RLPs described in a genome-wide study of rice RLPs [64] (Table S1). The results show that 78 LRR-RLPs with SP and TM were correctly classified with a relatively high probability (greater than 0.98). Additionally, from 73 LRR-RLPs with a single TM, 71 were correctly classified, whereas 2 were classified as Other-RLPs with an estimated probability ranging from 0.792 to 0.805. Only four predicted LRR-RLPs from rice were classified as NRLPs; two lack both SP and TM, and two do not harbor TM. The fourth validation was carried out to ensure that RLPredictiOme does not randomly classify proteins. For this, 100 randomly generated sequences were confronted against RLPredictiOme, and all sequences were classified as NRLP in the first step (Table 8).

2.7. High Throughput Prediction of RLPs in the Arabidopsis Genome Using RLPredictiOme

We performed high throughput prediction by submitting the Arabidopsis sequences against RLPredictiOme. The cutoff tuning for the probability filter was assumed to be 0.6 in the first two-step and 0.7 in the last step (Figure 1F). In the third step, the probability estimates were more flexible in order to predict the RLP subfamilies.
From this genome-wide prediction, RLPredictiOme classified 176 RLP sequences into 15 subfamilies (Table S2). Table 9 summarizes the correct predictions within the subfamily. The number of proteins with unknown functions is highlighted in red, whereas the blue description represents the RLPs subfamilies predicted in other subfamilies. The LRR-RLPs subfamily contained 49 members. Three new members (AT5G37360, AT5G19230, and AT4G28560), predicted with relatively high probability, were not classified into a known subfamily, whereas two sequences were incorrectly classified. Interestingly, AtRLP4 has two domains, an LRR domain, and an endoplasmic reticulum protein-associated Di-glucose binding domain, which characterizes malectin proteins. The RLPredictiOme method classified the AtRLP4 into the malectin-RLP subfamily (see Table S2).
The candidate sequences with a legume lectin domain were classified into two RLP subfamilies, B-Lectin-RLP and L-Lectin-RLP (Table S2). Only one member was classified as B-Lectin-RLP with an unknown function, while six members were classified into the L-Lectin-RLP subfamily, also designated as unknown function proteins. Seven proteins were classified incorrectly into this subfamily. The 20 Lysin motif-containing candidate proteins were classified as LysM-RLP (Table S2). Two (AT1G77630.1 and AT2G17120.1) of the three previously characterized LysM-RLPs [65] and two classified LysM-RLPs (AT3G06360.1 and AT5G26270.1) belong to subfamilies previously identified as unknown function subfamilies, and one sequence (AT1G63550.1) belongs to the salt stress response/antifungal-RLP family. The other 15 sequences may belong to the lipid transfer protein family, not yet characterized. Additionally, the ectodomain lipid transfer family associated with a kinase domain was allocated in the other-RLP group as probable lipid transfer-RLK. Twelve sequences were classified as probable lipid transfer-RLP; however, this misclassification occurred in the LysM-RLP and unknown-RLP groups, which may be functionally similar. It may be due to the over-representability of these two mentioned groups.
In the malectin-RLP subfamily, RLPredictiOme correctly classified two members previously characterized (AT1G28340.1 and AT1G24485.1). Four candidate members were identified into subfamilies of unknown function, and seven sequences were incorrectly predicted (Table S2). Furthermore, the third previously identified malectin-RLP (AT3G46240.1) was predicted as an RCC1-RLP. This subfamily has seven predicted members without known functions. One salt stress response/antifungal-RLP was predicted within this family. The salt stress response/antifungal-RLPs had four members correctly classified and four predicted within other subfamilies (three in WAK-RLP and one in RCC1-RLP). The S-domain-RLP had a correctly and an incorrectly predicted sequence (Table S2).
As for the thaumatin-RLP subfamily, all six members were correctly predicted (Table S2). The WAK-RLP subfamily correctly predicted five members but also incorporated one candidate sequence with an unknown function and three salt stress response/antifungal-RLPs. Ectodomains without a functional domain were classified within a subfamily designated unknown-RLPs. This group also includes RLPs harboring the ectodomains PERK-like, extensin, RKF3-like, CrRLK1, and RLK10-like proline-rich proteins. RLPredictiOme predicted 46 sequences with unknown functions classified as an unknown-RLP subfamily (Table S2). The protein sequences, which are not classified correctly or have a low relative probability of subfamily classification, were designated as undefined and not considered RLPs. In summary, a total of 78 proteins were classified in this group (Table S2).
RLPredictiOme identified probable lipid transfer-RLPs, considered a novel RLP class associated with RLKs, yet to be characterized. Furthermore, three new classes of RLPs were predicted: plastocyanin-like-RLP, ring finger-RLP, and glycosyl-hydrolase-RLP, which contained eight, five, and seven members, respectively. Interestingly, five glycerophosphoryl diester phosphodiesterase family (GDPDL members were predicted as other-RLPs. As a rare protein family in plants, we selected GDPDL-RLP to carry out an experimental validation for these receptor-like protein candidates. The number of predicted RLPs in each subfamily is shown in Table 9.

2.8. GDPDL Family Downstream Analysis

Phylogenetic analysis of the kinase domain of the RLK family and the kinase domain of IRE1A and IRE1B, endoplasmic reticulum (ER)-specific protein kinase, clustered the kinase domain of GDPDL-RLK and thaumatin in the same group distinct from the ER kinases (Figure 2A). These results suggest that GDPDL-RLKs are not ER transmembrane proteins. The secondary structure and the topology of GDPDL show that the N-terminal region of GDPDL-RLK is composed of a signal peptide, a GDPD domain, and more than 10 candidate sites for N-glycosylation (Figure 2B). As an RLK, GDPDL-RLK contains an ectodomain facing the extracellular space, a transmembrane segment, and a cytoplasmic portion harboring the kinase domain. The topology of classified GDPDL-RLPs fits a typical RLP configuration with an N-terminal peptide signal, the glycerophosphoryl diester phosphodiesterase ectodomain, the transmembrane segment, and it lacks a short C-terminal cytoplasmic domain. GDPDL1 and GDPDl6 harbor two glycerophosphoryl diester phosphodiesterase domains, whereas GDPDL3/4/5 has a single domain localized in a similar position compared with GDPDL-RLK.
The molecular evolution of the new GDPDLs and the GDPDL-RLK ectodomain was investigated by calculating the ratio between non-synonymous and synonymous substitutions (Ka/Ks). Compared to the full-length sequence of GDPDL-RLK, only the gene pair GDPDL-RLK/GDPDL6 with a ratio of Ka/Ks > 1 may have undergone a positive selection (Table 10). The ectodomain sequence of GDPDL-RLK compared with gene pairs GDPL1/3/4 was submitted to purifying selection, as suggested by their Ka/Ks ratio < 1 and p-value < 0.05. The divergence time of GDPL1/3/4 was 23.7, 32.5, and 120.1 Mya. These results suggest that despite the divergence time of GDPL1/3/4 compared to the GDPDL-RLK ectodomain, the higher frequency of synonymous mutations may have maintained the GDPL1/3/4 and the ectodomain GDPDL-RLK functionally similar.

2.9. Identification of GDPDLs- and SNC4-Interacting Proteins from Arabidopsis

Protein–protein interactions between the GDPDLs and GDPDL-RLK, also designated SUPPRESSOR OF NPR1, CONSTITUTIVE 4 (SNC4), and the Arabidopsis proteins were identified in silico through the protein–protein interactome using Cytoscape software and several databases (BioGRID database, Arabidopsis interactome database, and the String database). This procedure identified the protein-protein interaction (PPI) network containing GDPDLs and directly interacting Arabidopsis proteins (Figure 3). The GDPDL6 formed the largest hub (degree 38). Among the GDLDL6-interacting proteins, the glycogen synthase kinase 3/SHAGGY-like kinases (GSKs-AT1G57870) may represent a candidate protein for signaling (Figure 3A, Table 11). Although GSKs have been recently discovered in plants, evidence suggests that they are involved in different biological processes, such as brassinosteroid signaling, flower development, and injury responses [66]). The node-hub GDPDL5 contains the AtMLP328 pathogenesis-related protein and other proteins of unknown function (Figure 3A, Table 11). The AtMLP328 is a member of the major latex protein-like (MLPL) gene family responsible for promoting vegetative growth and delaying flowering.
The cluster of GDPDL3-interacting proteins includes the BRASSINOSTEROIDE INSENTIVE 1 (BRI1)-ASSOCIATED RECEPTOR KINASE 1 (BAK1), also designated SOMATIC EMBRYOGENESIS RECEPTOR KINASE 3 (SERK3). BAK1 has been shown to function as a co-receptor for many RLKs, including the recruitment of receptor-like proteins and SOBIR to form a heterodimeric complex upon recognition of ligands by RLPs, for example, RLP23-SOBIR1-BAK1, cf-4-BAK1/SERK3- SOBIR1, RE02-BAK1-SOBIR1, and RXEG1-BAK1-SOBIR1 [46,49,51,67] (Figure 3A, Table 11).
The interactions of GDPDLs- and SNC4 converge to centralized hubs represented by BPA1, AT1G01080, and AT4G17720 (BPL1), which contain an RNA binding motif (Figure 3A, Table 11). The BPA1 protein has been shown to interact with Arabidopsis ACD11, which induces the expression of genes associated with disease resistance and genes involved in the ROS-mediated response defense upon recognizing fungal elicitors [68,69]. Furthermore, BPA1 and BPL1 are induced during geminivirus infection [70]. The GDPDLs-Arabidopsis PPI network is enriched for proteins involved in plant defense response to pathogens and vegetative growth, indicating that this new RLP family may be involved in immunity and developmental signaling.
To gain further insights into the cellular processes involved by GDPDLs, we performed functional enrichment analyses of their direct interactors. In all three categories, biological process, molecular function, and cellular component ontology, we identified enriched GO terms with a p-value < 0.05. Under molecular function, we identified enriched terms for Glycerophosphodiester phosphodiesterase activity, nucleotide binding, purine ribonucleotide binding, and hydrolase activity, which are unusual enzyme activities associated with membrane receptor activity (Table 10). Under the cellular component ontology, we observed an over-representation of proteins from plasma membrane term, membrane-bounded term, and plant-type cell wall term, which may suggest that the location and functional activities of these hubs are specific to transmembrane proteins. (Figure 3B). Under the biological process ontology, the response to defense response, response to external stimulus, and developmental growth term represented significantly enriched GO terms, which show that this family of proteins may be related to immunity and plant development (Table S3).

2.10. The Expression Profile of the GDPDLs in Response to Pathogens and Different Organs

To gain insights into the potential defense response of the GDPDLs genes and to validate these candidate receptor-like proteins as expressed genes, we investigated their expression profiles through publicly available expression datasets using the gene investigator (NEBION, AG, Zurich, Switzerland; www.genevestigator.com, academic free license, accessed on 28 February 2020) (Figure 4A). From these microarray data, GDPDL1-RLK was induced by aphids, the bacteria Pseudomonas syringae, and the begomovirus cabbage leaf curl virus (CabLCV), but not by nematodes. Likewise, GDPDL2-RLP is induced by bacteria and aphids, and begomoviruses to a lesser extent. GDPDL3-RLP and GDPDL4-RLP are upregulated by aphids and bacteria and down-regulated by begomovirus. GDPDL5 and GDPDL6 are not induced by aphids and bacteria but downregulated by CabLCV. As for organ-specific expression, except for GDPDL5-RLP and GDPDL6-RLP which only expressed in flowers and siliques, the remaining GDPDLs are expressed in all organs tested, although to a different extent (Figure 4B). While GDPDL1 and GDPDL2 expressions predominate in the developed rosette, GDPDL3 is highly expressed in germinated seeds, and the GDPDL4 expression is fairly distributed in all organs.
Pathogen-induced and organ-specific expression profiles of the predicted GDPDL-RLP genes were confirmed by qRT-PCR (Figure 5 and Figure 6). We also monitored the expression of the GDPDL-RLP genes in response to infections with tobacco rattle virus (TRV) and CabLCV. The antibacterial immune responses (PTI) were activated by treatment with flg22, and the expression of GDPDLs was monitored (Figure 5). Consistent with the microarray data, GDPL5 and GDPL6 expression was not affected by flg22 treatment but was downregulated by CabLCV, whereas GDPDL1 and GDPDL2 were induced by flg22 and CabLCV. All 5 GDPDLs analyzed by qRT-PCR were induced by TRV, a plant RNA virus. Remarkably, these GDPDL proteins are interconnected via interactions with RNA recognition motif-containing proteins, which form centralized hubs in the network interaction (Figure 3A, Table 11). This result may suggest an involvement of GDPDLs in the antiviral response induced by an RNA virus.
We also confirmed the expression profile of these GDPDL genes in different tissues by qRT-PCR. We used the root, pedicel, inflorescence axis, and flower tissues. The expression levels of GDPDL1 and GDPDL2 are similar in all tissues (Figure 6A,B). The highest expression levels were identified in the inflorescence axis and pedicel, suggesting distinct functions in development. Likewise, GDPDL3 is most expressed in roots and barely detected in other tissues (Figure 6C). Interestingly, the expression levels of GDPDL4 are regular in all tissues, showing that this protein may have a varied role during development (Figure 6D). In contrast, qRT-PCR confirmed that the GDPDL5 and GDPDL6 transcripts accumulated to elevated levels in flowers (Figure 6E, 6F). These gene expression analyses confirmed that GDPDL-RLPs are expressed in response to stimuli and development, substantiating the argument that they may form a new class of RLPs involved in immunity and developmental signaling.

3. Discussion

Due to the functional relevance of the RLK family in several biological processes, this large family has been extensively studied in different plant species [6,9,71,72,73,74,75]. In contrast, far less is known about the plant RLP family, despite their conceptual relevance in signaling modules. RLPs can perceive external signals but depend on association with RLKs for signal transduction due to the lack of a cytoplasmic kinase domain at the C-terminus. The absence of a conserved kinase domain precludes using sequence comparison algorithms for genome-wide studies of the plant RLP family. Thus, identifying RLPs in plant genomes is challenging, and few RLPs have been described in plant species. Moreover, a large-scale RLP prediction tool has not been developed. Here, we developed the RLPredictiOme method based on machine learning approaches and Bayesian inference for the throughout prediction of RLPs.
Typically, the ML classification models applied in plant molecular biology require actual data to train ML-supervised algorithms [54,76,77,78]. The RLPredictiOme can predict RLP subfamilies using the RLK ectodomain and simultaneously six types of features during the prediction process. The prediction model consists of three steps subsequently built with trained models and different algorithms capable of distinguishing RLP from NRLP, RLP from RLKs, and finally, predicting an RLP subfamily. The combination of several ML models with different algorithms has been applied for protein and viral sequence classification [58,63]. Using different classifiers requires methods that compile the results of the classifiers into a single final prediction. Some methods have used different techniques for model combinations, including a majoritarian vote of the classifiers or an average probability for the classifications [63,79]. The approaches applied in the RLPredictiOme by combining models are based on the success and failure of predictions, which are modeled with Bayesian inference. In each step after the classifications, the Bayesian inference is applied. The validation results of the RLPredictiOme showed high probabilities for classifying RLPs proteins (See Table 7, columns RLP-NRLP Probability, RLP-RLK Probability, and RLP-Subfamily Probability). In contrast, NRLP proteins were predicted with a lower probability (Table 8). Finally, based on the probability of Bayesian inferences for each step, the last step is used as a decision-making process for the prediction of RLPs (Figure 1F). The RLPredictiOme predicts RLP proteins with a probability ranging from 0.79 to 0.99 (See Table 7, Table 8 and Table 9, column Decision probability). Thus, the ML models can be successfully combined with Bayesian inference to perform robust high-throughput predictions of RLPs in plant genomes.
The RLPredictiOme could predict new RLP subfamilies with higher probability in all steps, although groups less represented were also classified into a corresponding subfamily, yet with lower probability. Furthermore, groups less represented by RLPs tended to be classified within other RLP subfamilies. This other RLP classification was the case of the probable lipid transfer-RLP subfamily, which shares similar functional characteristics with LysM-RLP. The lipid transfer proteins (LTPs), already described as non-specific lipid transfer proteins (nsLTPs), contain an eight-cysteine motif that is stabilized by four disulfide bonds (Wang et al., 2019). The probable lipid transfer family (PLT)-RLPs found by RLPredictiOme harbor a five-cysteine motif (CC-Xn-CXC-Xn-C) in the TP_2 functional domain differently from the typical nsLTPs [80]. Phylogenetics relationships, structure, and genome-wide distribution of LTPs, involved in response to nematodes, have been described in cucumbers (Wang et al., 2019). Furthermore, PLTs have been shown to play a crucial role in regulating various plant biological processes and responding to biotic and abiotic stress [81,82]. Due to evidence of association with kinases, PTL-RLPs may be classified as a new subfamily of RLPs or may represent an expansion of the LysM-RLP subfamily, which exhibits similar functional roles.
In silico and in vitro analyses of GDPDL-RLPs confirmed the efficiency of the RLPredictiOme in identifying a new family of RLPs based on the ectodomain of GDPDL-RLK sequences. The GDPDL-RLK is a reduced class of RLKs in plants. Among all the plant species analyzed, they have been found only in Arabidopsis halleri (Araha.28943s0001.1), Arabidopsis lyrata (475793), Arabidopsis thaliana (AT1G66980.1), Boechera stricta (Bostr.26959s0213.1, Bostr.26959s0216.1), and Brassica rapa (Brara.K00110.1), all from the Brassicaceae family, and Capsella grandiflora (Cagra.0792s0001.1) and Panicum virgatum (Pavir.6NG294600.1), from the Poaceae family. Despite only one GDPDL-RLK in the Arabidopsis genome [83], RLPreditiOme identified five sequences as GDPDL-RLP. Furthermore, the GDPDL-RLK subfamily has been maintained in only a few plant species; thereby, this family is likely suffering a reduction in size and distribution. The GDPDL2-RLK (AT1G66980) has been previously characterized as SNC4, an atypical receptor-like kinase with a predicted extracellular GDPD domain involved in regulating plant immunity [84]. The glycerophosphodiester phosphodiesterase (GDPD) hydrolyzes glycerophosphodiesters into sn-glycerol-3-phosphate (G-3-P) and plays a significant role in various biological processes [84]. The GDPDL2-RLK ectodomain is structurally similar to the predicted GDPDL-RLPs (Figure 2B). Molecular evolution investigated by calculating ka/ks of GDPDL-RLP-GDPDL-RLK pairs revealed a significant rate of synonymous substitutions indicating that although the kinase domain has been lost, the functional characteristics of the ectodomain remained conserved among evolution (Table 10).
A common feature of the RLK subfamilies is that they are often more extensive than the RLP subfamily counterparts, which suggests that some members of the RLK subfamilies have lost their conserved C-terminal kinase domain during evolution. In contrast, RLPredictiOme identified a new RLP subfamily, GDPDL-RLP, which seems to have expanded compared to the corresponding GDPDL-RLK subfamily. Therefore, we were interested in examining the expression profile of the GDPDL-RLP members to ensure a basal level of expression during development or in response to pathogens. In silico analyses from publicly available expression databases indicated that the RLP members display differential expression profiles in response to pathogens and different organs, indicating that they may be involved in development and immunity.
GDPDL1 (GDPGL-RLP) has been previously shown to be expressed in the rosettes of Arabidopsis plants [85]. We confirmed by qRT-PCR that GDPDL1 is expressed in the pedicels of the rosette and flowers. GDPDL1 has also been shown to be involved in processes that confer rigidity to the cell wall, related to defense against insects, nematodes, and oomycetes [85]. Accordingly, the previously published microarray data showed a high GDPDL1 induction in response to these pathogens and pests.
GDPDL1 and GDPDL2 displayed the highest expression in pedicels and flower stems and were highly expressed in response to pathogens and flg22. Among all members of this new GDPDL family, GDPDL3 was barely detected in the organs examined except in roots, consistent with its role in root morphogenesis [86]. GDPL4 was uniformly expressed in all organs evaluated. GDPDL4 has been described as a highly expressed gene in rosettes and is involved in the development of root hair [85,87]. Therefore, the expression profile of already described GDPDLs is coordinated with their assigned function.
Two undescribed family members, GDPDL6 and GDPDL5, displayed elevated levels of expression in flowers, showing that both genes may be involved in the development of reproductive organs and structures. These genes are also induced by biotic signals, as RT-qPCR demonstrated they were upregulated by TRV infection and microarray data showed their slight induction by nematodes. We found that all GDPDLs are induced by the RNA virus TRV and form interconnected protein-protein hubs with RNA binding proteins. It would be relevant to investigate whether GDPDLs function in RNA virus infection. The expression pattern and evolution studies of members of the GDPGL-RLP subfamily further substantiate the notion that the members of this subfamily have maintained functional domains and may play relevant roles in development and plant defense.

4. Materials and Methods

4.1. Reclassification of the Plant RLK Ectodomains for Composing Datasets

The amino acid sequences of 80 plant species were retrieved from the Phytozome database (version 11.1 by DOE Joint Genome Institute, Lawrence Berkeley National Laboratory; https://phytozome.jgi.doe.gov/, accessed on 28 February 2020). We applied filters to remove unknown sequence proteins without functional annotation. The sequences were re-annotated using SMART (version 8.0, licensed by Creative Commons Licence, manufactured by Heidelberg, Germany; smart.embl-heidelberg.de) and Pfam (pfam.sanger.ac.uk) databases. Then, the amino acid sequences containing a predicted kinase domain were selected. The signal peptide was predicted using SignalP v.4.0 [50] and Phobius [88] software, whereas the transmembrane segment was identified using TMHMM [89] and Phobius software. Then, the sequences were filtered by using the criteria based on the presence of a signal peptide and a transmembrane segment. Furthermore, the redundant sequences were removed through CD-HIT algorithm [90]. Subsequently, the amino acid sequences were grouped according to the functional domain of the extracellular ectodomain (LRR-RLK, WAK-RLK, and LysMRLK, for example) [9,91].

4.2. Dataset Composition

For the classification of RLPs, we used three steps: two steps of binary classification and one multilabel classification. In summary, the first stage compares RLPs with other families of NRLP; the second compares RLP with receptor-like kinases (RLKs); and the third performs the classification of a protein sequence within an RLP subfamily using the functional ectodomain present in RLKs. In the first stage, the training dataset consisted of amino acid sequences containing the extracellular ectodomain, the region of the membrane segment, and the cytoplasmic region that precedes (upstream) the kinase domain of RLKs (but without the kinase domain) as a positive class (RLP). The negative class was composed of full-length amino acid randomly selected sequences (NRLP); the sequences of the positive class were removed from the negative dataset. The dataset was divided into three different datasets to increase the number of negative examples.
In the second stage, the positive class contained the training dataset (RLP), and the negative class used the full-length amino acid sequences of RLKs. In the third stage, the data from RLP positive classes were labeled according to the reclassification of RLKs based on their ectodomain. In this case, a putative LRR-RLP, for instance, contained an ectodomain of the leucine-rich repeat kinase receptor-like kinase (LRR-RLK), a transmembrane segment, and a short cytoplasmic region excluding a kinase domain. Furthermore, the whole dataset was distributed into ten different sub-datasets to work around the computational time limitations of the training.

4.3. Feature Extraction

Six types of feature types representing residue frequency composition were calculated for each residue sequence. These included (i) amino acid composition frequency of full-length sequence, (ii) amino acid composition frequency (monopeptide) of the N-terminal and C- terminal regions, (iii) dipeptide frequency, (iv) tripeptide frequency, (v) frequency of chemical properties of amino acid side chains (CPAASC), and (vi) CPAASC2 frequency of the N-terminal and C-terminal regions. A numerical feature vector was created for each sequence of positive and negative datasets. The CPAASC feature describes the frequency of the chemical properties of amino acid side chains, such as positively charged, negatively charged, polar uncharged, aromatic, nonpolar aliphatic, hydrophobicity, volume, and mass of the total number of amino acids in the full-length peptide sequence [63]. In contrast, the CPAASC2 is calculated by the frequency of the chemical properties of amino acid side chains of the N-terminal and C-terminal regions. The full-length sequence is split into two equal (or nearly equal) regions, and the proportion of amino acid composition was also calculated for each of these regions. We consider the N-terminus the first region of the complete amino acid sequence and the C-terminus the second region of the full-length sequence.
The amino acid composition feature describes the frequency of an individual amino acid type within the total number of amino acids in the full-length peptide sequence (Saravanan and Gautham, 2015). The amino acid composition comprises 20 features (ACDEFGHIKLMNPQRSTVWY). The amino acid composition frequency is calculated by the individual amino acid type of the N-terminal and C-terminal regions. The amino acid composition frequency in the N-terminal and C-terminal regions comprises 40 features. The dipeptide frequency describes all combinations of amino acid pairs and comprises 400 features [92]. The tripeptide frequency describes all combinations of three amino acids resulting in 8000 features [93].
The six types of features were used to train all classification models in the three proposed steps. In summary, three training datasets totaling 18 training sets were created for each feature type to compare RLPs with NRLPs proteins (first stage). However, to compare RLPs with RLKs (second stage), one training dataset for each feature type was created. Finally, to classify RLPs within a subfamily (third stage), ten training datasets for each feature type were created, resulting in 60 training sets.

4.4. Dealing with Imbalanced Datasets

The superfamily RLK in plants has been broadly characterized and is subdivided into different groups with a different number of members in the subfamilies. The LRR-RLK is the largest subfamily, whereas other subfamilies have a lower frequency of plant members; we used the SMOTE algorithm [94] to oversample the minority class, resulting in a balanced dataset. The SMOTE creates synthetic samples based on the values of the features from the minor class.

4.5. Machine Learning Algorithms

The RLPredictiOme method embeds several ML models built with the previously described training sets. This study tested 20 ML algorithms to select the one that suits the supervised learning task. Those algorithms are implemented in the Python library Scikit-learn v.0.22.1 [95]. The algorithms AdaBoost, probability calibration, Gradient Boosting, K-Nearest Neighbors, Linear discriminant analysis, Logistic Regression, and Deep Neural Network were selected, respectively, to compose RLPredictiOme [96,97,98,99,100,101,102,103,104].

4.6. Performance Assessment of the Models

The evaluation metrics used in bioinformatics were applied to choose the most efficient algorithms and training models. We evaluated accuracy, F-measure, false discovery rate (FDR), Mathew’s correlation coefficient (MCC), precision, sensitivity, and specificity for each training set and algorithm. These metrics are calculated based on the confusion matrix (contingence matrix) using the number of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN), respectively. For multi-class models, PyCM python library was used (multi-class confusion matrix library in Python) [105].

4.7. Bayesian Inference in Ensemble Methods

Ensemble methods under an ML approach combine the predictions of several classification models with improving the overall performance. Thus, it attempts to avoid misclassification due to noise, bias, and data variance reductions. In an ensemble method, several models are used to predict each data instance. In the binary classification contrasts involving the models RLPs versus NRLPs, and RLPs versus RLKs, we assumed the results provided by n independent Bernoulli trials (0 or 1 values) with probability parameter π. Thus, the number of successes (x) derived from these trials follows a binomial distribution [106]. In this context, we assumed a Beta distribution as the prior distribution for π [107]. Under the Bayes theorem, the posterior distribution for π (probability of success of classification) is a beta distribution and is conjugated with a binomial distribution. The multilabel models to classify RLP sub-families have different probabilities of success. Thus, the sum of the classification success for each subfamily follows a multivariate generalization of the binomial distribution, named multinomial distribution. We assumed the multinomial distribution for response vector x and probability of observed, and N is a vector of the total counts in each RLP sub-families. Thus, the data distribution assumes a multinomial model for all trials. The prior probability widely used for multinomial models is the Dirichlet distribution, which presents the parameters π and θ. The data vector (x) accounts for the total counts in each RLP sub-family.
We perform Bayesian inference using the Bayesian statistical modeling and PyMC3 Python library, which uses the Markov chain Monte Carlo (MCMC) algorithms to explore the posterior distributions [108]. Based on previous analyses with MCMC chains, we opted to use a single chain with 10,000 iterations per amino acid sequence. We used burn-in to 2000 iterations and four chains for all models. The Gibbs sampler algorithm was used to generate random samples from the posterior distribution for all analyses [109].

4.8. Classifier Evaluation Strategy

The classification models were evaluated using 10-fold cross-validation. Thus, the data were divided into ten subsets, assuming the training with nine datasets and validation with one dataset. This procedure was repeated ten times, whereas the testing for the RLPredictiOme method was performed with three independent datasets. One dataset was composed of 44 RLPs already described in the literature, and other datasets with 57 LRR-RLPs and legume-like (L-type) lectins, G-type lectins, calcium-dependent (C-type) lectins, and the lectin-like Lysin-motifs (LysM) described in Arabidopsis [53,110,111]. In addition, 100 random amino acid sequences were created by an in-house algorithm to demonstrate that the classifiers do not calculate random predictions.

4.9. RLP Subfamilies Downstream Analysis

The function domain prediction analysis was carried out with the Pfam database (version 31, licensed by Creative Commons Zero (“CC0”), manufactured by European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI; Hinxton, Cambridge; http://pfam.xfam.org/) with a Hidden Markov Model (HMM) algorithm implemented in Hummer software. The signal peptide and transmembrane segment were predicted with SignalP v.4.0 and TMHMM software, respectively [50]. The topology diagram was performed with Protter Web server [112]. The sequence alignment of the RLP superfamily was conducted using the Muscle algorithm (version V1.4.4 by EMBL-EBI, Hinxton, Cambridges, United Kingdom; www.ebi.ac.uk/Tools/msa/muscle/). The phylogenetic analysis was performed by the maximum likelihood statistical method with 10.000 bootstraps using FastTree software [113]. The tree was edited using the FigTree (version V1.4.4 by Andrew Rambaut; http://tree.bio.ed.ac.uk/software/figtree/) software. The gene expression of the glycerophosphoryl diester phosphodiesterase RLP subfamily was investigated through the meta-analysis of transcriptomes using Geneinvestigator V3 [114] and ePlant [115] for the expression in tissues and responses to pathogens.

4.10. Protein-Protein Interaction (PPI) Network Analysis

GDPDLs- and SNC4-interacting proteins from Arabidopsis were used as a query term to identify their respective interactions described in the BAR database (Genome Evolution and Function (CAGEF, University of Toronto, Toronto, Canadá; http://bar.utoronto.ca/interactions/). The IntAct and Biogrid databases were selected for searching. The protein–protein interactions (PPI) were visualized in the Cytoscape software (version 3.8.1, licensed by LGP, manufactured by National Resource for Network Biology (NRNB, USA; https://cytoscape.org/), which allowed us to spot the firework topology of the interactions network and measure the network centrality metrics for each protein. We used betweenness, closeness, eccentricity, and degree. Briefly, the betweenness centrality in the PPI network of the graph G = (V, E) was calculated by the number of times a protein interacts along the shorter paths among all nodes. The closeness centrality of a protein v is the sum of the shortest path distances from w to all other proteins. The eccentricity centrality of a protein v is the maximum distance from v to all other proteins in graph G. The degree of centrality of protein v is the total number of adjacent proteins.

4.11. Plant Growth, Treatment with flg22, and Viral infection with TRV and CabLCV

All gene expression experiments used Arabidopsis thaliana ecotype Columbia (Col-0) at different ages. The seeds were germinated on half-strength Murashige and Skoog (MS; Sigma = Aldrich) plates containing 10% (w/v) sucrose and 0.8% (w/v) agar, sterile, and grown under normal growth conditions at 21 °C under a 16 h light/8 h dark cycle. After 10 days, the seedlings were transferred to a tissue culture plate containing 2 mL of 100 nM flg22 (Sigma-Aldrich), and incubated for 15 min [116]. For the viral infection assay with tobacco rattle virus (TRV), Agrobacterium cultures containing TRV-RNA1 (pTRV1) and TRV-RNA2 (pTRV2) T-DNA constructs were infiltrated onto the lower leaf of four-leaf stage N. benthamiana plants using a 1-mL needleless syringe. Infected leaves were confirmed by conventional RT-PCR using TRV-RNA2-specific primers. TRV was mechanically inoculated in A. thaliana grown in soil in a growth chamber for 14 days by rubbing the leaves with sap (0.05 M K2HPO4, pH 7.2, 0.01 M Na2SO3) from infected N. benthamiana leaves. After 2 weeks of inoculation, viral infection was confirmed by RT-PCR. For infection with cabbage leaf curl virus (CabLCV), plants at the seven-leaf stage were inoculated with plasmids containing partial tandem repeats of CabLCV DNA-A and DNA-B [117], using biolistic delivery as previously described [118,119]. Inoculated plants were transferred to a growth chamber, and infection was confirmed by conventional PCR using CabLCV DNA-B-specific primers.

4.12. RNA Extraction, Synthesis of cDNA, and qRT-PCR Analysis

For quantitative RT-PCR, total RNA was extracted from frozen leaves or seedlings with TRIzol (Invitrogen) according to the instructions from the manufacturer. To quantify flg22-induced expression, total RNA was extracted from a pool of 10 flg22-treated seedlings (as described in 4.11). For the TRV infection experiment, total RNA was extracted from a pool of 10 infected plants two weeks post-inoculation (as described in 4.11). For CabLCV infection, total RNA was extracted from a pool of 10 infected plants after 21 days of inoculation. To quantify gene expression in different organs, total RNA was extracted from flowers, the inflorescence axis, pedicels of 35 days-soil-grown Col-0 plants, and from roots of 10 days-grown plants in MS medium under the conditions described in 4.11. We used 3 samples of different pools of 10 plants each (therefore n = 3, biological replicates) and three technical replicates.
Total RNA was treated with 2 units of RNase-free DNase (Promega). First-strand cDNA was synthesized from 3.5 mg of total RNA using oligo-dT(18) and Transcriptase Reverse M-MLV (Invitrogen), according to the manufacturer’s instructions. Real-time RT-PCR reactions were performed on ABI7500 equipment (Applied Biosystems), using SYBR Green PCR Master Mix (Bio-rad). The amplification reactions were performed as follows: 2 min at 50 °C, 10 min at 95 °C, and 40 cycles of 94 °C for 15 s and 60 °C for 1 min. To quantify gene expression, we used the 2−∆Ct method and actin 3 (At3g53750) as the endogenous control genes for data normalization.

5. Conclusions

An extensive family of RLKs and RLPs on the cell surface perceive external stimuli and allows communication of plant cells with the environment. Due to their conceptual relevance in cell signaling, RLKs have been extensively studied and characterized. In contrast, little is known about the RLP family that does not harbor conserved domains to prototype genome-wide searching and characterization of members in different plant species. As a result of this investigation, a new method, based on artificial intelligence and machine learning models in combination with Bayesian inference, designated RLPredictiOme, is proposed to perform genome-wide surveys of RLPs in plant species.
We provided evidence indicating that RLPredictiOme reliably predicts RLP subfamilies in plant genomes. First, the ML models achieved high accuracy, precision, sensitivity, and specificity for predicting RLPs with relatively high probability ranging from 0.79 to 0.99. Second, in the validation tests, more than 90% of known RLPs from Arabidopsis and rice were correctly predicted via RLPredictiOme. Finally, RLPredictiOme may have outperformed the predicting methods based on sequence comparison because it discovered new RLP subfamilies in the Arabidopsis genome. Therefore, PredctOme provides a reliable means to rationalize functional studies of the RLP gene family.
The new GDPDL-RLP subfamily seems to have expanded from the only GDPDL-RLK representative in the Arabidopsis genome. All five GDPDL-RLPs were expressed in different organs and responded to biotic signals. Evolution studies showed that their ectodomain may have undergone purifying selection, indicating that the members of this subfamily may have kept conserved functional signatures during evolution. In addition, an in silico analysis demonstrated that GDPDL-RLPs form biologically relevant hubs in the GDPDL-RLP-Arabidopsis protein-protein interactions network. Collectively, these biological studies confirmed the prediction of the new GDPDL-RLP subfamily.
In addition to using a set of conventional extractable features for training the classification models, RLPredictiOme also filters the conserved characteristics of the RLP configuration. These conserved attributes include the presence of a signal peptide, RLK ectodomains, a transmembrane segment, and the lack of a C-terminal kinase domain. Therefore, RLPredictiOme has the potential to predict RLPs from other organisms as well. Furthermore, the consistent and expanded results using RLPredctOme, which applies a different approach from sequence comparison methods, certify this new method as an innovative and promising tool for predicting RLPs. RLPredictOme will ultimately serve as an essential complement for protein annotation, identification, and functional prediction of novel RLPs in different plant species and organisms.

Supplementary Materials

The supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijms232012176/s1.

Author Contributions

J.C.F.S., conceptualization, writing—original draft preparation; M.A.F. conducted laboratory experiment; T.F.M.C., server configuration online and front-end developer; F.F.S., S.d.A.S., S.H.B., E.P.B.F., writing—review and editing, supervision; E.P.B.F., project administration, and funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq Grant no. 403819/2021-0 to E.P.B.F.) and Fundação de Amparo à Pesquisa do Estado de Minas Gerais, Brazil (Fapemig Grants no APQ-01282-17 and RED-00205-22 to E.P.B.F.).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data are available at http://209.145.56.49:8080/web/.

Acknowledgments

This work was partially supported by the Brazilian funding agencies: Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG), and the National Institute of Science and Technology in Plant-Pest Interactions (INCTIPP).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

ACCaccuracy MLmachine learning
BAK1BRI1-ASSOCIATED RECEPTOR KINASE1MLPLmajor latex protein-like
BRI1BRASSINOSTEROID INSENSITIVE 1 MSMurashige and Skoog
CabLCVcabbage leaf curl virus NEP1NECROSIS- AND ETHYLENE-INDUCING PEPTIDE 1
CAPadenylate-cyclase-associated NLPsNEP1-LIKE PROTEINS
CERK1CHITIN ELICITOR RECEPTOR KINASE 1 NRLPsnon-RLPs
CLV1CLAVATA1 nsLTPnon-specific lipid transfer proteins
CPAASC2chemical properties of amino acid side chains 2PAMPspathogen-associated molecular patterns
DAMPsdamage-associated molecular patterns PEPR1PEP1 RECEPTOR 1
ECDextracellular domain PEPR2PEP1 RECEPTOR 2
EPF1EPIDERMAL PATTERNING FACTOR 1 PPIprotein-protein interaction
EPF2EPIDERMAL PATTERNING FACTOR 2 PRRspattern recognition receptors
ERendoplasmic reticulum PSKPHYTOSULFOKINE
ERL1ERECTA-LIKE 1 PSKR1PHYTOSULFOKINE RECEPTOR 1
ETIeffector-triggered immunity PPIprotein-protein interactions
FDRfalse discovery rate PTIPAMP-triggered immunity
GDPDLglycerophosphoryl diester phosphodiesterase family RLCKreceptor-like cytoplasmic kinases
HMMhidden Markov model RLPreceptor-like protein
LRRleucine-rich repeat SOBIR1SUPPRESSOR OF BIR1-1
LRR-RLKleucine-rich repeat kinase receptor-like kinase SPsignal peptide
LYM1LYSIN-MOTIF 1 TMMRLP TOO MANY MOUTHS
LYM3LYSIN-MOTIF 3 TNtrue negatives
LysMlysin-motifs TPtrue positives
MCCMathew’s correlation coefficientTRVtobacco rattle virus

References

  1. Tang, D.; Wang, G.; Zhou, J.M. Receptor kinases in plant-pathogen interactions: More than pattern recognition. Plant Cell 2017, 29, 618–637. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  2. He, Y.; Zhou, J.; Shan, L.; Meng, X. Plant cell surface receptor-mediated signaling–a common theme amid diversity. J. Cell Sci. 2018, 131, jcs209353. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  3. Shiu, S.H.; Karlowski, W.M.; Pan, R.; Tzeng, Y.H.; Mayer, K.F.; Li, W.H. Comparative analysis of the receptor-like kinase family in Arabidopsis and rice. Plant Cell 2004, 16, 1220–1234. [Google Scholar] [CrossRef] [Green Version]
  4. Ma, X.; Xu, G.; He, P.; Shan, L. SERKing coreceptors for receptors. Trends Plant Sci. 2016, 21, 1017–1033. [Google Scholar] [CrossRef]
  5. Botos, I.; Segal, D.M.; Davies, D.R. The structural biology of Toll-like receptors. Structure 2011, 19, 447–459. [Google Scholar] [CrossRef] [Green Version]
  6. Shiu, S.H.; Bleecker, A.B. Plant receptor-like kinase gene family: Diversity, function, and signaling. Sci. STKE 2001, 2001, re22. [Google Scholar] [CrossRef]
  7. Shiu, S.H.; Bleecker, A.B. Expansion of the receptor-like kinase/Pelle gene family and receptor-like proteins in Arabidopsis. Plant Physiol. 2003, 132, 530–543. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  8. Gao, L.L.; Xue, H.W. Global analysis of expression profiles of rice receptor-like kinase genes. Mol. Plant 2012, 5, 143–153. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  9. Sakamoto, T.; Deguchi, M.; Brustolini, O.J.; Santos, A.A.; Silva, F.F.; Fontes, E.P. The tomato RLK superfamily: Phylogeny and functional predictions about the role of the LRRII-RLK subfamily in antiviral defense. BMC Plant Biol. 2012, 12, 229. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  10. Zhou, F.; Guo, Y.; Qiu, L.J. Genome-wide identification and evolutionary analysis of leucine-rich repeat receptor-like protein kinase genes in soybean. BMC Plant Biol. 2016, 16, 58. [Google Scholar] [CrossRef]
  11. Li, J.; Chory, J. A putative leucine-rich repeat receptor kinase involved in brassinosteroid signal transduction. Cell 1997, 90, 929–938. [Google Scholar] [CrossRef] [Green Version]
  12. Lee, J.S.; Kuroha, T.; Hnilova, M.; Khatayevich, D.; Kanaoka, M.M.; McAbee, J.M.; Sarikaya, M.; Tamerler, C.; Torii, K.U. Direct interaction of ligand–receptor pairs specifying stomatal patterning. Genes Dev. 2012, 26, 126–136. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  13. Jia, G.; Liu, X.; Owen, H.A.; Zhao, D. Signaling of cell fate determination by the TPD1 small protein and EMS1 receptor kinase. Proc. Natl. Acad. Sci. USA 2008, 105, 2220–2225. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  14. Cho, S.K.; Larue, C.T.; Chevalier, D.; Wang, H.; Jinn, T.L.; Zhang, S.; Walker, J.C. Regulation of floral organ abscission in Arabidopsis thaliana. Proc. Natl. Acad. Sci. USA 2008, 105, 15629–15634. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  15. Kumpf, R.P.; Shi, C.L.; Larrieu, A.; Stø, I.M.; Butenko, M.A.; Péret, B.; Riiser, E.S.; Bennett, M.J.; Aalen, R.B. Floral organ abscission peptide IDA and its HAE/HSL2 receptors control cell separation during lateral root emergence. Proc. Natl. Acad. Sci. USA 2013, 110, 5235–5240. [Google Scholar] [CrossRef] [Green Version]
  16. Chen, D.; Guo, H.; Chen, S.; Yue, Q.; Wang, P.; Chen, X. Receptor-like kinase HAESA-like 1 positively regulates seed longevity in Arabidopsis. Planta 2022, 256, 21. [Google Scholar] [CrossRef] [PubMed]
  17. Ogawa, M.; Shinohara, H.; Sakagami, Y.; Matsubayashi, Y. Arabidopsis CLV3 peptide directly binds CLV1 ectodomain. Science 2008, 319, 294. [Google Scholar] [CrossRef] [PubMed]
  18. Ou, Y.; Kui, H.; Li, J. Receptor-like kinases in root development: Current progress and future directions. Mol. Plant 2021, 14, 166–185. [Google Scholar] [CrossRef]
  19. Hirakawa, Y.; Shinohara, H.; Kondo, Y.; Inoue, A.; Nakanomyo, I.; Ogawa, M.; Sawa, S.; Ohashi-Ito, K.; Matsubayashi, Y.; Fukuda, H. Non-cell-autonomous control of vascular stem cell fate by a CLE peptide/receptor system. Proc. Natl. Acad. Sci. USA 2008, 105, 15208–15213. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  20. Wang, J.; Li, H.; Han, Z.; Zhang, H.; Wang, T.; Lin, G.; Chang, J.; Yang, W.; Chai, J. Allosteric receptor activation by the plant peptide hormone phytosulfokine. Nature 2015, 525, 265–268. [Google Scholar] [CrossRef]
  21. Haruta, M.; Sabat, G.; Stecker, K.; Minkoff, B.B.; Sussman, M.R. A peptide hormone and its receptor protein kinase regulate plant cell expansion. Science 2014, 343, 408–411. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  22. Zhong, S.; Li, L.; Wang, Z.; Ge, Z.; Li, Q.; Bleckmann, A.; Wang, J.; Song, Z.; Shi, Y.; Liu, T.; et al. RALF peptide signaling controls the polytubey block in Arabidopsis. Science 2022, 375, 290–296. [Google Scholar] [CrossRef] [PubMed]
  23. Macho, A.P.; Zipfel, C. Plant PRRs and the activation of innate immune signaling. Mol. Cell 2014, 54, 263–272. [Google Scholar] [CrossRef] [Green Version]
  24. Gómez-Gómez, L.; Boller, T. FLS2: An LRR receptor–like kinase involved in the perception of the bacterial elicitor flagellin in Arabidopsis. Mol. Cell 2000, 5, 1003–1011. [Google Scholar] [CrossRef]
  25. Zipfel, C.; Kunze, G.; Chinchilla, D.; Caniard, A.; Jones, J.D.; Boller, T.; Felix, G. Perception of the bacterial PAMP EF-Tu by the receptor EFR restricts Agrobacterium-mediated transformation. Cell 2006, 125, 749–760. [Google Scholar] [CrossRef]
  26. Yamaguchi, Y.; Pearce, G.; Ryan, C.A. The cell surface leucine-rich repeat receptor for At Pep1, an endogenous peptide elicitor in Arabidopsis, is functional in transgenic tobacco cells. Proc. Natl. Acad. Sci. USA 2006, 103, 10104–10109. [Google Scholar] [CrossRef] [Green Version]
  27. Yamaguchi, Y.; Huffaker, A.; Bryan, A.C.; Tax, F.E.; Ryan, C.A. PEPR2 is a second receptor for the Pep1 and Pep2 peptides andcontributes to defense responses in Arabidopsis. Plant Cell 2010, 22, 508–522. [Google Scholar] [CrossRef] [Green Version]
  28. Miya, A.; Albert, P.; Shinya, T.; Desaki, Y.; Ichimura, K.; Shirasu, K.; Narusaka, Y.; Kawakami, N.; Kaku, H.; Shibuya, N. CERK1, a LysM receptor kinase, is essential for chitin elicitor signaling in Arabidopsis. Proc. Natl. Acad. Sci. USA 2007, 104, 19613–19618. [Google Scholar] [CrossRef] [Green Version]
  29. Wan, J.; Zhang, X.C.; Neece, D.; Ramonell, K.M.; Clough, S.; Kim, S.y.; Stacey, M.G.; Stacey, G. A LysM receptor-like kinase plays a critical role in chitin signaling and fungal resistance in Arabidopsis. Plant Cell 2008, 20, 471–481. [Google Scholar] [CrossRef] [Green Version]
  30. Wan, J.; Tanaka, K.; Zhang, X.C.; Son, G.H.; Brechenmacher, L.; Nguyen, T.H.N.; Stacey, G. LYK4, a lysin motif receptor-like kinase, is important for chitin signaling and plant innate immunity in Arabidopsis. Plant Physiol. 2012, 160, 396–406. [Google Scholar] [CrossRef]
  31. Petutschnig, E.K.; Jones, A.M.; Serazetdinova, L.; Lipka, U.; Lipka, V. The lysin motif receptor- like kinase (LysM-RLK) CERK1 is a major chitin-binding protein in Arabidopsis thaliana and subject to chitin-induced phosphorylation. Plant Biotechnol. J. 2010, 285, 28902–28911. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  32. Cao, Y.; Liang, Y.; Tanaka, K.; Nguyen, C.T.; Jedrzejczak, R.P.; Joachimiak, A.; Stacey, G. The kinase LYK5 is a major chitin receptor in Arabidopsis and forms a chitin-induced complex with related kinase CERK1. eLife 2014, 3, e03766. [Google Scholar] [CrossRef] [PubMed]
  33. Ranf, S.; Gisch, N.; Schäffer, M.; Illig, T.; Westphal, L.; Knirel, Y.A.; Sánchez-Carballo, P.M.; Zähringer, U.; Hückelhoven, R.; Lee, J.; et al. A lectin S-domain receptor kinase mediates lipopolysaccharide sensing in Arabidopsis thaliana. Nat. Immun. 2015, 16, 426–433. [Google Scholar] [CrossRef]
  34. Yu, H.; Ruan, H.; Xia, X.; Chicowski, A.S.; Whitham, S.A.; Li, Z.; Wang, G.; Liu, W. Maize FERONIA-like receptor genes are involved in the response of multiple disease resistance in maize. Mol. Plant Pathol. 2022, 23, 1331–1345. [Google Scholar] [CrossRef] [PubMed]
  35. Ortiz-Morea, F.A.; Liu, J.; Shan, L.; He, P. Malectin-like receptor kinases as protector deities in plant immunity. Nat. Plants 2022, 8, 27–37. [Google Scholar] [CrossRef] [PubMed]
  36. Chen, X.; Ding, Y.; Yang, Y.; Song, C.; Wang, B.; Yang, S.; Guo, Y.; Gong, Z. Protein kinases in plant responses to drought, salt, andcold stress. J. Integr. Plant Biol. 2021, 63, 53–78. [Google Scholar] [CrossRef] [PubMed]
  37. Invernizzi, M.; Hanemian, M.; Keller, J.; Libourel, C.; Roby, D. PERKing up our understanding of the proline-rich extensin-like receptor kinases, a forgotten plant receptor kinase family. New Phytol. 2022, 235, 875–884. [Google Scholar] [CrossRef]
  38. Xie, Y.; Sun, P.; Li, Z.; Zhang, F.; You, C.; Zhang, Z. FERONIA receptor kinase integrates with hormone signaling to regulate plant growth, development, and responses to environmental stimuli. Int. J. Mol. Sci. 2022, 23, 3730. [Google Scholar] [CrossRef]
  39. Xie, Y.H.; Zhang, F.J.; Sun, P.; Li, Z.Y.; Zheng, P.F.; Gu, K.D.; Hao, Y.J.; Zhang, Z.; You, C.X. Apple receptor-like kinase FERONIA regulates salt tolerance and ABA sensitivity in Malus domestica. J. Plant Physiol. 2022, 270, 153616. [Google Scholar] [CrossRef]
  40. Yang, L.; Gao, C.; Jiang, L. Leucine-rich repeat receptor-like protein kinase AtORPK1 promotes oxidative stress resistance in and AtORPK1-AtKAPP mediated module in Arabidopsis. Plant Sci. J. 2022, 315, 111147. [Google Scholar] [CrossRef]
  41. Zhou, H.; Xiao, F.; Zheng, Y.; Liu, G.; Zhuang, Y.; Wang, Z.; Zhang, Y.; He, J.; Fu, C.; Lin, H. PAMP-INDUCED SECRETED PEPTIDE 3 modulates salt tolerance through RECEPTOR-LIKE KINASE 7 in plants. Plant Cell 2022, 34, 927–944. [Google Scholar] [CrossRef] [PubMed]
  42. Liu, Z.; Hou, S.; Rodrigues, O.; Wang, P.; Luo, D.; Munemasa, S.; Lei, J.; Liu, J.; Ortiz-Morea, F.A.; Wang, X.; et al. Phytocytokine signalling reopens stomata in plant immunity and water loss. Nature 2022, 605, 332–339. [Google Scholar] [CrossRef] [PubMed]
  43. Lin, G.; Zhang, L.; Han, Z.; Yang, X.; Liu, W.; Li, E.; Chang, J.; Qi, Y.; Shpak, E.D.; Chai, J. A receptor-like protein acts as a specificity switch for the regulation of stomatal development. Genes Dev. 2017, 31, 927–938. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  44. Jeong, S.; Trotochaud, A.E.; Clark, S.E. The Arabidopsis CLAVATA2 gene encodes a receptor-like protein required for the stability of the CLAVATA1 receptor-like kinase. Plant Cell 1999, 11, 1925–1933. [Google Scholar] [CrossRef] [Green Version]
  45. Willmann, R.; Lajunen, H.M.; Erbs, G.; Newman, M.A.; Kolb, D.; Tsuda, K.; Katagiri, F.; Fliegmann, J.; Bono, J.J.; Cullimore, J.V.; et al. Arabidopsis lysin-motif proteins LYM1 LYM3 CERK1 mediate bacterial peptidoglycan sensing and immunity to bacterial infection. Proc. Natl. Acad. Sci. USA 2011, 108, 19824–19829. [Google Scholar] [CrossRef] [Green Version]
  46. Albert, I.; Böhm, H.; Albert, M.; Feiler, C.E.; Imkampe, J.; Wallmeroth, N.; Brancato, C.; Raaymakers, T.M.; Oome, S.; Zhang, H.; et al. An RLP23–SOBIR1–BAK1 complex mediates NLP-triggered immunity. Nat. Plants 2015, 1, 15140. [Google Scholar] [CrossRef]
  47. Jones, D.A.; Thomas, C.M.; Hammond-Kosack, K.E.; Balint-Kurti, P.J.; Jones, J.D. Isolation of the tomato Cf-9 gene for resistance to Cladosporium fulvum by transposon tagging. Science 1994, 266, 789–793. [Google Scholar] [CrossRef]
  48. Thomas, C.M.; Jones, D.A.; Parniske, M.; Harrison, K.; Balint-Kurti, P.J.; Hatzixanthis, K.; Jones, J. Characterization of the tomato Cf-4 gene for resistance to Cladosporium fulvum identifies sequences that determine recognitional specificity in Cf-4 and Cf-9. Plant Cell 1997, 9, 2209–2224. [Google Scholar] [CrossRef] [Green Version]
  49. Postma, J.; Liebrand, T.W.; Bi, G.; Evrard, A.; Bye, R.R.; Mbengue, M.; Kuhn, H.; Joosten, M.H.; Robatzek, S. Avr4 promotes Cf-4 receptor-like protein association with the BAK1/SERK3 receptor-like kinase to initiate receptor endocytosis and plant immunity. New Phytol. 2016, 210, 627–642. [Google Scholar] [CrossRef] [Green Version]
  50. Nielsen, H. Predicting secretory proteins with SignalP. In Protein Function Prediction; Springer: New York, NY, USA, 2017; pp. 59–73. [Google Scholar] [CrossRef]
  51. Wang, Y.; Xu, Y.; Sun, Y.; Wang, H.; Qi, J.; Wan, B.; Ye, W.; Lin, Y.; Shao, Y.; Dong, S.; et al. Leucine-rich repeat receptor-like gene screen reveals that Nicotiana RXEG1 regulates glycoside hydrolase 12 MAMP detection. Nat. Commun. 2018, 9, 594. [Google Scholar] [CrossRef] [Green Version]
  52. Yu, H.; Xie, W.; Li, J.; Zhou, F.; Zhang, Q. A whole-genome SNP array (RICE 6 K) for genomic breeding in rice. Plant Biotechnol. J. 2014, 12, 28–37. [Google Scholar] [CrossRef]
  53. Jamieson, P.A.; Shan, L.; He, P. Plant cell surface molecular cypher: Receptor-like proteins and 957 their roles in immunity and development. Plant Sci. J. 2018, 274, 242–251. [Google Scholar] [CrossRef] [PubMed]
  54. Silva, J.C.F.; Teixeira, R.M.; Silva, F.F.; Brommonschenkel, S.H.; Fontes, E.P. Machine learning approaches and their current application in Plant Mol Biol: A systematic review. Plant Sci. J. 2019, 284, 37–47. [Google Scholar] [CrossRef] [PubMed]
  55. Gastaldo, P.; Pinna, L.; Seminara, L.; Valle, M.; Zunino, R. A tensor-based approach to touch modality classification by using machine learning. Rob. Auton. Syst. 2015, 63, 268–278. [Google Scholar] [CrossRef]
  56. Kang, J.; Schwartz, R.; Flickinger, J.; Beriwal, S. Machine learning approaches for predicting radiation therapy outcomes: A clinician’s perspective. Int. J. Radiat. Oncol. Biol. Phys. 2015, 93, 1127–1135. [Google Scholar] [CrossRef] [PubMed]
  57. Zhang, B.; He, X.; Ouyang, F.; Gu, D.; Dong, Y.; Zhang, L.; Mo, X.; Huang, W.; Tian, J.; Zhang, S. Radiomic machine-learning classifiers for prognostic biomarkers of advanced nasopharyngeal carcinoma. Cancer Lett. 2017, 403, 21–27. [Google Scholar] [CrossRef]
  58. Silva, J.C.F.; Carvalho, T.F.; Fontes, E.P.; Cerqueira, F.R. Fangorn Forest (F2): A machine learning approach to classify genes and genera in the family Geminiviridae. BMC Bioinform. 2017, 18, 431. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  59. Pineda, M.; Pérez-Bueno, M.L.; Barón, M. Detection of bacterial infection in melon plants by classification methods based on imaging data. Front. Plant Sci. 2018, 9, 164. [Google Scholar] [CrossRef] [Green Version]
  60. Moghimi, A.; Yang, C.; Miller, M.E.; Kianian, S.F.; Marchetto, P.M. A novel approach to assess salt stress tolerance in wheat using hyperspectral imaging. Front. Plant Sci. 2018, 9, 1182. [Google Scholar] [CrossRef] [PubMed]
  61. Gutiérrez, S.; Fernández-Novales, J.; Diago, M.P.; Tardaguila, J. On-the-go hyperspectral imaging under field conditions and machine learning for the classification of grapevine varieties. Front. Plant Sci. 2018, 9, 1102. [Google Scholar] [CrossRef]
  62. Ma, C.; Xin, M.; Feldmann, K.A.; Wang, X. Machine learning–based differential network analysis: A study of stress-responsive transcriptomes in Arabidopsis. Plant Cell 2014, 26, 520–537. [Google Scholar] [CrossRef] [Green Version]
  63. Carvalho, T.F.M.; Silva, J.C.F.; Calil, I.P.; Fontes, E.P.B.; Cerqueira, F.R. Rama: A machine learning approach for ribosomal protein prediction in plants. Sci. Rep. 2017, 7, 16273. [Google Scholar] [CrossRef] [Green Version]
  64. Fritz-Laylin, L.K.; Krishnamurthy, N.; Tör, M.; Sjölander, K.V.; Jones, J.D. Phylogenomic analysis of the receptor-like proteins of rice and Arabidopsis. Plant Physiol. 2005, 138, 611–623. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  65. Buendia, L.; Girardin, A.; Wang, T.; Cottret, L.; Lefebvre, B. LysM receptor-like kinase and LysM receptor-like protein families: An update on phylogeny and functional characterization. Front. Plant Sci. 2018, 9, 1531. [Google Scholar] [CrossRef] [Green Version]
  66. Jonak, C.; Hirt, H. Glycogen synthase kinase 3/SHAGGY-like kinases in plants: An emerging family with novel functions. Trends Plant Sci. 2002, 7, 457–461. [Google Scholar] [CrossRef]
  67. Nie, J.; Zhou, W.; Liu, J.; Tan, N.; Zhou, J.M.; Huang, L. A receptor-like protein from Nicotiana benthamiana mediates VmE02 PAMP-triggered immunity. New Phytol. 2021, 229, 2260–2272. [Google Scholar] [CrossRef]
  68. Petersen, N.H.; Joensen, J.; McKinney, L.V.; Brodersen, P.; Petersen, M.; Hofius, D.; Mundy, J. Identification of proteins interacting with Arabidopsis ACD11. J. Plant Physiol. 2009, 166, 661–666. [Google Scholar] [CrossRef]
  69. Li, Q.; Ai, G.; Shen, D.; Zou, F.; Wang, J.; Bai, T.; Chen, Y.; Li, S.; Zhang, M.; Jing, M.; et al. A Phytophthora capsici effector targets ACD11 binding partners that regulate ROS-mediated defense response in Arabidopsis. Mol. Plant 2019, 12, 565–581. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  70. Ascencio-Ibánez, J.T.; Sozzani, R.; Lee, T.J.; Chu, T.M.; Wolfinger, R.D.; Cella, R.; Hanley-Bowdoin, L. Global analysis of Arabidopsis gene expression uncovers a complex array of changes impacting pathogen response and cell cycle during geminivirus infection. Plant Physiol. 2008, 148, 436–454. [Google Scholar] [CrossRef]
  71. Liu, J.; Chen, N.; Grant, J.N.; Cheng, Z.M.; Stewart Jr, C.N.; Hewezi, T. Soybean kinome: Functional classification and gene expression patterns. J. Exp. Bot. 2015, 66, 1919–1934. [Google Scholar] [CrossRef] [PubMed]
  72. Yan, J.; Su, P.; Wei, Z.; Nevo, E.; Kong, L. Genome-wide identification, classification, evolutionary analysis and gene expression patterns of the protein kinase gene family in wheat and Aegilops tauschii. Plant Mol. Biol. 2017, 95, 227–242. [Google Scholar] [CrossRef]
  73. Zuo, C.; Liu, H.; Lv, Q.; Chen, Z.; Tian, Y.; Mao, J.; Chu, M.; Ma, Z.; An, Z.; Chen, B. Genome-wide analysis of the apple (Malus domestica) cysteine-rich receptor-like kinase (CRK) family: Annotation, genomic organization, and expression profiles in response to fungal infection. Plant Mol. Biol. Rep. 2020, 38, 14–24. [Google Scholar] [CrossRef]
  74. Yan, J.; Li, G.; Guo, X.; Li, Y.; Cao, X. Genome-wide classification, evolutionary analysis and gene expression patterns of the kinome in Gossypium. PLoS ONE 2018, 13, e0197392. [Google Scholar] [CrossRef] [Green Version]
  75. Dezhsetan, S. Genome scanning for identification and mapping of receptor-like kinase (RLK) gene superfamily in Solanum tuberosum. Physiol. Mol. Biol. Plants 2017, 23, 755–765. [Google Scholar] [CrossRef]
  76. Pal, T.; Jaiswal, V.; Chauhan, R.S. DRPPP: A machine learning based tool for prediction of disease resistance proteins in plants. Comput. Biol. Med. 2016, 78, 42–48. [Google Scholar] [CrossRef] [PubMed]
  77. Ni, Y.; Aghamirzaie, D.; Elmarakeby, H.; Collakova, E.; Li, S.; Grene, R.; Heath, L.S. A machine learning approach to predict gene regulatory networks in seed development in Arabidopsis. Front. Plant Sci. 2016, 7, 1936. [Google Scholar] [CrossRef] [Green Version]
  78. Kushwaha, S.K.; Chauhan, P.; Hedlund, K.; Ahrén, D. NBSPred: A support vector machine-based high-throughput pipeline for plant resistance protein NBSLRR prediction. Bioinformatics 2016, 32, 1223–1225. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  79. Dietterich, T.G. Ensemble methods in machine learning. In Proceedings of the International Workshop on Multiple Classifier Systems, Cagliari, Italy, 21–23 June 2000; pp. 1–15. [Google Scholar]
  80. Wang, X.; Li, Q.; Cheng, C.; Zhang, K.; Lou, Q.; Li, J.; Chen, J. Genome-wide analysis of a putative lipid transfer protein LTP_2 gene family reveals CsLTP_2 genes involved in response of cucumber against root-knot nematode (Meloidogyne incognita). Genome 2020, 63, 225–238. [Google Scholar] [CrossRef] [PubMed]
  81. Torres-Schumann, S.; Godoy, J.A.; Pintor-Toro, J.A. A probable lipid transfer protein gene is induced by NaCl in stems of tomato plants. Plant Mol. Biol. 1992, 18, 749–757. [Google Scholar] [CrossRef] [PubMed]
  82. Kapoor, R.; Kumar, G.; Arya, P.; Jaswal, R.; Jain, P.; Singh, K.; Sharma, T.R. Genome-wide analysis and expression profiling of rice hybrid proline-rich proteins in response to biotic and abiotic stresses, and hormone treatment. Plants 2019, 8, 343. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  83. Bi, D.; Cheng, Y.T.; Li, X.; Zhang, Y. Activation of plant immune responses by a gain-of-function mutation in an atypical receptor-like kinase. Plant Physiol. 2010, 153, 1771–1779. [Google Scholar] [CrossRef] [Green Version]
  84. Zhang, Z.; Liu, Y.; Ding, P.; Li, Y.; Kong, Q.; Zhang, Y. Splicing of receptor-like kinase-encoding SNC4 and CERK1 is regulated by two conserved splicing factors that are required for plant immunity. Mol. Plant 2014, 7, 1766–1775. [Google Scholar] [CrossRef] [Green Version]
  85. Duruflé, H.; Hervé, V.; Ranocha, P.; Balliau, T.; Zivy, M.; Chourré, J.; San Clemente, H.; Burlat, V.; Albenne, C.; Déjean, S.; et al. Cellwall modifications of two Arabidopsis thaliana ecotypes, Col, and Sha, in response to sub-optimal growth conditions: An integrative study. PlantSci.J. 2017, 263, 183–193. [Google Scholar]
  86. Hayashi, S.; Ishii, T.; Matsunaga, T.; Tominaga, R.; Kuromori, T.; Wada, T.; Shinozaki, K.; Hirayama, T. The glycerophosphoryl diester phosphodiesterase-like proteins SHV3 and its homologs play important roles in cell wall organization. Plant Cell Physiol. 2008, 49, 1522–1535. [Google Scholar] [CrossRef] [Green Version]
  87. Salazar-Henao, J.E.; Lin, W.D.; Schmidt, W. Discriminative gene co-expression network analysis uncovers novel modules involved in the formation of phosphate deficiency-induced root hairs in Arabidopsis. Sci. Rep. 2016, 6, 26820. [Google Scholar] [CrossRef] [PubMed]
  88. Käll, L.; Krogh, A.; Sonnhammer, E.L. A combined transmembrane topology and signal peptide prediction method. J. Mol. Biol. 2004, 338, 1027–1036. [Google Scholar] [CrossRef]
  89. Sonnhammer, E.L.; Von Heijne, G.; Krogh, A. A hidden Markov model for predicting transmembrane helices in protein sequences. In Proceedings of the ISMB, Montréal, QC, Canada, 28 June–1 July 1998; Volume 6, pp. 175–182. [Google Scholar]
  90. Fu, L.; Niu, B.; Zhu, Z.; Wu, S.; Li, W. CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics 2012, 28, 3150–3152. [Google Scholar] [CrossRef]
  91. Shiu, S.H.; Bleecker, A.B. Receptor-like kinases from Arabidopsis form a monophyletic gene family related to animal receptor kinases. Proc. Natl. Acad. Sci. USA 2001, 98, 10763–10768. [Google Scholar] [CrossRef] [Green Version]
  92. Saravanan, V.; Gautham, N. Harnessing computational biology for exact linear B-cell epitope prediction: A novel amino acid composition-based feature descriptor. OMICS 2015, 19, 648–658. [Google Scholar] [CrossRef] [PubMed]
  93. Bhasin, M.; Raghava, G.P. Classification of nuclear receptors based on amino acid composition and dipeptide composition. J. Biol. Chem. 2004, 279, 23262–23266. [Google Scholar] [CrossRef]
  94. Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
  95. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn Res. 2011, 12, 2825–2830. [Google Scholar]
  96. Freund, Y.; Schapire, R.E. A decision-theoretic generalization of online learning and an application to boosting. In Proceedings of the European Conference on Computational Learning Theory, Barcelona, Spain, 13–15 March 1995. [Google Scholar]
  97. Platt, J. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classif. 1999, 10, 61–74. [Google Scholar]
  98. Friedman, J.H. Stochastic gradient boosting. Comput. Stat. Data Anal. 2002, 38, 367–378. [Google Scholar] [CrossRef]
  99. Samworth, R.J. Optimal weighted nearest neighbour classifiers. Ann. Stat. 2012, 40, 2733–2763. [Google Scholar] [CrossRef]
  100. Hastie, T.; Tibshirani, R.; Friedman, J.H.; Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: New York, NY, USA, 2009; Volume 2. [Google Scholar]
  101. Kim, K.S.; Choi, H.H.; Moon, C.S.; Mun, C.W. Comparison of k-nearest neighbor, quadratic discriminant and linear discriminant analysis in classification of electromyogram signals based on the wrist-motion directions. Curr. Appl. Phys. 2011, 11, 740–745. [Google Scholar] [CrossRef]
  102. Schmidt, M.; LeRoux, N.; Bach, F. Minimizing finite sums with the stochastic average gradient. Math. Program. 2017, 162, 83–112. [Google Scholar] [CrossRef] [Green Version]
  103. King, G.; Zeng, L. Logistic regression in rare events data. Polit. Anal. 2001, 9, 137–163. [Google Scholar] [CrossRef] [Green Version]
  104. Hinton, G.; Deng, L.; Yu, D.; Dahl, G.E.; Mohamed, A.-r.; Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen, P.; Sainath, T.N.; et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process Mag. 2012, 29, 82–97. [Google Scholar] [CrossRef]
  105. Haghighi, S.; Jasemi, M.; Hessabi, S.; Zolanvari, A. PyCM: Multiclass confusion matrix library in Python. J. Open Source Softw. 2018, 3, 729. [Google Scholar] [CrossRef] [Green Version]
  106. Feller, W. An Introduction to Probability Theory and Its Applications; John Wiley & Sons: Hoboken, NJ, USA, 2008; Volume 2. [Google Scholar]
  107. Gupta, A.K.; Nadarajah, S. Handbook of Beta Distribution and Its Applications; CRC Press: Boca Raton, FL, USA, 2004. [Google Scholar]
  108. Salvatier, J.; Wiecki, T.V.; Fonnesbeck, C. Probabilistic programming in Python using PyMC3. PeerJ Comput. Sci. 2016, 2, e55. [Google Scholar] [CrossRef] [Green Version]
  109. Geman, S.; Geman, D. Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. J. Appl. Stat. 1993, 20, 25–62. [Google Scholar] [CrossRef]
  110. Faulkner, C.; Petutschnig, E.; Benitez-Alfonso, Y.; Beck, M.; Robatzek, S.; Lipka, V.; Maule, A.J. LYM2-dependent chitin perception limits molecular flux via plasmodesmata. Proc. Natl. Acad. Sci. USA 2013, 110, 9166–9170. [Google Scholar] [CrossRef] [Green Version]
  111. Liu, B.; Li, J.F.; Ao, Y.; Qu, J.; Li, Z.; Su, J.; Zhang, Y.; Liu, J.; Feng, D.; Qi, K.; et al. Lysin motif–containing proteins LYP4 and LYP6 play dual roles in peptidoglycan and chitin perception in rice innate immunity. Plant Cell 2012, 24, 3406–3419. [Google Scholar] [CrossRef] [Green Version]
  112. Omasits, U.; Ahrens, C.H.; Müller, S.; Wollscheid, B. Protter: Interactive protein feature visualization and integration with experimental proteomic data. Bioinformatics 2014, 30, 884–886. [Google Scholar] [CrossRef] [Green Version]
  113. Price, M.N.; Dehal, P.S.; Arkin, A.P. FastTree 2–approximately maximum-likelihood trees for large alignments. PLoS ONE 2010, 5, e9490. [Google Scholar] [CrossRef]
  114. Hruz, T.; Laule, O.; Szabo, G.; Wessendorp, F.; Bleuler, S.; Oertle, L.; Widmayer, P.; Gruissem, W.; Zimmermann, P. Genevestigator v3: A reference expression database for the meta-analysis of transcriptomes. Adv. Bioinform. 2008, 2008, 420747. [Google Scholar] [CrossRef] [Green Version]
  115. Waese, J.; Fan, J.; Pasha, A.; Yu, H.; Fucile, G.; Shi, R.; Cumming, M.; Kelley, L.A.; Sternberg, M.J.; Krishnakumar, V.; et al. ePlant: Visualizing and exploring multiple levels of data for hypothesis generation in plant biology. Plant Cell 2017, 29, 1806–1821. [Google Scholar] [CrossRef] [Green Version]
  116. Li, B.; Ferreira, M.A.; Huang, M.; Camargos, L.F.; Yu, X.; Teixeira, R.M.; Carpinetti, P.A.; Mendes, G.C.; Gouveia-Mageste, B.C.; Liu, C.; et al. The receptor-like kinase NIK1 targets FLS2/BAK1 immune complex and inversely modulates antiviral and antibacterial immunity. Nat. Commun. 2019, 10, 4996. [Google Scholar] [CrossRef] [Green Version]
  117. Fontes, E.P.; Santos, A.A.; Luz, D.F.; Waclawovsky, A.J.; Chory, J. The geminivirus nuclear shuttle protein is a virulence factor that suppresses transmembrane receptor kinase activity. Genes Dev. 2004, 18, 2545–2556. [Google Scholar] [CrossRef] [Green Version]
  118. Santos, A.A.; Carvalho, C.M.; Florentino, L.H.; Ramos, H.J.; Fontes, E.P. Conserved threonine residues within the A-loop of the receptor NIK differentially regulate the kinase function required for antiviral signaling. PLoS ONE 2009, 4, e5781. [Google Scholar] [CrossRef] [Green Version]
  119. Zorzatto, C.; Machado, J.P.B.; Lopes, K.V.; Nascimento, K.J.; Pereira, W.A.; Brustolini, O.J.; Reis, P.A.; Calil, I.P.; Deguchi, M.; Sachetto-Martins, G.; et al. NIK1-mediated translation suppression functions as a plant antiviral immunity mechanism. Nature 2015, 520, 679–682. [Google Scholar] [CrossRef]
Figure 1. Schematic representation of the RLPredictiOme method. Amino acid sequences are submitted to the method with the sequential filters A to F. (A) The signal peptide and segment transmembrane prediction. (B) Attribute vector provided to the ML models. (C) The first step of the classification to distinguish RLP from NRLP (RLP/NRLP). The result (binary vector) of the classification is submitted to perform Bayesian inference using probability distribution Binomial conjugated with Beta distribution. (D) The second classification step to distinguish RLP from RLK (RLP/RLK). The result (binary vector) is submitted to perform Bayesian inference using probability distribution Binomial conjugated with Beta distribution. (E) The ML models for subfamily classification is the third step to classify RLP families. The result (numerical vector) of the classification is submitted to perform Bayesian inference through the Multinomial and Dirichlet probability distributions. (F) The Bayesian inference for making decisions and final prediction using binary vector resulting from the preview inferences.
Figure 1. Schematic representation of the RLPredictiOme method. Amino acid sequences are submitted to the method with the sequential filters A to F. (A) The signal peptide and segment transmembrane prediction. (B) Attribute vector provided to the ML models. (C) The first step of the classification to distinguish RLP from NRLP (RLP/NRLP). The result (binary vector) of the classification is submitted to perform Bayesian inference using probability distribution Binomial conjugated with Beta distribution. (D) The second classification step to distinguish RLP from RLK (RLP/RLK). The result (binary vector) is submitted to perform Bayesian inference using probability distribution Binomial conjugated with Beta distribution. (E) The ML models for subfamily classification is the third step to classify RLP families. The result (numerical vector) of the classification is submitted to perform Bayesian inference through the Multinomial and Dirichlet probability distributions. (F) The Bayesian inference for making decisions and final prediction using binary vector resulting from the preview inferences.
Ijms 23 12176 g001
Figure 2. Analysis in silico of the GDPDL-RLPs. (A) Phylogenetic tree of the kinase catalytic domain of RLKs, IRE1A and IRE1B. (B) The topology of GDPDL-RLPs.
Figure 2. Analysis in silico of the GDPDL-RLPs. (A) Phylogenetic tree of the kinase catalytic domain of RLKs, IRE1A and IRE1B. (B) The topology of GDPDL-RLPs.
Ijms 23 12176 g002
Figure 3. GDPDL-RLPs-interacting Arabidopsis proteins. (A) GDPDL-RLP-interacting proteins were identified in the Arabidopsis interactome, and the network was assembled by the Cytoscape software. GDPDL-RLPs and SNC4 (GDPDL2) are indicated in green, GDPDL-specifically interacting proteins in light blue, RNA-binding proteins, which interact with all 6 GDPDLs, including GDPDL_RLK (SNC4), are shown in red. In orange, CSN5A as a central hub of plant-pathogen interactions (B) Gene enrichment of proteins under the molecular function term from the GDPD-RLP-Arabidopsis protein-protein interactions (PPI) network. (C) Gene enrichment of proteins from the GDPD-RLP-Arabidopsis PPI network under the cellular component term.
Figure 3. GDPDL-RLPs-interacting Arabidopsis proteins. (A) GDPDL-RLP-interacting proteins were identified in the Arabidopsis interactome, and the network was assembled by the Cytoscape software. GDPDL-RLPs and SNC4 (GDPDL2) are indicated in green, GDPDL-specifically interacting proteins in light blue, RNA-binding proteins, which interact with all 6 GDPDLs, including GDPDL_RLK (SNC4), are shown in red. In orange, CSN5A as a central hub of plant-pathogen interactions (B) Gene enrichment of proteins under the molecular function term from the GDPD-RLP-Arabidopsis protein-protein interactions (PPI) network. (C) Gene enrichment of proteins from the GDPD-RLP-Arabidopsis PPI network under the cellular component term.
Ijms 23 12176 g003
Figure 4. Analysis in silico of the expression of GDPDL-RLPs. (A) The expression profile of the GDPDL-RLPs in response to pathogens. (B) The expression profile of the GDPDL-RLPs in different organs and developmental stages.
Figure 4. Analysis in silico of the expression of GDPDL-RLPs. (A) The expression profile of the GDPDL-RLPs in response to pathogens. (B) The expression profile of the GDPDL-RLPs in different organs and developmental stages.
Ijms 23 12176 g004
Figure 5. Expression analysis of the GDPDL genes in response to biotic signals. For the flg22-induced expression of GDPDLs (as indicated in the figure), 12-day-old Arabidopsis seedlings were treated with 100 nM flg22, and total RNA was prepared from 100 µg of a pool of 10 flg22-treated plants. For TRV infection, Arabidopsis leaves were mechanically inoculated with TRV from N. benthamiana-infected leaves, and TRV infection was diagnosed by PCR. For CabLCV infection, Arabidopsis plants were inoculated with infectious DNA-A and DNA-B clones, and viral accumulation was monitored by PCR. After 15 days of TRV inoculation and 21 days of CabLCV inoculation, total RNA was extracted from a pool of 10 TRV- and CabLCV-infected plants. The transcript accumulation of the indicated genes was monitored by quantitative RT-PCR with gene-specific primers. The gene expression was calculated by the 2−∆CT method using actin as an endogenous control. The error or standard bars indicate the mean ± SD (n = 3, technical replicates). * p < 0.05.
Figure 5. Expression analysis of the GDPDL genes in response to biotic signals. For the flg22-induced expression of GDPDLs (as indicated in the figure), 12-day-old Arabidopsis seedlings were treated with 100 nM flg22, and total RNA was prepared from 100 µg of a pool of 10 flg22-treated plants. For TRV infection, Arabidopsis leaves were mechanically inoculated with TRV from N. benthamiana-infected leaves, and TRV infection was diagnosed by PCR. For CabLCV infection, Arabidopsis plants were inoculated with infectious DNA-A and DNA-B clones, and viral accumulation was monitored by PCR. After 15 days of TRV inoculation and 21 days of CabLCV inoculation, total RNA was extracted from a pool of 10 TRV- and CabLCV-infected plants. The transcript accumulation of the indicated genes was monitored by quantitative RT-PCR with gene-specific primers. The gene expression was calculated by the 2−∆CT method using actin as an endogenous control. The error or standard bars indicate the mean ± SD (n = 3, technical replicates). * p < 0.05.
Ijms 23 12176 g005
Figure 6. Organ-specific expression of the GDPDL genes. Total RNA was extracted from different Arabidopsis organs (as indicated in the figure) of 35-day-grown plants. We used 3 samples of different pools of 10 plants each (therefore n = 3, biological replicates), and the transcript levels of the indicated genes (GDPDL1, GDPDL2, GDPDL3, GDPDL, GDPDL5, and GDPDL6) were determined by qRT-PCR using gene-specific primers. The gene expression was calculated by the 2−∆CT method using actin as an endogenous control. The error or standard bars indicate the mean ± SD (n = 3 biological replicates + n = 3 technical replicates each) of three independent experiments.
Figure 6. Organ-specific expression of the GDPDL genes. Total RNA was extracted from different Arabidopsis organs (as indicated in the figure) of 35-day-grown plants. We used 3 samples of different pools of 10 plants each (therefore n = 3, biological replicates), and the transcript levels of the indicated genes (GDPDL1, GDPDL2, GDPDL3, GDPDL, GDPDL5, and GDPDL6) were determined by qRT-PCR using gene-specific primers. The gene expression was calculated by the 2−∆CT method using actin as an endogenous control. The error or standard bars indicate the mean ± SD (n = 3 biological replicates + n = 3 technical replicates each) of three independent experiments.
Ijms 23 12176 g006
Table 1. Number of RLKs harboring the indicated ectodomain type.
Table 1. Number of RLKs harboring the indicated ectodomain type.
DescriptionTotalDescriptionTotalDescriptionTotal
LRR-RLK14,087CHASE-RLK8CUB-RLK2
Unknown-RLK10,020Cysteine-rich-secretory-RLK7DUF1084-RLK2
S-domain-RLK3859GDPDL-RLK7DUF726-RLK2
Malectin-RLK3299Universal-stress-RLK6Endomembrane-RLK2
Salt-stress-response/antifungal-RLK2345ACT-RLK5GAF-domain2
L-Lectin-RLK2213Probable-lipid-transfer-RLK5GTPase-RLK2
WAK-RLK1844Ankyrin-Kinase4Glycosyl hydrolases-RLK2
B-lectin-RLK549Chromo-RLK4Glycosyltransferase-RLK2
LysM-RLK381PAN-like-Kinase4HAD-RLK2
WAK-EGF-RLK285PB1-RLK4HAD-hyrolase-like-RLK2
EGF-like-RLK212Sel1-RLK4MSP-RLK2
WAK-EFG-RLK177Alpha/beta-hydrolase-RLK3NB-ARC-RLK2
RCC1-RLK148Cytochrome P450-RLK3PQQ-enzyme-RLK2
B-Lectin-RLK145Helix-loop-helix-DNA-binding-RLK3Peptidase-RLK2
PAN-RLK131Histidine-phosphatase-RLK3PfkB-RLK2
C-Lectin-RLK90Major-Facilitator-RLK3Wnt-and-FGF-inhibitory-regulator-RLK2
Glycosyl-hydrolases-RLK90MatE-RLK3Adenylate-cyclase-associated-(CAP)-N-terminal-RLK1
Thaumatin-RLK86PPR3Alcohol-dehydrogenase-GroES-like-RLK1
NAF-RLK79PPR-RLK3Aldose-1-epimerase-RLK1
Ethylene-responsive-RLK74Phospholipase-RLK3Ankyrin-RLK1
EF-hand-RLK50Proline-rich-RLK3Castor-and-Pollux-RLK1
Cache-RLK32Sugar-(and other)-transporter-RLK3Cyclic nucleotide-binding-RLK1
Chitinase-RLK15Transmembrane-RLK3Cyclic-nucleotide-binding-RLK1
PAS-RLK12Alpha-amylase-catalytic-RLK2Cytochrome-P450-RLK1
Plastocyanin-like-RLK12Barwin-RLK2DEAD/DEAH-box-helicase-RLK1
Ring-finger-RLK9C2-RLK2DUF1221-RLK1
Adenovirus E3-RLK8
Table 2. Subfamily size of receptor-like kinase proteins.
Table 2. Subfamily size of receptor-like kinase proteins.
NoLabelCount
1L-Lectin-RLK980
2LRR-RLK5404
3S-domain-RLK1626
4Malectin-RLK1313
5Salt-stress-response/antifungal-RLK1004
6WAK-RLK1362
7B-Lectin-RLK362
8Unknown-RLK3285
10PAN-RLK41
11Ethylene-responsive-RLK29
12Thaumatin-RLK52
13RCC1-RLK65
14Glycosyl-hydrolases-RLK40
15C-Lectin-RLK21
16Other-RLKs192
Table 3. Summarized results of the evaluation models built with the RLPs/NRLPs datasets.
Table 3. Summarized results of the evaluation models built with the RLPs/NRLPs datasets.
Data SetAlgorithmACCF1FDRMCCPrecisionSensitivitySpecificity
AAComposition_1Logistic RegressionCV0.91730.92110.08780.83430.93030.93030.9032
AAComposition_2Logistic RegressionCV0.92050.92410.08390.84070.93220.93220.9078
AAComposition_3Logistic RegressionCV0.92090.92450.08310.84160.93210.93210.9088
AAComposition_N_C terminal_1MLP Classifier0.94570.94780.05340.89120.94900.94900.9421
AAComposition_N_C terminal_2MLP Classifier0.94680.94870.05130.89340.94870.94870.9446
AAComposition_N_C terminal_3MLP Classifier0.94820.94990.04570.89640.94560.94560.9511
CPAASC_1Linear Discriminant Analysis0.90200.91020.13150.80740.95610.95610.8436
CPAASC_2Linear Discriminant Analysis0.90420.91200.12820.81160.95620.95620.8481
CPAASC_3Linear Discriminant Analysis0.90400.91190.12880.81130.95660.95660.8473
CPAASC_N_C terminal_1Linear Discriminant Analysis0.91040.91720.11830.82320.95580.95580.8614
CPAASC_N_C terminal_2Linear Discriminant Analysis0.91320.91960.11480.82840.95690.95690.8660
CPAASC_N_C terminal_3Linear Discriminant Analysis0.91400.92040.11370.83010.95720.95720.8674
Dipeptide_1MLP Classifier0.94390.94570.04970.88780.94120.94120.9468
Dipeptide_2MLP Classifier0.94810.95000.05010.89600.95000.95000.9459
Dipeptide_3MLP Classifier0.94470.94660.04970.88940.94280.94280.9468
Tripeptide_1Logistic RegressionCV0.95350.95510.04100.90690.95110.95110.9561
Tripeptide_2Logistic RegressionCV0.95500.95650.03890.91000.95190.95190.9584
Tripeptide_3Logistic RegressionCV0.95340.95490.04040.90670.95020.95020.9568
Mean0.93030.93420.07840.86150.94800.94800.9112
Table 4. Summarized results of the evaluation models built with the RLPs/RLKs datasets.
Table 4. Summarized results of the evaluation models built with the RLPs/RLKs datasets.
Data SetAlgorithmACCF1FDRMCCPrecisionSensitivitySpecificity
AAComposition_N_C terminalQuadratic Discriminant Analysis0.97750.97730.03370.95520.98840.98840.9670
TripeptideGradient Boosting Classifier0.97620.97600.03670.95270.98900.98900.9639
CPAASC_N_C_terminalLinear Discriminant Analysis0.97070.97060.04790.94210.98990.98990.9523
CPAASCLinear Discriminant Analysis0.96470.96470.05720.93040.98770.98770.9426
DipeptideMLP Classifier0.96270.96170.03440.92540.95790.95790.9673
AACompositionQuadratic Discriminant Analysis0.95710.95710.06270.91510.97770.97770.9374
Mean0.96810.96790.04540.93680.98180.98180.9551
Table 5. Summarized results of the evaluation models built with the RLP subfamily datasets.
Table 5. Summarized results of the evaluation models built with the RLP subfamily datasets.
Data SetAlgorithmACCF1MCCPrecisionSensitivity
AAComposition_10Linear Discriminant Analysis0.9840.8720.8640.8720.872
AAComposition_1Calibrated ClassifierCV0.9840.8690.8610.8690.869
AAComposition_2Calibrated ClassifierCV0.9840.8740.8660.8740.874
AAComposition_3Linear Discriminant Analysis0.9840.8730.8640.8730.873
AAComposition_4Linear Discriminant Analysis0.9840.8700.8620.8700.870
AAComposition_5Linear Discriminant Analysis0.9830.8670.8580.8670.867
AAComposition_6Linear Discriminant Analysis0.9840.8710.8630.8710.871
AAComposition_7Calibrated ClassifierCV0.9840.8690.8610.8690.869
AAComposition_8Calibrated ClassifierCV0.9850.8760.8680.8760.876
AAComposition_9Linear Discriminant Analysis0.9840.8750.8670.8750.875
Mean0.9840.8720.8630.8720.872
AAComposition_N_C_terminal_10Calibrated ClassifierCV0.9890.9110.9050.9110.911
AAComposition_N_C_terminal_1Calibrated ClassifierCV0.9880.9040.8970.9040.904
AAComposition_N_C_terminal_2Calibrated ClassifierCV0.9890.9080.9020.9080.908
AAComposition_N_C_terminal_3Calibrated ClassifierCV0.9880.9020.8960.9020.902
AAComposition_N_C_terminal_4KNeighbors Classifier0.9890.9110.9050.9110.911
AAComposition_N_C_terminal_5KNeighbors Classifier0.9890.9090.9030.9090.909
AAComposition_N_C_terminal_6KNeighbors Classifier0.9880.9030.8960.9030.903
AAComposition_N_C_terminal_7KNeighbors Classifier0.9880.9000.8940.9000.900
AAComposition_N_C_terminal_8Calibrated ClassifierCV0.9880.9030.8970.9030.903
AAComposition_N_C_terminal_9Calibrated ClassifierCV0.9880.9070.9000.9070.907
Mean0.9880.9060.8990.9060.906
CPAASC_10Linear Discriminant Analysis0.9720.7780.7640.7780.778
CPAASC_1AdaBoost Classifier0.9710.7720.7570.7720.772
CPAASC_2AdaBoost Classifier0.9720.7760.7610.7760.776
CPAASC_3AdaBoost Classifier0.9720.7730.7590.7730.773
CPAASC_4Linear Discriminant Analysis0.9710.7700.7550.7700.770
CPAASC_5Linear Discriminant Analysis0.9720.7730.7590.7730.773
CPAASC_6Linear Discriminant Analysis0.9710.7710.7560.7710.771
CPAASC_7AdaBoos tClassifier0.9720.7730.7580.7730.773
CPAASC_8Linear Discriminant Analysis0.9720.7780.7630.7780.778
CPAASC_9AdaBoost Classifier0.9720.7740.7590.7740.774
Mean0.9720.7740.7590.7740.774
CPAASC_N_C_terminal_10AdaBoost Classifier0.9750.8000.7870.8000.800
CPAASC_N_C_terminal_1Linear Discriminant Analysis0.9760.8100.7970.8100.810
CPAASC_N_C_terminal_2AdaBoost Classifier0.9750.8030.7900.8030.803
CPAASC_N_C_terminal_3Linear Discriminant Analysis0.9760.8040.7920.8040.804
CPAASC_N_C_terminal_4Linear Discriminant Analysis0.9760.8050.7930.8050.805
CPAASC_N_C_terminal_5AdaBoost Classifier0.9750.8020.7890.8020.802
CPAASC_N_C_terminal_6Linear Discriminant Analysis0.9760.8080.7950.8080.808
CPAASC_N_C_terminal_7Linear Discriminant Analysis0.9760.8080.7950.8080.808
CPAASC_N_C_terminal_8AdaBoost Classifier0.9750.8020.7890.8020.802
CPAASC_N_C_terminal_9Linear Discriminant Analysis0.9760.8050.7920.8050.805
Mean0.9760.8050.7920.8050.805
Dipeptide_10KNeighbors Classifier0.9920.9350.9310.9350.935
Dipeptide_1KNeighbors Classifier0.9920.9370.9330.9370.937
Dipeptide_2KNeighbors Classifier0.9920.9350.9310.9350.935
Dipeptide_3KNeighbors Classifier0.9920.9340.9300.9340.934
Dipeptide_4KNeighbors Classifier0.9910.9320.9270.9320.932
Dipeptide_5KNeighbors Classifier0.9920.9340.9300.9340.934
Dipeptide_6KNeighbors Classifier0.9910.9310.9260.9310.931
Dipeptide_7KNeighbors Classifier0.9920.9330.9290.9330.933
Dipeptide_8KNeighbors Classifier0.9910.9250.9200.9250.925
Dipeptide_9KNeighbors Classifier0.9910.9290.9250.9290.929
Mean0.9920.9320.9280.9320.932
Tripeptide_1KNeighbors Classifier0.9950.9570.9540.9570.957
Tripeptide_2KNeighbors Classifier0.9940.9550.9520.9550.955
Tripeptide_3KNeighbors Classifier0.9940.9560.9530.9560.956
Tripeptide_4KNeighbors Classifier0.9950.9580.9550.9580.958
Tripeptide_5KNeighbors Classifier0.9950.9580.9550.9580.958
Tripeptide_6KNeighbors Classifier0.9940.9540.9510.9540.954
Tripeptide_7KNeighbors Classifier0.9940.9550.9520.9550.955
Tripeptide_8KNeighbors Classifier0.9940.9510.9480.9510.951
Tripeptide_9KNeighbors Classifier0.9950.9580.9550.9580.958
Tripeptide_10KNeighbors Classifier0.9950.9590.9570.9590.959
Mean0.9940.9560.9530.9560.956
Table 6. Validation of the almost characterized RLPs.
Table 6. Validation of the almost characterized RLPs.
AccessionSPTMRLP-NRLPRLP-NRLP ProbabilityRLP-RLKRLP-RLK ProbabilityRLP-SubfamilyRLP-Subfamily ProbabilityClassificationDecision Probability
NP_001234733.2YYRLP0.9961RLP0.5751LRR-RLP0.7666(LRR-RLP)0.9894
sQ9LNV9.2_RLP1YYRLP0.9961RLP0.7161LRR-RLP0.7671(LRR-RLP)0.9891
sp—Q93ZH0.1—LYM1YYRLP0.8941RLP0.9915LysM-RLP0.467(LysM-RLP)0.989
CAC40826.1_HcrVf2YYRLP0.9961RLP0.9895LRR-RLP0.8333(LRR-RLP)0.9888
AAA65235.1_Cf-9YYRLP0.9965RLP0.9906LRR-RLP0.8331(LRR-RLP)0.9887
AAC78594.1_Hcr2-2AYYRLP0.9965RLP0.8569LRR-RLP0.849(LRR-RLP)0.9885
Q9SSD1.1YYRLP0.9966RLP0.991LRR-RLP0.4667(LRR-RLP)0.9885
AAC15779.1_Cf-2.1YYRLP0.9965RLP0.855LRR-RLP0.85(LRR-RLP)0.9882
sp—Q7FZR1.1—RLP52YYRLP0.9966RLP0.9903LRR-RLP0.8336(LRR-RLP)0.9882
QED40966.1YYRLP0.9962RLP0.7168LRR-RLP0.8506(LRR-RLP)0.9881
CAC40827.1_HcrVf3YYRLP0.9964RLP0.9909LRR-RLP0.8501(LRR-RLP)0.988
sp—Q9LJS0.1—RLP42YYRLP0.9966RLP0.9911LRR-RLP0.8502(LRR-RLP)0.988
AAC78593.1_Hcr2-0BYYRLP0.9962RLP0.991LRR-RLP0.8495(LRR-RLP)0.9879
Q9FK66.1_RLP55YYRLP0.9958RLP0.9915LRR-RLP0.6669(LRR-RLP)0.9879
sQ9SN38.1_RLP5YYRLP0.9963RLP0.9912LRR-RLP0.8497(LRR-RLP)0.9879
AAC78596.1_Hcr2-5DYYRLP0.9959RLP0.9909LRR-RLP0.85(LRR-RLP)0.9878
BAE95828.1 (LysM)YYRLP0.9964RLP0.99Undefined0.4169(Undefined)0.9878
Q9LJS2.1YYRLP0.9964RLP0.9906LRR-RLP0.8505(LRR-RLP)0.9878
AJG42080.1_RLM2YYRLP0.9963RLP0.9908LRR-RLP0.8493(LRR-RLP)0.9877
CAA05269.1_Hcr9-4EYYRLP0.9962RLP0.9893LRR-RLP0.8332(LRR-RLP)0.9877
AJG42091.1_LEPR3YYRLP0.9967RLP0.9911LRR-RLP0.8508(LRR-RLP)0.9875
Q9M2Y3.1_RLP44YYRLP0.9962RLP0.9902LRR-RLP0.7503(LRR-RLP)0.9875
CAC40825.1_HcrVf1YYRLP0.9965RLP0.9921LRR-RLP0.8166(LRR-RLP)0.9874
NP_001234474.2YYRLP0.9963RLP0.991LRR-RLP0.8332(LRR-RLP)0.9874
Solyc08g016270.1.1YYRLP0.9961RLP0.72LRR-RLP0.6335(LRR-RLP)0.9874
AAC78595.1_Hcr2-5BYYRLP0.9963RLP0.8517LRR-RLP0.85(LRR-RLP)0.9873
O80809.1_CLV2YYRLP0.9964RLP0.991LRR-RLP0.8496(LRR-RLP)0.9873
sp—O23006.1—LYM2YYRLP0.9962RLP0.9908Undefined0.5005(Undefined)0.9873
sp—O48849.1—RLP23YYRLP0.9959RLP0.9906LRR-RLP0.7833(LRR-RLP)0.9873
AAC78592.1_Hcr2-0AYYRLP0.9966RLP0.8518LRR-RLP0.8513(LRR-RLP)0.9872
sp—Q6NPN4.1—LYM3YYRLP0.9452RLP0.99LysM-RLP0.4501(LysM-RLP)0.9872
AAC78591.1YYRLP0.9966RLP0.9899LRR-RLP0.8507(LRR-RLP)0.9871
AJV90937.1YYRLP0.9968RLP0.8507LRR-RLP0.8332(LRR-RLP)0.9871
AUT14025.1YYRLP0.9962RLP0.8537LRR-RLP0.7329(LRR-RLP)0.987
AAC15780.1_Cf-2.2YYRLP0.9961RLP0.8555LRR-RLP0.8491(LRR-RLP)0.9863
AGI92782.1_RLP1.813YYRLP0.9963RLP0.9906LRR-RLP0.4005(LRR-RLP)0.9862
NP_187187.1YYRLP0.9964RLP0.9913LRR-RLP0.6497(LRR-RLP)0.986
AKR80573.1_I-7YYRLP0.9963RLP0.8605LRR-RLP0.65(LRR-RLP)0.9855
NP_001362850.1_EIX2YYRLP0.9961RLP0.8581LRR-RLP0.6005(LRR-RLP)0.985
sp—Q9SHI4.1—RLP3NYRLP0.9965RLP0.9904LRR-RLP0.8328(LRR-RLP)0.8015
NP_001355132.1NYRLP0.9965RLP0.9903LRR-RLP0.5163(LRR-RLP)0.8012
Q940E8.1_FEA2YNRLP0.9487RLP0.8554LRR-RLP0.849NRLP0.2048
sp—Q67UE8.1—LYP4YNRLP0.7894RLP0.8564Undefined0.0NRLP0.2017
AFB75328.1YNRLP0.9472RLP0.857LRR-RLP0.5667NRLP0.2012
AKP45167.1YNRLP0.9462RLP0.8543Undefined0.4495NRLP0.201
sp—Q69T51.1—LYP6YNRLP0.8422RLP0.8544Undefined0.0NRLP0.2007
LOC_Os04g56430.1YNRLP0.9471RLP0.8518Salt-stress-response/antifungal-RLP0.4334NRLP0.1986
Table 7. Validation of the RLPs from the genome-wide study of Arabidopsis RLPs restricted to the LRR-RLP subfamily.
Table 7. Validation of the RLPs from the genome-wide study of Arabidopsis RLPs restricted to the LRR-RLP subfamily.
AccessionSPTMRLP-NRLPRLP-NRLP ProbabilityRLP-RLKRLP-RLK ProbabilityRLP-SubfamilyRLP-Subfamily ProbabilityClassificationDecision Probability
AT1G65380.1YYRLP0.9962RLP0.9907LRR-RLP0.8505(LRR-RLP)0.9902
AT1G17240.1YYRLP0.9962RLP0.9913LRR-RLP0.8497(LRR-RLP)0.9886
AT4G13880.1YYRLP0.9963RLP0.9899LRR-RLP0.8001(LRR-RLP)0.9884
AT5G27060.1YYRLP0.9962RLP0.991LRR-RLP0.6669(LRR-RLP)0.9884
AT3G23110.1YYRLP0.9964RLP0.9912LRR-RLP0.6502(LRR-RLP)0.9883
AT1G80080.1YYRLP0.9961RLP0.9911LRR-RLP0.5506(LRR-RLP)0.9883
AT2G32680.1YYRLP0.9967RLP0.9918LRR-RLP0.7838(LRR-RLP)0.9882
AT1G74180.1YYRLP0.9959RLP0.858LRR-RLP0.8163(LRR-RLP)0.988
AT3G05370.1YYRLP0.9962RLP0.8556LRR-RLP0.6337(LRR-RLP)0.988
AT3G11080.1YYRLP0.9962RLP0.991LRR-RLP0.8496(LRR-RLP)0.988
AT3G28890.1YYRLP0.9966RLP0.8561LRR-RLP0.6336(LRR-RLP)0.988
AT2G25440.1YYRLP0.9962RLP0.9902LRR-RLP0.4832(LRR-RLP)0.9878
AT5G45770.1YYRLP0.9965RLP0.99LRR-RLP0.683(LRR-RLP)0.9878
AT2G42800.1YYRLP0.9963RLP0.9908LRR-RLP0.6665(LRR-RLP)0.9876
AT3G05360.1YYRLP0.9967RLP0.9913LRR-RLP0.6668(LRR-RLP)0.9876
AT5G65830.1YYRLP0.9966RLP0.8566LRR-RLP0.667(LRR-RLP)0.9876
AT1G28340.1YYRLP0.8425RLP0.9905Malectin-RLP0.4502(Malectin-RLP)0.9875
AT1G74190.1YYRLP0.9959RLP0.8564LRR-RLP0.8499(LRR-RLP)0.9871
AT2G15080.1YYRLP0.9965RLP0.9904LRR-RLP0.8502(LRR-RLP)0.987
AT3G05650.1YYRLP0.9964RLP0.9906LRR-RLP0.6664(LRR-RLP)0.9868
AT1G45616.1YYRLP0.9961RLP0.9913LRR-RLP0.7665(LRR-RLP)0.9868
AT3G05660.1YYRLP0.9966RLP0.8557LRR-RLP0.85(LRR-RLP)0.9866
AT1G58190.1YYRLP0.9962RLP0.8521LRR-RLP0.6663(LRR-RLP)0.9866
AT3G49750.1YYRLP0.9963RLP0.9909LRR-RLP0.7502(LRR-RLP)0.9865
AT4G13920.1YYRLP0.9967RLP0.9911LRR-RLP0.8498(LRR-RLP)0.9865
AT5G25910.1YYRLP0.9964RLP0.9899LRR-RLP0.8501(LRR-RLP)0.9864
AT2G33060.1YYRLP0.9966RLP0.9914LRR-RLP0.8332(LRR-RLP)0.9863
AT4G04220.1YYRLP0.9962RLP0.9911LRR-RLP0.8506(LRR-RLP)0.9863
AT2G33050.1YYRLP0.9964RLP0.9915LRR-RLP0.7498(LRR-RLP)0.986
AT1G71400.1YYRLP0.996RLP0.8563LRR-RLP0.6831(LRR-RLP)0.9851
AT4G18760.1YYRLP0.9967RLP0.9903LRR-RLP0.8495(LRR-RLP)0.9885
AT1G71390.1NYRLP0.9966RLP0.99LRR-RLP0.6667(LRR-RLP)0.8021
AT2G25470.1NYRLP0.9964RLP0.8556LRR-RLP0.8502(LRR-RLP)0.8014
AT1G47890.1NYRLP0.9967RLP0.9908LRR-RLP0.8501(LRR-RLP)0.8001
AT4G13810.1NYRLP0.9964RLP0.9907LRR-RLP0.833(LRR-RLP)0.7997
AT3G23010.1NYRLP0.9965RLP0.9908LRR-RLP0.667(LRR-RLP)0.7995
AT1G74170.1NYRLP0.9964RLP0.8561LRR-RLP0.7164(LRR-RLP)0.7994
AT3G24982.1NYRLP0.9963RLP0.989LRR-RLP0.8512(LRR-RLP)0.7993
AT1G17250.1NYRLP0.9965RLP0.9911LRR-RLP0.8496(LRR-RLP)0.799
AT3G23120.1NYRLP0.997RLP0.9905LRR-RLP0.6835(LRR-RLP)0.7976
AT3G53240.1NYRLP0.9961RLP0.9905LRR-RLP0.783(LRR-RLP)0.7973
AT1G07390.1NYRLP0.9957RLP0.7119LRR-RLP0.7826(LRR-RLP)0.7969
AT3G11010.1NYRLP0.9961RLP0.9902LRR-RLP0.6665(LRR-RLP)0.7958
AT1G34290.1YYRLP0.9964RLP0.9898Undefined0.2166(Undefined)0.7949
AT5G49290.1NYRLP0.9966RLP0.9901LRR-RLP0.6833(LRR-RLP)0.7941
AT2G32660 N
AT2G33020 N
AT2G33030 N
AT2G33080 N
AT3G24900 N
AT3G25010 N
AT4G13900 N
AT5G40170 N
AT3G25020 N
Table 8. Random sequences confronted against RLPredictiOme.
Table 8. Random sequences confronted against RLPredictiOme.
AccessionSPTMRLP-NRLPRLP-NRLP ProbabilityRLP-RLKRLP-RLK ProbabilityRLP-SubfamilyRLP-Subfamily ProbabilityClassificationDecision Probability
Alien_71_464YYNRLP0.0532RLP0.7145Other-RLP0.4166NRLP0.4033
Alien_78_801YYNRLP0.0532RLP0.857WAK-RLP0.3169NRLP0.4014
Alien_88_471NYNRLP0.369RLP0.855Unknown0.2837NRLP0.2068
Alien_90_956NYNRLP0.0527RLK-like0.5721Other-RLP0.3499NRLP0.2064
Alien_94_666NYNRLP0.0535RLP0.8558S-domain-RLP0.3164NRLP0.2045
Alien_11_789NYNRLP0.0524RLK-like0.4288Other-RLP0.4331NRLP0.2034
Alien_34_248NYNRLP0.2093RLP0.8571Other-RLP0.4004NRLP0.2022
Alien_70_660NYNRLP0.3677RLP0.8564Unknown0.2491NRLP0.2002
Alien_59_959NYNRLP0.052RLK-like0.576S-domain-RLP0.417NRLP0.1994
Alien_20_195YNNRLP0.3704RLP0.8544Unknown0.2671NRLP0.1987
Alien_23_503NYNRLP0.3698RLP0.8596Unknown0.3NRLP0.1987
Alien_69_854NYNRLP0.0542RLP0.7198Other-RLP0.4327NRLP0.1985
Alien_2_750NYNRLP0.0526RLK-like0.5768Other-RLP0.3331NRLP0.1956
Alien_66_528NNNRLP0.0001RLP0.8549S-domain-RLP0.3829NRLP0.0195
Alien_1_268NNNRLP0.0002RLP0.8536Other-RLP0.3831NRLP0.0093
Alien_51_917NNNRLP0.0002RLK-like0.573Unknown0.283NRLP0.0044
Alien_79_429NNNRLP0.3166RLP0.8588Other-RLP0.3001NRLP0.0041
Alien_61_779NNNRLP0.0002RLP0.7131S-domain-RLP0.3834NRLP0.0036
Alien_67_112NNNRLP0.1591RLP0.7131Other-RLP0.3342NRLP0.0035
Alien_42_363NNNRLP0.316RLP0.8576S-domain-RLP0.3336NRLP0.003
Alien_4_417NNNRLP0.0002RLK-like0.5712WAK-RLP0.4337NRLP0.0029
Alien_24_102NNNRLP0.4222RLP0.861WAK-RLP0.3498NRLP0.0027
Alien_9_882NNNRLP0.0002RLP0.7132S-domain-RLP0.3664NRLP0.0019
Alien_7_199NNNRLP0.3166RLP0.8564WAK-RLP0.3504NRLP0.0018
Alien_29_460NNNRLP0.2089RLP0.8554Unknown0.284NRLP0.0017
Alien_50_474NNNRLP0.0009RLP0.8548Unknown0.2495NRLP0.0017
Alien_72_442NNNRLP0.0002RLP0.8498Unknown0.2333NRLP0.0017
Alien_97_120NNNRLP0.3685RLP0.8566Unknown0.2999NRLP0.0017
Alien_38_893NNNRLP0.0003RLK-like0.5771S-domain-RLP0.4499NRLP0.0016
Alien_73_528NNNRLP0.0002RLP0.857S-domain-RLP0.3665NRLP0.0016
Alien_83_641NNNRLP0.0003RLP0.7085Other-RLP0.3502NRLP0.0016
Alien_44_248NNNRLP0.0003RLP0.7133S-domain-RLP0.3833NRLP0.0015
Alien_62_945NNNRLP0.0002RLK-like0.5733S-domain-RLP0.4834NRLP0.0015
Alien_16_855NNNRLP0.0002RLK-like0.4308Unknown0.2658NRLP0.0014
Alien_40_703NNNRLP0.0002RLP0.711S-domain-RLP0.3499NRLP0.0014
Alien_45_534NNNRLP0.0002RLP0.8553WAK-RLP0.3165NRLP0.0014
Alien_74_665NNNRLP0.0001RLP0.8547Unknown0.2503NRLP0.0014
Alien_18_925NNNRLP0.0001RLK-like0.5679Other-RLP0.4166NRLP0.0013
Alien_33_955NNNRLP0.0003RLK-like0.4348Unknown0.2332NRLP0.0013
Alien_39_171NNNRLP0.1577RLP0.8516Unknown0.2665NRLP0.0012
Alien_49_350NNNRLP0.0002RLP0.8573S-domain-RLP0.4842NRLP0.0012
Alien_63_622NNNRLP0.0002RLP0.8555Unknown0.2664NRLP0.0012
Alien_89_627NNNRLP0.0002RLP0.8567Other-RLP0.3835NRLP0.0012
Alien_91_929NNNRLP0.0003RLK-like0.573Other-RLP0.4331NRLP0.0012
Alien_14_450NNNRLP0.3148RLP0.7157WAK-RLP0.333NRLP0.0011
Alien_15_536NNNRLP0.0007RLP0.8566Unknown0.2668NRLP0.0011
Alien_22_586NNNRLP0.001RLP0.8562S-domain-RLP0.3993NRLP0.0011
Alien_3_226NNNRLP0.0003RLK-like0.431Unknown0.2991NRLP0.0011
Alien_57_326NNNRLP0.3151RLP0.8605Unknown0.2502NRLP0.0011
Alien_13_137NNNRLP0.2113RLK-like0.5764Unknown0.1667NRLP0.001
Alien_35_659NNNRLP0.0002RLK-like0.5687Other-RLP0.3829NRLP0.001
Alien_37_440NNNRLP0.0003RLK-like0.5743Unknown0.2666NRLP0.001
Alien_48_571NNNRLP0.0002RLP0.8586Unknown0.2999NRLP0.001
Alien_54_839NNNRLP0.0004RLP0.7158Unknown0.2674NRLP0.001
Alien_12_553NNNRLP0.3185RLP0.858Unknown0.2335NRLP0.0009
Alien_17_304NNNRLP0.3169RLP0.8541Unknown0.2828NRLP0.0009
Alien_25_176NNNRLP0.0003RLP0.8568Unknown0.2667NRLP0.0009
Alien_30_623NNNRLP0.0002RLP0.8547Other-RLP0.3833NRLP0.0009
Alien_32_240NNNRLP0.1576RLP0.8531Unknown0.2499NRLP0.0009
Alien_53_589NNNRLP0.0006RLP0.7103Unknown0.3NRLP0.0009
Alien_58_715NNNRLP0.0001RLK-like0.5748S-domain-RLP0.3842NRLP0.0009
Alien_82_456NNNRLP0.0001RLP0.855S-domain-RLP0.3165NRLP0.0009
Alien_85_415NNNRLP0.0004RLP0.715Unknown0.2167NRLP0.0009
Alien_8_947NNNRLP0.0001RLK-like0.5689Unknown0.25NRLP0.0009
Alien_10_555NNNRLP0.0002RLP0.8536Unknown0.2996NRLP0.0008
Alien_19_229NNNRLP0.0003RLP0.8599PAN-RLP0.3336NRLP0.0008
Alien_27_824NNNRLP0.0002RLP0.7111Unknown0.3337NRLP0.0008
Alien_41_731NNNRLP0.0004RLP0.7117Unknown0.2666NRLP0.0008
Alien_43_686NNNRLP0.0001RLP0.7129S-domain-RLP0.3662NRLP0.0008
Alien_47_420NNNRLP0.0004RLP0.8546Other-RLP0.4172NRLP0.0008
Alien_52_779NNNRLP0.0003RLK-like0.4383Unknown0.2999NRLP0.0008
Alien_55_478NNNRLP0.0002RLP0.7179Other-RLP0.3997NRLP0.0008
Alien_60_817NNNRLP0.0002RLP0.7135Unknown0.2999NRLP0.0008
Alien_64_626NNNRLP0.0002RLP0.7138Other-RLP0.4NRLP0.0008
Alien_75_673NNNRLP0.0002RLP0.8548Unknown0.2832NRLP0.0008
Alien_81_442NNNRLP0.0003RLK-like0.5736S-domain-RLP0.4833NRLP0.0008
Alien_87_495NNNRLP0.0005RLP0.8555S-domain-RLP0.3838NRLP0.0008
Alien_93_110NNNRLP0.3149RLP0.8597WAK-RLP0.467NRLP0.0008
Alien_99_622NNNRLP0.0002RLP0.8568Unknown0.25NRLP0.0008
Alien_21_499NNNRLP0.0002RLP0.86S-domain-RLP0.3498NRLP0.0007
Alien_31_429NNNRLP0.0002RLP0.7128Unknown0.2996NRLP0.0007
Alien_46_860NNNRLP0.0002RLK-like0.571Unknown0.2995NRLP0.0007
Alien_56_859NNNRLP0.0005RLK-like0.5724S-domain-RLP0.3328NRLP0.0007
Alien_5_855NNNRLP0.0003RLK-like0.572Unknown0.2997NRLP0.0007
Alien_65_609NNNRLP0.0002RLK-like0.4257Unknown0.2667NRLP0.0007
Alien_6_529NNNRLP0.0001RLP0.8565Unknown0.2504NRLP0.0007
Alien_86_232NNNRLP0.1581RLP0.8535Other-RLP0.3495NRLP0.0007
Alien_92_960NNNRLP0.0005RLK-like0.5741Other-RLP0.3168NRLP0.0007
Alien_95_597NNNRLP0.157RLP0.8588Unknown0.2833NRLP0.0007
Alien_96_597NNNRLP0.3704RLP0.8544WAK-RLP0.3999NRLP0.0007
Alien_0_119NNNRLP0.0528RLP0.7163PAN-RLP0.4339NRLP0.0006
Alien_26_112NNNRLP0.5285RLP0.8585Unknown0.2664NRLP0.0006
Alien_76_327NNNRLP0.0003RLP0.7066Other-RLP0.4002NRLP0.0006
Alien_77_685NNNRLP0.0002RLK-like0.569Unknown0.2494NRLP0.0006
Alien_98_323NNNRLP0.1046RLP0.7172Other-RLP0.5328NRLP0.0006
Alien_28_468NNNRLP0.0001RLP0.8563Unknown0.2831NRLP0.0005
Alien_36_821NNNRLP0.0001RLP0.717Unknown0.2337NRLP0.0005
Alien_68_626NNNRLP0.0002RLP0.8541Unknown0.2835NRLP0.0005
Alien_80_637NNNRLP0.0002RLK-like0.5715S-domain-RLP0.4333NRLP0.0005
Alien_84_494NNNRLP0.1614RLP0.8574S-domain-RLP0.3501NRLP0.0005
Table 9. Number of RLPs and predicted RLKs.
Table 9. Number of RLPs and predicted RLKs.
Class (Subfamily)RLPCorrectly Classified *Unknown
Function **
Incorrectly Subfamily Classified ***Mistakenly Classified ****RLKs in Arabidopsis
LRR-RLP4946302235
L-Lectin-RLP505 545
Salt stress response/antifungal-RLP9315044
WAK-RLP651 442
S-domain-RLP11 137
Unknown-RLP (Extensin, PERK, RKF3, URKI)4343 1128
Malectin-RLP6231515
RCC1-RLP4 4 8
LysM-RLP422 3
B-lectin-RLP1 1 2
C-Lectin-RLP0 2
Ethylene-responsive-RLP33 32
PAS-RLP0 2
Thaumatin-RLP66 2
PPR-RLP0 1
Glycosyl-hydrolases-RLP3 3 0
PAN-RLP1 1 10
Other-RLP351124 130
Undefined78
Total17612247745468
* Correctly classified as shown in Table S2 in black bold. ** Unknown function as shown in Table S2 in red. *** Incorrectly subfamily classified as shown in Table S2 in blue. **** Mistakes as shown in Table S2 in standard black.
Table 10. Molecular evolution analysis of the GDPDLs.
Table 10. Molecular evolution analysis of the GDPDLs.
SequenceKaKsKa/KsSelectionDate (Mya)p-Value
GDPDL5-GDPDL30.3821.5780.242Purifying129.3167.98 × 10−49
GDPD (ectodomain)- GDPDL40.2141.4660.146Purifying120.1932.22 × 10-45
GDPDL4-GDPD-RLK0.2141.2880.166Purifying105.6029.31 × 10−45
GDPDL1-GDPDL40.1800.9400.192Purifying77.0371.60 × 10−51
GDPDL3-GDPDL40.1640.8520.192Purifying69.8221.12 × 10−46
GDPDL4-GDPDL60.6460.8020.805Purifying65.7440.146094
GDPD-RLK-GDPDL60.6950.6381.090Positive52.2860.109708
GDPD (ectodomain)- GDPDL30.1700.3970.428Purifying32.5254.56 × 10−13
GDPDL3-GDPD-RLK0.1670.3940.423Purifying32.3333.06 × 10−13
GDPD-RLK-GDPDL30.1670.3940.423Purifying32.3333.06 × 10−13
GDPDL1-GDPDL30.1410.3900.363Purifying31.9611.05 × 10−17
GDPDL1-GDPD-RLK0.1200.3270.368Purifying26.7865.38 × 10−16
GDPD-RLK-GDPDL10.1200.3270.368Purifying26.7865.38 × 10−16
GDPDL1-GDPD (ectodomain)0.1250.3260.384Purifying26.7305.08 × 10−15
Table 11. Protein-protein interactions between the GDPDL proteins and Arabidopsis proteins. The colors indicate the hubs from Figure 3A.
Table 11. Protein-protein interactions between the GDPDL proteins and Arabidopsis proteins. The colors indicate the hubs from Figure 3A.
NameBetweenness CentralityCloseness CentralityDegreeEccentricityDescription
SNC40.192340750.37614679123glycerophosphoryl diester phosphodiesterase family protein, putative, expressed
RLP510.00.2751677924leucine rich repeat family protein, putative, expressed
SNC13.0111 × 10−40.2770270344rp3 protein, putative, expressed
SUA1.0037 × 10−40.2770270344RNA recognition motif family protein, expressed
DRT1111.0037 × 10−40.2770270344G-patch domain containing protein, expressed
AT2G200500.00.2742474914AGC_PKA/PKG_like.1-ACG kinases include homologs to PKA, PKG and PKC, expressed
AT1G597800.00.2742474914NBS-LRR disease resistance protein, putative, expressed
AT3G553500.00.2760942834trp repressor/replication initiator, putative, expressed
BPA10.308183660.5189873462RNA recognition motif containing protein, putative, expressed
AT4G17720.308183660.5189873462RNA recognition motif, putative, expressed
AT1G229200.00.2742474924COP9 signalosome complex subunit 5b, putative, expressed
GDPDL50.178352760.37104072103glycerophosphoryl diester phosphodiesterase family protein, putative, expressed
MLP3280.00.2770270374pathogenesis-related Bet v I family protein, putative, expressed
AGL460.00.2770270374OsMADS89-MADS-box family gene with M-gamma type-box, expressed
AT2G471150.043020.277966184expressed protein
AT1G296600.043020.277966184GDSL-like lipase/acylhydrolase, putative, expressed
AT5G519500.043020.277966184HOTHEAD precursor, putative, expressed
AT1G206800.043020.277966184Ser/Thr-rich protein T10 in DGCR region, putative, expressed
AT2G177100.043020.277966184expressed protein
AT5G425300.043020.277966184
BPA10.308183660.5189873462RNA recognition motif containing protein, putative, expressed
AT4G177200.308183660.5189873462RNA recognition motif, putative, expressed
GDPDL30.16933420.37104072103glycerophosphoryl diester phosphodiesterase family protein, putative, expressed
SHV20.00.2751677954COBRA-like protein 7 precursor, putative, expressed
MRH10.00.2751677954MRH1, putative, expressed
BST10.00.2751677954endonuclease/exonuclease/phosphatase family domain containing protein, expressed
MRH60.00.2751677954universal stress protein domain containing protein, putative, expressed
MRH20.00.2751677954kinesin motor domain containing protein, expressed
ATCOAE0.00.2715231814dephospho-CoA kinase, putative, expressed
AT3G237500.00.2715231814receptor protein kinase TMK1 precursor, putative, expressed
BPA10.308183660.5189873462RNA recognition motif containing protein, putative, expressed
AT4G177200.308183660.5189873462RNA recognition motif, putative, expressed
GDPDL10.127947170.37442922103glycerophosphoryl diester phosphodiesterase family protein, putative, expressed
AT1G497500.00.2733333314uncharacterized protein At4g06744 precursor, putative, expressed
AT3G457100.00.2733333314peptide transporter PTR2, putative, expressed
PLDGAMMA10.007794550.2918149534phospholipase D, putative, expressed
MAP180.00.2733333314Unknown function
CDS10.00.2827586224phosphatidate cytidylyltransferase, putative, expressed
BPA10.308183660.5189873462RNA recognition motif containing protein, putative, expressed
AT4G177200.308183660.5189873462RNA recognition motif, putative, expressed
GDPDL40.215730540.38497653143glycerophosphoryl diester phosphodiesterase family protein, putative, expressed
AT5G384800.00.278911561414-3-3 protein, putative, expressed
FLA70.004458050.2939068164fasciclin domain containing protein, expressed
SKU50.00.287719344monocopper oxidase, putative, expressed
FLA80.00.287719344fasciclin-like arabinogalactan protein, putative, expressed
ZW90.004458050.2939068164ubiquitin carboxyl-terminal hydrolase, putative, expressed
AT1G328600.008534430.2949640324glycosyl hydrolases family 17, putative, expressed
AT3G563700.00.2789115614receptor-like protein kinase precursor, putative, expressed
AT4G090000.00.278911561414-3-3 protein, putative, expressed
BG_PPAP0.00.2789115614glycosyl hydrolases family 17, putative, expressed
AT1G010800.064801320.3904761934RNA recognition motif containing protein, putative, expressed
AT5G654300.00.278911561414-3-3 protein, putative, expressed
BPA10.308183660.5189873462RNA recognition motif containing protein, putative, expressed
AT4G177200.308183660.5189873462RNA recognition motif, putative, expressed
GDPDL60.674552990.4969697383glycerophosphoryl diester phosphodiesterase family protein, putative, expressed
AT4G118600.00.2789115614ubiquitin interaction motif family protein, expressed
AT3G234100.00.3333333314alcohol oxidase, putative, expressed
AT4G234000.00.3333333314aquaporin protein, putative, expressed
AT4G308500.00.3333333314haemolysin-III, putative, expressed
AT1G578700.00.3333333314CGMC_GSK.5-CGMC includes CDA, MAPK, GSK3, and CLKC kinases, expressed
AT1G318120.00.3333333314acyl CoA binding protein, putative, expressed
AT1G143600.00.3333333314solute carrier family 35 member B1, putative, expressed
AT5G063200.00.3333333314harpin-induced protein 1 domain containing protein, expressed
AT1G075500.00.3333333314senescence-induced receptor-like serine/threonine-protein kinase precursor, putative, expressed
AT5G073400.00.3333333314calreticulin precursor protein, putative, expressed
AT2G417050.00.3333333314crcB-like protein, expressed
AT3G121800.00.3333333314cornichon protein, putative, expressed
AT5G118900.00.3333333314harpin-induced protein 1 domain containing protein, expressed
AT1G140200.00.3333333314auxin-independent growth promoter protein, putative, expressed
AT1G346400.00.3333333314expressed protein
AT3G666540.00.3333333314peptidyl-prolyl cis-trans isomerase, putative, expressed
AT2G224250.00.3333333314signal peptidase complex subunit 1, putative, expressed
AT2G272900.00.3333333314protein of unknown function DUF1279 domain containing protein, expressed
AT5G495400.00.3333333314transmembrane protein 93, putative, expressed
AT1G137700.00.3333333314DUF647 domain containing protein, putative, expressed
AT1G290600.00.3333333314expressed protein
AT4G144550.00.3333333314SNARE domain containing protein, putative, expressed
AT4G253600.00.3333333314leaf senescence related protein, putative, expressed
AT4G122500.00.3333333314UDP-glucuronate 4-epimerase, putative, expressed
AT5G354600.00.3333333314integral membrane protein, putative, expressed
AT1G161700.00.3333333314expressed protein
AT5G033450.00.3333333314expressed protein
AT1G476400.00.3333333314SSA2-2S albumin seed storage family protein precursor, putative, expressed
AT5G524200.00.3333333314expressed protein
BPA10.308183660.5189873462RNA recognition motif containing protein, putative, expressed
AT4G177200.308183660.5189873462RNA recognition motif, putative, expressed
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Silva, J.C.F.; Ferreira, M.A.; Carvalho, T.F.M.; Silva, F.F.; de A. Silveira, S.; Brommonschenkel, S.H.; Fontes, E.P.B. RLPredictiOme, a Machine Learning-Derived Method for High-Throughput Prediction of Plant Receptor-like Proteins, Reveals Novel Classes of Transmembrane Receptors. Int. J. Mol. Sci. 2022, 23, 12176. https://doi.org/10.3390/ijms232012176

AMA Style

Silva JCF, Ferreira MA, Carvalho TFM, Silva FF, de A. Silveira S, Brommonschenkel SH, Fontes EPB. RLPredictiOme, a Machine Learning-Derived Method for High-Throughput Prediction of Plant Receptor-like Proteins, Reveals Novel Classes of Transmembrane Receptors. International Journal of Molecular Sciences. 2022; 23(20):12176. https://doi.org/10.3390/ijms232012176

Chicago/Turabian Style

Silva, Jose Cleydson F., Marco Aurélio Ferreira, Thales F. M. Carvalho, Fabyano F. Silva, Sabrina de A. Silveira, Sergio H. Brommonschenkel, and Elizabeth P. B. Fontes. 2022. "RLPredictiOme, a Machine Learning-Derived Method for High-Throughput Prediction of Plant Receptor-like Proteins, Reveals Novel Classes of Transmembrane Receptors" International Journal of Molecular Sciences 23, no. 20: 12176. https://doi.org/10.3390/ijms232012176

APA Style

Silva, J. C. F., Ferreira, M. A., Carvalho, T. F. M., Silva, F. F., de A. Silveira, S., Brommonschenkel, S. H., & Fontes, E. P. B. (2022). RLPredictiOme, a Machine Learning-Derived Method for High-Throughput Prediction of Plant Receptor-like Proteins, Reveals Novel Classes of Transmembrane Receptors. International Journal of Molecular Sciences, 23(20), 12176. https://doi.org/10.3390/ijms232012176

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop