1. Introduction
Black Sigatoka, caused by the ascomycete fungus
Pseudocercospora (previously
Mycosphaerella)
fijiensis, is one of the most important threats to banana production worldwide [
1]. Black Sigatoka is found in almost all banana-growing countries, producing massive losses in banana yield [
2,
3].
P. fijiensis is a hemibiotroph; the conidia or ascospores germinate, forming mycelia that penetrate through the stomata and colonize the intercellular spaces of the leaf during the biotrophic phase [
4]. The fungus then switches to a necrotrophic phase, leading to the death of leaf cells and the formation of characteristic necrotic lesions [
1].
Effector proteins are key actors in phytopathology. They play different roles such as suppressing plant immunity by interfering with the host perception, signaling, and biosynthesis of phytoregulators [
5]; effectors can also directly protect the producer microorganism, among other functions [
6,
7]. As a hemibiotroph,
P. fijiensis would be expected to produce biotrophy-related effectors able to suppress host defense and prevent host death, and, later, necrotrophy-related effectors and toxic compounds to kill host cells. Currently, three effectors of
P. fijiensis are known: PfAvr4, PfEcp2 [
8], and PfECP6 [
3].
PfAvr4 is a functional ortholog of the
C. fulvum Avr4 (
CfAvr4). These effectors bind to fungal cell wall chitin to protect against degradation by host chitinases [
8,
9]. Avr4 has been identified in different fungi of the Dothideomycete class, including the pathogens of the black Sigatoka complex in banana,
P. musae and
P. eumusae, the tomato pathogen,
P. fuligena, the pine tree pathogen,
Dothistroma septosporum, the poplar pathogen,
Septoria musiva (SmAvr4), and several
Cercospora species such as
C. beticola,
C. apii,
C. nicotianae, and
C. zeina [
10,
11]. This is one of the most studied fungal effectors. Avr4 has eight cysteines and all of them participate in disulfide bonds; three disulfide bonds are required for Avr4 protein stability and their disruption enables Avr4´s degradation by proteases in the plant apoplast [
12]. In
C. fulvum, natural Cys to Tyr mutant AVR4 proteins have been found. These mutants retain their chitin-binding ability and, when bound to chitin, are less sensitive to proteases, but are not recognized by the tomato Cf-4 resistance protein [
12], giving these mutants an advantage. When Mesarich et al. [
13] compared the AVR4s from
C. fulvum,
D. septosporum,
P. fugelina,
C. apii,
C. nicotianae, and
C. beticola, the first three pathogens were able to trigger a Cf-4-dependent hypersensitive response, while the last three did not. The authors analyzed four conserved amino acid residues shared between the Cys6–Cys7 region of the AVR4s from
C. fulvum,
D. septosporum, and
P. fugelina, which are absent in AVR4s from
C. apii,
C. nicotianae, and
C. beticola, and found that the proline residue (Pro 87) is necessary for Cf4-mediated HR elicitation.
Ecp2 was also first described in
C. fulvum where it contributed to pathogen virulence [
14]. This effector has the Hce2 domain, defined as “a domain that causes plant cell death” [
15]; this domain is widely distributed in the fungal kingdom [
15]. Homologs of Ecp2 have been identified in 135 fungal genomes, including
D. septosporum,
Mycosphaerella graminicola, and
P. fijiensis [
16]. In
D. septosporum, Ecp2 comprises a family of seven members but four of them appear to be non-functional [
17]. Although Ecp2 has been described in many reports, its precise function is still unknown [
15].
The third effector, Ecp6, sequesters the free immunogenic chitin oligomers released by the activity of plant chitinases, thus avoiding recognition by the plant [
18]. This effector shows a broad distribution among different pathogenic and nonpathogenic fungal species [
19].
Although many investigations have focused on these three effectors [
8], very little is known about the full catalog of effectors of
P. fijiensis. Some genomics reports have included the prediction of the effectorome in this pathogen, but, in all of these reports, it was not the main objective of the research, rather, a secondary one. Ohm et al. [
20] predicted 143 small secreted proteins as candidate effectors using 200 amino acids as the upper length limit, but the majority of their effort was focused on the comparative analysis of 18 genomes with respect to the presence of transposons, repeat regions, orthologous genes, shared PFAM domains, lipases, proteases, secondary metabolism enzymes, and toxins. Another prediction of the
P. fijiensis effectorome was carried out by Arango-Isaza et al. [
2] who recovered the secreted proteins with <300 amino acids, no transmembrane domains, no signal anchor motifs, and at least four cysteine residues. They predicted 172
P. fijiensis effectors; 107 of them had no blast hits and 37 had GO terms related to hydrolases such as chitinases, peptidases, cellulases, xylanases, and peroxidases. This report was mainly focused on the genome structure of the fungus; they conducted a genomic comparison of
P. fijiensis with other Capnodiales fungi which involved the identification of repeat-induced point mutation (RIP), long terminal repeat (LTR) retrotransposons, and a comparison of genome melting curves of different Dothideomycetes. Similarly, Noar and Daub [
21] used 300 amino acids as the upper length limit for protein effectors, and predicted 231 effectors in
P. fijiensis. This was a very interesting analysis that described differentially expressed genes in the transcriptome of
P. fijiensis during interaction with the banana host. These genes included ABC transporters, cytochrome P450s, polyketide synthases (PKS), and non-ribosomal peptide synthases (NRPS), as well as proteins with CFEM, DUF, and other domains. The goal of this work was to expand on the knowledge of polyketides in this pathogen, among other pathogenicity-related genes.
Lastly, Chang et al. [
3] compared the pathogens´ genomes of the Sigatoka Disease Complex (
P. fijiesis,
P. eumusae, and
P. musae). They identified the transposable elements, the shared and species-specific gene families, and the phylogenetic relationships and synteny existing among these genomes. They also predicted the effectoromes of these pathogens. Defining an effector as a secreted protein with <250 amino acids and a high percentage of cysteine residues (5%), they were able to predict 105 effectors of
P. fijiensis. They identified 234 gene families, including seven putative effectors exclusively present in the three Sigatoka species.
All these predictions identified canonical effectors; the term ‘canonical’ is used to define those effectors that meet classical criteria such as a small size, extracellular localization, the presence of a signal peptide, no transmembrane domain, and cysteine richness [
22,
23,
24]. However, effector proteins that deviate from one or more of these features also exist and are termed “non-canonical effectors” [
25,
26,
27]. Recently, we created WideEffHunter, an algorithm that identifies non-canonical effectors based on domains and motifs associated with effector proteins, and the shared homology with validated effectors [
28]. Here, we predicted, for the first time, the complete effectorome of
P. fijiensis; this effectorome comprises 5179 effector candidates, 240 of them canonical candidates. In agreement with Ohm et al. [
20], peptidases and lipases were not among the groups of the most expanded effector families in this hemibiotroph; rather, Fungal_TF_MHR and other transcription factors, the mycotoxin biosynthesis protein UstYa-like, the Concanavalin A-like lectins, Hydrophobic surface binding protein A (HsbA), CFEM, Salicylate hydroxylase, and Isochorismatase families were found to be the expanded effector families; these are likely to support functions necessary for the hemibiotrophic lifestyle of the pathogen.
Ohm et al. [
20] analyzed 18 genomes, including
P. fijiensis, and predicted the core and dispensable scaffolds. Here, we found in
P. fijiensis that dispensable scaffolds harbor only 409 effectors, all of them non-canonical (~8% of the effectorome), while similar proportions of the total effectorome (canonical and non-canonical effectors) were distributed throughout all the core scaffolds; 30% of the effectorome was concentrated in scaffolds 1 and 2. Interestingly, effector paralogs are dispersed in different scaffolds. It is currently believed that the core genomes of many filamentous fungi contain essential conserved genes while the dispensable genome scaffolds contain pathogenicity genes, including effectors [
29,
30]. Our results are innovative, since they show that most effectors are harbored in the core scaffolds of
P. fijiensis. We found similar results in
Cochliobolus heterostrophus,
Mycosphaerella graminicola,
Leptosphaeria maculans, and
Stagonospora nodorum, revealing a different effectorome genomic structure in these pathogens compared to the currently known model.
In many fungi, effectors are concentrated in genomic islands [
7,
31,
32], or clustered close to repetitive DNA such as transposons [
31,
33]. Ohm et al. [
20] reported that the small secreted proteins of
P. fijiensis do not cluster in close proximity to transposons, as in other fungi. Pathogens follow different evolutionary trajectories, and, evidently, in some fungi, the effectors do not cluster [
34].
P. fijiensis may have its effectors dispersed throughout all its scaffolds, in a balanced proportion of canonical and non-canonical effectors.
2. Materials and Methods
Sequence information and prediction of effectors in P. fijiensis. The complete genome and deduced proteome of
Pseudocercospora fijiensis v2.0 (strain CIRAD86) were downloaded from the JGI MycoCosm database, accessed on 10 April 2021 (
https://mycocosm.jgi.doe.gov/Mycfi2/Mycfi2.home.html) [
2]. Canonical effectors were identified using EffHunter v1.0 [
24]. The identification parameters were fixed as <400 amino acids and >4 cysteine, and the algorithm was executed in SO Linux/Unix (
https://github.com/GisCarreon/EffHunter_v.1.0, accessed on 28 May 2021). Non-canonical effectors were identified with WideEffHunter v2.0 [
28], with updated effector-related domains (accessed on 17 April 2024).
A set of scripts written in Perl language (version 5.30.0) in the Linux environment was used to analyze the number of amino acids, the most abundant amino acids and number and percentage of cysteine residues. The information was compiled in a tabular file.
Phylogenetic taxonomy distribution. Homologs of the P. fijiensis effector candidates were searched for with Blastp in the non-redundant database at GenBank (accessed on 23 April 2024), with a cutoff of 1 × 10−10. Regarding taxonomy distribution, five groups were classified: (1) wide phylogenetic distribution (homologs in non-related fungi; fungi of different families or orders), (2) closely related fungal genera (genera belonging to Mycosphaerellaceae family such as Cladosporium species, Cercospora species, Dothistroma species, Zymoseptoria species, etc.); (3) effectors of the Sigatoka complex (shared with P. eumusae and P. musae); (4) effectors shared with a closely related fungus (P. eumusae or P. musae); and (5) effectors specific to P. fijiensis.
Phylogenetic distribution was classified as discontinuous when patchy or non-continuous distribution in fungi was observed for the homologs.
Functional domains and motifs discovery. For the identification of functional domains, effector candidates were analyzed in the Conserved Domains and Protein Classification in the NCBI CD-Search Tool (
https://www.ncbi.nlm.nih.gov/Structure/bwrpsb/bwrpsb.cgi, accessed on 17 April 2024) [
35] (version v3.20). This server compiles several sources such as Pfam version 34.0 [
36], SMART [
37], the COGs collection [
38], TIGRFAMS [
39], the NCBI Protein Clusters collection [
40], NCBIfam [
41], and CDD’s internal data curation effort [
42]. The analysis was executed with an expected threshold value of 0.01 and 500 as the maximum number of hits.
Duplication and diversification of effectors in P. fijiensis genome. To find gene duplication and diversification, the database corresponding to the full effectorome of P. fijiensis was submitted to Blastp using “all-against-all” mode. Settings were established to obtain a maximum of 50 target sequences per query, and hits with e-value < 1 × 10−4.
Proteins were classified according to their function, and the proteins belonging to the same group were subjected to a multiple sequence alignment (MSA) using the Clustal Omega program from the EMBL-EBI server (
https://www.ebi.ac.uk/jdispatcher/msa/clustalo, accessed on 22 April 2024) [
43] with default parameters. A consensus sequence for each group was established at 70% identity. Those sequences with <40% identity with the group were considered “unclustered”.
Scaffolds 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, and 19 were classified as “core”, while the scaffolds 11, 13, 14, 15, 17, 18, 20, 21, 22, 44, 76, and 85 were classified as “dispensable scaffolds” according to Ohm et al. [
20] and Noar and Daub [
44].
The information on scaffold organization of all effector candidates was compiled in a database in tabular format.
Genomic organization of effectors in other fungi. To analyze genomic organization of effectors in other fungi, two hemibiotrophs,
Mycosphaerella graminicola (PRJNA19047) and
Leptosphaeria maculans (PRJNA171003), and two necrotrophs,
Cochliobolus heterostropus (PRJNA42739) and
Stagonospora nodorum (PRJNA13754), were selected. Their deduced proteomes were downloaded in GFF format and FASTA file from the NCBI platform (
https://www.ncbi.nlm.nih.gov/genome, accessed on 17 May 2024) and the Joint Genome Institute’s (JGI) Mycocosm Portal (
https://mycocosm.jgi.doe.gov/mycocosm/home, accessed on 17 May 2024). The effectoromes were predicted with EffHunter v1.0 and WideEffHunter v2.0, as used above for the prediction of the
P. fijiensis effectorome. Core and dispensable scaffolds or chromosomes were identified based on Ohm et al. [
20].
P. fijiensis effector expression in conidia, mycelia, and interaction with the plant host. Data regarding expression analysis of predicted
P. fijiensis effectors were collected from the previous reports with transcriptomics data of
P. fijiensis mycelia in vitro and in interaction with the banana host [
44], and transcriptomics data of
P. fijiensis conidia (both accessed on 29 April 2024) [
45].
Searching for protein motifs in the effectors. The motifs RXLR, YFWxC, LysM, EAR, [Li]xAR, PDI, CRN, and ToxA were screened using regular expressions (Regex) available at Carreón-Anguiano et al. [
45], accessed on 9 May 2024.
3. Results
3.1. Effectorome Generalities
EffHunter v1.0 predicted 240 canonical effectors while WideEffHunter v2.0 predicted 4939 effectors (non-canonical); therefore, the complete P. fijiensis effectorome comprises 5179 effectors, of which canonical effectors represent ~4.64% and non-canonical effectors represent 95.36% of the total effectorome.
WideEffHunter identified 2774 proteins with <400 amino acids, but with low cysteine content or no signal peptide. This evidenced that, even when the canonical length characteristic is met, many of the proteins do not meet all the canonical criteria. Non-canonical effectors were 49 to 4644 amino acids in length. Actually, 2165 effectors were larger than 400 amino acids and 307 of them were larger than 1000 amino acids.
The phylogenetic distribution for these 5179 identified effector candidates was determined (
Table 1) and it was found that their distribution was discontinuous, lending support to them being potential effectors. The majority (3179) are lineage-specific, i.e., homologs in fungi of the Mycospharellaceae family, and 2000 have a wide phylogenetic distribution, sharing homologs with other fungal families and even other fungal orders or fungal classes.
Cysteine content has been another common criterion used to identify effectors. WideEffHunter identified 4939 non-canonical effectors with cysteine ranging from 0 to 98 residues; the size of the protein lengths ranged from 49 to 4644 amino acids.
Table 2 shows the categorization of
P. fijiensis effectors in terms of cysteine content. In both canonical and non-canonical groups, the bulk of effectors are in the range of <2.99% cys. It is important to observe that a higher cys percentage does not mean a greater number of cys residues since the effector length is largely variable. Regarding the non-canonical effectors, the protein with ID 180235 has 98 cys residues in 3193 amino acids (3.07%), while the protein with ID 9960 has 15.69% cys, with 8 cys residues in 51 amino acids. Therefore, it would be most illustrative to provide the number of cysteine residues instead of the % Cys content as the cysteine content criterion during effector mining.
3.2. Effector Functions of P. fijiensis Effectors Are Associated with Stages of Biotrophy and Necrotrophy
The prediction of potential effector functions consisted of domain- and homology-based methods.
WideEffHunter was able to retrieve the three known
P. fijiensis effectors: Avr4, Ecp2, and Ecp6 (
Table S1).
Orthologs of other extracellular proteins (ECPs) of the biotroph
Passalora fulva, (syn.
C. fulvum) were identified by Blastp. In the canonical set, ECP7, ECP9, ECP20, ECP32, ECP44, ECP49, ECP52, ECP58, and ECP60 were identified, along with one homolog of ECP45 in the non-canonical set (
Table S1). Some ECPs share a homology with proteins of known functions such as extracellular lysophospholipase (ECP44), cerato-platanin (ECP45), CAP-domain containing PR-1 like protein (ECP 57), and malate dehydrogenase (ECP60). The roles of the majority of the ECPs are still largely unknown.
In the search for domains, 1942 domains were identified, 56 found in the canonical effectors (
Table S2) and 1900 domains among the non-canonical effectors (
Table S3).
Table 3 and
Table 4 show the domains with the largest number of hits in the effectorome of
P. fijiensis. Fourteen domains were shared among the canonical and non-canonical effectors of
P. fijiensis (
Table 5).
Two canonical candidates contained Rapid Alkalinization Factor (RALF) domains.
P. fijiensis is a hemibiotrophic fungus and is therefore expected to produce effectors to support the biotrophic stage, and, later, the necrotrophic stage of host infection.
The Fungal_TF_MHR domain is among the most frequently found domains in non-canonical effectors. This domain belongs to a large family of fungal zinc cluster transcription factors that contain a N-terminal GAL4-like C6 zinc binuclear cluster DNA-binding domain. Consequently, GAL4 was another domain found in many non-canonical effector candidates in this fungus. Other domains of transcription factors found in this effectorome were as follows: Fungal_trans_2, NAC_BTF3, PHD_SF superfamily, bHLH_SF, bHLHzip_SREBP_like, bHLHzip_USF_MITF, bHLH_SF, bHLHzip_SREBP_like, bHLHzip_USF_MITF, zf-C2H2, and ZnF_GATA. WD40-repeat was also a large family of effectors in P. fijiensis. More than 85 effectors have predicted transcription factors (TFs). Thirty PKcs were identified, reinforcing that signal transduction plays a key role in the effectorome. Additionally, Ankyrin repeat, an important domain that mediates protein–protein interactions, was among the top domains found in the non-canonical effectors.
Another family as large as Fungal_TF_MHR was the NADB_Rossmann superfamily (
Table 4).
During necrotrophy, the pathogen is expected to secrete toxins that kill the host cells. The domain Mycotoxin biosynthesis protein UstYa-like was the third most frequently found domain (
Table 4). Mitochondrial carrier proteins and metabolites seem to play a role as well in
P. fijiensis pathogenesis, since Mito_carr was another protein family expanded in the effectorome.
Fifteen proteins with the domain Hydrophobic surface binding protein A (HsbA) were identified among the non-canonical set. Lytic functions in effector candidates include carboxylesterase, palmitoyl protein thioesterase, GDSL-like Lipase/Acylhydrolase, pectate lyase, cellulases, glycosyl hydrolase families (10, 61, 16, 17, 18, 28, and 43), and cutinase, as well as proteases like peptidase A4, peptidase M43, peptidase S10, and trypsin. All these functions have been reported to be involved in pathogen–host interaction.
Additionally, the secreted phytotoxic protein cerato-platanin was identified in the non-canonical set. The other necrotrophy-related effectors were two homologs of necrosis-inducing proteins (NPPs) and CFEM-domain-containing proteins (
Table 5).
Interestingly, eight candidates, one in the canonical set and seven in the non-canonical set, are predicted isochorismatase hydrolases (
Table 5), and seven non-canonical candidates are predicted salicylate hydroxylases (
Table S1). These effectors target the salicylate metabolism in the host cell.
Lectins, carbohydrate epitope-binding proteins, comprise another large group of effectors in P. fijiensis. This group contains Avr4, which, as mentioned, before binds chitin, and 31 concanavalin A-like lectin/glucanase domain superfamily proteins: 2 canonical and 29 non-canonical candidates.
The Hce2 domain was found in three effectors: Hce2 class I (small secreted proteins of 80–400 amino acids in length) was found in the canonical effector Ecp2, and two Hce2 class II (similar modular architecture but in proteins with <800 amino acids) in two non-canonical effectors.
Table 5 shows all domains that are shared among the canonical and non-canonical effectors in
P. fijiensis.
3.3. Predicted P. fijiensis Effectors Are Expressed in Mycelia and Conidia, and in Interaction with the Plant Host
To explore whether predicted effectors of
P. fijiensis are indeed expressed, we analyzed the transcriptomes of
P. fijiensis mycelia grown in vitro and in interaction with the banana host, reported by Noar and Daub [
44], as well as during conidial germination, reported by Carreón-Anguiano et al. [
46]. Expression data were collected for 4622 of the 5179 effector candidates predicted here; 4457 of them showed differential expression in
P. fijiensis in interaction with the host (“in planta”), while 158 showed a similar expression to candidates expressed in in vitro mycelia (
Table S1). Among those with differential expression, 2126 were over-expressed in planta while 2331 had a higher expression in the in vitro mycelia. Five hundred and sixty-four (564) candidates showed no expression in pathogen–host interaction (
Table S4).
In P. fijiensis conidia, 618 effectors (11.93% of total effectorome) were expressed, 40 canonical and 576 non-canonical effectors. Seven effectors, two of them canonical (proteins IDs 205343 and 211577) and five non-canonical proteins (proteins IDs 180057, 187795, 7253, 112824, and 182312) were exclusively expressed in conidia. In addition, Blastp vs the Pathogen–Host Interaction database identified 920 homologs: 47 in the canonical effectors group and 873 hits in the non-canonical effectors.
3.4. Effectors Are Distributed Throughout the P. fijiensis Genome
To explore genomic organization of the effectors, they were classified according to their scaffold location. Ohm et al. [
20] identified in silico the dispensome of 18 dothideomycetes, including
P. fijiensis. Later, Noar and Daub [
44] confirmed experimentally that scaffolds 11, 13, 14, 15, 16, 17, 18, 20, 21, and 22 in the
P. fijiensis genome are actually dispensable DNA; scaffolds 44, 71, and 85 were also predicted to be part of the dispensable genome [
20]. Here, we termed all these scaffolds “dispensable scaffolds”, and scaffolds 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, and 19 as the “core scaffolds”. It was found that core scaffolds contain both canonical and non-canonical effectors, the latter having the largest proportion in all scaffolds (
Figure S1). Scaffolds 1 and 2 contained 33.30% of all effectors; 94.72% of them are expressed during interaction with the host (
Table S1).
In total, the core scaffolds contained 4770 effectors and 92.35% of these effectors are expressed during interaction with the host (
Table S1), supporting their participation in
P. fijiensis pathogenesis. Effectors of a particular length or cysteine content were not concentrated in any scaffold; rather, candidates with similar ranges of lengths in amino acids and cysteine richness were distributed in all scaffolds, suggesting that the effectors of
P. fijiensis do not show a tendency to concentrate in dispensable scaffolds, rather, they are well-dispersed throughout the genome.
Curiously, in the core scaffolds, the proportion of both classes of effectors was similar. For example, scaffold 1 contains 17.92% of the total canonical effectors and 21.59% of the total non-canonical effectors found in the core scaffolds; in scaffold 2, their proportions were 16.25 and 14.70%, respectively, and so on (
Table 6).
The dispensable scaffolds contained 409 effectors (7.89% of the effectorome), and 51.34% were expressed during interaction with the plant host. All effectors associated with the dispensome were non-canonical proteins. Scaffolds 16, 44, 76, and 85 each only had one effector and the effectors of scaffolds 16 and 44 were expressed during interaction with the host.
With the exception of scaffolds 16, 44, 76, and 85, the distribution of the non-canonical effectors in the dispensable scaffolds was quite homogeneous (
Table 6), similar to the distribution in the core scaffolds.
To determine whether the higher effector content found in core scaffolds compared to dispensable scaffolds occurs in other fungi, we analyzed the hemibiotrophs M. graminicola, and L. maculans, and the necrotrophs C. heterostrophus and Stagonospora nodorum.
The sizes of the effectoromes are in the range of those predicted for
P. fijiensis (
Table S5). Similarly, the majority of the effectors are in the core genomes and the largest sets are also non-canonical effectors. Dispensable genomes harbor more non-canonical effectors than canonical, while
S. nodorum has no effectors at all in the dispensable genome. Although initially surprising, this may be a widely occurring phenomenon among fungi, which has yet to be uncovered through further investigations.
To analyze in more detail the effector genomic distribution, all vs. all Blastp was performed to distinguish redundant groups, and then the members in the scaffolds were localized. We selected 12 functionally redundant groups: salicylate hydroxylases, isochorismatases, cutinases, concanavalin A-like proteins, and proteins containing the domains DUF, Fungal_TF_MHR, Mycotoxin biosynthesis protein UstYa-like, CFEM, HsbA, Hce2, LysM, and CAP (
Table 7). Expansion is observed in the families of salicylate hydroxylases, cutinases, and proteins containing CAP and Hce2 domains since all their members are grouped in a single cluster in each family.
Concanavalin A-like proteins, and the HsbA and CFEM protein families had two clusters each. Some members in these families were ungrouped.
The largest expansion of candidates was found here in the DUF family, comprising eight clusters with 19, 14, 7, 7, 5, 4, 4, and 4 members, respectively. The other 256 DUF proteins were ungrouped. DUF is actually not a single domain; it comprises diverse domains generically known as “Domains of Unknown Function”.
The distribution in the scaffolds suggests that the members of each family are well-dispersed in the genome (
Table 7).
3.5. Motif Screening in P. fijiensis Effectors
To expand on our knowledge of this fungus, the presence of motifs was screened in the complete effectorome.
Among the total 5179 effectors, 91 canonical and 4173 non-canonical effectors contain 15 known motifs (
Figure 1A); the most frequent motifs were RXLR in 1186 effectors, YFWxC in 836 effectors, LysM in 726 effectors, EAR in 566 effectors, [Li]xAR in 457 effectors, PDI in 193 effectors, CRN in 113 effectors, and ToxA in 96 effectors.
Figure 1B shows the occurrence of these motifs in
P. fijiensis effectors throughout the scaffolds. The bulk of the motifs was associated with core scaffolds, and the scaffold distribution was similar to that observed for the effectors in
Figure S1. Effectors harbored in dispensable scaffolds lacked known motifs.
The occurrence of each motif in canonical and non-canonical effectors was also investigated.
Table 8 shows the results. The most frequent motif found in canonical effectors was YFWxC, while, in non-canonical effectors, it was RXLR. YFWxC was also frequently found in non-canonical effectors as the second most abundant motif. EAR_1 and LysM were abundant in both canonical and non-canonical effectors.
3.6. Presence of Motifs in Other Fungal Effectors
RXLR, YFWxC, LysM, EAR, [Li]xAR, PDI, CRN, and ToxA motifs were screened in the effectoromes of different fungi which were not predicted by WideEffHunter, but by other strategies used by their respective authors.
Table S6 shows these results. There were a few fungi where CRN, [LI]xAR, and ToxA were absent, but RXLR, YFWxC, LysM, EAR, and PDI were present in all effectoromes screened here, ruling out that it is a bias introduced by WideEffHunter.
4. Discussion
Effector identification has been extremely complicated since each candidate is different; many effectors meet a few but not all the criteria commonly used for effector identification. In addition, most fungal effector predictions have paid attention to canonical candidates [
20,
47,
48] when effectoromes are much more complex, comprising also largely elusive non-canonical effectors [
25,
27,
49,
50]. Here, we identify the full effectorome of
P. fijiensis, one of the most important phytosanitary threats to bananas and plantains worldwide.
Cysteine content has been an important common criterion used to identify effectors. Some reports have based effector identification on at least four cysteines [
51,
52,
53,
54], but others employ 2% [
3] or 5% cysteine content [
55]. Regarding the criteria used to identify effectors, some true effectors may be excluded. For example, three validated effectors are currently known in
P. fijiensis: Avr4, Ecp2, and Ecp6 [
3,
8]. Avr4 of
P. fijiensis has 10 cys in its 121 amino acids (8.26%), Ecp2 has 5 cys in its 121 amino acids (3.11%), and Ecp6 has 8 cys in 413 amino acids (1.94%). Avr4 may be retrieved independently of the criterion of cysteine content being used, but Ecp2 is filtered out if 5% of cys is established as the cysteine percent cutoff; likewise, Ecp6 is excluded as a canonical effector since it is larger than 400 amino acids and has 1.94% cysteine content.
Which cys criterion to apply depends on the interest of the researcher; they may be interested in the full effectorome [
56] or prioritizing certain effectors for analysis in the laboratory [
57].
We used the EffHunter algorithm for canonical effector identification, which, to date, is the fungal effector predictor with the highest F1 score. EffHunter is stringent in its identification of canonical fungal effectors, with negligible or zero false positives [
24]. Two hundred and forty canonical effectors were predicted by EffHunter in
P. fijiensis. This number of canonical candidates is higher than that which was predicted by Stergiopoulus et al. [
16], but this is because these authors used a length cutoff of 300 amino acids, while, in our work, 400 amino acids were used according to the best results obtained by Carreón-Anguiano et al. [
24]. The size of the canonical effectorome predicted here for
P. fijiensis is in the range that was predicted for
Cladosporium fulvum [
58],
Pyrenophora tritici-repentis [
59], and
Fusarium oxysporum f. sp.
cepae [
60].
Non-canonical effectors were identified with WideEffHunter, which combines the search for effector motifs and domains with a homology to known effectors for effector identification [
28]. While EffHunter excluded Ecp6, WideEffHunter was able to retrieve it. WideEffHunter expanded the
P. fijiensis effectorome to 4939 effector candidates. With this strategy, the authentic
P. fijiensis effector, Ecp6, was retrieved; it is worth mentioning that this effector remained elusive in the quest for canonical candidates since it is 413 amino acids, surpassing the 400 amino acid limit used by EffHunter. The size of the full effectorome, 5179 effectors, is larger than the size of other fungal effectoromes previously predicted by WideEffHunter (1517–3811 effectors), but it is in the range of the effectorome sizes of
Bremia lactucae,
Trichoderma harzianum, and
Pestalotiopsis fici predicted by EffectorP 3.0 [
28].
Many effectors have a discontinuous phylogenetic distribution (phylogenetic “patches”) [
6,
16,
61]. Some authors discard candidates that have homologs in fungi phylogenetically distant from their microorganism under study. This is carried out to increase the chances of retrieving true effectors or to prioritize certain effectors for further study [
62,
63]. Homologs of effectors can indeed be distributed in phylogenetically distant fungi, although they may have lower identities, e-values, and scores [
6,
16]. Interestingly, fungi as well as other kingdoms of living beings present effectors with a discontinuous phylogenetic distribution [
46,
61]. To support or to discard effector candidates predicted in
P. fijiensis by WideEffHunter, Blastp was performed for each of the 5179 candidates of
P. fijiensis. Candidates that were largely conserved with a continuous phylogenetic distribution of homologs were discarded; on the contrary, candidates that have homologs with a discontinuous phylogenetic distribution in close relatives or in distant fungi were selected. However, all 5179 candidates showed a patchy phylogenetic distribution—either wide and discontinuous phylogenetic distribution or lineage specific distribution, supporting our effector identification strategy. This reveals that the
P. fijiensis effectorome is larger and more complex than previously believed. Furthermore, the pan-effectorome of
P. fijiensis might be larger, as other strains may harbor unknown effectors on their dispensable chromosomes. Similarly, non-canonical effectors that do not meet WideEffHunter criteria may be elusive to our identification strategy.
Although
P. fijiensis has one of the largest fungal genomes (74.14 Mpb) [
20], the size of the effectorome is not related to the size of the genome, e.g., in
Puccinia graminis f. sp.
tritici (88Mbp), 659 effectors were predicted, while, in
Blumeria graminis f. sp.
tritici (158 Mpb), 161 were predicted [
24]. The size of the effectoromes has been associated with the lifestyles of the fungi. Ohm et al. [
20] compared the effectoromes of
Pleosporales necrotrophs with
Capnodiales hemibiotrophs and found larger effectoromes in the necrotrophs. They proposed that, in hemibiotrophs, “a large arsenal of effectors could be detrimental, as it could lead to detection by the host plant and triggering of its defenses”; the authors also hypothesize that hemibiotrophs could down-regulate their effectors during stealth pathogenesis in order for the pathogen to remain undetected in the host. Ohm et al. [
20] used 200 amino acids as the upper limit for protein size and found that the most expanded effector families in the necrotrophs were small, secreted peptidases, lipases, carbohydrate-active enzymes, and enzymes involved in the synthesis of secondary metabolites. Using 400 amino acids as the upper size limit, and without restricting our search to hydrolases, we found a different but consistent pattern of effectororome sizes: necrotrophs have on average ~200 effectors, biotrophs ~300 effectors, and hemibiotrophs ~400 effectors [
24]. We proposed that the evasion of host perception, suppressing host defense responses, and keeping the host alive in biotrophic and hemibiotrophic fungi demand a larger catalog of effectors.
P. fijiensis is a hemibiotrophic fungus, but a large proportion of its effectorome is expressed during the necrotrophic stage of its interaction with the banana host (4615 effectors, 89.12%), considering the transcriptomics results of Noar and Daub [
44]. In
P. fijiensis conidia, 618 effectors (11.93%) are expressed [
45]. These results are consistent with our proposal that large effectoromes exist in hemibiotrophs to enable them to colonize and survive inside the host. In
P. fijiensis, conidia and mycelia are both infective and have been used for banana inoculation [
64]. Therefore, the large proportion of canonical and non-canonical candidates showing in vitro and in vivo expression supports that they are involved in
P. fijiensis pathogenicity. Only 10.89% of predicted effectors of
P. fijiensis were not expressed at all in any of the conditions.
We did not find in
P. fijiensis an expansion in the lipase and peptidase (hydrolases) families, consistent with the findings of Ohm et al. [
20] for biotrophic and hemibiotrophic fungi. Instead, expansions in other families were found, for example, in the transcription factors Fungal_TF_MHR, the mycotoxin biosynthesis protein UstYa-like, the lectins Concanavalin A-like, Hydrophobic surface binding protein A (HsbA), CFEM, Salicylate hydroxylase, Isochorismatase, and LysM, among others. Salicylic acid (SA) is a critical signaling molecule in the defense response to biotrophic and hemibiotrophic pathogens, and the
P. fijiensis effectorome comprises eight isochorismatases and seven salicylate hydroxylases that can disrupt salicylate metabolism and suppress plant immunity; the former hydrolyzes isochorismate, the precursor of salicylate, and the latter catalyzes the decarboxylative hydroxylation of salicylate into catechol [
65].
The NADB_Rossmann superfamily was found to be greatly expanded in
P. fijiensis effectorome. This domain is present in many dehydrogenases and other redox enzymes, evidencing the importance of redox activity in the effectorome of
P. fijiensis, as in other biotrophic and hemibiotrophic fungi [
66,
67].
Another family as large as the NADB_Rossmann superfamily was the Fungal_TF_MHR domain; these proteins are involved in the biosynthesis of ustiloxins, bicyclic ribosomal peptides, and cyclic peptidyl secondary metabolites. These toxins were first described in
Ustilaginoidea virens, but homologous genes to UstYa were found in the Ascomycota and Basidiomycota genomes [
68]. The lectin domain is involved in oligosaccharide binding and is associated with proteins involved in trafficking and sorting along the secretory pathway through vesicles [
69]. HsbAs are able to recruit lytic enzymes on the host´s cell wall and synergistically promote its degradation, important for necrotrophy [
70]. Lectins and LysM proteins most likely interfere with pathogen recognition by the host.
The RALF domain was also identified in some effectors. These small, secreted cysteine-rich peptides were first described in plants, and are involved in diverse processes. RALF homologs have been identified in fungal phytopathogens where they play a role in host cell alkalinization, the activation of virulence factors, and host infection [
71,
72].
Interestingly, a characterization of the effectorome of
P. fijiensis revealed similar proportions of canonical and non-canonical effectors throughout the core scaffolds (
Table 6). For example, one canonical and one non-canonical CAP-protein were found in scaffold 2, and the same was observed in scaffold 7. The four CAP-proteins group together in a single cluster (
Table 7), suggesting that duplication, diversification, and genomic reorganization contribute to the evolution of effectors in
P. fijiensis. The presence of other effector families also support that these events are occurring in the effectorome of
P. fijiensis; these families include salicylate hydroxylases, isochorismatases, and cutinases, among others (
Table 7). Salicylate hydroxylase members are distributed in the scaffolds 4, 5, 7, 8, 9, 10, and 12. Likewise, the DUF1793-glutaminase A (gtaA) family comprises six members distributed in scaffolds 1, 3, and 12. The wide genomic distribution of members of effector families probably protects the pathogen from the loss of important functions if any effector-containing genomic region should be lost.
In other fungi, some effectors have been found physically clustered in genomic islands, for example, in
Verticillium tricorpus and
V. dahlia [
73]. The physical clustering of effectors was recently used to isolate novel effector genes in
Fusarium oxysporum f. sp.
physali (Foph), when the regions containing the Secreted in Xylem (SIX) effectors in
F. oxysporum f. sp.
lycopercici (Fol) [
74] were compared. In addition, it is generally believed that the core genomes of many filamentous fungi contain conserved genes essential for normal development, while dispensable genomes contain pathogenicity genes such as effectors and genes involved in the biosynthesis of secondary metabolites [
29,
30]. However, there are effector genes located in core chromosomes, often located in close proximity to repetitive genomic regions like transposons [
34]. This is true, according to Ohm et. al. (2012), for
C. heterostrophus C5,
C. sativus,
L. maculans,
Septosphaeria turcica,
M. graminicola,
M. populorum,
P. tritici-repentis, and
Setosphaeria turcica, but it is not observed for
Alternaria brassicicola,
Baudoinia compniacensis, or
P. fijiensis, or its close relative,
C. fulvum [
20]. Other comparative genomic analyses of diverse plant pathogens have recently revealed that effectors do not always cluster, and do not necessarily colocalize with transposons [
34]. In
P. fijiensis, the effectors are likely dispersed throughout the genome, based on the balanced distribution of canonical and non-canonical effectors per core scaffold, as well as the genomic dispersion of paralogous effectors that we have observed. Similarly, in
L. maculans, nine Avrs designated AvrLm1–AvrLm9 have been genetically mapped on unlinked genomic regions [
7].
Many fungal dispensable chromosomes, also named B chromosomes, have been associated with roles in niche adaptation, pathogenicity, and host specificity [
75,
76,
77]. In
P. fijiensis and other Dothideomycetes, Ohm et al. [
20] identified in silico the dispensable scaffolds based on the following features: low G + C content, low gene density, the proportion of genes encoding proteins with PFAM domains compared to other scaffolds, and a high proportion of repetitive DNA. Here, 409 non-canonical effectors were localized in the dispensable DNA of
P. fijiensis, and 50% of these effector candidates were expressed during interaction with the
Musa host. Considering that the majority of the effectors described in the literature to date are canonical, and effectors have frequently been associated with dispensable chromosomes [
34], it was surprising that none of the canonical effectors of
P. fijiensis are harbored in the dispensable scaffolds. The association of canonical effectors with the dispensome is a widely occurring phenomenon in fungi, but not universal, as revealed by
P. fijiensis and
C. heterostrophus. In
P. fijiensis, some of the most expressed in planta proteins were located in the dispensable DNA [
44], evidencing the importance of these effectors to its pathogenicity. Similar to our results, in the wheat blast fungus
Magnaporthe oryzae [
78], and the wheat-pathogenic fungus
Zymoseptoria tritici [
30], effectors were recently associated with both core and dispensable genomes. However, the difference between
P. fijiensis and those fungi is that, in
M. oryzae and
Z. tritici, the effectors were canonical, while, in
P. fijiensis, only non-canonical effectors were found in the dispensable scaffolds.
Unlike the effector families described above whose members arise from gene duplication and their sequences group in a single cluster, some members of the concanavalin A-like proteins, and HsbA and CFEM families are ungrouped, suggesting that the effectorome of P. fijiensis is also expanding due to horizontal transfer.
The last feature analyzed in the effectorome of
P. fijiensis was the occurrence of effector motifs. WideEffHunter incorporated this identification criterion and found “oomycete motifs” in fungi and “fungal motifs” in oomycetes [
28]. Recently, WideEffHunter identified various motifs within effectors expressed in
P. fijiensis conidia, the RXLR motif was found in 161 effectors, LysM in 100 effectors, Y/F/WxC in 90 effectors, EAR-1 in 61 effectors, [LI]xAR in 60 effectors, PDI in 25 effectors, ToxA in 19 effectors, and crinkler (CRN) in 16 effectors, among others [
46]. Here, in the complete effectorome of
P. fijiensis, WideEffHunter retrieved 1186 members containing the RXLR motif, which was the most frequently found motif, followed by YFWxC in 836 effectors, LysM in 726 effectors, EAR in 566 effectors, [Li]xAR in 457 effectors, PDI in 193 effectors, CRN in 113 effectors, and ToxA in 96 effectors. These motifs were not only present in non-canonical effectors but also in canonical ones, which were not retrieved by motifs. In canonical effectors, RXLR was only found in four effectors while YFWxC was present in 35 canonical effectors in P. fijiensis. These data reinforce the utility of the inclusion of the domain and motifs in effector identification strategies. Additionally, to validate our results and demonstrate that the occurrence of these motifs in the effectoromes do not result from contamination with false positives by WideEffHunter, these motifs were screened and positively identified in other fungal effectoromes of previous reports (
Table S6). Our findings further demonstrate that, although many effectors are not conserved at the sequence level, they share known protein domains and motifs.
Motif screening permitted the identification of 719 novel RXLR-like effectors, 19 CRN-like effectors, and 138 Y/F/WxC effectors in the fungus,
P. graminis [
50]. This strategy also increased the predicted effectorome of P. infestans from 563 to 5814 effectors [
61].
In summary, effectors of P. fijiensis shared some of the effector features found in other microorganisms, such as the discontinuous or patchy phylogenetic distribution and expansion of certain effector families, but its evolutionary story differs from that of other pathogens. After effector gene duplication, the members seem to be distributed throughout the genome instead of being physically clustered together; interestingly, the effectors are predominantly found in the P. fijiensis core genome.