**4. Discussion**

The NLR genes constitute one of the largest gene families in angiosperms, with an average of about 300 genes per genome [8]. Genome-wide identification and comparative analysis of NLR genes has greatly accelerated the mining of functional NLR genes from various crops and ecological or economic important plants in recent years. The lack of NLR information has greatly hampered identification of functional disease resistance genes by using a genome-guided method for the Arecaceae species. Taking advantage of the recently released genomes from the five Arecaceae species, the NLR profile across the five species was compared in several aspects of this study, which may serve as a primary resource for molecular breeding of the Arecaceae species.

Different proportions of NLR genes in the Arecaceae species form clusters of varied size. The members within each of these NLR gene clusters provides large candidates for positive selection to act on. Additionally, high polymorphism can be maintained between NLR genes within the gene cluster through recombination, facilitating the generation of new NLR genes [35,36]. Comparative analysis of NLR genes in the five Arecaceae species showed that two of the Arecaceae genomes have more than 300 NLR genes, whereas the other three have fewer than 300 NLRs, suggesting that species-specific NLR gain and loss have occurred. NLR content dynamics is an important mechanism for plants coping with the varied environments and ecological adaption [8,37]. The species-specific NLR contraction or expansion for the five Arecaceae species suggest they may have faced different selection pressure from environmental microbes after having separated from the common ancestor, although the exact trajectory of environmental microbe diversity dynamic could hardly be traced. Reconstruction of the ancestral states of NLR genes at several divergence nodes of Arecaceae revealed 101 ancestral NLR lineages in the common ancestor, including two RNL lineages and 99 CNL lineages. The number of recovered ancestral NLR lineage in the Arecaceae family is much larger than that in the Orchidaceae family (29), but lower than that of the Poaceae (456) in the monocot clade [4,14]. The high difference of ancestral NLR lineage numbers in the three monocot families suggests that dramatic NLR contraction and amplification consistently occurred in the monocot lineage. Compared with several investigated dicot families, the ancestral NLR lineage in the Arecaceae family is fewer than the 119 ancestral NLR lineages in Fabaceae, and 166 ancestral NLR lineages in Solanaceae, and far fewer than the 228 ancestral NLR lineages found in Brassicaceae [13]. The large ancestral NLR lineage numbers in these dicot families may have benefitted from having an additional NLR subclass, TNL, in their genomes.

The current NLR profile in a genome is contributed by differential inheritance and amplification of ancestral NLR lineages [4]. Previous studies in Cucurbitaceae and Poaceae revealed that species in the two families experienced a similar pattern of NLR "contraction" [9–11], whereas Fabaceae and Rosaceae species exhibited "consistent expansion" of NLR genes after the families' radiation [5,12]. A distinct "first expansion and then contraction" pattern of NLR genes was observed in the five Brassicaceae [13]. However, species in the Arecaceae family exhibited two different modes of NLR gene evolution after diverging from the common ancestor. *D. jenkinsiana* and *E. guineensis* have experienced "consistent expansion" of the ancestral NLR lineages, whereas NLR genes in *P. dactylifera*, *C. nucifera*. and *C. simplicifolius* show "first expansion and then contraction". Such a pattern of species belonging to the same family having distinct NLR gene evolution patterns has also been observed in another monocot lineage, Orchidaceae, and two dicot families [15–17]. The results provide additional evidence to support that rapid NLR gene content variation could occur to facilitate plant adaption to changed environments.

The high diversity of NLR could be detected not only for gene content, but also for gene structure. For example, the three NLR subclasses have distinct N-terminal domains to support distinct functional mechanisms, either by making holes in the cell membrane, or by action as an enzyme [38]. Loss of characteristic domains is detected for many NLR genes in the Arecaceae species. This pattern has also been observed in several angiosperms by previous studies. For example, only a small proportion of NLR genes with intact

structures were reported in *C. annuum* (23.2%), *S. lycopersicum* (42.7%), *S. tuberosum* (28.2%), *P. trichocarpa* (46.2%), *M. truncatula* (39.1%), *Lotus japonicus* (31.0%), and *Oryza sativa* (30.6%) genomes [5,39–41]. The remaining large proportion of NLR genes further expanded the diversity of NLR genes through the loss of the N-terminal, C-terminal, or both domains (Figure S2). Notably, several studies have reported that NLR genes with atypical structures also function in plant immunity [42–45], suggesting the loss of the N-terminal or C-terminal domains may also be a mechanism to generated NLR functional diversity. It is worthy of note that a deeply diverged CNL lineage showed the feature of widespread loss of the N-terminal CC domain (Figure S2). The long-term maintenance and expansion of this CNL lineage suggest that the loss of the CC domain did not completely abolish the function of genes on this lineage. Considering the CC domain had been shown to be indispensable for multimerization and forming pores on the plant cell membrane of CNL proteins, the CC-lacking structure of this CNL lineage may suggest a different functional mechanism.

Different from the domain loss found in many NLR genes, we also detected fusion of alien domains for a small proportion of Arecaceae NLR genes to form the NLR-ID structure. This provides another way to expand the structure and functional diversity. In recent years, studies have increasingly found that alien domains can be fused to plant NLR proteins to act as target proteins for pathogens' effector factors. The research on RGA4/RGA5 and Pik-1/Pik-2 of the NLR with rice blast resistance provides the first experimental evidence for this model. Both RGA5 and Pik-1 genes are fused with an HMA domain, which serves as a decoy to interact directly with pathogenic effectors to stimulate the disease resistance of RGA4 and Pik-2 [46,47]. *Arabidopsis* NLR genes RPS4/RRS1 provides another example to support the important functions of alien domains. The RRS1 protein fused with WRKY domain can directly interact with the pathogenic effectors. AvrRPS4 interacts to stimulate the disease resistance activity of RPS4 [48]. In this study, different NLR alien domains were found in five species. These domains may be the "baits" proteins of pathogenic effectors in plant cells. Among them, v-SNARE and the PKinase domains have been detected in many proteins that play a role in plant disease resistance [32,33], but SRF-TF, DUF761and DUF4283, etc. have no direct evidence of being related to plant immunity. The discovery of these alien domains is helpful to explore more potential plant immune-related proteins.
