**1. Introduction**

The innate immune system can protect plants from the threats of foreign pathogens [1]. One of the core parts of the plant immune system is a set of genes, termed plant disease resistance genes (*R* genes), which recognize pathogen-derived virulence proteins (called "effectors") to activate downstream defense responses [1]. Upon the recognition of the invasion of pathogens, R proteins can activate a hypersensitivity reaction and a series of immune responses, and finally cause the cell death of the infected cells, to restrain the proliferation and spread of pathogens [1]. Nucleotide-binding leucine-rich repeat (NLR) genes are the largest type of all the different *R* genes, accounting for over 60% of the *R* genes functionally characterized to date [2]. A typical NLR protein contains a variable domain at the N-terminus, a highly conserved NBS domain in the middle, and a diverse leucine-rich repeat (LRR) domain at the C-terminus [3]. As the N-terminal domains in angiosperms are usually annotated as CC, TIR, or RPW8 domain, angiosperm NLR genes were classified into three subclasses: CC-NBS-LRR (CNL), TIR-NBS-LRR (TNL), and RPW8-NBS-LRR (RNL) [4,5]. Functionally, CNL and TNL proteins act as "sensor NLRs" that recognize specific pathogen effectors to trigger downstream immune responses, while RNL proteins serve as downstream signal transduction molecules ("helper NLR") of CNL and TNL proteins [6,7].

NLR genes constitute a large gene family in plant genomes, usually comprising hundreds of members [8], and they show very fast evolutionary modes in response to

**Citation:** Li, X.-T.; Zhou, G.-C.; Feng, X.-Y.; Zeng, Z.; Liu, Y.; Shao, Z.-Q. Frequent Gene Duplication/Loss Shapes Distinct Evolutionary Patterns of NLR Genes in Arecaceae Species. *Horticulturae* **2021**, *7*, 539. https:// doi.org/10.3390/horticulturae7120539

Academic Editor: Young-Doo Park

Received: 4 November 2021 Accepted: 30 November 2021 Published: 2 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

the fast-evolving pathogens [4]. With more and more plant genomes being sequenced, genome-wide evolutionary analyses and comparative genomic studies of NLR genes have been performed in many species and taxa, and different taxa exhibited distinct evolutionary patterns. For example, frequent gene losses and limited gene duplications resulted in a small number of NLR genes in the Cucurbitaceae species [9]. A similar pattern of NLR gene contraction, caused by gene losses or frequent gene deletions, was also reported for Poaceae species [10,11]. In contrast, NLR genes in Fabaceae and Rosaceae species exhibited a "consistent expansion" evolutionary pattern [5,12], while the five Brassicaceae species exhibited a "first expansion and then contraction" of NLR genes [13]. Moreover, species belonging to the same family may also show distinct patterns of NLR gene evolution [14,15]. For example, in four orchid species, *Phalaenopsis equestris* and *Dendrobium catenatum* exhibited an "early contraction to recent expansion" evolutionary pattern, while *Gastrodia elata* and *Apostasia shenzhenica* showed a "contraction" evolutionary pattern [14].

The distinct evolutionary patterns of these angiosperm lineages provide valuable resources to the understanding of the fast evolutionary modes of *R* genes due to threats from different pathogens. However, most of these investigated angiosperm lineages are from the dicot clade, while only two monocot lineages have been surveyed [10,11,14]. Because the monocot and dicot clades are different in NLR subclass composition [8], investigating more monocot lineages would provide new insights into the NLR gene dynamics among monocot evolution.

The Arecaceae consists of 183 genera and 2450 species, which are distributed throughout the tropical and subtropical areas in Africa, the Americas, Asia, Madagascar, and the Pacific, and widely grown as ornamentals in botanical gardens (Flora of China, www.iplant.cn/ foc/, accessed on 2 August 2021). Recently, the genome of five horticultural plants from the Arecaceae family of the Arecales, including *Elaeis guineensis* (2n = 32), *Phoenix dactylifera* (2n = 36), *Daemonorops jenkinsiana* (2n = 24), *Cocos nucifera* (2n = 32) and *Calamus simplicifolius* (2n = 26), were sequenced and made available [16–19]. Among them, oil palm (*E. guineensis*) is a source of vegetable oil and has very important economic value [20], date palm (*P. dactylifera*) is the most popular fruit in the Middle East and North Africa, and *C. nucifera* is widely distributed on Earth and has considerable food and medicinal value [21]. These horticultural plants are faced with infection by various pathogens during their lifespan. However, the composition and evolutionary pattern of NLR genes in the Arecaceae family have rarely been investigated [22,23]. Deciphering the evolutionary pattern of NLR genes among the five Arecaceae species would provide an additional example of dynamic NLR gene evolution across species speciation in the monocot lineage. Additionally, the obtained NLR information may serve as a primary resource for the disease resistance breeding of the Arecaceae species.

#### **2. Materials and Methods**

#### *2.1. Identification and Classification of the NLR Genes*

The five whole genomes of the *E. guineensis*, *P. dactylifera*, *D. jenkinsiana*, *C. nucifera* and *C. simplicifolius* were used in this study. Genomic sequences and annotation files were obtained from the GigaScience database. NLR genes of the five genomes were retrieved from the ANNA database (https://biobigdata.nju.edu.cn/ANNA/, accessed on 10 August 2021). All the identified NLR genes were subjected to NCBI's conserved domain database (https://www.ncbi.nlm.nih.gov/ Structure/cdd/wrpsb.cgi, accessed on 30 August 2021) using the default settings to determine whether they encoded CC, RPW8, LRR and other integrated domains (E value: 10−4). The domains that commonly encoded by the NLR genes, such as NBS, LRR, TIR, RPW8, CC, AAA+ and DUF1863 were removed from the integrated domain list.

#### *2.2. Cluster Arrangement of the Identified NLR Genes*

Gene clustering was determined according to the criterion used for *Medicago truncatula* [24]: if two neighboring NLR genes were located within 250 kb on a chromosome, these two genes

were regarded as members of the same gene cluster. Based on this criterion, the NLR genes in the five Arecaceae genomes were assigned to clustered loci and singleton loci.

## *2.3. Sequence Alignment and Phylogenetic Analysis of NLR Genes*

The amino acid sequences of the NBS domain were extracted from the identified NLR genes and used for multiple alignments using ClustalW integrated in MEGA 7.0 with default settings [25]. Sequences that were too short (<190 amino acids, less than two-thirds of a regular NBS domain) or too divergent were removed to prevent interference with the alignments and subsequent phylogenetic analysis. The resulting alignments were manually corrected and improved using MEGA 7.0. The phylogenetic tree was constructed using IQ-TREE (version 1.6.12) with the maximum likelihood method, following the selection of best-fit model by ModelFinder [26,27]. Branch support values were assessed using SH-aLRT and UFBoot2 tests with 1000 replications [28].
