**3. Study on the Potential Adaptive Evolution and Domestication of Lotus**

The availability of lotus reference genome information has facilitated the resequencing of different lotus germplasms. Several studies were conducted on how the lotus genome was subjected to adaptive evolution and artificial selection. Although it is known that there are only two species of lotus, namely Asian lotus and American lotus, except for the difference in flower color, their plant architecture and morphology are very similar. Based on molecular phylogeny analysis, significant genetic differentiation between American and Asian lotus was verified [25–28]. De-novo deep sequencing of the American lotus showed that its genome size is 843 Mb, and an approximate 81% repeat sequence was identified (Table 1), which is larger than the genome of Asian lotus. It is interesting to investigate the dramatic difference in repeat sequence between them because most proteincoding genes show a one-to-one synteny pattern. A total of 29,533 structure variations (SVs) were detected between two lotus species, with the SV-associated genes overexpressed in 'regulation of mitotic cell cycle', and 'protein transporter activity' [25]. Meanwhile, this study also showed that the selection on an *MYB* gene might contribute to the color difference between Asian and American lotus [25]. It is still an open question about when the two species diverged during the evolution and how they could keep high similarity in the independent geographical evolution. The wild lotus is distributed widely worldwide and maintains higher genomic diversity than cultivated lotus. Tropical and temperate lotus are the two ecotypes of Asian lotus. The comparison of the genome of these two ecotypes showed that a total of 453 genes were subjected to selection, including *cyp714a* genes that may relate to rhizome morphogenesis and a 10-Mb region in chromosome 1 that might play key roles in environmental adaption; including a homolog gene of *at5g2394* in *Arabidopsis* encoding an acyltransferase protein [24]. By comparing their expressional patterns, the genes encoding granule-bound starch synthases, storage organ development, *COSTAN-like* gene family, vernalization, as well as cold response genes may relate to ecotypic differentiation [26].

It is very important to know the genetic backgrounds of parental lines in breeding. The origin, classification, and evolution of cultivated lotus were investigated through population re-sequence analysis. A total of 18 lotus accessions, including categories of American, seed, rhizome, flower, wild, and Thai lotus, were re-sequenced, based on which phylogenetic tree was constructed. The results indicated that the rhizome lotus had a closer relationship with wild lotus. In contrast, seed and flower lotus were admixed [26], which could be supported by re-sequencing of an enlarged population containing 296 accessions of different germplasm (58 wild, 163 rhizome, 39 flower, 32 seed lotus varieties) [28]. Further re-sequencing of 69 lotus accessions showed that flower lotus might mix with rhizome or seed lotus [27]. All studies showed a low genetic variation in rhizome lotus, while higher genetic diversity in seed lotus. The origin of different subgroups is controversial, which is possible because the same accession of lotus has other names which were then divided into different subgroups by various people. Based on this genomic diversity, the potential domestication signals of cultivar subgroups could be speculated because the selected genomic regions had lower nucleotide diversity. When subgroups of seed, rhizome, and flower lotus were compared with wild lotus subgroup, a total of 1214, 95, and 37 artificially selected regions containing 2176, 77, and 24 genes were identified in seed, rhizome, and flower lotus, respectively [27]. Several of these selected genes were involved in key developmental processes associated with different organs. For example, a *SUPER-MAN like* gene affecting seed weight and size and a *legumin A-like* gene involved in storage protein synthesis were identified in the subgroup of seed lotus, while an *expansin-A 13-like* gene was identified in the subgroup of rhizome lotus [27]. These specifically selected genes controlling agronomic traits in different subgroups are also possible targets for lotus breeding. Meanwhile, different types of molecular markers have been developed, which may further facilitate the clarification of the relationship between different subgroups and maker-assisted breeding of new lotus varieties [29–38].

#### **4. Identification of Genes with Potential Application in Lotus Breeding**

As the largest aquatic vegetable in China, lotus is mainly bred through traditional cross-breeding and physical and chemical mutation as supplementation, based on which thousands of varieties have been obtained [39]. However, the selection of high-quality varieties was mainly based on the breeders' experience, because the mechanisms underlying each economic trait remained unclear. With the development of genomics and molecular genetics of the lotus, genome-based breeding is gradually becoming an effective method for lotus. Causal genes regulating essential traits, such as flower color and shape, rhizome yield, and seed quality, have been widely studied.

Flower color, shape, and flowering time are important traits that determine the ornamental value of lotus. There are three different colors in lotus, red and white in Asian

lotus and yellow in American lotus. The red color in Asian lotus is determined by the contents of anthocyanin [40,41], which is controlled by key enzyme encoding genes, and their regulating transcription factors (TFs) such as *MYB*, *basic-Helix-Loop-Helix* (*bHLH*), *WD40* in its biosynthetic pathway. Among all the enzyme encoding genes in this pathway, *NnANS* and *NnUFGT* seem to be the decisive two genes [42,43]. Several TFs including 5 *MYB*, 2 *bHLH*, and one *WD-repeat* genes, may be involved in the regulation of anthocyanin biosynthesis in lotus based on a transcriptome analysis [43]. Among them, a *bHLH* gene *NnTT8* was verified to regulate anthocyanin biosynthesis [44], whereas the yellow color of American lotus is determined by carotenoid, and no anthocyanin was detected [25,45]. Further analysis indicated that the difference in the coding region between *NnMYB5* (Genbank accession, KU198697) and *NlMYB5* (Genbank accession, KU198698) is the main reason for the different colors in the two species. Flower morphology is another factor that determines the ornamental value of lotus. Flower development is controlled by intricate gene-regulatory networks, and many vital genes that control flowering time have been identified in flowering plant species. However, the molecular regulation mechanism has not been well characterized in lotus. Comparative transcriptomic analysis of different bud development stages in temperate and tropical lotus identified 147 lotus floweringtime associated genes that participate in photoperiod, gibberellic acid and vernalization pathways [46]. The *MADS-box* TFs are widely involved in plant growth and development. A total of 44 *MADS-box* genes were identified in lotus, and based on the selected candidates, *NnMADS14* (*SEPALLATA3* (*SEP3*) homolog gene) was identified to be related to floral organogenesis in lotus [47]. Lotus possesses distinct types of flower morphology, and the floral organ petaloid phenomenon is universal. Comparative transcriptomic analysis identified many hormonal signal transduction pathway genes and *MADS-box* genes; *AGAMOUS*(*AG*) was predicted as the candidate which was gene related to carpel petaloidy [48,49]. Genome-wide DNA methylation analysis showed that different flower organs exhibited different methylation levels, while *plant U-box* (*PUB33*) homolog gene might play crucial roles in the stamen petaloid [50]. Furthermore, *NnFTIP1* was proven to interact with *NnFT1* and regulate the flowering time in lotus [51].

The rhizome is the main edible part of lotus. It is important to explore the mechanisms underlying rhizome formation and expansion in rhizome lotus breeding. Comparative transcriptomic and proteomic analyses focusing on rhizome development have been conducted to dig out the key genes and pathways critical for the crucial physiological process [52–54]. Furthermore, re-sequencing of the natural and genetic F<sup>2</sup> populations has also identified several genetic regions and candidate genes that might be involved in lotus rhizome enlargement [55]. A systematic analysis was conducted on one candidate gene *CONSTANS-LIKE 5* (*COL5*). Functional analysis in the potato system indicated that *NnCOL5* might be positively associated with rhizome enlargement by regulating the expression of *CO-FT* genes and the GA signaling pathway [56]. In addition, one SNP was identified in another candidate gene *NnADAP* of *AP2* subfamily, which is closely associated with rhizome enlargement phenotype and the soluble sugar content [57]. There is a big difference between temperate and tropical lotus, especially the rhizome's morphology. Many genes were highly differentiated between them, such as *APL* homologs and granule-bound starch synthases genes [26]. Temperate lotus is distributed at 20◦ north latitude and shows a significant annual growth cycle, whereas tropical lotus is distributed south of 17◦ north latitude and exhibits perennial growth. Asian wild lotus can be further divided into temperate, subtropical, and tropical types and is distributed in northeast China, the Yangtze River and Pearl River Basin, Thailand and India [27]. Different lotus groups are subject to different selection pressure, such as light, temperature, UV, and soil types. The genes underlying selection were discussed by integrating population genetics and omics data. Several genes related to photosynthesis and DNA repair were selected, such as NAD + ADP ribosyltranferase, 8-oxoguanine-DNA glycosylase 1 and DNA polymerase epsilon subunit B2. The *vacuolar iron transporter* (*VIT*) family gene, *nodulin-like 21* gene encoding vacuolar iron transporter, may be related to metal ion metabolism [24]. The homolog gene of *Arabidopsis VIN3* in lotus was predicted to

be related to flowering time and dormancy, with higher expression in temperate lotus than in tropical lotus [26].

Lotus seeds are rich in nutrients and functional compounds such as alkaloids, flavonoids, and polyphenols [58,59]. They are consumed "as both food and medicine" [60]. It is essential to increase the yield and nutrition of lotus seed. The main factors determining lotus seed yield are the seed size and the number of lotus seeds per seedpod. Transcriptome analysis on the cotyledon of "CA" and "Jianxuan-17 (JX-17)" seeds at different developmental stages identified 8437 differentially expressed genes (DEGs). Many DEGs are involved in the brassinosteroid biosynthesis pathway, and further analysis predicted two *AGPase* genes as candidate genes affecting lotus seed yield [61]. It seems that phytohormones are involved in lotus seed development. A combination of metabolomic and proteomic methods revealed that 15 DAP (Day After Pollination) was a switch time point from the physiological active to the nutrition accumulation stage [62]. Starch is the primary nutritional component in mature lotus seed [63]. Its contents and the proportion of amylose and amylopectin could largely determine the nutritional value and taste of lotus cotyledon, respectively. ADP-glucose pyrophosphorylase (AGPase) plays an important role in regulating starch biosynthesis. The evolution of *AGPase* genes experienced a purification selection, and *NnAGPL2a* and *NnAGPS1a* were the candidate genes related to starch content [64]. Starch branching enzyme (SBE) genes are key regulatory genes during starch synthesis, and *NnSBEI* and *NnSBEIII* were identified as related to the chain length of amylopectin in lotus [65]. In addition, comparative metabolomics between wild germplasm "CA" and domesticated cultivar "JX-17" indicated that the seed yield and the content of metabolites showed tradeoffs [66]. For nutritional and medicinal values of lotus seed, the metabolomics-assisted strategy might be applied in lotus breeding in the future [67]. Seed dormancy is one of the domestication traits. The classical stay-green *G* gene controlling seed dormancy was cloned in domestication and as improvement genes in soybean, rice, and tomato. G gene interacts with *NCEDS* and *SPY* and in turn, regulates abscisic acid (ABA) synthesis [68]. *NnDREB1* and *NnPER1* were identified from lotus and may be involved in the ABA signal transduction pathway and then modulate longevity and dormancy [69].

Except for the above breeding objectives, there are other diversified breeding objectives, such as resistance to submerging and high antioxidant content. Lotus has evolved novel features to adapt to aquatic lifestyle. Many putative copper-dependent proteins, especially *COG2132* gene family, expand in lotus and form a separate phylogenetic clade having functions distinct from *Arabidopsis* [20]. Research has shown that although lotus grows in water, it is actually "afraid" of water. A time-course submergence experiment and RNA-seq analysis showed lotus has a low tolerance to complete submergence stress, and took two major strategies to cope with submergence stress in different stages. In the early stage (3~6 h) it initiates a low oxygen "escape" strategy (LOES), with the rapid accumulation of ethylene, rapid elongation of petioles, and significantly increases the density of aerenchyma and *ERF-VII* genes while lotus innate immunity genes become elevated; In the later stage (24~120 h), it starts a "breath holding" mode to limit its anaerobic respiration to the lowest level [70]. Flooding is serious abiotic stress affecting plant growth and can be classified into waterlogging and submergence. During the rainy season, the lotus is vulnerable to submergence. It is necessary to cultivate lotus varieties that are resistant to flooding to promote economic value.

*WRKY* TFs play key roles in modulating plant biotic and abiotic stress response and secondary metabolic regulations. A total of 65 *WRKY* genes were identified in lotus, and they were regulated by salicylic acid (SA) and jasmonic acid (JA), of which *NnWRKY40a* and *NnWRKY40b* were significantly induced by JA and promoted benzylisoquinoline alkaloid (BIA) biosynthesis [71]. Lotus predominantly accumulates BIA, and the leaf and embryo have different alkaloid components that may be caused by two cluster *CYP80* genes synthetic bis-BIAs, and aporphine-type BIAs, respectively. Five TFs (3 *MYBs*, one ethylene-response factor, and one *bHLH*) were identified as the regulator involved in the BIA biosynthetic pathway in lotus [72].

#### **5. Conclusions and Perspective**

The new varieties of lotus with high yield, wide adaptability, and stress resistance play a vital role in improving the economic value of this important horticulture crop. The variations identification, functional gene cloning, and metabolites alterations among diverse germplasm resources were investigated in the past decades, driven by the progressively improved genome information which could facilitate breeding practices in lotus (Figure 1). However, a high-quality reference genome is the limiting factor that will affect the molecular breeding process. Improvement of the lotus reference genome will be a requisite in the future, directly affecting the accuracy of molecular markers and the efficiency of cloning functional genes. Gapless reference genomes and pan-genomes have become the new reference, based on which plentiful information of genomes such as open chromatin and more variant information can be explored. With the explosive growth of large-omics data, deep learning can be used to mine biological information and decipher gene regulation networks. Moreover, a sound genetic transformation system has not yet been well established in lotus, which still restricts the validation of gene function and genome-based gene editing, further hindering breeding strategies. Few studies on epigenomics, such as histone marks, accessible chromatin regions, and genomic interactions, have been conducted and are needed in future. Based on these investigations, collection of wild lotus germplasm and classification of both wild and cultivated germplasm, analysis of domestication, and identification of molecular markers and genes closely linked to important agronomic traits, should be systematically conducted in the coming years. Combining these and developing multiple breeding targets will speed up the breeding efficiency in lotus. *Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 8 of 11

**Figure 1.** Flowchart of the molecular breeding process of lotus. **Figure 1.** Flowchart of the molecular breeding process of lotus.

and agreed to the published version of the manuscript.

**Institutional Review Board Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

no. 32102422).

genome databases. *Front. Plant Sci.* **2018**, *9*, 418.

expression and variants of *Nelumbo nucifera*. *Sci. Data* **2021**, *8*, 38.

studies of 14 agronomic traits in rice landraces. *Nat. Genet.* **2010**, *42*, 961–967.

variation reveals the origin of cultivated rice. *Nature* **2012**, *490*, 497–501.

**References** 

796–815.

*34*, 666–681.

*Biol*. **2021**, *72*, 357–385.

**Author Contributions:** H.Q.: Original draft preparation and writing. F.Y.: review and editing; J.D.: discussion and editing; P.Y.: editing, review, conceptualization, supervision. All authors have read

**Funding:** This research was supported by the National Natural Science Foundation of China (NSFC

**Acknowledgments:** We thank all the colleagues who have been involved in the studies on lotus.

**Conflicts of Interest:** The authors declare no conflicts of interest.

1. Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant *Arabidopsis thaliana*. *Nature* **2000**, *408*,

2. Chen, F.; Dong, W.; Zhang, J.; Guo, X.; Chen, J.; Wang, Z.; Lin, Z.; Tang, H.; Zhang, L. The sequenced angiosperm genomes and

3. Li, H.; Yang, X.; Zhang, Y.; Gao, Z.; Liang, Y.; Chen, J.; Shi, T. Nelumbo genome database, an integrative resource for gene

4. van Dijk, E.L.; Jaszczyszyn, Y.; Naquin, D.; Thermes, C. The Third Revolution in Sequencing Technology. *Trends Genet.* **2018**,

5. Liang, Y.; Liu, H.J.; Yan, J.; Tian, F. Natural Variation in Crops, Realized Understanding, Continuing Promise. *Annu. Rev. Plant* 

6. Huang, X.; Wei, X.; Sang, T.; Zhao, Q.; Feng, Q.; Zhao, Y.; Li, C.; Zhu, C.; Lu, T.; Zhang, Z.; et al. Genome-wide association

7. Huang, X.; Kurata, N.; Wei, X.; Wang, Z.X.; Wang, A.; Zhao, Q.; Zhao, Y.; Liu, K.; Lu, H.; Li, W.; et al. A map of rice genome

8. Liu, S.; Zhang, M.; Feng, F.; Tian, Z. Toward a "Green Revolution" for Soybean. *Mol. Plant* **2020**, *13*, 688–697.

**Author Contributions:** H.Q.: Original draft preparation and writing. F.Y.: review and editing; J.D.: discussion and editing; P.Y.: editing, review, conceptualization, supervision. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by the National Natural Science Foundation of China (NSFC no. 32102422).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** We thank all the colleagues who have been involved in the studies on lotus.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


MDPI St. Alban-Anlage 66 4052 Basel Switzerland www.mdpi.com

*International Journal of Molecular Sciences* Editorial Office E-mail: ijms@mdpi.com www.mdpi.com/journal/ijms

Disclaimer/Publisher's Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Academic Open Access Publishing

mdpi.com ISBN 978-3-0365-9378-4