**1. Introduction**

Food shortage has become a rising challenge with the increase of the world's population and decrease of natural resources. It is incredibly significant to breed crops with high yields, of good quality, and high-stress resistance to ascertain food security since crops provide a staple food supply for the world. To achieve this, it is necessary to obtain a deeper understanding of the crops' genetic background, especially their genome information. Since the first flower plant, *Arabidopsis thaliana* genome was sequenced in 2000 [1], more and more plant whole genomes have been sequenced and deposited in databases, which are available to the public [2], including *Nelumbo* genome database (http://nelumbo.biocloud.net/nelumbo/home) [3]. Third-generation sequencing, which can produce long sequence read, has shown its advantages over next-generation sequencing (NGS) in generating high continuity reference genome assemblies [4].

Genomics is the cornerstone of breeding, and studies based on whole-genome sequencing and genome-wide association study have greatly driven forward genomics-assisted breeding in many crops [5,6]. Cloning and functional analysis of genes associated with important agronomic traits in rice (*Oryza Sativa*), soybean (*Glycine max*), and tomato (*Solanum lycopersicum*) have also demonstrated that high-quality genomes are prerequisite to clarify variations in each species [7–11]. However, population genetic analysis relied on a single reference genome that lost variant information, especially in the highly polymorphic region. Pan-genome contains the totality of genome sequence information of the target species and covers more comprehensive variant information. Pan-genomes have been constructed in many plants, such as rice, maize, brassica, and soybean, and applied to identify causal genes [12–15]. Pan-genome or graph pan-genome is obtaining new references along with the upgrading of sequencing. The information on genome maps, domestication, improvement-related genes, and regulation pathway promotes the understanding of plant evolution and accelerate breeding [15].

Lotus is one of the relict plants retaining the original morphology of its ancestors, as well as *Ginkgo biloba*, *Liriodendron*, and *Metasequoia glyptostroboides*. It belongs to the

*Nelumbo* genus of the Nelumbonaceae family, which includes two species, namely Asian lotus (*Nelumbo nucifera* Gaertn.) and American lotus (*Nelumbo lutea* Pear.). The two species are named for their different geographical distributions. Asian lotus is mainly distributed in Asia and the north of Oceania, while American lotus is distributed in North America and South America. The plant morphology differs between them. Asian lotus is a tall plant, with oval leaves and seeds, and red or white flower colors, whereas American lotus is a short plant, nearly round and with dark green leaves, spherical seeds, and yellow flowers [16]. There is no strict reproductive isolation between them, and the life cycles are similar at about five months. Asian lotus is commonly called lotus and has more than 3000 years of cultivation history as a horticultural crop [17]. Lotus seeds and rhizomes have rich nutritional value and unique health-care function. Lotus seeds contain starch, proteins, amino acids, polysaccharides, polyphenols, alkaloids, and mineral elements. Lotus rhizome has a high vitamin C content. During the long period of domestication and artificial selection, about 4500 lotus cultivars have been obtained up till the present [18]. These cultivars have been planted to produce edible vegetables, snacks, beverages, restorative materials, and ornamental flowers, which impact human life and economic development. The lotus industry is also important for rural revitalization in the Yangtze River, Pearl River, and Huang Huai river basins. The cultivated lotus is generally divided into rhizome lotus, seed lotus, and flower lotus based on their different usage. The notable feature of the rhizome lotus is the enlarged rhizome but with few flowers. It can be divided into power and crisp type according to the taste of the rhizome. Different varieties were bred to meet the taste of the different regions of people or for further usage. The main breeding goal of rhizome lotus is to improve the yield and quality of the rhizome. Seed lotus is mainly for lotus seed production, with high yield, good quality, and disease resistance being the breeding goals. Flower lotus is preferred for ornamental use, and it has distinct flower colors and shapes. During long cultivation, ornamental lotus with different flower morphologies were obtained, including few-petaled, double-petaled, petaloidy, and thousand-petalled flowers. Red, pink, yellow, and white are the main flower colors. Currently, the breeding objective is mainly aimed at flower shape and color, yield or quality of lotus seed and rhizome, and wide adaptability.

As a basal eudicot species, lotus plays an essential role in studying plant evolution and phylogeny. It is adapted to the aquatic environment, while its relatives are shrubs or trees living on land. Water lily lies at the phylogenetic position of the base angiosperm and has similar living conditions and flowers. However, its genomes are vastly different [19]. Lotus has unique features such as water-repellent self-cleaning function, multi-seed production, and flower thermogenesis, which may relate to flower protogyny or provide a warm environment for pollination [16]. Because of its importance in plant phylogeny and wide application, lotus has gained increasing attention from the scientific community. Since the release of the first version of two lotus reference genomes [20,21], genome-based investigations have been conducted continuously. Subsequently, the high-resolution genetic map and BioNano optical map were applied to improve the accuracy and assembly of the lotus genome [22]. A hybrid assembly was completed using PacBio sequencing data and previously published short reads [23]. High-quality genome assembly of "Taikonglian NO. 3" and American lotus genome were also recently generated [24,25]. High-throughput re-sequencing of different lotus cultivars has been utilized to identify numerous molecular markers, promoting marker-assisted selection. Moreover, "omics" approaches such as transcriptomics, proteomics, and metabolomics were applied in elucidating molecular regulatory networks of yield, quality, and response to stress in lotus. Here, we briefly review the latest progress of studies on the lotus genome, and how genome information could be used in lotus breeding. Meanwhile, the existing challenge and potential prospects are also discussed.

#### **2. Sequencing, Assembly and Annotation of Lotus Genome**

Lotus occupies a crucial phylogenetic position in flowering plants. The high-quality reference genome of lotus plays a vital role in studying the origin of eudicot and lotus molecular breeding. In the last decade, some lotus varieties were sequenced by different platforms, which resulted in a different version of the genome assembly and annotation (Table 1). Based on NGS, a wild lotus, "China antique (CA)", was successfully sequenced and assembled [20]. The total sequenced genome length of "CA" is 804 Mb, of which 543.4 Mb (67.6%) were anchored to nine megascaffolds. The contig N50 was 38.8 Kb and the scaffold N50 was 3.4 Mb. The heterozygosity of "CA" genome is 0.03%, and the repetitive sequence is about 57%. A total of 26,685 protein-coding genes were predicted, with the average length of a gene being 6561 bp. Simultaneously, another wild strain of lotus, "Chinese Taizi" was assembled through NGS technology. The final assembled genome size is 792 Mb with the contig N50 39.3 Kb and scaffold N50 986.5 Kb [21]. The length of transposable elements is 392 Mb (49.48%), and 36,385 protein-coding genes were annotated. One WGD event -λ in lotus instead of the paleo-hexaploid arrangement (γ WGD) event that occurred in core eudicots was predicted [20,21]. These two genomes were further anchored to eight pseudo-chromosomes by constructing a higher resolution genetic map and physical maps [22].

**Table 1.** Comparison of assembled lotus genomes.


With the advent of a new sequencing platform, the genome of "CA" was re-assembled using 11.9 Gb long-read data from PacBio Sequel, and 94.2 Gb previously sequenced short-read data [23]. The new assembly of "CA" is 807.6 Mb with the contig N50 being 484.3 Kb, which has significantly increased the quality of the genome. The ratio of repetitive sequence (58.5%) was similar to the first version. Moreover, a cultivated lotus, "Taikonglian NO. 3", was also assembled using the Oxford Nanopore sequencing platform (57.9 Gb raw data) with the contig N50 being 5.1 Mb, and eight chromosomes were anchored based on high-throughput chromatin conformation capture (Hi-C) data [24]. Another lotus species, American lotus, was recently assembled using PacBio RSII (74.6 Gb raw data) and Hi-C (50.32 Gb raw data), and the total length is 843 Mb while contig N50 is 1.34 Mb [25]. These data demonstrate that long-read sequencing technology has greatly improved the quality of the lotus genome. The successful assembly of the genome in Asian lotus, including wild and cultivar varieties, and American lotus will assist the investigation of functional genomics as well as molecular breeding in lotus.
