Next Article in Journal
The Effect of Exogenous Melatonin on the Photosynthetic Characteristics of Rhododendron simsii Under Cadmium Stress
Previous Article in Journal
Elucidating the Underlying Allelopathy Effects of Euphorbia jolkinii on Arundinella hookeri Using Metabolomics Profiling
Previous Article in Special Issue
An Overview on Nettle Studies, Compounds, Processing and the Relation with Circular Bioeconomy
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A High-Quality Phased Genome Assembly of Stinging Nettle (Urtica dioica ssp. dioica)

1
Michael Smith Laboratories, University of British Columbia, 2185 East Mall, Vancouver, BC V6T 1Z4, Canada
2
Biology Department, University of British Columbia, 1177 Research Rd, Kelowna, BC V1V 2W9, Canada
3
Institute of Plant and Environmental Sciences, Faculty of Agrobiology and Food Resources, Slovak University of Agriculture in Nitra, 949 76 Nitra, Slovakia
4
Botany Department, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
5
Biodiversity Research Centre, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
6
Luxembourg Institute of Science and Technology, Technology (LIST), 5, Rue Bommel, L-4940 Hautcharage, Luxembourg
7
Beaty Biodiversity Museum, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
*
Author to whom correspondence should be addressed.
Plants 2025, 14(1), 124; https://doi.org/10.3390/plants14010124
Submission received: 21 November 2024 / Revised: 24 December 2024 / Accepted: 2 January 2025 / Published: 3 January 2025
(This article belongs to the Special Issue Nettle: From Weed to Green Enterprise)

Abstract

:
Stinging nettles (Urtica dioica) have a long history of association with human civilization, having been used as a source of textile fibers, food and medicine. Here, we present a chromosome-level, phased genome assembly for a diploid female clone of Urtica dioica from Romania. Using a combination of PacBio HiFi, Oxford Nanopore, and Illumina sequencing, as well as Hi-C long-range interaction data (using a novel Hi-C protocol presented here), we assembled two haplotypes of 574.9 Mbp (contig N50 = 10.9 Mbp, scaffold N50 = 44.0 Mbp) and 521.2 Mbp (contig N50 = 13.5 Mbp, scaffold N50 = 48.0 Mbp), with assembly BUSCO scores of 92.6% and 92.2%. We annotated 20,333 and 20,140 genes for each haplotype, covering over 90% of the complete BUSCO genes and including two copies of a gene putatively encoding the neurotoxic peptide urthionin, which could contribute to nettle’s characteristic sting. Despite its relatively small size, the nettle genome displays very high levels of repetitiveness, with transposable elements comprising more than 60% of the genome, as well as considerable structural variation. This genome assembly represents an important resource for the nettle community and will enable the investigation of the genetic basis of the many interesting characteristics of this species.

1. Introduction

Urtica dioica L. (U. dioica ssp. dioica; stinging nettle, or common nettle) is an herbaceous perennial in the Urticaceae family, native to Eurasia and northwest Africa [1]. U. dioica is widely distributed in temperate and tropical climates and can grow on a variety of different soil types, although it prefers moist habitats rich in nitrogen and phosphorus [2]. It is often found on disturbed soils, where it can grow in dense stands, which has earned U. dioica a reputation as a weed. However, since at least the Bronze Age and continuing to the present day, U. dioica and its close relatives have also been used by humans as sources of fiber, food, and medicine [3,4,5]. Similar to cannabis (Cannabis sativa L.), flax (Linum usitatissimum L.), and ramie (Boehmeria nivea (L.) Gaudich.), the stems of U. dioica produce long (43–58 mm) bast fibers that are rich in crystalline cellulose (>70%) and have a relatively low lignin content (2–7%) [6]. Fibers have been used historically in papermaking, cordage, and textiles, and there is renewed interest in using nettle in both clothing and composite materials. Both the shoots and roots of stinging nettle are consumed in some cultures, and dried nettle powder contains an impressive 30% protein content (w/w) [7]. The lipids produced in the leaves also contain a very high proportion of polyunsaturated fatty acids (41% alpha-linolenic acid, 18:3n − 3; 12% linoleic acid, 18:2n − 6), which are beneficial to human nutrition [8]. Teas and other extracts made from stinging nettle have been reported to have anti-proliferative, antibacterial, anti-inflammatory, hypoglycemic, and other pharmacological activities [9].
The common name of stinging nettle comes from the irritation experienced by vertebrates when they make skin contact with certain trichomes of U. dioica. Although this effect was previously attributed to the presence of histamines, organic acids, and neurotransmitters [10], it has recently been demonstrated that small peptides, namely urthionin (Δ-Uf1a) and a sodium-gated channel modulator named urticatoxin (β/δ-Uf2a), contribute to the painful sensation [11]. Small neurotoxic peptides in Dendrocnide species similarly affect sodium-gated channels [12].
U. dioica is either diploid (2n = 24, 26) or tetraploid (2n = 48, 52), with tetraploids being much more abundant [2,13]. Occasionally, triploid and pentaploid individuals are found within populations [2]. Genome size estimates for diploid U. dioica also vary according to different studies, ranging from 558 Mbp to 660 Mbp [2,13]. U. dioica is distinct from slender nettle, Urtica gracilis Aiton (also known as U. dioica ssp. gracilis (Aiton) Selander), which is native to North America. While U. dioica is generally dioecious, meaning that female and male flowers are on separate individuals, U. gracilis is monoecious, with female and male flowers on the same plant [14]. Complex variation of morphological traits in the genus Urtica makes the classification of species and subspecies sometimes difficult, although attempts have been made to resolve these phylogenetic relationships using molecular data [15]. More recently, reference genome assemblies have been produced for a few species in the Urticaceae family; these include U. urens L. [16] and Parietaria judaica L. [17], as well as an assembly produced by the Darwin Tree of Life project [18] for an U. dioica L. individual collected in the UK (Table S1). While these assemblies provide very valuable resources for the research community, they are not fully haplotype-resolved, meaning that they report only one chromosome-level primary haplotype, with contigs belonging to the other haplotype(s) being collected in a shorter secondary haplotype assembly. A high-quality haplotype-resolved reference genome of a well-characterized U. dioica individual would allow for accurate comparison of sequence and structural diversity within haplotypes, further facilitating the study of the phenotypic, genetic, and karyotypic diversity within U. dioica and helping improve our understanding of taxonomic relationships within the Urticaceae.
In this study, we develop a diploid, phased chromosome-level genome assembly for Urtica dioica ssp. dioica using a combination of sequencing approaches, including PacBio HiFi and Oxford Nanopore (ONT) long reads, whole genome shotgun (WGS) Illumina short reads, and chromatin conformation capture (Hi-C). We further explore characteristics of the stinging nettle genome, such as the repeat landscape across chromosomes, the presence of large structural variants (SVs) between haplotypes, and how it compares to related taxa. Finally, we comment on the presence of putative neurotoxic peptides associated with the nettle’s sting.

2. Results and Discussion

2.1. Genome Sequencing and Assembly

We generated 66.3 Gbp of PacBio HiFi data (115X genome coverage) from a female, diploid individual of stinging nettle. HiFi reads had a median read length of 15.45 kbp and a median read quality of Q29. Only reads with quality ≥ Q20 were used for the assembly (50.7 Gbp, 88X genomic coverage). We also generated 29.02 Gbp paired-end Hi-C reads (50X genome coverage), with 96.49% of the reads with quality > Q30. The combination of long reads and long-range interaction data produced an initial phased contig-level assembly. Haplotype 1 (H1) had a total length of 574.795 Mbp and contig N50 = 20.587 Mbp (1404 contigs); haplotype 2 (H2) had a total length of 521,297 Mbp and N50 = 24.787 Mbp (229 contigs). There is a considerable range of genome size estimates for Urtica dioica from previous studies [3,13], so we performed confirmatory flow cytometry on the individual that we sequenced, which was originally estimated at 1C = 650 Mb [13]. Our new estimate, reported here, is 2C = 1.26 pg, which translates to 1C = 616 Mbp. While our assembled genome is smaller than that estimate, it has high BUSCO completeness scores (>92% for each haplotype) and kmer completeness (98.15% for both haplotypes combined; Table S2, Figure S1), consistent with recent suggestions that flow cytometry over-estimates genome sizes [19]. Scaffolding using Hi-C data anchored contigs into 13 pseudochromosomes for each haplotype. We visually inspected Hi-C contact maps for the two haplotypes and manually corrected misjoins and misassemblies, retaining only putative SVs whose presence was strongly supported by Hi-C data and by the presence of long HiFi and ONT reads spanning their borders. Finally, we compared the chromosome organization of the haplotype assemblies to that of the U. dioica assembly produced by the Darwin Tree of Life project [1] to ensure that Hi-C contact maps supported our contig ordering (Figure S2).
The final H1 assembly had a total length of 574.93 Mbp, a scaffold N50 = 43.96 Mbp, and a contig N50 = 10.89 Mbp, where 92.59% of the contigs were above 50 kbp. The final H2 assembly was of similar quality, with a total length of 521.16 Mbp, a scaffold N50 = 47.99 Mbp, and a contig N50 = 13.53 Mbp, with 98.98% of contigs above 50 kbp in length. The BUSCO score was 92.6% complete for H1 and 92.2% complete for H2, with very low levels of duplication (<3%) (Table 1; Figure 1). While 89.5% and 97.8% of the H1 and H2 genome were placed in the chromosomes, respectively, we had 72.33 Mbp of unplaced sequences. We note that 95.44% of these sequences (69.0 Mbp) were repetitive, and we only detected five BUSCO genes within them. Hifiasm placed all the repetitive sequences that could not be unequivocally assigned to either haplotype in H1, which explains the difference in size between H1 and H2.

2.2. Genome Structure

We expected to find very high synteny between the two haplotypes of our stinging nettle assembly, especially since we had high HiFi long-reads and Hi-C coverage, and thorough manual curation should minimize the chances of misassemblies between haplotypes. However, we observed numerous SVs between haplotype assemblies (see chromosomes 1, 2, 3, and 8 in Figure 2 and Figure S3). In particular, we found that the two haplotypes differed by a massive inversion (18.4 Mbp in length) on chromosome 8, which was surrounded by multiple duplicated regions (Figure 3). We manually checked all those SVs using Hi-C data and looked for HiFi and Nanopore reads spanning their breakpoints; this included manually “correcting” these SVs in either haplotype to determine if this resulted in improved Hi-C contact maps. In all cases, these checks supported the current chromosome organization of the two haplotypes (Figure 2, Tables S3 and S4).
We also observed fragmented alignments between haplotypes in regions with low gene density and high repetitiveness in several chromosomes (compare panels a–d in Figure 1). Across all 13 chromosomes, SyRI classified only about 71% of the regions between the two haplotypes as syntenic, with around 16% of the genome remaining unaligned (Table S5) due to their extremely repetitive content (see below). While it is not possible to completely exclude that some of these patterns are due to artifacts in sequencing or assembly, these results highlight a high occurrence of structural variation in the stinging nettle genome (Figure 1a, Figure 2b and Figure S2). Structural variation between haplotypes, especially if they have been maintained in nettle for many generations, could also contribute to the high level of estimated heterozygosity (~1.5%), which was obtained from short read sequence data from the same stinging nettle individual and was therefore determined independently from the genome assembly. While the report of such extensive structural variation in the genome of nettle is novel, the importance of SVs in maintaining sequence and functional diversity in wild [20] and cultivated species [21] is being increasingly recognized. Given the wide range of ecosystems in which stinging nettles grow [2], it is tempting to speculate that these SVs could be involved in adaptation to particular environments, as is the case in other systems [22,23]. However, additional experimental evidence and species-wide genetic and phenotypic analyses will be required to assess the adaptive relevance of SVs in nettle.

2.3. Genes and Repeats Landscape

Gene annotation with the BRAKER3 pipeline identified 20,333 and 20,140 genes on H1 and H2, respectively, with annotation BUSCO scores of 90.5% for H1 and 90.4% for H2, showing high levels of completeness (Figure 1b, Table 1 and Table S6). Ab initio and homology-based gene predictions were complemented with RNAseq data for three different tissues of nettle, obtained from a previous study [24]; while this RNAseq dataset is unlikely to include the entirety of transcribed genes in the nettle genome, it provides additional experimental support for our gene annotation. Across the entire genome, including contigs that could not be placed on the 13 nettle chromosomes, transposable elements (TEs) accounted for 69.14% and 68.59% of the total sequence of H1 and H2, respectively (Table 1). Within the 13 chromosomes, we found that the most abundant repeats were Long Terminal Repeat retrotransposons (LTRs), which collectively covered 47.23% and 50.51% of the two assemblies, and Terminal Inverted Repeats (TIRs), which accounted for 15.63% and 13.24% of the chromosomes (Table S7). The LTR Assembly Index (LAI) values were 16.96 and 11.15 for H1 and H2, respectively, which are comparable to previously benchmarked high-quality genome assemblies [25]. TE density in 500 kbp windows varied from 0 to >6000 TEs per window; however, most windows contained <1000 TEs, except for 37 windows containing more than 2000 TEs (Figure 1c, Table S7).
Scans for patterns of tandem repeats across the genome using RepeatOBserver [26] also identified these TE clusters. RepeatOBserver also calculates a repeats Shannon diversity index, which describes the diversity of tandem repeats across the genome, and uses it to identify the putative location of centromeres in chromosomes. Across most species, centromeric regions have been shown to have high repeat levels but low repeat diversity, and to correspond to minima for the repeats Shannon diversity index (meaning that most of the sequence in these regions is made up of only one or few very abundant repeats). These results can also be confirmed through visual inspection of Fourier transform repeat heatmap, in which discrete banding identifies long stretch of tandem repeats typically associated with centromeres (see, for example, heatmaps for metacentric Arabidopsis thaliana (L.) Heynh. and holocentric Morus notabilis C.K. Schneid. chromosomes, Figure 4a,b [26,27,28]). In our U. dioica assemblies, repeat patterns are consistent with the presence of acrocentric or near telocentric centromeres in five out of 13 chromosomes (chromosomes 8, 9, 11, 12, 13). For eight more chromosomes (chromosomes 1–7, 10) we observed more diffused and fragmented patterns of repeats over a large proportion of the chromosome, suggesting the presence of polycentric centromere (Figure 1a, Figure 4c and Figure S4).
While acrocentric centromeres have been previously observed in the genome of another member of the Urticaceae (ramie, Boehmeria nivea (L.) Gaudich.; [29]) the presence of polycentric centromeres has not been reported in this family; however, polycentric centromeres have been described in the Moraceae, which are the closest family to the Urticaceae [28]. The position of the predicted centromeres in the U. dioica genome is also consistent with those regions having the lowest gene density in the chromosome, as it is typical of centromeres (Figure 1b,e; Table S8). However, it should be noted that these analyses only provide a putative centromere location, and that definitive identification of centromere regions would require direct experimental evidence (e.g., localization of centromeric histones H3, CENH3).

2.4. Genome Evolution

The haploid chromosome number of U. dioica has been variously reported to be 12 or 13 chromosomes [30]. In this study, we unequivocally identify 13 separate chromosomes in the U. dioica genome assembly. However, base chromosome number is highly variable within the Urticaceae (ranging from 7 to 14 [31]), as well as within Urtica species [32]. While the Urticaceae family has a complex phylogeny, multiple sources support a monophyletic origin and subdivision in four clades (I-IV [33,34]). The divergence time between clades II/III and clades I/IV was estimated to be 84.87 million years, and the Urticaceae family split from the nearest relative Moraceae 100.01 million years ago [33]. To understand the evolution of genome organization in this family, we compared our U. dioica (clade III) genome assembly to those of three other species in the Urticaceae: Urtica urens (clade III), Boehmeria nivea (clade I), and Parietaria judaica (clade I). Despite evidence of abundant large-scale chromosome rearrangements, we found synteny to be quite conserved across all of these species (Figure 5).
Interestingly, while the phylogenetic distance between B. nivea or P. judaica and Urtica species is comparable, chromosomal synteny is much higher between Urtica and B. nivea than between either of them and P. judaica. Base chromosome number in U. urens and many other related Urtica species is often n = 12 [31,32], whereas U. dioica is almost always n = 13 in diploid cytotypes, while n = 12 is more frequently observed in tetraploids [13]. We find that the U. dioica chromosomes 8 and 13 are syntenic with chromosome 2 in U. urens, highlighting a clear history of chromosome fusion/fission (Figure 5). The instability of the organization of these chromosomes within U. dioica could explain the high level of structural variation that we observe between haplotypes in chromosome 8 of our assembly (Figure 3, Figure S3). As more genomes in this family become available, it will be interesting to further investigate the reasons for this high flexibility in chromosome arrangement between and within related species in the Urticaceae.

2.5. Putative Neurotoxic Nettle Sting Peptides

Nettle owes its common name to its sting, which constitutes an effective defence mechanism against herbivores. This is due to the presence, on the leaves and stems, of brittle needle-like trichomes, which break upon contact and release pain-inducing chemicals [35]. While the physiological mechanism of the nettle’s sting has been extensively studied since the early 1940s, the pain-inducing compounds were only recently identified. Initial study by Fu et al. [10] showed that simple acids, including oxalic acid, tartaric acid, and formic acid, which are potentially irritant to animals, are the dominant compounds in stinging hairs, suggesting that they could be the pain-inducing compounds in nettle, and rejecting earlier hypotheses [35,36,37]. More recently, small neurotoxic peptides were shown to play a major role in causing stinging pain. In particular, two classes of such peptides were described in Urtica spp. [11]. One is a 42 amino acid-long peptide (4.3-kDa) named urthionin, which has cytolytic activity and a structure that resembles a known group of plant toxins called thionins, known to disrupt the cell membrane [38]. The other is a 63 amino acids-long peptide (6.7-kDa) called urticatoxin, which has neurotoxic activity and is seemingly specific to species in the Urticaceae tribe, such as species in the Urtica and Dendrocnide genera. Urticatoxin was originally described in U. ferox (stinging tree) and was found to induce more severe pain than urthionin. The neurotoxicity of urticatoxin was shown to be due to its ability to modulate the activity of vertebrate ion-gated sodium channels, similar to the gympietides (i.e., Excelsatoxin and Moroidotoxin) previously described in Dendrocnide species [12]. Despite their similar effect, urticatoxins and gympietides appear to have evolved independently, given their structural differences [11].
Using a homology-based approach, we identified two copies of a gene putatively encoding the urthionin peptide on chromosome 9 (86% amino acid match to the U. ferox mature peptide: Δ-Uf1a). The two paralogs were positioned right next to each other with the same sequences, indicating a recent gene duplication event. Hits for urticatoxin on chromosomes 6 and 9 were only a 26–49% match to the peptides identified in U. ferox and in Dendrocnide species (Table 2), suggesting that this class of peptides might not be found in U. dioica. No genes with similarity to two other neurotoxic peptides identified in Dendrocnide species (Excelsatoxin A and Moroidotoxin A) were identified in the nettle genome.

3. Materials and Methods

3.1. Plant Materials and Sequencing

We collected a diploid female individual of Urtica dioica ssp. dioica from beside the River Jiu, north of Rovinari, Romania, and cultivated it in Vancouver, British Columbia (as clone 11–4). Young leaves were collected and flash-frozen in liquid nitrogen for extractions. A voucher specimen was deposited in the herbarium of the Beaty Biodiversity Museum at the University of British Columbia (UBC).
Estimation of nuclear DNA content was achieved using an Attune NxT Flow Cytometer (Thermo Fisher Scientific) at the University of British Columbia; preparation of samples followed the protocol in [39] using a Tris-MgCl2 lysis buffer [40]. RNA was removed with RNase A prior to staining with propidium iodide [39], and a tomato (Solanum lycopersicum L.) sample prepared with the same method was used as a standard when estimating the nettle genome size.
High-molecular-weight DNA was extracted using a modified CTAB method [41]. A PacBio HiFi sequencing library was prepared and sequenced on a Revio instrument by Novogene (San Diego, CA, USA). HiFi reads were quality-controlled using SMRT tools v13.0.0, runqc-reports (PacBio, 2024). Whole genome shotgun sequencing (WGS) was performed on an Illumina HiSeq 2000 platform by CD Genomics (Shirley, NY, USA) to produce 150 bp paired-end reads. DNA for Nanopore sequence was extracted using a ThermoFisher MagMax Plant DNA kit, (Waltham, MA, USA) and then further purified using a Qiagen DNeasy PowerClean column (Hilden, Germany). Nanopore sequencing libraries were generated from HMW DNA using the Genomic DNA by Ligation (SQK-LSK110) protocol. Sequencing was carried out with FLO-FLG001 R9.4.1 flow cells on a MinION instrument. The resulting fast5 files were subsequently basecalled using Guppy 6.0.1 Superior Basecalling Algorithm (dna_r9.4.1_450bps_sup.cfg) conducted on an NVIDIA 3060ti. Reads with a Q score below 9 were discarded.
A Hi-C library was prepared with modifications. In brief, ground frozen tissue was cross-linked in a 1.5% formaldehyde solution containing protease inhibitor. Following nuclei isolation, chromatin was fragmented through DpnII digestion. After end-filling in the presence of biotinylated dATP, blunt ends were ligated, and DNA was extracted. Three and a half µg of DNA were sheared using ultrasonication (Covaris, Woburn, MA, USA), and fragments in the 300–500 bp size range were further selected using SPRI beads [42]. Biotinylated fragments were pulled down using streptavidin-coated beads (Invitrogen, Waltham, MA, USA), and Illumina libraries were prepared following Todesco et al. [43]. The Hi-C method used here combined elements from various previously published Hi-C protocols (including [44,45]) and has been optimized to work on a variety of plant species and to reduce library preparation costs. A detailed protocol is provided in the Supplementary Material. The resulting libraries were sequenced by Novogene (San Diego, CA, USA) on an Illumina NovaSeq X Plus instrument (Illumina Inc., San Diego, CA, USA) to generate 150 bp paired-end reads.

3.2. De Novo Assembly and Quality Evaluation

We tested multiple assemblers, assembly parameters and versions to determine what worked best for our nettle assembly, and selected the approach that produced the most complete kmer representations in both haplotypes, based on a kmer plot analysis with Merqury [46]. PacBio HiFi reads were first filtered by mean Q score > 20 using fastq.filter -e 0.01 (https://github.com/LUMC/fastq-filter (accessed 15 March 2024)). An initial contig-level genome assembly was produced with hifiasm v0.19.8-r-603, integrating Hi-C reads using the --h1 and --h2 options [47] and keeping default values for the remaining parameters. This resulted in two separate haplotype assemblies. Juicer v1.9.9 [48] and 3D-DNA v180419 [49] were used to map Hi-C reads and create a contact map, which was used to manually sort and orient the contigs to produce a chromosome-scale assembly for each haplotype in Juicebox v 1.11.08 [50]. To verify the positions and orientation of contigs, we aligned the two haplotypes and compared syntenic regions. Alignments were performed using minimap2 v2.28 [51,52] and regions harboring putative structural variants (SVs) were surveyed for obvious misassemblies on the Hi-C contact map. For regions showing ambiguous patterns in the Hi-C contact alone, we used Synteny Rearrangement Identifier (SyRI) v1.7.0 [53] to obtain precise coordinates of the putative SVs. SVs larger than 10,000 bp were then manually reoriented in individual haplotypes to determine if that would improve the Hi-C contact profiles in the region (Table S3). SVs smaller than 10,000 bp could not be visually assessed on Juicebox and were therefore omitted from this manual curation step. To check whether these putative SVs were supported by the presence of long reads spanning the expected breakpoints (following [23]), we then mapped filtered HiFi reads and ONT reads ≥30 kbp to the assembled genome using winnowmap v2.03 [54] and visualized the resulting alignments on IGV (Table S4). We used this evidence to curate both haplotype assemblies and correct likely misassemblies. Furthermore, we re-mapped the Hi-C reads to produce a haplotype-aware H1 + H2 contact map using 3D-DNA with parameter -q 0 [49] to finalize the assembly, which allowed for visualization of reads that were mapped to multiple regions in the genome (i.e., mapping quality of 0). This allowed to resolve highly repetitive regions that previously showed no visible interactions in Juicebox. However, we note that this final step fragmented our genomes into smaller contigs as a trade-off for smoother Hi-C contact map patterns. We then aligned our assemblies to the primary haplotype of a published U. dioica assembly (NCBI accession: GCA_964188135.1, hereafter presented as Udio_DToL) to check for synteny, and we assigned corresponding chromosome numbers. To verify whether differences in chromosome organization with respect to Udio_DToL could reflect misassemblies in our haplotypes, we re-ordered our contigs based on Udio_DToL using Ragtag v2.1.0 [55] (command: ragtag.py correct + ragtag.py scaffold), re-mapped our Hi-C reads and generated a new contact map on Juicebox. If our assembly is a better representation of the real order and orientation of the contigs for our sequenced individual, regions that have a different organization in this reference-based scaffolding assembly should appear as misassemblies in the Hi-C contact map (Figure S3).
Genome assembly statistics were assessed with Bandage v0.8.1 [56], BBmap v39.06 [57], and BUSCO v5.1.2 (dataset: eudicot_odb10 [58,59]). A significant portion of contigs could not be placed on the chromosomes; while they lacked Hi-C interactions with chromosomes, they showed strong interactions with themselves on Hi-C contact maps. To assess whether these unplaced contigs contain genes or are mostly composed of repeats, we ran the same BUSCO analysis as above and the redmask.py v0.0.2 command in Red with default parameters to identify repetitive regions [60]. Additionally, we used the Illumina WGS data to calculate kmer completeness and QV score with Merqury [46], as well as to estimate genome-wide heterozygosity with GenomeScope [61]. In brief, we removed adapters and kept only the paired reads from the raw Illumina data using Trimmomatic v0.39 with parameters ILLUMINACLIP: TruSeq3-PE.fa:2:30:10:2:True SLIDINGWINDOW:4:15 LEADING:3 TRAILING:3 MINLEN:36 [62]. Then, for quality assessment, the meryl database was built using the trimmed reads with k = 19, based on which Merqury generated the kmer plots to obtain kmer completeness and QV score (Figure S1). Finally, to obtain a genome-wide heterozygosity estimate, we used Jellyfish v2.3.1 [63] to plot a kmer frequency histogram from which we calculated the % heterozygosity using GenomeScope v1.0 with k = 21 (as recommended for most species), ensuring that the model fits with >90% accuracy.

3.3. Genome Annotation and Visualization

We performed annotation of our stinging nettle genome assembly using BRAKER3, which allows integration of RNAseq data to support the annotation process and has shown superior benchmarking performance in published studies [64,65,66]. To run the pipeline, we first soft-masked the repetitive regions in the genome using the redmask.py v0.0.2 command in Red, with default parameters [60]. RNAseq data from three different tissue types—leaf, fiber, callus—was retrieved from [24]; raw paired-end Illumina reads were filtered with Trimmomatic v0.39 with parameters ILLUMINACLIP: TruSeq3-PE.fa:2:30:10:2:True SLIDINGWINDOW:4:15 LEADING:3 TRAILING:3 MINLEN:36 [62]. Filtered read pairs were aligned to the soft-masked genome using Hisat2 v2.2.1 (parameters --dta added), and the alignment file was converted to BAM format using samtools [67]. Prior to moving forward, the alignment score of the RNAseq to the genome was checked to be above 80% on average to confirm that the individuals from which the RNAseq was obtained from were sufficiently similar to our U. dioica assembly. The resulting BAM file, as well as the Viridiplantae protein database from OrthoDB v11 [68,69], were incorporated in the BRAKER3 pipeline (using --bam and --prot_seq mode). Completeness of the annotation was assessed with BUSCO with --mode protein (dataset: eudicot_odb10 [58,59]).
For transposable elements (TE) annotation, we used the extensive de-novo TE annotator (EDTA) pipeline v.2.2.0 [70] with default parameters. Additionally, we used RepeatOBserver v1 [26] to analyze patterns of tandem repeats across the chromosomes and identify putative telomeric and centromeric repeats. The location of the putative centromere was predicted in RepeatOBserver using the default values for Shannon diversity standard deviation cut-off (Shannon_bin_size = 500, Shannon_SD = 2). Genome-wide patterns of gene and TE density, Shannon diversity and rearrangements between haplotypes, as well as putative centromere locations, were visualized using Circos v0.69-8 [71]; computed Shannon diversity scores were averaged over a 250 kbp window into a line plot, and the gene and TE annotations were formatted to histograms of counts per 500 kbp window. To ensure proper visualization of TE patterns across the genome, we set the maximum y-axis for the TE density track to 4000, resulting in the values for six outlier windows with very high TE counts (>4000) being cut off the plot. Location and TE counts in those regions is reported in Table S7. Additionally, we used the LTR Assembly Index (LAI), calculated with LTR_retriever, to assess LTR integrity [25]. LAI is an independent assembly metrics that can provide useful information regarding how intact the LTRs are, which can be affected by either age of the LTR or by misassemblies. According to [25], LAIs of 10–20 are considered “Reference” level.

3.4. Comparative Genomics

To investigate the patterns of chromosome evolution within the nettle family, we first compiled a list of reference genomes published within the Urticaceae. The quality of these genome assemblies was assessed based on the annotation BUSCO score (≥90%). We included Urtica urens L., Parietaria judaica L. (both primary haplotypes from the Darwin Tree of Life Project Consortium [1]), and a wild accession of Boehmeria nivea [29]. For each genome, the gene annotation file, in gff format, was downloaded together with the associated protein sequences in amino acid fasta format. Then, the annotation was converted into a bed format using the lines corresponding to the “mRNA” annotation, with an extra column matching the header of the protein fasta sequence file. This bed file then indicates the genome position of every protein. The file conversion was performed manually with a custom shell script, as needed. For this analysis, we used the H1 haplotype of our U. dioica assembly since it had a longer total assembly size (quality scores are otherwise similar between the two haplotypes). Once the input was prepared for all the downloaded datasets and our H1 genome assembly, we ran GENESPACE v1.3.1 [72] to compare chromosome synteny across the four Urticaceae species. GENESPACE was run with default parameters. In brief, the initial orthogroups were discovered and identified with OrthoFinder v2.5.4 [73]. Using the synteny information analyzed with MCScanX [74], syntenic blocks based on the gene orders were visualized.

3.5. Identification of Sting Genes

To identify putative neurotoxic peptides in our stinging nettle genome, we obtained the amino acid sequence of the 4.3-kDa urthionin (Δ-Uf1a) and several versions of 6.7-kDa neurotoxic urticatoxin (β/δ-Uf2a, β/δ-Uf2b, β/δ-Dm2a, β/δ-De2a) peptide, described in Urtica ferox G. Forst, Dendrocnide moroides (Wedd.) Chew and D. excelsa (Wedd.) Chew [11]. Xie et al. [11] described the above toxins in other Urtica species, including from transcriptome data of U. dioica and U. incisa Poir.; however, since these putative toxin transcripts were found by homology to U. ferox sequences, we omitted them from our analysis. Additionally, we included in our search the two 4-kDa gympietides (Moroidotoxin A, Excelsatoxin A) described in D. moroides and D. excelsa [12]. After compiling this list of seven sting-associated neurotoxic peptides found in the literature, we looked for possible homologs in stinging nettle by aligning their protein and transcript sequences to our genome assembly using miniprot v0.13 (r248) with the parameter -Iut16, which accounted for a flexible size range of introns in the homolog detection on the query genome [75]. The visualized alignment results of these analyses are reported in Figure S5.

4. Conclusions

We have generated a highly complete, diploid genome assembly of stinging nettle, a multi-purpose species whose use has been interwoven in various cultures for thousands of years and is seeing renewed interest as a source of natural, sustainable fiber. Despite its compact size, we found the stinging nettle genome to be highly repetitive, with almost two-thirds of its sequence being composed of transposable elements. We also identify surprisingly high levels of structural variation between haplotypes in the individual we sequenced. While, despite our best efforts, we cannot completely exclude that these are due to haplotype misassemblies, it seems likely that these two observations are linked, given that transposable elements are known to facilitate the generation of chromosomal rearrangements [76]. Further complexity in the nettle genome is added by the possible presence of several polycentric chromosomes, as suggested by diffuse patterns of short tandem repeats. The stinging nettle genome represents, therefore, an attractive case study to help understand the evolution of chromosomal variation within and between species. In future studies, it will be interesting to determine whether the inversions we identified play a role in ecotypic adaptation in nettle, as has been observed in other plant species [77]. This assembly also provides a valuable resource for the nettle community, as it will greatly facilitate analyses of the genetic makeup of useful and interesting traits in the nettle family, such as studies of the genes controlling fiber quality, the production of bioactive molecules found in this species, or the formation of nettle’s characteristic stinging hairs.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/plants14010124/s1, Figure S1: Kmer plot for the U. dioica ssp. dioica assembly; Figure S2: Alignment between the U. dioica assembly from Darwin Tree of Life (DToL) and the U. dioica ssp. dioica H1 and H2 assembly from this study; Figure S3: SyRI alignment between U. dioica ssp. dioica H1 and H2 haplotypes; Figure S4: Fourier transformed repeat spectra for each chromosome; Figure S5: Confirmation of the putative stinging peptide genes; Table S1: Available genome assemblies in the Urticaceae family; Table S2: Statistics for haplotype assemblies produced by Hifiasm; Table S3: Round 3 and 4 of the manual curation of putative chromosomal inversions based on Hi-C data; Table S4: Manual curation of putative chromosomal inversions based on long-reads alignment; Table S5: Comparison between H1 and H2 haplotypes using SyRI; Table S6: Annotation statistics for the two haplotypes; Table S7: Transposable elements annotation statistics; Table S8: Coordinates of putative centromeric regions; Supplementary Methods: Hi-C Protocol.

Author Contributions

Conceptualization: Q.C. and M.K.D.; funding and resources: Q.C., M.K.D., G.G., D.M.P. and M.T.; data production: K.H., C.R.D., M.K. and D.M.P.; formal analyses, investigation, and visualization, K.H. and M.T.; sample preparation and laboratory work: K.H., C.R.D., M.K. and D.M.P.; writing, review, and editing: all authors. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by NSERC Discovery Grants to MT (RGPIN-2023-03344), QC (RGPIN-2019-04041), and MD (RGPIN-2020-05147).

Data Availability Statement

The raw sequencing data and the assembled genome are available at NCBI under BioProject number PRJNA663211 (haplotype 1) and PRJNA1198346 (haplotype 2). Genome annotation files for genes and TEs are found in Figshare: https://figshare.com/projects/A_high-quality_phased_genome_assembly_of_stinging_nettle_Urtica_dioica_ssp_dioica/230981. All the scripts used in this study are available at https://github.com/kaede0e/stinging_nettle_genome_assembly.git. Clones of individual 11-4 are available upon request ([email protected]).

Acknowledgments

We thank Eric Gonzales Segovia and Cassandra Elphinstone for their help with genome scaffolding and tandem repeats analyses, respectively, as well as Andy Johnson and Cassandra Elphinstone for assistance with the flow cytometry. We also gratefully acknowledge the Digital Research Alliance of Canada (DRAC) for access to their computational resources and the Natural History Museum, London (UK) for funding the fieldwork in Europe, undertaken by DP and QC.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Taylor, K. Biological Flora of the British Isles: Urtica dioica L. J. Ecol. 2009, 97, 1436–1458. [Google Scholar] [CrossRef]
  2. Rejlová, L.; Chrtek, J.; Trávníček, P.; Lučanová, M.; Vít, P.; Urfus, T. Polyploid Evolution: The Ultimate Way to Grasp the Nettle. PLoS ONE 2019, 14, e0218389. [Google Scholar] [CrossRef]
  3. Harwood, J.; Edom, G. Nettle Fibre: Its Prospects, Uses and Problems in Historical Perspective. Text. Hist. 2012, 43, 107–119. [Google Scholar] [CrossRef]
  4. Viotti, C.; Albrecht, K.; Amaducci, S.; Bardos, P.; Bertheau, C.; Blaudez, D.; Bothe, L.; Cazaux, D.; Ferrarini, A.; Govilas, J.; et al. Nettle, a Long-Known Fiber Plant with New Perspectives. Materials 2022, 15, 4288. [Google Scholar] [CrossRef] [PubMed]
  5. Bhusal, K.K.; Magar, S.K.; Thapa, R.; Lamsal, A.; Bhandari, S.; Maharjan, R.; Shrestha, S.; Shrestha, J. Nutritional and Pharmacological Importance of Stinging Nettle (Urtica dioica L.): A Review. Heliyon 2022, 8, e09717. [Google Scholar] [CrossRef]
  6. Xu, X.; Backes, A.; Legay, S.; Berni, R.; Faleri, C.; Gatti, E.; Hausman, J.-F.; Cai, G.; Guerriero, G. Cell Wall Composition and Transcriptomics in Stem Tissues of Stinging Nettle (Urtica dioica L.): Spotlight on a Neglected Fibre Crop. Plant Direct 2019, 3, e00151. [Google Scholar] [CrossRef] [PubMed]
  7. Man, S.M.; Paucean, A.; Chis, M.S.; Muste, S.; Pop, A.; Muresan, A.E.; Martis, G. Effect of Nettle Leaves Powder (Urtica dioica L.) Addition on the Quality of Bread. Hop Med. Plants 2019, 27, 104–112. [Google Scholar] [CrossRef]
  8. Guil-Guerrero, J.L.; Rebolloso-Fuentes, M.M.; Isasa, M.E.T. Fatty Acids and Carotenoids from Stinging Nettle (Urtica dioica L.). J. Food Compos. Anal. 2003, 16, 111–119. [Google Scholar] [CrossRef]
  9. Devkota, H.P.; Paudel, K.R.; Khanal, S.; Baral, A.; Panth, N.; Adhikari-Devkota, A.; Jha, N.K.; Das, N.; Singh, S.K.; Chellappan, D.K.; et al. Stinging Nettle (Urtica Dioica L.): Nutritional Composition, Bioactive Compounds, and Food Functional Properties. Molecules 2022, 27, 5219. [Google Scholar] [CrossRef]
  10. Fu, H.A.N.Y.I.; Chen, S.J.; Chen, R.F.; Ding, W.H.; Kuo-Huang, L.L.; Huang, R.N.A.N. Identification of Oxalic Acid and Tartaric Acid as Major Persistent Pain-Inducing Toxins in the Stinging Hairs of the Nettle, Urtica thunbergiana. Ann. Bot. 2006, 98, 57–65. [Google Scholar] [CrossRef]
  11. Xie, J.; Robinson, S.D.; Gilding, E.K.; Jami, S.; Deuis, J.R.; Rehm, F.B.H.; Yap, K.; Ragnarsson, L.; Chan, L.Y.; Hamilton, B.R.; et al. Neurotoxic and Cytotoxic Peptides Underlie the Painful Stings of the Tree Nettle Urtica ferox. J. Biol. Chem. 2022, 298, 102218. [Google Scholar] [CrossRef]
  12. Gilding, E.K.; Jami, S.; Deuis, J.R.; Israel, M.R.; Harvey, P.J.; Poth, A.G.; Rehm, F.B.H.; Stow, J.L.; Robinson, S.D.; Yap, K.; et al. Neurotoxic Peptides from the Venom of the Giant Australian Stinging Tree. Sci. Adv. 2020, 6, eabb8828. [Google Scholar] [CrossRef] [PubMed]
  13. Cronk, Q.; Hidalgo, O.; Pellicer, J.; Percy, D.; Leitch, I.J. Salix Transect of Europe: Variation in Ploidy and Genome Size in Willow-Associated Common Nettle, Urtica dioica L. Sens. Lat., from Greece to Arctic Norway. Biodivers. Data J. 2016, 4, e10003. [Google Scholar] [CrossRef]
  14. Bassett, I.J.; Crompton, C.W.; Woodland, D.W. The Biology of Canadian Weeds.: 21. Urtica dioica L. Can. J. Plant Sci. 1977, 57, 491–498. [Google Scholar] [CrossRef]
  15. Grosse-Veldmann, B.; Nürk, N.M.; Smissen, R.; Breitwieser, I.; Quandt, D.; Weigend, M. Pulling the Sting out of Nettle Systematics—A Comprehensive Phylogeny of the Genus Urtica L. (Urticaceae). Mol. Phylogenet. Evol. 2016, 102, 9–19. [Google Scholar] [CrossRef] [PubMed]
  16. Christenhusz, M.J.M.; Twyford, A.D.; Royal Botanic Gardens Kew Genome Acquisition Lab; Plant Genome Sizing Collective; Darwin Tree of Life Barcoding Collective; Wellcome Sanger Institute Tree of Life Management, Samples and Laboratory Team; Wellcome Sanger Institute Scientific Operations: Sequencing Operations; Wellcome Sanger Institute Tree of Life Core Informatics Team; Tree of Life Core Informatics Collective; Darwin Tree of Life Consortium. The Genome Sequence of the Small Nettle, Urtica urens L. (Urticaceae). Wellcome Open Res. 2024, 9, 639. [Google Scholar] [CrossRef] [PubMed]
  17. Christenhusz, M.J.M.; Royal Botanic Gardens Kew Genome Acquisition Lab; Plant Genome Sizing Collective; Darwin Tree of Life Barcoding Collective; Wellcome Sanger Institute Tree of Life Management, Samples and Laboratory Team; Wellcome Sanger Institute Scientific Operations: Sequencing Operations; Wellcome Sanger Institute Tree of Life Core Informatics Team; Tree of Life Core Informatics Collective; Darwin Tree of Life Consortium. The Genome Sequence of Pellitory-of-the-Wall, Parietaria judaica L. (Urticaceae). Wellcome Open Res. 2024, 9, 608. [Google Scholar] [CrossRef]
  18. Darwin Tree of Life Project Consortium Sequence Locally, Think Globally: The Darwin Tree of Life Project. Proc. Natl. Acad. Sci. USA 2022, 119, e2115642118. [CrossRef]
  19. Sun, H.; Ding, J.; Piednoël, M.; Schneeberger, K. FindGSE: Estimating Genome Size Variation within Human and Arabidopsis Using k-Mer Frequencies. Bioinformatics 2018, 34, 550–557. [Google Scholar] [CrossRef]
  20. Mérot, C.; Oomen, R.A.; Tigano, A.; Wellenreuther, M. A Roadmap for Understanding the Evolutionary Significance of Structural Genomic Variation. Trends Ecol. Evol. 2020, 35, 561–572. [Google Scholar] [CrossRef] [PubMed]
  21. Alonge, M.; Wang, X.; Benoit, M.; Soyk, S.; Pereira, L.; Zhang, L.; Suresh, H.; Ramakrishnan, S.; Maumus, F.; Ciren, D.; et al. Major Impacts of Widespread Structural Variation on Gene Expression and Crop Improvement in Tomato. Cell 2020, 182, 145–161.e23. [Google Scholar] [CrossRef]
  22. Battlay, P.; Wilson, J.; Bieker, V.C.; Lee, C.; Prapas, D.; Petersen, B.; Craig, S.; van Boheemen, L.; Scalone, R.; de Silva, N.P.; et al. Large Haploblocks Underlie Rapid Adaptation in the Invasive Weed Ambrosia artemisiifolia. Nat. Commun. 2023, 14, 1717. [Google Scholar] [CrossRef]
  23. Harringmeyer, O.S.; Hoekstra, H.E. Chromosomal Inversion Polymorphisms Shape the Genomic Landscape of Deer Mice. Nat. Ecol. Evol. 2022, 6, 1965–1979. [Google Scholar] [CrossRef] [PubMed]
  24. Xu, X.; Legay, S.; Berni, R.; Hausman, J.F.; Guerriero, G. Transcriptomic Changes in Internode Explants of Stinging Nettle during Callogenesis. Int. J. Mol. Sci. 2021, 22, 12319. [Google Scholar] [CrossRef] [PubMed]
  25. Ou, S.; Chen, J.; Jiang, N. Assessing Genome Assembly Quality Using the LTR Assembly Index (LAI). Nucleic Acids Res. 2018, 46, e126. [Google Scholar] [CrossRef]
  26. Elphinstone, C.; Elphinstone, R.; Todesco, M.; Rieseberg, L. RepeatOBserver: Tandem Repeat Visualization and Centromere Detection. bioRxiv 2023. 2023.12.30.573697. [Google Scholar] [CrossRef]
  27. Naish, M.; Alonge, M.; Wlodzimierz, P.; Tock, A.J.; Abramson, B.W.; Schmücker, A.; Mandáková, T.; Jamge, B.; Lambing, C.; Kuo, P.; et al. The Genetic and Epigenetic Landscape of the Arabidopsis Centromeres. Science 2024, 374, eabi7489. [Google Scholar] [CrossRef]
  28. Xuan, Y.; Ma, B.; Li, D.; Tian, Y.; Zeng, Q.; He, N. Chromosome Restructuring and Number Change during the Evolution of Morus notabilis and Morus alba. Hortic. Res. 2022, 9, uhab030. [Google Scholar] [CrossRef]
  29. Wang, Y.; Li, F.; He, Q.; Bao, Z.; Zeng, Z.; An, D.; Zhang, T.; Yan, L.; Wang, H.; Zhu, S.; et al. Genomic Analyses Provide Comprehensive Insights into the Domestication of Bast Fiber Crop Ramie (Boehmeria nivea). Plant J. 2021, 107, 787–800. [Google Scholar] [CrossRef] [PubMed]
  30. Rice, A.; Glick, L.; Abadi, S.; Einhorn, M.; Kopelman, N.M.; Salman-Minkov, A.; Mayzel, J.; Chay, O.; Mayrose, I. The Chromosome Counts Database (CCDB)—A Community Resource of Plant Chromosome Numbers. New Phytol. 2015, 206, 19–26. [Google Scholar] [CrossRef] [PubMed]
  31. Sharma, M.L.; Mehra, P.N. Chromosome Numbers in Some East Himalayan Urticaceae. Cytologia 1979, 44, 799–803. [Google Scholar] [CrossRef]
  32. de Lange, P.J.; Murray, B.G. Contributions to a Chromosome Atlas of the New Zealand Flora—37. Miscellaneous Families. N. Z. J. Bot. 2002, 40, 1–23. [Google Scholar] [CrossRef]
  33. Huang, X.; Deng, T.; Moore, M.J.; Wang, H.; Li, Z.; Lin, N.; Yusupov, Z.; Tojibaev, K.S.; Wang, Y.; Sun, H. Tropical Asian Origin, Boreotropical Migration and Long-Distance Dispersal in Nettles (Urticeae, Urticaceae). Mol. Phylogenet. Evol. 2019, 137, 190–199. [Google Scholar] [CrossRef] [PubMed]
  34. Wu, Z.Y.; Monro, A.K.; Milne, R.I.; Wang, H.; Yi, T.S.; Liu, J.; Li, D.Z. Molecular Phylogeny of the Nettle Family (Urticaceae) Inferred from Multiple Loci of Three Genomes and Extensive Generic Sampling. Mol. Phylogenet. Evol. 2013, 69, 814–827. [Google Scholar] [CrossRef]
  35. Pollard, A.J.; Briggs, D. Genecological Studies of Urtica dioca L. New Phytol. 1984, 97, 507–522. [Google Scholar] [CrossRef]
  36. Emmelin, N.; Feldberg, W. The Mechanism of the Sting of the Common Nettle (Urtica urens). J. Physiol. 1947, 106, 440–455. [Google Scholar] [CrossRef]
  37. Collier, H.O.J.; Chesher, G.B. Identification of 5-Hydroxytryptamine in the Sting of the Nettle (Urtica dioica). Br. J. Pharmacol. Chemother. 1956, 11, 186–189. [Google Scholar] [CrossRef] [PubMed]
  38. Stec, B. Plant Thionins—The Structural Perspective. Cell. Mol. Life Sci. 2006, 63, 1370–1385. [Google Scholar] [CrossRef]
  39. Doležel, J.; Greilhuber, J.; Suda, J. Estimation of Nuclear DNA Content in Plants Using Flow Cytometry. Nat. Protoc. 2007, 2, 2233–2244. [Google Scholar] [CrossRef]
  40. Baack, E.J.; Whitney, K.D.; Rieseberg, L.H. Hybridization and Genome Size Evolution: Timing and Magnitude of Nuclear DNA Content Increases in Helianthus Homoploid Hybrid Species. New Phytol. 2005, 167, 623–630. [Google Scholar] [CrossRef]
  41. Stoffel, K.; van Leeuwen, H.; Kozik, A.; Caldwell, D.; Ashrafi, H.; Cui, X.; Tan, X.; Hill, T.; Reyes-Chin-Wo, S.; Truco, M.-J.; et al. Development and Application of a 6.5 Million Feature Affymetrix Genechip® for Massively Parallel Discovery of Single Position Polymorphisms in Lettuce (Lactuca spp.). BMC Genom. 2012, 13, 185. [Google Scholar] [CrossRef] [PubMed]
  42. Rohland, N.; Reich, D. Cost-Effective, High-Throughput DNA Sequencing Libraries for Multiplexed Target Capture. Genome Res. 2012, 22, 939–946. [Google Scholar] [CrossRef] [PubMed]
  43. Todesco, M.; Owens, G.L.; Bercovich, N.; Légaré, J.S.; Soudi, S.; Burge, D.O.; Huang, K.; Ostevik, K.L.; Drummond, E.B.M.; Imerovski, I.; et al. Massive Haplotypes Underlie Ecotypic Differentiation in Sunflowers. Nature 2020, 584, 602–607. [Google Scholar] [CrossRef]
  44. Padmarasu, S.; Himmelbach, A.; Mascher, M.; Stein, N. In Situ Hi-C for Plants: An Improved Method to Detect Long-Range Chromatin Interactions. In Plant Long Non-Coding RNAs: Methods and Protocols; Chekanova, J.A., Wang, H.-L.V., Eds.; Springer: New York, NY, USA, 2019; pp. 441–472. ISBN 978-1-4939-9045-0. [Google Scholar]
  45. Wang, N.; Liu, C. Study of Cell-Type-Specific Chromatin Organization: In Situ Hi-C Library Preparation for Low-Input Plant Materials. In Plant Epigenetics and Epigenomics: Methods and Protocols; Spillane, C., McKeown, P., Eds.; Springer US: New York, NY, USA, 2020; pp. 115–127. ISBN 978-1-0716-0179-2. [Google Scholar]
  46. Rhie, A.; Walenz, B.P.; Koren, S.; Phillippy, A.M. Merqury: Reference-Free Quality, Completeness, and Phasing Assessment for Genome Assemblies. Genome Biol. 2020, 21, 245. [Google Scholar] [CrossRef] [PubMed]
  47. Cheng, H.; Concepcion, G.T.; Feng, X.; Zhang, H.; Li, H. Haplotype-Resolved de Novo Assembly Using Phased Assembly Graphs with Hifiasm. Nat. Methods 2021, 18, 170–175. [Google Scholar] [CrossRef]
  48. Durand, N.C.; Shamim, M.S.; Machol, I.; Rao, S.S.P.; Huntley, M.H.; Lander, E.S.; Aiden, E.L. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Syst. 2016, 3, 95–98. [Google Scholar] [CrossRef] [PubMed]
  49. Dudchenko, O.; Batra, S.S.; Omer, A.D.; Nyquist, S.K.; Hoeger, M.; Durand, N.C.; Shamim, M.S.; Machol, I.; Lander, E.S.; Aiden, A.P.; et al. De Novo Assembly of the Aedes aegypti Genome Using Hi-C Yields Chromosome-Length Scaffolds. Science 2017, 356, 92–95. [Google Scholar] [CrossRef] [PubMed]
  50. Durand, N.C.; Robinson, J.T.; Shamim, M.S.; Machol, I.; Mesirov, J.P.; Lander, E.S.; Aiden, E.L. Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom. Cell Syst. 2016, 3, 99–101. [Google Scholar] [CrossRef]
  51. Li, H. Minimap2: Pairwise Alignment for Nucleotide Sequences. Bioinformatics 2018, 34, 3094–3100. [Google Scholar] [CrossRef] [PubMed]
  52. Li, H. New Strategies to Improve Minimap2 Alignment Accuracy. Bioinformatics 2021, 37, 4572–4574. [Google Scholar] [CrossRef]
  53. Goel, M.; Sun, H.; Jiao, W.B.; Schneeberger, K. SyRI: Finding Genomic Rearrangements and Local Sequence Differences from Whole-Genome Assemblies. Genome Biol. 2019, 20, 277. [Google Scholar] [CrossRef] [PubMed]
  54. Jain, C.; Rhie, A.; Hansen, N.F.; Koren, S.; Phillippy, A.M. Long-Read Mapping to Repetitive Reference Sequences Using Winnowmap2. Nat. Methods 2022, 19, 705–710. [Google Scholar] [CrossRef]
  55. Alonge, M.; Soyk, S.; Ramakrishnan, S.; Wang, X.; Goodwin, S.; Sedlazeck, F.J.; Lippman, Z.B.; Schatz, M.C. RaGOO: Fast and Accurate Reference-Guided Scaffolding of Draft Genomes. Genome Biol. 2019, 20, 224. [Google Scholar] [CrossRef] [PubMed]
  56. Wick, R.R.; Schultz, M.B.; Zobel, J.; Holt, K.E. Bandage: Interactive Visualization of de Novo Genome Assemblies. Bioinformatics 2015, 31, 3350–3352. [Google Scholar] [CrossRef] [PubMed]
  57. Bushnell, B. BBMap: A Fast, Accurate, Splice-Aware Aligner. In Proceedings of the 9th Annual Genomics of Energy & Environment Meeting, Walnut Creek, CA, USA, 17–20 March 2014. [Google Scholar]
  58. Simão, F.A.; Waterhouse, R.M.; Ioannidis, P.; Kriventseva, E.V.; Zdobnov, E.M. BUSCO: Assessing Genome Assembly and Annotation Completeness with Single-Copy Orthologs. Bioinformatics 2015, 31, 3210–3212. [Google Scholar] [CrossRef] [PubMed]
  59. Manni, M.; Berkeley, M.R.; Seppey, M.; Simão, F.A.; Zdobnov, E.M. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Mol. Biol. Evol. 2021, 38, 4647–4654. [Google Scholar] [CrossRef]
  60. Girgis, H.Z. Red: An Intelligent, Rapid, Accurate Tool for Detecting Repeats de-Novo on the Genomic Scale. BMC Bioinform. 2015, 16, 227. [Google Scholar] [CrossRef]
  61. Ranallo-Benavidez, T.R.; Jaron, K.S.; Schatz, M.C. GenomeScope 2.0 and Smudgeplot for Reference-Free Profiling of Polyploid Genomes. Nat. Commun. 2020, 11, 1432. [Google Scholar] [CrossRef] [PubMed]
  62. Bolger, A.M.; Lohse, M.; Usadel, B. Trimmomatic: A Flexible Trimmer for Illumina Sequence Data. Bioinformatics 2014, 30, 2114–2120. [Google Scholar] [CrossRef] [PubMed]
  63. Marçais, G.; Kingsford, C. A Fast, Lock-Free Approach for Efficient Parallel Counting of Occurrences of k-Mers. Bioinformatics 2011, 27, 764–770. [Google Scholar] [CrossRef]
  64. Brůna, T.; Hoff, K.J.; Lomsadze, A.; Stanke, M.; Borodovsky, M. BRAKER2: Automatic Eukaryotic Genome Annotation with GeneMark-EP+ and AUGUSTUS Supported by a Protein Database. NAR Genom. Bioinform. 2021, 3, lqaa108. [Google Scholar] [CrossRef]
  65. Gabriel, L.; Brůna, T.; Hoff, K.J.; Ebel, M.; Lomsadze, A.; Borodovsky, M.; Stanke, M. BRAKER3: Fully Automated Genome Annotation Using RNA-Seq and Protein Evidence with GeneMark-ETP, AUGUSTUS and TSEBRA. bioRxiv 2023. 2023.06.10.544449. [Google Scholar] [CrossRef] [PubMed]
  66. Hoff, K.J.; Lange, S.; Lomsadze, A.; Borodovsky, M.; Stanke, M. BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS. Bioinformatics 2016, 32, 767–769. [Google Scholar] [CrossRef] [PubMed]
  67. Danecek, P.; Bonfield, J.K.; Liddle, J.; Marshall, J.; Ohan, V.; Pollard, M.O.; Whitwham, A.; Keane, T.; McCarthy, S.A.; Davies, R.M.; et al. Twelve Years of SAMtools and BCFtools. Gigascience 2021, 10, giab008. [Google Scholar] [CrossRef]
  68. Waterhouse, R.M.; Tegenfeldt, F.; Li, J.; Zdobnov, E.M.; Kriventseva, E. V OrthoDB: A Hierarchical Catalog of Animal, Fungal and Bacterial Orthologs. Nucleic Acids Res. 2013, 41, D358–D365. [Google Scholar] [CrossRef] [PubMed]
  69. Kuznetsov, D.; Tegenfeldt, F.; Manni, M.; Seppey, M.; Berkeley, M.; Kriventseva, E.V.; Zdobnov, E.M. OrthoDB V11: Annotation of Orthologs in the Widest Sampling of Organismal Diversity. Nucleic Acids Res. 2023, 51, D445–D451. [Google Scholar] [CrossRef] [PubMed]
  70. Ou, S.; Su, W.; Liao, Y.; Chougule, K.; Agda, J.R.A.; Hellinga, A.J.; Lugo, C.S.B.; Elliott, T.A.; Ware, D.; Peterson, T.; et al. Benchmarking Transposable Element Annotation Methods for Creation of a Streamlined, Comprehensive Pipeline. Genome Biol. 2019, 20, 275. [Google Scholar] [CrossRef]
  71. Krzywinski, M.I.; Schein, J.E.; Birol, I.; Connors, J.; Gascoyne, R.; Horsman, D.; Jones, S.J.; Marra, M.A. Circos: An Information Aesthetic for Comparative Genomics. Genome Res. 2009, 19, 1639–1645. [Google Scholar] [CrossRef] [PubMed]
  72. Lovell, J.T.; Sreedasyam, A.; Schranz, M.E.; Wilson, M.; Carlson, J.W.; Harkess, A.; Emms, D.; Goodstein, D.M.; Schmutz, J. GENESPACE Tracks Regions of Interest and Gene Copy Number Variation across Multiple Genomes. Elife 2022, 11, e78526. [Google Scholar] [CrossRef]
  73. Emms, D.M.; Kelly, S. OrthoFinder: Phylogenetic Orthology Inference for Comparative Genomics. Genome Biol. 2019, 20, 238. [Google Scholar] [CrossRef]
  74. Wang, Y.; Tang, H.; DeBarry, J.D.; Tan, X.; Li, J.; Wang, X.; Lee, T.; Jin, H.; Marler, B.; Guo, H.; et al. MCScanX: A Toolkit for Detection and Evolutionary Analysis of Gene Synteny and Collinearity. Nucleic Acids Res. 2012, 40, e49. [Google Scholar] [CrossRef]
  75. Li, H. Protein-to-Genome Alignment with Miniprot. Bioinformatics 2023, 39, btad014. [Google Scholar] [CrossRef] [PubMed]
  76. Bourque, G.; Burns, K.H.; Gehring, M.; Gorbunova, V.; Seluanov, A.; Hammell, M.; Imbeault, M.; Izsvák, Z.; Levin, H.L.; Macfarlan, T.S.; et al. Ten Things You Should Know about Transposable Elements. Genome Biol. 2018, 19, 199. [Google Scholar] [CrossRef]
  77. Huang, K.; Rieseberg, L.H. Frequency, Origins, and Evolutionary Role of Chromosomal Inversions in Plants. Front. Plant Sci. 2020, 11, 296. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Haplotype-resolved assembly of a female Urtica dioica ssp. dioica individual (2n = 26). The two haplotypes are compared (H1 on the right, blue; H2 on the left, yellow). The tracks in the Circos plot represent (a) aligned regions between haplotypes, (b) gene density, (c) TE density, (d) repeats, Shannon diversity score, and (e) predicted centromeric regions, highlighted in black on the ideogram.
Figure 1. Haplotype-resolved assembly of a female Urtica dioica ssp. dioica individual (2n = 26). The two haplotypes are compared (H1 on the right, blue; H2 on the left, yellow). The tracks in the Circos plot represent (a) aligned regions between haplotypes, (b) gene density, (c) TE density, (d) repeats, Shannon diversity score, and (e) predicted centromeric regions, highlighted in black on the ideogram.
Plants 14 00124 g001
Figure 2. Chromosome structure of the U. dioica genome (a) Hi-C contact map of haplotype H1 and (b) alignment between haplotype H1 and H2 of the U. dioica ssp. dioica genome assembly presented in this study. Only the 13 chromosomes are shown in (a); additional smaller contigs were not plotted for clarity. In (b), blue represents forward strand alignment, and red represents reverse strand alignment.
Figure 2. Chromosome structure of the U. dioica genome (a) Hi-C contact map of haplotype H1 and (b) alignment between haplotype H1 and H2 of the U. dioica ssp. dioica genome assembly presented in this study. Only the 13 chromosomes are shown in (a); additional smaller contigs were not plotted for clarity. In (b), blue represents forward strand alignment, and red represents reverse strand alignment.
Plants 14 00124 g002
Figure 3. The 18.4 Mbp inversion found between haplotypes on chromosome 8 is supported by Hi-C data. In the left panel, Hi-C data are aligned to both haplotypes simultaneously to create a haplotype-aware heatmap. Green lines represent contigs. Only the section of the genome-wide haplotype-aware heatmap corresponding to chromosome 8 is shown; top left contigs correspond to the H1 and bottom right contigs correspond to the H2 versions of chromosome 8. The right panel shows changes in the contact map when Hi-C reads are mapped to modified versions of the assembly, in which the orientation of the large putative inversion on chromosome 8 is flipped in H1 (top) or H2 (bottom). Disruption of (haplotype-aware) Hi-C patterns in both of these cases supports the presence of opposite orientations of the inversion in the two haplotypes.
Figure 3. The 18.4 Mbp inversion found between haplotypes on chromosome 8 is supported by Hi-C data. In the left panel, Hi-C data are aligned to both haplotypes simultaneously to create a haplotype-aware heatmap. Green lines represent contigs. Only the section of the genome-wide haplotype-aware heatmap corresponding to chromosome 8 is shown; top left contigs correspond to the H1 and bottom right contigs correspond to the H2 versions of chromosome 8. The right panel shows changes in the contact map when Hi-C reads are mapped to modified versions of the assembly, in which the orientation of the large putative inversion on chromosome 8 is flipped in H1 (top) or H2 (bottom). Disruption of (haplotype-aware) Hi-C patterns in both of these cases supports the presence of opposite orientations of the inversion in the two haplotypes.
Plants 14 00124 g003
Figure 4. Fourier transform spectra of repeats occurrence in (a) Arabidopsis thaliana chromosome 5 with metacentric centromeric signal; (b) Morus notabilis chromosome 2, representative of a holocentric chromosome (Elphinstone et al., 2023 [26]); and (c) U. dioica ssp. dioica chromosome 1 (this study). Colour intensity corresponds to the number of times a specific repeat is found in a 5 kbp window. Bright horizontal lines indicate presence of a repeat sequence that repeats itself many times across that region of the chromosome, such as tandem repeats found in telomeric and centromeric regions. Multiple bands in those regions represent harmonics of the base repeat sequence.
Figure 4. Fourier transform spectra of repeats occurrence in (a) Arabidopsis thaliana chromosome 5 with metacentric centromeric signal; (b) Morus notabilis chromosome 2, representative of a holocentric chromosome (Elphinstone et al., 2023 [26]); and (c) U. dioica ssp. dioica chromosome 1 (this study). Colour intensity corresponds to the number of times a specific repeat is found in a 5 kbp window. Bright horizontal lines indicate presence of a repeat sequence that repeats itself many times across that region of the chromosome, such as tandem repeats found in telomeric and centromeric regions. Multiple bands in those regions represent harmonics of the base repeat sequence.
Plants 14 00124 g004
Figure 5. Comparison of chromosome organization in four species of the Urticaceae family. Estimated divergence times are based on [33]. Note that while the Boehmeria nivea genome is supposed to have 14 chromosomes, the 15th largest scaffold in the assembly contained more than 300 genes and was therefore included in the figure. While the placement of that section of the genome is quite variable across species, comparison with Urtica spp. suggests that it might be part of chromosome 12 in B. nivea.
Figure 5. Comparison of chromosome organization in four species of the Urticaceae family. Estimated divergence times are based on [33]. Note that while the Boehmeria nivea genome is supposed to have 14 chromosomes, the 15th largest scaffold in the assembly contained more than 300 genes and was therefore included in the figure. While the placement of that section of the genome is quite variable across species, comparison with Urtica spp. suggests that it might be part of chromosome 12 in B. nivea.
Plants 14 00124 g005
Table 1. Genome assembly statistics. Heterozygosity values were obtained from short-read whole-genome sequencing data generated ad hoc.
Table 1. Genome assembly statistics. Heterozygosity values were obtained from short-read whole-genome sequencing data generated ad hoc.
Haplotype 1Haplotype 2
ParametersContigScaffoldContigScaffold
Total length (bp)574,934,600521,157,583
Contig/Scaffold number14591598376248
N50 (Mbp)10.8943.9613.5347.99
% main genome in scaffolds > 50 kbp92.5998.98
% of genome anchored to 13 chromosomes89.5297.83
BUSCO (C%)92.692.2
BUSCO (S%)90.590.1
BUSCO (D%)2.12.1
QV score (Merqury)42.0544.45
kmer completeness (%; Merqury) 81.7281.61
Number of protein-coding genes annotated20,33320,140
Protein BUSCO (Complete %)90.590.4
TE coverage (%)69.168.6
Whole genome LAI16.9611.15
Heterozygosity (%)1.53
Table 2. Identification of U. dioica homologs of pain-inducing peptides identified in other stinging species. Nt. = nucleotides; Aa = amino acids.
Table 2. Identification of U. dioica homologs of pain-inducing peptides identified in other stinging species. Nt. = nucleotides; Aa = amino acids.
Peptide NameSpecies DescribedNt. LengthAa Length SourceU. dioica Genome PositionNt. MatchesAa MatchesAnnotated mRNA ID
Chr.StartEnd
Urthionin A
(Δ-Uf1a)
Urtica ferox12642[11]09_H1198321381983226410836NA
09_H1198231911982331710836NA
09_H2196792251967935110836NA
09_H2196702821967040810836g13644.t1
Urticatoxin
(β/δ-Uf2a)
Urtica ferox18963[11]06_H1405745840576449030NA
06_H2466973246699189030NA
Urticatoxin
(β/δ-Uf2b)
Urtica ferox18963[11]06_H1405745840576449331NA
06_H2466973246699189331NA
Urticatoxin
(β/δ-De2a)
Dendrocnide excelsa18060[11]NA
Urticatoxin
(β/δ-Dm2a)
Dendrocnide moroides18361[11]09_H127999734279998004816g14471.t1
09_H227490551274997807826NA
Excelsatoxin ADendrocnide excelsa10535[12]NA
Moroidotoxin A Dendrocnide moroides10535[12]NA
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hirabayashi, K.; Dumigan, C.R.; Kučka, M.; Percy, D.M.; Guerriero, G.; Cronk, Q.; Deyholos, M.K.; Todesco, M. A High-Quality Phased Genome Assembly of Stinging Nettle (Urtica dioica ssp. dioica). Plants 2025, 14, 124. https://doi.org/10.3390/plants14010124

AMA Style

Hirabayashi K, Dumigan CR, Kučka M, Percy DM, Guerriero G, Cronk Q, Deyholos MK, Todesco M. A High-Quality Phased Genome Assembly of Stinging Nettle (Urtica dioica ssp. dioica). Plants. 2025; 14(1):124. https://doi.org/10.3390/plants14010124

Chicago/Turabian Style

Hirabayashi, Kaede, Christopher R. Dumigan, Matúš Kučka, Diana M. Percy, Gea Guerriero, Quentin Cronk, Michael K. Deyholos, and Marco Todesco. 2025. "A High-Quality Phased Genome Assembly of Stinging Nettle (Urtica dioica ssp. dioica)" Plants 14, no. 1: 124. https://doi.org/10.3390/plants14010124

APA Style

Hirabayashi, K., Dumigan, C. R., Kučka, M., Percy, D. M., Guerriero, G., Cronk, Q., Deyholos, M. K., & Todesco, M. (2025). A High-Quality Phased Genome Assembly of Stinging Nettle (Urtica dioica ssp. dioica). Plants, 14(1), 124. https://doi.org/10.3390/plants14010124

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop