*2.2. Pan-Genome*

Although there is an increasing amount of reference genome data available for *Brassica* species, a high-quality reference genome comprised of genome data of individuals within a species is mandatory for comparison studies and evaluation of genetic diversity, which is essential for improvement of breeding [57]. The original e ffort in assembling a core genome representing a population of *Streptoccocus agalactiae* [60] introduced the new term "Pan-genome". Pan-genomics exploits the genomic content of all the individual lines of a species under study, including closely related species such as crop wild relatives, di fferentiating the core genes which are genes present in every individual of the species, from variable genes, which are genes present only in one or some individuals [61,62]. The variations in the genomes can be in many forms including SNP, copy number variation, and presence-absence variation (PAV). These may arise due to many rounds of DNA exchange events happening within or between the (sub)genomes, accumulated within a species during the evolution from the ancestral species and subsequent breeding history with di fferent genetic backgrounds, resulting in high species diversity [63]. The first pan-genome of *B. rapa* was constructed by comparing genomic data of *B. rapa* species including Chinese cabbage cultivar, Chiifu; an oil-like rapid cycling line (RC-144) and a Japanese vegetable turnip [64], which is 283.84 Mb in size and includes 41,052 predicted gene models, of which 90% are the common genes and 9% are unique genes only found in one of the three genomes. Constructed by an iterative assembly approach, the pan-genome of *B. oleracea* revealed the number of variable genes was roughly more than 20% of the total pan-genome gene number; 12,806 (out of 61,379 pan-genome genes) [65], whilst in *B. napus* over one third of the genes were

variable: 35,481 (out of the total 94,013 genes in the pan genome) [66]. Recently, by using a de novo approach, a pan-genome representing 9 *B. napus* genomes was obtained comprising of 58,714 (~56%) core-gene clusters present in at least seven genomes, 44,035 (~42%) dispensable gene clusters and around 2% specific genes [57].

Further dissection of gene functions within pan-genomes across species showed dispensable genes may have higher mutation rates, and are less functionally conserved compared to core genes, of which functions are associated with responsive mechanisms to biotic and abiotic signals [62]. In the pan-genome of *B. oleracea*, Bayer et al. [67] found that the most abundant resistance gene analog (RGA) candidates in the additional pan-genome contigs were leucine rich repeat (NBS-LRR) genes compared to receptor-like kinase (RLK) and receptor-like protein (RLP) genes, and identified 59 RGA candidates linked to known QTL of *Sclerotinia*, clubroot, and *Fusarium* resistance. It was found that 753 out of the total 1749 RGAs in the pan-genome of *B. napus* [68] are variable, 368 of which are not present in the reference genome (cv. Darmor-bzh), suggesting many genes relating to plant resistance mechanisms may be unknown. This is consistent with the fact that the core genes (10584 SNPs identified, 70.97%) contained more SNPs than variable genes (4734 SNPs identified in 299 genes), and 688 SNPs were found in 106 RGAs within QTL (*LepR1*, *LepR2*, *Rlm1*, *Rlm3*, *Rlm4*, *Rlm7*, and *Rlm9*) conferring blackleg resistance in *B. napus* [68]. Transcriptomic analysis coupled with pathotyping could assist functional annotation of RGA candidates [69,70] while their roles in responsive mechanisms could be validated using gene technology or/and CRISPR/Cas tool [71]. Pan-genomes of *Brassica* oilseed crop provide insights on the genetic variations leading to phenotypic variations, di fferential trait expression of individuals within species which supports finding new candidate genes. Higher coverage of pangenomes allows more precise characterization and prediction of gene content of individuals within a species [63].

### **3. Breeding for Economically Important Agronomic Traits of** *B. napus*

### *3.1. Oil Content and Specialty as Priority*

Canola oil, the main product of *B. napus*, is the most consumed vegetable oil worldwide [72]. Canola food oil possesses a unique fatty acid profile compared to other types of vegetable oil: low in saturated fatty acid (SFAs), typically 7%; high monounsaturated fatty acids (MUFAs) and polyunsaturated fatty acid (PUFAs), comprised of 61% oleic acid, 21% linoleic acid, and 11% alpha-linolenic acid (ALA) [73]. Phytosterols and vitamin E are additional healthy components of canola oil [74]. The health benefits of a canola oil-based diet were clinically proven as reduction of blood glucose levels and risks of coronary heart disease, promoting immunity and prevention of tumor cell growth, contributed by MUFAs, PUFAs, and vitamin E [75]. For cooking purposes and biofuel production, canola oil with high oleic acid (C18:1) and low linolenic acid (C18:3) is preferred through providing high stability and longer shelf-life for the oil [12,76].

There are tailored profiles of canola fatty acids required for specified industries and industrial applications which were achieved through conventional breeding and mutagenesis during the 1990s [76], Figure 1. To meet the demands of the increasing market of cooking oil, priority for improvement of canola varieties is an increase in oleic acid (C18:1 or ω-9) level and decrease in linolenic acid (C18: 3 or ω-3) and SFAs with specific targets varying by country. For example, in Australia, the aim is for up to 67–75% for oleic acid and less than 3% linolenic acid [77]. Expansion of the acreage of high oleic and specialty canola varieties (to 33% of canola acres) is also part of Canada's scope for the canola crop in the period 2015–2025, as well as further reducing SFAs to below 6.8%, and especially palmitic and stearic levels below 4% [76,78]. To maximize canola intake in both biofuel and feed industries, varieties with high oleic acid and extremely low glucosinolate content are targeted [12]. High oil yield (40–45% of the mass) and low production cost are features making canola a potential crop for producing high demand fatty acids [36].

**Figure 1.** Desired optimal traits in current canola varieties, specifically for oil and meal content, herbicide, and disease resistance.

Understanding biosynthesis of the major C18 unsaturated fatty acids (UFA) in prokaryotes and eukaryotes has been well documented since the 1990s, where synthesis pathways, key fatty acid synthetic enzymes, and key regulating factors have been identified [79]. In *B. napus*, fatty acid desaturase (*FAD*) genes, specifically *FAD2* and *FAD3*, together with stearoyl-acyl carrier protein desaturase (*SAD*) catalyzing desaturation of steric acid (C18:0) to C18:1 are a focus for controlling oil quality [79], however they were also found to be involved in abiotic stress tolerance such as high cadmium (250 μM) and salinity (100 mM) conditions [80]. Elevated oleic levels were reported in lines with mutated *FAD2* genes [35,81]. Targeted mutagenesis of a *FAD2-Aa* allele of *B. napus* using CRISPR/Cas9 produced the heritable mutant, *fad2\_Aa* allele with a 4-bp deletion, which was separated from the transgenes by backcrossing [35]. By genome wide analysis of the *FAD* gene family, 84, 45, and 44 *FAD* genes were identified in *B. napus*, *B. rapa*, and *B. oleracea* genomes, respectively, with different distribution, which were assigned into four and six subfamilies in term of soluble and membrane-bound *FAD*s, respectively [82]. Based on 201,187 SNP markers developed from SLAFseq (specific locus amplified fragment sequencing) and GWAS of four important fatty acid content traits (erucic acid, oleic acid, linoleic acid, and linolenic acid), 148 SNP loci were found significantly associated with these traits and 20 orthologs of the candidate genes regulating the fatty acid biosynthesis (fatty acid synthesis, desaturation, elongation, and metabolism) and 14 candidate genes on chromosomes A8 and C3 with potential contributions were identified [83]. Roles of *BnTT8* genes in *B. napus*, the conserved gene complex controlling flavonoid accumulation in plant crops, have been elucidated by using the CRISPR/Cas9 tool to induce targeted mutations at *BnA09.TT8* and *BnC09.TT8b* genes, in which the mutant lines produced seeds with a yellow seed coat, significant increase in seed oil and protein content, and altered FA composition [84].

High demand long chain-unsaturated fatty acids, such as eicosapentaenoic acid (EPA) and docosahexaenoic acid (DHA), which were once exclusively obtained from fish, are now novel components of transgenic *B. napus* lines [36,85]. In 2005, by transformation using *A. tumefaciens* containing 6 different constructs of desaturase genes from *Thraustochytrium* sp., *Pythium irregular* and *Calendula <sup>o</sup>*ffi*cinalis*, and fatty acid elongase from *Physcomitrella patens* in *B. juncea*, very long chain PUFAs such as arachidonic acid (AA) and EPA were produced at levels of up to 25% and 15% of total seed fatty acids, however, the DHA biosynthetic pathway needed further optimization [34]. Using a similar approach, Walsh et al. [85] produced transgenic *B. napus* expressing PUFA synthases (*OrfA*, *OrfB*, and hybrid *OrfC*) from *Schizochytrium* sp. ATCC 20888 (Schizo20888). These microalgal PUFA synthases assemble C2 units from malonyl-CoA into long chain PUFAs in cytoplasm with local NADPH supplement. The average DHA contents in T2 seeds from the inbred lines of the selected events were around 2.87–3.43%, and the total content of both DHA and EPA was around 4.4% in field-produced canola oil. In the latest transgenic DHA canola variety by Petrie et al. [36], seven fatty acid biosynthesis genes from yeas<sup>t</sup> and microalgae were de novo synthesized as a single fragment of 19,750 bp and regulate DHA production through the delta-6-desaturase aerobic long-chain (C20) polyunsaturated fatty acid synthesis pathways in *B. napus*. Levels of DHA from 9 to 11%, similar to those obtained from fish, were obtained from the best transformation event (NS-B50027-4) in open field trials in Australia and Canada. With the fact that the DHA canola is approved for cultivation for human and animal consumption of the oil in Australia and cultivation approval in the US [86,87], production of other valuable fatty acids through genetic engineering tools is quite promising.

Regarding renewable and sustainable resources, erucic acid from canola oil is an excellent raw material in producing polymers used in film manufacture, nylon, lubricant, and emolient industries [13,17]. Furthermore, with the increasing demand of canola oil for food and efforts to promote renewable resources of high erucic acid rapeseed (HEAR, 45–60% erucic acid), varieties have been grown in European countries for biofuel and raw oil supplement (<2% of total weight of crushed seeds) for human consumption [88,89]. By overexpressing the fatty acid elongase gene (*fae1*) simultaneously with the lysophosphatidic acid acyltransferase gene from *Limnanthes douglasii* (Ld-LPAAT), erucic acid synthesis was predominant over PUFA synthesis in competitive elongation to the triaclyglycerol backbone in transgenic lines, which resulted in an increase in erucic acid content of up to 72% and a PUFA content as low as 6% [13]. The roles of LPAAT genes of *B. napus*, BnLPAT2 and BnLPAT5, in regulating oil biosynthesis has been recently confirmed by targeted mutations using Cas9 with single-gRNAs and multi-sgRNAs whereby the resulted *Bnlat2* and *Bnlat5* mutants showed decreased oil content and enlarged oil bodies [33]. Cytoplasmic genomes are also attracting more interest as the new approach for maximizing oil content in canola [90].
