*HWA1***- and** *HWA2***-Mediated Hybrid Weakness in Rice Involves Cell Death, Reactive Oxygen Species Accumulation, and Disease Resistance-Related Gene Upregulation**

**Kumpei Shiragaki 1, Takahiro Iizuka 1, Katsuyuki Ichitani 2, Tsutomu Kuboyama 3, Toshinobu Morikawa 1, Masayuki Oda <sup>1</sup> and Takahiro Tezuka 1,4,\***


Received: 28 August 2019; Accepted: 24 October 2019; Published: 25 October 2019

**Abstract:** Hybrid weakness is a type of reproductive isolation in which F1 hybrids of normal parents exhibit weaker growth characteristics than their parents. F1 hybrid of the *Oryza sativa* Indian cultivars 'P.T.B.7 and 'A.D.T.14 exhibits hybrid weakness that is associated with the *HWA1* and *HWA2* loci. Accordingly, the aim of the present study was to analyze the hybrid weakness phenotype of the 'P.T.B.7- × 'A.D.T.14 hybrids. The height and tiller number of the F1 hybrid were lower than those of either parent, and F1 hybrid also exhibited leaf yellowing that was not observed in either parent. In addition, the present study demonstrates that SPAD values, an index correlated with chlorophyll content, are effective for evaluating the progression of hybrid weakness that is associated with the *HWA1* and *HWA2* loci because it accurately reflects degree of leaf yellowing. Both cell death and H2O2, a reactive oxygen species, were detected in the yellowing leaves of the F1 hybrid. Furthermore, disease resistance-related genes were upregulated in the yellowing leaves of the F1 hybrids, whereas photosynthesis-related genes tended to be downregulated. These results suggest that the hybrid weakness associated with the *HWA1* and *HWA2* loci involves hypersensitive response-like mechanisms.

**Keywords:** *Oryza sativa*; hybrid weakness; cell death; reactive oxygen species; leaf yellowing; SPAD; hypersensitive response

#### **1. Introduction**

The traits of existing crop cultivars can be improved by crossing cultivars or lines to introduce beneficial traits, such as resistance or tolerance to disease or stress, to susceptible cultivars. However, because reproductive isolation mechanisms can hinder the production of hybrids, methods must be developed to overcome the underlying mechanisms of such reproductive isolation.

One type of post-zygotic reproductive isolation, namely hybrid weakness (i.e., hybrid lethality or hybrid necrosis). F1 hybrids that exhibit this phenomenon are characterized by weaker growth than their parents, and the phenomenon has been reported to occur in the offspring of crosses involved a number of species, including *Oryza sativa* [1–4], *Nicotiana* spp. [5], *Capsicum* spp. [6], *Arabidopsis thaliana* [7], *Triticum* spp. [8,9], *Gossypium* spp. [10], and *Phaseolus vulgaris* [11]. The genetic mechanisms of hybrid weakness are explained by the Bateson–Dobzhanzky–Muller model [12–14], which posits that the reduced hybrid vigor is driven by deleterious interactions between genes at different loci. In many cases, one of the causal genes is related to disease resistance (*R*), and interactions between the *R* gene and other causal gene cause autoimmune responses in the hybrid offspring [2,7]. The autoimmune responses include the accumulation of reactive oxygen species such as H2O2, cell death, upregulation of disease resistance-related genes, and downregulation of photosynthesis-related genes [2,7,15,16].

In rice, hybrid weakness has been reported to result from interactions between the *HWI1* locus, which encodes the LRR-RLK gene (*R* gene), and the *HWI2* locus, which encodes a subtilisin-like protease, and hybrids have been reported to exhibit localized programmed cell death (PCD), the high accumulation of salicylic and jasmonic acids, and amplified heat-related weakness symptoms [2]. These results demonstrate that the interaction of causal genes can activate downstream immune responses, such as hypersensitive response-like mechanisms [2,7].

The hybrid weakness that results from the interaction of *Hwa1-1*, a dominant allele of the *HWA1* locus, and *Hwa2-1*, a dominant allele of the *HWA2* locus, was firstly reported in rice by Oka [4]. In that study, F1 hybrid seedlings that exhibited hybrid weakness were reported to exhibit normal germination and seedling growth until developing three to four leaves, after which plant growth halted and the leaves yellowed. Then, unless the environment was particularly favorable, the plants died before reaching anthesis. The distributions of the *Hwa1-1* and *Hwa2-1* alleles were limited to Indian cultivars [4], and both the *HWA1* and *HWA2* loci were located in a 1637-kb region of the long arm of chromosome 11 [17]. However, the causal genes have not been identified, and the molecular mechanism underlying the hybrid weakness associated with the *HWA1* and *HWA2* loci remain unclear.

Accordingly, the aim of the present study was to characterize the phenotypes of the hybrid weakness that is associated with the *HWA1* and *HWA2* loci, in order to understand the system's underlying mechanisms. The effectiveness of SPAD values, an index correlated with chlorophyll content [18], for determining the progression of the hybrid weakness was also evaluated. The occurrence of cell death and H2O2 accumulation was also evaluated, and the expression of disease resistance and photosynthesis-related genes in the leaves of F1 hybrids exhibiting hybrid weakness were analyzed.

#### **2. Results**

#### *2.1. Hybrid Weakness Phenotypes*

The Oryza sativa Indian cultivars 'A.D.T.14 and 'P.T.B.7 carry homozygous Hwa1-1 and Hwa2-1 alleles, respectively. All the F1 hybrids of a cross between 'A.D.T.14 and 'P.T.B.7 exhibited dwarfism, reduced tiller number, and leaf yellowing (Figure 1). Increases in the height of the F1 hybrids nearly halted at 50 days after sowing (DAS), whereas that of the parents continued increasing (Figure 2A). The progression of plant age in leaf number in the F1 hybrids was the same as that in both parents (Figure 2B). The tiller number of both parents continued increasing and reached >35 tillers at 70 DAS, whereas that of the F1 hybrids increased little and only reached five tillers by 70 DAS (Figure 2C). In addition, both parents headed by 80 DAS, whereas none of the F1 hybrids had started heading after 140 DAS (Figure 1).

Leaf yellowing was first observed in the F1 hybrids that had developed seventh or eighth leaves at 30 DAS. Afterward, the leaves turned yellow sequentially, from the lower to the upper leaves. At 60 DAS, the fourth, fifth, and sixth leaves of the F1 hybrids turned yellow, starting from the leaf tip, and progressing toward the leaf base, whereas those of both parents remained green (Figure 3A–C). Meanwhile, the SPAD values of the fourth, fifth, and sixth leaves of the F1 hybrids were lower than those of the parents (Figure 3D–F). Furthermore, in the leaves of the F1 hybrids, the SPAD values of the lower leaves were lower than those of the upper leaves (fourth vs. fifth and sixth leaves and fifth vs. sixth leaves), and within each leaf, the SPAD values of the leaf tips were lower than those of the leaf bases (Figure 3D–F).

**Figure 1.** Parental and F1 hybrid phenotypes at 80 days after sowing. (**A**) *Oryza sativa* 'A.D.T.14- ; (**B**) F1 hybrid; and (**C**) *O. sativa* 'P.T.B.7- . Arrows indicate emerging panicles. Scale bars indicate 50 cm.

**Figure 2.** Phenotypic traits of parental and F1 hybrid rice. (**A**) plant height; (**B**) plant age in leaf number; and (**C**) tiller number. Values and error bars indicate mean ± SE values (*n* = 5), although some error bars are hidden by the symbols. Mid-parental and hybrids values were compared using two-tailed Student's t-test. Significance: \*\* *P* < 0.01, \* *P* < 0.05.

**Figure 3.** Phenotypes and SPAD values of leaves (fourth to sixth) from parental and F1 hybrid rice. The phenotypes (**A**–**C**) and SPAD values (**D**–**F**) of fourth (**A**, **D**), fifth (**B**, **E**), and sixth (**C**, **F**) leaves were assessed at 60 days after sowing. SPAD value was measured at the tip, middle, and base of each leaf. Values and error bars indicate mean ± SE values (*n* = 3). Different lowercase letters in each plot (**D**–**F**) indicate significant differences (Tukey HSD test, *P* < 0.05).

#### *2.2. Cell Death and H2O2 Accumulation*

The physiological changes that accompanied leaf yellowing were surveyed by analyzing F1 hybrid leaves that had been classified into four stages based on degree of yellowing (Figure 4A). Chlorophyll content was assessed by SPAD analysis and spectrophotometry. The SPAD values of Stage-1, -2, and -3 leaf tips, Stage-3 leaf middles, and Stage-3 leaf bases were lower than those of Stage-0 leaves (Figure 4B). The SPAD values of Stage-3 leaf tips, Stage-3 leaf middles, and Stage-3 leaf bases were lower than those of Stage-1 leaves (Figure 4B). The SPAD values of Stage-3 leaf middles and Stage-3 leaf bases were lower than those of Stage-2 leaves (Figure 4B). Total chlorophyll content also decreased in all leaf parts (tip, middle, and base) as yellowing progressed (Figure 4C). Because of the usefulness of SPAD value as discussed later, the progression of yellowing of leaves used in the subsequent experiments were evaluated based on SPAD value.

To determine whether cell death occurred in F1 hybrid leaves, cellular ion leakage, owing to ion permeability by cell death, was measured. Ion leakage increased slightly and significantly in the tips of Stage-2 and Stage-3 leaves, respectively (Figure 4D). Cell death in F1 hybrid leaves was also evaluated using trypan blue staining, which is used to identify the highly permeable membranes of dead cells. Only Stage-3 leaves contained dead cells (Figure 5A), and the analysis of transverse sections of Stage 3 revealed that the dead cells were located around vascular and epidermal cells (Figure 5B).

Meanwhile, 3,3-diaminobenzidine (DAB) staining revealed the presence of H2O2, which, as a reactive oxygen species, is an important regulator of cell death. Plant tissue is stained brown when DAB is oxidized by H2O2 into an insoluble polymer. Hydrogen peroxide (H2O2) was detected in the leaves of all stages, except Stage 0 (Figure 6).

**Figure 4.** Physiological changes of yellowing hybrid leaves. (**A**) Stages of yellowing: Stage 0, no yellowing; Stage 1, 1/4 of leaf yellow; Stage 2, 1/2 of leaf yellow; Stage 3, 3/4 of leaf yellow. Scale bars indicate 2 cm. (**B**) Changes in the SPAD values at the tip, middle, and base of leaves during the progression of yellowing. (**C**) Changes in chlorophyll content during the progression of yellowing. (**D**) Changes in ion leakage during the progression of yellowing. Values and error bars (**B**–**D**) indicate mean ± SE values (*n* = 3), and different lowercase letters in each plot (**B**–**D**) indicate significant differences (Tukey HSD test, *P* < 0.05).

**Figure 5.** Trypan blue staining of dead cells in F1 leaves. (**A**) Stained leaves from each yellowing stage. (**B**) Transverse section of a stained Stage-3 leaf. Scale bar indicates 100 μm. Xy: Xylem; Ph: Phloem; Ep: Epidermal cell; BS: Bundle sheath cell.

**Figure 6.** Presence of reactive oxygen species in hybrid leaves. 3,3-diaminobenzidine (DAB) staining was used to detect H2O2. Scale bars indicate 1 cm.

#### *2.3. Hybrid Weakness-Related Gene Expression*

At 70 DAS, the 11th (Stage 3) and 13th (Stage 0) leaves of the parents and F1 offspring were collected for gene expression analysis (Figure 7). The 13th leaves of both the parents and hybrids were entirely green, as indicated by high SPAD values, even though the SPAD values of the F1 hybrids were somewhat lower than those of either parent (Figure 7A,C). Meanwhile, the 11th leaves of the F1 hybrids exhibited significant yellowing, as indicated by low SPAD values, whereas those of both parents were entirely green, as indicated by high SPAD values (Figure 7B,D).

The expression of 11 disease resistance-related genes and four photosynthesis-related genes were surveyed (Table 1 and Table S1). The PR1 genes (PR1A and PR1B), the expression of which is induced by salicylic acid [19,20], were upregulated in the 11th leaves of the F1 hybrids (Figure 8), as were several PR2 genes (Gns5, Gns2, and OsEGL2), which encode glucanase-related proteins that degrade fungal cell walls [21] (Figure 8). Meanwhile, of several genes that encode chitinase-related proteins (PR4, CHT9, CHT11, and RIXI), which also degrade fungal cell walls [22], PR4, CHT9, and RIXI were all upregulated in the 11th leaves of the F1 hybrids; however, only PR4 was upregulated significantly (Figure 8). The expression of ACO2, which encodes an enzyme related to ethylene production [23], was similar in the 11th and 13th leaves of the F1 hybrids (Figure 8). Finally, PDC1, the expression of which is induced by jasmonic acid [24], was upregulated in the 11th leaves of the F1 hybrids, although not significantly (Figure 8).

Of the four photosynthesis-related genes, PSAF, LHCB, and OsRbcL were somewhat downregulated in the 11th leaves of the F1 hybrids; however, none of these trends were significant (Figure 8).

**Figure 7.** Phenotypes and SPAD values of the parental and F1 leaves (11th and 13th) used for gene expression analysis. The phenotypes (**A**, **B**) and SPAD values (**C**, **D**) of 11th (**A**, **C**) and 13th (**B**, **D**) leaves were assessed at 70 d after sowing. SPAD value was measured at the tip, middle, and base of each leaf. Values and error bars indicate mean ± SE values (*n* = 3), and different lowercase letters in each plot (**C**, **D**) indicate significant differences (Tukey HSD test, *P* < 0.05).


**Table 1.** Genes analyzed for RT-PCR.

<sup>a</sup> Identity of each gene was referenced using the Rice Annotation Project database (https://rapdb.dna.affrc.go.jp/); <sup>b</sup> CGSNL (Committee on Gene Symbolization, Nomenclature and Linkage, Rice Genetics Cooperative) gene names were referenced using Oryzabase (https://shigen.nig.ac.jp/rice/oryzabase/) [25].

**Figure 8.** Relative gene expression levels of parental and F1 hybrid leaves between 'P.T.B.7 and 'A.D.T.14- . Values and error bars indicate mean ± SE values (*n* = 3), and different lowercase letters in each plot indicate significant differences (Tukey HSD test, *P* < 0.05).

#### **3. Discussion**

Oka [4] reported that F1 hybrids that exhibit hybrid weakness associated with the *HWA1* and *HWA2* loci exhibit growth termination and leaf yellowing after the seedlings developed three or four leaves. However, the plant growth phenotypes were not described in detail. In contrast, the present study determined that F1 hybrids from the cross of 'A.D.T.14 and 'P.T.B.7 rice exhibited limited growth and tiller number, as well as and leaf yellowing (Figures 1–3). Even though leaf yellowing was also reported by Oka [4], the timing of the yellowing process was different [4]. In the present study, leaf yellowing was observed in F1 hybrids that had developed seven or eight leaves at 30 DAS and, furthermore, was associated with the downregulation of photosynthesis-related genes (Figure 8).

In *O. sativa*, three other gene sets have been reported to cause hybrid weakness, and the phenotypes associated with each system are different. More specifically, the hybrid weakness associated with the *HWC1* and *HWC2* loci is characterized by short stature, short roots, and rolled leaves [26], whereas that associated with the *HWI1* and *HWI2* loci is characterized by short stature and impaired root formation [2], and that associated with the *HW3* and *HW4* loci is characterized by short culms, fewer panicles, pale green leaves, and chlorotic leaf spots [3]. Remarkably, leaf yellowing has only been reported for the hybrid weakness associated with the *HWA1* and *HWA2* loci. Together, these reports suggest that either the causal genes of each system have different functions or the processes downstream of the causal gene interactions are different.

In the present study, the usefulness of SPAD value for determining the progression of leaf yellowing during hybrid weakness associated with the *HWA1* and *HWA2* loci were evaluated. SPAD values generally corresponded with leaf yellowing (Figure 4A,B) but failed to identify significant differences between the chlorophyll content of Stage-0 leaves and that of either the bases of Stage-1 leaves or the middles or bases of Stage-2 leaves. These results indicate that spectrophotometry is more sensitive than SPAD values to changes in chlorophyll content (Figure 4B,C). However, it is important to note that, because SPAD value accurately reflected degree of leaf yellowing and because spectrophotometry requires leaf destruction (Figure 4), SPAD value measurement is an effective and nondestructive method that can be used to quickly and easily evaluate hybrid weakness associated with the *HWA1* and *HWA2* loci.

The hybrid weakness phenotype that was studied by the present study also exhibited cell death in the yellow leaves (Figures 4D and 5). Similarly, the hybrid weakness associated with the *HWI1* and *HWI2* loci involved cell death at the basal nodes [2], and the hybrid weakness associated with the *HW3* and *HW4* loci involved cell death in leaves [3]. Cell death has been also detected in the leaves of intraspecific *Arabidopsis* hybrids that exhibit hybrid necrosis [7,27] and in the leaves, stems, and roots of interspecific *Nicotiana* hybrids that exhibit hybrid lethality [28,29]. Therefore, despite differences in localization, cell death appears to be a common feature of hybrid weakness in plants.

In the present study, cell death was only detected in Stage-2 and Stage-3 leaves, which indicates that the timing of cell death does not coincide with that of either leaf yellowing or reductions in SPAD value or chlorophyll content (Figure 4). On the other hand, H2O2 was detected in Stage-1, -2, and -3 leaves that exhibited yellow leaf tips and low SPAD values (Figure 6). H2O2 commonly triggers plant cell death during hypersensitive reactions, senescence, abiotic stress responses, and development [30–32]. We detected that reactive oxygen species would lead to cell death on hybrid weakness by *HWA1* and *HWA2*.

In many cases of hybrid weakness, one of the causal genes encodes an *R* gene, and the interaction of the *R* gene with another causal gene triggers an autoimmune response [7,27,33,34]. In the hybrid weakness associated with the *HWI1* and *HWI2* loci, the causal genes include the LRR-RLK gene (*R* gene) and a subtilisin-like protease gene, respectively, and the interaction of the causal genes results in an autoimmune response [2]. Meanwhile, in the hybrid weakness associated with the *HW3* and *HW4* loci, *HW3* encodes a calmodulin-binding protein, and even though the gene is a defense-response gene, not an *R* gene, the interaction of *HW3* with *HW4* results in an autoimmune response [3]. Furthermore, in the hybrid weakness associated with the *HWA1* and *HWA2* loci, a candidate region, which harbored both loci, also contained 12 *R* genes, along with many other genes [17]. During hypersensitivity reactions, reactive oxygen species are produced, thereby mediating cell death, chloroplast disruption, and the upregulation of defense-related genes [35,36]. In the present study, cell death was detected in the F1 leaves after H2O2 generation and leaf yellowing (Figure 3, Figure 5, and Figure 6), the yellow leaves of the F1 hybrids exhibited upregulated defense-related genes (Figure 8). These results suggest that the hybrid weakness associated with the *HWA1* and *HWA2* loci involves hypersensitive reaction-like responses. However, because leaf senescence is also associated with H2O2 production, cell death, the upregulation of certain defense genes, and leaf yellowing [37], it is possible that the hybrid weakness associated with the *HWA1* and *HWA2* loci involves premature senescence. Additional molecular

studies will reveal the exact mechanism underlying the hybrid weakness associated with the *HWA1* and *HWA2* loci.

#### **4. Materials and Methods**

#### *4.1. Plant Materials and Growth Conditions*

The *Oryza sativa* Indian cultivars 'A.D.T.14- (*indica* [17]), which is homozygous for the *Hwa1-1* allele, and 'P.T.B.7- (*aus* [17]), which is homozygous for the *Hwa2-1* allele, were crossed to generate F1 hybrid offspring. The genotypes of the two cultivars were previously reported by Oka [4]. F1 seeds were obtained by crossing 'P.T.B.7- (♀) and 'A.D.T.14- (♂) parents. Seeds of 'A.D.T.14- , 'P.T.B.7- , and the F1 offspring were sown on 17 July 2011. After the seeds were germinated on moistened filter paper in Petri dishes, the seedlings were transplanted to soil (Sukoyaka-Jinko-Baido; Yanmar Co., Ltd., Osaka, Japan) in Wagner pots of 1/5000 a. The seedlings were grown under natural light conditions in a greenhouse at Osaka Prefecture University, Sakai, Japan. The temperature and humidity of the greenhouse were recorded using a data logger (Ondotori; T&D Co., Ltd., Matsumoto, Japan), and the plants were fertilized weekly using Otsuka-A prescription (OAT Agrio Co., Ltd., Tokyo, Japan), which contained 18.6 mM N, 5.1 mM P, 8.6 mM K, 8.2 mM Ca, and 0.4 mM Mg. The plants were cultivated for 140 DAS to survey plant height, plant age in leaf number, tiller number, days to heading, and SPAD value. The plant height was measured from the surface of the soil to the tip of the tallest leaves. To evaluate the relationship between leaf yellowing and physiology, the leaves were classified according to degree of leaf yellowing. Leaves in which 0%, 25%, 50%, or > 75% of the blade had turned yellow were assigned to Stages 0, 1, 2, and 3, respectively (Figure 4A). These leaves classified according to degree of leaf yellowing were used to measured SPAD value and chlorophyll content, as well as to detect dead cells and H2O2. Parts of seedlings were cultivated in Wagner pots of 1/10,000 a in an incubator (14 h natural light and 10 h dark, 28 ◦C, light intensity: 512 μmol m−<sup>2</sup> s<sup>−</sup>1), and at 70 DAS, these plants were used as material for gene expression analysis.

#### *4.2. SPAD and Chlorophyll Measurement*

A SPAD meter (SPAD-502; Konica Minolta, Inc., Tokyo, Japan) was used to measure the SPAD values of the leaves without causing damage. SPAD values were obtained from the tip, middle, and base of each leaf. Meanwhile, total chlorophyll content was measured using a previously described spectrophotometric method [38]. Briefly, the leaves were cut into small pieces, weighed, treated with 20 mL 80% acetone, and ground using a pestle until bleached. The resulting solutions were transferred to 1.5 mL tubes and centrifuged at 10,000 g for 5 min. Each supernatant was transferred to a cuvette, and the absorbance of each supernatant was measured at 663.6 and 646.6 nm, after the spectrophotometer (V-530; JASCO Corp., Hachioji, Japan) was zeroed at 750 nm. Total chlorophyll concentration (mg g−<sup>1</sup> FW) was calculated using the following equation: [(17.76 × OD646.6 + 7.34 × OD663.6) × extraction volume in a cuvette]/fresh weight (g).

#### *4.3. Ion leakage Measurement*

Ion leakage was measured, as described previously [39]. Leaf disks (3 cm2) were taken from the tips, middles, and bases of the leaves, floated for 5 min in water that contained 0.2% (v/v) Tween 20 for removing ion generating on making leaf disks, transferred to Petri dishes that contained fresh water with Tween 20 (0.2%), and incubated for 3 h for leaking out ions by cell death. Their conductivity (value A) of the solutions was measured using a conductivity meter (Twin Cond B-173; Horiba, Ltd., Kyoto, Japan). The leaf disks were then incubated at 95 ◦C for 25 min, for leaking out ions of whole leaf disks by destroying whole organization, and cooled to room temperature, and their conductivity of the solutions was also measured (value B). Finally, ion leakage (%) was calculated using the following equation: (value A/value B) × 100%.

#### *4.4. Trypan Blue Staining*

Trypan blue staining was performed as described previously [40]. Detached leaves were stained by boiling for 8 min in a 1:1 (v:v) mixture of ethanol and lactophenol (i.e., alcoholic lactophenol) that contained 0.1 mg ml−<sup>1</sup> trypan blue, cleared in 70% chloral hydrate solution overnight, and then preserved in 70% glycerol. Trypan blue stains dead cells. Transverse slices were prepared using a hand-section method and visualized using a light microscope (Olympus BX50; Olympus, Co. Ltd., Tokyo, Japan).

#### *4.5. Detection of Hydrogen Peroxide Accumulation*

Hydrogen peroxide was detected visually, using previously described methods [41]. Briefly, leaves were soaked in a 3,3-diaminobenzidine (DAB) solution for 24 h, transferred to boiling 96% ethanol until bleaching, and then visualized. The presence of H2O2 was indicated by brown staining.

#### *4.6. Real-Time qRT-PCR*

Total RNA was isolated from leaves using an RNAiso PLUS kit (Takara Bio, Inc., Shiga, Japan), according to the manufacturer's protocol and then treated with RNase-free DNase (Promega Co., Madison, USA), and first-strand cDNA was synthesized from total RNA (2 μg) using oligo (dT)18 primers and ReverTra Ace (Toyobo Co., Ltd., Osaka, Japan). Real-time RT-PCR was carried out to analyze the expression of 11 defense-related genes and four photosynthesis-related genes (Table 1), using *Actin* as an internal control. The primers used to amplify *PR1A, PR1B, Gns5, PR4*, and *Actin* had been reported previously [42], and the other primers were designed based on RAP-DB locus ID using the Primer-BLAST design tool [43] (Table S1). Real-time RT-PCR was performed in 20 μL reaction mixtures that contained 10 μL KAPA SYBR FAST qPCR Master Mix (2×) ABI PRISM (Takara Bio), 10 μM of each forward and reverse primer (0.4 μL each), and 1 μL cDNA template, and the real-time PCR amplification was performed under the following conditions: Initial denaturation at 94 ◦C for 10 min, followed by 40 cycles of 15 s at 94 ◦C and 1 min at 60 ◦C, with a final 30 s extension at 72 ◦C using an Applied Biosystems 7300 Real-Time PCR System (Applied Biosystems, Foster, CA, USA). The results were analyzed using ABI Prism software (Applied Biosystems). Each gene expression level was divided by the expression level of *Actin* to calculate relative expression level.

#### *4.7. Statistical Analysis*

Data were analyzed using SPSS (version 22; IBM, Co., Armonk, USA). Tukey HSD tests were used to compare SPAD, chlorophyll content, and ion leakage values, and two-tailed Student's t-tests were used to compare mid-parental (mean of 'A.D.T.14 and 'P.T.B.7- ) and hybrid values of plant height, foliar age, and tiller number.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2223-7747/8/11/450/s1, Table S1: Real-time RT-PCR primers.

**Author Contributions:** Conceptualization, K.I., T.K., and T.T.; formal analysis, K.S.; funding acquisition, K.I., T.K., and T.T.; investigation, K.S. and T.I.; resources, K.I.; supervision, K.I., T.K., T.M., M.O., and T.T.; writing—original draft, K.S.; writing—review and editing, T.T.

**Funding:** This research was funded by JSPS KAKENHI (grant no. JP24580009) from the Japan Society for the Promotion of Science.

**Acknowledgments:** We are grateful to the Genebank of the National Institute of Agrobiological Sciences (Tsukuba, Japan) for providing seeds of the parent lines ('A.D.T.14 and 'P.T.B.7- ). We would like to thank Editage (www.editage.com) for English language editing.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Molecular and Morphological Divergence of Australian Wild Rice**

## **Dinh Thi Lam 1,2, Katsuyuki Ichitani 3, Robert J. Henry <sup>4</sup> and Ryuji Ishikawa 5,\***


Received: 10 December 2019; Accepted: 4 February 2020; Published: 10 February 2020

**Abstract:** Two types of perennial wild rice, Australian *Oryza rufipogon* and a new taxon Jpn2 have been observed in Australia in addition to the annual species *Oryza meridionalis*. Jpn2 is distinct owing to its larger spikelet size but shares *O. meridionalis*-like morphological features including a high density of bristle cells on the awn surface. All the morphological traits resemble *O. meridionalis* except for the larger spikelet size. Because Jpn2 has distinct cytoplasmic genomes, including the chloroplast (cp), cp insertion/deletion/simple sequence repeats were designed to establish marker systems to distinguish wild rice in Australia in different natural populations. It was shown that the new taxon is distinct from Asian *O. rufipogon* but instead resembles *O. meridionalis.* In addition, higher diversity was detected in north-eastern Australia. Reproductive barriers among species and Jpn2 tested by cross-hybridization suggested a unique biological relationship of Jpn2 with other species. Insertions of retrotransposable elements in the Jpn2 genome were extracted from raw reads generated using next-generation sequencing. Jpn2 tended to share insertions with other *O. meridionalis* accessions and with Australian *O. rufipogon* accessions in particular cases, but not Asian *O. rufipogon* except for two insertions. One insertion was restricted to Jpn2 in Australia and shared with some *O. rufipogon* in Thailand.

**Keywords:** *Oryza*; speciation; divergence; life history; phylogenetic relation; Australian continent

### **1. Introduction**

The *Oryza* genus is comprised of 23 species with varying genome compositions and ploidy levels [1]. The two cultivated species, *Oryza sativa* and *Oryza glaberrima,* belong to AA genome species, and their progenitors were wild *Oryza rufipogon* and *Oryza barthii*, respectively. The AA genome species are dispersed across the major continents and were once classified as a single species, *Oryza perennis*, comprising Asian, American, African, and Oceanian forms [2]. The Asian species, *O. rufipogon* represents different life histories and varies from annual to perennial. Their life history is a continuum with annual, intermediate, and perennial forms [3,4]. The American species, *Oryza glumaepatula* also varies from annual to perennial. African species, *O. barthii* and *Oryza longistaminata*, however, are exclusively annual and perennial types, respectively.

Oceanian species had been known as *O. perennis*(later changed to the current species nomenclature) including annual and perennial types as a continuum within a single species [3]. After rearrangement of the species classification, an annual type was defined as an Oceanian endemic species, *Oryza meridionalis*

and the perennial form as *O. rufipogon* [4]. Their distributions in Australia are well studied [5,6]. Speciation of these species has been confirmed using retrotransposon insertions [7,8] and crossing ability [9–11].

In general, annual and perennial species have different adaptive strategies to allocate their energy resources [3,12]. Annual species tend to have higher seed productivity than perennial species. *O. meridionalis*, the Australian annual species, produces plenty of seeds and disperses these seeds. *O. meridionalis* inhabits ponds or the periphery of ponds, ditches, or lakes during the rainy season. Water levels in wild rice habitats recede and water in the peripheral areas of annual species disappears during the dry season [13]. Annual species produce large amounts of seed for the next generation. In contrast, the life history of Australian perennial species is similar to Asian perennial species except for a unique taxon known as Jpn2 or taxon B [6,14]. In addition, Jpn2 type wild rice exhibits different morphological and genetic characteristics [14]. Including the new wild rice type, Australian perennial and annual rice chloroplast (cp) genomes have been completely sequenced in order to understand the uniqueness in evolutionary relationships among other wild rice [15,16]. This showed that the cp genome of Australian *O. rufipogon*, Jpn1 (taxon A) has a closer relationship to *O. meridionalis* than to Asian *O. rufipogon*, although its nuclear type tended to show higher similarity to Asian *O. rufipogon.* Another perennial species, Jpn2 (taxon B), also shared similarity not only with the cp genome to *O. meridionalis* but also the nuclear type [14,17,18]. This analysis showed that all Australian wild rice shared some cp genetic similarity with *O. meridionalis*. Nuclear genomes in Australia showed huge variation never seen in Asian wild rice. These findings with ecological observations confirmed that there were two types of perennial rice. Their distribution in northern Queensland and their unique morphological traits were also reported [6,14].

In this paper, we further characterized these two taxa at morphological and reproductive levels, which enabled us to determine how they have diverged at the species level. Cytoplasmic markers to distinguish them were developed and variation among natural populations was evaluated. These findings will help to distinguish these taxa in field research for further analysis and also give clues to their evolutionary origins. Retro-transposable elements were also used to screen the species examined in this study. Some of these provide clear evidence of phylogenetic relationships because of the unique mechanism of transposition insertion.

#### **2. Results**

#### *2.1. Morphological Features*

Two types of Australian perennial wild rice were collected (Table 1). Based on our previous report [14], identifying two types of perennials: Jpn1 (taxon A) and Jpn2 (taxon B), morphological traits were able to be discriminated between the Australian perennials. Bristle cells have a thorn-like architecture along the awns (Figure 1). SEM enabled us to compare the density of these cells. They varied from 2.33 to 5.33 per 200 μm square among Asian wild rice (Table 2). In *O. meridionalis*, W1299 and W1300 had 12.67 and 14.67 per 200 μm2, respectively. The density in Jpn1 was similar to that in Asian wild rice. That in Jpn2 was similar to *O. meridionalis*. There were significant differences between the two groups, W1299/W1300/Jpn2 and W106/W0120/W0137/Jpn1. Other traits, such as anther length, suggested that Jpn2 shared short anthers with other annual accessions such as W0106 in *O. rufipogon*, and W1299 and W1300 in *O. meridionalis*.


**Table 1.** Samples collected in Australia and control core collections developed in NBRP.

\*W1299 noted as no rank in Oryza database, was added to the core collection in this study.



\*,\*\*: Significant differences compared with the longest anther of Jpn1, the largest panicle of Jpn2, the widest width of W0137, and density of bristle cells of W1300 at 5 and 1% levels, respectively.

**Figure 1.** Variation in the density of bristle cells in awns. Panel **A**: spikelet of Jpn1, Panels B to E: enlarged SEM photos of W0120 (Panel **B**), Jpn1 (Panel **C**), W1299 (Panel **D**), Jpn2 (Panel **E**). Panel **F**: density of bristle cells per 200 μm2. Bars indicating standard error (n = 3).

#### *2.2. Maternal Lineages*

In order to trace maternal lineages, next-generation sequencing data obtained from Jpn1 and Jpn2 were used for re-sequencing and comparison with the Nipponbare complete cp genome sequence. More than 53 million reads were obtained from the two accessions. Two genome sequences of *O. meridionalis*, and *O. rufipogon* were added for comparison. In all cases, 100% coverage was achieved with 733 to 2002 mean depth. When the nuclear genome was used as a reference genome, 66%–88% coverage with 7.6 to 11.4 mean depth was obtained.

Simple sequence repeats were found at 20 loci in the cp genomes. Simple insertions or deletions (INDELs) were also found at 21 loci (Table 3). Two loci were not amplified, and six loci were not confirmed because of difficulty of primer design for these fragments. One region ranging from nucleotide 17,336 to 17,392 of the Nipponbare cp genome was amplified as a single amplicon because of its short size. In total, 29 insertions/deletions (INDELs)/simple sequence repeats (SSRs) in the cp genome were polymorphic. Australian rice accessions including *O. meridionalis*, Jpn1, and Jpn2 shared the same genotype at 26 out of the 29 loci developed by plastid INDELs and SSRs.


**3.**INDELandSSRmarkersinchloroplastgenomesanddevelopedmarkers.

#### *Plants* **2020** , *9*, 224



#### *Plants* **2020** , *9*, 224

Five chloroplast markers, INDEL1, INDEL11, INDEL13, INDEL18, and INDEL19, represented polymorphisms among natural populations (Table S2 (Supplementary Materials)). Plastid types were defined as distinct combinations of each genotype. In total, nine plastid types (Type 1 to 9, r1, and r2) with r1 and r2 types in the control *O. rufipogon*, were recognized. Asian *O. rufipogon* and *O. sativa* accessions, were obviously different from the Australian wild rices.

Three accessions in PNG *O. rufipogon*, W1235, W1238, and W1239, and W2109 in Australian *O. rufipogon* shared the Type 5 plastid type with *O. meridionalis*. W1230 in Papua New Guinea *O. rufipogon* shared the r2 plastid type with the Asian type. W1236 carried a unique plastid type. Jpn2 shared Type 1 with *O. meridionalis*. Other *O. meridionalis* in the core collection divided into three types, Types 1, 5, and 8. Only two types, Types 1 and 2, were detected in the Northern Territory and in Western Australia. Newly collected accessions from Queensland carried seven types. Five of them were newly detected.

#### *2.3. Reproductive Isolation*

Biological species can be detected by the pollen fertility of hybrids. Jpn1 and Jpn2 were crossed with Asian wild rice and *O. meridionalis*. Each F1 plant was grown in a greenhouse, and leaf samples were used to check whether they were hybrids originating from the cross. Anthers were taken to check pollen fertility by staining with I2–KI. Well-stained pollen grains were counted.

Seed fertility was also assessed but this may not reflect reproductive ability of the respective plants (Table 4). W0106, W0120, and W1299 showed more than 95% pollen fertility. However, except for W0120, they showed lower seed fertility of 19.5% in W0106 and 22.3% in W1299. The panicles were bagged to prevent out-crossing and this might explain the low seed fertility. In combinations with Jpn1 and Asian *O. rufipogon*, F1 plants with W0106 and W0120 had more than 90% pollen fertility. However, seed fertility was relatively low, similar to self-pollination of W0106 and W1299. We relied on data from pollen fertility rather than seed fertility and concluded that by this criterion, Jpn1 is related to Asian *O. rufipogon*, and that Jpn2 is not close to either *O. rufipogon* or *O. meridionalis*.


**Table 4.** Pollen and seed fertility of self pollinated plants and F1 plants among Asian *O. rufipogon*, *O. meridionalis* and alternative perennials in Australia.

\*Mean: data obtained from multiple plants were averaged and noted the mean.

#### *2.4. Unique Insertion of Retrotransposable Element in Jpn2*

In total, six presumed insertions were confirmed only in the Jpn2 genome but not in Nipponbare (Table 5). Two *pSINE1* insertions were shared among Jpn2 and 19 *O. meridionalis* accessions. Another insertion amplified with Chr3-10559212-r (w/L) and pSINE1-L showed an insertion shared among Jpn1, Jpn2, and 19 *O. meridionalis* accessions (Figure 2). Chr1-4067055-f (w/L) and pSINE1-L amplified the same amplicons not only from Jpn2 and 19 *O. meridionalis* accessions but also with W0106, which originated in India, suggesting that some parts of the Jpn2 genome share the insertion with wild rice from India. No *O. rufipogon* accessions in the core collection except for W2266 and W2267 were tested because of lack of DNA, and 19 *O. meridionalis* showed these insertions. Results suggested that the insertion was probably shared among *O. meridionalis* and W0106. Chr3-10203820-f (w/L) can amplify with pSINE1-L only in Jpn2 and no other *O. meridionalis* showed any amplicons. In screening for the insertion among 30 *O. rufipogon* accessions in the core collection, W0180 and W1921, both of which originated from Thailand, showed amplicons. The insertion sequence in Jpn2 was screened from the raw reads and 53 bp were recovered. When aligned with *pSINE1*, 92.4% high similarity was retained. When *pSINE3* insertions were examined, three of the presumed insertions were amplified only among Jpn2 and 19 *O. meridionalis* accessions.


**Figure 2.** *pSINE1* and *pSINE3* insertions amplified with flanking primers and outward primers from *pSINE* consensus sequences. From lane 1 to 8, Nipponbare, W0106, W0120, W0137, Jpn1, Jpn2, W1299, and W1300 were used as each DNA template.


**Table 5.** Screening retrotransposable element insertions.

#### **3. Discussion**

#### *3.1. Unique Morphological Traits in Australian Wild Rice*

*O. rufipogon* is composed of a continuum of annual and perennial strains in Asia. They represent different life history traits related to the r-K strategy to maximize fitness [4,12]. Perennial and annual types are regarded as K- and r-strategists, respectively. Intermediates represented the r–K continuum. K selection works for individuals to increase their life span and r selection works to produce more offspring. Thus, perennials spend more energy on vegetative organs before the flowering stage. Annuals spend energy to produce more panicles and seeds. Because anther size is related to preference for outcrossing, perennials tend to carry longer anthers than annuals and produce more pollen to maximize the chance of outcrossing [3,4,12]. Such resource allocation was also confirmed in three Asian *O. rufipogon,* the Australian perennial Jpn1, and the Oceanian annual *O. meridionalis*. In order to adapt to the dry season, *O. meridionalis* plants produce many seeds and die after scattering their seeds. *O. meridionalis* has short anthers and slender panicles. The appearance of Jpn1 is similar to Asian *O. rufipogon*, with similar long anthers and open panicles. Our measurements also suggested a trend. In our previous paper, we reported that Jpn2 represents a perennial life history [14]. It generated shoots and roots from its stems to follow water in peripheral areas, growing to the inner side because of the shrinking water mass during the dry season. The morphological appearance of Jpn2 was quite different to Jpn1. Anthers length is a unique characteristic in the morphology of this type, being shorter in *O. meridionalis* [14]. *O. rufipogon* W0106, an annual type, also shared this short anther characteristic. In this study, we also demonstrated another morphological trait characteristic of Jpn2. Jpn2 has a high density of bristle cells along the awn. Genome sequencing also suggested that Jpn2 shared higher similarity to *O. meridionalis* than *O. rufipogon* [16]. These characteristics of Jpn2 infer that this species/taxon has diverged from *O. meridionalis*.

#### *3.2. Maternal Variation*

Cytoplasmic marker systems can be developed for the mitochondrial genome as suggested in this report. Other markers were also developed based on whole cp genome sequences. Whole cp genome sequences have been determined for several Australian accessions [15–19]. The maternal genome data clearly showed Jpn1 and Jpn2 shared high similarity to *O. meridionalis* with some variation. INDELs and SSRs were designed based on the cp genome sequences. Core collections and natural populations were examined to determine the distribution of cp variation. Higher variation was found among accessions in Queensland compared with others accessions collected from the Northern Territory and Western Australia. Variations in cp genomes were distinguished at high resolution using single nucleotide polymorphisms [17,18]. This study showed that easily scored INDELs and SSRs also detected higher diversity in the northern Queensland accessions. These marker systems with whole cp genome screening will provide more clues about the maternal relationships among these related species/taxa and how they diverged.

#### *3.3. Reproductive Barriers Among Australian Wild Rice*

Reproductive barriers among *Oryza species* including the Australian species have been confirmed and numerical characteristics supported speciation reproductive barriers [2]. *O. meridionalis* already has high genetic reproductive barriers and sterility detected in F1 lines of crosses with Asian wild rice. Even among *O. meridionalis*, some sterile lines were reported [9]. In our study, F1 between Jpn2 and *O. meridionalis* displayed reproductive sterility. Jpn2 in particular, developed a reproductive barrier with both Asian wild rice and *O. meridionalis*. Because the extent of the reproductive barrier corresponded to that of two different organisms, it is concluded that Jpn2 does not belong to *O. rufipogon* or *O. meridionalis*. We have not yet determined when they diverged from each other. Clade analysis of cp genomes suggested that a clade including *O. meridionalis* diverged at a date estimated as 0.86–11.99 million years ago [18]. Similar estimation has also been reported based on sequences among *Oryza*

genus [19–22]. Such a long time since divergence has allowed the accumulation of quite diverse genomes in the north-eastern part of Australia and created Jpn2 and various wild rice found at the P5 site.

F1 hybrids between Jpn2 and *O. meridionalis* showed relatively high pollen fertility with *O. meridionalis*, although the F1 showed complete seed sterility. It was suggested that the divergence between these two plants is a relatively recent event compared with the divergence from other species.

#### *3.4. Retrotransposable Elements*

Retrotransposable elements are well known markers for examining evolutionary pathways. This is mainly due to the unique mechanisms of transpositions. Two retrotransposable elements, *pSINE1* and *pSINE3*, were recognized in species divergence among AA genome and between Asian wild rice and *O. meridionalis* [3,6,23]. These have offered researchers a powerful tool for phylogenetic analysis. On the other hand, when there are no genome sequences, new markers to distinguish particular genomes are not available. In fact, there was no genomic data on insertions in a novel taxon such as Jpn2. Thus, we established a screening methodology to extract retrotransposable elements from raw reads. Recent developments in sequencing technology offer huge numbers of reads to increase target sites. Even with our limited volume of data, we succeeded in picking up insertions in the Jpn2 genome. The uniqueness of Jpn2 was also found with an insertion of the *pSINE1* retrotransposon, which was detected in Jpn2 only and in none of the other accessions of *O. meridionalis*. Two accessions of *O. rufipogon* in Thailand may provide key information on how Australian wild rice originated. This tool will open the way to draw a more precise evolutionary pathway and to understand valuable genetic resources among wild rice.

#### **4. Materials and Methods**

#### *4.1. Plant Materials*

Wild rice was collected in Australia with permission from the Queensland government, EcoAccess. We developed these collections as de novo resources, which can be accessed repeatedly from the same site with accurate GPS data allowing us to reconfirm their life cycles. Successive observations were made from 2009 until 2011, and the life history traits at the collection sites were reconfirmed year by year. This field research was supported by overseas scientific research funds (JSPS) and collaborative research with the Queensland Herbarium and Queensland Alliance for Agriculture and Food Innovation (QAAFI), University of Queensland. Thirty populations were collected from their natural habitat. Observation of the ecological habitats and life cycle of each population in April 2008, August 2009, and September 2009 were used to determine their life history such as annual or perennial behavior especially for Jpn1 and Jpn2 populations. Jpn1 and Jpn2 were typical perennial sites and individuals survived as living plants in a swamp (Jpn1) or a pond (Jpn2). In order to compare these accessions with cultivated rice, *Oryza sativa*, and wild species, *O. rufipogon* (W0106, W0120, and W0137) and *O. meridionalis* (W1297, W1299 and W1300) were compared. All plant materials were grown in greenhouse conditions at Kagoshima University. Samples collected from nature were compared with Jpn1 and Jpn2 grown from seeds collected to compare environmental effects on anther length and lemma size. Jpn1 and Jpn2 were crossed with W0106, W0120, W1297, and W1299 to test these relationships. The density of bristle cells on the surface of awns per 100 μm<sup>2</sup> was counted using a scanning electric microscope (JSM-7000F, JEOL co., Japan). A core collection derived from the National Bio-Resource (NBR) Project in Japan was kindly provided, as shown in Table S1 (Supplementary Materials) [24].

#### *4.2. Crossing and Fertility Test*

Jpn1 and Jpn2 were crossed with Asian wild rice, W106 and W120, and to Australian *O. meridionalis*, W1299 and W1300. F1 plants were grown in a greenhouse at Kagoshima University. Anthers were taken and stained with I2–KI solution and well-stained grains were counted as fertile pollen. The remaining

panicles were wrapped in paraffin bags to prevent outcrossing. Fully filled grains were counted to calculate seed fertility.

#### *4.3. Data Mining from Whole Genome Sequences*

Whole genome sequences of Jpn1 and Jpn2 were obtained using Illumina GAIIx to develop INDEL markers of the cpDNA and retrotransposon INDEL markers. The total numbers of pair-end reads were 52,087,744 for Jpn1 and 54,749,858 for Jpn2. Total nucleotides sequenced were 3.9 Gb for Jpn1 and 4.1 Gb for Jpn2. Mean depth was 2022 in Jpn1 and 2002.5 in Jpn2. With our draft data, we aligned these raw reads to the cpDNA of Nipponbare (GenBank: GU592207.1) using CLC-work bench genomics version 6.0. Several INDELs were grouped together to screen for using single PCR reactions with the designed markers listed in Table 5.

Retrotransposable elements *pSINE1* and *pSINE3* have been reported to be uniquely found in either species, *O. rufipogon* or *O. meridionalis* [7,8,23]. Consensus sequences of the 5' and 3' termini were used to design a consensus probe to screen the draft sequence data. Based on the alignment of *pSINE1* elements, consensus sequences were presumed [8]. Two probe sequences were applied for *pSINE1* to screen the data: CCA.CA.CTTGTGGAGCTAGCCGG, in which the periods indicate degenerate nucleotides, for the 5' termini, and TAGGT.TTCCCTAATATTCGCG for the 3' termini. These degenerate probes were applied to screen raw reads of Jpn1 and Jpn2. 5' termini which were confirmed to carry AAGACCCCTGGGCATTTCTC as the complementary sequence. Then, internal sequences of the 5' probe were obtained from the read to confirm whether it shares homology, ranging from 74% to 82%. In these cases, we adopted the outside of 5' termini as flanking sequences of *pSINE1* insertions. 3' termini carried TAG followed by a poly T stretch. We adopted the downstream sites as flanking sequences of *pSINE1* insertions.

Based on *pSINE3* family elements, *r3004*, *r3005*, *r3012*, and *r3024*, the 5' terminal consensus probe GCCGGGAAGACCCCGGGCC was used to screen internal sequences. The internal sequences were used to design an internal probe, CTAGCTCAGCTTGTGCTA. In order to examine the flanking sequences of insertions, the consensus probe was applied. After confirming the 5' end shared the 5' terminus of *pSINE3*, the outside sequences from TTTCTC were regarded as pSINE3 insertions.

When multiple reads were obtained as single locations, we specified the genome position based on the Nipponbare genome and detected 14 and 16 insertions that did not overlap. Of these, 21 could be aligned to the Nipponbare genome without *pSINE1* insertions at the site. Flanking sequences in the Nipponbare genome were applied to design primers to amplify the presumed insertions of either *pSINE1* or *pSINE3*. Outward primers inside *pSINE1* or *pSINE3* were also designed as shown in Table 5. Preliminary screening was performed with Nipponbare, three *O. rufipogon*, W0106, W0120, and W0137, two *O. meridionalis*, W1299 and W1300, and Jpn1, and Jpn2.

#### *4.4. Data Analysis*

Dendrograms were constructed using the neighbor-joining method based on Nei's unbiased genetic distances by Populations1.2.30 beta2 program, which was downloaded from http://bioinformatics. org/~{}tryphon/populations/#ancre\_bibliographie. All dendrograms were drawn by the TreeExplorer software used to show and edit population dendrograms as supplied with MEGA [25].

#### **5. Conclusions**

These data suggested that Jpn2 (taxon B) may be a distinct new species belonging to the *Oryza* genus and isolated from other species by reproductive barriers.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2223-7747/9/2/224/s1, Table S1: Lists of core collections in the NBRP (National Bio-resource Project). Table S2: plastid types among core collections and natural populations in Australia.

**Author Contributions:** Conceptualization, R.I., R.J.H., and K.I.; methodology, R.I.; software, R.I.; validation, D.T.L., R.I., R.J.H., and K.I.; formal analysis, D.T.L., R.J.H., K.I., and R.I.; investigation, D.T.L., R.J.H., K.I., and

R.I.; resources, R.I., R.J.H., K.I., and R.I.; writing—original draft preparation, R.I.; writing—review and editing, D.T.L., R.I., R.J.H., and K.I.; visualization, D.T.L. and R.I.; supervision, R.I.; project administration, R.I.; funding acquisition, R.I. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by a Grant-in-aid B (Overseas project. No. 16H05777) and partly by a Grant-in-aid for Scientific Research on Innovative Areas (15H05968), partly by a Grant-in-aid for Scientific Research A (19H00542), and partly by a Grant-in-aid for Scientific Research A (19H00549).

**Acknowledgments:** We acknowledge Bryan Simon, who supported our mission in Australia.

**Conflicts of Interest:** The authors declare no conflicts of interest.

#### **Abbreviations**


#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## **Segregation Distortion Observed in the Progeny of Crosses Between** *Oryza sativa* **and** *O***.** *meridionalis* **Caused by Abortion During Seed Development**

**Daiki Toyomoto 1, Masato Uemura 2, Satoru Taura 3, Tadashi Sato 4, Robert Henry 5, Ryuji Ishikawa <sup>6</sup> and Katsuyuki Ichitani 1,2,\***


Received: 31 August 2019; Accepted: 3 October 2019; Published: 8 October 2019

**Abstract:** Wild rice relatives having the same AA genome as domesticated rice (*Oryza sativa*) comprise the primary gene pool for rice genetic improvement. Among them, *O. meridionalis* and *O. rufipogon* are found in the northern part of Australia. Three Australian wild rice strains, Jpn1 (*O. rufipogon*), Jpn2, and W1297 (*O. meridionalis*), and one cultivated rice cultivar Taichung 65 (T65) were used in this study. A recurrent backcrossing strategy was adopted to produce chromosomal segment substitution lines (CSSLs) carrying chromosomal segments from wild relatives and used for trait evaluation and genetic analysis. The segregation of the DNA marker RM136 locus on chromosome 6 was found to be highly distorted, and a recessive lethal gene causing abortion at the seed developmental stage was shown to be located between two DNA markers, KGC6\_10.09 and KGC6\_22.19 on chromosome 6 of W1297. We name this gene as *SEED DEVELOPMENT 1* (gene symbol: *SDV1*). *O*. *sativa* is thought to share the functional dominant allele *Sdv1-s* (s for *sativa*), and *O. meridionalis* is thought to share the recessive abortive allele *sdv1*-*m* (m for *meridionalis*). Though carrying the *sdv1*-*m* allele, the *O. meridionalis* accessions can self-fertilize and bear seeds. We speculate that the *SDV1* gene may have been duplicated before the divergence between *O. meridionalis* and the other AA genome *Oryza* species, and that *O. meridionalis* has lost the function of the *SDV1* gene and has kept the function of another putative gene named *SDV2*.

**Keywords:** reproductive barrier; segregation distortion; abortion; wild rice; *O. meridionalis*; *O. sativa*; gene duplication

#### **1. Introduction**

Rice (*Oryza sativa*) is one of the most important staple crops in the world. It feeds about one-third of the world population. Wild rice relatives having the same AA genome as domesticated rice comprise the primary gene pool for rice genetic improvement and include the following species; *O. rufipogon*, *O. meridionalis*, *O. glumaepatula*, *O. barthii*, *O. longistaminata*. Another domesticated *Oryza* species *O. glaberrima* (African rice) also has an AA gemome, and contributes to rice improvement. Though there are several reproductive barriers among these species as described below, transfer of useful genes such as disease resistance gene from AA genome *Oryza* species to rice has been successful via hybridization.

*O. meridionalis* and *O. rufipogon* are found in the northern part of Australia [1]. *O. rufipogon* is inferred to be the direct progenitor of *O. sativa* [2], and widely distributed not only in Australia but also in South and South East Asia and New Guinea. On the other hand, the distribution of *O. meridionalis* is confined to the northern parts of Australia and Irian Jaya, Indonesia [1]. Molecular data provides support for the divergence of *O. meridionalis* from the other AA genome *Oryza* species [3–7]. This is reflected by low pollen fertility of the hybrids between *O. meridionalis* and the other AA genome species [8,9], with almost no progeny being produced from the selfing of the hybrids. To utilize the rice breeding potential of wild relatives of rice, a recurrent backcrossing strategy has been adopted to produce chromosomal segment substitution lines (CSSLs) carrying chromosomal segments from wild relatives of rice in the genetic background of cultivated rice [10–13]. Subsequent backcrossing with *O. sativa* as pollen parent was successful, because the F1 plants between *O. sativa* and its wild relatives retained female fertility.

To elucidate the genetic potential for the improvement of cultivated rice using these wild species, we produced three kinds of CSSLs with different Australian wild rice strains in the same genetic background. As a model agronomic trait, we selected late-heading, because the wild rice strains in this study head later than the recurrent parent Taichung 65 by about 50 days, and heading-time is easily scored. We have succeeded in mapping the late-heading time genes from these wild rice strains (see below) and found a new genetic distortion phenomenon in the *Oryza*. In this study, we report the genetic mechanism of the new distortion phenomenon.

#### **2. Results**

#### *2.1. Mapping of Photoperiod Sensitivity Gene*

Three wild rice strains, Jpn1, Jpn2, and W1297, and one cultivated rice cultivar Taichung 65 (T65) were used in this study. We bred various CSSLs in a T65 genetic background incorporating the three Australian wild rice strains, W1297, Jpn1, and Jpn2, chromosomal segments by recurrent backcrossing (see Material and Methods). Hereafter, the backcrossing populations using Jpn1, Jpn2, and W1297 as donor parent are described as BCnFm (Jpn1), BCnFm (Jpn2), and BCnFm (W1297), respectively. "n" and "m" represent numbers of backcrossing and selfing, respectively. The frequency distributions of days to heading of the three BC3F2 populations are shown in Figure 1. All populations showed bimodal distributions. A total of 94 DNA markers covering the whole 12 chromosomes and showing polymorphism between T65 and the three wild rice strains were subjected to preliminary linkage analysis using bulked DNA from the three BC3F2 populations. Only one marker RM136, located 568 kbp away from a photoperiod sensitivity gene *HD1* [14], showed heterozygosity in all the bulk DNAs. Chi square values for the independence between genotypes of RM136 and days to heading (early and late heading divided by the dotted line in Figure 1) were 26.880, 81.073, and 86.693 for Jpn1, Jpn2, and W1297, respectively, all highly significant (*P* < 0.0001). These results suggest that the three strains from Australia carry photoperiod sensitive alleles of the *HD1* locus, because heterozygotes and homozygotes of these strains at the RM136 locus headed much later than the homozygotes of T65. This cultivar proved to carry a photoperiod insensitive allele at the *HD1*(= *Se1*) locus [15], which behaved as an early heading-time allele in a usual cropping season in Japan [16–18].

In the BC3F2 (W1297), the segregation of the RM136 locus was highly distorted: very few homozygotes of W1297 appeared. To check if this phenomenon was specific to the cross with W1297 as donor parent, and to evaluate this phenomenon more clearly under a more uniform genetic background, BC4F2 populations with the same cross combinations were subject to further study. As for W1297, the BC3F1 plants producing the BC3F2 population for the analysis was backcrossed again to produce BC4F1 plants. Among them, late flowering plants were selected to produce BC4F2 populations. As for Jpn1 and Jpn2, different BC3F1 plants from that producing the BC3F2 population for the above experiment were backcrossed to produce BC4F1 plants. Among them, late flowering plants were selected to produce BC4F2 populations.

**Figure 1.** Frequency distributions of days to heading of the three BC3F2 populations using T65 as the recurrent parent. Jpn1, Jpn2, and W1297 were respectively used as donor parent in subfigure (**a**), (**b**), and (**c**). Three classified genotypes were assessed for RM136 as indicated: white, homozygous for T65, grey, heterozygous, black, homozygous for wild rice strains, Jpn1 (**a**), Jpn2 (**b**), and W1297 (**c**). Dotted lines dividing each population into early heading and late heading were drawn for chi-square analysis (see text).

#### *2.2. Mapping of Segregation Distortion Gene*

In the BC4F2 (W1297), the genotype of RM136 was distorted again (data not shown). In our preliminary experiment, among the published DNA markers around RM136, RM314 [19] located at 4,845kb, RM276 [19] at 6,231kb, RM7023 [20] at 6972kb, RM3628 [20] at 23,738kb and RM5314 [20] at 24,843kb on the IRGSP 1.0 pseudomolecule for chromosome 6 were fixed for the T65 allele. On the other hand, RM6818 and RM193 (Table 1) were segregating. These results suggest that the cause of segregation distortion is located between RM7023 and RM3628. Because other published DNA markers in our stocks failed in amplification of W1297 or did not distinguish T65 from W1297, we designed new DNA markers (Table 1), and performed linkage analysis. For the five consecutive markers from KGC6\_12.02 to KGC6\_19.48, only homozygotes of T65 and heterozygotes appeared (Figure 2), and no recombination occurred among the five markers (Table 2). The ratio of 64 homozygotes of T65: 119 heterozygotes fitted very well to 1:2 (χ2(1:2) = 0.221, *P* = 0.638).

**Figure 2.** Frequency distributions of days to heading of the three BC4F2 populations using T65 as recurrent parent. Jpn1, Jpn2, and W1297 were respectively used as donor parent in subfigure (**a**), (**b**) and (**c**). Three classified genotypes were assessed for KGC6\_12.02 as indicated: white, homozygous for T65, grey, heterozygous, black, homozygous for wild rice strains, Jpn1 (**a**), Jpn2 (**b**), and W1297 (**c**).


**Table 1.** Primer sequences of DNA markers designed or redesigned for linkage analysis of *SDV1* gene.

**Table 2.** Haplotypes around the segregation distortion region on rice chromosome 6 of BC4F2 (W1297).


<sup>1</sup> T, H, and W respectively denote homozygotes for T65, heterozygotes, and homozygotes for W1297.

The distorted segregation ratio 1:2:0 can be explained by one pair of recessive lethal genes. If the lethality occurred at the seedling stage, about 25% of seedlings would be expected to die. However, our visual observation did not fit with such a phenomenon. We then speculated that segregation distortion occurred during seed development. If so, seed fertility of the heterozygotes should be lower than that of the T65 homozygotes by about 25%. To test this, we examined the seed fertility of each of the BC4F2 plants. In the BC4F2 (W1297) population, the heterozygotes for KGC6\_12.02 showed lower seed fertility than the homozygotes of the T65 allele (Figure 3). If lower fertility was caused by one recessive gene, many of the sterile seeds were expected to be aborted after fertilization. Therefore, sterile seeds were dehusked to see if sterility occurred before or after fertilization (Figure 4). The proportion of seeds aborted after fertilization for heterozygotes for KGC6\_12.02 was higher than that for homozygotes of the T65 allele (Figure 5). These results suggested that homozygotes of the W1297 allele for KGC6\_12.02 die at the seed developmental stage.

**Figure 3.** Scatter diagram of days to heading and seed fertility in the two BC4F2 populations using T65 as the recurrent parent. Jpn2 and W1297 were respectively used as donor parent in subfigure (**a**) and (**b**). Two classified genotypes were assessed for KGC6\_12.02 as indicated: solid circle, homozygous for T65; open circle, heterozygous. In (**b**), plants used for testcross (Table 3) or damaged by birds in Figure 2 were removed in this figure.

**Figure 4.** Sterile seeds aborted after fertilization (top) and normal fertile seeds (bottom) found in the BC4F2 (W1297). One unit of the rightmost scale indicates 1 mm.

**Figure 5.** The scatter diagram between seed fertility (the ratio of fertile seeds) (X-axis) and the ratio of seeds aborted after fertilization in sterile seeds (Y-axis) in the BC4F2 population (W1297). Two classified genotypes were assessed for KGC6\_12.02 as indicated: solid circle, homozygous for T65; open circle, heterozygous. Plants used for testcross (Table 3) or damaged by birds in Figure 2 were removed in this figure.

#### *2.3. Segregation Distortion Caused by Abortion During Seed Development*

To confirm this hypothesis, the following experiments were performed. First, pollen fertility was examined for all BC4F2 (W1297) plants, with the result that all plants showed more than 90% pollen fertility (data not shown), suggesting that pollen sterility was not the cause of the distorted segregation ratio. Second, reciprocal backcrossing of heterozygotes for KGC6\_12.02 to T65 to produce a BC5F1 generation was undertaken. The BC5F1 from both cross combinations showed the segregation ratio fitted a 1 heterozygote:1 homozygote ratio for the T65 allele (Table 3), indicating that normal gene segregation occurred at both the egg and pollen developmental stage. The BC4F2 plants used for backcrossing were also selfed to produce a BC4F3 generation. DNA was extracted from the embryo of the fertile seeds. The segregation ratio was largely distorted from 1:2:1 at the KGC6\_12.02 locus, and no homozygotes for the W1297 allele for KGC6\_12.02 appeared, indicating that segregation distortion occurred during seed development, and was not caused by ungerminated fertile seeds, though the segregation ratio did not fit to a 1:2:0 ratio. (Table 3). The BC4F3 plants deriving from selfed seeds of the BC4F2 plants heterozygous for the KGC6\_12.02 locus also showed distorted segregation, and the ratio fitted a 1: 2: 0 ratio, confirming the other experimental results (Table 3). Taken together, all the experimental results indicated that a recessive lethal gene causing abortion at the seed developmental stage was located between KGC6\_10.09 and KGC6\_22.19 on chromosome 6 of W1297 (Table 2).


**Table 3.** Segregation of progeny of BC4F2 (W1297) heterozygous for KGC6\_12.02 genotype.

<sup>1</sup> T, H, and W respectively denote homozygotes for T65, heterozygotes, and homozygotes for W1297.

The same segregation distortion was also found in the BC4F2 (Jpn2) population (Figure 2). The ratio of 59 homozygotes of T65 allele: 97 heterozygotes at the KGC6\_12.02 locus fitted very well to 1:2 (χ2(1: 2) = 0.556, *P* = 0.456), and no homozygotes of Jpn2 allele appeared. The seed fertility of

heterozygotes of KGC6\_12 was lower than that of the homozygote of T65 allele, supporting the view that a recessive lethal gene causing abortion at seed developmental stage was located close to KGC6\_12 of Jpn2 (Figure 3). The seed fertility of BC4F2 (Jpn2) was highly variable for both homozygotes of the T65 allele and heterozygote at the KGC6\_12.02 locus, suggesting that other genetic factor(s) were involved in the large variance of seed fertility. Our preliminary results showed low pollen fertility might be responsible for low seed fertility of some plants (unpublished data). Therefore, the cause of the seed sterility was not investigated further. For Jpn1, both BC3F2 (Jpn1) and BC4F2(Jpn1) showed that normal gene segregation occurred around the *HD1* locus (Figures 1 and 2).

These results indicated that the two Australian *O. meridionalis* strains, W1297 and Jpn2, carry a recessive lethal gene causing abortion at the seed developmental stage, which was located between the two DNA markers, KGC6\_10.09 and KGC6\_22.19, spanning 12 Mb on chromosome 6, and that the Australian *O. rufipogon* strain Jpn1 does not carry such a gene.

#### **3. Discussion**

There have been many genes conferring hybrid seed sterility, hybrid pollen sterility, and segregation distortion found on *Oryza* chromosome 6 in inter-and intra-specific crosses, most of which *O. sativa* is involved with [21–27]. However, to our knowledge, the segregation distortion caused by seed abortion after fertilization has not been reported in the genus *Oryza*. We name this gene *SEED DEVELOPMENT 1* (gene symbol: *SDV1*), according to the gene nomenclature system for rice [28], because this gene is involved in the early seed developmental stage. In the intraspecific crosses among *O. sativa*, there have been no reports of gene distortion or partial seed sterility phenomena as described above on chromosome 6, though other phenomena have been reported [21–27]. Therefore, all *O*. *sativa* is thought to share the same functional dominant allele found in T65. This allele was called *Sdv1-s* (s for *sativa*). The homozygotes of the W1297 allele and the Jpn2 allele of this locus do not exist in the T65 genetic background probably because they die at the early seed development stage. W1297 and Jpn2 have originated from different places in Australia: W1297 is from Northern Territory, and Jpn2 is from Queensland. According to Juliano et al. [29], most crosses between Northern Territory and Queensland accessions produced sterile hybrids. Our preliminary results showed the hybrids from the reciprocal crosses between W1297 and Jpn2 were highly sterile (unpublished data). DNA marker-based analyses showed *O. meridionalis* genetic differentiation corresponding to geographic origin [29]. Further, in the CSSL lines of an *O. meridionalis* accession, W1625, chromosomal segments in a T65 genetic background [12], no lines were fixed for the W1625 chromosomal segment on which *SDV1* locus is located (https://shigen.nig.ac.jp/rice/Oryzabase/locale/change?lang=en). The results described above on the whole suggest that all *O. meridionalis* share the recessive abortive allele. This allele was called *sdv1-m* (m for *meridionalis*).

Though carrying the recessive abortive allele in homozygous form at the *SDV1* locus, the *O. meridionalis* accessions can self-fertilize and bear seeds. We speculate that the *SDV1* gene may have been duplicated before the divergence between *O. meridionalis* and the other AA genome *Oryza* species, and that *O. meridionalis* has lost the function of the *SDV1* gene and has kept the function of the other gene while *O. sativa* kept the function of the *SDV1* gene and has lost the function of the other gene (Figure 6). Such duplication and loss of reproductive barrier-related genes has been reported; Yamagata et al. [30] found that the reciprocal loss of duplicated genes encoding mitochondrial ribosomal protein L27, essential for the later stage of pollen development, causes hybrid pollen sterility in F1 hybrid between *O. sativa* and *O. glumaepatula*. Nguyen et al. [31] reported that the duplication and loss of function of genes encoding RNA polymerase III subunit C4 hybrid causes pollen sterility in F1 hybrid between *O. sativa* and *O. nivara* (annual form of *O. rufipogon*). Ichitani et al. [32] performed linkage analysis of hybrid chlorosis genes in rice, and found that the causal recessive genes *hca1-1* and *hca2-1* are located on the distal region of the short arm of chromosome 12 and 11, respectively, known to be highly conserved as a duplicated chromosomal segment.

There are other models explaining the hybrid incompatibility (abortion, lethality, or weakness) known as the Bateson–Dobzhansky–Muller (BDM) model (for a review, Bomblies et al. [33]). In the incompatibility caused by the two nonallelic dominant genes, if the one locus is heterozygote or fixed for the incompatibility-causing allele, both heterozygotes and homozygotes of the incompatibility-causing allele of the other locus should show incompatibility. If the one locus is fixed for the normal allele, incompatibility does not occur. In the incompatibility caused by the heterozygote on one locus, only heterozygotes should show incompatibility. Therefore, these models cannot explain the segregation distortion in this study. In the hybrid breakdown model proposed by Oka [34], the combination of the heterozygotes or the homozygotes of recessive alleles at one locus and the homozygotes of recessive alleles at the other locus show incompatibility. This model cannot explain the segregation in this study either. Therefore, the gene duplication model as described above fits the phenomenon in this study best.

As the counterpart of *SDV1*, "the other" putative gene is named *SEED DEVELOPMENT 2* (gene symbol: *SDV2*). *SDV1* and *SDV2* are thought to be derived from duplication. *O. meridionalis* accessions and *O. sativa* accessions should carry the functional allele and the unfunctional allele at the *SDV2* locus, respectively. We name these respective alleles *Sdv2-m* and *sdv2-s*. The presence and chromosomal location of *SDV2* have not been elucidated. We are undertaking the genetic analysis of *SDV2*, tracing back to earlier backcrossing populations.

**Figure 6.** A genetic model that explains segregation distortion and seed sterility assuming gene duplication and loss of gene function for seed development.

If useful genes of *O. meridionalis* for rice genetic improvement are located close to *sdv1-m*, the introgression of these genes into *O. sativa* genetic background should be combined with *Sdv2-m*. Therefore, the chromosomal location of *SDV2* and tightly linked DNA markers to it are urgently needed.

In the frequently cited high-density rice genetic linkage map by Harushima et al. [35], the centromeric region of chromosome six is located between 64.7 cM and 65.7 cM. Some of the DNA marker sequences located on the centromeric region are available in NCBI (https://www.ncbi.nlm.nih.gov). C574 (accession name: D15395) is located at 13,685 kb, and G294 (accession name: D14774) is located at 17,056 kb in Nipponbare genome (Os-Nipponbare-Reference-IRGSP-1.0). Therefore the physical size of the centromeric region is at least 3371 kb. The candidate chromosomal region of *SDV1* encompasses this region (Tables 1 and 2). Recombination events were, in general, highly suppressed around the centromere. Our result is consistent with that. The combination of high resolution linkage analysis with gene expression analysis, gene disruption, and association study will be necessary to identify the *SDV1* gene.

Seed development is dissected into embryogenesis and endosperm development. We are undertaking microscopic observation of seed development of *Sdv1-s sdv1-m* heterozygotes in the T65 background to define the cause of the seed abortion. Several genes required for embryogenesis and endosperm development have been reported [36–39]. Identification of the *SDV1* and *SDV2* genes will contribute to the molecular genetics of seed development.

Direct evidence supporting the gene model was that the DNA from the embryo of aborted seeds deriving from the heterozygote of KGC6\_1202 was homozygous for the W1297 allele. In our preliminary experiments, we tried to extract DNA from them, modifying the method below so that the DNA concentration would be higher. A few embryos were homozygous for the W1297 allele. However, PCR failed in most cases. This suggests that embryogenesis stops at an early stage in the homozygotes of *sdv1-m*. One alternative approach might be to extract DNA from developing seeds, not from mature seeds. Combination of microscopic observation of developing embryo and DNA genotyping will contribute to understanding the abortion mechanism caused by the *sdv1-m* gene.

According to the chloroplast genome analyses by Wambugu et al. [40], Yin et al. [41] and Sotowa et al. [42], *O. rufipogon* in Australia carries a chloroplast genome similar to that of *O. meridionalis* rather than that of *O. rufipogon* in Asia and *O. sativa*, probably because of chloroplast capture (introgression). During the process of chloroplast capture, some nuclear genome genes could be shared by *O. rufipogon* in Australia and *O. meridionalis*. However, they might carry distinct alleles on *SDV1* and *SDV2* loci. When the DNA sequences of these alleles of the two loci are uncovered, this information can be applied for the analysis of plants growing in the wild, and possible ongoing hybridization between *O. rufipogon* and *O. meridionalis* can be monitored in the Northern part of Australia, in which the two species are sympatric. Hybrids have been found in the wild and confirmed by molecular analysis [43], but the low frequency of these hybrids and the continued existence of the two distinct AA genome taxa in the northern Australian environment may be explained by these genes that create a reproductive barrier.

#### **4. Materials and Methods**

#### *4.1. Plant Material*

Three wild rice strains, Jpn1, Jpn2, and W1297, and one cultivated rice cultivar Taichung 65 (T65) were used in this study. W1297 is a strain of *O. meridionalis* collected in Darwin, Northern Territory, Australia, and provided by National Institute of Genetics, Mishima, Japan. Jpn1 and Jpn2 were collected in Australia with the permission from the Queensland government, under the EcoAccess program [42]. Judging from its perennial life history, typical of Australian *O. rufipogon*, and Indel marker genotypes, Jpn1 was classified as *O. rufipogon*. The Australian *O*. *rufipogon* population at the Jpn1 site has been shown to have a chloroplast similar to that of *O*. *meridionalis* and a nuclear genome closer to *O*. *rufipogon* [44] suggesting it may need to be considered as a distinct taxon. Jpn2 was distinct with a short anther, typical of *O. meridionalis*, and perennial life history in its habitat in Queensland, Australia [42]. *O*. *meridionalis* is now described as including both annual and perennial types [45]. It had Indel marker genotypes that were the same as 18 *O. meridionalis* Core collection accessions. It was treated as a type of *O. meridionalis* based on five Indel DNA markers that reflect varietal differentiation in comparisons, such as Indica–Japonica, temperate Japonica–tropical Japonica with high accuracy [46,47]. Our visual observations indicated that the three wild rice strains each

showed a uniform phenotype in the first growing year, suggesting that they had been fixed for at least the loci controlling agronomic traits. Before anthesis, the panicles of the wild rice strains were covered with bags made of glassine paper to force self-fertilization in every generation. The selfed progeny also showed uniform phenotypes. The preliminary analysis of DNA markers covering the whole 12 chromosomes indicated that they were homozygous at all the DNA marker loci. T65 is a Japonica cultivar used frequently in the study of rice genetics, as a recurrent parent of CSSLs, isogenic lines, and the study of induced mutation [12,21,48].

We bred CSSLs in a T65 genetic background incorporating the three Australian wild rice strains, W1297, Jpn1, and Jpn2, chromosomal segments by recurrent backcrossing. First T65 was crossed with W1297, Jpn1, and Jpn2 as pollen parents. One plant per each wild rice strain was used for producing the F1 generation. Then the F1 was backcrossed with T65 as a pollen parent in all subsequent backcross generations with some exception described above. A total of 39, 43, and 33 BC1F1 plants were obtained using W1297, Jpn1, and Jpn2 as donor parent, respectively. All the BC1F1 plants were backcrossed with T65. One BC2F1 plant originating from each BC1F1 plant was backcrossed with T65 to produce BC3F1. One BC3F1 plant originating from each BC2F1 plant was backcrossed with T65 to produce BC4F1. W1297, Jpn1, and Jpn2 have many characters different from T65, such as late heading, red pericarp, long awn, and easy shattering. Some BC3F1 plants had such characteristics in T65 genetic background, which was suitable for genetic dissection of these characters. As a model character, late heading was selected. We selected the latest BC3F1 plants and collected seeds from these plants to produce a BC3F2 generation. As shown above, because the segregation of genes conferring days to heading did not fit the expected Mendelian single gene segregation, we focused on the analysis of the distorted segregation. The BC4F2 generations deriving from the late heading BC4F1 plants were also examined.

Plant cultivation followed Ichitani et al. [48]. Germinated seeds were sown in nursery beds in a greenhouse. About two weeks after sowing, seedlings were transferred out of the greenhouse. About 30 days after the sowing date, seedlings were planted in a paddy field at the Experimental Farm of Kagoshima University, Kagoshima, Japan. The fertilizers applied were 4, 6, and 5 g/m2, respectively, for N, K2O, and P2O5. Plant spacing was 15 × 30 cm. Sowing and transplanting were done respectively on May 31 and June 24 in 2015, on May 27and June 28 in 2016, May 25 and July 4 in 2017, respectively. Hybridization was performed as follows: For emasculation, panicles of the egg donor were soaked in hot water at 43 °C for 7 min. For pollination, the upper half of the open spikelets were cut about 30 min after emasculation. All the closed spikelets were cut off. Then pollen of the pollen donor was scattered on them. After pollination, panicles were covered with bags made of glassine paper. At least one panicle was left without pollination to check whether emasculation was complete.

#### *4.2. Trait Evaluation*

Heading date was recorded for each plant when the first developing panicle emerged from the leaf sheath of the flag leaf. Heading date was converted into days to heading. Seed fertility was evaluated by collecting 50 seeds from the upper side of each of the three panicles, using a modification of the method of Wan and Ikehashi [49], counting fertile and sterile spikelets on the upper half of 3–4 panicles for each plant. Seeds were scored as fertile or sterile. In the W1297 cross, sterile seeds were dehusked to see if sterility occurred before or after fertilization. The BC4F2 plants that produced the BC5F1 generation were dug up, and transferred from the paddy field to a glass house a day before pollination. We empirically know that rice plants undergoing such a treatment show lower seed fertility, probably because of root damage. Therefore, we did not evaluate seed fertility of these plants. Panicles of some plants were damaged by birds after heading. This is the reason for the inconsistency in BC4F2 plant number among tables and figures.

Pollen fertility of the BC4F2 (W1297) population was evaluated using iodine-potassium iodide solution. Panicles were collected about three days after emerging from the leaf sheath of the flag leaf, and dried in paper bags at room temperature. All the anthers in a spikelet collected one day before anthesis were cleaved to gather pollen on a glass slide. Pollen were stained with iodine-potassium

iodide solution. More than 200 pollen grains were scored for each individual. Densely stained pollen with a normal size were scored as fertile. The other pollen were scored as sterile.

#### *4.3. DNA Analysis*

DNA from leaves and embryo from fertile seeds was extracted according to Ichitani et al. [48] with some modifications: Each leaf tip, 2.5 cm long from a single plant, or embryo from dehusked seeds was put in a well of a 96-deep-well plate. Then 100 μL of extraction buffer (100 mM Tris–HCl (pH 8.0), 1 M KCl, and 10 mM EDTA) was added with a 5-mm-diameter stainless steel ball to the well. After being covered with a hard lid, the plate was shaken hard (ShakeMaster ver. 1.2; BioMedical Science Inc., Tokyo, Japan) for 1 min to grind the leaves or embryos. After centrifuging, the plate was incubated at 70 ◦C for half an hour, then at room temperature for half an hour. Then 10 μL of the supernatant was recovered and 8 μL of 2-propanol was added. After centrifuging, the supernatant was discarded and the DNA pellet was rinsed with 50 μL of 70% ethanol. The DNA pellet was dried and dissolved in 50 μL of sterilized distilled water. It was very difficult to separate the embryo from the other part of seed completely. However, our preliminary experiment indicated that even if DNA was extracted from the whole dehusked seeds produced by a heterozygote for a DNA marker such as KGC6\_12.02 (Table 1), DNA marker segregation was observed, suggesting that DNA from the parts of the dehusked seed other than the embryo was negligible. PCR mixture, cycle, electrophoresis, DNA staining, gel image documentation also followed Ichitani et al. [48].

#### *4.4. DNA Markers*

Most published PCR-based DNA markers for *Oryza* are based on an *O. sativa* genome sequence such as Nipponbare (Os-Nipponbare-Reference-IRGSP-1.0, [50]) and 9311 (GCA\_0000046551, [51]). However, a preliminary survey comparing the genome of Nipponbare (IRGSP 1.0) and that of *O. meridionalis* accession (GCA\_000338895.2. [7]) showed that there were many discrepancies between them, leading to expected failure in amplification from the *O. meridionalis* genome when using *O. sativa* genome-based DNA markers. Our strategy of designing co-dominant DNA markers was that insertion/deletion (indel) polymorphisms ranging from 5 to 100 base pairs were searched for between the Nipponbare and the *O. meridionalis* genomes. Then, the indels found only between *O. meridionalis* and Nipponbare, not between *meridionalis* and two Indica cultivars, 93-11 and HR-12 (GCA\_000725085), were selected. The event causing such indels were thought to have occurred in Japonica rice after Japonica-Indica differentiation. T65 is a typical Japonica cultivar. Our preliminary survey showed that T65 shared the banding patterns of Nipponbare in most of the DNA markers examined [52]. Therefore, the indels as described above were expected to show polymorphism between T65 and *O. meridionalis*. The selected indels were screened based on sequence similarity surrounding indels between Nipponbare and the *O. meridionalis* genomes. The primer design followed Busung et al. [53].

**Author Contributions:** Conceptualization, K.I.; methodology, K.I.; validation, D.T. and M.U.; formal analysis, D.T., M.U., and K.I.; investigation, D.T. and M.U.; resources, S.T., T.S., R.H., R.I., K.I.; data curation, D.T., M.U., K.I.; writing—original draft preparation, D.T. and K.I.; writing—review and editing, D.T., M.U., S.T., T.S., R.H., R.I. and K.I.; visualization, D.T., M.U., and K.I.; supervision, K.I.; project administration, R.I.; funding acquisition, R.I.

**Funding:** This research was funded by JSPS KAKENHI Grant Number JP16H05777 from the Japan Society for the Promotion of Science.

**Acknowledgments:** We are grateful to the National Institute of Genetics for their kind provision of W1297 seeds. We thank Mr. Masaaki Ikenoue, Mr. Nishiobino Tsubasa, Ms. Yoko Nakashima and Ms. Asako Kobai for their technical assistance.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

1. Henry, R.J.; Rice, N.; Waters, D.L.E.; Kasem, S.; Ishikawa, R.; Hao, Y.; Dillon, S.; Crayn, D.; Wing, R.; Vaughan, D. Australian *Oryza*: Utility and conservation. *Rice* **2010**, *3*, 235–241. [CrossRef]


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Relationships between Iraqi Rice Varieties at the Nuclear and Plastid Genome Levels**

#### **Hayba Badro, Agnelo Furtado and Robert Henry \***

Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, QLD 4072, Australia; haybaq@yahoo.com (H.B.); a.furtado@uq.edu.au (A.F.)

**\*** Correspondence: robert.henry@uq.edu.au

Received: 16 September 2019; Accepted: 5 November 2019; Published: 7 November 2019

**Abstract:** Due to the importance of the rice crop in Iraq, this study was conducted to determine the origin of the major varieties and understand the evolutionary relationships between Iraqi rice varieties and other Asian rice accessions that could be significant in the improvement of this crop. Five varieties of *Oryza sativa* were obtained from Baghdad/Iraq, and the whole genomic DNA was sequenced, among these varieties, Amber33, Furat, Yasmin, Buhooth1 and Amber al-Baraka. Raw sequence reads of 33 domesticated Asian rice accessions were obtained from the Sequence Read Archive (SRA-NCBI). The sequence of the whole chloroplast-genome was assembled while only the sequence of 916 concatenated nuclear-genes was assembled. The phylogenetic analysis of both chloroplast and nuclear genomes showed that two main clusters, Indica and Japonica, and further five sub-clusters based upon their ecotype, *indica*, *aus*, *tropical*-*japonica*, *temperate*-*japonica* and *basmati* were created; moreover, Amber33, Furat, Yasmin and Buhooth1 belonged to the *basmati*, *indica* and *japonica* ecotypes, respectively, where Amber33 was placed in the *basmati* group as a sister of cultivars from Pakistan and India. This confirms the traditional story that Amber was transferred by a group of people who had migrated from India and settled in southern Iraq a long time ago.

**Keywords:** rice (*Oryza sativa*); evolutionary relationships; chloroplast genome; nuclear genome; phylogeny

#### **1. Introduction**

Rice is grown in a wide range of environments worldwide, however, most of the world's rice is cultivated and consumed in Asia [1–3]. Iraq has favorable agricultural conditions for rice cultivation, where rice is a staple food for the majority of the Iraqi people [4]. In Iraq, rice grows as a summer crop, and there are a number of traditional, introduced and improved rice varieties that are cultivated in the central and southern region, as well as in the valleys of northern Iraq [1].

The variety Amber is the most important local Iraqi rice variety and is characterised by high quality in terms of taste (aromatic character) [1]. It has been cultivated in central and southern Iraq, especially in the marshes, for a long time. Anecdotal evidence suggests that Amber was introduced to the marshlands of southern Iraq when water buffalo breeding was introduced to the region by a foreign group from the south Asia, probably from the Indian subcontinent. This popular view was reported in a study by Al-Zahery et al. [5], that highlighted the paternal and maternal origin of the human population in the marsh areas, and observed marginal influences of Indian origin on the gene pool of an autochthonous population of the region. A number of rice varieties have also been introduced to Iraq since the middle of the last century to improve rice productivity [6]. IR8 was the first variety introduced in 1968 by the International Rice Research Institute (IRRI) (Philippines), it has high yield potential but the grain quality has not been high compared to Amber. Since aroma is one of the key traits in determining grain quality in rice [7], Amber became a control variety in the central

and south regions of Iraq to assess the grain quality of introduced varieties [1]. Accordingly, Furat and Yasmin were also introduced from Vietnam to Iraq in the late 20th century because they are aromatic, tolerant to limited water, highly productive, and have high grain quality [4]. An understanding of the origin of local Iraqi rice and the genetic relationships between Iraqi rice and Asian domesticated rice will effectively guide Iraqi rice breeding (the aim of the current study). However, few studies have investigated Iraqi rice in general, and the origin and the evolution of Iraqi varieties, especially Amber, in particular [4,8,9].

Each living organism is the consequence of an evolutionary process [10]; therefore, it is imperative to enrich our perception of the evolutionary history of organisms and the relationships among them to guide their genetic improvement. Methods of determining evolutionary history (Phylogeny) have undergone many stages of development. Morphological markers maybe influenced by environmental factors and growth practices. More recent methods have used molecular markers which are independent of environmental factors [11], including techniques such as RFLP, AFLP, RAPD, SSR and ISSR along with morphological markers, to study phylogenetic relationships [12]. The development of high-throughput sequencing technology has revolutionised the study of genetics and evolutionary relationships. Most recently, through next-generation sequencing (NGS), whole-genome sequencing and re-sequencing have become available, so the investigation of the entire genome, rather than targeting precise regions, is now a real opportunity [13–15].

Every plant cell has three genomes—nuclear, chloroplast, and mitochondrial—that may differ in evolutionary history. The chloroplast genome is a maternal genome which is highly-conserved and not involved in recombination, therefore, it is the most commonly used tool to determine the origin and the evolutionary relationships among plant species [16–19]. However, sometimes, evolutionary analysis based on the chloroplast genome must be supported by nuclear genome-based analysis to achieve the most reliable results because the chloroplast genome can only represent the maternal evolutionary history with a slow evolutionary rate [20,21]. Phylogenetic analysis using the nuclear genome can deliver inconsistent trees due to recombination that may confuse phylogenetic resolution. However, this analysis provides greater insights into evolutionary relationships. Several studies have strongly suggested applying this analysis along with chloroplast phylogenetic analysis [18,19,22]. Many studies have applied phylogenetic analysis at both genome levels [23–25], and the results of most of these studies showed that the nuclear genome followed a different evolutionary history pattern to that of the chloroplast genome.

We reported the whole chloroplast genome sequences for Iraqi rice and compared them with the whole chloroplast sequences of other domesticated Asian rice varieties. This provided an important tool for estimating genetic distance and determining evolutionary relationships between rice accessions; the nuclear genomes also provided further information on the relationships between the varieties studied. The study aimed to determine the origin and evolution of Iraqi rice, especially Amber33.

#### **2. Results**

#### *2.1. DNA Sequencing and Data Processing*

The sequencing process of the five Iraqi varieties (Table 1) generated about 51 Gb of data containing 337 million of 151-bp paired-end reads. The minimum and the maximum number of reads were about 58 and 93 million reads with sequence depth ranging between 23× and 38× for Buhooth1 and Furat, in turn. When raw data was trimmed at the quality limit of 0.01, an average of 15% of the reads' length and 9% of the number of reads were removed, thus the number of reads and data coverage reduced to the range between 53 and 86 million, and 18× and 30×, respectively (Table S1). In terms of downloaded data (Table 2), the average length of raw reads was 83-bp, and the minimum and the maximum number of reads ranged between 43 and 117 million reads while the sequence coverage fluctuated between 10× and 26×. Finally, the number of reads and the data coverage of each of the data sets were assessed after trimming the raw reads at the quality limit of 0.01 (Table S1).


**Table 1.** The Iraqi plant materials used in this study.



32 domesticated Asian rice accessions and one domesticated African rice as an out-group downloaded from SAR-NCBI: their unique ID, species, country of origin, and ecotype was from the study of [26]; the alignment names were generated in this study. \* In: *indica* subpopulation, TrpJ: *tropical japonica* subpopulation, TmpJ: *temperate japonica* subpopulation, Aus: *aus* population, Bas: *basmati* population, Ch: China, Indo: Indonesia, Jap: Japan, Pak: Pakistan, Phil: Philippines, Sril: SriLanka, Viet: Vietnam.

#### *2.2. Chloroplast Genome Assembly*

Mapping all varieties against the reference, *O. sativa* sub sp. japonica *Nipponbare* "GenBank: GU592207.1", under three various fraction settings clarified the most accurate and reliable mapping setting. The number of mismatches and gaps of each variety was virtually stable in all different settings (Table S2). Indeed, this stability confirms that most of these variations were produced from actual differences between the sequences of samples and reference, not due to using different settings; based on that, setting number two (length fraction (LF) of 1 and similarity fraction (SF) of 0.8) was applied to the other steps of assembly, Improvement process (Imp). Moreover, three different settings of Word "W" and Bubble "B" size in *de novo* assembly generated a satisfactory number of contigs that cover the whole chloroplast genome area, around five large chloroplast-contigs with a length of more than 12 kb produced from each setting. Subsequently, four main regions of the chloroplasts, large single copy (LSC), inverted repeat A (IR A), small single copy (SSC) and inverted repeat B (IR B), were assembled successfully for all 38 varieties through a *de novo* assembly pipeline. The lengths of these regions were about 80 kb for LSC, 12 kb for SSC and 20 kb for IR A and IR B (Figure S1 shows only Iraqi rice varieties). In manual-curation, the comparison between both sub-approaches of the chloroplast genome assembly pipeline showed no significant differences in terms of the number of variations; however, any minor conflicts were resolved by reference to the reads (Table S3 shows only Iraqi rice varieties). The minimum and maximum lengths of the whole chloroplast for all Iraqi varieties and downloaded accessions were 134,259 and 134,556 bp, respectively, while the coverages ranged from 839× to up to 11,466×, and the average coverage was 3818× (Table 3).


**Table 3.** The results of the chloroplast and nuclear genome assembly.

The table includes the length of the chloroplast genome, the number of bases of mapped reads, and the coverage of assembled chloroplast genome for five Iraqi varieties and 32 domesticated Asian accessions and one domesticated African rice as an out-group downloaded from SAR-NCBI. This table also shows the length of the nuclear genome.

#### *2.3. Phylogenetic Analysis of the Chloroplast Genome*

Two phylogenetic approaches were used to analyse the multiple alignments of thirty-nine chloroplast genomes which had a total length of 134,535 bp. Although the result of both phylogenetic methods showed some minor alterations at the end of some subclades, the content of the main clades and subclades, which followed their ecotype classifications, were identical (Figure 1). Phylogenetic analysis of the chloroplast genome divided the thirty-nine rice accessions into two main clades, an Indica clade and a Japonica clade. The Indica clade (In) included most individuals under *indica* (6 accessions) and *aus* (5 accessions) ecotypes except two individuals, B009 and IRIS\_313-10718. The Japonica clade contained two subclades, a main Japonica clade and a Basmati clade; the first subclade which was the main Japonica clade (Jap) included all *japonica* individuals (13 accessions) from the two subpopulations of *japonica* ecotype, *tropical* and *temperate*, while the second subclade, the Basmati clade (Bas), involved all individuals of *basmati* ecotype (6 accessions) and the excluded individuals from the first clade (Indica). Additionally, the Iraqi varieties were distributed as following: Furat, Yasmin and Amber al-Baraka into the Indica clade whereas Amber33 and Buhooth1 into the Japonica clade. Buhooth1 was close to accessions from *tropical japonica* ecotype more than accessions under *temperate japonica* ecotype, and interestingly, Amber33 was located within the Basmati subclade.

The multiple alignments of chloroplast genomes comprised 134,535 bp, the number of identical sites was 134,270 characters (99.8%) while the number of variable bases among all the accessions totaled 265 (0.2%). These 265 variable bases were sorted into 85 variation positions which were in turn grouped into four types of polymorphisms including single nucleotide polymorphism (SNP), multi nucleotide polymorphism (MNP), insertions (Ins) and deletions (Del) (Table 4). The most abundant polymorphism types among all accessions were SNPs. Out of 85 polymorphisms, 83%, 12% and 5% were located in the four main regions of the chloroplast genome, LSC, SSC and IR A and B, respectively (Table S4).

**Figure 1.** Phylogenetic relationships among chloroplast genomes of thirty-nine rice accessions. Tree topology based on MrBayes software (branch labels represent probability percentage).

Considering the variations identified, all thirty-nine rice accessions were sorted into three main groups: (1) Indica, (2) Japonica and (3) Basmati. As expected, the highest number of polymorphisms among the species studied (255 bases in 76 variant positions) was found in the Indica group, 11 accessions and 3 Iraqi varieties; within 76 variants, there was only one variation (1-bp deletion at position of 75990 bp) between *indica* and *aus* accessions. While the second largest number of variations (55 bases within 21 variant positions) was within the Basmati group, 8 accessions and one Iraqi variety. Part of the Basmati group, 4 accessions, showed unique polymorphisms (2 variable bases (SNPs) within 2 variant positions), three accessions were from Pakistan IRIS\_313–8656, IRIS\_313–11026, and IRIS\_313–11021) and one from Iran (CX104). As expected, the Japonica group, 13 accessions along with the reference (*O. sativa* sub sp. japonica Nipponbare "GenBank: GU592207.1") and one Iraqi variety, possessed the lowest number of polymorphisms (13 bases within 10 variant positions) (Table S4). Most of the polymorphisms in the Japonica group belonged to only four accessions from *tropical japonica* (TrpJ) subpopulation, including CX352, IRIS\_313–10073, CX243, and IRIS\_313–11248.

Furthermore, a heat-map was drawn according to the number of variable bases (Table S5); in this map, the two main clusters, Indica and Japonica, were clearly distinguished, whereas the Basmati group was comprised within the Japonica group. Within the Japonica group two individuals, 24:IRIS\_313–11479 and 27:IRIS\_313–11248, clearly showed the greatest distances among the rice accessions. This cluster surprisingly also included two individuals, 33:IRIS\_313–10718 and 34:B009, from the *aus* and *indica* ecotype, respectively. There were no variable bases between a number of pairs (dark red in Table S5) such as (3:IRIS\_313–11152 and 4:IRIS\_313–9505), (9:CX126 and 11:CX37), (9:CX126 and 13:CX25), (17:IRIS\_313–10073, and 18:CX243), (19:Ref-GU592207.1 and 20:IRIS\_313–11153), (20:IRIS\_313–11153 and 21:IRIS\_313–10373); and (30:IRIS\_313–8656, 31:IRIS\_313–11026 and 37:CX104); whereas the highest number of variable bases, 260 bases, was found between (14:CX227 and 24:IRIS\_313–11479) (dark green in Table S5). The smallest number of variable bases between Iraqi varieties and other domesticated rice accessions were 1, 3, 1, 6 and 4 bases, those bases were between Iraqi varieties: Amber33, Furat, Yasmin, Buhooth1, and Amber al-Baraka, and the following accessions: 28:IRIS\_313-10670, 12:CX10, 3:IRIS\_313–11152, 20:IRIS\_313–11153 and 5:IRIS\_313–10549, respectively (Table S5).

#### *2.4. Phylogenetic Analysis of the Nuclear Genome*

Within a group of thirty-nine rice accessions, the multiple alignment of 916 concatenated nuclear genes was 621,012 bp in length; the minimum and maximum lengths were 616,099 and 616,393 bp, respectively (Table 3). The nuclear phylogenies using two different methods showed that the two main clusters, Indica and Japonica, and further five sub-clusters were based upon their ecotype, *indica*, *aus*, *tropical japonica*, *temperate japonica* and *basmati* (Figure 2). Unlike the results of the chloroplast phylogeny, the accessions of *indica*, and *aus* ecotypes were represented by two well-resolved subclades within the Indica clade. The Iraqi varieties, Furat and Yasmin, were found in the *indica* subclade while the rest of the Iraqi collection was grouped in the Japonica clade, where Amber33 acted as a sister to all the *basmati* varieties within the *basmati* subcluster, which included all accessions with the *basmati* ecotype. Buhooth1 was part of the *temperate japonica* subcluster that comprised accessions from the *temperate japonica* ecotype and the reference, *O.s japonica* cv. Nipponbare. Amber al-Baraka was a sister to both the Indica and Japonica clades; however, Geneious Tree Builder showed that it was close to the Indica clade, while MrBayes suggested that Amber al-Baraka was closer to the Japonica clade.

**Figure 2.** Evolutionary relationships among the multiple alignment of 916 concatenated nuclear genes of domesticated rice. Tree topology based on MrBayes software (branch labels represent probability percentage).

#### **3. Discussion**

Rice phylogeny has been extensively studied as a better understanding of the evolutionary relationships among rice species is critical for rice breeding programmes as well as comparative genomics studies. Recent advances in next-generation DNA sequencing (NGS) have improved the phylogenetic reconstruction of any plant species including *Oryza*. In this study, both plastid and nuclear genomes were assembled using NGS reads (whole genome DNA sequencing) to identify the phylogenetic relationships among Iraqi rice varieties and other accessions. According to Sims et al. [27], the accuracy of a genome assembly using NGS reads depends on many factors including sequencing depth (coverage) and the accuracy of the assembly pipeline. Therefore, even after trimming, the sequence coverage of the sequenced and downloaded accessions (Table S1) was enough to ensure coverage of all the chloroplast and most of the nuclear genome, thereby guaranteeing a high-quality assembly.

A dual pipeline was applied to the assembly of the chloroplast genome in this study; this pipeline consisted of two procedures, mapping assembly (MA), and *de novo* assembly (*d*A). The comparison between the sequence of *de novo* and mapping consensus showed no significant differences in terms of the number of variations. Interestingly, the variety Yasmin showed no difference in both approaches with regard to a number of variations (Table S3), but the length of the consensuses was different; this observation indicates that even when the number of copies of an insertion or deletion was similar, the number of bases that were inserted or deleted was diverse. Therefore, in agreement with an earlier study [28], a manual-curation step was critical in resolving any conflicts by reference to the reads. A pipeline of nuclear genes assembly was also developed in this study. This pipeline involved

multiple tools on the CLC Genomics Workbench, unlike a previous study [25] that used different software packages to assemble the nuclear genes for phylogenetic analysis at the nuclear genome level. The number of genes selected to represent the nuclear genome in the phylogenetic analysis was only 916 genes with a length of 621,012 bp, considerably lower than that reported previously [25].

Phylogenetic analysis of the chloroplast genome sorted the thirty-nine rice accessions into two main clades, an Indica clade and a Japonica clade (Figure 1). The Indica clade (In) included most individuals under the *indica* and the *aus* ecotypes except for two accessions. Accessions from *indica* and *aus* ecotypes were not clearly distinct but were placed together in one clade; this was confirmed by the results of genetic polymorphism analysis that showed only one variation (1-bp deletion at the position of 75,990 bp; Table S4) between the *indica* and *aus* accessions. The Japonica clade (Jap) contained two subclades, the main Japonica clade (Jap) which included all individuals from the two *japonica* subpopulations, and the Basmati clade (Bas) that included all basmati accessions as well as the individuals excluded from the first clade (Indica). Moreover, the presence of accessions from *aus* ecotype in the Indica clade as well as in the Basmati subclade within Japonica clade agrees with earlier outcomes [18] which indicated that the two different ecotypes, *indica* and *japonica*, might be involved in the origins of the maternal genome in two Korean *aus* landrace rices. This also agrees with the conclusion made by Civán et al. [ ˇ 29]. which suggested that *aromatic* rice resulted from a hybridization between *japonica* and *aus*. Analysis of genetic polymorphisms at the chloroplast genome level revealed that the most abundant variation types were SNPs, 57% of 85 variants (Table 4). This analysis also showed 255 nucleotide differences within 76 variant positions between the *O.sativa* spp. *indica* and the *japonica* reference (GU592207) in agreement with the previous studies of Brozynska et al. [22] and Wambugu et al. [28].


**Table 4.** Summary of the number and types of variants in the chloroplast-genomes of thirty-nine domesticated rice-accessions.

At the nuclear genome level, the phylogenetic analysis using two different approaches sorted accessions from *indica* and *aus* ecotypes into two completely independent subclades within the Indica clade, unlike the result of the chloroplast phylogeny, whilst the second clade was a Japonica clade which included three sub-clusters *tropical japonica*, *temperate japonica*, and *basmati* (Figure 2). Accordingly, the findings of the evolutionary relationship based on nuclear and chloroplast data in the current study aligned with an earlier study by Garris et al. [30] which reported that the closest evolutionary relationships were between *indica* and *aus* groups, and among the *tropical japonica*, *temperate japonica*, and *aromatic* groups. In general, in the present study, the phylogenetic analysis at both genome levels, chloroplast and nuclear, showed relatively comparable evolutionary history patterns with insignificant differences at the end of clades, unlike other studies that recorded significant differences in evolutionary history pattern using both chloroplast and nuclear genomes (regardless of plant materials) [23–25,29]. Furthermore, the phylogenetic trees of both genomes, chloroplast and nuclear, constructed using different methodologies, were highly compatible. However, Amber al-Baraka showed slightly different relationships at the level of the nuclear genome according to the method used; where Geneious Tree Builder software placed Amber al-Baraka close to the Indica clade whereas Amber al-Baraka was closer to the Japonica clade and distant from the Indica clade by MrBayes software. This was unexpected and requires further investigation.

The phylogenetic analysis of both the chloroplast and nuclear genomes indicated that Amber33, Furat and Yasmin, and Buhooth1 belonged to *basmati*, *indica* and *japonica* ecotypes, respectively. Our results supported that Buhooth1 is an improved cultivar, where the nuclear phylogenies showed a divergent relationship to those deduced from the chloroplast genomes, analogous to *temperate japonica* subpopulation and *tropical japonica* subpopulation, respectively. Furat and Yasmin were introduced to Iraq from Vietnam [4], this was obvious by the results of phylogenetic analysis of the nuclear genome, but their chloroplast genome was closely related to accessions from China, India and Philippines. This may be explained by the breeding history of the genotype.

In this study, Amber33, which is local Iraqi variety, was placed in the *basmati* ecotype group as a sister of cultivars from Pakistan and India by analysing the evolutionary relationship at both levels of the genome. Based on distance analysis, the number of differences in the chloroplast genome between Amber33 and all accessions within the Basmati subclade was in the following order: 28:IRIS\_313–10670 O (1 bp), 35:CX59 (1 bp), 30:IRIS\_313–8656 (3 bp), 31:IRIS\_313–11026 (3 bp), 37:CX104 (3 bp), 33:IRIS\_313–10718 (3 bp), 32:IRIS\_313–11021 (5 bp), 34:B009 (6 bp) (Table S5); it can be accordingly concluded that Amber33 is closely related to accession from India which is visibly reflected in the observed phylogenetic tree (Figure 1). This confirms the popular tradition that says that the Amber variety was transferred by a group of people who had migrated from India (the Southeast) and settled in southern Iraq a long time ago.

Recently, the term Basmati has been used to indicate a long-grain and high-quality rice, but this name originally refers to *aromatic* rice because it was derived from the Sanskrit words "Vas" and "Matup" which stand for "aroma" and "ingrained from the beginning", respectively, and then both words were combined making 'Vasmati' which changed to become 'Basmati' later on [31,32]. Therefore, the presence of Amber33 within the Basmati subcluster does not necessarily mean that it is a long grain cultivar; indeed, it is an aromatic medium-grain cultivar. Furthermore, Basmati is a group that can be described basically as the fifth isozyme group identified by Glaszmann [33], and it is closer to the *japonica* group than the *indica* [7,30,34]; this group is also phenotypically diverse as it includes both long or medium grain, and aromatic or nonaromatic varieties [7]. In many studies, this group is also known as the "*aromatic*" subpopulation [7,33], but most of the time it is known as "Group V" to avoid confusion. In this study, we refer to this group as "Basmati" according to Wang et al.'s study [26] which is the information resource of the downloaded accessions.

#### **4. Conclusions**

In the present study, we have assembled the whole chloroplast genome and the nuclear genome of the five Iraqi rice varieties, together with thirty-three domesticated Asian rice, to find the origin of Iraqi varieties, especially Amber33, and to gain insight into the evolutionary relations between Iraqi and domesticated Asian rice varieties. Our results suggest that the possibility of an Indian and/or Pakistani origin for Amber33; to evaluate this hypothesis, further historical biogeographical analyses are required. Moreover, further study on the chloroplast and nuclear genome in Iraqi rice varieties are required to determine the functional genome annotations that might be useful for future rice breeding programmes in Iraq.

#### **5. Materials and Methods**

#### *5.1. Plant Materials*

A total of five varieties of *Oryza sativa* were provided and tested by the Office of Agricultural Research, and Directorate of Seed Testing and Certification, Ministry of Agriculture, Baghdad, IRAQ, respectively. Among these varieties, one variety, Amber33, is local and one of the most highly valued varieties in Iraq because of its fragrance, and two varieties, Furat and Yasmin, were introduced from Vietnam; however, they are successfully cultivated in the central and southern regions of Iraq; while

the other two, Buhooth1 and Amber al-Baraka, are improved varieties [4]. The plant materials used in this chapter are described in detail in Table 1.

#### *5.2. Seed Germination and Growth*

About 15 seeds of each individual, a total of 75 seeds, were first dehusked, and then placed in a container with plenty of liquid fertilizer, Flowfeed EX7, that was diluted to half concentration (full concentration is 0.5 g/1L) to break the dormancy phase; this method was the non-heat treatment method. Once the radicle emerged, the germinated seeds were transferred to a petri dish covered with a layer of tissue that was saturated with liquid fertilizer, and planted within three days. All the germination and planting processes were carried out under extremely restricted quarantine conditions in quarantine facilities.

#### *5.3. DNA Extraction and Sequencing*

After harvesting leaves tissues, total genomic DNA was extracted individually using the modified CTAB protocol described by Furtado [35] with slight modifications. The modifications that were made can be summarised as the following: the mixture of ground plant tissue and nuclear extraction buffer was incubated at 65 ◦C for 60 min with periodic mixing by inverting the tubes every 5 min; as well as the speed and time of centrifuge were increased to 4000× *g* and 7 min, respectively, after the steps of protein denaturation and DNA precipitation. However, the most vital modification in the DNA extraction procedure was the exclusion of the mixture of phenol:chloroform:isoamyl alcohol (25:24:1). The quality of DNA was assessed by NanoDrop™ 8000 Spectrophotometers (Thermo Scientific, http://www.nanodrop.com) while the DNA quantity was estimated by agarose gel electrophoresis (1%, 120 V for 1 h) based on Furtado's study [35].

The whole genomic DNA of Iraqi rice varieties was sequenced by preparing and indexing five PCR-free libraries separately (one library for each variety), then pooling them together and sequencing over a half lane of an Illumina HiSeq 4000 flow-cell at MACROGEN (Seoul, Korea; http://dna.macrogen.com).

#### *5.4. Data Downloaded for Sequence Comparisons*

Raw sequence reads of 33 domesticated rice accessions were sourced from the Sequence Read Archive (SRA)-NCBI website (https://www.ncbi.nlm.nih.gov/sra) using "Download/Search for Reads in SRA" tool on CLC Genomics Workbench version 11.0.1 (CLC Bio, a QIAGEN Company, Aarhus, Denmark; www.clcbio.com). All of the species, except one, were Asian rice (*O. sativa*) relatives. *O. glaberrima*, an African rice, was included as an out-group. All related information such as the sample unique ID, project accession, species, country of origin, and ecotype was obtained from an earlier study [26], as shown in Table 2.

#### *5.5. Data Processing*

The raw reads of both sequenced and downloaded data were subjected to quality control (QC) analysis using the "Create Sequencing QC Report" tool in the CLC Genomics Workbench, which was used to verify the integrity of the data and determine the appropriate trimming score. The low-quality reads were trimmed at a quality limit of 0.01 and a minimum PHRED score of 25 "Trim Sequences" tool on the CLC.

#### *5.6. Chloroplast Genome Assembly*

A chloroplast genome of the domesticated rice was assembled and validated using a dual pipeline approach: (1) mapping assembly (MA), and (2) *de novo* assembly (*d*A) [36]. In the mapping assembly (MA) pipeline, the trimmed reads were mapped against the reference, which is *O. sativa* sub sp. japonica Nipponbare "GenBank: GU592207.1", using "Map reads to reference" tool at three various

fraction settings of length-fraction and similarity-fraction (1) 0.8 and 0.8, (2) 1 and 0.8, and (3) 1 and 0.9, this step was known as "R". Additionally, in an attempt to mend the Cp map, two tools, "InDels and Structural Variants" and "Local Realignment", were applied. This step was named "S". All the analyses of mapping assembly were performed on the CLC Genomics Workbench 11.0.1.

In the *de novo* assembly pipeline, the Fast "F" model was used with combinations of Word "W" and Bubble "B" settings. Contigs generated by *de novo* were blasted against the Cp reference *O. sativa* sub sp. japonica Nipponbare "GenBank: GU592207.1" to select the Cp-exclusive contigs, and they were then updated using the "Map Reads to Contigs" tool on the CLC Genomics Workbench 11.0.1. Lastly, the updated contigs were aligned to a reference sequence to recognise overlaps and gaps using Clone Manager Professional 9.0 (www.scied.com). When non-overlapping contigs were produced, supplemental *de novo* assembly was conducted at various W-and B-settings to plug all gaps by creating additional contigs, and then all the overlapping contigs were subjected to the further analysis.

An additional improvement process was performed on both the mapping and *de novo* assembly pipelines. The improvement (Imp) process was similar to the mapping assembly (MA) pipeline, repeated twice, Imp-1 and Imp-2, with one difference, the consensus generated from each process would be a reference for the following process. The sequences of both improved Cp consensus generated by the mapping and *de novo* improvement processes were compared to identify all mismatches and then were manually corrected by reference to the reads; this step was named "manual-curation" (Figure S2). Eventually, the Cp sequence of each variety was ready for the phylogenetic analysis.

#### *5.7. Phylogenetic Analysis*

The consensus chloroplast sequences of the Iraqi rice and the other domesticated rice accessions were used to perform a phylogenetic analysis using Geneious software version 9.1.8 (https://www. geneious.com). The multiple alignment was conducted using the plugin MAFFT Alignment [37] with default parameters; subsequently, to analyse evolutionary relationships; the phylogenetic tree was constructed through software that roots the constructed tree based on the outgroup method: MrBayes [38], and Geneious Tree Builder. The distance between the chloroplast genomes of Iraqi and comparative rice was determined by detecting all the variants using the "variant/SNP detection" tool on Geneious software and then counting the differences (number of bases which are not identical), one of the outputs of the phylogenetic tree construction process.

#### *5.8. Phylogenetic Analysis of the Nuclear Genome*

An evolutionary relationship analysis at the level of the nuclear genome was undertaken using the CLC Genomics Workbench 11.0.1 and Geneious software version 9.1.8; this analysis started with the nuclear genome assembly (NGA) pipeline (Figure S3). In NGA pipeline, the "Map Reads to Reference" tool was used to map the trimmed reads of the Iraqi rice (Table 1), and the domesticated rice accessions from Asia and Africa (Table 2) against the reference, which is *O. sativa* sub-spp. Japonica cv Nipponbare "GenBank: IRGSP1.0", applying the following setting: length-fraction of 1 and similarity-fraction of 0.8. After mapping, the consensus sequence of a whole genome for each variety was extracted using the "Extract Consensus Sequence" tool, and from that, the genome and coding sequence (CDS) tracks were generated by the "Convert to Tracks" tool. By investigating the CDS tracks for all varieties, a subset of 916 genes was identified in all varieties, and then the nucleotide sequences of 916 CDS were separately extracted from the genomes using the "Extract Annotations" tool. At the final stage of the nuclear genome assembly (NGA), all the nucleotide sequences of the 916 CDS selected from each genome were concatenated into a super-matrix of 621,012 bp by the "Join Sequences" tool. The super-matrices of all varieties were then aligned using multiple alignments MAFFT [37] on Geneious at default parameters; the alignment output was used in the following phylogenetic inference. A phylogenetic tree was constructed and rooted using the outgroup methods which are MrBayes [38], and Geneious Tree Builder (https://www.geneious.com); the default tree search settings were applied for both methods.

*Plants* **2019**, *8*, 481

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2223-7747/8/11/481/s1, Figure S1: The results of *de novo* Assembly on Clone Manager Professional 9.0 software; Figure S2: Illustration of the Chloroplast Genome Assembly Pipeline; Figure S3: Illustration of the Nuclear Genome Assembly Pipeline. Table S1: Summary of the output of sequencing and downloading (Raw Data) and trimming processes; Table S2: Summary of Mapping Assembly process using three different setting of Length fraction and Similarity Fraction; Table S3: Comparison between Mapping and *de novo* assembly in the number of variations in the chloroplast-genome; Table S4: Details of the polymorphisms identified in aligned chloroplast-genomes using the "variant/SNP detection" tool; Table S5: Distance matrix corresponding to the number of non-identical bases in the sequences of domesticated-rice chloroplast-genomes.

**Author Contributions:** Conceptualization, R.H. and H.B.; Methodology, H.B and A.F.; Software, A.F. and H.B.; Validation, H.B.; Formal Analysis, H.B.; Investigation, H.B.; Resources, H.B.; Data Curation, H.B.; Writing-Original Draft Preparation, H.B.; Writing-Review & Editing, R.H., A.F. and H.B.; Visualization, H.B.; Supervision, R.H. and A.F.; Project Administration, A.F.; Funding Acquisition, H.B.

**Funding:** A PhD scholarship was provided by HCED Iraq program.

**Acknowledgments:** We would like to acknowledge the HCED Iraq program for providing PhD scholarship and sincerely thank the Office of Agricultural Research and Directorate of Seed Testing and Certification (Ministry of Agriculture, Baghdad, IRAQ) for providing us with the seeds of five Iraqi varieties for phylogenetic analysis. We acknowledge the University of Queensland Research Computing Centre (UQ-RCC) for providing all the computing resources.

**Conflicts of Interest:** The authors declare no conflict of interest.

**Data Availability Statement:** All NGS sequence data as raw data was submitted to NCBI at the Sequence Read Archive (SRA) and is available as SRA Submission# SUB6410326 (under BioProject# PRJNA576935 and BioSample# SAMN13014963, SAMN13014964, SAMN13014965, SAMN13014966, and SAMN13014967 represent Amber33, Furat, Yasmin, Buhooth1, and Amber al-Baraka, respectively).

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Review* **Advances in Molecular Genetics and Genomics of African Rice (***Oryza glaberrima* **Steud)**

## **Peterson W. Wambugu 1, Marie-Noelle Ndjiondjop <sup>2</sup> and Robert Henry 3,\***


Received: 23 August 2019; Accepted: 25 September 2019; Published: 26 September 2019

**Abstract:** African rice (*Oryza glaberrima*) has a pool of genes for resistance to diverse biotic and abiotic stresses, making it an important genetic resource for rice improvement. African rice has potential for breeding for climate resilience and adapting rice cultivation to climate change. Over the last decade, there have been tremendous technological and analytical advances in genomics that have dramatically altered the landscape of rice research. Here we review the remarkable advances in knowledge that have been witnessed in the last few years in the area of genetics and genomics of African rice. Advances in cheap DNA sequencing technologies have fuelled development of numerous genomic and transcriptomic resources. Genomics has been pivotal in elucidating the genetic architecture of important traits thereby providing a basis for unlocking important trait variation. Whole genome re-sequencing studies have provided great insights on the domestication process, though key studies continue giving conflicting conclusions and theories. However, the genomic resources of African rice appear to be under-utilized as there seems to be little evidence that these vast resources are being productively exploited for example in practical rice improvement programmes. Challenges in deploying African rice genetic resources in rice improvement and the genomics efforts made in addressing them are highlighted.

**Keywords:** African rice; climate change; genomic resources; genetic potential; genome sequencing; domestication; transcriptome and chloroplast

#### **1. Background**

African rice (*Oryza glaberrima* Steud) is one of the two rice species that have undergone independent domestication, the other one being Asian rice (*Oryza sativa*). African rice was domesticated about 3500 years ago from its putative progenitor, *Oryza barthii*. These two cultivated species play a vital role in enhancing food security in sub-Saharan Africa where the popularity of rice as a staple food is rising rapidly [1]. Despite this growing popularity of rice, the region is yet to attain self-sufficiency in rice production [1]. In order to achieve self-sufficiency, significant yield increases are required in order to ensure almost complete closure of existing gap between current and potential yields [2]. Climate change is however predicted to be a major threat that is likely to hamper the attainment of these enhanced yields in sub-Saharan Africa [3]. Being part of the *Oryza* primary gene pool and with its wide adaptive potential, African rice presents an important genetic resource that can support the breeding of high yielding climate resilient rice genotypes. Though its production is limited to only a few rice-growing agro-ecologies in West Africa, African rice is of global importance as it is a source of readily available genetic diversity for rice improvement.

Genomic science presents novel tools for exploiting the genetic potential of African rice for accelerated rice productivity. Over the last one decade, there has been tremendous technological advances especially in DNA sequencing which have provided various genomic and genetic tools which have been pivotal in dramatically expanding the frontiers of crop research. Some of these advances include changes in sequencing instruments, chemistry, read length, throughput and bioinformatic tools. In African rice, some of the tools and resources that have been provided by these advances include complete genome reference sequences [4], novel mapping populations [5,6], bacterial artificial chromosome libraries [7] and numerous high throughput molecular markers [4,8]. Other advances include various analytical and bioinformatics tools, resources and platforms. This paper reviews some of the remarkable advances in knowledge that have been witnessed in the last few years in the area of genetics and genomics of this African indigenous *Oryza* species. Challenges in exploiting the immense genetic potential of African rice are highlighted.

#### **2. Genetic Potential and Capacity for Climate Change Adaptation**

Literature suggests that African rice possesses important traits that impart great adaptability to various biotic and abiotic stresses as well as climate change adaptation. The superior drought and thermal tolerance capacity of African rice has been reported [9]. This African indigenous rice species may have developed these traits as an adaptive mechanism against the harsh sahelo-saharan climate which is largely characterized by arid conditions. This drought tolerance is achieved through a series of morphological, phenological and physiological responses. Bimpong et al. [10] reported that, compared to Asian rice, some accessions of African rice have capacity to retain more transpirable water when faced with drought stress. These authors suggested that these accessions have capacity to close stomata early enough during periods of drought as a biological survival strategy that ensures effective use of available water. Some varieties of African rice are early maturing and are therefore able to escape terminal drought [11]. It has been found to have thin leaves that roll easily during drought thus reducing transpiration and thin roots which have a high soil penetrative capacity thereby helping in extracting water from the soil [12]. Its leaf and root architecture traits play an important role in enhancing drought tolerance. In a study conducted by Bimpong et al. [13], alien introgression lines derived from a cross between *O. glaberrima* and *O. sativa* had higher yields under drought conditions than the *O. sativa* parent. This demonstrates the potential of transferring drought related traits from *O. glaberrima* to *O. sativa*. About half of the beneficial alleles in the novel drought related quantitative trait loci (QTLs) identified in this study were derived from African rice. In a related study, Shaibu et al. [14] evaluated a total of about 2000 accessions of African rice for drought tolerance and found that some accessions had higher yields under drought conditions than the CG14 *O. glaberrima* drought tolerant check. Though the African rice genotypes were not significantly different from those of the *O. sativa* checks, they provide an important genetic resource for widening the gene pool that can be used to breed for drought tolerance.


Natural variation that imparts greater thermal tolerance and adaptation to heat stress compared to *O. sativa* has been identified [9]. This adaptation is particularly important as recent modelling studies have reported potential massive rice yield declines in the West Africa's Sahel region due to high temperature-induced reduction in photosynthesis [3]. African rice therefore possesses valuable genetic diversity for breeding for heat stress in the face of the ever-growing problem of climate change and variability. It has been found to be more tolerant to phosphorus deficiency than Asian rice [36]. Climate change is predicted to lead to increased soil salinity especially in low lying coastal areas and it is expected that this will cause a significant decline in rice yields [37]. Farmers in West Africa where salinity is high have reported that the key strategy they use in mitigating against salinity is planting of tolerant African rice varieties [8]. Owing to its high salt tolerance, African rice seems to be an important source of genes for breeding against salinity. Various types of predictions such as climate modelling show that of all regions, sub-Saharan Africa will be worst hit by climate change [38]. Incidentally, this region has low technological, financial and infrastructural climate change adaptive capacity. It is critically important that rice breeders in sub-Saharan Africa lay concrete strategies for exploiting these important African rice traits for climate change adaptation. Effective deployment of these adapted genetic resources will enhance the resilience and sustainability of rice production systems. In order to leverage the power of genomic tools in taking advantage of these traits, there is need to decipher the loci associated with this adaptive potential or phenotype. Table 1 summarizes the genetic potential of African rice in terms of resistance to a wide range of biotic and abiotic stresses among them drought, soil acidity, iron and aluminium toxicity and weed competitiveness [39].

#### **3. Genetic and Molecular Basis of Important Traits**

Genomic research holds the key to greater understanding and unlocking of genetic potential of both wild and domesticated species. In order to leverage the potential of African rice in rice improvement programmes there is need for sound understanding on the molecular underpinning of the functionally important variation. The lack of knowledge on the molecular and genetic basis of important traits acts as a major impediment in the deployment of African rice genetic resources in rice improvement. Aided by the increased availability of genomic resources and other remarkable advances in genomics and molecular genetics, the last couple of years have seen concerted efforts in linking genotypes and phenotypes. These have led to discovery of more loci or causal mutations associated with various traits particularly tolerance to various biotic and abiotic stresses. African rice has superior tolerance to a broad array of nutrient deficiencies and toxicities which are prevalent in most soils. In addition to previously detected QTLs for resistance to iron toxicity which seem stable across genetic backgrounds and environments, seven novel ones were identified [40]. Phosphorus deficiency is a major constraint in rice production particularly in sub-Saharan Africa. A novel allele that is associated with enhanced uptake of phosphorus has been identified in the *OsPSTOL1 (P-Starvation tolerance*) gene which is a major gene controlling the uptake of phosphorus. Candidate genomic regions that are associated with high mineral concentrations among them being key micronutrients have been identified [41]. A genome wide association study based on whole genome resequencing identified genomic regions controlling tolerance to salinity and geographic differentiation, with a total of 28 single nucleotide polymorphisms (SNPs) associated with various salt tolerance traits being identified [8]. This genetic resource of SNP markers is vital for plant breeding and adapting African rice to saline conditions.

Transcriptomic and histological analysis of African rice has identified a set of novel candidate genes for resistance to root knot nematode, *Meloidogyne graminicola*, a pest responsible for major yield losses in *O. sativa* [42]. A second major gene, *RYMV 2,* controlling resistance to *Rice Yellow Mottlel Virus (RYMV*) which is one of the most devastating rice infecting viruses in Africa, has been identified in *O. glaberrima* [26]. Efforts to fine map the *RYMV2* gene led to the identification of a putative loss-of-function one base deletion mutation in one of the candidate genes for *RYMV2*. This low frequency mutation was highly associated with *RYMV* resistance and affected a gene homologous to the *CPR5* defense gene in *Arabidopsis thaliana* [43]. Using *O. sativa* and *O. glaberrima* introgression lines, Gutierrez et al. [44] for the first time identified a major factor QTL controlling Rice stripe necrotic virus located on chromosome 11. These authors also identified a host of other QTLs for various traits, signifying the power of chromosome segment substitution lines (CSSL) as a genetic mapping tool. The continued identification of such locus and alleles is important in rice improvement as it assists in marker assisted selection.

The regulatory mechanisms of key domestication traits are increasingly being unravelled using genomics. Analysis of chromosome segment substitution lines with different genetic backgrounds revealed that the awnless phenotype in African rice was due to a novel recessive allele in the *Regulator of Awn Elongation 3 (RAE3)* gene located on chromosome 6 [45]. Other studies have identified genetic architecture of traits that were selected for by farmers during domestication for adapting the crop to their farming systems. A study by Li et al. [9] has uncovered the QTL responsible for thermal tolerance. Further analysis of this QTL identified a candidate gene, *OsPAB1 (Os03g0387100),* which was differentially expressed under heat stress and may have been selected for by farmers for adapting African rice to high temperatures. An African rice specific functional SNP, *H99*, in this gene was also identified that may allow the marker assisted introgression of thermal tolerance-enhancing alleles from African rice to other varieties. Despite the growing popularity of genome wide association studies, [8] its application seems limited in African rice as most researchers working on African rice seem to still rely on QTL mapping which has less resolution. Similarly, the application of systems genetics approaches to understand complex traits has been minimal or almost non-existent.

#### **4. Genomic and Transcriptomic Resources**

#### *4.1. Genomic Sequences*

African rice has one of the smallest genomes in the *Oryza* genus and its assembled reference is about 20% smaller than that of its domesticated counterpart (Table 2). Size differences between the various species are due to lineage-specific expansion and contraction of genes and gene families during the evolutionary process [46]. The first draft genome of African rice was presented by Sakai et al. [47]. This draft genome which was produced through whole genome shot gun sequencing had a size of about 206 Mb, which corresponds to about 0.6X coverage of the African rice genome whose size is estimated to be about 357Mb [48]. Though this genome sequence provided some useful insights on genomic evolution of African rice, it had limited utility as a large portion of the genome was missing. A few years later, a much-improved reference sequence in terms of assembly and annotation was released by Wang et al. [4] under the International *Oryza* Mapping and Alignment Project (IOMAP). Based on the estimated size of the *O. glaberrima* genome, this reference seems incomplete. Recent studies have also reported various assembly errors [49–51]. The CG14 reference sequence was assembled against the *O. sativa* Nipponbare reference sequence and may therefore have missed some *O. glaberrima* specific polymorphisms. The *PSTOL1* locus which is the major gene controlling the uptake of phosphorus was for example found to be missing from the assembled CG14 reference. The *PSTOL1* locus is located within Phosphorus uptake 1 (*Pup1*) which is the major QTL for phosphorus uptake. Aligning the *PSTOL1* locus with unplaced scaffolds revealed that it was present in an unanchored scaffold belonging to chromosome 12. Further analysis revealed that this particular loci and the adjacent sequence of a *Pup1* specific INDEL region spanning about 90 kb is absent in the Nipponbare reference, thus explaining the gap in the CG14 reference in this particular genomic region [51]. Though the identified assembly errors and gaps may hinder its effective utility in rice genetics and genomics, this reference sequence is arguably the most valuable genomic resource for African rice and has opened opportunities for detailed studies on this species. The CG14 reference is also relatively poorly annotated [52,53]. Owing to the importance of this species as a source of readily accessible diversity for rice improvement, there is need for concerted global efforts from the rice scientific community to initiate efforts aimed at improving the quality of this reference sequence.


**Table 2.** Important assembly and annotation features of selected *Oryza* species.

Genomic studies have revealed that one reference sequence is not enough to represent the full genetic variability present in a species [57,58]. It is against this background that additional varieties of African rice were sequenced. Moreover, as stated, recent studies have identified various errors and gaps in the CG14 reference in addition to its relatively poor annotation [52,53]. These challenges have necessitated additional sequencing efforts to address them. In this regard, sequencing, de novo assembly and annotation of two additional genomes was undertaken. Similarly, using the same de novo approaches, the CG14 genome was also reassembled. These sequencing efforts yielded assemblies which, though smaller and more fragmented, produced better resolution in some loci such as *RYMV1* than the original CG14 reference. As shown in Table 3, they also predicted more protein coding genes than the IOMAP generated reference [50]. A similarly higher number of genes were reported by Zhang et al. [46]. A high-quality reference genome is fundamental for various genetic and genomic applications such as functional and comparative genomics. Lack of quality genomic resources has in some cases limited capacity to validate gene function thereby hindering the unlocking of novel trait variation.

**Table 3.** Description of various African rice genome assembles.


Source: [4,50].

In addition to the whole genome sequences, advances in genomics have presented opportunities that have fuelled the development of other types of genomic resources, key among them being high throughput genetic markers. The IOMAP led initiative in which the CG14 variety reference genome was sequenced, generated the first large set of genomic data for African rice. Sequencing 20 diverse accessions of *O. glaberrima* identified a total of 4,447,424 SNPs [4]. Recently, Meyer, et al. [8] generated a genome wide SNP map that contained a total of 2.32 million SNPs by resequencing a total of 93 landraces. Molecular characterization of the *O. glaberrima* accessions conserved in AfricaRice genebank using diversity arrays technology (DArTseq) led to the identification of 3834 polymorphic SNPs [59]. Over 1.4 million Simple Sequence Repeats (SSR) have been identified in the African rice genome [46,47] providing a useful set of genetic markers. The lack of a dedicated set of high throughput markers that can study polymorphisms in interspecific crosses between Asian and African rice has been blamed for the limited exploitation of African rice genetic resources in interspecific breeding [60]. In order to address this gap, Pariasca-Tanaka et al. [60] developed a cost-effective high-throughput genotyping panel comprising of 2015 polymerase chain reaction (PCR)-based SNPs out of which 322 were polymorphic between the two species. These genomic resources provide versatile tools for

dissecting the genetic basis of agriculturally important traits, for population genomic studies and other modern breeding applications.

#### *4.2. Chloroplast Genome Sequences*

The first chloroplast genome sequences were published by Mariac et al. [61]. Additional chloroplast sequences including data for multiple accessions were later reported by [62] (Figure 1). These authors used a combination of both de novo and read mapping approaches in assembling the genomes. To date, a total of six African rice chloroplast genomes have been released, with the sizes ranging from 132,629–134,661 bp. The significant differences in the sizes of the various released genome sequences can be attributed to the protocol used in retrieving the chloroplast sequences and the assembly approaches used. However, the genome assembled by Mariac et al. [61], with a size of 132,629 bp, appears to be unusually small and to the best of our knowledge is the smallest of all the *Oryza* genomes that have so far been assembled. These genome sequences are providing a versatile tool for use in population genetics, phylogenetic and phylogeographic studies. Wambugu et al. [62] used chloroplast sequences to establish the phylogenetic relationships between African rice and other species constituting the *Oryza* primary gene pool.

**Figure 1.** Gene map of *O. glaberrima* chloroplast genome [62].

#### *4.3. Transcriptomic Resources*

Transcriptome analysis has played an important role in supporting the assembly, annotation and analysis of the African rice genome and that of other species including *Oryza* wild species. Wang et al. [4] generated, to our knowledge, the largest multi tissue transcriptomic data for African rice. This RNA sequencing data was used to identify assembly gaps in the CG14 African rice reference. Sequence analysis identified seven genes that were missing in the reference, but RNA data indicated that they were transcribed clearly pointing to genome assembly gaps. In the same study, RNA sequence data was used to conduct comparative analysis of domestication genes in African rice and its progenitor. RNA sequence data was used to confirm the deletion of some genes in African rice among them the ortholog of the *O. sativa* shattering gene (*OsSh1*) which may have been lost during the process of evolution. While transcription for the *O. glaberrima* shattering gene (*OgSh4*) was detected in *O. barthii*, expression level for this gene was found to be limited or absent in African rice. This RNA sequencing analysis together with the analysis of mutation profiles led to the conclusion that African and Asian farmers may have targeted the same traits and genes but sometimes selected different mutations during the domestication process. A similar conclusion was made by Win et al. [63] who used gene expression analysis to unravel the genetic mechanism underlying loss of seed shattering in African rice. Zhang et al. [46] used transcriptome, EST and homology searches to validate predicted gene models during the annotation of de novo assembled *Oryza* genomes. Zhang et al. [46] used transcriptome, EST and homology searches to validate predicted gene models during the annotation of de novo assembled *Oryza* genomes. Further insights into the domestication process were given by Nabholz et al. [64] who used transcriptome sequencing to analyse the genetic diversity of various African rice transcripts. Genetic variation in African rice was reported to be the lowest for all grass species and perhaps for all domesticated crop species [64,65]. As noted by Ndjiondjop et al. [59], it might appear puzzling how a species with such narrow genetic base can possess such unique and exceptional genetic potential in terms of broad resistance to a variety of biotic and abiotic stresses. However, this situation does not seem to be unique as other studies have reported a negative correlation between neutral and functional diversity [66].

Analysis of the transcriptome has been used to decipher the genetic and molecular basis of important morphological, biochemical and physiological traits. African rice and *O. barthii* have uniquely different panicle architectures, but the underlying genetic cause has remained unknown. Comparative RNA analysis of these two African taxa revealed that these differences in panicle morphology are due to expression differences in the miR2118-triggered phased siRNAs [67]. RNA-seq analysis was used to elucidate the cytological and molecular mechanisms of resistance to *M. graminicola* root-knot nematodes, with differentially expressed genes being identified. [42]. Meyer et al. [8] used RNA analysis to identify genes that may be associated with tolerance to salinity based on their gene expression patterns. MicroRNAs are non-coding RNAs that may be involved in regulation of genes involved in response to various biotic and abiotic stresses. In a study analysing miRNAs that are involved in salinity stress response in African rice, Mondal et al. [52] identified a total of 150 conserved and 348 novel miRNAs which may have potential roles in gene expression. A total of 29 known and 32 novel differentially regulated miRNAs were identified suggesting they may have a direct role in response to salinity stress. Additional miRNAs belonging to different gene families have been reported for different *Oryza* species [46,68]. African rice has been found to have less polycistronic miRNA precursors compared to *O. barthii* [69], perhaps as a result of evolutionary and domestication processes. Identification of these important gene expression regulators and their analysis will aid in giving greater insights into their functional and evolutionary roles. This information on the genetic and molecular basis of various traits in African rice is useful to plant breeders as the loci identified can be genetically manipulated in order to impart increased tolerance to biotic and abiotic stresses.

#### **5. Supporting the Conservation and Utilization of African Rice Germplasm Using Genomics**

African rice has huge genetic resources which are conserved in various ex situ conservation facilities globally. The largest collection totalling about 3910 accessions is held at AfricaRice Genebank, with the second largest collection of about 2828 accessions being conserved at the International Rice Research Institute (IRRI). As already stated, these collections are a rich reservoir of genes and alleles that

is important for rice improvement particularly on tolerance to biotic and abiotic stresses. However, this diversity remains grossly underutilized [23,39]. Over the years, biotechnology-based approaches such as molecular markers have played a key role in genebank management. The current advances in genomics, particularly in DNA sequencing, are offering tools that have capacity for revolutionising the conservation and utilization of plant genetic resources [70,71]. However, compared to other areas of plant science, biodiversity conservation has been slow in embracing these technological advances [72]. Recently, there has been a commendable attempt towards leveraging these genomic-enabled advances in supporting the conservation and utilization of African rice germplasm currently conserved at the AfricaRice genebank. The molecular characterization of this collection using high density molecular markers has recently been reported. A total of 2927 accessions were genotyped with 31,739 DArTseq-based SNP markers. This data has assisted in identification of duplicates, constitution of core and mini-core collections as well as identifying human errors during various genebank operations [59,73]. SNP genotyping is assisting in revealing cases of taxonomic misidentification [74] which is common in genebanks and negatively impacts deployment of genetic resources in plant breeding and other research purposes. This is arguably the largest molecular data collected on this collection and presents a valuable resource for supporting decision making on key conservation aspects. Species SNP diagnostic markers which have capacity for accurately discriminating various *Oryza* species have been developed. Next generation sequencing based approaches have been used to identify duplicates in genebank collections [75] thereby providing a basis for rationalising germplasm collections. The current genomics-enhanced revolution will continue providing novel genomics, analytical and breeding tools that allow more rational and efficient conservation as well as more targeted exploitation of genetic resources.

One of the greatest challenges limiting the use of genetic resources particularly those conserved in genebanks is inadequate understanding of their potential genetic value due to inadequate characterization [72,76]. Genetic diversity especially for genebank samples is usually studied anonymously with little or no efforts to identify the functional diversity [70]. Efforts have been made to analyse the genetic variation of *O. glaberrima* conserved at AfricaRice genebank anonymously using molecular markers [74,77,78]. Genome sequencing has been used to identify functional diversity related to different traits particularly on tolerance to biotic and abiotic stresses [8]. However, in most instances, functional diversity has been studied in only a few genotypes for a particular trait, leaving most of the African rice intraspecific variation largely unknown [23,79]. This limited characterization can largely be attributed to cost related considerations as this remains a major limiting factor. Even with the reduced sequencing and genotyping costs, many labs and researchers particularly in developing countries can still not afford to undertake genomic analysis of large sample sizes. Analysis of bulked samples is becoming a popular approach in genetic mapping and population genetic studies as it allows cost effective analysis of a large number of samples [80,81]. Pool sequencing and whole genome-based bulk segregant analysis are some of the commonly used cutting-edge approaches [82–86]. The lack of quality phenotypic data is increasingly emerging as a major bottleneck in establishing phenotype-to-genotype relationships. The on-going rapid advances in genomics seem to be outpacing capacity to undertake high throughput phenotypic analysis. This calls for an urgent need to invest in human resource capacity and physical infrastructure that will ensure enhanced phenotyping capabilities. Major initiatives aimed at exploiting the vast genetic potential of African rice through intra and inter-specific crossing are currently underway. These initiatives include, Rapid Alleles Mobilization (RAM) and Methodologies and new resources for genotyping and phenotyping (MENERGEP) of African rice species and their pathogens for developing strategic disease resistance breeding programs, both of which are being implemented by AfricaRice and other partners [59].

#### **6. Grain Quality and Its Genetic Control**

While priority has been placed on breeding for high yielding crops especially in developing countries, there is also need to ensure that these varieties deliver nutritional security which contributes to human health. Although African rice has potential to contribute genes for improving rice quality [39], this genetic potential has not been deployed in rice improvement and remains poorly studied. However, research interest in the physicochemical and functional properties of starch in African rice is growing [86–93]. Analysis of starch physicochemical properties has revealed that it has unique starch traits [92], a finding that could perhaps explain the renewed interest in starch traits in African rice. Generally, it has higher amylose content (AC) than Asian rice and could be a potential natural source of slowly digestible starch, traits that could confer it potential health benefits [92]. It has been found to have wider diversity of AC than earlier reported [89]. The health benefits of high amylose foods are increasingly being recognised, with such foods being associated with positive gastro-intestinal indices. African rice therefore has potential for use in the development of functional foods [94]. Analysis of *O. glaberrima* introgression lines has revealed that African rice is a novel genetic resource for addressing micro nutrient malnutrition through bio-fortification [41].

Deploying African rice genetic resources in breeding for healthier rice is however constrained by the poor understanding of molecular and genetic mechanisms underlying the unique starch traits, such as AC. Unlike in the case of Asian rice, lack of knowledge on marker-trait associations has hindered the use of marker-assisted selection. Recently, a whole genome based bulk segregant analysis conducted by Wambugu et al. [86] identified genetic markers that are putatively associated with AC. By sequencing bulks of interspecific progenies with low and high AC, this study identified a G/A SNP associated with the *Granule Bound Starch Synthesis (GBSS)* gene located on chromosome 6. Other putative AC associated SNPs were identified in genes encoding the *NAC* and *CCAAT-HAP5* transcription factors located on chromosome 1 and 11 respectively and which have previously been associated with starch biosynthesis. Analysis of natural variation in the *GBSS* locus identified several novel non-synonymous SNPs whose functional importance is still unknown. This study provides useful insights on the genetic control of AC, with the identified candidate genes being novel targets for manipulating AC in African rice.

#### **7. Challenges in Deploying African rice Genetic Diversity in Interspecific Breeding**

One of the greatest challenges that have constrained the deployment of African rice diversity in rice breeding is strong and remnant sterility observed in interspecific crosses with Asian rice. This limits rice breeders from taking advantage of heterosis between the two cultivated species. Over the years, there has been intense research efforts on the sterility barriers between the two cultivated species [95–97]. Various approaches have been used in overcoming these barriers, with the first successful cross being achieved about 3 decades ago through the use of another culture and embryo rescue techniques [15]. The use of these conventional biotechnological approaches led to the development of New Rice for Africa (NERICA) varieties, which is arguably the most successful rice improvement program in sub-Saharan Africa. Research has identified a host of loci which are associated with reproductive barriers in cultivated rice [98]. Among these is the *S1* locus, which has a major effect on this interspecific sterility [99]. Despite the huge initial success that was achieved in generating interspecific crosses between *O. sativa* and *O. glaberrima*, the process is still fraught with technical challenges in addition to being tedious and time consuming.

A variety of other methodological approaches have been developed and their effectiveness in addressing these sterility challenges tested. As reported by Lorieux et al. [98], a multi institutional collaborative effort has made efforts to address the challenge of sterility barriers by developing interspecific bridge lines. These are interspecific crosses between *O. sativa* and *O. glaberrima* and are developed through marker assisted selection of progenies that are homozygous for the *S1 <sup>s</sup>* allele. Due to the large introgressions of *O. glaberrima* genome in these crosses and by significantly increasing fertility in subsequent crosses with diverse *O. sativa* lines, they ensure effective exploitation of useful *O. glaberrima* genes in conventional breeding programmes. Using mutagenesis, Koide et al. [95] isolated a mutant with an allele in the *S1* locus which is associated with increased fertility. Through this forward genetics approach, these authors were able to create a neutral allele which facilitates crossing these two cultivated species. Another closely related challenge is segregation distortion which has been reported

in various genomic regions associated with a sterility locus such as the short arm of chromosome 6 where the *S1* locus is located [44]. Segregation distortion may affect the accuracy of QTL mapping as it may cause the effect of some QTLs to be overestimated. QTLs mapping in regions segregating in non-mendelian fashion should therefore be interpreted with caution.

#### **8. Origin and Domestication of African Rice**

Although there has been exceptional interest in studying the domestication and evolutionary history of African rice over the years, this remains unparalleled to that of Asian rice whose domestication is perhaps the most studied of all crop species. Several theories on the origin of African rice have been proposed but the debate rages on. An Asian origin of this species has previously been advanced but has been rejected [65]. Proposals of African rice having been domesticated from Asian rice in West Africa have been put forward but received very little support [100–103]. The dominant theory around which many studies and opinions appear to converge postulates that African rice was domesticated from *O. barthii* in West Africa. This has been supported by studies using gene sequence analysis [65], chloroplast genome based phylogenetic analysis [62] and population genomics [4]. Despite this general consensus, there has been an underlying complexity in understanding the exact location where domestication took place. Whole genome resequencing studies [4,104,105] have provided great insights on the domestication process and especially on the domestication centre but with key studies giving conflicting theories. Using a population genetics approach, Wang et al. [4] were the first authors to map the domestication centre of African rice using whole genome analysis. By resequencing 94 *O. barthii* and 20 *O. glaberrima* accessions as well as comparative genetic analysis of selected domestication genes, these authors mapped the actual domestication centre along the Niger River, consistent with original proposals from Porteres [106] and later supported by Li et al. [65]. Moreover, this resequencing study identified the specific *O. barthii* population from which African rice was domesticated. Recently, analysis of 246 whole genome sequences similarly mapped the Inner Niger Delta as the domestication centre [105]. These findings have however been disputed by Choi et al. [104] who analysed whole genome resequencing data from 286 African rice and *O. barthii* individuals. These authors proposed a non-centric origin of African rice instead of the single origin theory which has been proposed by many previous studies. Moreover, they reported that the progenitor population proposed by Wang et al. [4] lacked genetic differentiation from *O. glaberrima* and had greater resemblance to *O. glaberrima* than *O. barthii*. They therefore concluded that this population may have been misidentified or constitutes a feral weedy population. Rather than settling the debate on the origin of African rice as would have been expected, it appears the era of whole genome data is leading to greater controversies and conflicting theories. The different approaches used in the analysis of whole genome sequences data and the interpretation thereof may be the cause of these contradictory theories and conclusions.

Characterization of domestication genes has enabled deeper understanding of the domestication process of various species. The recently released *O. barthii* reference assembly [56] forms a valuable resource that will allow more insightful analysis of the evolution and domestication of African rice. Genomic analysis is increasing our understanding on the molecular basis of domestication of African rice. While significant progress has been made in the identification and in some cases cloning of domestication genes in Asian rice, relatively little is known about these genes in African rice [45,63,107,108]. The *O. barthii* reference assembly [56] will facilitate identification and analysis of orthologous loci between the domesticate and its progenitor and hence allow in-depth understanding of the target domestication genes. Analysis of selected domestication genes shows that ancient farmers in Africa and Asia targeted the same set of genes during domestication making it an independent and convergent evolution [4,105]. There is increasing evidence indicating that the genetic and molecular basis of the key domestication traits are in some cases conserved between African and Asian rice [4,109] though in other cases the genes and mutation profiles are different [45]. As highlighted earlier, this points to convergent evolution between the two species driven by human selection. The domestication process was associated with major shifts in various morphological traits, among them being grain size where humans showed a strong preference for big seeds. However, in African rice, the selection process appears not to have followed the dominant trend as far as grain size is concerned as the cultivated species typically has smaller seeds than its progenitor. The shift to small seeds has been attributed to a SNP mutation in the *GL4* gene that led to a stop codon. Interestingly, this mutation also led to loss of seed shattering [110]. Analysis of 93 diverse African rice landraces identified *SH3* as an additional gene controlling seed shattering which together with *SH4* led to multiple seed shattering phenotypes [111]. Using association analysis and positional cloning approach, a C/T SNP underlying the loss in seed shattering was identified [63]. The transition from the prostrate growth of *O. barthii* to erect growth in African rice has been attributed to a mutation in the promoter region of the *PROG7* (*PROSTRATE GROWTH 7*) gene which is located on chromosome 7 [107]. A 113kb deletion mutation in the *RICE PLANT ARCHITECTURE DOMESTICATION* (*RPAD*) locus on chromosome 7 has been reported as an additional genetic factor controlling plant architecture in both Asian and African rice [112]. This knowledge is important for plant improvement as important genetic variation can be introduced by targeting these genes and mutations. The domestication process may have been associated with loss of important diversity which may need to be introduced back in a well-targeted manner.

#### **9. Conclusions**

In order to meet the food and nutritional requirements of the rapidly growing human population, there is an urgent need to increase per capita rice production. More innovations in rice breeding present an option for achieving the much-needed increases in rice productivity. This can be achieved by the development of super-varieties which have capacity to produce high yields per unit area under low water and nutrient input, in addition to being tolerant to diverse biotic and abiotic stresses. African rice offers a variety of these agronomically important traits. Production of such varieties will require sound knowledge in rice genetics and genomics. There is need to leverage the genomic capabilities that have been presented by cheap genome sequencing technologies to advance their contribution in rice improvement. African resource remains an untapped resource that can play a vital role in the development of novel gene pools. Additional efforts are required in the development of more structural and functional genomic resources. Identification of more functional genetic diversity is also of great value in these efforts. The on-going phenotypic and molecular characterization of African rice genetic resources is also critical in enhancing the utility of these resources in rice improvement.

**Author Contributions:** P.W.W. wrote the paper; R.H. and M.-N.N. edited an earlier version of the manuscript; all authors read and approved the final manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interests.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Plants* Editorial Office E-mail: plants@mdpi.com www.mdpi.com/journal/plants

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18