# **Advances in Cereal Crops Breeding**

Edited by Igor G. Loskutov Printed Edition of the Special Issue Published in *Plants*

www.mdpi.com/journal/plants

## **Advances in Cereal Crops Breeding**

## **Advances in Cereal Crops Breeding**

Editor

**Igor G. Loskutov**

MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin

*Editor* Igor G. Loskutov N. I. Vavilov Institute of Plant Genetic Resources (VIR) Russia

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *Plants* (ISSN 2223-7747) (available at: https://www.mdpi.com/journal/plants/special issues/ Cereal Breeding Advance).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Volume Number*, Page Range.

**ISBN 978-3-0365-2650-8 (Hbk) ISBN 978-3-0365-2651-5 (PDF)**

Cover image courtesy of Igor G. Loskutov

© 2021 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

### **Contents**


**Thao Duc Le, Floran Gathignol, Huong Thi Vu, Khanh Le Nguyen, Linh Hien Tran, Hien Thi Thu Vu, Tu Xuan Dinh, Fran¸coise Lazennec, Xuan Hoi Pham, Anne-Ali´enor V ´ery, Pascal Gantet and Giang Thi Hoang**

Genome-Wide Association Mapping of Salinity Tolerance at the Seedling Stage in a Panel of Vietnamese Landraces Reveals New Valuable QTLs for Salinity Stress Tolerance Breeding in Rice

Reprinted from: *Plants* **2021**, *10*, 1088, doi:10.3390/plants10061088 .................. **141**

#### **Shivani Saini, Navdeep Kaur, Deeksha Marothia, Baldev Singh, Varinder Singh, Pascal Gantet and Pratap Kumar Pati**

Morphological Analysis, Protein Profiling and Expression Analysis of Auxin Homeostasis Genes of Roots of Two Contrasting Cultivars of Rice Provide Inputs on Mechanisms Involved in Rice Adaptation towards Salinity Stress

Reprinted from: *Plants* **2021**, *10*, 1544, doi:10.3390/plants10081544 .................. **165**

### **About the Editor**

**Igor G. Loskutov** was born in 1956. In 1978, he graduated from Saint Petersburg State University with a diploma of Agrochemistry and Soil Science. In 1985, he completed his Ph. D (plant breeding) and in 2003, his D. Sc. (botany and plant breeding) from N.I. Vavilov Institute of Plant Genetic Resources (VIR).

He has vast experience in different areas of Plant Genetic Resources: the theoretical, practical and legislative aspects of collecting, evaluation and storage of plant genetic resources; botany, systematic, evolution, distribution and diversity of cultivated and wild oats; genomic, genetics and breeding of the main characters and properties of cereals.

At present, he is Head of the Department of Genetic Resources of Oat, Barley and Rye, N.I. Vavilov Institute of Plant Genetic Resources (VIR), Russia; Professor of Department of Agrochemistry, Biology Faculty, Sankt-Petersburg State University, Russia.

He was supervisor of 8 Bachelors and 3 Masters of Science Dissertations in St-Petersburg State University and supervisor of 10 PhD Dissertations in Vavilov Institute of Plant Genetic Resources.

He has published 437 papers in Soviet/Russian national and international peer-reviewed journals such as *Euphytica, J. Bot., Genome, J. Agric. Food Chem., Gen. Res. Crop Evol., Rus. J. Gen.: Appl. Res., Rus. J. Gen., Agronomy, Plants, Molecules*, etc., and from different journals and book of Springer. He is editorial board member of several International Journals. He has six monographs. He has six Russian patents. Prof. Loskutov also played an instrumental role in different prestigious Russian and internal collaborative research projects with USA, Germany, Italy, Switzerland, Sweden, France, China, etc.

As a Visiting Professor, he made some lectures in Universities and Research Institutes in USA, Germany, Italy, China, Brazil, Sweden, Tajikistan, Finland, France, UK, Turkey, Israel, etc.

### **Preface to "Advances in Cereal Crops Breeding"**

This Special Issue presents some advances in the results of cereal crop breeding. These studies address only some of the bottlenecks in the breeding of specific crops. At the same time, the advances in the modern breeding of grain crops are multifaceted and diverse, occurring in different countries and different continents of the world. At present, in the breeding of agricultural crops, the factor of climate change and the associated changing conditions for the cultivation of many crops important for humanity are becoming increasingly important. Climate change can lead to an excess and a sharp shortage of precipitation, coupled with an increase in temperature, which will affect edaphic factors of plant growth and development, expressed in the salinization/acidification or drying out of the soil. On the other hand, this can lead to the emergence of new diseases and stronger epiphytoties of already known diseases, or to a greater spread of agricultural pests. The above factors ultimately affect the productivity and quality of the products obtained, on which the food security of each country depends. In future, breeders of the world will be assisted in solving many of these problems, along with the traditional ones, using recently developed "omix" genotyping technologies.

> **Igor G. Loskutov** *Editor*

### *Editorial* **Advances in Cereal Crops Breeding**

**Igor G. Loskutov**

Federal Research Center the N.I. Vavilov All-Russian Institute of Plant Genetic Resources (VIR), St. Petersburg 190000, Russia; i.loskutov@vir.nw.ru

Cereals are the main food and feed crops on our planet, with wheat, rice, and maize occupying three-quarters of the total acreage. The vast majority of plant breeders and plant geneticists around the world are engaged in cereal breeding. The genetic resources for crop genepools, including breeding and research materials, landraces, and wild crop relatives, which collectively are the pillars of modern plant breeding, are maintained ex situ in gene banks. The main challenges or bottlenecks in the advanced breeding techniques currently used in cereals are connected with concerns related to climate change, with breeding programs aiming to increase yield and tolerance to biotic and abiotic stresses (e.g., yield potential and resistance to main diseases and pets, as well as increased drought, heat tolerance, and nutrient efficiency). In the last few years, a trend has occurred in cereal crop breeding aimed at combining high agronomic and biochemical parameters in a single cultivar. Currently, traditional genetic and innovative molecular genetic methods are widely used in the breeding of grain crops. The success of biotechnology approaches has expanded the breeding possibilities and allowed interspecies and intergenus hybrids to be obtained. The development of molecular biology and genomics has completely overcome the barriers limiting the breeding of living organisms, while methods for genome editing of agricultural crops are still being improved to achieve higher levels of accuracy. Studies aimed at finding genes and quantitative traits loci (QTLs) that affect the main breeding traits and at identifying the desired allelic variants are currently relevant. In the field of genetic sequencing, genotyping by sequencing, also called GBS, is a method used to discover single-nucleotide polymorphisms (SNP) in order to perform genotyping studies, such as genome-wide association studies (GWAS).

The acquisition of large-scale phenotypic data has become one of the major bottlenecks hindering crop breeding and functional genomics studies. Nevertheless, recent technological advances have provided potential solutions to relieve such bottlenecks and to explore advanced methods for large-scale phenotyping, data acquisition, and data processing in the coming years. The phenomics data generated are already beginning to be used to identify genes and QTL through QTL mapping, association mapping, and genome-wide association studies (GWAS), in order to achieve crop improvements through genomicsassisted breeding (GAB). There is no doubt that accurate high-throughput phenotyping platforms will accelerate improvements in plant genetics.

This Special Issue on 'Advances in Cereal Crops Breeding' comprises 9 papers covering a wide array of aspects, ranging from the expression-level investigation of genes in terms of salinity stress adaptations and their relationships with proteomics in rice, the use of genetic analysis to assess the general combining ability (GCA) and specific combining ability (SCA) in promising hybrids of maize, the use of DNA markers based on PCR in rice, the identification of quantitative trait loci (QTLs) in wheat and simple sequence repeats (SSR) in rice, the use of single-nucleotide polymorphisms (SNP) in a genome-wide association study (GWAS) in cereals, and Nanopore direct RNA sequencing of related with LTR RNA retrotransposon in triticale prior to genomic selection of heterotic maize hybrids.

In order to better understand the mechanisms involved in salinity stress adaptations in rice, two contrasting rice cultivars were compared in a recent study—Luna Suvarna, a salt-tolerant cultivar, and IR64, a salt-sensitive cultivar. The expression-level investigation

**Citation:** Loskutov, I.G. Advances in Cereal Crops Breeding. *Plants* **2021**, *10*, 1705. https://doi.org/10.3390/ plants10081705

Received: 9 August 2021 Accepted: 16 August 2021 Published: 19 August 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

of auxin signaling pathway genes revealed increases in the transcript levels of several auxin homeostasis genes in Luna Suvarna compared with IR64 under salinity stress. Furthermore, protein profiling showed 18 proteins that were differentially regulated between the roots of two cultivars, some of which were salinity-stress-responsive proteins found exclusively in the proteome of Luna Suvarna roots, revealing the critical role of these proteins in imparting salinity stress tolerance. The results show that Luna Suvarna involves a combination of morphological and molecular traits of the root system that could prime the plant to better tolerate salinity stress [1].

The tolerance of rice to salinity stress involves diverse and complementary mechanisms, such as the regulation of genome expression, activation of specific ion transport systems to manage excess sodium at the cell or plant level, and anatomical changes that mitigate sodium penetration into the inner tissues of the plant. The identification of salinity tolerance QTLs associated with different mechanisms involved in salinity tolerance requires the greatest possible genetic diversity to be explored. In the investigation of genotyped rice landraces, SNP markers were used, with the aim of identifying new QTLs involved in salinity stress tolerance via a genome-wide association study (GWAS). Twenty-one identified QTLs colocalized with known QTLs. Several genes within these QTLs have functions related to salinity stress tolerance and are mainly involved in gene regulation, signal transduction, and hormone signaling. This study provides promising QTLs for breeding programs to enhance salinity tolerance and identifies candidate genes that should be further functionally studied to better understand salinity tolerance mechanisms in rice [2].

In addition to water flooding and salinity, rice growers in some parts of the world are also facing drought; thus, developing new rice genotypes tolerant to water scarcity is one of the best strategies to maximize yield potential and achieve water savings. In a recent study, rice genotypes were characterized for grain and agronomic parameters under normal and drought stress conditions and genetic differentiation was determined via specific DNA markers related to drought tolerance using simple sequence repeats (SSR) and cultivar grouping, establishing their genetic relationships with different traits. All genotypes were grouped into two major clusters with 66% similarity based on Jaccard's similarity index. As a result of the study, genotypes were identified that could be included as appropriate materials for developing a drought-tolerant breeding program. Genetic diversity is needed to grow new rice cultivars that combine drought tolerance with high grain yields, which is essential to maintaining food security [3].

Recent studies on the tolerance to biotic and abiotic stressors in rice hybrids with donor lines of the genes of interest showed the effectiveness of such hybrids. As a result of the studies carried out using molecular marking based on PCR in combination with traditional breeding, early-maturing rice lines with genes resistant to salinity (SalTol) and flooding (Sub1A) were obtained, which are suitable for cultivation in southern Russia. The development of resistant rice varieties and their introduction into production will allow us to avoid the epiphytotic development of the disease, preserving the biological productivity of rice and resulting in environmentally friendly agricultural products [4].

The combining ability and genetic diversity of plants are important prerequisites for the development of outstanding hybrids that are tolerant to high plant density. A recent study was carried out to assess general combining ability (GCA) and specific combining ability (SCA), identify promising hybrids, estimate genetic diversity among the inbred lines, and correlate genetic distance (GD) to hybrid performance and SCA across different plant densities. As a result, no significant correlation was found between GD and either hybrid performance or SCA for grain yield and other traits, proving to be of no predictive value. Nevertheless, SCA could be used to predict hybrid performance across all plant densities. Overall, this study presents useful information regarding the inheritance of maize grain yield and other important traits under high plant density [5].

In addition to studying the productivity of plants and genes associated with general adaptability, the genetic improvement of root systems is of interest as an efficient approach to improve the yield potential and nitrogen use efficiency (NUE) of crops. QMrl-7B is a

major stable quantitative trait locus (QTL) controlling the maximum root length in wheat. Two types of near isogenic lines (A-NILs with superior and B-NILs with inferior alleles) were used to specify the effects of QMrl-7B on root, grain output, and nitrogen-related traits under both low-nitrogen (LN) and high-nitrogen (HN) environments. The QMrl-7B A-NILs manifested larger root systems compared to the B-NILs, which is favorable to N uptake and accumulation, and eventually enhanced grain production. This study provides valuable information for the genetic improvement of root traits and breeding of elite wheat varieties with high yield potential [6].

Traditional plant breeding approaches supplemented with SNP markers used for genome-wide associative studies (GWASs) and genetic editing, as well as high-throughput chemotyping techniques, are exploited to speed up the breeding of desired genotypes. To enrich cereal grains with functional components, the new breeding programs need a source of genes in order to improve the contents of the beneficial components. The sources of these valuable genes are plant genetic resources deposited in genebanks, including landraces, rare crop species, and even wild relatives of cultivated plants. Correlations between the contents of certain bioactive compounds and the resistance to diseases or tolerance to certain abiotic stressors suggest that breeding programs aimed at increasing the levels of health-benefiting components in cereal grain might at the same time allow the development of cultivars adapted to unfavorable environmental conditions [7].

Using Nanopore long-term forward RNA sequencing, functionally important but unexplored RNA molecules have been identified, including long non-coding RNAs (lncRNAs), as they are often associated with repeat-rich regions of genomes and transposon-derived transcripts expressed during early stages of seed development in triticale. Detailed analysis of the protein-coding potential of the RTE-RNAs showed that 75% of them carry open reading frames (ORFs) for a diverse set of GAG proteins, the main components of viruslike particles of LTR retrotransposons. This demonstrated experimentally that certain RTE-RNAs originate from autonomous LTR retrotransposons, with ongoing transposition activity during early stages of triticale seed development. Overall, these results provide a framework for further exploration of the newly discovered lncRNAs and RTE-RNAs in functional and genome-wide association studies in triticale and wheat. The results also demonstrate that Nanopore direct RNA sequencing is an indispensable tool for the elucidation of lncRNA and retrotransposon transcripts [8].

Genomic selection (GS) shows great promise in terms of strongly increasing rates of genetic improvement in plant breeding programs. It allows for comparative larger gains from selection by estimating all marker effects simultaneously, while the subsequent selection of genetically superior individuals is based on their genomic estimated breeding value (GEBV) instead of using a few significant markers, as is the case in classical markerassisted selection (MAS). GS is ideal for complex traits with lower heritability and complex genetic architectures.

Genomic selection (GS) can accelerate variety improvement when the training set (TS) size and its relationship with the breeding set (BS) are optimized for the prediction accuracies (PAs) of genomic prediction (GP) models. Sixteen GP algorithms were run on phenotypic best linear unbiased predictors (BLUPs) and best linear unbiased estimators (BLUEs) of resistance to both fall armyworm (FAW) and maize weevil (MW) in a tropical maize panel. Random-based training sets (RBTS) and pedigree-based training sets (PBTSs) were designed to study biotic resistance. For PBTS, the FAW resistance PAs were generally higher than those for RBTS, except for one dataset. GP models generally showed similar PAs across individual traits, whilst the TS designation was determinant, since a positive correlation between TS size and PAs was observed for RBTS, while for the PBTS, this correlation was negative. The resulting population could be of interest in future breeding activities targeted at improving insect resistance in maize and could be potentially useful for GS of complex traits with low to moderate heritability. This study has pioneered the use of GS for maize resistance to insect pests [9].

Advances in cereals breeding to develop new improved cultivars are some of the most important factors in agricultural production, playing an essential role in ensuring sustainable agriculture. Along with classical breeding goals, innovative, modern plant breeding methodologies are applied here to create new cultivars of crops for current and future agriculture applications. This endeavor includes the development of cultivars for stress cultivation conditions to achieve sustainable agricultural production, increased food quality, and increased security, and to supply raw materials for innovative industrial products and to meet the needs of mankind.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Review* **Wheat, Barley, and Oat Breeding for Health Benefit Components in Grain**

**Igor G. Loskutov \* and Elena K. Khlestkina**

Federal Research Center the N.I. Vavilov All-Russian Institute of Plant Genetic Resources (VIR), St. Petersburg 190000, Russia; e.khlestkina@vir.nw.ru

**\*** Correspondence: i.loskutov@vir.nw.ru

**Abstract:** Cereal grains provide half of the calories consumed by humans. In addition, they contain important compounds beneficial for health. During the last years, a broad spectrum of new cereal grain-derived products for dietary purposes emerged on the global food market. Special breeding programs aimed at cultivars utilizable for these new products have been launched for both the main sources of staple foods (such as rice, wheat, and maize) and other cereal crops (oat, barley, sorghum, millet, etc.). The breeding paradigm has been switched from traditional grain quality indicators (for example, high breadmaking quality and protein content for common wheat or content of protein, lysine, and starch for barley and oat) to more specialized ones (high content of bioactive compounds, vitamins, dietary fibers, and oils, etc.). To enrich cereal grain with functional components while growing plants in contrast to the post-harvesting improvement of staple foods with natural and synthetic additives, the new breeding programs need a source of genes for the improvement of the content of health benefit components in grain. The current review aims to consider current trends and achievements in wheat, barley, and oat breeding for health-benefiting components. The sources of these valuable genes are plant genetic resources deposited in genebanks: landraces, rare crop species, or even wild relatives of cultivated plants. Traditional plant breeding approaches supplemented with marker-assisted selection and genetic editing, as well as high-throughput chemotyping techniques, are exploited to speed up the breeding for the desired genotуpes. Biochemical and genetic bases for the enrichment of the grain of modern cereal crop cultivars with micronutrients, oils, phenolics, and other compounds are discussed, and certain cases of contributions to special health-improving diets are summarized. Correlations between the content of certain bioactive compounds and the resistance to diseases or tolerance to certain abiotic stressors suggest that breeding programs aimed at raising the levels of health-benefiting components in cereal grain might at the same time match the task of developing cultivars adapted to unfavorable environmental conditions.

**Keywords:** barley; breeding; marker-assisted selection; genes; genetic resources; genome editing; health benefits; metabolomics; oat; QTL; wheat

#### **1. Introduction**

Cereal crops are the main food and feed sources worldwide, supplying more than half of the calories consumed by humans [1]. An overwhelming majority of plant breeders and geneticists work on no other crops but cereals. Breeding methods depend on the biological features of a crop and on the genetic research standards, traditions, economic objectives, and levels of agricultural technologies in the country where plant breeding is underway. The general breeding trend of the past decades, however, was finding solutions to the problem of higher yields in cereal crops; furthermore, special attention was paid in many countries to increasing plant resistance against diseases and various abiotic stressors. The concentration of all efforts on these two targets and none other resulted in a certain decline in the genetic diversity in those plant characters that are associated with the biochemical composition of cereal grain [2]. In the last few years, cereal crop breeding generated a trend

**Citation:** Loskutov, I.G.; Khlestkina, E.K. Wheat, Barley, and Oat Breeding for Health Benefit Components in Grain. *Plants* **2021**, *10*, 86. https:// doi.org/10.3390/plants10010086

Received: 3 December 2020 Accepted: 30 December 2020 Published: 3 January 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional clai-ms in published maps and institutio-nal affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

aimed at combining high biochemical and agronomic parameters in one cultivar [3–5]. In addition to protein, cereal grains are rich in other chemical compounds, such as fats with their good assimilability by the organism and a well-balanced composition of chemical constituents, including fatty acids [6–10], vitamins of the B, A, Е, and F groups, organic compounds of iron, calcium, phosphorus, manganese, copper, molybdenum, and other trace elements [3], and diverse biologically active compounds–polysaccharides, phenolic compounds, carotenoids, tocopherols, avenanthramides, etc.

In recent years, the world food market has seen the emergence of a wide range of new cereal crop products designed for dietetic purposes. Currently, available data confirm the importance of biochemical composition in cereal crop grains since it underpins their dietetic, prophylactic, and curative effect on the human organism [11]. Cereals are rich in protein, starch, oils, vitamins, micronutrients, and various antioxidants. The research that examines the potential of a number of cereal crops for prophylactic or medicinal uses has been expanding from year to year [12–16]. In addition to determining types of bioactivity for different grain components, an important challenge is to concentrate further efforts of researchers on disclosing the mechanisms of their effect [17].

It is admitted that breeding techniques can help to increase the percentage of individual constituents in the grain to a very high level. An important role in promoting this breeding trend is played by the achievements in modern genetics of cereal crops and traits associated with the quality and dietary value of their products. New breeding programs imply that the developed high-yielding cultivars will combine maximum contents of the abovementioned components and optimal correlations among them with other grain quality indicators and resistance to biotic stressors. Marker-assisted selection techniques are used more and more often to accelerate the development of cultivars enriched in useful grain components [4,18]. There are examples of the works employing genetic editing technologies for these purposes [19–21]. The current review aims to consider current trends and achievements in wheat, barley, and oat breeding for health-benefiting components.

#### **2. Major Dietary Components in Grain and Breeding Programs for Health Benefit** *2.1. Micronutrients*

The long-standing problem of micronutrient deficiencies in human diets is the most significant for public healthcare worldwide. It is especially true for cereal-based diets: They are poor in both the number of micronutrients and their bioavailability for the organism since breeding of these major food and feed crops primarily aims at developing higher-yielding varieties to meet global demand. Due to dilution effects, an increase in grain mass sometimes causes a reduction in micronutrient contents. In most countries, people eat meals produced from cereal crops with low micronutrient content; it is a serious global problem invoked by the uniformity of different diets and may lead to significant health deteriorations [22,23]. Iron-deficiency anemia is one of the most widespread health disorders provoked by the worldwide deficit in micronutrients [24], while zinc deficiency in food is faced on average by one-third of the world's population [25]. Increasing the content of these trace elements in wheat by breeding techniques is considered one of the ways to enhance the consumption of micronutrients with food [26].

It has been noticed that cereal crop cultivars can be enriched in the desired micronutrients through the application of agricultural practices or by plant breeding [22,27–30]. Such procedures, however, might lead to an increase in micronutrient content in leaves but not in grain [31]. Methods combining breeding and agrochemical approaches were proposed to solve this problem: They helped accumulate micronutrients in the edible parts of plants [27–29,32]. There are considerable variations in the concentration of micronutrients in seeds or kernels of most crops [3,32]. Genetic variability in the micronutrient content is often observed to be less expressed in fruit and more in leaves. Nevertheless, screening large collections of staple cereal crops reveals extensive diversity of micronutrient concentrations in their grains [26,32,33]. Increased content of most micronutrients was observed

in local varieties and landraces of wheat and other cereals, compared with improved commercial cultivars [34].

The content of micronutrients in grain was analyzed in 65 commercial Russian cultivars of four major cereal crops: wheat, barley, rye, and oat. Statistically significant variations were found in the content of all studied trace elements (Fe, Zn, and Mn). The highest levels were registered for barley and oat cultivars. Among barley genotypes, the content of Fe, Zn, and Mn varied with a 3-to 5.5-fold difference between the extremes (Table 1). Oat cultivars manifested a 7-fold difference between the extremes in the Zn content and nearly 3-fold in Mn [3].

**Table 1.** Average values and ranges for the content of micronutrients (Fe, Mn, Zn) in caryopses of cereal crops [3].


A detailed study of a set of commercial oat cultivars of different geographical origin in the context of their micronutrient content and biochemical parameters showed that genotypic differences in the Fe and Zn levels in grain were small (1.9–2.7 times), but in Mn, they were relatively high (10.5 times). A 1.8-fold difference was observed between the lowest (10.9%) and the highest (19.3%) protein content levels in oat grain [3]. A wide range of variation in oil content (2.7–8.1%) was found in all studied oat accessions. The amounts of protein, oil, oleic acid, and Zn in grain demonstrated statistically significant positive correlations among themselves [3]. The identified oat cultivars with high nutritive value will be included in breeding programs and used directly in high-quality food production.

Molecular-genetic research on 335 spring barley accessions was conducted for more effective utilization of the micronutrient diversity in cereal crop breeding. A genome-wide association study (GWAS) was employed for mapping quantitative trait loci (QTL) linked to the content of macro- and micronutrients in grain (Fe, Zn, Ba, Ca, Cu, K, Mg, Mn, Na, P, S, Si, and Sr). The analyses of the tested populations helped to identify specific QTL for each of the studied indicators and map them on chromosomes. The QTL identified are valuable for the future development of barley cultivars with increased content of nutrients, especially Zn and Fe [35].

#### *2.2. β-glucans*

A physiologically important dietary component in the grain is (1,3;1,4)-β-D-glucan, or the non-starchy water-soluble polysaccharide β-glucan. This component is reported to be typical of some species of the Poaceae family: its content varies within 3–11% in barley, 1–2% in rye, and <1% in wheat, while in other cereals, it is present only in trace amounts [36]. At the same time, the content and composition of dietary fibers in various cereal crop species are genetically determined. It means, as opined by many scientists, that it is possible to produce new lines of such crops with different correlations between the levels of β-glucan polysaccharides and arabinoxylans that would be optimal for various uses [37–39]. Studying of the β-glucan content in oat and barley cultivars is associated with their uses for dietetic and medical purposes [37,38].

The β-glucans are not evenly distributed within a grain: its larger amount is found in the endosperm cell walls, aleurone, and subaleurone layers, and its content varies from 1.8 to 7% [40,41]. The concentration of β-glucans in oat grain and their degree of polymerization depend not only on the cultivar but also on the conditions of cultivation, grain processing, and post-harvest storage [42].

The presence in the grain of a higher amount of β-glucans, which are dietary soluble fiber (or soluble non-starch polysaccharide), determines the viscosity of oat and barley broths, which have a beneficial effect on important functions of the human gastrointestinal tract, so they are widely used in the food industry for dietetic and curative purposes [36,43]. Among numerous products of barley and oat biosynthesis, probably the most valuable for the human organism is soluble cellulose fibers and β-glucans first of all (also arabinoxylan, xyloglucan, and some other secondary cellulose components), as they can reduce the level of cholesterol in the blood and noticeably mitigate the risk of cardiovascular diseases [38,44,45]. Multiple evidence of the beneficial role played by β-glucans impelled the U.S. Food and Drug Administration (FDA) to make an official statement that soluble dietary fibers extracted from whole oat grain to produce flakes, bran, or flour helped to reduce the risks of cardiovascular diseases [46]. Insoluble fractions of dietary fiber are partly cellulose, xylose, and arabinose [39]. Insoluble dietary fiber has general gastrointestinal effects and, in most cases, has an impact on weight loss. There is convincing evidence that β-glucans contained in oat grain are partially responsible for decreasing the levels of glucose in the human blood and of cholesterol in serum [12]; it is associated with its physicochemical and rheological characteristics, such as molecular weight, solubility in water, and a viscosity [42,47].

Genetic diversity of barley and oats in the content of β-glucans in their grain was evaluated in the framework of two European Union (EU) programs. The HEALTHGRAIN Diversity Screen project resulted in finding significant differences in the content of β-glucans and antioxidants in the grain of five tested oat cultivars [48]. The AVEQ project (*Avena* genetic resources for quality in human consumption) analyzed 658 oat cultivars and confirmed the contribution of both genetic and environmental aspects to the formation of the tested character [49]. It is worth mentioning that, compared with cultivated and other wild di- and tetraploid oat species, higher contents of β-glucans and other antioxidants were found in the hexaploid (wild) *A. fatua, A. occidentalis,* and (cultivated) *A. byzantina*, and diploid (wild|) *A. atlantica* [38,39,49–51].

Measuring the content of β-glucans in oat grain in large and diverse sets of cultivars and species showed that its values were significantly dispersed [37,38,49]. Naked oat forms demonstrated a higher total content of the analyzed polysaccharide than hulled ones, but the latter contained more insoluble β-glucans in their grain [52–54]. Computer modeling helped to provide a ranking of the factors affecting the β-glucan content in hulled and naked oat cultivars during their cultivation. The analysis showed that the selection of the cultivar is the most important parameter of the model for determining the final β-glucan accumulation in grain, among the other factors [55]. There are contradictory data concerning the results of comparative studies on naked and hulled barley as well. Some authors failed to disclose significant differences between these two forms of the crop [56,57], while others found that naked barleys contained more β-glucans than hulled ones [43,58]. Meanwhile, the group of Tibetan naked barleys was reported to have the highest content of β-glucans in their grain [59].

In the meantime, the amount of β-glucans in oat grain is associated with protein and fat accumulation, grain volume weight, and grain productivity [60,61]. The content of these polysaccharides depended on meteorological conditions and agricultural practices used in oat cultivation [61]. The content of β-glucans in barley grain is determined by both the genotype and the growing conditions [43,59,62]; some authors insist that it is the genotype that plays a decisive role [63,64], while others give preference to the environmental conditions [65,66]. When 33 barley cultivars and lines were tested in two arid areas in the United States, it was shown that the variability in the content of β-glucans in grain was determined by the genotype for 51% [64] to 66% [67]. At the same time, the protein content in grain depended on environmental conditions for 69%, whereas yield size and the grain volume weight for 83 and 70%, respectively [64]. The study of 9 barley cultivars and 10 oat ones showed that cultivar-specific differences in the β-glucan content

persisted across the years [63]. The content of β-glucans in the grain is also influenced by plant development phases. It was reported that the content of β-glucans gradually increased in the process of grain formation, and in the maturation phase, it either reached the plateau or decreased [57]. At present, there are contradictory data concerning the linkage of β-glucan accumulation in barley grain with 1000 grain weight, protein content, or starch content [56,62]. Some authors did not find any interplay between these characters, while others reported a positive correlation. When the content of β-glucans was measured in the grain of six-row and two-row barley cultivars, no differences between these two cultivar groups were reported [43].

The 1700 oat lines with mutations induced by TILLING of high-frequency mutagenesis have been produced for breeding purposes with molecular-based, high precision selection methods from cv. 'Belinda' (Sweden) to evaluate the variability of β-glucans content in this crop [68]. Their assessment resulted in identifying 10 lines with β-glucan concentrations in their grain higher than 6.7% and 10 lines with the content of β-glucans less than 3.6% (β-glucan concentration in cv. 'Belinda' was 4.9%). The maximum range of variation in the content of these polysaccharides was from 1.8 to 7.5% [69]. The comparatively recent identification of genes participating in the biosynthesis of β-glucans in cereals [70] and their first genetic map open new opportunities for genetic improvement of grain quality indicators and resulting food products, which is very important for human health [71].

Three markers (Adh8, ABG019, and Bmy2) significantly linked to β-glucan content regulation were identified in barley grain, and a group of *HvCslF* genes was mapped: At least two of them were in the region of barley chromosome 2H explained by the QTL for (1,3;1,4)-β-glucan near the Bmy2 marker [72]. A genome-wide association study (GWAS) employing oat germplasm of worldwide origin from the American Gene Bank was aimed at the identification of QTL linked to β-glucan content in grain and resulted in finding three independent markers closely associated with the target character. A comparison of these results with the data obtained for rice showed that one of the described markers, localized on rice chromosome 7, was adjacent to the *CslF* gene family responsible for β-glucan synthesis in grain. Thus, GWAS in oat can be a successful QTL detection technique with the future development of higher-density markers [73].

By now, the GWAS approach has already started to be used to analyze the association between the genotype and the content of β-glucans and fatty acids in oats. Researchers have identified four loci contributing to changes in the fatty acid composition and content in oat grain. However, genome regions conducive to changing the content of proteins, oils, saccharic and uronic acids, which, in their turn, produce a direct effect on grain quality, remain unexplored [74]. Furthermore, positive correlations were demonstrated in barley between 1000 grain weight and tocol concentration, between dietary fiber content and phenolic compounds, and between husk weight and total antioxidants in hulled barley [38,50].

#### *2.3. Antioxidants*

Cereal crop grains are known to have high nutritive value and contain diverse chemical compounds with antioxidant properties. Research efforts have been undertaken in recent years to study the content of antioxidants in the grain of various cultivated cereals [50,75–79].

Starting in the mid-1930s, oat flour has been used as a natural antioxidant. Later, more in-depth research was done to assess the antioxidative properties of oat flour versus those of chemical antioxidants. It was ascertained that adding sterols extracted from oat to heated soybean oil significantly decelerated its oxidation compared with the reference. At present, along with the extensive utilization of synthetic antioxidants, oat flour has found its stable niche as a natural ingredient in eco-friendly food products [7].

A comparison of bakery products made from wheat that synthesized such antioxidant compounds as anthocyanins with those from an anthocyanin-free wheat line demonstrated that the presence of anthocyanins increased the shelf life of bakery products and their resistance to molding under provocative conditions [80]. Cereal crops contain secondary metabolites with antioxidant activity belonging to three groups: phenolic compounds, carotenoids, and tocopherols [81].

#### *2.4. Phenolic Compounds and Avenanthramides*

Oat and barley grains contain a considerable amount of various phenolic compounds exhibiting biological activity, including antioxidative, anti-inflammatory, and antiproliferative (preventive activity against cancerous and cardiac diseases) effects [50]. One of the most abundant and powerful antioxidants found in nature, the flavonoid quercetin, has been found in wheat. It is characterized by numerous biological effects, including antithrombotic activity [82].

Many published studies testify that a major part of phenolic compounds in grain occurs in a bound form: Their content in oat and wheat grains reaches 75% [83,84]. Phenolic acids, like most flavonoids in cereal crops, are concentrated in structures bound to the cell wall: 93% of the total flavonoid content in wheat and 61% in oats [83]. The highest level of total flavonoids is characteristic of maize grain, followed by wheat, oats, and rice [83]. Phenolic acids are the most widespread phenolic compounds in oats, especially ferulic acid (250 mg/kg), which is present mainly inbound forms linked through ester or ether bonds to cell wall components but also exists in the free form [85].

Bioactive chemical compounds are unevenly distributed within the grain. Grains of four naked barley cultivars were divided into five layers to measure the total phenolic content and total antioxidant activity. The total content of soluble phenolic compounds was observed to decrease from the outer layer (2.8–7.7 μg/g) towards the inner endosperm structures (0.87–1.35 μg/g) [78,86]. It has been proven that most antioxidants contained in whole grain are located in the bran and germ fractions of the grain. For example, wholegrain wheat flour was found to contain in its bran/germ fraction 83% of the total phenolic content in grain and 79% of total flavonoids [87].

In the study of molecular mechanisms of 'melanin-like' black seed pigments known to be strong antioxidants, comparative transcriptome analysis of two near-isogenic lines differing by the allelic state of the *Blp* (black lemma and pericarp) locus revealed that black seed color is related to the increased level of ferulic acid and other phenolic compounds [88]. The melanic nature of the purified black pigments was confirmed by a series of solubility tests and Fourier transform infrared spectroscopy, while intracellular pigmented structures were described to appear in chloroplast-derived plastids designated "melanoplasts" [89]. The most frequently mentioned flavonoids of cereal crops are the flavonols kaempferol and quercetin, the flavanone naringenin and its glycosylated forms, catechin, and epicatechin in barley [90–93].

Pigmentation of the grain's outer coating can be analyzed as an important indicator of antioxidant activity. A barley cultivar with purple grain contained 11 anthocyanins, while only one anthocyanin was observed in black and yellow barley grains. The purple barley bran extract had the highest total antioxidant activity [94]. Another study of naked barley demonstrated the presence of higher antioxidant activity in pigmented grains compared with non-pigmented ones [78]. A study of naked and hulled oats showed that naked oat cultivars had significantly higher values of total antioxidant activity. Among hulled oat cultivars, these values were higher in dark-hulled forms compared with white-hulled oats [50].

Differences between naked and hulled oats and barleys, generated a perfect model interesting for comparative analyses: the mutant barley line for the *Nud* gene (nakedness), derived by gene editing from cv. 'Golden Promise' [21]. Using this model will help to distinguish the pleiotropic effects of the *Nud* gene on the grain's biochemical composition from the influence of closely linked genes.

Analyzing grain extracts of wheat lines with different combinations of the *Ba* (*Blue aleurone*) and *Pp* (*Purple pericarp*) genes on the genetic background of elite cultivars demonstrated a higher diversity of flavonoid compounds in the carriers of dominant alleles of *Ba* and *Pp* genes. Comparing the products made from the grain of a purple-grained line with those from an anthocyanin-free isogenic line revealed significant differences, which

was also true for the samples that had passed a full processing cycle, including baking at elevated temperatures [80,95]. The analysis of anthocyanin extracts obtained under conditions simulating those of food digestion by a human organism showed that ingesting 100 g of bread crisps or biscuits made from flour with added purple wheat grain bran raised the assimilation of anthocyanins to 1.03 and 0.83 mg, respectively, i.e., 100 g of bran would supply the organism with up to 3.32 g of anthocyanins. Besides, purple-grained wheat matched or even exceeded the reference line in the quality and taste of its products [95].

Recently, new high-yielding wheat cultivars, resistant to fungal diseases and having high anthocyanin content in grain have been developed [4]. The efficiency of the breeding strategy lasting only three years from the first cross until the state cultivar competitive testing has been demonstrated. The strategy is based on marker-assisted selection (MAS) [4]. MAS also demonstrated its efficacy in creating barley with certain alleles of anthocyanin regulatory genes [18]. For breeding blue-grained wheat, besides molecular markers, FISH or C-banding are needed since the *Ba* gene is alien for wheat and can be inherited from wheat lines with either 4B or 4D chromosome substituted by the *Thinopyrum ponticum* chromosome 4 [96,97]. Unlike bread wheat, barley has its own *Ba* gene. Recent findings of regulatory features of anthocyanin biosynthesis in barley [98] are useful for both MASbased and genetic editing-based breeding strategies.

Interestingly, 30 years ago, the purple- and blue-grain characters were regarded as having "a limited practical use from a scientific point" [99]. Since that time, some studies demonstrating the health benefit of plant anthocyanins, including those from wheat grain [16], have been carried out, denying the old point of view and proving these traits to be economically important. Commercial cultivars of wheat with increased anthocyanin content have been released in Canada, China, Japan, and several European countries [100,101].

The class of phenolics with antioxidative effect and bioactivity includes avenanthramides (AVA), a class of hydroxycinnamoyl anthranilate alkaloids contained only in oats. Twenty-five components of these compounds were detected in kernels, and twenty in hulls [102]. The most widespread in oats are AVA-A (2p), AVA-B (2f), and AVA-C (2c) [9,103,104]. There is documented evidence that avenanthramides demonstrate antioxidant, anti-inflammatory, antiatherogenic, and antiproliferative activity [105–107].

It has been shown that oat cultivars differed in the AVA content in grain. The cultivated diploid species *A. strigosa* had a very high AVA content reaching 4.1 g/kg, and the hexaploid *A. byzantina* contained 3.0 g/kg. Contrariwise, wild oat species with different ploidy levels were characterized by relatively low AVA content values (240–1585 mg/kg) [108]. Analyzing a representative set of cultivated and wild oat species revealed an even wider diversity of the AVA content in grain [109]. A conclusion has been made that wild oat species are an important source of diversity for breeding programs, which dictates the necessity of further studies into the pattern of AVA content and composition variability across the genus *Avena* L. Wild oat species might incorporate a unique AVA composition, promising for crosses with cultivated oats.

#### *2.5. Tocols*

The health benefits of oats are also associated with the presence of several antioxidant compounds known as tocols, specifically tocopherols and tocotrienols. The fat-soluble vitamin E contains tocopherols and tocotrienols [110], which make the oil more resistant to oxidation. Both tocopherols and tocotrienols have several isomeric forms designated as α, β, γ, and δ [111]. All in all, vitamin E can comprise eight isomers, with prevailing α-isomers (70–85%) and δ-isomers not exceeding 1%. The total tocopherol content in oat cultivars can reach 2.6–3.2 mg/100 g, which is many times lower than in barley [101]. Tocopherols are mainly present in the germ fraction of grain, while tocotrienols are found in the pericarp and endosperm. Tocotrienols prevail in oats, barley, and wheat; their concentrations vary from 40 to 60 μg/g depending on the crop [112].

Eight isomers of tocols have been found in barley grain oil (four tocopherols and four tocotrienols). They play an exceptionally important role, regulating cholesterol in human blood. Tocols also demonstrate very high activity as antioxidants, blocking harmful peroxidation of lipids in cell membranes [101]. Tocols (16–94 mg/kg) consist of a polar chromanol ring linked to an isoprenoid-derived hydrocarbon chain. They are powerful scavengers of free radicals, also demonstrating an ability to inhibit the proliferation of some cancer cells [108].

Furthermore, positive correlations were demonstrated in barley between 1000 grain weight and tocol concentration, between dietary fiber content and phenolic compounds, and between husk weight and total antioxidants in hulled barley [38,50]. Presently, molecular-genetic studies of this type of antioxidant are based on simple-sequence repeats (SSR) markers. It is worth mentioning that the naked barley with the *Waxy* gene and zero amylase content in starch has higher contents of both β-glucans and tocols [113].

#### *2.6. Sterols*

Sterols are important components of vegetable oils. Their content in oat grain varies, according to different sources, from 0.1% to 9.3% of the total fatty acid content. This indicator often depends not only on the oat genotype but also on the extraction technique. Cultivars of rye, wheat, barley, and oats grown in the same year and same location were compared, the highest plant sterol content was observed in rye (mean content 95.5 mg/100 g, wb), whereas the total sterol contents (mg/100 g, wb) of wheat, barley, and oats were 69.0, 76.1, and 44.7, respectively [114]. Among the six components of sterol content, the main one is sitosterol, whose content reaches 70% of the total sterol content; additionally, about 20% are allocated to campesterol and stigmasterol [7,101]. The content of sterols in oats can reach 447 mg/kg and include, in addition to the abovementioned, D-5 and D-7 avenasterols [114] and phytic acid (5.6–8.7 mg/g); the latter manifests antioxidant activity due to its ability to chelate metal ions, thus making them catalytically inactive and inhibiting the metal-mediated formation of free radicals. However, this chelating activity reduced the bioavailability of major minerals [110].

#### *2.7. Carotenoids*

Carotenoids (yellow, orange, and red pigments) relating to isoprenoids are among the most widespread plant antioxidants. Carotenoid content in oat grain can reach 1.8 μg/g [86]; besides, lutein is considered the main xanthophyll in wheat, barley, and oat grains, and zeaxanthin is the secondary one [115].

Comparative investigation of four groups of wheat genotypes (spelt wheat, landraces, old cultivars, and primitive wheat) for carotenoid content and composition in grain revealed a high level of variation among the genotypes and the groups in the content of carotenoids. Lutein contributed 70–90% of the carotenoids in the grain [116]. In durum wheat, which is used for the production of pasta, carotenoid content is also an important technological and market indicator. In semolina and pasta, a yellow color is desirable, and it depends on the carotenoid accumulation in kernels. Genetic dissection of the carotenoid content character showed quantitative trait loci (QTL) on all wheat chromosomes [117]. The major QTL, responsible for 60% of heritability, is located on the long arms of chromosomes 7A and 7B. Variability in these QTL is explained by allelic variations of the phytoene synthase (PSY) genes. Molecular markers for MAS-based breeding programs aimed at the enrichment of durum wheat grain with carotenoid content are available [117].

#### *2.8. Other Antioxidant Compounds*

Oat is the only cereal grass that contains saponins, steroidal glycosides known as avenocosides A and B (65.5 and 377.5 mg/kg, respectively), which exhibit anticancer activity at the expense of diverse, complex mechanisms, including inhibition of neoplasm cell growth through cell cycle arrest and, inter alia, stimulation of cancer cell apoptosis [13]. Oat also accommodates two classes of saponins: avenocosides (steroid-linked saccharides) and avenacides (triterpenoid-linked saccharides), which were shown to drop the cholesterol level, stimulate the immune system, and demonstrate anticancer properties [14]. Targeted breeding for increased content of these compounds in oat lines has not yet been attempted, but interline and interspecies differences in this indicator have already been identified [118]. Grains of five Finnish barley cultivars grown in 2006–2008 were analyzed for their total content of folic acid. It was noted that the external and germ-containing grain layers had the highest levels of this compound (up to 1710 ng/g) [77,79].

#### **3. Assessment of Cereal Crop Genetic Resources According to the Diversity and Concentration of Health-Friendly Dietary Grain Components**

Secondary metabolites associated with quality traits in the released and processed products are presently identified using metabolomic profiling or chemotyping. Such an approach enables researchers to evaluate plant genetic resources according to these traits, including varieties of cultivated species and populations of wild ones. Chemotyping the grain of cultivated and wild *Avena* L. spp. showed that the range of variability in the metabolomic profile of improved cultivars was significantly narrower than that of wild species. Metabolites, the content of which may have been reduced in the process of domestication and breeding in comparison to wild oats, are identified [2]. Presumably, it might be connected with the selection during oat domestication and a decline of metabolome diversity while "domestication syndrome" traits were shaped [119]. The diversity of metabolomic profiles may be lost in the process of selection when highly specialized singleline intensive-type cultivars are developed because this process is always accompanied by a decrease in genetic polymorphism in a breeding object compared with the metagenome of numerous ecotypes, local varieties, and natural races of dozens of wild species [2,119]. A study of naked and hulled oat forms disclosed differences in their metabolites, which serves as an additional justification of the differentiation between these subspecies of common oat [2]. Landraces, which are plant varieties selected and grown regionally but not officially tested and released as registered varieties, are a source of special genetic characteristics derived by (many years of) adaptation to the respective territory. Such local varieties are often more resistant to biotic and abiotic stresses typical for their environment. In addition, such varieties may be a source for special phytochemicals (also known as bioactives) considered as health-beneficial, while the content of these compounds may be lower in commercial cultivar [2,120].

The bands of secondary metabolites in oat accessions exposed to *Fusarium* infection were analyzed, and correlations between metabolites and resistance were disclosed. Highprotein oat forms with increased content of certain secondary metabolites demonstrated less damage from *Fusarium*, accumulated fewer toxins, and were more adaptable to the biotic stress [121].

Matthews et al. [122] used metabolite profiling to compare 45 lines of tetraploid and hexaploid wheat. The extracts were analyzed by the ultraperformance liquid chromatography coupled with time-of-flight mass spectrometry (UPLC-TOF-MS). Two different species of bread and durum wheat formed two distinct groups differing in sterols, fatty acids, and phospholipids, while *T. aestivum* L. split into two groups (corresponding to hard and soft bread wheat) according to differences in heterocyclic amines and polyketides. This and similar studies underpin the use of chemotyping in breeding both for desired agronomic traits and for higher contents of health-benefiting compounds in cereal grain.

Information obtained with the molecular metabolomic approach on mQTL (metabolite quantitative trait loci) and mGWAS (metabolome-based GWAS) ensures a new level for qualitative and quantitative characterization of secondary metabolites interesting for breeding. Such analyses can provide knowledge about the interactions among metabolites themselves and between them and important breeding indicators. It may lead to the development of more rational models linking a certain metabolite with such characters as plant productivity or end-product quality. Even more promising is the possibility to examine the interplay between quantitative variation in metabolites and changes in the plant phenotype [123].

Due to the genetic potential of grain crops through the directed formation of the properties and structure of the kernel in the process of ontogenesis, when developing new cultivars, it is possible to attend to the target component composition of the final product. Wider application of chemotyping, chemical research methods, metabolomic analysis of grain quality, and searching for high content of rare beneficial (dietary or curative) components will result in the release of new crop cultivars, thus promoting next-generation breeding trends and technologies [50].

#### **4. The Effect of Dietary Components in Grain on Life Functions of Plants Themselves**

Content of all biochemical components in the grain of cereal crops there are variations in the composition of it. These variations arise from differences between environments, variation in the genotype of the crop, and interactions between biotic and abiotic factors and genotype. Biotic and abiotic factors change depending on climate change, soil, and various stressors affecting plants. The genotypic variation includes the differences between individual genotypes.

#### *4.1. Biotic Stress Resistance*

Generally, an explanation why grain in the soil is not affected by microorganisms despite the environmental conditions favorable for infection was given by the presence of antimicrobial flavonoid compounds in extracts from barley and wheat grains soaked in water [124]. Higher disease resistance of plants with enhanced flavonoid biosynthesis has been described in rye, barley, and wheat [125]. In vitro infection of developing barley caryopses of wild type and proanthocyanidin-free mutants with fungal pathogens *Fusarium poae*, *F. culmorum,* and *F. graminearum* revealed mutants to be more sensitive to *Fusarium* attack than wild-type plants [126].

Considering the available data on interactions between compounds with antioxidant properties in cereal crop kernels and *Fusarium* spp., it seems appropriate to suppose that some of the former could significantly contribute to the grain's protection mechanism against toxicogenic fungi and mycotoxin accumulation. It has been proven that the crucial role in Fusarium Head Blight (FHB) resistance is played by five main classes of antioxidant metabolites: phenolic acids, flavonoids, carotenoids, tocopherols, and benzoxazinoids [127].

Cereal crop diseases caused by pathogenic and toxicogenic species of the *Fusarium* genus (FHB) inflict serious economic losses worldwide. Therefore, the development of sustainable strategies to prevent FHB contamination and mycotoxin accumulation has become a target of intensive research in recent years, and the use of FHB-resistant genotypes has been chosen as one of the prioritized trends in breeding practice [121,128,129]. Even now, however, the knowledge of complex mechanisms regulating resistance in cereal crops is still insufficient, and selecting resistant genotypes remains a difficult task for breeders. It has been established that, in addition to their fungicidal properties, a number of antioxidant secondary metabolites in cereals can regulate mycotoxin production by various pathogenic fungi [127].

The first weighty general argument in favor of phenolic compounds, carotenoids, and tocopherols is their ability to suppress reactive oxygen species (ROS), thus protecting biological cells. Besides, tocopherols and carotenoids can entrap free radicals of lipid peroxides and, therefore, arrest lipid peroxidation chain development [130]. Cinnamic acid derivatives, such as sinapic, caffeic, *p*-coumaric, chlorogenic, and ferulic acids, are effective inhibitors of *F. graminearum* and *F. culmorum* development, while benzoic acid derivatives, except syringic acid, produce an antiactivating effect [131,132]. There is an opinion that cereal crop metabolites with antioxidant activity suppress toxigenic action of a fungal infection. Numerous research works demonstrated the efficiency of phenolic compounds [133,134], carotenoids [135], tocopherols, and even benzoxazinoids [136] in restraining the growth and mycotoxin production of toxigenic *Fusarium* fungi. Finally, phenolic compounds partaking in plant structure enforcement are known to contribute to building a physical barrier against pathogenic infection. There is a positive interrelation between the content of phenolic acids, both free and bound to the cell wall, and FHB

resistance in wheat [137]. A high level of FHB resistance in barley with the black-pigmented grain is supposedly associated with increased content of phenolic compounds [133].

High-protein oat forms were observed to be less affected by *Fusarium* head blight and accumulate fewer toxins; they are more adaptable to biotic stress. A relationship was identified between FHB resistance and accumulation of pipecolic acid, monoacylglycerols, tyrosine, galactinol, certain phytosterols, saccharides, and adenosine [121].

There were, however, many unproven assumptions on the participation of metabolites in the FHB resistance mechanism in cereals. Although the genetic architecture that supports secondary metabolite synthesis and regulation in cereal crops is exceptionally intricate, such proof may be retrieved in the process of comprehensive genetic and functional genomic studies [127].

Accumulation of avenanthramides in oats is also associated with the penetration of a fungal infection. Avenanthramides are mostly contained in oat grain, but under an attack by crown rust or leaf blotch, they start to synthesize in leaves as a means of protection against disease agents [110]. The fact that the amount of avenanthramides in grain significantly increases during imbibition [138], plant development [139], steeping [140], and storage [141] is also related to plant protection against potential susceptibility to pathogenic flora.

#### *4.2. Abiotic Stress Resistance*

Polyphenolic compounds in grain may protect seeds from unfavorable abiotic environmental conditions. Some of these compounds may act as sunscreens against potentially damaging UV-B radiation [142]. This may explain the presence of a purple grain color and other parts of the plant in tetraploid wheat *T. aethiopicum* Jakubz. [143] adapted to intensive solar UV-B radiation in highland areas in Ethiopia. Studies of near-isogenic wheat lines differing in the anthocyanin content in the pericarp and coleoptile under various stress conditions showed that both pericarp and coleoptile anthocyanins protected seedlings from osmotic stress [144], while protection of seedlings under a moderate irradiation dose (pretreatment of dry seeds with 50 Gy before sowing) or moderate Cd toxicity (25 μM CdCl2) was due to the coleoptile's anthocyanins only [145,146]. Flavonoid substances can prevent negative effects of excessive moisture, such as pre-harvest seed sprouting by reducing the permeability of seed coat to water [147], inhibiting α-amylase (an enzyme whose activity is directly related to seed germination of grain) [148], or inactivating dehydrogenase required for the initial phase of respiration in ripening grain and young shoots [149].

Avenanthramide accumulation in oat grain is affected by weather and geographic conditions under which the studied material is cultivated [109,150–153]. Changes in the concentration of avenanthramides in response to salinity stress in CBF3 transgenic oat demonstrated that these compounds might have a potential role in enhancing abiotic stress tolerance in oats [154]. Havrlentova et al. [155] suggested that oats with higher β-D-glucan content may have thicker and, therefore, more insulating cell walls, better adapted to heat stress conditions. The same conclusion between higher content of β-D-glucan and greater cell wall thickness has been reported in barley [156]. Sterol might be important for cold acclimation of wheat [157,158] and oat [159]. Thus, breeding programs aimed at an increase in the content of health benefit components in cereal grain are at the same time eligible to solve the task of cultivar adaptability to unfavorable environmental conditions.

#### **5. Conclusions**

Each of the abovementioned natural components (dietary or curative) is promising for use as a food additive or an ingredient of pharmaceutical and cosmetic products. They are expected to play an ever-growing role in food industries, expanding the assortment of healthy food for the population. The demand for such products has already instigated plant breeders to launch new breeding programs aimed at the development of cereal crop cultivars with higher contents of bioactive components in grain. Such programs have often been based on molecular breeding techniques from the very beginning. Screening promising cultivars and hybrids for the content of antioxidants and other bioactive compounds in

the grain is required to expand and promote this breeding trend. It also seems expedient to apply simple, undamaging and, as a rule, indirect techniques of plant genotype assessment for the levels of antioxidants in the grain to increase the performance and efficiency of such screening, employing the entire genetic diversity of cereal crops for identification of contrasting initial sources for breeding food and feed cultivars. The results obtained in the process of studying already existing cereal cultivars and the achievements of plant breeding in releasing new high-yielding and high-quality cultivars enable producers to use them in the development of a wide assortment of health-friendly dietary products contributing to the physical fitness of the human organism.

**Author Contributions:** Conceptualization, I.G.L., E.K.K.; writing, I.G.L., E.K.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Acknowledgments:** The article was made with the support of the Ministry of Science and Higher Education of the Russian Federation under agreement № 075-15-2020-911 date 16.11.2020 on providing a grant in the form of subsidies from the Federal budget of the Russian Federation. The grant was provided for state support for the creation and development of a World-class Scientific Center "Agrotechnologies for the Future".

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Article* **Genetic Diversity and Combining Ability of White Maize Inbred Lines under Di**ff**erent Plant Densities**

#### **Mohamed M. Kamara 1, Medhat Rehan 2,3, Khaled M. Ibrahim 4, Abdullah S. Alsohim 3, Mohsen M. Elsharkawy 5, Ahmed M. S. Kheir 6, Emad M. Hafez <sup>1</sup> and Mohamed A. El-Esawi 7,\***


Received: 7 August 2020; Accepted: 31 August 2020; Published: 3 September 2020

**Abstract:** Knowledge of combining ability and genetic diversity are important prerequisites for the development of outstanding hybrids that are tolerant to high plant density. This work was carried out to assess general combining ability (GCA) and specific combining ability (SCA), identify promising hybrids, estimate genetic diversity among the inbred lines and correlate genetic distance to hybrid performance and SCA across different plant densities. A total of 28 F1 hybrids obtained by crossing eight adverse inbred lines (four local and four exotic) were evaluated under three plant densities 59,500 (D1), 71,400 (D2) and 83,300 (D3) plants ha−<sup>1</sup> using spilt plot design with three replications at two locations during 2018 season. Increasing plant density from D1 to D3 significantly decreased leaf angle (LANG), chlorophyll content (CHLC), all ear characteristics and grain yield per plant (GYPP). Contrarily, days to silking (DTS), anthesis–silking interval (ASI), plant height (PLHT), ear height (EHT), and grain yield per hectare (GYPH) were significantly increased. Both additive and non-additive gene actions were involved in the inheritance of all the evaluated traits, but additive gene action was predominant for most traits. Inbred lines L1, L2, and L5 were the best general combiners for increasing grain yield and other desirable traits across research environments. Two hybrids L2 × L5 and L2 × L8 were found to be good specific combiners for ASI, LANG, GYPP and GYPH. Furthermore, these hybrids are ideal for further testing and promotion for commercialization under high plant density. Genetic distance (GD) among pairs of inbred lines ranged from 0.31 to 0.78, with an average of 0.61. Clustering based on molecular GD has effectively grouped the inbred lines according to their origin. No significant correlation was found between GD and both hybrid performance and SCA for grain yield and other traits and proved to be of no predictive value. Nevertheless, SCA could be used to predict the hybrid performance across all plant densities. Overall, this work presents useful information regarding the inheritance of maize grain yield and other important traits under high plant density.

**Keywords:** maize; density tolerance; combining ability; gene effects; genetic diversity

#### **1. Introduction**

Maize (*Zea mays* L.) is one of the main economic crops that subsidize global food security. It is widely used for food, animal feed, edible oil and fuel worldwide [1]. In Egypt, maize is considered the second most important crop, with the annual production of the grain reaching about 7.30 Mt from approximately 0.94 Mha in 2018 [2]. This production is insufficient to meet the demands of a fast-growing population. The gap between production and consumption is approximately 45% [3]. This gap could be narrowed by further increase in the hybrids yield potential and total yield production from unit area. [4]. Increasing planting density is required to increase grain yield production in maize [5]. The average density of intense maize cultivation in the USA is 97,500 plants ha−<sup>1</sup> [6]. The recommended planting density in Egypt is 53,533 plants ha−<sup>1</sup> [7], which is around half the amount used in the USA. The use of lower plant densities decreases light interception, leading to high grain production per plant but low grain production per unit area [8]. The yield production could be maximized by growing maize hybrids that can tolerate high plant density up to 100,000 plants ha<sup>−</sup>1. However, high plant densities enhance interplant competition for light, nutrients, and water [9]. Additionally, it increases the anthesis–silking interval [10], thereby increasing kernel abortion [11] and reducing single plant yield. Al-Naggar et al. [12] showed that with increased planting density, plant and ear heights increased, whereas chlorophyll content, grains per ear and thousand grain weights decreased. The tolerance of the current Egyptian maize hybrids to high plant densities is low. This probably attributed to their tallness, decumbent leaf, one-eared and large size [7,13]. Conversely, modern maize hybrids in developed countries are characterized by early silking, short anthesis–silking interval and prolificacy, which are essential adaptive traits to high plant density tolerance [10,14–16].

Breeding programs should be directed towards the development of hybrids that are not only high yielding, but also show enhanced adaptability to high plant density tolerance. The successful identification of desirable hybrid combinations depends on the combining ability of the parents and the gene effects involved in the expression of target trait [17]. Furthermore, knowledge of gene action is important to devise an appropriate breeding strategy [18]. General combining ability (GCA) and specific combining ability (SCA) are widely used in selection of good parents and hybrids, respectively [19]. Among different biometrical approaches, the diallel mating design is commonly used by maize breeders to estimate GCA and SCA effects [20–22]. GCA is associated with additive gene effects, whereas SCA is typically associated with non-additive gene effects [23]. Both additive and non-additive gene actions were reported to be important in the inheritance of maize grain yield under high plant density [24]. However, the grain yield and other assessed traits under different plant densities among selected maize inbred lines were mostly controlled by additive gene action [7,25].

The assessment of the diversity and genetic distance in the available maize inbreds is important for a hybrid breeding program, in order to identify inbreds that would produce crosses with good levels of heterosis without testing all hybrids combinations [26,27]. Different types of DNA markers are available to estimate genetic distance. The simple sequence repeat (SSR) markers or microsatellites have been considered as the markers of choice owing to their co-dominant, high polymorphic, multi-allelic nature and high reproducibility [28–30]. However, contradictory results have been reported with respect to the relationship between genetic distance and hybrid performance in maize. Significant correlations were reported between molecular marker-based GD and F1 hybrid grain yield in maize [31,32]. Whereas, other studies reported no significant correlation [33,34]. The objectives of this study were (i) to estimate GCA of the inbred lines and SCA of the hybrids under different plant densities; (ii) to determine the mode of gene action controlling grain yield and other important agronomic traits; (iii) to identify promising hybrids that yield well at high plant density; and (iv) to assess genetic diversity among the eight inbred lines and correlate genetic distance to hybrid performance and SCA.

#### **2. Results**

#### *2.1. Analysis of Variance*

The analysis of variance (ANOVA) revealed highly significant mean squares for locations (L), densities (D), hybrids (H) and their interactions (L × D, H × L, H × D and H × D × L) for all the studied characteristics (Table 1). Moreover, general combining ability (GCA) and specific combining ability (SCA) mean squares were highly significant for all the measured traits. The magnitude of GCA mean squares was higher than that of SCA mean squares (the ratio of GCA/SCA was higher than the unity) for all the studied traits, except number of kernels per row (NKPR) trait. Significant mean squares of GCA × L, SCA × L, GCA × D, SCA × D, GCA × L × D, SCA × L × D interactions were detected for all the studied traits, except GCA × L and GCA × L × D for leaf angle (LANG) and chlorophyll content (CHLC), GCA × D for ear diameter (ED) and SCA × D for EHT, LANG and ED were not significant.

**Table 1.** Analysis of variance for the evaluated crosses under three plant densities combined across two locations for all the studied traits.


\* and \*\* significant at 0.05 and 0.01 levels of probability, respectively. DTS: days to 50% silking, ASI: anthesis–silking interval, PLHT: plant height, EHT: ear height, LANG: leaf angle, CHLC: chlorophyll content, ED: ear diameter, NRPE: number of rows per ear, NKPR: number of kernels per row, TKW: thousand kernel weight, GYPP: grain yield per plant and GYPH: grain yield per hectare.

#### *2.2. Changes in the Studied Traits Due to Increased Plant Density*

Across the two locations, the mean of grain yield per plant (GYPP) was significantly decreased as plant density increased from D1 to D2 and D3 by −9.60 and −20.59%, respectively, as compared to D1 (Figure 1A). This reduction was accompanied by reductions in leaf angle (LANG) (−5.97 and −11.23%), chlorophyll content (CHLC) (−5.48 and −12.15%) and all yield attributes; ear diameter (ED) (−7.68 and −14.01%), number of rows per ear (NRPE) (−6.21 and −9.83%), number of kernels per row (NKPR) (−7.38 and −17.77%), and thousand kernel weight (TKW) (−6.39 and −13.13%) at plant density of D2 and D3, respectively, as compared to D1. Conversely, high plant density (D2 and D3) caused a significant increase in grain yield per hectare (GYPH) compared with the low density (D1) by 8.48 and 11.23%, respectively (Figure 1B). Similarly, D2 and D3 caused significant increases in days to 50% silking (DTS) (5.10 and 11.31%), anthesis–silking interval (ASI) (12.87 and 39.88%), plant height (PLHT) (3.78 and 9.75%) and ear height (EHT) (6.64 and 12.86%) as compared with low plant density (D1), respectively.

**Figure 1.** Shows the changes due to increased plant density: (**A**) reduction in leaf angle (LANG), chlorophyll content (CHLC), ear diameter (ED), number of rows per ear (NRPE), number of kernels per row (NKPR), thousand kernel weight (TKW) and grain yield per plant (GYPP); (**B**) increase in days to 50% silking (DTS), anthesis–silking interval (ASI), plant height (PLHT), ear height (EHT) and grain yield per hectare (GYPH) under D2 and D3 in compared with D1.

#### *2.3. Performance of F1 Hybrids*

The mean performances of the 28 F1 hybrids and the commercial check hybrid SC128 for all the studied characteristics are provided in Supplementary Materials, Table S1. The evaluated hybrids showed a wide variation for all studied traits under all plant densities. The mean values for DTS were 58.22 days in D1, 61.19 days under D2, and 64.80 days in D3 (Table 2). The earliest hybrids were L1 × L3 at D1, L3 × L4 at D2 and L1 × L4 at D3, while the latest hybrids were L6 × L8 under D1 and D2 and L3 × L6 under D3 (Table 2). A total of 21, 17 and 4 hybrids were significantly earlier than the check hybrid SC128 under D1, D2 and D3, respectively (Supplementary Materials, Table S1). Likewise, the means of ASI were 3.26 days in D1, 3.68 days under D2, and 4.56 days in D3. The longest ASI was shown by the hybrid L3 × L7, and the shortest one was shown by L2 × L5 under the three plant densities (Table 2). The highest PLHT mean was 263.52 cm in D3, while it was 240.122 cm and 249.20 cm in D1 and D2, respectively. The tallest hybrids were L4 × L7 under D1 and D3, and L2 × L4 under D2, while the shortest hybrid was L2 × L6 under the three plant densities (Table 2). The means of the EHT were 117.86, 125.68 and 133.02 in D1, D2 and D3, respectively. A total of 12, 11 and 14 hybrids were significantly shorter than the check hybrid SC128 under D1, D2 and D3, respectively (Supplementary Materials, Table S1).


**Table 2.** Minimum, maximum and mean values of all the studied traits under three plant densities across two locations.

The hybrid L6 × L7 had the highest ear height under the three plant densities, while the hybrids L3 × L6 in D1 and L2 × L6 under D2 and D3 had the lowest ear heights (Table 2). A total of 13, 20 and 19 hybrids had significantly lower ear placement than the check hybrid SC128 under D1, D2 and D3, respectively (Supplementary Materials, Table S1). Furthermore, the hybrid L4 × L5 displayed the lowest LANG, while L3 × L7 gave the highest one under the three plant densities. The means of CHLC were 50.34, 47.59 and 44.23 SPAD units under D1, D2 and D3, respectively. The highest hybrid in CHLC was L2 × L8, while the lowest hybrid was L7 × L8 across the three plant densities (Table 2). Moreover, the hybrids L5 × L6 at D1, L3 × L4 at D2 and L1 × L5 at D3 significantly surpassed the check hybrid SC128 for this trait (Table S1). The means of ED were 5.16 cm in D1, 4.76 cm under D2, and 4.44 cm in D3. The hybrid L1 × L7 at D1 and L2 × L4 at D2 and D3 exhibited the lowest ED, while L1 × L8, L1 × L3 and L1 × L4 gave the highest ones under D1, D2 and D3, respectively (Table 2). The mean for the NRPE was 14.83 in D1 and 13.91 in D2, while it was 13.37 in D3. The hybrid L2 × L5 under D1 and L1 × L5 under D2 and D3 exhibited the highest NRPE, while L3 × L7 in D1, L3 × L4 under D2 and L1 × L3 in D3 had the lowest mean values (Table 2). Additionally, two hybrids under D1, four hybrids at D2 and three hybrids at D3 possessed higher NRPE than the check hybrid SC128 (Supplementary Materials, Table S1). The mean values of the NKPR were 40.28, 37.31 and 33.12 for D1, D2 and D3, respectively. The hybrid L2 × L8 had the highest NKPR, but the hybrid L1 × L5 displayed the lowest one under the three plant densities. Means of the TKW were 356.0 g, 333.24 g, and 309.26 g in D1, D2, and D3, respectively. The heaviest TKW was assigned for the hybrids L2 × L8 under D1 and L1 × L4 under D2 and D3, whereas the hybrids L3 × L8 in D1, L5 × L6 under D2 and L5 × L7 under D3 exhibited the lightest TKW (Table 2). Furthermore, four hybrids under D1, five hybrids at D2 and three hybrids at D3 significantly exceeded the check hybrid SC128 for this trait (Supplementary Materials, Table S1). The highest mean of GYPP was 170.11 g in D1, while it was 153.78 and 135.09 g in D2 and D3, respectively. Conversely, the highest mean of GYPH was obtained in D3 (11.26 t ha<sup>−</sup>1), followed by D2 (10.98 t ha<sup>−</sup>1) and then by D3 (10.12 t ha−1) (Table 2). The hybrid L2 <sup>×</sup> L8 was the top yielding hybrid and significantly out-yielded the check hybrid SC128 by 9.98, 13.16 and 10.26% under D1, D2 and D3, respectively. Moreover, the hybrid L2 × L5 significantly surpassed the check hybrid SC128 by 5.26% only under D2 (Supplementary Materials, Table S1). The optimum plant density for obtaining the highest GYPH was D3 for all hybrids, except the hybrids; L2 × L7, L3 × L4, L3 × L7 and L2 × L8, where the optimum density was D2 (Supplementary Materials, Table S1). This indicates that the optimum plant density is genotype dependent and should be identified separately for each hybrid.

#### *2.4. General Combining Ability (GCA) E*ff*ects*

Estimates of GCA effects are presented in Table 3. High positive values of GCA effects would be of interest for all studied characteristics in question, except DTS, ASI, PLHT, EHT and LANG where high negative values would be desirable from the breeder point of view. Results showed that the highest significant and negative GCA effects under the three plant densities were obtained by the inbred lines L1 and L3 for DTS; L1, L2 and L5 for ASI; L1, L5, L6 and L8 for PLHT; L3, L5 and L8 for EHT and L1, L2 and L4 for LANG. Additionally, the inbred lines L4 in D1 and D2, as well as L5 in D3 for DTS; L4 in D3 and L8 in D1 and D2 for ASI; L2 in D3 and L3 under D1 and D3 for PLHT; and L5 under D1 and D3 for LANG also expressed significant and negative GCA effects for these traits. In contrast, the inbred lines L1 in D2 and D3, L5 under D1 and L2 under the three plant densities possessed significant and positive GCA effects for CHLC. Regarding ED, the inbred lines L1 and L8 in D1 and D3 as well as L3 in D2 had significant and positive GCA effects.

The highest positive and significant GCA effects for NRPE belonged to L1 in D2 and D3, L5 and L8 in D1 and D3, and L2 under the three plant densities. Likewise, the inbreds L3 and L7 in D1; L1 and L6 in D3 and L2 under the three plant densities were determined and considered to be good general combiners for NKPR. The highest positive and significant GCA effects for TKW belonged to L1 and L2 under the three plant densities, L4 under D1 and D2 and L6 under D3. Furthermore, the inbred lines L1, L2 and L5 under the three plant densities and L8 under D3 had significant and positive GCA effects for GYPP and GYPH. Based on the summarized results, it can be concluded that parental lines L1, L2 and L5 had the highest GCA effects for grain yield and the majority of studied traits.


**Table 3.** General combining ability (GCA) effects of the eight parental inbred lines for all the studied traits under three plant densities across two locations.

\* and \*\* significant at 0.05 and 0.01 levels of probability, respectively. DTS: days to 50% silking, ASI: anthesis–silking interval, PLHT: plant height, EHT: ear height, LANG: leaf angle, CHLC: chlorophyll content, ED: ear diameter, NRPE: number of rows per ear, NKPR: number of kernels per row, TKW: thousand kernel weight, GYPP: grain yield per plant and GYPH: grain yield per hectare.

#### *2.5. Specific Combining Ability (SCA) E*ff*ects*

The estimated SCA values under the three plant densities across two locations are presented in Table 4. The hybrids that presented the highest significant and negatives SCA effects (desirable) under the three plant densities were L1 × L6, L2 × L4, L3 × L5, L3 × L8, L4 × L7 for DTS; L1 × L7, L2 × L5, L2 × L7, L2 × L8, L3 × L4, L3 × L6 and L4 × L5 for ASI; L1 × L4, L2 × L6, L2 × L7, L2 × L8, L3 × L4 and L3 × L7 for PLHT; L1 × L7, L1 × L8, L2 × L6 and L3 × L6 for EHT and L1 × L4, L1 × L5, L1 × L6, L1 × L7, L2 × L5, L2 × L8, L3 × L4, L3 × L6, L4 × L5, L4 × L7 and L7 × L8 for LANG. On the contrary, the hybrid combinations; L1 × L7, L2 × L8, L3 × L4 and L5 × L6 for CHLC; L2 × L5 and L2 × L7 for ED; L1 × L5, L2 × L3, L3 × L6 and L6 × L7 for NRPE; L1 × L6, L2 × L8 and L6 × L7 for NKPE; L1 × L4, L1 × L6, L2 × L5, L2 × L8, L3 × L5, L4 × L5, L6 × L7 and L7 × L8 for TKW and L1 × L3, L1 × L6, L2 × L5, L2 × L8, L3 × L4, L3 × L6, L4 × L5, L6 × L7 and L7 × L8 for GYPP and GYPH had the highest significant and positive SCA effects (desirable) under the three plant densities. Moreover, the hybrids L1 × L5 in D2 and D3, L4 × L7 in D1 and D2 and L2 × L4 and L5 × L7 under D3 displayed significant and positive SCA effects for GYPP and GYPH. It is notable that the crosses that showed high SCA effects for GYPP and GYPH also showed desirable SCA effects for some other traits, i.e., DTS, LANG, NKPE and TKW for the hybrid L1 × L6; ASI, LANG and TKW for the two hybrids L2 × L5 and L4 × L5; ASI, PLHT, LANG, CHLC, NKPR and TKW for the hybrid L2 × L8 and PLHT, NRPE, NKPR and TKW for the hybrid L6 × L7.


*Plants* **2020** , *9*, 1140





CHLC: chlorophyll content, ED: ear diameter, NRPE: number of rows per ear, NKPR: number of kernels per row, TKW: thousand kernel weight, GYPP: grain yield per plant and GYPH:

grain yield per hectare.

#### *2.6. SSR Polymorphisms, Genetic Distance (GD) and Cluster Analysis*

Out of twenty-two SSR primer pairs analyzed, ten were polymorphic among the eight inbreds studied (Table 5). The primer pairs generated a total of 80 polymorphic fragments (Figure 2). The number of alleles per locus ranged from 2 to 6, with an average number of 2.7 alleles/locus (Table 5). The major allele frequency had an average of 0.59 with a range extended from 0.25 to 0.88. The gene diversity and polymorphic information content (PIC) averaged 0.50 and 0.41, with ranges of 0.22–0.81 and 0.19–0.79, respectively. The umc1033 locus showed the highest gene diversity and PIC (Table 5). Genetic distance estimates based on SSR markers ranged from 0.31 to 0.78 with an average of 0.61 (Table 6). The lowest genetic distance (0.31) was obtained between the inbred lines (L1 and L4), whereas the highest genetic distance (0.78) was observed between the inbred lines (L1 and L8), (L2 and L5), (L2 and L6) and (L2 and L8). The dendrogram constructed based on GD revealed two main clusters; L1, L2, L3 and L4 constituted the first group, while L5, L6, L7 and L8 formed the second one (Figure 3).

**Table 5.** Number of alleles, major allele frequency, gene diversity and polymorphic information content (PIC) of the ten SSR markers used in this study.


**Figure 2.** Amplification pattern of representative SSR markers with the eight maize inbred lines (L1–L8). M refers to the 100 bp DNA ladder.

**Table 6.** Genetic distance (GD) matrix among the eight maize inbred lines based on SSR analysis. **Parent L1 L2 L3 L4 L5 L6 L7 L8**


**Figure 3.** Dendrogram of the eight maize inbred lines constructed from SSR data using (UPGMA) according to Jaccard's coefficients.

#### *2.7. Association between Genetic Distance, F1 Hybrid Performance and SCA*

Correlations between GD estimated for pairs of inbred lines with each of F1 hybrid performance and SCA were not significant for all measured traits (Table 7, Figure 4A,B). However, significant and positive association was observed between F1 hybrid performance and SCA for all the studied traits across the three plant densities (Table 7).

**Table 7.** Correlation coefficients among parental genetic distance (GD), F1 hybrid performance and SCA for all studied traits across all environments.


\*\* significant at 0.01 level of probability.

**Figure 4.** Corrplot depicting correlation coefficient of genetic distance based on molecular data with F1 hybrid performance (**A**) and SCA (**B**) for all studied traits. GD: genetic distance, DTS: days to 50% silking, ASI: anthesis–silking interval, PLHT: plant height, EHT: ear height, LANG: leaf angle, CHLC: chlorophyll content, ED: ear diameter, NRPE: number of rows per ear, NKPR: number of kernels per row, TKW: thousand kernel weight, GYPP: grain yield per plant and GYPH: grain yield per hectare.

#### **3. Discussion**

#### *3.1. Analysis of Variance and Hybrid Performance*

The significant mean squares of L, D and H observed for all the studied characteristics (Table 2), indicate that the tested locations and densities were dissimilar and there were adequate genetic differences among the hybrids for effective selection of all the studied traits. Significant differences among maize hybrids under different plant densities were also reported [10,35–37]. The presence of significant mean squares for H × D interaction, indicated inconsistent performance of the hybrids across plant densities. In that context, the ranks of maize hybrids differed from one density to another for all measured traits. Therefore, selection of hybrids under various plant densities may be a promising strategy to improve the adaptation of maize hybrids to higher plant density. These results are consistent with the findings of other studies [12,13,36,38].

The significant GCA and SCA effects imply that both additive and non-additive gene effects are involved in governing all traits. The inheritance of a specific trait could be identified based on the ratio of GCA/SCA variances. In the present study, the GCA/SCA ratio was greater than unity for all evaluated characteristics, except NKPR, which indicated the preponderance of additive gene effects in controlling the inheritance of all measured traits, except NKPR which was mainly controlled by non-additive gene action. Therefore, selection breeding methods can be effective for improvement of these traits. This finding is in agreement with that of Mason and Zuber [25] and Al-Naggar et al. [7], who reported that additive genetic effects were important in the inheritance of grain yield and other agronomic traits under different plant densities. However, this result is in contrast to the findings of other studies [36,39], who reported that non-additive gene effects were found to be more important in controlling grain yield inheritance under varying plant densities.

The significant GCA × L and GCA × D interactions mean squares for most traits in the present study indicate that the GCA effects of the inbred lines varied significantly under different environments. This result is in agreement with the findings of several authors [17,26,40,41]. Likewise, the significant SCA × L and SCA × D interactions observed for most traits implied that the performance of the hybrids was not consistent under varying research environments. This suggests the need for extensive

evaluation of the hybrids in multiple environments in order to identify high yielding and most stable hybrids tolerant to high plant densities [39].

The highest GYPP of all evaluated hybrids in this study was observed under low density (D1), where competition between plants is minimum [12]. As planting density increases, resources to each plant (water, nutrients and light interception) decrease, increasing plant–plant competition and in turn reducing the assimilated supply to developing cobs and, consequently, resulting in a reduction in grain yield per plant [42–44]. The observed reduction in GYPP due to elevating plant density from D1 to D2 and D3 in this study could be a result of the reduction in all yield attributesED, NRPE, NKPR and TKW. These results are consistent with Tang et al. [45], who stated that increasing plant density in maize leads to a reduction in ear diameter, grains per ear, thousand kernels weight and finally single plant yield. Hashemi et al. [46] also demonstrated that grain yield per plant and all yield components linearly decreased with increasing plant density. Moreover, increasing plant density also reduced LANG and CHLC. The decrease in the leaf angle and chlorophyll content in response to high plant density has also been reported previously in maize [13,47,48].

On the other hand, high plant density (D3) caused significant increases in DTS, ASI, PLHT, EHT and GYPH compared with the low density (D1). Delayed silking and increased ASI period, as symptoms of intense interplant competition for growth resources, can be associated with significant yield reductions [15,49]. Increasing plant density initiated greater stress during pollination that can lead to increasing kernel abortions and decreasing grain fill [8,11]. These two traits (early DTS and short ASI) could be effective indicators for selecting high density tolerance hybrids [50]. The increased values of PLHT and EHT might be related to the stress imposed on maize plants due to competition for light resulting from elevated plant density which potentially increase stem elongation [51,52]. The increase in GYPH with increasing plant density is largely attributed to the higher number of plants per unit area. This suggested that the increase in GYPH due to increased plant density may offset the reduction in GYPP due to competition between plants. These results are in accordance with the results reported in other studies [10,12,53,54].

The two hybrids L2 × L5 and L2 × L8 had the highest GYPP and GYPH under three plant densities, and could be considered the most highly responsive and tolerant to high plant density. Interestingly, the hybrid L2 × L8 significantly outyielded the check hybrid SC128 under all densities; moreover, it had outstanding features, such as short ASI, short plant and ear position, erect leaf under high plant density. Therefore, this hybrid should be tested extensively in multilocation trials and promoted for adoption to high plant density tolerance. Similar to our results, Al-Naggar et al. [12] reported that the selection of hybrids with high grain yield, better plant and ear heights, short ASI, and erect leaf under high plant density stress is important for the development of tolerant hybrids to high plant densities.

#### *3.2. GCA and SCA Estimates*

Combining ability analysis helps in the identification of parents with good GCA effects and hybrids with good SCA effects [23]. Selection of parents giving good-performing hybrids is one of the challenges facing breeders. Parents with desirable GCA effect for the target traits can be used to accumulate favorable alleles by recombination and selection [55]. In the current study, high GCA values for the evaluated traits were scattered among the eight inbred lines and changed across plant densities, demonstrating the effects of plant densities on GCA values. Moreover, none of the inbred lines exhibited significant GCA effects for all the measured traits under any of the testing densities. Similar results were reported by other researchers [56,57]. The significant and negative GCA effects were displayed by the inbreds L1 and L3 for DTS and L1, L2 and L5 for ASI across the three plant densities, indicating that, these inbreds could be good combiners and possessed favorable alleles towards earliness. Likewise, inbred lines L5 and L8 were the best general combiners for reduced plant and ear heights which are important for lodging tolerance especially under high plant density. The inbred line L2 had the highest positive GCA values for CHLC, NRPE, NKPR and TKW suggesting that this line could be good combiner for improving these traits. Moreover, the best general combiners for

GYPP and GYPH were L1, L2, and L5 under the three plant densities and L8 under D3. These inbreds could transfer desirable alleles for improved grain yield to their progenies to develop hybrids tolerant to high plant density. The superiority of these inbreds in GCA effects for grain yield was associated with their superiority in GCA effects for some other traits. Interestingly, the inbred line L1, which had desirable GCA effects for GYPP and GYPH, was also found to be good a general combiner for earliness, short ASI, short PLHT, reduced LANG and increased TKW. Previous findings proved that positive GCA effects for grain yield and negative GCA effects for DTS, PLHT, and LANG traits are a good indicator of high plant density tolerance [13]. Thus, the inbred line L1 has potential to be used to improve maize grain yield under high plant density.

Estimates of SCA effects provide important information about the non-additive gene effects (dominance and epistatic interaction), which can also be related to hybrid vigor, assisting in the selection of the best hybrid combinations [58]. The highly positive and significant SCA effects for grain yield and its components indicated that the produced hybrids were good specific combiners for developing high-yielding hybrids [1]. In the present study, the most promising specific combiners for grain yield (GYPP or GYPH) and some of its components were L1 × L3, L1 × L6, L2 × L5, L2 × L8, L4 × L5 and L7 × L8 under the three plant densities. These hybrids involved at least one high GCA parent, which could be exploited by conventional breeding procedures. This finding is in line with the result reported in other studies [56,59]. In their studies, high SCA was observed in cross combinations involving one line with high GCA and another with low GCA effects.

Two hybrids, L2 × L5 and L2 × L8, had desirable significant positive SCA coupled with high mean grain yield under the three plant densities, revealing good correspondence between mean grain yield and SCA effects [1]. Regardless of their significant SCA effects, three crosses L3 × L4, L3 × L6 and L6 × L7, constituted from parents with low × low GCA effects for GYPP and GYPH were not favorable due to insufficient additive variance. This indicates that both GCA and SCA should be taken into consideration in the selection of elite parents for the development of heterotic hybrids [18]. It is notable that none of the hybrids exhibited significant SCA effects for all the traits. However, the hybrids L2 × L5, L2 × L8 and L4 × L5 were found to be good specific combiners for more than one trait, such as ASI, LANG, TKW, GYPP and GYPH. Accordingly, these hybrids would be useful to increase maize grain yield under high plant density for their complementary characteristics, including, short ASI, erect leaf and high grain yield under high plant density. In concordance with the findings reported here, desirable significant SCA under high plant density for ASI, LANG and grain yield has previously been reported by Al-Naggar et al. [13].

#### *3.3. SSR Polymorphisms, Genetic Distance (GD) and Cluster Analysis*

The mean number of alleles (2.7) per locus obtained in this study was close to the values reported by other researchers [26,27,34], who detected averages of 2.9, 2.57 and 3.0 alleles per locus, respectively. However, it was lower than the 6.21 alleles/locus reported by Oppong et al. [60] or the 5.7 alleles/locus found by Oyekunle et al. [61] in maize inbred lines using SSR markers. The differences in the means of alleles among different studies could be attributed to the differences in sample size, repeat length and number of the SSR markers involved in the studies [27]. The lower values observed in this study could arise from the small number of lines used for genotyping.

The PIC demonstrates the informativeness of the SSR loci and their potential to detect differences among the inbred lines based on their genetic relationships [62]. Informative markers can be categorized as highly informative (PIC > 0.5), reasonably informative (0.5 < PIC < 0.25) and slightly informative (PIC < 0.25), as reported by Botstein et al. [63]. Accordingly, four markersumc1014, phi112, phi015 and umc1033 with high PIC values, and hence high discriminatory power, were identified. The average gene diversity (0.50) detected among the tested inbred lines in this study indicated high levels of polymorphisms within the inbred lines. This result is in close agreement with the findings reported in other studies [30,64]. The frequency of the most common (major) alleles had an average of 0.59, suggesting that 59.0% of the studied inbreds shared a common major allele at any of the tested loci.

Assessing the genetic diversity is essential for enhancing the yield and conservation strategies of main crops [65–70], such as maize that has high an economic importance [71]. The average genetic diversity existing among all the inbred lines was relatively high (0.61). This indicated that there was considerable genetic diversity among the inbreds based on the microsatellite markers analysis [72]. The largest GD in this study was between the Egyptian (local) and CIMMYT (exotic) inbred lines. The relatively large genetic distance between local and exotic lines, suggesting the opportunity to use these lines for the development of high-yielding and stress-tolerant hybrids. Indeed, the two high-yielding hybrids (L2 × L5 and L2 × L8) under the three plant densities consisted of local × exotic line combinations. This indicates that novel and complementary alleles existing in the germplasm from the two countries can be exploited for superior maize hybrid development and population improvement [73]. Moreover, it implies the potential benefits of exchanging germplasm between breeding programs for the development of high yielding and density tolerant hybrids.

The dendrogram constructed using the UPGMA clustering grouped the inbred lines into two main clusters, which generally agreed with their origin. One cluster was composed of CIMMYT inbred lines, while the other consisted of local inbreds. This result is consistent with the findings of Mageto et al. [17], who reported that clustering based on GD grouped maize inbred lines according to their origin. Similarly, [34,64] revealed the effectiveness of SSR markers for classifying maize inbreds according to their origin in their studies.

#### *3.4. Association between Genetic Distance, F1 Hybrid Performance and SCA*

Our results showed that GD of the parental inbreds was not significantly correlated with the mean of F1 hybrids for any of the evaluated traits across the tested environments. This implied that the SSR-based GD could not be used to predict the performance of F1 hybrids in this study. This result is consistent with those reported by [26,33,34,40]. Bernardo [74] attributed this poor correlation to the lack of linkage between genes controlling the trait and markers used to estimate GD, inadequate genome coverage and different levels of dominance among hybrids. Contrary to the current finding, a significant correlation was reported between molecular GD and F1 hybrid performance [32,75]. There was no significant correlation between GD and SCA for all the traits, suggesting that SSR-based GD might not be effective in predicting SCA effects in the studied materials. Similarly, non-significant association between genetic distances and SCA was reported by [40,76]. However, Betran et al. [75] reported a significant correlation between GD and SCA for maize grain yield. Furthermore, our results showed that SCA effects were significantly correlated with F1 hybrid performance for all the traits. This indicated that SCA could be used to predict the performance of F1 hybrids. This result is in agreement with the findings of [17,26].

#### **4. Conclusions**

This study revealed a considerable variability among F1 hybrids for all traits under different plant densities. Additive and non-additive gene effects are involved in the genetic control of all traits, with a predominance of the additive gene action for most traits. Selection of potential hybrids for density tolerance breeding programs should be based on both GCA and SCA effects. The inbred lines L1 and L3 were identified as excellent combiners for earliness, L5 and L8 for reduced plant and ear heights and L1, L2, and L5 for increased grain yield under varying plant densities. The best hybrids L2 × L5 and L2 × L8 for grain yield and other multiple traits were identified for further evaluation. The estimated GD based on SSR markers in this study could not be used to predict the hybrids performance and SCA effects. Nevertheless, SCA could be used to predict the hybrids performance across all plant densities. Although SSR determined that GD was not useful in predicting hybrid performance and SCA effects, it was effective in classifying the inbred lines according to their origin, signifying the efficiency of SSR marker for diversity and clustering analyses. The findings of the present study might have important implications for breeding programs designed to improve density tolerance in maize.

#### **5. Materials and Methods**

#### *5.1. Plant Materials*

Eight white maize (*Zea mays* L.) inbred lines showing clear differences in grain yield and other agronomic characteristics were chosen as parents in this study. Four inbreds (L1, L2 L3 and L4) were obtained from Maize Research Department, Agricultural Research Center (ARC) in Egypt and the other four (L5, L6, L7 and L8) were introduced from the International Maize and Wheat Improvement Center (CIMMYT). The parental codes, names and sources of these inbred lines are listed in Table 8.


**Table 8.** Code, name and source of the parental maize inbred lines.

#### *5.2. Production and Evaluation of F1 Hybrids*

In the 2017 season, all possible diallel crosses (excluding reciprocals) were made among the eight inbred lines to obtain seeds of 28 F1 hybrids. In the 2018 season, the resulting 28 F1 white hybrids plus the commercial check hybrid SC128 were evaluated under three plant densities, i.e., 59,500 (D1), 71,400 (D2) and 83,300 (D3) plants ha−<sup>1</sup> at two locations. The two locations were El-Mahmoudia, El-Behira, Egypt ((31◦3 N, 30◦48 E)) in a private farm, and the Experimental Farm, Faculty of Agriculture, Kafrelsheikh University, Egypt ((31◦6 N, 30◦56 E)). A split-plot design in randomized complete blocks (RCB) arrangement with three replications was used in each location. The three plant densities were located at the main plots, while the hybrids were located at the sub plots. Each subplot consisted of one ridge of 6 m long and 0.7 m width. Two seeds were sown in hills at 24, 20 and 17 cm apart, and thereafter (before the 1st irrigation) were thinned to one plant/hill to achieve the three plant densities, i.e., D1, D2 and D3, respectively. Phosphorus at the rate of 476 kg ha−<sup>1</sup> in the form of calcium super phosphate (15.5% P2O5) was added to the soil during seedbed preparation, and potassium sulphate (48% K2O) at a level of 120 kg ha<sup>−</sup><sup>1</sup> was applied after thinning. Moreover, nitrogen at the rate of 286 kg ha−<sup>1</sup> was added in two equal doses before the first and second irrigation. All other standard agronomic practices including weed control were followed in each location. Soil analysis was conducted on soil samples collected from 30 cm depth from each location according to Association of Officinal Analytical Chemists (A.O.A.C 2005) [77] (Supplementary Materials, Table S2). Additionally, the meteorological data are presented in the Supplementary Materials, Figure S1.

#### *5.3. Data Collection*

Data were collected on days to 50% silking (DTS, days from the planting to 50% extrusion of silks from the plants), anthesis–silking interval (ASI, calculated as the difference between days to 50% silking and days to 50% anthesis), plant height (PLHT, measured in cm as the distance from the soil surface to the top of the first tassel branch) and ear height (EHT, measured in cm as the distance from the soil surface to the base of the topmost ear). Leaf angle (LANG) (◦) was measured as the angle between stem and blade of the leaf just above ear leaf. Chlorophyll content (CHLC, SPAD units) was measured by hand-held chlorophyll meter (SPAD-502; Minolta Sensing Co., Ltd., Hangzhou, Japan) from the leaf of the top-most ear. The LANG and CHLC values were recorded on ten guarded plants within each plot, and then the values were averaged per plot. At harvest, ear diameter (ED), number of

rows per ear (NKPR), number of kernels per row (NKPR), thousand kernel weight (TKW), grain yield per plant (GYPP, in g plant<sup>−</sup>1) and grain yield per hectare (GYPH, in ton ha−1) were estimated. Plots were hand-harvested, and the weight of the shelled grain (adjusted to 15.5% grain moisture content) was used to calculate GYPP and GYPH. Grain moisture at harvest was measured using a hand-held moisture meter.

#### *5.4. Molecular Analysis*

#### 5.4.1. DNA Isolation

Leaves were sampled from 10 to 15 seedlings of each inbred line after twenty days from planting. Genomic DNA was isolated using CTAB method [78]. DNA quantity as well as quality was assessed using NanoDrop spectrophotometer (ND-1000, USA).

#### 5.4.2. SSR Primers and PCR Amplification

Twenty-two SSR markers were randomly selected from the MaizeGDB database (www.maizegdb. org). The 22 primer pairs were tested to identify the polymorphic ones. Only ten markers were found to be polymorphic and they used for the SSR analysis (Supplementary Materials, Table S3). PCR was performed in a volume of 10 μL reaction mixture containing 1 μL of 20 ng/μL genomic DNA template, 1 unit Taq DNA polymerase (Promega, Madison, WI, USA), 2 mM MgCl2, 0.2 mM dNTPs and 0.5 μM of reverse and forward primer. The PCR reaction was initially started by denaturation at 94 ◦C for 2 min, followed by 35 cycles consisting of denaturation at 94 ◦C for 30 sec, 30 sec of annealing at 55 ◦C, 30 sec of extension at 72 ◦C and a final extension of 3 min at 72 ◦C. Amplified products were electrophoresed on 1.5% agarose gel. The gels were stained with ethidium bromide and then distained with tap water and photographed using gel documentation system (UVITEC, Cambridge, UK).

#### *5.5. Statistical Analysis*

Analysis of variance (ANOVA) was computed for all data using SAS software (SAS Institute Inc, 2008). Combined analysis of variance of the split-plot design across the two locations was performed if the homogeneity test was non-significant. Least significant difference (LSD) values were calculated to test the significance of differences between means according to Steel et al. [79]. General combining ability (GCA) effects of the parents and specific combining ability (SCA) effects of the hybrids as well as their mean squares were computed according to Griffing's method 4 model I [80], using the DIALLEL-SAS program [81]. The testing of significance of GCA and SCA effects was done at 5% and 1% probability. Pearson's coefficients of correlation (r) were calculated and plotted using the package corrplot [82]. Based on the mean of each trait the reduction or increase due to increased plant density was calculated as follow:

Change% = 100(D2 or D3 − D1)/D1

#### *5.6. SSR Data Analysis*

The amplified bands were scored for each SSR marker based on the presence or absence of bands, generating a binary data matrix of (1) and (0) for each marker. The number of alleles per locus, major allele frequency, gene diversity and polymorphic information content (PIC) were calculated to assess allele diversity of each marker. The value of polymorphic information content (PIC) of each SSR marker was determined as described by Botstein et al. [63] as follows:

$$1 - \sum\_{i=1}^{n} \mathbf{P}\_j^2 - \sum\_{i=1}^{n-1} \sum\_{j=i+1}^{n} 2\mathbf{P}\_i^2 \mathbf{P}\_j^2$$

where P*<sup>i</sup>* and P*<sup>j</sup>* are the frequencies of the *i*th and *j*th allele of a given marker, respectively.

Genetic distances between pairs of inbred lines were calculated according to [83], using the PAST program. The dendrogram tree was generated with the unweighted pair group method using arithmetic averages (UPGMA) by the computational package MVSP version 3.1.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2223-7747/9/9/1140/s1. Table S1: Mean performance of the 28 F1 crosses and the check hybrid SC128 for all the studied traits under the three plant densities across the two locations. Table S2: Physical and chemical soil properties for the two locations during 2018 season. Table S3: List of SSR primers and their sequences used in the present study. Figure S1: Daily maximum temperature (T max), minimum temperature (T min) and solar radiation (SRAD) for the two locations during 2018 season.

**Author Contributions:** M.M.K., M.R., K.M.I., A.S.A., M.M.E., A.M.S.K., M.A.E.-E., and E.M.H. designed the study, performed the experiments, analyzed the data and wrote the manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Acknowledgments:** Faculty of agriculture, Kafrelsheikh University, Egypt, is thankfully acknowledged for carrying out this work. Tanta University in Egypt is also thankfully acknowledged for the support provided for conducting this work. The Agricultural Research Center (ARC) in Egypt and the International Maize and Wheat Improvement Center (CIMMYT), are thankfully acknowledged for providing us the seeds of the inbred lines used in this study.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


#### *Plants* **2020**, *9*, 1140


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Rice Breeding in Russia Using Genetic Markers**

#### **Elena Dubina 1, Pavel Kostylev 2,\*, Margarita Ruban 1, Sergey Lesnyak 1, Elena Krasnova <sup>2</sup> and Kirill Azarin <sup>3</sup>**


Received: 17 October 2020; Accepted: 12 November 2020; Published: 15 November 2020

**Abstract:** The article concentrates on studying tolerance to soil salinization, water flooding, and blast in Russian and Asian rice varieties, as well as hybrids of the second and third generations from their crossing in order to obtain sustainable paddy crops based on domestic varieties using DNA markers. Samples IR 52713-2B-8-2B-1-2, IR 74099-3R-3-3, and NSIC Rc 106 were used as donors of the *SalTol* tolerance gene. Varieties with the *Sub1A* locus were used as donors of the flood resistance gene: Br-11, CR-1009, Inbara-3, TDK-1, and Khan Dan. The lines C101-A-51 (Pi-2), C101-Lac (Pi-1, Pi-33), IR-58 (Pi-ta), and Moroberekan (Pi-b) were used to transfer blast resistance genes. Hybridization of the stress-sensitive domestic varieties Novator, Flagman, Virazh, and Boyarin with donor lines of the genes of interest was carried out. As a result of the studies carried out using molecular marking based on PCR in combination with traditional breeding, early-maturing rice lines with genes for resistance to salinity (*SalTol*) and flooding (*Sub1A*), suitable for cultivation in southern Russia, were obtained. Introgression and pyramiding of the blast resistance genes *Pi-1, Pi-2, Pi-33, Pi-ta*, and *Pi-b* into the genotypes of domestic rice varieties were carried out. DNA marker analysis revealed disease-resistant rice samples carrying 5 target genes in a homozygous state. The created rice varieties that carry the genes for blast resistance (Pentagen, Magnate, Pirouette, Argamac, Kapitan, and Lenaris) were submitted for state variety testing. The introduction of such varieties into production will allow us to avoid epiphytotic development of the disease, preserving the biological productivity of rice and obtaining environmentally friendly agricultural products.

**Keywords:** rice; salinity; submergence tolerance; blast; SSR markers; PCR analysis

#### **1. Introduction**

Rice (*Oryza sativa* L.) is the most important food crop for more than half of the world's population (China, Japan, India, Bangladesh, etc.). Biotic and abiotic stressors are the main obstacles to increasing global crop production and expanding rice production. It was found that only about 10% of the world's agricultural land is located in areas that do not suffer from stress factors [1].

Decreased rice yields in adverse climatic conditions threaten global food security. Genetic loci that ensure productivity in difficult conditions exist in the germplasm of cultivated plants, their wild relatives, and species adapted to extreme conditions [2].

One-fifth of the world's irrigated land (North Africa, Central and South-East Asia, etc.) is adversely affected by high soil salinity [3]. About 45 million hectares in the world are subject to soil salinization [4]. In the Russian Federation, rice is grown on an area of about 200 thousand hectares, of which about 80 thousand hectares are saline [5]. The decline in productivity on saline soils can be overcome by developing rice varieties tolerant to salinity and introducing them into agricultural production. Several

non-allelic genes provide tolerance to salinity during ontogenesis [6]. The main locus of salt tolerance is *SalTol*, which was first identified in some rice varieties [7,8]. This locus is mapped on chromosome 1 and its main function is to control the balance of Na+/K+ ions in rice plants [9].

One of the serious abiotic stress factors for rice, which inhibits plant growth and affects crop yield, is prolonged submersion of plants under water, which often happens to large areas of land in the rice-growing regions of South-East Asia [5]. Rice dies if total flooding lasts more than two weeks. A negative effect on the growth and development of rice plants at this time is exerted by a lack of oxygen (O2) and limited diffusion of carbon dioxide (CO2). Lack of light due to turbid flood water in the rice paddies during this period limits the ability of plants to photosynthesize and can even lead to their death [4–6].

Scientists in Asia have found the *Sub1A* gene, which regulates the response of plant cells to ethylene and gibberellin, leading to restriction of carbohydrate intake and dormancy of shoots under water, which contributes to tolerance to immersion [10,11]. In Russia, this gene can be used to develop varieties resistant to a large layer of water during the germination phase, which will become an effective way to protect rice from weeds without herbicides. Most weeds die under water without oxygen, and rice can survive. To develop such varieties, it is necessary to combine in one genotype genes with increased energy of initial growth, the ability to anaerobic germination, resistance to prolonged flooding and lodging.

In all countries of the world, including Russia, blast is among the most dangerous fungal diseases of rice and causes large yield losses in the years of epiphytoty. The most effective way to protect rice without fungicides is to grow blast-resistant varieties. More than 50 genes of resistance to this pathogen are known: *Pi-1, Pi-2, Pi-33, Pi-b, Pi-ta, Pi-z*, etc. [12]. Combining several effective resistance genes with their contribution on the genetic basis of the best varieties is an effective breeding strategy for resistance to variable fungal pathogens. Lines with a combination of 3–5 resistance genes show an increase and broadening of the spectrum of blast resistance in comparison with lines with separate genes. A number of successful breeding programs have already been carried out abroad to develop blast-resistant rice varieties by the gene pyramiding method using marker breeding [13].

Resistance to various biotic and abiotic factors is one of the traits that are difficult to assess when the assessment of the breeding material is possible only in the presence of an appropriate stress factor. At present, during the breeding of agricultural plants for resistance, the splitting population obtained from the crossing of resistance sources with genotypes that have productivity is tested against a natural background, or artificial infection is carried out under controlled conditions. This procedure, although it gives excellent results, is quite lengthy and costly. In addition, there are always susceptible plants that have escaped damage [14].

The use of DNA markers allows us to speed up the assessment and conduct selection without phenotypic assessment, at an early stage, regardless of the external conditions. In recent years, great progress has been made in the development of molecular marking technologies and their application to control complex agronomic traits using marker breeding [15]. The technology of molecular marking of resistance loci makes it possible to quickly select plant forms with target genes without using provocative backgrounds [16]. The identification of molecular markers linked to genes of resistance to these factors facilitates breeding work. The use of DNA markers brings the breeding of agricultural plants to a qualitatively new level, making it possible to evaluate genotypes directly and not through phenotypic manifestations, which, ultimately, is realized in the accelerated development of varieties with a complex of valuable traits [14]. Therefore, it is relevant to develop new rice varieties by marking [17].

The purpose of the study was the development of initial rice material using DNA markers for breeding highly productive varieties resistant to biotic and abiotic environmental stress factors: soil salinity, prolonged flooding, and blast.

#### **2. Materials and Methods**

We used samples from the collection of the Institute of Agricultural Genetics (Vietnam) as donors of the transferred salt tolerance gene: IR 52713-2B-8-2B-1-2, IR 74099-3R-3-3, and NSIC Rc 106, which were crossed with the early–maturing Krasnodar variety Novator. These varieties carry the *SalTol* locus, which has been mapped near the centromeric region of the first chromosome. RM493 and RM7075 [18] were used as flanking SSR-markers of this locus, with the greatest difference in the length of microsatellite repeats between the parental forms.

Varieties with the *Sub1A* locus were used as donors of the flooding resistance gene: BR-11, CR-1009, Inbara-3, TDK-1, and Khan Dan. The early-ripening variety Novator and rice lines with the introgressed genes for blast resistance *Pi-2* and *Pi-33* were also taken as recipients. The *Sub1A* locus is mapped to an interval of 0.06 morganides in chromosome 9 [11]. We used microsatellite markers for the *Sub1A* gene, CR25K and SSR1A. The *Sub1A* gene was identified by molecular marking based on PCR using specific primers.

When transferring blast resistance genes, lines C101-A-51 (*Pi-2*), C101-Lac (*Pi-1, Pi-33*), IR-58 (*Pi-ta*), and Moroberekan (*Pi-b*) were used. To identify the Pi-1 gene, we used primer pairs of the flanking microsatellite SSR markers RM224 and RM144; for the Pi-2 gene, we used RM527 and SSR140; for the Pi-33 gene, RM310 and RM72; for the *Pi-b* and *Pi-ta* genes, intragenic markers developed in the laboratory of biotechnology, Federal Scientific Rice Centre. They are localized on chromosomes 11, 6, 8, 2, and 12, respectively (Table 1) [19,20].


**Table 1.** Nucleotide sequences of codominant markers for identification of the allelic status of genes Pi-1, Pi-2, Pi-a, and Pi-b.

The early-ripening released rice varieties Boyarin, Flagman, and Virage served as the paternal form. During plant hybridization, pneumocastration of maternal forms and pollination by the Twell method were used [21]. Hybrid plants were grown on checks of Federal State Unitary Enterprise "Proletarskoe" (Rostov region) and the Federal State Unitary Elite Seed-growing Enterprise "Krasnoe" of the Federal Scientific Rice Centre, Krasnodar region. From the selected rice leaves, genomic DNA was isolated under laboratory conditions at the Federal Scientific Rice Centre, the Academy of Biology and Biotechnology of the Southern Federal University, and the All-Russian Research Institute of Agricultural Biotechnology. PCR products were separated by electrophoresis in 2.5% agarose and 8% acrylamide gels. The experimental data were statistically processed using Microsoft Excel and Statistica 6 software.

The account of the degree of damage to plants (in percentages) was carried out on the 14th day after inoculation, in accordance with the express method for assessing rice varietal resistance to blast. The assessment was carried out by taking two indicators into account: the type of reaction (in points) and the degree of damage (in percentages), using the ten-point scale of the International Rice Research Institute [12]:


The intensity of disease development (IDD, %) was calculated by the formula (Equation (1)):

$$\text{ADD} = \sum \text{(a} \times \text{b)} / \text{n} \times 9 \tag{1}$$

where IDD is the intensity of disease development (%), (a × b) is the sum of the products of the number of infected plants multiplied by the corresponding damage point, and n is the number of recorded plants (pcs).

Depending on the damage points, all varieties wee conventionally divided into 4 groups: resistant, intermediate, susceptible, and strongly susceptible.

#### **3. Results and Discussion**

The development of blast-resistant varieties and their rapid introduction into production is the most promising solution in the fight against this disease. However, the development of resistant varieties is one of the most difficult areas of breeding. The causative agents of the disease have a great potential for variability, which, combined with its colossal reproduction capabilities, provides the pathogen with the highest adaptive capabilities. Combining several effective genes of resistance on a genetic basis of the best varieties widely used in production is an effective breeding strategy for long-term resistance to variable fungal pathogens.

Based on the use of DNA marker breeding (marker-assisted selection (MAS)—breeding with use of DNA markers towards genes of interest), we introduced 5 blast resistance genes into domestic rice varieties adapted to the agro-climatic conditions of rice cultivation in southern Russia.

A series of crosses made it possible to obtain rice lines based on the varieties Boyarin, Flagman and Virage with the introgressed and pyramided blast resistance genes *Pi-1, Pi-2, Pi-33, Pi-ta*, and *Pi-b* in a homozygous state. During all cycles of backcrossing, the transfer of the dominant alleles of each such gene in the offspring was controlled by closely linked molecular markers. Plants with no resistance alleles in the genotype were discarded.

At the first stage of work in 2005 at Agrarian research center "Donskoy", 6 hybrids were obtained from crossing the varieties Boyarin and Virage with three donors of blast resistance carrying the *Pi-l, Pi-2*, and *Pi-33* genes. After analysis at the Federal Scientific Rice Centre, homozygous forms were identified for the dominant alleles.

At the second stage of work (2008), after crossing the *Pi-1* + *33* × Boyarin and *Pi-2* × Boyarin hybrids between themselves, it was possible to obtain forms with three pyramided genes simultaneously: *Pi-1, Pi-2*, and *Pi-33* in a homozygous state.

At the third stage of work (2010), they were hybridized with varieties—donors of the *Pi-ta* and *Pi-b* genes—for combining 5 genes. There were two types of crosses: ((*Pi-1* + *2* + *33*) × *Pi-ta*) × *Pi-b* and *Pi-1* + *2* + *33* × (*Pi-ta* × *Pi-b*).

Leaves were selected from the best F2 hybrid plants for DNA analysis at All-Russian Research Institute of Agricultural Biotechnology and the Federal Scientific Rice Centre using one marker for each gene. Based on the analysis results, it was possible to isolate two rice samples that were homozygous for all five dominant alleles. Reanalysis of the leaves of these samples confirmed last year's results, i.e., homozygosity for the dominant alleles of all five loci.

Figures 1 and 2 show the panicles of two lines, 1225/13 and 1396/13, which show the presence of dominant alleles at five loci in the homozygous state: *Pi-1*, *Pi-2*, *Pi-33*, *Pi-b*, and *Pi-ta*.

**Figure 1.** Panicle of the early-ripening line 1225/13.

**Figure 2.** Panicle of the mid-ripening line 1396/13.

Line 1225/13 is early maturing, matures in 110 days, and dwarfish (80 cm), with a small panicle (15 cm) (Figure 1).

The second line 1396/13 is mid-ripening, the period to maturity is 120 days, and it is taller (100 cm), with a large long panicle (22 cm) (Figure 2).

Against the infectious background in the Federal Scientific Rice Centre, the index of disease development (IDD) in this line was only 1.4%, while the variety Novator was damaged by 67.7%. The results of the analysis made it possible to send these lines to the breeding nursery in 2014–2015 for testing for yield and blast resistance. The variety **Pentagen** (1396/13), carrying 5 genes for blast resistance, is widely used in hybridization with high-yielding Russian varieties.

In the process of work at the Federal Scientific Rice Centre in 2007–2008, crosses were carried out and F1 hybrids were obtained from the combination (Flagman × C101-Lac) × (Flagman × C-101-A-51), which have the blast resistance genes *Pi-33* and *Pi-2* in their genotypes, respectively. The resulting F1 generation was used in backcrosses with the recipient parental forms. It should be noted that the F1 plants had a high degree of sterility (up to 95%). After the first series of backcrosses in 2008, the BC1 and BC2 generations were obtained in artificial climate chambers. In BC1 populations, fertility increased and averaged about 50%. Starting from the first backcrossing, marker control was carried out for the presence of transferred donor alleles in the hybrid offspring. In 2009, plants of the BC3 and BC4 generation were obtained. Among these plants, we selected the forms with the shortest growing

season and the highest panicle fertility. From the BC4F1 stage (the first self-pollination of rice plants, which makes it possible to transfer the donor allele to a homozygous state), individual selection was carried out. Segregation for Pi-2, Pi-33, and Sub1A genes fit into the Mendelian framework: in the second generation as a result of DNA analysis of the obtained plants, the ratio was 1:2:1 by genotype and 3:1 by phenotype.

Plants were selected that were closest in morphotype to the recipient parental form and had donor genes for resistance to the pathogen *Pyricularia oryzae* Cav. in their genotype in a homozygous state [22].

Figure 3 shows the results of PCR analysis for identification of the *Pi-33* blast resistance gene in the BC4F3 hybrid material.

**Figure 3.** Electrophoregram of genomic DNA amplification products at the loci RM310 and RM72: 1–4, 7–12, analyzed hybrid BC4F3 plants; 5, donor line of the Pi-33 gene C101-Lac; Flagman, maternal form.

The figure shows that plants Number 2, 4, and 7–12 are homozygotes for the dominant allele; plants Number 1 and 3 are heterozygous. The size of the PCR product in varieties with the dominant allele of the Pi-33 gene, which determines the resistance, is 198 bp; in varieties with a recessive allele, it is 152 bp.

In 2015–2016, the resulting rice lines with introgressed blast resistance genes *Pi-2* and *Pi-33* were crossed with the variety Khan Dan (Vietnamese breeding): the donor of the *Sub1A* gene. This work was performed for obtaining breeding material with combined genes for disease resistance and tolerance to prolonged immersion of plants under water. In 2017–2020, F4 and BC2F3 generations were obtained using climate chambers at the Federal Scientific Rice Centre (All-Russian Rice Research Institute, Krasnodar, Russia).

To increase economic efficiency and reduce labor costs, multi-primer systems have been developed to identify two genes (*Pi* and *Sub1A*) in a hybrid material simultaneously.

At the first stage, when we selected DNA markers for reliable interpretation of PCR products and identification of non-specifically amplified fragments, the following parameters were taken into account: the annealing temperature of specific pairs introduced into the reaction mixture, the difference in the size of PCR products synthesized during amplification with specific primer pairs (at least 100 base pairs), and the self-complementarity of the primer sequences.

The results of testing the combination of primer pairs flanking the marker regions of the *Pi-2* + *Sub1A* genes are shown in Figure 4.

**Figure 4.** Electrophoregram of multiplex PCR of genomic DNA amplification products at the loci RM527 and SSR140 for the Pi-2 gene and at the Sub1A203 locus for the *Sub1A* gene: 1–5, 9–13, analyzed hybrid plants of the BC2F3 generation; 6, Khan Dan, donor of the *Sub1A* gene; 7, Flagman, maternal form; 8, C101Lac-A-51, donor line of the *Pi-2* gene.

The electrophoregram shows that when PCR with such a combination of molecular markers is carried out, the target products specific for DNA markers of the desired genes are reliably amplified. Samples Number 3 and 12 have dominant alleles of the genes *Pi-2* and *Sub1A* in a homozygous state in their genotype; Samples 1, 4, 5, and 9 are homozygous for the *Sub1A* gene and have the *Pi-2* gene in the genotype in a heterozygous state; Sample 10 is a recessive homozygote for two target genes and was rejected. The size of the PCR product in varieties with the dominant allele of the *Pi-2* gene, which determines the resistance, is 233 bp. The size of the PCR product in varieties with the dominant allele of the *Sub1A* gene, which determines the resistance, is 118 bp. Clear identification on the electrophoregram makes it possible to reliably identify the presence of dominant alleles of the target genes.

The introduction of such varieties into production will allow us to avoid epiphytic development of the disease, preserving the biological rice productivity, and obtaining environmentally friendly agricultural products.

**Magnat** is the first cultivar in Russia created at the Agrarian Research Center Donskoy together with the Federal Scientific Rice Centre by the method of marker selection from a hybrid population (C101A-51 × Boyarin) × (C101 LAC × Boyarin) with transfer of blast resistance genes. Sample C101 LAC is a donor of the genes *Pi-1* and *Pi-33*, and C101A-51 is a *Pi-2*. The growing season is 125 ± 1 days and the plant height is 96 ± 2 cm. The panicle is erect and compact, 17.5 ± 0.5 cm long, and bears 185 ± 5 spikelets. The grain is oval, 8.3 ± 0.2 mm long, 3.1 ± 0.1 mm wide, and 2.2 ± 0.1 mm thick and weighs 24.0 ± 2.0 mg. The yield of the Magnat variety was 8.25 t/ha, which is 1.1 t/ha higher than that of the Boyarin standard.

The rice variety **Pirouette** was bred at the Agrarian Research Center Donskoy, together with the Federal Scientific Rice Centre, by the method of stepwise hybridization and marker breeding from a hybrid population (C101-A-51 (*Pi-2*) × Boyarin) × (C101-Lac (*Pi-1* + *33*) × Virazh). It contains three blast resistance genes: *Pi-1, Pi-2,* and *Pi-33.* The variety is mid-ripening, the growing season from flooding to full ripeness is 124 ± 1 days. The average yield of the variety Pirouette was 9.57 t/ha, which is 1.13 t/ha higher than that of the standard variety Yuzhanin. Plant height is 88 ± 2 cm; the panicle is erect, compact, and 17.5 ± 0.5 cm long and carries 165 ± 5 spikelets. The spikelets are oval, 8.9 ± 0.2 mm long, and 3.7 ± 0.1 mm wide. The weight of 1000 grains is 31.6 ± 2.0 g. The variety is resistant to lodging and shedding, is cold-tolerant, and germinates well from under a layer of water. It has been ncluded in the Register of Breeding Achievements of the Russian Federation for the North Caucasus region since 2020.

The rice variety **Kapitan** was bred at the Agrarian Research Center Donskoy in cooperation with the Federal Scientific Rice Centre by the method of triple backcrossing and marker breeding from the Flagman × IR-36 hybrid population. The variety is mid-ripening and the growing season from the flooding to full ripeness is 120 ± 1 days. On average, over the years of competitive testing, the yield of the variety Kapitan was 8.13 t/ha, which is 0.64 t/ha higher than that of the variety Yuzhanin. A higher yield of this variety is formed due to more grain in the panicle and an increased weight of the caryopsis. The average height of plants is 90 ± 2 cm; the panicle is semi-inclined, compact, and 18.5 ± 0.5 cm long; and the average number of spikelets is 140 ± 10 pieces (Figure 5). The grains are oval, 9.5 ± 0.2 mm long, and 3.6 ± 0.1 mm wide. The average weight of 1000 grains is 35.0 ± 2.1 g. The variety carries the *Pi-ta* gene and is resistant to blast, lodging, and shedding. The variety has been under state testing since 2019.

**Figure 5.** Rice panicles of the variety Kapitan.

The rice variety **Argamak** was bred at the Agrarian Research Center Donskoy by individual multiple selection of plants with the largest panicles from a hybrid population Il. 14 (*Pi-1, Pi-2, Pi-33*) × Kuboyar. The variety belongs to the mid-ripening group, and the growing season from flooding to full ripeness is 119 days. On average, over the years of competitive testing (2017–2019), the yield of the variety was 8.79 t/ha, which is 1.59 t/ha higher than that of the variety Yuzhanin. The maximum yield was formed in 2019: 10.1 t/ha, 2.55 more than the standard. The high yield of this variety was formed due to the greater grain content of the panicle than that of the standard and the increased density of the stem. Plant height is 93 ± 2 cm on average; the panicle is erect, compact, and 16 ± 0.5 cm long; the average number of spikelets is 142 ± 6 pieces. The grains are oval,

8.4 ± 0.2 mm long, 3.3 ± 0.1 mm wide. Weight of 1000 grains—31.1 ± 1.9 g. The variety is resistant to blast, lodging, and shedding. It has been tested at state varietal testing since 2020.

The rice variety **Lenaris** (Federal Scientific Rice Centre) had shown high adaptability, non-lodging, and the possibility for straight combine harvesting. Its yield was 10.6 t/ha. Plants had high spikelet fertility and short stems (77 ± 5 cm) and were resistant to the Krasnodar population of *P. oryzae* as well. Their panicle is slightly drooping and compact; its length is 18 ± 1.0 cm. The mass of 1000 grains is about 30.4 ± 1.8 g.

In 2013–2014, the Agrarian Research Center Donskoy conducted crosses and obtained F1–F2 hybrids of the variety Novator with Asian donor rice varieties carrying the *SalTol* and *Sub1A* genes. The hybrids of the second generation varied significantly in terms of quantitative traits: growing season (from early ripening to non-flowering), plant height (75–122 cm), panicle length (14–25 cm), number of filled grains (80–206 pcs), number of spikelets (99–300 pcs), panicle density (4.4–16.6 pcs/cm), 1000-grain weight (26.3–34.9 g), grain weight per panicle (2.1–5.5 g), etc.

Hybridization of the salt-sensitive domestic variety Novator with the lines IR52713-2B-8-2B-1-2, IR74099-3R-3-3, and IR61920-3B-22-2-1 (NSIC Rc 106)—*SalTol* locus donors—was carried out. The first generation of hybrids was used to generate an F2 hybrid population. From the populations of plants of the second generation, 90 early-ripening samples with well-ripened grains (30 in each combination of crossing) were selected, which were analyzed by PCR for the presence of introduced *SalTol* alleles. As an example, Figure 2 shows the data of electrophoretic analysis of PCR products with the Rm493 marker. The donor allele of the parental line NSIC Rc 106, designated as 2.2, was found in a homozygous state in Sample 282. The rest of the plants, whose amplification spectra are presented in this electrophoregram, carried the alleles of the donor and the variety Novator; that is, they were heterozygous for the *SalTol* locus (Figure 2). Similar results were obtained during DNA analysis of the studied rice samples with the RM7075 marker (Figures 6 and 7).

**Figure 6.** Electrophoregram of genomic DNA amplification products with RM 493: 1.1, Novator; 1.2, NSIC Rc 106; 17–296, hybrid plants NSIC Rc 106 × Novator; DNA marker (100–1500 bp).

**Figure 7.** Electrophoregram of genomic DNA amplification products with RM 7075: 1.1, Novator; 2.1, NSIC Rc 106; 17–286, hybrid plants NSIC Rc 106 × Novator; DNA marker (100–1500 bp).

In general, according to the results of DNA analysis of F2 hybrids, 17 plants homozygous for the dominant allele of the *SalTol* locus were identified, 29 samples carried *SalTol* in a heterozygous state, and 44 plants showed only recessive alleles inherited from the variety Novator.

Segregation for *SalTol* genes did not fit into the Mendelian framework, since the sample was unrepresentative due to selection. Plants with recessive alleles of the gene prevailed, and the number of salt-tolerant dominant homozygotes was less than the expected number. This is due to the linkage of *SalTol* genes with genes unfavorable for plants in our conditions: photosensitivity, late maturity, spikelet shedding, and spinosity.

Testing plants under salinity in the early stages of development is a quick, common method based on simple criteria. It was shown that at the initial vegetation stage, the length of the root and shoots and seed germination are potential indicators of resistance to the effects of increased salt concentrations [18,19]. Evaluation of the potential salt tolerance of the studied rice hybrids and their parental forms revealed significant variations in salinity tolerance depending on the genotype. The greatest decrease in seed germination—52%—was found in the salt-sensitive variety Novator. The line NSIC Rc 106 and second-generation plants, which were homo- and heterozygous for the *SalTol* locus, showed the highest resistance by seed germination (germination decrease of less than 5%). The donor lines IR52713-2B-8-2B-1-2, and IR74099-3R-3-3 and hybrid combinations obtained on their basis with the *SalTol* gene in a homozygous state also showed high resistance for this trait.

The least suppression of growth indices, as well as in the case of seed germination, was recorded in the lines NSIC Rc 106, IR52713-2B-8-2B-1-2, IR74099-3R-3-3, and *SalTol* homozygous plants from the F2 generation; the greatest decrease in the length of shoots and roots under salt stress was shown in the variety Novator and in hybrid plants that did not inherit the *SalTol* locus according to molecular analysis data. Thus, DNA analysis made it possible to simplify the breeding scheme and obtain salt-tolerant F2 hybrids carrying the *SalTol* locus in a homozygous state. These results indicate that the developed codominant markers of the *SalTol* locus RM 493 and RM 7075 are an effective tool for marker-assisted selection of salt-tolerant forms based on domestic rice genotypes.

Rice samples with *SalTol* genes in 2018–2020 were studied in a control nursery and in competitive variety testing; productive forms were identified.

At the same time, in 2013, hybrids were obtained by crossing the variety Novator with donors of the *Sub1A* gene. The Asian varieties turned out to be late-ripening and photosensitive and did not flower under our conditions. Hybridization was carried out only with the help of artificial climate chambers. The first generation in 2013 was characterized by a high degree of sterility (90–95%) and brown color of the flowering scales during maturation, which indicates significant genetic differences between the parental forms. In the second generation in 2014, a very large spectrum of splitting was observed in terms of the growing season, plant height, panicle length and shape, number of spikelets, and spinosity (Table 2).

**Table 2.** Variations in the quantitative traits in F2 hybrids from crossing submergence-resistant samples with the variety Novator, 2014.


Note: The average value is indicated in brackets.

This wide range of variability is not observed in other crops. This is due to the genetic and ecological-geographical remoteness of the crossed forms. In each combination, about 400 plants were selected for morphometric and genetic analysis. Among the F2 hybrids, we managed to select the best plants according to many traits, combining early maturity, optimal plant height, good grain size in panicles, non-shattering, and fertility of spikelets (Table 3).


**Table 3.** Selected F2 hybrid plants from crossing submergence-resistant samples with Novator, 2014.

Note \*: (BxN), BR 11 × Novator; (CxN), CR-1009 × Novator; (IxN, Inbara 3 × Novator; (TxN), TDK-1 × Novator.

PCR analysis of leaves was carried out in 20 plants of each of the four hybrids, as a result of which, forms with the *Sub1A* flood resistance gene were isolated. The electrophoretic analysis of PCR products with the RM 7481 marker is shown in Figure 8. The donor allele of the parental line CR-1009 was homozygous in Samples 2, 3, 5, 9, 13 and 17. Plants 2, 4, 6–8, 10, 11, 16, 18, and 19 were heterozygous at the *Sub1A* locus; that is, they carried both the alleles of the donor and the recessive alleles inherited from the variety Novator. Thus, according to the results of PCR analysis with the RM 7481 marker, 14 homozygotes of F2 plants at the *Sub1A* locus were identified, 40 samples carried *Sub1A* in a heterozygous state, and 22 plants inherited only the recessive allele from the variety Novator.

Of the analyzed BR-11 × Novator hybrid plants, the Sub1A gene (in homo- and heterozygous state) was present in nine, i.e., in a ratio of 9:11, although with monohybrid segregation, it should have been 15:5. In the hybrid combination CR-1009 × Novator, F2 segregated in a ratio of 18:2, i.e., almost all of the selected plants had the *Sub1A* gene. In the hybrids Inbara-3 × Novator and TDK-1 × Novator, segregation took place in a ratio of 14:6 or approximately 3:1, i.e., close to Mendelian.

The deviations in segregation of the two combinations can be explained by the influence of selection and gene linkage. A total of 55 plants with the target gene in the homo- and heterozygous state were isolated from 80 plants of four hybrids. The selected samples with the *Sub1A* gene in 2015 were reproduced in the Federal State Unitary Enterprise "Proletarskoye"of the Rostov Region, where the best F3 plants were selected from them for DNA analysis.

In F3 plants, significant morphological and biological segregation continued. Significant variation was noted for the growing season, plant height, size of panicles and caryopses, fertility, grain shedding, etc. The best forms were selected from them and leaves were taken for DNA analysis. At the next stage of work, in 2016–2020, constant lines carrying the *Sub1A* gene in a homozygous state were selected and tested for yield and resistance to prolonged water flooding. As a result, rice varieties for herbicide-free technologies will be developed, vigorously overcoming a deep layer of water in the germination phase with minimal seed loss.

#### **4. Conclusions**

1. As a result of the studies carried out using molecular marking based on PCR in combination with traditional breeding, early-maturing rice lines with genes for resistance to salinization (*SalTol*) and to flooding (*Sub1A*), which are suitable for cultivation in the south of Russia, were isolated.

2. Rice lines have been developed, the genotype of which contains five effective blast resistance genes (*Pi-1, Pi-2, Pi-33, Pi-ta,* and *Pi-b*). The introduction of such varieties into production will allow us to avoid epiphytotic development of the disease, preserving the biological productivity of rice and obtaining environmentally friendly agricultural products.

3. Samples of the F4 and BC2F3 generations were obtained with combined blast resistance (*Pi*) and prolonged flooding tolerance (*Sub1A*) genes as a factor in the control of weeds in the homo- and heterozygous state, which was confirmed by the data of their DNA analysis. The testing of the obtained rice breeding resources for submergence tolerance under laboratory conditions made it possible to select tolerant rice forms that will be studied in the breeding process for a complex of agronomically valuable traits. Their use will reduce the use of chemical plant protection products against diseases and weeds, thereby increasing the ecological status of the rice-growing industry.

The research was carried out with the financial support of the Kuban Science Foundation in the framework of the scientific project № 20.1/1.

**Author Contributions:** Conceptualization, P.K.; methodology, E.D. and K.A.; validation, P.K.; formal analysis, E.D.; investigation, E.D., P.K., M.R., S.L. and E.K.; resources, E.D.; writing—original draft preparation, E.D., P.K. and K.A.; project administration, E.D.; funding acquisition, E.D. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by Federal Scientific Rice Centre.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### **Nanopore RNA Sequencing Revealed Long Non-Coding and LTR Retrotransposon-Related RNAs Expressed at Early Stages of Triticale SEED Development**

**Ilya Kirov 1,2,\*, Maxim Dudnikov 1,2, Pavel Merkulov 1, Andrey Shingaliev 1, Murad Omarov 1,3, Elizaveta Kolganova 1, Alexandra Sigaeva 1, Gennady Karlov <sup>1</sup> and Alexander Soloviev <sup>1</sup>**


Received: 24 November 2020; Accepted: 15 December 2020; Published: 17 December 2020

**Abstract:** The intergenic space of plant genomes encodes many functionally important yet unexplored RNAs. The genomic loci encoding these RNAs are often considered "junk", DNA as they are frequently associated with repeat-rich regions of the genome. The latter makes the annotations of these loci and the assembly of the corresponding transcripts using short RNAseq reads particularly challenging. Here, using long-read Nanopore direct RNA sequencing, we aimed to identify these "junk" RNA molecules, including long non-coding RNAs (lncRNAs) and transposon-derived transcripts expressed during early stages (10 days post anthesis) of seed development of triticale (AABBRR, 2*n* = 6*x* = 42), an interspecific hybrid between wheat and rye. Altogether, we found 796 lncRNAs and 20 LTR retrotransposon-related transcripts (RTE-RNAs) expressed at this stage, with most of them being previously unannotated and located in the intergenic as well as intronic regions. Sequence analysis of the lncRNAs provide evidence for the frequent exonization of Class I (retrotransposons) and class II (DNA transposons) transposon sequences and suggest direct influence of "junk" DNA on the structure and origin of lncRNAs. We show that the expression patterns of lncRNAs and RTE-related transcripts have high stage specificity. In turn, almost half of the lncRNAs located in Genomes A and B have the highest expression levels at 10–30 days post anthesis in wheat. Detailed analysis of the protein-coding potential of the RTE-RNAs showed that 75% of them carry open reading frames (ORFs) for a diverse set of GAG proteins, the main component of virus-like particles of LTR retrotransposons. We further experimentally demonstrated that some RTE-RNAs originate from autonomous LTR retrotransposons with ongoing transposition activity during early stages of triticale seed development. Overall, our results provide a framework for further exploration of the newly discovered lncRNAs and RTE-RNAs in functional and genome-wide association studies in triticale and wheat. Our study also demonstrates that Nanopore direct RNA sequencing is an indispensable tool for the elucidation of lncRNA and retrotransposon transcripts.

**Keywords:** long non-coding RNAs; seed development; Nanopore sequencing; retrotransposons; triticale

#### **1. Introduction**

Long non-coding RNAs (lncRNAs) are a diverse set of RNAs longer than 200 bp with no or very little coding potential. Traditionally, lncRNAs are considered to be protein non-coding, although many of them carry small open reading frames and encode functional peptides in different plants [1]. A broad range of functionality has been described for lncRNAs in plants, including the miRNA sponge, protein scaffolding, and post-transcriptional regulation of target genes via antisense pairing followed by mRNA decoy [2]. Based on their genomic localization regarding other genes, lncRNAs are broadly grouped into different classes: (1) lincRNAs or long non-coding intergenic RNAs; (2) NAT lncRNAs or natural anti-sense lncRNAs; (3) intronic lncRNAs, located in the introns; and (4) sense lncRNAs [3]. LncRNAs have been identified in many plant species and their expression in response to various abiotic and biotic stresses has been investigated, although our knowledge about lncRNA functions is still very limited [4–9]. However, the catalogue of lncRNAs for some plants with complex polyploidy and repeat-rich genomes, including wheat (*Triticum aestivum* L.), remains mainly underexplored [10].

Intergenic space and introns are frequently sources of lncRNA origin, although these genomic regions are enriched by insertions or remnants of transposable elements (TEs) [11]. Corroborating this, lncRNA sequences are more often associated with TEs than protein-coding genes and the bulk of them possess TE-related sequences [11–16]. For example, up to 75% of human lncRNAs have at least one exon with sequences originating from TEs [12,17]. A similar trend was demonstrated in some plant species, including maize, where 65% of lncRNAs had similarities to TEs [18]. More intriguingly, the TE-derived sequences may trigger the origin of new lncRNAs, providing a positive feedback loop with the evolution of lncRNAs [11,19]. TE-derived lncRNAs can have important and conserved biological functions [20]. For example, the rice lncRNA MIKKI is derived from LTR retrotransposons and has been shown to sequestrate miR171 and prevent degradation of its targets, mRNAs of SCARECROW-Like (SCL) transcription factors, in roots [20]. It is important to note that TEs can become transcriptionally and transpositionally active under certain circumstances, including stressful conditions, and in some developmental stages (e.g., microsporogenesis) and tissues (e.g., developing endosperm) [21–23]. However, individual TE transcripts and their coding potential have only been studied episodically in plants. The association of lncRNAs with repeat sequences like TEs makes the annotation of many lncRNAs challenging because of the ambiguity in the mapping of repeat-derived short RNAseq reads. RNAseq reads with multiple mapping positions in the genome sequence are frequently discarded from further analysis. Although some tools have been developed so far to overcome these obstacles, most of the lncRNA identification pipelines still ignore ambiguously mapped reads [24,25]. Thus, long-read sequencing technologies provide a great opportunity for transcriptome exploration, including the identification of transcribed repeats (e.g., transposable elements) or repeat-related transcripts [26]. Moreover, Panda and Slotkin [26] showed that by using Nanopore long cDNA reads, it is even possible to trace the expression of individual TEs from multicopy families in *Arabidopsis* and maize. The application of long-read technologies to the exploration of the lncRNA repertoire in plants has been demonstrated for several plant species, including *Oryza sativa* L. ssp. *japonica* [27], *Populus simonii* [28], *P. qiongdaoensis* [29], poplar "Nanlin 895" [30], *Trifolium pratense* [31], *Cardamine violifolia* [32], and *Vigna angularis* [33]. These studies demonstrated that long-read sequencing can be used to obtain a comprehensive catalogue of lncRNAs.

Grain development is one of the most important and practically relevant biological processes. It involves massive biochemical, physiological, and transcriptomic changes [34–36]. Wheat grain development is divided into five stages: (i) undifferentiated embryo and cellularization of the endosperm (0–7 days post anthesis (dpa)); (ii) embryo differentiation the embryo with formation of

the main cell types (transfer cells, aleurone, starchy endosperm and the surrounding cells) (7–14 dpa); (iii) root and leaf primordia differentiation, full kernel development and the milk-ripe stage (14–21 dpa); (iv) further growth and differentiation of primary and lateral roots, and the dough stage of endosperm (21–31 dpa); (v) fully differentiated embryo (31–50 dpa) [34]. Studies of the transcriptome during wheat seed development have been extensively elucidated using RNAseq sequencing [35,37–43]. Global transcriptome analysis of developing seeds has shown that the expression of protein-coding genes is highly dynamic. Recently, Madhavan et al. (2020) used publicly available Illumina RNAseq reads and identified 443 lncRNAs expressed during the grain filling stage (14 and 30 dpa) [44]. It is currently unknown which types of lncRNA are expressed during other stages of wheat seed development.

Here, we used Nanopore long-read sequencing to discover lncRNAs, TE transcripts, and TE-related lncRNAs that are specifically expressed during the cell proliferation stage of seed development (10 dpa) in triticale (× *Triticosecale* Wittmack, AABBRR genome, 2*n* = 6*x* = 42) an interspecific hybrid between wheat and rye (*Secale cereale* L.). We identified 796 lncRNAs and 20 LTR retrotransposon-derived transcripts, with most of them being previously unannotated. The majority of the detected retrotransposon RNAs had a single intron, carried open reading frames (ORFs) encoding for a diverged set of GAG proteins, and were encoded by potentially autonomous and non-autonomous retrotransposons. The lncRNAs were also expressed during wheat seed development and had high stage specificity. Moreover, we found that lncRNA loci were biased toward frequent TE sequence exonization and were mainly located in the intergenic regions of A and B genomes of triticale. Our work explored the lncRNA landscape during the early stage of wheat and triticale seed development and provides a unique dataset for further functional studies of lncRNA and TEs, and their implications for seed development. Finally, the identified lncRNAs can be further incorporated into genome-wide association studies for marker-assisted improvement of the bread quality of modern triticale genotypes.

#### **2. Material and Methods**

#### *2.1. Plant Material and DNA Isolation*

For this study, the spring triticale line "L8665" obtained from the Department of Genetics, Russian State Agrarian University, was used. For DNA isolation, seeds of this line were germinated in the dark (room temperature) during 5–7 days and genomic DNA was isolated by the cetyltrimethylammonium bromide (CTAB) protocol [45].

#### *2.2. Sample Collection and RNA Isolation*

Plants of the spring triticale line "L8665" were grown in a greenhouse under natural light conditions. Developing seeds at 10 days post anthesis and flag leaves were dissected and placed into liquid nitrogen. RNA was isolated by the ExtractRNA kit (Evrogen, Moscow, Russia) following the manufacturer's instructions. The RNA concentration and integrity were estimated by Nanodrop (Nanodrop Technologies, Wilmington, CA, USA) and gel electrophoresis using an 1.2% agarose gel with ethidium bromide staining.

#### *2.3. RT-PCR*

For RT-PCR, RNA was treated by DNAse I (Qiagen, Hilden, Germany, Q-79254) following the manufacturer's instruction and used for cDNA synthesis. cDNA was synthesized using the MMLV RT kit (Evrogen, Moscow, Russia). Primers used for RT-PCR and the expected fragment lengths are listed in Table 1. The CDC (Cell division control protein, AAA-superfamily of ATPases; Ta54227) gene was used as a reference because of its high expression stability as shown by a previous study [46].


#### **Table 1.** Primers used for RT-PCR.

The PCR conditions were 94 ◦C for 1 min; 35 cycles of 94 ◦C for 1 min, 58 ◦C for 1 min, and 72 ◦C for 1 min; and a final elongation of 72 ◦C for 3 min.

#### *2.4. Nanopore Direct RNA Sequencing and Transcript Assembly*

Poly-A+ mRNA was purified from 100 μg of total RNA by the Dynabeads mRNA DIRECT Kit (ThermoFisher Scientific, Waltham, MA, USA) following the manufacturer's instructions. Final poly-A+ RNA concentration was measured using a Quantus Fluorometer (Promega Corporation, Madison, WI, USA) and checked by gel electrophoresis. For Nanopore sequencing, a library was prepared from 1 μg Poly-A+ using the nanopore Direct RNA Sequencing Kit SQK-RNA002 (Oxford Nanopore Technologies, Oxford, UK). Direct RNA Sequencing (DRS) was carried out using MinION and flow cell FLO-MIN106. Basecalling was performed by Guppy (Version 4.0.11).

For transcript assembly, sequences of the A and B genomes and unanchored scaffolds (Un) of *Triticum aestivum* ([48]) downloaded from the EnsemblPlants server (https://plants.ensembl.org/ Triticum\_aestivum/Info/Index) were combined with *Secale cereale* genome sequences [49] into a single fasta file. Reads were mapped to the obtained fasta file by minimap2 software [50] with the '-ax splice' argument. The obtained sam file was converted to a bam file by samtools [51] (samtools view -Sb) followed by sorting of the bam file by bamtools [52]. Transcript assembly was performed by StringTie2 [53] with the following arguments: –L –j 2 –f 0.05. The obtained gtf file was converted to gff format by the gffread tool [54]. The sorted bam and gff files were used for read mapping visualization by the locally installed JBrowse [55].

To obtain a high-confidence set of transcripts, we extracted transcript sequences using gtf from StringTie2 assembly by the gffread tool [54]. The reads were then mapped back to the transcripts using minimap2 software (settings: -ax map-ont), and the bam file with only primary alignments and mapping quality > 30 was obtained using samtools with –F 256 –q 30 –b parameters. This bam file was used to count the number of reads per transcripts and transcripts with > 5 DRS reads were selected.

#### *2.5. Long Non-Coding RNA Prediction*

To identify lncRNAs, all high-confidence transcripts with a length of >200 bp were selected using biopython [56]. Transcripts with open reading frame (ORF) lengths of >300 bp predicted by getorf were filtered out. Protein coding potential was calculated by three tools: LncFinder [57], PLEK [58], and CNCI [59]. LncFinder [57] was run in Rstudio Version 1.2.1335 (http://www.rstudio.com/) with R version 3.6.0. The following parameters were utilized: parallel.cores = 20, SS.features = TRUE, format = "DNA", frequencies.file = "wheat", svm.model = "wheat". PLEK [58] and CNCI [59] were run with

the default settings. Only transcripts identified by all three tools as non-coding and without similarity to Pfam domains were classified as lncRNAs.

To identify lncRNAs with exons overlapping the annotated TEs, we intersected CLARITE TE annotation (https://urgi.versailles.inra.fr/download/iwgsc/IWGSC\_RefSeq\_Annotations/v1.0/) with the genomic coordinates of the assembled exons.

#### *2.6. Retrotransposon-Related Transcript Annotation*

LTR retrotransposons (RTEs) were predicted in the genome using LTRharvest 1.5.10 with the default parameters [60] and LTRdigest 1.5.10 [61] with the following parameters: -aaout yes -pptlen 10 30 -pbsoffset 0 3 -pdomevalcutoff 0.001. Hidden markov model (HMM) profiles of RTE domains were downloaded from the GyDB database [62]. The gff3 file from the LTRdigest analysis was treated by a custom Python script (https://github.com/Kirovez/LTR-RTE-analysis/blob/master/LtrDiParser\_v2.2.py) to extract sequences of LTR retrotransposons possessing similarity to any RTE domains including GAG, reverse transcriptase, RNAse H, aspartic protease, and integrase. To identify retrotransposon-related transcripts, the TEsorter tool (parameters: -eval 0.00001 -db gydb) was run with the set of all high-confidence transcripts. The transcripts with a similarity to RTE domains were manually checked in the locally installed JBrowse [55]. TEsorter data were also used for RTE classification. We also ran TEsorter with confident transcripts and the RExDB [63] database (-db rexdb) to identify the transcripts of Class II transposons but no transcripts were detected.

#### *2.7. GAG Protein Analysis*

To find GAG proteins, ORFs were predicted for RTE transcripts and Blastp with corresponding proteins was run followed. GAG proteins were aligned by MAFFT [64] with the standard parameters and a phylogenetic tree was constructed using iTOL [65]. The multiple alignment visualization was carried out in Jalview version 2.11.1.3 [66]. RNA-binding motifs (CX2CX4HX4C, where X is any amino acid) were identified by a custom Python script (https://github.com/Kirovez/LTR-RTE-analysis/blob/ master/RBM\_GAG\_screen.py).

#### *2.8. Gene Ontology Enrichment*

Gene ontology (GO) enrichment analyses was performed using ShinyGO v0.61 [67] (http:// bioinformatics.sdstate.edu/go/) with an false discovery rate (FDR) <0.01.

#### *2.9. Expression Analysis*

For RNA-Seq analysis of lncRNA and RTE transcript expression in different organs and development stages, publicly available data were used (Table 2). Reads were mapped on the de novo assembled transcriptome by Hisat2 [68] with the default options. The obtained files with alignments were used to calculate RPM (read per million reads) values for every transcript. For this purpose, we used Salmon v0.8.1 [69] and the quant command with the default parameters.


**Table 2.** Publicly available RNAseq data used in this study.

#### *2.10. Extrachromosomal Circular DNA Isolation*

Extrachromosomal circular DNA (eccDNA) was isolated and amplified according to the protocol of Lanciano et al. [72] with several modifications. Briefly, 5 μg of genomic DNA was treated by Plasmid-Safe ATP-Dependent DNAse (Epicenter, Madison, WI, USA) for 48 h according to the manufacturer's instructions. DNA precipitation was carried out by 0.1 volume 3 M sodium acetate and 2.5 volume absolute ethanol, followed by overnight incubation at −20 ◦C. After centrifugation, the eccDNA pellet was obtained and exposed to the rolling circle amplification (RCA) reaction by the Illustra TempliPhi 100 Amplification Kit (GE Healthcare, Chicago, IL, USA) for 65 h at 28 ◦C. Detection of the eccDNA of LTR retrotransposon TaeST2.45518.1 (named '*MIG*', location in wheat genome: 7B: 312,336,869 ... 312,341,902 (5 kb)) was performed by inverse PCR with specific primers: Forward: CACACCACTAGCAACCTCCA ; Reverse: TGCTTGTGACAAGATGGGCA. The PCR conditions were 94 ◦C for 1 min; 35 cycles of 94 ◦C for 1 min, 58 ◦C for 1 min, 72 ◦C for 1 min; and final elongation at 72 ◦C for 3 min.

#### *2.11. Statistics and Visualization*

Statistical analysis was done in Rstudio Version 1.2.1335 (http://www.rstudio.com/) with R version 3.6.0. Bar plots, density plots, and box plots were drawn by ggplot2 [73]. Heatmaps were constructed by the ComplexHeatmap [74] R package.

#### **3. Results**

#### *3.1. Direct Oxford Nanopore RNA Sequencing*

Total RNA was isolated from whole spikes of hexaploid triticale (AABBRR) cv. L8665 at 10 days post anthesis (dpa). This RNA was used for direct RNA sequencing by MinION (Oxford Nanopore). In total, 1,100,000 direct RNA sequencing (DRS) reads with N50~1.1 kb were obtained (Figure 1). To assemble the transcripts, reads were mapped to the genome sequence created artificially by combining the A and B genome sequences of the wheat chromosome-level assembly and rye draft genome contigs. Overall, 82,785 transcripts from 74,904 loci were assembled with 47,378 and 26,169 loci located in genomes A/B (AB lncRNAs) and R. A total of 36,490 transcripts had >5 mapped DRS reads, representing a set of transcripts with high confidence that was used for further analysis.

To estimate the triticale seed development phase used for Nanopore sequencing, we determined the expression values (reads per million mapped reads (RPM)) of the key genes involved in starch biosynthesis (expression started at Phase 1), the genes of storage proteins (high molecular weight glutenins and gliadins), and those of the *wbm* protein (Table 3), which are expressed during the grain filling stage. We identified the expression of starch biosynthesis genes, while no genes of storage proteins or the *wbm* protein were expressed. This suggests that the seeds used for Nanopore sequencing were in the early stages of seed development (before 14 dpa).

Thus, using direct Nanopore RNA sequencing, we assembled a high-confidence transcriptome of triticale seed at the early development stage and detected the expression of key genes known to be involved in the biological process (starch biosynthesis) occurring at this stage.

#### *3.2. Long Non-Coding RNA Prediction*

The assembled high-confidence set of transcripts was used for long non-coding RNA (lncRNA) prediction. The following criteria were applied to distinguish lncRNAs from protein-coding transcripts: (1) transcript length of >200 bp; (2) transcripts with an ORF length of <300 bp; (3) transcripts classified as non-coding by three tools including LncFinder [57], PLEK [58], and CNCI [59]; and (4) transcripts with no similarity to any Pfam domains. Using these criteria, we identified 796 triticale lncRNAs (Supplementary Files S1 and S2) encoded by 780 loci in Genomes A (167 lncRNAs), B (212 lncRNAs), and R (410 lncRNAs) and in unanchored wheat scaffolds (seven lncRNAs) (Figure 1A). Most of the lncRNAs had lengths of <1000 bp (Figure 1B). LncRNAs (386 transcripts) encoded by the loci of

Genomes A and B or unanchored wheat scaffolds (AB lncRNAs) were used for further analysis because of the significantly better annotation of these genomes compared with the R genome. Intersection of the genome position of the AB lncRNA loci with lncRNA and mRNA loci previously annotated in the A or B wheat genomes showed that 281 (73%) of the triticale AB lncRNAs are located in the previously unannotated genomic regions (Figure 1C). This number is significantly higher (Fisher's Exact Test, *<sup>p</sup>*-value <sup>&</sup>lt; 2.2 <sup>×</sup> <sup>10</sup><sup>−</sup>16) than that for non-lncRNA AB transcripts (10%, 2523), pointing to the underexplored nature of lncRNA loci. Moreover, only 13% (106) of the AB lncRNAs in our dataset were previously known.

**Figure 1.** Identification and classification of triticale long non-coding RNAs (lncRNAs). (**A**) Bar plot showing the number of triticale lncRNA loci located on different chromosomes of Genomes A, B, and R and unanchored wheat contigs (Un). Sc indicates loci mapped to *Secale cereale* contigs. (**B**) Density plot of the lncRNA length distribution. (**C**) Pie graph showing the portion of the AB lncRNAs located in the unannotated genomic regions. (**D**) The portion of genes encoding AB lncRNAs and other RNAs (not classified as lncRNAs) that have exons with transposable elements. Three stars (\*\*\*) indicate significant differences based on Fisher's Exact Test for Count Data, p-value < 0.001. (**E**) RT-PCR with specific primers on three genic lncRNAs of distinct types (**F**), and RNA isolated from developing grain (10 days post anthesis (dpa)) and flag leaves. CDC: reference gene (the cell division control protein). (**F**) Types of genic lncRNAs and the corresponding wheat genes.



LncRNAs are frequently associated with transposons. We found that 111 (29%) AB lncRNAs had exons that had transposon sequences with a length of >50 bp (Figure 1D). This was significantly more than that expected by chance (10%, Fisher's exact test, *<sup>p</sup>*-value <sup>&</sup>lt; 2.2 <sup>×</sup> <sup>10</sup><sup>−</sup>16). Of the TE-related lncRNAs, 61, 47, and 3 lncRNAs had exons with similarity to Class I, Class II, and unclassified TEs, respectively.

The AB lncRNAs were classified regarding the position of the annotated wheat protein-coding genes, resulting in 17 genic lncRNAs. Gene ontology analysis revealed that the genes overlapping with genic lncRNAs were significantly enriched (FDR < 0.01) in several Gene Ontology categories including "vesicle-mediated transport", "lipid modification", "ATPase activity", and "hydrolase activity". The genic lncRNAs were of different types, with Intronic-antisense (two lncRNAs, Figure 1F (top)), exonic-anti-sense (five lncRNAs, Figure 1F (middle)), and exonic-sense (nine lncRNAs, Figure 1F (bottom)) transcripts being the most common types. In addition, exonic (sense and anti-sense) lncRNAs were found. The expression levels of three types of genic lncRNA (depicted in Figure 1F) belonged to the distinct types estimated in developing grain (10 dpa) and flag leaves (Figure 1E). The RT-PCR results suggested that two lncRNAs (lnc001 and lnc003) were expressed in both samples, while the expression of one type of lncRNA (lnc002) only occurred in developing grain (10 dpa).

Altogether, we identified hundreds of previously unknown genic and intergenic lncRNAs of triticale and showed that they frequently possess the remnants of Class I and Class II transposable elements.

#### *3.3. AB lncRNAs Are Prone to Tissue-Specific Expression*

To find lncRNAs with possible specific roles during seed development, we used wheat RNAseq data to estimate lncRNA expression in several seed developmental stages, leaf tissues, and pistils of wheat.

A non-zero expression value in at least one condition was obtained for 351 AB lncRNAs. To estimate any biases in AB lncRNA expression compared with all high-confidence transcripts assembled from DRS reads, we calculated the tissue-specificity index (TSI). The results showed that the TSI was significantly (according to the Wilcoxon rank sum test with continuity correction, *<sup>p</sup>*-value <sup>&</sup>lt; 2.2 <sup>×</sup> <sup>10</sup><sup>−</sup>16) higher for lncRNAs, suggesting high tissue specificity of lncRNA expression (Figure 2A). We further found that 46% of the lncRNAs had their highest level of expression at 10, 20 or 30 dpa, with 107 lncRNAs having their maximum expression level at 10 dpa; this is in accordance with the type of our triticale sample (10 dpa) used for RNA isolation (Figure 2B). Moreover, we identified 95 AB lncRNAs for which >90% of the sum of RPKM (reads per kilobases per million reads) values across all the samples accounted for the 10, 20, and 30 DPA stages (Figure 2C).

Thus, the expression pattern of the identified lncRNAs was found to be tissue-specific, with almost half of the lncRNAs demonstrating the maximum expression level during seed development.

#### *3.4. Retrotransposon-Related Transcripts Encoding GAG Proteins Are Expressed during Early Seed Development in Triticale*

Early embryonic and endosperm development may be accompanied by the activation of transposable element (TE) activity [22]. Therefore, we analyzed the assembled transcripts for the presence of open reading frames encoding TE-related proteins. We focused on the transcripts of Genomes A/B because of the high quality of wheat genome assembly and annotation compared with the rye genome. No transcripts corresponding to DNA transposons were detected. However, we found 20 transcripts (RTE-RNAs) carrying a single ORF with similarities to distinct proteins of LTR retrotransposons. Surprisingly, we found no transcripts encoding for the full set of RTE proteins (GAG and POL). To check whether any RTE-RNAs were encoded by LTR retrotransposons with detectable LTR sequences (RTEs), we predicted RTEs in Genomes A and B. The results showed that 5 and 10 RTE-RNAs (15) were transcribed from full-length (potentially autonomous) or non-complete (one or more RTE domains were not detected while both LTRs were present) RTEs, respectively. Thus, almost 25% of the RTE-RNAs were found to be transcribed from potentially autonomous RTEs. For five RTE-RNAs, no associated RTEs were predicted, but 75% (15) of the RTE-RNAs were found to carry an ORF encoding a single GAG protein (GAG-RNAs). Of those, eight (53%) and five (33%) GAG-RNAs were encoded by full-length (Figure 3A) and non-complete copies of LTR retrotransposons, respectively. For two (14%) GAG-RNAs, no corresponding RTEs were identified (Table 4). It should be noted that four GAG-RNA genes were found to be located in the introns of three annotated protein-coding genes in the sense or anti-sense orientation, including TraesCS2B02G261900 (sense and anti-sense), TraesCS5A02G298800 (anti-sense), and TraesCS1B02G222500 (sense). Two GAG-RNA genes (TaeST2.11597.1 and TaeST2.11598.1) were found to be located in the introns of the same gene (TraesCS2B02G261900).

**Figure 2.** Expression patterns of triticale AB lncRNAs based on wheat RNAseq data. (**A**) Boxplot of the tissue-specificity index (TSI) for lncRNAs and all AB high-confidence transcripts assembled from direct RNA sequencing (DRS) reads (TSI values close to 1 represent high tissue specificity). Stars indicate significant differences estimated by the Wilcoxon rank sum test with continuity correction (*p*-value < 2.2 <sup>×</sup> <sup>10</sup><sup>−</sup>16). (**B**) Bar plot showing the number of AB lncRNAs with the maximum expression (reads per kilobases per million reads, RPKM) level at a specific stage of wheat development (10, 20, and 30 days post anthesis (dpa)) or tissue (leaves and pistils). (**C**) Heatmap of expression values (log(RPKM)) of AB lncRNAs with >90% of RPKM values based on wheat RNAseq data (10, 20, and 30 dpa; pistils and leaves).

**Figure 3.** Identification, phylogenetic analysis, and expression pattern of RTE-related transcripts. (**A**) Full-length LTR retrotransposons of Tork (TaeST2.19707.1) and Retrofit (TaeST2.45518.1) lineages expressing a short isoform encoding the GAG protein (shGAG). The dark blue rectangles show ORFs encoding all retrotransposon proteins (genome scheme) and the GAG protein (isoform scheme). Orange color indicates untranslated regions (UTRs). The Nanopore direct RNA read alignment on the LTR retrotransposon genome sequence is also shown. (**B**) Neighbor-joining phylogenetic tree built from 15 GAG proteins. Red and gray highlight the GAG proteins expressed from RTEs without one or more encoded proteins and from loci without predicted RTEs, respectively. Red stars indicate GAG proteins with <300 aa. The vertical red line indicates a group of GAGs encoded by truncated retrotransposons possessing only GAG ORFs. (**C**) Multiple alignment of Ty1/Copia GAG proteins. Green and red highlight variable regions between the Tork and Retrofit regions. Gray shows the RNA-binding motif (CX2CX4HX4C, where X is any amino acid). (**D**) Heatmap of the log(RPKM) expression of isoforms encoding GAG proteins in wheat leaves and pistils, and during seed development.


**Table 4.** Number of retrotransposon-related transcripts (RTE-RNAs) in different groups based on completeness of associated LTR retrotransposons (RTEs) and the type of predicted RTE proteins encoded by open reading frames (ORFs) (reverse transcriptase, RT; aspartic protease, AP; RNAse H).

We further focused on GAG-encoding RTE-RNAs (GAG-RNAs) as the most represented group. Oxford Nanopore RNA sequencing provides a unique opportunity to analyze the exon–intron structure of GAG-RNA-encoded loci and predict the deduced full GAG protein sequences. Classification of the deduced GAG proteins showed that 85% (14) and 15% (1) of them belong to Ty1/Copia and Ty3/Gypsy elements, respectively. Based on the information from the transcript assembly, we grouped all GAG-RNAs into three categories based on the number of introns they possessed: (1) a single intron, (2) two introns, and (3) no introns. The vast majority (86%, 13) of GAG-RNAs were found to carry a single intron (Figure 3A), while one GAG-RNA from Ty3/Gypsy had no introns, showing that the exon–intron structure may differ between the two LTR retrotransposon superfamilies. To understand the functional role of splicing in the generation of GAG-RNA transcripts, we also predicted ORFs for unspliced RNA variants corresponding to the regions between two LTRs. We observed that unspliced transcripts expressed from the whole RTEs had significantly longer ORFs, resulting in GAG proteins being fused with other RTE proteins. Additionally, for two GAG-RNA encoding RTEs (TaeST2.19707.1 and TaeST2.45518.1, Figure 3A), only one very long ORF (>4000 bp) was predicted, while other RTEs were found to have two or three ORFs encoding distinct proteins. Thus, we showed that the splicing of the GAG-RNA isoform is critical to ensure the production of the entire GAG protein.

We next performed a comparison and phylogenetic analysis of the amino acid sequences of the 15 GAG proteins. The multiple alignment revealed significant differences between one Ty3/Gypsy GAG (TaeST2.45979, GAG length 514 aa) and the Ty1/Copia GAG proteins. We compared the Ty1/Copia GAG sequences and found that Ty1/Copia GAGs originated from two RTE lineages, Tork (seven sequences) and Retrofit (seven sequences). Phylogenetic tree analysis revealed three and two groups of highly similar (up to 99%) GAG proteins in the Retrofit and Tork lineages, respectively (Figure 3B). In addition, we detected pronounced sequence divergence between the GAG proteins of Tork and Retrofit lineages. The GAG protein of the Retrofit lineage has a ~50 aa-specific C-end which is not found in the Tork GAG sequences (Figure 3C). In turn, Tork GAGs have specific amino acid sequences before the RNA-binding motif (RBM) site. These differences are also reflected in the phylogenetic tree's topology, where the branches corresponding to the Tork and Retrofit Ty1/Copia lineages are readily distinguishable (Figure 3B).

In addition, we noticed that the divergence of groups in a single lineage correlated well with the completeness of the RTEs expressing corresponding GAG-RNAs; the most diverged group of GAG was the one with the Tork lineage and encoded by truncated retrotransposons (Figure 3B) that had a single ORF-encoding GAG.

We then analyzed the GAG protein sequences in more detail. In particular, we estimated the presence of the RNA-binding motif (RBM)(CX2CX4HX4C, where X is any amino acids), a specific part of GAG proteins, which is responsible for GAG–RNA interactions. We found that RBM could be identified in all except three GAG proteins, including one Ty3/Gypsy (TaeST2.45979.1) and two Ty1/Copia elements (TaeST2.14377.1, TaeST2.16660.1) of the Tork lineage. The Ty1/Copia GAG-RNAs without the CX2CX3GHX4X motif (TaeST2\_14377 and TAeST\_16660) are truncated GAG proteins with smaller protein lengths (175 aa and 182 aa vs >300 aa for the full-length Ty1/GAG protein) and probably very limited functionality. Altogether, the results of the phylogenetic analysis suggest that a divergent set of GAG proteins, including truncated GAGs with no RNA-binding capacity, is expressed during triticale seed development.

We then estimated the expression patterns of the GAG-RNAs in several seed developmental stages and in leaf tissues and pistils of wheat. Nine of the 15 GAG-RNA loci had a specific expression pattern with maximum expression levels at early developmental stages or in pistils. For two GAG-RNA loci (TaeST2.14377.1 and TaeST2.44075.1), the expression level was too low, indicating a possible triticale-specific expression pattern (Figure 3D). Thus, the expression data showed that most of the identified triticale GAG-RNAs were also expressed during wheat seed development, and some RTEs expressed genomic RNA (gRNA) as well as a short isoform (shGAG) carrying ORF for GAG protein.

Overall, our results suggested that tens of transcripts encoded by full-length and truncated LTR retrotransposon copies are expressed at early stages of triticale seed development. Three-fourths of these RNAs carry ORFs for encoding a set of GAG proteins of variable length and phylogenetic diversity.

#### *3.5. A Full-Length Ty1*/*Copia LTR Retrotransposon Is Active in Triticale Seeds*

To transpose in the genome, RTEs need to express the full-length genomic RNA (gRNA). Although we did not detect gRNA for the RTEs expressing TaeST2.19707.1 (RTE3B, location in wheat genome: 3B:555,156,557 ... 555,163,131 (6.58 kb)) and TaeST2.45518.1 (named '*MIG*', location in wheat genome: 7B:312,336,869 ... 312,341,902 (5 kb)) shGAG RNAs by Nanopore sequencing (Figure 3A), we performed RT-PCR with primer pairs designed to detect (a) shGAG isoforms and (b) gRNA isoforms. The RT-PCR analysis resulted in detection of the expression levels of shGAG and gRNAs in developing triticale seeds (10 dpa) and flag leaves, although the gRNA expression level was lower (Figure 4A). Next, we assessed whether the RTEs were capable of transposing. To answer this question, we determined the generation of the extrachromosomal circular DNA (eccDNA) by these RTEs using inverted PCR. EccDNAs are byproducts of RTE activity in plants [71]. We first determined that inverted PCR with the designed primers did not produce PCR products with genomic DNA. Unfortunately, the primers on RTE3B produced PCR fragments with genomic DNA; therefore, the activity of RTE3B could not be assessed by inverted PCR. We continued eccDNA detection only for *MIG* (Supplementary File S3). For this, we enriched the eccDNA fraction by exonuclease treatment of total genomic DNA using an enzyme that specifically cut linear DNA while leaving circular DNA molecules intact. The product was then amplified by rolling circle amplification, and inverted PCR was carried out. We enriched eccDNA in genomic DNA isolated from developing grain at 10 dpa, as well as glume and lemma tissue. The specific products were detected only for eccDNA isolated from developing grain, and no products were obtained with eccDNA of glume and lemma tissues. Thus, our results showed that RTE *MIG* expresses both shGAG RNA and gRNA isoforms and has transposition activity.

Here, we provide experimental evidence suggesting that some detected RTE-RNAs originate from autonomous LTR retrotransposons with ongoing transposition activity in triticale at early stages of seed development.

**Figure 4.** Expression and extrachromosomal circular DNA (eccDNA) formation. (**A**) RT-PCR detection of the shGAG (a short isoform carrying ORFs for GAG protein) and gRNA isoforms of RTE3B and *MIG* full-length RTEs. CDC: reference gene (the cell division control protein). (**B**) Inverted PCR with genomic DNA and eccDNA-enriched fractions obtained from developing seeds (10 dpa) and glume and lemma triticale tissues. The positions of the primers on eccDNA are shown in the small representation in the right. PCR with shGAG primers with total and eccDNA-enriched DNA was used as a control.

#### **4. Discussion**

#### *4.1. A Set of Intergenic lncRNAs Detected by Nanopore Sequencing Is Expressed during the Early Stage of Triticale Seed Development*

Wheat seed development is a dynamic multistage process that involves significant changes in the transcriptome landscape. Here, we uncovered the lncRNA transcriptome of triticale seed at 10 days post anthesis (dpa), corresponding to the second stage of grain development, known as embryo differentiation (7–14 dpa) [34]. This stage is characterized by the formation of Type A starch granules and the expression of genes involved in starch biogenesis [36]. In turn, the genes encoding for the main storage proteins (the high molecular weight (HMW) glutenins and gliadin [36]) and the genes involved in storage protein biogenesis (e.g., wbm [75]) are expressed during the mid-development stage (14–21 dpa). In agreement with this, we detected the expression levels of the starch metabolism genes (Table 3), while expression of the storage protein genes (HMW glutenins and gliadins) and the wbm gene (found recently in the analyzed line (L8665) [76]) was not detected by Nanopore read analysis. These results prove that the analyzed triticale transcriptome corresponds to the early stage of grain development.

Our analysis showed that hundreds (798) of lncRNAs were expressed during this stage. Surprisingly, we found that 87% of the A and B lncRNAs were expressed from as yet unannotated regions of Genomes A and B. LncRNAs are often underrepresented in plant genome annotation. For example, 8009 lncRNAs were previously identified in the intergenic regions of barley [77], and 1760

unannotated lncRNAs were identified in foxtail millet [78]. The annotation of the lncRNAs expressed from the intergenic space can be challenging because of the biological properties of lncRNAs, including their high tissue- and stage-specificity. Indeed, we identified a high tissue-specificity index for triticale lncRNAs, suggesting that a large number of lncRNAs are expressed during a narrow time window. Additionally, lncRNAs often possess exons containing transposon-similar multicopy sequences that make the transcript assembly difficult because of unambiguity in short RNAseq read mapping. Here, we found that almost 30% of the triticale lncRNAs possessed exons with similarity to TEs. This is in accordance with previous reports on rice [79], where 73% lncRNAs were found to overlap with different TEs. Furthermore, 9.18% of sunflower lncRNAs have TE-related exons [80]. In these terms, RNAseq-based lncRNA identification can underestimate the number of lncRNAs in a cell or lead to transcript misassembly, because short reads from the repeat regions are often mismapped or discarded from the analysis [25]. However, notably, the study of plant lncRNAs and transposon-derived transcripts has mostly been limited by short read RNAseq data. Here, to escape lncRNA identification biases because of the short length of the RNAseq reads, we applied Nanopore direct RNA sequencing. This approach allowed us to precisely determine the transcribed regions in a complex wheat genome. In addition, Nanopore direct RNA sequencing is strand-specific and can be used to identify natural anti-sense lncRNAs. The application of third-generation sequencing could help to illuminate "the dark side" of developing seed transcriptomes involving lncRNAs and transposon-derived loci, thereby overcoming the obstacles in lncRNA discovery by short RNAseq reads. It will be especially useful for crop plants where lncRNA loci can be included into genome-wide association studies (GWAS) to determine which of them can influence key agronomical traits. Notably, lncRNA loci have not been well used for GWAS analysis so far [81]. We believe that the triticale lncRNAs identified in this study could act as valuable new targets for marker-assisted selection.

#### *4.2. Transcripts Encoding Diverse GAG Proteins Are Expressed during the Early Stage of Seed Development*

Seed development is accompanied by epigenetic relaxation, which may trigger retrotransposon (RTE) activity [21–23]. Here, we identified 20 transcripts with similarities to RTE proteins. Interestingly, we found that 75% of the RTE-related transcripts carry ORFs encoding GAG proteins, the main component of the RTE virus-like particle. It is known that during their lifecycle, RTEs have to produce significantly more GAG proteins that other RTE proteins. To ensure that there are excess GAG proteins compared with other RTE proteins, an *Arabidopsis* RTE called *EVD* encodes a special isoform (shGAG) encoding the GAG protein [82]. Because almost no systematic studies of RTE transcripts have been carried out at the single isoform level in crop plants, it was not previously clear whether shGAG transcript production is a common pattern for plant species. In this study, we detected isoform production for several full-length triticale RTEs and showed their transcription in wheat, implying that shGAG is a very common pattern for plants. Moreover, we showed that one of the full-length RTEs, TaeST2.45518.1, is capable of producing extrachromosomal circular DNAs (eccDNA), which are a byproduct of RTE activity and have been used to isolate transpositionally active RTEs [74]. Whether this RTE is active in the triticale embryo and can produce copies that are transmitted to the next generation or whether it is active in other tissues of developing grains (e.g., endosperm) warrants further investigation.

Based on the knowledge of splicing patterns of GAG-encoding isoforms, we were able to predict the GAG protein sequence and analyze it in more detail. Our results point to the existence of a divergent group of GAG proteins expressing during triticale and wheat seed development. These GAG proteins and the RTEs encoding them have several distinct features: (1) the RTEs encoding these GAG proteins are non-autonomous elements and possess no similarity to POL proteins; (2) the lengths of most of these GAG proteins (169–270 aa) are less than that of the conventional GAG protein (>300 aa); and (3) half of these GAG proteins lack the RNA-binding motif and cannot interact with RTE RNAs. The elements encoding GAG-RNA consist of two LTRs and the internal part is similar to the GAG protein. This structure makes these elements very similar to the previously described TR-GAG elements (terminal repeat with GAG domain) ([83]) found in many angiosperm species. However, the TR-GAG elements described in the current paper are classified as Ty1/Copia, while GAG proteins of previously identified TR-GAG elements are similar to both Ty1/Copia and the Ty3/Gypsy superfamily. While further functional and evolutionary studies are required, we suggest that these GAG loci are intermediate products of RTE diversification or "domestication". The "domestication" of GAG proteins has been documented in animals and insects [84–87]. Because of the short lengths of these GAG proteins, it can be also suggested that they may be involved in control of the activity of functional RTEs via incorporation into their virus-like particles. The mechanism of copy number control of RTEs via virus-like particles (VLP) misassembly, which is caused by a truncated GAG form, known as dominant-negative factor, has been well described for yeast [88,89]. No such examples have been described in plants. In the future, it will be intriguing to check whether TR-GAG proteins are capable of interacting with normal GAG proteins of full-length RTEs, which, as we showed here, are expressed in the same developmental stage.

Previously, Nanopore long-read sequencing of transcriptomes was used to annotate expressed transposons in *Arabidopsis* mutants lacking key systems of TE suppression, resulting in elevated expression of transposons [26]. The authors detected the expression of almost 1300 TEs. However, elucidation of transposon expression a "wild-type" genetic background would provide a unique opportunity to trace natural evolutionary forces shaping plant retrotranscriptome and to more deeply understand the features of host–transposon interactions. Recent studies on maize and sunflower [26,90] and our current results point to the great advantage of Nanopore RNA sequencing to decipher RTE expression in crop plants, even those with complex genomes such as triticale. Together with the growing number of publicly available long-read RNA sequencing datasets, this opens a broad avenue for studies of transposon expression in plants on the isoform-based level.

#### **5. Conclusions**

Here, using Nanopore direct RNA sequencing, we identified hundreds of previously unknown lncRNAs and LTR retrotransposon-derived transcripts that are expressed in the early stages of triticale and wheat seed development. We showed that triticale lncRNAs often possess similar sequences to transposons and their expression has high stage and tissue specificity, with half of the lncRNAs having the highest expression level at 10–30 days post anthesis in wheat. In addition, we found that most of the detected retrotransposon-related RNAs have a single intron, carry ORFs encoding for a divergent set of GAG proteins, and are encoded by potentially autonomous and non-autonomous retrotransposons. Of these, we identified one Ty1/Copia LTR retrotransposon that produces extrachromosomal circular DNA, and we suggest that it has transposition activity in developing triticale seeds. Finally, this study identified a unique set of lncRNAs and LTR retrotransposons expressed in the early stages of seed development, which we believe will be useful for further exploration of their functional potential and the association with phenotypic variation in triticale and wheat.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2223-7747/9/12/1794/s1, Supplementary File S1: Genomic location and overlapped genes for 976 Triticale IncRNAs, Supplementary file S2. Fasta sequences of 796 lncRNAs, Supplementary file S3. Sequence of RTE7B LTR retrotransposon that is expressed and produce eccDNA in triticale developing seed at 10 dpa.

**Author Contributions:** Conceptualization, I.K. and A.S. (Alexander Soloviev); methodology, I.K.; formal analysis, I.K., M.D., P.M., A.S. (Andrey Shingaliev), M.O., E.K., and A.S. (Alexandra Sigaeva); writing—original draft preparation, I.K.; writing—review and editing, I.K., A.S., and G.K.; funding acquisition, A.S. and G.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Ministry of Education and Science of Russian Federation (Goszadanie No 0431-2019-0005).

**Conflicts of Interest:** The authors declare no conflict of interest.

**Data Availability Statement:** Nanopore data produced for this study are available in Sequence Read Archive (SRA) NCBI under Bioproject Accession PRJNA683988.

#### **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Factors Influencing Genomic Prediction Accuracies of Tropical Maize Resistance to Fall Armyworm and Weevils**

**Arfang Badji 1,\*, Lewis Machida 2, Daniel Bomet Kwemoi 3,\*, Frank Kumi 4, Dennis Okii 1, Natasha Mwila 1, Symphorien Agbahoungba 5, Angele Ibanda 1, Astere Bararyenya 1, Selma Ndapewa Nghituwamhata 1, Thomas Odong 1, Peter Wasswa 1, Michael Otim 3, Mildred Ochwo-Ssemakula 1, Herbert Talwana 1, Godfrey Asea 3, Samuel Kyamanywa <sup>1</sup> and Patrick Rubaihayo <sup>1</sup>**


**Abstract:** Genomic selection (GS) can accelerate variety improvement when training set (TS) size and its relationship with the breeding set (BS) are optimized for prediction accuracies (PAs) of genomic prediction (GP) models. Sixteen GP algorithms were run on phenotypic best linear unbiased predictors (BLUPs) and estimators (BLUEs) of resistance to both fall armyworm (FAW) and maize weevil (MW) in a tropical maize panel. For MW resistance, 37% of the panel was the TS, and the BS was the remainder, whilst for FAW, random-based training sets (RBTS) and pedigree-based training sets (PBTSs) were designed. PAs achieved with BLUPs varied from 0.66 to 0.82 for MW-resistance traits, and for FAW resistance, 0.694 to 0.714 for RBTS of 37%, and 0.843 to 0.844 for RBTS of 85%, and these were at least two-fold those from BLUEs. For PBTS, FAW resistance PAs were generally higher than those for RBTS, except for one dataset. GP models generally showed similar PAs across individual traits whilst the TS designation was determinant, since a positive correlation (R = 0.92\*\*\*) between TS size and PAs was observed for RBTS, and for the PBTS, it was negative (R = 0.44\*\*). This study pioneered the use of GS for maize resistance to insect pests in sub-Saharan Africa.

**Keywords:** prediction accuracy; mixed linear and Bayesian models; machine learning algorithms; training set size and composition; parametric and nonparametric models

#### **1. Introduction**

Insect damage on maize plants and stored grains potentially impedes food security in Africa [1–3]. The fall armyworm (FAW) and stem borers in the field, and the maize weevils (MWs) in storage facilities, are some of the most devastating insect pests on the continent. These insect pests cause yield losses ranging from 10–90% leading to loss of grain marketability, and consumer health concerns due to the possible contamination of the grain with mycotoxins, such as aflatoxins [3–6]. In Africa, tremendous efforts were made during the last two decades to build host plant resistance to insect pests in maize through traditional pedigree (phenotypic)-based selection (PS) with substantial desirable results. Several Africa-adapted maize lines were developed and successfully tested for resistance to MW damage on grains [7–12]. Some of the success stories are from the International Center for Maize and Wheat Improvement (CIMMYT) of Kenya through the Insect Resistant Maize

**Citation:** Badji, A.; Machida, L.; Kwemoi, D.B.; Kumi, F.; Okii, D.; Mwila, N.; Agbahoungba, S.; Ibanda, A.; Bararyenya, A.; Nghituwamhata, S.N.; Odong, T.; et al. Factors Influencing Genomic Prediction Accuracies of Tropical Maize Resistance to Fall Armyworm and Weevils. *Plants* **2021**, *10*, 29. https://dx.doi.org/10.3390/ plants10010029

Received: 13 July 2020 Accepted: 14 September 2020 Published: 24 December 2020

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/ licenses/by/4.0/).

for Africa project (IRMA) that produced several storage pest and stem borer resistant maize lines [7,8,13–15]. On the other hand, the FAW is a new pest on the continent, first reported in 2016 in West and Central African countries [16], from where it spread throughout the African continent [17]. Hence, although efforts to develop FAW resistant varieties are underway at several institutions, including CIMMYT, published reports of FAW resistant varieties are not yet available [18,19].

The complex nature of insect resistance traits makes PS slow and expensive, and thus, difficult to implement, especially for resource-constrained breeding programs [20]. Application of traditional marker-assisted selection (MAS) is hampered by the necessity to first discover resistance-associated genomic regions through genetic linkage and genome-wide association mapping methods, both with several shortcomings, especially for complex traits [21–23]. In addition, genetic linkage and genome-wide association mapping studies have seldom been explored in African germplasm [8,24], which further impedes the application of MAS in the development of insect resistance maize germplasm in Africa. In a previous study, we discovered several quantitative trait nucleotides and genes that are putatively associated with FAW and MW resistance, confirming the quantitative nature of these traits, hence the difficulty in improving these traits through MAS [25]. An alternative to both PS and MAS is genomic selection (GS), which uses whole-genome markers to perform genomic prediction (GP) of breeding values of unphenotyped genotypes, from which one can select superior candidate genotypes for crossing to produce hybrids or to advance to the next generation [26]. GS was reported to achieve up to threefold annual genetic gain in maize improvement when compared to MAS, due to a more efficient accounting of trait-associated quantitative trait loci (QTL), faster selection cycles, and lower phenotyping costs [27–33].

Several statistical and machine learning GP models with various strengths and weaknesses have been developed to adapt to different contexts that are partly influenced by the genetic architecture of traits (number and effect size of QTL, proportions of additive and non-additive genetic effects) and reproductive classes of plants (allogamous vs. autogamous vs. clonally propagated) [34–36]. Therefore, to effectively implement GS in crop improvement programs, it is necessary to employ a holistic approach to determine the best GP strategy for particular breeding targets for given crop species [31,37]. Statistical models for GS vary in their prior assumptions and treatment of marker effects [31]. Parametric models focus on parameter estimates rather than prediction, while nonparametric algorithms give priority to prediction and have fewer assumptions [38]. Some parametric methods assume the SNP effects follow a normal distribution with equal variance for all loci, which seems unrealistic in practice.

Representative parametric methods are ridge regression best linear unbiased predictors (RR-BLUP) [39] and genomic BLUP (GBLUP) [40]. GBLUP was the first GP method to be developed, and it replaced the traditional pedigree-based relationship matrix with a genomic information-based matrix to improve prediction accuracies (PAs) [41]. Parametric methods BayesA [26] and weighted Bayesian shrinkage regression (wBSR) [42], on the other hand, consider a prior distribution of effect with a higher probability of moderate to large effects. Regarding parametric models such as BayesB [43] and BayesCπ [44], assumptions are made that consider the effects of some SNPs to be zero. The Bayesian least absolute shrinkage and selection operator (Bayesian LASSO) assumes that the effects of all markers follow a double exponential distribution [45], whilst the Bayesian sparse linear mixed model (BSLMM), a parametric method developed by Zhou et al. [46], combines the hypotheses of both GBLUP and Bayesian methods and achieves higher PAs than BayesCπ and BayesLASSO. Nonparametric or semi-parametric approaches such as random forest and reproducing kernel Hilbert space (RKHS) [47,48] are better suited for accounting for non-additive genetic effects (37,38), in contrast with parametric genomic prediction models [23,38,47,49]. Several studies compared the performances of GP models under different conditions. In a simulation study, Meuwissen et al. [26] found that while GBLUP achieved PAs of up to 73.2%, BayesA and BayesB comparatively provided additional increases of

around 9% and 16%, respectively. However, when a population is composed of close relatives and the target traits are controlled by several small effect genes, the different methods perform similarly [50–52]. On the contrary, BayesB and BayesCπ are better when dealing with distant relatives and traits affected by a small number of large-effect loci [23]. Kernel methods such as RKHS are robust in predicting non-additive effects and in solving complex multi-environment multi-trait models [53,54]. Compared to the above-mentioned parametric methods, deep learning techniques such as support vector regression (SVR), multilayer perceptron, and convolutional neural networks models performed poorly in some studies [55,56]. However, there are also instances where RKHS outperformed one or several of the parametric methods, for instance, GBLUP, rrBLUP, and Bayesian algorithms, in terms of several traits in several crops including maize [51,57–59]. These results were most likely because nonparametric GS models capture more adequately the non-additive genetic components which are an essential characteristic of complex traits [23,37,38] and hence could be good candidate tools for the prediction of FAW and MW-resistance traits which are controlled by both additive and non-additive gene action [21,23,31,41,60]. Therefore, since GP for maize resistance to insect pests such as FAW and MW is not yet well explored, it is pivotal to compare performances of several available prediction algorithms to inform better future GS programs. Therefore, the Genomic Prediction 0.2.6 plugin of the KDCompute 1.5.2. beta (https://kdcompute.igss-africa.org/kdcompute/home), an online database developed by Diversity Array Technologies (DArT, https://www.diversityarrays.com) for the analysis of DArT marker data, presents great interest for this purpose. It hosts a suite of parametric, semiparametric, and nonparametric GP methods that can be run simultaneously on genotype-phenotype datasets.

Additional factors that influence PAs are the different sizes of the training sets (TSs) and breeding sets (BS) and their genetic relationships, the number of markers used to estimate genomic estimated breeding values (GEBV) of lines, the population structure, and the extent of linkage disequilibrium [21,23,31,41,60]. Since phenotyping is the current bottleneck in plant breeding and one of the disadvantages of GP is the requirement of large TSs for high PAs to be achieved, determination of effective TS composition and size is critical for effective implementation of GS in crop improvement programs [21,61–64]. Additionally, the best TS determination will depend on the genetic architecture and the extent of population structure of the trait targeted for GP [63], two parameters that are substantially variable among plant breeding traits. Another factor that is a determinant of the predictive ability is the kinship between the TS and the BS (63). Several methods are used for TS optimization and these generally fall into two categories—namely, untargeted and targeted approaches. For the untargeted approach, the TS is determined independently of its genomic information, whereas the targeted method considers the genomic relationship between the TS and the BS as a means of maximizing PAs [65]. However, deciding on the best TS selection method is not straightforward and depends on context [66].

Furthermore, in maize, GPs were previously conducted using either genotypic best linear unbiased estimators (BLUEs) [67–69] or best linear unbiased predictors (BLUPs) [31, 41,70] as means of phenotypic correction [70]. BLUEs are obtained by treating the genotypic effect of a mixed linear model as fixed effects and provide an estimated mean for each individual of a population equal to its true value. On the other hand, BLUPs generated by considering the genotypic factors as random and allowing for the shrinkage of the means towards the population mean [71]. Whether to use BLUPs or BLUEs in GP is debatable. Phenotypic BLUEs allow avoiding double penalization which BLUPs suffer from. With phenotypic BLUPs, this double penalization is, however, compensated through maximization of the correlation between predicted and true line values, while phenotypic BLUEs do not rely on this shrinkage [70]. However, the shrinkage in the BLUP procedure accounts better for outliers and environmental variabilities [72], permitting better estimates of individual genetic effects than BLUEs [71], and therefore, it usually yields more accurate predictions of phenotypic performance [70,72,73]. Furthermore, BLUPs are better in handling unbalanced data, wherein, for example, the number of individuals is not the

same in different locations or in the different replications of an experiment [49,70]. On that basis, the current study was conducted to evaluate the efficacies of different parametric, semiparametric, and nonparametric methods from both statistical and machine learning GP models in generating prediction accuracies (PAs) for maize resistance to FAW and MW in a diverse panel using both genotypic BLUEs and BLUPs.

#### **2. Material and Methods**

#### *2.1. Genetic Material and Field Experiments*

The panel used in this study consisted of 358 maize lines with diverse genetic and geographic backgrounds, and they were sourced from the National Crop Resources Research Institute (NaCRRI/Namulonge, Uganda), the International Institute for Tropical Agriculture (IITA/Ibadan, Nigeria), and The International Maize and Wheat Improvement Center (CIMMYT/Nairobi, Kenya). The panel consisted of 71 inbred lines developed for various purposes at NaCRRI; 28 and five stem borer (SB)-resistant inbred lines from CIM-MYT [6,13,14] and IITA, respectively; 19 storage pest (SP)-resistant inbred lines [7,8]; and a doubled haploid (DH) panel of 235 lines developed at CIMMYT using six parents—three of which were stem borer-resistant, one was a storage pest-resistant inbred line (these were also included in the population), and two were CML elite lines (one, CML132 was included in the panel) (Supplementary Materials Table S1).

The panel was planted and evaluated in three environments, at Mubuku Irrigation Experimental Station in Kasese, western Uganda in 2017 (316 lines) during the second rainy season (2017B) and the National Crop Resources Research Institute (NaCRRI), Namulonge, central Uganda in 2018 (92 lines) and 2019 (252 lines) both during the first rainy seasons (2018A and 2019A, respectively). Detailed information on these locations is presented in Table 1.


**Table 1.** Geographical, climatic, and soil characteristics of the planting locations [74].

Each combination of location and season was considered an environment, resulting in a total of three environments. An augmented experimental design was adopted in all three environments using six checks in 2017B, two in 2018A, and four in 2019A replicated in all the blocks. The experiments in 2017B, 2018A, and 2019A consisted of twelve, five, and ten blocks, respectively, containing the replicated checks and unreplicated lines and the experiment in 2018A was replicated twice.

#### *2.2. Genotyping, Quality Control, and Imputation for Genomic Prediction Analyses*

Genotyping of the panel and SNP quality were described in our previous study [25]. In brief, maize leaves at the sixth-leaf stage of development were harvested from 341 of the 358 lines of the panel (5–10 plants per line) in 2017B and 2018A (for lines not captured in 2017B). The leaf samples were oven-dried overnight at 36 degrees Celsius and shipped to the Biosciences east and central Africa (BecA) Laboratory of the International Livestock Research Institute (ILRI, Nairobi, Kenya) for DNA extraction and genotyping. Diversity Array Technology (DArT) genotyping facilities (44) were used to successfully identify 34,509 SNPs from 341 of the 358 lines composing the panel; hence, only these lines were considered for the GP analyses. Duplicate SNPs were first removed using the R package DartR (45), leaving 28,919 unique SNPs (DRSNP) distributed across all the 10 chromosomes of the entire maize genome.

The DRSNP dataset was imputed before GP using KDCompute 1.5.2. beta (https:// kdcompute.igss-africa.org/kdcompute/home), an online database developed by Diversity Array Technologies (DArT, https://www.diversityarrays.com) for the analysis of DArT marker data. KDCompute uses a suit of imputation methods to impute the SNP dataset and scores the imputation results by calculating simple matching coefficient (SMC). The method with the highest SMC is considered as optimal and used to impute the original genotypic dataset.

#### *2.3. FAW and MW Resistance Phenotyping*

After germination, plants were left unprotected to allow sufficient natural pressure of fall armyworm (FAW) population to build up. FAW damage scoring in all the three environments was carried out two months after planting when adequate natural FAW infestation levels had manifested, and scoring was based on a visual assessment using a scale of 1 (no or minor leaf damage) to 9 (all leaves highly damaged) [75], illustrated in Figure S1 [18].

Rearing of and bioassays for MW were performed as described in previous experiments carried out at NaCRRI [76,77]. Weevils were reared prior to the MW bioassay to obtain enough insects aged between 0 to 7 days for infestation. During rearing, standard conditions were provided for weevils to ensure proper acclimatization during the experiment. Rearing was carried out by preparing a weevil-maize grain culture of 300–400 unsexed insects and 1.5 kg of grains contained in 3000 cm<sup>3</sup> plastic jars incubated for 14 days in the laboratory at a temperature of 28 ± 2 ◦C and relative humidity of 70% ± 5%, to enhance oviposition. The lids of the jars were perforated and a gauze-wire mesh with a pore size smaller than one mm was fitted on each lid to allow proper ventilation while preventing the weevils from escaping.

After harvesting and shelling, grains of each line were bulked across environments. Then, samples of 30 g were weighed from each grain bulk, aiming to produce three replicates per line for the MW bioassay experiment. However, due to the lack of an adequate amount of grains for most of the lines of the panel, only 64, 123, and 132 lines could generate three, two, and one replicates, respectively, and were therefore considered for the MW bioassay experiment. Each of these samples was wrapped in polythene bags and kept at −20 ◦C for 14 days to eliminate any weevil infestation prior to the start of the experiment. After this disinfestation process, samples were left to thaw and transferred into 250 cm3 glass jars and infested with 32 unsexed weevils. After 10-days of incubation to allow oviposition, all dead and living adult insects were removed. One month after infestation (MAI), each sample was removed from its jar, and the grains and the flour were isolated and their weights were recorded. The total number of holes inflicted by the weevils on the grains was counted along with the number of grains affected by such damages. Additionally, the numbers of dead and living weevils were recorded. After these measurements were collected, the grains were returned to their respective jars and all the measurements were repeated at two and three MAI. The collected data were used to infer, for each sample, the cumulative grain weight loss (GWL), the cumulative number of emerged adult weevil progenies (AP), and the final number of damage-affected kernels (AK).

#### *2.4. Statistical Analyses of the Phenotypic Data*

Both best linear unbiased estimators (BLUEs) and predictors (BLUPs) were generated using the general linear model with only phenotype option of the software Trait Association through Evolution and Linkage (TASSEL) [78] and the *ranef* function of the R package [79] lme4 [80], respectively. The mixed linear model for generating BLUEs (all factors considered as fixed) and BLUPs (all factors considered as random) for MW traits (GWL, AP, AK, NH, and FP) was as follows:

$$Y = \mu + \text{Replication} + \text{Genotype} + Error$$

The mixed model for generating BLUEs (all factors considered as fixed) and BLUPs (with all factors considered as random) for FAW damage scores across environments model was:

*Y* = *μ* + *Location* + *Block* + *Genotype* + *Location* : *Genotype* + *Error*

where *μ* in the two equations is the intercept.

#### *2.5. Strategies for TS and BS Determination*

#### 2.5.1. MW Resistance Traits

Due to inadequate amount of seeds, only 37% (126 out of 341 that had genotypic data) of lines from the panel had phenotypic data on grain weight loss (GWL), adult progeny emergence (AP), and number of affected kernels (AK). Therefore, to estimate GP accuracies for MW resistance, these 126 lines were used as the TS and the remaining 215 lines with only genotypic data constituted the breeding set (BS).

#### 2.5.2. FAW Damage Resistance

The GP analyses for FAW resistance were carried out on the 341 lines of the panel that were genotyped and phenotyped for FAW damage resistance. To determine TS and BS sizes and compositions for the evaluation of maize resistance to FAW damage, two strategies, namely, random-based TS (RBTS) and pedigree-based TS (PBTS), were used.

#### 2.5.3. Random-Based TS Determination

For the RBTS, 126 (37%) lines used for GPs of MW-resistance traits were used as the TS for FAW to predict the GEBVs of the remaining 215 lines first. To build the second TS for FAW, the 215 (63%) lines used earlier as BS were considered as a TS. Then to determine the third and fourth TSs for FAW, random selections of 75 and 85% of the lines in the entire panel were performed through the Excel formula "*=INDEX(\$A:\$A,RANDBETWEEN(1,COUNTA (\$A:\$A)),1)*" and dragging until the adequate number of lines for each percentage determined above was obtained.

#### 2.5.4. Pedigree-Based TS Determination

The four datasets determined based on the pedigrees of the lines in the panel (PBTS strategy) are presented in Table 2. For the first dataset (FAW.Ped1), the 235 (68.91%) CIMMYT doubled haploid (DH) lines were used as a TS and the remainder (106 lines) as a BS. Regarding the second dataset, the TS and BS were switched to consider the TS in FAW.Ped1 as BS, and BS in FAW.Ped1 as the TS. The third dataset, FAW.Ped3, had a TS composed of the 294 that were neither stem borer (SB) resistant nor storage pest (SP)-resistant lines from CIMMYT, whilst the 28 SB and 19 SP-resistant lines from CIMMYT constituted the BS. In the last dataset, FAW.Ped4, the 235 DH lines, the 28 SB and 19 SPresistant lines from CIMMYT, and the five SB-resistant lines from IITA amounting to 287 (84.16%) genotypes were considered as the TS and the remaining 54 lines from NaCRRI lines were considered as the BS (Table 2).

**Table 2.** Compositions of the pedigree-based test sets (TSs) for fall armyworm (FAW) datasets.


DH = doubled haploid; FAW, fall armyworm; FAW.Ped1 to 4, FAW datasets 1–4 with TS based on pedigree information of the lines in the panel; SB, stem borer; SP, storage pest; TS, training set; CIMMYT, International Center for Maize and Wheat Improvement; IITA, International Institute for Tropical Agriculture.

#### *2.6. Genomic Prediction Algorithms*

The GP analyses were performed using the BLUEs and BLUPs of the phenotypes and the 28,919 DRSNPs. Sixteen algorithms available in 10 GP methods were implemented using the Genomic Prediction 0.2.6 plugin of the KDCompute 1.5.2. beta. The 10 methods were directly translated from functions of five R packages designed for GP analyses:

#### 2.6.1. Bayesian Models

Bayesian models have different prior distributions with a general model that can be as follows: *y* = 1*nμ* + *Zμ* + *ε*, where *y* is the vector of observations, *Z* is the design matrix for random effects, and *μ* is the vector of random effects [31].

The BLR (Bayesian Linear Regression) algorithms from the BLR R Package [81] are used to fit the Bayesian ridge regression. The marker effects are assumed to have a Gaussian prior distribution with mean 0 and variance σ2, where σ<sup>2</sup> is unknown and assumed to have scaled *x*<sup>2</sup> distribution. In the KDCompute genomic prediction 0.2.6 plugin, the Gibbs sampler is run with 4000 iterations and 1000 iterations for burn-in period as default parameters.

The Bayesian Generalized Linear Regression (BGLR) package fits various types of parametric and semi-parametric Bayesian regressions. The parametric Bayesian algorithms used from this package rely on different prior distributions that induce different types of shrinkages of the marker effects [82], including: Gaussian (Bayesian ridge regression, BRR [83]), scaled-t (BayesA [26]), double-exponential (Bayesian LASSO, BL [84]), and two component mixtures with a point of mass at zero and a distribution with a slab that can be either Gaussian (BayesC [44]) or scaled-t (BayesB [43]). In the KDCompute genomic prediction 0.2.6 plugin tool, defaults parameters for running the Gibbs sampler were used: 4000 iterations and 1000 iterations for burn-in period.

Reproducing kernel Hilbert space (RKHS) [47,48] is a semiparametric Bayesian method from the BGLR package implemented on the KDCompute genomic prediction 0.2.6 plugin. The RKHS methods employs a kernel function to convert the molecular markers as a between pairs of observations distances, thereby, generating a square matrix that fits in a linear model. This non-linear regression method is expected to capture dominance and epistasis effects more efficiently. This approach can be modelled as:

$$y = \mathbb{W}\mu + \mathbb{K}\_{\text{fr}}\alpha + \varepsilon\_{\text{r}}$$

where *μ* represents the fixed effects vector and ε is a vector of random residuals. The parameters α and ε are assumed to have independent prior distributions *α* ∼ *N*(0, *Khσ*<sup>2</sup> *α* ) and *ε* ∼ *N*(0, *I σ*2 *ε* ), respectively, and the matrix *Kh* relies on a reproducing kernel function with a smoothing parameter *h*. The parameter *h* measures the genomic distances among genotypes that can be interpreted as a correlation matrix and it controls the rate of decay of the correlation among genotypes [51]. To perform this analysis, the same number of iterations and burn-in parameters as for the other Bayesian methods described above were set on the KDCompute genomic prediction 0.2.6 plugin.

#### 2.6.2. Mixed Models

The Sommer (solving mixed model equations in R) package [85] was used to implement the *mmer* (mixed model equations in R) function on the KDCompute genomic prediction 0.2.6 plugin. The package solves mixed model equations proposed by Henderson [86]. It works incidence matrices and known variance covariance matrices for each random effect using four algorithms: efficient mixed model association (EMMA) [87], average information (AI) [88], expectation maximization (EM) [89], and the default Newton–Raphson (NR) [90].

The model by Sommer can be formulated as [85]: *y* = *Xβ* + *Zμ* + *ε* with variance *V*(*y*) = *V*(*Zμ* + *ε*) = *ZGZ* + *R* Additionally, the mixed model equations for this model are:

$$
\left[\begin{array}{c}X^{\prime}R^{-1}X\,X^{\prime}R^{-1}Z\\Z^{\prime}R^{-1}X\,Z^{\prime}R^{-1}Z+G^{-1}\end{array}\right]^{-1}\left[\begin{array}{c}X^{\prime}R^{-1}y\\Z^{\prime}R^{-1}y\end{array}\right] = \left[\begin{array}{c}\mathcal{S}\\\mu\end{array}\right].
$$

where *G* = *Kσ*<sup>2</sup> *<sup>ω</sup>* is the variance covariance matrix of the random effect μ, from a multivariate normal distribution *<sup>μ</sup>* ∼ *MVN*(0, *<sup>K</sup>σ*<sup>2</sup> *<sup>μ</sup>*), *K* is the additive or genomic relationship matrix (**A** or **Ag**) in the genomics context, *X* and *Z* are incidence matrices for fixed and random effects, respectively, and *R* is the matrix for residuals (here *Iσ*<sup>2</sup> *<sup>e</sup>* ). A mixed model with a single variance component other than the error (*σ*<sup>2</sup> *<sup>e</sup>* ) can be used to estimate the genetic variance (*σ*<sup>2</sup> *<sup>μ</sup>*) along with genotype BLUPs to exploit the genetic relationships between individuals coded in **K**(**A**). The genomic relationship matrix was constructed according to VanRaden where *K* = *ZZ* /2 ∑ *pi*(1 − *pi*) [91].

The ridge regression best linear unbiased predictor (rrBLUP) packages can either estimate marker effects by ridge regression, or alternatively, BLUPs can be calculated based on an additive relationship matrix or a Gaussian kernel. Additionally, using the rrBLUP package, **the mixed model solution (MMS)** that calculates the maximum-likelihood (ML) or restricted-ML (REML) solutions for mixed models to perform GP [92] was fitted in the KDCompute genomic prediction 0.2.6 plugin.

The mixed models fitted by rrBLUP can be formulated as:

$$y = X\beta + Z\mu + \varepsilon\_r$$

where *β* is a vector of fixed effects and *μ* is a vector of random effects with variance *Var*[*μ*] = *Kσ*<sup>2</sup> *μ* . The residual variance is *Var*[*ε*] = *I σ*2 *ε* . This class of mixed models, in which there is a single variance component other than the residual error, has a close relationship with ridge regression (ridge parameter *λ* = *σ*<sup>2</sup> *<sup>ε</sup>* /*σ*<sup>2</sup> *<sup>μ</sup>*) (https://kdcompute.igss-africa.org/ kdcompute/home).

#### 2.6.3. Machine Learning Algorithms

The R package RandomForest that implements Breiman's random forest algorithm for classification and regression [93] was used on the KDCompute genomic prediction 0.2.6 plugin to fit the function missForest. Random forest is a non-linear machine learning algorithm that uses a two-layer randomization process to build decorrelated bootstrapped trees. As a first randomization layer, it builds multiple trees using a bootstrap sample of the marker data in the training. Then, a second randomization process is carried at the novel nodes to grow final trees. The random forest method selects at each node of each tree, a random subset of variables, and only those variables are used as candidates to find the best split for the node [94]. To predict the breeding value of a line in the TS, predictions over trees for which the given observation was not used to build the tree are averaged [51]. On the KDCompute 1.5.2. beta platform, both options for the *mtry*, square root and regression (*sqrt*(*p*) and *p/3*, respectively, where *p* is number of variables in x), for the classification of the number of variables randomly sampled as candidates at each split were implemented in this study. Additionally, the trees to grow (*ntree*) was set to 10, while *node size* (minimum size of terminal nodes) and *max nodes* (maximum number of terminal nodes trees in the forest can have) were set to 5 and 10, respectively. The 16 methods used in this study and their statistical characteristics are presented in Table 3.


**Table 3.** Genomic prediction methods used for the analysis of the different traits and datasets.

#### *2.7. Cross-Validations and PA Estimation*

To calculate the predictive accuracies of each of the 17 methods, a cross-validation approach was performed using the data for the TS with 10 folds and five repetitions amounting to 50 replications. The PAs were estimated as the correlation coefficient (R2) averaged across the 50 cross-validation replications between the observed phenotypic values and the predicted genomic-estimated breeding values (GEBV) (https://kdcompute. igss-africa.org/kdcompute/plugins).

#### **3. Results**

#### *3.1. Higher PAs Achieved for FAW and MW-Resistance Traits with BLUPs when Compared to BLUEs across GP Algorithms*

Both genotypic BLUEs and BLUPs for resistance to FAW and MW traits such as AK, AP, and GWL were used in GPs. In general, BLUPs produced better predictions than BLUEs by at least two orders of magnitude in terms of PAs (Figure 1). The PAs realized with BLUEs (Figure S2) varied from −0.246 for FAW (mms\_ML) to 0.299 for AP (BayesB), while PAs for BLUPs ranged from 0.668 for AP (mmer\_NR) to 0.823 for AP (missForest\_Reg). The differences in terms of accuracies between BLUEs and BLUPs were high, despite the highly significant (*p* < 0.001) correlations between BLUEs and BLUPs for each trait ranging from 0.93 for FAW to close to 1 for AP, AK, and GWL (presented in Figure 1); therefore, only results for BLUPs will be presented hereafter.

**Figure 1.** Boxplot of PAs (prediction accuracies) for best linear unbiased estimators (BLUEs) (in pink) and predictors (BLUPs) (in blue) of maize resistance to FAW and MW across prediction models and correlations (r) between BLUEs and BLUPs for each trait. FAW, fall armyworm; GWL, grain weight loss; AP, adult progeny emergence; AK, number of affected kernels. \*\*\* significant at *p* < 0.001.

#### *3.2. PAs for MW Resistance Traits Using BLUPs*

The PAs were generally high for the tested MW traits, mostly above 0.668 across the 12 GP models that were successfully run on the datasets (Figure 2); however, RKHS failed to work for AK. The highest PAs were achieved for AP with missForest\_reg (0.823), followed by BRR (0.805), and RKHS (0.804), whilst mmer\_NR algorithm had the lowest prediction accuracy of 0.667 (Figure 2). The PAs achieved for GWL ranged from 0.742 for missForest\_Sqt to 0.795 for mmer\_NR, while for AK, they varied from 0.749 for missForest\_sqrt to 0.779 for BRR (Figure 2). In general, Bayesian models predicted better than both mixed model and machine learning methods, although the differences were small (Figure S3).

**Figure 2.** Boxplots of the genomic prediction accuracies of BLUPs for MW-resistance traits: GWL, grain weight loss; AP, adult progeny emergence; AK, number of affected kernels (See Table 3 for GP algorithms).

#### *3.3. PA for FAW Resistance Using BLUPs*

The different maize resistance to FAW datasets showed high predictive abilities with 10 of the 16 GP algorithms used in the study. For the RBTS approach, the PAs were lowest with the dataset that had a TS composed of 37% (lowest size) of the panel and highest with the largest TS (85% of the panel). Even with a TS of 37%, the PAs were still high, ranging from 0.694 to 0.714 for mms\_ML and BLR methods, respectively (Figure 3). However, it should be noted that with equal TS sizes and same composition (37% of the panel), higher PAs were achieved for MW-resistance traits (GWL, AP, and AK) compared to FAWresistance ones (Figure S3). The PA for the RBTS of 63% varied from 0.833 for BL method to 0.838 for the missForest\_Sqt; thus, there was a small variation among different methods. Similarly, there was minimal variation among GP algorithms on the dataset with a 75% TS whose PAs varied from 0.838 for mms\_REML to 0.843 for MissForest\_Reg. The same trend was obtained on the dataset with a RBTS of 85% of the panel, with PAs ranging from 0.843 for the BRR model to 0.847 for the missForest\_Reg method. Furthermore, there was a high and significant (*p* < 2.2.10−16) positive correlation of 0.92 (Figure 4) between the PAs and TS sizes for FAW datasets for the RBTS denoting a steady improvement of the PAs as the TS size increased. However, the PAs for FAW resistance reached a plateau at TS size above 63% of the panel (Figure 5).

**Figure 3.** Boxplot of PAs for maize resistance to the fall armyworm (FAW) datasets with the RBTS approach with random selection of 37, 63, 75, and 87% of the entire panel (see Table 3 for GP algorithms).

**Figure 4.** Pearson correlation between training set (TS) sizes and prediction accuracies (PAs) across the 10 genomic prediction algorithms conducted on RBTS (**A**) and PBTS (**B**) datasets for fall armyworm resistance (FAW) resistance.

**Figure 5.** Prediction accuracies for FAW with RBTS across algorithms and training sets with different sizes in percent of the total panel.

Although the PAs did not vary much among GP algorithms, especially when the analyses involved larger TS sizes equal or bigger than 63% of the panel, the machine learning methods slightly outperformed other GP algorithms for all the traits, except for the TS of 37% where Bayesian methods such as BLR and BayesC showed a slight advantage over the machine learning methods (Figure S4). The PAs for FAW-resistance datasets with PBTS were generally high, mostly above 0.82 (Figure 6). For the first dataset (FAW.Ped1) with a TS of 68.91% of the panel (see Table 2), the PAs varied between 0.828 for BLR to 0.835 for missForest\_Sqt. For FAW.Ped2 (TS = 31.09%), the PAs ranged from 0.862 for BayesC to 0.864 for mms\_REML.

**Figure 6.** Boxplots of PAs for maize resistance to the fall armyworm (FAW) datasets using the PBTS approach (see Table 2 for the PBTS strategy and Table 3 for GP algorithms).

For FAW.Ped4, with a TS of 84.16%, PAs varied between 0.860 to 0.864 for missForest\_Sqt and mms\_ML, respectively. However, for FAW.Ped3 with the largest TS (86.22%), eight of the 10 algorithms achieved low PAs (below 0.20) and only missForest\_Reg and missForest\_Sqt attained PAs of 0.749 and 0.750, respectively. Thus, the Pearson correlation between the sizes of the PBTS datasets and the predictions accuracies for the 10 GP algorithms revealed a significant (*p* > 0.0036) negative relationship of r = −0.45 (Figure 4).

In the FAW datasets, the PAs were more influenced by the composition of the TS and its genetic relationship with the BS (see Table 2). Using the doubled haploid (DH) lines as TS (FAW.Ped1) and vice-versa (FAW.Ped2) or DH and stem borer (SB) and storage pest (SP)-resistant lines as TS (FAW.Ped4) permitted achieving relatively high PAs from all the 10 algorithms, which when considering the CIMMYT SB and SP-resistant lines as BS and the remainder as a TS (FAW.Ped3), only resulted in machine learning algorithms missForest\_reg and missForest\_Sqt achieving relatively high PAs. Furthermore, the composition of the TS and its relationship with the BS determined which GP methods achieved the highest Pas; machine learning algorithms worked best on FAW.Ped1 and FAW.Ped3, linear mixed model approaches outperformed Bayesian and machine learning algorithms on FAW.Ped2 and FAW.Ped4, and Bayesian methods ranked either second or third on all datasets (Figure S5). It should be noted that the PBTS strategy generally achieved better PAs than the RBTS irrespective of the size of the TS, except for the FAW.Ped3 dataset (Figures 3 and 6).

#### **4. Discussion**

Tropical maize germplasm is characterized by rapid linkage disequilibrium (LD) decay with high diversity [95]. These germplasm genetic characteristics make genomic selection (GS) a promising approach to integrate into African breeding programs [96]. However, genomic prediction (GP) models are very diverse and their differential performances depend on crops and trait architectures, besides other parameters such as the size of the training set (TS) and its genetic relationship with the breeding set (BS) [31,37]. Therefore, this study aimed at assessing the feasibility of genomic selection for maize resistance to FAW and MW through estimation of the genomic prediction accuracies achieved by parametric, semiparametric, and nonparametric (machine learning) genomic prediction (GP) algorithms using phenotypic BLUEs and BLUPs, and random and pedigree-based TS determination strategies.

#### *4.1. Higher Pas Were Achieved for BLUPs Compared to BLUEs for Both FAW and MW-resistance Traits*

With a RBTS of 37% of the panel, which was the smallest and expected to give the worst PAs, PAs were higher (at least two-fold) across both FAW and MW-resistance traits and for all GP models when trait BLUPs were used as phenotypes compared to BLUEs, although there were high Pearson correlations between these two categories of phenotypic data for each trait. In general, BLUPs were reported to have higher predictability than BLUEs owing to better accounting for outliers and environmental variabilities permitted by the shrinkage procedure in BLUPs, which results in more accurate estimates of individual genetic effects [70–73]. Furthermore, most of the predictive differences between BLUPs and BLUEs might have stemmed from BLUPs being more suitable than BLUEs in fitting data recorded from unbalanced experiments [49,70] as was the case for both FAW damage scores across environments and MW bioassay in this study. Therefore, for all subsequent analyses with higher RBTS sizes and the PBTS strategy for FAW, only BLUPs were focused at in this study and will be further discussed.

#### *4.2. High PAs Were Achieved for FAW and MW-Resistance Traits Using Moderately Sized Training Sets*

The obtained PAs were high for both MW and FAW-resistance traits even with TS of moderate sizes confirming the potential of genomic selection (GS) in Africa-adapted germplasms [28–30,33]. With a TS of 37% of the entire panel, high PAs (above 0.70) for MWresistance traits, grain weight loss (GWL), adult progeny emergence (AP), the number of

affected kernels (AK), and FAW resistance were achieved in agreement with the moderate to high heritability values for these traits as, reported earlier [21,31,41]. These results are significantly important considering that one of the disadvantages of GS is the requirement of large TS which negatively impacts the reduction of phenotyping cost [62,64].

The PAs increased up to above 0.85 in proportion to the increase in TS (RBTS approach) size for FAW resistance which was the only trait phenotyped for all the lines of the panel. It would be interesting to phenotype other lines of the panel that were not evaluated for MW-resistance traits to establish larger TS which may improve the PAs [31,65,97,98]. Very few reports of GP are available for maize resistance to biotic stresses. High PAs were achieved for maize resistance to chlorotic mottle virus (up to 0.95) and maize lethal necrosis (reaching 0.87) in tropical germplasm [67]. However, lower PAs of up to 0.59 were obtained in a study that assessed the predictability of maize resistance to the European corn borer [99] in temperate germplasm. Additionally, Gowda et al. (69) reported moderate PAs (close to 0.60) for maize resistance to a biotic stress, maize lethal necrosis in tropical maize populations.

#### *4.3. GP Algorithms Performed Differently on FAW and MW Maize Resistance Traits*

In this study, several GP models that included statistical and machine learning algorithms from parametric, semi-parametric, and nonparametric approaches were used to predict FAW and MW-resistance traits. These GP algorithms, as expected, performed differently on the different traits although the predictive variations were generally minimal, especially when large TS were involved, similarly to earlier model benchmarking reports [100,101]. Bayesian models (parametric: BLR and BRR, and semi-parametric: RKHS) performed better on MW traits, GWL, AP, and AK, while nonparametric machine learning algorithms (missForest, here), and to a lesser extent, the linear mixed model (especially in the PBTS approach), achieved the highest PAs on FAW datasets. The differential performances of the different GP algorithms on the insect resistance traits evaluated in this study could be due to differences in the genetic structures (extent of additive vs. non-additive gene action) of the respective traits [23,38,47,49]. Maize resistance to FAW, which was moderately heritable across environments [25], would be expected to be controlled by both additive and non-additive genetic factors, including epistasis [102–104], whereas, MW-resistance traits such as GWL, AP, and AK with heritability values above 90% [25] were most likely characterized by a prevalence of additive gene action [105,106] in the current panel.

This supposed genetic architecture difference between FAW and MW-resistance trait could be the reason for non-linear methods such as random forest performing better at predicting FAW resistance, since these are more capable of integrating epistasis in the statistical modelling [27,51]. However, the RKHS algorithm, also a non-linear GP approach known to efficiently handle epistatic genetic relation [51,59], did not successfully run on FAW dataset, although it was among the best models for predicting MW-resistance traits, except BLUPs for the number of affected kernels (AK), for which the RKHS algorithm did not run successfully. In this study, the reasons for some GP algorithms failing to run either on MW or FAW-resistance datasets are unclear, but this could be related to the BLUPS structure of the datasets that failed to run. It should be noted that all the algorithms ran successfully on phenotypic BLUEs datasets with the smallest TS (37% of the panel) being used to compare PAs between BLUPs and BLUEs in this study. However, the two to three-fold predictive ability gain with BLUPs compared to BLUEs would be an incentive to consider BLUPs in future GS activities for maize resistance to MW and FAW. Overall, future GS efforts for maize resistance to MW and FAW are recommended to focus more on Bayesian and machine learning algorithms such as random forest, BayesA, BayesB, BayesC, BRR, and BLR which outperformed mixed linear models for most datasets considered in the current study.

#### *4.4. Influences of the Sizes and the Compositions of TS and BS on PAs*

Two factors, the relative sizes of the TS and BS (RBTS approach) and their genetic relationship (PBTS approach), influenced the levels of PAs across FAW-resistance datasets, corroborating earlier reports [31,63,65,97,98,107,108]. A net increase in PAs for maize resistance to FAW was realized when the size of the TS was increased from 37% (0.694 to 0.714) to 63% (0.833 to 0.838), similar to earlier reports on wheat yield [109]. This increase was followed by a slight gain in predictability at 75% (0.837 to 0.843) and 85% (0.843 to 0.847), and thus, the PAs plateaued when TS sizes above 63% were considered in this study as reported earlier in other studies [21,64,109–111]. Thus, future GS programs for maize resistance to FAW could be designed around TS composed of a minimum of 60% of the entire breeding germplasm to achieve high genetic gains. These results were further supported by the highly significant (*p* > 2.2.10−16) positive correlation (R = 0.92) between TS size and PAs. Similarly, positive correlations between the number of lines in the TS and the PAs, and plateau for the PAs were also reported by Edwards et al. [109].

The composition of the TS and its relationship with the BS are determinant factors for the genomic predictability of complex traits [63,112–114]. In the current study, using the PBTS approach, these two parameters were more important than the size of the TS since higher PAs were achieved in FAW.Ped2 (0.862 to 0.864) with a TS of 31.09% compared to all other FAW PBTS datasets, including FAW.Ped3 (0.114 to 0.750), with the largest TS of 66.22%. In fact, FAW.Ped3 achieved the lowest PAs among all the PBTS FAW datasets. These results were further illustrated by the significantly (*p* < 0.0036) negative correlation (R = −0.45) between the sizes of the PBTS and the achieved PAs.

However, it is not very clear why the predictions for the BS FAW.Ped3 (47 CIMMYT SB and SP-resistant lines) and the TS (DH, IITA SB, and NaCRRI lines) led to lower PAs for FAW.Ped3. A possible explanation could be that these two sets were distantly related since only two and one CIMMYT SB and SP-resistant lines, respectively, were used as parents to develop the DH lines. Spindel et al. [111] argued that high PAs can be achieved with smallsized TS when lines in the TS and the BS are closely related, since such TS would sample the full genetic diversity of the population. However, the more distantly related the TS and the BS are, the larger the required TS size to reach high PAs [111]. Using the CIMMYT SB and SP-resistant lines as a TS would most likely lead to lower PAs since such a TS would be additionally disadvantaged by its small size (47 lines). The DH lines in the current study are involved as a TS in most of the best performing GP datasets evaluated in the current study (both in the RBTS and PBTS approaches) and as unique lines in the BS of the best performing pedigree-based BS (FAW.Ped2). This DH population could be of interest in future breeding activities targeted at improving insect resistance in maize [23,115–117] and potentially useful for GS of complex traits with low to moderate heritability [118].

#### **5. Conclusions**

This study assessed prediction accuracies of genomic-estimated breeding values for fall armyworm (FAW) and maize weevil (MW)-resistance traits in a diverse Africaadapted maize panel using several parametric, semi-parametric, and non-parametric genomic prediction models. Prediction accuracies for maize resistance to FAW and MW traits were relatively high, even with a moderate training set size. For FAW resistance, although the prediction accuracies were positively correlated with the size of the training set, the composition and the relationship of the training set with the breeding set were more influential in predicting line performance. Additionally, TS determination-related parameters were more important than the type of genomic prediction models in predicting FAW and MW-resistance traits. However, Bayesian models on MW-resistance traits and machine learning models on FAW damage resistance outperformed mixed linear models in almost all the datasets used in this study. Therefore, future genomic selection programs for maize resistance to insect pests such as FAW and MW in Africa should put more effort into designing effective training sets and use selected Bayesian and machine learning GP algorithms to improve genetic gains, shorten breeding cycles, and accelerate variety release. Such programs could greatly benefit from using the genetically diverse maize panel used in this study as a base population, since it consists of lines adapted to several African agro-ecologies.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/2223-774 7/10/1/29/s1, Figure S1: Rating of maize plants based on foliar damage by FAW, Figure S2: Boxplot of PA for best linear unbiased estimators (BLUEs) of maize resistance to the fall armyworm (FAW) and maize weevil (MW) with identical training set size (37%) and compositions, Figure S3: Comparisons of genomic prediction accuracies of the three best algorithms for best linear unbiased predictors (BLUPs) of maize weevil resistance traits: number of affected kernels (AK), adult progeny emergence (AP), and grain weight loss (GWL) vs., fall armyworm resistance dataset with identical TS, Figure S4: Genomic prediction accuracies of the three best algorithms for each fall armyworm resistance BLUPs datasets with RBTS of 37, 62, 75, and 85% of the entire dataset, Figure S5: Genomic prediction accuracies of the three best algorithms for each fall armyworm resistance BLUPs datasets with PBTS, Table S1: Descriptions of parents and crosses that constituted the doubled-haploid population.

**Author Contributions:** Conceptualization, A.B. (Arfang Badji), P.R., S.K., M.O., D.B.K., and L.M.; methodology, A.B. (Arfang Badji), D.B.K., and L.M.; investigation, A.B. (Arfang Badji) and D.B.K.; formal analysis, A.B. (Arfang Badji) and L.M.; resources, A.B. (Arfang Badji), G.A., M.O., D.B.K., and L.M.; visualization, A.B. (Arfang Badji); supervision, P.R., S.K., M.O., and L.M.; project administration, P.R. and M.O.; funding acquisition, A.B. (Arfang Badji), M.O., G.A., D.B.K., and P.R.; writing—original draft preparation, A.B. (Arfang Badji); writing—review and editing, all authors (A.B. (Arfang Badji), L.M., D.B.K., F.K., D.O., N.M., S.A., A.I., A.B. (Astere Bararyenya), S.N.N., T.O., P.W., M.O., M.O.-S., H.T., G.A., S.K., P.R.). All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the capacity building competitive grant training the next generation of scientists provided by Carnegie Cooperation of New York through the Regional Universities Forum for Capacity Building in Agriculture (RUFORUM: RU/2016/Intra-ACP/RG/001). A. Badji received a Ph.D. scholarship from the Intra- ACP Academic mobility for Crop Scientists for Africa Agriculture (CSAA) project. Genotyping of the lines was carried out through a project of D.B.K. thanks to the Integrated Genotyping Service and Support (IGSS) coordinated by the International Livestock Research Institute (ILRI) and Bioscience east and central Africa (BecA), grant number: IGSS-DL0274. The National Crops Resources Research Institute (NaCRRI) of Namulonge, UGANDA through a grant of the USAID Feed-the-Future Uganda, Agriculture Research Activity/Maize paid the article processing charges. Further, NaCRRI financially and logistically supported field and laboratory activities of this research.

**Acknowledgments:** The authors thank all the technicians for experimental setup and data collection in the fields and laboratories of NaCRRI at Namulonge and Kasese, UGANDA. The authors acknowledge NaCRRI, the International Maize and Wheat Improvement Center (CIMMYT) of Nairobi, KENYA, and the International Institute of Tropical Agriculture (IITA) of Ibadan, NIGERIA for providing the original germplasm used for this research. The authors thank Clay SNELLER of the Ohio State University and all the personnel of ILRI and BecA who provided the genotyping support at BecA/ILRI.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Article* **Genetic Diversity of Selected Rice Genotypes under Water Stress Conditions**

**Mahmoud M. Gaballah 1, Azza M. Metwally 2, Milan Skalicky 3, Mohamed M. Hassan 4, Marian Brestic 3,5, Ayman EL Sabagh 6,\* and Aysam M. Fayed <sup>2</sup>**


**Abstract:** Drought is the most challenging abiotic stress for rice production in the world. Thus, developing new rice genotype tolerance to water scarcity is one of the best strategies to achieve and maximize high yield potential with water savings. The study aims to characterize 16 rice genotypes for grain and agronomic parameters under normal and drought stress conditions, and genetic differentiation, by determining specific DNA markers related to drought tolerance using Simple Sequence Repeats (SSR) markers and grouping cultivars, establishing their genetic relationship for different traits. The experiment was conducted under irrigated (normal) and water stress conditions. Mean squares due to genotype × environment interactions were highly significant for major traits. For the number of panicles/plants, the genotypes Giza179, IET1444, Hybrid1, and Hybrid2 showed the maximum mean values. The required sterility percentage values were produced by genotypes IET1444, Giza178, Hybrid2, and Giza179, while, Sakha101, Giza179, Hybrid1, and Hybrid2 achieved the highest values of grain yield/plant. The genotypes Giza178, Giza179, Hybrid1, and Hybrid2, produced maximum values for water use efficiency. The effective number of alleles per locus ranged from 1.20 alleles to 3.0 alleles with an average of 1.28 alleles, and the He values for all SSR markers used varied from 0.94 to 1.00 with an average of 0.98. The polymorphic information content (PIC) values for the SSR were varied from 0.83 to 0.99, with an average of 0.95 along with a highly significant correlation between PIC values and the number of amplified alleles detected per locus. The highest similarity coefficient between Giza181 and Giza182 (Indica type) was observed and are susceptible to drought stress. High similarity percentage between the genotypes (japonica type; Sakha104 with Sakha102 and Sakha106 (0.45), Sakha101 with Sakha102 and Sakha106 (0.40), Sakha105 with Hybrid1 (0.40), Hybrid1 with Giza178 (0.40) and GZ1368-S-5-4 with Giza181 (0.40)) was also observed, which are also susceptible to drought stress. All genotypes are grouped into two major clusters in the dendrogram at 66% similarity based on Jaccard's similarity index. The first cluster (A) was divided into two minor groups A1 and A2, in which A1 had two groups A1-1 and A1-2, containing drought-tolerant genotypes like IET1444, GZ1386-S-5-4 and Hybrid1. On the other hand, the A1-2 cluster divided into A1-2-1 containing Hybrid2 genotype and A1-2-2 containing Giza179 and Giza178 at coefficient 0.91, showing moderate tolerance to drought stress. The genotypes GZ1368- S-5-4, IET1444, Giza 178, and Giza179, could be included as appropriate materials for developing a drought-tolerant variety breeding program. Genetic diversity to grow new rice cultivars that combine drought tolerance with high grain yields is essential to maintaining food security.

**Keywords:** rice; drought stress; genetic diversity; SSR markers; dendrogram

**Citation:** Gaballah, M.M.; Metwally, A.M.; Skalicky, M.; Hassan, M.M.; Brestic, M.; EL Sabagh, A.; Fayed, A.M. Genetic Diversity of Selected Rice Genotypes under Water Stress Conditions. *Plants* **2021**, *10*, 27. https://dx.doi.org/10.3390/ plants10010027

Received: 13 November 2020 Accepted: 21 December 2020 Published: 24 December 2020

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/ licenses/by/4.0/).

#### **1. Introduction**

Rice is the most diversified crop, which is grown under diverse ecological conditions. It is the staple food for more than 50% of the world's population and is the world's most important food in terms of a natural calorie source [1]. It occupies almost one-fifth of the total land area covered under cereals [2]. Due to the changing climate, the frequent occurrence of many extreme events contributes to different abiotic stresses, limiting the productivity of rice globally. Among them, drought is one of the most critical abiotic stresses that continually threatens the world's food security [3]. The severity of the drought depends on many factors, such as the occurrence and distribution of rainfall, evaporative demands, and moisture-retaining capacity of the soils [4]. Therefore, it is imperative to find out the genotypes that can grow under water-scarce conditions to expand ricegrowing areas in water-limited lands. It can be helpful to meet the challenge of everincreasing global food demand [5]. Different rice varieties of a distinct genetic structure promise a future improvement of rice cultivars against drought stress [6]. Hence, the assessment of genetic diversity becomes important in establishing relationships among different cultivars [7]. The first step towards determining the magnitude of these risks is to evaluate the genetic diversity in improved rice genotypes as the success of a crop improvement program depends on the importance of genetic variability and how the desirable characters are heritable [8]. This identification of genotypes and their interrelationships is important. The development of new biotechnological techniques provides increased support to evaluate genetic variation in both phenotypic and genotypic levels. The results derived from analyses of genetic diversity at the DNA level could be used for designing effective breeding programs aiming to broaden the genetic basis of commercially grown varieties.

Molecular marker technology is a powerful tool for determining genetic variation in rice varieties. In contrast to morphological traits, molecular markers can reveal large differences among genotypes at the DNA level, providing a more direct, reliable, and efficient tool for germplasm characterization, conservation, and management and untouched by environmental influence [9]. SSR markers can detect a high level of allelic diversity, and they have been extensively used to identify genetic variation among rice subspecies [10]. Simple sequence repeats (SSR) markers are efficient in detecting genetic polymorphisms and discriminating among genotypes from germplasms of various sources; even they can notice the finer level of variation among closely related breeding materials within the same variety [11]. Several quantitative trait loci (QTLs) in rice with consistent effects on grain yield under water-limited conditions were reported [12]. Among them, DTY1.1, located in chromosome 1 of the rice genome, was identified from rice variety N22 and successfully transferred to susceptible genotypes line IR64 and MTU1010 [13]. Besides this, two other major-effect QTLs viz. DTY3.1 and DTY2.1 were also identified, which explains about 30% and 15% of the phenotypic variance, respectively [14]. Later, Shamshudin et al. [15] reported another two QTLs, DTY2.2 ad DTY12.1, for reproductive stage drought tolerance in rice. However, as mentioned earlier, all of these QTLs are derived from stable grain yield under drought conditions. Because of the low heritability of grain yield under drought stress, selection for secondary traits was more effective than grain yield traits. Due to the lack of a useful trait selection index related to drought tolerance, it is essential to find a molecular marker associated with drought tolerance in rice [16]. Many SSR markers have been reported to be linked to drought tolerance traits in rice, such as yield under drought [13,17,18]. Although several investigations have researched rice germplasm characterization and diversity analysis, variability studies of the common landraces and cultivars grown are limited. Therefore, the study was conducted based on three main goals; (i) morphological characterization of the genotypes for grain and agronomic parameters under normal and drought stress conditions; (ii) genetic differentiation of 16 rice cultivars by determination of specific DNA markers related to drought tolerance using SSR markers

and; (iii) grouping of cultivars according to their genotypes and subsequent decision their genetic relationship for different traits.

#### **2. Materials and Methods**

*2.1. Experiment Site*

The field experiment was conducted in Rice Research and Training Center Farm, Sakha Research Station, Agricultural Research Center, Egypt, in consecutive two rice growing seasons during 2018 and 2019 to investigate morphological traits and genetic diversity of 16 rice genotypes. The soil's physical and chemical properties in the Sakha Research Station in 2018 and 2019 years are illustrated in Table 1.

**Table 1.** Physical and chemical properties of soil at Sakha Research Station in 2018 and 2019 years (the soil was collected before starting the field preparation in each season).


#### *2.2. Treatments and Design*

The sixteen rice genotypes origin, pedigree, salience, and feature are shown in Table 2. All selected rice materials were grown under full irrigated (normal) and water stress conditions (flush irrigation every twelve days and exposed after fifteen days from transplanting) in a randomized complete block design with three replications.

#### *2.3. Experimental Procedures*

Seeds of all cultivars were sown in a nursery on 5 May and transplanted into the main field after 30 days in both years (2018 and 2019). A single seedling of each genotype was transplanted in 5 rows having 20 by 20 cm space (between rows and within row distance). Data were recorded from 10 randomly selected plants from each genotype. For characterization of root structure, large iron cylinders of 20 cm diameters and 60 cm height were used. They were buried inside the soil with a hammer, dug out with a spade, and pulled out using hooks. The roots were separated from the soil by thorough washing in a special washing facility. After taking the quantitative data, the shoot was separated from the root using a sharp knife and dried in an oven at 70 ◦C for five days. Root length (cm) was measured by the length of the root from the base of the plant to the tip of the longest root, root volume was determined by measuring the volume of water displaced by the plant root system (mm3), root thickness, the average diameter (mM) of the tip portion (about 1 cm from the tip) of three random secondary roots at the middle position of the root/plant, the number of roots/plant were estimated by the account roots at the maximum tillering stage and root: shoot ratio the ratio of the root dry weight (g) to the shoot dry weight (g) at maximum tillering stage was measured. Days to heading was recorded after flowering by

the daily count of panicle exertion. The physiological maturity dates were recorded when 80% grains turn into golden yellow color. The leaf rolling scores were estimated by visual estimation, and the susceptible varieties and lines first started the rolling symptoms in the morning. Highly sensitive lines did not unroll at early morning hours and were recorded based on methods proposed by (De Data et al. 1988). The flag leaf area (cm2) was measured using a leaf area meter (LI-3100 (LI-COR Inc., Lincoln, NE, USA), plant height (cm) was measured in (cm), from the soil surface to the tip of the tallest panicle of each plant, relative water content (%) was measured using the formula (Fw − Dw) × 100/(Tw − Dw) where, Fw is fresh leaf weight, Dw is leaf dry weight, and Tw is turgid leaf weight. A number of panicles/plant were recorded at harvest by counting the number of panicles/plant, 100-grain weight (g) was recorded as the weight of 100 random chosen filled grains/plant, sterility percentage (%) was calculated by the divided number of unfilled spikelets/panicle on a number of total spikelets/panicle, grain yield/plant (g) was recorded by collecting the filled grains from all the tillers in a single plant and their weight recorded, water use efficiency was calculated as economic yield/total water consumed during the crop growth period.

All cultural practices were applied as recommended. Nitrogenous fertilizer was used in three splits as top dressing; phosphorus and potash were applied in full dose at sowing. Insect and weed control was used as and when required.

#### *2.4. Statistical Analysis*

A combined analysis of variance for the two years was carried out for the yield and yield components. Phenotypic correlation between yield and yield-related traits was done following Steel et al. [19], and the data were analyzed using the Co-State software program.

#### *2.5. Genomic DNA Extraction*

Ten seeds of each advanced genotype were placed into a Petri dish with filter paper soaked in distilled water for germination under aseptic conditions. Then, germinated seeds were grown into labeled pots. Genomic DNA was extracted from the healthy portion of young leaves harvested from 21 days old seedlings. DNA isolation was carried out using a mini preparation modified CTAB (cetyltrimethylammonium bromide) method, which did not require liquid nitrogen, and only a minimal amount of tissue samples were needed [20]. Leaf tissues were cut into small pieces, homogenized, and digested with extraction buffer (1 M Tris, 0.5 M Na 2EDTA, 5 M NaCl, and distilled Hybrid 2O, pH 8.0) and 20% SDS. Following incubation of leaf extracts for 10 min at 65 ◦C in a water-bath, 100 μL of 5M NaCl was added and mixed well by gentle inversion. Then 100 μL 10 × CTAB was added and again incubated for 10 min at 65 ◦C in a water-bath. After that, 900 μL of a mixture of chloroform and isoamyl alcohol (24:1) was added and centrifuged for 8 min at 11,000 rpm in a microcentrifuge. Then, 500 μL of the upper aqueous layer was separated, and 600 μL of ice-cold isopropanol was added to it, mixed, and centrifuged for 12 min at 13,200 rpm. A small pellet was visible, and the supernatant was decanted. The pellet was then washed with 200 μL cold 70% ethanol and centrifuged at 13,200 rpm for 12 min. After removing ethanol followed by air drying, the DNA pellets were re-suspended into 100 μL of 1× TE buffer and dissolved the pellet by warming in a 65 ◦C water bath for up to 1 h (with frequent mixing or flicking the tube with finger). Then the pellet was stored at −20 ◦C in an ultra-freezer. The quality of DNA was estimated by agarose gel (0.8%) electrophoresis and visualized with UV light.


**Table 2.** Origin, pedigree, salience, and feature of sixteen rice genotypes.

#### *2.6. SSR Markers and PCR Amplification*

Ten SSR markers related to drought tolerance traits/QTLs were used. The sequences of primer pairs are found on the Web database (http://www.gramene.org). Primers' names, repeat motifs, chromosome number, and related trait/QTL are shown in Table 3. PCR amplification reactions were done in 10 μL reaction mixtures, containing 50 ng/μL of template DNA, 0.5 μL of each forward and reverse primer, 5 μL of PCR master mix (Ferments), and 3 μL dd H2O. Thermal cycler was used with the following PCR profile: an initial denaturation step at 94 ◦C for 5 min, followed by 35 cycles of denaturation at 94 ◦C for 1 min, annealing at 55 ◦C for 30 s, and primer elongation at 72 ◦C for 1 min and then a final extension at 72 ◦C for 5 min. Amplified products were stored at −20 ◦C until further use.



 (root to shoot ratio), (Plant height), (Days to heading), FGN (Filled grain number), TSW (1000-seed weight), (Root Thickness), (Root Length), SW (Seed weight), SN (Seed number), SSP (seed sterility percentage), RDW (root dry weight, RFW (root fresh weight), SP (Seed Percentage), HI (Harvest index), PL (panicle length), LL (Leaf Length), LW (leaf width), SN (Spikelet number), RV (Root volume), DTM (days to maturity).

#### *2.7. Electrophoretic Separation and Visualization of Amplified Products*

Five μL of PCR amplified product was loaded into each well of 3% agarose gel supplemented with ethidium bromide. The TAE 1× was used as a running buffer, and a 50 bp DNA ladder (0.5 μg/μL, ferments) was used to estimate the molecular size of the amplified fragments. Electrophoresis was conducted at 60 Volts for 2 h. Gels were then visualized and photographed using a Biometra gel documentation unit (Bio-Doc, Biometra, Germany).

#### *2.8. SSR Data Analysis*

The amplified SSR DNA bands representing different alleles were scored as different genotypes. For each marker, allelic bands were compared against a 100 bp DNA ladder. Then, fragment data was converted into the binary encoded allelic data to apply the multivariate analyses. Genetic distance, the ratios of shared DNA bands, and genetic similarities were estimated from the allele binary formatted data set using Nei and Li's coefficient [26]. Genetic distance was calculated as follows:

$$\text{GDn} = 1 - \left[ 2 \text{N11}/(2 \text{N11} + \text{N10} + \text{N01}) \right]$$

where N (1,1) is the number of loci having bands present in both accession, N (1,-) is the number of loci having a band present in the first accession, N (-, 1) is the number of loci having a band present in the second accession.

The accessions were clustered based on the matrix of genetic similarities using the unweighted pair group method with arithmetic averages (UPGMA). Polymorphic information content (PIC) values were calculated for each microsatellite based on the allelic frequency detected in the accessions studied using this formula.

Where, Pij is the frequency of the j-th allele for the i-th marker, and summation extends over n alleles. Polymorphic loci were defined as those whose most frequent allele had a frequency of less than 0.95.

Genetic diversity of the entries/populations (based on a set of measured molecular data) was estimated using diversity parameters other than PIC [27]. These are calculated as follows: percentage of polymorphic loci (PPL):

$$P = (k/n) \times 100\%$$

where k is the number of polymorphic loci, n is the total number of loci investigated. The average number of alleles per locus (A):

$$\mathbf{A} = \Sigma \text{ Ai/m}$$

where Ai is the number of alleles at the i-th locus and n is the total number of loci investigated. The average number of alleles per polymorphic locus (Ap):

#### Ap =ΣApi/np

where, Api is the number of alleles at a certain polymorphic locus, np is the total number of polymorphic loci investigated.

Percentage of polymorphic alleles (PPA)

$$\text{PPA} = (\Sigma \text{ Api} / \Sigma \text{ Ai}) \times 100\%$$

The similarity matrix using [26] genetic distance for SSR characterization was also used for principal coordinate analysis (PCoA) with the Dcenter, Eigen, Output, and Mxplot subprograms in NTSYS-PC.

#### **3. Results and Discussion**

The analyses of variance in Table 4; Table 5 showed that the mean squares due to years were significant for major studied traits, except for days to heading, which would indicate overall vast differences among the genotypes studied annually. Abdallah et al. [28] observed the ordinary analysis of variance showed highly significant differences among environments, genotypes, and environments × genotype interaction for root and shoot traits in both treatments (normal and drought). The variation due to interaction between year and variety was not significant for all measures [29]. Mean squares due to environments were highly significant for all traits studied, indicating that all environments showed significant differences. Mean squares due to genotype × environment interactions were highly significant for all traits except, root thickness, flag leaf area, plant height, relative water content, number of panicles/plant, 100-grain weight, sterility percentage, and grain yield/plant, which indicated that the tested genotypes varied from the environment to environment and ranked differently from the normal condition. Raman et al. [30] recorded that the variance analysis for grain yield indicated a highly significant genotype x degree of stress severity interaction. Mean squares due to genotype × year interactions were significant for all traits studied, except root length, root volume, root thickness, number of roots/plants, root: shoot ratio, and leaf rolling.

Some genotypes surpassed the others once the mean squares of genotypes were highly significant than the interaction G × Y mean squares and identified the most superior genotypes. Genotype × environment × year mean squares were not significant for all the studied traits, except, days to heading and leaf rolling, indicating that each genotype's performance in one environment will be changed from one year to another. The significant differences among rice genotypes in this investigation revealed genetic variability in the studied material and provided an excellent yield improvement opportunity. Grain yield and other characteristics exhibited stability across the seasons as significant genotype × environment interaction, which indicated the differences among genotypes were apparent (Table 4; Table 5). This research shows that further improvement through the selection of all studied characteristics could be effective. Genotype characteristics that confer an advantage in some water stress environments may prove useless or may even be a liability in other environments. This is reflected in the large G × E interactions in drought trials and the difficulty of identifying drought-tolerant check cultivars Zhang et al. [31].

#### *3.1. Performance Across Environments*

The ordinary analysis of variance indicated highly significant differences among genotypes for all traits studied in Table 4; Table 5 in the combined data. The studied genotypes' mean performances at the combined data over environments are presented in Tables 6–8. For root length, the genotypes showing high values were IET1444, Sakha101, Sakha106, and Hybrid 2 (26.39, 25.88, 25.73, and 25.01cm, respectively), while the lowest values were obtained from Sakha102, Giza178, Sakha104, and Sakha103 (21.94, 21.42, 19.83 and 18.71cm, respectively). The genotypes Giza178, Hybrid 1, IET1444, and Sakha101 gave the superior values for root volume, 64.06, 57.66, 54.84, and 52.28 mm3, respectively; otherwise, the genotypes Giza177, Giza182, Sakha105, and Sakha103 gave the lowest one 103, 35.00, 29.88, 27.27, and 16.50 mm3, respectively. Concerning the root thickness, the genotypes E. Yasmine, Hybrid 1, IET1444, and Hybrid 2 had increased values of 1.08, 1.08, 1.07, and 0.97 mM. On the other hand, Giza178, Sakha102, Sakha104, and Giza182 resulted in the decreased values of 0.58, 0.56, 0.46, and 0.41 mM, respectively (Table 6).


**Table 4.** Analysis of variance for growth characteristics under normal and drought conditions.

\*, \*\* significant and high significant at probability 0.05 and 0.01, respectively. Env. (Environment), Gen. (Genotypes).

**Table 5.** Analysis of variance for grain yield and related traits under normal and drought in 2018 and 2019 rice growing seasons.


\*, \*\* significant and high significant at probability 0.05 and 0.01, respectively.

**Table 6.** Performance of Growth characteristics under normal and drought conditions and their combined data.


d, n, and c are drought, normal, and combined data, respectively. <sup>111</sup>

Regarding the number of roots/plant, Sakha101, IET1444, Sakha104, Sakha106 produced the greatest number 272.65, 270.14, 266.86, and 248.20, respectively; meanwhile, the lowest number was found to be with Sakha103, Giza182, E. Yasmine, Giza181, recorded 169.13, 159.39, 157.18, and 135.81, respectively. The genotypes Giza181, Giza179, IET1444, and Giza178 recorded the major values 1.22, 1.13, 1.11, and 1.09 for Sakha103, 0.57, 0.55, 0.48, and 0.42, respectively. The genotype Giza178 produced the highest values for root volume and number of roots/plant. Hybrid 1 has superior values for the root volume. Root thickness and IET1444 have major root volume and root: shoot ratio under drought stress compared to other genotypes, indicating these genotypes can avoid water stress and increase the ability to absorb water from the soil. Moreover, efforts to increase yield under drought conditions also focused on improving secondary traits such as root architecture (root length, root volume, root thickness, number of roots, and root: shoot ratio). Rice genotypes that can maintain water status through adapted root systems come under the drought avoidance mechanism category. These genotypes can minimize the yield losses caused by drought [32]. Rice genotypes that avoid drought usually have deep, coarse roots with a high ability for branching and soil penetration and a higher root to shoot ratio [33]. Gaballah [34] reported the rice genotypes Moroberekan, Giza 178, and Sakha104 had the highest values for root characters under water shortage. Abdallah et al. [16] found the genotypes GZ5121-5-2 and GZ1368-S-5-4 had thicker roots, higher root diameter, and higher root length density than those grown under normal conditions.

**Table 7.** Growth characteristics of genotype under normal, drought conditions, and their combined data.


d, n, and c are drought, normal, and combined data, respectively.


**Table 8.** Grain yield performance and related traits under normal, drought conditions and their combined data.

d, n, and c are drought, normal, and combined data, respectively.

The genotypes such as Sakha 102, Sakha 103, Giza177, and Sakha 105 took the least number of days to head and have earliness values of 94.42, 94.33, 94.17, and 93.92 days, respectively; in contrast, it took significantly longer time for E. Yasmine, Giza 181, GZ1368- S-5-4 and IET1444 having the earliness values of 118.25, 114.08, 112.67 and 112.50 days, respectively. These differences among rice genotypes might be attributed to their genetic background. The opposite strategy was observed in other cultivars, which had a significant delay in maturity with drought. Heading delay is a typical drought response observed in rice (Gaballah and Abd Allah) [35], which is expected to confer a benefit in those environments where stress is temporary, if development and flowering resume after the stress are relieved. Gaballah [34] mentioned that Moroberakan, Giza178, and Sakha104 gave the highest values under normal and drought conditions and for days to heading.

The desirable mean values for leaf rolling were found to be with Hybrid 1, Hybrid 2, GZ1368-S-5-4, and IET1444 were 2.08, 2.17, 1.58, and 1.50, respectively, while, the undesirable mean values observed with 'Giza177 , Giza178, Giza179 and 'Giza181 were 3.21, 2.33, 2.17 and 3.67, respectively (Table 7). In this investigation, drought tolerance can be assessed by visual scoring based on leaf rolling. A smaller degree of leaf rolling indicates a greater degree of dehydration avoidance by the development of deep roots. Gaballah [34] mentioned that the drought after 12 days increased leaf rolling in rice genotypes.

Concerning the flag leaf area, which is an important functional factor for photosynthesis, assimilation, and transpiration along the experimental plant life recorded the highest values with significant differences by the genotypes, IET1444, Giza181, E. Yasmine and Sakha101 were 25.08, 21.50, 20.62, and 20.57 cm2, respectively. Otherwise, the lowest value was recorded with Sakha102 (14.27 cm2). Reduced soil moisture levels produced a lower leaf area, which might inhibit cell division under water-starved conditions. Zubaer et al. [36] mentioned that the highest leaf area was found at 100% field capacity (FC) of the soil in all the rice genotypes. The leaf area was reduced by reducing moisture levels, but the degree reduction was higher in Basmati (14.7 for 70% FC and 53.2% for 40% FC) cm. There were significant differences in the plant height among the studied rice genotypes, suggesting that the growth rates were different in these genotypes. With respect to plant height in Table 7, the most desirable mean values towards dwarfism were obtained in the genotypes Sakha101, Giza 179, Sakha105, and Giza 177, which were 85.43, 85.72, 86.60, and 87.13cm,

respectively. Similarly, the tallest values were obtained in Sakha106, Sakha104, Sakha102, and IET1444, 96.13, 96.86, 100.45, and 104.37 cm, respectively. Rice cultivars have tolerance by assessing plant height reduction under drought stress conditions. Lafitte et al. [37] indicated that the low land stress reduced height by only 4 cm (3%), ranging from a 43 cm reduction to a 22 cm increase in height. Regarding relative water content, the genotypes Sakha104, IET1444, Giza179, and Sakha101 resulted in the maximum values 83.30, 83.72, 83.76, and 84.55%, respectively, while the minimum values were recorded in Sakha105, Sakha103, 'Giza177 and Giza181 were 53.61, 58.78, 61.24, and 66.87%, respectively (Table 7). Plant responses to tissue water potential determine their level of drought tolerance. The traits, such as leaf turgor (RWC) maintenance and leaf rolling, have been used as selection criteria in rice [38], due to rice cultivar's ability to save water in the leaf tissue to overcome water shortage. In the present study, we could also find a similar mechanism of drought tolerance is operating in genotypes, such as Giza 179, Sakha 101, Sakha 104, and IET1444, which were able to maintain significantly higher RWC under drought condition. We found that these genotypes can be considered as tolerant to moderately tolerant rice genotypes for drought stress.

For a number of panicles/plant, the genotypes Giza179, IET1444, Hybrid 1, and Hybrid 2 recorded the highest mean values of 20.16, 20.18, 20.70, and 22.08, respectively. Otherwise, the lowest values 13.92, 14.81, 14.95, and 15.35 with the genotypes GZ1368- S-5-4, Sakha102, Sakha106, and Giza177, respectively (Table 8). The heaviest 100-grain weight were 3.00, 3.03, 3.14, and 3.19 g, achieved with Sakha102, Sakha106, Giza182, and E. Yasmine. Therefore, the genotypes GZ1368-S-5-4, Giza178, IET1444, and Giza181 gave the lightest values, 2.10, 2.25, 2.30, and 2.40 g, respectively. The desirable values for sterility percentage were confirmed with genotypes IET1444, Giza178, Hybrid 2 and Giza179, which were 9.58, 9.81, 10.05 and 10.28%, respectively, otherwise the genotypes Sakha106, Sakha104, Giza177, and Giza181 gave undesirable values 14.53, 15.75, 16.08, and 16.25%, respectively. Concerning grain yield/plant, the genotypes Sakha101, Giza179, Hybrid 1, and Hybrid 2 resulted in the greatest values 35.56, 37.51, 40.00, and 41.50 g/plant. On the other hand, the lowest values obtained with the genotypes IET1444, Giza181, Giza182, and E. Yasmine were 27.14, 28.08, 29.96, and 31.03 g/plant, respectively. Mukamuhirwa et al. [39] reported that the cultivar Intsindagirabigega was most tolerant to drought, while Zong geng was the most sensitive. The genotypes Giza178, Giza179, Hybrid 1, and Hybrid 2 gave maximum values 0.78, 0.84, 0.88, and 0.92 respectively, for water use efficiency, while the genotypes Giza181, IET1444, Sakha105, and Sakha103 recorded the minimum values were 0.62, 0.62, 0.66, and 0.67, respectively. The rice genotypes Giza 179, Hybrid 1and Hybrid 2 had higher values for a number of panicles/plants, 100-grain weight, grain yield/plant and water use efficiency under normal, drought stress, and their combined data. Gaballah and AbdAllah [35] mentioned that the water stress reduced plant height, induced leaf rolling in the susceptible rice genotypes. The reduction of grain yield, number of panicles/plants, 100-grain weight, and high sterility percentage resulted from water stress at flowering and ripening stages. Water stress during vegetative, panicle initiation, flowering has reduced grain yield/plant by 28%, 34%, and 40%, respectively. Drought mitigation, through the development of drought-tolerant varieties with higher yields suitable for water-limiting environments, will be the critical factor in improving stable rice production.

#### *3.2. Number of Alleles and Allelic Diversity*

The sixteen rice genotypes were used in the present study were subjected to DNA polymorphism screening and assessment using SSR markers, which offer excellent potential for generating large numbers of markers evenly distributed throughout the genome and have efficiently been used to give reliable and reproducible genetic markers. Ten SSR primer pairs related to drought tolerance with known map positions distributed in the rice genome were used to screen a set of sixteen selected indica, japonica, and tropical-japonica rice genotypes with different levels with mechanisms of drought.

Among 10 SSR markers, spread on seven chromosomes (1, 2, 4, 5, 6, 8, and 9) generated polymorphic alleles. Table 9 showed that a total number of 85 alleles were detected at the nine markers' loci across the sixteen rice genotypes. The number of alleles per locus generated by each marker varied from 2 to 15 alleles, with an average of 8.5 alleles per locus. The effective number of alleles per locus ranged from 1.20 alleles to 3.0 alleles, with an average of 2.28 alleles. The highest number and the effective number of alleles per locus were observed for RM263 (3.0), RM289 (2.81), and RM242 (2.63). Similar results for a low number of alleles per locus were also obtained by [40] (3.33) and [41] (2.5). On the contrary, a high number of alleles per locus was obtained by [42] (8.57). There was a significant positive correlation between the number of alleles detected at a locus and the number of repeats within the targeted microsatellite DNA (r = 0.57 \*\*). Thus, the larger the repeat number in the microsatellite DNA, the larger the number of alleles detected. Moreover, it was reported that the dinucleotide repeat motif (GA) displayed a high level of variation among the rice genotypes [24]. On the other hand, [40] reported no correlations between the number of alleles detected and the number of SSR repeats.

**Table 9.** The correlation coefficient for polymorphic SSR markers.


\*\* is highly significant at probability 0.001.

#### *3.3. Gene Diversity*

The gene diversity or heterozygosity (He) of a locus is defined as the probability that an individual is heterozygous for the locus in the population [43]. Higher values of this measure tend to be more informative because there is more allelic variation. As shown in Table 9, the He values for all SSR markers used in this study varied from 0.94 to 1.00, with an average of 0.98. The findings were in agreement with the observation of [44–46]. The highest He value (1.00) was recorded for RM23 and 518. Meanwhile, the lowest He values (0.98) were achieved by RM518.

#### *3.4. PIC Value*

PIC value refers to the value of a marker for detecting polymorphism within a population, depending on the number of detectable alleles and the distribution of their frequency; thus, it provides an estimate of the discriminating power of the marker [47]. As shown in Table 3, the PIC values for the SSR used in this study varied from 0.83 to 0.99, with an average of 0.95. This result is consistent with Sajib et al. [40] who reported high variation in PIC values for all tested SSR loci (from 0.14 to 0.71 with an average of 0.48). Higher averages of PIC values were reported by Zeng et al. [48] (0.57) and Ram et al. [49] (0.707). The highest PIC values were observed for RM23 (0.99), RM518 (0.99), RM223 (0.98) and RM276 (0.98). A highly significant correlation coefficient was found between PIC values and the number of amplified alleles detected per locus (r = −0.75 \*\*), as shown in Table 9. A significant correlation between PIC value and the effective number of alleles (r = 0.92 \*\*) and a highly significant correlation was found between PIC and gene diversity (r = −0.92 \*\*). Similar results were obtained by Kumar et al. [50].

Figure 1 shows the PCR amplified fragments produced by the highest polymorphic markers in the current study, RM23, RM518, RM223, and RM276. These markers revealed the highest PIC values and gave the same values 1.00 and the highest number of alleles ranging from 8 to 9 alleles per locus, suggesting that these markers could be used for molecular characterization of a large number of rice genotypes rather than mapping populations for drought tolerance. The results were similar to those obtained by [10,51,52].

#### *3.5. Identified MAS Marker*

Among ten polymorphic SSR markers, RM518 was able to divide the studied genotypes into seven groups depending on their drought tolerance potential. The first group showed the first allele with a molecular weight of 125.89 bp included the drought susceptible genotypes japonica type1 and 2. The second allele with a molecular weight of 172.37 bp appeared in the second group included the drought moderate indica–japonica genotype Sakha105, Hybrid 1, Giza178, Giza181, and Giza182. The third the allele molecular weight 163.038 has six genotypes IET1444, Sakha101, Sakha102, Sakha106, Sakha104, and Giza177. The fourth allele molecular weight 375.85 bp with on genotype GZ1368-S-5-4 indica japonica type and tolerant to drought stress. The fifth allele molecular weight was 169.65 with one genotype Hybrid 2 moderate tolerant to drought and the six-allele molecular weight 158.56 bp with on cultivar Sakha103 susceptible to drought stress. This result agreed with [53] who reported that RM472 was linked to maximum root length and root dry weight characters. This marker could be useful in MAS for these characters in rice. The results were similar to those obtained by [46,54].

**Figure 1.** Agarose gel electrophoresis of PCR amplified fragments for the polymorphic SSR markers RM23, RM518, RM223, and RM276. M is a 100 bp DNA ladder.

Lane 4 Giza 181 Lane 8 Sakha 103 Lane 12 E.Yasmine Lane 16 IET 1444

#### *3.6. Similarity*

The maximum similarity coefficient (0.5) was recorded between Giza181 and Giza182, which are indica type and drought susceptible genotypes at the same time (Table 10). Moreover, a high similarity percentage was observed between the japonica type and susceptible to drought stress Sakha104 with Sakha102 and Sakha106 (0.45), Sakha101 with

Sakha102 and Sakha106 (0.40), Sakha105 with Hybrid 1 (0.40), Hybrid 1 with Giza178 (0.40) and GZ1368-S-5-4with Giza181 (0.40). On the other hand, no similarity percentage values were observed between the genotypes such as Sakha103, Giza179, IET1444, Sakha102, and Sakha104 with Giza178, GZ1368-S-5-4, Giza181, Giza182, E. Yasmine and Hybrid 2. These results were in harmony with [2], who reported a low similarity coefficient between japonica type and indica type genotypes, and [55] reported a relatively high level of similarity between closely related genotypes. Moreover, the findings are similar to those observed by [17,18].

#### *3.7. Cluster Analysis*

The genetic relationships among rice genotypes are presented in a dendrogram based on informative microsatellite alleles (Figure 2). All genotypes are grouped into two major clusters in the dendrogram at 66% similarity based on Jaccard's similarity index. Whereas Jaccard's similarity measure similarity for the two sets of data, it ranges from 0% to 100%. The higher the percentage, the more similar the two populations. Although it's easy to interpret, it is extremely sensitive to small sample sizes and may give erroneous results, especially with very small samples or data sets with missing observations. The first cluster (A) is divided into two minor groups A1 and A2. The A1 sub-cluster included two groups A1-1 and A1-2, of which A1-1 contains genotypes IET1444, GZ1386-S-5-4, and Hybrid1, which are drought stress-tolerant. On the other hand, the A1-2 cluster was further divided into A1-2-1 (having Hybrid2 genotype) and A1-2-2 (containing Giza179 and Giza178) at coefficient 0.91, represented moderate or drought stress.

**Figure 2.** Dendrogram derived from unweighted pair group method with arithmetic averages (UPGMA) cluster analysis of sixteen rice genotypes based on Jaccard's similarity coefficient using 10 SSR markers.

The A2 cluster, also subdivided into A2-1, included Giza181 and Giza182 at coefficient 0.94, while the A2-2 cluster had Egyptian Yasmine genotype, which has coefficient 0.76 with cluster A2-1. The A2 cluster comprised of the sensitive genotypes and indica type. Similarly, the cluster B was divided into two minor groups B1 and B2. The cluster B1 included B1-1 and B1-2 at coefficient 0.65. The B1-1 divided into B1-1-1 and B1-1-2 at coefficient 0.72. In general, the B cluster japonica type, and all were sensitive to drought stress. El-Malky et al. [42] reported the ability of SSR markers to divide the varieties into two distinct groups, one included the indica varieties, and the other had the japonica varieties. Moreover, Zeng et al. [48] found that all genotypes grouped into two major branches in the dendrogram with less than 10% similarity based on Jaccard's similarity index, one unit represents the subspecies, japonica rice, and another unit represents the subspecies, indica, or the hybrids between japonica rice and indica rice.


**Table 10.** Similarity coefficient among studied genotypes based on SSR

 markers.

#### **4. Conclusions**

Genetic improvement for drought tolerance in rice can be achieved in this present study based on results obtained at phenotypic characterization. Extensive genetic diversity analyses were presented as valuable in selecting the truly promising drought-tolerant genotypes, which can be used to cross to development genotypes with increased water stress tolerance levels. The rice genotypes Giza179, IET1444, Hybrid1, and Hybrid2, achieved the greatest values for grain yield/plant and produced maximum values regarding water use efficiency. It could be summarized the genotypes GZ1368-S-5-4, IET1444, Giza 178, and Giza179 were suitable materials for developing drought breeding. Thus, this study's results indicate that incorporating genetic analyses with phenotypic data is very important to accelerate breeding programs by selecting suitable genotypes to improve target traits and could help exclude genotypes with bad performance.

**Author Contributions:** Conceptualization, M.M.G., and A.M.F.; methodology, M.M.G. and A.M.M.; software, A.M.M.; validation, A.E.S., M.M.G.; formal analysis, A.M.F.; investigation, M.M.G.; resources, M.M.G. and A.M.F.; data curation, M.M.G.; writing—original draft preparation, M.M.G., A.M.F., and A.M.M.; writing—review and editing, A.E.S., M.S., M.B., M.M.H.; visualization, A.M.F.; supervision, M.M.G.; project administration, A.M.F.; funding acquisition, M.S., M.B., M.M.H., A.E.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** The current work was funded by Taif University Researchers Supporting Project number (TURSP—2020/59), Taif university, Taif, Saudi Arabia.

**Acknowledgments:** The authors sincerely acknowledge the contributions of Rice Research and Training Center (RRTC), Field Crops Research Institute, Agricultural Research Center, 33717, Sakha, Kafr Elsheikh, Egypt for providing necessary laboratory facility during the investigation. Besides, the authors extend their appreciation to Taif University for funding current work by Taif University Researchers Supporting Project number (TURSP—2020/59), Taif University, Taif, Saudi Arabia.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Article QMrl-7B* **Enhances Root System, Biomass, Nitrogen Accumulation and Yield in Bread Wheat**

**Jiajia Liu 1,2,†, Qi Zhang 1,2,†, Deyuan Meng 1,2, Xiaoli Ren 1,2, Hanwen Li 3, Zhenqi Su 3, Na Zhang 1, Liya Zhi 1, Jun Ji 1, Junming Li 1, Fa Cui 4,\* and Liqiang Song 1,5,\***


**Abstract:** Genetic improvement of root systems is an efficient approach to improve yield potential and nitrogen use efficiency (NUE) of crops. *QMrl-7B* was a major stable quantitative trait locus (QTL) controlling the maximum root length in wheat (*Triticum aestivum* L). Two types of near isogenic lines (A-NILs with superior and B-NILs with inferior alleles) were used to specify the effects of *QMrl-7B* on root, grain output and nitrogen-related traits under both low nitrogen (LN) and high nitrogen (HN) environments. Trials in two consecutive growing seasons showed that the root traits, including root length (RL), root area (RA) and root dry weight (RDW), of the A-NILs were higher than those of the B-NILs at seedling stage (SS) before winter, jointing stage (JS), 10 days post anthesis (PA10) and maturity (MS), respectively. Under the LN environment, in particular, all the root traits showed significant differences between the two types of NILs (*p* < 0.05). In contrast, there were no critical differences in aerial biomass and aerial N accumulation (ANA) between the two types of NILs at SS and JS stages. At PA10 stage, the aerial biomass and ANA of the A-NILs were significantly higher than those of the B-NILs under both LN and HN environments (*p* < 0.05). At MS stage, the A-NILs also exhibited significantly higher thousand-grain weight (TGW), plot grain yield, harvest index (HI), grain N accumulation (GNA), nitrogen harvest index (NHI) and nitrogen partial factor productivity (NPFP) than the B-NILs under the corresponding environments (*p* < 0.05). In summary, the *QMrl-7B* A-NILs manifested larger root systems compared to the B-NILs which is favorable to N uptake and accumulation, and eventually enhanced grain production. This research provides valuable information for genetic improvement of root traits and breeding elite wheat varieties with high yield potential and NPFP.

**Keywords:** *Triticum aestivum* L.; *QMrl-7B*; root traits; grain yield; nitrogen use efficiency

#### **1. Introduction**

Bread wheat (*Triticum aestivum* L.) is one of the major crops worldwide; its production greatly affects food security and the global economy [1]. In general, high grain productivity largely depends on water and fertilizer input. However, over-application of fertilizers has

**Citation:** Liu, J.; Zhang, Q.; Meng, D.; Ren, X.; Li, H.; Su, Z.; Zhang, N.; Zhi, L.; Ji, J.; Li, J.; et al. *QMrl-7B* Enhances Root System, Biomass, Nitrogen Accumulation and Yield in Bread Wheat. *Plants* **2021**, *10*, 764. https:// doi.org/10.3390/plants10040764

Academic Editor: Igor G. Loskutov

Received: 16 March 2021 Accepted: 10 April 2021 Published: 13 April 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

led to not only natural resources exhaustion, but also soil, air and water quality degradation [2,3]. To resolve these environmental issues and ensure food security, breeding crops with efficient use of water and nutrients is urgently required for sustainable agriculture [4]. Roots are the primary organs that determine the acquisition efficiency of soil resources and have a direct impact on grain yield [5]; more and deeper roots may improve the water and mineral uptake from deeper soil layers and reduce nitrate leaching losses to the environment [6]. Although root traits are difficult to characterize and their breeding values are seldom assessed under field conditions, manipulating root system architecture to enhance nutrient uptake has been proposed to enable a very much needed new green revolution and further increase in yield potential [7].

Root traits can be dissected into root number, root length (RL), root weight, root surface area (RA), root volume, root thickness, and density of primary roots, lateral roots and adventitious roots as well as root/shoot dry weight ratio, etc. [8,9]. Since the 1990s, a large number of quantitative trait loci (QTLs) controlling root system architecture have been reported in rice and some of them have been successfully cloned [10]. In maize, several major QTLs involving root morphology have been detected, but no causal genes have been reported yet [11]. In recent years, a good many QTLs for root traits in wheat have been also documented [9,12–16]. However, most of these QTLs were identified at seedling stage in hydroponic culture. It was not clear whether these root-related QTLs were associated with yield-related traits in most cases. Considering that root is mainly grown in soil and root traits are plastic in adapting to environmental factors such as limitation of water [17] and nutrients [18], field experiments under diverse environments are necessary to elucidate the genetic effects of QTLs identified in hydroponic culture. With precise evaluation and verification at the population level, QTLs associated with root traits may be used in molecular wheat breeding practice.

Nitrogen (N), as the key element of proteins and other biomacromolecules, is quantitatively the most important mineral nutrient for plant growth and development. Application of enough synthetic N fertilizers at the appropriate time can overwhelmingly improve crop yield [19]. However, only 30%~40% of the applied N fertilizer is taken up from soil by crops. Therefore, improving nitrogen use efficiency (NUE) in crops can help minimize the detrimental impact of N fertilizers on the environment and be favorable for sustainable agriculture [20,21]. As a result, a number of NUE-improved cultivars of main cereal crops have been released. Numerous research studies on rice [22,23], maize [24,25] and barley [26] indicated that root traits are closely related with N uptake and genetically controlled by major QTLs [27,28]. In wheat, several studies also discovered the co-localization of QTLs for root traits, nitrogen uptake and grain productivity [8,9]. These results presented the common genetic basis of root traits and N utilization, suggesting the tremendous potential of root traits in improving grain yield and NUE. Nevertheless, more sufficient understanding of the role of the key loci conferring high NUE will facilitate its future application in molecular breeding.

Near-isogenic lines (NILs) are powerful tools to characterize the gene/QTL function for certain plant traits [29]. We [9] detected a major stable QTL, named *QMrl-7B,* controlling the maximum root length of wheat at seedling stage in hydroponic culture and developed a pair of *QMrl-7B* NILs with superior and inferior alleles, respectively. The objective of this study was to specify *QMrl-7B*'s genetic effects on root, above-ground biomass, grain yield and nitrogen accumulation, using the pair of *QMrl-7B* NILs as materials at the population level under different nitrogen environments, which would provide a valuable resource for molecular improvement of root traits.

#### **2. Results**

#### *2.1. Root Morphology of QMrl-7B NILs*

Field trials showed that the root traits of KN9204 and the *QMrl*-*7B* NILs displayed the tendency of rapid increase in the initial seedling stage and then gradual decrease with the advancement of the growth period of wheat, and the highest values of root length, root area and root dry weight of the three genotypes were recorded at the stage of 10 days post anthesis (Figure 1, Table 1). Identical changing trends in root traits were observed in both 2017~2018 and 2018~2019 growing seasons.

**Figure 1.** Root length (RL) (**A**–**D**), root surface area (RA) (**E**–**H**) and root dry weight (RDW) (**I**–**L**) of KN9204 and the *QMrl*-*7B* near isogenic lines (NILs) at different stages. Note: 2017~2018 and 2018~2019 indicate growing seasons; LN and HN indicate low nitrogen and high nitrogen environments, respectively; AA indicates *QMrl*-*7B* NILs with the superior alleles; BB indicates *QMrl-7B* NILs with the inferior alleles. SS, JS, PA10 and MS indicate seedling stage, jointing stage, 10 days post anthesis and maturity, respectively. Different lowercases indicate significant differences (*p* < 0.05) among the materials.

#### 2.1.1. Root Length (RL)

In the 2017~2018 growing season, the mean RLs of A-NILs vs. B-NILs at SS, JS, PA10 and MS were 33.6 vs. 25.1, 90.2 vs. 72.1, 146.3 vs. 105.1 and 92.2 vs. 69.0 cm/cm<sup>2</sup> under the LN environment (Table 1, Figure 1A), and 42.8 vs. 32.5, 128.2 vs. 113.3, 218.4 vs. 185.6 and 173.4 vs. 137.2 cm/cm2 under the HN environment (Table 1, Figure 1B), respectively; indicating that RLs of the A-NILs increased 33.9%, 25.1%, 39.2% and 33.6% under LN environment, and 31.7%, 13.2%, 17.7% and 26.4% under HN environment in comparison to those of the B-NILs at the comparable stages (*p* < 0.05). In the 2018~2019 growing season, the mean RLs of the A-NILs at the comparable stages were also significantly longer than those of the B-NILs under the corresponding nitrogen environments, except the RLs at JS stage under the HN environment (Table 1; Figure 1C,D).



#### 2.1.2. Root Surface Area (RA)

In the 2017~2018 growing season, likewise, the mean RAs of A-NILs vs. B-NILs at SS, JS, PA10 and MS were 1.7 vs. 1.2, 9.1 vs. 7.3, 13.6 vs. 10.1 and 7.6 vs. 5.8 cm2/cm2 under the LN environment (Table 1, Figure 1E), and 2.3 vs. 2.0, 11.8 vs. 10.7, 18.3 vs. 14.0 and 12.3 vs. 10.0 cm2/cm2 under the HN environment (Table 1, Figure 1F), respectively; indicating that the mean RAs of the A-NILs increased by 41.7%, 24.7%, 34.7% and 31.0% under the LN environment, and 15%, 10.3%, 30.7% and 23% under the HN environment higher than those of the B-NILs at the comparable stages (*p* < 0.05). In the 2018~2019 growing season, the unvarying trends in RA difference between the two types of NILs were observed at the comparable growth stages under the corresponding nitrogen environments, except the RAs at SS stage under the HN environment (Table 1; Figure 1G,H).

#### 2.1.3. Root Dry Weight (RDW)

In the 2017~2018 growing season, similarly, the mean RDWs of A-NILs vs. B-NILs at SS, JS, PA10 and MS were 2.2 vs. 1.8, 8.5 vs. 6.7, 11.5 vs. 9.1 and 6.1 vs. 4.7 mg/cm2, respectively, under the LN environment (Table 1, Figure 1I), indicating that the A-NILs were heavier than the B-NILs by 22.2%, 26.9%, 26.4% and 29.8% in RDW at the four growth stages (*p* < 0.05). Under the HN environment, the mean RDWs of A-NILs vs. B-NILs at SS, JS, PA10 and MS were 2.8 vs. 2.5, 10.0 vs. 9.0, 13.2 vs. 11.7 and 8.8 vs. 6.6 mg/cm2, respectively (Table 1, Figure 1J), indicating that the A-NILs were 12.0%, 11.1%, 12.8% and 33.3% heavier than the B-NILs in RDW at the four growth stages (*p* < 0.05). In the 2018~2019 growing season, the mean RDWs of the A-NILs at the comparable stages were also significantly heavier than those of the B-NILs under the corresponding nitrogen environments, except the RDW at SS under the HN environment (Table 1; Figure 1K,L).

#### 2.1.4. Root Vertical Distribution

To investigate the root distribution in soil, the RLD, RAD and RWD were measured every 10 cm of soil layer at MS, JS, PA10 and MS stages. The biggest values of RLDs, RADs and RWDs at each growth stage were recorded in the upper soil layer (0~10 cm and 10~20 cm), then gradual decrease of the root indices accompanied with the raised soil depth (Figure 2, Figures S1–S3). Noticeably, the root distribution in the 30~40 cm soil layer was much less than those in the neighboring soil layers (20~30 and 40~50 cm), which may result from the restriction effect of compact soil on root growth in this ploughed bottom layer. The A-NILs exhibited superior RLDs, RADs and RWDs over the B-NILs in each soil layer (except for 30~40 cm) at the most comparable stages (*p* < 0.05). Taking the 10~20 cm soil layer at PA10 stage as an example, the mean RLDs of the A-NILs were 3.6 cm/cm<sup>3</sup> in 2017~2018 and 4.0 cm/cm3 in 2018~2019 growing seasons under the LN environment, respectively, which were 33.3% and 14.3% higher than those of the B-NILs (2.7 and 3.5 cm/cm3), respectively (Figure 2A,C). Under the HN environment, the corresponding RLDs of the A-NILs were 5.2 and 4.9 cm/cm3, respectively, which were 36.8% and 40.0% higher than those of the B-NILs (3.8 and 3.5 cm/cm3) (Figure 2B,D). As expected, the RAD (Figure 2E–H) and RWD (Figure 2I–L) exhibited the consistent distribution pattern in different soil layers like RLD.

Besides, the root distribution in 0~30, 30~60, 60~100 and 100~150 cm groups of soil layers at PA10 stage was further analyzed (Table 2). The mean RL, RA and RDW of the A-NILs were significantly different from those of the B-NILs in most soil layers under the LN environment (*p* < 0.05), except for RL in the 0~30 cm soil layer and RA in the 60~100 cm soil layer in 2018~2019. Under the HN environment, significant differences in RL, RA and RDW between the two genotypes mainly took place in the 0~30 and 100~150 cm soil layers (*p* < 0.05). The ample roots of the A-NILs over the B-NILs in both upper soil and deeper soil would definitely improve water and mineral uptake, especially in water-deficient north China plain.

**Figure 2.** Root length density (RLD) (**A**–**D**), root area density (RAD) (**E**–**H**) and root weight density (RWD) (**I**–**L**) of KN9204 and the *QMrl*-*7B* near isogenic lines (NILs) in different soil layers at 10 days post anthesis. 2017~2018 and 2018~2019 indicate growing seasons; LN and HN indicate low nitrogen and high nitrogen environments, respectively; L indicates soil layer.



significant differences (*<sup>p</sup>* < 0.05) among materials at the same environment;

interaction; "\*" and "\*\*" indicate significant differences at *p* < 0.05 and *p* < 0.01 levels, respectively; "ns" indicates no significant differences.

 SV indicate source of variation; E and G indicate environment

 and genotype, respectively; E\*G indicate their

#### *2.2. Aerial Biomass and Grain Yield of QMrl-7B NILs*

#### 2.2.1. Aerial Dry Weight (ADW)

Field trials showed that the ADWs of KN9204 and the *QMrl*-*7B* NILs increased gradually with the advancement of wheat development (Figure 3). In the 2017~2018 growing season, the mean ADWs of A-NILs vs. B-NILs at SS, JS, PA10 and MS were 48.1 vs. 41.0, 216.0 vs. 204.1, 573.8 vs. 525.2 and 882.2 vs. 832.7 g/m<sup>2</sup> under the LN environment (Figure 3A), and 100.8 vs. 89.4, 462.0 vs. 456.2, 1186.0 vs. 974.4 and 1475.2 vs. 1447.6 g/m2 under the HN environment, respectively (Figure 3B). In the 2018~2019 growing season, the consistent trends in ADW difference between the two types of NILs were observed repeatedly at the comparable growth stages under the corresponding nitrogen environments (Figure 3C,D).

**Figure 3.** Aerial dry weight (ADW) of KN9204 and the *QMrl-7B* near isogenic lines (NILs) at different stages. 2017~2018 (**A**,**B**) and 2018~2019 (**C**,**D**) indicate growing seasons; LN (**A**,**C**) and HN (**B**,**D**) indicate low nitrogen and high nitrogen environments, respectively; AA indicates *QMrl-7B* NILs with the superior alleles; BB indicates *QMrl-7B* NILs with the inferior alleles; SS, JS, PA10 and MS indicate seedling stage, jointing stage, 10 days post anthesis and maturity, respectively; Different lowercases indicate significant differences (*p* < 0.05) among the genotypes at the same growth stage.

Unlike the findings in root traits, interestingly, no significant differences in ADW were found between the two types of NILs at SS and JS stages under both LN and HN environments. The biggest difference of ADW between the two types of NILs was recorded at the stage PA10 (Figure 3). The mean ADWs of A-NILs vs. B-NILs at this stage in 2018~2019 were 762.7 vs. 658.5 g/m<sup>2</sup> under the LN environment and 1349.1 vs. 1049.1 g/m<sup>2</sup> under the HN environment, respectively. This finding indicated that the A-NILs were heavier than the B-NILs in ADW by 9.3% and 15.8% under the LN environment, and 21.7% and 28.6% under the HN environment in the two trial years, respectively. Prior to harvest, no significant difference between the two types of NILs in ADW was observed under the HN environment in the two growing seasons. Under the LN environment; however, the mean ADWs of A-NILs vs. B-NILs at MS were 1231.0 vs. 1135.9 g/m<sup>2</sup> in 2018~2019, indicating that there were 6.76% and 8.37% phenotypic differences between the two types of NILs in the two years, respectively.

#### 2.2.2. Grain Yield

The trends of annual variation in agronomic traits of KN9204 and the *QMrl*-*7B* NILs were basically the same between the two growing seasons. Under both LN and HN environments, there were no significant differences in plant height (PH), spike length (SL), total spikelets per spike (TSPS) and kernel number per spike (KNPS) between the two types of NILs, but the A-NILs manifested superior TGW and plot grain yield over the B-NILs (Table 3). Under the LN environment, the mean TGWs of the A-NILs were 38.8 g in 2017~2018 and 40.7 g in 2018~2019, respectively, which were 1.9 g (5.15%) and 3.3 g (8.82%) heavier than those of the B-NILs (*p* < 0.05). Under the HN environment, the mean TGWs of the A-NILs were 32.6 g in 2017~2018 and 37.9 g in 2018~2019, respectively, which were 5.50% and 6.76% higher than those of the B-NILs in the comparable growing seasons (*p* < 0.05). Consequently, the A-NILs yielded more than the B-NILs. Under the LN environment, GYs of the A-NILs were 4030.9 and 5735.4 kg/ha in 2017~2018 and 2018~2019, respectively; which were 454.8 kg/ha (12.72%) and 550.2 kg/ha (10.61%) heavier than those of the B-NILs (*p* < 0.05), respectively. Under the HN environment, GYs of the A-NILs were 6388.9 and 8426.8 kg/ha in 2017~2018 and 2018~2019, respectively; which were 6.40% (6004.1 kg/ha) and 9.99% (7661.4 kg/ha) higher than those of the B-NILs (*p* < 0.05), respectively. What is more, the mean HI of the A-NILs was also significantly higher than that of the B-NILs under the corresponding nitrogen environments.

#### *2.3. Nitrogen Accumulation of QMrl-7B NILs*

#### 2.3.1. The Aerial N Content (ANC) and Accumulation (ANA)

Field trials revealed that the ANCs of KN9204 and the *QMrl*-*7B* NILs tended to decrease with the advancement of wheat development (Table S1, Figure 4A–D). The ANCs of A-NILs vs. B-NILs at SS, JS, PA10 and MS stages were 2.47% vs. 2.45%, 1.70% vs 1.71%, 1.33% vs. 1.22% and 1.25% vs. 1.13% in 2017~2018, and 3.53% vs 3.35%, 2.00% vs 2.01%, 1.64% vs. 1.57% and 1.80% vs. 1.62% in 2018~2019 under the LN environment, respectively. Under the HN environments, the ANCs of A-NILs vs. B-NILs at the comparable stages were 2.79% vs. 2.75%, 2.42% vs 2.40%, 1.79% vs. 1.76% and 1.65% vs. 1.50% in 2017~2018, and 3.69% vs 3.69%, 2.32% vs 2.30%, 2.05% vs. 1.95% and 1.98% vs. 1.91% in 2018~2019, respectively. The result showed that the A-NILs exhibited higher ANC than the B-NILs, but the differences were not significant in most cases. The significant differences were presented at SS and MS stages under the LN environment in 2018~2019 (*p* < 0.05).

The ANA tended to increase with the advancement of the growth period (Table S1, Figure 4E–H), but no significant differences were found between the two types of NILs at SS and JS. At PA10 and MS, on the other hand, the A-NILs exhibited significant higher ANA than the B-NILs under both LN and HN environments (*p* < 0.05). At PA10 stage, the mean ANAs of the A-NILs vs. B-NILs were 7.62 vs. 6.35 and 12.51 vs. 10.37 g/m2 under the LN environment, and 21.19 vs. 17.12 g/m2 and 27.69 vs. 20.36 g/m<sup>2</sup> under the HN environment in the two growing seasons, respectively, indicating that the A-NILs were heavier than the B-NILs in ANA by 20.0%, 20.6%, 23.8% and 36.0% under the corresponding environments, respectively. At MS stage, the A-NILs also accumulated more N than the B-NILs, the mean ANAs of the A-NILs vs. B-NILs were 10.99 vs. 9.42 g/cm2 and 22.12 vs. 18.43 g/m<sup>2</sup> under the LN environment, and 24.34 vs. 21.71 g/m2 and 40.05 vs. 36.71 g/m2 under the HN environment in the two growing seasons, respectively, indicating that ANAs of the A-NILs were higher than those of the B-NILs by 16.7% and 20.0% under the LN environment as well as 12.1% and 9.1% under the HN environment.

#### 2.3.2. The Grain N Content (GNC) and Accumulation (GNA)

Compared to the B-NILs, the A-NILs had higher mean GNCs, but the differences were not significant (Table 4). The GNCs of A-NILs vs. B-NILs were 2.15% vs. 2.01%, 2.64% vs. 2.44% under the LN environment, and 2.51% vs. 2.23%, and 2.98% vs. 2.87% under the HN environment in the two trial years, respectively. In contrast, there were significant differences in GNAs between the two genotypes (*p* < 0.05). The GNAs of the A-NILs vs. B-NILs were 8.9 vs. 7.2 g/m<sup>2</sup> and 15.6 vs. 12.8 g/m2 under the LN environment, and 16.4 vs. 13.5 g/m2 and 25.3 vs. 22.1 g/m<sup>2</sup> under the HN environment in the two years, which were 23.6%, 21.9%, 21.5%, and 14.5% higher than those of the B-NILs under the corresponding environments, respectively (Table 4).


**Table 3.**

Agronomic

 traits of KN9204 and the *QMrl*-*7B* NILs.

yield and harvest index, respectively; different lowercases indicate significant differences (*<sup>p</sup>* < 0.05) among materials at the same environment

 by ANOVA.

**Figure 4.** Aerial N content (ANC) (**A**–**D**) and accumulation (ANA) (**E**–**H**) of KN9204 and the *QMrl*-*7B* near isogenic lines (NILs) at different stages. 2017~2018 and 2018~2019 indicate growing seasons; LN and HN indicate low nitrogen and high nitrogen environments, respectively; AA indicates *QMrl*-*7B* NILs with the superior alleles; BB indicates *QMrl*-*7B* NILs with the inferior alleles. SS, JS, PA10 and MS indicate seedling stage, jointing stage, 10 days post anthesis and maturity, respectively; different lowercases indicate significant differences (*p* < 0.05) among the genotypes at the same growth stage.



2017~2018 and 2018~2019 indicate growing seasons; LN and HN indicate low nitrogen and high nitrogen environments, respectively; AA indicates *QMrl*-*7B* NILs with the superior alleles; BB indicates *QMrl*-*7B* NILs with the inferior alleles; GNC indicates grain N content; GNA indicates grain N accumulation; NHI indicates N harvest index; NPFP indicates partial factor productivity of applied N; different lowercases indicate significant differences (*p* < 0.05) among the genotypes at the same environment by ANOVA.

> As expected, the A-NILs manifested significant higher mean NHIs in comparison to the B-NILs under both LN and HN environments (*p* < 0.05) (Table 4). The NHIs of A-NILs vs. B-NILs were 0.81 vs. 0.77 and 0.71 vs. 0.69 under the LN environment, and 0.68 vs. 0.62 and 0.63 vs. 0.60 under the HN environment in the two consecutive growing seasons, respectively, indicating that the NHIs of the A-NILs were higher than those of the B-NILs by 2.9 to 5.2% under the LN environment and 5.0 to 9.7 under the HN

environment, respectively. Meanwhile, the NPFPs of the A-NILs vs. the B-NILs were 28.02 vs. 26.33 kg kg−<sup>1</sup> in 2017~2018, and 36.96 vs. 33.60 kg kg−<sup>1</sup> in 2018~2019, respectively, indicating that the NPFPs of the A-NILs were 6.4% to 10.0% higher than those of the B-NILs at the normal nitrogen management (*p* < 0.05) (Table 4).

#### **3. Discussion**

#### *3.1. The Plasticity of Wheat Root Traits Is Affected by Both Genetic and Environmental Factors*

A characteristic feature of plant development plasticity is that it does not follow a rigidly predefined plan but, instead, is continuously susceptible to modification by interactions with the environment [30,31]. Root architecture is a complicated trait not only controlled by endogenous genes/QTLs but also affected by soil environment. In Arabidopsis, for example, genes such as *MONOPTEROS (MP)* and *BODENLOSBDL* regulate root architecture through repressing primary root development [32,33]. In rice, Yao et al. [34] found that the short-root mutant, *srt5*, showed extreme inhibition of seminal root, crown root and lateral root elongation, as well as altered root hair formation at the seedling stage. The PIN1 family gene, *OsPIN1* and *ZmPIN1*, plays important roles in root growth in rice [35] and maize [36], respectively. In wheat, suppression of *LATERAL ROOT DENSITY* (*LRD*) expression in RNAi plants confers the ability to maintain root growth under water limitation and has a positive pleiotropic effect on grain size and number under optimal growth conditions [37]. Overexpression of *TaTRIP1* [38]) affects the growth of root in Arabidopsis. While knockdown of the transcription factor *TabZIP60* can increase the lateral root branching in wheat [39]. Uga et al. [22] reported that the *DRO1*, a rice quantitative trait locus controlling root growth angle, is involved in cell elongation in the root tip that causes asymmetric root growth and downward bending of the root in response to gravity. Maccaferri et al. [14] revealed 20 clusters of QTLs controlling root architecture such as root length, root number and root angle of wheat. *QMrl-7B*, a major stable QTL controlling maximum root length, was reported to regulate root development of wheat in hydroponic culture of different nitrogen conditions [9]. All the above findings indicated that root architecture is mainly controlled by both major genes as well as QTLs with moderate or minor effects.

Root plastic development is enormously influenced by environmental factors including soil water deficiency [40] and insufficient nutrient availability [18]. Developmental response to drought stress in crops is manifested through enhanced root growth and suppressed shoot growth resulting in increased root/shoot ratio [41]. According to the description of Zhang et al. [17], the root growth of bread wheat in the north China plain was even before winter, remained almost static in the winter, increased rapidly between jointing and grain filling stage, and then decreased at maturity. In the present study, a pair of *QMrl-7B* near isogenic lines experienced similar root growth patterns, the root traits including root length, root surface area and root dry weight expressed plasticity to varied soil nitrogen supplies. Interestingly, there were significant differences in root traits between the two types of *QMrl-7B* NILs from seedling till mature under both low and high nitrogen environments, indicating that the *QMrl-7B* played a vital role in the maintenance of root traits (Tables 1 and 2). *QMrl-7B* allele from KN9204 had significant positive effect on wheat root growth and development. For root vertical distribution, it was noticed that there was always significant difference between the two types of NILs, especially in deep soil, no matter what nitrogen environment there was (Tables 1 and 2). This result further supported the permanent effect of *QMrl-7B* on root development.

#### *3.2. The Association of Root System with Nitrogen Accumulation*

As an integral part of plants, roots are involved in the acquisition of water and nutrients, affecting efficiency of nitrogen uptake and utilization. Several studies in maize revealed that a larger root system contributed to effective N accumulation in N-efficient cultivars in comparison with N-inefficient cultivars [42,43]. Different wheat varieties responded to low N supply by expanding their root traits, such as root length, but manifested

varied N accumulation [44]. Ehdaie et al. [45] suggested that positive and significant correlation coefficients existed between root biomass and plant N content, between root biomass and grain yield in wheat. KN9204, the donor of *QMrl-7B* superior alleles, is an efficient nitrogen use wheat cultivar [46] with long roots and large root system [47]. In the current study, the two types of NILs of *QMrl-7B* did not show a significant difference in aerial nitrogen accumulation before jointing stage, but the A-NILs, with huge root systems, exhibited enhanced N accumulation in both aerial vegetative organ at anthesis and grain over the B-NILs with small root systems, particularly under LN environment (Figure 4, Table 4). These results demonstrated that *QMrl-7B* has a prolonged positive effect on N accumulation during later vegetative growth and reproductive development of wheat.

Saengwilai et al. [48] found that maize genotypes with few crown roots in six RILs had 45% greater rooting depth in low-N soils, which further enhanced N acquisition, biomass and grain yield. Li et al. [28] detected 331 QTLs for root and NUE-related traits in maize and found about 70% of QTLs for NUE-related traits co-located in a cluster with those for root traits, suggesting genetic associations between root and NUE-related traits in most cases. Some reports in wheat revealed the linkage or co-localization of root trait QTLs with N uptake QTLs [8,12]. Using the KN9204-derived RIL population, Fan et al. [9] detected a list of QTLs for root architecture and NUE-related traits, and found most of them were mapped in nine clusters. In the present study, the pleiotropic effects of *QMrl-7B* were shown by the prolonged larger root system (Figures 1 and 2), higher N accumulation in the above-ground part and grain in the A-NILs (Table 3). In rice, Obara et al. [49] detected five QTLs for root system architecture and found that the most effective QTL increased the maximum root length and total root length 15.2–24.6%, in a near-isogenic line (NIL) over a wide range of nitrogen concentrations. Other studies showed that root and NUE-related traits might be regulated by the same gene. For example, overexpression of *TaNAC2-5A* enhanced root growth and nitrate influx rate in wheat, increased the root's ability to acquire nitrogen and nitrogen accumulation in aerial parts, and eventually allocated more nitrogen in grains [50].

#### *3.3. The Ideal Root System Enhances Biomass, Grain Yield and NUE*

Up to now, studies principally supported the theory that larger root system is positively correlated with the enhanced nutrient uptake, biomass accumulation and yield formation [51]. In the present study, the A-NILs with superior alleles at *QMrl-7B* exhibited extremely huge root systems over the B-NILs with inferior alleles from seedling till harvest. The seedling aerial biomass of the A-NILs, interestingly, were not significantly different from those of the B-NILs (Figure 1); this insignificant difference between the two types of NILs maintained till jointing stage. At PA10 stage, the aerial biomass of the A-NILs increased dramatically and surpassed that of the B-NILs remarkably (*p* < 0.05) (Figure 1). Till mature, the root dry weight of the two types of NILs paralleled the aerial biomass and grain yield linearly. These results illustrated that there is no correlation between root biomass and the aboveground biomass in early vegetative growth of the very wheat genotype, but the huge root system formed during seedling stage potentially associates with the final biomass and grain yield. The *QMrl-7B* donor parent KN9204, as a nitrogen efficient cultivar [46], bears a larger root system, but moderate tiller number and vegetative biomass in early seedling stage compared to the well-known 1RS-1BL cultivar 'Lovrin 10' [47]. Comprehensively, we proposed that the luxuriant root system, rather than abundant above-ground biomass before jointing, may be essential characteristics of modern wheat cultivars with high yield and NUE.

In wheat, deep root systems contribute to greater yield potential under drought conditions [52]. The drought-adapted genotype SeriM82 showed longer root systems in deep soil layers and higher potential grain yield [41], KN9204 with its robust root system showed high grain yield and high NUE [46]. Similarly, the A-NILs with large root systems also showed a higher aerial biomass prior to harvest than the B-NILs (Figure 3), demonstrating higher potential grain yield. Some researches pointed out that abundant

roots in deep soil are essential for wheat growth and final yield, especially in deficient water and nutrient stresses [53,54]. The A-NILs manifested large root systems in the 100~150 cm soil layers under both LN and HN environments, and also showed significant higher grain yield than the B-NILs. These results demonstrated that *QMrl-7B* has a positive effect on enhanced aboveground biomass and grain yield.

Among the root system architecture traits, the maximum root length decides the root depth in soil and is considered as the most important root traits to impact crop yield [55]. Cane et al. [56] detected a QTL controlling root length on chromosome 7B co-located with grain weight in durum wheat. Fan et al. [9] found the cluster C7B had striking effect on TGW and the loci *QMrl-7B* with KN9204 allele could improve TGW by 4 g (10.64%). In the present study, the mean TGW of the A-NILs was significantly higher than that of the B-NILs by 5.15% to 8.82% under the LN environment and 5.50% to 6.76% under the HN environment, respectively (Table 3), when they were planted at the population level. But the significance was much less than the effect obtained at the individual level when the RILs (10.64%) and *QMrl-7B* NILs (9.19%) were planted in a large row [9]. It seems that planting density has vital influence on the precise evaluation of the genetic effect of *QMrl-7B.* What is more, the increased TGW devoted by *QMrl-7B* greatly contributed to plot grain yield of the A-NILs, over the B-NILs by 10.61% to 12.72% under LN environment and 6.40% to 9.99% under HN environment, respectively (Table 3). These results at the population level further showed that *QMrl-7B* is of great value in elevated grain weight and grain yield.

In conclusion, NILs with superior alleles of *QMrl-7B* not only manifested a luxuriant root system, but also had positive effects on aboveground biomass, grain yield and NPFP, indicating that *QMrl-7B* could facilitate genetic improvement of wheat root system. Therefore, this study provides a valuable case that improving root system via genetic manipulation can contributes directly to increased yield and NPFP.

#### **4. Materials and Methods**

#### *4.1. Plant Materials and Experimental Design*

A major stable QTL *QMrl-7B* (controlling the maximum root length) was identified by hydroponic culture using the recombinant inbred line population derived from the cross between KN9204 and J411 (KJ-RIL) [9]. This QTL was located in the interval 89.50–92.50 cM and the candidate physical region preliminarily ranged from 580.13 to 590.13 Mb (IWGSC1.0) [9]. A residual heterozygous line KJ-RIL239, which was heterozygous within the confidence interval of *QMrl-7B* detected by twelve PCR markers across this interval [9], was selected from F6 progeny and self-pollinated for four generations till F10 progeny. Of which, two types of *QMrl-7B* NILs respectively, with superior alleles from KN9204 (A-NILs) and inferior alleles from J411 (B-NILs), were developed. In this study, the superior parent KN9204, three A-NILs (namely A1, A2 and A3) and three B-NILs (namely B1, B2 and B3) were used as materials.

The seven materials were evaluated under two different nitrogen environments in a split-plot design with three replicates at Luancheng (37◦53 N, 114◦41 E, 54 m altitude), Hebei province, China for 2017~2018 and 2018~2019 growing seasons, respectively (two years × two controlled-environments × three replicates). The low nitrogen (LN) environment was located on a long-term positioned experimental site where no nitrogen fertilizer but 600 kg ha−<sup>1</sup> of superphosphate (around 16% P2O5) were applied throughout the growing period. In the high nitrogen (HN) environment, 300 kg ha−<sup>1</sup> of diamine phosphate and 225 kg ha−<sup>1</sup> of urea were applied before sowing, and 150 kg ha−<sup>1</sup> of urea was applied at the elongation stage with irrigation every year. The field was irrigated twice at elongation and anthesis respectively to keep convenient soil hydraulic status for wheat growth. The soil fertility within the top tillage soil layer (0~20 cm) in each environment were measured after harvest (Table S2).

The plot was 6.3 m2 (7.0 m × 0.9 m) containing 6 rows 0.18 m apart, and 280 seeds were evenly planted in each row. All of the recommended agronomic practices were followed in each of the trials except for the nitrogen fertilization treatment as described above.

#### *4.2. Root Sampling and Measurement*

Roots were sampled at seedling stage before winter (SS), jointing stage (JS), 10 days post anthesis (PA10) and maturity (MS) under both LN and HN environments during the 2017~2018 and 2018~2019 growing seasons. After removing the above-ground part of the plants, the corer of 10 cm diameter was used to take the soil cores from the rows in each plot. The depth of sampling was 60, 100, 150 and 160 cm at the SS, JS, PA10 and MS at intervals of 10 cm, respectively. The soil cores were taken to the laboratory and the root samples were obtained as described by Zhang et al. [18]. The root samples were stored at –20 ◦C to prevent decay. On quantifying the root length (RL, cm) and root surface area (RA, cm2), the root samples were tiled in a transparent dish to be scanned using ScanMaker i800 Plus Scanner (600 DPI) and analyzed by LA-S software (Hangzhou Wanshen Detection Technology Co., Ltd., Hangzhou, China, www.wseen.com). After being scanned, the roots were collected, oven-dried at 105 ◦C for an hour and then kept at 80 ◦C until constant weight to determine root dry weight (RDW, mg). The root length density (RLD, cm/cm3), root area density (RAD, cm2/cm3) and root weight density (RWD, mg/cm3) were calculated using RL, RA and RDW divided by the soil core volume.

#### *4.3. Yield-Related Trait Evaluation*

Ten representative plants in the center of the plot were randomly sampled at physiological maturity to evaluate the yield-related traits. The plant height (PH, cm), spike number per plant (SN), spike length (SL, cm), total spikelets per spike (TSPS), sterile spikelet number per spike (SSPS), kernel number per spike (KNPS) were determined. Thousand-grain weight (TGW, g) was evaluated after harvest using the Seed Counting and Analysis System of WSeen SC-G Instrument (Hangzhou Wanshen Detection Technology Co., Ltd., Hangzhou, China, www.wseen.com). The grain yield per plot (GY, kg/ha) was measured after harvest.

#### *4.4. Measurement of Nitrogen-Related Traits*

Ten representative plants in each plot were randomly sampled at the stages of SS, JS, PA10 and MS, respectively, and the aerial part was oven-dried at 105 ◦C for an hour and then kept at 80 ◦C until constant weight to determine dry matter accumulation (DW, g). The aerial part at the MS was further divided into shoot and grain parts. The dry matter accumulation was corrected to the aerial dry weight per unit area (aerial dry weight, ADW, g/m2) and grain dry weight per unit area (grain dry weight, GDW, g/m2), according to the number of plants per unit area. The dried samples were ground and sifted through a 0.5 mm sieves to determine the total aerial N content (ANC, %) and total grain N content (GNC, %) using a standard Kjeldahl procedure. Based on grain yield, dry weight and total N content, a suite of traits were calculated as follows:

Harvest index (HI) = GDW/ADW

Aerial N accumulation (ANA, g/m2) = ANC × ADW Grain N accumulation (GNA, g/m2) = GNC × GDW

N harvest index (NHI) = GNA/ANA

Partial factor productivity of applied N (NPFP, kg kg<sup>−</sup>1) = GY/N applied amount

Statistical analyses were conducted using the SPSS 20.0 (SPSS, Chicago, IL, United States) and the ANOVA was used to test the difference of the above traits among the genotypes at *p* < 0.05.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/10 .3390/plants10040764/s1, Table S1: Aerial N content (ANC) and accumulation (ANA) of KN9204 and the *QMrl-7B* NILs at different stages; Table S2: Summary of the major macronutrients in top tillage soil layer (0~20 cm) during the two growing seasons; Figure S1: Root length density (RLD), root surface area density (RAD) and root weight density (RWD) of KN9204 and the *QMrl-7B* NILs at seedling stage before winter (SS); Figure S2: Root length density (RLD), root surface area density (RAD) and root weight density (RWD) of KN9204 and the *QMrl-7B* NILs at jointing stage (JS); Figure S3: Root length density (RLD), root surface area density (RAD) and root weight density (RWD) of KN9204 and the *QMrl-7B* NILs at maturity (MS).

**Author Contributions:** L.S. and F.C. planned the research project. J.L. (Jiajia Liu), Q.Z., F.C. and L.S. made genotyping of the materials. J.L. (Jiajia Liu), Q.Z., L.S., J.J., N.Z., D.M., X.R., L.Z. and J.L. (Junming Li) conducted the field experiments. J.L. (Jiajia Liu), H.L., L.S. analyzed the data and wrote the manuscript. F.C., Z.S. and J.L. (Junming Li) revised the manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was jointly funded by the Natural Science Foundation of Hebei Provence (C2019503066), the National Key Research and Development Program of China (2016YFD0100706) and China Agriculture Research System (CARS-03).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Acknowledgments:** We are grateful to our colleague Xiying Zhang for her critical review of the manuscript.

**Conflicts of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

#### **References**


### *Article* **Genome-Wide Association Mapping of Salinity Tolerance at the Seedling Stage in a Panel of Vietnamese Landraces Reveals New Valuable QTLs for Salinity Stress Tolerance Breeding in Rice**

**Thao Duc Le 1, Floran Gathignol 2, Huong Thi Vu 1, Khanh Le Nguyen 3, Linh Hien Tran 1, Hien Thi Thu Vu 4, Tu Xuan Dinh 5, Françoise Lazennec 2, Xuan Hoi Pham 1, Anne-Aliénor Véry 6, Pascal Gantet 2,7,\* and Giang Thi Hoang 1,\***


**Abstract:** Rice tolerance to salinity stress involves diverse and complementary mechanisms, such as the regulation of genome expression, activation of specific ion-transport systems to manage excess sodium at the cell or plant level, and anatomical changes that avoid sodium penetration into the inner tissues of the plant. These complementary mechanisms can act synergistically to improve salinity tolerance in the plant, which is then interesting in breeding programs to pyramidize complementary QTLs (quantitative trait loci), to improve salinity stress tolerance of the plant at different developmental stages and in different environments. This approach presupposes the identification of salinity tolerance QTLs associated with different mechanisms involved in salinity tolerance, which requires the greatest possible genetic diversity to be explored. To contribute to this goal, we screened an original panel of 179 Vietnamese rice landraces genotyped with 21,623 SNP markers for salinity stress tolerance under 100 mM NaCl treatment, at the seedling stage, with the aim of identifying new QTLs involved in the salinity stress tolerance via a genome-wide association study (GWAS). Nine salinity tolerance-related traits, including the salt injury score, chlorophyll and water content, and K+ and Na<sup>+</sup> contents were measured in leaves. GWAS analysis allowed the identification of 26 QTLs. Interestingly, ten of them were associated with several different traits, which indicates that these QTLs act pleiotropically to control the different levels of plant responses to salinity stress. Twenty-one identified QTLs colocalized with known QTLs. Several genes within these QTLs have functions related to salinity stress tolerance and are mainly involved in gene regulation, signal transduction or hormone signaling. Our study provides promising QTLs for breeding programs to enhance salinity tolerance and identifies candidate genes that should be further functionally studied to better understand salinity tolerance mechanisms in rice.

**Keywords:** rice; GWAS; salinity tolerance; Vietnamese landraces; QTL

**Citation:** Le, T.D.; Gathignol, F.; Vu, H.T.; Nguyen, K.L.; Tran, L.H.; Vu, H.T.T.; Dinh, T.X.; Lazennec, F.; Pham, X.H.; Véry, A.-A.; et al. Genome-Wide Association Mapping of Salinity Tolerance at the Seedling Stage in a Panel of Vietnamese Landraces Reveals New Valuable QTLs for Salinity Stress Tolerance Breeding in Rice. *Plants* **2021**, *10*, 1088. https:// doi.org/10.3390/plants10061088

Academic Editor: Igor G. Loskutov

Received: 14 April 2021 Accepted: 25 May 2021 Published: 28 May 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

More than one third of cultivated lands are polluted by excess of salt (NaCl) [1]. Sodium is a toxic element for plants and this is particularly true for rice, which is often cultivated in river delta areas where irrigation water is increasingly frequently contaminated by sea water [2]. Rice is the most important food crop, feeding more than three billion people in the world [3]. In Vietnam, rice occupies 85% of the total agricultural area [4]. However, with 3620 km of coastline spreading from north to south, Vietnam has been ranked among the top five countries likely to be most affected by climate change [5]. Vietnam is one of the first rice exporters in the world and consequently plays an important role in food supply security, particularly in Asian countries [6,7]. The Mekong River Delta and Red River Delta are the main areas of rice production in Vietnam; the Mekong River Delta represents 50% of the total rice production area and supplies 90% of the rice exported by the country [7]. The Mekong River Delta is increasingly menaced by an elevation in salinity due to sea water intrusion that results from different climatic and anthropic factors [8]. According to data from the Ministry of Science and Technology of Vietnam, at the end of 2015 and the first months of 2016, saline intrusion in the Mekong River Delta reached the highest level measured during the past 100 years. In addition to global management of the causes leading to increases in salinity, it is important to breed new varieties of rice tolerant to salinity, which necessitates the identification of genetic determinants conferring salinity tolerance [9]. Several salinity tolerance QTLs (quantitative trait loci) have been identified in rice using association genetics approaches, and the mechanisms undelaying rice salinity tolerance start to be well known (for reviews see [10,11]). The mechanisms involved in rice salinity tolerance act at different levels and combine transcriptional and posttranscriptional or posttranslational regulation events that lead to sodium exclusion or compartmentation in specific cell infrastructures, osmolyte production or anatomical changes that avoid sodium penetration into the internal tissues of the plant [10]. These mechanisms act in different complementary ways that synergistically allow salinity tolerance [10]. For these reasons, it is interesting to combine genetic sources with different and complementary salinity tolerance to increase resistance to salinity, which can also buffer the susceptibility of QTL effects to environmental conditions [12,13]. To identify such complementary sources of salinity tolerance, it is necessary to look for them in the widest and most diverse panels of varieties possible.

To contribute to this goal, in this study, we screened a genotyped panel of 179 Vietnamese landrace varieties of indica and japonica rice collected in different agrosystems from North to South Vietnam for salinity tolerance [14]. Vietnamese landrace varieties are often underrepresented in the studied international panels such as the 3K panel developed by the International Rice Research Institute (IRRI), even though, they potentially constitute an original source of valuable alleles [15,16]. We already used this panel to identify valuable QTLs associated with root, leaf or panicle traits and water deficit tolerance by genome wide association study (GWAS) [17–20]. The plants were screened for salinity stress tolerance at an early developmental stage using a hydroponic culture system in the presence of 100 mM NaCl. The phenotypic standard evaluation score (SES) [21], chlorophyll and relative water content and the concentrations of K<sup>+</sup> and Na<sup>+</sup> ions were measured in leaves. GWAS revealed 26 QTLs including 10 QTLs associated with several traits. Most of these QTLs contain candidate genes that may explain their effect on salinity tolerance, and the function of the genes are further discussed.

#### **2. Results**

#### *2.1. Phenotypic Variation and Heritability of Salinity Tolerance-Related Traits*

The phenotyping experiment was conducted for three consecutive years, from 2017 to 2019. The observed salt tolerance diversity in different accessions was reproducible. The data from the last trial were chosen for performing GWAS, for which the screening protocol was improved and standardized for the Vietnamese rice landrace panel and for the parameter measurement as described in the Materials and Methods section. In this trial,

on the tenth day after the start of salinization, 25 plots of 19 accessions were monitored to have simultaneously reached a score of 7, which included the susceptible check IR29. A total of 9 salinity tolerance-related traits were evaluated, three of which (leaf water content (WC), chlorophyll a to chlorophyll b ratio in leaves (Chla\_b), and ratio of Na+/K+ in leaves (Na\_K)) were computed from the directly measured traits. Statistical analysis was conducted for the full panel and the indica and japonica subpanels (Table 1). Within the full panel and the indica subpanel, significant replication and genotypic effects were observed for most of the traits, with the exception of Chla\_b. Meanwhile, the genotypic effect for the chlorophyll traits of the japonica subpanel was insignificant (Table 1). The broad-sense heritability (H2) calculated for each trait with a significant genotypic effect was moderate to high, varying from 0.40 to 0.76, while high values were recorded for WC, Score and three ion content traits.

**Table 1.** Phenotypic variation and broad-sense trait heritability for the three panels.


n: number of accessions; Rep: replication; CV: coefficient of variations; H2: broad-sense heritability; WC: leaf water content; Score: score of visual salt injury; Chl\_total: total chlorophyll content in leaves; Chla: chlorophyll a content in leaves; Chlb: chlorophyll b content in leaves; Chla\_b: chlorophyll a to chlorophyll b ratio in leaves; ConcK: K<sup>+</sup> concentration in leaves; ConcNa: Na<sup>+</sup> concentration in leaves; Na\_K: ratio of Na+/K+ in leaves.

> Significant phenotypic variation was observed for all of the traits, with "full name" (CVs) ranging from 15.35% to 70.47% (Table 1). Figure 1 shows statistically significant differences in the mean values of WC, Score and ConcK between the indica and japonica subpanels. Specifically, the indica subpanel displayed a lower WC and higher Score and ConcK than the japonica subpanel (Table 1, Figure 1). Consequently, for the Vietnamese rice landrace panel used in this study, indica accessions were considered less salt-tolerant than japonica accessions.

**Figure 1.** Boxplots of the distribution of salinity tolerance-related traits. Indica subpanel in red; japonica subpanel in blue; statistical significance (ANOVA *p*-values) between the two subpanels is indicated; (**a**) leaf water content (WC); (**b**) score of visual salt injury (Score); (**c**) total chlorophyll content in leaves (Chl\_total); (**d**) chlorophyll a content in leaves (Chla); (**e**) chlorophyll b content in leaves (Chlb); (**f**) chlorophyll a to chlorophyll b ratio in leaves (Chla\_b); (**g**) K<sup>+</sup> concentration in leaves (ConcK); (**h**) Na<sup>+</sup> concentration in leaves (ConcNa); (**i**) ratio of Na+/K+ in leaves (Na\_K).

The correlations among the traits determined the same tendency within the full panel and the two subpanels (Figure S1). However, the correlation coefficients were largely variable between the traits (Table 2). For instance, Score, ConcNa and Na\_K were strongly negatively correlated with WC. ConcNa and Na\_K were also highly correlated with Score. In contrast, ConcK constituted weak correlations with the other traits, except for a moderate correlation with WC. Overall, higher correlations were observed among WC, Score, ConcNa and Na\_K.


**Table 2.** Correlation coefficients between traits in the three panels (below the diagonal). Probabilities above the diagonal.

F: full panel; I: indica subpanel; J: japonica subpanel; WC: leaf water content; Score: score of visual salt injury; Chl\_total: total chlorophyll content in leaves; Chla: chlorophyll a content in leaves; Chlb: chlorophyll b content in leaves; Chla\_b: chlorophyll a to chlorophyll b ratio in leaves; ConcK: K<sup>+</sup> concentration in leaves; ConcNa: Na<sup>+</sup> concentration in leaves; Na\_K: ratio of Na+/K+ in leaves.

#### *2.2. SNP-Trait Associations*

GWAS analyses were conducted for the full panel and for the indica and japonica subpanels separately. The GWAS results are presented in the Q-Q and Manhattan plots in Figure <sup>2</sup> and Figure S2. Using the *<sup>p</sup>*-value threshold of 1 × <sup>10</sup>−4, we identified 64 associations between 58 SNPs and the studied traits, but no associations were detected in the japonica subpanel. These 58 significant SNPs were distributed in 26 QTL regions. Within the detected QTL regions, the number of significant SNPs increased to 119 when the threshold value was set at 1 × <sup>10</sup>−<sup>3</sup> (Table S1). Among these values, 110 SNPs were found in the full panel, 44 were identified in the indica subpanel, and 35 were common between the full panel and the indica subpanel.

A total of 16 QTLs were associated with Chla\_b, 6 with WC, 6 with Score, 4 with ConcNa, 3 with Chl\_total, 3 with Chlb, 2 with Chla, 3 with Na\_K, and 1 with ConcK (Table 3). Ten of the 26 identified QTLs were associated with multiple traits, including QTL\_25 on chromosome 11 associated with five traits (i.e., WC, Score, ConcK, ConcNa, and Na\_K); QTL\_21 on chromosome 9 associated with 4 traits (i.e., WC, Chla\_b, ConcNa, and Na\_K); three QTLs (QTL\_9, QTL\_20, and QTL\_23) associated with three traits; and five other QTLs (i.e., QTL\_13, QTL\_16, QTL\_17, QTL\_19, and QTL\_24) associated with two traits. Most of the individual trait-associated QTLs were detected for chlorophyll traits, except for QTL\_1, which was related to Score. The number of significant SNPs within each QTL varied from 1 to 33, whereas QTL\_25 was defined by 33 SNPs, QTL\_21 by 14 SNPs, QTL\_1 and QTL\_4 by 8 SNPs, and QTL\_16 and QTL\_19 by 7 SNPs (Table 3).

**Figure 2.** Manhattan plots and Q-Q plots for GWAS of salinity tolerance-related traits in the full panel. (**a**) Leaf water content (WC); (**b**) score of visual salt injury (Score); (**c**) total chlorophyll content in leaves (Chl\_total); (**d**) chlorophyll a content in leaves (Chla); (**e**) chlorophyll b content in leaves (Chlb); (**f**) chlorophyll a to chlorophyll b ratio in leaves (Chla\_b); (**g**) K<sup>+</sup> concentration in leaves (ConcK); (**h**) Na<sup>+</sup> concentration in leaves (ConcNa); (**i**) ratio of Na+/K+ in leaves (Na\_K). In the Manhattan plots, significant SNPs are highlighted in red.

Therefore, among the 26 detected QTLs, QTL\_25 was supposed to be the major QTL due to being mapped by the highest number (33) of significant SNPs and associated with the greatest number (5) of traits in both the full panel and the indica subpanel (Figure 3). The next was QTL\_21, which was common to 4 traits and supported by 14 significant SNPs.

**Figure 3.** QTL\_25. (**a**) Manhattan plot for K+, Na+ and water content in leaves in the full panel; (**b**) linkage disequilibrium (LD) heatmap. In the Manhattan plots, significant SNPs are highlighted in red, and genes of interest are mentioned. The genomic region of QTL\_25 is specified in the boundary area in the LD heatmap.

#### *2.3. Colocalizing QTLs and Candidate Genes Underlying the Detected QTLs Involved in Salinity Tolerance*

The sites of the QTLs identified in this study were compared with QTLs detected in mapping populations and derived by other GWASs related to salinity tolerance. Consequently, most of our QTLs colocalized with already known QTLs, except for QTL\_6, QTL\_17 and QTL\_22 (Table S2). We found a total of 100 colocalizations, of which 17 were detected by GWAS [22–26], and 83 other colocalizations were mapped in biparental populations [13,27–45]. In particular, 8 colocalizations shared similar traits (leaf chlorophyll content, K<sup>+</sup> concentration, Na<sup>+</sup> concentration, leaf water content). In addition, colocalizations with QTLs identified in previous studies for other traits using the same Vietnamese rice panel and genotyping data were observed (Table S3). For the latter, forty-nine overlapping associations were found that underlie QTL\_3, QTL\_6, QTL\_8, QTL\_17, QTL\_20, and QTL\_26.

In the region of almost all QTLs identified, a number of candidate genes related to the response of plants to salt or abiotic stress were found, with the exception of QTL\_7, QTL\_11, QTL\_14, QTL\_15, QTL-17, QTL\_18 and QTL\_22 (Table 3). These candidate genes encode different kinds of proteins including transcription factors, receptor-like kinases (RLKs), mitogen-activated kinase (MAPK), enzymes and transporters.


*Plants* **2021**, *10*, 1088

**Table 3.** List of candidate genes located within the identified QTLs.


**Table 3.** *Cont.*


**Table 3.** *Cont.*

150


**Table**

**3.**

*Cont.*


**Table 3.** *Cont.*

152


**Table 3.** *Cont.* photoperiodic

 signals

#### **3. Discussion**

Rice is considered to be very sensitive to salinity [98,99]. Here, to determine the response of the Vietnamese rice landrace panel to salinity, a moderate salinity stress (100 mM NaCl) was applied at the seedling stage. We assessed a total of 9 phenotypic traits, all of which showed high variability within the panel in response to salinity stress. Of these 9 traits, WC, Score and three ion content traits (ConcNa, ConcK and Na\_K) exhibited high heritability (0.60–0.76). Additionally, strong correlations (0.59–0.97) were observed among these traits with the exception of ConcK, indicating that WC, Score, ConcNa and Na\_K were strongly associated with the response of rice plants to salinity stress, which is consistent with previous studies on rice salinity tolerance evaluation [100–102]. WC is a physiological parameter of the plant water status that expresses the response of plants to osmotic stress [103,104], ionic content traits reflect the level of ionic stress (ion homeostasis) [105], and the salt injury score is an indicator of plant damage/survival (growth performance) under salinity stress [100]. Previous studies reported that rice accessions tolerant to salinity stress have the ability to reduce the osmotic stress, prevent the excess accumulation of Na+ and absorb greater K+ to maintain a low shoot Na+/K+ ratio [106–109].

Correlations among the traits varied in the same direction in the full panel and the two subpanels. However, the japonica subpanel had, on average, greater WC and lower Score than the indica subpanel, indicating that japonica accessions are more salt-tolerant than indica accessions. This finding contradicts the results reported in a previous study [101] that used 4 japonica varieties and 6 indica varieties. This contradiction can be explained by the difference in the number of rice accessions included in the screening. In our study, 112 indica and 64 japonica accessions were evaluated.

In this study, GWAS analyses were applied for the full panel and for the two subpanels. Thus far, we succeeded in identifying 119 significant SNPs assigned to 26 QTLs. Twentytwo QTLs were detected in the full panel and 15 QTLs were detected only in the indica subpanel, but no japonica-specific QTLs were found, although japonica seems to have an average higher salinity tolerance than indica accessions. Similarly, in other studies using the same rice panel and genotyping data screened for water deficit tolerance and leaf traits, no japonica-specific QTLs were detected [17,18], likely because japonica accessions represent only one-third of the total accessions of the panel. In this study, of the 26 identified QTLs, 11 QTLs were detected in both the full panel and the indica subpanel, and 10 QTLs were associated with two or more traits (Table 3). Interestingly, all the QTLs that were detected for WC colocalized with QTLs associated with Score and/or ion content traits, except for QTL\_13, which was found for WC and Chla\_b. These results suggest that WC, Score and ion content traits have a shared genetic basis related to salinity stress responses. However, there was no overlap between QTLs detected for chlorophyll-related traits and ion content traits.

The QTLs discovered in this study were located on most chromosomes, apart from chromosome 6, and we found 3 QTLs (QTL\_1, QTL\_2 and QTL3) on chromosome 1, but none of these QTLs colocalized with *Saltol*, a well-known major QTL for rice salinity tolerance at the seedling stage [110,111]. A large number of QTLs for salinity tolerance detected in this study colocalized with QTLs detected in other studies and populations under conditions of salt stress at vegetative or reproductive stages, which validates our approach (Tables S2 and S3). Interestingly, of 26 QTLs identified in the present study, 3 QTLs (QTL\_6, QTL\_17 and QTL\_22) did not colocalize with previously reported QTLs and thus constituted novel QTLs. They can be of high interest to bring new salinity tolerance sinks into breeding programs.

The major QTL identified in our study, QTL\_25 at 18,273.1–18,684.5 kb on chromosome 11 was associated with WC, Score, ConcK, ConcNa and Na\_K (Table 3, Figure 3). In particular, QTL\_25 was mapped by 33 significant SNPs, and each of them contributed 5.75–14.09% to the phenotypic variation (Table S1). QTL\_25 colocalized with previously identified QTLs under conditions of salinity stress using different mapping populations (Table S2), i.e., with 2 QTLs associated with leaf water content [42], with QTL qSHL11.1 for shoot length and QTL qRTL11.1 for root length [39], and with 4 GWAS-derived QTLs for the number of unfilled grains per plant [25] (Table S2), suggesting that this QTL has a pleiotropic effect on plant growth and reproduction under salinity stress and likely acts synergistically with other major salinity tolerance QTLs such as *Saltol*, in enhancing the salinity tolerance in rice.

Compared to the previous GWAS [14,17,18,20,112], using the same rice panel and genotyping data, we found 28 associations of 6 QTLs identified in this study colocalized with 23 associations of [14,17,20,112], but there was no colocalization between QTLs for salinity tolerance-related traits and for leaf mass traits [18] (Table S3). Remarkably, 9 associations for various drought tolerance-related traits, including relative water content after 2 and 3 weeks of drought stress, slope of relative water content after 2 weeks of drought stress, drought sensitivity score after 2, 3 and 4 weeks of drought stress, and recovery ability, belonging to QTL q9 of [17], were colocalized with all associations in QTL\_17 for Chla\_b and Score, suggesting that this genomic region contains important genetic determinants for rice adaptation to osmotic stresses.

Underlying 19 out of the 26 QTLs detected in this study, a high number of genes were annotated or functionally associated with salinity tolerance (Table 3).

Most candidate genes encode transcription factors reported to be involved in the rice response to salinity or abiotic stresses. Found in QTL\_1 for Score, the *OsERF922* gene (*ETHYLENE RESPONSE FACTOR 922*, *Os01g54890*) negatively regulates tolerance to salinity stress through an ABA signaling pathway, since rice transgenic plants overexpressing *OsERF922* exhibited reduced salinity tolerance with increased shoot Na+/K+ ratio and ABA level, and knockdown of *OsERF922* expression reduced the ABA accumulation [46]. Additionally, as a member of the ERF gene family, *OsERF922*, the expression of *OsERF#103* (*ETHYLENE RESPONSE FACTOR 103*, *Os02g52670*) (in QTL\_5), was reported to be upregulated under drought and salinity stress conditions at the seedling stage [65].

Furthermore, we found two potential genes encoding bZIP transcription factors. In plants, bZIP genes are involved in the response to abiotic stress [66,113]. One of these two genes is *OsbZIP23* (*b-ZIP TRANSCRIPTION FACTOR 23, Os02g52780*) (in QTL\_5), which was functionally characterized as being an ABA-dependent enhancer of drought and salinity tolerance [66,67]. On the one hand, *OsbZIP23* overexpression significantly enhances tolerance to drought stress, especially to high salinity stress, compared with the wild type [66,67]. On the other hand, the *OsbZIP23* mutant displays significantly reduced tolerance to drought and salinity stress [67]. In addition, the SUMO protease *OsOSTS1* (*OVERLY TOLERANT TO SALT 1*), a gene involved in tolerance to high salinity [114], was reported to directly target OsbZIP23, which results in activation of OsbZIP23 and stimulation of OsbZIP23-dependent gene expression, which helps promote tolerance to drought stress [115]. Similar to *OsbZIP23*, *OsbZIP33* (*b-ZIP TRANSCRIPTION FACTOR 33*, *Os03g58250*), located in QTL\_10, also plays a role as an ABA-dependent enhancer of drought and salinity tolerance. *OsbZIP33* is highly upregulated under drought and high salinity stress conditions. *OsbZIP33*-overexpressing transgenic plants exhibited significantly increased drought tolerance [78].

Three candidate genes belonging to the zinc-finger transcription factors were identified: *OsSAP3* (*STRESS-ASSOCIATED PROTEIN 3*, *Os01g56040*) in QTL\_2, *OsPHD7* (*PHD FINGER PROTEIN 7*, *Os01g66420*) in QTL\_3, and *OsCga1* (*CYTOKININ GATA TRANSCRIP-TION FACTOR 1*, *Os02g12790*) in QTL\_4. *OsSAP3* and *OsPHD7* are related to abiotic stress responses. In particular, the expression of *OsSAP3* is induced in response to drought and salinity stress [50], and *OsPHD7* is upregulated under drought stress [57]; moreover, *OsCga1* is associated with the development of chloroplasts [59] and stay-green [60]. Staygreen refers to the ability to maintain green leaves and photosynthetic capacity and is thus related to plant adaptation to osmotic stress [116]. Overexpression of *OsCga1* delays leaf senescence [59].

Underlying QTL\_2, *OsRDCP3 (RING DOMAIN-CONTAINING PROTEIN 3*, *Os01g56070*) was predicted to be involved in drought stress tolerance [51], and *OsABCI6* (*ABC TRANS-* *PORTER I FAMILY MEMBER 6*, *Os01g56400*) was supposed to be involved in the response to abiotic stress [52,53]. Similarly, the expression of *OsTET2* (*TETRASPANIN 2*, *Os02g12750*), an integral membrane protein found in QTL\_4, was increased in drought-stress seedlings; in addition, this gene was highly upregulated under heat and salinity stress [58].

Two other candidate transcription factor genes were found in QTL\_26 on chromosome 11, including *OsHox33* (*HOMEOBOX GENE 33*, *Os12g41860*) and *OsARF25* (*AUXIN RESPONSE FACTOR 25*, *Os12g41950*). *OsHox33*, encoding an HDZIP transcription factor, is involved in leaf senescence because its knockdown accelerates leaf senescence [92] and is a target of a salinity stress-responsive miRNA [75]. *OsARF25* is also a salinity tolerance-related candidate gene discovered by GWAS, as reported by [93].

Another transcription factor gene identified, *OsAS2* (*ASYMMETRIC LEAVES 2*, *Os01g66590*) in QTL\_3 [55], was associated with the development of plants. *LhCa5* (*PHOTO-SYSTEM I LIGHT HARVESTING COMPLEX GENE 5*, *Os02g52650*) in QTL\_5 was predicted to function in the photosystem [64].

Within the region of QTL\_25, the strongest QTL found in this study, we detected a consecutive set of four *BRASSINOSTEROID INSENSITIVE 1-associated receptor kinase 1 (BAK1)*, including *Os11g31530* (*OsBDG1*), *Os11g31540* (*OsLRR2*), *Os11g31550*, and *Os11g31560* (Figure 3b). *BAK1*, encoding a leucine-rich repeat type II receptor-like kinase, functions as a coreceptor of BRI1 in brassinosteroid plant signaling [117]. Perception of brassinosteroids through the BRI1-BAK1 complex can influence the growth and development of rice plants [118], e.g., regulating the leaf angle and grain size [119] and regulating ABA-induced stomatal closure, which is critical for the survival of plants under water stress [120]. Among these four *BAK1* genes, *OsBDG1* and *OsLRR2* are considered to be involved in salt and/or abiotic stress responses [90,91]. Under salinity stress conditions, *OsBDG1* is significantly upregulated in roots of the rice-sensitive cultivar IR29, whereas *OsLRR2* is upregulated in roots of the rice-tolerant cultivar FL478 [90]. Additionally, the expression of *OsLRR2* is highly induced in leaves after cold and drought treatment; thus, *OsLRR2* is a supposed candidate gene involved in tolerance to abiotic stress [91]. Interestingly, two significant SNPs identified in this study, Sj11\_18426630R and Dj11\_18426457R, were located in the sequence of *OsBDG1* (Figure 3). Dj11\_18426457R is intronic, while Sj11\_18426630R is positioned within a coding sequence (i.e., exon 5) that changes the amino acid sequence in the LRR domain of the OsBDG1 protein. Thus, the perspective of a functional characterization of these *BAK1* candidate genes is opened.

Three other genes encoding receptor-like kinase (RLK) with enhanced abiotic stress tolerance are *Os08g28710* (*OsRLCK253*) in QTL\_19 and *Os12g42060* (*OsWAK128*) and *Os12g42070* (*OsRLCK375*, *OsWAK129*) in QTL\_26. Functionally, *OsRLCK253* confers tolerance to salt and water deficits in transgenic *Arabidopsis thaliana* plants during different growth stages, resulting in yield protection against stress [86]. *OsWAK128* and *Os12g42070* were candidate genes near a GWAS-derived QTL related to salinity tolerance at the seedling stage [93]. In addition, a mitogen-activated protein kinase (MAPK) encoded by the *Os-RLCK84* gene (*Os02g53030*) in QTL\_5 was activated in response to salinity stress [68].

#### **4. Materials and Methods**

#### *4.1. Plant Materials and Genotyping*

This study included 179 Vietnamese rice landraces and 3 control genotypes (Nipponbare, Azucena and IR64). The Vietnamese rice accessions came from diverse locations throughout Vietnam and were originally provided by the Plant Resource Center (21◦00 05N and 105◦43 33E). All 182 accessions were genotyped by 21,623 single nucleotide polymorphism (SNP) markers using genotyping-by-sequencing with a minor allele frequency above 5% [14]. IR29 was used as a susceptibility check for phenotyping experiments. The names of the accessions, provinces of origin and ecosystem are described in Table S4. More detailed information on this panel can be found in [14].

#### *4.2. Phenotyping Experiment*

#### 4.2.1. Salt Treatment

The experiment was conducted from August 26, 2019, to September 24, 2019, at the Agriculture Genetics Institute, Hanoi, Vietnam (21◦02 55 N and 105◦46 58 E). The accessions were grown in hydroponics following the IRRI standard protocol with three replicates [100]. Within each replicate, the accessions were randomly distributed in 5.2 individual plastic trays (36 × 31 × 15 cm) fitted with styrofoam float of 35 slots (2 mm diameter) filled with Peters solution composed of 1 g/L Peters water-soluble fertilizer (20-20-20 NPK) and 200 mg/L ferrous sulfate [21]. A total of 16 plastic trays were used.

The experiment was set under greenhouse conditions. After breaking dormancy at 50 ◦C for five days, seeds were soaked in water for 2–3 days. When germination began, seeds were incubated in a culture room (28 ◦C, photoperiod 12 h light/12 h dark) for 2 days. Once the primary root emerged well at a length of 2–3 cm, seedlings were cultured in styrofoam floats with a nylon net bottom according to the experimental design. Four seedlings were cultured per slot. Three days after seeding, seedlings were thinned to keep 3 well-developed plants per slot. The pH (5.2) and the level of nutrient solution were adjusted daily. The Peters solution was replaced weekly until the end of the experiment. Salinity stress was applied when plants reached the fourth leaf stage. Salt NaCl was gradually supplemented to the hydroponic medium to avoid osmotic shocks. Each time, 50 mM NaCl was separated by two days to obtain a final concentration of 100 mM NaCl. The experiment was stopped once all the plants exhibited drying in most leaves (average evaluation score of 7).

#### 4.2.2. Scoring and Sampling

For each plant, salinity tolerance score was evaluated based on leaf injury symptoms using the modified standard evaluation score (SES) for rice [21], as follows: score 1—normal growth, no leaf symptoms; score 3—near normal growth, but leaf tips or few leaves whitish and rolled; score 5—growth severely retarded, most leaves rolled, only a few elongating; score 7—complete cessation of growth, most leaves dry, some plants dying; score 9—almost all plants dead or dying.

After scoring, the second fully expanded leaves of three plants in each hole were harvested. Quickly cut a 1.5 cm fragment from the leaf base, separately pack the material of each hole in aluminum foil, avoiding folding the leaves, and place on ice for chlorophyll determination. The rest of the cut leaves were immediately put into a small zip plastic bag of known weight for measuring the water content.

#### 4.2.3. Chlorophyll Determination

The chlorophyll content was estimated as described in the protocol of [121] with some modifications. The harvested samples were weighed, put into 2-mL Eppendorf tubes, and ground in liquid nitrogen. The pellet was resuspended in 1.5 mL of 85% acetone solution and centrifuged at 12,000× *g* at 4 ◦C for 15 min. One milliliter of the supernatant was collected, and the absorbance was measured at wavelengths of 645 and 663 nm using a 7305 UV/visible spectrophotometer (Jenway, Staffordshire, UK). The chlorophyll content was calculated as follows: total chlorophyll (Chl\_total, μg/mL) = 20.2 (A645) + 8.02 (A663), chlorophyll a (Chla, μg/mL) = 12.7 (A663) − 2.69 (A645), chlorophyll b (Chlb, μg/mL) = 22.9 (A645) − 4.68 (A663). The values were then converted to the amount of chlorophyll per milligram of fresh tissue (μg/mg). The ratio of chlorophyll a to chlorophyll b (Chla\_b) was also determined.

#### 4.2.4. Water Content Measurement

The bags containing samples were weighed to determine the sample fresh weight (FW). After being dried for 3 days at 70 ◦C in an oven, the sample dry weight (DW) was measured. The leaf water content of each sampling was calculated using the formula: WC (%) = (FW − DW) × 100/FW.

#### 4.2.5. Ion Content Measurement

The above dried samples with known weight (DW, mg) were used for measurement of Na<sup>+</sup> and K<sup>+</sup> ion content. The samples were put into 15-mL Falcon tubes, and 10 mL of 0.1 N hydrochloric acid solution was added. After sample ion solubilization at room temperature overnight, 2 mL of sample solution at 200-fold dilution (10 μL of first sample solution + 2 mL 0,1 N hydrochloric acid solution) was used to measure Na<sup>+</sup> and K<sup>+</sup> concentrations (mg/L) by a SpectrAA 220FS atomic absorption spectrometer (Varian, US). The Na+ and K+ contents (ConcNa and ConcK) were then converted back to the quantity of Na+ and K<sup>+</sup> ions per gram of dry weight (mg/gDW) by the following equations: ConcNa = [Na+ measurement (mg/L) × dilution rate (200) × volume of first sample solution (10 mL)]/DW (mg); ConcK = [K<sup>+</sup> measurement (mg/L) × dilution rate (200) × volume of first sample solution (10 mL)]/DW (mg). The Na+/K+ ratio (Na\_K) was calculated as the proportion of Na+ content to K<sup>+</sup> content.

#### *4.3. Statistical Analysis of Phenotypic Data*

Statistical analysis of phenotypic data (means, standard deviations, coefficients of variation (CVs), graphs) was carried out in the R software v3.6.2. Analysis of variance (ANOVA) was performed to test the effect of genotype and replication using a linear model of the R function lm(). Broad-sense heritability (H2) was used to estimate the genetic variance based on the variance among phenotypic measurements between three replicates of the panel. H2 was computed using the following formula: <sup>H</sup><sup>2</sup> = (*F*-value − 1)/*F*-value, where the *F*-value was derived from analysis of ANOVA for the genotype effect [18]. Phenotypic correlations between traits were evaluated by the Pearson method using the corrplot R package. The R function cor.test() was used to test the significance of the correlation coefficients.

#### *4.4. Genome-Wide Association Study*

The phenotypic data from the salt test and SNP genotypic data on the full panel and the indica and japonica subpanels were separately used to study the marker-trait associations by incorporating a kinship matrix along with population structure. In the Tassel software v.5.0, the structure matrix was determined with 6 axes on the SNP data of the population by running a principal component analysis (PCA). The kinship matrix was built by the pairwise identity-by-state method, to account for relatedness of individuals among 182 accessions. Q-Q and Manhattan plots of the negative log10-transformed observed *p*-values for each SNP-trait association were created to visualize the GWAS results. Markers with a *<sup>p</sup>*-value ≥ <sup>5</sup> × <sup>10</sup>−<sup>4</sup> were declared significant.

The number of QTLs from the detected associations was determined based on linkage disequilibrium (LD) between SNPs surrounding the significant markers. The LD heatmaps were plotted by using the LDheatmap R package, and the genomic regions of QTLs were limited by LD blocks with r<sup>2</sup> values (squared allele frequency correlation) between SNPs > 0.4. For a low LD block (<50 kb), the interval of QTLs was enlarged by a distance of +/− 50 kb. The qqman package in R software was utilized to highlight the significant markers of strong QTLs in Manhattan plots. The genes in the genomic regions of strong QTLs were scanned in the MSU rice database.

#### **5. Conclusions and Future Prospects**

Our approach identified different QTLs characterized by the presence of a high number of genes associated with the response to salinity or abiotic stress. Interestingly, these genes are related to hormone transduction pathways or transcriptional modulation of gene expression in response to stress, suggesting that these QTLs act in complementary ways to control the salinity tolerance, which is of major interest for breeding programs. Pyramiding several favorable QTLs in a variety will ensure a better resilience of the plant to salinity stress under different environmental conditions and then a better sustainability of the variety. Therefore, it will be interesting to conduct introgression of the major QTLs

identified in this study such as QTL25 in modern varieties cultivated in the Mekong or Red River Delta areas such as Bac Thom 7 and Khang Dan 18. The function of the four *BAK1* genes in QTL25 should be specified by generating single and multiple gene mutations using the CRISPR Cas9 system.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/10 .3390/plants10061088/s1, Figure S1: Correlation plots in the full panel and the indica and japonica subpanels, Figure S2: Manhattan plots and Q-Q plots for GWAS of salinity tolerance-related traits in the indica panel. In the Manhattan plots, significant SNPs are highlighted in red, Table S1: GWAS associations and significant SNPs at *<sup>p</sup>* <sup>≤</sup> <sup>1</sup> <sup>×</sup> <sup>10</sup>−<sup>3</sup> in the full panel and the indica subpanel, Table S2: Colocalizations of the QTLs identified in this study with previous reports, Table S3: Colocalizations of the QTLs detected in this and previous studies using the same rice panel and genotyping data, Table S4: List of the 183 rice accessions used in the experiment.

**Author Contributions:** Conceptualization, P.G.; methodology, formal analysis, data curation, G.T.H., A.-A.V. and P.G.; investigation, T.D.L., H.T.V., K.L.N., F.G., L.H.T., H.T.T.V., F.L., X.H.P., A.-A.V., G.T.H.; writing—original draft preparation, G.T.H. and P.G.; writing—review and editing, T.D.L., H.T.V., K.L.N., H.T.T.V., T.X.D., X.H.P., A.-A.V.; visualization, supervision, project administration, G.T.H.; funding acquisition, G.T.H. and P.G. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Ministry of Science and Technology of Vietnam and the French embassy in Vietnam in the frame of project "Application of functional genomics and association genetics to characterize genes involved in abiotic stresses tolerance in rice" (code: NDT.56.FRA/19). This research was also supported by the Global Rice Science Partnership (2011– 2016) and by the CGIAR Research Program (CRP) on rice agri-food systems (RICE, 2017–2022).

**Data Availability Statement:** The GBS genotyping dataset supporting the results of this study has been deposited as a downloadable Excel file in TropGeneDB: http://tropgenedb.cirad.fr/tropgene/ JSP/interface.jsp?module=RICE (accessed on 24 May 2021) tab "studies", study type "genotype", study "Vietnamese panel-GBS data". The seeds of the accessions are available in the National Key Laboratory for Plant Cell Biotechnology of Agricultural Genetics Institute, Hanoi, Vietnam.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### **Morphological Analysis, Protein Profiling and Expression Analysis of Auxin Homeostasis Genes of Roots of Two Contrasting Cultivars of Rice Provide Inputs on Mechanisms Involved in Rice Adaptation towards Salinity Stress**

**Shivani Saini 1, Navdeep Kaur 1, Deeksha Marothia 1, Baldev Singh 1, Varinder Singh 1, Pascal Gantet 2,3,\* and Pratap Kumar Pati 1,\***


**Abstract:** Plants remodel their root architecture in response to a salinity stress stimulus. This process is regulated by an array of factors including phytohormones, particularly auxin. In the present study, in order to better understand the mechanisms involved in salinity stress adaptation in rice, we compared two contrasting rice cultivars—Luna Suvarna, a salt tolerant, and IR64, a salt sensitive cultivar. Phenotypic investigations suggested that Luna Suvarna in comparison with IR64 presented stress adaptive root traits which correlated with a higher accumulation of auxin in its roots. The expression level investigation of auxin signaling pathway genes revealed an increase in several auxin homeostasis genes transcript levels in Luna Suvarna compared with IR64 under salinity stress. Furthermore, protein profiling showed 18 proteins that were differentially regulated between the roots of two cultivars, and some of them were salinity stress responsive proteins found exclusively in the proteome of Luna Suvarna roots, revealing the critical role of these proteins in imparting salinity stress tolerance. This included proteins related to the salt overly sensitive pathway, root growth, the reactive oxygen species scavenging system, and abscisic acid activation. Taken together, our results highlight that Luna Suvarna involves a combination of morphological and molecular traits of the root system that could prime the plant to better tolerate salinity stress.

**Keywords:** rice; abiotic stress; salinity; root; auxin; *YUCCA*; *PIN*; proteomics; mass spectrometry

#### **1. Introduction**

The plant root is the vital organ that serves a wide range of functions and regulates crop productivity. As roots are in direct interface with the soil, they act as the primary site for perceiving environmental stress-related signals for plants [1,2]. Among various environmental stresses, salinity has emerged as one of the most serious threats limiting global crop production and yield [3]. Currently, almost 20% of the world's total irrigated land is estimated to be affected by salinity stress and it is expected that by the end of the year 2050, more than 50% of the world's arable land will become saline [4–7]. High soil salinity induces undesirable changes at phenotypic, biochemical, physiological, cellular, genetic and molecular levels, which are detrimental to plant growth and survival [8]. The root system responds to abiotic stresses by triggering stress adaptive mechanisms, which are supposed to be regulated by a number of factors [2,9,10].

The potential of several phytohormones to ameliorate the damaging effects of salinity stress has attracted the attention of researchers in the recent past [11,12]. Among different

165

**Citation:** Saini, S.; Kaur, N.; Marothia, D.; Singh, B.; Singh, V.; Gantet, P.; Pati, P.K. Morphological Analysis, Protein Profiling and Expression Analysis of Auxin Homeostasis Genes of Roots of Two Contrasting Cultivars of Rice Provide Inputs on Mechanisms Involved in Rice Adaptation towards Salinity Stress. *Plants* **2021**, *10*, 1544. https://doi.org/10.3390/plants10081544

Academic Editors: Igor G. Loskutov and Masayuki Fujita

Received: 15 April 2021 Accepted: 24 July 2021 Published: 28 July 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

phytohormones, auxin is an important plant hormone well-known for controlling the different aspects of plant growth and development including tropistic growth, vascular tissue differentiation, auxiliary bud formation, cell elongation, flower organ development and abiotic stress tolerance [13–16]. It has also been regarded as a master player in triggering salinity stress-induced changes in root system architecture [12]. Auxin regulates the root growth rates by promoting lateral root formation and mediating the size of root meristem by controlling the transition from cell division to cell differentiation processes [17,18]. The processes that determine the spatiotemporal distribution of auxin and the maintenance of auxin homeostasis required for root growth and development include local auxin biosynthesis, transport, perception, signaling, conjugation and degradation [19,20].

Although roots are the critical site for the perception of salinity stress signals and are responsible for triggering stress-related mechanisms in plants, very little attention has been paid to analyzing this underground part of the plant in the context of understanding salinity tolerance. Physiological, biochemical and genetic studies have provided ample evidence in support of the key role of auxin in triggering abiotic stress-mediated differential modifications in the root system architecture of plants [21]. The key role of the maintenance of auxin homeostasis in regulating salinity stress tolerance is emerging in plant biology [22]. In the present study, in order to better understand the mechanisms conferring rice adaptation to salinity stress, we conducted a comparative analysis of various auxin-related genes (which regulate auxin homeostasis) in the roots of salt sensitive IR64 and salt tolerant Luna Suvarna (LS) cultivars of rice under optimal as well as salinity stress conditions. Further, the endogenous content of indole-3-acetic acid (IAA) has also been estimated in the roots of two contrasting salinity stress cultivars of rice, and an analysis of their root morphology has been performed.

For finding significant clues on the adaptive behavior of plants to salinity stresses, studies at the protein level might be a better option compared to at the transcript level since many post-transcriptional and post-translational changes often take place in plants and hence, the rate of transcription and the translation will not necessarily always correlate. Hence, proteome-based approaches involving two-dimensional gel electrophoresis (2-DE) and mass spectrometry (MS) are often utilized for unraveling proteins associated with induced changes in plants as they are very reliable, sensitive and powerful technologies [9,23]. For example, a comparative study of the leaf proteome profiles of the wild salt tolerant Poaceae species *Porteresia coarctata* with two rice cultivars variable in salt sensitivity—IR64 (salt-sensitive) and Pokkali (salt-tolerant)—suggested that, in the leaves of *Porteresia coarctata*, several proteins exhibited up-regulation that could provide it a physiological advantage under salinity stress [24]. However, there are limited reports on a comparative proteome analysis of the root of contrasting salt-responsive cultivars of rice. Therefore, herein a comparison of the root proteome of two rice cultivars differing in salt tolerance has been conducted using 2-DE and MS. Our results showed that salt tolerant rice cultivar LS has better stress adaptive root traits, elevated expression of auxin homeostasis genes and more endogenous IAA content than IR64 cultivar, which could be linked to the acquisition of natural salinity stress tolerance in LS. Further, several salinity stress responsive proteins were detected exclusively in the roots of LS, which might be providing a peculiar property for attaining salinity stress adaptation and tolerance in rice.

#### **2. Results**

#### *2.1. Analysis of Morphological Parameters and IAA Quantification in IR64 and LS*

The differences in the morphological parameters of the two cultivars were clearly observable when cultivated in normal conditions (Figure 1A,B). An approximately 40% increase in the length of shoot and a 70% longer roots were observed in the salt-tolerant cultivar, LS, as compared to the sensitive cultivar IR64 (Figure 1A–C,F). The number of roots (primary root and crown roots) was also found to be 169% more in LS (Figure 1G). Moreover, an increase in the fresh weight of shoots and roots by 101% and 137% respectively, was noticed in LS (Figure 1D,H). Similarly, an approximately 122% and 170% enhancement

in the dry weight of shoots and roots in the tolerant cultivar, respectively, was observed in LS with respect to IR64 (Figure 1E,I). The amount of endogenous IAA was also quantified in the roots of IR64 and LS. It was observed that in LS roots, the endogenous IAA concentration was significantly higher (1.086 μg/gFW) as compared to IR64 roots (0.6608 μg/gFW) (Figure 1J).

#### *2.2. Expression Analysis of Genes Involved in Auxin Homeostasis in IR64 and LS Roots*

To better understand the cause of the differences observed in the IAA content in the roots of both cultivars, the transcript levels of various genes involved in auxin homeostasis were measured by qRT-PCR under optimal and salinity stress conditions. Among various auxin biosynthesis genes, the transcript levels of *OsYUCCA5*, *OsYUCCA7*, and *OsYUCCA8* exhibited significant up-regulation of 2.79, 3.53, and 2.58 fold, respectively, in the roots of the salt-tolerant cultivar LS as compared to the salt-sensitive cultivar IR64 of rice under normal conditions (Figure 2A). In response to salinity stress, significant down-regulation of *OsYUCCA3, OsYUCCA4, OsYUCCA6, OsYUCCA7*, and *OsYUCCA9* by 0.43, 0.15, 0.18, 0.27, and 0.39 fold was observed in IR64 with respect to the control (Figure 2A). In the LS cultivar, auxin biosynthesis genes *OsYUCCA3, OsYUCCA4, OsYUCCA5, OsYUCCA6, OsYUCCA7,* and *OsYUCCA9* exhibited significant up-regulation by 2.83, 2.79, 7.88, 6.75, 4.53 and 3.07 fold, respectively, in the roots upon salinity stress with respect to the IR64 (control) (Figure 2A). On the contrary, the expression level of *OsYUCCA1* and *OsYUCCA8* exhibited significant down-regulation by 0.41 and 0.75 fold, respectively, in the LS root compared to the control (Figure 2A). The transcript levels of *OsYUCCA2* did not show any significant difference in two contrasting salinity stress responsive cultivars of rice under normal and salinity stress conditions.

Among different auxin efflux transporter *OsPIN* genes, *OsPIN2, OsPIN5a,* and *Os-PIN5b* exhibited higher transcript level accumulation of 1.95, 2.36, and 2.46 fold, respectively, in the roots of untreated LS as compared to the IR64 (Figure 2B). On the contrary, significant down-regulation in the expression of *OsPIN1b*, that is, 0.48, was found in the roots of LS as compared to the salt-sensitive cultivar IR64 of rice (Figure 2B). In response to salinity stress in the IR64 cultivar, significant down-regulation of *OsPIN1a* and *OsPIN2* by 0.28 and 0.23 fold, respectively, was observed as compared to the control. The expression of auxin efflux transporters *OsPIN1a, OsPIN1b, OsPIN2, OsPIN3a,* and *OsPIN5b* was up-regulated by 4.51, 1.62, 5.46, 1.42, and 1.54 fold, respectively, in the roots of the LS cultivar under salinity stress as compared to the untreated IR64 control (Figure 2B). However, the transcript levels of *OsPIN1c* and *OsPIN1d* did not show any significant differences between the two cultivars in response to control or salinity stress conditions (Figure 2B). Under the effect of salinity stress, *OsPIN5a* and *OsPIN9* exhibited down-regulation by 0.85 and 0.33 fold, respectively, in LS than the roots of the control (Figure 2B).

**Figure 1.** (**A**) Growth response of IR64 and Luna Suvarna (LS) shoot under control conditions; (**B**) Growth response of IR64 and Luna Suvarna (LS) roots under control conditions; (**C**) Comparative analysis of shoot length in IR64 and Luna Suvarna; (**D**) Comparative analysis of the fresh weight of shoot in IR64 and Luna Suvarna; (**E**) Comparative analysis of the dry weight of shoot in IR64 and Luna Suvarna; (**F**) Comparative analysis of root length in IR64 and Luna Suvarna; (**G**) Comparative analysis of the number of roots in IR64 and Luna Suvarna; (**H**) Comparative analysis of the fresh weight of roots in IR64 and Luna Suvarna; (**I**) Comparative analysis of the dry weight of roots in IR64 and Luna Suvarna; (**J**) IAA estimation in the roots of IR64 and LS. Data represent mean ± SE (n = 15) for the analysis of growth parameters while for IAA quantification mean ± SE (n = 3). Asterisks signs (\*) represent values which were significantly different among different samples (Fisher LSD, *p* ≤ 0.05). Blue color represents IR64 while red color represents LS cultivar.

**Figure 2.** Real-time gene expression analysis of auxin homeostasis genes under control and salinity stress conditions in IR64 and LS roots. (**A**) Real-time gene expression of auxin biosynthesis genes; (**B**) Real-time gene expression of auxin transport genes; (**C**) Real-time gene expression of auxin conjugation and degradation genes; (**D**) Real-time gene expression of auxin receptor genes; (**E**) Real-time gene expression of auxin signaling genes. Three biological replicates were taken and bars represent mean ± SE. Asterisks signs (\*, \*\*, \*\*\*) represent values which were significantly different among different samples (Fisher LSD, *p* ≤ 0.05). The transcript levels of LS under normal condition and, LS and IR64 upon salinity stress treatment were compared with IR64 (control), whose expression was assumed as 1. (-NaCl) refers to untreated samples, (+NaCl) refers to salinity treated samples. Blue color refers IR64 root (-NaCl), red color refers to LS root (-NaCl), green color refers to IR64 root (+NaCl), and purple color refers to LS root (+NaCl). <sup>169</sup>

The transcript levels of auxin conjugation and degradation gene *OsGH3.13* were found to be higher by 2.04 fold in the LS root under normal conditions (Figure 2C). In response to the salinity stress, no significant change was observed in the gene expression of *OsGH3.13* in IR64, whereas *OsGH3.13* displayed significant down-regulation by 0.63 fold in the LS root with respect to the control (Figure 2C). The gene expression of *OsGH3.8* did not exhibit a significant change in the roots of LS under the normal conditions with respect to IR64. *OsGH3.8* displayed 1.21 fold higher accumulations in the tolerant cultivar LS, and its expression remained unaltered in IR64 in response to salinity stress (Figure 2C). The expression of auxin receptor genes, particularly *OsTIR1, OsAFB2* and *OsABP1*, was studied in the roots of salt-sensitive cultivar IR64 and salt-tolerant cultivar LS of rice under normal and salinity stress (Figure 2D). It was observed that the expression of *OsTIR1* and *OsAFB2* displayed up-regulation by 1.96 and 1.86 fold in the roots of the LS under the normal conditions as compared to IR64. Similarly, the transcript level accumulation of *OsABP1*, the auxin receptor of the proteasome independent pathway was also found to be higher by 1.46 fold in the roots of LS than IR64 (Figure 2D). Under the effect of salinity stress, *OsTIR1* exhibited 1.46 fold up-regulation, while *OsAFB2* displayed 0.73 fold down-regulation in IR64 than in the control (Figure 2D). In LS roots, it was observed that the expression of *OsAFB2* and *OsABP1* displayed up-regulation by 1.36 and 1.22 fold, respectively, in response to salinity stress. On the contrary, *OsTIR1* showed 0.68 fold down-regulation in the roots of LS (Figure 2D). Interestingly, various auxin signaling genes, such as *OsARF1, OsARF2, OsARF16, OsAUX*/*IAA1* and *OsAUX*/*IAA4*, also exhibited higher gene expression of 2.49, 3.26, 1.72, 1.66, and 1.54 fold, respectively, in the LS root as compared to the IR64 under control conditions (Figure 2E). In response to salinity stress, the transcript levels of *OsARF1, OsARF2, OsARF16, OsAUX*/*IAA1*, and *OsAUX*/*IAA4* displayed down-regulation by 0.91, 0.62, 0.09, 0.6, and 0.67 fold, respectively, in IR64 with respect to the control. On the contrary, *OsARF2, OsARF16, OsAUX*/*IAA1* and *OsAUX*/*IAA4* exhibited up-regulation by 1.59, 1.96, 1.52, and 1.48 fold, respectively, in the LS root under salinity stress stimuli (Figure 2E). The expression level of *OsARF1* did not show any significant differences in the two contrasting salt tolerant cultivars of rice under salinity stress (Figure 2E).

#### *2.3. 2-DE Analysis of Root Proteins in IR64 and LS*

In complement to auxin-related gene expression analysis, the proteins isolated from roots of 14-days old seedlings of salt-sensitive IR64 and salt-tolerant LS cultivars of rice were subjected to 2-DE analysis. A total of 146 spots were detected in IR64 while 166 spots were observed in LS (Figure 3), using PDQuest 8.0.2 software. Among these protein spots, 27 were observed to be either present/absent and few were of altered intensity in the roots of IR64 and LS. These spots were excised and sent for MALDI TOF/TOF MS/MS analysis. Out of 27 spots, only 18 proteins were successfully identified (Tables 1 and 2). The plant intracellular Ras group related LRR protein 2, B3 domain-containing protein, and Ubiquitin fold modifier protein 1 displayed higher protein expression by 1.76, 1.1, and 3.75 fold in the roots of LS in comparison to IR64, respectively (Table 2). Among the 18 identified proteins, 13 proteins were implicated in abiotic stress responses and two of them exhibited enzymatic activity. One protein constituted the core nucleosome component, however, another protein was involved in autophagy and protein transport and the remaining one protein function in the DNA methylation process.

**Figure 3.** Two-dimensional gel electrophoretic analysis of the protein profiles of rice root proteome under control condition (**A**) IR64 root (**B**) Luna Suvarna (LS) root.

**Table 1.** Specifically expressed proteins in Luna Suvarna (LS) versus IR64 roots identified using MALDI-ToF/ToF mass spectrometry.



**Table 1.** *Cont.*

Spots 1–14 were exclusively observed in the case of Luna Suvarna root, whereas spot 15 was differentially less expressed in the IR64 root.

**Table 2.** Highly expressed proteins in Luna Suvarna (LS) roots as compared to IR64 identified using MALDI-ToF/ToF mass spectrometry.


Comparative expression analysis of identified stress marker genes (which are known for exhibiting tolerance against salinity stress) (Table 3) was performed in the roots of IR64 and LS under the native condition to complete the results obtained with MS at the transcript level. The transcript levels of *4-coumarate CoA ligase 9 (CCoA), β-glucosidases 12 (β-gluc), phytosulfokines 3 (PSK)*, and *B3 domain-containing protein (B3D)* were observed to be 1.09, 1.02, 3.64, and 1.78 fold higher, respectively, in LS roots compared to IR64 under normal conditions (Figure 4). However, the gene expression of *calcineurin B-like interacting protein kinase 21 (CIPK), delta-pyrroline-5-carboxylate synthase (P5CS), cyanate hydratase (CHS), DEAD-box ATP-dependent RNA helicases 53 (DEAD), plant intracellular Ras group related LRR protein 2 (RAS-LRR)*, and *minichromosome maintenance 6 (MCM6)* was 0.69, 0.38, 0.30, 0.21, 0.64, and 0.22 fold lower, respectively, in the roots of tolerant cultivar LS (Figure 4) with respect to the salt-sensitive IR64. Under salinity stress, *β-gluc* and *B3D* displayed up-regulation of 1.22 and 2.44 fold, respectively, in IR64 roots compared to the control (Figure 4). On the contrary, *CIPK, CCoA, PSK, P5CS, CHS,* and *MCM6* showed lower

transcript levels by 0.54, 0.53, 0.066, 0.15, 0.75, and 0.72 fold, respectively, in IR64, while *DEAD* exhibited no significant change in its expression. In response to salinity stress, the expression of *CIPK, CCoA, PSK, CHS, DEAD* and *MCM6* was 1.85, 1.86, 15.26, 1.32, 1.5, and 1.37 fold higher respectively, in LS roots compared to control IR64 (Figure 4). However, the transcript level of *β-gluc, P5CS, B3D* and *RAS-LRR* exhibited 0.83, 0.78, 0.41, and 0.65 fold (Figure 4) lower transcript level accumulation, respectively, in the roots of the tolerant cultivar (LS) under salinity stress stimuli.

**Figure 4.** The relative expression levels of genes encoding for proteins differentially accumulated in IR64 and Luna Suvarna (LS) roots under control and salinity stress conditions. Three biological replicates were taken and bars represent mean ± SE. Asterisks signs (\*, \*\*, \*\*\*) represent values which were significantly different among different samples (Fisher LSD, *p* ≤ 0.05). The transcript levels of LS under normal condition and, LS and IR64 upon salinity stress treatment were compared with IR64 (control), whose expression was assumed as 1. (-NaCl) refers to untreated samples, (+NaCl) refers to salinity treated samples. Blue color refers to IR64 root (-NaCl), red color refers to LS root (-NaCl), green color refers to IR64 root (+NaCl) and purple color refers to LS root (+NaCl). CIPK: Calcineurin B-like interacting protein kinase 21; CCoA: 4-coumarate CoA ligase like 9; β-gluc: Beta-glucosidase 12; PSK: Phytosulfokines 3; P5CS: Delta-1-pyrroline-5-carboxylate synthase; CHS: Cyanate hydratase; DEAD: DEAD-box ATP-dependent RNA helicase 53; RAS-LRR: Plant intracellular Ras group related LRR protein 2; B3D: B3 domain-containing protein; MCM6: minichromosome maintenance 6.

#### **3. Discussion**

Plant roots perceive the salinity stress signals and promptly pass them to the shoot to activate various stress-responsive pathways [1,2,9,10]. Although roots are the important site for the perception of salinity stress-related signals, not much attention has been paid to exploring this underground part of the plant in the context of understanding salinity tolerance attributes. In the present study, a positive correlation has been observed between the root system architecture, auxin content, stress marker proteins, and salinity stress adaptation. It was observed that the salinity stress tolerant LS cultivar of rice has a longer primary root, a larger number of roots and a higher fresh weight and dry weight in comparison to IR64. It is realized that plants acquire deeper roots, more lateral roots, more root hair length, and a larger number of roots and its biomass for achieving natural defense against stress conditions including salinity [25–27]. Moreover, it has been demonstrated that the presence of a larger root/shoot length ratio and a higher root biomass promoted the adaptation of plants towards environmental stresses [24,28]. Thus, the present observation of distinct differences in the root phenotype of LS compared to IR64 could be extrapolated to the acquisition of adaptive morphological traits that enable LS plants to mitigate salinity stress when exposed.

It is believed that phytohormones are critical signaling molecules that function downstream of environmental stimuli and regulate various stress adaptive pathways [29]. In previous studies, it has been demonstrated that high salinity stress greatly affects root architecture by inhibiting primary and lateral root growth through altering the accumulation and distribution of the critical phytohormone, auxin [30–33]. Among different auxins, the role of primary auxin IAA is thought to be fundamental as it is the key player in regulating root development [34,35]. Thus, in the present work, the endogenous levels of IAA have been estimated. Upon analysis, it was observed that LS exhibited significantly higher IAA content compared to IR64 roots. Earlier, it was reported that *iaam-OX* transgenic lines (with higher endogenous IAA level) and wild-type plants of Arabidopsis pretreated with IAA exhibited resistance towards drought stress [15]. However, the triple mutants, *yuc1yuc2yuc6,* which were deficient in endogenous IAA content, showed decreased resistance towards drought stress [15]. Moreover, augmented levels of indole-3-butyric acid (IBA) in growing leaves and higher IAA content in the roots of the highly salt-resistant maize variety, SR03, were observed in response to salinity stress [32]. It was revealed that the increased IAA concentration enhanced the accumulation of cell growth-related agents, such as β-expansins (involved in cell wall extension), under salinity stress [32,36]. In Arabidopsis *iar4* mutants, reduced root meristem activity and root growth were reported due to diminished auxin distribution in root tips, indicating the key role of auxin in root growth and development [30]. The exogenous application of auxin is also well known to positively modulate root architecture, especially the lateral root number [15,33,37,38]. Thus, the higher endogenous levels of IAA observed in salt tolerant rice cultivar (LS) could be considered as one of the prominent reasons for the acquisition of salinity stress adaptive root traits observed in the LS cultivar.

It is well realized that the process of auxin-mediated root development is regulated by a complex interplay between auxin metabolism, its signaling and transport leading to the spatio-temporal distribution of auxin [12,39,40]. Thus, to get insights into the molecular dynamics of auxin homeostasis, the transcript-level expression of different genes involved in the auxin pathway has been analyzed in roots in the present work. Recent studies suggest that the local biosynthesis of auxin by YUCCA flavin monooxygenases in the roots is the primary source for normal root development and root gravitropic responses [35]. Moreover, it has been demonstrated that five *YUCCA* genes—*YUCCA3*, *YUCCA5*, *YUCCA7*, *YUCCA8*, and *YUCCA9*—express highly in Arabidopsis roots, playing an essential role in the root development [35]. However, the link of *YUCCA* genes and salinity stress adaptation has never been evaluated in rice. Interestingly, in the present study, it was observed that the transcript level accumulation of different *OsYUCCA* genes was higher in the roots of LS. It might be the primary reason for more auxin biosynthesis and its accumulation in LS roots. Further, in response to salinity stress, the transcript level accumulation of *OsYUCCA3, OsYUCCA4, OsYUCCA5, OsYUCCA6, OsYUCCA7,* and *OsYUCCA9* was enhanced in LS roots. In Arabidopsis, *YUCCA8* and *YUCCA9* have been linked with the development of lateral roots, while their mutants develop shorter primary roots suggesting their key role in the development of root system architecture [41]. Hence, the present study hints towards a link between *OsYUCCA* genes mediated enhanced auxin accumulation and subsequently better developed root system architecture for the acquisition of salinity stress tolerance in rice.

Once IAA is biosynthesized, it is transported to the area of its requirement with the help of cell-to-cell auxin transport mediated by *OsPINs* in rice [42]. Earlier, it was found that polar auxin transport is affected by osmotic stress caused by increased salinity or drought [31]. Moreover, flavonoids and phenolic compounds that are accumulated in response to stress exposure also inhibit polar auxin transport [43,44]. Interestingly, in the present study, the transcript levels of auxin efflux carrier genes, such as *OsPIN1a, OsPIN2, OsPIN3a*, *OsPIN5a*, and *OsPIN5b*, were found to be higher in the LS root. Moreover, under salinity stress, the expression of auxin transport genes, particularly *OsPIN1a, OsPIN1b, OsPIN2, OsPIN3a,* and *OsPIN5b*, was up-regulated in the roots of LS. *OsPIN1b* and *OsPIN9* have been suggested to participate in root development in rice, by regulating auxin-cytokinin interaction [45]. Further, *OsPIN2* expresses highly in roots and enhances shoot to root auxin transport [46]. Thus, the increased expression of such transporter genes in LS roots suggests that salt-tolerant rice cultivar has better capability to maintain auxin homeostasis under salinity stress; however, further investigations are necessary to consolidate these findings.

The optimum concentration of IAA is maintained in a cell through their conjugation and degradation by *OsGH3* genes [47]. Hence, the expression of auxin conjugation and degradation gene *OsGH3.13* was analyzed in the roots of salt-sensitive (IR64) and salt-tolerant cultivar (LS) of rice. It was observed that the expression of *OsGH3.13* was significantly higher in the roots of LS, which is contradictory to the observed higher IAA content in LS. Thus, it can be inferred that the role of the *OsYUCCA* genes is probably more critical in regulating auxin content in rice as compared to *OsGH3*. Further, the analysis of *OsGH3.8* at transcript level suggests no difference in expression level between IR64 and LS roots. However, under salinity stress, the transcript level accumulation of *OsGH3.13* was down-regulated in the roots of LS compared to IR64. On the contrary, *OsGH3.8* exhibited up-regulation in the root but not to a significant level. It indicates that the lower expression of *OsGH3.13* under NaCl application in LS root might be responsible for providing tolerance against salinity stress, probably by enhancing IAA levels.

Various findings have suggested that *OsAFB2* and *OsTIR1* are the auxin signaling receptors affected by salinity stress [31,48]. However, their probable role in providing a natural defense against salinity stress has never been evaluated. In the present study, the transcript level accumulation of auxin receptor genes *OsTIR1, OsAFB2,* and *OsABP1* were found to be elevated in the roots of LS with respect to IR64 under normal conditions. This identification of the enhanced expression of auxin receptors can be linked to higher auxin content in the roots of LS compared to IR64. Under salinity stress, the expression of *OsTIR1* showed down-regulation in LS roots with respect to control IR64. On the contrary, the transcript levels of *OsAFB2* and *OsABP1* exhibited up-regulation in the roots of LS compared to IR64 under the salt application. The expression of various auxin signaling genes, *OsARF1, OsARF2, OsARF16, OsAUX*/*IAA1*, and *OsAUX*/*IAA4*, was also found to be higher in the roots of LS compared to IR64 under both control as well as salinity stress conditions, demonstrating elevated endogenous IAA level in LS root compared to IR64. Previous studies have also linked elevated auxin concentration with increased auxin transport and downstream signaling genes [49], thus promoting auxin-mediated root development. In rice, 31 auxin repressor (*OsAUX*/*IAAs*) and 25 auxin activator (*OsARFs)* genes that participate in auxin signaling were observed to be suppressed by cold, heat and drought stress [31]. On the contrary, some *OsAUX*/*IAA,* such as *OsAUX*/*IAA 6,9,18,19,20*

and *28* and *OsARF 4,11,13,14,16,18* and *19,* were induced by at least one among cold, heat, and drought stress [31]. Hence, various auxin signaling genes respond differentially to abiotic stresses such as cold, heat and drought [31]. However, to the best of our knowledge, there is no report on the effect of salinity stress on the auxin signaling genes. It has been suggested that, in Arabidopsis Aux/IAA protein, IAA14 participates in the early stages of lateral root development [50,51]. Hence, the observed elevated expression of *AUX*/*IAA* in LS root compared to IR64 might be extrapolated to the salt tolerance and a higher number of roots detected in the tolerant cultivar.

The comparative proteomics study revealed that some proteins were specifically present in the roots of LS compared to IR64. Among these, CIPK21 and CIPK29, the Ca2+ sensing proteins of the CIPK gene family, have been previously linked with the enhanced tolerance against salinity stress conditions in Arabidopsis [52,53]. It was suggested that *salt overly sensitive 3* (*SOS3*) encodes for CBL, which functions in sensing the cytosolic Ca2+ concentration by directly binding to it [52,54]. The Ca2+ bound CBL proteins directly activate their interacting partners, such as CIPK6, which are involved in auxin transport, regulation of root length and lateral root development [52,54]. CIPK6 also enhances the transcript levels of *NAC, PIN2,* and *P5CS* genes, which promote salinity stress resistance in plants [52]. P5CS protein, which is involved in proline biosynthesis, was also observed to show increased levels in LS roots in the current study. Proline is involved in the maintenance of cell turgor or osmotic balance, stabilizing membranes to prevent the leakage of electrolytes, and regulates reactive oxygen species (ROS) homeostasis [55,56]. Thus, it can be supposed that the increased levels of an enzyme involved in proline biosynthesis could be responsible for enhancing the proline content in the LS roots that has been previously linked to salinity tolerance [2]. In LS roots, a higher abundance of lignin biosynthesis protein, CCoA, was also observed. There is evidence that salinity stress causes increased lignification of the cell wall through maintaining the structural rigidity and durability of desert poplar plants [57]. Thus, the current finding indicates the possible role of lignin deposition in enhancing salinity stress tolerance in rice [57–59]. Further, higher protein accumulation of detoxification enzyme, CHS (which detoxifies cytotoxic compounds such as cyanate), was observed exclusively in LS roots. CHS also supplies salinity-stressed plants with alternative sources of nitrogen and carbon for better adaptation [60,61]. PSK, which was found exclusively in LS roots, is linked to plant immunity and the maintenance of cellular homeostasis, and is also involved in normal root growth and development [62]. PSK also decreases ethylene production which hinders the primary root growth by inhibiting cell proliferation in the meristematic zone and cell elongation in the elongation zone [62–64]. Hence, the augmented root growth and salinity stress tolerance observed in LS compared to IR64 could also be linked to the higher PSK content. Another stress marker protein, DEAD, was observed exclusively in the roots of LS. In earlier reports, DEAD has been shown to provide salinity stress tolerance in transgenic tobacco by reducing oxidative stress through activating the ROS scavenging system [65,66]. It also improves a plant's photosynthesis machinery, enhances plant growth and development, and mitigates salinity stress [65]. Hence, the higher accumulation of DEAD in LS roots might be involved in scavenging excess ROS, leading to the promotion of salinity stress tolerance. Moreover, β-gluc, which enhances the ABA pool, was also found exclusively in LS roots. The key role of β-gluc in releasing active and free forms of abscisic acid (ABA) from physiologically inactive ABA-glucose conjugate pool, resulting in the alleviation of salinity stress, has already been reported [67,68]. Therefore, the higher accumulation of β-gluc in LS roots might promote ABA accumulation, thus enhancing salinity stress tolerance. The content of MCM6 protein was found exclusively in the roots of the salt tolerant cultivar (LS), which plays an important role in the initiation and elongation steps of eukaryotic DNA replication [69]. In one of the previous studies, the role of MCM6 in providing resistance against high salinity and cold stress has already been elucidated [69]. It was also suggested that the ectopic over-expression of *Pisum sativum PsMCM6* in tobacco confers salinity stress tolerance without affecting yield [65,69]. Further, there were some

proteins, such as RAS-LRR, B3D and ubiquitin fold modifier 1, that expressed relatively higher in LS as compared to IR64 roots. The enhanced protein accumulation of RAS-LRR (which encodes polygalacturonase inhibitor proteins, PGIPs) plays a critical role in mitigating salinity stress [68,70]. Further, the roots of LS also exhibited higher protein accumulation of B3D (which triggers various stress-responsive genes) with respect to IR64 roots. RAV (related to ABI3/VP1) protein contains AP2 domain at N-terminal region and B3D in its C-terminal region, which also confer salinity stress resistance through regulating various stress-related genes (*RD29A, RD29B, RAB18, ABI1, ERD15, KIN, ERD10,* and *COR15a*) [71]. The protein content of ubiquitin fold modifier 1 (UFM1) was found to be several-fold higher in the roots of the tolerant cultivar (LS), which could prevent oxidative damage caused by free radicals. It has been suggested that, in addition to ubiquitin, plants utilize a number of ubiquitin-like proteins, such as those related to ubiquitin 1 (RUB1), small ubiquitin-like modifier (SUMO), UFM1, and homology to ubiquitin (HUB), which participates in providing abiotic stress tolerance [72,73]. These proteins confer resistance against salinity stress by prohibiting the damage caused by free radicals and also prevent endoplasmic reticulum-induced apoptosis in protein secretory cells [73,74]. In IR64, upon the application of salinity stress, the expression of *CIPK*, *CCoA, P5CS, PSK, CHS,* and *MCM6* genes exhibited down-regulations with respect to the control IR64, which might lead to salinity stress susceptibility in the sensitive cultivar. The present finding also indicates that, in IR64, the protein turnover rate might be high, probably leading to targeting of the salinity stress responsive proteins towards degradation, leading to salinity stress sensitivity. On the contrary, in LS upon salinity stress application, higher transcript accumulation of *CIPK*, *CCoA, PSK, CHS, DEAD* and *MCM6* was observed, which could be linked to its acquisition of the salinity stress resistance property.

**Table 3.** Role of identified stress marker proteins in salinity stress tolerance.


#### **4. Materials and Methods**

#### *4.1. Plant Material*

The certified and disease-free seeds of salinity stress-sensitive IR64 and salinity stresstolerant LS rice (*Oryza sativa* L.) cultivars were procured from Punjab Agricultural University, Ludhiana, India, and the Central Rice Research Institute (CRRI), Cuttack, India, respectively. LS can tolerate the salt stress up to 8 dSm<sup>−</sup>1. The seeds were surface sterilized with 70% ethanol (*v*/*v*) for 1 min and treated with 0.4% sodium hypochlorite solution containing a drop of tween-20 for 30 min. The seeds were washed thrice with autoclaved distilled water and were then dried on autoclaved Whatman paper (3 mm) for 5 min. After surface sterilization, the seeds were inoculated in the plastic tray containing autoclaved sand moistened with sterile distilled water and were incubated in the culture room at 25 ◦C (day/night), 70–80% relative humidity (day/night), and 14 h photoperiod for 14 days. After 14 days, IR64 and LS seedlings were treated with 100 mM NaCl for 8 h for imposing salinity stress. The roots were later separated for protein and RNA extraction to conduct 2-dimensional gel electrophoresis and gene expression studies, respectively.

#### *4.2. Study of Morphological Parameters*

The seedlings of IR64 and LS were harvested after 2 months and were dipped in water to remove the adhering sand particles. A representative sample of 15 seedlings of both IR64 and LS were selected to study the morphological parameters. Root and shoot length were measured using a meter scale and observations for fresh weight were taken in grams. The root and shoot of each sample were then dried in an oven at 70 ◦C until a constant weight was achieved, and then the observations for dry weight were recorded. The number of roots for each seedling of IR64 and LS was also counted.

#### *4.3. IAA Estimation*

To estimate the content of IAA, 5 g fresh roots of IR64 and LS were crushed finely in liquid nitrogen and extracted in chilled 80% ethanol (15 mL/g) containing butylated hydroxytoluene (BHT) (100 mg/L) [75]. The homogenized material was kept in the dark at 4 ◦C for 24 h and was filtered. The solid residues were re-extracted thrice with 80% ethanol for 4 h without adding BHT. The BHT containing filtrate and the filtrate without BHT were combined and were centrifuged at 8000 rpm for 20 min. The supernatant was concentrated by drying at 30 ◦C in a rotavapor in the dark and was used for further processing while the pellet was discarded. The concentrate was resuspended in 2.5 mL of 0.1 M potassium phosphate buffer (pH 8) and was applied to the PVP column after adding 3-bed volumes of potassium phosphate buffer into the PVP column. After elution, a 3-bed volume of potassium phosphate buffer in the PVP column was added again. The elute was concentrated by drying in the rotavapor at 30 ◦C to obtain 10 mL of elute and its pH was adjusted to 2.5 with 1N HCl. The concentrated 10 mL elute was dissolved in diethyl ether (30 mL) containing BHT (100 mg/L). It was vortexed and kept for 10 min and then the supernatant was taken in a fresh flask (approx. 30 mL). This step was repeated four times. The obtained elute was mixed with 1.5 g of Na2SO4 and kept for 30 min. After 30 min, it was evaporated and dried completely at 30 ◦C using the rotavapor. Then 5 mL of distilled water was added and evaporated on rotavapor at 30 ◦C. The step was repeated twice and a dried pellet was obtained. The pellet was then dissolved in 1.5 mL of methanol (HPLC grade) for IAA estimation. Further, the elution was carried out with 100 % methanol (HPLC grade): Water (Formic acid 0.05% *<sup>v</sup>*/*v*), 35:65, at a flow rate of 1 mL·min−1. The column elutes were passed through a UV detector at 254 nm, and IAA was estimated with reference to an authentic standard of IAA (1 mM) (Sigma Chemical Co., St. Louis, MO, USA). The readings were taken in the replicates of three and the average of peaks was obtained.

#### *4.4. RNA Extraction and cDNA Synthesis*

A total of 150 mg of root sample of IR64 and LS was homogenized in liquid nitrogen using pestle and mortar. Total RNA was isolated using Trizol reagent (Invitrogen,

http://www.invitrogen.com, last accessed on 20 July 2021), as per the manufacturer's instructions. RNase-free DNase (Sigma-Aldrich, USA) was used to remove the genomic DNA and 2 μg of RNA was used to synthesize cDNA in a total volume of 10 μL reaction using the iScript cDNA synthesis kit (Bio-Rad, Hercules, CA, USA) as per the manufacturer's recommendations.

#### *4.5. Quantitative Real-Time (qRT) PCR Analysis*

qRT-PCR was performed to study the differential expression of auxin homeostasis genes in the roots of IR64 and LS. The nucleotide sequences of different genes involved in auxin homeostasis were retrieved from the rice annotation project database (RAP-DB) and the gene-specific primers were designed using Integrated DNA Technologies, USA (http://www.idtdna.com/primerquest/Home/Index, last accessed on 20 July 2021) (Supporting Information-Table S1). The qRT-PCR reaction was performed in 96 well plates using SYBR Green detection chemistry in the StepOne Plus Realtime PCR machine (Applied Biosystems, Waltham, MA, USA). A 10 μL reaction was prepared using 5 μL of 2X Fast SYBR Green (Applied Biosystem), 7.5 ng of each cDNA, 5 μmol each of forward and reverse gene-specific primers and the final volume was raised to 10 μL using sterile nuclease-free water. No template control (NTC) was also set for each primer pair. The thermal cycling was carried out using the following parameters: initial denaturation step at 95 ◦C for the 20 s to activate the Taq DNA polymerase, followed by the 40 cycles of denaturation at 95 ◦C for 3 s and finally annealing at 60 ◦C for 30 s. The melting curve was generated by heating the amplicon from 60 to 90 ◦C. Baseline and threshold cycles (Ct) were automatically determined using the StepOne Plus Software version 2.3 (Applied Biosystems, USA). Fold changes were calculated using CT (ΔΔCT) and normalized against *OsUBQ5* (LOC\_Os1g328400) used as an endogenous control.

#### *4.6. Protein Extraction*

A phenol-based method was used for extracting proteins from 1 g roots of IR64 and LS as reported previously [76]. The samples were homogenized with 6 mL of extraction buffer containing 100 mM KCl, 700 mM sucrose, 50 mM EDTA, and 500 mM Tris-HCl pH 8.0. Further, 2% β-mercaptoethanol, 1 mM PMSF, and a 10 mM protease inhibitor cocktail were added to the extraction buffer just before use. The mixture was vortexed and incubated by agitating on ice for 10 min. After incubation, 6 mL of tris-buffered phenol was added to it and the mixture was again vortexed and incubated on a shaker on ice for 10 min. The homogeneous mixture was centrifuged at 12,000 rpm at 4 ◦C for 20 min. The upper phenolic phase was collected carefully in a fresh tube. Again, 3 mL of the extraction buffer was added to the tris-buffered phenol and the extraction process was repeated and the upper phenolic phase was collected. Further, 5 volumes of 0.1 M ammonium acetate in 100% cold methanol were added to the phenolic phase and the tube was shaken gently. The mixture was incubated at −20 ◦C for protein precipitation overnight. The protein pellet was recovered after 24 h by centrifugation at 12,000 rpm at 4 ◦C for 10 min and the supernatant was discarded. The pellet so obtained was washed thrice with 0.1 M ammonium acetate in cold methanol and then with a mixture containing 80% methanol and 20% acetone, followed by washing with 100% cold methanol. The final washing was given with 100% chilled acetone and the washed pellet was air-dried and stored at −80 ◦C for 2-DE.

#### *4.7. Protein Solubilization and Quantification*

The protein pellets were suspended thoroughly in rehydration buffer (ReadyPrep™ 2-D Starter Kit Rehydration/Sample Buffer #1632106, Bio-Rad, USA). Protein concentration was quantified with a Bradford protein estimation assay [77] using bovine serum albumin (BSA) taken as standard.

#### *4.8. Two-Dimensional Gel Electrophoretic (2-DE) Analysis*

For isoelectric focusing (IEF), 150 μg of protein was dissolved in a total of 130 μL of rehydration buffer containing 8 M urea, 2% CHAPS, 50 mM DTT, 0.2% Bio-Lyte® 3/10 ampholyte, 0.001% bromophenol blue (Bio-Rad, USA) and passively rehydrated over IPG strips (7 cm, pH 3–10, Readystrips, Cat. No. 163-200, Bio-Rad, USA) overnight at 20 ◦C. After rehydration, the strips were focused at 250V for 40 min, 4000 V for 2 h with linear voltage amplification, and finally to 10,000 V h with rapid amplification. After IEF, the strips were incubated with equilibration buffer I, containing 6 M urea, 375 mM Tris-HCL pH 8.8, 2% SDS and 2% DTT for 15 min for reduction (ReadyPrep 2-D Starter Kit Equilibration Buffer I #1632107, Bio-Rad USA). For alkylation of the proteins, the strip was further incubated with 2.5% iodoacetamide dissolved in equilibration buffer II containing 6 M urea, 375 mM Tris-HCL pH 8.8 and 2% SDS (ReadyPrep 2-D Starter Kit Equilibration Buffer I #1632108) for 15 min. The second-dimensional electrophoresis was performed using 12% polyacrylamide gel. After mounting the strip on the gel, it was sealed with 0.5% agarose containing 0.1% bromophenol blue, and the protein molecular marker was also loaded. Electrophoresis was performed at a constant voltage of 100 V for 2 h in tris-glycine-SDS containing running buffer.

#### *4.9. Gel Staining, Imaging, and Analysis*

After 2-DE, gels were stained with Coomassie brilliant blue and were stored in 5% acetic acid until further analysis. Gel imaging was conducted using the Molecular Imager Gel Doc XR system (Bio-Rad, USA) and the images were analyzed using PDQuest 8.0.2 software.

#### *4.10. Protein in-Gel Digestion and Mass Spectrometry (MS) Analysis*

Proteins spots showing variations in their intensities, presence and absence were manually excised from Coomassie brilliant blue-stained gels and were subjected to mass spectrometric analysis [78]. The excised gel pieces were destained properly using 100 mM NH4HCO3/50% ACN solution and washed twice with 200 μL of Milli-Q water for 5 min each and were dehydrated using 100 μL of acetonitrile. The samples were subjected to trypsinolysis in 25 μL of trypsin solution (Sigma, USA) with a concentration of 20 μg/mL in 25 mmol/L NH4HCO3, and were incubated overnight at 37 ◦C. Each digested peptide was further extracted from the gels using 50% trifluoroacetic acid/ 50% acetonitrile, twice at room temperature. The extracted peptides were mixed with 0.5 μL of α-cyano-4-hydroxycinnamic acid (Bruker) of a concentration of 20 mg/mL prepared in 0.1% trifluoroacetic acid, 30% (*v*/*v*) acetonitrile and dried at room temperature. The trypsin digested protein samples were subjected to mass spectrometric analysis using an UltrafleXtreme™ mass spectrometer (Bruker Daltonics Inc. Germany). The instrument was calibrated and finetuned with a mass standard starter kit (Bruker) and standard tryptic digested BSA (Bruker, Germany). TOF spectra were recorded in positive ion reflector mode between mass ranges of 700–3500 Da. For protein characterization, the obtained MS spectra were searched against a non-redundant database (SwissProt database) using a MASCOT search engine with these parameters: taxonomy: *Oryza sativa* (rice); parent ion mass tolerance was set at ± 1.2 Da and MS/MS tolerance at 100ppm; variable modifications, oxidation of methionine (M) and carbadomethylation of cysteine (C) and trypsin enzyme.

#### *4.11. Statistical Analysis*

All the data obtained from different experiments were evaluated using statistical analysis. An unpaired t-test and a one-way analysis of variance (ANOVA) (the Fischer LSD, Waltham, MA, USA) test were conducted to compare the mean differences using Sigma Stat version 3.5. Comparisons with *p* < 0.05 were considered significantly different.

#### **5. Conclusions**

The present study shows that salt tolerant rice cultivars present salinity stress adaptive root traits, likely due to an elevated endogenous auxin content and augmented levels of key salinity stress providing proteins in its roots. Salt tolerant rice LS cultivars exhibited higher transcript-level expression of different genes involved in auxin homeostasis both under control and salinity stress conditions. Thus, our study suggests that an elevated level of auxin and a higher buffering capacity of the auxin homeostasis process may be critical for the acquisition of salinity stress adaptation in rice. Upon 2-DE and MS analysis, several salinity stress tolerance providing proteins were detected that exhibited higher constitutive expression in the roots of LS with respect to IR64. In LS roots, the transcript level of some identified stress marker proteins exhibited lower expression; on the contrary, their protein accumulation was higher in the tolerant cultivar, LS. It indicates that their protein turnover rate might be low. Taken together; these results highlight morphological and molecular features that are critical for rice adaptation towards salinity stress and reveal that this process is multifactorial. Moreover, our results pinpoint several candidate genes that could be artificially overexpressed to increase salinity stress tolerance in rice.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/10 .3390/plants10081544/s1, Table S1: List of primers used for the quantitative real-time polymerase chain reaction.

**Author Contributions:** Conceptualization, P.K.P.; methodology, P.K.P. and S.S.; validation, P.K.P., S.S. and N.K.; formal analysis, S.S., N.K., D.M., B.S. and V.S.; investigation, S.S.; resources, P.K.P., S.S., N.K. and D.M.; data curation, P.K.P., S.S. and N.K.; writing—original draft preparation, S.S. and N.K.; writing—review and editing, P.K.P., P.G., S.S. and N.K.; visualization, S.S., N.K. and D.M.; supervision, P.K.P. and P.G.; project administration, P.K.P.; funding acquisition, P.K.P. and P.G. All authors have read and agreed to the published version of the manuscript.

**Funding:** We are thankful to the funding agencies, the Department of Science and Technology (DST), Government of India, and the Department of Biotechnology (DBT), Government of India, and to CGIAR Research Program (CRP) on rice agri-food systems (RICE, 2017–2022) for supporting this research work.

**Data Availability Statement:** The available data are presented in the manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Plants* Editorial Office E-mail: plants@mdpi.com www.mdpi.com/journal/plants

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18

www.mdpi.com