**The Potential of Payment for Ecosystem Services for Crop Wild Relative Conservation**

**Nicholas Tyack 1,\*, Hannes Dempewolf <sup>2</sup> and Colin K. Khoury <sup>3</sup>**


Received: 28 August 2020; Accepted: 25 September 2020; Published: 2 October 2020

**Abstract:** Crop wild relatives (CWR) have proven to be very valuable in agricultural breeding programs but remain a relatively under-utilized and under-protected resource. CWR have provided resistance to pests and diseases, abiotic stress tolerance, quality improvements and yield increases with the annual contribution of these traits to agriculture estimated at USD 115 billion globally and are considered to possess many valuable traits that have not yet been explored. The use of the genetic diversity found in CWR for breeding provides much-needed resilience to modern agricultural systems and has great potential to help sustainably increase agricultural production to feed a growing world population in the face of climate change and other stresses. A number of CWR taxa are at risk, however, necessitating coordinated local, national, regional and global efforts to preserve the genetic diversity of these plants through complementary in situ and ex situ conservation efforts. We discuss the absence of adequate institutional frameworks to incentivize CWR conservation services and propose payment for ecosystem services (PES) as an under-explored mechanism for financing these efforts. Such mechanisms could serve as a potentially powerful tool for enhancing the long-term protection of CWR.

**Keywords:** crop wild relatives; payment for ecosystem services; payment for environmental services; agrobiodiversity conservation; climate change; agricultural adaptation

#### **1. Introduction**

Crop wild relatives (CWR) are wild or weedy plants that are the progenitors or close relatives of crops including those that readily cross with cultivated taxa as well as more distantly related species within the same or related genera [1]. CWR constitute genetic resources with demonstrated value for plant breeding due to their useful traits, which are not easily found in crops due to domestication bottlenecks and the subsequent narrowing of the cultivated gene pool through breeding efforts [2]. Further, CWR are often important cultural resources being directly harvested for food, spice, medicine, ceremony or other uses [3] and provide various ecosystem services within their natural habitats.

Introgression of CWR traits has led to improvements in an ever-growing list of crops including enhanced resistance to pests and diseases, tolerance to abiotic stresses such as drought, heat and salinity, yield increases, quality improvements and other desired characteristics [1,4]. Maxted and Kell identify the transfer of useful traits from 185 CWR taxa to 29 crop species [5]. Estimates of the value of these contributions have ranged from USD 8 million (USD 17.6 million in 2020) per year for increased sugar content and improved taste from a single wild tomato species [6] to USD 267–384 million (USD 310–445 million in 2020) per year for the wild sunflower gene pool [7]. Pimentel et al. estimate the contributions of CWR genetic material to increased crop yields at approximately USD 20 billion per year (USD 32.4 billion in 2020) in the United States and USD 115 billion (USD 186.3 billion in 2020) per year worldwide [8].

However, with more than a fifth of plant species worldwide threatened with extinction [9] amidst what has been called the "sixth great extinction event in Earth's history" [10], CWR are facing threats from development and other forms of habitat modification, the industrialization of agriculture, invasive species, pollution, overharvesting, overgrazing and climate change. A recent European Red List assessment of Vascular Plants found that 11.5% of assessed CWR species in Europe were threatened with another 4.5% at Near Threatened status [11]. Jarvis et al. estimated that 16–22% of peanut, potato and cowpea wild relative species are likely to become extinct by 2055 [12] while Ureta et al. predicted that climate change is likely to lead to severe reductions in the potential distributions of maize wild relatives [13].

Up to double the amount of food that is presently produced may be required by 2050 to feed a global population of around 9 billion [14]. At the same time, normal growing season temperatures are expected to exceed the most extreme seasonal temperatures recorded from 1900 to 2006 by the end of the 21st century, which will likely have severe effects on the cultivation of many crops [15,16]. For example, rice grain yield can decline by up to 10% for each 1 ◦C increase in growing season minimum temperature in the dry season [17]. With rapidly evolving techniques in molecular biology, it is increasingly feasible to access and utilize genetic material from CWR including distant relatives and the expanded use of CWR with these tools provides a potential pathway to contribute to meeting these challenges through the development of crop varieties that are more resilient and productive [1,2,18]. For example, a gene from the Asian wild rice species *Oryza rufipogon* Griff. has been shown to significantly increase rice yields [19]. The incorporation of CWR derived genetic diversity into elite gene pools has been a key tool for plant breeders for many decades with hundreds of different taxa that have been used in this way especially to introduce disease tolerance traits into domesticated crops. In more recent years the value of crop wild relatives to also address abiotic stress tolerances, including many of relevance to climate change, is becoming more widely recognized and many breeding programs around the world are using these genetic resources in pursuit of that goal. As such, the range of stakeholders involved is also broadening beyond the public breeding programs in universities and national and international agricultural research programs. Several private companies are adding pre-breeders to their staff who often engage in pre-competitive public-private partnerships to utilize CWR and other types of genetic resources [2].

However, the users of CWR diversity for the most part remain far removed from where CWR are found in nature and CWR remain a largely unrecognized group within the field of conservation policy and the ecosystem services literature [20]. The continued lack of sufficient investment in CWR conservation may lead to permanent gaps in the pool of wild genetic resources available to crop breeders. Their extirpation will also have negative impacts locally including the loss of their contributions as wild-harvested plants and the disappearance of the ecosystem services they provide.

At the root of these conservation deficiencies is a lack of, or inadequacies in, institutions and payment systems by which the beneficiaries of CWR conservation services could compensate those who can supply them. Adaptation of payment for ecosystem services (PES) mechanisms to CWR offers a potentially useful tool for correcting this failure and enabling the creation of a missing institutional framework for the conservation of wild and weedy genetic diversity to support future agricultural research and crop improvement efforts. While early PES schemes have encountered challenges in implementation, the mechanism has been shown to be effective in strengthening ecosystem service provision [21] and well-designed PES programs can offer a low-cost and efficient solution for the mitigation of market failures associated with ecosystem service provision such as those associated with the carbon fixation services provided by forests or water filtration services provided by wetlands and riparian buffers [22]. The use of PES over the past years has provided potential PES practitioners with a number of lessons that can help improve program design [23]. Initial research investigating the potential for implementing PES for CWR in fact already exists in the case of several CWR taxa in Zambia where competitive tenders have been held for farmer conservation of CWR in field margins [24] as well as experiences from Latin America, which provide insights for the design of payment for agrobiodiversity conservation service programs more broadly [25]. Importantly, the successful conservation of CWR requires a combination of action on different geographic (local, national, global) and social (individual, market, societal) levels spanning in situ, ex situ and on-farm conservation [26]. If designed well, PES instruments provide the flexibility to further CWR conservation across these different dimensions.

We describe here the ways in which economic benefits flow from the conservation of CWR, discuss the absence of adequate institutional frameworks to incentivize CWR conservation services and discuss the potential of payment for ecosystem services (PES) as a tool for the conservation and sustainable use of CWR genetic diversity.

#### **2. CWR and Ecosystem Services**

Crop wild relative populations provide a number of ecosystem services (Table 1), which include direct contributions locally as well as cultural and genetic resource services more widely. With regard to agriculture and plant breeding, the conservation of these populations provides the important "supporting" ecosystem service of plant genetic diversity [27]. CWR germplasm, as a tangible material product resulting from ecosystem processes, is an "ecosystem good" [28] that is collected from wild, weedy or human managed habitats, deposited in gene banks or other repositories for ex situ conservation, undergoes a process of pre-breeding and breeding and finally results in the introduction of beneficial traits into crop varieties (provisioning service). In traditional farming systems, wild relatives of numerous crops also provide genetic diversity "spontaneously" to crops in the field through the natural gene flow; for example, wild and cultivated populations of cowpea often overlap and there is evidence of substantial hybridization between the two in the field [29]. The Millennium Ecosystem Assessment has also considered genetic resources as primarily a provisioning good/service [30] while other authors see genetic diversity as providing mainly a supporting service to agriculture [27]. The Economics of Ecosystems and Biodiversity (TEEB) clarifies that the collection of useful genetic resources from nature implies a provisioning service whereas the maintenance of genetic diversity, e.g., through the in situ conservation of CWR, should be considered to be a supporting or "habitat" service [31].



When CWR and other forms of agrobiodiversity are conserved and used for crop improvement, agricultural system resilience may be increased through a greater resistance to pests and diseases among other factors. Changing conditions and the outbreak of new pests and diseases can cause significant losses in productivity as occurred during the 1970–1971 Southern corn leaf blight outbreak, which led to the loss of almost 710 million bushels of the US maize crop [34]. Such epidemics can turn into catastrophes such as the Irish Potato Famine, an outbreak of the late blight disease associated with the death or displacement of 25% of the Irish population [35].

The use of CWR to breed more resistant or resilient crop varieties can help to avert persistent crop failures stemming from a genetic deficiency; for example, one of the very few potato cultivars immune to the most recent and virulent strains of late blight is Sarpo Mira with its durable, broad-spectrum resistance coming from genes that originated in the wild potato species *Solanum demissum* Lindl. [36]. The use of CWR can provide insurance value to agriculture by minimizing the risks posed by climate change, droughts and pests and diseases to the genetically homogenous monocultures of modern industrial agriculture. Though valuable traits can also be found in traditional crop varieties and other sources, in some cases the unique traits found in the genomes of CWR are essentially a non-substitutable good as shown also by the provision of cytoplasmic male sterility from sunflower CWR [37].

CWR germplasm is a mostly renewable and non-rival resource in that it can be multiplied and shared at a relatively low cost (for most crops) though it is possible to exclude others from using it. As with other forms of plant genetic resources, CWR thus constitute an imperfect public good with both public and private characteristics. Importantly, various publicly-held collections of CWR of a list of globally important crops are required to be shared freely by the International Treaty on Plant Genetic Resources for Food and Agriculture (ITPGRFA) [38], which came into force in 2004 and strengthened the public and non-excludable aspects of these resources. The ITPGRFA is a major international agreement that governs access to the CWR species of many agricultural, horticultural and forage crops. Parties are governed by the Treaty's Multilateral System of access and benefit sharing, which specifies that publicly-held CWR materials included in Annex 1 of the ITPGRFA in countries that have ratified the Treaty are non-excludable. They must be made available to those who request them for the purpose of research and breeding, rendering these species a global public good whose benefits can be enjoyed by all although payment is required to the Treaty's Benefit Sharing Fund if the accessed materials are used to breed a commercialized variety that itself is available without restrictions for further research and breeding. Under this Multilateral System, plant breeders and other researchers around the world have access to CWR germplasm held in the gene banks of the CGIAR and other international collections as well as the national gene banks of the countries that have ratified and implemented the Treaty (though a number of countries possessing significant CWR resources have not yet done so including China, Israel and Mexico). Non-public collections can also be placed under the umbrella of the Treaty. CWR not under the purview of the Multilateral System of the ITPGRFA fall under the governance of the Nagoya Protocol on Access and Benefit Sharing under the Convention on Biological Diversity [39] and are subject to bilateral agreement negotiations.

Land containing CWR populations maintains two important ecosystem services related to the production of the ecosystem good of CWR germplasm. First, the CWR habitat essentially serves as a natural repository for these resources. Second, it also serves as an incubator for the creation of novel allelic variation and genotypes, some of which may be useful to breeders. When the CWR habitat is destroyed, both of these services are lost as well. CWR populations often contain higher allelic diversity than cultivated diversity as a result of genetic bottlenecks and continued evolution, as has been demonstrated for soybean, and a range of CWR populations maintained across diverse topographies with different microclimates can provide both a valuable resource of genetic diversity and also incubate new traits and alleles of value [40]. Conversely, changes in the CWR habitat can destroy these resources.

In situ genetic reserves are thus an important conservation strategy, complementary to gene banks and other ex situ repositories (and vice versa) and systems employing both in situ and ex situ conservation are generally considered to be the most robust in terms of successful long-term conservation [41,42]. A population of CWR conserved in situ typically contains more genetic diversity than a single accession that is collected and stored in a gene bank, which is a subset meant to represent that population and provides a storage service associated with the conservation of genetic diversity. The incubator service refers to the maintenance of evolutionary relationships and processes in populations preserved in situ. Continued evolution of CWR populations has value for adapting crops to changes in pests and disease pressures [43]. As ecosystems change, for example through

climatic shift and the subsequent migration of species, populations of CWR may adapt and these adaptations may benefit future breeding efforts. As an example, CWR living in dry ecosystems that become even drier due to climate change may respond to selection for increased drought tolerance and become particularly useful in the context of crop improvement for this trait even on short time scales [44]. It is important to note that CWR occur not only in wild ecosystems but also in disturbed and in human managed environments including in and around agricultural fields. While these may be considered as locally useful resources by farmers, wild-harvesters and others, they also may be regarded as agricultural nuisances, for example in wild pumpkins (*Cucurbita* L.), where populations close to agricultural fields in the United States and Mexico may be purposely extirpated because the gene flow can introduce bitterness into the fruit of cucurbit crops [45].

*Ex situ* conservation provides the opportunity to conserve a great diversity of samples in a small area and is thus indispensable for maintaining CWR genetic resources and facilitating their use for crop improvement. However, each sample in a gene bank only represents a single snapshot in time of a limited amount of the genetic diversity of the wild population. The in situ conservation of a CWR population coupled with ex situ protection thus provides the range of conditions under which wild populations may continue to evolve and also be readily available for agricultural research.

#### **3. The Rationale for Payment for Ecosystem Services for CWR Conservation**

Crop wild relatives have demonstrated economic and cultural value [46] and potentially even greater future value but major gaps have been recognized in their conservation, both in situ and ex situ. However, in spite of this demonstrated economic value and the threatened status of many CWR species, with over 70% of CWR taxa identified as high priority for further collecting [47], strong institutional frameworks to support the conservation of CWR do not exist. In the absence of such a framework, the owners of land serving as a habitat for CWR populations are not compensated for conserving CWR and thus have no incentive to do so. In fact, they may very well have various incentives to convert CWR habitat into farmland or develop it for other purposes. Indeed, it is not private actors (individuals or firms) who have implemented most of the past CWR conservation projects but rather the financing for most past major initiatives to conserve CWR in situ to date have typically come from internationally funded projects (e.g., by the Global Environmental Facility).

A key issue associated with the protection of CWR populations in situ is that many of the agricultural benefits arising from CWR conservation are separated both temporally and geographically from those who have the ability to provide them. For example, CWR might remain in a gene bank for several years or even decades before it is selected for breeding, after which it typically takes ten years for the effort to result in a released variety that provides economic benefits.

In this paper, we explore payment for ecosystem service (PES) instruments as incentive mechanisms to provide compensation for the provision of CWR conservation services. Such market instruments have the potential to help bridge the gap between those willing to pay for CWR conservation and those able to conserve them and create stronger institutional frameworks for CWR conservation.

#### **4. Payment for Ecosystem Services Mechanisms for CWR Conservation**

The adoption of payment for ecosystem services (PES) to the CWR context could incentivize the provision of CWR conservation services, bridging the gap between the beneficiaries and providers of these services. The core idea of PES is that "external (ecosystem service) beneficiaries make direct, contractual and conditional payments to local landholders and users in return for adopting practices that secure ecosystem conservation and restoration" [48].

Although payment for ecosystem services has been hailed as arguably "the most promising innovation in conservation since [the enactment of the Convention of Biological Diversity at] Rio 1992" [48], the mechanism has not been widely applied to agrobiodiversity conservation, particularly wild agrobiodiversity [49]. Noting the lack of research in this area, Narloch et al. [49] also proposed "payments for agrobiodiversity conservation services" (PACS) as a PES-like solution to the loss of

landraces and other local crop varieties, which they define as an "economic instrument to tackle market, intervention and global appropriation failures associated with the public good characteristics of agrobiodiversity conservation services through the use of (monetary or in kind) reward mechanisms in order to increase the private benefits from local (plant and animal genetic resources)".

Though PACS is designed to incentivize the on-farm conservation of under-utilized and endangered crop varieties, PES instruments are also promising in the context of mitigating the market failure associated with the under-conservation of CWR by increasing the private benefits of conserving CWR and could fund both in situ and ex situ conservation activities.

#### *4.1. Who Will Pay for CWR Conservation?*

Large scale CWR conservation efforts are slowly increasing at an international and various national levels as seen in the examples of the Sierra de Manantlan Biosphere reserve in Mexico, the Ammiad Project in Israel, the Erebuni Reserve in Armenia and the Global Environment Facility project "In situ conservation of crop wild relatives through enhanced information management and field application", which developed national CWR conservation strategies in Armenia, Bolivia, Madagascar, Sri Lanka and Uzbekistan [7]. Given the value of CWR for adapting agriculture to climate change, the focus of a major CWR-related project over the past decade [50], PES mechanisms for a complementary system of in situ and ex situ conservation of CWR species may also be attractive investment targets for the proposed USD 100 billion Green Climate Fund, which envisions significant investment in climate change adaptation and mitigation measures [51].

There may also be under-explored demand for CWR conservation in the private sector. For example, the drug company Merck & Co. paid an upfront fee of USD 1 million to Costa Rica's National Institute of Biodiversity to help conserve rainforest biodiversity in exchange for the rights to use samples of the plants, insects and microorganisms collected through this program to create new pharmaceutical products [52]. Investing in payments for CWR conservation services could provide agricultural sector companies with germplasm that might be difficult to access otherwise. A PES for CWR conservation program funded by a private agricultural firm could be seen as an investment in the long-term sustainability of the industry and could also be an opportunity for green marketing and corporate social responsibility programs [53,54]. Given that firms are under increasing pressure from stakeholders to reduce their impacts on ecosystems and biodiversity [55], companies that invest in PES for CWR conservation could gain a competitive advantage by advertising their activities through sustainability labelling programs.

Individual governments, companies or organizations may fund PES programs for CWR conservation unilaterally or could contribute to a CWR conservation fund that aims to conserve the wild genetic resources of a given crop gene pool or set of crops, providing streamlined access to contributors. The creation of CWR endowments could also aid in the sustainability and permanence of such programs. The interest from such funds would be used to pay those safeguarding the plants on a frequent basis contingent on the persistence of the CWR populations or for the maintenance of CWR within a protected area. Such programs could be arranged through already existing access and benefit sharing agreements, either bilateral (Nagoya Protocol) or multilateral (ITPGRFA) and could contribute new resources to existing funds, such as the ITPGRFA's Benefit Sharing Fund.

A key challenge facing PES for CWR programs is that the benefits of CWR conservation are typically distant both spatially and temporally from the conservation activities (with the exception of farmers who manage CWR on-farm). In addition, given that CWR and plant genetic resources held as part of the Multilateral System as a whole act as an important global public good, the benefits of these conservation activities are highly diffused. Thus, it is likely that the majority of funds made available for CWR conservation through PES would come either from international institutions (such as the Benefit Sharing Fund of the ITPGRFA, GEF or the Green Climate Fund) or from national governments.

#### *4.2. Who Will Provide CWR Conservation Services?*

Crop wild relative conservation differs from the protection of cultivated crop diversity in that CWR taxa are wild species that generally do not require farmers for their persistence. That said, many CWR taxa can be weedy and can benefit from disturbance caused by humans; for example, on roadsides or on the sides of agricultural fields including CWR of maize as managed within the Sierra de Manantlan Biosphere Reserve. These plant taxa thus constitute a particular class of wild and weedy biodiversity that calls for specific forms of conservation.

While PES for CWR conservation will in some way require different designs from past PES programs, many similarities exist and lessons can be taken from past experiences with PES and PACS more specifically. Depending on the taxa, PES for in situ or on-farm CWR conservation might target farmers (as in Zambia for wild millet, cowpea and sorghum taxa [24]), private landowners, forest managers or conservationists working in the context of protected areas. In some circumstances, successful CWR conservation will require communities to collaborate who do not typically work together (e.g., conservationists/ecologists and agricultural scientists and crop breeders) to further collaborate and PES offers a tool for bridging such gaps similar to the REDD+ (Reducing Emissions from Deforestation and Forest Degradation in Developing Countries) program, which has brought together climate scientists and those working in forestry management and biodiversity conservation.

#### *4.3. Towards Designing a CWR PES Conservation Portfolio*

In this section, we describe a portfolio of CWR conservation actions that fit into the PES framework. These programs could also be coupled with already existing PES schemes by "bundling" CWR with other ecosystem services such as carbon fixation (e.g., REDD+) or water purification [56]. Though this portfolio focuses on in situ conservation, gene banks and other ex situ conservation repository actors are important to long-term CWR conservation and accessibility for use; thus, PES programs for in situ CWR conservation would best include integrated aspects with the ex situ conservation community. A PES scheme for CWR conservation could take many forms including:

Within preserves. CWR can be conserved through the creation of new preserves, the addition of new land containing CWR populations to existing reserves or by providing new funding to the budgets of existing reserves to support CWR conservation programs.

On farms or other highly human managed environments. Property owners could be compensated for protecting CWR populations on their lands. One manner by which to implement such work could be through the inclusion of CWR within agri-environmental payment schemes or other, already existing, PES programs such as REDD+. Many countries have agri-environmental payment programs designed to preserve the provision of environmental public goods such as biodiversity, cultural heritage and scenery. These programs theoretically have the potential to correct market failures associated with these goods [57]. Such programs could, as an example, provide compensation to farmers for conserving CWR in field margins [24] or for setting aside larger portions of their cropland as non-agricultural conservation lands. Owners and managers of roadsides could also be compensated for the protection of CWR within their mowing and herbicide use activities.

Use in landscaping, forage programs and plantations. PES could be used to fund the use of CWR in landscaping, forage programs and plantations of edible and medicinal species. These three strategies could be particularly useful for expanding the distributions of threatened CWR and may enable long-term conservation without further funding if the target species becomes sufficiently popular.

Use in restoration projects. CWR could be prioritized or subsidized for use in restoration activities in their native ranges through PES funding mechanisms.

The creation of new protected areas for CWR conservation may emphasize those that contain several important CWR populations following an optimal reserve design [58,59] as exemplified by the Sierra de Manantlan Biosphere Reserve, with a core area preserving the CWR habitat surrounded by transition and buffer zone containing settlements. Sometimes sites need not be very large, as pioneered by the plant micro-reserve initiative in Valencia, Spain [60] and the Ammiad Project in Israel, a one-hectare site that nonetheless has thus far been successful in conserving an important population of wild wheat [31]. Creating new preserves may be expensive but is sometimes necessary and feasible in areas where the presence of CWR populations overlap with high levels of other ecosystem services and species that are targeted for conservation.

Contracting with farmers and other local people, on the other hand, could help decrease the costs of conserving a CWR population by eliminating the need to purchase the land and turn it into a protected area, allow the transfer of the payments if the CWR population shifts due to climate change and enable those administering the fund to shift the payments to other populations if the original goes extinct. For example, research conducted in Zambia to determine how much farmers would have to be paid to conserve CWR in field margins estimated conservation costs at between USD 23–91 per hectare per year [24]. Payment could be monetary or in kind. CWR conservation may also be compatible with agroforestry activities such as shade-grown coffee or cacao, allowing local people to continue economic activities in the area and farmers that take part in these programs may have the opportunity to command a premium for their products through eco-labelling, gaining multiple benefits from their involvement. As CWR are present in a wide range of habitats, this type of PES scheme for in situ conservation of CWR could also include the management of roadside CWR populations, plans for sustainable management plans of harvested populations of edible CWR and the conservation of CWR populations present in agricultural landscapes by farmers. In any such strategy, payments should be contingent on the quality of the conservation effort (generally, whether the population continues to persist, whether it increases or decreases in size, etc.) to incentivize the effective conservation of the CWR populations. Although it may prove cheaper than the creation of new CWR preserves, this strategy may be difficult to successfully design and implement due to complications with land tenure, questions about who to pay, payment structure and local culture. The long-term sustainability of the strategy will most likely be contingent upon whether or not payments continue.

#### *4.4. Prioritizing CWR for Conservation*

Given limited budgets, it is important to prioritize the most important resources to conserve. For CWR, the potential PES practitioner must prioritize between crop gene pools, individual species and intraspecific populations. Crop gene pools may be prioritized, among other factors, by the economic value of the crop to which they are related, the value of the crop for food security or other cultural values and/or contributions made by the crop to development and poverty alleviation. Another tool that can be used for prioritization is the so-called Weitzman approach. The Weitzman theorem uses diversity, risk status and conservation cost indices to construct a priority portfolio of conservation targets to maximize the diversity that can be conserved with any given quantity of funding [61,62]. The geographic prioritization of CWR for conservation is possible as their native distributions are known to be concentrated in primary regions of diversity around the world. Previous efforts to map the ranges of over 1000 CWR related to 81 globally important crops distinguished areas of the Mediterranean, Near East and southern Europe, South America, Southeast and East Asia and Mesoamerica as particularly CWR rich, with up to 84 taxa potentially overlapping in a 25 km<sup>2</sup> area in Turkey, one of the top global hotspots for CWR diversity (an online tool has been developed that allows conservation practitioners to identify where CWR taxon richness is the highest [63]). A related gap analysis methodology has combined conservation and native distribution data on CWR to map under-collected (ex situ) and under-protected (in situ) areas, identifying priority populations for conservation [64,65].

Another important consideration is the potential usefulness of CWR species and populations to crop improvement efforts, which has been described for the CWR of many crops [66]. Consulting with breeders of the crop of interest to further determine which CWR species and populations they are most interested in may provide further context for selecting conservation targets. A maximum diversity approach might alternatively be adopted in which CWR are selected based on genetic diversity and genetic uniqueness since specific trait values are often challenging to measure in CWR [2].

#### **5. Assessing the E**ff**ectiveness of PES for CWR**

The success of PES mechanisms designed to conserve CWR-related ecosystem services may be assessed according to three main criteria: ecological effectiveness, economic efficiency and social equity [47,67]. This section discusses strategies for maximizing these aspects of PES programs for CWR conservation.

#### *5.1. Ecological E*ff*ectiveness*

The ecological effectiveness of PES schemes for CWR conservation refers to the efficacy and sustainability of programs in contributing to the preservation of the targeted CWR genetic diversity in the medium-term. A major concern of any PES scheme for the in situ conservation of CWR is how effective its payments are in preserving the ecological functions of CWR habitats and the evolutionary potential of the CWR populations. Practitioners designing a PES program for CWR conservation should assess whether or not the CWR population itself can be expected to survive indefinitely into the future; i.e., what is the risk that the population will go extinct, eliminating the benefits the payments for ecosystem services were meant to achieve. If the population is likely to go extinct in the medium-term even with conservation action, it may be a priority for collecting for ex situ conservation but may not be a wise choice for a PES program for in situ CWR protection. Methods from plant conservation biology can be utilized to set and measure the progress of populations, while studies have also predicted the effects of climate change on the range of specific CWR species into the future using methodologies that could be adapted to determine which potential preserve areas are more likely than others to remain as a suitable habitat for CWR species in future climates [10,13,68]. Last, the management of in situ CWR populations conserved through the PES mechanism should include measures to reduce the risks posed by development, livestock, crop introgression and other threats. Genetic erosion may also occur through the selection for desired traits in CWR that are used in landscaping, plantations and in forage programs and care should be taken to prevent the excessive narrowing of the conserved genetic diversity.

#### *5.2. Economic E*ffi*ciency*

The term economic efficiency in the context of PES for CWR conservation refers to the use of project resources such that the net economic benefit resulting from the PES program is maximized. Economic efficiency on a larger scale in this context would imply that the limited funds available for CWR conservation globally are spent in ways that maximize the economic benefits flowing from these projects (through crop improvement and other uses). Though it is difficult to predict which CWR populations will end up being most useful, the prioritization techniques discussed in Section 4.4 may indicate means by which to maximize the value of the genetic material conserved while the cost effectiveness of PES schemes for CWR conservation can be enhanced by preserving the most unique populations and/or by preserving overlapping populations of many taxa of interest. In the case of multiple habitats containing similar amounts of CWR genetic diversity, costs can be cut by selecting the habitat that is the cheapest to conserve. This holds true for the other elements of the PES programs as well, as long as the quality of the service provided remains as high as with less expensive providers. Employing a conservation auction in which potential service providers reveal their cost structures through the process of bidding may be useful in driving down the costs of conservation by helping to select the lowest-cost project partners [49]. Furthermore, it should be noted that complementary funding for the characterization and evaluation of CWR germplasm and its use in pre-breeding activities is essential for increasing the economic benefits flowing from CWR conservation.

#### *5.3. Social Equity*

Finally, social equity should be a key consideration for the design of PES schemes for CWR conservation. Though many authors have emphasized economic efficiency as the primary goal of PES schemes, incorporating social equity concerns is important to the success of PES mechanisms designed for CWR conservation so as to avoid the so-called "PES curse" of negative social impacts [47,69]. There may be more overlap between social equity concerns and the ecological effectiveness and economic efficiency aspects of PES than previously considered, at least for CWR.

The history of CWR conservation contains several examples of conservation programs that were designed to be both ecologically effective and socially equitable. In addition, it should be noted that even if CWR are not cultivated species, on-farm conservation techniques still are essential for the successful conservation of many CWR taxa, as shown by recent research conducted by Fagandini Ruiz et al. [70] on quinoa wild relatives in southern Peru around Lake Titicaca. Six quinoa CWR were found to be present both on permanent native meadows and cultivated land with fallow cycles and plot borders [70]. Other examples of how farmers and local communities have been included in CWR conservation efforts are numerous. For example, the Potato Park in Peru, or Parque de la Papa, is a biocultural heritage area that preserves wild relatives and landrace varieties of potato as well as other Andean crops like quinoa and oca. The park is maintained by six local Quechua communities and its management uses customary laws and institutions to aid in the conservation and sustainable use of the area's natural resources [71]. The Global Environment Facility's CWR Project developed a management plan for wild yams in Madagascar to allow the sustainable harvest and management for these CWR instead of simply cutting off access to the plants that locals had harvested, eaten and sold for centuries. The Sierra de Manantlan Biosphere Reserve in Mexico, with its focus on people as an integral part of the ecosystem, combines the in situ conservation of maize wild relatives and landraces with the development of local agrarian communities, ecotourism and sustainable forest management. The disease-resistant maize wild relative *Zea diploperennis* (Iltis et al.) is preserved within the core zones of the preserve with strict protection, along with *Zea perennis* (Hitchc. Reeves and Mangels) and subspecies *Zea mays* spp. *parviglumis* (H.H. Iltis and Doebley), yet around 40,000 people live in the buffer zone [72]. A project funded by the ITPGRFA's Benefit Sharing Fund is helping to train local farmers and their families in the conservation of a maize wild relative in Nicaragua's Apacunca Genetic Reserve and the area surrounding it as part of a wider package of development activities, seeking to involve communities in the recovery, conservation and use of teosinte (a maize CWR) while ensuring that they receive some benefit as well. Thus, local communities have played a central role in many past CWR conservation projects.

Projects inclusive of social equity considerations tend to be internally originated and driven, owned by the community, fully backed by local practice and culture and strongly supported by other stakeholders [7]. Those designing PES schemes for CWR conservation should recognize the synergies between social equity and the ecological effectiveness and economic efficiency of CWR preservation mechanisms while at the same time acknowledging the potential tradeoffs between these goals. It might not always be possible to use these strategies but social equity considerations should at least be considered during PES mechanism design. Participatory approaches present opportunities for CWR conservation to bring added benefits through the social and economic empowerment of often-marginalized groups by sharing the benefits of the program with those who live nearby; for example, through their involvement in the planting of CWR species or in the maintenance of plantations of edible CWR and in training and job creation in ecotourism activities centered on CWR. They may also help to tap into local ethnobotanical knowledge through the engagement of local parabotanists who may be better suited to identify, manage and educate others about the CWR resources of a particular area. For example, local rural communities were found to have a detailed knowledge of the utility of the flora in the Sierra de Manantlan, using more than 500 of the plant species present in the area [73].

#### **6. Conclusions**

Payment for ecosystem services (PES) has been shown to be an effective mechanism mitigating market failures associated with the provision of ecosystem services such as water filtration, carbon fixation and a number of other economically and culturally valuable functions provided by the natural world. In this article, we argue that PES may offer a useful tool for ensuring the conservation of priority, at-risk populations of CWR of important crops and may assist CWR conservation efforts at a local, national, regional and global scale.

Currently, adequate institutional frameworks to support the conservation of CWR worldwide do not exist, in part due to insufficient incentives for providing CWR conservation services. Payment for ecosystem services could be a promising tool for solving this problem by directly linking payments from public and private beneficiaries of CWR conservation services to their suppliers, bridging substantial spatial and temporal gaps. PES in particular offers a flexible mechanism for advancing CWR conservation that may prove successful in a broad range of situations and scenarios in developing and developed countries alike.

The loss of CWR genetic diversity results in the irreversible destruction of resources of significant importance to the sustainability and resilience of future agriculture, to local food and cultural security and to the provision of local ecosystem services. To ensure that this diversity is present and available when needed, it is necessary that investments be made in the conservation of CWR. The payment for ecosystem services (PES) mechanism has the potential to aid in this goal by providing economic incentives for the maintenance of CWR resources.

**Author Contributions:** Conceptualization, N.T. and H.D.; investigation, N.T., H.D. and C.K.K.; writing—original draft preparation, N.T.; writing—review and editing, N.T., H.D. and C.K.K.; funding acquisition, N.T. All authors have read and agreed to the published version of the manuscript.

**Funding:** The initial research resulting in this article was conducted in 2011–2012 with financial support from the Fulbright Program.

**Acknowledgments:** Many thanks to Luigi Guarino, Ehsan Dulloo, Bo Cutter, Devra Jarvis, Adam Drucker, Nanete Neves, Peter Tyack, Lisen Runsten, Vijay Kolinjivadi and Chelsea Smith for providing comments on previous draft versions. Thanks also to the Fulbright Program for funding and to the Global Crop Diversity Trust for hosting this research.

**Conflicts of Interest:** The authors declare no conflict of interest and the funders had no role in the conceptualization or writing of the manuscript.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **SNP Markers and Evaluation of Duplicate Holdings of** *Brassica oleracea* **in Two European Genebanks**

### **Anna E. Palmé 1, Jenny Hagenblad 2, Svein Øivind Solberg 3,\*, Karolina Aloisi <sup>1</sup> and Anna Artemyeva <sup>4</sup>**


Received: 17 June 2020; Accepted: 20 July 2020; Published: 22 July 2020

**Abstract:** Around the world, there are more than 1500 genebanks storing plant genetic resources to be used in breeding and research. Such resources are essential for future food security, but many genebanks experience backlogs in their conservation work, often combined with limited budgets. Therefore, avoiding duplicate holdings is on the agenda. A process of coordination has started, aiming at sharing the responsibility of maintaining the unique accessions while allowing access according to the international treaty for plant genetic resources. Identifying duplicate holdings based on passport data has been one component of this. In the past, and especially in vegetables, different selections within the same varieties were common and the naming practices of cultivars/selections were flexible. Here, we examined 10 accession pairs/groups of cabbage (*Brassica oleracea* var. *capitata*) with similar names maintained in the Russian and Nordic genebanks. The accessions were analyzed for 11 morphological traits and with a SNP (Single Nucleotide Polymorphism) array developed for *B. napus*. Both proved to be useful tools for understanding the genetic structure among the accessions and for identifying duplicates, and a subset of 500 SNP markers are suggested for future *Brassica oleracea* genetic characterization. Within five out of 10 pairs/groups, we detected clear genetic differences among the accessions, and three of these were confirmed by significant differences in one or several morphological traits. In one case, a white cabbage and a red cabbage had similar accession names. The study highlights the necessity to be careful when identifying duplicate accessions based solely on the name, especially in older cross-pollinated species such as cabbage.

**Keywords:** *Brassica oleracea*; conservation; diversity; genebank; plant genetic resources; SNP

#### **1. Introduction**

A report from the Food and Agriculture Organization of the United Nations (FAO) indicates that up to 70% of the 7.4 million accessions around the world might be duplicate holdings [1]. At the same time, genebanks are struggling with inadequate resources and backlogs in regeneration, characterization, and documentation [2]. Taking a bird s-eye view, duplication is not an efficient conservation approach. At a local level, each collection holder aims to have a large and influential collection. Requesting and maintaining accessions from other genebanks (duplication) has been one way to do this. The European Genebank Integrated System (AEGIS) has managed to involve institutions in more than 30 countries in an action for coordination and collaboration on plant genetic resource conservation [3]. The main idea

is to share responsibility by establishing and operating a European Collection of unique and important germplasm and to increase the conservation efficiency and quality while facilitating the use of the genetic resources [4].

A critical step in this initiative has been the selection of accessions (generally seeds, conserved in genebanks). There are several challenges in such selections, but one is how to handle accessions with the same or similar names [5]. Same or similar names could be due to the duplication of accessions among genebanks, but can also indicate different selections (enterprises' selections) and/or a flexible naming practice of seed material in the past [6,7]. Official variety lists (cultivar lists) and control came in the mid-twentieth century [8,9], and plant breeders' rights came after the ratification of the International Union for the Protection of New Varieties of Plants, which was launched in 1969 [10]. In 1969, local companies' selections were still listed under similar names in Scandinavia, but almost all of them were removed from the national variety lists between 1970 and 1980 [11]. All this has resulted in a large number of older varieties with the same or similar names. An important question is whether accessions of such varieties should be regarded as duplicates when efforts are made to increase the efficiency in genebank conservation.

Recent developments such as using passport data with digital object identifiers (DOI) on accessions and transactions [12], large-scale morphological characterization and phenotyping [13], and molecular studies with next-generation sequencing platforms [14–18] have all improved the possibilities to identify duplicates. Regardless of the approach, proper data and transparent genebank information systems are needed to facilitate duplication assessments [19,20]. *Brassica oleracea* L. (2n = 18) comprises many important crops, including cauliflower, broccoli, and cabbages as well as wild species and subspecies which are cross-compatible with the cultivars [21,22]. The issue of duplicate genebank holdings of *B. oleracea* has been raised [23], and genetic diversity has been investigated using, for example, AFLP (Amplified Fragment Length Polymorphism) markers [24–26] and microsatellites [27]. In these studies, substantial diversity within accessions was observed, but so was a clear differentiation among accessions. A high-density SNP genotyping array for *Brassica napus* and its ancestral diploid species has been developed [28], and in this study we test it on *B. oleracea*.

The main objectives of this study were: (1) to examine the suitability of this Illumina Infinium SNP array to study the genetic diversity and structure in cabbage; (2) to evaluate potential duplicates among genebank accessions with similar names by using morphological traits and the mentioned genetic markers. If the SNP array could be used successfully, we wanted to identify a sub-set of SNP markers to be used for the future screenings of a larger number of accessions in a process of identifying duplicates and incorrectly labelled accessions at a European or global scale.

#### **2. Results**

The *Brassica* Working Group of the European Cooperative Programme for Plant Genetic Resources (ECPGR) has prioritized AEGIS, which means there is an ongoing process of searching for potential duplicates. Based on the cabbage passport data from the N. I. Vavilov Institute of Plant Genetic Resources in St. Petersburg (VIR) and the Nordic Genetic Resource Centre (NGB), we were able to identify 40 pairs, triplets, and groups (hereafter termed groups) based on the "accession name" or "donor name". Here, we present the results for 10 such groups, with a total of 27 accessions (Table 1).

#### *2.1. Morphological Diversity*

Two types of morphological descriptors were analyzed: continuous descriptors, such as plant height and leaf length, and categorical descriptors, such as leaf color and head shape. Many of the continuous, numeric descriptors were positively correlated (arrows pointing to the right in Figure 1). The two first principal components represented 42% and 23% of the total variation. Accessions with the same or similar names grouped together, but only to a certain extent.

**Table 1.** Overview of the accessions included in the study, sorted into groups based on accession name. For each accession, a code is given. Information is provided on the accession number, genebank holding institute, acquisition year, and donor institute. Acquisition year indicates the year when the accession entered the genebank holding institute, and the donor institute is the organization from which the material was sent.


**Figure 1.** Morphological data: Principal component analysis (PCA) biplot where the descriptors of importance are given as arrows and where the length of an arrow is a measure of the descriptor's variance and the angle between arrows is a measure of the correlation between descriptors, with a small angle expressing a high correlation. PC1 and PC2 explain 42% and 23% of the total variation. Accession names are abbreviated as codes; see Table 1.

Within the Amager Tall group, A1 (Amager Hög) and A4 (Grami) did not cluster with A3 (Amager Høj Grøn Grami), A6 (Amager Høj, Grøn, Toftø 67), and A7 (Amager Tall Resistent). The biplot furthermore indicated that the two Amager Winter accessions (A11 and A12), the two Stavanger Torg accessions (ST1 and ST2), and the two Kissendrup accessions (K1 and K2) did not cluster within their respective group.

The two Stavanger Torv accessions were early maturing and needed around 120 days to maturation, while most of the other accessions needed 140 days or more (Figure 2). There was also a large variation in the time to maturity and other traits within many of the accessions. The results from the Tukey multiple comparisons of means (Table S1) showed that A1 differed significantly from A4, A6, and A7 in the time to maturity (all *p* < 0.05). Furthermore, A4 differed from A1 and A6 in the leaf lamina length and plant height, and A4 from A1 in head weight (all *p* < 0.05). A11 and A12 differed in plant height and, notably, also in leaf lamina color, where A11 plants were purple while A12 plants were green. Significant differences were also detected among the Langendijker Summer accessions. The L2 and L3 plants were purple, while L1 had a mixture of purple and green leaves. In addition, L1 differed from L2 in core length (*p* < 0.05). The Stavanger Torg accessions ST1 and ST2 differed in head height and head density, while the Kissendrup accessions differed in the time to maturity (all *p* < 0.05). No clear differentiation was detected among the accessions within the Blåtopp, Ruhm von Enkhizen, Loke, or Jåtunsalgets Vinterkål groups (Table S1).

**Figure 2.** Boxplots describing the variation in the continuous morphological descriptors. Accession names are abbreviated as codes; see Table 1.

#### *2.2. Marker E*ffi*ciency and Accession Diversity*

Among the 5965 markers, 3969 failed to amplify in all the genotyped cabbage plants. Of these, mapping data was available for 3750 markers, 99% of which were located on the A genome in *B. napus*, and therefore were not expected to be found in *B. olereacea*. The largest proportion of failed markers was found on the *B. napus* chromosome A04 (0.540), with the lowest on chromosome C06 (0.002). Duplicate samples showed a high consistency across runs. Each re-genotyped individual differed in only two markers. Individual 135 differed in Bn-A01-p16331424 and Bn-scaff\_15712\_6-p1025930, and individual 136 in Bn-scaff\_15877\_1-p926737 and Bn-scaff\_16553\_1-p6743.

The highest average number of alleles across all the loci was found in accession A4 (1.8) and the lowest in accession K1 (1.2) (Table 2). The same accessions had the highest and the lowest observed heterozygosity and the lowest genetic diversity, calculated as Nei's h (expected heterozygosity under Hardy–Weinberg Equilibrium, HWE) (Table 2). In some cases, accessions with similar names had similar levels of within-accession diversity—for example, ST1 and ST2—but in many cases, different levels of diversity were observed among the accessions within a group (Table 2). The genetic diversity of the accessions from NGB did not differ significantly from those from VIR (*t*-test, *p* = 0.734). Genetic diversity was not significantly correlated with acquisition year for the full data (*p* = 0.177) nor for accessions from NGB (*p* = 0.687), but was positively correlated with the acquisition year for VIR accessions (c = 0.553, *p* < 0.05).


**Table 2.** Genetic diversity in individual accessions as measured by SNP markers.

#### *2.3. Accession Comparisons*

All the pairwise FST values were significantly different from 0 (*p* < 0.001 for all comparisons), indicating genetic differentiation among all accessions. In general, the average pairwise FST values were lower between pairs of accessions belonging to the same group (average 0.193) than between pairs of accessions belonging to different groups (average 0.270) (*t*-test, *p* < 0.001). The lowest FST value (0.087) was found between accessions B6 and B3 (Table 3), both from the Blåtopp group and with a similar morphology (Figure 1, Table S1). The same was true for the Loke pair (LO1 and LO2), with a very low FST between the accessions (Table 3) and no significant morphological differences (Table S1). The highest FST values within groups were found between the pairs A8 and A9 (0.326) and B2 and B3 (0.321, Table 3), accessions showing little morphological differentiation (Figure 1, Table S1). The highest FST value overall was found between the accessions A1 and K1 (0.564, Table S2), the two accessions with the lowest level of genetic diversity. The pattern was reflected when looking at the average number of pairwise shared alleles, where accessions A1 and K1 shared few alleles with many of the other accessions. The lowest number of shared alleles was found between the accessions K1 and A11, and the highest between A1 and A4 (Table S3).


**Table 3.** Pairwise FST values within groups based on all markers (full dataset) and from subsets of 10 markers (subsample average). Subsample values that do not include the pairwise FST value for the full dataset are highlighted in bold.

A STRUCTURE analysis showed equally high support for two and three clusters (Figure S1). At K = 2, the first cluster contained A1, A3, A6, A7, A12, B1, B3, B6, J1, and J2 and the second contained A8, K1, K2, L1, L2, and L3. The remaining accessions showed some level mixed clustering, with similar degrees of mixture for R1, R2, R3 and ST1 and ST2 (Figure 3). At K = 3, the cluster consisting of accessions A8, K1, K2, L1, L2, and L3 remained intact. A second cluster consisted of some of the accessions showing mixed clustering at K = 2: R1, R2, R3, and ST1. The third cluster contained accessions A1, A3, A6, A7, A12, B1, B3, B6, J1, and J2, with the remaining accessions showing various degrees of mixed clustering (Figure 3b). In general, accessions within the same group tended to belong to the same clusters (Figure 3). For example, all the Kissendrup, Langendijk, Ruhm von Enkhuizen, Jatunsalgets, and Loke accessions clustered within the same group. The most notable exception was accession A8, Amager Kurzstrunkiger Original. Based on the accession name, A8 was expected to cluster with the other Amager accessions, but instead the accession clearly clustered with the Kissendrup and Langendijker accessions (Figure 3).

The PCA based on the allele frequencies in the different accessions (Figure 4) supported the clustering found with STRUCTURE (Figure 3) to a large degree and the morphology-based clustering to a lesser degree (Figure 1). One cluster contained A8, K1, K2, L1, L2, and L3, (upper right); one cluster contained accessions R1, R2, R3, and ST1 (lower center); and one cluster contained accessions A1, A3, A6, A7, A9, A12, B1, B3, B6, J1, LO1, J2, and LO2 (upper left). Accessions A4, B2, A11, and ST2 were located between the latter two clusters, substantiating the structure analysis at K = 3. There was no evidence of clustering according to the genebank origin. The VIR accessions acquired at an earlier date tended to be located less centrally in the PCA (c = −0.540, *p* = 0.056) than the NGB accessions. An individual-based PCA showed a good agreement with the accession average-based

PCA. Most individuals of each accession clustered together, but two exceptions were found in accession L3 and accession A4 (Figure S2).

**Figure 3.** STRUCTURE analysis based on the full SNP dataset, assuming (**A**) two genetic clusters (K = 2); (**B**) three genetic clusters (K = 3). Different colours symbolize the different genetic clusters identified in each individual.

#### *2.4. Genotypic and Morphological Comparisons*

In the accession-level PCA based on the SNP data, the first principal component (PC1) separated most of the purple-leafed accessions from most of the green-leafed accessions. The green-leafed accession A8, however, clustered among the purple-leafed accessions, while the purple-leafed accession A11 clustered among the green-leafed (Figure 4). The accession L1, with purple/green leaves, clustered among the purple accessions. Neither head density nor head shape showed any clustering in the accession-level PCA (data not shown).

#### *2.5. Limiting the Number of SNP Markers*

Subsets of the data were analyzed to determine whether a more limited number of markers could be used to capture a similar amount of information as the full dataset. Already, with as few as 10 randomly chosen markers, the FST values obtained were in most cases similar to those calculated from the full dataset (Table 3). Increasing the number of markers to 50, 100, and 500 markers, respectively, reduced the variance of the FST estimates (Figure S3) from an average of 0.088 for 10 markers to 0.011 for 500 markers. The mean FST values did, however, not change consistently in any given direction, and could either increase or decrease with an increasing number of markers. The mean FST, however, always changed less than 0.01.

**Figure 4.** SNPs markers data: Accession-level principal component analysis (PCA) of the SNP markers, where the use of color refers to the leaf color of the cabbage accessions (purple circles = purple leaves, green squares = green leaves, reddish-brown triangles = purple/green leaves). PC1 and PC2 explain 15% and 10% of the total variation, respectively.

Individual-level PCA clustering according to accession could be obtained with a limited number of markers. Already, when subsampling the 20 markers with the alleles providing the largest segregation along PC1 and PC2, a reasonable clustering according to accession could be obtained (Figure S4a). The same number of markers used on the accession-level PCA could only separate accessions along PC1 (Figure S4b).

A single subsample for each 10, 50, 100, and 500 markers was used to investigate whether the same clustering could be obtained in a STRUCTURE analysis as with the full dataset. All subsets indicated K = 2 to be the level of structuring best explaining the data. With only 10 markers, very limited power was obtained to identify structuring, although the clustering of the Jatunsalgets Vinterkål group (J1 and J2) and the Ruhm group (R1, R2, and R3) could be discerned. Surprisingly, as few as 50 markers yielded a structuring similar to that obtained for the full dataset, as well as from 500 markers (Figure S5). A STRUCTURE analysis of 100 randomly chosen markers, however, showed that a lower number of markers was not sufficient to reliably replicate the results of the full dataset.

To explore the efficiency of 50 markers to capture the same structure at K = 2 as the full dataset, an additional 9 datasets of 50 randomly chosen markers were generated. Comparisons of the STRUCTURE analysis of the 10 sets of 50 markers showed that some but not all clusters were reliably identified in each subset (Figure S6). In particular, the clustering of the Jatunsalgets Vinterkål (J1 and J2) and the Loke (LO1 and LO2) groups varied. An analysis with the software CLUMPP showed that the clustering among the 10 sets was not very consistent (H = 0.752).

A subset of markers was chosen with the aim to provide a good resolution for discriminating between accessions. The markers were chosen to provide as high a discriminatory power as possible along the first two principal components in the accession-level PCA (Figure 4). The markers were further evaluated to show a high level of genetic diversity (h > 0.3) and not found to be in high linkage disequilibrium (D < 0.25) with each other. In total, 500 markers were chosen (Table S4).

#### **3. Discussion**

#### *3.1. Discrimination Power and the Number of Markers*

The array used in this study for genotyping *B. oleracea* was originally developed for *B. napus* [28]. Not surprisingly, the array showed the greatest efficiency for markers located on the "C" genome in *B. napus*, with less than 1% of the markers on most chromosomes failing to amplify. Nevertheless, as many as 50% of the markers mapping to the "A" genome in *B. napus* were able to successfully amplify in our *B. oleracea*. Further studies are needed to discern the mapping location of these markers in *B. oleracea* and to evaluate the amount of cross-amplification between the "A" and the "C" genome.

Although the cost of genotyping and sequencing is becoming ever lower, using a limited number of markers while retaining sufficient discriminatory power is still of interest. We found that a random sample of only a few percent of the amplifying markers would have provided us with an overall picture of reasonable similarity to the one obtained with the full dataset. In most cases, the FST value between two accessions in the same group could be estimated with reasonable accuracy with as few as 50 or 100 markers, similar to what was reported by Willing et al. [29].

A STRUCTURE analysis further suggested that as few as 50 randomly chosen markers could often capture a large part of the genetic structuring in the data, although the clustering of some accessions was inconsistent and the single analysis of 100 random markers failed to replicate the clustering obtained with the full dataset. In addition, parameters such as the amount of gene flow and the evenness of sampling have been shown to influence the number of markers needed to discriminate between groups of differently related individuals [30]. The minimum number of markers needed will vary from organism to organism and from case to case, but may be surprisingly low. Choosing the 20 most discriminatory markers for individual-level PCA resulted in an outcome with a high similarity to that of the full dataset. However, these 20 markers might not be the most discriminatory in another sample of cabbage accessions. For this reason, we recommend a subset of at least 500 SNP markers for duplication assessments in cabbage in order to get a robust, detailed result. A list of 500 SNP markers with a high discriminating power, high level of genetic diversity, and low linkage disequilibrium with each other can be found in Table S4).

#### *3.2. Duplication Assessment and Genetic Similarity*

This study clearly demonstrates that the same or similar names do not necessarily mean a duplicate holding. Both the STRUCTURE and PCA analyses detected genetic differences among accessions grouped by accession name. In the STRUCTURE analysis, five out of the 10 groups showed genetic differences within the group: A4 vs. A1, A3, A6, and A7 (Amager Tall group); A8 vs. A9 (Amager Short group); A11 vs. A12 (Amager Winter group); B2 vs. B1, B3, and B6 (Blåtopp group); and ST1 vs. ST2 (Stavanger Torv group, Figure 3). A similar pattern could be seen in the PCA based on SNP markers (Figure 4). Three of the five cases were supported by significant differences in one or several morphological traits (Table S1). The difference between the two Amager Winter accessions was obvious from the morphology, where A11 (Amager Winter) was identified as a red cabbage type, while A12 (Amager Vinter Gefion) was a white cabbage. The differences between the two Stavanger Torv accessions (ST1 and ST2) were less obvious, both being early maturing white cabbage types but with significant differences in two head characters. Within the Amager Tall group, morphological data also confirmed that accessions were different (Table S1), while no clear differences in morphology were observed in the Amager Short group or the Blåtopp group.

Genetic similarity can be used as a criterion to identify and handle duplicate holdings in genebanks [23,24]. In clonal and highly inbred material, it can be a relatively easy task to determine whether accessions are duplicates, but in open pollinated crops such as cabbage, the genetic structure is more complex. Even accessions that have the same origin—for example, accessions donated from the same seed lot to different genebanks—are expected to diverge over time [27]. In these species, the overall pattern of genetic diversity needs to be taken into consideration.

In instances where several lines of evidence (e.g., low FST values, a large proportion of shared alleles, morphologic similarity, and common clustering in the STRUCTURE analysis and PCA) suggest a high similarity compared to the average, accessions can be considered duplicates and one of the accessions can be removed from the genebank holdings with a minimum loss of genetic diversity (or bulked) [23]. Examples from our dataset that could be bulked, removed, or given lower priority in the conservation could be B3 and B6 and one of the LO accessions. Our study has shown that using accession names alone is not a good strategy to reduce duplicate holdings, as the same or similar names does not mean identical genetic composition. A combined method using both accession names and other passport data as a first step and then marker evaluations as a second step would be a better approach. Alternatively, morphological evaluations could be used or a more extensive passport data evaluation trying to trace the transactions of accessions between genebanks, for example, by using donor accession numbers or other relevant information. The ECPGR Brassica group has established an online tool for identifying duplicate holdings based on accession names and other passport data. This is a useful first step that could be taken into a next step with an extensive evaluation of the potential duplicates with the developed marker set.

#### *3.3. Cultivation History and Naming Practices*

There can be several explanations as to why accessions with the same or similar names are genetically and/or morphologically different. Minor differences could be explained by breeding history and naming practices. As mentioned, selections within a cultivar were common in the 19th and first part of the 20th century, and a cultivar could have many and complicated names [7]. One example is "Jatunsalgets Vinterkål Berbes St. Orginal" (J1) or "Jåtunsalgets Vinterkål" (J2). These two accessions are morphologically close and cluster together both in the STRUCTURE analysis and the SNP-based PCA, but the accessions are not identical, either morphologically or genetically. Jåtunsalget was a small Norwegian seed enterprise with only these two accessions recorded in the ECPGR *Brassica* database [31]. "Jåtunsalgets Vinterkål" was listed on the Norwegian variety list in 1979 [32], however the pedigree was "Jåtun Amager x en Hollansk sort i 1929", which means "a selection of Amager crossed with a Dutch cultivar in the year 1929". Most likely, "Jåtun Vinterkål" was marketed already in the 1930s but was listed much later. The oldest (and most original) accession is J1, acquired by VIR in 1953, while J2 (from NGB) entered the Nordic Genebank more than 20 years later from unknown sources [32].

A more complicated example is the Amager varieties. The ECPGR *Brassica* database [31] shows 102 records with "Amager" in their accession name. Amager is a geographical area and a village just outside Copenhagen that hosted both seed enterprises and an extensive vegetable production. We divided the Amager accessions in our study into three sub-groups based on naming; one of them was the Amager Tall sub-group, where the Danish word "Høj" or the Swedish "Hög" both mean "Tall" (with 21 accessions in the ECPGR database). Our study included five such accessions where A4 (accession name "Grami") was genetically clearly different from the other four. In most selections, there is a second name—e.g., "Grön Grami", "Grön Toftø", or "Resistent"—describing further selection properties or enterprises' names. We hypothesized that A4 ("Grami" from VIR) would be similar to A3 ("Amager Høj Grøn Grami" from NGB), but this was not the case. In retrospect, we should maybe not have grouped A4 with the Amager Tall accessions, as the only link to Amager was through the name "Grami", which was used also in the name of A3.

Within the remaining two Amager sub-groups, genetic differences were also detected. Amager Winter is commonly known to be a white cabbage (green leaf laminas), marketed, amongst others, by A. Hansen Amagerfrø in Denmark in the mid-20th century [7]. In the ECPGR *Brassica* database [31], there are six accessions fitting this name, and we included two of these.

Test cultivations showed that A11 (Amager Winter, K192) was a red cabbage (purple leaf laminas). A11 is from an unknown source in Denmark, acquired by VIR in 1969, and the accession has so far been through at least six regeneration cycles at the VIR experimental field. The accession was listed as purple at the time of entry (as accession number K192 in the VIR catalogue). Certainly, there have been red Amager cabbages traded. According to the Nordic Genetic Resource Center cultivar database [32], a red cabbage cultivar/selection named "Amager" was released in 1959 and was bred by Østergård frøavl (Denmark, breeders name Stenballe P 59 68). Other red Amager cabbage cultivars/selections were "Amager 304", bred by A. Hansen Amagerfrø (breeders name Tagenhus P 59 69, released in 1959); "Holdbar Amager", bred by L. Dæhnfeldt (breeders name Toftø S 1960, released in 1960); and "Amager Caro" and "Amager Rega", both released in 1974 by Ohsens Enke (Denmark). A11 could not be one of the latter, as it was acquired by VIR already in 1969, but it could be one of the earlier developed red "Amager" cultivars. What is certain is that A11 is not a duplicate of A12, which has a similar name (Amager Vinter Gefion, NGB1879) but is a genetically different white cabbage. Although A11 is different from the remaining accessions in the "Amager" group in the PCA, it does show a higher similarity to the "Amager" group than to other red cabbages in the PCA (Figure 4). This, together with the results of the STRUCTURE analysis (Figure 3), tentatively suggests that the accession is the result of breeding the purple color into an Amager background. It is hard to know if the polymorphism observed in accession L1 (compared to L2 and L3; Figure 3) is due to gene flow from accessions with a different leaf color, as there are both green and purple plants in this accession, or if there are other explanations.

The genetic data showed clear differences between the Amager Shor pair (Amager Kurzstunkiger Orginal, A8) and (Amager L NF Orginal, A9). Based on passport data, we know they are from different seed companies (one in Denmark and one in Norway) and were included in the collection at VIR in different years (1935 and 1967, respectively). Accessions with the name "Amager Kurztunkig" are found in in Germany, Poland, the Check Republic, and Belarus [31], most likely duplicated with the original accession from VIR included in this study. The prefix "Kurz" is a German word and means "Lav" (in Scandinavian) or short in English. Our reason for pairing A8 and A9 was the prefix "Kurz" in A8 and the abbreviation L ("Lav"?) in A9. Regarding plant height, they were both short, and they had quite similar morphological characteristics, but were not very close in the morphological PCA (Figure 1). Most likely, A9 is an Amager selection from Norsk Frø (NF). From Norway, a cultivar with a similar name (Amager L1 Orginal, not included in this study) is known, with the pedigree "Jåtun Amager x Jåtunsalgets vinterkål 1932" [32]. The cultivar was marketed from 1933 onward, but it was approved as late as in 1961. Amager Kurzstunkiger came to VIR in 1935 from a Danish enterprise but with a German accession name. Certainly, A8 and A9 have a different history and, as demonstrated, they are genetically different.

#### *3.4. The E*ff*ect of Genebank Conservation*

Changes in genetic composition, major or minor, may take place during field regeneration [33–37]. Our study was not designed to track changes from generation to generation, but some of the differences we observe are probably the result of this process. Minor changes are expected during regeneration in genebanks, especially if a low number of plants are used [38] or if insufficient isolation is used during flowering.

Genebanks use a standard number of plants, usually 20 to 50 individuals, and net cage isolation with pollinators to reduce the risks of genetic changes during regeneration. The FAO genebank standards do not specify the number of plants [39]. At VIR, regeneration takes place typically every 5–7 years, meaning that material acquired in the 1930s has been through at least 10 regeneration cycles. This is expected to result in an increase in differentiation among accessions and a loss of genetic diversity within accessions. The heterozygosity is predicted to decrease each generation in proportion to the population size [40]. For example, with 10 regeneration cycles and a population size of 20, a 22% decrease in heterozygosity is expected on average (Ht = (1−1/2N) Ht−1; N=diploid population size). We found that accessions acquired a long time ago tended to have lower genetic diversity than more recent additions, a pattern that is in agreement with the loss of genetic diversity from genetic drift. Additionally, more recently acquired accessions tended to cluster more centrally in the PCA plots, which could also be the result of genetic drift acting to differentiate older accessions.

Other factors such as selection and gene flow could also affect genebank accessions. Some selection from local conditions—both those connected to the environment at the regeneration site, such as climate and soil conditions, and those linked to cultivation practices, such as harvesting time and methods—is expected. In addition, if accessions are not completely isolated, gene flow will occur from other cultivated genebank accessions of the same species, from cultivated fields in the area, and from weeds. Bees and flies are the main pollinators of cabbages [41], and to avoid unwanted gene flow through cross-accession pollination, isolation is crucial. Contrary to genetic drift, external gene flow is expected to increase diversity within the accession and decrease divergence among accessions cultivated at the same time.

Van Hintum et al. [24] demonstrated that the genetic changes caused by regeneration within an accession were of similar magnitude to differences among genebank accessions of cabbages with the same or similar names. Therefore, they questioned the rationale behind conserving a large number of accessions with the same or similar names. Our study supports the occurrence of genetic change during regeneration and similarity among some accessions with similar names. At the same time, however, we have shown that similar names do not always imply the same genetic material.

#### *3.5. Implications for Genebank Conservation*

All genebanks have limited budgets and need to adapt their operations to their economic frame. One way to adapt is to remove or pool/bulk duplicates and thus make funds available for the high-quality conservation of the remaining unique accessions. This approach has been used in many species, including *Brassica* crops [23,42].

AEGIS has suggested a roadmap for how to handle duplicate holdings at the European level, identifying the most appropriate accessions based on passport data [4]. However, the decision to remove duplicates is the responsibility of individual genebanks. Our study clearly shows that relying exclusively on accession name when identifying duplicates can be risky, especially with old cross-pollinating cultivars with complex breeding history and naming practices. We find that in five out of the 10 groups, accessions with same or similar names have clear genetic differences. In most cases, such differences were corroborated by significant differences in one or several morphological traits.

Additional passport data such as accession numbers, donor institute, donor accession number, etc. can help pinpoint the origin of the accession and the time of split from other accessions. If such data is available, the chance of correctly identifying duplicates based solely on documentation increases. For recently acquired material, for example—modern cultivars—this can be a safe approach. However, for older cultivars and landraces this is often difficult. Documentation is often missing and, as discussed above, accessions can have diverged substantially from a common origin via complex breeding histories and regeneration in genebanks.

Using detailed morphological characterization has been suggested [43], as has combined morphological and molecular characterization, and molecular characterization alone [18,44]. Our current findings lend support to the need for characterization before deciding to remove or bulk accessions. However, a major challenge is the costs of such characterizations. Most genebanks are underfunded and have backlogs in regeneration and viability monitoring. For tracking future duplications, the introduction of DOI on the accession level [12] would make transactions between genebanks easier, but it cannot capture what is already duplicated. International collaboration and genotyping using the next-generation sequencing could be a cost-effective way forward [14–18]. Here, proper information on the accession level could go hand-in-hand with the facilitation of the use of the germplasm as genetic information is catalogued, linked to the accessions, and made available for the users.

#### **4. Materials and Methods**

#### *4.1. Plant Material, Cultivation, and Morphological Characterization*

In Europe, there are 35 *Brassica* collections, located in 24 countries and with more than 11,000 *B. oleracea* accessions [45]. In total, 980 cabbage accessions are maintained at VIR and 189 at NGB. An initial study [46] characterized six groups for morphological traits, and some differences within the groups could be detected. In this study, another 10 groups (Table 1) were examined to see if there were differences among accessions within a group. For each group 10 plants per accession were planted. These plants were randomized, and each of the 10 plants was characterized. The study consisted of 10 such randomized pair/triplet characterizations. The planting distance was 50 cm between the plants. The work was done at Alnarp, Sweden (55◦' N, 13◦' E); the soil was loamy clay and the fertilization was 100 kg ha-1 PROMAGNA 11-5-18™ (Yara, Norway) at planting and 30 kg ha-1 YaraMila 22-0-12™ (Yara, Norway) one month after planting. Plants were irrigated and biological control and fungicides were applied. Plants were evaluated just before harvesting. SI units were used for plant leaf, head, and core size parameters and UPOV (International Union for the Protection of New Varieties of Plants) [47] descriptors for leaf color, head shape, and head density. Details are provided in Table S1. Principal Component Analysis (PCA) bi-plots were used for an overview of the data and characters. An ANOVA was performed for each numeric character and included data from all the individuals in that group. If the ANOVA indicated significant differences among accessions, a Tukey multiple comparison of means [48] was used to identify accessions that differed from each other. χ2 statistics were used for categorical characters.

#### *4.2. DNA Extraction*

As far as we know, the selected accessions have not previously been included in any molecular studies. DNA extraction was conducted on the same 10 plants per accession that were morphologically characterized. Leaf samples (2 cm2) were collected from the plants cultivated in the field, placed in 2 ml Eppendorf tubes, immediately frozen in liquid nitrogen, and subsequently freeze-dried overnight in a LyoLab 3000 (Heto Lab Equipment). The freeze-dried material was powdered in a mixer mill (Merck Retsch MM 300) using steel beads; hereafter, 600 μl of CTAB buffer was added to each powdered sample (0.1 M Tris; pH 8.0, 0.01 M EDTA, 0.7 M NaCl, 1% CTAB, and 1% β-mercaptoethanol) according to Doyle and Doyle [49]. The samples were incubated in a thermomixer (Eppendorf) at 600 rpm and 60 ◦C for 60 min. DNA was extracted by adding one volume of chloroform/isoamylalcohol (24:1), then they were mixed and centrifuged for 20 min at 13,200 rpm. The supernatants were transferred to new tubes and 5 μl of RNAse (1 mg/ml) was added and incubated in the thermomixer (600 rpm, 37 ◦C for 30 min). Cold isopropanol (0.8 V) was added, followed by mixing and centrifugation (10 min at 13,200 rpm). The DNA pellet was cleaned in 500 μl of wash-buffer (76% ethanol, 0.2 M sodium acetate) for 20 min at room temperature with subsequent centrifugation for 5 min in 13,200 rpm, followed by 500 μl of rinse-buffer (76% ethanol, 0.01 M ammonium acetate) and mixing and centrifugation for 5 min at 13,200 rpm. The samples were left to dry in room temperature for 1 h and were then re-suspended in 50 μl of ddH2O. The DNA concentration and ratio (at 260 nm and 280 nm) were determined using an Eppendorf BioSpectrometer. The DNA concentrations of the samples were adjusted for further analysis.

#### *4.3. SNP Analysis and Statistics*

Array genotyping was performed by TraitGenetics GmbH (Gatersleben, Germany) and with a 15 K Illumina Infinium array that contains a subset of markers from the Brassica 60K array [28]. The array had previously been tested for *B. oleracea* and had a total of 13,714 SNPs. An initial run was carried out with 92 individuals, and a second with 178 individuals. Ten individuals were analyzed from each accession in order to capture the within-accession variation. Cabbage accessions are expected to harbour substantial within-accession variation, and therefore more individuals per accession are needed. By analyzing 10 individuals per accession, we gained an adequate picture of

the within-accession variation and at the same time we were able to include many accessions in the study. The first run with 92 individuals was a test run, and since that was successful the second run with 178 individuals was performed in the same way to increase the number of individuals analyzed. After the merging of the two runs, failed and invariant markers were removed, as were markers failing in more than 50% of the individuals. The remaining 5965 markers were used for further analysis. Of these, 68% (4090 markers) had less than 10% missing data. After the removal of the above markers, individuals with more than 40% failed markers were removed (2 individuals + 2 controls). Of the remaining individuals, all had less than 10% missing data.

Two individuals of the accession K2248 were genotyped in both runs. The second genotyping of both individuals had a lower success rate and was removed from the downstream analysis. In total, 266 individuals were kept for further analyses and were analyzed with 5965 markers.

Deviations from the HWE (Hardy-Weinberg Equilibrium) were tested using a χ2 test with and without Bonferroni correction. All the accessions had less than 10% of the markers deviating from the HWE before the Bonferroni correction. No marker deviated significantly from the HWE in any accession after the Bonferroni correction, and hence no marker was removed for this reason.

Wright's FST [50] and Nei's h were estimated according to Nei [51] using purpose-written perl scripts. For the FST values, significance was determined by permutation tests (1000 permutations). Subsets of the markers were analyzed to evaluate if the FST values between pairs of accessions from the same group could have been estimated with similar accuracy with a more limited number of markers. Subsets of 500, 100, 50, and 10 markers were randomly drawn from the dataset and used for calculating the FST values. This was repeated 1000 times for each number of markers, and the average FST values and standard error for the 1000 replicates were calculated.

PCA of the genetic data was carried out using R v 3.2.4 [52] and the *prcomp* command. For an accession-level PCA, the allele frequencies for each allele at each locus were treated as independent variables, while in the individual-level PCA the number of copies of each allele at each locus was used.

The software STRUCTURE (v 2.3.4) [53,54] was used to explore the data for genetic structuring. The software was run with a burn-in length of 20,000 iterations, followed by 50,000 iterations for estimating the parameters, with non-amplifying markers treated as missing data. Each analysis using the admixture model was repeated 10 times for each number of clusters (K = 1 to 10), until the likelihood values for the runs no longer improved. The number of clusters observed in the dataset was evaluated by calculating the ΔK according to Evanno et al. [55]. CLUMPP v 1.1.1 [56] was used to compare the results of individual runs and to calculate the similarity coefficients, H, and the average matrix of ancestry. In CLUMPP, the Full- Search, Greedy, and LargeKGreedy algorithms were used for comparing runs with K < 4, 4 ≤ K ≤ 6, and K > 6, respectively. The graphical presentation of the results was obtained using DISTRUCT v 1.1 [57]. STRUCTURE was also used to analyze the subsets of randomly chosen markers (10, 50, 100, and 500 markers, respectively), and the repeatability of the STRUCTURE analysis of 50 randomly chosen markers was analyzed using CLUMPP.

#### **5. Conclusions**

Our study is a contribution to AEGIS and the work to avoid unwanted duplication holdings. We tested the SNP markers developed for *B. napus* and found that many of these genetic markers (nearly 6000) were suitable for an analysis of the genetic structure and duplicate identification of *B. oleracea*. Of these, a subset of 500 markers are recommended for a future large-scale analysis of *B. oleracea* var. *capitata*. Both the genetic SNP data and the morphological data demonstrate the complex relationships among old cabbage cultivars and show that similar accession names do not necessarily mean that accessions are genetically or morphologically similar. This emphasizes that in the case of old cultivars of cross-pollinating species such as cabbage, extra care should be taken when identifying duplicates.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2223-7747/9/8/925/s1: Figure S1: Data for determining optimal number of clusters (K) in the STRUCTURE analysis of the full SNP dataset. *Plants* **2020**, *9*, 925

Figure S2: Results of the individual level principal component analysis (PCA) based on SNP markers. Figure S3: FST values between different accessions based on 1000 subsamples of each 10, 50, 100 and 500 SNP markers respectively. Figure S4: Results from PCA when subsampling 20 SNP markers: A) individual level analysis and B) accession-level analysis. Figure S5: Subsampling of SNP markers for STUCTURE analysis assuming two genetic clusters (K=2). Figure S6: Comparison of STRUCTURE analyses of 10 subsets of 50 randomly chosen SNP markers. Table S1: Morphological descriptors with means and standard deviations. Table S2: Pairwise FST values among all investigated accessions. Table S3: Average number of pairwise shared SNP alleles among all investigated accessions. Table S4: Marker subset suggested for the identification of duplicate accessions (Excel file).

**Author Contributions:** Conceptualization, A.A. and S.Ø.S.; methodology, J.H., K.A., and A.E.P.; formal analysis, K.A., J.H., and A.E.P.; investigation, A.E.P.; writing—original draft preparation, A.P., J.H., and S.Ø.S.; writing—review and editing, all; visualization, J.H., S.Ø.S., and A.E.P.; project administration, S.Ø.S. and A.A.; funding acquisition, S.Ø.S. and A.A. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by the Nordic countries (Norway, Sweden, Denmark, Finland, and Iceland) through the Nordic Counsel of Ministers.

**Acknowledgments:** We like to thank Flemming Yndgaard and Jerker Niss at Nordic Genetic Resource Center for assistance in the visualization of morphological data and for field cultivation, respectively.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analysis, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Review*

### **Document or Lose It—On the Importance of Information Management for Genetic Resources Conservation in Genebanks**

#### **Stephan Weise \*, Ulrike Lohwasser and Markus Oppermann**

Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstr. 3, 06466 Seeland, Germany; lohwasse@ipk-gatersleben.de (U.L.); opperman@ipk-gatersleben.de (M.O.) **\*** Correspondence: weise@ipk-gatersleben.de

Received: 15 July 2020; Accepted: 14 August 2020; Published: 18 August 2020

**Abstract:** Genebanks play an important role in the long-term conservation of plant genetic resources and are complementary to the conservation of diversity in farmers' fields and in nature. In this context, documentation plays a critical role. Without well-structured documentation, it is not possible to make statements about the value of a resource, especially with regard to its potential for breeding and research. In particular, comprehensive information management is a prerequisite for the further development of genebank collections. This requires detailed information about the composition of a collection, thus allowing statements about which species and/or regions of origin are under-represented. This task is of strategic importance, especially due to the threats to crop plants and their wild relatives caused by advancing climate change. Both the actual conservation management and the fulfilment of legal obligations depend on information. Hence, documentation units have been established in almost all genebanks worldwide. They all face the challenge that knowledge about genebank accessions must be permanently managed and passed on across generations. International standards such as Multi-Crop Passport Descriptors (MCPD) have been established for the exchange of data between genebanks, and allow the operation of international information systems, such as the World Information and Early Warning System on Plant Genetic Resources for Food and Agriculture (WIEWS), the European Search Catalogue for Plant Genetic Resources (EURISCO) or Genesys.

**Keywords:** documentation; genebank; plant genetic resources; agricultural biodiversity

#### **1. Introduction**

For many centuries, humans have been taking advantage of the plant world and adapting it to their needs. The resulting diversity of useful plants is the main source of food for humans and animals [1]. In addition to the nutritional aspect, plants also supply raw materials for the chemical and pharmaceutical industries and are renewable energy sources [2,3]. Global biodiversity is severely threatened by human intervention, not the least against the background of advancing climate change [4]. This also applies to the diversity of crops.

Genebanks play an important role in the long-term conservation of these plant genetic resources [5]. They complement the conservation of diversity in farmers' fields and in nature. There are about 1800 genebanks worldwide, more than 600 of them in Europe [6]. The important genetic diversity stored in genebanks can provide new impulses for research and breeding, e.g., by introducing new alleles into existing breeding stocks [7], which have a morphologically and physiologically beneficial effect on the characteristics of the plants. Besides maintenance and regeneration, an important task of genebanks is therefore the phenotypic characterisation of accessions [8].

In this context, documentation plays a crucial role [9,10]. Imagine a supermarket with shelves where all the goods are unlabelled. In addition, only a few persons know what is on which shelf. There are no records of where the goods come from, how old they are and for how much they have to be priced. It is obvious that such a store cannot work. The same holds true for genebank holdings. Without having as much information as possible available in a well-structured way, it is not possible to make informed statements about the value of a resource, especially with regard to its breeding and research potential. The wealth of information covers many areas, from data necessary to optimally manage collections over genebank basic data (passport data) to phenotypic and comprehensive genetic data. One of the greatest challenges of genebanks, apart from the conservation of accessions, is the management of these data [11,12].

This article aims to provide an overview of the relevant systems and structures of plant genetic resources documentation in genebanks. The development of documentation, the current situation and international cooperation as well as the specification of data for long-term and sustainable conservation management are described. In addition, the current needs for integrative information management based on several requirements for future genebank documentation are described.

#### **2. Information Required**

For the long-term conservation and use of plant genetic resources, it is necessary to document a large amount of information at various levels, especially for the identification and characterisation of accessions.

#### *2.1. Basic Data*

Basic information on plant genetic resources is contained in the passport data. They serve in particular to identify the material and contain information such as the accession number, the scientific name and information on the origin and acquisition of the material (year of acquisition, donor, collecting mission, location of collecting). Ideally, these data are following the standard of the Multi-Crop Passport Descriptors (MCPD) [13,14], which has been developed as a uniform, global format. This corresponds with the recommendations of the Genebank Standards [15] of the Food and Agriculture Organization of the United Nations (FAO), which were developed by the FAO Commission on Genetic Resources for Food and Agriculture (http://www.fao.org/cgrfa/).

The geographical origin of a genebank accession (particularly in connection with environmental data) can provide information on possible adaptations to biotic/abiotic stress factors. Such data should be complemented by information on the type of material (biostatus, e.g., wild form, landrace, etc.). Other important data include phenotypic characterisation of the individual accession, including morphological and agronomical traits.

This basic information helps to identify the individual samples of the genetic resource and to estimate its value, especially its potential for breeding, but also for research.

#### *2.2. Stable and Unique Identifiers*

Genebanks have been existing for many decades. This long period of time implies that the conditions at genebanks have been and continue to change. On the one hand, changes result from technological progress. This means that both the type of storage and the data itself must be adapted. On the other hand, social, political and economic changes occur, which lead to organisational changes in genebanks. This means that the description and use of plant genetic resources are also subject to constant change and may result in different identifiers for an accession over the time. In addition, the exchange between genebanks is common in order to establish conservation security through targeted safety duplication and to complement collections in the individual countries. Moreover, material is supplied to researchers and breeders, but also to other users.

Before the introduction of information systems spanning several collections, a local, sometimes temporary, unambiguous identification of an accession was sufficient. However, changing identifiers lead to chains of identifiers of an accession and make it more difficult to trace transferred material. Even within a collection, unambiguity could not always be guaranteed, for example, in the case

of multiple accession numbers for the same material in the course of time. Identical identifiers in aggregating information systems also pose a problem [16]. This led directly to the consideration of introducing a system that would provide unique and stable identifiers for genebank accessions [17]. Of the various approaches that can be considered for this purpose, digital object identifiers (DOIs) appear to be the most common. The idea of the DOI dates back to the 1990s [18]. A DOI is a unique and permanent digital identifier of a (digital) object. For the description of the object, metadata are associated with it. The name resolution of a DOI is performed using a handle system such as doi.org. The DOI system was introduced in 2000 and is maintained by the International DOI Foundation (IDF) [19]. The core is a central database. It contains the URL under which the referenced object is currently available. The organisation that has registered the DOI is responsible for updating the database entry whenever the metadata are changed. Not the least due to the support by the Secretariat of the International Treaty on Plant Genetic Resources for Food and Agriculture (ITPGRFA), DOIs have established themselves as a quasi-standard for plant genetic resources material [20]. An undeniable advantage of DOIs is their high acceptance in the scientific community. However, it should be critically noted that accessions of plant genetic resources are not non-modifiable digital objects. Data describing accessions are generally subject to changes and additions.

Moreover, it is an indispensable task of the documentation units of the genebanks to map the historically used identifiers of the accessions to these new, unique identifiers.

#### *2.3. Data for Collection Development*

What is needed to recognise which parts of a collection are over- or under-represented? First and foremost, the composition of a collection naturally depends on its exact purpose. In addition, as comprehensive information as possible on individual accessions is a further essential prerequisite for the further development of genebank collections. With the help of the botanical determinations of accessions, a genebank collection can be examined to determine the extent to which the gene pool of a genus is completely represented or which species/subspecies are missing. In addition, the within species/subspecies diversity should be adequately represented. By adding geographical information, it is possible to specify more precisely from which regions material is missing and should be collected. Such detailed information on the composition of a collection is necessary to perform gap analyses [21–23]. In addition, precise geographical data can be used to draw conclusions about ecological conditions. This task is of strategic importance, especially because of the threat posed to crops and their wild relatives by progressing climate change [24].

The comparison of data on accessions also allows statements to be made about potential duplicates within a collection. Nevertheless, this task is not trivial. A reliable statement can only be made by the joint analysis of phenotypic, genotypic and passport data in combination with comparative cultivations [25,26].

By analysing combined data from different genebanks, it is also possible to check which accessions are maintained in other collections and, if necessary, could be obtained from there [27,28]. Such comparisons are an important means of expanding a collection in a targeted manner through exchange with other genebanks. This can also contribute to making the genebank work more effectively. One approach that pursues this goal is the European Cooperative Programme for Plant Genetic Resources (ECPGR) initiative, A European Genebank Integrated System (AEGIS, https://www.ecpgr.cgiar.org/aegis/) [6,29]. AEGIS aims to identify which accession in different genebanks is the most appropriate accession (MAA). In this case, genebanks participating in AEGIS undertake to assume responsibility for conservation and to maintain this accession according to uniform standards in the long term.

#### *2.4. Securing the Legal Status of Acquisitions*

There are international agreements governing the conservation and sustainable use of plant genetic resources as well as access and benefit sharing (ABS), in particular, the Convention on Biological Diversity (CBD, https://www.cbd.int/), which entered into force in 1993, and its supplementary agreement, the Nagoya Protocol of 2014 (Nagoya Protocol on Access to Genetic Resources and the Fair and Equitable Sharing of Benefits Arising from Their Utilization to the Convention on Biological Diversity, https://www.cbd.int/abs/about/). To this end, it is essential to document the origin of genebank accessions, as well as the time of inclusion in a collection and existing collecting permits.

In accordance with the CBD, the International Treaty on Plant Genetic Resources for Food and Agriculture (ITPGRFA, http://www.fao.org/plant-treaty/) came into force in 2004 and the signatory states committed themselves to conserve, characterise and evaluate plant genetic resources and to ensure their sustainable use. The main component of the Treaty is the Multilateral System of Access and Benefit Sharing (MLS), which regulates facilitated access to plant genetic resources that have been included in the MLS and the equitable sharing of the resulting benefits. A Standard Material Transfer Agreement (SMTA, http://www.fao.org/plant-treaty/areas-of-work/the-multilateral-system/the-smta/en/) was created for this purpose. The reporting obligations attached to the SMTA are usually part of the tasks of the documentation units of the genebanks.

#### *2.5. Material Management Data*

The actual conservation management in the genebanks is also dependent on information, e.g., on storage conditions and location, and the quality and quantity of seeds and other plant propagules at accession level, in order to enable efficient and effective genebank management. This includes information such as germinability or storage quantity as well as information on health tests that have been carried out or are pending. In addition, information on the regeneration of the individual accessions must be documented as well as how and where the material is stored (e.g., active or base collection, cold store, shelf, safety duplication sites) [15]. Depending on size and information technology equipment, the solutions of the individual genebanks differ [12,30].

#### *2.6. Data to Protect Against Losses*

How is it generally attempted to protect plant genetic resources in genebanks from loss? A fundamental idea of genebanks is the protection against losses of accessions or entire collections, e.g., in case of war or catastrophe. The FAO Genebank Standards [15] recommend to realise this by intentional duplication of unique accessions in different genebanks of geographically distant areas. In recent years, the Global Seed Vault (https://www.croptrust.org/our-work/svalbard-global-seed-vault/) in Svalbard was introduced as an additional safety backup system [31–33]. That this has its justification is evident in the example of the genebank of the International Center for Agricultural Research in the Dry Areas (ICARDA), which was reconstructed from security samples in Morocco and Lebanon (https://www.seedvault.no/news/withdrawal-of-icarda-aleppo-seeds-accomplished/). However, such a mechanism has not yet been established for the associated data. It also does not make sense to save data as a black box, as is possible with the physical material. In the case of loss of material, information about where safety duplicates are located, but also who did receive this material (both donors and recipients of material samples), are the keys to finding the material again. Unfortunately, this information is primarily in the genebank, which could be lost. Usually only the basic passport data are shared along with safety duplicates, in some cases not even this. In addition, not all genebanks have a strategy and the ability to secure data which goes beyond this. Similar to the idea of the Global Seed Vault, a global data safe or corresponding, connected repositories would be one way to meet this challenge. Of course, this must not result in losing reference to the physical material. The International Nucleotide Sequence Database Collaboration (INSDC, http://www.insdc.org/) shows that such an approach can work [34,35]. The INSDC dates back to the 1980s and ensures the continuous synchronisation of DNA and RNA sequence data from the three major international systems, GenBank (https://www.ncbi.nlm.nih.gov/Genbank/), the DNA Data Bank of Japan (DDBJ, https://www.ddbj.nig.ac.jp/) and the European Nucleotide Archive (ENA, https://www.ebi.ac.uk/ena).

#### **3. Information Management for Genetic Resources Conservation**

#### *3.1. Documentation Development*

The main intention of genebanks is to conserve collections of plant genetic resources for posterity. This means that the documentation of the material must also be ensured across generations. In order to meet the resulting requirements, documentation units have been established in almost all genebanks worldwide. They all face the challenge that the knowledge about the genebank material must be continuously managed, supplemented and passed on across technical and personnel generations. The knowledge must not be tied to individual persons alone. Some genebanks are many decades old, which means that there have been several changes of personnel during this period and possibly also one or more predecessor institutions. Ideally, however, there should still be complete documentation of all accessions of plant genetic resources acquired, maintained and distributed, covering the entire period since the foundation of the genebank.

Historically, the documentation structures and methods of the management of genebanks in the first decades were influenced by experiences from botanical gardens and focused on breeding activities. Accordingly, there was a collection-oriented documentation of the origins, often in the form of collecting mission reports, and a practice-oriented documentation of the collections as inventory books and cultivation documentation. The taxonomic classification of the accessions was an important ordering criterion. Many genebanks regularly created a catalogue, a so-called Index Seminum, to make their holdings publicly accessible. Depending on the size and focus of the genebank, more or less complex index card systems and registries were established. Archives with correspondence and collecting mission reports supplemented these, but often only with weak linkage to collection management. These document holdings were difficult to search and exploit. Due to their limited storage capacity, the first electronic systems were therefore established parallel to paper documentation in order to facilitate searchability, filtering and presentation of the information. With the advancing technical development of databases and spreadsheet programs, different IT solutions were created depending on size, focus and organisational form (centralised or decentralised) of the genebank, which increasingly replaced paper documentation [36–40].

Many larger genebanks have implemented in-house genebank information systems in recent years, e.g., GENIS [41], GBIS [42] or GBIMS [43]. In addition, there was also cooperation between genebanks, sometimes across national borders. For example, the management system SESTO (https: //sesto.nordgen.org/) was developed for the joint use of the Nordic countries and was operated by the Nordic Genetic Resource Centre (NordGen) in Alnarp, Sweden. SESTO was also used by the genebanks of the Baltic countries for documentation purposes. However, even today there are still a large number of smaller collections that do not have these capabilities and manage data with simpler means, such as MS Excel lists [44]. For those, the freely available system, GRIN-Global (https://www.grin-global.org/) [45], is increasingly establishing itself as an alternative to proprietary systems, e.g., as seen in Barata et al. [30]. This system was originally developed for the Germplasm Resource Information Network (GRIN) of the United States Department of Agriculture (USDA) and has been made available as an open source version jointly by the Global Crop Diversity Trust, Bioversity International and the USDA's Agricultural Research Service since 2011. GRIN-Global enables to manage phenotypic data in addition to the basic passport data. Furthermore, the system allows the maintenance of material management data, e.g., germination rates or existing storage quantities. The data can be curated via a corresponding interface. Furthermore, GRIN-Global has an online search and ordering system. This represents a major step forward for structured and sustainable documentation. The Genebank Information System (GBIS) [42] of the German Federal ex situ Genebank for Agricultural and Horticultural Crop Species is following a similar line. Just like GRIN-Global, GBIS allows to manage different types of data (passport and phenotypic data, management data, plant health tests, germination rates, orders, etc.). An online search and ordering system is also publicly available. However, in contrast to GRIN-Global, GBIS is explicitly designed for use in a

single genebank. For this purpose, the system is fully integrated into the specific work processes. In this context, GRIN-Global and GBIS shall serve as examples of two different philosophies in the development of information systems in genebanks.

#### *3.2. Current Situation*

In order to provide an overview of the current situation of information management in genebanks, the authors conducted an ad hoc survey among the National Inventory Focal Points of the European Search Catalogue for Plant Genetic Resources (EURISCO) network (see below). They were asked whether they could provide information on (1) how the genebanks in their respective countries manage their data and (2) which data are managed in addition to basic passport data. From 40 persons contacted, 30 replied. Even though this survey is not representative and certainly reflects only a part of the overall situation, it provides a basic overview. A more extensive survey, aiming in particular at perspective approaches to the future development of the documentation systems in the individual countries, would be a logical consequence, but was not feasible in the context of this article.

As expected, the information management in genebanks is very diverse. Since the data of different domains are often managed in different systems, there were multiple answers. Forty percent of the respondents stated that individual information systems are used in the genebanks. Beginning in July 2020, the Nordic and Baltic countries started to use a joint system based on GRIN-Global (Nordic Baltic Genebanks Information System (NBIS), https://www.nordic-baltic-genebanks.org/gringlobal/). Thus, 28% in total use the GRIN-Global system, while only 3% apply other systems that are used in different genebanks. More than 40% use MS Access or FoxPro for data management. The most widespread use is MS Excel (59%). Thirty-four percent still document in paper form (Figure 1).

**Figure 1.** Overview of the management systems used in various genebanks. Different systems are often used depending on the type of data. Therefore, multiple answers were possible.

All respondents indicated that they manage passport data. This was to be expected because this is the basic information for genebanks. In addition, it was stated that more than 90% also hold phenotypic data and information on seed stocks. Sixty-nine percent also manage data on seed orders.

#### *3.3. International Collaboration*

Despite ever better IT support through genebank information systems, these remained largely isolated from each other. Since the 1980s, a start has been made on compiling data on accessions of one or more crop species obtained in genebanks in a region or even worldwide into databases, the so-called Central Crop Databases (CCDBs) [46]. Two of the earliest Central Crop Databases are the European Barley Database [47] and the European Prunus Database [48]. The CCDBs have strengthened the cooperation between genebanks and have been made possible through networking genebanks and collections. In addition, the CCDBs aimed to make genebank material more accessible to users and to identify possible duplicates between the individual collections. However, these goals could only be achieved to a limited extent, particularly due to the limited availability of these databases and their low data quality or lack of such data [49]. Another major challenge was the long lack of uniform standards for the description and exchange of passport data. In 1997, a first draft of the Multi-Crop Passport Descriptors (MCPD) was presented [50], which was subsequently developed into a globally accepted standard [13,14].

Aggregating platforms and databases for a cross-genebank search for suitable accessions have also been and are being developed. MCPD and Darwin Core [51,52] have been established for the exchange of passport data between genebanks and these platforms. It is only through such standards that it is possible to feed and operate international information systems such as the World Information and Early Warning System on Plant Genetic Resources for Food and Agriculture (WIEWS, http: //www.fao.org/wiews/), EURISCO (http://eurisco.ecpgr.org/), Genesys (https://www.genesys-pgr.org/) or the Global Biodiversity Information Facility (GBIF, https://www.gbif.org/), which combine the information as homogeneously as possible and make it available beyond the boundaries of the individual genebank collection.

While the intention of WIEWS is to provide periodic, country-driven assessments of the plant genetic resources conservation status for FAO, the European Search Catalogue for Plant Genetic Resources (EURISCO) provides detailed accession-specific information on the majority of European collections [53]. Of the approximately 600 European collections of plant genetic resources, more than 400 provide their accession-level data to EURISCO. Information on the geographical location of the genebanks can be found on the EURISCO website. The development of EURISCO started in 1999. For this purpose, a network of national inventories covering 43 countries was successively established. These national inventories bring together the data from the respective collections of their countries and then make them available to the EURISCO information system in a coordinated manner. EURISCO currently documents more than two million genebank accessions, the data of which are regularly updated. The MCPD standard plays a key role here.

Another approach is the Genesys information system, which has been developed since 2008 and is based on the System-wide Information Network for Genetic Resources (SINGER) system. SINGER was an integrated system for the management and exchange of data on plant genetic resources held in Consultative Group on International Agricultural Research (CGIAR) genebanks (https://cgiar.org/). This system was originally developed in 1994 and was put on a new technological basis with Genesys. In addition, EURISCO as the European hub and the US GRIN system as the North American hub regularly feed their data into Genesys as well. A source of additional Australian data is the University of Queensland's Crop Trait Mining Informatics Platform [54]. The aim of the platform is to make information available through the existing Genesys system to support the development and use of plant genetic resources. Both EURISCO and Genesys comprise passport data as well as phenotypic data.

The above-mentioned GBIF is both an international network and an infrastructure designed to make data on global biodiversity freely and permanently available. For this purpose, standards and tools for the exchange of information are provided. With its all-encompassing approach to aggregating data from all areas of biodiversity, GBIF holds a special position. In the plant sector, this network mainly includes data from natural history collections, such as botanical gardens, herbaria and other biodiversity databases, but also includes data on plant genetic resources, which are made available either via aggregators, such as EURISCO, or directly through the genebanks.

The systems just described have made a significant contribution to standardising the documentation of plant genetic resources and improving international cooperation. They make it easier to find information on accessions maintained in a large number of genebanks. In addition, they provide positive impulses for the coordination of the conservation of collections. In particular, the European approach AEGIS is being promoted as a virtual European genebank and aims to define the efficient conservation of genebank material using uniform standards.

In general, networks at the regional and/or international level play an outstanding role in developing the above approaches. They contribute to bundling existing strengths and resources in order to master common challenges. In this context, the European Cooperative Programme for Plant Genetic Resources (ECPGR, https://www.ecpgr.cgiar.org/) should be mentioned, under which a large number of Central Crop Databases as well as the EURISCO system were developed. From the field of natural science collections, the Distributed System of Scientific Collections (DiSSCo, https://www.dissco.eu/) approach should be mentioned here. DiSSCo aims to curate and make accessible the holdings of European natural science collections according to uniform criteria. This approach could provide additional impulses to advance the international networking of genebanks and their documentation.

In the context of international cooperation, reference should be made here once again to the International Treaty on Plant Genetic Resources for Food and Agriculture (ITPGRFA, see above). Article 17 of the Treaty contains the vision for the development of a global information system. Since 2015, a Global Information System (GLIS) for plant genetic resources on the basis of existing information systems has been underway. The goal of the GLIS is to provide a global entry point for knowledge and information to support the conservation, management and use of plant genetic resources (http://www.fao.org/plant-treaty/areas-of-work/global-information-system/). Besides the GLIS DOI portal (https://ssl.fao.org/glis/) of the Treaty Secretariat, WIEWS, Genesys, EURISCO and GRIN-Global are major components of GLIS.

Successful international cooperation, also in the field of documentation, supports the fulfilment of the Sustainable Development Goals (SDG, http://www.fao.org/sustainable-development-goals/) developed by 193 member states of the United Nations.

Finally, it is worth mentioning the international DivSeek network (https://divseekintl.org/), which is a worldwide collaboration to support the creation, integration and exchange of data on plant genetic resources. A number of working groups have been established for this purpose.

#### **4. Challenges**

As already mentioned, genebanks manage more than just passport data and material management data on accessions. For plant genetic resources, however, there are no generally accepted standards for capturing and exchanging data other than passport data [55]. This is a particular challenge for phenotypic and genotypic data.

Phenotypic data are collected in genebanks for various reasons. On the one hand, they provide important information for a better exploitation of the collection material. On the other hand, they support the management of collections, for example, they serve to ensure the quality of seed multiplication. Many genebanks collect a range of phenotypic and agronomic traits during each multiplication cycle to characterise the material, but also to detect potential mixing or swapping of material. This is particularly important for cross-pollinated species. Initiatives to harmonise the collection of phenotypic data in the field of genebanks have been underway since the late 1970s. The IPGRI/Bioversity descriptor lists developed for different crop species are a good example of this, e.g., [56–58]. However, it never was possible to achieve general acceptance. In many genebanks, the phenotyping of the material is based on such lists, but they have often been further developed and adapted to the respective practice. All data collected in this way are of limited comparability. In addition, there is a large number of scientific experiments, not carried out by the genebanks themselves, in which phenotypic data are

collected from genebank material. The European information system EURISCO (see above) also collects phenotypic data [53]. Due to the problems just described, it was decided here not to standardise the data itself, but only the exchange format. This is a minimum consensus format that only contains the fields that are absolutely necessary. The idea behind this was to first gather a critical mass of data from which it is worthwhile to start a discussion with providers and users about harmonising traits and methods [59]. In this context, there are current approaches that aim to improve the comparability and traceability of phenotypic data, e.g., by recommendations for more extensive documentation of metadata such as Minimum Information About a Plant Phenotyping Experiment (MIAPPE) [55,60,61]. Mapping of traits and methods to ontology terms, such as Crop Ontology [62,63], is also promising.

Platforms for combining and analysing the data from the different domains are also essential, and can be adapted to the needs of projects or communities. A promising and already successfully used approach is the platform Germinate, developed by the James Hutton Institute, which is used in various projects to represent their data [64].

In addition to phenotypic data on genebank collections, genotypic data are increasingly coming into focus. Genotypic data can help to better exploit the treasures stored in genebanks [65,66]. Against this background, reference should be made here to pilot projects, which aim to exploit entire collections on a molecular level. For example, the BRIDGE project carried out the genotyping of the entire barley collection with more than 20,000 accessions of the German Federal ex situ Genebank for Agricultural and Horticultural Crop Species [67]. The analysis of such genome-wide genotyping-by-sequencing data provides the basis for gene annotation, marker-assisted selection or a better understanding of the population structures of globally domesticated crops, to name a few. In addition, molecular data can also be used to improve the curation of genebank collections, for example, with regard to potential duplication [68].

Moreover, such approaches can contribute significantly to the long-term development of traditional genebanks into biodigital resource centres. This means integrated centres which, in addition to the actual plant genetic resources, also provide a large amount of associated information from various data domains and thus enable better and more targeted access to the material [26,69]. According to this objective and based on first experiences, e.g., from the above-mentioned BRIDGE project, a number of further research projects with international participation are currently taking place. Examples are the Activated GEnebank NeTwork (AGENT, https://www.agent-project.eu/) and Intelligent Collections of Food Legumes Genetic Resources for European Agrofood Systems (INCREASE, https://increase-h2020. eu/) projects.

However, new challenges arise from this thoroughly gratifying development. Single-seed descent (SSD) lines must be generated for genotyping. However, depending on the type of accession, these SSD lines reflect only a selection from the original accessions. This is particularly problematic in the case of populations. Strictly speaking, the SSD lines would have to be preserved as precision collections in addition to the original accessions. The challenge for the documentation is that the data do not explicitly represent the SSD lines either. For example, it is doubtful that an SSD line that is derived from a landrace can still be called a landrace, since the character of a landrace is greatly lost through the selection of a single-seed descent. Conversely, the data obtained by genotyping an SSD line cannot be transferred to the entire accession. The same applies to phenotypic data that is not necessarily valid for both the original accession and the SSD line created from it. If, for capacity reasons, not all SSD lines can be preserved, assigning the data collected to the original accessions without comment should be avoided.

#### **5. Conclusions**

In this article, an overview was given of the most important topics, systems and structures of the documentation of plant genetic resources in genebanks. Furthermore, existing challenges were described.

In the context of the conservation of plant genetic resources, structured documentation of the material is essential. This includes a variety of basic information, such as origin or legal status, but also data on conservation management, such as storage quantities, germinability, etc. In addition, phenotypic and agronomic characterisation play an important role. Only through this it is possible to exploit the potential of the conserved material for research and breeding.

Over the past decades, there have been significant efforts to intensify cooperation between genebanks in order to conserve plant biodiversity in the best possible way. To this end, networks have been established, such as the European Cooperative Programme for Plant Genetic Resources. Such networks have been an important basis for the development of international aggregator systems such as EURISCO or Genesys, which serve as central entry points for the search for plant genetic resources.

These information systems are currently limited to passport data and phenotypic data. In terms of a sustainable use of genebank collections, it seems promising to continuously expand the interactions between the globally available information systems and to enrich the existing information on genebank accessions with additional data from other domains. In this context, a number of pilot projects have been successfully carried out in the recent past, which aimed at the molecular exploitation of genebank collections. These promising approaches must be continued in the future.

In order to meet the requirements of the modern research landscape, which is characterised by constant diversification and big data, the information systems of the genebanks must be able to connect with other data sources and to make their data available in a network. The linkage with each other and with other data domains leads to new knowledge about plant genetic resources, which ultimately improves their usability and the reputation of the genebanks.

**Author Contributions:** Coordinated the drafting, S.W.; conceptualization and writing, S.W., U.L. and M.O. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Acknowledgments:** We would like to thank the National Inventory Focal Points of the EURISCO network for providing data on the information management situation in their countries through an ad hoc survey. We also thank Helmut Knüpffer for providing information on the history of genebank documentation.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Review* **Genebank Phenomics: A Strategic Approach to Enhance Value and Utilization of Crop Germplasm**

#### **Giao N. Nguyen \* and Sally L. Norton**

Australian Grains Genebank, Agriculture Victoria, 110 Natimuk Road, Horsham 3400, Australia; sally.norton@agriculture.vic.gov.au

**\*** Correspondence: giao.nguyen@agriculture.vic.gov.au; Tel.: +61-3-4344-3315

Received: 3 June 2020; Accepted: 26 June 2020; Published: 29 June 2020

**Abstract:** Genetically diverse plant germplasm stored in ex-situ genebanks are excellent resources for breeding new high yielding and sustainable crop varieties to ensure future food security. Novel alleles have been discovered through routine genebank activities such as seed regeneration and characterization, with subsequent utilization providing significant genetic gains and improvements for the selection of favorable traits, including yield, biotic, and abiotic resistance. Although some genebanks have implemented cost-effective genotyping technologies through advances in DNA technology, the adoption of modern phenotyping is lagging. The introduction of advanced phenotyping technologies in recent decades has provided genebank scientists with time and cost-effective screening tools to obtain valuable phenotypic data for more traits on large germplasm collections during routine activities. The utilization of these phenotyping tools, coupled with high-throughput genotyping, will accelerate the use of genetic resources and fast-track the development of more resilient food crops for the future. In this review, we highlight current digital phenotyping methods that can capture traits during annual seed regeneration to enrich genebank phenotypic datasets. Next, we describe strategies for the collection and use of phenotypic data of specific traits for downstream research using high-throughput phenotyping technology. Finally, we examine the challenges and future perspectives of genebank phenomics.

**Keywords:** high-throughput phenotyping; statistical modelling; phenotypic breeding; genomic selection

#### **1. Introduction**

The global population is forecasted to reach 9.6 billion people by 2050, with the current 1.3% annual growth rate of crop productivity required to increase to 2.4% to meet the expected food security needs [1]. Concomitantly, climate change is impacting global food and biofuel production chains through rising temperatures, increasing carbon dioxide concentrations, unpredictable rainfall patterns, and soil degradation [2]. Extreme weather conditions, such as drought and heat that occurs during critical crop growth phases, reduce yield and production of major grain crops, and can trigger the emergence of new pests and diseases that can cause further production losses [2,3]. To produce enough food to meet this increasing demand under current and future growing conditions, it is critical that plant breeders develop new high yielding, environmentally resilient, and sustainable crop varieties.

Crop improvement through breeding relies heavily on genebanks worldwide to provide genetically diverse material that contains genes and alleles that govern desirable agronomic traits [4]. Novel alleles discovered from genebank genetic resources underpin the selection of, and enhance the genetic gain in breeding programs for favorable traits such as high yield, abiotic, and biotic stress adaptation [5–7]. However, under current practices, most genebanks can only offer end-users limited passport and basic characterization data based on morphological traits guided by standard international descriptors [8]. Only a small number of accessions have agronomic and quality trait data available [9]. There are

around 1750 individual genebanks worldwide that preserve approximately 7.4 million accessions of agricultural genetic materials [10]. However, only 10% of these accessions are used for breeding purposes, partly due to poor phenotypic and genotypic characterization or lack of evaluation for agronomic traits [11] or because they are not publicly available [5].

Li et al. [12] pointed out two major limitations preventing the exploitation of genebank genetic resources for breeding programs: 1) time and available resources for thorough characterization of accessions at a large scale; and 2) identifying and introducing the allelic variance into elite breeding materials. This missing characterization data makes searching for an accession with specific desirable agronomic traits from within the millions of accessions held in genebanks, like 'finding the proverbial needle in a haystack' [13]. Therefore, to improve the utilization of germplasm, genebanks are increasingly required to move beyond providing basic passport data that defines only the identity and origin of the genetic resources, to thoroughly catalogue and make publicly available additional information for accessions such as agronomic, physiological, and genetic traits that meets the specific needs of end-users [14–16].

McCouch et al. [17] proposed a strategic three-step approach to effectively mine genebank genetic resources that combines genomics and phenomics with efficient database management to enhance the value of available germplasm that is readily available to breeders. Although the use of genomics by genebanks have advanced due to the development of DNA technology and next-generation genome sequencing [18,19], genebank phenomics still lag in the valorizing of available plant genetic resources [20,21]. The lack of robust, cost-efficient phenotyping tools and systematic collection of phenotypic data of accessions are currently a bottleneck, restricting the exploration and utilization of genebank genetic resources for downstream research and breeding [22]. A vast amount of useful agronomic and physiological information from genebank seed regeneration trials are not systematically recorded, contributing to the underutilization of germplasm [5]. Since standard genebank characterization practices can be expensive and time-consuming, a strategic cost-effective approach for simultaneously collecting multiple phenotypic trait data from genebank accessions during routine annual seed regenerations is essential to efficiently collect this valuable data and provide it to end-users [7,23,24]. This phenotypic data can be readily available for use in combination with genotypic information in subsequent genomic studies and breeding purposes [19,25]. High-throughput phenotyping (HTP) using sensors and imagers is a promising, efficient, and cost-effective approach to collect phenotypic data for multiple traits across large scale trials, that can then be used together with genomic data for accurate selection in breeding [26–28]. This approach has been successfully applied for genomic selection in wheat using various sensor-derived representations of agronomic and adaptive traits [29,30].

Although there are numerous excellent reviews on genebank mining using genomic approaches [13,18,31,32], only a handful of literature has addressed the exploitation of plant genetic resources using a phenomic approach is available and even do not fully cover genebank management practice as a whole [17,33–35]. In this manuscript, we examine: (i) current HTP methods that can be applied to phenotype accessions to leverage genebanks' phenotypic dataset; (ii) compatible crop traits that can be phenotyped by HTP technology and catalogued together with passport data in a genebank database; and (iii) data management strategies to effectively exploit these phenotypic data for future use. Finally, we discuss the challenges and future perspectives of genebank phenomics. Although there are numerous HTP methods, we limit our discussions to those that are more applicable to characterize and evaluate genebank germplasm in accordance with the international crop descriptors.

#### **2. Phenomics to Unlock the Genetic Potential of Genebank Germplasm**

#### *2.1. Plant Phenomics and Its Potential Applications for Plant Genetic Resources Research*

Plant phenomics is a multidisciplinary field that enables the systematic and comprehensive research and development of robust HTP tools and methods for data capture, processing, handling, and meta-analysis of phenotypic properties, growth, the performance of crops, and their environments [22,36]. The foundation of plant phenomics is the advent of HTP technology, which contrasts with more conventional arduous manual and destructive phenotyping, as it uses sensor- or image-based instruments to non-destructively simultaneously measure morphological, agronomic, and physiological characteristics of crops on a large scale across time and space. HTP technology is fundamentally based on principles of interaction between plant cellular components and natural light spectra between 400–2500 nm [37]. By capturing and analyzing these interactions proximally or remotely, important morphological, agronomic and physiological properties can be derived such as crop growth status, phenology, water and nutrient content, and yield potential [38]. The HTP approach has been widely used for decades in agriculture and plant science research with promising outcomes [39,40].

Various HTP platforms that use a combination of multiple sensors have been developed over many years that are suitable for plant science research in controlled and field conditions [27,39]. In the controlled environment, the system such as the automated Scanalyzer 3D imaging platform developed by LemnaTec GmbH (Aachen, Germany) has been effectively used to phenotype various crops and traits [41,42]. In the field, many HTP platforms are currently deployed such as the Field Scanalyzer gantry type [43]; manned- [44] or unmanned ground vehicles (UGV) [45]; and manned- [29] or unmanned aerial vehicles (UAV) [46,47]. These platforms are equipped with multiple sensor types and can be used to capture various crop traits at the same time. These sensor technologies will continue to advance over time and are likely to become less expensive, and hence more affordable for use in plant science applications.

Sensors developed for HTP can be broadly classified into either active or passive sensors that need to be considered when used to capture data. Passive sensors measure reflectance coming directly from natural light, thus the data captured by these sensors are highly affected by environmental conditions. Examples of passive sensors are Analytical Spectral Devices (ASD) FieldSpec spectroradiometer [48,49]; red-green-blue (RGB) [50,51], multi- [52] and hyper-spectral [53], and thermal cameras [54]. Active sensors, however, use their own light source and therefore the resulting reflectance is much less affected by the environment, with crop circle [42] and light detection and ranging (LiDAR) [55] being typical examples. Regardless of which type is deployed, sensors must be well calibrated and raw data should be normalized before analysis for quality assurance. Individual or multiple sensors can be handheld and mounted on vehicles or platforms, depending on the experimental setup and availability [56].

The workflow of deploying sensors for phenotyping crop experiments usually involves three main steps: 1) data capture; 2) raw data processing and storage; and 3) validation and comprehensive data analysis. Raw data are captured by sensors and processed by computer software algorithms to derive digital plant parameters such as vegetation indices (VIs) or structural properties. Once validated by compatible ground truths from conventional observations, these digital parameters can be used as proxies of crop traits for subsequent analysis. For instance, one of the most common vegetative indices is the normalized difference vegetation index (NDVI), which is derived from red and near-infrared (NIR) spectral bands and is widely used as a representation of biomass, grain yield, and crop N status [57]. The 2D and 3D structural models can be reconstructed from RGB, multispectral, and thermal imagery to derive important agronomic traits for various crops under different environments such as flowering time of rice [58] and wheat [43]; crop biomass of field peas [42] and wheat [41]; plant height and biomass of rice [59] and barley [60]; seed characteristics of lentils [61], rice [62], and field peas [63]; architectural and physiological properties of apple trees [64]; height and morphological characteristics of blueberries [65]; canopy temperature of black poplars [66]; bunch architecture of grapevines [67]; and ripeness estimation [68] and fruit counts [69] of mangos. Recent advances in computer algorithms and machine learning have significantly improved the throughput of raw data processing and analysis, where the processing pipelines have enabled data capture, analysis, and extraction of multiple patterns and features simultaneously [70]. Machine learning in sensorand image-based phenotyping has been applied successfully for germination assessment of tomato seeds [71], head count [72,73], yield prediction in wheat [74], and prediction of seed longevity in oilseed rape from chemical compositions [75].

Since thorough discussions on the development and application of HTP tools for agriculture research alone are not the main purpose of this review; readers can find detailed information about sensors and platforms, image processing and storage, data analysis approaches from numerous excellent reviews, and references cited therein [22,26,27,36,76–79].

#### *2.2. Why Genebank Phenomics?*

There are multiple factors, both subjective and objective, that make genebank phenomics feasible and strategic, i.e., the availability of cost-efficient HTP technology; the nature of routine operations; the pressure to efficiently exploit genetic resources for crop improvement and the conservation of genetic diversity. Phenotyping is the most expensive yet indispensable component of any plant research and crop improvement program to understand the genetic basis and interaction between genotypes and environments. The use of HTP tools and methods discussed above by genebanks will shorten the time requirement, increase throughput, improve consistency, reduce the overall cost of phenotyping projects, and improve selection accuracy in breeding programs, especially for large-scale trials [80].

Genebanks complete essential seed regeneration as routine practices to maintain the viability, quality, and quantity of accessions, e.g., when the quantity and viability of specific accessions fall below a standard threshold [7,81]. Characterization of germplasm for a range of phenotypic traits is undertaken during the regeneration process, however, traits are manually recorded, can be subjective, and are time-consuming to collect, limiting the amount of data able to be captured, resulting in a wasted opportunity for a comprehensive characterization of genetic materials, with the flow-on effect of restricting their subsequent utilization. Mining superior agronomic alleles for breeding is crucial for improving crop yield and resilience, with the availability of comprehensive phenotypic data for genebank germplasm enabling researchers and breeders to more accurately identify desired accessions for breeding projects [82].

The application of low-cost HTP methods to assess the true value of genetic resources, accurate estimation of their agronomic phenotypic traits for a complete phenotypic representation of collections will significantly improve the gains of pre-breeding or breeding programs with marginal extra expenses. This is particularly useful for studying complex traits such as grain yield. Multiple secondary traits captured by HTP tools that correlate well with target traits (i.e., grain yield) can be used as surrogates in yield selection models to improve prediction accuracy. For instance, Rutkoski et al. [29] showed that the use of canopy temperature and NDVI measured by aerial thermal and hyperspectral sensors substantially improved genomic and pedigree yield predictions of 557 wheat lines across five growing environments. Interestingly, the authors also pointed out that genetic value for grain yield can be accurately estimated by using these secondary phenotypic traits in absence of pedigree and genomic data. The phenotypic profiling of genebanks' accessions can, therefore, provide direct support for phenomic selection or choice of parents in breeding programs.

Genomic selection has been proven as an excellent tool to estimate genomic breeding values and is now widely used as a routine selection method in crop breeding [83,84]. However, since its successful introduction over the last two decades, there has been a significantly faster loss of genetic variance in breeding programs compared to conventional phenotypic selection [85]. To slow down the loss of genetic diversity through genomic selection in plant breeding, a physiological breeding approach combining multiple integrative traits captured by HTP tools in conjunction with genomic selection methods, with a heavier weight placed on phenotypic components, could be an alternative [86]. The advantage of phenomic selection has been demonstrated by Rutkoski et al. [87], where the authors claimed that using an optimized breeding scheme with phenotypic selection for quantitative analysis of stem rust resistance in wheat would result in equal genetic gains as genomic selection, but higher genetic variance. This phenotypic selection approach is further supported by a recent study of Rincent

et al. [88], where distinctive endophenotypes, such as transcripts, small RNAs, or metabolites, could be used as phenomic markers for the selection process. The authors found that the matrices of near infrared spectroscopy absorbance between 400 and 2500 nm of winter wheat grains and leaf tissue could provide better yield prediction than molecular-based markers. Thus, using these low-cost, high-throughput endophenotypic markers significantly improved genetic gains, while better conserving allelic diversity of breeding populations.

Finally, safeguarding genetic resources ex-situ for integrity, diversity, and allelic variability for future use is the mandatory task of every genebank whose materials fuel breeding programs, underpinning food security efforts and bringing billions of dollars in benefit [17]. For instance, by 1997, the world economy had benefited approximately \$115 billion annually from using wild materials from genebanks to develop environmentally resilient and resistant crops [89]. Selections for desirable agronomic traits are the driving force of plant domestication and crop improvement. However, extensive breeding selections lead to loss of genetic variants, narrowing a crop's genetic base and an overall erosion of crop diversity through breeding programs [40,90]. Alarmingly, there is also mounting evidence that indicates that allelic variance of genebank accessions might be lost over time through seed regeneration due to genetic drift and inbreeding, while its storage size and maintenance costs will keep increasing [91]. Genebank accessions are collected from various geographical locations, thus original phenotypic variance could be lost during ex-situ conservation and seed regenerations [92]. While DNA fingerprinting is the most effective method to verify the genetic integrity of regenerated materials, the associated genotyping cost is still too high for large scale genotyping of thousands of accessions per year [62]. Thus, a complete phenotypic assessment of accessions through periodic seed regenerations could be a counter measure to ensure that original phenotypic features are preserved. Furthermore, those accessions possessing desirable agronomic characteristics can be recommended for immediate use, whereas those which do not have attributes of immediate interest can be conserved for further evaluation under different and specific environmental conditions, or potentially be discarded.

#### **3. Phenomic Characterization and Evaluation of Genebank Accessions**

Missing or incomplete passport, characterization, and evaluation data is one of the main reasons for the underutilization of genebank germplasm. For decades, the crop descriptor lists developed by Bioversity International have been routinely used to standardize genebank data collection and to facilitate the exchange of information between genebanks and end-users. This data is also used by genebanks to catalogue morphological and physiological characteristics of various crop species for germplasm validation processes [8]. In this section, we discuss the potential of deploying HTP technology to routinely collect quantitative data of specific traits in line with these descriptor lists and the possibilities of collecting additional data with a marginal cost that can enriches genebank collections. Highly heritable morphological and physiological features can provide invaluable information for strategic selection schemes used by plant breeders to speed up the development of new, high yielding environmentally adaptive cultivars [93]. Table 1 details systematic HTP approaches of genebank germplasm for morphological and physiological traits in different environments.





*Plants* **2020**, *9*, 817


LemnaTec 3D Scanalyzer RGB, red green blue; 2 LiDAR, Light Detection and Ranging; 3 UAV, Unmanned Aerial Vehicle; 4 UGV, Unmanned Ground Vehicle; 5 P-TRAP, Panicle TRAit Phenotyping;

 6

Panicle-SEG,

1 Panicle segmentation

 algorithm; 7 CCD,

charge-coupled

 device; 8 D3P, Digital Plant Phenotyping

 Platform.

*Plants* **2020**, *9*, 817

**Table 1.** *Cont*.

#### *3.1. Morphology*

The collection of data on the morphology of accessions is a critical part of routine curatorial activities of any genebank. These data describe overall plant architecture, height, leaf shape, and angle. Conventionally, these data are visually assessed and manually recorded by curators, which are sometimes subjective in nature and prone to human errors. This labor-intensive and time-consuming notetaking can be replaced by robust HTP technology (Table 1). Morphological characteristics of various crops such as number of tillers (wheat) [94]; node and internode length (tomato) [95]; panicle, branch, and leaf number (rice, maize, tomato) [97]; and leaf shape (legumes) [102] can be easily acquired by cost-effective RGB imagery tools. The HTP technology can be flexibly applied for trait capture under various growing conditions including field and greenhouse environments. For example, plant height is an important botanical trait that is defined as the shortest distance from ground level to the upper boundary of photosynthetic tissues [145]. It is a useful indicator of crop growth rate, biomass, yield potential, and lodging resistance [46, 132]. Studies on wheat have shown that lodging can cause yield losses up to 80% [146]. Thus, strategic exploitation of genebank germplasm for novel alleles is crucial for the development of lodging resistant cultivars. Several HTP methods using sensors such as ultrasonic sensors, LiDAR or RGB cameras can be used to measure plant height in the greenhouse and field [98]. However, the method using a combination of LiDAR and RGB camera mounted on a groundor aerial based vehicle appeared more feasible with a similarly high level of accuracy [147]. Using this method, plant height can be modelled and estimated by the principles of the structure from motion photogrammetry, where the difference between digital terrain model (DTM) and digital surface model (DSM) is the average height of plants within the plots [99]. Quantitative measurement of lodging can be derived from the differences between DSM before and after lodging events, which has been demonstrated in barley [131], wheat [132], and rice [133].

#### *3.2. Inflorescence and Fruit*

Inflorescence and fruit are important and distinctive botanical features of crops used to identify and classify genebank accessions. In physiological breeding, highly heritable traits in cereals such as spike length, spike weight, and floret number per spike are indicators of agronomic values, yield, and adaptation for selection schemes [146]. These traits can be quantitatively measured on large-scale seed regeneration trials by using cost-effective HTP technology (Table 1). For instance, Grillo et al. [148] developed a method to differentiate wheat landraces by glume size, shape, color, and texture using a color scanner. Likewise, Makanza et al. [118] designed a simple low-cost RGB imaging method to quantitatively measure seed size, number, and weight of intact maize cobs in the field. Most recently, Genaev et al. [113] described a simple RGB imaging setup which can precisely quantify morphological features such as spike shape and awnedness of the wheat spike. This work suggests that the deployment of HTP methods can help curators digitally characterize a wide range of traits related to the inflorescence and fruit, and once incorporated into genebank databases, readily provide this quantitative data for subsequent genetic analysis and breeding purposes by end-users.

#### *3.3. Seed Characteristics*

Seed traits such as shape, size, and coat color are crucial criteria for determining commodity market values and are highly controlled by genetics. Genebanks routinely characterize seeds based on general morphology and use the data for both in-house quality assurance and end-user purposes. Currently, most genebanks manually collect this data following seed regeneration cycles based on visual assessment of traits, an approach that can be subjective and potentially lead to inaccurate results. Image-based phenotyping methods using RGB, multispectral, and hyperspectral cameras could be a cost-effective and accurate substitute for manual phenotyping since shape, size, and coat color are easily reconstructed and analyzed using reflectance spectra from the seed surface, and are known to be related to chemical properties (Table 1) [149]. These HTP tools have been applied for

seed quality, purity, viability, vigor testing, and variety identification on various crop species [150,151]. Potentially, this cost-effective HTP technology can be used to develop seed descriptor states for crop species [120,152] and as routine methods for managing genebank accessions as they are included into the collection from new acquisitions or seed regeneration events to avoid physical contamination and maintain genetic integrity [62], as well as genomic selection for seed traits [153].

#### *3.4. Phenology*

Understanding the timing of key physiological growth stages such as germination, flowering, and their variations is critical for crop production and breeding of new high yielding and environmentally adaptive varieties. Therefore, the documentation of crop phenological traits such as germination, flowering, and maturity are routinely recorded during genebank seed regeneration cycles. Research has shown that image-based phenotyping can be used to effectively measure these qualitative traits as a replacement to conventionally visual-based methods. For instance, HTP technology has been used for phenotype emergence [123,124], heading, and flowering [43,129] of various crop species (Table 1). This also suggests that these trait data can be systematically and simultaneously captured together with other traits by using HTP technology, reducing the cost and increasing the opportunity to explore the genetic potential of individual accessions.

#### *3.5. Physiological and Agronomic Traits*

Most genebanks choose to skip the collection of physiological and agronomic traits (known as evaluation data) from their standard curatorial procedures due to either funding shortage or the labor intensity required. However, this information together with genebank passport data is critical for prioritizing and enhancing the utilization of valuable germplasm for the selection of parents used for breeding [35,154]. Core collections can be generated using phenotypic data of useful agronomic traits [155]. A plethora of reports and publications from multiple international research groups have indicated that specially designed HTP platforms can comprehensively capture multiple trait data simultaneously that can be subsequently exploited not only by curators but also by plant scientists and breeders (Table 1).

This is particularly helpful when dissecting the genetic basis of polygenic traits such as grain yield or adaptive traits. For instance, grain yield is a critically important trait for selection in physiological breeding and can be effectively captured along with other descriptive traits using HTP technology during standard curatorial procedures at genebanks. However, grain yield is a genetically complex trait that can only be improved by simultaneously enhancing other secondary morphological and physiological traits such as plant architecture, lodging resistance, photosynthetic capacity, canopy temperature, and harvest index. This approach has been proposed for major agricultural crop species such as wheat [86,156], rice [157], and pulses [158]. Therefore, the use of HTP technology has a distinctive advantage over the conventional manual collection method, where the former can capture multiple secondary trait data quantitatively at the same time. This will reduce time, labor, and phenotyping cost with the benefit of a comprehensive data set, fully describing crop growth and yield, which is critically valuable to breeding programs. Distinctive secondary traits can also be directly used to breed for adaptation or indirectly in forward genetics for molecular cloning and gene identification. For example, stay green is an adaptive trait that provides better drought tolerance and nitrogen use efficiency in crops [56,159]. Research has shown that stay-green is a part of the drought adaptation mechanism that increases yield stability and lodging resistance in sorghum and other cereals, where it can lead to prolonged grain filling duration and improve yield [146,160]. Interestingly, other reports show that canopy temperature (CT)—an indicator of evaporative cooling from the canopy surface and an adaptive trait for high yielding and drought tolerance—is associated with stay green and deeper roots [161]. Thus, a HTP approach using a combination of sensors can capture stay green and CT traits together with other traits such as NDVI, height, biomass, and ground cover, as well as being used for selection in breeding programs [162,163].

#### **4. Challenges**

#### *4.1. Lack of Resources*

Despite the enormous potential to phenotype and characterize genebank germplasm to enhance genetic gain in plant breeding, there are several constraints that genebanks must fully address before being able to move forward. The first and perhaps the biggest challenge is the availability of resources for a long-term phenotyping scheme [7]. Although HTP phenotyping of genebank germplasm will provide valuable information for end-users, the associated cost for purchasing, establishing, and operating of sensors, phenotyping platforms, analysis, validation, and making available phenotypic data in a searchable online platform, respectively, is not a trivial task, and might not be affordable by every genebank [164,165]. Hidden costs such as equipment and database maintenance, software licensing, and upgrades need to be considered. Therefore, genebank managers need to carefully consider the balance between investment, labor cost, and achievement of goals before initiating HTP projects. For instance, low-cost simple HTP tools such as PhenoBox [166] can be developed for effective phenotyping of seed regeneration in the greenhouse without the need for complicated and costly automated phenotyping platforms reported by Nguyen et al. [41,42]. More importantly, in contrast to short-term research projects, genebanks are long-term investments with large numbers of accessions that require well planned, consistent phenotyping programs to be in place. Adequate planning and resources must be made available for effective phenotyping to be undertaken over the long-term if research and breeding programs are to achieve the increased plant production required in the future.

#### *4.2. Technical Di*ffi*culties in Data Management and Analysis*

Data capture, standardization, quality assurance, and analysis are technical challenges related to genebank phenotyping. HTP technologies generate a large volume of 'big data' in a short period of time using standardized protocols. However, a high level of infrastructure investment and a multidisciplinary approach for the appropriate storage, back up, data management, and analysis is required [167,168]. These data must be thoroughly validated before they can be used. In contrast to genomic data, plant phenotypes are non-constant, plastic, and change over time, as they are the results of instantaneous interactions between genotypes and the environments [169].

Furthermore, phenotypic data of field seed regenerations, mainly collected by passive sensors and cameras, are highly influenced by spatial and temporal climatic conditions and must be processed through sophisticated computational algorithms before the data can be made readily available for genebanks scientists [26]. Therefore, if data collection is not standardized for unpredictable weather and changes in agronomical practices, over the years data analysis will become difficult due to the disparity between different data sets, rendering the HTP efforts useless [23]. To cope with fluctuating climatic conditions, Xu [170] introduced the strategic 'envirotyping' approach, where local environmental data such as soil, weather, biotic factors, and crop management practices are documented as metadata together with plant phenotypic data.

All data must be integrated into a well-structured and publicly searchable database for end-users [171]. For instance, the Genebank Information System (GBIS) of the Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany, currently houses approximately 151,000 crop accessions and is comprehensively managed across passport data, seed/line management, taxonomy, phenotypic characterization, and evaluation data [172]. To ensure phenotypic data generated from HTP are fully described and annotated, the plant phenotyping community has recommended a convention on the minimum information about plant phenotyping experiments (MIAPPE), where all experimental conditions are well described and published together with phenotypic data [173]. Wilkinson et al. [174] introduced the FAIR principles (findability, accessibility, interoperability, and reusability) for the management of scholarly data, where their application will enhance the handling, sharing, reuse, and interpretation of data and metadata. In addition to passport and phenotypic data, images demonstrating morphological features, which are not easily analyzed and

represented by numerical data, should be included [175]. Clearly, the systematic collection of phenotype and metadata and its stewardship will assist genebank scientists to fully describe the datasets and conditions where seed regenerations are conducted and enable the interpretation of the phenotypic plasticity in statistical models.

#### *4.3. Users' Awareness*

Finally, the communication of genebank data sets to the research and breeding community is critical for increasing the successful utilization of genebank germplasm. Currently, only around 10% of genebank germplasm is used in plant breeding due to a range of technical reasons including lack of good quality phenotype and genotype data on the germplasm, as well as accession's low level of adaptation to changing environments and genetic drift [11]. Although accurate phenotypic data are valuable to plant breeders to identify outstanding performers as potential parents, phenotyping is still the most expensive part of any breeding programs [176]. Breeders are often reluctant to take risks screening a large amount of diverse genebank germplasm without any certainty of their genetic potential for beneficial agronomic traits due to the cost and significant challenge for them to identify valuable germplasm or novel alleles from a 'sea of seeds' [177]. In this context, readily available phenotypic and genomic data that enriches genebank passport data will enhance the utilization and overall value of germplasm stored in genebanks.

#### **5. Future Perspectives**

#### *5.1. Systematic HTP Phenotyping of Routine Genebank Seed Regenerations*

Despite the challenges posed by the deployment of sensors and image-based HTP protocols, systematic collection of data from genebank seed regeneration cycles can effectively derive multiple trait data for downstream research. One of the advanced features of HTP is that multiple sensors can be deployed at the same time to simultaneously and non-destructively capture many independent observations that will allow for more targeted prioritization of accessions from large genetic resources collections for downstream studies. Targeted beneficial endophenotypes of individual genebank accessions can be directly used for the low-cost phenomic selection in breeding process or prioritize germplasm for higher value in the selection of crossing parents [88]. A proposed strategic phenomic approach for the collection of multiple trait data, management of genebank collections, and increasing utilization of data and seed by end users through the adoption of HTP technologies is shown in Figure 1.

Routine seed regeneration protocols of genebanks are often conducted in small, unreplicated plots or even single rows in the field or pots in greenhouses. Seed regeneration blocks should be replicated with a reasonable number of individuals whenever possible to facilitate statistical analysis and ensure sufficient number of seeds are used to maintain the genetic diversity and integrity of accessions (Figure 1) [178,179]. A large amount of morphological, agronomic, physiological (Table 1), and environmental data [170] can simultaneously be collected from routine seed regeneration cycles over subsequent years using HTP (Figure 1). Even though these phenotypic data are highly incomplete [23], meaningful inferences can still be achieved using appropriate analysis methods, such as identifying novel alleles [25]. Measurement of grain yield from small seed regeneration plots is generally not meaningful and is sometimes impractical to measure when thousands of lines are being regenerated in a single sowing event [30]. A practical and cost-effective phenotyping approach could be used to measure secondary correlated traits such as early vigor, height, canopy properties, and biomass during the growth phase which are components contributing to grain yield. Moreover, these phenotypic data can be instantly used in conjunction with genetic data generated by advanced genotyping technologies such as diversity array technology combined with next generation sequencing (DArT-Seq) and appropriate data quenching for direct phenomic and genomic selection from landrace accessions (Figure 1) [180–182].

**Figure 1.** A proposed strategic phenomic approach to improve the value and utilization of genetic resources. IPPN, international plant phenotyping network; EPPN, European plant phenotyping network; NAPPN, North American plant phenotyping network; APPF, Australian plant phenotyping Facility; DPPN, German plant phenotyping network; PHENOME, French plant phenomic infrastructure; GLIS, global information system; GeneSys-PGR, global portal on crop genetic Resources; GRIN-Global, global germplasm resource information network; EURISCO, European plant genetic resources search catalogue; SINGER, system-wide information network for genetic resources; GODAN, global open data for agriculture.

This strategic phenomic approach has been deploying at genebanks elsewhere. For example, the Australian grains genebank (AGG), Horsham, Victoria was established in 2014 and currently houses approximately 195,000 accessions of 918 crop species such as wheat, barley, canola, field pea, chickpea, lentil, sorghum, maize, cowpea, mungbean, and millets. The number of accessions has increased by around 2750 per annum (Figure 2) [9]. Annually, AGG regenerates more than 3500 accessions of genetically diverse crop species and wild relatives in the field and greenhouse, subject to viability, quantity in stock and user demand (Figure 2)[9]. This routine activity requires large inputs of labor and resources costing in excess of A\$500,000 per annum. Due to the large number of accessions being regenerated annually, using conventional phenotyping methods to obtain a complete phenotypic data set is not possible. Several field HTP platforms can acquire multiple crop traits such as plant height, biomass, leaf area index, and canopy temperature across thousands of seed regeneration plots at the same time [54,99,136]. The AGG is currently applying different HTP platforms such as automated phenotyping of Plant Phenomics Victoria, Horsham [41,42], laboratory-based phenotyping of spikes and airborne platforms (Figure 2) to capture more useful morphological, agronomic, and physiological traits from seed regeneration cycles. Once validated and analyzed, these data will be made publicly available with passport data which will help end-users prioritize higher value germplasm for targeting traits in subsequent studies (Figure 1).

**Figure 2.** Australian grains genebank (AGG) storage facilities and its application of HTP technology for phenotyping routine seed regenerations in laboratory, greenhouse and field.

#### *5.2. A Combination of Genebanks' Data Mining Approaches*

To enhance value and utilization of germplasm, it is crucial that traits are identified and linked with genebank accessions (Figure 1). Several methods have been proposed for mining genebank data such as using published data sources and users' feedback, core and mini core collections. phenotyping and genotyping approaches [15]. Overall, these methods aim to identify genebank accessions containing agronomic traits of interest. Core collections can be constructed by using phenotypic (Figure 1) [88,183], genomic [13], and geographical information of accessions for certain crop traits. A variety of software packages are available to develop core collections solely using phenotypic traits, for instance, Chung et al. [184] analyzed 11 quantitative and 28 qualitative phenotypic traits from 10,368 characterized rice accessions and derived a core collection of 107 entries by using PowerCore software. Similarly, Dutta et al. [185] constructed a core set of 2,208 accessions from 22,469 accessions of wheat and its wild relatives used 34 highly heritable phenotypic traits.

Accessions can also be grouped based on the geographical information data as specified in the focused identification of germplasm strategies approach (FIGS) [186,187]. The underlying principle of the FIGS approach is that crops will likely evolve under environmental selection pressures and develop their adaptation in response to extreme climatic conditions. Thus, the method uses detailed eco-geographical location and weather conditions where accessions were collected to precisely predict their adaptive traits to abiotic and biotic conditions. The FIGS approach has been successfully used to identify several core sets such wheat stem rust [188], drought in faba bean [187], powdery mildew in wheat [189], and Russian wheat aphids [190]. Using these data mining approaches will narrow down the number of accessions for further analysis while the allelic variance is still well maintained in the subsets.

Several approaches can also be used in sequence to increase the chance of identifying targeted accessions as reported by Haupt and Schmid [191] where two core collections of 183 and 366 soybean accessions were chosen from the original collection of more than 17,000 accessions by using a combination of FIGS approach and SNP genotypic markers. Given the advanced features of HTP, capturing multiple traits nondestructively at the same time, core collections can be easily developed and accessions containing promising traits of interest will be chosen for further studies.

#### *5.3. A Collaborative Network for Data Collection, Analyzing and Sharing*

To improve HTP practices, genebanks should work closely with universities, research institutes and industries to standardize seed regeneration procedures, phenotyping protocols, and calibration of sensors so that the resulting phenotypic data are comparable across genebanks and able to be fully exploited by end-users. Numerous initiatives have been implemented at the national and international scale, which aim to bring academia and industry together to address common phenotyping questions and integrate the plant phenotyping community (Figure 1) [171]. For instance, world-class plant phenotyping infrastructures have been established in Australia to enhance the capability, capacity, and scientific rigor in support of national plant phenomics studies and applications. These include the Australian plant phenotyping facility with three nodes located at the University of Adelaide, the Australian National University, and the Commonwealth Scientific and Industrial Research Organization, Canberra, respectively. Moreover, the Plant Phenomics Victoria is home to two nodes located in Horsham and Bundoora, Victoria (Figure 1). At the global scale, the International Plant Phenotyping Network [171] is an organization representing plant phenotyping centers, that formulates multidisciplinary working groups and enables the communication between stakeholders through conferences and training workshops so that up to date information about new HTP infrastructures and methodologies for various crop phenotypes can be effectively shared (Figure 1). This networking collaboration is essential to foster the advancement in plant phenotyping technologies including affordable phenotyping, sensors, and platforms, targeting traits and data analysis pipelines and data management.

To make valuable information pertaining to germplasm available for global users, a cooperative platform for data collection, analysis and sharing is urgently required. Several international initiatives such as DivSeek, breeding API, research data alliance (RDA), and global open data for agriculture (GODAN) have been developed, all of which have aimed to facilitate the integration and sharing of evaluation and characterization data so as to improve the value and utilization of germplasm [192]. For instance, the DivSeek international network is a global, community-driven initiative that facilitates the cooperation and interactions among its members through working groups [17]. Genebanks, phenotyping scientists, and breeders can develop and share methodologies, tools, and best practice phenotyping technologies to evaluate genetic resources, which improve the generation, integration, and sharing of phenotypic data [193]. Moreover, the introduction of the global information platforms such as the global portal on crop genetic resources (GeneSys-PGR) has enabled breeders and genebank users to use free online search engines to explore and request germplasm accessions conserved in genebanks worldwide [177]. The global information system is an international portal that links all current plant genetic resources systems by using unique digital object identifiers (DOIs) for individual accessions [192]. By using DOIs and linkage through these portals, invaluable phenotypic evaluation and characterization of germplasm can be effectively shared with the global user network.

Individual institutions can setup different collaboration protocols for sharing and exchanging phenotypic and genotypic data (Figure 1) [172]. For example, the International Maize and Wheat Improvement Center (CIMMYT), El Batán, State of Mexico, Mexico distributes seed all over the world and receive data in return from experimental trials that provide valuable information to assess genotype-by-environment interaction [194]. A more similar stringent protocol can be introduced to enforce the current clause in the standard material transfer agreement of the seed distribution of any genebanks, where the end-users are obliged to give back basic phenotypic data of genetic resources which they have used such as trial location, phenology, biomass, and grain yield. This information would clearly enrich the genebanks' databases and the GeneSys-PGR, which can be used as reference guides by end-users for future use. However, phenotypic and genotypic data should be linked and shared with other national and international databases of plant genetic resources through the use of DOIs and the global portals discussed above.

Although genebank scientists make use of invaluable knowledge and techniques from other research disciplines such as plant physiologist, breeders, agronomists, seed physiologists, and computer scientists, their independent translational studies are indispensable to fully utilize HTP technology to phenotype genebanks' accessions [34]. For instance, HTP methods can theoretically be applied to quantitatively analyze the 3D canopy structure of wheat by multi-view stereo and structure from motion algorithms [50]. This protocol is not yet ready for large-scale phenotyping of genebanks accessions as translational research must be conducted by genebanks to optimize the existing protocol and determine if its throughput is applicable for large-scale phenotyping of various crop species. Similarly, more studies should be dedicated to verifying and developing feasible seed testing methods by using multi- and hyper-spectral imagery for handling genebank accessions of various crop species [62].

#### **6. Conclusions**

The application of HTP technology for large-scale phenotypic characterization and validation of genebank germplasm is essential if they are to fulfill their biorepository role (i.e., in the preservation and support of further experimentation and plant breeding) [195]. With a comprehensive phenomics approach combining pedigree, genomic, and phenotyping data [17], the true value of genebank genetic resources is evident. Therefore, they should more strategically and efficiently utilized by breeding programs that should double our current rate of genetic gain to feed the growing world population under the changing conditions expected into the future.

**Author Contributions:** G.N.N. conceived the topic and wrote the manuscript. S.L.N. provided critical comments, edits, and acquired the funding. Both authors contributed to editing, revision and approval of the submitted manuscript. Both authors have read and agreed to the published version of the manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was funded by the Grains Research and Development Corporation, grant number 9176106 to S.L.N., through the Australian Grains Genebank project—Phase 3, 2017-2022.

**Acknowledgments:** Authors wish to thank Dr. Lance Maphosa for his critical reading and comments on the manuscript. We thank Christine Best, Katherine Dunsford and Susan Robson for assistance in taking phenotyping photos in the laboratory and the field.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Germplasm Acquisition and Distribution by CGIAR Genebanks**

**Michael Halewood 1,\*, Nelissa Jamora 2, Isabel Lopez Noriega 1, Noelle L. Anglin 3, Peter Wenzl 1, Thomas Payne 4, Marie-Noelle Ndjiondjop 5, Luigi Guarino 2, P. Lava Kumar 6, Mariana Yazbek 7, Alice Muchugi 8, Vania Azevedo 9, Marimagne Tchamba 6, Chris S. Jones 10, Ramaiah Venuprasad 11, Nicolas Roux 1, Edwin Rojas <sup>3</sup> and Charlotte Lusty <sup>2</sup>**


Received: 31 July 2020; Accepted: 10 September 2020; Published: 1 October 2020

**Abstract:** The international collections of plant genetic resources for food and agriculture (PGRFA) hosted by 11 CGIAR Centers are important components of the United Nations Food and Agriculture Organization's global system of conservation and use of PGRFA. They also play an important supportive role in realizing Target 2.5 of the Sustainable Development Goals. This paper analyzes CGIAR genebanks' trends in acquiring and distributing PGRFA over the last 35 years, with a particular focus on the last decade. The paper highlights a number of factors influencing the Centers' acquisition of new PGRFA to include in the international collections, including increased capacity to analyze gaps in those collections and precisely target new collecting missions, availability of financial resources, and the state of international and national access and benefit-sharing laws and phytosanitary regulations. Factors contributing to Centers' distributions of PGRFA included the extent of accession-level information, users' capacity to identify the materials they want, and policies. The genebanks' rates of both acquisition and distribution increased over the last decade. The paper ends on a cautionary note concerning the potential of unresolved tensions regarding access and benefit sharing and digital genomic sequence information to undermine international cooperation to conserve and use PGRFA.

**Keywords:** plant genetic resources for food and agriculture; genebanks; access and benefit sharing; multilateral system; CGIAR

#### **1. Introduction**

Over the last three decades, under the auspices of the United Nations (UN) Food and Agriculture Organization (FAO), the international community has repeatedly committed itself to developing and maintaining a global system on plant genetic resources for food and agriculture (global system) (http: //www.fao.org/agriculture/crops/thematic-sitemap/theme/seeds-pgr/gpa-old/gsystem/en/). This global system includes specialized international bodies that monitor the status of the conservation and use of plant genetic resources for food and agriculture (PGRFA), develop normative instruments when necessary, and support the implementation and use of those instruments. Furthermore, in 2015, the Sustainable Development Goals were adopted, including Target 2.5 concerning the sustainable management of genetic diversity, and the following target indicator focusing on PGRFA in particular: "[n]umber of plant genetic resources for food and agriculture secured in medium or long term conservation facilities" (http://www.fao.org/sustainable-development-goals/indicators/251a/en/)

Through their management of international PGRFA collections, the CGIAR Centers make important contributions to both the global system and SDG Target 2.5. Their contributions include assembling and conserving PGRFA, adding value to those materials through extensive characterization, evaluation, documentation, and health testing, and supplying samples that are free of quarantine pests and diseases to researchers, plant breeders, farmers, national and community genebanks, and seed companies around the world. The international collections hosted by the CGIAR Centers include over 760,000 accessions of crops, forages, and trees that were originally obtained from 207 countries, as well as pre-bred materials.

Over the last ten years, the CGIAR Centers' genebanks have distributed more than 1.1 million PGRFA samples to recipients in 163 countries. (Data source: Online Reporting Tool (ORT), https: //grants.croptrust.org). These transfers represent approximately 23% of all PGRFA transferred following the rules of the multilateral system of access and benefit sharing (multilateral system) created by the International Treaty on Plant Genetic Resources for Food and Agriculture (Plant Treaty). The multilateral system is the internationally sanctioned mechanism for PGRFA exchanges under the global system. The CGIAR breeding programs were the source of an additional 66% (approximately 3.3 million samples) of the PGRFA transferred through the multilateral system. The remaining 11% of materials exchanged through the multilateral system were transferred by organizations and individuals outside the CGIAR. (Source: Plant Treaty Secretariat).

Given their central position within the multilateral system, the CGIAR genebanks' patterns of international acquisition and distribution of PGRFA over time are potentially significant proxies for the overall status and functioning of the global system in general, and institutions governing access to genetic resources and benefit sharing in particular. The CGIAR genebank managers previously participated in a study of factors affecting acquisitions by the CGIAR genebanks from 1984 to 2009 [1]. The study found that the following factors contributed to a significant drop in genebanks' rate of PGRFA acquisitions from the mid-1990s to 2009: decreased levels of international support for collecting expeditions, overstretched staff, inability to characterize and evaluate the materials already collected, and challenges associated with targeting gaps in existing collections. It established that the most consistent overarching factor was "the highly politicized nature of access and benefit sharing issues at the international, national, and local levels, combined with low levels of legal certainty". The study concluded on a millennial note, looking forward to the resolution of outstanding international tensions over access and benefit-sharing issues, and the full implementation of the multilateral system, with the result that more PGRFA would be made available to include in the international collections maintained by the CGIAR genebanks on behalf of the international community.

The research presented in this paper was initiated with the primary objective of revisiting the conclusions of the earlier study. We structured our research around the following questions: have the CGIAR genebanks' rates of acquisition and distribution of germplasm changed in the last 10 years (2010–2019) as compared to the previous decade (2000–2009)? If there were significant changes, what factors—either external or internal to the CGIAR —contributed to those changes? Finally, recalling one of the main findings of the earlier study, how have international policy frameworks in particular affected the genebanks' acquisitions and distributions of PGRFA? The data and methods we used to investigate these research questions, and our principle findings are set out and discussed below.

Before proceeding, we note that the earlier study which served as initial inspiration for the research presented here focused almost exclusively on CGIAR genebanks' acquisitions of PGRFA [1]. In the early stages of our research planning, we decided to expand the scope of our investigation to also include the genebanks' rates of germplasm distribution and contributing factors.

It is important to underscore that this paper focuses almost exclusively on acquisitions and distributions of PGRFA by the CGIAR Centers' genebanks, and not by the Centers' breeding programs (other than when the breeding programs access materials from, or donate them to, the genebanks). One reason for focusing on genebanks is that data concerning the genebanks acquisitions and distributions are easier to assemble as a result of the historical, CGIAR system-wide coordination of genebank activities. It would take considerably more time and resources to compile time-sequenced data for breeders' acquisitions of germplasm in particular. Another reason for focusing exclusively on genebanks is that the mandates and contributions to the global system of the genebanks and breeders, while closely linked, are different and therefore amenable to separate studies. It is the responsibility of genebanks to assemble and maintain globally relevant collections of PGRFA, maintain the genetic integrity of conserved materials and make them available, in the form received by the Center, to recipients world-wide. The CGIAR breeding programs, on the other hand, develop new and improved materials which they distribute globally. Both make enormously important, but different, contributions to the global system. While it would be extremely interesting to examine the breeding programs' experiences in this regard, it is beyond the scope of the research presented here.

#### **2. Materials and Methods**

The CGIAR Centers hosting international PGRFA collections are Africa Rice Center, International Center for Agricultural Research in the Dry Areas (ICARDA), International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), International Institute for Tropical Agriculture (IITA), International Potato Centers (CIP), International Rice Research Institute (IRRI), International Livestock Research Institute (ILRI), Alliance of Bioversity International and the International Center for Tropical Agriculture (Alliance of Bioversity and CIAT), International Maize and Wheat Improvement Center (CIMMYT), and the World Agroforestry (ICRAF). To assemble data on the genebanks' acquisitions and distributions from 2013 to 2019 inclusive, records were compiled from the CGIAR Genebank Research Program 2012–2016 and the CGIAR Genebank Platform 2017–2021. These records are stored on the Online Reporting Tool (ORT) maintained by the Global Crop Diversity Trust (Crop Trust), coordinator of the CGIAR Genebank Platform. The genebanks' managers reviewed those records, and provided additional information, subdividing acquisitions from each year into three mutually exclusive categories: from Centers' own breeding programs, from new collecting expeditions, and donations from other organizations (from outside the CGIAR) Additional information regarding the countries and crops from which Centers acquired PGRFA was assembled through reference to Centers' collecting expedition reports in the ORT and each genebank's own records. For both acquisitions and distributions by the genebanks for 2010–2012 inclusive, each genebank provided the requisite annual acquisition and distribution data, including the subdivision of acquired materials into three categories of sources, provider countries, and crops.

Each Center genebank also provided aggregate totals of their annual acquisitions and distributions for 2005–2009 inclusive. While some of these data had been compiled for CGIAR reports to the Plant Treaty's Governing Body and the FAO Commission on Genetic Resources for Food and Agriculture (CGRFA), this had not been done according to calendar years, so could not be used for this study. These data were combined with pre-existing data on annual acquisitions and distributions by the Centers covering the period of 1980–2004 (from the 2012 study referenced above).

Genebank managers were also asked to respond to a survey (see Appendix A) providing reflections on trends in acquisitions and distributions from 2010 to 2019, contributing factors, and experiences acquiring (or attempting to acquire) new materials during the last five years. Staff from the Centers' genebanks, the Crop Trust, and the Genebank Platform Policy Module subsequently worked together in group teleconferences to review and synthesize key findings from the assembled data and survey.

The respondents/informants provided expert knowledge concerning genebanks' performance targets and quality management standards, methods for identifying and prioritizing gaps in collections to be addressed through new collecting expeditions, Centers' efforts to ensure healthy, quarantine organism-free genetic materials, and international crop conservation strategies. Literature reviews were conducted to gain insights into the experiences of genebanks outside the CGIAR. To validate data and key findings, several consultations with CGIAR genebank managers and staff were conducted.

#### **3. Findings**

#### *3.1. Acquisitions*

There was a dramatic rise in acquisitions of new PGRFA by the genebanks between 2010 and 2018, compared to the previous 10 years, though still lower than peaks reached in the 1980s and 1990s, as illustrated in Figures 1 and 2 below. The increase over the last 10 years reached its height in 2012, when the genebanks received almost 14,000 samples of distinct PGRFA to include as new accessions in the international collections. Of course, not all of the materials that the genebanks receive are ultimately accessioned. If materials received are not viable, or are redundant with respect to the materials already in the collection, they are not included in the collections. For the convenience of using a short form, the tables and figures presented below make reference to "accessions"; in fact, the data refer to materials received with the intention of including them as accessions, assuming they are viable and not redundant. In 2019, the number of newly acquired materials by the Centers to include in the international collections dropped back down to the lower levels that characterized the mid-1990s to 2009.

**Figure 1.** Acquisitions and distributions by all CGIAR Centers 1980–2019.

**Figure 2.** Details of CGIAR genebank acquisitions 2005–2019.

In total, over the course of ten years, from 2010 to 2019 inclusive, the CGIAR genebanks acquired 116,921 samples of distinct PGRFA to include in their Article 15 collections.

Approximately 65% of the materials acquired by the genebanks came from providers in 142 different countries. The remaining 35% came from the Centers' own breeding programs, as discussed below. A total of 84% of those countries are developing countries or countries with economies in transition as defined in the International Monetary Fund's World Economic Outlook Database (October 2018, accessible at: https://www.imf.org/external/pubs/ft/weo/2018/02/weodata/ groups.htm). Approximately 18% of the materials from countries came from new collecting expeditions; the other 82% was material that was already in ex situ conditions prior to being sent to the Centers.

All of the materials from providers in countries were either received under the standard material transfer agreement (SMTA) adopted for exchanges of materials under the Plant Treaty's multilateral system (which allows the Center to conserve, use, and pass on the materials with the same SMTA) or under other agreements whereby the providers gave the Centers permission to subsequently distribute the material concerned with the SMTA. In this context, it is interesting to note that 31 of the countries from which the materials were made available are not currently contracting parties to the Plant Treaty, yet they were still willing to provide the materials, and have them subsequently redistributed by the CGIAR Centers, under the conditions established by the Plant Treaty. We did not analyzed whether some other countries which are now Plant Treaty members made materials available before becoming members.

Approximately one third of the PGRFA acquired by the genebanks from providers in countries between 2010 and 2019 were associated with a project coordinated by the Crop Trust from 2007 to 2012 called "Securing the Biological Basis of Agriculture" (hereinafter referred to as the "Regeneration project"), funded by the Bill and Melinda Gates Foundation and the Grains Research and Development Corporation. That project provided financial and technical support for organizations around the world to regenerate unique ex situ PGRFA that were at risk of being lost, to send a copy of the regenerated materials to an internationally recognized genebank, to send a copy for safety back-up in the Svalbard Global Seed Vault, and to make the materials available through the multilateral system of access and benefit sharing. Activities with 84 national partners in 54 countries resulted in the regeneration of approximately 73,000 threatened accessions, of which more than half were duplicated in CGIAR genebanks with permission to make them available through the multilateral system.

As of 2019, another 2256 samples of 1508 unique accessions collected from 25 countries were sent to CGIAR Center genebanks (ICARDA, ICRISAT, IRRI, IITA, CIP) by the Millennium Seed Bank (MSB), associated with the project called "Adapting Agriculture to Climate Change: Collecting, Protecting and Preparing Crop Wild Relatives" (hereinafter the "CWR project") coordinated by the Crop Trust from 2011 to 2021 [2]. The CWR project, with funding from the Norwegian government, provided financial and technical support for project partners to target and collect wild species related to crops, to create a safety back-up, and make collected material available through the multilateral system. It is critically important to have safety duplication for PGRFA accessions hosted by organizations that can ensure the required conditions for long-term storage. Details concerning the Centers, crops, providing countries, and related programs are set out in Table 1 below.


**Table 1.** Providers of materials acquired by CGIAR genebanks 2010–2019 (excluding CGIAR breeding programs)—Notes: associated with Regeneration project (+), CWR project (\*). We separate the two genebanks hosted by the Alliance of Bioversity and CIAT.


**Table 1.** *Cont.*


**Table 1.** *Cont.*

As mentioned above, approximately 35% of the materials acquired by the genebanks between 2010 and 2019 came from the Centers' own plant breeding programs, mostly from the CIMMYT wheat breeding program. Every genebank has a policy and process, both of which are periodically reviewed, for strategically acquiring materials from breeders' collections for incorporation into the genebank for long-term conservation with the express aim of ensuring that incoming materials are likely either to represent a highly demanded material or diversity that is not already contained in the collection.

The availability of funds—both for providers and for the Centers as recipients—was one of the most frequently mentioned factors by the genebank managers affecting the ability of CGIAR Centers to acquire new materials to include in the genebanks.

The Centers representatives confirm that in many of the instances where the genebanks were able to acquire new materials, it was critically important to be able to provide financial and technical support for providers' activities related to the collection, regeneration, phytosanitary cleaning and inspection, and shipping of accessions. In some cases, the Centers provided this support, in other instances—most notably, the Regeneration and CWR projects—the support came from other organizations.

Generally speaking, the financial costs of preparing and sending materials are not particularly high. Costs arise because of the lack of capacity and means to multiply, dry, test, and clean seed that is either collected or conserved and is only available in small quantities with unknown viability. The transaction costs are greatly higher when vegetatively propagated germplasms, including roots, tubers, bananas, or trees, are involved because the movement across international borders involves stringent phytosanitary restrictions that demand expensive disease cleaning and testing upon both shipping and receipt along with extensive periods of time in quarantine based on the risk assessed by the National Plant Protection Organizations (NPPO). In such cases, the Centers need to make substantial investments in strengthening partners' capacity to test and clean materials.

Once materials arrive at the Center, they need to be processed through post-entry quarantine, health tested, and cleaned of infectious diseases, multiplied, tested for viability, and dried before being introduced into the collection and made available for distribution. In the case of clonally propagated crops, the steps can be very expensive and time-consuming, taking generally four years or longer for a new accession to become available (N. L. Anglin pers comm). Post-entry quarantine (PEQ) procedures such as growing the first generation in a quarantine greenhouse—a requirement for all newly acquired materials by the genebanks—are also expensive. Occasionally, the NPPO requires new acquisitions to remain in PEQ from four months up to two years to assess risks.

Direct costs for new collecting missions are also relatively modest. Of course, the costs per samples of materials collected greatly differs by crop type, wild or cultivated form, and geographical location. For example, similar projects, with similar budgets, working with national agricultural research system (NARS) partners to conduct collecting missions resulted in very different numbers of acquisitions; samples of 307 landraces and 94 wild relatives and forages were gathered in Tajikistan and Lebanon for the same cost as samples of 106 bananas collected in Papua New Guinea, Samoa, and the Cook Islands. The per accession costs will differ between seed and clonal crop collections by an order of magnitude more when taking into account the costs of incorporating the materials into the collections.

In addition to direct costs associated with the management and transfer of biological materials, there are transaction costs associated with getting requisite permissions to provide/access materials from both ex situ and in situ conditions. Transactional costs associated with these activities are often substantially increased in situations where the national policies and laws are unclear, or non-existent. In such cases, it can take extended rounds of communications over long periods of time with many different levels of national R&D partners, lawyers, and competent authorities before final decisions to provide materials can be made, or, as sometimes occurs, no final determination is ever communicated. Unlike some national genebanks or networks of collections [3], CGIAR does not have a centralized service that takes responsibility to help Centers comply with requisite processes for organizing collecting missions/partnerships, and obtaining requisite permissions.

Indeed, along with availability of funds, the CGIAR genebank managers emphasized that "restrictive or unclear laws or policies" were leading variables influencing their ability to acquire new PGRFA to include in the international collections. They note that the Plant Treaty's multilateral system has contributed stability and a sound legal basis for providing and receiving germplasm, and that their ability to acquire materials through new collecting is evidence of cooperation/coordination between national authorities responsible for implementing the Plant Treaty and those responsible for regulating access to genetic resources outside the multilateral system (including those implementing the CBD or Nagoya Protocol). However, the Centers are concerned that unresolved disputes concerning the enhancement of the Plant Treaty's multilateral system of access and benefit sharing, and digital sequence information (DSI) in particular (both under the Plant Treaty and the Nagoya Protocol), are holding back some countries (and providers within countries) from making more PGRFA available through the multilateral system. If international tensions over these issues remain unresolved for too long, they could further undermine the Centers' ability to access, generate, use, and distribute PGRFA and associated information.

The Center genebanks confirm that, over time, they tend to obtain materials to include in their collections from the same set of countries or subregions, where they have established connections. By corollary, there are some countries and subregions from which they rarely, if ever, obtain materials. A number of the genebanks confirm that they rarely make overtures to organizations and or countries which have strongly signaled in the past that they are unwilling to make new materials available. Their general perception is that, despite the coming into force of the Nagoya Protocol and the existence of the Plant Treaty's multilateral system of access and benefit sharing, some of these same countries have not substantially altered their approach to making materials available upon request for inclusion in the international genebanks' collections.

The initial stimulus, and subsequent financial and technical support for PGRFA providers, from these internationally coordinated projects were clearly critically important factors contributing to the extraordinary increase in PGRFA that was made available for the CGIAR Centers to include in the international collections and thus the multilateral system.

The Centers genebanks' representatives also report that the strength of the Centers' long-term relationships with providers and provider countries is equally important. Most materials are made available to Centers as part of projects with providers. Centers rarely obtain new materials to include in the genebank as a result of "cold-calling" would-be providers with simple requests for materials and no other form of engagement. Out of appreciation of these factors, the Centers most recent collective efforts under the CGIAR Genebank Platform (suspended temporarily due to the COVID-19 pandemic) to catalyze and support new collecting expeditions involve support for "two-way flows" of germplasm from the genebanks to the providers, identified on the basis of a jointly conducted analysis of potentially useful germplasm to respond to local needs, and from the providers to the genebanks, and financial and technical support for institutional capacity building for partner organizations in the country concerned. This is consistent with practices of other organizations seeking to acquire PGRFA to include in public genebanks through new collecting activities [4].

Of course, these factors affecting the ability of the genebanks to acquire new materials need to be considered within the broader context of what additional PGRFA needs to be collected, backed up, and conserved as part of the global system, either by CGIAR genebanks or other organizations hosting globally available PGRFA collections. Methods for conducting gap analyses and strategies for coordination with other organizations are considered below.

From 2007 to 2017, CGIAR Centers acquired PGRFA through collecting expeditions from 14 countries. Collecting in six of those countries was supported by the CWR project. During the same period of time, the Center for Genetic Resources in the Netherlands (CGN, with 23,000 accessions) received materials through collecting expeditions in five countries, the Leibniz Institute of Plant Genetics and Crop Plant Research (IPK, with 105,000 accessions) received materials from collecting in six countries, and the National Plant Germplasm System of the United States Department of Agriculture (NPGS-USDA, with 500,000 accessions) organized collecting expeditions in at least 20 foreign countries [5–11]. Accessions from collecting expeditions in other countries account for 8% of CGN total germplasm acquisitions [12]. At the beginning of the century, such materials represented 20% of material acquired by NPGS-USDA [3]. In the last decade, collecting expeditions have contributed 11% of all the acquisitions by CGIAR genebanks. The fact that, relative to the cumulative size of their collections, the CGIAR Centers have engaged in relatively fewer collecting activities during this period than these other organizations may be attributable to a combination of the following factors. The diversity of some CGIAR mandate crops is relatively well represented in ex situ collections when compared to the crops that IPK, CGN, and NPGS-USDA have been prioritizing in their collecting activities, i.e., vegetables (e.g., lettuce, Allium, Brassica, chicory, spinach, asparagus, and carrot), fruit and nut trees (e.g., apple, pear, pomegranate, pistachio, walnut, and hazelnut), berries (Fragaria, Rubus, and Ribes), and temperate grasses (Poa, Festuca, Agrostis, Koeleria, and Puccinellia). IPK, CGN, and NPGS-USDA have been acquiring most of their new materials from the Transcaucasia and Central Asia regions and Europe [10,11], which are arguably more open to allowing new collecting missions than other countries and regions in the world. IPK, CGN, and NPGS-USDA may also have had more, and more regular, financial resources to dedicate to supporting new collecting activities. We acknowledge that we are only scratching the surface of potential comparisons between CGIAR and other genebanks around the world; we hope to be able to deepen such analyses in the future.

#### *3.2. Distributions*

Over the course of the last 10 years, the CGIAR genebanks have distributed on average 115,000 samples of germplasm per year around the world. While there are significant fluctuations in Centers' distributions from year to year, the overall rate of Centers' distribution from 2010 to 2019 is higher than the previous decade at 95,000 samples per year (2000 to 2009 annual average), as portrayed in Figure 1 above. Comparing the total distributions between time periods 2000–2014 and 2015–2019, four Centers increased their annual distributions in the latter half of the last decade, four remained generally the same, and three decreased.

There is considerable fluctuation, from year to year, in the ratio of materials the CGIAR genebanks send to recipients within the CGIAR (mainly breeders) and to recipients outside the CGIAR, as can be seen in Figure 3. Since 2017, the Centers genebanks have been distributing proportionately more materials to recipients outside the CGIAR. Some Centers do not have crop breeding programs (e.g., Bioversity, ILRI), so almost all of their distributions are to recipients outside the CGIAR. In 2019, the Centers and crops with a high proportion of materials distributed to breeders (within their own Center or in other Centers within the CGIAR) are AfricaRice (rice); ICRISAT (pigeon pea, chickpea); ICARDA (grasspea, barley); CIP (sweet potato); IRRI (rice); IITA (cassava); and CIMMYT (wheat).

**Figure 3.** Centers' distributions, 2010–2019, broken down by (i) transfers within the CGIAR (internal), and (ii) transfers to recipients outside the CGIAR (external).

Between 2017 and 2019, approximately 80% of PGRFA samples distributed by the genebanks to recipients outside the CGIAR were to a combination of NARS partners, national genebanks, advanced research institutes (ARIs), and universities (see Figure 4 for more details). The proportion of samples distributed to farmers, farmer organizations, and non-governmental organizations (NGOs) during this time is still relatively small (8%), approximately the same rate as noted earlier for 2015 [13]. CIP and ICRAF were the two centers whose proportionate distribution of materials to farmers and NGOs was largest, between 2017 and 2019 inclusive, as seen in Figure 4b. Much of the material distributed by CIP during this period was part of their repatriation program in which desired potato landraces are matched to farmers' descriptions and returned to the farmers that have lost them due to normal attrition, environmental impacts (drought, hail, diseases), or other reasons, to help maintain and/or increase their diversity in farms. CIP also gives material to farmers in the repatriation program to support their needs in responding to climate change or to help improve yield.

In terms of absolute numbers, the largest provider of germplasm materials was CIMMYT, distributing 28% of the materials cumulatively distributed by the genebanks between 2010 and 2019, followed by IRRI and ICRISAT, distributing approximately 26% and 12%, respectively.

**Figure 4.** *Cont.*

**Figure 4.** Recipients of germplasm distributed by CGIAR genebanks, 2017–2019.

Landraces are the most frequently requested materials (50% between 2017 and 2019), followed by breeding materials (24%), and wild relatives (13%) (see Figure 5 for more details).

**Figure 5.** Types of germplasm distributed by CGIAR genebanks, 2017–2019.

The countries which received the highest numbers of samples from CGIAR genebanks, between 2017 and 2019, are set out in Table 2. These data do not include transfers within or between CCIAR Centers. Four of these countries are not contracting parties to the Plant Treaty.


**Table 2.** 20 countries which received the most samples from the CGIAR genebanks, 2017–2019 (not including intra- and inter-CGIAR Center transfers).

The genebanks' annual distribution is a reflection of demand from users, often driven by the needs of the research and development projects in which they are involved, and by the profile or visibility of the genebanks among research and development, and non-governmental organizations in different countries. The Center hosting the genebank is frequently involved in the research in some way. Here, we provide some examples of projects with which distributed material was associated. The overall spike in distributions in 2013 is largely attributable to the internal CIMMYT transfers of 36,500 wheat accessions as part of CIMMYT's Seeds of Discovery (SeeD) project to CIMMYT researchers and breeders characterizing maize and wheat genetic diversity for use in breeding programs. CIMMYT's high number of distributions in 2013 also includes 8500 wheat accessions to recipients in Iran and Turkey as part of those countries' restoration efforts. In 2019, ICRAF distributed considerably more tree germplasm than in recent years as a result of requests from organizations involved in the mega project "Regreening Africa" that is being implemented in East and West Africa. Between 2017 and 2019, the ICARDA genebank's rate of distribution to recipients outside the CGIAR increased 200–300% over the previous five-year period, reflecting a combination of the Center's progress getting its genebank's operations back "up to speed" after the disruptions associated with having to relocate from Syria and a surge of new interest in the genebank's materials as a result of publicity generated when the Center retrieved materials from the Svalbard Global Seed Vault. A big increase in the ICRISAT genebank's distributions in 2014 and 2018 was associated with supporting a consortium of Indian organizations conducting chickpea and pigeon pea multilocation trials. In 2017, ICRISAT genebank's distributions were higher than usual in part because it responded to a request from Italian organizations for core collections of cereals (sorghum, pearl millet, finger, kodo, proso, barnyard, little, and foxtail millets) as part of a program to reproduce local varieties and develop new crops adaptable to Italian local conditions. The IITA genebank's increased rate of distributions in 2015 was associated with the requests for cowpea and Bambara groundnut germplasm from Nigerian universities to support their research programs on those crops.

The incidence of a newly emerging rare disease can also lead to a need to screen a large number of accessions for resistance [14,15].

The genebank managers confirm that one of the most important factors affecting demand for PGRFA is the quality and relevance of the accession-level information that the Centers compile about the materials in their collections, a finding consistent with the relevant literature [16,17]. Accession-level information helps users make informed decisions about what materials from the collections are potentially most useful for their specific purposes. It also makes it possible for the genebank to make more targeted selections of materials in response to their requests since often users do not know where to begin in choosing an appropriate accession for their needs. The genebanks highlighted the importance of trait-specific data including nutritional qualities, biotic and abiotic stresses, agronomic performance, genetic sequence information linked to traits or to provide information about relationships among accessions, and geographic information about place of collection, including climate conditions and soil type. IRRI reports there was a significant increase in the number of requests for genetic stocks of 3000 rice accessions, particularly between 2015 and 2018, after the full genome sequences were made publicly available. The genebanks have minted digital object identifiers under the Plant Treaty's global information system (GLIS-DOIs) for almost all of the accessions in the Center-hosted international collections, with a long-term view towards helping this process and linking publications and data back to accessions.

While such information is absolutely necessary to generate interest in, and demand for, materials in the genebanks, it also permits users to make more targeted requests for a narrower range of materials with each request. Otherwise, users must consider thousands of accessions from which to choose. Thus, documenting the traits and uniqueness of each accession helps create a path for utilization of the germplasm. Further, creating cores or subsets of accessions, where small groups of accessions are defined by the genebank to help narrow down the search for specific traits, has had an influence on demand and could be scaled up significantly. AfricaRice's genebank has used molecular markers to create core and mini-core sets that represent the maximum possible genetic variation contained in the African rice whole collection [18]. As a result, there has been a significant increase in requests for the mini-core set for use in rice genetic and breeding studies, and gene discovery (M.N. Ndjiondjop pers.com). The Centers' genebank managers note that as they have been able to increase the quantity of accession-level information, requests are indeed becoming more informed and better targeted. Scientists working with genebanks both outside [19,20] and with CGIAR are developing methods to assist users to identify useful materials from ex situ collections, including the ICARDA-led focused identification of germplasm strategy (FIGS) [21,22]. In 2018, IITA developed a FIGS population of drought- and heat-tolerant cowpea. This significantly increased the requests to over 5000 samples distributed to 19 recipients' countries for research purposes (T. Marimagne, pers.com)

The supply of PGRFA from the genebanks also depends upon the ready availability of a sufficient stock of pest and disease-free materials with legal certainty concerning the conditions under which the materials can be provided and received. Costs associated with multiplying, assuring plant health, and distributing samples of crops that are clonally propagated or have recalcitrant seeds are much higher than for crops of orthodox seed behavior. Under the framework of the CGIAR Genebank CRP (2012–2016) and the CGIAR Genebank Platform (2017–2021), the Crop Trust, in cooperation with the Centers, has developed performance targets and a monitoring system to assess the availability, safety duplication, documentation, and quality management of collections. Ultimately, the target is to have 90% (of accessions in the collections) immediately available and 90% safety-duplicated at two locations (for seed collections only). As of the end of 2019, 78% of all materials in the CGIAR genebank were immediately available, 60% of the seed collection was secured in safety duplication at two levels, and 77% was duplicated at the Svalbard Global Seed Vault. A total of 72% of the clonal crop collection was safety-duplicated in the form of cryopreservation or in vitro cultures in at least one location.

The COVID-19 pandemic has highlighted the strategic importance of cryopreserving clonal crops. In vitro cultures require continuous monitoring and upkeep by genebank staff in personam. If scientists' access to these collections is limited as a result of governmental policies restricting movement, some accessions in those collections could deteriorate and be lost. If those accessions are not safety duplicated somewhere else, or if the safety duplications are in the form of in vitro collections that are similarly vulnerable to the same risk, then unique materials and potentially unique varieties no longer found in farmers' fields could disappear if they are not secured in cryopreservation. Cryopreserved back-up collections of these materials would address this risk.

The CGIAR genebanks report that, on occasion, materials they send, and materials they are meant to receive, are held up for long periods of time due to the implementation of phytosanitary regulations, occasionally to the point where materials die before they arrive at their destination.

The agreements between the CGIAR Centers and the Plant Treaty's Governing Body (signed in 2006) have created legal certainty concerning the status of the collections and the conditions under which they can be distributed. This is reflected in the fact that almost all transfers of PGRFA from the genebanks are under the Plant Treaty's SMTA (with the exception of materials sent for service agreements, or restoration or direct use by farmers in cultivation, as per the opinions of the Ad Hoc Technical Advisory Committee on the Multilateral System and SMTA) [23]. Just as the Centers received materials under the SMTA from providers in countries that are not contracting parties, the Centers' genebanks also distributed PGRFA, using the SMTA, to recipients in thirteen countries that are not Plant Treaty contracting parties (between 2012 and 2019 inclusive).

Despite these benefits to the Centers operating under the Plant Treaty's framework, the genebanks note with concern that some large seed companies, some universities, and one national agricultural research organization are unwilling to receive materials under the SMTA, which makes it impossible for the Centers to distribute materials from the genebank (and also most of the materials from the breeding programs) to them. Other genebanks noted that their ability to distribute materials from the international collections was being constrained by policies of the country in which they are located.

#### **4. Discussion**

#### *4.1. Targeted Acquisitions within the Context of the Global System*

Most of the international collections hosted by the CGIAR Centers originated as working collections to support research and breeding programs of international and national public agricultural research organizations. Their subsequent growth depended partly on taking advantage of opportunities to acquire PGRFA from a wide range of sources in an ad hoc manner, for example, being offered material:


The collections grew in opportunistic "fits and starts" without resources or tools to systematically analyze their structure and coverage vis-à-vis what exists in situ, or in other collections around the world. Consequently, it can be a challenge for genebank managers to be certain that newly acquired materials are duplicates of materials they already have, or the same as materials that are conserved and made internationally available by other genebanks, or that they are truly unique. Characterization, documentation, and some cross-referencing are key to resolving this issue, but still duplicates or near duplicates are likely to be abundant within and among genebanks In recent years, the Centers' genebanks have started to take advantage of modern molecular tools such as genotyping to characterize the genetic structure of the collections and to identify genetic differences among and within accessions, for example, potato [25–27], sweet potato [28,29], cassava [30], forage grasses [31–33], Mexican wheat landraces [34], and African rice [18] IRRI generated whole genome sequences of three thousand accessions in its collection [35]. While still constrained by resource limitations,

molecular-level characterization has unprecedented potential to identify redundancies not only within collections, but also across collections, both inside and outside the CGIAR along with identifying potential misclassifications, introgressions, levels of domestication, genetic origins, and putative hybrids. However, it is important to note that even with these data, it is often difficult to determine an adequate cut-off threshold for calling a material a duplicate or too genetically similar for incorporation into the collections. The cost of generating raw sequence data has dropped precipitously, but the expertise and computing power necessary to analyze the data are still expensive and time-consuming, especially when dealing with whole genomes instead of reduced-representation sequencing approaches for genotyping. There are also other biological complications to molecular data alone for resolving duplication issues. However, the CGIAR Genebank Platform has been piloting training programs with scientists from national agricultural research organizations to use genotypic information to analyze within and among accession genetic diversity for a range of crops.

All CGIAR genebanks now have acquisition policies in place and current processes are based on a more critical assessment of whether new materials add diversity to the collections or respond to a specific need or mandate. This is particularly important for clonal collections where, resources permitting, the CGIAR genebanks can use genotyping to help confirm if a sample of newly acquired material is unique (to the collection) before undertaking expensive procedures to test, clean, and reproduce the materials for introduction into the collection as new accessions [31]. Centers working with clonal crops have also developed tools to use genetic sequence information to test for the presence of viruses and bacteria, overcoming costly delays for Centers acquiring and distributing clonal PGRFA [36–38].

Under the framework of the CGIAR Genebank Platform, the Centers and the Crop Trust have developed three methods for analyzing gaps in the coverage of their own collections. First, so-called "diversity trees" have been constructed for 22 crops. The trees are developed using published literature and expert knowledge to categorize the diversity held in each crop gene pool into known variety or genotype groups or wild species, which allows the mapping of accessions into the groups and the quantitative representation (or not) of the gene pool by the collection [32,39,40]. Second, spatial analyses have been undertaken using a method to assess the ecogeographic gaps and coverage of current CGIAR crop collections. The method, which works best for collections with a high percentage of available information on the latitude–longitude of the origin of accessions, looks for relationships between geographic patterns in crop distribution with the genetic structuring, and uses these relationships to build distribution models for crop landraces [41]. Third, a method for trait-based gap analyses focuses on the analysis of the distribution of adaptive priority traits in relation to the environment using machine learning to make predictions; it works best where landraces have been associated with an environment for long enough for their traits to become associated with their environment, and presupposes well-characterized collections. Figure 6 includes a preliminarily indication of the coverage of cultivated gene pools in the CGIAR genebank collections through the use of these tools and illustrates clearly that the cultivated diversity of some crops is well represented while others are considerably less.

In addition to the three strategies listed above, some genebanks are utilizing collection-wide genotyping in order to make key decisions on any new acquisitions. This strategy is only effective for genebanks that have collected genotyping data on their entire ex situ collection with a particular marker system. After which, any new material being considered for acquisition can be genotyped with the same marker system (GBS, SNP, DArTseq, etc.) to aid in decision making. The resulting fingerprints of material being considered for acquisition are subsequently compared to the entire germplasm collection to gain genetic insights. Phylogenetic results and genetic distance measures produced from the genotyping data can clearly show which samples are unique and which are redundant to the existing germplasm collection. Once these data are available, decisions can be made on which material to introduce into the genebank, usually under the framework of maximizing genetic diversity and not introducing material that is genetically similar. This is especially a useful strategy in clonal genebanks in which introduction and virus cleaning are expensive and time-consuming.

**Figure 6.** Preliminary assessment of the coverage of traditional landraces in international collections managed by CGIAR Centers. Note: these findings were generated by mapping accessions onto diversity trees. The colors correspond to numbers of accessions representing identified landrace end-groups or varieties making up the crop gene pool. This figure is reproduced from the 2019 Annual Report of the CGIAR Genebank Platform [40].

The Genebank Platform has recently initiated communications with national agricultural research organizations and national Plant Treaty Focal Points in 20 countries to use combinations of these tools as part of an effort to identify complementary holdings and gaps in collections, to identify priority areas for collecting (with the understanding that collected materials would be made available through the Plant Treaty's multilateral system). At the same time, they undertook to work together to identify potentially useful materials from the international collections to test in the countries concerned. While COVID-19 has had an effect on the preparations for this work, the ambition remains to support the collecting of PGRFA in target countries where there are significant gaps in ex situ conservation.

The CGIAR Centers do not have the objective to host complete collections of the diversity of their mandate crops. Rather, they seek to coordinate efforts with other collections to ensure the broadest possible coverage, long-term conservation, and availability of materials, in the global system as a whole. The Centers will pursue this approach, and the use of the tools and methods described above, in the process of defining collecting priorities and revising the current set of crop conservation strategies (available online at https://www.croptrust.org/resources/#ex-situ-conservation-strategies). The Centers (and the Crop Trust) do not organize collecting expeditions on their own. They work in collaboration with NARS partners who take responsibility for conducting the collecting, obtaining requisite permissions in compliance with national laws and regulations, and first depositing samples in their own national genebanks before transferring them to the CGIAR genebanks.

#### *4.2. Phytosanitary Standards and Initiatives*

Maintaining the health of the germplasm in the international collections is critically important to ensure against the international spread of quarantine pests and diseases. In 2019, the 11 germplasm health units (GHUs) of the 10 Centers hosting PGRFA collections collectively facilitated 2004 exchanges of materials with 141 countries. Altogether 152,469 samples of 105,961 accessions were analyzed by performing 594,909 diagnostics tests, and 13,248 samples were rejected to be replaced with healthy materials from other batches or to be subject to phytosanitary treatments to eliminate the pests concerned [40]. Most of the pests detected are target-specific species. Viral pathogens are most frequently intercepted in legume seeds and in clonal crops. Centers only distribute germplasm that is free of quarantine organisms. Untested and unclean materials are not distributed until establishment of the pest-free stocks. As of 2019, nearly 80% of the germplasm conserved in CGIAR genebanks had been tested for quarantine pests and about 78% of the collection was available for immediate distribution.

The International Plant Protection Convention (IPPC) is essential for promoting and harmonizing countries' phytosanitary regulations, and for facilitating international movement of plant genetic resources that are free of quarantine pests and diseases. However, the system still can and should be further tailored to address the particularities of the international movement and uses of plant genetic resources. Under the framework of the IPPC, contracting parties occasionally adopt International Standards for Phytosanitary Measures (ISPMs) that are tailored for different subject matters that could be a potential carrier of quarantine pests and diseases over international borders (e.g., farm equipment, commercial seed). To date, approximately 42 ISPMs have been adopted. Unfortunately, no ISPM for the regulation of international movement of plant germplasm from genebanks has been developed. The ISPM-36 on Integrated Measures for Plants for Planting and the ISPM-38 on International Movement of Seeds partially address some of the issues, but insufficiently. As a result, the national plant quarantine facilities of many countries either develop and follow their own norms or follow those prescribed through ISPMs dealing with commercial seed (or plantlets), which are not appropriate for plant germplasm shipped to and from genebanks. The commercial seed ISPMs—which anticipate tons of seed in a single shipment—prescribe testing protocols that deplete most of the very small quantities of seed transferred to and from genebanks, or create long delays which are unnecessary to address risks associated with genebank germplasm. These delays can result in materials dying in transit before they arrive at their intended destinations, or they arrive so late that they miss an entire planting season, thereby contributing to delays or cessation of planned research and or plant breeding activities. While phytosanitary issues are most challenging for recalcitrant seed or clonal crops, the genebanks report similar challenges with seed crops.

To address this situation, CGIAR germplasm health experts are working with partners from other organizations to develop a draft protocol for a comprehensive phytosanitary compliance assurance procedure that will demonstrate the best procedures in use for germplasm production and health assurance, while maintaining transparency in risk assessment and mitigation strategies to get NPPO accreditation as trusted to fast-track germplasm distribution. The initiative is referred to as the CGIAR GreenPass Phytosanitary Protocol (GreenPass). If the concept is endorsed at the level of the IPPC's Commission on Phytosanitary Measures, NPPOs could justify eliminating redundant checks by cutting some steps or reducing the processing time for material from GreenPass-accredited facilities. (These issues are addressed in more detail by Kumar et al. in this Special Edition of *Plants*.)

#### *4.3. Genebank and Plant Genetic Resources Valuation*

During the late 1990s, the genebanks received mounting criticism that materials stored in genebanks were rarely used. This was a concern because necessary investments in genebanks are difficult to obtain if potential investors do not appreciate the value of crop diversity, including the multitude of services and benefits it can provide. Several studies followed in the early 2000s which contradicted the viewpoint that genebanks were "unused" [42,43] and a large body of research documented the high rates of return from the genetic improvement of crops for yield, yield stability, quality, nutritional composition, resource use efficiency, and resistance to pests and diseases [44–49]. Most of the economic benefits have been generated from farm productivity gains which can be attributed to research and breeding programs by publicly funded institutions, such as the CGIAR, and society and consumers have especially benefited from lower food prices. Since then, however, the research on the value of international collections and the genetic diversity held in the CGIAR genebanks has not kept pace.

Recently, the CGIAR Genebank Platform and the Crop Trust supported the establishment of a Community of Practice on Genebank Impacts to revive the interest in applied economics research in this area. The work resulted in several papers which have made important contributions to earlier research on genebank valuation [50]. For example, Bernal-Galeano et al. (2020) [51] estimated a gross benefit from the "Victoria" potato variety in Uganda at USD 1.04 billion dollars, which exceeds the annual operational cost of the CIP genebank several times over. Villanueva et al. (2020) [52] estimated that a 10% increase in the genetic contribution of IRRI genebank accessions to an improved rice variety grown by rice farmers in East India is associated with a yield increase of 27%. Aberkane et al. (2020) [53], Sellitti et al. (2020) [54], and Kitonga et al. (2020) [55] focused on the use of wild, semi-natural, and landrace genetic materials in enhancing crop diversity options for breeders and farmers. The study by Ocampo-Giraldo et al. (2020) [56] highlighted the importance of combining ex situ and in situ approaches in a dynamic model of conservation. Alexandra et al. (2020) [57] narrated the formation of Pacific Community's Centre for Pacific Crops and Trees (CePaCT) in Fiji and underscored the role of a global effort to collect, conserve, and breed taro in response to disease outbreaks.

This current set of studies attests to the value of genebanks in at least two ways. First, they contribute to a better understanding of the role, function, and value of genebanks, in the light of climate change and evolving food security challenges. Several authors were able to trace the ancestry of varieties currently adopted by farmers to specific accessions in the genebanks and apportion economic gains by drawing from information on pedigrees. Second, the studies highlight the importance of long-term conservation and safety duplication of unique and diverse genetic materials, for the potential unknown use of future generations. As with Gollin (2020) [58], the findings support the need to refocus global conservation strategies on the efficient management of genetic resources, including on acquisition and conservation priorities.

#### *4.4. Money, Politics, and Law*

The findings presented above concerning the influence of political tensions, restrictive policies, and legal uncertainty are similar to the conclusion of the previous study [1] of factors affecting Centers' acquisitions between 1980 and 2009.

On the positive side, there is evidence that the Plant Treaty's multilateral system of access and benefit sharing is contributing positively to the willingness of many countries, national genebanks, and other providers to make PGRFA available and to safety-duplicate material in the CGIAR Center-hosted international collections. Perhaps the most significant evidence (which is so obvious it is often overlooked) is that all of the material in the international collections ultimately came from countries and, to date, 146 countries and the EU have voluntarily ratified the Plant Treaty, which invites the Centers to manage those collections under the Plant Treaty framework, and to make those materials available under the SMTA. As of November 2019, at least 135,000 accessions maintained in CGIAR genebanks (approximately 17% of the collections) were originally obtained from countries that were not Plant Treaty contracting parties including, but not limited to, the following countries: Mexico, China, Nigeria, Colombia, Thailand, Russia, Vietnam, Azerbaijan, Republic of South Africa, Uzbekistan, and Kazakhstan. It is partly for this reason that the CGIAR Centers feel duty bound to continue making materials available under the SMTA to recipients in countries that are not Plant Treaty members. Some contracting parties, particularly those who have been criticized in recent years for not making more PGRFA directly available through the multilateral system, complain that there is insufficient recognition of the fact that much of the genetic resources that the CGIAR Centers' genebanks distribute through the multilateral system (with the exception of accessioned breeders materials) originally came from them. Information about the country sources of materials (provenance) held in the Center-hosted international collections is available through Genesys, and Centers have published papers highlighting the origins of materials in the collection [31,53]. However, more can and should be done—through more popular, less expert-oriented mechanisms—to celebrate countries' contributions to the international collections and the multilateral system. Indeed, during the Plant

Treaty Governing Body meeting in October 2019, CGIAR undertook to work with the Plant Treaty Secretariat to publicize this information more broadly.

More recent evidence of the positive influence of the Plant Treaty on providers' willingness to include materials in the international collections is the surge of materials received by the Centers' genebanks between 2010 and 2019. All of those materials were either provided under the SMTA or with permission for Centers to subsequently make those materials available under the SMTA. It seems likely that many of those providers would not have been willing (or permitted) to provide materials for inclusion in the international collections in the absence of the Plant Treaty, the multilateral system, and the SMTA. Indeed, some genebanks have been informed by national partners that while those partners can provide PGRFA of Annex 1 materials, they cannot provide PGRFA of non-Annex 1 materials, because their national rules only apply to materials formally included within the scope of the multilateral system.

The fact that the Centers were able to acquire materials—particularly materials through new collecting expeditions—is evidence that other regulatory frameworks, apart from the Plant Treaty's multilateral system, are also contributing to the willingness/ability of providers to provide access to those materials. A lot of in situ PGRFA are not automatically included in the multilateral system, so collection must be approved by a competent authority subject to national law (or in the absence of a law, some other standard observed by the authority) to allow the material to be collected, deposited in the national genebank, and later sent under the SMTA to the CGIAR genebanks. (In this context, it is relevant to note that some countries have explicitly adopted the policy of not wanting to regulate access in this manner). The CGIAR genebanks never collect on their own. They work in partnership with national organizations who manage the interface with national authorities, logistics, and the actual collecting. Transfer within a country of material from one ABS system to another requires coordination and cooperation between the competent authorities and stakeholders.

It is certainly the case that the existence of the SMTA (because it is standard) made it possible for the Centers to process agreements to receive and manage those same materials. It would have been impossible for the Centers to negotiate unique transfer agreements with sui generis benefit sharing, dispute resolution, scope of use, and other conditions in each case, and then to put systems in place to manage materials under a plethora of different conditions.

It is important to consider the significance of the fact that most of the materials made available to the genebanks between 2010 and 2019 were associated with the Regeneration and CWR projects. First, it highlights the fact that conserving and providing PGRFA require financial and technical support and coordination, and in the absence of such support, many potential actors in the global system are unable to play their anticipated roles including securing the unique materials in their collections through safety duplication. In this context, it is important to emphasize that the funds and/or other non-monetary benefits provided to project partners in the Regeneration and CWR projects were modest, designed to subsidize/cover the costs of regeneration and or collecting, health testing and shipping, and in some cases training to do those things and for the Centers to receive the materials. Extra funding is needed to support collecting new PGRFA. Those costs, transaction costs in particular, would decrease if countries had well-defined systems for processing requests for germplasm, and clear signals from competent authorities that such requests were acceptable to consider. It is interesting to note that the USDA has a dedicated Plant Exploration Program managed by a Plant Exchange Office that is responsible for organizing collecting expeditions, reaching out to national competent authorities on behalf of the genebanks included within the national system [3]. There is no such service or function within the CGIAR. Second, the Regeneration and CWR projects highlight the continued importance, and need for, scientifically informed priority setting and international coordination to generate a shared sense of purpose and to motivate the wide range of actors, spread across the world, who ended up being engaged in those projects. The rounds of projects supported by the Plant Treaty's Benefit-sharing Fund (BSF) have played a similar role, catalyzing interest and engagement of a broad range of actors operating from the levels of individual farms to international organization, and providing financial

and technical support for project participants to, among other things, collect, multiply, health-test, and share PGRFA through the multilateral system. It would be interesting, but it is beyond the scope of this paper, to identify the range of materials included in the multilateral system in general as a result of the BSF projects.

This is not to say that all of these internationally coordinated projects worked perfectly, or were immune from criticisms, or that many contracting parties, organizations, and people involved in the conservation and use of PGRFA do not have alternative visions or priorities. The point is that while the Plant Treaty's multilateral system of access and benefit sharing provides an internationally sanctioned, clearly defined legal platform of exchanging plant genetic resources, it is clear that additional financial and technical support, and inspiring visions for dispersed actors to work together, are also necessary to take advantage of the multilateral system. Other actors (including potentially the Plant Treaty's own Governing Body, or, more realistically, a specialist group it establishes) could develop a vision and internationally coordinated program that similarly stimulate and support activities that take advantage of the multilateral system to access, improve, and share PGRFA, and safeguard threatened PGRFA in furtherance of Sustainable Development Goals (e.g., pooling and evaluating a genetically diverse range of PGRFA for traits adapted to local climate changes).

Meanwhile, the CGIAR Centers' genebanks continue to distribute hundreds of thousands of PGRFA samples under the Plant Treaty's SMTA. Indeed, in the decade between 2010 and 2019 (inclusive), the Centers' genebanks distributed 21% more PGRFA samples than they did over the previous decade. These numbers attest to the utility of the multilateral system and increased reliance upon it as a means of accessing genetic materials.

The Plant Treaty's positive impact on the availability of PGRFA more generally (i.e., without involvement of the Center's genebanks) is further evidenced by the fact that, since 2014, sixteen additional countries became members of the Plant Treaty. Among them, is the USA, with the result that approximately 500,000 additional PGRFA accessions are available through the multilateral system of access and benefit sharing. In addition, during the same period, one additional international organization, the International Center for Biosaline Agriculture (ICBA), agreed to make PGRFA available under the terms of and conditions of the Plant Treaty.

On the negative side, however, there is also evidence that a number of Plant Treaty contracting parties continue to be reluctant to implement the multilateral system. Overall, national level implementation of the multilateral system is still relatively low. This is perhaps most clearly manifested in the fact that only 44 out of 146 countries that are contracting parties have confirmed what PGRFA within their borders are actually included in the multilateral system. (see "Material Available in the Multilateral System" on the Plant Treaty website at http://www.fao.org/plant-treaty/areas-of-work/themultilateral-system/collections/en/). This may not be a "black letter law" obligation under the Treaty framework, but it is a commonly acknowledged prerequisite for the multilateral system to practically function [59]. Furthermore, some of the countries that have provided notice have confirmed only a fraction of the collections maintained in their national genebanks or other national public organizations. There are even fewer notices to the Governing Body concerning PGRFA that are voluntarily included by natural and legal persons. Few countries have reported to the Governing Body that they have adopted new policies or administrative measures to implement the multilateral system. On one hand, it is not required by the Plant Treaty that a country develop new policies or laws to implement the multilateral system; on the other hand, many countries report that in the absence of new policy instruments approved by the national government that clarify who has the right to provide materials under the multilateral system, they are unable to do so [59]. In such cases, the absence of new policy measures does indeed reflect a lack of national level implementation.

The relatively slow rate of national level implementation by a number of countries can be partially accounted for by the fact that developing country contracting parties are dissatisfied by the fact that, to date, there has been only one payment to the Plant Treaty's Benefit-Sharing Fund by a commercial user of materials from the multilateral system. This dissatisfaction contributed to the Fifth Session of the

Governing Body in 2013 launching a process to enhance the functioning of the multilateral system. That process continued until 2019, when it was suspended, with no new agreement reached, and high levels of unresolved political tension between contracting parties.

International tension and disagreement concerning access and benefit-sharing issues in other fora have also spilled over into meetings under the Plant Treaty framework. The Nagoya Protocol came into force in 2010. It was designed to address (mostly developing) countries' concerns that the Convention on Biological Diversity (CBD) did not sufficiently promote its benefit-sharing objective. However, by 2013, the Conference of the Parties to the CBD in Mexico was overtaken by tensions over benefit sharing from commercial use of DSI, which is not addressed by the Nagoya Protocol (or at least not in a way that the international community agrees upon). Since then, the issue has dominated the agendas of the CBD, Plant Treaty, FAO CGRFA, and others. Under these circumstances, some parties who are not content with levels of monetary benefit sharing from the use of PGRFA and/or genomic sequence information may be reluctant to make more materials available through the multilateral system by depositing them in the international collections hosted by the CGIAR Centers until there is some resolution to the DSI and ABS issue.

The previous study [1] of factors affecting the Centers' germplasm acquisitions from 1980 to 2009 adopted an implicitly millennial framework, looking forward to a new era when international tensions over access and benefit sharing were resolved, and national laws implementing international agreements provided requisite legal certainty for PGRFA providers and recipients alike. In 2010, it seemed reasonable to expect that this state of affairs could be achieved within a few years. Since that time, tensions over access and benefit sharing have increased, and the possibility of arriving at a set of final international agreement(s) on the issue appears to have receded still further into the distance.

#### **5. Conclusions**

Over the course of the decade 2010–2019, the CGIAR Center genebanks received a surge of PGRFA from providers around the world, with permission to make those materials available through the Plant Treaty's multilateral system of access and benefit sharing. Most of those newly deposited materials were associated with an internationally coordinated project that provided financial and technical support for the providers to regenerate and safety-duplicate unique, at-risk PGRFA in ex situ collections around the world.

The Regeneration and CWR projects, the BSF project cycles, and the CGIAR genebanks' experiences over the last ten years highlight the critical importance of internationally coordinated projects to motivate and instill otherwise dispersed and disconnected actors with a sufficiently shared sense of purpose, to make materials available through the multilateral system. Those projects and experiences also underscore the importance of providing modest levels of financial support to cover both the providers' and the recipient genebanks' costs associated with collecting, multiplying, health-testing, sending, and receiving PGRFA, and of providing technical support/training to providers/partners.

The number of collecting expeditions organized by CGIAR Centers in this period was higher than in the previous decade, but lower than some national genebanks which play an important role in the global system of conservation, sustainable use, and exchange of plant genetic resources. Increasingly sophisticated and globally coordinated gap analyses are making it possible to identify where gaps in collections exist. Gap analyses presented in this paper highlight that ex situ collections managed by the CGIAR represent well the cultivated gene pool of some crops, while there are significant gaps for other crops. Given the need to keep costs to a minimum, and the possibility of sharing responsibilities with other global system actors, CGIAR Centers must redouble their efforts to address the results of gap analyses in concert with other organizations that host internationally available collections.

While financial support, scientific leadership, and international coordination are indispensable, so too are supportive policies. The CGIAR genebanks' experiences highlight the importance of supportive policies, and conversely, the negative impacts of restrictive or unclear policies and laws, on their ability to acquire new materials to include in the international collections, and to distribute those

materials to recipients around the world. In retrospect, it is perhaps surprising that over the same period of time, the international community launched and attempted to renegotiate the multilateral system of access and benefit sharing—with all the uncertainty that attends an international negotiation—and the Centers' genebanks enjoyed a surge of new materials being made available to include in the international collections. During the same period of time, they continued to distribute an extraordinary diversity of PGRFA to recipients around the world (even more than in the previous decade), a fact which reflects the persistent need/demand for access to those materials for agricultural research and development, the deep rooted nature of the CGIAR collections within the global system, and the positive influence of the Centers' Article 15 agreements with the Governing Body of the Plant Treaty.

However, with the suspension of the process to enhance the multilateral system, and widespread tensions concerning the governance of digital genomic sequence information, international disagreement over access and benefit sharing is becoming still more geopolitically polarized. There is a significant risk that this increased polarization could further undermine the willingness of a range of actors to make materials available through the multilateral system in general, and to the CGIAR genebanks in particular. The Centers highlight the importance of resolving those tensions to "head off" unintended potential negative impacts on the CGIAR's mission, the global system, and the SDGs. Rapid loss of biodiversity, climate change, the COVID-19 pandemic, rising populations, depleted soils, and a range of other challenges make the conservation, availability, and use of PGRFA more important than ever. It is essential that the Plant Treaty (and Nagoya Protocol and IPPC) is implemented in ways that support all actors in the global system to fulfill their roles.

**Author Contributions:** Conceptualization, M.H., N.J., I.L.N.; methodology, M.H., N.J., I.L.N.; validation, M.H., N.J., I.L.N., C.L., N.L.A., P.W., T.P., M.-N.N., L.G., P.L.K., M.Y., A.M., V.A., M.T., C.S.J., R.V., N.R., E.R.; formal analysis, M.H., N.J., I.L.N., C.L., N.L.A., P.W., T.P., M.-N.N., L.G., P.L.K., M.Y., A.M., V.A., M.T., C.S.J., R.V., N.R.; investigation, M.H., N.J., I.L.N.; data curation, N.J., N.L.A., P.W., T.P., M.-N.N., M.Y., A.M., V.A., M.T., C.S.J., R.V., N.R., E.R.; writing—original draft preparation, M.H., N.L., I.L.N.; writing—review and editing, M.H., N.J., I.L.N., C.L., T.P., N.L.A., P.W., L.G., P.L.K., A.M., C.S.J., M.-N.N., R.V.; visualization, M.H., N.J., I.L.N.; supervision, M.H., N.J.; project administration, M.H., N.J.; funding acquisition, M.H., I.L.N. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by CGIAR Genebank Platform.

**Acknowledgments:** The authors thank Ana Bedmar and Gaia Gullotta for their valuable contributions to literature surveys and data compilation and analysis.

**Conflicts of Interest:** N.L.A., P.W., T.P., M.-N.N., M.Y., A.M., V.A., M.T., C.S.J., R.V., N.R., P.L.K., E.R. work in the management of CGIAR Centers' genebanks (and/or germplasm health units) that are the subject of this paper. N.J., C.L., L.G. work at the Crop Trust. M.H., I.L.N. are co-coordinators of the Genebank Platform Policy Module.

#### **Appendix A**

**Table A1.** Survey on Acquisitions and Distributions.


Q2. Name of respondent (You may have more than one respondent filling in the same survey, but please fill in just one survey per Center)

Q3. Over the course of the last 10 years (2010–2019), the overall rate of acquisition of new PGRFA by your genebank from sources outside the CGIAR has...?

Answer Choices Declined Increased remained steady Q4. Which of the following factors have affected the your genebanks' overall rate of acquisition of PGRFA from sources outside the CGIAR?

Answer Choices

most relevant diversity is already collected

no space left in the genebank for additional accessions

no money for additional collecting missions

no capacity/money to evaluate or characterize what is already in the genebank, so there is no point adding more restrictive or unclear policies or laws make it difficult to get permission to access new PGRFA

unwillingness of countries to allow collecting missions or to share their ex situ materials

provider's country cannot issue a certificate in compliance that satisfies the phytosanitary rules of the country hosting the my genebank

other (please specify)

Q5. Please add additional information to further clarify your answer from above. If you chose more than one factor, please rank them in terms of their importance.

Q6. In the last 10 years (2010 - 2019), the overall rate of distribution of PGRFA from your genebank has...?

Answer Choices Declined Increased remained steady

Q7. Which of the following factors have affected the rate of distributions from your genebank?

Answer Choices

number of requests for materials

a general tendency towards more targeted requests (i.e., requestors are more specific about the range of materials they are seeking, so you end up sending less material per request)

your responses to requests are more targeted (i.e., you spend more time in the past, with more information about your collection, determining which particular materials are best suited to the needs of the requestor, so you end up sending less material per request)

you do not have sufficient resources to regenerate enough material to send materials in response to all requests requested materials do not meet requisite phytosanitary standards

restrictive policies or laws or conditions make it difficult for the genebank to distribute PGRFA status of information available about materials in the genebank (e.g., characterization, evaluation, subsets, etc.) other (please specify)

Q8. Please add additional information to further clarify your answer from above. If you chose more than one factor, please rank them in terms of their impact (either positive or negative) on your distributions.

Q9. Are you sometimes asked for PGRFA that you do not have?

Answer Choices

Yes

No

If 'yes', please provide a brief description of the materials concerned and why they are being asked for.

Q10. Please describe the circumstances that you believe contributed to particularly steep spikes or dips in your genebank's rate of acquisition of PGRFA in any year or years between 2010–2019 (e.g., internationally coordinated projects, organizations looking to transfer collections, joint projects with national research organizations, etc.)

Q11. Please describe the circumstances that you believe contributed to particularly steep spikes or dips in your genebank's rate of distribution of PGRFA in any year or years between 2010–2019 (e.g., sources of demand linked to particular projects in particular countries, etc.)

Q12. How many requests for PGRFA to include in your collection(s) has your genebank made to organizations outside the CGIAR in the last 5 years? (for materials in in situ and ex situ conditions)

Answer Choices

0

1–3 4–6

7–9

other (please indicate number)

#### **Table A1.** *Cont.*

Q13. Regarding the requests you made described in question above

Answer Choices

(a) how many were explicitly rejected?

(b) how many were ignored (i.e., simply no answer)?

(c) how many were accepted, but materials are not yet acquired by your Center?

(d) how many were accepted and the materials have actually been acquired by your Center?

if (a) to (d) do not describe what happened, please describe the outcome

Q14. Was there a difference in the kinds of responses you received depending on whether you were seeking to acquire material from in situ conditions (therefore involving new collecting missions) or materials that were already in ex situ collections? Please explain.

Q15. Was there a difference in the kinds of responses you received depending on the types of organizations to whom you addressed your request? Please explain.

Q16. Would you have preferred to make more requests to acquire more PGRFA over the last 5 years?

Answer Choices Yes No If 'yes', why didn't you make more requests? Q17. The Plant Treaty has: Answer Choices made it much harder for my genebank to acquire PGRFA made it a little harder for my genebank to acquire PGRFA has not had any appreciable impact on my genebank's ability to acquire PGRFA has made it a little easier for my genebank to acquire PGRFA has made it much easier for my genebank to acquire PGRFA Briefly explain your response. Q18. The Nagoya Protocol (to the Convention on Biological Diversity) has: Answer Choices made it much harder for my genebank to acquire PGRFA made it a little harder for my genebank to acquire PGRFA has not had any appreciable impact on my genebank's ability to acquire PGRFA has made it a little easier for my genebank to acquire PGRFA has made it much easier for my genebank to acquire PGRFA Briefly explain your response. Q19. The IPPC and national phytosanitary rules have: Answer Choices made it much harder for my genebank to acquire PGRFA made it a little harder for my genebank to acquire PGRFA has not had any appreciable impact on my genebank's ability to acquire PGRFA has made it a little easier for my genebank to acquire PGRFA has made it much easier for my genebank to acquire PGRFA Briefly explain your response. Q20. Are you concerned that unresolved international negotiations concerning digital genomic sequence information (DSI), and/or the suspension of negotiations to enhance the Plant Treaty's multilateral system of access and benefit-sharing, could, in the future, have a negative impact on the following: CGIAR Centers' genebanks ability to access to PGRFA to include in international collections? CGIAR Centers' ability to access, generate, use and share digital genomic sequence information (DSI)? Centers' management of their Article 15 collections (including distributing PGRFA and related information)? Willingness of some organizations to enter into partnership with CGIAR Centers? If you answered 'yes' to any of the above, please explain

Q21. If you already have evidence of negative impacts as a result of the unresolved issues regarding DSI and the multilateral system enhancement, please provide details here.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Review* **Phytosanitary Interventions for Safe Global Germplasm Exchange and the Prevention of Transboundary Pest Spread: The Role of CGIAR Germplasm Health Units**

**P. Lava Kumar 1,\*, Maritza Cuervo 2, J. F. Kreuze 3, Giovanna Muller 3, Gururaj Kulkarni 4, Safaa G. Kumari 5, Sebastien Massart 6, Monica Mezzalama 7,†, Amos Alakonya 7, Alice Muchugi 8, Ignazio Graziosi 8,‡, Marie-Noelle Ndjiondjop 9, Rajan Sharma <sup>10</sup> and Alemayehu Teressa Negawo <sup>11</sup>**


**Abstract:** The inherent ability of seeds (orthodox, intermediate, and recalcitrant seeds and vegetative propagules) to serve as carriers of pests and pathogens (hereafter referred to as pests) and the risk of transboundary spread along with the seed movement present a high-risk factor for international germplasm distribution activities. Quarantine and phytosanitary procedures have been established by many countries around the world to minimize seed-borne pest spread by screening export and import consignments of germplasm. The effectiveness of these time-consuming and cost-intensive procedures depends on the knowledge of pest distribution, availability of diagnostic tools for seed health testing, qualified operators, procedures for inspection, and seed phytosanitation. This review describes a unique multidisciplinary approach used by the CGIAR Germplasm Health Units (GHUs) in ensuring phytosanitary protection for the safe conservation and global movement of germplasm from the 11 CGIAR genebanks and breeding programs that acquire and distribute germplasm to and from all parts of the world for agricultural research and food security. We also present the challenges, lessons learned, and recommendations stemming from the experience of GHUs, which collaborate with the national quarantine systems to export and distribute about 100,000 germplasm samples annually to partners located in about 90 to 100 countries. Furthermore, we describe how GHUs adjust their procedures to stay in alignment with evolving phytosanitary regulations and pest risk scenarios. In conclusion, we state the benefits of globally coordinated phytosanitary networks for the prevention of the intercontinental spread of pests that are transmissible through plant propagation materials.

**Citation:** Kumar, P.L.; Cuervo, M.; Kreuze, J.F.; Muller, G.; Kulkarni, G.; Kumari, S.G.; Massart, S.; Mezzalama, M.; Alakonya, A.; Muchugi, A.; et al. Phytosanitary Interventions for Safe Global Germplasm Exchange and the Prevention of Transboundary Pest Spread: The Role of CGIAR Germplasm Health Units. *Plants* **2021**, *10*, 328. https://doi.org/10.3390/ plants10020328


Academic Editor: Andreas W. Ebert and Johannes M. M. Engels Received: 29 November 2020 Accepted: 28 January 2021 Published: 9 February 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

**Keywords:** CGIAR; crop genetic resources; diagnostics; germplasm; crop breeding; pathogen; pest; Plant Treaty; phytosanitary regulations; transboundary pests; invasive species; prevention; quarantine; seed; seed health; virus indexing

#### **1. Introduction**

#### *1.1. International Germplasm Transfers for Food Security and Biodiversity Conservation*

The international exchange of genetic resources, such as botanic seeds and vegetative propagules, has played a crucial role in agricultural and food diversification to an extent that about 68% of national food supplies are derived from crops with a foreign origin [1]. At the forefront of these international exchanges are the CGIAR genebanks, breeding and seed system programs that have made vital contributions for over five decades by assembling germplasm from all over the world for conservation, and adding value to those materials by characterizing, breeding, and making them available to users around the world [2,3]. (*Note: Germplasm used to denote plant propagation material, both true seed* (*orthodox, intermediate, and recalcitrant seeds*) *and vegetative propagules, from genebanks and breeding programs.*) Established in 1971, the CGIAR is part of the global agricultural research system, which makes critically important contributions to the United Nations Sustainable Development Goals (SDGs) in alleviating poverty and hunger and improving food and nutrition security and in the conservation of biodiversity [4].

The 11 CGIAR genebanks conserve over 760,467 accessions of cereals, grain legumes, forages, tree species, root and tuber crops, and bananas. These represent >174 genera and over 1000 species obtained from 207 countries, which are conserved in 35 collections around the world as seeds, in vitro material, and living plants in fields or screenhouses (Supplementary Table S1) [5]. Between 2007–2016, the CGIAR centers distributed 3.91 million samples, with about 30% from genebanks and 70% from crop breeding programs, to 163 countries [3,5]. These distributions from the CGIAR programs account for almost 89% of the total annual international germplasm exchanges *(note: 'exchange' and 'transfers' used as a common term to denote germplasm exports or imports between countries.*), under the International Treaty on Plant Genetic Resources for Food and Agriculture (ITPGRFA or the Plant Treaty) [2,3,6–8]. Between 2010 to 2019, the CGIAR genebanks acquired 116,921 distinct accessions, about 35% of which were acquired through the centers' own breeding programs and 65% were acquired from collection missions or through national programs in 142 countries [7]. During the same period, the CGIAR genebanks distributed, on average, 115,000 samples of germplasm per year, and above 80% of the recipients were in developing countries [5,7]. A detailed analysis of the CGIAR genebanks' acquisition and distribution of germplasm in the last decade is presented by Halewood et al. [7]. The demand for a global movement of plant genetic resources (PGR) from the international genebanks and breeding programs is increasing due to worldwide efforts to develop nutrient-rich high-yielding varieties, which are resilient to biotic and abiotic stresses and better adapted to a changing climate, through various programs, such as the CGIAR's 'Crops to End Hunger' initiative [9]. Import and export of germplasm and other biological resources are influenced by several international and national policies, treaties, and legal frameworks [3]. The ITPGRFA and its multilateral system of access and benefit-sharing and the Convention on Biological Diversity (CBD) agreements guide the CGIAR centers' policies on germplasm acquisition, conservation, regeneration, use, and distribution [3]. The availability of pestand disease-free germplasm is an important requirement for international distribution from genebanks and breeding programs.

#### *1.2. Pathogen and Pest Threats to International Germplasm Transfers*

It is well-known that plants and seeds can harbor pathogens and pests (hereafter referred to as pest (*note: 'pest' used to denote any species, strain or biotype of plant, animal or pathogenic agent injurious to plants or plant products.*), including bacteria, fungi, phytoplasmas, viroids, viruses, insects, nematodes, and other harmful biotic agents, and that the transfer of germplasm carries a simultaneous risk of moving pests between geographies and introducing them into territories where they are not known to exist [10,11]. International seed transfers have been recognized as important pathways for the transboundary spread of pests through human activities associated with collection and distribution [10]. The threat may become severe, if more virulent strains or races of the pathogens are introduced [12]. Even pests with a low seed transmission rate, especially viruses, may lead to the development of an epiphytotic proportion of the disease in a field, if the other conditions (e.g., occurrence of insect vectors and susceptible hosts) and climate are favorable [13].

The introduction of economically important alien pests, a term used for non-indigenous pests introduced into new territory, from their centers of origin into new environments, has been reported in many different parts of the world [14]. Considering that every plant serves as a host for several insects and microbes of both a beneficial and harmful nature, every introduction of plant material is expected to result in the introducing exotic organisms. For instance, European farmers introduced wheat and its pathogens, *Mycosphaerella graminicola* and *Phaeosphaeria nodorum*, into the Americas, Australia, and South Africa in the past 500 years [13,14]. Some examples of introduced pests causing epidemics and pandemics with disastrous consequences for food production, livelihoods, and environmental biodiversity include the Irish potato famine in the 1840s caused by *Phytophthora infestans,* which was introduced from Central America into Ireland [11,15]. Some recent examples of devastating outbreaks caused by transboundary pest introductions into regions where the CGIAR operates include the maize lethal necrosis (MLN) epidemic in East Africa caused by maize chlorotic mottle virus (MCMV), which was introduced from East Asia [16]; the fall armyworm (*Spodoptera frugiperda*) outbreak in Africa and Asia, caused by the likely introduction of insect pest from the Americas [17]; the cassava mosaic disease outbreak in East Asia, caused by the Sri Lankan cassava mosaic virus (SLCMV), which was introduced from South Asia [18]; the banana bunchy top virus (BBTV) outbreak in the sub-region of Western Africa due the spread of the virus through planting material from the sub-region of Central Africa [19]; the expansion of banana wilt caused by the *Fusarium oxysporum* Tropical Race 4, which was introduced from Southeast Asia into South (India) and West Asia (Jordon and Israel), Mozambique and Colombia [20]; the outbreak of potato cyst nematode, *Globodera pallida*, in Kenya by the likely introduction of the pest from Europe [21]; the wheat blast outbreak in Bangladesh [22] and, more recently, in Zambia [23], caused by the *Magnaporthe oryzae* pathotype *Triticum,* introduced from South America; the spread of *Candidatus* Liberibacter solanacearum haplotype A and its vector, *Bactericera cockerelli* (potato psyllid), which is responsible for the potato Zebra chip disease and potato purple top disease, which are likely to have spread from the Central American region to Ecuador in South America [24]. The occurrence of these new pests in the territories was recognized during obvious disease outbreaks. A few examples of transboundary pests that have caused economically significant disease outbreaks through their spread to sub-Saharan Africa are indicated in Table 1 and Figure 1.

**Table 1.** Some of the most economically important disease outbreaks, caused by pests introduced into sub-Saharan Africa (SSA).



**Table 1.** *Cont.*

Source: CABI Invasive Species Database: https://www.cabi.org/isc/ (accessed on 29 November 2020).

**Figure 1.** Examples of a few transboundary pests that have caused significant economic damage to food production and posed a major risk to germplasm transfers in sub-Saharan Arica. (**A**) Papaya mealybug (*Paracoccus marginatus*), (**B**) Taro blight (*Phytophthora colocasiae*), (**C**) Panama disease (*Fusarium oxysporum* Tropical Race 4), (**D**) Cassava brown streak (cassava brown streak ipomoviruses), (**E**) Maize lethal necrosis (Maize chlorotic mottle virus), (**F**) Fall armyworm (*Spodoptera frugiperda*), (**G**) Potato cyst nematode (*Globodera pallida*), and (**H**) Banana bunchy top (Banana bunchy top virus)

Many alien pests that could have caused significant damage were intercepted by quarantine authorities at the port of entry, preventing their introduction and establishment. For instance, in the USA, 7% of the 725,000 pest interception reports at the port of entry between 1984 to 2000 were attributed to plant propagative material [25]. In India, 45 viruses of quarantine importance were intercepted in imported plant germplasm between 1985–2016 [26]. The number of first reports of crop pests in new hosts and/or new regions has increased in recent years, driven by agricultural intensification, international trade, and climate change [13,27]. An analysis of 1300 known invasive pests and pathogens estimated their potential cost to global agriculture at over US\$ 540 billion per year, if they continue to spread [28,29]. National capacities to prevent and manage alien pests are sub-optimal in much of Africa, as well as in parts of the Latin America, the Middle East, Central Asia, and Indochina, predisposing global biodiversity hotspots in these regions to the risk of exotic pest invasion [30]. Increasing risk of pest introduction and limited capacity to act against invasions in most countries warrant robust strategies to prevent transboundary pest spread, especially through propagation material, as such strategies decrease the chance of pest introduction, establishment, and further spread.

#### *1.3. Transboundary Pest Risk to Germplasm Distribution and Premises for the Establishment of CGIAR Germplasm Health Programs*

The nature of the CGIAR germplasm acquisition and distribution operations presents a high-risk scenario for transboundary pest spread, consequently, the safety of germplasm movement has been a major concern. The main reasons for this are the diverse origin of accessions, which are acquired from different geographies that converge at a research station for regeneration and characterization, and their distribution to diverse locations [7,15]. For instance, in the past 10 years, CGIAR genebanks have distributed 854,000 samples to 150 countries, at an average of 105,000 samples per year, catering for about 2000 requests from 100 countries [5]. The risk of exotic pest introduction through an accession into a new territory and its spread in the case of favorable environmental conditions is high. Similarly, endemic pests in regeneration sites can migrate to new territories along with the germplasm. Poor phytosanitary management also has detrimental effects on the survival of accessions during regeneration and evaluation, and it lowers the viability of the germplasm in the conservation facility, and it could lead to a loss of diversity in the collections and genetic erosion [31].

Without the proper phytosanitary measures of international agencies, such as the CGIAR, germplasm distribution increases the possibility of pest dissemination in areas previously considered to be disease-free [32,33]. Unintended pest spread through germplasm is a significant concern for the CGIAR genebanks and breeding programs, the majority of which distribute germplasm to developing countries and biodiversity hotspots that lack a sufficient quarantine capacity to prevent pest entry or respond to pest outbreaks [30]. Recognizing these pest risks, the CGIAR centers have setup Germplasm Health Units (GHUs), with the objectives of (i) averting the spread of quarantine pests with CGIAR germplasm transfers, (ii) preventing pest outbreaks, (iii) safeguarding biodiversity, and (iv) strengthening the development of phytosanitary capacities. The GHUs ensure compliance with the Food and Agriculture Organization (FAO)-International Plant Protection Convention (IPPC) procedures, which have the force of a legal treaty, and are enforced by the National Plant Protection Organization (NPPO or quarantine agency) to regulate pest spread through transfers of germplasm [34]. The movement of germplasm internationally is subjected to the same rules, with the directive that germplasm should be free of regulated pests for safe transfers across international boundaries [35]. Therefore, it is crucial to test the health of germplasm accessions before distribution and before it is used planting material. As the Center's liaison, GHUs engage with the NPPO of the host and recipient countries to organize import permits, conduct inspections of regeneration fields, conduct germplasm health testing and phytosanitation, and prepare germplasm for exportation or importation in accordance with the International Standards for Phytosanitary Measures (ISPMs) of the IPPC and other recommended actions, including the FAO-International Board for Plant Genetic Resources (IBPGR) technical guidelines for the safe movement of germplasm [36].

In this paper, we describe the mission and functions of the CGIAR GHUs and how transdisciplinary approaches are adopted for the phytosanitary protection of germplasm at all stages of the value chain, from acquisition for conservation in genebanks to regeneration for accession increase, breeding, safety duplication and regional and international distribution for the safe global movement of germplasm from the 11 CGIAR genebanks and breeding programs. We present the emerging phytosanitary challenges and the increased risk of transboundary pests and pathogens to the international exchange of germplasm, followed by lessons learned and recommendations stemming from experience of the international network of GHUs, which operate across all continents in collaboration with the NPPOs and plant health organizations. We conclude by stating the benefits of globally coordinated phytosanitary networks to prevent the transboundary spread of diseases through plant propagation materials.

#### **2. Historical Evolution of GHU and Its Core Functions**

*2.1. Development of Institutional Capacity for the Prevention of Transboundary Pest Spread through Germplasm*

The first set of plant quarantine procedures, as a legal measure, was established in 1873 in Germany to regulate potato tuber imports from the USA in order to prevent the spread of the Colorado potato beetle, *Leptinotarsa decemlineata*. The first international agreement on measures to control the pests through regulation of the movement of plants was established in 1878 as an "International Convention on Measures to be taken against *Phylloxera vastatrix* (present name *Viteus vitifoliae*)", an insect pest that was introduced with the vine cuttings imported from the USA to France in 1865 [14]. This agreement between seven European countries, which came into force on 3 November 1881, has specified procedures for the certification of plant material for export and import in order to control grapevine phylloxera [14]. Many countries subsequently followed suit by imposing quarantine regulations to contain the spread of pests through plants and plant products, which led to the establishment of the International Convention for the Protection of Plants in 1929 by the International Institution for Agriculture in Rome. The International Plant Protection Convention (IPPC), adopted by the sixth Conference of the FAO in 1951, which came into force on 3 April 1952, was eventually established as a multilateral treaty on plant protection and replaced previous agreements [34]. The IPPC and the 1994 World Trade Organization (WTO) agreement on the Application of Sanitary and Phytosanitary (SPS) Measures (the SPS Agreement) shaped international plant quarantine policy and standards for regulatory measures implemented by member countries for the protection of plants, animals, and human life [34]. However, the implementation of quarantine procedures differed all over the world, with many developing and underdeveloped countries unable to implement adequate measures, due to poor resources and a lack adequate physical and technical capacity [28,37].

At the time of the establishment of the early CGIAR centers—the IRRI, CIMMYT, CIAT, and IITA—in the 1960s, the subsequent formation of the CGIAR and the establishment of new centers, such as CIP, ICARDA, and ICRISAT, in 1971, many NPPO institutions were under development. In the early stage of the CGIAR, international germplasm exchange programs faced multiple challenges due to a lack of adequate baseline knowledge on pests affecting its mandate crops and weak quarantine infrastructure in many countries in which they were operating [38]. For instance, IITA depended on an intermediary quarantine station in the Netherlands for the importation of cassava from outside Africa in the 1970s and 1980s [15]. Similarly, the INIBAP (International Network for the Improvement of Banana and Plantain) Transit Centre (ITC) was established in 1985 by INIBAP at the Universiteit Leuven in Belgium as a transit center for *Musa* collection. The "INIBAP" was replaced by "International" when INIBAP and IPGRI (International Plant Genetic Resources Institute) were merged to establish Bioversity International at the end of 2006 [39].

To address the pest risks associated with the CGIAR germplasm exchange activities, a task force was established in 1975 by the IBPGR, which led to the publication of "Plant Health and Quarantine in the International Transfer of Genetic Resources" [40], outlining the control actions required to address the seed health challenges encountered by International Agricultural Research Centers (IARCs). This was followed by a series of consultations in the following decade, and an informal recommendation by the Regional

Plant Protection Organizations (RPPOs) in 1988, which fostered an FAO and IBPGR joint program to facilitate the safe exchange of germplasm, including the drafting of technical guidelines for the safe movement of germplasm for major crops [36]. A CGIAR commissioned study in 1989 on "Plant Quarantine and the International Transfer of Germplasm" recognized a lack of accurate, up-to-date information on pests and poor accessibility to updated information by national quarantine officials to be among the main problems, and the study recommended that CGIAR centers adopt standardized phytosanitary procedures for germplasm transfers [37]. This led to the foundation of an "inter-center collaboration on germplasm health and exchange", and the first meeting of the pathologists and virologists in charge of germplasm health from the CIMMYT, CIP, IBPGR, ICARDA, ICRISAT, IITA, and IRRI, which was convened by the IBPGR in Rome on October 1990 as a formal meeting of the germplasm health program [41]. In 1993, following the recommendations of the Sixth International Plant Protection Congress held in Montreal in August 1993, the CIMMYT, CIP, ICARDA, ICRISAT, IITA, and IRRI established GHUs as independent units within the centers to undertake research and facilitate the safe exchange of PGR from genebanks and breeding programs. Similar programs were subsequently established in the remaining five centers. These programs, differently named the Germplasm Health Unit (AfricaRice, Bioversity, CIAT, ICRAF, IITA, and ILRI), Seed Health Unit (CIMMYT, ICARDA, and IRRI), Health Quarantine Unit (CIP), and Plant Quarantine Unit (ICRISAT), serve the same mission of preventing phytosanitary risks associated with CGIAR germplasm activities and ensuring the safe transfer of germplasm.

#### *2.2. GHUs as CGIAR Gateway for Safe Germplasm Exchange*

The GHU mission is to maintain the multidisciplinary capacities required for health testing, ensuring the implementation of phytosanitary procedures to eliminate pests, facilitate the production of pest-free germplasm, and make "go/no-go" decisions on germplasm transfers from the centers based on phytosanitary statuses. The six strategic objectives of GHUs are: (i) to ensure that the transboundary movement of germplasm and non-seed biological materials complies with the regulatory guidelines of the importing and exporting countries and that the materials are free of quarantine pests; (ii) to develop and adopt phytosanitary procedures to generate pest-free germplasm; (iii) to develop diagnostic tools for seed health monitoring and pest surveillance; (iv) to conduct pest risk assessments of germplasm activities, including conservation, seed increase, and transfers; (v) to contribute to the development of phytosanitary capacity; and (vi) to organize a GHU Community of Practice to form a network of centers in transboundary pest prevention.

From the moment of their establishment, these centers began to develop expertise on the safe exchange of germplasm for its mandate crops by mobilizing interdisciplinary capacities for seed and plant health testing, phytosanitation, and therapy procedures in order to generate pest-free planting material or salvage germplasm after eliminating the risk of contaminating pests [36,41]. This evolution was extremely challenging due to the inadequate knowledge on pests affecting crop species of interest in operational territories. This has often led to the task of simultaneously conducting research on the identification and characterization of pests, developing diagnostics for pest detection, and establishing procedures for germplasm phytosanitation [15]. In this process, the CGIAR centers have worked closely with the NPPOs of the host countries to standardize safe germplasm exchange procedures, which were eventually transformed into a formal collaboration between the center and the host country NPPOs, leading to special accords for CGIAR germplasm transfers. For instance, in 1978, the Indian Council of Agricultural Research (ICAR) accorded permission to set up an "Export Certification Quarantine Laboratory" (Plant Quarantine Unit) at the ICRISAT headquarters in Patancheru, Hyderabad, India [38]. The need for adaptive changes remains a consistent requirement for coping with the changing phytosanitary risks in the world. For instance, the MLN outbreak in East Africa led to the redrafting of safe maize exchange procedures [42], and the characterization of casual viruses in the 2000s of cassava brown streak disease, described in the 1920s, led to the draft-

ing of a new protocol for cassava virus indexing and the production of virus-free cassava planting material [43], which was followed by a reindexing of the cassava collection held in the IITA Genetic Resources Center at Ibadan, Nigeria. GHU programs have demonstrated a high commitment to the minimization of the pest risks associated with germplasm transfers as many national programs, especially in sub-Saharan Africa, lack sufficient facilities to carry out the required testing and/or phytosanitation of the accessions before their release for propagation use [30]. GHUs perform all their activities in close collaboration with NPPOs, RPPOs, and several national and international plant health programs. The centers' policy mandate for all germplasm, outgoing and incoming, is channeled and cleared through GHUs to ensure safe import or export, and GHUs have, over the years, developed into the centers' gateway for the safe international exchange of germplasm.

In 2017, GHUs were aligned with the Germplasm Health (GH) component of the CGIAR Genebank Platform [5]. This offered a unique opportunity for strengthening collaboration among GHUs, catalyzing a harmonized CGIAR GHU strategy, adopting a common Quality Management System (QMS) to ensure uniform standards across the centers, and the implementing cross-center R4D initiatives to address recalcitrant and emerging phytosanitary challenges. These recent developments, in close collaboration with NPPO partners, resulted in GHUs becoming a global network for transboundary pest prevention and effectively addressing the emerging needs of CGIAR programs.

#### **3. Procedures for Germplasm Health Testing and Safe International Transfers**

*3.1. Multistage Phytosanitary Controls for Pest Prevention*

The multidisciplinary and multistage process of GHUs for ensuring the phytosanitary safety of bioresources has five stages [15]: (i) germplasm health testing for pests using a range of diagnostic methods, including conventional bioassays, culturing methods, serological detection using enzyme-linked immunosorbent assay (ELISA), and nucleic acid-based detection (nucleic acid hybridization techniques, or various formats of DNA and RNA amplification, including polymerase chain reaction (PCR) and isothermal amplification methods, such as loop-mediated isothermal amplification (LAMP), and recombinase polymerase amplification (RPA), or sequence-independent high-throughput sequencing (HTS) and bioinformatics virus detection [44]); (ii) physical inspection to eliminate infected and physically damaged true seeds and vegetative propagules [12]; (iii) pest risk mitigation during germplasm regeneration using the most optimum procedures, including inspection of plants during the active growth stage, the use of pesticides and weed management in the field, nursery, and screenhouse production sites, and the use of virus-free planting material for clonally propagated germplasm [36]; (iv) phytosanitation (treatment) of germplasm, as a curative procedure to eliminate pests and salvage germplasm [36]; and (v) documentation for traceability and regulatory compliance, which includes an import permit issued by the NPPO of the import country that enlists phytosanitary conditions for import qualification; a phytosanitary certificate issued by the NPPO of the export country, ensuring that the germplasm complies to conditions listed in the import permit; and a health statement issued by GHUs, with a description of the germplasm origin, propagation, and health assessment information [36].

Generally, CGIAR centers acquire (imports) or distribute (exports) small quantities of germplasm as a few grams of botanic seeds or a small number of vegetative propagules of accession as in vitro plantlets, tubers, corms, or stem cuttings. All countries regulate incoming genetic resources according to the national and international laws and regulations, which are designed to prevent the risk of pest introduction. It is necessary for the CGIAR centers to align their phytosanitary compliance procedures with the host country NPPO, which is the statutory organization that sets policies, laws, and regulations to oversee plant material transfers according to the IPPC framework and agreements. The import and export of germplasm is a collaborative endeavor between the GHU and the NPPO of the country of export and import, the germplasm provider, and the germplasm material recipient (Figure 2). The GHUs submit imported germplasm to the NPPO of the host

country for post-entry inspection to ensure compliance with the importation conditions, including checks for quarantine pests in the post-entry testing facility or regeneration of germplasm in a quarantine isolation area, prior to the release of safe material to the requester. A similar procedure is used to export germplasm from the centers. The GHUs conduct specific checks, as indicated on the import permit, and dispatch the material with or without seed treatment, depending on the importer's requirements.

#### *3.2. Criteria for Pest Monitoring*

A wide variety of pests is reported for each crop species, some of which are ubiquitous with the host species distribution in all geographies, and some pests are restricted to a few geographies [45]. Monitoring germplasm for pests depends on the knowledge of the pests affecting a crop species, particularly in the country regenerating the accessions, the crop propagation method (true seed or vegetatively propagation), and the ability of a pest to spread through germplasm. On the basis of the pest biology of different crop species and the economic significance and risk associated with crops and production systems, the NPPOs of each country have established national pest lists that categorize pests as "unregulated" or "regulated pests", with a further division of regulated pests into "quarantine pests" and "regulated non-quarantine pests" [46]. Pests whose introduction into an area can result

in severe destruction are classified as quarantine pests. Regions can be free of quarantine pests, or such pests may exist but not be widely distributed (e.g., cassava brown streak virus, *Fusarium oxysporum* f. sp. *cubense* Race 4, and *Candidatus* Liberibacter solanacearum). Quarantine pests are strictly controlled through official monitoring measures, which are enforced by the NPPO. However, regulated non-quarantine pests are widely distributed, and their presence in germplasm causes planting material losses or initiates new disease cycles (e.g., cucumber mosaic virus). Unregulated pests include endophytes, saprophytes, and other pests of no significance.

The pest categorization and country-specific lists of quarantine and regulated nonquarantine pests are established by the NPPO. They are dynamic lists, which are updated regularly [47]. Countries and regions also use alternative classifications to designate regulated and unregulated pests. For instance, the European Plant Protection Organization (EPPO) uses "A1 pests" and "A2 pests" based on the complete absence or presence of designated pests in the EPPO region, respectively, and the Nigerian Agricultural Quarantine Services (NAQS) classifies pests under three categories: Category A (quarantine pest) for pathogens that are not present in Nigeria and/or in any country in West Africa; Category B (restricted regionally occurring pest) for pathogens that have a restricted local distribution in Nigeria and/or West Africa, against which field inspection and/or seed health testing methods can provide adequate protection; and Category C (regulated non-quarantine pests) for internationally widespread pathogens that may affect seed quality [15].

National plant pest lists provide information on pests likely to be associated with a plant species in the country of origin, and national regulated pest lists provide information on pests that need to be controlled using rigorous quarantine measures. Both these types of pest lists form the basis for setting the conditions of germplasm transfers between countries. The IPPC has established standards and frameworks for preparing a regulated pest list [48], and pest risk assessment (PRA) procedures for establishing scientifically justified regulations for the prevention of regulated pest incursions [48,49]. However, such analyses are often limited to commercially traded crops (e.g., chickpea, groundnut, maize, potato, rice, sorghum, soybean, and wheat), and the information on the pest occurrence, economic significance, distribution, and epidemiology is scanty or non-existent for several minor and orphan crops, as well as wild relatives. In addition, changes to pest nomenclature, due to taxonomic revision, which necessitates additional efforts to revise the pest lists, further complicates compliance procedures. Many countries do not update pest lists regularly, and the fact that importation or exportation conditions are based on outdated pest lists poses a challenge for regulatory compliance. Considering the variations and gaps in the country-specific lists due to limited knowledge of the pests affecting crop species, GHUs have taken a standardized approach to conducting checks for all quarantine and regulated non-quarantine pests reported for each species (detailed in Section 4). These procedures for the minimization of the risk of the spread of known and unknown pests through germplasm are in line with the FAO-IBPGR Technical Guidelines for the Safe Exchange of Germplasm [36], ISPMs, and other best practices [50].

#### **4. Germplasm Health Testing and Pest Elimination**

Seed health testing and pest detection is a first-line approach in managing seed-borne and seed-transmitted pests. In the case, of true seed crops, some pests infecting host crops are seed-borne (e.g., *Fusarium oxysporum* in cowpea), some are seed-transmitted (e.g., bean common mosaic virus in cowpea and common bean), and some are either seed-borne or seed-transmitted (e.g., *Phyllachora maydis,* which is responsible for tar spot affecting maize) [10,51]. Seed-borne and seed-transmitted pests are a concern for germplasm conservation and exchange, and procedures are therefore used to eliminate pests including the use of seed treatment methods or the regeneration and harvesting of seed from healthy plants. However, most pests affecting vegetatively propagated crops, especially intracellular pests, such as phytoplasmas, viruses, and viroids, can spread through vegetative propagules, and eliminating them requires the use of complex procedures. The GHUs

routinely check for about 320 pests that are endemic in germplasm production sites, including bacteria, fungi, insects, nematodes, oomycetes, phytoplasmas, viruses, and viroids (Supplementary Table S2). The testing also covers other pests listed in the import permit of the country that receives germplasm. According to the crop mandate of the center, each GHU is specialized in enabling the production of quality germplasm in accordance with the best procedures available for the diagnosis and detection of pests, treatment for phytosanitation, and international transfers. GHUs apply similar procedures for genebanks and breeding programs, although genebank materials are more diverse, including wild species, landraces, and new acquisitions from new collection missions, which may demand complex/time-consuming procedures, owing to the different species biology and pest risks. The breeding program materials mostly comprise staple cereals, grain and oil seed legumes, roots, tubers, and banana crops. In general, managing the phytosanitary risks associated with true seed crops is relatively easy and effective, as not all the pathogens and pests are seed-transmitted, or seed-borne. Moreover, it is relatively easy to control or eliminate infections of seed-transmitted, and seed-borne pests from seed using chemical or heat treatments, thus salvaging germplasm. In the case of clonally propagated crops, however, systemically infectious pathogens, especially viruses and viroids, are difficult to eliminate without applying complex procedures, which are expensive and time-consuming. Brief details on the procedures employed to generate pest-free germplasm by crop group are summarized here.

#### *4.1. True Seed Crops*

#### 4.1.1. Cereals

Cereal germplasm from breeding programs and genebanks is inspected for both seedborne and seed-transmitted pests. General procedures for testing, detection, diagnosis, and seed treatment for the elimination of seed-borne pests are used [51,52], including the International Seed Trade Association (ISTA) methods, where applicable [53]. The general phytosanitary procedures used for true seed phytosanitation include, (i) activegrowth stage inspection at the flowering/pre-harvest stage to check for the presence of any regulated pests and seed-transmitted pests; (ii) dry seed examination using a desk magnifier (2x) to remove the admixtures of plant debris, sclerotia, galls, insects, smut sori, and discolored and moldy seeds; (iii) seed-washing and a sedimentation test to detect the spores that could not be detected either in dry seed examination or incubation tests; (iv) standard blotter tests to detect the presence of fungi; (v) an agar test (selective media) to detect the bacterial pathogens using specific media; (vi) a seed soaking test to detect the presence of nematodes; (vii) a seed treatment involving a fungicidal treatment to remove saprophytic fungi and seed-borne pathogens; and (viii) seed fumigation using aluminum phosphide (or methyl bromide for sorghum seeds, as per the requirement of Indian NPPO at ICRISAT, India) [54]. The tests performed for some important seed-borne and seed-transmitted diseases of various CGIAR mandate crops are summarized below.

*Barley:* The most important seed-borne fungi are smut (*Ustilago nuda*), covered smut (*Ustilago hordei*), spot blotch (*Bipolaris sorokiniana*), head blight (*Fusarium graminearum*), barley leaf stripe (*Pyrenophora tritici-repentis*), ergot (*Claviceps purpurea*), a bacterium responsible for basal glum rot (*Pseudomonas syringae* pv. *atrofaciens*); a virus (barley stripe mosaic virus (BSMV)), a seed gall nematode (*Anguina tritici*), and an insect, the khapra beetle (*Trogoderma granarium*). Standard phytosanitary procedures are used to test and generate pest-free germplasm for import and export, including considerations of the additional conditions laid down by the NPPO of the import and export countries.

*Maize:* The main risks associated with maize germplasm exportation are associated with pathogens, such as *Pantoea stewartii* pv. *stewartii* maize dwarf mosaic virus, maize chlorotic mottle virus, sugarcane mosaic virus, and wheat streak mosaic virus, which have a restricted geographical distribution. These pathogens are proven to be seed-borne and seed-transmitted, although some of them have a low transmission rate of <1%. Many other maize pathogens are listed in the requirements of the country importing the germplasm, and the measures taken to guarantee that seeds are pathogen-free cover a wide range of possible threats by applying strict phytosanitary procedures in the multiplication field plots and exhaustive laboratory seed testing using conventional, serological, and molecular methods and seed treatments.

*Rice:* Many pests and pathogens have been identified as posing a risk to rice germplasm. GHUs use various procedures, as summarized above, for seed-borne pests, including bacteria (*Psudomonas* spp., *Xanthomonas* spp.), fungi (*Magnaporthe oryzae, Tilletia barclyayana*, etc.), oomycetes (*Sclerophthora macrospora*), phytoplasma (*Candidatus* phytoplasma 16srIII-L), virus (rice yellow mottle virus) and nematode (*Aphelenchoides besseyi*) on seeds.

*Sorghum and millets:* Some of the important sorghum seed-borne diseases are ergot (*Claviceps sorghi*), anthracnose (*Colletotrichum graminicola*), leaf blight (*Exserohilum turcicum*), downy mildew (*Peronosclerospora sorghi*), loose kernel smut (*Sporisorium cruentum*), long smut (*S*. *ehrenbergii*), head smut (*S*. *reilianum*), covered kernel smut (*S*. *sorghi*), bacterial blight (*Ralstonia andropogoni*), bacterial leaf streak (*Xanthomonas vesicola* pv. *holcicola*), and bacterial leaf spot (*Pseudomonas syringae* pv. *syringae*). Ergot (*Claviceps fusiformis*), and smut (*Moesziomyces penicillariae*) are the major seed-borne diseases of pearl millet. There are also some reports of downy mildew (*Sclerospora graminicola*) being seed-borne in nature. *Melanopsichium eleusinis, Pyricularia grisea*, and *Bipolaris* sp., are the important pathogens of small millet, for which salvaging treatment is used to recover pest-free seeds.

*Wheat:* The main risks associated with the germplasm exportation of bread and durum wheat are associated with pathogens, such as Karnal bunt (*Tilletia indica*), common bunt (*T. tritici* and *T. laevis*), *Alternaria triticina*, *Xanthomonas translucens* pv. *undulosa*, BSMV, and wheat streak mosaic virus (WSMV), which have a restricted geographical distribution. Nevertheless, many more wheat pathogens are listed in the requirements of the country importing the germplasm, and the measures taken to guarantee that the seeds are pathogenfree cover a wide range of possible threats by applying strict phytosanitary procedures in the multiplication field plots and exhaustive laboratory seed testing and seed treatments. Germplasm that is imported is subject to the NPPO regulations and inspected very carefully for wheat blast (*Magnaporthe oryzae* pathotype *Triticum*), dwarf bunt (*T. controversa*), and flag smut in wheat (*Urocystis agropyri*). In addition to the fungal pathogens, inspections are also carried for seed-borne insect pests, such as *T. granarium* (the khapra beetle), in seed exports and imports.

#### 4.1.2. Grain and Oil Seed Legumes

Legume germplasm is more prone to pest attack, and many of these pests are known to spread through seeds [45]. A list of regulatory pests and pathogens frequently tested in the legume germplasm regeneration sites of CGIAR is given in Supplementary Table S2. The stringent phytosanitary and seed health testing procedures, such as those described for cereals, are also applied for legumes to prevent the transfer of fungal, bacterial, and viral diseases through legume germplasm. In general, germplasm and breeding lines for international transfers are regenerated under screenhouse conditions to avoid viral infections, and the germinated plants are inspected for viral symptoms and indexed by ELISA or PCR-based methods to ensure that plants are free from viruses prior to seed harvesting. Grow-out tests under screenhouse conditions are performed to assess seed-transmitted viruses, which is a standard practice for legumes. Although it is a timeconsuming procedure, but it offers a reliable detection that eliminates the risk of viruses. Some of the important seed-borne and seed-transmitted pests, for which observations are conducted for export and import quarantine, are listed by crop below.

*Bean:* About 23 seed-borne bacterial, fungal, and viral pathogens are reported to be important for beans, including common blight (*Xanthomonas campestris* pv. *phaseoli*), charcoal rot (*Macrophomina phaseolina*), and anthracnose (*Colletotrichum truncatum*), along with three seed-transmitted viruses (alfaalfa mosaic virus (AMV), bean common mosaic virus (BCMV), and peanut mottle virus (PeMoV)).

*Cowpea, bambara groundnut and other* Vigna *species*: Several fungi and bacterial pathogens of cowpea are seed-borne, including cowpea bacterial blight (*Xanthomonas axonopodis* pv. *vignicola*), web blight (*Rhizoctonia solani*), and brown blotch (*Colletotrichum capsici*). About 10 viruses are reported to be seed-transmitted in cowpea. The most frequent viruses of interest in seed transmission are cucumber mosaic virus (CMV), cowpea yellow mosaic virus (CYMV), cowpea mottle virus (CmeV), southern bean mosaic virus (SBMV), and cowpea mild mottle virus (CPMMV). Cowpea seeds are subjected to fumigation with phostoxin (55% aluminum phosphide) to eliminate insect pests and treated with fungicide to eliminate seed-borne pathogens.

*Chickpea and pigeonpea*: Important seed-borne diseases of these two grain legumes are blight (*Ascochyta rabiei*), grey mold (*Botrytis cinerea*), wilt (*Fusarium oxysporum* f. sp. *ciceri*), and stem blight (*Phomopsis longicolla*) in chickpea; blight (*Botryodiplodia theobromae*) and wilt (*Fusarium oxysporum* f. sp. *udum*) in pigeonpea.

*Faba bean*: Twenty fungal species belonging to 13 genera were recognized as seed-borne risk (*Aspergillus*, *Penicillium*, *Alternaria*, *Botrytis*, *Cephalosporium*, *Cladosporium*, *Epicoccum*, *Fusarium*, *Rhizoctonia*, *Rhizopus*, *Stemphylium*, *Trichothecium*, and *Verticillium*), along with four seed-transmitted viruses [broad bean stain virus (BBSV), bean yellow mosaic virus (BYMV), broad bean mottle virus (BBMV), and pea seed-borne mosaic virus (PSbMV)]. Broomrape (*Orobanche* and *Phelipanche* spp.), root parasitic weeds, are also considered to pose a threat and measures are taken to avoid germplasm multiplication in the broomrape infested fields.

*Groundnut*: Dry root rot (M. phaseolina/Rhizoctonia bataticola), root rot (Rhizoctonia solani), pod rot (Sclerotium rolfsii), Sphaceloma arachidis (groundnut scab), *Ralstonia solanacearum* (African strains), seed bruchid (Stator pruininus), Testa nematode (*Aphelenchoides arachidis*), peanut mottle virus (PMV), peanut stripe virus (PStV), peanut clump virus (PCV), Indian peanut clump virus (IPCV), peanut stunt virus (PSV), and tobacco streak virus (TSV) are the important quarantine pests for groundnut.

*Lentil***:** The important fungal seed-borne diseases of lentil include *Ascochyta lentis* (ascochyta blight), and *Fusarium oxysporum* f. sp. *lentis* (fusarium wilt), botrytis grey mold (*Botrytis fabae* and *B. cinerea*), Stemphylium blight (*Stemphylium botryosum*), phoma blight (*Phoma medicaginis* var. *medicaginis*), and anthracnose (*C. lindemuthianu* and *C. truncatum*); stem nematode (*Ditylenchus dipsaci*), and seed-transmitted viruses, include, AMV BYMV, PSbMV, CMV, and BBSV.

*Soybean*: The seed-borne fungal and bacterial pathogens of soybean are soybean bacterial pustule (*X. axonopodis* pv. *glycinea*), brown spots (*Septoria glycinea*), frogeye leaf spots (*Cercospora sojina*), yellow leaf spots (*P. manshurica*), charcoal rot (*M. phaseolina*), and anthracnose (*C. truncatum*). Many seed-transmitted viruses are also reported in soybean, including BCMV, CMV, CYMV, CmeV, CPMMV, and SBMVin West Africa. Rigorous tests are also conducted for other viruses depending on the country of origin. Seeds are fumigated with phostoxin (55% aluminum phosphide) to eliminate insect pests and fungicide treatments are given to eliminate seed-borne pathogens.

#### *4.2. Vegetatively Propagated Crops*

Banana (and plantain), cassava, potato, sweetpotato, and yam are the major vegetatively propagated crops (VPCs) exchanged by the CGIAR programs [55]. Vegetatively propagation poses the greatest risk of the introduction of pests through planting material, which can carry any infections from previous seasons to the next cropping cycle and thus accumulate pathogens, especially viruses, over generations of cultivation. Many transboundary pest introductions have been linked with the transfer of vegetative propagules: the spread of BBTV to Africa and its further spread in the continent [19,56]; in the case of potato, the necrotic strains of potato virus Y (PVY) in Brazil and aggressive strains of potato late blight in Africa and Asia and potato cyst nematode (*Globodera palladi*) in East Africa; in the case of cassava, the regional spread of cassava brown streak virus (CBSV), which is attributed to contaminated stem propagation; and in Asia, the spread of

Sri Lankan cassava mosaic virus (SLCMV) from South Asia to East Asia [57]. Therefore, many countries regulate vegetative germplasm importation, and the FAO-IPGRI technical guidelines recommend that only in vitro plants that have been tested for pathogens should be moved between countries [36]. The pollen or true seed of these crops are also exchanged for breeding purposes under adequate phytosanitary controls. By limiting international movement to sterile in vitro plants, the only concern that remains is intracellular obligate pathogens, such as viruses, viroids, and phytoplasmas. Depending on the country, some viruses are regulated by quarantine procedures (e.g., BBTV, CBSV, and PVY), and several other viruses are unregulated (e.g., sweet potato mild mosaic virus). Nonetheless, the standard procedure used by GHUs includes the generation of virus-free in vitro plants as per the FAO/IBPGR technical guidelines for the conservation and distribution of these crops [36]. All the material exported and imported are tested for viruses, and other pests under NPPO guidance, and only material free of viruses, and other pests, is released to the end-users. Unlike cereals and legumes, the phytosanitation, and testing procedures for clonally propagated crops differ according to the crop species, as explained below.

*Banana:* Banana is a perennial herbaceous plant, traditionally propagated using suckers (side shoots generated from underground corms), and is thus often carries both soilborne insects and fungi, in addition to shoot-invading viruses, fungi, and bacterial agents. Several banana pathogens, like *Fusarium oxysporum* f. sp. *cubense* tropical race 4 (Panama disease), banana Xanthomonas wilt (*Xanthomonas campestris pv. musacearum*), and several viruses, such as BBTV and banana bract mosaic virus (BBrMV) have restricted geographic distribution. Guaranteeing the movement of pathogen-free germplasm is an important task to minimize the risk of these regulated quarantine pest introduction into new countries. Pathogens that are often symptomless in germplasm (e.g., in vitro plants, corms, and suckers), such as viruses, pose a special risk to the movement of vegetative germplasm. The Bioversity International-CIAT Alliance (1617 accessions), and the IITA (393 accessions) germplasm collections are managed as in vitro cultures. It has been shown that bacterial and fungal contaminants in banana shoot tip culture can be eradicated by isolating small explants, e.g., 1 mm meristems, and culturing them in vitro, but the virus infection still presents an important risk. To mitigate these risks, the Conservation Thematic Group of MusaNet, an international network for *Musa* genetic resources coordinated by Bioversity International, has recently edited a new version of technical guidelines to minimize the risk of pest introductions, through the movement of germplasm [58]. These guidelines followed a recommendation issued on the basis of an analysis of the phytosanitary procedures carried out by GHUs [56]. As per the new guidelines, at least four plants for each accession are grown for six months in a greenhouse. Leaf sampling is carried out from the limb and midrib of the three youngest leaves after 3 and 6 months for the comprehensive detection of the five most important viruses by PCR/RT-PCR: BBrMV, BBTV, banana streak virus (BSV), banana mild mosaic virus (BanMMV), and cucumber mosaic virus (CMV). Comprehensive indexing using electron microscopy is also conducted to search for any viral particle. Sanitation of the virus-infected banana accession is a complex process requiring a combination of meristem culturing, thermotherapy, and chemotherapy. Despite numerous efforts and the continuous optimization of the protocols, the success rate of banana sanitation is around 70%. An accession indexed negative is added to an in vitro banana collection for further safe propagation and distribution. All precautions are taken to avoid any further infection to in vitro plants that could arise if the plant is transferred to the field or greenhouse before distribution.

One of the major challenges for banana germplasm exchange was posed by the finding of an integration of the BSV genome, termed the eBSV (endogenous BSV), in the *M. balbisiana* genome, which contributes to the B genome. The eBSV can spontaneously release infectious particles, especially following in vitro culturing and interspecific crosses [19]. The presence of infectious eBSVs within B genomes has emerged as a main constraint for health indexing and safe *Musa* germplasm transfers. Plants apparently negative to BSV could spontaneously become positive with the expression of the eBSV sequence. The discovery

of this phenomenon in bananas in the 1990s halted banana germplasm distribution from CGIAR centers. However, the Inter-African Phytosanitary Council, the Regional Plant Protection Organization of Africa, made a provision allowing the distribution within Africa of virus-free banana and plantain that may carry eBSV [19]. Based on this regulation, the IITA genebank and breeding programs distribute virus-free banana germplasm within Africa, with the informed consent of the recipients. However, the advancement of technology and knowledge on viruses integrated in host genomes provide a way to overcome this natural bottleneck to germplasm distribution. First, diagnostic techniques were established to distinguish eBSV and episomal virus particles for virus indexing purposes; secondly, molecular markers were established to identify *Musa* accessions with activatable eBSV; and lastly, a decision model was developed to enable the distribution of *Musa* germplasm with eBSV sequences based on the consent of the importer [58].

*Cassava:* Cassava is cultivated for tuberous roots and is traditionally propagated using stem cuttings. The crop is conserved in field collections and in vitro. The in vitro collection of 6500 accessions of cassava at the CIAT in Colombia, and 3700 accessions at the IITA in Nigeria are the largest cassava ex situ collections. The germplasm is exchanged as in vitro plants and botanic seed. Viruses and phytoplasmas pose a major threat to cassava distribution as in vitro plants. A diverse range of viruses infect cassava in Latin America, Africa, and Asia (Supplementary Table S1) [59,60]. The sanitary testing of the cassava collection held by the CIAT in Colombia checks for viruses prevalent in the region: Cassava common mosaic virus (CsCMV), cassava virus X (CsXV), and four other viruses that are associated with the cassava frogskin disease: the cassava frogskin associated virus (CsFSaV), cassava polero-like virus (CsPLV), cassava new alphaflexivirus (CsNAV), and cassava torrado-like virus (CsTLV) [60]. The sanitary testing of the in vitro African cassava collection conserved in the IITA, Nigeria, mainly checks for viruses prevalent in Africa: African cassava mosaic virus (ACMV), a complex of East African cassava mosaic viruses (EACMVs), and its strains, cassava brown streak ipomoviruses (CBSIVs), and 16Sr Phytoplasmas [57]. The cassava collection in Asia mainly focuses on viruses (Indian cassava mosaic virus (ICMV), and SLCMV), and phytoplasmas prevalent in the region. Both the CIAT and IITA GHUs have the diagnostic capability to test for all viruses known to infect cassava.

Several procedures have been established for virus and phytoplasma detection to generate virus-free planting material from meristem cultures in vitro, with or without thermotherapy, chemotherapy, or cryotherapy. The basic procedure includes a heat treatment applied to of stem cuttings with a length of about 30 at 28 and 38 ◦C for 6 h in the dark and 18 h in the light in an incubator [43]. Apical shoots from stem cuttings, after sanitation with 3% sodium hypochlorite, are used for meristem excision and in vitro plant development. About 2- to 4-month-old in vitro plants are virus indexed by PCR or RT-PCR to detect and eliminate virus-infected plants, and the remaining plants are re-indexed second time after 3 to 4 months to ascertain their health status. The virus-free plants are used as a mother stock for conservation as 'clean stock', and further propagation and use. It takes about 6 to 12 months to generate a virus-free stock of cassava germplasm. These procedures are known to be robust and 90% efficient in eliminating viruses. Only virus-free in vitro plants are transferred for propagation purposes. Cassava germplasm distributed as botanic seed poses little risk of virus spread, as none of the viruses reported to infect cassava have been detected in seedlings. Nonetheless, the cassava botanic seeds are surface sterilized with insecticides and pesticides, and they are germinated in screenhouses for physical inspection, before the seedlings are released to the end-users. Occasionally, cassava germplasm is transferred as stem cuttings generated from virus-free in vitro plants under insect-proof screenhouse conditions, after stem treatment with a slurry of insecticide and fungicide cocktail to eliminate microorganisms and arthropod pests.

*Potato:* Potato is propagated through tubers. Besides viruses, viroids, and phytoplasmas, field-produced tubers can transmit a long list of bacterial, fungal, and nematode diseases, including brown rot (*Ralstonia solanacearum phylotype IIB*), softrot (*Pectobacterium*

and *Dickeya* spp.) ringrot (*Clavibacter michiganensis* subsp. *sepedonicus*), wart (*Synchytrium endobioticum*), common scab (*Streptomyces* sp.), powdery scab (*Spongospora subterranea* f. sp. *subterranea*), late blight (*Phytophthora infestans*), nematodes, and insects. Potato is known to be infected by more than 50 different viruses, but only about a handful of them (PVY, PVX, PVS, PVA, and potato leaf roll virus) are significant pathogens globally; however, some unique local viruses can be of major concern [61]. The CIP maintains one of the world's largest in-vitro genebank collections with over 7209 accessions of potato, many of which are of a local origin. The CIP uses a ISO/IEC17025 accredited process for ensuring that in vitro plants are free of all pathogens, both known and unknown. The process includes a combination of an antibacterial treatment before and during in vitro introduction, followed by virus indexing, which combines ELISA (9 viruses), RT-PCR (1 virus and 1 viroid), and a biological indicator host infection for 11 species. This indexing is repeated twice, before and after thermo-therapy and meristem tip culturing. Due to the extremely contagious and stable nature of PSTVd all plants are tested for this viroid before even starting the process of introduction and are destroyed immediately if they are found to be positive. The protocol for virus cleaning at the CIP is 91% efficient, and the most difficult viruses to clean are PVS and PVT. Only material that has been certified to be free of any pathogens after this process is permitted to be moved internationally. Any germplasm received from other institutions or countries will be tested in a similar way, before it can be made available for further distribution. Despite the rigorous indexing process, additional diagnostic tests are performed for pathogens when demanded by the importing country. Breeders occasionally move true seed and pollen internationally, and this is generally considered to be safer than moving plants or organs around. For true seed and pollen, the procedure includes testing both parents (female and male) at the pre-flowering stage for seed-transmitted viruses (APLV, APMMV, AVB-O, AMV, PVT, and PYV) and PSTVd, for which they must be negative. The seeds and pollen must be free of any insect pests and treated at −70 ◦C for seven days, if contamination is suspected. The seeds are surface sterilized with 2.5% sodium hypochlorite for 10 min to kill any seed-borne pathogens.

*Sweetpotato:* Sweetpotato is traditionally multiplied through stem (vine) cuttings. Like other VPCs, viruses are the principal concern for vegetatively propagation. More than 30 viruses have been reported to infect sweetpotato, but only a handful of them are of global significance and include crinivirus, sweet potato chlorotic stunt virus (SPCSV), the potyviruses, sweet potato feathery mottle virus (SPFMV), sweet potato virus C (SPVC), -G (SPVG), and -2 (SPV2), and several related begomoviruses (Supplementary Table S2) [62,63]. The CIP genebank holds 8,054 accessions of sweetpotato, and like potato, CIP has a ISO/IEC17025 accredited process for ensuring that in vitro cultivated sweetpotato are free of all known pathogens. The process is similar to that described for potato, except that testing is conducted for sweetpotato-specific viruses by PCR (begomoviruses) and ELISA (10 viruses), and that the biological indicator host range is replaced by a single universal indicator host, *Ipomoea setosa*, on which sweetpotato accessions are grafted. The efficiency of the current virus clean-up protocols for sweetpotato is 69%, and the difficult viruses to eliminate are begomoviruses, SPFMV, and SPVG. As for potatoes, only in vitro plants that have been confirmed to be free of known pathogens are distributed internationally. Sweetpotato pollen or seed is not commonly moved internationally by the CIP or partners, but the procedure would be similar to that for potato. Only two sweetpotato viruses, a begomovirus (sweet potato leaf curl virus), and a mastrevirus (sweet potato symptomless virus 1), have been reported to be seed-transmitted.

*Yam*: Unlike other clonal crops, multiple species of yam (*Dioscorea* spp.), which originated from different parts of the world, have been domesticated for the production of underground edible tubers, and the crop is traditionally propagated using tubers [64]. Out of about ten popularly grown species, *D. rotundata* (white yam) of West African origin, and *D. alata* (water yam) of Asiatic origin are widely cultivated in the world. The IITA genebank in Nigeria holds the world's largest in vitro yam collection [64]. The global yam germplasm collection comprises about 5839 accessions of about 10 species, including *D.* *rotundata*, *D. alata, D. bulbifera, D. cayennensis, D. dumetorum, D. esculenta,* and a few other species. Yam is known to be infected by over 15 viruses [57,65], including yam mosaic virus (YMV), CMV, yam mild mosaic virus (YMV), several badnaviruses, generically referred to as yam bacilliform viruses (YBVs), Japanese yam mosaic virus, Chinese yam necrotic mosaic virus, yam asymptomatic virus 1, yam virus Y, yam chlorotic necrosis virus (YCNV), yam chlorotic mosaic virus (YCMV), Dioscorea mosaic-associated virus, and air potato ampelovirus 1. Of these, YMV is known to cause the most economic damage in *D. rotundata* and *D. cayanensis* worldwide, whereas YCMV and YCN are important for Chinese and Japanese yam in Asia. The other viruses detected in yam either cause mild mottling or no symptoms at all. Many yam viruses are not regulated, although the IITA uses protocols to generate virus-free in vitro plants for conservation and distribution [66]. Virus elimination is achieved by selecting asymptomatic plants for thermotherapy and regeneration of in vitro plants from meristem explants. The in vitro plants are subjected to virus indexing using PCR-based methods, after three months of post-flask growth, to ascertain their health status. Plants that test negative are bulk propagated for conservation and distribution. The integrated viral sequences of badnaviruses have been detected in yam genome sequences, but unlike in eBSV in bananas, the endogenous badnavirus sequences in yam are defective and not known to generate infectious particles. They were therefore not regarded to have any phytosanitary significance. Compared to other VPCs conserved by the CGIAR, yam phytosanitation is a lengthy process and usually takes between 12 to 24 months. Of the various associated factors, the slow in vitro meristem growth is a major bottleneck for the production of virus-free plants. True seed of yam is exchanged by breeding programs after seed treatment with a fungicide and an insecticide. Tubers generated from virus-free yam plants are also used for international exchange, especially within the West African sub-region, by breeding and seed system initiatives.

#### *4.3. Trees*

*Germplasm conservation approach*. The World Agroforestry Centre (ICRAF) characterizes and conserves 20,000 accessions of over 200 species of agroforestry trees, both indigenous and exotic species for agroforestry and restoration programs in Africa, Asia, and Latin America. Germplasm is conserved as seeds in genebanks or adult trees in field genebanks, where provenances are grown and evaluated in common gardens for domestication and cultivar testing [67]. Due to the wide range of plant growing conditions and geographic locations, this conservation approach implies that tree germplasm is exposed to a plethora of seed-borne pathogenic organisms, including both native and non-native microorganisms and insects. Furthermore, the vast taxonomic and geographic diversity of collected germplasm is susceptible to being impacted by poorly characterized, or unknown pests and diseases [68]. This scenario poses a major challenge in the implementation of phytosanitary measures for the detection and mitigation of pests and diseases of tree germplasm. Plant health interventions were put in place through collaborative research with the NPPO. High priority tree taxa were selected based on the strategic importance of the crop (food, timber, and non-timber product, and biodiversity conservation), and major biotic threats were determined based on laboratory testing of seed and vegetative tissue and field monitoring. Prioritized species include relevant food and multipurpose trees native to Africa, such as baobab (*Adansonia digitata*) and marula (*Sclerocarya birrea*), or exotic trees, such as Southern silky oak (*Grevillea robusta*) and eucalypts (*Eucalyptus* spp.). Special efforts have been made to characterize the taxonomy and pathogenicity of emerging fungal diseases of an unknown origin within the Botryosphaeriaceae family, which have the potential of impacting multiple tree species in Africa [69]. Standard operating procedures (SOPs) for assessing the germplasm health of both seeds and trees in the open field were developed and include: (a) seed health testing for the detection of seed-borne pathogens; (b) field assessment of incidence and damage caused by tree cankers, (c) detection and identification of tree canker fungi; and (d) field assessment of incidences and damage caused by fruit and seed insect

pests. These procedures are to be used routinely as phytosanitary measures to evaluate germplasm prior to exchange or collection and for monitoring plant health in the field.

#### *4.4. Forages*

The CIAT and ILRI genebanks hold 22,694, and 18,662 accessions of forages (cereals and legumes), which include over 1400 species, the vast majority of which are wild species [5]. Most forage accessions are distributed as seeds, except for a few that rarely produce seeds, such as Napier (or elephant grass, *Pennisetum purpureum*), which are maintained by vegetatively propagation in field genebanks. The regeneration of forage germplasm faces a great variety of growth habits and development times, due to the large number of genera, as well as the considerable number of species, belonging to the same genus. All of this makes it difficult to use standardized health testing measures on a large scale. For this reason, most of these materials are regenerated in the open field and occasionally under a screenhouse, making use of the integrated pest and disease control and management of diseases expensive. Except for a few widely used species, the knowledge on pests and diseases is limited for rigorous health testing. Most countries lack sufficient records of the pests and diseases affecting forages, which complicates international germplasm transfers. As in the case of tree germplasm, strict integrated phytosanitary control measures are used to maintain germplasm health, but much work is still needed to generate baseline knowledge on seed-borne and seed-transmitted risks in order to develop robust procedures for safe exchange.

#### **5. GHU Support for CGIAR Programs**

#### *5.1. Enabling Safe Germplasm Transfers*

The GHU operations are demand-driven and accommodate the evolving needs of the genebanks and breeding programs. The centers export and receive thousands of germplasm samples from genebanks, breeding programs, and elite seeds of released cultivars for evaluation and use by national and international partners [5–7]. Over 80% of the CGIAR centers' germplasm exports are executed mainly from twelve countries, Belgium, Colombia, Cote d'Ivoire, Ethiopia, India, Kenya, Lebanon, Mexico, Morocco, Nigeria, Peru, and Philippines. All of these countries host the headquarters, main genebanks, and breeding programs of the centers, except for Belgium and Morocco. The remaining CGIAR centers' international transfers and/or regeneration activities are operated from about 12 to 16 countries, including Benin, Cameroon, Mali, Malawi, Tanzania, Turkey, Uganda, Vietnam, Zambia, and Zimbabwe, which also host the CGIAR breeding programs, and ex situ and in situ collections of genebanks. Germplasm exports from the CGIAR centers located in these countries cater for between 90 and 130 countries per year, in all continents (Figure 3).

Uniform standards are applied to all export and import events for successful pest-free germplasm transfers. For instance, in 2018 and 2019, GHUs facilitated 1300 and 2600 events of germplasm transfers from genebanks and breeding programs, respectively, to 150 countries (Figure 3) (Supplementary Figure S1). This onerous task involves the production and extensive testing to ascertain the health status of germplasm released to end-users. In 2018 and 2019, GHUs tested and removed 7% of the 335,928 genebank samples, including those for import, export, and regeneration, and 3% of the 118,044 breeding samples for import and export, were found to be infected with pests (data not shown) [5]. In this process, a total of 2.47 million diagnostic reactions were employed to analyze the 453,972 samples in the two years at an average annual cost of about US\$ 12 million. The proportion of infected samples detected varies by crop, and the pests most frequently detected are endemic in regeneration sites. An example set of some pest-infested legume seeds are shown in Figure 4, and the percentages of rejections during the phytosanitary testing of crop germplasm in 2019 are shown in Figure 5. The infected samples are returned to the phytosanitary treatment cycle or replaced with healthy stock, and the infected samples are then subjected to incineration. The data and knowledge from extensive phytosanitary surveillance of germplasm helped GHUs to make improvements to their procedures and protocols, some of which have played

a vital role in confirming the first occurrence of regulated pests in new territories [15–17]. The technical resources and skill set of GHUs also support the centers' and partner initiatives in combating emerging pests, suppling of reference material for diagnosis and phenotyping, developing national program capacity, and improving awareness and advocacy associated with transboundary pest prevention and control.

**Figure 3.** Countries from which the CGIAR centers received germplasm (**A**), and countries that received germplasm from the CGIAR centers (**B**) in 2018–19. The data combine transfers from the genebanks and crop breeding programs. The intensity of the orange shade indicates the number of transfer instances.

**Figure 4.** Symptoms of necrosis, size reduction, and malformation in seeds of faba bean, virus-infected (right) and healthy (left): causal viruses are broad bean stain virus (BBSV) (**A**), broad bean mottle virus (**B**), and bean yellow mosaic virus (**C**). Symptoms of necrosis in lentil seeds caused by BBSV (**D**), virus-infected (right), healthy (left). Cracking caused by pea seed-borne mosaic virus (PSbMV) in pea seed (**E**), virus-infected (right), healthy (left).

**Figure 5.** Percentages of samples removed, due to pest interception during phytosanitary processing.

#### *5.2. Partnerships Enabling GHU Functions*

Excellent partnerships and special arrangements, both formal and informal, between the GHUs and NPPO of the host countries, and regional plant protection organizations (e.g., Inter-African Phytosanitary Council in Africa; and Comunidad Andina de Naciones (CAN) in South America) have a significant role in enabling the successful exchange of germplasm. Such arrangements enabled the establishment of special facilities for CGIAR germplasm processing. For instance, the National Bureau of Plant Genetic Resources (NBPGR), which is responsible for the quarantine monitoring of germplasm exchanges from India, established its Regional Station at Rajendranagar, Hyderabad, India, in 1986 as the sole plant quarantine authority for clearing the germplasm and breeding material of ICRISAT's mandate crops [54]. Similarly, special arrangements exist between the CIAT and the Colombian Institute of Agriculture (ICA) in Colombia; CIP and Servicio Nacional de Sanidad Agraria (SENASA) in Peru; the CIMMYT and El Servicio Nacional de Sanidad, Inocuidad y Calidad Agroalimentaria (SENASICA) in Mexico; the IRRI and the Bureau of Plant Industry in Philippines; the ICARDA and the Plant Protection and Plant Quarantine Department of Lebanon; the IITA and the Nigerian Agricultural Quarantine Services (NAQS) in Nigeria; the ICRAF, IITA, CIP, and CIMMYT and Kenya Plant Health Inspectorate (KEPHIS) in Kenya, to name a few. Close partnerships also exist between non-NPPO agencies, for example, the relationship between the Alliance of Bioversity International-CIAT and the University of Liege (Belgium), and the ICRAF and the Kenya Forestry Research Institute (Kenya), which enable the centers' GHU activities. These arrangements between various partners recognize GHUs as part of the national diagnostic facilities, undertake collaborative research on phytosanitary issues and pest surveillance, jointly organize national and regional phytosanitary capacity development events on emerging disease control, and undertake awareness and advocacy activities to improve phytosanitary practices, policies, and regulations.

#### *5.3. GHUs in Capacity Development*

The GHUs play very important roles in the training of personnel, institutional capacity building, and raising awareness at a global level. The GHUs organize at least 10 workshops each year for staff from national and regional organizations on various phytosanitary themes, including diagnostics, seed health testing, and seed treatment. Since 2017, GHUs have organized "The International Phytosanitary Awareness Week" in coordination with the NPPOs, RPPOs, and IPPC [70]. These activities have the objective of informing interested parties about GHUs and their role in the secure distribution of germplasm, as well as the importance of pathogens and pests in crop production. The week-long event also aims to secure links between institutions through the consolidation of a "Community of Practice", fostering collaboration and exchange in such areas as joint projects, technicalscientific support, and capacity building, among other areas, with the goal of providing a 'front line' free-flow of information to assist in decision-making and facing upcoming challenges. These meetings bring together representatives of institutions from the public sector and academia, as well as staff, to take part in a variety of activities based on a main theme, which is different every year. For instance, the 2020 theme focused on *"phytosanitary safety for transboundary pest prevention"* to mark the United Nations International Year of Plant Health 2020 (IYPH 2020) [71]. These activities increase awareness of the importance of phytosanitation locally and globally and create a scientific forum for raising awareness of the phytosanitary challenges and organizational responsibilities associated with ensuring the distribution of healthy seed and sustainable agricultural production that will contribute to the global fight against hunger and malnutrition.

#### **6. Challenges and Opportunities**

#### *6.1. Evaluation and Reevaluation of Germplasm Health*

The comprehensive phytosanitary testing procedure used to declare the health status of an accession and its suitability for safe distribution or conservation is a tedious and time-consuming task (about 6 to 24 months for clonally propagated crops and 3 to 6 months for true seed crops). Untested accessions or accessions that fail health tests are marked as "unavailable for distribution". For instance, about 75% of the true seed accessions of staple crops held in ex situ collections of the CGIAR genebanks have been health certified and are available for distribution, compared to about 55% of clonally propagated accessions [5]. However, the discovery of new pests in a crop species sometimes necessitates a reevaluation for the detection of the newly reported pests and recertification of the health status of accessions and their availability for distribution. For instance, the discovery of new viruses and a phytoplasma with the cassava frogskin disease etiology in Colombia [60], led to a reevaluation and health certification of the in vitro cassava collection held at the CIAT in Colombia. Similarly, the discovery of causal viruses of cassava brown streak disease (CBSD) in 2002, and the CBSD outbreak in the Greatlakes region of Eastern Africa in 2008 [57] led to a precautionary evaluation of the in vitro and field collection of cassava conserved at the IITA Ibadan station in Nigeria. The outbreak of MLN in Eastern Africa in 2011 affected the distribution of maize germplasm in Eastern Africa, which was resumed after the establishment of procedures for the reliable detection of maize chlorotic mottle virus (MCMV) in the seed lots and phytosanitary procedures for the safe production of maize seed [42]. While instances of this type are infrequent, they result in a significant impact on germplasm distribution and result in additional costs. At the same time, these instances also aided GHUs in gaining experience in quickly adapting to new pest situations and establishing optimal protocols for phytosanitation and diagnostics to restore phytosanitary protection of germplasm and field crops. It also helped GHUs to engage with national, regional, and continental efforts to control epidemics caused by introduced transboundary pests or the emergence of new strains or species of an established threat.

#### *6.2. Variable Standards and Different Phytosanitary Demands*

GHUs work on different crops, wild relatives, and pests in various countries. Therefore, the variety of needs arises from the different phytosanitary statuses of the crops in each geography, the technologies available for detection, diagnosis, and phytosanitation, and the standards adopted by the NPPO in the country of operation [45]. For instance, the import conditions for cassava between Nigeria and Ghana are different from that between Nigeria and Vietnam, due to different pest risks. The standards for germplasm distribution

from genebanks are not always well established. The NPPO adopts ISPMs designed for commercial consignments of plants and plant products, with specific modifications of their own for dealing with small sample sizes distributed from genebanks. Due to the better knowledge of pest risks, the standards for some crops are relatively well defined (e.g., banana, bean, cassava, cowpea, chickpea, groundnut, maize, potato, rice, wheat, and other crops) [47]. However, the vast taxonomic range, geographic diversity, and limited knowledge of the pest risks to crop wild relatives, trees, and forages pose significant challenges in the implementation of appropriate testing standards for pest detection. To overcome some of these challenges, GHUs began developing harmonized Quality Management Systems (QMS), termed the GHU-QMS, to achieve uniform standards across GHUs. The GHUs of the CIAT, CIMMYT, and CIP are ISO/IEC17025 accredited for the quality assurance of seed health testing methods. As of 2019, 139 SOPs have been developed, with 7 to 30 per GHU, depending on the center and country. These procedures were introduced with the aim of having GHUs conform uniform standards by the end of 2021.

The implementation of phytosanitary measures and policies for tree germplasm critically lacks in Africa. The extraordinary taxonomic and geographic diversity of the tree germplasm collected, and the availability of field genebanks, in addition to seed banks, show an opportunity for boosting the detection and characterization of emerging pathogens in line with the "sentinel plant" [72]. This should fuel fruitful collaborations with the NPPOs and IPPC and contribute to the much-needed updating of the lists of quarantine pest and diseases of tree species.

#### *6.3. Changes in Pest Dynamics*

The changes occurring in the dynamics of pests have a significant impact on germplasm transfers from the centers. Several economically important pest outbreaks in the last decade were attributed to introduced pests, as explained in the previous sections [14]. The perception of pest risk is also influenced by the severe destruction caused by unrelated pest outbreaks. For instance, the olive decline caused by introduced *Xylella fastidiosa* in Italy, citrus greening caused by *Candidatus liberibacter* spp., in the USA, and several other examples, including the Covid-19 pandemic, have a significant influence on the regulatory procedures and decision-making relating to germplasm transfers [13]. In addition, the discovery of new virus species using novel diagnostics technologies is adding to the burden of risks from the pests that are already known [73,74]. A study estimated that many alien pests introduced into countries are yet to be detected [13], a status termed "pseudo-absence", which implies the potential occurrence of a pest in the geography, but apparently considered there to be no alien pests because none had been found. This familiar but unquantified risk of "known-unknowns" and "unknown knowns" is a major threat for international germplasm exchange programs, which relies on pest occurrence knowledge in the country of the germplasm origin.

Over the years, GHUs have adjusted to changing pest dynamics, including undetermined pest risks and have taken adaptive measures to sustain operations. Following the MLN outbreak in East Africa, the CIMMYT GHU team established sampling and treatment procedures to sustain maize germplasm transfers. Similarly, the IITA-CIAT established cassava virus elimination protocols to maintain germplasm transfers between continents, including the use of transit centers for intermediary evaluation before delivering material to a final destination. Recently, the ICRAF GHU documented the invasive pests of native African tree germplasm, conserved as a resource for updating pest lists [69]. Procedures for the health testing of true seed crops from seedling to harvest, and a seed health test offers robust measures for the detection of both known and new pathogens. Clonal crops are more complicated, especially cryptic and latent viruses, which do not induce any symptoms and avoid detection. To overcome these challenges, GHUs have adopted HTS technologies and the bioinformatic reconstruction of viral sequences, which make it conceptually feasible to detect any viral agent by HTS of the nucleic acids from a host and the identification of viral sequences of known or unknown agents in the generated sequences [73,74]. These developments will strongly impact the way virus diagnostics is performed in the coming years. A pilot project focusing on the application of HTS technologies to improv the virus indexing of clonal crop germplasm accessions has been initiated for bananas, cassavas, potatoes, sweetpotato, and yams at the Bioversity International, the CIAT, the CIP, and the IITA.

#### *6.4. Keeping up with Evolving Technologies*

New technologies are evolving all the time for the phytosanitation and more accurate and rapid detection of existing and newly diagnosed pests. GHUs maintain a balance in adopting the best technologies that offer cost and time efficiency, meet regulatory requirements, and comply with ISO/QMS systems. The GHU operating system supports the use of a well-standardized procedure, so long as the procedure remains effective and offers reliable results for decision-making. The development of new standardized procedures is expensive, time-consuming, and requires extensive testing under various scenarios to determine the robustness, reliability, and suitability of the new method for the intended purpose. GHUs aim at keeping up to date and staying relevant, while avoiding change for the sake of change. As an example, the GHUs use, HTS-based diagnostic methods in the phytosanitary context is limited to virus indexing of mother stocks, while PCR and ELISA-based methods remain as 'gold standard' for virus indexing.

GHUs have identified a need to intensify efforts towards developing nucleic acidbased detection protocols for several pests that are difficult to detect through routinely used conventional tests, such as the blotter technique. Efforts are also required to standardize protocols for non-invasive techniques for detecting seed-borne pests (e.g., Videometer spectral imaging for detecting fungal pathogens and soft X-ray analysis to detect hidden seed infestation by pests) [75]. Similarly, new and safe solutions for crop protection and seed treatments are needed, as some fungicides and insecticide treatments are banned or restricted for use on specific crops in some countries. Due to the high volume of samples processed annually, the adoption of mobile digital data collection devices is necessary to facilitate the processing of materials, which would notably improve the traceability of the process and real-time data collection and analyses.

#### *6.5. Insufficient Phytosanitary Standards for Germplasm Transfers from Genebanks and Breeding Programs*

Specific phytosanitary standards for the international exchange of germplasm have not been developed. The FAO Genebanks Standards [35] lack adequate details on the procedures for the import and export germplasm from international genebanks. Therefore, the NPPOs either develop and follow their norms or follow those prescribed through ISPMs, which were established to address the SPS regulations governing the trade of plant and plant products, as part of the WTO treaty [34]. To date, the 43 ISPMs have been developed by the IPPC are aligned with the SPS requirements concerning commercial trade and large volumes of consignments. These regulations are inadequate for the purposes of the international transfer of germplasm. The ISPM 36 on the "international movement of plants for propagation" [76], and the ISPM 38 on the "international movement of seed" [77] address few issues, but are mainly designed for commercial shipment volumes. The ad hoc norms for germplasm exchange from genebanks and breeding programs introduce different requirements, depending on the country, making germplasm transfers a challenging endeavor. In addition, the existence of conflicting regulatory frameworks in different countries due to outdated regulations, outdated pest lists, or their absence constrain the exchange of germplasm. All of this causes delays in clearance, leading to germplasm having a loss of viability, before it arrives at its destinations, or a late arrival, resulting in the loss of an entire planting season.

Other challenges emerge from unforeseen changes to policies in the countries of operations. Policy changes are most often triggered by (i) new pest outbreaks, (ii) the risk perception of invasive pests spreading into territories, (iii) the introduction of new/amended procedures, and (iv) changes to administrative and implementation protocols. GHUs have adopted the flexibility needed to make necessary adjustments in order to align with policy

requirements in countries of operations and thus enable germplasm distributions. In some cases, the policies do not match the biological complexities and restrict germplasm distributions. For instance, the genomes of some viruses are integrated into the host genomes (e.g., endogenous badnavirus sequences in banana and yam genomes). In essence, integrated viral genomes are an inseparable part of the host. The existing regulations do not consider these complexities, and all the germplasm with integrated virus genomes was consequently withheld from international transfers, and this amounts to over 50% of the *Musa* collection held in the CGIAR genebanks. In 2015, GHUs of IITA-Bioversity, together with the MusaNet Working Group on Genetic Resources, developed a new approach to the transfer of germplasm with integrated viral sequences (see Section 4 for details) [58]. The NPPOs have approved international transfers of *Musa* germplasm organized in accordance with this protocol, making a significant proportion of banana germplasm available again for distribution.

The current phytosanitary policies are also insufficient to address the germplasm "safety duplication" efforts (also referred to as black box conservation) in the Svalbard Global Seed Vault and/or in other third-party countries [5]. The safety duplication involves transfers of both health certified and untested accessions to another country (third-party) in sealed envelopes or as in vitro plants exclusively for conservation and repatriation to the "country of origin" when required. The NPPO requirements, however, are difficult to fulfill, as the procedures stipulate mandatory health declarations, and the entry of untested germplasm is prohibited. However, ad hoc bilateral arrangements have been established between the source of origin countries and third-party countries to facilitate safety duplication as an interim arrangement. This system is working although not always smoothly due to ambiguities arising from the different understandings of the NPPOs. GHUs are working with regulatory agencies to establish a standard policy to streamline the procedure for this important genebank activity.

To cover some of the phytosanitary policy challenges associated with germplasm exchange, GHUs have initiated the development of the "CGIAR GreenPass Phytosanitary Protocol (GreenPass)" [78], as a comprehensive procedure for the assurance of phytosanitary compliance. This protocol will detail the best procedures in use for germplasm regeneration and health assurance, while maintaining transparency in risk assessment and mitigation strategies to obtain NPPO accreditation in order to fast track germplasm distribution. It is hoped that the IPPC and other stakeholders' endorsement of this initiative will eliminate redundant checks or reduce the processing time of material from GreenPass-accredited facilities.

#### **7. Conclusions**

The CGIAR germplasm health program has over 50 years of experience [41]. GHUs have served as a vital conduit of the globally coordinated CGIAR crop research programs, which tested 1000s of germplasms and new breeding lines in multiple field sites and mega environments for the identification of lines that have superior yields, high nutrition and are resilient to biotic and abiotic stresses. The seeds of those accessions were made widely available for crop productivity improvement, leading to a broad social, economic, and environmental impact [79–81]. For instance, the International Wheat Improvement Network (IWIN) organized approximately 700 field sites in over 90 countries to develop around 1000 high-yielding, disease-resistant lines targeted at major agro-ecologies, which are delivered annually as international public goods (IPGs) [81]. To date, GHUs continue to facilitate crucial germplasm transfers to the largest number of stakeholders around the world vital to deliver IPGs with a positive impact on the SDGs associated with (i) nutrition and food security; (ii) poverty reduction; (iii) environmental health and biodiversity; and (iv) climate adaptation and greenhouse gas reduction.

The efforts of GHUs in thoroughly testing germplasm accessions for known pests, before their release for international transfer, have averted the inadvertent spread of quarantine pests. This is of great significance, as most CGIAR centers operate in countries

where some of the most dreaded pests are prevalent (e.g., cassava brown streak virus, Karnal bunt, maize lethal necrosis, rice blight, and wheat blast, to name a few) [82]. Years of experience indicate that adaptability is a vital requirement for sustaining operations in an era of constant changes driven by pest outbreaks, agricultural intensification, climate variability, phytosanitary policies, and regulations [83]. A study on the patterns of invasion and spread pathways of 1517 invasive species reported that horticulture and the nursery trade are the dominant pathways for the incursion of invasive alien species [84]. The increasing international exchanges and the globalization of the world present a high risk that introduced pests will be established and expand quickly [82]. Safe and efficient germplasm transfer forms a critical preventive pest control approach for the CGIAR programs under the IPPC treaty and national laws. It is also safe to assume that the drivers responsible for transboundary pest outbreaks are difficult to contain, and high levels of vigilance will be required to monitor the pest dynamics in order to sustain the CGIAR operations. This requires regular updating of the existing protocols for hitherto unknown pests, enhanced collaboration with phytosanitary organizations and academia to obtain the most advanced information on pest detection and epidemiology, and adequate funding support, which is necessary for continuous adaptation to new pest challenges. It is imperative for GHUs to leverage technological advances in diagnostics, ICTs, remote sensing, and modeling to predict and monitor pest dynamics at a global level in order to understand their dispersal mechanisms and impact on the genebank and breeding programs in the short, medium, and long term.

GHUs high-level capacity, experience, track record, and global distribution in the developing world enable them to play an important role as centers of excellence in supporting national and regional pest and disease surveillance and rapid response. A strong case exists for positioning GHUs as part of a global network of phytosanitary hubs for the research, diagnoses, and control of established and emerging pests as part of the One CGIAR program, which is set to be operational in 2022 [85].

**Supplementary Materials:** The following are available online at https://www.mdpi.com/2223-774 7/10/2/328/s1, Table S1: Details of crops and country locations of the CGIAR GHUs; Table S2: List of crops and pests routinely assessed by the CGIAR GHUs. Figure S1: Countries receiving germplasm from genebanks (A) and breeding programs (B); and the countries from which germplasm was received by the CGIAR genebanks (C) and CGIAR breeding programs (D), during 2018–19. The intensity of blue reflects the number of transfer events.

**Author Contributions:** The first author conceived the article, guided its development, and prepared the first draft; and all authors contributed equally to the preparation of the article. All authors have read and agreed to the published version of the manuscript.

**Funding:** The writing of this article and open access publication fees were funded through the CGIAR Genebank Platform.

**Institutional Review Board Statement:** Not applicable for studies not involving humans.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** We thank the editors of this special issue, Andreas Ebert and Jan Engels, for inviting us to write this article and for their critical review and valuable suggestions for improvement. We also thank Nelissa Jamora of Crop Trust (Bonn, Germany) and Alabi Tunrayo of IITA for their help with the preparation of geographic maps used in this manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

